2.10. reports.py - produce various metrics and reports

Functions to create reports from genomics pipeline data.

usage: reports.py subcommand

2.10.1. subcommands



Possible choices: assembly_stats, coverage_only, consolidate_fastqc, consolidate_spike_count, aggregate_spike_count, aggregate_alignment_counts, plot_coverage, align_and_plot_coverage, fastqc

2.10.2. Sub-commands

2.10.2.1. assembly_stats

Fetch assembly-level statistics for a given sample

reports.py assembly_stats [-h]
                          [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                          [--assembly_dir ASSEMBLY_DIR]
                          [--assembly_tmp ASSEMBLY_TMP]
                          [--align_dir ALIGN_DIR] [--reads_dir READS_DIR]
                          [--raw_reads_dir RAW_READS_DIR]
                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                          [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                          samples [samples ...] outFile

2.10.2.1.1. Positional Arguments

samples

Sample names.

outFile

Output report file.

2.10.2.1.2. Named Arguments

--cov_thresholds

Genome coverage thresholds to report on. (default: (1, 5, 20, 100))

Default: (1, 5, 20, 100)

--assembly_dir

Directory with assembly outputs. (default: ‘data/02_assembly’)

Default: 'data/02_assembly'

--assembly_tmp

Directory with assembly temp files. (default: ‘tmp/02_assembly’)

Default: 'tmp/02_assembly'

--align_dir

Directory with reads aligned to own assembly. (default: ‘data/02_align_to_self’)

Default: 'data/02_align_to_self'

--reads_dir

Directory with unaligned filtered read BAMs. (default: ‘data/01_per_sample’)

Default: 'data/01_per_sample'

--raw_reads_dir

Directory with unaligned raw read BAMs. (default: ‘data/00_raw’)

Default: 'data/00_raw'

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.2. coverage_only

reports.py coverage_only [-h]
                         [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                         [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                         mapped_bams [mapped_bams ...] out_report

2.10.2.2.1. Positional Arguments

mapped_bams

Aligned-to-self mapped bam files.

out_report

Output report file.

2.10.2.2.2. Named Arguments

--cov_thresholds

Genome coverage thresholds to report on. (default: (1, 5, 20, 100))

Default: (1, 5, 20, 100)

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.3. consolidate_fastqc

Consolidate multiple FASTQC reports into one.

reports.py consolidate_fastqc [-h]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                              inDirs [inDirs ...] outFile

2.10.2.3.1. Positional Arguments

inDirs

Input FASTQC directories.

outFile

Output report file.

2.10.2.3.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.4. consolidate_spike_count

Consolidate multiple spike count reports into one.

reports.py consolidate_spike_count [-h]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inDir outFile

2.10.2.4.1. Positional Arguments

inDir

Input spike count directory.

outFile

Output report file.

2.10.2.4.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.5. aggregate_spike_count

aggregate multiple spike count reports into one.

reports.py aggregate_spike_count [-h]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmp_dir TMP_DIR]
                                 [--tmp_dirKeep]
                                 inDir outFile

2.10.2.5.1. Positional Arguments

inDir

Input spike count directory.

outFile

Output report file.

2.10.2.5.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.6. aggregate_alignment_counts

aggregate multiple reports from read_utils.py bwamem_idxstats into one report.

reports.py aggregate_alignment_counts [-h]
                                      [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                      [--version] [--tmp_dir TMP_DIR]
                                      [--tmp_dirKeep]
                                      in_reports [in_reports ...] outFile

2.10.2.6.1. Positional Arguments

in_reports

tsv reports with alignment counts from read_utils.py bwamem_idxstats

outFile

Output report file.

2.10.2.6.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.7. plot_coverage

Generate a coverage plot from an aligned bam file

reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle] [--plotStyle]
                         [--plotWidth PLOT_WIDTH] [--plotHeight PLOT_HEIGHT]
                         [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE]
                         [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                         [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                         [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD]
                         [-m MAX_COVERAGE_DEPTH] [-l READ_LENGTH_THRESHOLD]
                         [--binLargePlots]
                         [--binningSummaryStatistic {max,min,mean,median}]
                         [--outSummary OUT_SUMMARY] [--plotOnlyNonDuplicates]
                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                         [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                         in_bam out_plot_file

2.10.2.7.1. Positional Arguments

in_bam

Input reads, BAM format.

out_plot_file

The generated chart file

2.10.2.7.2. Named Arguments

--plotFormat

Possible choices: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp

--plotDataStyle

Possible choices: filled, line, dots

The plot data display style. Valid options: filled, line, dots (default: ‘filled’)

Default: 'filled'

--plotStyle

Possible choices: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10

The plot visual style. Valid options: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10 (default: ‘ggplot’)

Default: 'ggplot'

--plotWidth

Width of the plot in pixels (default: 880)

Default: 880

--plotHeight

Width of the plot in pixels (default: 680)

Default: 680

--plotDPI

dots per inch for rendered output, more useful for vector modes (default: 100.0)

Default: 100.0

--plotTitle

The title displayed on the coverage plot (default: ‘’Coverage Plot’’)

Default: 'Coverage Plot'

--plotXLimits

Limits on the x-axis of the coverage plot; args are ‘<min> <max>’

--plotYLimits

Limits on the y-axis of the coverage plot; args are ‘<min> <max>’

-q

The minimum base quality threshold

-Q

The minimum mapping quality threshold

-m

The max coverage depth (default: None)

-l

Read length threshold

--binLargePlots

Plot summary read depth in one-pixel-width bins for large plots.

Default: False

--binningSummaryStatistic

Possible choices: max, min, mean, median

Statistic used to summarize each bin (max or min).

Default: 'max'

--outSummary

Coverage summary TSV file. Default is to write to temp.

--plotOnlyNonDuplicates

Plot only non-duplicates (samtools -F 1024), coverage counted by bedtools rather than samtools.

Default: False

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.8. align_and_plot_coverage

Take reads, align to reference with BWA-MEM, and generate a coverage plot

reports.py align_and_plot_coverage [-h] [--plotFormat] [--plotDataStyle]
                                   [--plotStyle] [--plotWidth PLOT_WIDTH]
                                   [--plotHeight PLOT_HEIGHT]
                                   [--plotDPI PLOT_DPI]
                                   [--plotTitle PLOT_TITLE]
                                   [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                                   [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                                   [-q BASE_Q_THRESHOLD]
                                   [-Q MAPPING_Q_THRESHOLD]
                                   [-m MAX_COVERAGE_DEPTH]
                                   [-l READ_LENGTH_THRESHOLD]
                                   [--binLargePlots]
                                   [--binningSummaryStatistic {max,min,mean,median}]
                                   [--outSummary OUT_SUMMARY]
                                   [--outBam OUT_BAM] [--sensitive]
                                   [--excludeDuplicates]
                                   [--JVMmemory JVMMEMORY]
                                   [--picardOptions [PICARDOPTIONS ...]]
                                   [--minScoreToFilter MIN_SCORE_TO_FILTER]
                                   [--aligner {novoalign,bwa}]
                                   [--aligner_options ALIGNER_OPTIONS]
                                   [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   in_bam out_plot_file ref_fasta

2.10.2.8.1. Positional Arguments

in_bam

Input reads, BAM format.

out_plot_file

The generated chart file

ref_fasta

Reference genome, FASTA format.

2.10.2.8.2. Named Arguments

--plotFormat

Possible choices: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp

--plotDataStyle

Possible choices: filled, line, dots

The plot data display style. Valid options: filled, line, dots (default: ‘filled’)

Default: 'filled'

--plotStyle

Possible choices: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10

The plot visual style. Valid options: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10 (default: ‘ggplot’)

Default: 'ggplot'

--plotWidth

Width of the plot in pixels (default: 880)

Default: 880

--plotHeight

Width of the plot in pixels (default: 680)

Default: 680

--plotDPI

dots per inch for rendered output, more useful for vector modes (default: 100.0)

Default: 100.0

--plotTitle

The title displayed on the coverage plot (default: ‘’Coverage Plot’’)

Default: 'Coverage Plot'

--plotXLimits

Limits on the x-axis of the coverage plot; args are ‘<min> <max>’

--plotYLimits

Limits on the y-axis of the coverage plot; args are ‘<min> <max>’

-q

The minimum base quality threshold

-Q

The minimum mapping quality threshold

-m

The max coverage depth (default: None)

-l

Read length threshold

--binLargePlots

Plot summary read depth in one-pixel-width bins for large plots.

Default: False

--binningSummaryStatistic

Possible choices: max, min, mean, median

Statistic used to summarize each bin (max or min).

Default: 'max'

--outSummary

Coverage summary TSV file. Default is to write to temp.

--outBam

Output aligned, indexed BAM file. Default is to write to temp.

--sensitive

Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’. Only relevant if the bwa aligner is selected (the default).

Default: False

--excludeDuplicates

MarkDuplicates with Picard and only plot non-duplicates

Default: False

--JVMmemory

JVM virtual memory size (default: ‘2g’)

Default: '2g'

--picardOptions

Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value …

Default: []

--minScoreToFilter

Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output)

--aligner

Possible choices: novoalign, bwa

aligner (default: ‘bwa’)

Default: 'bwa'

--aligner_options

aligner options (default for novoalign: “-r Random -l 40 -g 40 -x 20 -t 100 -k”, bwa: bwa defaults

--NOVOALIGN_LICENSE_PATH

A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: None)

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.10.2.9. fastqc

reports.py fastqc [-h] [--out_zip OUT_ZIP] [--threads THREADS] inBam out_html

2.10.2.9.1. Positional Arguments

inBam

Input reads, BAM format.

out_html

Output report, HTML format.

2.10.2.9.2. Named Arguments

--out_zip

Output data, zip archive.

--threads

Number of threads.