3.7. reports.py - produce various metrics and reportsΒΆ

Functions to create reports from genomics pipeline data.

usage: reports.py subcommand

Fetch assembly-level statistics for a given sample

usage: reports.py assembly_stats [-h]
                                 [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                                 [--assembly_dir ASSEMBLY_DIR]
                                 [--assembly_tmp ASSEMBLY_TMP]
                                 [--align_dir ALIGN_DIR]
                                 [--reads_dir READS_DIR]
                                 [--raw_reads_dir RAW_READS_DIR]
                                 samples [samples ...] outFile
Positional arguments:
samples Sample names.
outFile Output report file.
--cov_thresholds=(1, 5, 20, 100)
 Genome coverage thresholds to report on. (default: %(default)s)
 Directory with assembly outputs. (default: %(default)s)
 Directory with assembly temp files. (default: %(default)s)
 Directory with reads aligned to own assembly. (default: %(default)s)
 Directory with unaligned filtered read BAMs. (default: %(default)s)
 Directory with unaligned raw read BAMs. (default: %(default)s)

Write or print pairwise alignment summary information for sequences in two FASTA files, including SNPs, ambiguous bases, and indels.

usage: reports.py alignment_summary [-h] [--outfileName OUTFILENAME]
                                    inFastaFileOne inFastaFileTwo
Positional arguments:
inFastaFileOne First fasta file for an alignment
inFastaFileTwo First fasta file for an alignment
--outfileName Output file for counts in TSV format

Consolidate multiple FASTQC reports into one.

usage: reports.py consolidate_fastqc [-h] inDirs [inDirs ...] outFile
Positional arguments:
inDirs Input FASTQC directories.
outFile Output report file.

Consolidate multiple spike count reports into one.

usage: reports.py consolidate_spike_count [-h] inDir outFile
Positional arguments:
inDir Input spike count directory.
outFile Output report file.

Generate a coverage plot from an aligned bam file

usage: reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle]
                                [--plotStyle] [--plotWidth PLOT_WIDTH]
                                [--plotHeight PLOT_HEIGHT]
                                [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE]
                                [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                                [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                                [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD]
                                [-m MAX_COVERAGE_DEPTH]
                                [-l READ_LENGTH_THRESHOLD]
                                [--outSummary OUT_SUMMARY]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                in_bam out_plot_file
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: pgf, raw, tiff, eps, svgz, pdf, png, tif, ps, jpg, svg, rgba, jpeg

Possible choices: pgf, raw, tiff, eps, svgz, pdf, png, tif, ps, jpg, svg, rgba, jpeg


The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots


The plot visual style. Valid options: dark_background, fivethirtyeight, seaborn-pastel, seaborn-muted, seaborn-bright, seaborn-dark, seaborn-white, seaborn-paper, seaborn-darkgrid, grayscale, seaborn-whitegrid, seaborn-colorblind, seaborn-ticks, seaborn-deep, ggplot, seaborn-dark-palette, bmh, seaborn-talk, classic, seaborn-poster, seaborn-notebook (default: %(default)s)

Possible choices: dark_background, fivethirtyeight, seaborn-pastel, seaborn-muted, seaborn-bright, seaborn-dark, seaborn-white, seaborn-paper, seaborn-darkgrid, grayscale, seaborn-whitegrid, seaborn-colorblind, seaborn-ticks, seaborn-deep, ggplot, seaborn-dark-palette, bmh, seaborn-talk, classic, seaborn-poster, seaborn-notebook

 Width of the plot in pixels (default: %(default)s)
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
--plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
--plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
 Plot only non-duplicates (samtools -F 1024), coverage counted by bedtools rather than samtools.

Verboseness of output. [default: %(default)s]


--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

Take reads, align to reference with BWA-MEM, and generate a coverage plot

usage: reports.py align_and_plot_coverage [-h] [--plotFormat]
                                          [--plotDataStyle] [--plotStyle]
                                          [--plotWidth PLOT_WIDTH]
                                          [--plotHeight PLOT_HEIGHT]
                                          [--plotDPI PLOT_DPI]
                                          [--plotTitle PLOT_TITLE]
                                          [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                                          [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                                          [-q BASE_Q_THRESHOLD]
                                          [-Q MAPPING_Q_THRESHOLD]
                                          [-m MAX_COVERAGE_DEPTH]
                                          [-l READ_LENGTH_THRESHOLD]
                                          [--outSummary OUT_SUMMARY]
                                          [--outBam OUT_BAM] [--sensitive]
                                          [--JVMmemory JVMMEMORY]
                                          [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                          [--minScoreToFilter MIN_SCORE_TO_FILTER]
                                          [--aligner {novoalign,bwa}]
                                          [--aligner_options ALIGNER_OPTIONS]
                                          [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmp_dir TMP_DIR]
                                          in_bam out_plot_file ref_fasta
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file
ref_fasta Reference genome, FASTA format.

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: pgf, raw, tiff, eps, svgz, pdf, png, tif, ps, jpg, svg, rgba, jpeg

Possible choices: pgf, raw, tiff, eps, svgz, pdf, png, tif, ps, jpg, svg, rgba, jpeg


The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots


The plot visual style. Valid options: dark_background, fivethirtyeight, seaborn-pastel, seaborn-muted, seaborn-bright, seaborn-dark, seaborn-white, seaborn-paper, seaborn-darkgrid, grayscale, seaborn-whitegrid, seaborn-colorblind, seaborn-ticks, seaborn-deep, ggplot, seaborn-dark-palette, bmh, seaborn-talk, classic, seaborn-poster, seaborn-notebook (default: %(default)s)

Possible choices: dark_background, fivethirtyeight, seaborn-pastel, seaborn-muted, seaborn-bright, seaborn-dark, seaborn-white, seaborn-paper, seaborn-darkgrid, grayscale, seaborn-whitegrid, seaborn-colorblind, seaborn-ticks, seaborn-deep, ggplot, seaborn-dark-palette, bmh, seaborn-talk, classic, seaborn-poster, seaborn-notebook

 Width of the plot in pixels (default: %(default)s)
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
--plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
--plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
--outBam Output aligned, indexed BAM file. Default is to write to temp.
 Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’. Only relevant if the bwa aligner is selected (the default).
 MarkDuplicates with Picard and only plot non-duplicates
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
 Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
 Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output)

aligner (default: %(default)s)

Possible choices: novoalign, bwa

 aligner options (default for novoalign: “-r Random -l 40 -g 40 -x 20 -t 100 -k”, bwa: bwa defaults
 A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s)

Verboseness of output. [default: %(default)s]


--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

usage: reports.py fastqc [-h] inBam outHtml
Positional arguments:
inBam Input reads, BAM format.
outHtml Output report, HTML format.