3.7. reports.py - produce various metrics and reportsΒΆ

Functions to create reports from genomics pipeline data.

usage: reports.py subcommand
Sub-commands:
assembly_stats

Fetch assembly-level statistics for a given sample

usage: reports.py assembly_stats [-h]
                                 [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                                 [--assembly_dir ASSEMBLY_DIR]
                                 [--assembly_tmp ASSEMBLY_TMP]
                                 [--align_dir ALIGN_DIR]
                                 [--reads_dir READS_DIR]
                                 [--raw_reads_dir RAW_READS_DIR]
                                 samples [samples ...] outFile
Positional arguments:
samples Sample names.
outFile Output report file.
Options:
--cov_thresholds=(1, 5, 20, 100)
 Genome coverage thresholds to report on. (default: %(default)s)
--assembly_dir=data/02_assembly
 Directory with assembly outputs. (default: %(default)s)
--assembly_tmp=tmp/02_assembly
 Directory with assembly temp files. (default: %(default)s)
--align_dir=data/02_align_to_self
 Directory with reads aligned to own assembly. (default: %(default)s)
--reads_dir=data/01_per_sample
 Directory with unaligned filtered read BAMs. (default: %(default)s)
--raw_reads_dir=data/00_raw
 Directory with unaligned raw read BAMs. (default: %(default)s)
coverage_only

usage: reports.py coverage_only [-h]
                                [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version]
                                mapped_bams [mapped_bams ...] out_report
Positional arguments:
mapped_bams Aligned-to-self mapped bam files.
out_report Output report file.
Options:
--cov_thresholds=(1, 5, 20, 100)
 Genome coverage thresholds to report on. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
alignment_summary

Write or print pairwise alignment summary information for sequences in two FASTA files, including SNPs, ambiguous bases, and indels.

usage: reports.py alignment_summary [-h] [--outfileName OUTFILENAME]
                                    [--printCounts]
                                    inFastaFileOne inFastaFileTwo
Positional arguments:
inFastaFileOne First fasta file for an alignment
inFastaFileTwo First fasta file for an alignment
Options:
--outfileName Output file for counts in TSV format
--printCounts=False
 Undocumented
consolidate_fastqc

Consolidate multiple FASTQC reports into one.

usage: reports.py consolidate_fastqc [-h] inDirs [inDirs ...] outFile
Positional arguments:
inDirs Input FASTQC directories.
outFile Output report file.
consolidate_spike_count

Consolidate multiple spike count reports into one.

usage: reports.py consolidate_spike_count [-h] inDir outFile
Positional arguments:
inDir Input spike count directory.
outFile Output report file.
plot_coverage

Generate a coverage plot from an aligned bam file

usage: reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle]
                                [--plotStyle] [--plotWidth PLOT_WIDTH]
                                [--plotHeight PLOT_HEIGHT]
                                [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE]
                                [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                                [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                                [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD]
                                [-m MAX_COVERAGE_DEPTH]
                                [-l READ_LENGTH_THRESHOLD]
                                [--outSummary OUT_SUMMARY]
                                [--plotOnlyNonDuplicates]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                in_bam out_plot_file
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file
Options:
--plotFormat

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps

Possible choices: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps

--plotDataStyle=filled
 

The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots

--plotStyle=ggplot
 

The plot visual style. Valid options: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot (default: %(default)s)

Possible choices: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot

--plotWidth=880
 Width of the plot in pixels (default: %(default)s)
--plotHeight=680
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
--plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
--plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
--plotOnlyNonDuplicates=False
 Plot only non-duplicates (samtools -F 1024), coverage counted by bedtools rather than samtools.
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
align_and_plot_coverage

Take reads, align to reference with BWA-MEM, and generate a coverage plot

usage: reports.py align_and_plot_coverage [-h] [--plotFormat]
                                          [--plotDataStyle] [--plotStyle]
                                          [--plotWidth PLOT_WIDTH]
                                          [--plotHeight PLOT_HEIGHT]
                                          [--plotDPI PLOT_DPI]
                                          [--plotTitle PLOT_TITLE]
                                          [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
                                          [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
                                          [-q BASE_Q_THRESHOLD]
                                          [-Q MAPPING_Q_THRESHOLD]
                                          [-m MAX_COVERAGE_DEPTH]
                                          [-l READ_LENGTH_THRESHOLD]
                                          [--outSummary OUT_SUMMARY]
                                          [--outBam OUT_BAM] [--sensitive]
                                          [--excludeDuplicates]
                                          [--JVMmemory JVMMEMORY]
                                          [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                          [--minScoreToFilter MIN_SCORE_TO_FILTER]
                                          [--aligner {novoalign,bwa}]
                                          [--aligner_options ALIGNER_OPTIONS]
                                          [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmp_dir TMP_DIR]
                                          [--tmp_dirKeep]
                                          in_bam out_plot_file ref_fasta
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file
ref_fasta Reference genome, FASTA format.
Options:
--plotFormat

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps

Possible choices: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps

--plotDataStyle=filled
 

The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots

--plotStyle=ggplot
 

The plot visual style. Valid options: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot (default: %(default)s)

Possible choices: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot

--plotWidth=880
 Width of the plot in pixels (default: %(default)s)
--plotHeight=680
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
--plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
--plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
--outBam Output aligned, indexed BAM file. Default is to write to temp.
--sensitive=False
 Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’. Only relevant if the bwa aligner is selected (the default).
--excludeDuplicates=False
 MarkDuplicates with Picard and only plot non-duplicates
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
--minScoreToFilter
 Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output)
--aligner=bwa

aligner (default: %(default)s)

Possible choices: novoalign, bwa

--aligner_options
 aligner options (default for novoalign: “-r Random -l 40 -g 40 -x 20 -t 100 -k”, bwa: bwa defaults
--NOVOALIGN_LICENSE_PATH
 A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
fastqc

usage: reports.py fastqc [-h] inBam outHtml
Positional arguments:
inBam Input reads, BAM format.
outHtml Output report, HTML format.