3.7. reports.py - produce various metrics and reportsΒΆ

Functions to create reports from genomics pipeline data.

usage: reports.py subcommand
Sub-commands:
assembly_stats

Fetch assembly-level statistics for a given sample

usage: reports.py assembly_stats [-h]
                                 [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
                                 [--assembly_dir ASSEMBLY_DIR]
                                 [--assembly_tmp ASSEMBLY_TMP]
                                 [--align_dir ALIGN_DIR]
                                 [--reads_dir READS_DIR]
                                 [--raw_reads_dir RAW_READS_DIR]
                                 samples [samples ...] outFile
Positional arguments:
samples Sample names.
outFile Output report file.
Options:
--cov_thresholds=(1, 5, 20, 100)
 Genome coverage thresholds to report on. (default: %(default)s)
--assembly_dir=data/02_assembly
 Directory with assembly outputs. (default: %(default)s)
--assembly_tmp=tmp/02_assembly
 Directory with assembly temp files. (default: %(default)s)
--align_dir=data/02_align_to_self
 Directory with reads aligned to own assembly. (default: %(default)s)
--reads_dir=data/01_per_sample
 Directory with unaligned filtered read BAMs. (default: %(default)s)
--raw_reads_dir=data/00_raw
 Directory with unaligned raw read BAMs. (default: %(default)s)
alignment_summary

Write or print pairwise alignment summary information for sequences in two FASTA files, including SNPs, ambiguous bases, and indels.

usage: reports.py alignment_summary [-h] [--outfileName OUTFILENAME]
                                    [--printCounts]
                                    inFastaFileOne inFastaFileTwo
Positional arguments:
inFastaFileOne First fasta file for an alignment
inFastaFileTwo First fasta file for an alignment
Options:
--outfileName Output file for counts in TSV format
--printCounts=False
 Undocumented
consolidate_fastqc

Consolidate multiple FASTQC reports into one.

usage: reports.py consolidate_fastqc [-h] inDirs [inDirs ...] outFile
Positional arguments:
inDirs Input FASTQC directories.
outFile Output report file.
consolidate_spike_count

Consolidate multiple spike count reports into one.

usage: reports.py consolidate_spike_count [-h] inDir outFile
Positional arguments:
inDir Input spike count directory.
outFile Output report file.
plot_coverage

Generate a coverage plot from an aligned bam file

usage: reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle]
                                [--plotStyle] [--plotWidth PLOT_WIDTH]
                                [--plotHeight PLOT_HEIGHT]
                                [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE]
                                [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD]
                                [-m MAX_COVERAGE_DEPTH]
                                [-l READ_LENGTH_THRESHOLD]
                                [--outSummary OUT_SUMMARY]
                                [--plotOnlyNonDuplicates]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                in_bam out_plot_file
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file
Options:
--plotFormat

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, pgf, pdf, png, jpeg, jpg, ps, svgz, tif, rgba, svg, raw, tiff

Possible choices: eps, pgf, pdf, png, jpeg, jpg, ps, svgz, tif, rgba, svg, raw, tiff

--plotDataStyle=filled
 

The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots

--plotStyle=ggplot
 

The plot visual style. Valid options: fivethirtyeight, seaborn-poster, classic, seaborn-dark-palette, seaborn-muted, grayscale, seaborn-notebook, dark_background, seaborn-paper, seaborn-white, seaborn-ticks, seaborn-colorblind, seaborn-dark, seaborn-pastel, seaborn-talk, seaborn-darkgrid, seaborn-whitegrid, bmh, seaborn-bright, ggplot, seaborn-deep (default: %(default)s)

Possible choices: fivethirtyeight, seaborn-poster, classic, seaborn-dark-palette, seaborn-muted, grayscale, seaborn-notebook, dark_background, seaborn-paper, seaborn-white, seaborn-ticks, seaborn-colorblind, seaborn-dark, seaborn-pastel, seaborn-talk, seaborn-darkgrid, seaborn-whitegrid, bmh, seaborn-bright, ggplot, seaborn-deep

--plotWidth=1024
 Width of the plot in pixels (default: %(default)s)
--plotHeight=768
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m=1000000 The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
--plotOnlyNonDuplicates=False
 Plot only non-duplicates (samtools -F 1024)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
align_and_plot_coverage

Take reads, align to reference with BWA-MEM, and generate a coverage plot

usage: reports.py align_and_plot_coverage [-h] [--plotFormat]
                                          [--plotDataStyle] [--plotStyle]
                                          [--plotWidth PLOT_WIDTH]
                                          [--plotHeight PLOT_HEIGHT]
                                          [--plotDPI PLOT_DPI]
                                          [--plotTitle PLOT_TITLE]
                                          [-q BASE_Q_THRESHOLD]
                                          [-Q MAPPING_Q_THRESHOLD]
                                          [-m MAX_COVERAGE_DEPTH]
                                          [-l READ_LENGTH_THRESHOLD]
                                          [--outSummary OUT_SUMMARY]
                                          [--outBam OUT_BAM] [--sensitive]
                                          [--excludeDuplicates]
                                          [--JVMmemory JVMMEMORY]
                                          [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                          [-T MIN_SCORE_TO_OUTPUT]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmp_dir TMP_DIR]
                                          [--tmp_dirKeep]
                                          in_bam out_plot_file ref_fasta
Positional arguments:
in_bam Input reads, BAM format.
out_plot_file The generated chart file
ref_fasta Reference genome, FASTA format.
Options:
--plotFormat

File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, pgf, pdf, png, jpeg, jpg, ps, svgz, tif, rgba, svg, raw, tiff

Possible choices: eps, pgf, pdf, png, jpeg, jpg, ps, svgz, tif, rgba, svg, raw, tiff

--plotDataStyle=filled
 

The plot data display style. Valid options: filled, line, dots (default: %(default)s)

Possible choices: filled, line, dots

--plotStyle=ggplot
 

The plot visual style. Valid options: fivethirtyeight, seaborn-poster, classic, seaborn-dark-palette, seaborn-muted, grayscale, seaborn-notebook, dark_background, seaborn-paper, seaborn-white, seaborn-ticks, seaborn-colorblind, seaborn-dark, seaborn-pastel, seaborn-talk, seaborn-darkgrid, seaborn-whitegrid, bmh, seaborn-bright, ggplot, seaborn-deep (default: %(default)s)

Possible choices: fivethirtyeight, seaborn-poster, classic, seaborn-dark-palette, seaborn-muted, grayscale, seaborn-notebook, dark_background, seaborn-paper, seaborn-white, seaborn-ticks, seaborn-colorblind, seaborn-dark, seaborn-pastel, seaborn-talk, seaborn-darkgrid, seaborn-whitegrid, bmh, seaborn-bright, ggplot, seaborn-deep

--plotWidth=1024
 Width of the plot in pixels (default: %(default)s)
--plotHeight=768
 Width of the plot in pixels (default: %(default)s)
--plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s)
--plotTitle=Coverage Plot
 The title displayed on the coverage plot (default: ‘%(default)s’)
-q The minimum base quality threshold
-Q The minimum mapping quality threshold
-m=1000000 The max coverage depth (default: %(default)s)
-l Read length threshold
--outSummary Coverage summary TSV file. Default is to write to temp.
--outBam Output aligned, indexed BAM file. Default is to write to temp.
--sensitive=False
 Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’
--excludeDuplicates=False
 MarkDuplicates with Picard and only plot non-duplicates
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
-T=30 The min score to output during alignment (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.