3.7. reports.py - produce various metrics and reportsΒΆ
Functions to create reports from genomics pipeline data.
usage: reports.py subcommand
- Sub-commands:
- assembly_stats
Fetch assembly-level statistics for a given sample
usage: reports.py assembly_stats [-h] [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]] [--assembly_dir ASSEMBLY_DIR] [--assembly_tmp ASSEMBLY_TMP] [--align_dir ALIGN_DIR] [--reads_dir READS_DIR] [--raw_reads_dir RAW_READS_DIR] samples [samples ...] outFile
- Positional arguments:
samples Sample names. outFile Output report file. - Options:
--cov_thresholds=(1, 5, 20, 100) Genome coverage thresholds to report on. (default: %(default)s) --assembly_dir=data/02_assembly Directory with assembly outputs. (default: %(default)s) --assembly_tmp=tmp/02_assembly Directory with assembly temp files. (default: %(default)s) --align_dir=data/02_align_to_self Directory with reads aligned to own assembly. (default: %(default)s) --reads_dir=data/01_per_sample Directory with unaligned filtered read BAMs. (default: %(default)s) --raw_reads_dir=data/00_raw Directory with unaligned raw read BAMs. (default: %(default)s)
- coverage_only
usage: reports.py coverage_only [-h] [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] mapped_bams [mapped_bams ...] out_report
- Positional arguments:
mapped_bams Aligned-to-self mapped bam files. out_report Output report file. - Options:
--cov_thresholds=(1, 5, 20, 100) Genome coverage thresholds to report on. (default: %(default)s) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- alignment_summary
Write or print pairwise alignment summary information for sequences in two FASTA files, including SNPs, ambiguous bases, and indels.
usage: reports.py alignment_summary [-h] [--outfileName OUTFILENAME] [--printCounts] inFastaFileOne inFastaFileTwo
- Positional arguments:
inFastaFileOne First fasta file for an alignment inFastaFileTwo First fasta file for an alignment - Options:
--outfileName Output file for counts in TSV format --printCounts=False Undocumented
- consolidate_fastqc
Consolidate multiple FASTQC reports into one.
usage: reports.py consolidate_fastqc [-h] inDirs [inDirs ...] outFile
- Positional arguments:
inDirs Input FASTQC directories. outFile Output report file.
- consolidate_spike_count
Consolidate multiple spike count reports into one.
usage: reports.py consolidate_spike_count [-h] inDir outFile
- Positional arguments:
inDir Input spike count directory. outFile Output report file.
- plot_coverage
Generate a coverage plot from an aligned bam file
usage: reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle] [--plotStyle] [--plotWidth PLOT_WIDTH] [--plotHeight PLOT_HEIGHT] [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE] [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS] [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS] [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD] [-m MAX_COVERAGE_DEPTH] [-l READ_LENGTH_THRESHOLD] [--outSummary OUT_SUMMARY] [--plotOnlyNonDuplicates] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] in_bam out_plot_file
- Positional arguments:
in_bam Input reads, BAM format. out_plot_file The generated chart file - Options:
--plotFormat File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps
Possible choices: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps
--plotDataStyle=filled The plot data display style. Valid options: filled, line, dots (default: %(default)s)
Possible choices: filled, line, dots
--plotStyle=ggplot The plot visual style. Valid options: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot (default: %(default)s)
Possible choices: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot
--plotWidth=880 Width of the plot in pixels (default: %(default)s) --plotHeight=680 Width of the plot in pixels (default: %(default)s) --plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s) --plotTitle=Coverage Plot The title displayed on the coverage plot (default: ‘%(default)s’) --plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’ --plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’ -q The minimum base quality threshold -Q The minimum mapping quality threshold -m The max coverage depth (default: %(default)s) -l Read length threshold --outSummary Coverage summary TSV file. Default is to write to temp. --plotOnlyNonDuplicates=False Plot only non-duplicates (samtools -F 1024), coverage counted by bedtools rather than samtools. --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- align_and_plot_coverage
Take reads, align to reference with BWA-MEM, and generate a coverage plot
usage: reports.py align_and_plot_coverage [-h] [--plotFormat] [--plotDataStyle] [--plotStyle] [--plotWidth PLOT_WIDTH] [--plotHeight PLOT_HEIGHT] [--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE] [--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS] [--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS] [-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD] [-m MAX_COVERAGE_DEPTH] [-l READ_LENGTH_THRESHOLD] [--outSummary OUT_SUMMARY] [--outBam OUT_BAM] [--sensitive] [--excludeDuplicates] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--minScoreToFilter MIN_SCORE_TO_FILTER] [--aligner {novoalign,bwa}] [--aligner_options ALIGNER_OPTIONS] [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] in_bam out_plot_file ref_fasta
- Positional arguments:
in_bam Input reads, BAM format. out_plot_file The generated chart file ref_fasta Reference genome, FASTA format. - Options:
--plotFormat File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps
Possible choices: eps, tif, pgf, svgz, pdf, jpg, svg, jpeg, tiff, png, rgba, raw, ps
--plotDataStyle=filled The plot data display style. Valid options: filled, line, dots (default: %(default)s)
Possible choices: filled, line, dots
--plotStyle=ggplot The plot visual style. Valid options: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot (default: %(default)s)
Possible choices: grayscale, seaborn-pastel, seaborn-whitegrid, fivethirtyeight, bmh, seaborn-muted, seaborn-paper, seaborn-colorblind, seaborn-dark, seaborn-darkgrid, seaborn-deep, seaborn-poster, seaborn-dark-palette, seaborn-notebook, seaborn-ticks, seaborn-talk, classic, dark_background, seaborn-white, seaborn-bright, ggplot
--plotWidth=880 Width of the plot in pixels (default: %(default)s) --plotHeight=680 Width of the plot in pixels (default: %(default)s) --plotDPI=80.0 dots per inch for rendered output, more useful for vector modes (default: %(default)s) --plotTitle=Coverage Plot The title displayed on the coverage plot (default: ‘%(default)s’) --plotXLimits Limits on the x-axis of the coverage plot; args are ‘<min> <max>’ --plotYLimits Limits on the y-axis of the coverage plot; args are ‘<min> <max>’ -q The minimum base quality threshold -Q The minimum mapping quality threshold -m The max coverage depth (default: %(default)s) -l Read length threshold --outSummary Coverage summary TSV file. Default is to write to temp. --outBam Output aligned, indexed BAM file. Default is to write to temp. --sensitive=False Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’. Only relevant if the bwa aligner is selected (the default). --excludeDuplicates=False MarkDuplicates with Picard and only plot non-duplicates --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ... --minScoreToFilter Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output) --aligner=bwa aligner (default: %(default)s)
Possible choices: novoalign, bwa
--aligner_options aligner options (default for novoalign: “-r Random -l 40 -g 40 -x 20 -t 100 -k”, bwa: bwa defaults --NOVOALIGN_LICENSE_PATH A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastqc
usage: reports.py fastqc [-h] inBam outHtml
- Positional arguments:
inBam Input reads, BAM format. outHtml Output report, HTML format.