2.10. reports.py - produce various metrics and reports
Functions to create reports from genomics pipeline data.
usage: reports.py subcommand
2.10.2. Sub-commands
2.10.2.1. assembly_stats
Fetch assembly-level statistics for a given sample
reports.py assembly_stats [-h]
[--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
[--assembly_dir ASSEMBLY_DIR]
[--assembly_tmp ASSEMBLY_TMP]
[--align_dir ALIGN_DIR] [--reads_dir READS_DIR]
[--raw_reads_dir RAW_READS_DIR]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
samples [samples ...] outFile
2.10.2.1.1. Positional Arguments
- samples
Sample names.
- outFile
Output report file.
2.10.2.1.2. Named Arguments
- --cov_thresholds
Genome coverage thresholds to report on. (default: (1, 5, 20, 100))
Default:
(1, 5, 20, 100)- --assembly_dir
Directory with assembly outputs. (default: ‘data/02_assembly’)
Default:
'data/02_assembly'- --assembly_tmp
Directory with assembly temp files. (default: ‘tmp/02_assembly’)
Default:
'tmp/02_assembly'- --align_dir
Directory with reads aligned to own assembly. (default: ‘data/02_align_to_self’)
Default:
'data/02_align_to_self'- --reads_dir
Directory with unaligned filtered read BAMs. (default: ‘data/01_per_sample’)
Default:
'data/01_per_sample'- --raw_reads_dir
Directory with unaligned raw read BAMs. (default: ‘data/00_raw’)
Default:
'data/00_raw'- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.2. coverage_only
reports.py coverage_only [-h]
[--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
mapped_bams [mapped_bams ...] out_report
2.10.2.2.1. Positional Arguments
- mapped_bams
Aligned-to-self mapped bam files.
- out_report
Output report file.
2.10.2.2.2. Named Arguments
- --cov_thresholds
Genome coverage thresholds to report on. (default: (1, 5, 20, 100))
Default:
(1, 5, 20, 100)- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.3. consolidate_fastqc
Consolidate multiple FASTQC reports into one.
reports.py consolidate_fastqc [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
inDirs [inDirs ...] outFile
2.10.2.3.1. Positional Arguments
- inDirs
Input FASTQC directories.
- outFile
Output report file.
2.10.2.3.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.4. consolidate_spike_count
Consolidate multiple spike count reports into one.
reports.py consolidate_spike_count [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
inDir outFile
2.10.2.4.1. Positional Arguments
- inDir
Input spike count directory.
- outFile
Output report file.
2.10.2.4.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.5. aggregate_spike_count
aggregate multiple spike count reports into one.
reports.py aggregate_spike_count [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
inDir outFile
2.10.2.5.1. Positional Arguments
- inDir
Input spike count directory.
- outFile
Output report file.
2.10.2.5.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.6. aggregate_alignment_counts
aggregate multiple reports from read_utils.py bwamem_idxstats into one report.
reports.py aggregate_alignment_counts [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
in_reports [in_reports ...] outFile
2.10.2.6.1. Positional Arguments
- in_reports
tsv reports with alignment counts from read_utils.py bwamem_idxstats
- outFile
Output report file.
2.10.2.6.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.7. plot_coverage
Generate a coverage plot from an aligned bam file
reports.py plot_coverage [-h] [--plotFormat] [--plotDataStyle] [--plotStyle]
[--plotWidth PLOT_WIDTH] [--plotHeight PLOT_HEIGHT]
[--plotDPI PLOT_DPI] [--plotTitle PLOT_TITLE]
[--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
[--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
[-q BASE_Q_THRESHOLD] [-Q MAPPING_Q_THRESHOLD]
[-m MAX_COVERAGE_DEPTH] [-l READ_LENGTH_THRESHOLD]
[--binLargePlots]
[--binningSummaryStatistic {max,min,mean,median}]
[--outSummary OUT_SUMMARY] [--plotOnlyNonDuplicates]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
in_bam out_plot_file
2.10.2.7.1. Positional Arguments
- in_bam
Input reads, BAM format.
- out_plot_file
The generated chart file
2.10.2.7.2. Named Arguments
- --plotFormat
Possible choices: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp
File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp
- --plotDataStyle
Possible choices: filled, line, dots
The plot data display style. Valid options: filled, line, dots (default: ‘filled’)
Default:
'filled'- --plotStyle
Possible choices: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10
The plot visual style. Valid options: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10 (default: ‘ggplot’)
Default:
'ggplot'- --plotWidth
Width of the plot in pixels (default: 880)
Default:
880- --plotHeight
Width of the plot in pixels (default: 680)
Default:
680- --plotDPI
dots per inch for rendered output, more useful for vector modes (default: 100.0)
Default:
100.0- --plotTitle
The title displayed on the coverage plot (default: ‘’Coverage Plot’’)
Default:
'Coverage Plot'- --plotXLimits
Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
- --plotYLimits
Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
- -q
The minimum base quality threshold
- -Q
The minimum mapping quality threshold
- -m
The max coverage depth (default: None)
- -l
Read length threshold
- --binLargePlots
Plot summary read depth in one-pixel-width bins for large plots.
Default:
False- --binningSummaryStatistic
Possible choices: max, min, mean, median
Statistic used to summarize each bin (max or min).
Default:
'max'- --outSummary
Coverage summary TSV file. Default is to write to temp.
- --plotOnlyNonDuplicates
Plot only non-duplicates (samtools -F 1024), coverage counted by bedtools rather than samtools.
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.8. align_and_plot_coverage
Take reads, align to reference with BWA-MEM, and generate a coverage plot
reports.py align_and_plot_coverage [-h] [--plotFormat] [--plotDataStyle]
[--plotStyle] [--plotWidth PLOT_WIDTH]
[--plotHeight PLOT_HEIGHT]
[--plotDPI PLOT_DPI]
[--plotTitle PLOT_TITLE]
[--plotXLimits PLOT_X_LIMITS PLOT_X_LIMITS]
[--plotYLimits PLOT_Y_LIMITS PLOT_Y_LIMITS]
[-q BASE_Q_THRESHOLD]
[-Q MAPPING_Q_THRESHOLD]
[-m MAX_COVERAGE_DEPTH]
[-l READ_LENGTH_THRESHOLD]
[--binLargePlots]
[--binningSummaryStatistic {max,min,mean,median}]
[--outSummary OUT_SUMMARY]
[--outBam OUT_BAM] [--sensitive]
[--excludeDuplicates]
[--JVMmemory JVMMEMORY]
[--picardOptions [PICARDOPTIONS ...]]
[--minScoreToFilter MIN_SCORE_TO_FILTER]
[--aligner {novoalign,bwa}]
[--aligner_options ALIGNER_OPTIONS]
[--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
in_bam out_plot_file ref_fasta
2.10.2.8.1. Positional Arguments
- in_bam
Input reads, BAM format.
- out_plot_file
The generated chart file
- ref_fasta
Reference genome, FASTA format.
2.10.2.8.2. Named Arguments
- --plotFormat
Possible choices: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp
File format of the coverage plot. By default it is inferred from the file extension of out_plot_file, but it can be set explicitly via –plotFormat. Valid formats include: eps, jpg, jpeg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff, webp
- --plotDataStyle
Possible choices: filled, line, dots
The plot data display style. Valid options: filled, line, dots (default: ‘filled’)
Default:
'filled'- --plotStyle
Possible choices: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10
The plot visual style. Valid options: Solarize_Light2, _classic_test_patch, _mpl-gallery, _mpl-gallery-nogrid, bmh, classic, dark_background, fast, fivethirtyeight, ggplot, grayscale, petroff10, seaborn-v0_8, seaborn-v0_8-bright, seaborn-v0_8-colorblind, seaborn-v0_8-dark, seaborn-v0_8-dark-palette, seaborn-v0_8-darkgrid, seaborn-v0_8-deep, seaborn-v0_8-muted, seaborn-v0_8-notebook, seaborn-v0_8-paper, seaborn-v0_8-pastel, seaborn-v0_8-poster, seaborn-v0_8-talk, seaborn-v0_8-ticks, seaborn-v0_8-white, seaborn-v0_8-whitegrid, tableau-colorblind10 (default: ‘ggplot’)
Default:
'ggplot'- --plotWidth
Width of the plot in pixels (default: 880)
Default:
880- --plotHeight
Width of the plot in pixels (default: 680)
Default:
680- --plotDPI
dots per inch for rendered output, more useful for vector modes (default: 100.0)
Default:
100.0- --plotTitle
The title displayed on the coverage plot (default: ‘’Coverage Plot’’)
Default:
'Coverage Plot'- --plotXLimits
Limits on the x-axis of the coverage plot; args are ‘<min> <max>’
- --plotYLimits
Limits on the y-axis of the coverage plot; args are ‘<min> <max>’
- -q
The minimum base quality threshold
- -Q
The minimum mapping quality threshold
- -m
The max coverage depth (default: None)
- -l
Read length threshold
- --binLargePlots
Plot summary read depth in one-pixel-width bins for large plots.
Default:
False- --binningSummaryStatistic
Possible choices: max, min, mean, median
Statistic used to summarize each bin (max or min).
Default:
'max'- --outSummary
Coverage summary TSV file. Default is to write to temp.
- --outBam
Output aligned, indexed BAM file. Default is to write to temp.
- --sensitive
Equivalent to giving bwa: ‘-k 12 -A 1 -B 1 -O 1 -E 1’. Only relevant if the bwa aligner is selected (the default).
Default:
False- --excludeDuplicates
MarkDuplicates with Picard and only plot non-duplicates
Default:
False- --JVMmemory
JVM virtual memory size (default: ‘2g’)
Default:
'2g'- --picardOptions
Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value …
Default:
[]- --minScoreToFilter
Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output)
- --aligner
Possible choices: novoalign, bwa
aligner (default: ‘bwa’)
Default:
'bwa'- --aligner_options
aligner options (default for novoalign: “-r Random -l 40 -g 40 -x 20 -t 100 -k”, bwa: bwa defaults
- --NOVOALIGN_LICENSE_PATH
A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: None)
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.10.2.9. fastqc
reports.py fastqc [-h] [--out_zip OUT_ZIP] [--threads THREADS] inBam out_html
2.10.2.9.1. Positional Arguments
- inBam
Input reads, BAM format.
- out_html
Output report, HTML format.
2.10.2.9.2. Named Arguments
- --out_zip
Output data, zip archive.
- --threads
Number of threads.