viral-ngs: genomic analysis pipelines for viral sequencing¶
Contents¶
Description of the methods¶
Much more documentation to come...
TO DO: here we will put a high level description of the various tools that exist here, perhaps with some pictures and such. We will describe why we used certain tools and approaches / how other approaches fell short / what kinds of problems certain steps are trying to solve. Perhaps some links to papers and such. Kind of a mini-methods paper here.
Viral genome analysis¶
De novo assembly, reference assisted assembly improvements, gene annotaion, species-level variation, within-host variation, etc.
Taxonomic read filtration¶
Especially human read depletion (prior to submission to NCBI SRA). But also the part where we restrict to a particular taxa of interest (the species you’re studying).
Taxonomic read identification¶
Nothing much here at the moment. That comes later, but we will later integrate it when it’s ready.
Installation¶
System dependencies¶
This is known to install cleanly on most modern Linux systems with Python, Java, and some basic development libraries. On Ubuntu 14.04 LTS, the following APT packages should be installed on top of the vanilla setup:
python3 python3-pip python3-nose
python-software-properties
zlib zlib1g zlib1g-dev
libblas3gf libblas-dev liblapack3gf liblapack-dev
libatlas-dev libatlas3-base libatlas3gf-base libatlas-base-dev
gfortran
oracle-java8-installer
libncurses5-dev
The Fortran libraries (including blas and atlas) are required to install numpy via pip from source. numpy is not actually required if you have Python 3.4, if you want to avoid this system dependency.
Java >= 1.7 is required by GATK and Picard.
Python dependencies¶
The command line tools require Python >= 2.7 or >= 3.4. Required packages (like pysam and Biopython) are listed in requirements.txt and can be installed the usual pip way:
pip install -r requirements.txt
Additionally, in order to use the pipeline infrastructure, Python 3.4 is required (Python 2 is not supported) and you must install snakemake as well:
pip install snakemake==3.2 yappi=0.94
However, most of the real functionality is encapsulated in the command line tools, which can be used without any of the pipeline infrastructure.
You should either sudo pip install or use a virtualenv (recommended).
Tool dependencies¶
A lot of effort has gone into writing auto download/compile wrappers for most of the bioinformatic tools we rely on here. They will auto-download and install the first time they are needed by any command. If you want to pre-install all of the external tools, simply type this:
python -m unittest test.test_tools.TestToolsInstallation -v
However, there are two tools in particular that cannot be auto-installed due to licensing restrictions. You will need to download and install these tools on your own (paying for it if your use case requires it) and set environment variables pointing to their installed location.
- GATK - http://www.broadinstitute.org/gatk/
- Novoalign - http://www.novocraft.com/products/novoalign/
The environment variables you will need to set are GATK_PATH and NOVOALIGN_PATH. These should be set to the full directory path that contains these tools (the jar file for GATK and the executable binaries for Novoalign).
Alternatively, if you are using the Snakemake pipelines, you can create a dictionary called “env_vars” in the config.json file for Snakemake, and the pipelines will automatically set all environment variables prior to running any scripts.
The version of MOSAIK we use seems to fail compile on GCC-4.9 but compiles fine on GCC-4.4. We have not tried intermediate versions of GCC, nor the latest versions of MOSAIK.
Command line tools¶
taxon_filter.py - tools for taxonomic removal or filtration of reads¶
This script contains a number of utilities for filtering NGS reads based on membership or non-membership in a species / genus / taxonomic grouping.
usage: taxon_filter.py subcommand
- Sub-commands:
- deplete_human
Undocumented
usage: taxon_filter.py deplete_human [-h] [--taxfiltBam TAXFILTBAM] --bmtaggerDbs BMTAGGERDBS [BMTAGGERDBS ...] --blastDbs BLASTDBS [BLASTDBS ...] [--lastDb LASTDB] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam revertBam bmtaggerBam rmdupBam blastnBam
- Positional arguments:
inBam Input BAM file. revertBam Output BAM: read markup reverted with Picard. bmtaggerBam Output BAM: depleted of human reads with BMTagger. rmdupBam Output BAM: bmtaggerBam run through M-Vicuna duplicate removal. blastnBam Output BAM: rmdupBam run through another depletion of human reads with BLASTN. - Options:
--taxfiltBam Output BAM: blastnBam run through taxonomic selection via LASTAL. --bmtaggerDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. --blastDbs One or more reference databases for blast to deplete from input. --lastDb One reference database for last (required if –taxfiltBam is specified). --JVMmemory=4g JVM virtual memory size for Picard FilterSamReads (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- trim_trimmomatic
Undocumented
usage: taxon_filter.py trim_trimmomatic [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 pairedOutFastq1 pairedOutFastq2 clipFasta
- Positional arguments:
inFastq1 Input reads 1 inFastq2 Input reads 2 pairedOutFastq1 Paired output 1 pairedOutFastq2 Paired output 2 clipFasta Fasta file with adapters, PCR sequences, etc. to clip off - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_lastal_bam
Undocumented
usage: taxon_filter.py filter_lastal_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam db outBam
- Positional arguments:
inBam Input reads db Database of taxa we keep outBam Output reads, filtered to refDb - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_lastal
Undocumented
usage: taxon_filter.py filter_lastal [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq refDb outFastq
- Positional arguments:
inFastq Input fastq file refDb Reference database to retain from input outFastq Output fastq file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- partition_bmtagger
Undocumented
usage: taxon_filter.py partition_bmtagger [-h] [--outMatch OUTMATCH OUTMATCH] [--outNoMatch OUTNOMATCH OUTNOMATCH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 refDbs [refDbs ...]
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. Must have same names as inFastq1 refDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. - Options:
--outMatch Filenames for fastq output of matching reads. --outNoMatch Filenames for fastq output of unmatched reads. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_bam_bmtagger
Undocumented
usage: taxon_filter.py deplete_bam_bmtagger [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refDbs [refDbs ...] outBam
- Positional arguments:
inBam Input BAM file. refDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. outBam Output BAM file. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_blastn
Undocumented
usage: taxon_filter.py deplete_blastn [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq outFastq refDbs [refDbs ...]
- Positional arguments:
inFastq Input fastq file. outFastq Output fastq file with matching reads removed. refDbs One or more reference databases for blast. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_blastn_paired
Undocumented
usage: taxon_filter.py deplete_blastn_paired [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] infq1 infq2 outfq1 outfq2 refDbs [refDbs ...]
- Positional arguments:
infq1 Input fastq file. infq2 Input fastq file. outfq1 Output fastq file with matching reads removed. outfq2 Output fastq file with matching reads removed. refDbs One or more reference databases for blast. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_blastn_bam
Undocumented
usage: taxon_filter.py deplete_blastn_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refDbs [refDbs ...] outBam
- Positional arguments:
inBam Input BAM file. refDbs One or more reference databases for blast. outBam Output BAM file with matching reads removed. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
assembly.py - de novo assembly¶
This script contains a number of utilities for viral sequence assembly from NGS reads. Primarily used for Lassa and Ebola virus analysis in the Sabeti Lab / Broad Institute Viral Genomics.
usage: assembly.py subcommand
- Sub-commands:
- trim_rmdup_subsamp
Undocumented
usage: assembly.py trim_rmdup_subsamp [-h] [--n_reads N_READS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam clipDb outBam
- Positional arguments:
inBam Input reads, unaligned BAM format. clipDb Trimmomatic clip DB. outBam Output reads, unaligned BAM format (currently, read groups and other header information are destroyed in this process). - Options:
--n_reads=100000 Subsample reads to no more than this many pairs. (default %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- assemble_trinity
Undocumented
usage: assembly.py assemble_trinity [-h] [--n_reads N_READS] [--outReads OUTREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam clipDb outFasta
- Positional arguments:
inBam Input unaligned reads, BAM format. clipDb Trimmomatic clip DB. outFasta Output assembly. - Options:
--n_reads=100000 Subsample reads to no more than this many pairs. (default %(default)s) --outReads Save the trimmomatic/prinseq/subsamp reads to a BAM file --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- order_and_orient
Undocumented
usage: assembly.py order_and_orient [-h] [--inReads INREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFasta inReference outFasta
- Positional arguments:
inFasta Input de novo assembly/contigs, FASTA format. inReference Reference genome for ordering, orienting, and merging contigs, FASTA format. outFasta Output assembly, FASTA format, with the same number of chromosomes as inReference, and in the same order. - Options:
--inReads Input reads in unaligned BAM format. These can be used to improve the merge process. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- impute_from_reference
Undocumented
usage: assembly.py impute_from_reference [-h] [--newName NEWNAME] [--minLength MINLENGTH] [--minUnambig MINUNAMBIG] [--replaceLength REPLACELENGTH] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFasta inReference outFasta
- Positional arguments:
inFasta Input assembly/contigs, FASTA format, already ordered, oriented and merged with inReference. inReference Reference genome to impute with, FASTA format. outFasta Output assembly, FASTA format. - Options:
--newName rename output chromosome (default: do not rename) --minLength=0 minimum length for contig (default: %(default)s) --minUnambig=0.0 minimum percentage unambiguous bases for contig (default: %(default)s) --replaceLength=0 length of ends to be replaced with reference (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- refine_assembly
Undocumented
usage: assembly.py refine_assembly [-h] [--outBam OUTBAM] [--outVcf OUTVCF] [--min_coverage MIN_COVERAGE] [--novo_params NOVO_PARAMS] [--chr_names [CHR_NAMES [CHR_NAMES ...]]] [--keep_all_reads] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFasta inBam outFasta
- Positional arguments:
inFasta Input assembly, FASTA format, pre-indexed for Picard, Samtools, and Novoalign. inBam Input reads, unaligned BAM format. outFasta Output refined assembly, FASTA format, indexed for Picard, Samtools, and Novoalign. - Options:
--outBam Reads aligned to inFasta. Unaligned and duplicate reads have been removed. GATK indel realigned. --outVcf GATK genotype calls for genome in inFasta coordinate space. --min_coverage=3 Minimum read coverage required to call a position unambiguous. --novo_params=-r Random -l 40 -g 40 -x 20 -t 100 Alignment parameters for Novoalign. --chr_names=[] Rename all output chromosomes (default: retain original chromosome names) --keep_all_reads=False Retain all reads in BAM file? Default is to remove unaligned and duplicate reads. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_short_seqs
Undocumented
usage: assembly.py filter_short_seqs [-h] [-f FORMAT] [-of OUTPUT_FORMAT] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile minLength minUnambig outFile
- Positional arguments:
inFile input sequence file minLength minimum length for contig minUnambig minimum percentage unambiguous bases for contig outFile output file - Options:
-f=fasta, --format=fasta Format for input sequence (default: %(default)s) -of=fasta, --output-format=fasta Format for output sequence (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- modify_contig
Undocumented
usage: assembly.py modify_contig [-h] [-n NAME] [-cn] [-t] [-r5] [-r3] [-l REPLACE_LENGTH] [-f FORMAT] [-r] [-rn] [-ca] [--tmpDir TMPDIR] [--tmpDirKeep] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] input output ref
- Positional arguments:
input input alignment of reference and contig (should contain exactly 2 sequences) output Destination file for modified contigs ref reference sequence name (exact match required) - Options:
-n, --name fasta header output name (default: existing header) -cn=False, --call-reference-ns=False should the reference sequence be called if there is an N in the contig and a more specific base in the reference (default: %(default)s) -t=False, --trim-ends=False should ends of contig.fasta be trimmed to length of reference (default: %(default)s) -r5=False, --replace-5ends=False should the 5’-end of contig.fasta be replaced by reference (default: %(default)s) -r3=False, --replace-3ends=False should the 3’-end of contig.fasta be replaced by reference (default: %(default)s) -l=10, --replace-length=10 length of ends to be replaced (if replace-ends is yes) (default: %(default)s) -f=fasta, --format=fasta Format for input alignment (default: %(default)s) -r=False, --replace-end-gaps=False Replace gaps at the beginning and end of the sequence with reference sequence (default: %(default)s) -rn=False, --remove-end-ns=False Remove leading and trailing N’s in the contig (default: %(default)s) -ca=False, --call-reference-ambiguous=False should the reference sequence be called if the contig seq is ambiguous and the reference sequence is more informative & consistant with the ambiguous base (ie Y->C) (default: %(default)s) --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- vcf_to_fasta
Undocumented
usage: assembly.py vcf_to_fasta [-h] [--trim_ends] [--min_coverage MIN_DP] [--major_cutoff MAJOR_CUTOFF] [--min_dp_ratio MIN_DP_RATIO] [--name [NAME [NAME ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcf outFasta
- Positional arguments:
inVcf Input VCF file outFasta Output FASTA file - Options:
--trim_ends=False If specified, we will strip off continuous runs of N’s from the beginning and end of the sequences before writing to output. Interior N’s will not be changed. --min_coverage=3 Specify minimum read coverage (with full agreement) to make a call. [default: %(default)s] --major_cutoff=0.5 If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position. [default: %(default)s] --min_dp_ratio=0.0 The input VCF file often reports two read depth values (DP)–one for the position as a whole, and one for the sample in question. We can optionally reject calls in which the sample read count is below a specified fraction of the total read count. This filter will not apply to any sites unless both DP values are reported. [default: %(default)s] --name=[] output sequence names (default: reference names in VCF file) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- trim_fasta
Undocumented
usage: assembly.py trim_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta outFasta
- Positional arguments:
inFasta Input fasta file outFasta Output (trimmed) fasta file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- deambig_fasta
Undocumented
usage: assembly.py deambig_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta outFasta
- Positional arguments:
inFasta Input fasta file outFasta Output fasta file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- dpdiff
Undocumented
usage: assembly.py dpdiff [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcfs [inVcfs ...] outFile
- Positional arguments:
inVcfs Input VCF file outFile Output flat file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
interhost.py - species and population-level genetic variation¶
This script contains a number of utilities for SNP calling, multi-alignment, phylogenetics, etc.
usage: interhost.py subcommand
intrahost.py - within-host genetic variation (iSNVs)¶
This script contains a number of utilities for intrahost variant calling and annotation for viral genomes.
usage: intrahost.py subcommand
- Sub-commands:
- tabfile_rename
Undocumented
usage: intrahost.py tabfile_rename [-h] [--col_idx COL] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile mapFile outFile
- Positional arguments:
inFile Input flat file mapFile Map file. Two-column headerless file that maps input values to output values. This script will error if there are values in inFile that do not exist in mapFile. outFile Output flat file - Options:
--col_idx=0 Which column number to replace (0-based index). [default: %(default)s] --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- vphaser_to_vcf
Undocumented
usage: intrahost.py vphaser_to_vcf [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile refFasta multiAlignment outVcf
- Positional arguments:
inFile Input vPhaser2 text file refFasta Reference genome FASTA multiAlignment Consensus genomes multi-alignment FASTA outVcf Output VCF file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- Fws
Undocumented
usage: intrahost.py Fws [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcf outVcf
- Positional arguments:
inVcf Input VCF file outVcf Output VCF file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- iSNV_table
Undocumented
usage: intrahost.py iSNV_table [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcf outFile
- Positional arguments:
inVcf Input VCF file outFile Output text file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- iSNP_per_patient
Undocumented
usage: intrahost.py iSNP_per_patient [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile outFile
- Positional arguments:
inFile Input text file outFile Output text file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
read_utils.py - utilities that manipulate bam and fastq files¶
Utilities for working with sequence reads, such as converting formats and fixing mate pairs.
usage: read_utils.py subcommand
- Sub-commands:
- purge_unmated
Undocumented
usage: read_utils.py purge_unmated [-h] [--regex REGEX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--regex=^@(\S+)/[1|2]$ Perl regular expression to parse paired read IDs (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_fasta
Undocumented
usage: read_utils.py fastq_to_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq outFasta
- Positional arguments:
inFastq Input fastq file. outFasta Output fasta file. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- index_fasta_samtools
Undocumented
usage: read_utils.py index_fasta_samtools [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta
- Positional arguments:
inFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- index_fasta_picard
Undocumented
usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFasta
- Positional arguments:
inFasta Input reference genome, FASTA format. - Options:
--JVMmemory=512m JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- mkdup_picard
Undocumented
usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input reads, BAM format. outBam Output reads, BAM format. - Options:
--outMetrics Output metrics file. Default is to dump to a temp file. --remove=False Instead of marking duplicates, remove them entirely (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- revert_bam_picard
Undocumented
usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s RevertSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- picard
Undocumented
usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] command
- Positional arguments:
command picard command - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- sort_bam
Undocumented
usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam {unsorted,queryname,coordinate}
- Positional arguments:
inBam Input bam file. outBam Output bam file, sorted. sortOrder How to sort the reads. [default: %(default)s]
Possible choices: unsorted, queryname, coordinate
- Options:
--index=False Index outBam (default: %(default)s) --md5=False MD5 checksum outBam (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SortSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- merge_bams
Undocumented
usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input bam files. outBam Output bam file. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam
Undocumented
usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam readList outBam
- Positional arguments:
inBam Input bam file. readList Input file of read IDs. outBam Output bam file. - Options:
--exclude=False If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed). --JVMmemory=4g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bam_to_fastq
Undocumented
usage: read_utils.py bam_to_fastq [-h] [--outHeader OUTHEADER] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outFastq1 outFastq2
- Positional arguments:
inBam Input bam file. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--outHeader Optional text file name that will receive bam header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SamToFastq, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_bam
Undocumented
usage: read_utils.py fastq_to_bam [-h] (--sampleName SAMPLENAME | --header HEADER) [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outBam
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outBam Output bam file. - Options:
--sampleName Sample name to insert into the read group header. --header Optional text file containing header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- split_reads
Undocumented
usage: read_utils.py split_reads [-h] [--maxReads MAXREADS | --numChunks NUMCHUNKS] [--indexLen INDEXLEN] [--format {fastq,fasta}] [--outSuffix OUTSUFFIX] inFileName outPrefix
- Positional arguments:
inFileName Input fastq or fasta file. outPrefix Output files will be named ${outPrefix}01${outSuffix}, ${outPrefix}02${outSuffix}... - Options:
--maxReads Maximum number of reads per chunk (default 1000 if neither maxReads nor numChunks is specified). --numChunks Number of output files, if maxReads is not specified. --indexLen=2 Number of characters to append to outputPrefix for each output file (default %(default)s). Number of files must not exceed 10^INDEXLEN. --format=fastq Input fastq or fasta file (default: %(default)s).
Possible choices: fastq, fasta
--outSuffix= Output filename suffix (e.g. .fastq or .fastq.gz). A suffix ending in .gz will cause the output file to be gzip compressed. Default is no suffix.
- split_bam
Undocumented
usage: read_utils.py split_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBams [outBams ...]
- Positional arguments:
inBam Input BAM file. outBams Output BAM files - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_mvicuna_bam
Undocumented
usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- dup_remove_mvicuna
Undocumented
usage: read_utils.py dup_remove_mvicuna [-h] [--unpairedOutFastq UNPAIREDOUTFASTQ] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 pairedOutFastq1 pairedOutFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. pairedOutFastq1 Output fastq file; 1st end of paired-end reads. pairedOutFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--unpairedOutFastq File name of output unpaired reads --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_prinseq_fastq
Undocumented
usage: read_utils.py rmdup_prinseq_fastq [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam_mapped_only
Undocumented
usage: read_utils.py filter_bam_mapped_only [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input aligned reads, BAM format. outBam Output sorted indexed reads, filtered to aligned-only, BAM format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoalign
Undocumented
usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Novoindex. outBam Output reads, BAM format (aligned). - Options:
--options=-r Random Novoalign options (default: %(default)s) --min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoindex
Undocumented
usage: read_utils.py novoindex [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] refFasta
- Positional arguments:
refFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- gatk_ug
Undocumented
usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outVcf
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard. outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well. - Options:
--options=--min_base_quality_score 15 -ploidy 4 UnifiedGenotyper options (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- gatk_realign
Undocumented
usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format, aligned to refFasta. refFasta Reference genome, FASTA format, pre-indexed by Picard. outBam Realigned reads. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- align_and_fix
Undocumented
usage: read_utils.py align_and_fix [-h] [--outBamAll OUTBAMALL] [--outBamFiltered OUTBAMFILTERED] [--novoalign_options NOVOALIGN_OPTIONS] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta
- Positional arguments:
inBam Input unaligned reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard and Novoalign. - Options:
--outBamAll Aligned, sorted, and indexed reads. Unmapped reads are retained and duplicate reads are marked, not removed. --outBamFiltered Aligned, sorted, and indexed reads. Unmapped reads and duplicate reads are removed from this file. --novoalign_options=-r Random Novoalign options (default: %(default)s) --JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
reports.py - produce various metrics and reports¶
Reports
usage: reports.py subcommand
- Sub-commands:
- assembly_stats
Undocumented
usage: reports.py assembly_stats [-h] [--cov_thresholds COV_THRESHOLDS [COV_THRESHOLDS ...]] [--assembly_dir ASSEMBLY_DIR] [--assembly_tmp ASSEMBLY_TMP] [--align_dir ALIGN_DIR] samples [samples ...] outFile
- Positional arguments:
samples Sample names. outFile Output report file. - Options:
--cov_thresholds=(1, 5, 20, 100) Genome coverage thresholds to report on. (default: %(default)s) --assembly_dir=data/02_assembly Directory with assembly outputs. (default: %(default)s) --assembly_tmp=tmp/02_assembly Directory with assembly temp files. (default: %(default)s) --align_dir=data/02_align_to_self Directory with reads aligned to own assembly. (default: %(default)s)
- consolidate_bamstats
Undocumented
usage: reports.py consolidate_bamstats [-h] inFiles [inFiles ...] outFile
- Positional arguments:
inFiles Input report files. outFile Output report file.
- consolidate_fastqc
Undocumented
usage: reports.py consolidate_fastqc [-h] inDirs [inDirs ...] outFile
- Positional arguments:
inDirs Input FASTQC directories. outFile Output report file.
- coverage_summary
Undocumented
usage: reports.py coverage_summary [-h] [--runFile RUNFILE] [--bamstatsDir BAMSTATSDIR] coverageDir coverageSuffix outFile
- Positional arguments:
coverageDir Input coverage report directory. coverageSuffix Suffix of all coverage files. outFile Output report file. - Options:
--runFile Link in plate info from seq runs. --bamstatsDir Link in read info from BAM alignments.
- consolidate_coverage
Undocumented
usage: reports.py consolidate_coverage [-h] inFiles [inFiles ...] adj outFile
- Positional arguments:
inFiles Input coverage files. adj Report adjective. outFile Output report file.
- consolidate_spike_count
Undocumented
usage: reports.py consolidate_spike_count [-h] inFiles [inFiles ...] outFile
- Positional arguments:
inFiles Input coverage files. outFile Output report file.
broad_utils.py - for data generated at the Broad Institute¶
Utilities for getting sequences out of the Broad walk-up sequencing pipeline. These utilities are probably not of much use outside the Broad.
usage: broad_utils.py subcommand
- Sub-commands:
- get_bustard_dir
Undocumented
usage: broad_utils.py get_bustard_dir [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] inDir
- Positional arguments:
inDir Picard directory - Options:
--loglevel=ERROR Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
- get_run_date
Undocumented
usage: broad_utils.py get_run_date [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] inDir
- Positional arguments:
inDir Picard directory - Options:
--loglevel=ERROR Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
- get_all_names
Undocumented
usage: broad_utils.py get_all_names [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] {samples,libraries,runs} runfile
- Positional arguments:
type Type of name
Possible choices: samples, libraries, runs
runfile File with seq run information - Options:
--loglevel=ERROR Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
- make_barcodes_file
Undocumented
usage: broad_utils.py make_barcodes_file [-h] inFile outFile
- Positional arguments:
inFile Input tab file w/header and 3-5 named columns (last two are optional): sample, barcode_1, barcode_2, library_id_per_sample, run_id_per_library outFile Output BARCODE_FILE file for Picard.
- extract_barcodes
Undocumented
usage: broad_utils.py extract_barcodes [-h] [--outMetrics OUTMETRICS] [--read_structure READ_STRUCTURE] [--max_mismatches MAX_MISMATCHES] [--minimum_base_quality MINIMUM_BASE_QUALITY] [--min_mismatch_delta MIN_MISMATCH_DELTA] [--max_no_calls MAX_NO_CALLS] [--minimum_quality MINIMUM_QUALITY] [--compress_outputs COMPRESS_OUTPUTS] [--num_processors NUM_PROCESSORS] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inDir lane barcodeFile outDir
- Positional arguments:
inDir Bustard directory. lane Lane number. barcodeFile Input tab file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2 outDir Output directory for barcodes. - Options:
--outMetrics Output metrics file. Default is to dump to a temp file. --read_structure=101T8B8B101T Picard ExtractIlluminaBarcodes READ_STRUCTURE (default: %(default)s) --max_mismatches=1 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s) --minimum_base_quality=15 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s) --min_mismatch_delta Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s) --max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s) --minimum_quality Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s) --compress_outputs Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s) --num_processors=4 Picard ExtractIlluminaBarcodes NUM_PROCESSORS (default: %(default)s) --JVMmemory=8g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- make_params_file
Undocumented
usage: broad_utils.py make_params_file [-h] inFile bamDir outFile
- Positional arguments:
inFile Input tab file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2 bamDir Directory for output bams outFile Output LIBRARY_PARAMS file for Picard
- illumina_basecalls
Undocumented
usage: broad_utils.py illumina_basecalls [-h] [--read_structure READ_STRUCTURE] [--sequencing_center SEQUENCING_CENTER] [--adapters_to_check [ADAPTERS_TO_CHECK [ADAPTERS_TO_CHECK ...]]] [--platform PLATFORM] [--max_reads_in_ram_per_tile MAX_READS_IN_RAM_PER_TILE] [--max_records_in_ram MAX_RECORDS_IN_RAM] [--num_processors NUM_PROCESSORS] [--apply_eamss_filter APPLY_EAMSS_FILTER] [--force_gc FORCE_GC] [--first_tile FIRST_TILE] [--tile_limit TILE_LIMIT] [--include_non_pf_reads INCLUDE_NON_PF_READS] [--run_start_date RUN_START_DATE] [--read_group_id READ_GROUP_ID] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBustardDir inBarcodesDir flowcell lane paramsFile
- Positional arguments:
inBustardDir Bustard directory. inBarcodesDir Barcodes directory. flowcell Flowcell ID lane Lane number. paramsFile Input tab file w/header and five named columns: BARCODE_1, BARCODE_2, OUTPUT, SAMPLE_ALIAS, LIBRARY_NAME - Options:
--read_structure=101T8B8B101T Picard ExtractIlluminaBarcodes READ_STRUCTURE (default: %(default)s) --sequencing_center=BI Picard ExtractIlluminaBarcodes SEQUENCING_CENTER (default: %(default)s) --adapters_to_check=('PAIRED_END', 'NEXTERA_V1', 'NEXTERA_V2') Picard ExtractIlluminaBarcodes ADAPTERS_TO_CHECK (default: %(default)s) --platform Picard ExtractIlluminaBarcodes PLATFORM (default: %(default)s) --max_reads_in_ram_per_tile=100000 Picard ExtractIlluminaBarcodes MAX_READS_IN_RAM_PER_TILE (default: %(default)s) --max_records_in_ram=100000 Picard ExtractIlluminaBarcodes MAX_RECORDS_IN_RAM (default: %(default)s) --num_processors=4 Picard ExtractIlluminaBarcodes NUM_PROCESSORS (default: %(default)s) --apply_eamss_filter Picard ExtractIlluminaBarcodes APPLY_EAMSS_FILTER (default: %(default)s) --force_gc=False Picard ExtractIlluminaBarcodes FORCE_GC (default: %(default)s) --first_tile Picard ExtractIlluminaBarcodes FIRST_TILE (default: %(default)s) --tile_limit Picard ExtractIlluminaBarcodes TILE_LIMIT (default: %(default)s) --include_non_pf_reads Picard ExtractIlluminaBarcodes INCLUDE_NON_PF_READS (default: %(default)s) --run_start_date Picard ExtractIlluminaBarcodes RUN_START_DATE (default: %(default)s) --read_group_id Picard ExtractIlluminaBarcodes READ_GROUP_ID (default: %(default)s) --JVMmemory=54g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
Using the Snakemake pipelines¶
Much more documentation to come...
This utilizes Snakemake, which is documented at https://bitbucket.org/johanneskoester/snakemake/wiki/Home
Note that Python 3.4 is required to use these tools with Snakemake.