3.2. taxon_filter.py - tools for taxonomic removal or filtration of readsΒΆ
This script contains a number of utilities for filtering NGS reads based on membership or non-membership in a species / genus / taxonomic grouping.
usage: taxon_filter.py subcommand
- Sub-commands:
- deplete
Run the entire depletion pipeline: bwa, bmtagger, mvicuna, blastn.
usage: taxon_filter.py deplete [-h] [--bwaDbs [BWADBS [BWADBS ...]]] [--bmtaggerDbs [BMTAGGERDBS [BMTAGGERDBS ...]]] [--blastDbs [BLASTDBS [BLASTDBS ...]]] [--srprismMemory SRPRISM_MEMORY] [--chunkSize CHUNKSIZE] [--JVMmemory JVMMEMORY] [--clearTags] [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]] [--doNotSanitize] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam [revertBam] bwaBam bmtaggerBam rmdupBam blastnBam
- Positional arguments:
inBam Input BAM file. revertBam Output BAM: read markup reverted with Picard. bwaBam Output BAM: depleted of reads with BWA. bmtaggerBam Output BAM: depleted of reads with BMTagger. rmdupBam Output BAM: bmtaggerBam run through M-Vicuna duplicate removal. blastnBam Output BAM: rmdupBam run through another depletion of reads with BLASTN. - Options:
--bwaDbs=() Reference databases for blast to deplete from input. --bmtaggerDbs=() Reference databases to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. --blastDbs=() Reference databases for blast to deplete from input. --srprismMemory=7168 Memory for srprism. --chunkSize=1000000 blastn chunk size (default: %(default)s) --JVMmemory=4g JVM virtual memory size for Picard FilterSamReads (default: %(default)s) --clearTags=False When supplying an aligned input file, clear the per-read attribute tags --tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP'] A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s) --doNotSanitize=False Undocumented --threads Number of threads (default: all available cores) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_human
A wrapper around ‘deplete’; deprecated but preserved for legacy compatibility.
usage: taxon_filter.py deplete_human [-h] [--bwaDbs [BWADBS [BWADBS ...]]] [--bmtaggerDbs [BMTAGGERDBS [BMTAGGERDBS ...]]] [--blastDbs [BLASTDBS [BLASTDBS ...]]] [--srprismMemory SRPRISM_MEMORY] [--chunkSize CHUNKSIZE] [--JVMmemory JVMMEMORY] [--clearTags] [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]] [--doNotSanitize] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam [revertBam] bwaBam bmtaggerBam rmdupBam blastnBam
- Positional arguments:
inBam Input BAM file. revertBam Output BAM: read markup reverted with Picard. bwaBam Output BAM: depleted of reads with BWA. bmtaggerBam Output BAM: depleted of reads with BMTagger. rmdupBam Output BAM: bmtaggerBam run through M-Vicuna duplicate removal. blastnBam Output BAM: rmdupBam run through another depletion of reads with BLASTN. - Options:
--bwaDbs=() Reference databases for blast to deplete from input. --bmtaggerDbs=() Reference databases to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. --blastDbs=() Reference databases for blast to deplete from input. --srprismMemory=7168 Memory for srprism. --chunkSize=1000000 blastn chunk size (default: %(default)s) --JVMmemory=4g JVM virtual memory size for Picard FilterSamReads (default: %(default)s) --clearTags=False When supplying an aligned input file, clear the per-read attribute tags --tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP'] A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s) --doNotSanitize=False Undocumented --threads Number of threads (default: all available cores) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_lastal_bam
Restrict input reads to those that align to the given reference database using LASTAL.
usage: taxon_filter.py filter_lastal_bam [-h] [-n MAX_GAPLESS_ALIGNMENTS_PER_POSITION] [-l MIN_LENGTH_FOR_INITIAL_MATCHES] [-L MAX_LENGTH_FOR_INITIAL_MATCHES] [-m MAX_INITIAL_MATCHES_PER_POSITION] [--JVMmemory JVMMEMORY] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam db outBam
- Positional arguments:
inBam Input reads db Database of taxa we keep outBam Output reads, filtered to refDb - Options:
-n=1 maximum gapless alignments per query position (default: %(default)s) -l=5 minimum length for initial matches (default: %(default)s) -L=50 maximum length for initial matches (default: %(default)s) -m=100 maximum initial matches per query position (default: %(default)s) --JVMmemory=4g JVM virtual memory size (default: %(default)s) --threads Number of threads (default: all available cores) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_bam_bmtagger
Use bmtagger to deplete input reads against several databases.
usage: taxon_filter.py deplete_bam_bmtagger [-h] [--srprismMemory SRPRISM_MEMORY] [--JVMmemory JVMMEMORY] [--clearTags] [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]] [--doNotSanitize] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refDbs [refDbs ...] outBam
- Positional arguments:
inBam Input BAM file. refDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex. outBam Output BAM file. - Options:
--srprismMemory=7168 Memory for srprism. --JVMmemory=4g JVM virtual memory size (default: %(default)s) --clearTags=False When supplying an aligned input file, clear the per-read attribute tags --tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP'] A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s) --doNotSanitize=False Undocumented --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_blastn_bam
Use blastn to remove reads that match at least one of the specified databases.
usage: taxon_filter.py deplete_blastn_bam [-h] [--chunkSize CHUNKSIZE] [--JVMmemory JVMMEMORY] [--clearTags] [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]] [--doNotSanitize] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refDbs [refDbs ...] outBam
- Positional arguments:
inBam Input BAM file. refDbs One or more reference databases for blast. An ephemeral database will be created if a fasta file is provided. outBam Output BAM file with matching reads removed. - Options:
--chunkSize=1000000 FASTA chunk size (default: %(default)s) --JVMmemory=4g JVM virtual memory size (default: %(default)s) --clearTags=False When supplying an aligned input file, clear the per-read attribute tags --tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP'] A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s) --doNotSanitize=False Undocumented --threads Number of threads (default: all available cores) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- deplete_bwa_bam
Use BWA to remove reads that match at least one of the specified databases.
usage: taxon_filter.py deplete_bwa_bam [-h] [--clearTags] [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]] [--doNotSanitize] [--JVMmemory JVMMEMORY] [--threads THREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inBam refDbs [refDbs ...] outBam
- Positional arguments:
inBam Input BAM file. refDbs One or more reference databases for bwa. An ephemeral database will be created if a fasta file is provided. outBam Ouput BAM file with matching reads removed. - Options:
--clearTags=False When supplying an aligned input file, clear the per-read attribute tags --tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP'] A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s) --doNotSanitize=False Undocumented --JVMmemory=2g JVM virtual memory size (default: %(default)s) --threads Number of threads (default: all available cores) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- lastal_build_db
build a database for use with last based on an input fasta file
usage: taxon_filter.py lastal_build_db [-h] [--outputFilePrefix OUTPUTFILEPREFIX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inputFasta outputDirectory
- Positional arguments:
inputFasta Location of the input FASTA file outputDirectory Location for the output files (default is cwd: %(default)s) - Options:
--outputFilePrefix Prefix for the output file name (default: inputFasta name, sans ”.fasta” extension) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bwa_build_db
Create a database for use with bwa from an input reference FASTA file
usage: taxon_filter.py bwa_build_db [-h] [--outputFilePrefix OUTPUTFILEPREFIX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inputFasta outputDirectory
- Positional arguments:
inputFasta Location of the input FASTA file outputDirectory Location for the output files - Options:
--outputFilePrefix Prefix for the output file name (default: inputFasta name, sans ”.fasta” extension) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- blastn_build_db
Create a database for use with blastn from an input reference FASTA file
usage: taxon_filter.py blastn_build_db [-h] [--outputFilePrefix OUTPUTFILEPREFIX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inputFasta outputDirectory
- Positional arguments:
inputFasta Location of the input FASTA file outputDirectory Location for the output files - Options:
--outputFilePrefix Prefix for the output file name (default: inputFasta name, sans ”.fasta” extension) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bmtagger_build_db
Create a database for use with Bmtagger from an input FASTA file.
usage: taxon_filter.py bmtagger_build_db [-h] [--outputFilePrefix OUTPUTFILEPREFIX] [--word_size WORD_SIZE] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep] inputFasta outputDirectory
- Positional arguments:
inputFasta Location of the input FASTA file outputDirectory Location for the output files (Where *.bitmask and *.srprism files will be stored) - Options:
--outputFilePrefix Prefix for the output file name (default: inputFasta name, sans ”.fasta” extension) --word_size=18 Database word size (default: %(default)s) --loglevel=INFO Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmp_dir=/tmp Base directory for temp files. [default: %(default)s] --tmp_dirKeep=False Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.