3.1. taxon_filter.py - tools for taxonomic removal or filtration of readsΒΆ

This script contains a number of utilities for filtering NGS reads based on membership or non-membership in a species / genus / taxonomic grouping.

usage: taxon_filter.py subcommand
Sub-commands:
deplete_human

Undocumented

Run the entire depletion pipeline: bmtagger, mvicuna, blastn. Optionally, use lastal to select a specific taxon of interest.

usage: taxon_filter.py deplete_human [-h] [--taxfiltBam TAXFILTBAM]
                                     --bmtaggerDbs BMTAGGERDBS
                                     [BMTAGGERDBS ...] --blastDbs BLASTDBS
                                     [BLASTDBS ...] [--lastDb LASTDB]
                                     [--JVMmemory JVMMEMORY]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmpDir TMPDIR]
                                     [--tmpDirKeep]
                                     inBam revertBam bmtaggerBam rmdupBam
                                     blastnBam
Positional arguments:
inBam Input BAM file.
revertBam Output BAM: read markup reverted with Picard.
bmtaggerBam Output BAM: depleted of human reads with BMTagger.
rmdupBam Output BAM: bmtaggerBam run through M-Vicuna duplicate removal.
blastnBam Output BAM: rmdupBam run through another depletion of human reads with BLASTN.
Options:
--taxfiltBam Output BAM: blastnBam run through taxonomic selection via LASTAL.
--bmtaggerDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex.
--blastDbs One or more reference databases for blast to deplete from input.
--lastDb One reference database for last (required if –taxfiltBam is specified).
--JVMmemory=4g JVM virtual memory size for Picard FilterSamReads (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
trim_trimmomatic

Undocumented

Trim read sequences with Trimmomatic.

usage: taxon_filter.py trim_trimmomatic [-h]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmpDir TMPDIR]
                                        [--tmpDirKeep]
                                        inFastq1 inFastq2 pairedOutFastq1
                                        pairedOutFastq2 clipFasta
Positional arguments:
inFastq1 Input reads 1
inFastq2 Input reads 2
pairedOutFastq1
 Paired output 1
pairedOutFastq2
 Paired output 2
clipFasta Fasta file with adapters, PCR sequences, etc. to clip off
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_lastal_bam

Undocumented

Restrict input reads to those that align to the given reference database using LASTAL.

usage: taxon_filter.py filter_lastal_bam [-h] [--JVMmemory JVMMEMORY]
                                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                         [--version] [--tmpDir TMPDIR]
                                         [--tmpDirKeep]
                                         inBam db outBam
Positional arguments:
inBam Input reads
db Database of taxa we keep
outBam Output reads, filtered to refDb
Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_lastal

Undocumented

Restrict input reads to those that align to the given reference database using LASTAL. Also, remove duplicates with prinseq.

usage: taxon_filter.py filter_lastal [-h]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmpDir TMPDIR]
                                     [--tmpDirKeep]
                                     inFastq refDb outFastq
Positional arguments:
inFastq Input fastq file
refDb Reference database to retain from input
outFastq Output fastq file
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
partition_bmtagger

Undocumented

Use bmtagger to partition input reads into ones that match at least one of several databases and ones that don’t match any of the databases.

usage: taxon_filter.py partition_bmtagger [-h] [--outMatch OUTMATCH OUTMATCH]
                                          [--outNoMatch OUTNOMATCH OUTNOMATCH]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmpDir TMPDIR]
                                          [--tmpDirKeep]
                                          inFastq1 inFastq2 refDbs
                                          [refDbs ...]
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads. Must have same names as inFastq1
refDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex.
Options:
--outMatch Filenames for fastq output of matching reads.
--outNoMatch Filenames for fastq output of unmatched reads.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
deplete_bam_bmtagger

Undocumented

Use bmtagger to deplete input reads against several databases.

usage: taxon_filter.py deplete_bam_bmtagger [-h] [--JVMmemory JVMMEMORY]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmpDir TMPDIR]
                                            [--tmpDirKeep]
                                            inBam refDbs [refDbs ...] outBam
Positional arguments:
inBam Input BAM file.
refDbs Reference databases (one or more) to deplete from input. For each db, requires prior creation of db.bitmask by bmtool, and db.srprism.idx, db.srprism.map, etc. by srprism mkindex.
outBam Output BAM file.
Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
deplete_blastn

Undocumented

Use blastn to remove reads that match at least one of the databases.

usage: taxon_filter.py deplete_blastn [-h]
                                      [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                      [--version] [--tmpDir TMPDIR]
                                      [--tmpDirKeep]
                                      inFastq outFastq refDbs [refDbs ...]
Positional arguments:
inFastq Input fastq file.
outFastq Output fastq file with matching reads removed.
refDbs One or more reference databases for blast.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
deplete_blastn_paired

Undocumented

Use blastn to remove reads that match at least one of the databases.

usage: taxon_filter.py deplete_blastn_paired [-h]
                                             [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                             [--version] [--tmpDir TMPDIR]
                                             [--tmpDirKeep]
                                             infq1 infq2 outfq1 outfq2 refDbs
                                             [refDbs ...]
Positional arguments:
infq1 Input fastq file.
infq2 Input fastq file.
outfq1 Output fastq file with matching reads removed.
outfq2 Output fastq file with matching reads removed.
refDbs One or more reference databases for blast.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
deplete_blastn_bam

Undocumented

Use blastn to remove reads that match at least one of the specified databases.

usage: taxon_filter.py deplete_blastn_bam [-h] [--chunkSize CHUNKSIZE]
                                          [--JVMmemory JVMMEMORY]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version] [--tmpDir TMPDIR]
                                          [--tmpDirKeep]
                                          inBam refDbs [refDbs ...] outBam
Positional arguments:
inBam Input BAM file.
refDbs One or more reference databases for blast.
outBam Output BAM file with matching reads removed.
Options:
--chunkSize=1000000
 FASTA chunk size (default: %(default)s)
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.