3.1. metagenomics.py - metagenomic analysesΒΆ

This script contains a number of utilities for metagenomic analyses.

usage: metagenomics.py subcommand
Sub-commands:
kraken

Classify reads by taxon using Kraken

usage: metagenomics.py kraken [-h] [--outReport OUTREPORT]
                              [--outReads OUTREADS]
                              [--filterThreshold FILTERTHRESHOLD]
                              [--numThreads NUMTHREADS]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                              inBam db
Positional arguments:
inBam Input unaligned reads, BAM format.
db Kraken database directory.
Options:
--outReport Kraken report output file.
--outReads Kraken per read output file.
--filterThreshold=0.05
 Kraken filter threshold (default %(default)s)
--numThreads=1 Number of threads to run. (default %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
diamond

Classify reads by the taxon of the Lowest Common Ancestor (LCA)

usage: metagenomics.py diamond [-h] [--outM8 OUTM8] [--outLca OUTLCA]
                               [--numThreads NUMTHREADS]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               inBam db taxDb outReport
Positional arguments:
inBam Input unaligned reads, BAM format.
db Diamond database directory.
taxDb Taxonomy database directory.
outReport Output taxonomy report.
Options:
--outM8 Blast m8 formatted output file.
--outLca Output LCA assignments for each read.
--numThreads=1 Number of threads (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
krona

Create an interactive HTML report from a tabular metagenomic report

usage: metagenomics.py krona [-h] [--queryColumn QUERYCOLUMN]
                             [--taxidColumn TAXIDCOLUMN]
                             [--scoreColumn SCORECOLUMN] [--noHits] [--noRank]
                             [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                             [--version]
                             inTsv db outHtml
Positional arguments:
inTsv Input tab delimited file.
db Krona taxonomy database directory.
outHtml Output html report.
Options:
--queryColumn=2
 Column of query id. (default %(default)s)
--taxidColumn=3
 Column of taxonomy id. (default %(default)s)
--scoreColumn Column of score. (default %(default)s)
--noHits=False Include wedge for no hits.
--noRank=False Include no rank assignments.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
align_rna

Align to metagenomics bwa index, mark duplicates, and generate LCA report

usage: metagenomics.py align_rna [-h] [--dupeReport DUPEREPORT] [--sensitive]
                                 [--outBam OUTBAM] [--outLca OUTLCA]
                                 [--dupeLca DUPELCA] [--numThreads NUMTHREADS]
                                 [--JVMmemory JVMMEMORY]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmp_dir TMP_DIR]
                                 [--tmp_dirKeep]
                                 inBam db taxDb outReport
Positional arguments:
inBam Input unaligned reads, BAM format.
db Bwa index prefix.
taxDb Taxonomy database directory.
outReport Output taxonomy report.
Options:
--dupeReport Generate report including duplicates.
--sensitive=False
 Use sensitive instead of default BWA mem options.
--outBam Output aligned, indexed BAM file. Default is to write to temp.
--outLca Output LCA assignments for each read.
--dupeLca Output LCA assignments for each read including duplicates.
--numThreads=1 Number of threads (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
report_merge

Merge multiple metegenomic reports into a single metagenomic report. Any Krona input files created by this

usage: metagenomics.py report_merge [-h]
                                    [--outSummaryReport OUT_KRAKEN_SUMMARY]
                                    [--krakenDB KRAKEN_DB]
                                    [--outByQueryToTaxonID OUT_KRONA_INPUT]
                                    [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                    [--version] [--tmp_dir TMP_DIR]
                                    [--tmp_dirKeep]
                                    metagenomic_reports
                                    [metagenomic_reports ...]
Positional arguments:
metagenomic_reports
 Input metagenomic reports with the query ID and taxon ID in the 2nd and 3rd columns (Kraken format)
Options:
--outSummaryReport
 Path of human-readable metagenomic summary report, created by kraken-report
--krakenDB Kraken database (needed for outSummaryReport)
--outByQueryToTaxonID
 Output metagenomic report suitable for Krona input.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
subset_taxonomy

Generate a subset of the taxonomy db files filtered by the whitelist. The whitelist taxids indicate specific taxids plus their parents to add to taxonomy while whitelistTreeTaxids indicate specific taxids plus both parents and all children taxa. Whitelist GI and accessions can only be provided in file form and the resulting gi/accession2taxid files will be filtered to only include those in the whitelist files. Finally, taxids + parents for the gis/accessions will also be included.

usage: metagenomics.py subset_taxonomy [-h]
                                       [--whitelistTaxids WHITELISTTAXIDS [WHITELISTTAXIDS ...]]
                                       [--whitelistTaxidFile WHITELISTTAXIDFILE]
                                       [--whitelistTreeTaxids WHITELISTTREETAXIDS [WHITELISTTREETAXIDS ...]]
                                       [--whitelistTreeTaxidFile WHITELISTTREETAXIDFILE]
                                       [--whitelistGiFile WHITELISTGIFILE]
                                       [--whitelistAccessionFile WHITELISTACCESSIONFILE]
                                       [--skipGi] [--skipAccession]
                                       [--skipDeadAccession]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       taxDb outputDb
Positional arguments:
taxDb Taxonomy database directory (containing nodes.dmp, parents.dmp etc.)
outputDb Output taxonomy database directory
Options:
--whitelistTaxids
 List of taxids to add to taxonomy (with parents)
--whitelistTaxidFile
 File containing taxids - one per line - to add to taxonomy with parents.
--whitelistTreeTaxids
 List of taxids to add to taxonomy (with parents and children)
--whitelistTreeTaxidFile
 File containing taxids - one per line - to add to taxonomy with parents and children.
--whitelistGiFile
 File containing GIs - one per line - to add to taxonomy with nodes.
--whitelistAccessionFile
 File containing accessions - one per line - to add to taxonomy with nodes.
--skipGi=False Skip GI to taxid mapping files
--skipAccession=False
 Skip accession to taxid mapping files
--skipDeadAccession=False
 Skip dead accession to taxid mapping files
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
kraken_build

Builds a kraken database from library directory of fastas and taxonomy db directory. The –subsetTaxonomy option allows shrinking the taxonomy to only include taxids associated with the library folders. For this to work, the library fastas must have the standard id names such as `>NC1234.1` accessions, `>gi|123456789|ref|XXXX||`, or custom kraken name `>kraken:taxid|1234|`. Setting the –minimizerLen (default: 16) small, such as 10, will drastically shrink the db size for small inputs, which is useful for testing. The built db may include symlinks to the original –library / –taxonomy directories. If you want to build a static archiveable version of the library, simply use the –clean option, which will also remove any unnecessary files.

usage: metagenomics.py kraken_build [-h] [--library LIBRARY]
                                    [--taxonomy TAXONOMY] [--subsetTaxonomy]
                                    [--minimizerLen MINIMIZERLEN]
                                    [--kmerLen KMERLEN]
                                    [--maxDbSize MAXDBSIZE] [--clean]
                                    [--numThreads NUMTHREADS]
                                    [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                    [--version] [--tmp_dir TMP_DIR]
                                    [--tmp_dirKeep]
                                    db
Positional arguments:
db Kraken database directory.
Options:
--library Library directory of fasta files.
--taxonomy Taxonomy db directory.
--subsetTaxonomy=False
 Subset taxonomy based on library fastas.
--minimizerLen Minimizer length
--kmerLen Kmer length
--maxDbSize Maximum db size (will shrink if too big)
--clean=False Clean by deleting other database files after build
--numThreads=1 Number of threads to run. (default %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.