2.5. metagenomics.py - utilities for metagenomic analyses
This script contains a number of utilities for metagenomic analyses.
usage: metagenomics.py subcommand
2.5.2. Sub-commands
2.5.2.1. subset_taxonomy
Generate a subset of the taxonomy db files filtered by the whitelist. The whitelist taxids indicate specific taxids plus their parents to add to taxonomy while whitelistTreeTaxids indicate specific taxids plus both parents and all children taxa. Whitelist GI and accessions can only be provided in file form and the resulting gi/accession2taxid files will be filtered to only include those in the whitelist files. Finally, taxids + parents for the gis/accessions will also be included.
metagenomics.py subset_taxonomy [-h]
[--whitelistTaxids WHITELISTTAXIDS [WHITELISTTAXIDS ...]]
[--whitelistTaxidFile WHITELISTTAXIDFILE]
[--whitelistTreeTaxids WHITELISTTREETAXIDS [WHITELISTTREETAXIDS ...]]
[--whitelistTreeTaxidFile WHITELISTTREETAXIDFILE]
[--whitelistGiFile WHITELISTGIFILE]
[--whitelistAccessionFile WHITELISTACCESSIONFILE]
[--skipGi] [--skipAccession]
[--skipDeadAccession]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
taxDb outputDb
2.5.2.1.1. Positional Arguments
- taxDb
Taxonomy database directory (containing nodes.dmp, parents.dmp etc.)
- outputDb
Output taxonomy database directory
2.5.2.1.2. Named Arguments
- --whitelistTaxids
List of taxids to add to taxonomy (with parents)
- --whitelistTaxidFile
File containing taxids - one per line - to add to taxonomy with parents.
- --whitelistTreeTaxids
List of taxids to add to taxonomy (with parents and children)
- --whitelistTreeTaxidFile
File containing taxids - one per line - to add to taxonomy with parents and children.
- --whitelistGiFile
File containing GIs - one per line - to add to taxonomy with nodes.
- --whitelistAccessionFile
File containing accessions - one per line - to add to taxonomy with nodes.
- --skipGi
Skip GI to taxid mapping files
Default:
False- --skipAccession
Skip accession to taxid mapping files
Default:
False- --skipDeadAccession
Skip dead accession to taxid mapping files
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.2. filter_taxids_to_focal_hits
Generate a subset of the taxids_tsv file filtered by the focal_report_tsv. We will only emit rows from the taxids_tsv that contain taxids that are either contained within or are a child/descendant of nodes contained within the focal_report_tsv
metagenomics.py filter_taxids_to_focal_hits [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
taxids_tsv focal_report_tsv
taxdb_dir min_read_count
output_tsv
2.5.2.2.1. Positional Arguments
- taxids_tsv
TSV file where first column is a taxid
- focal_report_tsv
TSV produced by taxlevel_plurality
- taxdb_dir
Taxonomy database directory (containing nodes.dmp, parents.dmp etc.)
- min_read_count
ignore focal_report_tsv entries below this read count
- output_tsv
Output TSV file where first column is a taxid
2.5.2.2.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.3. kraken2
Classify reads by taxon using Kraken2
metagenomics.py kraken2 [-h] [--outReports OUTREPORTS [OUTREPORTS ...]]
[--outReads OUTREADS [OUTREADS ...]]
[--minimum_hit_groups MINIMUM_HIT_GROUPS]
[--min_base_qual MIN_BASE_QUAL]
[--confidence CONFIDENCE] [--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
db inBams [inBams ...]
2.5.2.3.1. Positional Arguments
- db
Kraken database directory.
- inBams
Input unaligned reads, BAM format.
2.5.2.3.2. Named Arguments
- --outReports
Kraken2 summary report output file. Multiple filenames space separated.
- --outReads
Kraken2 per read classification output file. Multiple filenames space separated.
- --minimum_hit_groups
Minimum hit groups (Kraken2 default: 2)
- --min_base_qual
Minimum base quality (default None)
- --confidence
Kraken2 confidence score threshold (default None)
- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.4. kb
Runs kb count on the input BAM files.
- Args:
in_bam (list): List of input BAM files. out_dir (str): Output directory. Defaults to None. index (str): Path to the kb index file. t2g (list|str): Transcript-to-gene mapping file(s). kmer_len (int, optional): K-mer size for the alignment. Defaults to 31. parity (str, optional): Library parity (default: single). Defaults to ‘single’. technology (str, optional): Sequencing technology used. Defaults to ‘bulk’. h5ad (bool, optional): Whether to output HDF5 file. Defaults to False. loom (bool, optional): Whether to output Loom file. Defaults to False. protein (bool, optional): Whether the sequence contains amino acids. Defaults to False. threads (int, optional): Number of threads to use. Defaults to None.
metagenomics.py kb [-h] [--index INDEX] [--t2g T2G] [--kmer_len KMER_LEN]
[--parity {single,paired}]
[--technology {10xv2,10xv3,10xv3-3prime,10xv3-5prime,dropseq,indrop,celseq,celseq2,smartseq2,bulk}]
[--h5ad] [--loom] [--protein] [--out_dir OUT_DIR]
[--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
in_bam
2.5.2.4.1. Positional Arguments
- in_bam
Input unaligned reads, BAM format.
2.5.2.4.2. Named Arguments
- --index
kb index file.
- --t2g
Input unaligned reads, BAM format.
- --kmer_len
k-mer size (default: 31bp)
Default:
31- --parity
Possible choices: single, paired
Library parity (default: single)
Default:
'single'- --technology
Possible choices: 10xv2, 10xv3, 10xv3-3prime, 10xv3-5prime, dropseq, indrop, celseq, celseq2, smartseq2, bulk
Technology used to generate the data (default: bulk)
Default:
'bulk'- --h5ad
Output HDF5 file (default: False)
Default:
False- --loom
Output Loom file (default: False)
Default:
False- --protein
True if sequence contains amino acids (default: False).
Default:
False- --out_dir
Output directory (default: kb_out)
Default:
'kb_out'- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.5. kma
metagenomics.py kma [-h] [--outPrefixes OUTPREFIXES [OUTPREFIXES ...]]
[--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
db inBams [inBams ...]
2.5.2.5.1. Positional Arguments
- db
KMA database prefix.
- inBams
Input unaligned reads, BAM format.
2.5.2.5.2. Named Arguments
- --outPrefixes
KMA output prefixes.
- --threads
Number of threads.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.6. kma_build
metagenomics.py kma_build [-h] [--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
ref_fasta db_prefix
2.5.2.6.1. Positional Arguments
- ref_fasta
Reference FASTA file.
- db_prefix
Output database prefix.
2.5.2.6.2. Named Arguments
- --threads
Number of threads.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.7. krona
Create an interactive HTML report from a tabular metagenomic report
metagenomics.py krona [-h] [--sample_name SAMPLE_NAME]
[--queryColumn QUERYCOLUMN] [--taxidColumn TAXIDCOLUMN]
[--scoreColumn SCORECOLUMN]
[--magnitudeColumn MAGNITUDECOLUMN] [--noHits]
[--noRank] [--inputType {tsv,kraken2}]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inReports [inReports ...] db outHtml
2.5.2.7.1. Positional Arguments
- inReports
Input report file (default: tsv)
- db
Krona taxonomy database directory.
- outHtml
Output html report.
2.5.2.7.2. Named Arguments
- --sample_name
Title of dataset (default basename(inReport))
- --queryColumn
Column of query id. (default 2)
Default:
2- --taxidColumn
Column of taxonomy id. (default 3)
Default:
3- --scoreColumn
Column of score. (default None)
- --magnitudeColumn
Column of magnitude. (default None)
- --noHits
Include wedge for no hits.
Default:
False- --noRank
Include no rank assignments.
Default:
False- --inputType
Possible choices: tsv, kraken2
Handling for specialized report types.
Default:
'tsv'- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.5.2.8. report_merge
Merge multiple metagenomic reports into a single metagenomic report suitable for Krona input.
metagenomics.py report_merge [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
metagenomic_reports [metagenomic_reports ...]
out_krona_input
2.5.2.8.1. Positional Arguments
- metagenomic_reports
Input metagenomic reports with the query ID and taxon ID in the 2nd and 3rd columns (Kraken format)
- out_krona_input
Output metagenomic report suitable for Krona input.
2.5.2.8.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.9. filter_bam_to_taxa
Filter an (already classified) input bam file to only include reads that have been mapped to specified taxonomic IDs or scientific names. This requires a classification file, as produced by tools such as Kraken, as well as the NCBI taxonomy database.
metagenomics.py filter_bam_to_taxa [-h] [--exclude]
[--taxNames TAX_NAMES [TAX_NAMES ...]]
[--taxIDs TAX_IDS [TAX_IDS ...]]
[--without-children]
[--read_id_col READ_ID_COL]
[--tax_id_col TAX_ID_COL]
[--out_count OUT_COUNT]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
in_bam read_IDs_to_tax_IDs out_bam
nodes_dmp names_dmp
2.5.2.9.1. Positional Arguments
- in_bam
Input bam file.
- read_IDs_to_tax_IDs
TSV file mapping read IDs to taxIDs, Kraken-format by default. Assumes bijective mapping of read ID to tax ID.
- out_bam
Output bam file, filtered to the taxa specified
- nodes_dmp
nodes.dmp file from ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
- names_dmp
names.dmp file from ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
2.5.2.9.2. Named Arguments
- --exclude
Switch filtration to remove all reads falling under matching taxa (and keep all non-matching). Default is the inverse: keep all reads falling under matching taxa (and remove all non-matching).
Default:
False- --taxNames
The taxonomic names to include. More than one can be specified. Mapped to Tax IDs by lowercase exact match only. Ex. “Viruses” This is in addition to any taxonomic IDs provided.
- --taxIDs
The NCBI taxonomy IDs to include. More than one can be specified. This is in addition to any taxonomic names provided.
- --without-children
Omit reads classified more specifically than each taxon specified (without this a taxon and its children are included).
Default:
False- --read_id_col
The (zero-indexed) number of the column in read_IDs_to_tax_IDs containing read IDs. (default: 1)
Default:
1- --tax_id_col
The (zero-indexed) number of the column in read_IDs_to_tax_IDs containing Taxonomy IDs. (default: 2)
Default:
2- --out_count
Write a file with the number of reads matching the specified taxa.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.10. taxlevel_summary
Aggregates taxonomic abundance data from multiple Kraken-format summary files. It is intended to report information on a particular taxonomic level (–taxlevelFocus; ex. ‘species’), within a higher-level grouping (–taxHeading; ex. ‘Viruses’). By default, when –taxHeading is at the same level as –taxlevelFocus a summary with lines for each sample is emitted. Otherwise, a histogram is returned. If per-sample information is desired, –noHist can be specified. In per-sample data, the suffix “-pt” indicates percentage, so a value of 0.02 is 0.0002 of the total number of reads for the sample. If –topN is specified, only the top N most abundant taxa are included in the histogram count or per-sample output. If a number is specified for –countThreshold, only taxa with that number of reads (or greater) are included. Full data returned via –jsonOut (filtered by –topN and –countThreshold), whereas -csvOut returns a summary.
metagenomics.py taxlevel_summary [-h] [--jsonOut JSON_OUT] [--csvOut CSV_OUT]
[--taxHeading TAX_HEADINGS [TAX_HEADINGS ...]]
[--taxlevelFocus TAXLEVEL_FOCUS]
[--topN TOP_N_ENTRIES]
[--countThreshold COUNT_THRESHOLD]
[--zeroFill] [--noHist] [--includeRoot]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
summary_files_in [summary_files_in ...]
2.5.2.10.1. Positional Arguments
- summary_files_in
Kraken-format summary text file with tab-delimited taxonomic levels.
2.5.2.10.2. Named Arguments
- --jsonOut
The path to a json file containing the relevant parsed summary data in json format.
- --csvOut
The path to a csv file containing sample-specific counts.
- --taxHeading
The taxonomic heading to analyze (default: ‘Viruses’). More than one can be specified.
Default:
'Viruses'- --taxlevelFocus
The taxonomic heading to summarize (totals by Genus, etc.) (default: ‘species’).
Default:
'species'- --topN
Only include the top N most abundant taxa by read count (default: 100)
Default:
100- --countThreshold
Minimum number of reads to be included (default: 1)
Default:
1- --zeroFill
When absent from a sample, write zeroes (rather than leaving blank).
Default:
False- --noHist
Write out a report by-sample rather than a histogram.
Default:
False- --includeRoot
Include the count of reads at the root level and the unclassified bin.
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.11. taxlevel_plurality
Identifies the most abundant taxon (of any rank) contributing to a node of interest in the taxonomic tree. It is intended to highlight the primary contributor of taxonomic signal within a taxonomic category of interest, for example, the most abundant virus among all viruses.
metagenomics.py taxlevel_plurality [-h] [--min_reads MIN_READS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR]
[--tmp_dirKeep]
summary_file tax_heading out_report
2.5.2.11.1. Positional Arguments
- summary_file
input Kraken-format summary text file with tab-delimited taxonomic levels.
- tax_heading
The taxonomic heading to analyze.
- out_report
tab-delimited output file.
2.5.2.11.2. Named Arguments
- --min_reads
Only include hits with more than min_reads (default: 1)
Default:
1- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.12. kb_extract
Runs kb extract on the input BAM file.
- Args:
in_bam (str): Input BAM file. index (str): Path to the kb index file. t2g (str): Path to the transcript-to-gene mapping file. targets (str): Comma-separated list of target sequences to extract. protein (bool): True if sequence contains amino acids. Defaults to False. out_dir (str): Output directory. Defaults to None. h5ad (str): Path to the output h5ad file. Can pull IDs to extract from this file. Defaults to None. threshold (int, optional): Minimum read count threshold for a target to be extracted. Defaults to 1. threads (int, optional): Number of threads to use. Defaults to None.
metagenomics.py kb_extract [-h] [--index INDEX] [--t2g T2G]
[--out_dir OUT_DIR] [--protein] [--targets TARGETS]
[--h5ad H5AD] [--threshold THRESHOLD]
[--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
in_bam
2.5.2.12.1. Positional Arguments
- in_bam
Input unaligned reads, BAM format.
2.5.2.12.2. Named Arguments
- --index
kb index file.
- --t2g
Transcript to gene mapping file.
- --out_dir
Output directory (default: kb_out)
Default:
'kb_out'- --protein
True if sequence contains amino acids (default: False).
Default:
False- --targets
Comma-separated list of target sequences to extract from input sequences.
- --h5ad
Path to the output h5ad file. Can pull IDs to extract from this file.
- --threshold
Minimum read count threshold for a target to be extracted (only used when extractin IDs from h5ad; default: 1)
Default:
1- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.13. kb_top_taxa
Identifies the most abundant taxon (of any rank) contributing to a taxa node of interest in kb count output.
It is intended to highlight the primary contributor of taxonomic signal within a taxonomic category of interest, for example, the most abundant virus among all viruses.
- Args:
counts_tar (str): Path to the input kb count tarball (tar.zst format). out_report (str): Path to the output report file. id_to_tax_map (str, optional): Path to the ID to taxonomy mapping file (CSV format). target_taxon (str): The taxonomic category to analyze (default: ‘Viruses’).
metagenomics.py kb_top_taxa [-h] [--id-to-tax-map ID_TO_TAX_MAP]
[--target-taxon TARGET_TAXON]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
counts_tar out_report
2.5.2.13.1. Positional Arguments
- counts_tar
Input kb count tarball (tar.zst format).
- out_report
Tab-delimited output file.
2.5.2.13.2. Named Arguments
- --id-to-tax-map
ID to taxonomy mapping file (CSV format).
- --target-taxon
Target taxonomic category to analyze (default: Viruses).
Default:
'Viruses'- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.14. kb_merge_h5ads
Merge multiple kb count output tarballs into a single h5ad file with sample metadata.
Extracts h5ad files from counts_unfiltered folder and adds sample names from matrix.cells.
- Args:
in_count_tars (list): List of input kb count tarballs (tar.zst format). out_h5ad (str): Path to the output h5ad file. tmp_dir (str, optional): Temporary directory for extraction.
metagenomics.py kb_merge_h5ads [-h] [--out-h5ad OUT_H5AD]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
in_count_tars [in_count_tars ...]
2.5.2.14.1. Positional Arguments
- in_count_tars
Input kb count tarballs to merge (tar.zst format).
2.5.2.14.2. Named Arguments
- --out-h5ad
Output merged h5ad file.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.15. krona_build
Builds a Krona taxonomy database
metagenomics.py krona_build [-h] [--taxdump_tar_gz TAXDUMP_TAR_GZ]
[--get_accessions]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
db
2.5.2.15.1. Positional Arguments
- db
Krona taxonomy database output directory.
2.5.2.15.2. Named Arguments
- --taxdump_tar_gz
NCBI taxdump.tar.gz file
- --get_accessions
Fetch NCBI accession to taxid mappings. This is not required for processing kraken1/2/uniq hits, only for BLAST hits, and adds a significant amount of time and database space (default false).
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.16. kraken2_build
Builds a kraken2 database from library directory of fastas and taxonomy db directory. The –subsetTaxonomy option allows shrinking the taxonomy to only include taxids associated with the library folders. For this to work, the library fastas must have the standard id names such as >NC1234.1 accessions, >gi|123456789|ref|XXXX||, or custom kraken name >kraken:taxid|1234|.
metagenomics.py kraken2_build [-h] [--tax_db TAX_DB]
[--taxdump_out TAXDUMP_OUT]
[--standard_libraries {archaea,bacteria,plasmid,viral,human,fungi,plant,protozoa,nr,nt,env_nr,env_nt,UniVec,UniVec_Core} [{archaea,bacteria,plasmid,viral,human,fungi,plant,protozoa,nr,nt,env_nr,env_nt,UniVec,UniVec_Core} ...]]
[--custom_libraries CUSTOM_LIBRARIES [CUSTOM_LIBRARIES ...]]
[--kmerLen KMERLEN]
[--minimizerLen MINIMIZERLEN]
[--minimizerSpaces MINIMIZERSPACES] [--protein]
[--maxDbSize MAXDBSIZE] [--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
db
2.5.2.16.1. Positional Arguments
- db
Kraken database output directory.
2.5.2.16.2. Named Arguments
- --tax_db
Use pre-existing kraken2 taxonomy db structure
- --taxdump_out
Save ncbi taxdump.tar.gz file
- --standard_libraries
Possible choices: archaea, bacteria, plasmid, viral, human, fungi, plant, protozoa, nr, nt, env_nr, env_nt, UniVec, UniVec_Core
A list of “standard” kraken libraries to download on the fly and add.
- --custom_libraries
Custom fasta files with properly formatted headers.
- --kmerLen
k-mer length (kraken2 default: 35nt/15aa)
- --minimizerLen
Minimizer length (kraken2 default: 31nt/12aa)
- --minimizerSpaces
Number of characters in minimizer that are ignored in comparisons (kraken2 default: 7nt/0aa)
- --protein
Build protein database (default false/nucleotide).
Default:
False- --maxDbSize
Maximum db size in GB (default: none)
- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.5.2.17. kb_build
Builds a kb index from a reference fasta file.
- Args:
ref_fasta (str): Path to the reference sequence fasta file. index (str): Path to the output kb index file. workflow (str): Type of index to create. Options are ‘standard’, ‘nac’, ‘kite’, ‘custom’. kmer_len (int): k-mer length (default: 31). protein (bool): True if sequence contains amino acids (default: False). threads (int): Number of threads to use (default: None).
metagenomics.py kb_build [-h] [--index INDEX]
[--workflow {standard,nac,kite,custom}]
[--kmer_len KMER_LEN] [--protein] [--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
ref_fasta
2.5.2.17.1. Positional Arguments
- ref_fasta
Reference sequence fasta file.
2.5.2.17.2. Named Arguments
- --index
kb output index file.
- --workflow
Possible choices: standard, nac, kite, custom
Type of index to create (default: ‘standard’).
Default:
'standard'- --kmer_len
k-mer length (default: 31).
- --protein
True if sequence contains amino acids(default: False).
Default:
False- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False