3.5. intrahost.py - within-host genetic variation (iSNVs)ΒΆ

This script contains a number of utilities for intrahost variant calling and annotation for viral genomes.

usage: intrahost.py subcommand
Sub-commands:
vphaser_one_sample

Input: a single BAM file, representing reads from one sample, mapped to its own consensus assembly. It may contain multiple read groups and libraries. Output: a tab-separated file with no header containing filtered V Phaser-2 output variants with additional column for sequence/chrom name, and library counts and p-values appended to the counts for each allele.

usage: intrahost.py vphaser_one_sample [-h]
                                       [--vphaserNumThreads VPHASERNUMTHREADS]
                                       [--minReadsEach MINREADSEACH]
                                       [--maxBias MAXBIAS]
                                       [--removeDoublyMappedReads]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version]
                                       inBam inConsFasta outTab
Positional arguments:
inBam Input Bam file.
inConsFasta Consensus assembly fasta.
outTab Tab-separated headerless output file.
Options:
--vphaserNumThreads
 Number of threads in call to V-Phaser 2.
--minReadsEach=5
 Minimum number of reads on each strand (default: %(default)s).
--maxBias=10 Maximum allowable ratio of number of reads on the two strands (default: %(default)s). Ignored if minReadsEach = 0.
--removeDoublyMappedReads=False
 When calling V-Phaser, remove reads mapping to more than one contig. Default is to keep the reads.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
vphaser

Run V-Phaser 2 on the input file without any additional filtering. Combine the non-header lines of the CHROM.var.raw.txt files it produces, adding CHROM as the first field on each line.

usage: intrahost.py vphaser [-h] [--numThreads NUMTHREADS]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version]
                            inBam outTab
Positional arguments:
inBam Input Bam file.
outTab Tab-separated headerless output file.
Options:
--numThreads Number of threads in call to V-Phaser 2.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
tabfile_rename

Take input tab file and copy to an output file while changing the values in a specific column based on a mapping file. The first line will pass through untouched (it is assumed to be a header).

usage: intrahost.py tabfile_rename [-h] [--col_idx COL]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version]
                                   inFile mapFile outFile
Positional arguments:
inFile Input flat file
mapFile Map file. Two-column headerless file that maps input values to output values. This script will error if there are values in inFile that do not exist in mapFile.
outFile Output flat file
Options:
--col_idx=0 Which column number to replace (0-based index). [default: %(default)s]
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
merge_to_vcf

Combine and convert vPhaser2 parsed filtered output text files into VCF format. Assumption: consensus assemblies used in creating alignments do not extend beyond ends of reference. the number of alignment files equals the number of chromosomes / segments

usage: intrahost.py merge_to_vcf [-h] --samples SAMPLES [SAMPLES ...] --isnvs
                                 ISNVS [ISNVS ...] --alignments ALIGNMENTS
                                 [ALIGNMENTS ...] [--strip_chr_version]
                                 [--naive_filter] [--parse_accession]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version]
                                 refFasta outVcf
Positional arguments:
refFasta The target reference genome. outVcf will use these chromosome names, coordinate spaces, and reference alleles
outVcf Output VCF file containing all variants
Options:
--samples A list of sample names
--isnvs A list of file names from the output of vphaser_one_sample These must be in the SAME ORDER as samples.
--alignments a list of fasta files containing multialignment of input assemblies, with one file per chromosome/segment. Each alignment file will contain a line for each sample, as well as the reference genome to which they were aligned.
--strip_chr_version=False
 If set, strip any trailing version numbers from the chromosome names. If the chromosome name ends with a period followed by integers, this is interepreted as a version number to be removed. This is because Genbank accession numbers are often used by SnpEff databases downstream, but without the corresponding version number. Default is false (leave chromosome names untouched).
--naive_filter=False
 If set, keep only the alleles that have at least two independent libraries of support and allele freq > 0.005. Default is false (do not filter at this stage).
--parse_accession=False
 If set, parse only the accession for the chromosome name. Helpful if snpEff has to create its own database
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
Fws

Compute the Fws statistic on iSNV data. See Manske, 2012 (Nature)

usage: intrahost.py Fws [-h]
                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                        [--version]
                        inVcf outVcf
Positional arguments:
inVcf Input VCF file
outVcf Output VCF file
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
iSNV_table

Convert VCF iSNV data to tabular text

usage: intrahost.py iSNV_table [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version]
                               inVcf outFile
Positional arguments:
inVcf Input VCF file
outFile Output text file
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
iSNP_per_patient

Aggregate tabular iSNP data per patient x position (all time points averaged)

usage: intrahost.py iSNP_per_patient [-h]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version]
                                     inFile outFile
Positional arguments:
inFile Input text file
outFile Output text file
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit