2.8. intrahost.py - intrahost variant calling and annotation
This script contains a number of utilities for intrahost variant calling and annotation for viral genomes.
usage: intrahost.py subcommand
2.8.2. Sub-commands
2.8.2.1. vphaser_one_sample
- Input: a single BAM file, representing reads from one sample, mapped to
its own consensus assembly. It may contain multiple read groups and libraries.
- Output: a tab-separated file with no header containing filtered
V Phaser-2 output variants with additional column for sequence/chrom name, and library counts and p-values appended to the counts for each allele.
intrahost.py vphaser_one_sample [-h] [--vphaserNumThreads VPHASERNUMTHREADS]
[--minReadsEach MINREADSEACH]
[--maxBias MAXBIAS]
[--removeDoublyMappedReads]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inBam inConsFasta outTab
2.8.2.1.1. Positional Arguments
- inBam
Input Bam file.
- inConsFasta
Consensus assembly fasta.
- outTab
Tab-separated headerless output file.
2.8.2.1.2. Named Arguments
- --vphaserNumThreads
Number of threads in call to V-Phaser 2.
- --minReadsEach
Minimum number of reads on each strand (default: 5).
Default:
5- --maxBias
- Maximum allowable ratio of number of reads on the two strands
(default: 10). Ignored if minReadsEach = 0.
Default:
10- --removeDoublyMappedReads
When calling V-Phaser, remove reads mapping to more than one contig. Default is to keep the reads.
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.2. vphaser
- Run V-Phaser 2 on the input file without any additional filtering.
- Combine the non-header lines of the CHROM.var.raw.txt files it produces,
adding CHROM as the first field on each line.
intrahost.py vphaser [-h] [--numThreads NUMTHREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inBam outTab
2.8.2.2.1. Positional Arguments
- inBam
Input Bam file.
- outTab
Tab-separated headerless output file.
2.8.2.2.2. Named Arguments
- --numThreads
Number of threads in call to V-Phaser 2.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.3. tabfile_rename
- Take input tab file and copy to an output file while changing
the values in a specific column based on a mapping file. The first line will pass through untouched (it is assumed to be a header).
intrahost.py tabfile_rename [-h] [--col_idx COL]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inFile mapFile outFile
2.8.2.3.1. Positional Arguments
- inFile
Input flat file
- mapFile
- Map file. Two-column headerless file that maps input values to
output values. This script will error if there are values in inFile that do not exist in mapFile.
- outFile
Output flat file
2.8.2.3.2. Named Arguments
- --col_idx
Which column number to replace (0-based index). [default: 0]
Default:
0- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.4. merge_to_vcf
- Combine and convert vPhaser2 parsed filtered output text files into VCF format.
- Assumption: consensus assemblies used in creating alignments do not extend beyond ends of reference.
the number of alignment files equals the number of chromosomes / segments
intrahost.py merge_to_vcf [-h] [--samples [SAMPLES ...]] --isnvs ISNVS
[ISNVS ...] --alignments ALIGNMENTS [ALIGNMENTS ...]
[--strip_chr_version] [--naive_filter]
[--parse_accession]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
refFasta outVcf
2.8.2.4.1. Positional Arguments
- refFasta
- The target reference genome. outVcf will use these
chromosome names, coordinate spaces, and reference alleles
- outVcf
Output VCF file containing all variants
2.8.2.4.2. Named Arguments
- --samples
A list of sample names
- --isnvs
- A list of file names from the output of vphaser_one_sample
These must be in the SAME ORDER as samples.
- --alignments
- a list of fasta files containing multialignment of input
assemblies, with one file per chromosome/segment. Each alignment file will contain a line for each sample, as well as the reference genome to which they were aligned.
- --strip_chr_version
- If set, strip any trailing version numbers from the
chromosome names. If the chromosome name ends with a period followed by integers, this is interepreted as a version number to be removed. This is because Genbank accession numbers are often used by SnpEff databases downstream, but without the corresponding version number. Default is false (leave chromosome names untouched).
Default:
False- --naive_filter
- If set, keep only the alleles that have at least
two independent libraries of support and allele freq > 0.005. Default is false (do not filter at this stage).
Default:
False- --parse_accession
- If set, parse only the accession for the chromosome name.
Helpful if snpEff has to create its own database
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.5. Fws
Compute the Fws statistic on iSNV data. See Manske, 2012 (Nature)
intrahost.py Fws [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inVcf outVcf
2.8.2.5.1. Positional Arguments
- inVcf
Input VCF file
- outVcf
Output VCF file
2.8.2.5.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.6. iSNV_table
Convert VCF iSNV data to tabular text
intrahost.py iSNV_table [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inVcf outFile
2.8.2.6.1. Positional Arguments
- inVcf
Input VCF file
- outFile
Output text file
2.8.2.6.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.8.2.7. iSNP_per_patient
Aggregate tabular iSNP data per patient x position (all time points averaged)
intrahost.py iSNP_per_patient [-h]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
inFile outFile
2.8.2.7.1. Positional Arguments
- inFile
Input text file
- outFile
Output text file
2.8.2.7.2. Named Arguments
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit