2.8. intrahost.py - intrahost variant calling and annotation

This script contains a number of utilities for intrahost variant calling and annotation for viral genomes.

usage: intrahost.py subcommand

2.8.1. subcommands



Possible choices: vphaser_one_sample, vphaser, tabfile_rename, merge_to_vcf, Fws, iSNV_table, iSNP_per_patient

2.8.2. Sub-commands

2.8.2.1. vphaser_one_sample

Input: a single BAM file, representing reads from one sample, mapped to

its own consensus assembly. It may contain multiple read groups and libraries.

Output: a tab-separated file with no header containing filtered

V Phaser-2 output variants with additional column for sequence/chrom name, and library counts and p-values appended to the counts for each allele.

intrahost.py vphaser_one_sample [-h] [--vphaserNumThreads VPHASERNUMTHREADS]
                                [--minReadsEach MINREADSEACH]
                                [--maxBias MAXBIAS]
                                [--removeDoublyMappedReads]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version]
                                inBam inConsFasta outTab

2.8.2.1.1. Positional Arguments

inBam

Input Bam file.

inConsFasta

Consensus assembly fasta.

outTab

Tab-separated headerless output file.

2.8.2.1.2. Named Arguments

--vphaserNumThreads

Number of threads in call to V-Phaser 2.

--minReadsEach

Minimum number of reads on each strand (default: 5).

Default: 5

--maxBias
Maximum allowable ratio of number of reads on the two strands

(default: 10). Ignored if minReadsEach = 0.

Default: 10

--removeDoublyMappedReads

When calling V-Phaser, remove reads mapping to more than one contig. Default is to keep the reads.

Default: False

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.2. vphaser

Run V-Phaser 2 on the input file without any additional filtering. Combine the non-header lines of the CHROM.var.raw.txt files it produces,

adding CHROM as the first field on each line.

intrahost.py vphaser [-h] [--numThreads NUMTHREADS]
                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                     [--version]
                     inBam outTab

2.8.2.2.1. Positional Arguments

inBam

Input Bam file.

outTab

Tab-separated headerless output file.

2.8.2.2.2. Named Arguments

--numThreads

Number of threads in call to V-Phaser 2.

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.3. tabfile_rename

Take input tab file and copy to an output file while changing the values in a specific column based on a mapping file. The first line will pass through untouched (it is assumed to be a header).

intrahost.py tabfile_rename [-h] [--col_idx COL]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version]
                            inFile mapFile outFile

2.8.2.3.1. Positional Arguments

inFile

Input flat file

mapFile
Map file. Two-column headerless file that maps input values to

output values. This script will error if there are values in inFile that do not exist in mapFile.

outFile

Output flat file

2.8.2.3.2. Named Arguments

--col_idx

Which column number to replace (0-based index). [default: 0]

Default: 0

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.4. merge_to_vcf

Combine and convert vPhaser2 parsed filtered output text files into VCF format. Assumption: consensus assemblies used in creating alignments do not extend beyond ends of reference.

the number of alignment files equals the number of chromosomes / segments

intrahost.py merge_to_vcf [-h] [--samples [SAMPLES ...]] --isnvs ISNVS
                          [ISNVS ...] --alignments ALIGNMENTS [ALIGNMENTS ...]
                          [--strip_chr_version] [--naive_filter]
                          [--parse_accession]
                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                          [--version]
                          refFasta outVcf

2.8.2.4.1. Positional Arguments

refFasta
The target reference genome. outVcf will use these

chromosome names, coordinate spaces, and reference alleles

outVcf

Output VCF file containing all variants

2.8.2.4.2. Named Arguments

--samples

A list of sample names

--isnvs
A list of file names from the output of vphaser_one_sample

These must be in the SAME ORDER as samples.

--alignments
a list of fasta files containing multialignment of input

assemblies, with one file per chromosome/segment. Each alignment file will contain a line for each sample, as well as the reference genome to which they were aligned.

--strip_chr_version
If set, strip any trailing version numbers from the

chromosome names. If the chromosome name ends with a period followed by integers, this is interepreted as a version number to be removed. This is because Genbank accession numbers are often used by SnpEff databases downstream, but without the corresponding version number. Default is false (leave chromosome names untouched).

Default: False

--naive_filter
If set, keep only the alleles that have at least

two independent libraries of support and allele freq > 0.005. Default is false (do not filter at this stage).

Default: False

--parse_accession
If set, parse only the accession for the chromosome name.

Helpful if snpEff has to create its own database

Default: False

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.5. Fws

Compute the Fws statistic on iSNV data. See Manske, 2012 (Nature)

intrahost.py Fws [-h]
                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                 [--version]
                 inVcf outVcf

2.8.2.5.1. Positional Arguments

inVcf

Input VCF file

outVcf

Output VCF file

2.8.2.5.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.6. iSNV_table

Convert VCF iSNV data to tabular text

intrahost.py iSNV_table [-h]
                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                        [--version]
                        inVcf outFile

2.8.2.6.1. Positional Arguments

inVcf

Input VCF file

outFile

Output text file

2.8.2.6.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

2.8.2.7. iSNP_per_patient

Aggregate tabular iSNP data per patient x position (all time points averaged)

intrahost.py iSNP_per_patient [-h]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version]
                              inFile outFile

2.8.2.7.1. Positional Arguments

inFile

Input text file

outFile

Output text file

2.8.2.7.2. Named Arguments

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit