3.5. intrahost.py - within-host genetic variation (iSNVs)ΒΆ
This script contains a number of utilities for intrahost variant calling and annotation for viral genomes.
usage: intrahost.py subcommand
- Sub-commands:
- vphaser_one_sample
Input: a single BAM file, representing reads from one sample, mapped to its own consensus assembly. It may contain multiple read groups and libraries. Output: a tab-separated file with no header containing filtered V Phaser-2 output variants with additional column for sequence/chrom name, and library counts and p-values appended to the counts for each allele.
usage: intrahost.py vphaser_one_sample [-h] [--vphaserNumThreads VPHASERNUMTHREADS] [--minReadsEach MINREADSEACH] [--maxBias MAXBIAS] [--removeDoublyMappedReads] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inBam inConsFasta outTab
- Positional arguments:
inBam Input Bam file. inConsFasta Consensus assembly fasta. outTab Tab-separated headerless output file. - Options:
--vphaserNumThreads Number of threads in call to V-Phaser 2. --minReadsEach=5 Minimum number of reads on each strand (default: %(default)s). --maxBias=10 Maximum allowable ratio of number of reads on the two strands (default: %(default)s). Ignored if minReadsEach = 0. --removeDoublyMappedReads=False When calling V-Phaser, remove reads mapping to more than one contig. Default is to keep the reads. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- vphaser
Run V-Phaser 2 on the input file without any additional filtering. Combine the non-header lines of the CHROM.var.raw.txt files it produces, adding CHROM as the first field on each line.
usage: intrahost.py vphaser [-h] [--numThreads NUMTHREADS] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inBam outTab
- Positional arguments:
inBam Input Bam file. outTab Tab-separated headerless output file. - Options:
--numThreads Number of threads in call to V-Phaser 2. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- tabfile_rename
Take input tab file and copy to an output file while changing the values in a specific column based on a mapping file. The first line will pass through untouched (it is assumed to be a header).
usage: intrahost.py tabfile_rename [-h] [--col_idx COL] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile mapFile outFile
- Positional arguments:
inFile Input flat file mapFile Map file. Two-column headerless file that maps input values to output values. This script will error if there are values in inFile that do not exist in mapFile. outFile Output flat file - Options:
--col_idx=0 Which column number to replace (0-based index). [default: %(default)s] --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- merge_to_vcf
Combine and convert vPhaser2 parsed filtered output text files into VCF format. Assumption: consensus assemblies used in creating alignments do not extend beyond ends of reference. the number of alignment files equals the number of chromosomes / segments
usage: intrahost.py merge_to_vcf [-h] --samples SAMPLES [SAMPLES ...] --isnvs ISNVS [ISNVS ...] --alignments ALIGNMENTS [ALIGNMENTS ...] [--strip_chr_version] [--naive_filter] [--parse_accession] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] refFasta outVcf
- Positional arguments:
refFasta The target reference genome. outVcf will use these chromosome names, coordinate spaces, and reference alleles outVcf Output VCF file containing all variants - Options:
--samples A list of sample names --isnvs A list of file names from the output of vphaser_one_sample These must be in the SAME ORDER as samples. --alignments a list of fasta files containing multialignment of input assemblies, with one file per chromosome/segment. Each alignment file will contain a line for each sample, as well as the reference genome to which they were aligned. --strip_chr_version=False If set, strip any trailing version numbers from the chromosome names. If the chromosome name ends with a period followed by integers, this is interepreted as a version number to be removed. This is because Genbank accession numbers are often used by SnpEff databases downstream, but without the corresponding version number. Default is false (leave chromosome names untouched). --naive_filter=False If set, keep only the alleles that have at least two independent libraries of support and allele freq > 0.005. Default is false (do not filter at this stage). --parse_accession=False If set, parse only the accession for the chromosome name. Helpful if snpEff has to create its own database --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- Fws
Compute the Fws statistic on iSNV data. See Manske, 2012 (Nature)
usage: intrahost.py Fws [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcf outVcf
- Positional arguments:
inVcf Input VCF file outVcf Output VCF file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- iSNV_table
Convert VCF iSNV data to tabular text
usage: intrahost.py iSNV_table [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inVcf outFile
- Positional arguments:
inVcf Input VCF file outFile Output text file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- iSNP_per_patient
Aggregate tabular iSNP data per patient x position (all time points averaged)
usage: intrahost.py iSNP_per_patient [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFile outFile
- Positional arguments:
inFile Input text file outFile Output text file - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit