2.5. read_utils.py - utilities that manipulate bam and fastq filesΒΆ
Utilities for working with sequence reads, such as converting formats and fixing mate pairs.
usage: read_utils.py subcommand
- Sub-commands:
- purge_unmated
Undocumented
usage: read_utils.py purge_unmated [-h] [--regex REGEX] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--regex=^@(\S+)/[1|2]$ Perl regular expression to parse paired read IDs (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_fasta
Undocumented
usage: read_utils.py fastq_to_fasta [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq outFasta
- Positional arguments:
inFastq Input fastq file. outFasta Output fasta file. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- index_fasta_samtools
Undocumented
usage: read_utils.py index_fasta_samtools [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] inFasta
- Positional arguments:
inFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- index_fasta_picard
Undocumented
usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFasta
- Positional arguments:
inFasta Input reference genome, FASTA format. - Options:
--JVMmemory=512m JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- mkdup_picard
Undocumented
usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input reads, BAM format. outBam Output reads, BAM format. - Options:
--outMetrics Output metrics file. Default is to dump to a temp file. --remove=False Instead of marking duplicates, remove them entirely (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- revert_bam_picard
Undocumented
usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s RevertSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- picard
Undocumented
usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] command
- Positional arguments:
command picard command - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- sort_bam
Undocumented
usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam {unsorted,queryname,coordinate}
- Positional arguments:
inBam Input bam file. outBam Output bam file, sorted. sortOrder How to sort the reads. [default: %(default)s]
Possible choices: unsorted, queryname, coordinate
- Options:
--index=False Index outBam (default: %(default)s) --md5=False MD5 checksum outBam (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SortSam, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- merge_bams
Undocumented
usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBams [inBams ...] outBam
- Positional arguments:
inBams Input bam files. outBam Output bam file. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam
Undocumented
usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam readList outBam
- Positional arguments:
inBam Input bam file. readList Input file of read IDs. outBam Output bam file. - Options:
--exclude=False If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed). --JVMmemory=4g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- bam_to_fastq
Undocumented
usage: read_utils.py bam_to_fastq [-h] [--outHeader OUTHEADER] [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outFastq1 outFastq2
- Positional arguments:
inBam Input bam file. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--outHeader Optional text file name that will receive bam header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s SamToFastq, OPTIONNAME=value ... --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- fastq_to_bam
Undocumented
usage: read_utils.py fastq_to_bam [-h] (--sampleName SAMPLENAME | --header HEADER) [--JVMmemory JVMMEMORY] [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outBam
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outBam Output bam file. - Options:
--sampleName Sample name to insert into the read group header. --header Optional text file containing header. --JVMmemory=2g JVM virtual memory size (default: %(default)s) --picardOptions=[] Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present. --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- split_reads
Undocumented
usage: read_utils.py split_reads [-h] [--maxReads MAXREADS | --numChunks NUMCHUNKS] [--indexLen INDEXLEN] [--format {fastq,fasta}] [--outSuffix OUTSUFFIX] inFileName outPrefix
- Positional arguments:
inFileName Input fastq or fasta file. outPrefix Output files will be named ${outPrefix}01${outSuffix}, ${outPrefix}02${outSuffix}... - Options:
--maxReads Maximum number of reads per chunk (default 1000 if neither maxReads nor numChunks is specified). --numChunks Number of output files, if maxReads is not specified. --indexLen=2 Number of characters to append to outputPrefix for each output file (default %(default)s). Number of files must not exceed 10^INDEXLEN. --format=fastq Input fastq or fasta file (default: %(default)s).
Possible choices: fastq, fasta
--outSuffix= Output filename suffix (e.g. .fastq or .fastq.gz). A suffix ending in .gz will cause the output file to be gzip compressed. Default is no suffix.
- split_bam
Undocumented
usage: read_utils.py split_bam [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBams [outBams ...]
- Positional arguments:
inBam Input BAM file. outBams Output BAM files - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_mvicuna_bam
Undocumented
usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input reads, BAM format. outBam Output reads, BAM format. - Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- dup_remove_mvicuna
Undocumented
usage: read_utils.py dup_remove_mvicuna [-h] [--unpairedOutFastq UNPAIREDOUTFASTQ] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 pairedOutFastq1 pairedOutFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. pairedOutFastq1 Output fastq file; 1st end of paired-end reads. pairedOutFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--unpairedOutFastq File name of output unpaired reads --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- rmdup_prinseq_fastq
Undocumented
usage: read_utils.py rmdup_prinseq_fastq [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inFastq1 inFastq2 outFastq1 outFastq2
- Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads. inFastq2 Input fastq file; 2nd end of paired-end reads. outFastq1 Output fastq file; 1st end of paired-end reads. outFastq2 Output fastq file; 2nd end of paired-end reads. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- filter_bam_mapped_only
Undocumented
usage: read_utils.py filter_bam_mapped_only [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam outBam
- Positional arguments:
inBam Input aligned reads, BAM format. outBam Output sorted indexed reads, filtered to aligned-only, BAM format. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoalign
Undocumented
usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Novoindex. outBam Output reads, BAM format (aligned). - Options:
--options=-r Random Novoalign options (default: %(default)s) --min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- novoindex
Undocumented
usage: read_utils.py novoindex [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] refFasta
- Positional arguments:
refFasta Reference genome, FASTA format. - Options:
--loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit
- gatk_ug
Undocumented
usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outVcf
- Positional arguments:
inBam Input reads, BAM format. refFasta Reference genome, FASTA format, pre-indexed by Picard. outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well. - Options:
--options=--min_base_quality_score 15 -ploidy 4 UnifiedGenotyper options (default: %(default)s) --JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
- gatk_realign
Undocumented
usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}] [--version] [--tmpDir TMPDIR] [--tmpDirKeep] inBam refFasta outBam
- Positional arguments:
inBam Input reads, BAM format, aligned to refFasta. refFasta Reference genome, FASTA format, pre-indexed by Picard. outBam Realigned reads. - Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s) --loglevel=DEBUG Verboseness of output. [default: %(default)s]
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
--version, -V show program’s version number and exit --tmpDir=/tmp Base directory for temp files. [default: %(default)s] --tmpDirKeep=False Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.