2.5. read_utils.py - utilities that manipulate bam and fastq filesΒΆ

Utilities for working with sequence reads, such as converting formats and fixing mate pairs.

usage: read_utils.py subcommand
Sub-commands:
purge_unmated

Undocumented

usage: read_utils.py purge_unmated [-h] [--regex REGEX]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmpDir TMPDIR]
                                   [--tmpDirKeep]
                                   inFastq1 inFastq2 outFastq1 outFastq2
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outFastq1 Output fastq file; 1st end of paired-end reads.
outFastq2 Output fastq file; 2nd end of paired-end reads.
Options:
--regex=^@(\S+)/[1|2]$
 Perl regular expression to parse paired read IDs (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
fastq_to_fasta

Undocumented

usage: read_utils.py fastq_to_fasta [-h]
                                    [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                    [--version] [--tmpDir TMPDIR]
                                    [--tmpDirKeep]
                                    inFastq outFasta
Positional arguments:
inFastq Input fastq file.
outFasta Output fasta file.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
index_fasta_samtools

Undocumented

usage: read_utils.py index_fasta_samtools [-h]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version]
                                          inFasta
Positional arguments:
inFasta Reference genome, FASTA format.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
index_fasta_picard

Undocumented

usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY]
                                        [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmpDir TMPDIR]
                                        [--tmpDirKeep]
                                        inFasta
Positional arguments:
inFasta Input reference genome, FASTA format.
Options:
--JVMmemory=512m
 JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
mkdup_picard

Undocumented

usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove]
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                  inBams [inBams ...] outBam
Positional arguments:
inBams Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--outMetrics Output metrics file. Default is to dump to a temp file.
--remove=False Instead of marking duplicates, remove them entirely (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
revert_bam_picard

Undocumented

usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY]
                                       [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmpDir TMPDIR]
                                       [--tmpDirKeep]
                                       inBam outBam
Positional arguments:
inBam Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s RevertSam, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
picard

Undocumented

usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY]
                            [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            command
Positional arguments:
command picard command
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
sort_bam

Undocumented

usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY]
                              [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                              inBam outBam {unsorted,queryname,coordinate}
Positional arguments:
inBam Input bam file.
outBam Output bam file, sorted.
sortOrder

How to sort the reads. [default: %(default)s]

Possible choices: unsorted, queryname, coordinate

Options:
--index=False Index outBam (default: %(default)s)
--md5=False MD5 checksum outBam (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s SortSam, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
merge_bams

Undocumented

usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                inBams [inBams ...] outBam
Positional arguments:
inBams Input bam files.
outBam Output bam file.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_bam

Undocumented

usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                inBam readList outBam
Positional arguments:
inBam Input bam file.
readList Input file of read IDs.
outBam Output bam file.
Options:
--exclude=False
 If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed).
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
bam_to_fastq

Undocumented

usage: read_utils.py bam_to_fastq [-h] [--outHeader OUTHEADER]
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                  inBam outFastq1 outFastq2
Positional arguments:
inBam Input bam file.
outFastq1 Output fastq file; 1st end of paired-end reads.
outFastq2 Output fastq file; 2nd end of paired-end reads.
Options:
--outHeader Optional text file name that will receive bam header.
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s SamToFastq, OPTIONNAME=value ...
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
fastq_to_bam

Undocumented

usage: read_utils.py fastq_to_bam [-h]
                                  (--sampleName SAMPLENAME | --header HEADER)
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                  inFastq1 inFastq2 outBam
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outBam Output bam file.
Options:
--sampleName Sample name to insert into the read group header.
--header Optional text file containing header.
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present.
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
split_reads

Undocumented

usage: read_utils.py split_reads [-h]
                                 [--maxReads MAXREADS | --numChunks NUMCHUNKS]
                                 [--indexLen INDEXLEN]
                                 [--format {fastq,fasta}]
                                 [--outSuffix OUTSUFFIX]
                                 inFileName outPrefix
Positional arguments:
inFileName Input fastq or fasta file.
outPrefix Output files will be named ${outPrefix}01${outSuffix}, ${outPrefix}02${outSuffix}...
Options:
--maxReads Maximum number of reads per chunk (default 1000 if neither maxReads nor numChunks is specified).
--numChunks Number of output files, if maxReads is not specified.
--indexLen=2 Number of characters to append to outputPrefix for each output file (default %(default)s). Number of files must not exceed 10^INDEXLEN.
--format=fastq

Input fastq or fasta file (default: %(default)s).

Possible choices: fastq, fasta

--outSuffix= Output filename suffix (e.g. .fastq or .fastq.gz). A suffix ending in .gz will cause the output file to be gzip compressed. Default is no suffix.
split_bam

Undocumented

usage: read_utils.py split_bam [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                               inBam outBams [outBams ...]
Positional arguments:
inBam Input BAM file.
outBams Output BAM files
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
rmdup_mvicuna_bam

Undocumented

usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmpDir TMPDIR]
                                       [--tmpDirKeep]
                                       inBam outBam
Positional arguments:
inBam Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
dup_remove_mvicuna

Undocumented

usage: read_utils.py dup_remove_mvicuna [-h]
                                        [--unpairedOutFastq UNPAIREDOUTFASTQ]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmpDir TMPDIR]
                                        [--tmpDirKeep]
                                        inFastq1 inFastq2 pairedOutFastq1
                                        pairedOutFastq2
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
pairedOutFastq1
 Output fastq file; 1st end of paired-end reads.
pairedOutFastq2
 Output fastq file; 2nd end of paired-end reads.
Options:
--unpairedOutFastq
 File name of output unpaired reads
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
rmdup_prinseq_fastq

Undocumented

usage: read_utils.py rmdup_prinseq_fastq [-h]
                                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                         [--version] [--tmpDir TMPDIR]
                                         [--tmpDirKeep]
                                         inFastq1 inFastq2 outFastq1 outFastq2
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outFastq1 Output fastq file; 1st end of paired-end reads.
outFastq2 Output fastq file; 2nd end of paired-end reads.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_bam_mapped_only

Undocumented

usage: read_utils.py filter_bam_mapped_only [-h] [--JVMmemory JVMMEMORY]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmpDir TMPDIR]
                                            [--tmpDirKeep]
                                            inBam outBam
Positional arguments:
inBam Input aligned reads, BAM format.
outBam Output sorted indexed reads, filtered to aligned-only, BAM format.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
novoalign

Undocumented

usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL]
                               [--JVMmemory JVMMEMORY]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                               inBam refFasta outBam
Positional arguments:
inBam Input reads, BAM format.
refFasta Reference genome, FASTA format, pre-indexed by Novoindex.
outBam Output reads, BAM format (aligned).
Options:
--options=-r Random
 Novoalign options (default: %(default)s)
--min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
novoindex

Undocumented

usage: read_utils.py novoindex [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version]
                               refFasta
Positional arguments:
refFasta Reference genome, FASTA format.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
gatk_ug

Undocumented

usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY]
                             [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                             [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                             inBam refFasta outVcf
Positional arguments:
inBam Input reads, BAM format.
refFasta Reference genome, FASTA format, pre-indexed by Picard.
outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well.
Options:
--options=--min_base_quality_score 15 -ploidy 4
 UnifiedGenotyper options (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
gatk_realign

Undocumented

usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                  inBam refFasta outBam
Positional arguments:
inBam Input reads, BAM format, aligned to refFasta.
refFasta Reference genome, FASTA format, pre-indexed by Picard.
outBam Realigned reads.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.