3.6. read_utils.py - utilities that manipulate bam and fastq filesΒΆ

Utilities for working with sequence reads, such as converting formats and fixing mate pairs.

usage: read_utils.py subcommand
Sub-commands:
purge_unmated

Use mergeShuffledFastqSeqs to purge unmated reads, and put corresponding reads in the same order. Corresponding sequences must have sequence identifiers of the form SEQID/1 and SEQID/2.

usage: read_utils.py purge_unmated [-h] [--regex REGEX]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inFastq1 inFastq2 outFastq1 outFastq2
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outFastq1 Output fastq file; 1st end of paired-end reads.
outFastq2 Output fastq file; 2nd end of paired-end reads.
Options:
--regex=^@(\S+)/[1|2]$
 Perl regular expression to parse paired read IDs (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
index_fasta_samtools

Index a reference genome for Samtools.

usage: read_utils.py index_fasta_samtools [-h]
                                          [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                          [--version]
                                          inFasta
Positional arguments:
inFasta Reference genome, FASTA format.
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
index_fasta_picard

Create an index file for a reference genome suitable for Picard/GATK.

usage: read_utils.py index_fasta_picard [-h] [--JVMmemory JVMMEMORY]
                                        [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmp_dir TMP_DIR]
                                        [--tmp_dirKeep]
                                        inFasta
Positional arguments:
inFasta Input reference genome, FASTA format.
Options:
--JVMmemory=512m
 JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s CreateSequenceDictionary, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
mkdup_picard

Mark or remove duplicate reads from BAM file.

usage: read_utils.py mkdup_picard [-h] [--outMetrics OUTMETRICS] [--remove]
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inBams [inBams ...] outBam
Positional arguments:
inBams Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--outMetrics Output metrics file. Default is to dump to a temp file.
--remove=False Instead of marking duplicates, remove them entirely (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MarkDuplicates, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
revert_bam_picard

Revert BAM to raw reads

usage: read_utils.py revert_bam_picard [-h] [--JVMmemory JVMMEMORY]
                                       [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                       [--clearTags]
                                       [--tagsToClear TAGS_TO_CLEAR [TAGS_TO_CLEAR ...]]
                                       [--doNotSanitize]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       inBam outBam
Positional arguments:
inBam Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s RevertSam, OPTIONNAME=value ...
--clearTags=False
 When supplying an aligned input file, clear the per-read attribute tags
--tagsToClear=['XT', 'X0', 'X1', 'XA', 'AM', 'SM', 'BQ', 'CT', 'XN', 'OC', 'OP']
 A space-separated list of tags to remove from all reads in the input bam file (default: %(default)s)
--doNotSanitize=False
 Undocumented
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
picard

Generic Picard runner.

usage: read_utils.py picard [-h] [--JVMmemory JVMMEMORY]
                            [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                            command
Positional arguments:
command picard command
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
sort_bam

Sort BAM file

usage: read_utils.py sort_bam [-h] [--index] [--md5] [--JVMmemory JVMMEMORY]
                              [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                              inBam outBam {unsorted,queryname,coordinate}
Positional arguments:
inBam Input bam file.
outBam Output bam file, sorted.
sortOrder

How to sort the reads. [default: %(default)s]

Possible choices: unsorted, queryname, coordinate

Options:
--index=False Index outBam (default: %(default)s)
--md5=False MD5 checksum outBam (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s SortSam, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
downsample_bams

Downsample multiple bam files to the smallest read count in common, or to the specified count.

usage: read_utils.py downsample_bams [-h] [--outPath OUT_PATH]
                                     [--readCount SPECIFIED_READ_COUNT]
                                     [--deduplicateBefore | --deduplicateAfter]
                                     [--JVMmemory JVMMEMORY]
                                     [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                     [--threads THREADS]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmp_dir TMP_DIR]
                                     [--tmp_dirKeep]
                                     in_bams [in_bams ...]
Positional arguments:
in_bams Input bam files.
Options:
--outPath Output path. If not provided, downsampled bam files will be written to the same paths as each source bam file
--readCount The number of reads to downsample to.
--deduplicateBefore=False
 de-duplicate reads before downsampling.
--deduplicateAfter=False
 de-duplicate reads after downsampling.
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s DownsampleSam, OPTIONNAME=value ...
--threads Number of threads (default: all available cores)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
merge_bams

Merge multiple BAMs into one

usage: read_utils.py merge_bams [-h] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                inBams [inBams ...] outBam
Positional arguments:
inBams Input bam files.
outBam Output bam file.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s MergeSamFiles, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_bam

Filter BAM file by read name

usage: read_utils.py filter_bam [-h] [--exclude] [--JVMmemory JVMMEMORY]
                                [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                inBam readList outBam
Positional arguments:
inBam Input bam file.
readList Input file of read IDs.
outBam Output bam file.
Options:
--exclude=False
 If specified, readList is a list of reads to remove from input. Default behavior is to treat readList as an inclusion list (all unnamed reads are removed).
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s FilterSamReads, OPTIONNAME=value ...
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
fastq_to_bam

Convert a pair of fastq paired-end read files and optional text header to a single bam file.

usage: read_utils.py fastq_to_bam [-h]
                                  (--sampleName SAMPLENAME | --header HEADER)
                                  [--JVMmemory JVMMEMORY]
                                  [--picardOptions [PICARDOPTIONS [PICARDOPTIONS ...]]]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inFastq1 inFastq2 outBam
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outBam Output bam file.
Options:
--sampleName Sample name to insert into the read group header.
--header Optional text file containing header.
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--picardOptions=[]
 Optional arguments to Picard’s FastqToSam, OPTIONNAME=value ... Note that header-related options will be overwritten by HEADER if present.
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
join_paired_fastq

Join paired fastq reads into single reads with Ns

usage: read_utils.py join_paired_fastq [-h] [--outFormat OUTFORMAT]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       output inFastqs [inFastqs ...]
Positional arguments:
output Output file.
inFastqs Input fastq file (2 if paired, 1 if interleaved)
Options:
--outFormat=fastq
 Output file format.
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
split_bam

Split BAM file equally into several output BAM files.

usage: read_utils.py split_bam [-h]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               inBam outBams [outBams ...]
Positional arguments:
inBam Input BAM file.
outBams Output BAM files
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
reheader_bam

Copy a BAM file (inBam to outBam) while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI

usage: read_utils.py reheader_bam [-h]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inBam rgMap outBam
Positional arguments:
inBam Input reads, BAM format.
rgMap Tabular file containing three columns: field, old, new.
outBam Output reads, BAM format.
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
reheader_bams

Copy BAM files while renaming elements of the BAM header. The mapping file specifies which (key, old value, new value) mappings. For example: LB lib1 lib_one SM sample1 Sample_1 SM sample2 Sample_2 SM sample3 Sample_3 CN broad BI FN in1.bam out1.bam FN in2.bam out2.bam

usage: read_utils.py reheader_bams [-h]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   rgMap
Positional arguments:
rgMap Tabular file containing three columns: field, old, new.
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
rmdup_cdhit_bam

Remove duplicate reads from BAM file using cd-hit-dup.

usage: read_utils.py rmdup_cdhit_bam [-h] [--JVMmemory JVM_MEMORY]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmp_dir TMP_DIR]
                                     [--tmp_dirKeep]
                                     inBam outBam
Positional arguments:
inBam Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
rmdup_mvicuna_bam

Remove duplicate reads from BAM file using M-Vicuna. The primary advantage to this approach over Picard’s MarkDuplicates tool is that Picard requires that input reads are aligned to a reference, and M-Vicuna can operate on unaligned reads.

usage: read_utils.py rmdup_mvicuna_bam [-h] [--JVMmemory JVMMEMORY]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       inBam outBam
Positional arguments:
inBam Input reads, BAM format.
outBam Output reads, BAM format.
Options:
--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
rmdup_prinseq_fastq

Run prinseq-lite’s duplicate removal operation on paired-end reads. Also removes reads with more than one N.

usage: read_utils.py rmdup_prinseq_fastq [-h] [--includeUnmated]
                                         [--unpairedOutFastq1 UNPAIREDOUTFASTQ1]
                                         [--unpairedOutFastq2 UNPAIREDOUTFASTQ2]
                                         [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                         [--version] [--tmp_dir TMP_DIR]
                                         [--tmp_dirKeep]
                                         inFastq1 inFastq2 outFastq1 outFastq2
Positional arguments:
inFastq1 Input fastq file; 1st end of paired-end reads.
inFastq2 Input fastq file; 2nd end of paired-end reads.
outFastq1 Output fastq file; 1st end of paired-end reads.
outFastq2 Output fastq file; 2nd end of paired-end reads.
Options:
--includeUnmated=False
 Include unmated reads in the main output fastq files (default: %(default)s)
--unpairedOutFastq1
 File name of output unpaired reads from 1st end of paired-end reads (independent of –includeUnmated)
--unpairedOutFastq2
 File name of output unpaired reads from 2nd end of paired-end reads (independent of –includeUnmated)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
filter_bam_mapped_only

Samtools to reduce a BAM file to only reads that are aligned (-F 4) with a non-zero mapping quality (-q 1) and are not marked as a PCR/optical duplicate (-F 1024).

usage: read_utils.py filter_bam_mapped_only [-h]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmp_dir TMP_DIR]
                                            [--tmp_dirKeep]
                                            inBam outBam
Positional arguments:
inBam Input aligned reads, BAM format.
outBam Output sorted indexed reads, filtered to aligned-only, BAM format.
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
novoalign

Align reads with Novoalign. Sort and index BAM output.

usage: read_utils.py novoalign [-h] [--options OPTIONS] [--min_qual MIN_QUAL]
                               [--JVMmemory JVMMEMORY]
                               [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               inBam refFasta outBam
Positional arguments:
inBam Input reads, BAM format.
refFasta Reference genome, FASTA format, pre-indexed by Novoindex.
outBam Output reads, BAM format (aligned).
Options:
--options=-r Random
 Novoalign options (default: %(default)s)
--min_qual=0 Filter outBam to minimum mapping quality (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--NOVOALIGN_LICENSE_PATH
 A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
novoindex

usage: read_utils.py novoindex [-h]
                               [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version]
                               refFasta
Positional arguments:
refFasta Reference genome, FASTA format.
Options:
--NOVOALIGN_LICENSE_PATH
 A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
gatk_ug

Call genotypes using the GATK UnifiedGenotyper.

usage: read_utils.py gatk_ug [-h] [--options OPTIONS] [--JVMmemory JVMMEMORY]
                             [--GATK_PATH GATK_PATH]
                             [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                             [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                             inBam refFasta outVcf
Positional arguments:
inBam Input reads, BAM format.
refFasta Reference genome, FASTA format, pre-indexed by Picard.
outVcf Output calls in VCF format. If this filename ends with .gz, GATK will BGZIP compress the output and produce a Tabix index file as well.
Options:
--options=--min_base_quality_score 15 -ploidy 4
 UnifiedGenotyper options (default: %(default)s)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
gatk_realign

Local realignment of BAM files with GATK IndelRealigner.

usage: read_utils.py gatk_realign [-h] [--JVMmemory JVMMEMORY]
                                  [--GATK_PATH GATK_PATH]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep] [--threads THREADS]
                                  inBam refFasta outBam
Positional arguments:
inBam Input reads, BAM format, aligned to refFasta.
refFasta Reference genome, FASTA format, pre-indexed by Picard.
outBam Realigned reads.
Options:
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
--threads Number of threads (default: all available cores)
align_and_fix

Take reads, align to reference with Novoalign, optionally mark duplicates with Picard, realign indels with GATK, and optionally filters final file to mapped/non-dupe reads.

usage: read_utils.py align_and_fix [-h] [--outBamAll OUTBAMALL]
                                   [--outBamFiltered OUTBAMFILTERED]
                                   [--aligner_options ALIGNER_OPTIONS]
                                   [--aligner {novoalign,bwa}]
                                   [--JVMmemory JVMMEMORY] [--threads THREADS]
                                   [--skipMarkDupes] [--GATK_PATH GATK_PATH]
                                   [--NOVOALIGN_LICENSE_PATH NOVOALIGN_LICENSE_PATH]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inBam refFasta
Positional arguments:
inBam Input unaligned reads, BAM format.
refFasta Reference genome, FASTA format; will be indexed by Picard and Novoalign.
Options:
--outBamAll Aligned, sorted, and indexed reads. Unmapped and duplicate reads are retained. By default, duplicate reads are marked. If “–skipMarkDupes” is specified duplicate reads are included in outout without being marked.
--outBamFiltered
 Aligned, sorted, and indexed reads. Unmapped reads are removed from this file, as well as any marked duplicate reads. Note that if “–skipMarkDupes” is provided, duplicates will be not be marked and will be included in the output.
--aligner_options
 aligner options (default for novoalign: “-r Random”, bwa: “-T 30”
--aligner=novoalign
 

aligner (default: %(default)s)

Possible choices: novoalign, bwa

--JVMmemory=4g JVM virtual memory size (default: %(default)s)
--threads Number of threads (default: all available cores)
--skipMarkDupes=False
 If specified, duplicate reads will not be marked in the resulting output file.
--GATK_PATH A path containing the GATK jar file. This overrides the GATK_ENV environment variable or the GATK conda package. (default: %(default)s)
--NOVOALIGN_LICENSE_PATH
 A path to the novoalign.lic file. This overrides the NOVOALIGN_LICENSE_PATH environment variable. (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
bwamem_idxstats

Take reads, align to reference with BWA-MEM and perform samtools idxstats.

usage: read_utils.py bwamem_idxstats [-h] [--outBam OUTBAM]
                                     [--outStats OUTSTATS]
                                     [--minScoreToFilter MIN_SCORE_TO_FILTER]
                                     [--alignerOptions ALIGNER_OPTIONS]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmp_dir TMP_DIR]
                                     [--tmp_dirKeep]
                                     inBam refFasta
Positional arguments:
inBam Input unaligned reads, BAM format.
refFasta Reference genome, FASTA format, pre-indexed by Picard and Novoalign.
Options:
--outBam Output aligned, indexed BAM file
--outStats Output idxstats file
--minScoreToFilter
 Filter bwa alignments using this value as the minimum allowed alignment score. Specifically, sum the alignment scores across all alignments for each query (including reads in a pair, supplementary and secondary alignments) and then only include, in the output, queries whose summed alignment score is at least this value. This is only applied when the aligner is ‘bwa’. The filtering on a summed alignment score is sensible for reads in a pair and supplementary alignments, but may not be reasonable if bwa outputs secondary alignments (i.e., if ‘-a’ is in the aligner options). (default: not set - i.e., do not filter bwa’s output)
--alignerOptions
 bwa options (default: bwa defaults)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
extract_tarball

Extract an input .tar, .tgz, .tar.gz, .tar.bz2, .tar.lz4, or .zip file to a given directory (or we will choose one on our own). Emit the resulting directory path to stdout.

usage: read_utils.py extract_tarball [-h]
                                     [--compression {gz,bz2,lz4,zip,none,auto}]
                                     [--pipe_hint PIPE_HINT]
                                     [--threads THREADS]
                                     [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                     [--version] [--tmp_dir TMP_DIR]
                                     [--tmp_dirKeep]
                                     tarfile out_dir
Positional arguments:
tarfile Input tar file. May be “-” for stdin.
out_dir Output directory
Options:
--compression=auto
 

Compression type (default: %(default)s). Auto-detect is incompatible with stdin input unless pipe_hint is specified.

Possible choices: gz, bz2, lz4, zip, none, auto

--pipe_hint If tarfile is stdin, you can provide a file-like URI string for pipe_hint which ends with a common compression file extension if you want to use compression=auto.
--threads Number of threads (default: all available cores)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.