3.9. illumina.py - for raw Illumina outputsΒΆ

Utilities for demultiplexing Illumina data.

usage: illumina.py subcommand
Sub-commands:
illumina_demux

Read Illumina runs & produce BAM files, demultiplexing to one bam per sample, or for simplex runs, a single bam will be produced bearing the flowcell ID. Wraps together Picard’s ExtractBarcodes (for multiplexed samples) and IlluminaBasecallsToSam while handling the various required input formats. Also can read Illumina BCL directories, tar.gz BCL directories.

usage: illumina.py illumina_demux [-h] [--outMetrics OUTMETRICS]
                                  [--commonBarcodes COMMONBARCODES]
                                  [--sampleSheet SAMPLESHEET]
                                  [--runInfo RUNINFO] [--flowcell FLOWCELL]
                                  [--read_structure READ_STRUCTURE]
                                  [--max_mismatches MAX_MISMATCHES]
                                  [--minimum_base_quality MINIMUM_BASE_QUALITY]
                                  [--min_mismatch_delta MIN_MISMATCH_DELTA]
                                  [--max_no_calls MAX_NO_CALLS]
                                  [--minimum_quality MINIMUM_QUALITY]
                                  [--compress_outputs COMPRESS_OUTPUTS]
                                  [--sequencing_center SEQUENCING_CENTER]
                                  [--adapters_to_check [ADAPTERS_TO_CHECK [ADAPTERS_TO_CHECK ...]]]
                                  [--platform PLATFORM]
                                  [--max_reads_in_ram_per_tile MAX_READS_IN_RAM_PER_TILE]
                                  [--max_records_in_ram MAX_RECORDS_IN_RAM]
                                  [--apply_eamss_filter APPLY_EAMSS_FILTER]
                                  [--force_gc FORCE_GC]
                                  [--first_tile FIRST_TILE]
                                  [--tile_limit TILE_LIMIT]
                                  [--include_non_pf_reads INCLUDE_NON_PF_READS]
                                  [--run_start_date RUN_START_DATE]
                                  [--read_group_id READ_GROUP_ID]
                                  [--compression_level COMPRESSION_LEVEL]
                                  [--JVMmemory JVMMEMORY] [--threads THREADS]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inDir lane outDir
Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). This is the top-level run directory.
lane Lane number.
outDir Output directory for BAM files.
Options:
--outMetrics Output ExtractIlluminaBarcodes metrics file. Default is to dump to a temp file.
--commonBarcodes
 Write a TSV report of all barcode counts, in descending order. Only applicable for read structures containing “B”
--sampleSheet Override SampleSheet. Input tab or CSV file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2. Default is to look for a SampleSheet.csv in the inDir.
--runInfo Override RunInfo. Input xml file. Default is to look for a RunInfo.xml file in the inDir.
--flowcell Override flowcell ID (default: read from RunInfo.xml).
--read_structure
 Override read structure (default: read from RunInfo.xml).
--max_mismatches=1
 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s)
--minimum_base_quality=10
 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s)
--min_mismatch_delta
 Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s)
--max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s)
--minimum_quality
 Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s)
--compress_outputs
 Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s)
--sequencing_center
 Picard IlluminaBasecallsToSam SEQUENCING_CENTER (default: %(default)s)
--adapters_to_check=('PAIRED_END', 'NEXTERA_V1', 'NEXTERA_V2')
 Picard IlluminaBasecallsToSam ADAPTERS_TO_CHECK (default: %(default)s)
--platform Picard IlluminaBasecallsToSam PLATFORM (default: %(default)s)
--max_reads_in_ram_per_tile=200000
 Picard IlluminaBasecallsToSam MAX_READS_IN_RAM_PER_TILE (default: %(default)s)
--max_records_in_ram=1000000
 Picard IlluminaBasecallsToSam MAX_RECORDS_IN_RAM (default: %(default)s)
--apply_eamss_filter
 Picard IlluminaBasecallsToSam APPLY_EAMSS_FILTER (default: %(default)s)
--force_gc Picard IlluminaBasecallsToSam FORCE_GC (default: %(default)s)
--first_tile Picard IlluminaBasecallsToSam FIRST_TILE (default: %(default)s)
--tile_limit Picard IlluminaBasecallsToSam TILE_LIMIT (default: %(default)s)
--include_non_pf_reads=False
 Picard IlluminaBasecallsToSam INCLUDE_NON_PF_READS (default: %(default)s)
--run_start_date
 Picard IlluminaBasecallsToSam RUN_START_DATE (default: %(default)s)
--read_group_id
 Picard IlluminaBasecallsToSam READ_GROUP_ID (default: %(default)s)
--compression_level=7
 Picard IlluminaBasecallsToSam COMPRESSION_LEVEL (default: %(default)s)
--JVMmemory=7g JVM virtual memory size (default: %(default)s)
--threads=0 Number of threads (default: 0)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
lane_metrics

Write out lane metrics to a tsv file.

usage: illumina.py lane_metrics [-h] [--read_structure READ_STRUCTURE]
                                [--JVMmemory JVMMEMORY]
                                [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                [--version] [--tmp_dir TMP_DIR]
                                [--tmp_dirKeep]
                                inDir outPrefix
Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). This is the top-level run directory.
outPrefix Prefix path to the *.illumina_lane_metrics and *.illumina_phasing_metrics files.
Options:
--read_structure
 Override read structure (default: read from RunInfo.xml).
--JVMmemory=8g JVM virtual memory size (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
common_barcodes

Extract Illumina barcodes for a run and write a TSV report of the barcode counts in descending order

usage: illumina.py common_barcodes [-h] [--truncateToLength TRUNCATETOLENGTH]
                                   [--omitHeader] [--includeNoise]
                                   [--outMetrics OUTMETRICS]
                                   [--sampleSheet SAMPLESHEET]
                                   [--flowcell FLOWCELL]
                                   [--read_structure READ_STRUCTURE]
                                   [--max_mismatches MAX_MISMATCHES]
                                   [--minimum_base_quality MINIMUM_BASE_QUALITY]
                                   [--min_mismatch_delta MIN_MISMATCH_DELTA]
                                   [--max_no_calls MAX_NO_CALLS]
                                   [--minimum_quality MINIMUM_QUALITY]
                                   [--compress_outputs COMPRESS_OUTPUTS]
                                   [--JVMmemory JVMMEMORY]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inDir lane outSummary
Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). This is the top-level run directory.
lane Lane number.
outSummary Path to the summary file (.tsv format). It includes several columns: (barcode1, likely_index_name1, barcode2, likely_index_name2, count), where likely index names are either the exact match index name for the barcode sequence, or those Hamming distance of 1 away.
Options:
--truncateToLength
 If specified, only this number of barcodes will be returned. Useful if you only want the top N barcodes.
--omitHeader=False
 If specified, a header will not be added to the outSummary tsv file.
--includeNoise=False
 If specified, barcodes with periods (”.”) will be included.
--outMetrics Output ExtractIlluminaBarcodes metrics file. Default is to dump to a temp file.
--sampleSheet Override SampleSheet. Input tab or CSV file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2. Default is to look for a SampleSheet.csv in the inDir.
--flowcell Override flowcell ID (default: read from RunInfo.xml).
--read_structure
 Override read structure (default: read from RunInfo.xml).
--max_mismatches=1
 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s)
--minimum_base_quality=10
 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s)
--min_mismatch_delta
 Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s)
--max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s)
--minimum_quality
 Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s)
--compress_outputs
 Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s)
--JVMmemory=8g JVM virtual memory size (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
miseq_fastq_to_bam

Convert fastq read files to a single bam file. Fastq file names must conform to patterns emitted by Miseq machines. Sample metadata must be provided in a SampleSheet.csv that corresponds to the fastq filename. Specifically, the _S##_ index in the fastq file name will be used to find the corresponding row in the SampleSheet

usage: illumina.py miseq_fastq_to_bam [-h] [--inFastq2 INFASTQ2]
                                      [--runInfo RUNINFO]
                                      [--sequencing_center SEQUENCING_CENTER]
                                      [--JVMmemory JVMMEMORY]
                                      [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                      [--version] [--tmp_dir TMP_DIR]
                                      [--tmp_dirKeep]
                                      outBam sampleSheet inFastq1
Positional arguments:
outBam Output BAM file.
sampleSheet Input SampleSheet.csv file.
inFastq1 Input fastq file; 1st end of paired-end reads if paired.
Options:
--inFastq2 Input fastq file; 2nd end of paired-end reads.
--runInfo Input RunInfo.xml file.
--sequencing_center
 Name of your sequencing center (default is the sequencing machine ID from the RunInfo.xml)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
extract_fc_metadata

Extract RunInfo.xml and SampleSheet.csv from the provided Illumina directory

usage: illumina.py extract_fc_metadata [-h]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       flowcell outRunInfo outSampleSheet
Positional arguments:
flowcell Illumina directory (possibly tarball)
outRunInfo Output RunInfo.xml file.
outSampleSheet Output SampleSheet.csv file.
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.