3.8. illumina.py - for raw Illumina outputsΒΆ

Utilities for demultiplexing Illumina data.

usage: illumina.py subcommand
Sub-commands:
illumina_demux

Demultiplex Illumina runs & produce BAM files, one per sample. Wraps together Picard’s ExtractBarcodes and IlluminaBasecallsToSam while handling the various required input formats. Also can read Illumina BCL directories, tar.gz BCL directories. TO DO: read BCL or tar.gz BCL directories from S3 / object store.

usage: illumina.py illumina_demux [-h] [--outMetrics OUTMETRICS]
                                  [--commonBarcodes COMMONBARCODES]
                                  [--sampleSheet SAMPLESHEET]
                                  [--flowcell FLOWCELL]
                                  [--read_structure READ_STRUCTURE]
                                  [--max_mismatches MAX_MISMATCHES]
                                  [--minimum_base_quality MINIMUM_BASE_QUALITY]
                                  [--min_mismatch_delta MIN_MISMATCH_DELTA]
                                  [--max_no_calls MAX_NO_CALLS]
                                  [--minimum_quality MINIMUM_QUALITY]
                                  [--compress_outputs COMPRESS_OUTPUTS]
                                  [--sequencing_center SEQUENCING_CENTER]
                                  [--adapters_to_check [ADAPTERS_TO_CHECK [ADAPTERS_TO_CHECK ...]]]
                                  [--platform PLATFORM]
                                  [--max_reads_in_ram_per_tile MAX_READS_IN_RAM_PER_TILE]
                                  [--max_records_in_ram MAX_RECORDS_IN_RAM]
                                  [--num_processors NUM_PROCESSORS]
                                  [--apply_eamss_filter APPLY_EAMSS_FILTER]
                                  [--force_gc FORCE_GC]
                                  [--first_tile FIRST_TILE]
                                  [--tile_limit TILE_LIMIT]
                                  [--include_non_pf_reads INCLUDE_NON_PF_READS]
                                  [--run_start_date RUN_START_DATE]
                                  [--read_group_id READ_GROUP_ID]
                                  [--JVMmemory JVMMEMORY]
                                  [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                  [--version] [--tmp_dir TMP_DIR]
                                  [--tmp_dirKeep]
                                  inDir lane outDir
Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). This is the top-level run directory.
lane Lane number.
outDir Output directory for BAM files.
Options:
--outMetrics Output ExtractIlluminaBarcodes metrics file. Default is to dump to a temp file.
--commonBarcodes
 Write a TSV report of all barcode counts, in descending order.
--sampleSheet Override SampleSheet. Input tab or CSV file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2. Default is to look for a SampleSheet.csv in the inDir.
--flowcell Override flowcell ID (default: read from RunInfo.xml).
--read_structure
 Override read structure (default: read from RunInfo.xml).
--max_mismatches=0
 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s)
--minimum_base_quality=25
 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s)
--min_mismatch_delta
 Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s)
--max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s)
--minimum_quality
 Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s)
--compress_outputs
 Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s)
--sequencing_center
 Picard IlluminaBasecallsToSam SEQUENCING_CENTER (default: %(default)s)
--adapters_to_check=('PAIRED_END', 'NEXTERA_V1', 'NEXTERA_V2')
 Picard IlluminaBasecallsToSam ADAPTERS_TO_CHECK (default: %(default)s)
--platform Picard IlluminaBasecallsToSam PLATFORM (default: %(default)s)
--max_reads_in_ram_per_tile=100000
 Picard IlluminaBasecallsToSam MAX_READS_IN_RAM_PER_TILE (default: %(default)s)
--max_records_in_ram=100000
 Picard IlluminaBasecallsToSam MAX_RECORDS_IN_RAM (default: %(default)s)
--num_processors=4
 Picard IlluminaBasecallsToSam NUM_PROCESSORS (default: %(default)s)
--apply_eamss_filter
 Picard IlluminaBasecallsToSam APPLY_EAMSS_FILTER (default: %(default)s)
--force_gc=False
 Picard IlluminaBasecallsToSam FORCE_GC (default: %(default)s)
--first_tile Picard IlluminaBasecallsToSam FIRST_TILE (default: %(default)s)
--tile_limit Picard IlluminaBasecallsToSam TILE_LIMIT (default: %(default)s)
--include_non_pf_reads=False
 Picard IlluminaBasecallsToSam INCLUDE_NON_PF_READS (default: %(default)s)
--run_start_date
 Picard IlluminaBasecallsToSam RUN_START_DATE (default: %(default)s)
--read_group_id
 Picard IlluminaBasecallsToSam READ_GROUP_ID (default: %(default)s)
--JVMmemory=54g
 JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
common_barcodes

Extract Illumina barcodes for a run and write a TSV report of the barcode counts in descending order

usage: illumina.py common_barcodes [-h] [--truncateToLength TRUNCATETOLENGTH]
                                   [--omitHeader] [--includeNoise]
                                   [--outMetrics OUTMETRICS]
                                   [--sampleSheet SAMPLESHEET]
                                   [--flowcell FLOWCELL]
                                   [--read_structure READ_STRUCTURE]
                                   [--max_mismatches MAX_MISMATCHES]
                                   [--minimum_base_quality MINIMUM_BASE_QUALITY]
                                   [--min_mismatch_delta MIN_MISMATCH_DELTA]
                                   [--max_no_calls MAX_NO_CALLS]
                                   [--minimum_quality MINIMUM_QUALITY]
                                   [--compress_outputs COMPRESS_OUTPUTS]
                                   [--JVMmemory JVMMEMORY]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmp_dir TMP_DIR]
                                   [--tmp_dirKeep]
                                   inDir lane outSummary
Positional arguments:
inDir Illumina BCL directory (or tar.gz of BCL directory). This is the top-level run directory.
lane Lane number.
outSummary Path to the summary file (.tsv format). It includes two columns: (barcode, count)
Options:
--truncateToLength
 If specified, only this number of barcodes will be returned. Useful if you only want the top N barcodes.
--omitHeader=False
 If specified, a header will not be added to the outSummary tsv file.
--includeNoise=False
 If specified, barcodes with periods (”.”) will be included.
--outMetrics Output ExtractIlluminaBarcodes metrics file. Default is to dump to a temp file.
--sampleSheet Override SampleSheet. Input tab or CSV file w/header and four named columns: barcode_name, library_name, barcode_sequence_1, barcode_sequence_2. Default is to look for a SampleSheet.csv in the inDir.
--flowcell Override flowcell ID (default: read from RunInfo.xml).
--read_structure
 Override read structure (default: read from RunInfo.xml).
--max_mismatches=0
 Picard ExtractIlluminaBarcodes MAX_MISMATCHES (default: %(default)s)
--minimum_base_quality=25
 Picard ExtractIlluminaBarcodes MINIMUM_BASE_QUALITY (default: %(default)s)
--min_mismatch_delta
 Picard ExtractIlluminaBarcodes MIN_MISMATCH_DELTA (default: %(default)s)
--max_no_calls Picard ExtractIlluminaBarcodes MAX_NO_CALLS (default: %(default)s)
--minimum_quality
 Picard ExtractIlluminaBarcodes MINIMUM_QUALITY (default: %(default)s)
--compress_outputs
 Picard ExtractIlluminaBarcodes COMPRESS_OUTPUTS (default: %(default)s)
--JVMmemory=8g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
miseq_fastq_to_bam

Convert fastq read files to a single bam file. Fastq file names must conform to patterns emitted by Miseq machines. Sample metadata must be provided in a SampleSheet.csv that corresponds to the fastq filename. Specifically, the _S##_ index in the fastq file name will be used to find the corresponding row in the SampleSheet

usage: illumina.py miseq_fastq_to_bam [-h] [--inFastq2 INFASTQ2]
                                      [--runInfo RUNINFO]
                                      [--sequencing_center SEQUENCING_CENTER]
                                      [--JVMmemory JVMMEMORY]
                                      [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                      [--version] [--tmp_dir TMP_DIR]
                                      [--tmp_dirKeep]
                                      outBam sampleSheet inFastq1
Positional arguments:
outBam Output BAM file.
sampleSheet Input SampleSheet.csv file.
inFastq1 Input fastq file; 1st end of paired-end reads if paired.
Options:
--inFastq2 Input fastq file; 2nd end of paired-end reads.
--runInfo Input RunInfo.xml file.
--sequencing_center
 Name of your sequencing center (default is the sequencing machine ID from the RunInfo.xml)
--JVMmemory=2g JVM virtual memory size (default: %(default)s)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
extract_fc_metadata

Extract RunInfo.xml and SampleSheet.csv from the provided Illumina directory

usage: illumina.py extract_fc_metadata [-h]
                                       [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                       [--version] [--tmp_dir TMP_DIR]
                                       [--tmp_dirKeep]
                                       flowcell outRunInfo outSampleSheet
Positional arguments:
flowcell Illumina directory (possibly tarball)
outRunInfo Output RunInfo.xml file.
outSampleSheet Output SampleSheet.csv file.
Options:
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmp_dir=/tmp Base directory for temp files. [default: %(default)s]
--tmp_dirKeep=False
 Keep the tmp_dir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.