2.6. kmer_utils.py - commands for working with sets of kmers

Commands for working with sets of kmers

usage: kmer_utils.py subcommand

2.6.1. subcommands



Possible choices: build_kmer_db, dump_kmer_counts, filter_reads, kmers_binary_op, kmers_set_counts

2.6.2. Sub-commands

2.6.2.1. build_kmer_db

Build a database of kmers occurring in given sequences.

kmer_utils.py build_kmer_db [-h] [--kmerSize KMER_SIZE] [--minOccs MIN_OCCS]
                            [--maxOccs MAX_OCCS] [--counterCap COUNTER_CAP]
                            [--singleStrand] [--memLimitGb MEM_LIMIT_GB]
                            [--memLimitLaxness {0,1,2}] [--threads THREADS]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                            seq_files [seq_files ...] kmer_db

2.6.2.1.1. Positional Arguments

seq_files

Files from which to extract kmers (fasta/fastq/bam, fasta/fastq may be .gz or .bz2)

kmer_db

kmer database (with or without .kmc_pre/.kmc_suf suffix)

2.6.2.1.2. Named Arguments

--kmerSize, -k

kmer size

Default: 25

--minOccs, -ci

drop kmers with fewer than this many occurrences

Default: 1

--maxOccs, -cx

drop kmers with more than this many occurrences

Default: 2147483647

--counterCap, -cs

cap kmer counts at this value

Default: 255

--singleStrand, -b

do not add kmers from reverse complements of input sequences

Default: False

--memLimitGb

Max memory to use, in GB

Default: 8

--memLimitLaxness

Possible choices: 0, 1, 2

How strict is –memLimitGb? 0=strict, 1=lax, 2=even more lax

Default: 0

--threads

Number of threads; by default all cores are used

Default: 2

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.6.2.2. dump_kmer_counts

Dump kmers and their counts from kmer database to a text file

kmer_utils.py dump_kmer_counts [-h] [--minOccs MIN_OCCS] [--maxOccs MAX_OCCS]
                               [--threads THREADS]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               kmer_db out_kmers

2.6.2.2.1. Positional Arguments

kmer_db

kmer database (with or without .kmc_pre/.kmc_suf suffix)

out_kmers

text file to which to write the kmers

2.6.2.2.2. Named Arguments

--minOccs, -ci

drop kmers with fewer than this many occurrences

Default: 1

--maxOccs, -cx

drop kmers with more than this many occurrences

Default: 2147483647

--threads

Number of threads; by default all cores are used

Default: 2

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.6.2.3. filter_reads

Filter reads based on their kmer contents.

Can also be used to filter contigs or reference sequences, but we’ll refer to filtering of reads in the documentation.

Note that “occurrence of a kmer” means “occurrence of the kmer or its reverse complement” if kmer_db was built with single_strand==False.

Inputs:

kmer_db: the kmc kmer database in_reads: the reads to filter. can be a .fasta or .fastq or .bam; fasta or fastq can be compressed

with gzip or bzip2. If a .bam, a read pair is kept if either mate passes the filter.

Outputs:
out_reads: file to which filtered reads are written. type is determined from extension,

same types as above are supported.

Params:

db_min_occs: only consider database kmers with at least this count db_max_occs: only consider database kmers with at most this count

read_min_occs: only keep reads with at least this many occurrences of kmers from database. read_max_occs: only keep reads with no more than this many occurrence of kmers from the database. read_min_occs_frac: only keep reads with at least this many occurrences of kmers from database,

interpreted as a fraction of read length in kmers

read_max_occs_frac: only keep reads with no more than this many occurrence of kmers from the database.

interpreted as a fraction of read length in kmers.

(Note: minimal read kmer content can be given as absolute counts or fraction of read length, but not both).

hard_mask: if True, in the output reads, kmers not passing the filter are replaced by Ns threads: use this many threads

kmer_utils.py filter_reads [-h] [--dbMinOccs DB_MIN_OCCS]
                           [--dbMaxOccs DB_MAX_OCCS]
                           [--readMinOccs READ_MIN_OCCS]
                           [--readMaxOccs READ_MAX_OCCS]
                           [--readMinOccsFrac READ_MIN_OCCS_FRAC]
                           [--readMaxOccsFrac READ_MAX_OCCS_FRAC] [--hardMask]
                           [--threads THREADS]
                           [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                           [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                           kmer_db in_reads out_reads

2.6.2.3.1. Positional Arguments

kmer_db

kmer database (with or without .kmc_pre/.kmc_suf suffix)

in_reads

input reads, as fasta/fastq/bam

out_reads

output reads

2.6.2.3.2. Named Arguments

--dbMinOccs

ignore datatbase kmers with count below this

Default: 1

--dbMaxOccs

ignore datatbase kmers with count above this

Default: 2147483647

--readMinOccs

filter out reads with fewer than this many db kmers

Default: 0

--readMaxOccs

filter out reads with more than this many db kmers

Default: 2147483647

--readMinOccsFrac

filter out reads with fewer than this many db kmers, interpreted as fraction of read length

Default: 0.0

--readMaxOccsFrac

filter out reads with more than this many db kmers, interpreted as fraction of read length

Default: 1.0

--hardMask

In the output reads, mask the invalid kmers

Default: False

--threads

Number of threads; by default all cores are used

Default: 2

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.6.2.4. kmers_binary_op

Perform a simple binary operation on kmer sets.

kmer_utils.py kmers_binary_op [-h] [--resultMinOccs RESULT_MIN_OCCS]
                              [--resultMaxOccs RESULT_MAX_OCCS]
                              [--resultCounterCap RESULT_COUNTER_CAP]
                              [--threads THREADS]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                              {intersect,union,kmers_subtract,counters_subtract}
                              kmer_db1 kmer_db2 kmer_db_out

2.6.2.4.1. Positional Arguments

op

Possible choices: intersect, union, kmers_subtract, counters_subtract

binary operation to perform

kmer_db1

first kmer set

kmer_db2

second kmer set

kmer_db_out

output kmer db

2.6.2.4.2. Named Arguments

--resultMinOccs

from the result drop kmers with counts below this

Default: 1

--resultMaxOccs

from the result drop kmers with counts above this

Default: 2147483647

--resultCounterCap

cap output counters at this value

Default: 255

--threads

Number of threads; by default all cores are used

Default: 2

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False

2.6.2.5. kmers_set_counts

Copy the kmer database, setting all kmer counts in the output to the given value.

kmer_utils.py kmers_set_counts [-h] [--threads THREADS]
                               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                               [--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
                               kmer_db_in value kmer_db_out

2.6.2.5.1. Positional Arguments

kmer_db_in

input kmer db

value

all kmer counts in the output will be set to this value

kmer_db_out

output kmer db

2.6.2.5.2. Named Arguments

--threads

Number of threads; by default all cores are used

Default: 2

--loglevel

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

Verboseness of output. [default: ‘INFO’]

Default: 'INFO'

--version, -V

show program’s version number and exit

--tmp_dir

Base directory for temp files. [default: ‘/tmp’]

Default: '/tmp'

--tmp_dirKeep
Keep the tmp_dir if an exception occurs while

running. Default is to delete all temp files at the end, even if there’s a failure.

Default: False