2.11. file_utils.py - utilities to perform various file manipulations
Utilities for dealing with files.
usage: file_utils.py subcommand
2.11.2. Sub-commands
2.11.2.1. merge_tarballs
Merges separate tarballs into one tarball data can be piped in and/or out
file_utils.py merge_tarballs [-h] [--extractToDiskPath EXTRACT_TO_DISK_PATH]
[--pipeInHint PIPE_HINT_IN]
[--pipeOutHint PIPE_HINT_OUT] [--threads THREADS]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version] [--tmp_dir TMP_DIR] [--tmp_dirKeep]
out_tarball in_tarballs [in_tarballs ...]
2.11.2.1.1. Positional Arguments
- out_tarball
- output tarball (.tar.gz|.tar.lz4|*.tar.bz2|*.tar.zst|-);
compression is inferred by the file extension.
- Note: if “-” is used, output will be written to stdout and
–pipeOutHint must be provided to indicate compression type when compression type is not gzip (gzip is used by default).
- in_tarballs
input tarballs (.tar.gz|.tar.lz4|*.tar.bz2|*.tar.zst)
2.11.2.1.2. Named Arguments
- --extractToDiskPath
If specified, the tar contents will also be extracted to a local directory.
- --pipeInHint
If specified, the compression type used is used for piped input.
Default:
'gz'- --pipeOutHint
If specified, the compression type used is used for piped output.
Default:
'gz'- --threads
Number of threads; by default all cores are used
Default:
2- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False
2.11.2.2. rename_fasta_sequences
Renames the sequences in a fasta file. Behavior modes: 1. If input file has exactly one sequence and suffix_always is False,
then the output file’s sequence is named new_name.
- In all other cases,
the output file’s sequences are named <new_name>-<i> where <i> is an increasing number from 1..<# of sequences>
file_utils.py rename_fasta_sequences [-h] [--suffix_always]
[--tmp_dir TMP_DIR] [--tmp_dirKeep]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
in_fasta out_fasta new_name
2.11.2.2.1. Positional Arguments
- in_fasta
input fasta sequences
- out_fasta
output (renamed) fasta sequences
- new_name
new sequence base name
2.11.2.2.2. Named Arguments
- --suffix_always
append numeric index ‘-1’ to <new_name> if only one sequence exists in <input> (default: False)
Default:
False- --tmp_dir
Base directory for temp files. [default: ‘/tmp’]
Default:
'/tmp'- --tmp_dirKeep
- Keep the tmp_dir if an exception occurs while
running. Default is to delete all temp files at the end, even if there’s a failure.
Default:
False- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.11.2.3. tsv_derived_cols
Modify metadata table to compute derivative columns on the fly and add or replace new columns
file_utils.py tsv_derived_cols [-h] [--table_map [TABLE_MAP ...]]
[--lab_highlight_loc LAB_HIGHLIGHT_LOC]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
in_tsv out_tsv
2.11.2.3.1. Positional Arguments
- in_tsv
input metadata
- out_tsv
output metadata
2.11.2.3.2. Named Arguments
- --table_map
Mapping tables. Each mapping table is a tsv with a header. The first column is the output column name for this mapping (it will be created or overwritten). The subsequent columns are matching criteria. The value in the first column is written to the output column. The exception is in the case where all match columns are ‘*’ – in this case, the value in the first column is the column header name to copy over.
- --lab_highlight_loc
This option copies the ‘originating_lab’ and ‘submitting_lab’ columns to new ones including a prefix, but only if they match certain criteria. The value of this string must be of the form prefix;col_header=value:col_header=value. For example, ‘MA;country=USA:division=Massachusetts’ will copy the originating_lab and submitting_lab columns to MA_originating_lab and MA_submitting_lab, but only for those rows where country=USA and division=Massachusetts.
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit
2.11.2.4. tsv_join
full outer join of tables
file_utils.py tsv_join [-h] --join_id JOIN_ID
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
[--version]
in_tsvs [in_tsvs ...] out_tsv
2.11.2.4.1. Positional Arguments
- in_tsvs
input tsvs
- out_tsv
output tsv
2.11.2.4.2. Named Arguments
- --join_id
column name to join on
- --loglevel
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
Verboseness of output. [default: ‘INFO’]
Default:
'INFO'- --version, -V
show program’s version number and exit