ani_rep¶
Compute the ANI of input genomes to all GTDB-Tk representative genomes.
Arguments¶
usage: gtdbtk ani_rep (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
--out_dir OUT_DIR [--no_mash] [--mash_k MASH_K]
[--mash_s MASH_S] [--mash_d MASH_D] [--mash_v MASH_V]
[--mash_db MASH_DB] [--min_af MIN_AF] [-x EXTENSION]
[--prefix PREFIX] [--cpus CPUS] [--tmpdir TMPDIR]
[--debug] [-h]
mutually exclusive required arguments¶
- --genome_dir
directory containing genome files in FASTA format
- --batchfile
path to file describing genomes - tab separated in 2 or 3 columns (FASTA file, genome ID, translation table [optional])
required named arguments¶
- --out_dir
directory to output files
optional Mash arguments¶
- --no_mash
skip pre-filtering of genomes using Mash
- --mash_k
k-mer size [1-32]
Default: 16
- --mash_s
maximum number of non-redundant hashes
Default: 5000
- --mash_d
maximum distance to keep [0-1]
Default: 0.15
- --mash_v
maximum p-value to keep [0-1]
Default: 1.0
- --mash_db
path to save/read (if exists) the Mash reference sketch database (.msh)
optional skani arguments¶
- --min_af
minimum alignment fraction to assign genome to a species cluster
Default: 0.5
Named Arguments¶
- -x, --extension
extension of files to process,
gz
= gzippedDefault: “fna”
- --prefix
prefix for all output files
Default: “gtdbtk”
- --cpus
number of CPUs to use
Default: 1
- --tmpdir
specify alternative directory for temporary files
Default: “/tmp”
- --debug
create intermediate files for debugging purposes
Files output¶
intermediate_results/mash/
Example¶
Input¶
gtdbtk ani_rep --genome_dir genomes/ --out_dir ani_rep/ --cpus 70
Output¶
[2024-03-27 16:43:25] INFO: GTDB-Tk v2.3.2
[2024-03-27 16:43:25] INFO: gtdbtk ani_rep --batchfile genomes/500_batchfile.tsv --out_dir user_vs_reps --cpus 90
[2024-03-27 16:43:25] INFO: Using GTDB-Tk reference data version r214: /srv/db/gtdbtk/official/release214_skani/release214
[2024-03-27 16:43:25] INFO: Loading reference genomes.
[2024-03-27 16:43:25] INFO: Using Mash version 2.2.2
[2024-03-27 16:43:25] INFO: Creating Mash sketch file: user_vs_reps/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2024-03-27 16:43:27] INFO: Completed 500 genomes in 1.42 seconds (351.61 genomes/second).
[2024-03-27 16:43:27] INFO: Creating Mash sketch file: user_vs_reps/intermediate_results/mash/gtdbtk.gtdb_ref_sketch.msh
[2024-03-27 16:46:55] INFO: Completed 85,205 genomes in 3.47 minutes (24,519.48 genomes/minute).
[2024-03-27 16:46:55] INFO: Calculating Mash distances.
[2024-03-27 16:47:37] INFO: Calculating ANI with skani v0.2.1.
[2024-03-27 16:47:45] INFO: Completed 4,383 comparisons in 7.68 seconds (570.58 comparisons/second).
[2024-03-27 16:47:46] INFO: Summary of results saved to: user_vs_reps/gtdbtk.ani_summary.tsv
[2024-03-27 16:47:46] INFO: Closest representative hits saved to: user_vs_reps/gtdbtk.ani_closest.tsv
[2024-03-27 16:47:46] INFO: Done.