ani_rep

Compute the ANI of input genomes to all GTDB-Tk representative genomes.

Arguments

usage: gtdbtk ani_rep (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
                      --out_dir OUT_DIR [--no_mash] [--mash_k MASH_K]
                      [--mash_s MASH_S] [--mash_d MASH_D] [--mash_v MASH_V]
                      [--mash_db MASH_DB] [--min_af MIN_AF] [-x EXTENSION]
                      [--prefix PREFIX] [--cpus CPUS] [--tmpdir TMPDIR]
                      [--debug] [-h]

mutually exclusive required arguments

--genome_dir

directory containing genome files in FASTA format

--batchfile

path to file describing genomes - tab separated in 2 or 3 columns (FASTA file, genome ID, translation table [optional])

required named arguments

--out_dir

directory to output files

optional Mash arguments

--no_mash

skip pre-filtering of genomes using Mash

--mash_k

k-mer size [1-32]

Default: 16

--mash_s

maximum number of non-redundant hashes

Default: 5000

--mash_d

maximum distance to keep [0-1]

Default: 0.15

--mash_v

maximum p-value to keep [0-1]

Default: 1.0

--mash_db

path to save/read (if exists) the Mash reference sketch database (.msh)

optional skani arguments

--min_af

minimum alignment fraction to assign genome to a species cluster

Default: 0.5

Named Arguments

-x, --extension

extension of files to process, gz = gzipped

Default: “fna”

--prefix

prefix for all output files

Default: “gtdbtk”

--cpus

number of CPUs to use

Default: 1

--tmpdir

specify alternative directory for temporary files

Default: “/tmp”

--debug

create intermediate files for debugging purposes

Example

Input

gtdbtk ani_rep --genome_dir genomes/ --out_dir ani_rep/ --cpus 70

Output

[2024-03-27 16:43:25] INFO: GTDB-Tk v2.3.2
[2024-03-27 16:43:25] INFO: gtdbtk ani_rep --batchfile genomes/500_batchfile.tsv --out_dir user_vs_reps --cpus 90
[2024-03-27 16:43:25] INFO: Using GTDB-Tk reference data version r214: /srv/db/gtdbtk/official/release214_skani/release214
[2024-03-27 16:43:25] INFO: Loading reference genomes.
[2024-03-27 16:43:25] INFO: Using Mash version 2.2.2
[2024-03-27 16:43:25] INFO: Creating Mash sketch file: user_vs_reps/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2024-03-27 16:43:27] INFO: Completed 500 genomes in 1.42 seconds (351.61 genomes/second).
[2024-03-27 16:43:27] INFO: Creating Mash sketch file: user_vs_reps/intermediate_results/mash/gtdbtk.gtdb_ref_sketch.msh
[2024-03-27 16:46:55] INFO: Completed 85,205 genomes in 3.47 minutes (24,519.48 genomes/minute).
[2024-03-27 16:46:55] INFO: Calculating Mash distances.
[2024-03-27 16:47:37] INFO: Calculating ANI with skani v0.2.1.
[2024-03-27 16:47:45] INFO: Completed 4,383 comparisons in 7.68 seconds (570.58 comparisons/second).
[2024-03-27 16:47:46] INFO: Summary of results saved to: user_vs_reps/gtdbtk.ani_summary.tsv
[2024-03-27 16:47:46] INFO: Closest representative hits saved to: user_vs_reps/gtdbtk.ani_closest.tsv
[2024-03-27 16:47:46] INFO: Done.