ani_rep

Compute the ANI of input genomes to all GTDB-Tk representative genomes.

Arguments

usage: gtdbtk ani_rep (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
                      --out_dir OUT_DIR [--no_mash] [--mash_k MASH_K]
                      [--mash_s MASH_S] [--mash_d MASH_D] [--mash_v MASH_V]
                      [--mash_db MASH_DB] [--min_af MIN_AF] [-x EXTENSION]
                      [--prefix PREFIX] [--cpus CPUS] [--tmpdir TMPDIR]
                      [--debug] [-h]

mutually exclusive required arguments

--genome_dir

directory containing genome files in FASTA format

--batchfile

path to file describing genomes - tab separated in 2 or 3 columns (FASTA file, genome ID, translation table [optional])

required named arguments

--out_dir

directory to output files

optional Mash arguments

--no_mash

skip pre-filtering of genomes using Mash

--mash_k

k-mer size [1-32]

Default: 16

--mash_s

maximum number of non-redundant hashes

Default: 5000

--mash_d

maximum distance to keep [0-1]

Default: 0.15

--mash_v

maximum p-value to keep [0-1]

Default: 1.0

--mash_db

path to save/read (if exists) the Mash reference sketch database (.msh)

optional FastANI arguments

--min_af

minimum alignment fraction to assign genome to a species cluster

Default: 0.5

Named Arguments

-x, --extension

extension of files to process, gz = gzipped

Default: “fna”

--prefix

prefix for all output files

Default: “gtdbtk”

--cpus

number of CPUs to use

Default: 1

--tmpdir

specify alternative directory for temporary files

Default: “/tmp”

--debug

create intermediate files for debugging purposes

Example

Input

gtdbtk ani_rep --genome_dir genomes/ --out_dir ani_rep/ --cpus 70

Output

[2020-04-13 10:51:58] INFO: GTDB-Tk v1.1.0
[2020-04-13 10:51:58] INFO: gtdbtk ani_rep --genome_dir genomes/ --out_dir ani_rep/ --cpus 70
[2020-04-13 10:51:58] INFO: Using GTDB-Tk reference data version r89: /release89
[2020-04-13 10:51:59] INFO: Using Mash version 2.2.2
[2020-04-13 10:51:59] INFO: Creating Mash sketch file: ani_rep/intermediate_results/mash/gtdbtk.user_query_sketch.msh
==> Sketching 3 of 3 (100.0%) genomes
[2020-04-13 10:51:59] INFO: Creating Mash sketch file: ani_rep/intermediate_results/mash/gtdbtk.gtdb_ref_sketch.msh
==> Sketching 24706 of 24706 (100.0%) genomes
[2020-04-13 10:53:13] INFO: Calculating Mash distances.
[2020-04-13 10:53:14] INFO: Calculating ANI with FastANI.
==> Processing 874 of 874 (100.0%) comparisons.
[2020-04-13 10:53:23] INFO: Summary of results saved to: ani_rep/gtdbtk.ani_summary.tsv
[2020-04-13 10:53:23] INFO: Closest representative hits saved to: ani_rep/gtdbtk.ani_closest.tsv
[2020-04-13 10:53:23] INFO: Done.