classify¶

Determine taxonomic classification of genomes.

Arguments¶

usage: gtdbtk classify (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
                       --align_dir ALIGN_DIR --out_dir OUT_DIR
                       [--skip_ani_screen] [-x EXTENSION] [--prefix PREFIX]
                       [--cpus CPUS] [--pplacer_cpus PPLACER_CPUS]
                       [--scratch_dir SCRATCH_DIR] [--genes] [-f]
                       [--min_af MIN_AF] [--tmpdir TMPDIR] [--debug] [-h]

mutually exclusive required arguments¶

--genome_dir: directory containing genome files in FASTA format
--batchfile: path to file describing genomes - tab separated in 2 or 3 columns (FASTA file, genome ID, translation table [optional])

required named arguments¶

--align_dir: output directory of ‘align’ command
--out_dir: directory to output files

Named Arguments¶

--skip_ani_screen

Skip the skani ANI screening step to classify genomes.

-x, --extension

extension of files to process, gz = gzipped

Default: “fna”

--prefix

prefix for all output files

Default: “gtdbtk”

--cpus

number of CPUs to use

Default: 1

--pplacer_cpus

number of CPUs to use during pplacer placement

--scratch_dir

reduce pplacer memory usage by writing to disk (slower).

--genes

indicates input files contain predicted proteins as amino acids (skip gene calling).Warning: This flag will skip the ANI comparison steps (ANI screen and classification).

-f, --full_tree

use the unsplit bacterial tree for the classify step; this is the original GTDB-Tk approach (version < 2) and requires more than 320 GB of RAM to load the reference tree

--min_af

minimum alignment fraction to assign genome to a species cluster

Default: 0.5

--tmpdir

specify alternative directory for temporary files

Default: “/tmp”

--debug

create intermediate files for debugging purposes

Files output¶

Example¶

Input¶

gtdbtk classify --batchfile genomes/3_batchfile.tsv --align_dir 3_align/ --out_dir 3_classify --cpus 50

Output¶

[2025-08-05 17:41:11] INFO: GTDB-Tk v2.5.0
[2025-08-05 17:41:11] INFO: gtdbtk classify --batchfile genomes/3_batchfile.tsv --align_dir 3_align/ --out_dir 3_classify --cpus 50
[2025-08-05 17:41:11] INFO: Using GTDB-Tk reference data version r226: /srv/db/gtdbtk/official/release226
[2025-08-05 17:41:13] INFO: Loading reference genomes.
[2025-08-05 17:41:13] INFO: Calculating all vs all ANI with skani v0.2.1.
[2025-08-05 17:41:14] INFO: Sketching genomes
[2025-08-05 17:42:21] INFO: Sketches done: 1min 6secs
[2025-08-05 17:42:22] INFO: Running comparisons
[2025-08-05 17:42:51] INFO: Comparisons finished, capturing results.
[2025-08-05 17:44:45] INFO: 0 genome(s) have been classified using the ANI pre-screening step.
[2025-08-05 17:44:45] TASK: Placing 3 bacterial genomes into backbone reference tree with pplacer using 50 CPUs (be patient).
[2025-08-05 17:44:45] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2025-08-05 17:48:20] INFO: Calculating RED values based on reference tree.
[2025-08-05 17:48:21] INFO: 3 out of 3 have an class assignments. Those genomes will be reclassified.
[2025-08-05 17:48:21] TASK: Placing 2 bacterial genomes into class-level reference tree 7 (1/2) with pplacer using 50 CPUs (be patient).
[2025-08-05 18:00:49] INFO: Calculating RED values based on reference tree.
[2025-08-05 18:00:54] TASK: Traversing tree to determine classification method.
[2025-08-05 18:00:54] INFO: Completed 2 genomes in 0.00 seconds (6,043.67 genomes/second).
[2025-08-05 18:00:54] TASK: Calculating average nucleotide identity using skani (v0.2.1).
[2025-08-05 18:00:55] INFO: Completed 13 comparisons in 1.18 seconds (11.02 comparisons/second).
[2025-08-05 18:00:55] INFO: 0 genome(s) have been classified using skani and pplacer.
[2025-08-05 18:00:56] TASK: Placing 1 bacterial genomes into class-level reference tree 2 (2/2) with pplacer using 50 CPUs (be patient).
[2025-08-05 18:12:59] INFO: Calculating RED values based on reference tree.
[2025-08-05 18:13:05] TASK: Traversing tree to determine classification method.
[2025-08-05 18:13:05] INFO: Completed 1 genome in 0.00 seconds (1,681.08 genomes/second).
[2025-08-05 18:13:05] INFO: 0 genome(s) have been classified using skani and pplacer.
[2025-08-05 18:13:05] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.
[2025-08-05 18:13:05] INFO: Done.