classify¶
Determine taxonomic classification of genomes.
Arguments¶
usage: gtdbtk classify (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
--align_dir ALIGN_DIR --out_dir OUT_DIR
[--skip_ani_screen] [-x EXTENSION] [--prefix PREFIX]
[--cpus CPUS] [--pplacer_cpus PPLACER_CPUS]
[--scratch_dir SCRATCH_DIR] [--genes] [-f]
[--min_af MIN_AF] [--tmpdir TMPDIR] [--debug] [-h]
mutually exclusive required arguments¶
- --genome_dir
directory containing genome files in FASTA format
- --batchfile
path to file describing genomes - tab separated in 2 or 3 columns (FASTA file, genome ID, translation table [optional])
required named arguments¶
- --align_dir
output directory of ‘align’ command
- --out_dir
directory to output files
Named Arguments¶
- --skip_ani_screen
Skip the skani ANI screening step to classify genomes.
- -x, --extension
extension of files to process,
gz
= gzippedDefault: “fna”
- --prefix
prefix for all output files
Default: “gtdbtk”
- --cpus
number of CPUs to use
Default: 1
- --pplacer_cpus
number of CPUs to use during pplacer placement
- --scratch_dir
reduce pplacer memory usage by writing to disk (slower).
- --genes
indicates input files contain predicted proteins as amino acids (skip gene calling).Warning: This flag will skip the ANI comparison steps (ANI screen and classification).
- -f, --full_tree
use the unsplit bacterial tree for the classify step; this is the original GTDB-Tk approach (version < 2) and requires more than 320 GB of RAM to load the reference tree
- --min_af
minimum alignment fraction to assign genome to a species cluster
Default: 0.5
- --tmpdir
specify alternative directory for temporary files
Default: “/tmp”
- --debug
create intermediate files for debugging purposes
Files output¶
- classify
- intermediate_results
ani_screen
Example¶
Input¶
gtdbtk classify --batchfile genomes/3_batchfile.tsv --align_dir 3_align/ --out_dir 3_classify --cpus 50
Output¶
[2025-08-05 17:41:11] INFO: GTDB-Tk v2.5.0
[2025-08-05 17:41:11] INFO: gtdbtk classify --batchfile genomes/3_batchfile.tsv --align_dir 3_align/ --out_dir 3_classify --cpus 50
[2025-08-05 17:41:11] INFO: Using GTDB-Tk reference data version r226: /srv/db/gtdbtk/official/release226
[2025-08-05 17:41:13] INFO: Loading reference genomes.
[2025-08-05 17:41:13] INFO: Calculating all vs all ANI with skani v0.2.1.
[2025-08-05 17:41:14] INFO: Sketching genomes
[2025-08-05 17:42:21] INFO: Sketches done: 1min 6secs
[2025-08-05 17:42:22] INFO: Running comparisons
[2025-08-05 17:42:51] INFO: Comparisons finished, capturing results.
[2025-08-05 17:44:45] INFO: 0 genome(s) have been classified using the ANI pre-screening step.
[2025-08-05 17:44:45] TASK: Placing 3 bacterial genomes into backbone reference tree with pplacer using 50 CPUs (be patient).
[2025-08-05 17:44:45] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2025-08-05 17:48:20] INFO: Calculating RED values based on reference tree.
[2025-08-05 17:48:21] INFO: 3 out of 3 have an class assignments. Those genomes will be reclassified.
[2025-08-05 17:48:21] TASK: Placing 2 bacterial genomes into class-level reference tree 7 (1/2) with pplacer using 50 CPUs (be patient).
[2025-08-05 18:00:49] INFO: Calculating RED values based on reference tree.
[2025-08-05 18:00:54] TASK: Traversing tree to determine classification method.
[2025-08-05 18:00:54] INFO: Completed 2 genomes in 0.00 seconds (6,043.67 genomes/second).
[2025-08-05 18:00:54] TASK: Calculating average nucleotide identity using skani (v0.2.1).
[2025-08-05 18:00:55] INFO: Completed 13 comparisons in 1.18 seconds (11.02 comparisons/second).
[2025-08-05 18:00:55] INFO: 0 genome(s) have been classified using skani and pplacer.
[2025-08-05 18:00:56] TASK: Placing 1 bacterial genomes into class-level reference tree 2 (2/2) with pplacer using 50 CPUs (be patient).
[2025-08-05 18:12:59] INFO: Calculating RED values based on reference tree.
[2025-08-05 18:13:05] TASK: Traversing tree to determine classification method.
[2025-08-05 18:13:05] INFO: Completed 1 genome in 0.00 seconds (1,681.08 genomes/second).
[2025-08-05 18:13:05] INFO: 0 genome(s) have been classified using skani and pplacer.
[2025-08-05 18:13:05] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.
[2025-08-05 18:13:05] INFO: Done.