Installing GTDB-Tk¶
GTDB-Tk is available through multiple sources, you only need to choose one. If you are unsure which one to choose, Bioconda is generally the easiest.
Sources¶
Alternatively, GTDB-Tk can be run online through KBase (third party). Note that the version may not be the most recent release.
Hardware requirements¶
Domain |
Memory |
Storage |
Time |
---|---|---|---|
Archaea |
~60 GB |
~106 GB |
~90 minutes / 1,000 genomes @ 64 CPUs |
Bacteria |
~90GB (545 GB when using –full_tree) |
~106 GB |
~90 minutes / 1,000 genomes @ 64 CPUs |
Note
The amount reported of memory reported can vary depending on the number of pplacer threads. See GTDB-Tk reaches the memory limit / pplacer crashes for more information.
Python libraries¶
GTDB-Tk is designed for Python >=3.6 and requires the following libraries, which will be automatically installed:
Library |
Version |
Reference |
---|---|---|
>= 4.1.0 |
Sukumaran, J. and Mark T. Holder. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26: 1569-1571. |
|
>= 1.9.0 |
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 0.1038/s41586-020-2649-2 |
|
>= 4.35.0 |
Please cite these libraries if you use GTDB-Tk in your work.
Third-party software¶
GTDB-Tk makes use of the following 3rd party dependencies and assumes they are on your system path:
Tip
The check_install command will verify that all of the programs are on the path.
Software |
Version |
Reference |
---|---|---|
>= 2.6.2 |
Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119. |
|
>= 3.1b2 |
Eddy SR. 2011. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195. |
|
>= 1.1 |
Matsen FA, et al. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538. |
|
>= 0.2.1 |
Shaw J. and Yu Y.W. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20, pages1661–1665 (2023). |
|
>= 2.1.9 |
Price MN, et al. 2010. FastTree 2 - Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One, 5, e9490. |
|
>= 2.2 |
Ondov BD, et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132. doi: doi: 10.1186/s13059-016-0997-x. |
Please cite these tools if you use GTDB-Tk in your work.
GTDB-Tk reference data¶
GTDB-Tk requires ~110G of external data that needs to be downloaded and unarchived:
For full package:
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz
wget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz ( mirror for Australia)
tar xvzf gtdbtk_data.tar.gz
For split package:
To create an archive from the GTDB-Tk release data parts:
Ensure all parts of the GTDB-Tk release data are in the same directory.
Open a terminal or command prompt.
Navigate to the directory containing the parts of the GTDB-Tk release data.
Use the following command to concatenate all parts into a single archive: cat gtdbtk_r220_data.tar.gz.part_* > gtdbtk_r220_data.tar.gz
Once the command finishes executing, you will have a single archive file named ‘gtdbtk_r220_data.tar.gz’ in the same directory.
You can find the gtdbtk_r220_data.tar.gz.part_* files under: https://data.ace.uq.edu.au/public/gtdb/data/releases/release220/220.0/auxillary_files/gtdbtk_package/split_package/gtdbtk_r220_data.tar.gz.part_aa
Note
Note that different versions of the GTDB release data may not run on all versions of GTDB-Tk, check the supported versions!
GTDB Release |
Minimum version |
Maximum version |
MD5 |
---|---|---|---|
2.4.0 |
Current |
5aafa1b9c27ceda003d75adf238ed9e0 |
|
2.1.0 |
2.3.2 |
630745840850c532546996b22da14c27 |
|
2.1.0 |
2.3.2 |
df468d63265e8096d8ca01244cb95f30 |
|
2.0.0 |
2.0.0 |
b04c55104b491f84e053a9011b36164a |
|
1.5.0 |
1.7.0 |
4986526c2b935fd4dcc2e604c0322517 |
|
1.3.0 |
1.4.2 |
06924c63f4b555ac6fd1525b09901186 |
|
0.3.0 |
0.1.2 |
82966ef36086237d7230955e2bfff759 |
|
0.2.1 |
0.2.2 |
f71408d69fa2a289f2cdc734b7a58a02 |
|
0.1.0 |
0.1.6 |
d019b3541746c3673181f24e666594ba |
|
0.0.6 |
0.0.7 |
9cf523761da843b5787f591f6c5a80de |