Installing GTDB-Tk¶
GTDB-Tk is available through multiple sources, you only need to choose one. If you are unsure which one to choose, Bioconda is generally the easiest.
Sources¶
Alternatively, GTDB-Tk can be run online through KBase (third party). Note that the version may not be the most recent release.
Hardware requirements¶
Domain |
Memory |
Storage |
Time |
|---|---|---|---|
Archaea |
~100 GB |
~100 GB |
~90 minutes / 1,000 genomes @ 64 CPUs |
Bacteria |
~140GB (950 GB when using –full_tree) |
~100 GB |
~90 minutes / 1,000 genomes @ 64 CPUs |
Note
The amount reported of memory reported can vary depending on the number of pplacer threads. See GTDB-Tk reaches the memory limit / pplacer crashes for more information.
Python libraries¶
GTDB-Tk is designed for Python >=3.6 and requires the following libraries, which will be automatically installed:
Library |
Version |
Reference |
|---|---|---|
>= 4.1.0 |
Sukumaran, J. and Mark T. Holder. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26: 1569-1571. |
|
>= 1.9.0 |
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 0.1038/s41586-020-2649-2 |
|
>= 4.35.0 |
Please cite these libraries if you use GTDB-Tk in your work.
Third-party software¶
GTDB-Tk makes use of the following 3rd party dependencies and assumes they are on your system path:
Tip
The check_install command will verify that all of the programs are on the path.
Software |
Version |
Reference |
|---|---|---|
>= 2.6.2 |
Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119. |
|
>= 3.1b2 |
Eddy SR. 2011. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195. |
|
>= 1.1 |
Matsen FA, et al. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538. |
|
>= 0.2.1 |
Shaw J. and Yu Y.W. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20, pages1661–1665 (2023). |
|
>= 2.1.9 |
Price MN, et al. 2010. FastTree 2 - Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One, 5, e9490. |
Please cite these tools if you use GTDB-Tk in your work.
GTDB-Tk reference data¶
GTDB-Tk requires ~100G of external data (for R232) that needs to be downloaded and unarchived:
For full package:
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz
wget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz ( mirror for Australia)
tar xvzf gtdbtk_data.tar.gz
For split package: Currently the split pacakge is not available for R232. We have focused on adding more mirror and reducing the storage footprint of the package for this release.
To create an archive from the GTDB-Tk release data parts:
Ensure all parts of the GTDB-Tk release data are in the same directory.
Open a terminal or command prompt.
Navigate to the directory containing the parts of the GTDB-Tk release data.
Use the following command to concatenate all parts into a single archive: cat gtdbtk_r232_data.tar.gz.part_* > gtdbtk_r232_data.tar.gz
Once the command finishes executing, you will have a single archive file named ‘gtdbtk_r226_data.tar.gz’ in the same directory.
You can find the gtdbtk_r232_data.tar.gz.part_* files under: https://data.ace.uq.edu.au/public/gtdb/data/releases/release232/232.0/auxillary_files/gtdbtk_package/split_package/
Alias the GTDB-Tk reference data:
GTDB-Tk requires an environment variable named GTDBTK_DATA_PATH to be set to the directory
containing the unarchived reference data. This is documented under:
Note
Note that different versions of the GTDB release data may not run on all versions of GTDB-Tk, check the supported versions!
GTDB Release |
Minimum version |
Maximum version |
MD5 |
|---|---|---|---|
2.7.0 |
Current |
25a59e0352b1fd150c589f56559767d4 |
|
2.4.1 |
2.6.1 |
24b476ea5a4ef30519d461e56cc4a27f |
|
2.4.0 |
2.6.1 |
5aafa1b9c27ceda003d75adf238ed9e0 |
|
2.1.0 |
2.3.2 |
630745840850c532546996b22da14c27 |
|
2.1.0 |
2.3.2 |
df468d63265e8096d8ca01244cb95f30 |
|
2.0.0 |
2.0.0 |
b04c55104b491f84e053a9011b36164a |
|
1.5.0 |
1.7.0 |
4986526c2b935fd4dcc2e604c0322517 |
|
1.3.0 |
1.4.2 |
06924c63f4b555ac6fd1525b09901186 |
|
0.3.0 |
0.1.2 |
82966ef36086237d7230955e2bfff759 |
|
0.2.1 |
0.2.2 |
f71408d69fa2a289f2cdc734b7a58a02 |
|
0.1.0 |
0.1.6 |
d019b3541746c3673181f24e666594ba |
|
0.0.6 |
0.0.7 |
9cf523761da843b5787f591f6c5a80de |