NEWS / UPDATES:

2014-11-12

Since publication we've been recieving may feature requests and bug reports.So far we've been concentrating on the latter and it's now time to start concentrating on the former.

The first major round of fixes are now in the main distribution version 0.3.0! One major change is a switch away from using pytables to parse BAM files and instead using a new tool called BamM. This makes the parse and extraction steps much much much faster.

We've had a few very dedicated beta testers try the new version out but as always we can't test all systems. If you find a bug, please let me know.

Of course, to take advantage of this, you'll need to reinstall GroopM :(


Install GroopM

I use and love Linux and GroopM has been developed to work on a Linux system. I'm not saying it won't work elsewhere, but I haven't tried. YMMV. People have successfully used GroopM on many different flavours of Linux as well as on Mavericks 10.9. If you try it somewhere else then let me know. I'd like to keep this list up to date.

using PIP is the recommended method as it will automaticaly install many but not all of GroopM's dependencies.

Install dependencies and use pip as much as possible

This guide assumes you're starting from a completely blank system, so it seems like there are a lot of packages to install. If you're installing this on a running bioinformatics system then many of these will already be installed. The following s what I'd type on a fresh ubuntu install

$ sudo apt-get -y install git build-essential zlib1g-dev python-numpy python-pip python-dev cython libhdf5-dev libfreetype6-dev libpng-dev python-pillow python-matplotlib libblas-dev liblapack-dev gfortran

GroopM now uses BamM to parse BAM files and produce coverage profiles. This is not available on pip so it will be need to be installed separately. Instructions are here

Next you need to install numexpr. This is straightforward if the above dependencies have been met.

Install pytables

$ sudo pip install git+https://github.com/PyTables/PyTables.git@v.3.1.1#egg=tables

Finally, install GroopM

$ sudo pip install GroopM

From source on github

If you prefer this type of thing you can always try install from source directly. You will need the following dependencies:

  • numpy >= 1.6.1
  • scipy >= 0.10.1
  • matplotlib >= 1.1.0
  • pysam >= 0.3
  • PIL >= 1.1.7
  • BamM >= 1.3.0
  • Cython
  • GTK or TK dev packages installed on your machine

Clone the repo from github:

$ git clone https://github.com/minillinim/GroopM.git

Then change into the GroopM directory and type:

$ sudo python setup.py install


Use GroopM

You can checkout our very new, ever more detailed manual, or check out the development api. If you're feeling rushed then read on:

GroopM was developed to be used in conjunction with a specific experimental design pattern. Before you try GroopM please ensure:

  • You are using a sequencing platform developed after 1987
  • You have sampled your metagenomic community at at least 3 time points / spatial positions

Still with me? Great!

Before you can use GroopM you'll need to assemble and map your reads. The general recipe is to make a co-assembly of ALL of your data using Velvet or similar. Take these contigs and map each of your read sets to them using BWA or similar. If you have N sampling points then your aim is to produce N sorted-indexed BAM files. samtools can help with this.

The typical workflow for GroopM is as follows:

  • parse - Load then contigs and coverage info into pytables
  • core - Produce a set of bins
  • refine - Make sure these bins are ok. Fix any errors you see
  • recruit - Recruit smaller contigs into your existing bins
  • extract - Extract binned contigs

GroopM was designed to be as parameter-free as possible. For more information on these steps type:

$ groopm OPTION -h


What Next?

After you've finished binning your contigs you will need to assess their quality (completeness + contamination). We suggest using our other tool: CheckM to do this.


Cite GroopM

If you use this software then we'd love you to cite us. Our paper is now published at PeerJ. You can get it here. Please cite as: "Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW. (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2:e603 http://dx.doi.org/10.7717/peerj.603".


Talk to us

All GroopM related suggestions, criticisms or abuse should be directed to Mike Imelfort. m_dot_imelfort_at_uq_dot_edu_dot_au


Licensing

GroopM is licensed using the GNU General Public License version 3 as published by the Free Software Foundation.

This site and the GroopM logo are copyright Mike Imelfort

This site was created using a template created by the wonderful people at bootswatch.