CloVR - Cloud Virtual Resource for Automated Sequence Analysis From Your Desktop

Project ID
FG-145
Project Categories
Computer Science
NSF Grant Number
0949201
NSF Grant URL
http://nsf.gov/awardsearch/showAward.do?AwardNumber=0949201
Completed
Abstract
Background: Recently, second-generation sequencing platforms (e.g. 454, Illumina, Solid) have made genomic tools affordable and increased their popularity to the broader research community. However, demands in computational resources and lack of standardized analysis tools are increasingly representing a bottleneck in the bioinformatics analysis of large-scale sequence data. Results: Here, we present the Cloud Virtual Resource (CloVR) software package that takes advantage of two technologies, Virtual Machines and Cloud computing, to provide a new community resource for sequence analysis, suitable for large-scale sequencing projects. CloVR is available as an Open Source virtual machine at http://clovr.org and bundles pre-installed and pre-configured bioinformatics tools into automated pipelines. With the CloVR virtual machine, users have the option to run supported pipelines on their local computers and to utilize scalable on-demand Cloud computing services to perform CPU-intensive tasks on the Internet without having to install additional software. In order to support a large variety of different sequencing projects, the CloVR virtual machine is composed of separate sequence analysis tracks. Each track within CloVR is comprised of the entire suite of Open Source software tools necessary to support a fully automated analysis as required in a typical genomics project. Currently supported applications include BLAST search (CloVR-Search) single microbial whole-genome shotgun (WGS) assembly and annotation (CloVR-Microbe), metagenomic WGS assembly, gene prediction and BLAST comparison (CloVR-Metagenomics), and 16S phylogeny (CloVR-16S). CloVR currently supports VMware for local execution and the commercial Amazon EC2 Cloud (http://aws.amazon.com/ec2/) and the academic free Nimbus Science Clouds (http://www.scienceclouds.org/). Conclusion: CloVR is a genomics tool that enables any researcher with a sequencing machine and an Internet connection to perform complex and computationally demanding sequence analysis and join the genomic revolution.
Use of FutureSystems
We plan to install and test the CloVR VM on futuregrid and make it available to project collaborators and the community at large. We have done extensive testing on Amazon EC2 and two free academic clouds; DIAG (http://diagcomputing.org/) using Nimbus and the now defunct Magellan (http://magellan.alcf.anl.gov/), previously using Eukalyptus. We are eager to add futuregrid to this list.


Scale of Use
A dozen or so VMs several times a week to test and process experimental data