Next Generation Sequencing in the Cloud

Project Details

Project Lead
Jonathan Klinginsmith 
Project Manager
Jonathan Klinginsmith 
Supporting Experts
Zhenhua Guo, Saliya Ekanayake  
Institution
Indiana University, School of Informatics  
Discipline
Computer Science (401) 
Subdiscipline
26.0613 Genetics, Plant and Animal 

Abstract

We will use this work to analyze next generation sequencing (NGS) algorithms and workflows in the cloud.

Intellectual Merit

There are many genomic data sets hosted either publicly or in clouds such as Amazon already. Many researchers have created algorithms using the Map/Reduce paradigm for pleasingly parallel algorithms. These algorithms fit nicely in clouds; however, we are also interested in understanding better how well other NGS algorithms map to clouds. Questions such as, "Are there limits to using clouds for certain algorithms?" and "Can current NGS algorithms be modified to perform well in the cloud?" are important for researchers to understand.

Broader Impacts

This work will enhance scientific understanding on how next generation sequencing (NGS) algorithms operate in cloud computing infrastructures. By performing this work, researchers will gain a better understanding on how to perform NGS algorithms and workflows in computing environments such as cloud, which provide a necessary scale of resources.

Scale of Use

I will request a few VMs for an experiment when initially testing. To perform some tests at small scale, I may request 10s of VMs for a virtual cluster. The time the VMs will run will be dependent on the analysis and/or workflow being tested.