De novo assembly of genomes and metagenomes from next generation sequencing data

Project Details

Project Lead
Haixu Tang 
Project Manager
Haixu Tang 
Project Members
Heewook Lee, Mina Rho, Ram Podicheti, Gregory Zynda, Mingjie Wang  
Institution
Indiana University, School of Informatics and Computing  
Discipline
Biology (603) 
Subdiscipline
14.14 Environmental/Environmental Health Engineering 

Abstract

We will use the FutureGrid computing resource to assemble next-generation sequencing (NGS) reads from eukaryotic genome projects and metagenome project, including the human microbiome project and the earth microbiome project. The massive sequencing data generated by NGS sequencers have revolutionized many fields of biology, but requires extensive computing resources to be analyzed. In particular, we would like to utilize the computer clusters with large continuous RAM from the FutureGird project to test some assembly algorithms we developed for NGS data and to analyze large datasets from microbiome projects that may lead to new findings.

Intellectual Merit

Because of the nature of the large dataset, it is very time consuming to test and improve assembly algorithms for NGS data. FutureGrid resources provide a unique opportunity to test them on real large datasets. The results will be very valuable for the genomics community to develop and improve assembly algorithms.

Broader Impacts

NGS techniques have been applied to many different topics, ranging from biology to environmental sciences and new energy. The success of the proposed project will have great impact in these application areas.

Scale of Use

We need to use computer nodes with large RAM for a week.