Sequence alignment for Phylogenetic Tree Generation on Big Data Set

Project Details

Project Lead
Geoffrey Fox 
Project Manager
Geoffrey Fox 
Project Members
Saliya Ekanayake, Loran Saggu, Khaliq Satchell, Bibrak Qamar Chandio, Pulasthi Wickramasinghe, Bibrak Qamar Chandio, Md Enayat Ullah  
Indiana University, Community Grids Laboratory  
Computer Science (401) 
26.01 Biology, General 


As sequences generation become faster, the computing power to process these sequences should increased as well. In our dataset, we need to handle million scale of sequences, cluster and visualize them, and find reference sequences in each cluster.
However, doing multiple sequence alignment (MSA) is still a challenge for us as it can easily overwhelmed tranditional compute nodes. We need to test different alignment method on high performance computers and possibly GPUs tryting to address the issue brought by MSA. After sequence alignment, it is possible to generate phylogenetic tree with thousands of branches and visualize that in 3D.

Intellectual Merit

We are doing experiments which makes multiple sequence alignment on and visualized thousands of sequences with length from 1000 to 5000 possible.

Broader Impacts

If we can successfully do MSA on our dataset with acceptable time usage on Futuregrid, it will be possible for us to provide such service to Biologist who needs to use Phylogenetic Tree for their research.

Scale of Use

a few nodes with faster CPU and large memory