Bacterial Proteins Data Clustering and Visualization

Project Details

Project Lead
Geoffrey Fox 
Project Manager
Geoffrey Fox 
Project Members
Bo Feng, Vibhatha Abeykoon, Pulasthi Wickramasinghe  
Institution
Indiana University, Community Grids Laboratory  
Discipline
Biomedical Engineering (103) 
Subdiscipline
14.09 Computer Engineering 

Abstract

DAMDS is the deterministic annealing implementation of Multidimensional Scaling algorithm.

 

DAPWC Deterministic Annealing Pairwise Clustering (dapwc) is a scalable and parallel clustering program that operates on non vector space.

 

These are bacterial proteins that are connected by edges with associated similarities. What we are trying to accomplish eventually is two layers of clustering. First, we are grouping proteins into functional clusters, and then grouping organisms according to the functions they share.

 

We need to convert data to a set of N nodes. For each nodes i and j there are distances Dij  for a small number of j, The j for which Dij  exists need to be stored efficiently. This is distributed over nodes of computer

  1. Dij  should be symmetrized

  2. Distance is something like (HFSP Score minimum)/HFSP Score so 0 if score large and 1 if the score at the minimum value

 

Artifacts:

       https://drive.google.com/file/d/1npARS1UhFzi0wh5qIGlxA6DNV2z2yrzQ/view?usp=sharing  

 

       Functional Basis of Microorganism Classification

       Chengsheng Zhu, Tom O. Delmont, Timothy M. Vogel, Yana Bromberg

 

Intellectual Merit

This is merely for research purpose to collaborate on a biological research.

Broader Impacts

Support fellow researchers to enhance their research.

Scale of Use

For research purpose.