Bacterial Proteins Data Clustering and Visualization

Project ID
FG-546
Project Lead
Abstract

DAMDS is the deterministic annealing implementation of Multidimensional Scaling algorithm.

 

DAPWC Deterministic Annealing Pairwise Clustering (dapwc) is a scalable and parallel clustering program that operates on non vector space.

 

These are bacterial proteins that are connected by edges with associated similarities. What we are trying to accomplish eventually is two layers of clustering. First, we are grouping proteins into functional clusters, and then grouping organisms according to the functions they share.

 

We need to convert data to a set of N nodes. For each nodes i and j there are distances Dij  for a small number of j, The j for which Dij  exists need to be stored efficiently. This is distributed over nodes of computer

  1. Dij  should be symmetrized

  2. Distance is something like (HFSP Score minimum)/HFSP Score so 0 if score large and 1 if the score at the minimum value

 

Artifacts:

       https://drive.google.com/file/d/1npARS1UhFzi0wh5qIGlxA6DNV2z2yrzQ/view?usp=sharing  

 

       Functional Basis of Microorganism Classification

       Chengsheng Zhu, Tom O. Delmont, Timothy M. Vogel, Yana Bromberg

 
Use of FutureSystems
In order to test the algorithm and generate results.
Scale of Use
For research purpose.