Bacterial Proteins Data Clustering and Visualization
DAMDS is the deterministic annealing implementation of Multidimensional Scaling algorithm.
DAPWC Deterministic Annealing Pairwise Clustering (dapwc) is a scalable and parallel clustering program that operates on non vector space.
These are bacterial proteins that are connected by edges with associated similarities. What we are trying to accomplish eventually is two layers of clustering. First, we are grouping proteins into functional clusters, and then grouping organisms according to the functions they share.
We need to convert data to a set of N nodes. For each nodes i and j there are distances Dij for a small number of j, The j for which Dij exists need to be stored efficiently. This is distributed over nodes of computer
Dij should be symmetrized
Distance is something like (HFSP Score minimum)/HFSP Score so 0 if score large and 1 if the score at the minimum value
Functional Basis of Microorganism Classification
Chengsheng Zhu, Tom O. Delmont, Timothy M. Vogel, Yana Bromberg