High Performance Data Analytics with DSC-SPIDAL

Project ID
Project Categories
Computer Science
Project Keywords
NSF Grant Number
Scalable parallel interoperable data analytics (SPIDAL) is a novel library being developed by Digital science center (DSC) at Indiana University. The library is available in GitHub at [1]. At its current version the library provides high performance multi-dimensional scaling, pairwise clustering, and vector clustering algorithms. Further, we provide a data visualization tool that'll enable scientists to visually evaluate and form conclusions about results in 3D. We have sucessfully analysed life science data with these in the past [2,3] and plan to extend to larger datasets with this project.

[1] https://github.com/orgs/DSC-SPIDAL
[2] Yang Ruan, Geoffrey L. House, Saliya Ekanayake, Ursel Schütte, James D. Bever, Haixu Tang and Geoffrey Fox Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions FIRST INTERNATIONAL WORKSHOP ON CLOUD FOR BIO (C4Bio 2014) to be held as part of CCGrid2014, the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Chicago Illinois May 27-29 2014
[3] Yang Ruan, Saliya Ekanayake, Mina Rho, Haixu Tang, Seung-Hee Bae, Judy Qiu, Geoffrey Fox DACIDR: Deterministic Annealed Clustering with Interpolative Dimension Reduction using a Large Collection of 16S rRNA Sequences ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB) Orlando Florida October 7-10 2012
Use of FutureSystems
FutureGrid HPC resources will be used to run SPIDAL in analyzing data
Scale of Use
We would need around 32 nodes for allocated for couple of days at a time for couple of months.