Parallelization of heterogeneous workloads for Imaging Genomic Browser

Project Details

Project Lead
Ryan Newton 
Project Manager
Ryan Newton 
Project Members
Eric Jiang, Rebecca Swords, Sajith Sasidharan, Li Shen, Sungeun Kim, Abhishek Kulkarni, Eric Holk, Aaron Todd, Edward Amsden, Aaron Hsu  
Institution
Indiana University, School of Informatics and Computer Science  
Discipline
Computer Science (401) 

Abstract

With collaborators in the IU Medical School, we are applying our next-generation parallel programming libraries to a recent application in genome analysis, described here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3065788/ The application enables a user to explore correlations between genotypes and brain structure. It presents a challenging target for parallelization: first, workloads are dynamic, driven by a user manipulating a GUI; second, workloads include both 3D image processing and genome analysis components, the former of which is a good candidate for GPU execution. Our software framework balances parallelism between CPUs and GPUs on multiple nodes, and thus the ideal platform for evaluation of our techniques is a cluster with both GPUs and a high number of CPU cores per node (so as to simultaneously test scaling of multi-threading, distribution, and CPU/GPU partitioning). For this reason we are interested in using the new Delta cluster.

Intellectual Merit

Heterogenous distributed platforms present critical new problems to the software development process. Much recent research has attempted to addres this problem. Our particul approach uses high-level domain-specific languages that present enough information to the compiler to enable code generation for different platforms (e.g. CPU/GPU). Further, we use new dynamic load balancing techniques to manage load between nodes and between multiple CPUs and multiple GPUs. The research goals supported by this project are two-fold, corresponding to the collaborators involved: first, advances in compilers and language runtimes, and second advances in the analysis of large imaging/genomic data-sets.

Broader Impacts

The software libraries that we have been developing are already used by an open source software community (e.g. major packages on hackage.org depend on our monad-par library). All software developed in the course of this project will likewise be made available and supported.

Scale of Use

At this stage of the project we will primarily be running benchmarks to evaluate the scalability of our software. Running our benchmark suites can take from one hour to a few hours but requires exclusive access to a set of machines. The ideal for us would be able to run a bechmark suite periodically (say, every week or two) as we incrementally improve the software.