Automatic Extraction of Heterogeneous Parallelism from Array-based High-level Languages

Project Details

Project Lead
Arun Chauhan 
Project Manager
Arun Chauhan 
Project Members
Chun-Yu Shei, Pushkar Ratnalikar, Milinda Pathirage  
Indiana University, School of Informatics and Computing  
Computer Science (401) 


High-level array-based languages, such as MATLAB and R, provide a highly productive programming environment to scientists and engineers in a diverse set of domains. However, these languages continue to suffer from an "abstraction penalty" that hinders performance and prevents programmers from using the languages for writing production-quality code. One way to enable programmers to achieve better performance is to enable programs written in these languages to leverage the heterogeneous parallelism available on modern machines. Automatic parallelization is an attractive option for these languages for a variety of reasons. First, array-based syntax naturally conveys fine-grained parallelism. Second, due to the high-level nature of these languages their compiler analysis is often simpler. And, finally, a fully automatic approach is highly attractive to the typical users of these languages who are often not expert programmers.

For this project we propose to automatically extract coarse-grained parallelism in the form of task parallelism. Our initial studies on MATLAB and R programs have found significant potential for such parallelism. We are in the process of designing and implementing compiler algorithms to automatically extract task-parallelism in MATLAB and R programs and convert the code to C++, based on Intel Threading Building Blocks (TBB). We will use the FutureGrid resources to run experiments in order to validate our approach and obtain feedback that will help us improve our algorithms. We do not require MATLAB or R to be available on FutureGrid to be able to run our experiments, although, their availability would certainly be convenient.

Intellectual Merit

We expect to develop novel compiler analysis techniques for array-based languages through this project. Additionally, data-flow computation models have been known for a long time, however, very little is understood about automatic translation of standard imperative models to macro-data flow-style, which is what TBB enables. This research work is expected to result in new understanding of the issues and limits of the equivalence between the two models and achieving such an automatic translation.

Broader Impacts

Two PhD students working on this project will get exposure to using a shared high-performance infrastructure through the use of FutureGrid. The research outcomes will be published in leading international conferences and journals. Techniques developed through this work will be incorporated in an open-source MATLAB/R-to-C++ compiler. Some of the results may also get used directly by MathWorks through a collaboration that is currently in the process of being set up.

Scale of Use

One dedicated node at a time.