Harp

Project Details

Project Lead
Judy Qiu 
Project Manager
Langshi Chen 
Project Members
Bo Peng, Langshi Chen, Vibhatha Abeykoon, Tony Liu, Anurag Sharma, Prawal Gangwar, Raviteja Bingi, manpreet gulia, Zhao Zhao, Sabra Ossen, Bo Feng, Bokyoon Na, Tiana Deckard, Yining Wang  
Institution
Indiana University, School of Informatics and Computing  
Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 

Abstract

Harp is an open source project that builds on Twister and Twister4Azure. We implemented it as a library that plugs into Hadoop and enables users to run complex data analysis applications on both clouds and supercomputers. We will support domain scientists using this runtime environment to conduct large-scale data analysis for biomedical science, computer vision and social science applications.

Intellectual Merit

We have shown that previous standalone enhanced versions of MapReduce can be replaced by Harp (a Hadoop plug-in) that offers both data abstractions useful for high performance iterative computation and MPI-quality communication and can drive libraries like Mahout, MLlib, DAAL and Deep Learning on HPC and Cloud systems.

Broader Impacts

The Harp library provides a common set of data abstractions and related collective communication abstractions to transform Map-Reduce programming models into Map-Collective models, thereby addressing large collectives which are a distinctive feature of data intensive and data mining applications.

Scale of Use

10~20 nodes