Comparison of MapReduce Systems

Project ID
Project Categories
Technology Evaluation
Project Keywords
Project Alumni
Hui Li (lihui)
We are in a Big Data era. The rapid growth of information in science requires processing of large amounts of scientific data. One proposed solution is to apply data flow languages and runtimes to data intensive applications. The sample systems include the Google MapReduce, Microsoft Dryad, and CGL Twister. In this project, we will study applicability and performance of using those runtimes to solve Big Data issue in the science.
Use of FutureSystems
Deploy runtimes such as Twister, Hadoop, Dryad on Future Grid resources.
Scale of Use
Samples of scale of use include:
1) Parallel SW-G job with 10,000 sequences on 32 nodes in Hadoop cluster.
2) Parallel Matrix Multiplication with the order of 31200 on 16 nodes in Dryad cluster.
3) Parallel Pagerank with 10GB web graph on 16 nodes in Twister cluster.