Evaluation of Hadoop for IO-intensive applications

Project Details

Project Lead
Zhenhua Guo 
Project Manager
Zhenhua Guo 
Institution
Indiana University, Pervasive Technology Institute  
Discipline
Computer Science (401) 

Abstract

One advantage of MapReduce is its data-affinity aware scheduling, which makes MapReduce more efficient than traditional HPC systems for data-intensive applications. In this project, we want to evaluate the performance of Hadoop for IO intensive applications. 

Intellectual Merit

We closely investigate how Hadoop performs to run IO-intensive applications. For Hadoop, the execution time of MapReduce jobs is impacted by many factors. We choose some important factors (e.g. input data size, the number of nodes) and measure how they impact the job run time.

Broader Impacts

This project enables Hadoop users to understand how the factors we considered influence performance. As a result, those factors can be accordingly tuned by users to maximize the performance for their specific environments.

Scale of Use

We plan to use 20-60 HPC nodes.

Results

The results are presented in detail in the file https://portal.futuregrid.org/sites/default/files/HadoopEvaluationResults.pdf.