Fault management in Map-Reduce

Project Details

Project Lead
Selvi Kadirvel 
Project Manager
Selvi Kadirvel 
Institution
University of Florida, Advanced Computing and Information Systems Lab  
Discipline
Computer Science (401) 

Abstract

The purpose of this project is to evaluate performance penalties experienced by Map-Reduce jobs in the presence of different types of injected faults. We will begin with the Hadoop implementation of Map-Reduce. Hadoop has in built fault-tolerance mechanisms. However, these mechanisms result in performance penalties in the presence of faults as indicated by prior research on our in-house clusters as well as by other recent literature. This project will enable us to make large-scale evaluations of these penalties, especially in the heterogeneous environment provided by FutureGrid.

Intellectual Merit

The ability to predict performance for distributed applications is a challenging problem.The ability to quantify performance for the case of Map-Reduce applications will enable us to propose mechanisms to overcome these penalties, enabling Map-Reduce to be more readily used for applications requiring performance guarantees.

Broader Impacts

Performance in the presence of faults is a critical goal for applications executing in enterprise data centers and cloud computing environments. The technologies to achieve this will be helpful to a wide range of communities both in academia, industry and government that use Map-Reduce for bioinformatics, text-mining, machine-learning, web-indexing, ad-analytics, etc.

Scale of Use

I would like to begin with a 16 VM cluster and be able to expand to few hundred VMs as my experiments proceed.