Improve resource utilization in MapReduce

Project Details

Project Lead
Zhenhua Guo 
Project Manager
Zhenhua Guo 
Institution
Indiana University, Pervasive Technology Institute  
Discipline
Computer Science (401) 

Abstract

Hadoop partitions physical resources into conceptual map and reduce slots to control the maximum number of tasks that can concurrently run on each slave node. We observed that this mechanism can result in low resource utilization when not all task slots on a node are used. In this project, we propose a new mechanism called resource stealing to increase resource utilization. In addition, the default mechanism to trigger speculative execution may incur the execution of many non-beneficial speculative tasks that are killed before completion. In this project, we propose Benefit Aware Speculative Execution (BASE) which reduces the number of non-beneficial speculative tasks without sacrificing performance.

Intellectual Merit

This project addresses the inefficiencies of Hadoop. Our proposed resource stealing increases resource utilization without interfering with normal Hadoop task scheduling. In addition, our proposed Benefit Aware Speculative Execution (BASE) can eliminate most of the non-beneficial speculative tasks without degrading performance.

Broader Impacts

MapReduce/Hadoop has been used by both industry and academia to run large-scale data processing applications. The proposed approaches evaluated in this project increase resource utilization, which can improve throughput. It enables users to run MapReduce jobs more efficiently, and therefore reduces job run time. So the productivity of scientists is increased because they can get results faster and tune their applications accordingly.

Scale of Use

We used 20 - 40 of bare metal machines on a periodic basis.

Results

We ran CPU-, IO-, and network-intensive applications to evaluate our algorithms. The results show resource stealing can achieve higher resource utilization and thus reduce job run time. Our BASE optimization reduces the number of non-beneficial speculative tasks significantly without incurring performance degradation. 
The detailed results of this project are presented in our paper "Improving Resource Utilization in MapReduce" [bib]ResStealAndBASE[/bib].