Wide area distributed file system for MapReduce applications on FutureGrid platform

Project ID
FG-60
Project Categories
Computer Science
Project Alumni
Fugang Wang (fuwang)
Gregor von Laszewski (gvonlasz)
Completed
Abstract
Map/Reduce is a software framework introduced by Google for processing datasets with the help of parallel processing using a large number of computer nodes. Map/Reduce has received significant attention as a programming model for various scientific problems using a wide variety of computing architectures, including large-scale compute clusters, GPGPU, and multi-core architectures. However, some data-intensive computing applications such as those used as part of the Large Hadron Collider (LHC) computing project, earth observation sciences, and biomedical applications require large-scale distributed data processing across multiple data centers connected via high speed networking. In this project, we aim to develop a software framework for Map/Reduce across distributed data centers in support of data-intensive applications.
Use of FutureSystems
We are planning to use FutureGrid as a test bed to setup several clusters, to deploy wide-area parallel file system for MapReduce applications.
Scale of Use
Distributed Hadoop clusters that across wide area, each cluster at least has 8 compute nodes