Distributed Mapreduce

Project Details

Project Lead
Chenyu Wang 
Project Manager
Chenyu Wang 
Project Members
Jerome Mitchell, Bingjing Zhang  
Institution
University of Minnesota, Twin Cities, Computer Science School, Distributed System Lab  
Discipline
Computer Science (401) 
Subdiscipline
11.07 Computer Science 

Abstract

Data is generated at a rapid speed at all the places in the world nowadays. This nature of widely spread of data and their geographical distance to all the data centers has brought up a big challenge for big companies data analysis and for come-as-use cloud service. We have compared the performance of using different architecture of Hadoop on some widely scattered data sets by experiments, and proposed an approach of doing the computation as close to the data as possible. We also pointed out that there could be several factors we need to balance when we have data outside the cloud which we need to do the computation on. http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf And this project is to for further experiment on the implementation of this distributed mapreduce system.

Intellectual Merit

This project will improve the research at distributed systems especially on the problems that moving data might be costly in the whole workflow of computation(like scientific data which needs to be imported into computing clusters). This project will also help build an improved Hadoop prototype which will have a better performance for widely distributed data set and it will be open sourced so that it could be used for other scientific experiments. We have previously proved it is a worthwhile problem to work at in this published paper. http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf

Broader Impacts

Included student ranged from Ph.D to undergraduate students, it is also improving collaborated research across several research group including network, database, distributed system and database. The finished software could be used for further research topics like spatial data mining and social networks.

Scale of Use

a few VMs for an experiment