Benchmarking Hadoop and Spark

Project Details

Project Lead
Karan Kotabagi 
Project Manager
Gregor Laszewski2 
Project Members
Karan Kamatgi, Manoj Joshi, Ramyashree DG  
Indiana University, Computer Science  
Computer Science (401) 


Apache Spark is a framework targeted for cluster-computing. The extremely fast processing of the spark makes it run 100 times faster than Hadoop. Spark is installed natively on the Rasberry Pi cluster with the one master and four workers. The main goal is to create a spark cluster and perform the relevant configurations in order to compare and contrast the difference in performance, ease of deployment, flexibility and scalability on various platforms like direct installation on Rasberry Pi cluster, docker and Future systems Echo.

Intellectual Merit

The project is being worked on simultaneously also with the draft of the paper.

Broader Impacts

The impact will lead to the development of the future systems with respect to the results obtained from the experiment.

Scale of Use

I want around 7 days to run the proposed project and analyze the run times on the echo systems.


This will be updated.