Investigating the Apache Big Data Stack

Project Details

Project Lead
ibrahim hallac 
Project Manager
ibrahim hallac 
Project Members
Galip Aydin  
Firat University, Computer Science Department  
Computer Science (401) 


This project aims investigating Apache's Big Data technologies which can be called as Apache Big Data Stack. These projects are both replaceable and compatitive, and at the same time they can work compatible to each other. 

The steps for this project will be:

1. Deploying a cloud: Several virtual machines will be used in a small cloud preferably OpenStack which will be deployed on FutureGrid. On these virtual machines FutureGrid's platform will be explored by the accessing, managing, snapshotting them etc.

2. Installation of the Apache components: 
After having a small cloud of machines and completing the pre-installations need like JDK,SSH etc. 
--Apache Mesos will be explored by running K-Means algorithm on some sample data.
--Mahout also will be used on top of Hadoop for the purpose of comparison.

3.Results and Future Work
After the tests and small applications a knowledge of Apache's Big Data solutions and FutureGrid platform experience will be gained.  This knowledge will be used in finding  the place of Machine Learning  based Applications in the Big Data and Cloud Computing ecosystem. 

Intellectual Merit

Apache's Big Data stack keeps expanding and we will discover these technologies for the purpose of learning which/where/when to use especially for machine learning applications. This project will help us to see the big picture of Apache's Big Data projects.

Broader Impacts

This project can be a guide for the researchers who needs a platform to develop their parallel machine learning programs. Apache has many solutions and machine learning experiments using these solutions will contribute to the scientists who solve problems which consist big data.

Scale of Use

We will use several VMs for an experiment. VMs will be managed by an IaaS tool.