Course: Big Data Open Source Software and Projects (Data Science Curriculum)

Project Details

Project Lead
Geoffrey Fox 
Project Manager
Sidd Maini 
Project Members
Fugang Wang, Sidd Maini, Gregor von Laszewski, Scott McCaulay, Abhik Seal, Anesu Chaora, Sriram Pulipaka, Fazle Rabbi, Naveen Madhire, Aravindh Varadharaju, Harsh Seth, Rakesh Menon, Rahul Singhania, Amritanshu Joshi, satwik narlanka, William k., Priyank Kabaria, Karthik Mohandas Bangera, Pushkar Raj, Ian Wood, Siddhardha Raju Mandapati, Hyungro Lee, Yukai Xiao  
Indiana University, Community Grids Laboratory  
Computer Science (401) 


This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~150 software subsystems illustrated at
We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).

The course covers the following material
a) The cloud computing architecture underlying ABDS and
contrast of this with HPC.
b) The software architecture with its different layers at covering broad functionality and rationale for each layer.
c) We will give application examples
d) Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureGrid systems using OpenStack and Chef recipes.
e) Students will chose one other open source member of Kaleidoscope each and deploy as in d).
f) The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
g) Teams of up to 3 students can be formed with corresponding increase in scope in activities e), f)

Intellectual Merit

One of main data science classes being offerred for first time Fall 2014 with online and residential sections

Broader Impacts

Our MOOC style ensures broad impact

Scale of Use

Modest as class