Course: Big Data Open Source Software and Projects (Data Science Curriculum)

This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~150 software subsystems illustrated at
We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).

The course covers the following material
a) The cloud computing architecture underlying ABDS and
contrast of this with HPC.
b) The software architecture with its different layers at covering broad functionality and rationale for each layer.
c) We will give application examples
d) Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureGrid systems using OpenStack and Chef recipes.
e) Students will chose one other open source member of Kaleidoscope each and deploy as in d).
f) The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
g) Teams of up to 3 students can be formed with corresponding increase in scope in activities e), f)
