Twister2 Yarn/Mesos Integration

Project Details

Project Lead
Ahmet Uyar 
Project Manager
Ahmet Uyar 
Project Members
Gurhan Gunduz  
Institution
Indiana University, Digital Science Center  
Discipline
Computer Science (401) 

Abstract

Twister2 is a loosely coupled component based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments. Twister2 will have a pluggable architecture for deployment on various cluster resource managers including slurm, mesos and yarn. We would like to run Twister2 jobs on Yarn/Mesos as a starting point. a) Choose a framework on top of Yarn/Mesos to interface with. I.e. Aurora, Marathon, Reef b) Write a plugin to Twister2 resource scheduler to start the containers on those systems c) Need to work on the plugin API to improve it along the way. For these purposes we will need four nodes that we can install and experiment on Yarn, Mesos, Aurora, Marathon, Reef, etc. Need to work on the plugin API to improve it along the way.

Intellectual Merit

Twister2 project foresees that the share of large-scale applications driven by data will increase rapidly in the future. The HPC community has tended to focus mostly on heavy computational-bound applications, and with these new developments, there is an opportunity to explore data-driven applications with HPC features such as high-speed interconnects and many-core machines. Data-driven computing frameworks are still in the early stages, and there are four driving application areas (streaming, data pipelines, machine learning, and service) with different processing requirements. The convergence of these application areas with a common event-driven model will be explored in Twister2 project. In this project we will explore and integrate some of the known big data resource management frameworks like yarn, and mesos with Twister2 project.

Broader Impacts

This project will be an important component of Twister2 project. It will let Twister2 Big Data Toolkit run on various cluster resource management systems.

Scale of Use

We expect this project to take months. We will be experimenting with these systems maybe daily during this time.