Deep Learning framework in Python under RaPyDLI

Project Details

Project Lead
Md. Lisul Islam 
Project Manager
Md. Lisul Islam 
Project Members
Khandokar Md. Nayem  
Institution
Indiana University, Bloomington, Department of Computer Sciene  
Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 

Abstract

The Rapid Python Deep Learning Infrastructure (RaPyDLI) project is based on the objective to combine high level Python, C/C++ and Java environments with carefully designed libraries supporting GPU accelerators and MIC coprocessors. Interactive analysis and visualization will be supported together with scaling from the current terabyte size to Petabyte datasets to enable substantial progress in the complexity and capability of the DL applications. A broad range of storage models will be supported including network file systems, databases and HDFS. We aim to deploy Caffe and Tensorflow, state of the art deep learning framework on FutureSystems and look for any further algorithmic improvement in the framework

Intellectual Merit

RaPyDLI will design a prototype software framework that provides highly optimized, distributed, easy-to-use implementations of the basic primitives that underlie commonly used Deep Learning (DL) modules. This can scale the platform spectrum from desktops to today’s largest supercomputers and beyond, to machines with different accelerators such as GPU and MIC chips. Because DL modules can be mixed, matched, and combined freely, a surprisingly small set of optimized operations can be used to represent a rich array of functionality useful for attacking a range of applications

Broader Impacts

We have already identified a modest number of key routines (fewer than 100) grouped into 10 broad categories that are sufficient to implement many state-of-the-art DL algorithms and have scaled them up dramatically, with the only change in user software being a requirement to call our new libraries, as shown in the accompanying figure. Our co-design process will create a middleware infrastructure that helps connect, in a highly scalable way, a wide range of exploratory DL algorithms (many of which are already implemented for single nodes) to the high-performance operations needed for DL research on distributed HPC systems. Our approach mimics the way BLAS and BLACS provided primitives to implement and extend high-level linear algebra codes.

Scale of Use

We will need to run GPU compatible platform to measure several performance gain for several deep learning algorithms