MPI Java Performance Evaluation

Project Details

Project Lead
Saliya Ekanayake 
Project Manager
Saliya Ekanayake 
Project Members
Nigel Pugh, Tori Wilbon  
Institution
Virginia Tech, Network Dynamics and Simulation Science Laboratory (NDSSL)  
Discipline
Computer Science (401) 

Abstract

In the last few years, Java gain popularity in processing “big data” mostly with Apache big data stack – a collection of open source frameworks dealing with abundant data, which includes several popular systems such as Hadoop, Hadoop Distributed File System (HDFS), and Spark. Efforts have been made to introduce Java to High Performance Computing (HPC) as well in the past, but were not embraced by the community due to performance concerns. However, with continuous improvements in Java performance an increasing interest has been placed on Java message passing support in HPC. We support this idea and show its feasibility in solving real world data analytics problems. This includes performance evaluation of two MPI Java frameworks - OpenMPI and FastMPJ - for real life machine learning problems.

Intellectual Merit

Our analysis will serve as proof that large scale data analytic problems are efficiently solvable using Java ecosystem while benefiting the parallel capabilities of MPI

Broader Impacts

If we get positive results as expected in this study then we'll be able to take our algorithms to the outside scientific community giving them the opportunity to efficiently analyse their data. The algorithms contain a suite of data clustering and multi-dimensional scaling implementations.

Scale of Use

Around 20 nodes for few weeks