Performance Evaluation of HDFS and FusionFS

Project ID
Project Keywords
As the amount of data is explosive increasing in various fields of arts, science and engineering. The computation of such huge data has become one of the challenges. In order to support  the execution of such computation an new infrastructure have emerged in that most of them involve distributed computing  environment  for parallel computing process among the nodes of a large distributed computing.
Distributed Computing environment handles large scale data in several applications by providing better performance. One of the well known solutions is the using of Distributed File Systems (DFSs). Distributed file system (DFS) allows users of physically distributed computers to share data and storage resources by using a common file system. As there are number of different DFS, most popularly used is HDFS (Hadoop Distributed File System).  When we consider the performance of HDFS it is compared with other distributed file system  but it is not compared with new DFS such as FusionFS. As Mapreduce is implemented on HDFS, but not on FusionFS. Hence the main goal is to run MapReduce on FusionFS and evaluate the performance of HDFS and FusionFS by considering various parameters. 
Use of FutureSystems
To set up Hadoop clusters - to be able to replace HDFS by FusionFS
Scale of Use
a few VMs for experimental set up of Hadoop cluster