Cloud Peer

Project Details

Project Lead
Kiruba Karan 
Project Manager
Kiruba Karan 
Supporting Experts
Zhenhua Guo  
Institution
Anna University, ECE  
Discipline
Engineering, n.e.c. (114) 

Abstract

Scope of Research Work: Currently available data replication play a vital role in cloud computing. Our goal is to build a efficient data replication algorithm to increase the data availability and decrease the bandwidth. We would store large amount of data in Hadoop Distributed File System (HDFS) and a efficient searching algorithm to find the replicated data in the datacenter. We will use Java or phyton as our development language. We will use Map/Reduce framework provided by Hadoop. Research Objectives: We will use the resources to generate large amount of data initially and to implement the replication algorithm. Then we will pre-process and store it in a Hadoop cluster and query it using Map/Reduce programming. This itself is quite challenging and requires a lot of disk space. We will try to find the best possible way to query the data by Map/Reduce programming. Required Open Cirrus Resources: For data generation and pre-processing phase: 8 cores @ 16 GB/node main memory and 3 TB/nodes of storage. For query phase: 128 cores @ 32 GB/node main memory and 1 TB/nodes of storage in addition to data storage.

Intellectual Merit

The Availability of the data can be increased and hence the users can benifited and the cost of the replication can be decreased by using effective replications

Broader Impacts

The Availability of the data can be increased and hence the users can benifited and the cost of the replication can be decreased by using effective replications

Scale of Use

I want to run a set of comparisons on entire systems and for each I'll need about more number of days to do that