Support multiple dimensional index in HDFS on the FutureGrid platform

Project Details

Project Lead
Abhijeet Kodgire 
Project Manager
Abhijeet Kodgire 
Project Members
Lizhe Wang, Gregor von Laszewski  
Institution
Indiana University, Computer Science  
Discipline
Please Select... 

Abstract

MapReduce is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms. Hadoop is an open source frame that implements the MapReduce paradigm. Hadoop File System (HDFS) currently use pair as index for data processing. The multiple requirements to process large voluminous high-dimensional data sets have emerged in many different application domains, such as Geographic Information System (GIS). In such systems there are complex data processing, such as range query, nearest neighbor query, and distance join query. Hierarchical index structures, like R-tree, is used for organize multi-dimensional data. This independent study develops a research strategy how to support R-tree liked multi-dimensional index in HDFS. It is planned to developed a HDFS upperware that can map R-tree index to pair index and efficiently handle complex data processing requirements.

Intellectual Merit

I will update it later.

Broader Impacts

This is an independent study project in Indiana University. Need to devlop upper ware for HDFS to support multi dimensional data.

Scale of Use

2-3 days a week