Development of an Index File System to Support Geoscience Data with Hadoop

Project Details

Project Lead
Sonali Karwa 
Project Manager
Sonali Karwa 
Project Members
Lizhe Wang, Gregor von Laszewski, Geoffrey Fox  
Indiana University, Pervasive Technology Institute  
Computer Science (401) 
14.09 Computer Engineering 


Background - Science Geographic information system (GIS) is a system that captures, stores, analyzes, manages, and presents data that are linked to location(s) [1]. GIS analysis and simulation are used to investigate and understand the environment around us. The amount of data to be processed increases as populations grow, communities become more complex, or the size of the processing area grows. As the data grows, so does the processing time required to perform simulation and analysis processes. Furthermore, the analysis for complex problems, such as research on climate change [2], uses more than one dataset, thus increasing the computational requirements even more. Background - Technology * MapReduce [3] is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original form. * Hadoop [4] is an open source frame that implements the MapReduce paradigm. In this independent study, it is planned to develop an indexed file system that stores Geoscience data, for example, a file that stores Geoscience data file name and location. This will be used to generate the pair and later for parallel GIS operations with the Hadoop framework. It is planned to use open Source GIS software, such GRASS [5] or PostGIS [6] for development. The FutureGrid project [7] is a project to develop a high-performance grid test bed that will allow scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing. This independent study will use FutureGrid as development platform.

Intellectual Merit

(provided later)

Broader Impacts

This is also an independent study project at Indiana University.

Scale of Use

Scale of use will be frequent like 5 days a week


My results are uploaded at this link :

The Password to download : futuregrid