Laboratory for Cosmological Data Mining

Project Details

Project Lead
Robert Brunner 
Project Manager
Robert Brunner 
Project Members
Edward Kim, Robert Santucci, Fanshi Liu, Nick Ciaglia  
Institution
University of Illinois, Astronomy & NCSA  
Discipline
Astronomy (201) 
Subdiscipline
40.03 Astrophysics 

Abstract

We will evaluate the use of Hadoop and cloud computing in general to the task of large scale cosmological data mining. Specifically we will explore the use of Mahout classification an clustering codes to determine source classifications and distance estimate for objects detected in large photometric surveys. We also will explore the development of specific clustering measurement codes, such as the two-point correlation function to the Hadoop Map-Rduce framework. We also will look to push the machine learning tasks to the calibrated image data themselves, in order to obtain more accurate classifications.

Intellectual Merit

Our project will explore large scale data mining on the future grid system. Most algorithms that we will use are not traditional map-reduce tasks, thus we will help develop the cloud computing approach to general purpose data mining. In addition, our image data mining will help lead the way for other researchers who need to perform bulk image analysis and mining.

Broader Impacts

Beyond guiding others in our field and outside our field who may be interested in our data mining efforts, we will also be teaching students in our research group how to use future grid and Hadoop as well as specific data mining algorithms and implementations as exist in Mahout.

Scale of Use

We wish to scale as large as possible. Will try mahout on GPUs if deemed feasible as well.