Big Data Analytics for GatorCloud

Project Details

Project Lead
Dapeng Wu 
Project Manager
Dapeng Wu 
Project Members
Qiuyuan Huang, Dong Wang  
Institution
University of Florida, Dept. of Electrical and Computer Engineering  
Discipline
Electrical and Related Engineering (106) 
Subdiscipline
14.09 Computer Engineering 

Abstract

Cyberinfrastructure has fundamentally transformed the landscape of many disciplines, becoming the engine of change for the next revolution. We propose the GatorCloud cyberinfrastructure to dramatically boost the Campus Research Network (CRN) to over 100Gb/s backbone networks connecting multiple primary sites of data centers across the campus and further enhance it with the cutting-edge software-defined networking (SDN) capabilities and novel cloud services. Additional SDN switches will be deployed across multiple sites of the Florida LambdaRail (FLR).

GatorCloud will strengthen Florida's participation in high-impact national and international projects, such as the CMS experiment that recently discovered the Higgs boson, the Open Science Grid, the Global Environment for Network Innovations (GENI), Kepler, and FutureGrid. These projects involve terabyte to petabyte storage and require terascale to petascale computations. With GatorCloud, UF can serve the needs of terabyte data throughput around the world for researchers and their collaborators with transformative capabilities and services.

Intellectual Merit

The proposed GatorCloud project features the following salient intellectual merits. (1) It is the first SDN-enabled campus cloud. It offers novel Network as a Service (NaaS), HPC as a Service (HPCaaS), and Cloud-in-Cloud as a Service (CCaaS) for the first time on an academic campus. (2) Leveraging the flexibility of SDN, we further propose the novel notion of application-aware network provisioning and resource management for data-intensive HPC applications. (3) In addition to existing HPC programming models supported by GatorCloud, we propose novel programming models to ease the development of complicated parallel scientific applications in a sequential manner. (4) The proposed data cloud provides an efficient common data repository with multiple user/developer interfaces (portal, client; CLI, Web Services via SOAP/REST) and toolkits.

Broader Impacts

The proposed project has transformative impacts in many aspects. (1) GatorCloud provides a unique platform for synergistic collaboration among networking, cloud, HPC, and domain scientists. (2) It offers a unique platform for conducting state-of-the-art cyberinfrastructure research for data-intensive and computation-intensive applications. Integrating with existing resources in data centers, GatorCloud presents a user-friendly interface to aggregated resources across the campus. (3) It offers a unique high-performance platform for developing and exploiting data-intensive applications in multiple high-impact research domains, e.g., high energy physics to discover Higgs particle and dark matter, computational astronomy to discover planets, and computational biology to discover evolutionary history of birds. (4) The proposed research on future Internet and future clouds is expected to have profound impacts on shaping the future of cyberinfrastructure. (5) Through this project, we can strengthen our close collaboration with FLR and other universities in Florida (e.g. FIU, a minority-serving institute). The Sunshine State education & Research Computing Alliance (SSERCA) is a newly created organization to build cyberinfrastructure in Florida for all universities in the state connected by FLR. GatorCloud offers a unique infrastructure to the SSERCA community. (6) It offers a unique resource for UF to train and educate graduate and undergraduate students in many related courses and research projects. Leveraging UF EDGE remote education program, it has positive impacts to many students in the state and beyond. (7) Interdisciplinary in nature, GatorCloud and related projects demonstrate societal relevance and significance to inspire minority students to pursue STEM degree programs. We are committed to recruit under-represented minority and female students.

Scale of Use

The start date of the NSF project is October 1, 2012, and the end date is September 30, 2014. Sometimes we may need 100 Hadoop nodes to run our algorithms.