Course: Applied Cyberinfrastructure concepts

Project Details

Project Lead
Nirav Merchant 
Project Manager
Nirav Merchant 
Project Members
Kate Kharitonova, Jorge Rodriguez, Joseph Thibodeau, Christopher Ray, Rachel Gladysz, Kyle Beard, Joshua Chuang, Xiao Liu, Keith Crosby, Victor Uribe, Lucy Carruthers, Doug Bostic, Ian Montgomery, Jeremy Jalnos, Te Dasinger, Alex Wainwright, Bryan LaFrese, Nicholas Gizzi, Jordan Gietz, York Myers, Margaret Preston  
Institution
University of Arizona, Arizona Research Laboratories, School of Information Sciences Technology and Arts  
Discipline
Computer Science (401) 
Subdiscipline
11.01 Computer and Information Sciences, General 

Abstract

The resources provided by FutureGrid will be utilized by students enrolled in Applied Cyberinfrastructure Concepts ( ISTA 420/520 Fall 2013) at the University of Arizona. This project based learning class will introduce fundamental concepts, tools and resources for effectively managing common tasks associated with analyzing large datasets. Providing familiarity with cyberinfrastrucutre (CI) resources available at the University of Arizona campus, iPlant Collaborative, NSF XSEDE centers, FutureGrid and commercial providers such as Amazon. Students will learning to apply relevant CI skills (for final project) and develop wiki based documentation of these best practices, learning how to effectively collaborate in interdisciplinary team settings and targeting the optimal CI resources and tools for their project. The course will comprise of series of guest lectures by subject matter experts from projects that have developed widely adopted foundational Cyberinfrastrcutrue resources, followed by hands on laboratory exercises focused around those resources (some of which will be tailored for cloud resources on FutureGrid). The students will utilize these resources and gain practical experience from laboratory exercises for a final project. The final project will include data set and requirements provided by domain scientists (Genomics and Geosciences). Students will be provided access to compute resources at: UA campus clusters, iPlant Collaborative and at NSF XSEDE and FutureGrid. Students will also learn how to write a proposal for obtaining future allocation to large scale national resources through XSEDE.

Intellectual Merit

The computational challenges encountered while managing, analyzing and visualizing data are a common and recurring theme across various domains. The rapid evolution of computational capabilities ranging from cloud, hadoop, NoSQL etc have augmented the traditional HPC model, providing a wide array of choices for analyzing research data. This course is targeted to equip students from very diverse disciplines ranging from astronomy, hydrology to life sciences and CISE disciplines a broad understanding of how these computational resources and technologies can be effectively utilized for their specific research questions. This course will introduce students to opportunities, guidance and initial roadmap for gaining access to scalable computational resources and connecting with community of practitioners when working with local, regional, national and commercial infrastructure providers.

Broader Impacts

This course will provide orientation to wide array of resources that are essential in order to support and foster creative approaches for managing computational challenges across disciplines. Current roster of students include many disciplines that have not had access to course work or formal training that include topics in high throughput computing; through this course we intend to foster interdisciplinary collaborations (among students from various disciplines) while providing practical experience to students who have traditionally not had access to these capabilities. Our goal is to espouse computational thinking which is essential to prepare the current and future generation of “data scientists”, affording them the audacity and ability to scale their research.

Scale of Use

There are 24 students in the class, certain assignments will be based on FutureGrid resources and these will be conducted in groups of 4. I expect the while class to use of 4-10 VM for design/prototype (over a 2 week period) and then a scale out of 40 VM for final assignment (for the whole class). Hardware requirement will be between 2 to 4 core machines with 4 to 8 Gb RAM and 10-50 GB disk space (EBS style or some form of persistent storage) All scale out will be done under guidance and collaboration with Future Grid team once the proof of concept VM's are functional. All requirements are flexible and can be tailored to resources available