Large-scale agent-based simulation of Github

Project Details

Project Lead
Pik-Mai Hui 
Project Manager
Pik-Mai Hui 
Institution
Indiana University Bloomington, School of Informatics and Computing  
Discipline
Computer Science (401) 
Subdiscipline
45.99 Social Sciences, Other 

Abstract

The project is supported by DARPA grant. Our team consists of three sub-teams of PIs and Ph.D. students from University of South California, University of Notre Dame, and Indiana University Bloomington. The goal of the project is to develop a large-scale, detailed agent-based simulation that are capable of modelling and predicting real-world events. We are now focusing on simulation using data extracted from the GHTorrent project, which contains all data records of activities on Github. We are seeking high performance computing platform that are suitable of hosting our datasets and of performing analysis on the dataset. Potential technologies we would like to use are Spark and Hadoop. A rough estimation of the size of the dataset is anywhere from 2TB to 5TB uncompressed.

Intellectual Merit

Agent-based models (ABM) are known to be hard to scale. We aim to develop our ABM to scale to real-world systems, and measure the performance of our model, in the hope of understanding the mechanism of real-world emergent phenomena.

Broader Impacts

Not only will this project be the first to scale ABM up to millions of agents, it will also lay the foundation for acquiring knowledge in social interaction and team mechanism on development of technologies.

Scale of Use

We would like enough VCore for reasonable speed of our measurements, and disk space to host our dataset locally. The project will likely last for at least two semesters, and possibly longer.

Results

Publications.