Information Diffusion in Online Social Networks

Project Details

Project Lead
Karissa McKElvey 
Project Manager
Karissa McKElvey 
Project Members
Clayton Davis, Azadeh Nematzadeh, Lilian Weng, Emilio Ferrara, Jacob Read, Mohsen JafariAsbagh, Rohit Alekar, Giovanni Luca Ciampaglia  
Supporting Experts
Indiana University, Center for Complex Networks and Systems Research  
Computer Science (401) 
11.01 Computer and Information Sciences, General 


The focus of this research project is understanding how information propagates through complex socio-technical information networks. Leveraging large-scale behavioral trace data from online social networking platforms we are able to analyze and model the spread of information, from political discourse to market trends, in unprecedented detail. Our work to date includes a number of core research themes. Truthy is a web-based system to analyze and visualize the diffusion of information on Twitter. The Truthy system evaluates thousands of tweets an hour to identify new and emerging bursts of activity around memes of various flavors. Building on this foundation we have undertaken several analyses of political communication on Twitter, addressing political polarization and cross-ideological communication, the automated prediction of political affiliation from network and text data, and partisan asymmetries in online political engagement. Members of the Truthy team have successfully applied a custom psycholinguistic sentiment analysis framework to the problem of forecasting key market indicators, technology which now underpins the trading decisions of a $40 million investment fund. The current focus of the project is on three directions: 1. Expanding the platform to make the data more easily accessible and thus more useful to social scientists, reporters, and the general public. 2. Modeling efforts to better understand how information spreads, why some memes go viral, the role of sentiment on the diffusion process, the mutual interaction between traffic on the network and the emergent structure of the network. 3. Combining sophisticated network analysis with content and time series mining in a machine learning framework to automatically detect deceptive, coordinated attempts to spread misinformation, such as astroturf in political campaigns.

Intellectual Merit

The project is aimed at modeling the diffusion of information online and empirically discriminating among models of mechanisms driving the spread of memes. We explore why some ideas cause viral explosions while others are quickly forgotten. Our analysis goes beyond the traditional approach of applied epidemic diffusion processes and focuses on cascade size distributions and popularity time series in order to model the agents and processes driving the online diffusion of information, including: users and their topical interests, competition for user attention, and the chronological age of information. Completion of our project will result in a better understanding of information flow and could assist in elucidating the complex mechanisms that underlie a variety of human dynamics and organizations. The analysis will involve studying meme diffusion in large-scale social media by collecting and analyzing massive streams of public micro-blogging data.

Broader Impacts

The project stands to benefit both the research community and the public significantly. Our data will be made available via APIs and include information on meme propagation networks, statistical data, and relevant user and content features. The open-source platform we develop will be made publicly available and will be extensible to ever more research areas as a greater preponderance of human activities are replicated online. Additionally, we will create a web service open to the public for monitoring trends, bursts, and suspicious memes. This service could mitigate the diffusion of false and misleading ideas, detect hate speech and subversive propaganda, and assist in the preservation of open debate.

Scale of Use

I want to run a set of algorithms at predefined time intervals to update statistics for real-time information visualization tools.

Results is a website which visualizes tweets collected from Twitter specifically relating to politics, social movements, and news from the past 90 days. We would like to increase this capacaity to over a year and compute the statistics at more frequent intervals.