Information Diffusion in Online Social Networks

Project ID
Project Categories
Computer Science
NSF Grant Number
The focus of this research project is understanding how information propagates through complex socio-technical information networks. Leveraging large-scale behavioral trace data from online social networking platforms we are able to analyze and model the spread of information, from political discourse to market trends, in unprecedented detail. Our work to date includes a number of core research themes. Truthy is a web-based system to analyze and visualize the diffusion of information on Twitter. The Truthy system evaluates thousands of tweets an hour to identify new and emerging bursts of activity around memes of various flavors. Building on this foundation we have undertaken several analyses of political communication on Twitter, addressing political polarization and cross-ideological communication, the automated prediction of political affiliation from network and text data, and partisan asymmetries in online political engagement. Members of the Truthy team have successfully applied a custom psycholinguistic sentiment analysis framework to the problem of forecasting key market indicators, technology which now underpins the trading decisions of a $40 million investment fund. The current focus of the project is on three directions: 1. Expanding the platform to make the data more easily accessible and thus more useful to social scientists, reporters, and the general public. 2. Modeling efforts to better understand how information spreads, why some memes go viral, the role of sentiment on the diffusion process, the mutual interaction between traffic on the network and the emergent structure of the network. 3. Combining sophisticated network analysis with content and time series mining in a machine learning framework to automatically detect deceptive, coordinated attempts to spread misinformation, such as astroturf in political campaigns.
Use of FutureSystems
We are interested in using paralleled algorithms, specifically for computing sentiment analysis and graph algorithms over large sets of twitter data. The algorithms will be computed on this cluster at predetermined time intervals and the results will then be used for visualization on our website,
Scale of Use
I want to run a set of algorithms at predefined time intervals to update statistics for real-time information visualization tools.