The Spread of Misinformation: Using Data to Improve Semantic, Social, and Conversational Network Modeling

In today’s world, misinformation is practically ubiquitous. But how do we even begin to solve such an extensive problem? CDS Faculty Fellow Sarah Shugars recently gave a talk on how to potentially address this issue at the Networks 2021: A Joint Sunbelt & NetSci Conference, which took place virtually July 5 – July 10. The conference was the first joint gathering of the world’s leading network societies: INSNA, a professional association for researchers who are interested in social network analysis, and the Network Science Society, an organization whose mission is to serve and represent the rapidly growing research community on network science.

Sarah’s talk/project, “Networks All the Way Down: Assessing Modeling Choices for Socio-Semantic Networks of Political Dialogue”, centers on conversations, particularly political ones. Regardless of whether online or in-person, conversations can be interpreted as “rich socio-semantic networks”, linked across multi-dimensions. People are in contact with one another via social networks and extend messages and ideas through semantic networks — which are knowledge bases that represent semantic relations between concepts within a network. Each facet of these networks presents an opportunity to better understand the implications of discourse and to develop appropriate mitigation for toxic speech and the spread of misinformation. However, current studies opt for simpler models that fail to analyze the full breadth of complexity of these networks.

To tackle this, Sarah looks at discourse on platforms such as Twitter and Reddit as well as other relevant events such as the US Congressional debates. They particularly focused on Twitter and Black Lives Matters in their talk. Using Black Lives Matter (BLM) as an example, Sarah and their team reference that there were 2,076,490 original tweets between April 1-May 30, 2021 that mentioned Black Lives Matter/BLM. The team used the Twitter API to collect all tweets “in the same conversation” as any of those original two million tweets. However, in using the new Academic Track “conversation_id” mechanism in their analysis of a standard semantic model, it returned an additional 13,932,265 tweets that were not included as part of the initial keyword search. Thus, this particular semantic network ignores the conversation level, which makes us wonder how many documents are interconnected and how many of those interconnected documents are missing.

“Discourse is fundamentally relational — how we respond and who we respond to are intimately connected to our social relationships and topical interests,” says Sarah. “Truly understanding and intervening in ‘the public discourse,’ then, requires a rich examination of the ways in which our social ties, conversational messages, and even word choice tie together as part of an ongoing, collective conversation around matters of common concern.”

Ultimately, the project presents a system for modeling the social, semantic, and conversational networks of political discourse to account for an assortment of different contexts. They demonstrate what can and cannot be extrapolated from individual network models and how these models can be utilized together to better comprehend socio-semantic patterns. “Networks All the Way Down” is still in development but will be available as a preprint this fall.

For more information on Sarah’s research, please visit

By Ashley C. McDonald

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.