The Misalignment Problem: CDS Faculty Fellow Sarah Shugars on Social Media Stance vs. Public Opinion Surveys

CDS Faculty Fellow Sarah Shugars, alongside their colleagues, recently published a paper “(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys,” describing the misalignment between a person’s stance on social media and their stance as measured by public opinion polls.

Their goal with the research was to answer two questions: 1) when do annotators (people who categorize data) agree with each other on a particular categorization for a stance and 2) how well does human behavior map to other ways of inferring a person’s attitude (i.e., surveys)?

From here, they looked at four different stances, which included Donald Trump, COVID-related lockdowns, face masks, and COVID 19 vaccinations. Using these four topics, they collected survey information accordingly and extracted data from each individual’s Twitter, using annotators to guide the categorization of tweets.

For the first question, they found that the level of confidence for each annotation impacted annotator agreement. If one annotator felt very confident in their categorization, the others were sure to be more or less on the same level of confidence. However, this agreement did seem to fall off when annotators were less sure of themselves.

They then answer the following question — how accurate is the mapping of the annotated data from Twitter to the survey results? Well, it’s complicated. If a survey response was listed as not neutral, the annotators were often able to match the survey data. However, there seemed to be a disconnect between specific topics (i.e., mask mandate, vaccinations); this may have been due to a change in opinion or the possibility of adhering to social norms on social media.

Ultimately, their work concludes that stance detection can often be unreliable and does not capture the same data that public opinion polling might. The difference between the external perception that social media invites and an individual’s internal perception when answering the questions themselves plays a huge role. Keeping this in mind, this suggests some exciting interactions with this kind of data.

They will be presenting their paper at the Empirical Methods of Natural Language Processing Conference this year, one of the primary impact conferences in the research field of NLP. For further details, read the paper, coauthored with Kenneth Joseph, Ryan Gallagher, Jon Green, Alexi Quintana Mathé, Zijian An, and David Lazer, on arxiv.org.

About Sarah Shugars: They are a computational political scientist studying American political behavior and developing new methods in natural language processing, network analysis, and machine learning.

Currently, they are a CDS Moore-Sloan Faculty Fellow at NYU’s Center for Data Science (CDS) and a Research Fellow at George Washington University’s School of Media & Public Affairs. Their research focuses on how people express their political views, reason about political issues, and engage with others around matters of common concern. They received their Ph.D. from Northeastern’s Network Science program in Spring 2020.

By Keerthana Manivasakan

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.