Just Short of Suspension: How Suspension Warnings Can Reduce Hate Speech on Twitter
With over 396 million users, Twitter is one of the largest social media platforms. As the platform continues to grow, there has been a vocal concern about addressing the increase of hate speech on the platform. The most common way of addressing hate speech on social media is by banning an individual’s account. However, banning accounts can potentially have the unintended consequence of leading users to migrating to more radical platforms.
To address this problem, NYU Politics Department PhD Candidate Mustafa Yildirim and CDS affiliated professor Jonathan Nagler, CDS associated professor Richard Bonneau, and CDS affiliated professor Joshua A. Tucker considered the effectiveness of warnings in preventing hate speech in their recent paper “Short of Suspension: How Suspension Warnings Can Reduce Hate Speech on Twitter”.
While designing effective warnings, the group drew upon three principles: costliness, credibility, and legitimacy. Costliness is defined as the influence of a community’s perceptions of how strict punishment can be in generating effective deterrence — does a particular user have more to lose with their follower base given the warning they’ve received? Credibility identifies the user’s belief in a threat occurring to prevent their unwanted behavior — is the information coming from a source of authority and will that make them more likely to listen to the warning? Legitimacy is the extent to which the act of being warned is considered as legitimate by the target — does the warning respect the target? Using the principles, they developed several warnings that emphasized each concern to varying degrees.
Credibility was critical when it came to research design. To ensure that warnings were sent only to people who could credibly believe that their account could be suspended, they limited the participant population to people who had previously used hateful language on Twitter and followed someone who used hate speech and had been recently suspended.
From here, the researchers extracted over 600,000 tweets to identify possible users for suspension and their followers. Identifying 4,327 followers, they assigned them to one of six treatment groups and a control group. All tweets sent to each treatment group were prefaced with the following sentence: “The user @[account] you follow was suspended, and I suspect that this was because of hateful language.”
However, each group in the treatment received a different message with varying degrees of intensity based on the principles of costliness, credibility, and legitimacy. The messages ranged from costliness — “If you continue to use hate speech, you might lose your posts, friends, and followers, and not get your account back. @[your_account]” — to credibility — “Twitter suspends thousands of users each month. I am a professional researcher who studies suspensions due to hate speech. My model says that you might also get suspended. @[your_account].”
Ultimately, they concluded that one warning tweet could decrease the ratio of tweets with hateful language by 10%, with certain principles like legitimacy suggesting decreases as high as 15 to 20%. The group discussed their warnings by suggesting that they could be even more effective when sent from more popular accounts, but noted that more research would need to be done to see what the effect would be from receiving a similar message from Twitter, probably the only actor capable of producing these types of warnings at scale.
To learn more, find their paper here.