An exploration of disinformation through a Russian troll farm
This project is an investigation into “troll” farms and how technology is utilized to sow discord among the American public through social media. The data used in this investigation is primarily from Facebook as presented to the Intelligence House Committee by Facebook during the November 2017 open hearings on social media companies ("Exposing Russia’s Effort to Sow Discord Online," n.d.). The particular data used in this project has been extracted and compiled by the Department of Communication at the University of Maryland ("Internet Research Agency Ads," n.d.).
In order to understand disinformation and its intentional dissemination, it is important to understand trolling as it exists on the internet. An internet troll is traditionally someone who delights from sowing discord on the internet (Campbell, 2006). By provoking unsuspecting victims, trolls incite confusion, distress, and misinformation. If victims respond or retaliate, they in effect contribute to the stream of misinformation by providing a cause for response by the troll.
Trolls have existed on the internet since well before the term came into modern usage. The Wired article “The War Between alt.tasteless and rec.pets.cats” (Quittner, 1994) outlines a typical troll scenario in the early 1990s between two Usenet groups without using the term troll. alt.tasteless “trolled” rec.pets.cats through a coordinated “stealth attack,” effectively flooding their communication channels with nonsensical to dangerous posts. An active member of rec.pets.cats was threatened by members of alt.tasteless when she took proactive measures to quell the attack, receiving death threats for her actions, pointing to how trolling even in the early 1990s manifested themselves in the physical.
As the internet aged, trolling became more refined and more mainstream. The anonymity provided by forums such as 4chan gave a space for users to troll without much consequence, and more importantly with a support community (Feinberg, 2014). While trolls were a nuisance, their impact was largely contained to cyberspace and their mission generally arbitrary (however with hints of sexism, racism, and reactionary). In the 2010s, a renewed interest in trolls arose. High profile controversies such as Gamergate and Sad Puppies began impacting communities and their autonomies. Around this time, trolls were increasingly coordinating to impact various communities in pointed ways, such as the gamer community or science fiction community.
Something else began to happen around the 2010s. In 2011, Russia experienced large protests after Vladimir Putin announced his plans to run for a third term of presidency. The protests leveraged social media in order to organize and express dissent. After Putin was sworn in as president, measures were taken to reign in the perceived western slant of the internet. Adrian Chen in the New York Times article “The Agency” (2015) outlines the political situation in Russia and notes how the political administration understands this as information warfare. Measures were taken to redistribute information on the internet to make it more pro Kremlin, or at least to dilute the perceived western biases on the internet. This was done by hiring Russian citizens to work on “troll farms” by posting articles and comments on blogs, news sites, and social media in order to create an atmosphere of animosity and, more importantly, confusion.
“The Agency” also makes an important link between fake news cropping up in the US and these same Russian troll farms. Reporting on a fake news story that developed about a hazardous toxic breach in 2014, Chen connected the users and content in the accounts with similar Russian fake news stories through a company called The Internet Research Agency. Delving deeper into similar sources, Chen discovered that a number of social media groups were also connected to these Russian troll farms. A common feature was their tendency to be patriotic and their tenuous and misshapen critiques of Barack Obama (Chen, 2015).
As the 2016 United States election drew closer, Internet Research Agency’s social presence did not wane. In fact, their social media posts saw a noticeable uptick, particularly around some key election dates. For example between the dates of March and May 2016, the Internet Research Agency nearly tripled their Facebook posts. What happened? Donald Trump was declared the presumptive Republican nominee on May 3 after all other opponents dropped out of the Republican primaries. While these new posts did not garner many clicks on Facebook, the posts in October 2016 did. What happened? The presidential election was held on November 8, 2016. These posts generated a tremendous amount of impressions (views) and clicks (interactions). Again, activity increased in February 2017 following the inauguration of President Trump and again in April 2017 following the investigation of illicit Russian communication by National Security Advisor Michael Flynn.
All advertisements in this dataset were paid with Russian Rubles. October 2016, or the month leading up to the election, saw the biggest monetary investment from the troll farm, with just over a half million rubles (nearly $8,000) spent by the Internet Research Agency on Facebook advertisements. This is not an unsubstantial sum, especially considering that the ads appeared in 3 million feeds (impressions) and were engaged with 300,000 times (clicks) that same month.
A cursory glance at the timeline of the dataset shows a clear correlation between Internet Research Agency posts and United States political events. Who was targeted with these advertisements? A reasonable guess might be the “swing states,” or states that could reasonably be won by the Democratic or Republican party. However, a quick glance at the map indicates that this is not the case with the Internet Research Agency advertisements. Instead, a peculiar pattern can be observed: other than a number of ads targeted at Milwaukee, WI, (a traditional swing state) advertisements were largely focused on a few select cities, including St. Louis and Ferguson, MO with 165 ads (112 and 53 respectively); Baltimore, MD with 116 ads, New York City, NY with 116 ads, and Atlanta, GA with 94 ads. Texas was also a popular target, receiving 75 ads targeted to the state as a whole as well as 55 ads targeted at specific cities.
Missouri, Maryland, New York, and Georgia, and Texas are all squarely not swing states. Why were these states and cities targeted? As with the posts timeline, some geopolitical events might help understand this data. Ferguson, Baltimore, New York City, Milwaukee, and Cleveland all experienced high profile cases of police brutality and protests in 2014, 2015, and 2016 (Friedersdorf, 2015) (Baker, Goodman, & Mueller, 2015) (Eligon and Nolan, 2016) (Fausset, 2015) ("Tamir Rice," 2014). 2015 overall was a significant year in terms of police killings, especially towards minority communities ("2015 Police Violence Report," n.d.). This suggests that the Internet Research Agenda had an agenda that was largely propelled by these events.
In order to understand the nature of the advertisement content, topic modeling was performed on the ad descriptions. Topic modeling provides a method for “distant reading” or identifying common words or character strings across text. Advertisements were broken up into three different categories based on the posts chart: from the beginning of the dataset (June 2015) through just before Donald Trump was named the presumptive Republican nominee (March 2016); from when Trump was named the presumptive Republican nominee (April 2016) through the end of the election (December 2016); and after the election through the end of the posts (August 2017).
Five topics were extracted from the texts and loosely given categorical names: exclusion, identity, immigration, police brutality, and race. These topics were determined based on keywords extracted by the topic modeling. Exclusion refers to words that are largely around the protection of one’s own or community such as second amendment, guns, free, safe and defend. Identity is derived from keywords pertaining to sexuality, gender, and marriage. Immigration topics consist of keywords such as immigration, refugees, community, and country. Police brutality contains police, stop, brutality, and video. Finally the topic race contained words such as black, white, america/n, and people.
Displaying these topics graphically highlights some interesting trends. Before the 2016 primaries, issues such as identity, sexuality, Obama, and immigration shaped many of the Internet Research Agency advertisements. During the elections, advertisements focused on police, police brutality, and immigration. After the election, there was a noticeable uptick in exclusion topics. These topics emphasize guarding one’s self or community against others through issues such as the 2nd amendment, guns, blackmatters (a spin on Black Lives Matter), defense, and freedom. Race was a sizeable topic in each of these sections, suggesting a common theme throughout.
Paired with the aforementioned geographic targeting of the troll farm ads, perhaps this is not surprising, as many ads were directed towards communities that experienced high levels of systemic racism and recent police brutality events. Similarly, advertisements directed towards Texas, historically politically right with proximity to debates concerning immigration, could be paired with these topics as well, particularly with those centered around immigration and exclusion.
How did these troll farms sow discord? From the above analysis, an understanding of the timeline, location, and topics of the Internet Research Agency advertisements has been laid out. But how did these advertisements affect the political process in the United States? A hint comes from Chen’s research into the Internet Research Agency back in 2015. Chen discovered through interviewing past employees of the company, including a Russian activist who worked there as a mole, that the intended result of the internet presence was to essentially insert noise into the online information frontier. As the internet provided a means for any person to post, and for information from other (unwanted) sources to be accessible, one way to combat this was to inject disinformation in order to make it difficult for the consumer to parse content in any meaningful way without requiring a vast amount of overhead research.
A closer look at some of the popular Facebook advertisement supports this. The Internet Research Agency invented activist groups such as Blacktivist and Black Matters US directed at already weathered and vulnerable communities. Presumably, these were invented as an effort to dilute social movements such as Black Lives Matter, an actual activist group gaining traction at the same time (Levin, 2017). This links the activities of the troll farm to the recent awareness around “fake” news. In some cases, the advertisements succeeded in gathering enough support to spin off websites that continued well after the Internet Research Agency was eliminated (Michel, 2017), effectively diverging support from actual, well established social movements as well as causing confusion among already wearied communities.
Furthermore, building upon the advertisement tagging by students and faculty at the Department of Communication at the University of Maryland, an analysis of the tags can be performed in order to understand which advertisements did not overlap in terms of content and message. Advertisements in this dataset were tagged according to the content with the advertisement. This being the case, an investigation into which tags do and do not overlap can give us meaningful information about the advertisements. In particular, noting which tags do not overlap might suggest an intentional effort to create disparity between communities.
Building on the Blacktivist example, a comparison of tags associated with Blacktivist was compared to the set of tags appearing with the MAGA tag. The resulting network graph shows a number of tags to the left and right of the circle of connected tags; these are the tags associated with either MAGA or Blactivist. Interestingly, tags such as religion, refugees, illegals, borders, bluelivesmatter (pro law enforcement), terrorism, texas, confederate flag, being patriotic, 2nd amendment, and many more appear only with the MAGA tag. Tags such as black excellence, brown power, self made man, intoxicants, and a number of tags related to media only appeared with the Blacktivist tag. These differences suggest that content was specifically created in order to appeal to different values and further polarize those communities.
The analysis of the Facebook data as released by Facebook to the House Intelligence Committee suggests a number of things about “troll” farms, “fake” news, and disinformation. First, it is clear that Russian citizens are being hired by the Russian Government in order to create content, comments, and advertisements. Second, it is clear that these online information sources overlap neatly with political and social events in the United States. Third, specific locations are targeted that have experienced recent social unrest. Fourth, content is largely centered around politically tumultuous topics such as race, racism, immigration, firearms, police brutality, and identity. Fifth, advertisements appear to be strategically placed in order to dilute social movements and to further polarize communities.
These observations are important for understanding disinformation. The Facebook ads were specifically created with the intent to confuse, alarm, dissuade, incite, and demoralize. At minimum, they create noise in the information ecosystem that makes it harder for a citizen to parse information that pertains to the real and information that belongs to the fake. It creates a discourse of demagoguery and rhetoric that is distanced from critical awareness. With platforms such as Facebook that are built on appealing to emotion and driven by profit, a readymade infrastructure is in place, providing necessary means to rapidly and complexly distribute information, misinformation, and disinformation.
Paired with other recent revelations about the use of personal data, such as the Facebook-Cambridge Analytica data scandal ("The Data That Turned the World Upside Down," 2017) (Calabresi, 2017), the need to understand disinformation and its quest for discontent is immediate. Scholars such as Shoshana Zuboff in her capitulation of Surveillance Capitalism (2015) have made it clear that politics, the economy, and the self are largely driven by data from a few forces, Facebook being a primary player. Without scholarship and regulation around how this data is leveraged and by who, the public is at the mercy of campaigns such as Russia’s Internet Research Agency.
2015 Police Violence Report. (n.d.). Retrieved March 21, 2019, from https://mappingpoliceviolence.org/2015
Baker, A., Goodman, J. D., & Mueller, B. (2015, June 13). Beyond the Chokehold: The Path to Eric Garner’s Death. The New York Times. Retrieved from https://www.nytimes.com/2015/06/14/nyregion/eric-garner-police-chokehold-staten-island.html
Chen, A. (2015, June 2). The Agency. The New York Times. Retrieved from https://www.nytimes.com/2015/06/07/magazine/the-agency.html
Eligon, J., & Nolan, K. (2017, December 21). Angry After Milwaukee Police Shooting, Protesters Turn Against Media, Too. The New York Times. Retrieved from https://www.nytimes.com/2016/08/16/us/milwaukee-police-shooting-protests.html
EXCLUSIVE: Website targeting black Americans appears to be elaborate Russian propaganda effort. (2017, October 12). Retrieved March 21, 2019, from https://thinkprogress.org/black-matters-us-site-90625b18f262/
Exposing Russia’s Effort to Sow Discord Online: The Internet Research Agency and Advertisements Permanent Select Committee on Intelligence. (n.d.). Retrieved March 21, 2019, from https://intelligence.house.gov/social-media-content/
Fausset, R. (2017, December 21). Police Killing of Unarmed Georgia Man Leaves Another Town in Disbelief. The New York Times. Retrieved from https://www.nytimes.com/2015/03/11/us/chamblee-georgia-police-shooting-anthony-hill.html
Feinberg, A. (2014, October 30). The Birth of the Internet Troll. Retrieved March 21, 2019, from https://gizmodo.com/the-first-internet-troll-1652485292
Friedersdorf, C. (2015, April 22). The Brutality of Police Culture in Baltimore. Retrieved March 21, 2019, from https://www.theatlantic.com/politics/archive/2015/04/the-brutality-of-police-culture-in-baltimore/391158/
Inside Russia’s Social Media War on America. (2017, May 18). Retrieved March 21, 2019, from http://time.com/4783932/inside-russia-social-media-war-america/
Internet Trolls. (2006, December 10). Retrieved March 21, 2019, from https://web.archive.org/web/20061210043940/http://members.aol.com:80/intwg/trolls.htm
Levin, S. (2017, September 30). Did Russia fake black activism on Facebook to sow division in the US? The Guardian. Retrieved from https://www.theguardian.com/technology/2017/sep/30/blacktivist-facebook-account-russia-us-election
Lindblad, P., Murphy, N., Pfister, D.S., Styer, M., Summers, E., and Yang, M. Internet Research Agency Ads Dataset. Retrieved from https://mith.umd.edu/irads/data.zip
PASTERNACK, A., & Krogerus, H. G. & M. (2017, January 28). The Data That Turned the World Upside Down. Retrieved March 21, 2019, from https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win
Zuboff, S. (2015). Big other: Surveillance Capitalism and the Prospects of an Information Civilization. Journal of Information Technology, 30(1), 75–89. https://doi.org/10.1057/jit.2015.5
The data used in this investigation is primarily from Facebook as presented to the Intelligence House Committee by Facebook during the November 2017 open hearings on social media companies (https://intelligence.house.gov/social-media-content/)
The particular data used in this project has been extracted and compiled by the Department of Communication at the University of Maryland (https://mith.umd.edu/irads/data.zip). The tag matrix developed in a University of Maryland course was used prominently for this project.
While the data for this project is derived primarily from one source, the processing and presentation of the data was performed using a number of tools and platforms. This site is hosted on Github via Jekyll.
Tableau (https://www.tableau.com/) was ued to perform a variety of analyses and visualization on the data. Embedded interactive graphics are hosted on Tableau Public.
Cytoscape (https://cytoscape.org). The static images of network graphs were created using Cytoscape.
OpenRefine (http://openrefine.org/), along with Excel, were used to clean data.
RStudio (https://www.rstudio.com/) was used to flip the matrix into an edgelist for graphing purposes. It was also used to test plotting of network graphs.