Given this enormous volume of social media data, analysts have come to recognize Twitter as a virtual treasure trove of information for data mining, social network analysis, and information for sensing public opinion trends and groundswells of support for (or opposition to) various political and social initiatives. Twitter Trend Topics in particular are becoming increasingly recognized as a valuable proxy for measuring public opinion.
General presidential elections were held in Brazil on October 5, 2014. No candidate received more than 50% of the vote, so a second runoff election was held on October 26th.
In the first round, Dilma Rousseff (Partido dos Trabalhadores) won 41.6% of the vote, ahead of Aécio Neves (Partido da Social Democracia Brasileira) with 33.6%, and Marina Silva (Partido Socialista Brasileiro) with 21.3%. Rousseff and Neves contested the runoff on October 26th with Rousseff being re-elected by a narrow margin, 51.6% to Neves’ 48.4%. The analysis in this article relates specifically to the October 26th runoff election.
Partido dos Trabalhadores (PT) is one of the biggest political parties in Brazil. It is the political party for the current and former presidents, Dilma Roussef and Luis Inacio Lula da Silva. Partido da Social Democracia Brasileira (PSDB) is the political party of the prior president Fernando Henrique Cardoso.
I began social media data mining by extracting Twitter Trend Topic data for the 14 Brazilian cities for which data is supplied via the Twitter API, namely: Brasília, Belém, Belo Horizonte, Curitiba, Porto Alegre, Recife, Rio de Janeiro, Salvador, São Paulo, Campinas, Fortaleza, Goiânia, Manaus, and São Luis.
I queried the Twitter REST API to get the top 10 Twitter Trend Topics for these 14 cities in a 20 minute interval (limited by some restrictions that Twitter has on its API). Limiting the query to these 14 cities is done by specifying their Yahoo! GeoPlanet WOEIDs (Where On Earth IDs).
For this proof-of-concept, I used Python and a Twitter library (cleverly called “twitter”) to get all the social network data for the day of the runoff election (Oct 26th), as well as the two days prior (Oct 24th and 25th). For each day, I performed about 70 different queries to help identify the instant trend topics.
Below is an example of the JSON object returned in response to each query (this example was based on a query for data on October 26th at 12:40:00 AM, and only shows the data for Belo Horizonte).
Social Network Theory is the study of how people, organizations, or groups interact with others inside their network. There are three primary types of social networks:
Social networks are considered complex networks, since they display non-trivial topological features, with patterns of connection between their elements that are neither purely regular nor purely random.
Social network analysis examines the structure of relationships between social entities. These entities are often people, but may also be social groups, political organizations, financial networks, residents of a community, citizens of a country, and so on. The empirical study of networks has played a central role in social science, and many of the mathematical and statistical tools used for studying networks were first developed in sociology.
To create a network using the Twitter Trend Topics, I defined the following rules:
For example, on October 26th, the cities of Fortaleza and Campinas had 11 trend topics in common, so the network for that day includes an edge between Fortaleza and Campinas with a weight of 11:
In addition, to aid the process of weighting the relationships between the cities, I also considered topics that were not related to the election itself (the premise being that cities that share other common priorities and interests may be more inclined to share the same political leanings).
Although the order of the trend topics could potentially have some significance to the analysis, for purposes of simplification of the proof-of-concept, I chose to ignore the ordering of the topics in the trend topic list.
To assist us in predicting election results, we consider not only the trend topics in common between cities, but also how the content of those topics relates to likely support for each of the two principal political parties; i.e., Partido dos Trabalhadores (PT) and Partido da Social Democracia Brasileira (PSDB).
First, I created a list of words and phrases perceived to indicate a positive leaning toward, or support for, one of the parties. (Populating this list is admittedly a highly complex task. In the context of this proof of concept, I deliberately took a simplified approach. If anything, this makes the caliber of the results all the more intriguing, since a more highly tuned list of terms and phrases would presumably further improve the accuracy of the results.)
Then, for each node, I count:
Using the city of Fortazela again as an example, I ended up with counts of:
Fortaleza['PT'] = 56
Fortaleza['PDSB'] = 37
We thereby draw the conclusion that Fortaleza residents have an overall preference for Partido dos Trabalhadores (PT).
Based on this algorithm, the analysis yields results that are surprisingly similar to the actual election results, especially when one considers the general simplicity of our approach. Here’s a comparison of the predictive results based on the Twitter Trend Topic data as compared with the real election results (red is used to represent Partido dos Trabalhadores and blue is used to represent Partido da Social Democracia Brasileira):
Improved scientific rigor, as well as more sophisticated algorithms and metrics, would undoubtedly improve the results even further.
Here are a few metrics, for example, that could be used to infer a node’s importance or influence, which could in turn inform the type of predictive analysis described in this article:
But even without that level of sophistication, the results achieved with this simple proof-of-concept provided a compelling demonstration of effective predictive analysis using Twitter Trend Topic data. There is clearly the potential to take social media data analysis even further in the future.
An accomplished software engineer, Elder specializes in machine learning and data science. He has expertise in the full life cycle of the software design process, including: requirement specifications, prototyping, proof of concept, human-interface design, implementation, testing, and maintenance.