This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Top Gear or Black Mirror: Inferring Political Leaning from Non-political Content

[Uncaptioned image] Ahmet  Kurnaz
Department of Public Administration and Political Science
Çanakkale Onsekiz Mart University
Çanakkale, Turkey
ahmetkurnaz@hotmail.com
&[Uncaptioned image] Scott A.  Hale
Oxford Internet Institute
University of Oxford
Oxford, United Kingdom
scott.hale@oii.ox.ac.uk
Abstract

Polarization and echo chambers are often studied in the context of explicitly political events such as elections, and little scholarship has examined the mixing of political groups in non-political contexts. A major obstacle to studying political polarization in non-political contexts is that political leaning (i.e., left vs right orientation) is often unknown. Nonetheless, political leaning is known to correlate (sometimes quite strongly) with many lifestyle choices leading to stereotypes such as the “latte-drinking liberal.” We develop a machine learning classifier to infer political leaning from non-political text and, optionally, the accounts a user follows on social media. We use Voter Advice Application results shared on Twitter as our groundtruth and train and test our classifier on a Twitter dataset comprising the 3,200 most recent tweets of each user after removing any tweets with political text. We correctly classify the political leaning of most users (F1 scores range from 0.70 to 0.85 depending on coverage). We find no relationship between the level of political activity and our classification results. We apply our classifier to a case study of news sharing in the UK and discover that, in general, the sharing of political news exhibits a distinctive left–right divide while sports news does not.

Introduction

Word choice, grammatical structures, and other linguistic features often correlate with demographic variables such as age and gender, although audience and other performative aspects can play a significant role Wang et al. (2019); Nguyen et al. (2016).

Latent attribute inference describes the field of research in computer science that tries to infer demographic and other attributes from individuals’ behaviours. While much attention has been paid to inferring gender, age, and location from social media data Wang et al. (2019); Liu et al. (2019); Zagheni et al. (2017); Zhang et al. (2016); Chen et al. (2015); Sap et al. (2014); Nguyen et al. (2013); Rosenthal and McKeown (2011); Rao et al. (2010), less work has been done on inferring political orientation. Nonetheless, political orientation is known to correlate (sometimes quite strongly) with many lifestyle choices, which leads to stereotypes such as the “latte-drinking liberal” or “bird-hunting conservative” DellaPosta et al. (2015). Individuals’ perception of their own political identities also correlates with the degree of moral value adoption and behaviour Talaifar and Swann (2019). Furthermore, political leaning may influence perceptions of non-political topics: Ahn et al. (2014) were able to infer political leaning from fMRI data in which subjects were shown non-political images as stimuli. Even academic publications and their findings contain some signals of the political leanings of their authors Jelveh et al. (2014).

It, therefore, is very likely that the content people share, the accounts they follow, and the words they use on social media may also contain traces of their political orientation and that a properly trained machine learning system will be able to infer individuals’ political orientations even in non-political contexts.

Although there are several political spectrum models Eysenck (1964); Sznajd-Weron and Sznajd (2005); Kitschelt (1994), in this study, we focus specifically on a unidimensional left–right spectrum, given its ubiquity in political science and popular culture alike.

We develop and freely share our machine learning system, which achieves an F1 score of more than 0.85 even when considering non-political content. Our ground-truth data come from individuals who used Voter Advice Applications (VAAs) during the 2015 and 2017 UK General Elections. Many respondents shared their VAA results on social media during the elections, and we use these social media accounts and their VAA results to train our system. We gather multiple datasets, including these users’ political and non-political Twitter activity.

After developing our classifier, we use the system to infer the political orientations of individuals sharing news articles to investigate the level of polarization in sharing different types of news and from different sources. Specifically, we investigate the sharing of political and sports news from The Telegraph (right-leaning), the BBC (centrist), and The Guardian (left-leaning).

We first present our classifier’s motivation, development, and evaluation before turning to the news sharing case study. Finally, we conclude on broader implications and future research directions.

Inferring political leaning

Political ideology prediction from digital trace data has become a core interest for researchers. Prior work has focused on predicting ideological stances using text data, network data, or a combination.

Social media, news outlets, and parliamentary discussions are the most popular sources of textual data: work on social media has used both users’ and politicians’ textual data Preoţiuc-Pietro et al. (2017); Conover et al. (2011). Newsgroups are also often political, and the political leanings of their text have been studied Kulkarni et al. (2018); Iyyer et al. (2014). Although legislators are already officially affiliated with a political party, several studies have classified the members of the US Congress Iyyer et al. (2014); Diermeier et al. (2012); Yu et al. (2008).

Text-based approaches focus on the words people choose to use Preoţiuc-Pietro et al. (2017), the style of their messages, and social-media specific attributes such as hashtags Weber et al. (2013); Conover et al. (2011). These approaches work well for polarised topics where the left and right use different phrases to talk about the same things: for example, “gun control” and “gun rights.”

Network approaches have used both textual elements such as hashtags, mentions, and URLs Gu et al. (2016) to create networks, and social links Barberá (2015).

Within Twitter, there is a divide between using either retweets and mentions (e.g., Conover et al., 2011; Gu et al., 2016; Hale, 2014) or followers and friends (e.g., King et al., 2016; Barberá, 2015; Golbeck and Hansen, 2014; Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Compton et al., 2014) to infer latent attributes. Networks based on mentions and retweets are often preferred over followers for two reasons. First, taking a snapshot of an extensive network is difficult, given the Twitter API rate limits. Second, retweeting and mentions, often have greater recency, can be analysed temporally, and can form weighted networks (e.g., based on how many times users mention each other). As retweeting is more publicly visible than following a user, there is also a greater social cost to retweeting another account, suggesting that users will more carefully consider which users they retweet or mention. Retweeting has been used as a sign of support (e.g., Conover et al., 2011; Wong et al., 2016), but it is important to note that some mentions, retweets, and hashtag uses will be specifically to call attention to content with which a user disagrees.

As detailed below, we examine the use both words (through topic modeling) in political and non-political text, and network features (the accounts a user follows) to estimate the political leaning of social media users.

We find network features and political text equally reveal political leaning, but network features have lower coverage. Inferring political leaning with non-political text is less accurate, but network data helps in that task and performance is on par with previous work inferring political leaning from other sources.

Ground truth data

We collected all mentions of social media accounts and URLs associated with the three most-used voter advice applications (VAAs) during the 2015 and 2017 General Elections in the United Kingdom.111We used commercial, firehose access for the 2015 election and elevated direct access during the 2017 election. We believe our dataset represents the entire population as closely as possible. VAAs ask individuals a series of political and policy issue questions to help them understand their preferences and how these align with political candidates running for office. Users of VAAs have the opportunity (but not requirement) to share their results on social media.

The three VAAs from which we collected data were “I Side With You” (ISW), “Who Should You Vote For” (WSYVF), and “Vote Match” (VM). We found 4,118 unique users who shared their VAA results: 2,975 of these users came from ISW, 794 from VM, and 387 from WSYVF. Our data only includes original tweets and not retweets of VAA result URLs. Thirty eight users shared results from multiple VAAs.

After collecting the data during the elections, we extracted all VAA result URLs in July 2018, downloaded the content of VAA result pages, and extracted users’ matches with political parties. We furthermore obtained the most recent 3,200 public tweets as of January 2019 of each user who shared a VAA result by using the user_timeline endpoint of Twitter’s public RESTful API.

All VAAs provided the voter’s alignment with individual political parties. WSYVF also gave a left–right score, but the exact details of this score are unknown. Therefore, we develop our own left–right or political leaning score using only the matches to parties, which can be applied consistently across all three VAAs.

We calculate political leaning scores by subtracting the user’s match to Labour and the Conservatives (the UK’s two largest parties).222We experimented with summing a user’s match with left-leaning parties (Labour, SNP, Greens, Plaid-Cymru, Sinn Fein) and subtracting this from the sum of the user’s match to right-leaning parties (Conservatives, UKIP, British National). This procedure produced similar results but was less transparent as not all parties are available in all constituencies. We normalize all scores into the [-1,1] interval for each platform before combining data from all the VAAs together.

Users with a political leaning score greater than zero are labelled as “right-leaning” while those with scores less than zero are labelled as “left-leaning.” We dropped all users with political leaning scores equal to zero. A small number of users shared results from multiple VAAs. We checked for consistency and removed one inconsistent user from our dataset. We created one political leaning score for users with multiple but consistent political leanings by taking the mean of their political leaning scores from the multiple VAAs.

In total, our ground-truth dataset contains 2,694 unique users from all three VAAs. We collected last 3,200 tweets from 1,921 of these users in January 2019.

We detect the language of all tweets using the Compact Language Detector 2 Sites (2013) and remove all users who had less than 75% of their most recently 3,200 tweets in English. Last, we remove users who tweeted less than 100 times. As a result, the final dataset comprises 1,760 users. Of these users, 1,396 are left-leaning, and 364 are right-leaning (Figure 1).

Refer to caption
Figure 1: Histogram of left- and right-leaning users sharing Voter Advice Application (VAA) results on Twitter.

Predictive variables

We build five datasets using users’ tweets and Twitter ‘friends’ (i.e., the accounts a user follows). The first three datasets are (1) network information, (2) political text, and (3) non-political text. The final two-hybrid datasets are constructed by joining textual and network data.

Textual data

We create two datasets of texts from tweets authored by the users. The first is a set of political text from tweets containing a political term (e.g., #ge2015, labour) primarily collected during the 2015 and 2017 elections with firehose or elevated streaming API access. The second dataset, non-political text, is mainly from users’ most recent 3,200 tweets collected with the RESTful API. We find that some recent tweets are political (e.g., discussing Brexit) and exclude tweets with political words from the non-political dataset.

Determining political tweets

To detect political tweets, we develop a two-step procedure. In the first step, we create an index by using election and non-election word occurrence frequencies. Then, we extract initial political words by using this index. Finally, we train a word embedding model to extend the political terms list in the second step.

We select words that appear in at least 250 different tweets during an election period in the initial step.333The periods for the elections as follow: GE2010 (2010-04-12 to 2010-06-06), GE2015 (2014-11-19 to 2015-06-07), and GE2017 (2017-04-18 to 2017-07-08). Then, we examine word occurrences both within and outside of the election periods. We normalize these two frequencies by using the total tweet count for each period. Finally, we divide the out-of-election number by the in-election number to develop a political index.

In our case, setting political index threshold of 0.25 yields a word list comprising 316 words with scores lower than 0.25. However, this initial list contains several ambiguous words. Therefore, we manually examined all words and removed 78 vague terms (#eurovision2015, #pensioners, youtuber, tuition, etc.) before continuing to the second step. We also add political words that were present throughout the whole period, including ‘brexit’, ‘eu’, ‘trump’, ‘clinton’, ‘sanders’, ‘johnson’, ‘gop’, and ‘referendum.’

In the second step, we use word embeddings to determine the closest terms to the political words extracted in the first step. We train a skip-gram embedding model (skip window: 5, minimum frequency: 100). Using Euclidean distance, we choose the closest three words to any political term in the list. Then, we again manually inspect all words and remove ambiguous terms. This results in 433 political words. The complete list of words is available within the supplemental materials.

We label any tweet containing a term from this final list as political. Other tweets are labelled non-political. In the end, the non-political dataset consists of 3.66 million tweets, and the political dataset contains 888 thousand tweets (Figure 2).

Refer to caption
Figure 2: Frequency of political and non-political tweets by date. There are distinctive spikes during the elections for political tweets, but also political tweets outside of this.

Network data

We collected the accounts that users in our dataset followed (their ‘friends’) on Twitter for 1,742 of the 1,760 users in our dataset who shared VAA results. Next, we create a binary matrix where the columns are Twitter accounts followed by multiple users, and the rows are the users in our dataset.

Approach

Our goal is to estimate political leaning at a per-user level, not at a per-tweet level. We, therefore, sort all tweets of each user by date and concatenate these tweets into two text documents: one for political text and one for non-political text.

To construct document-feature matrices (DFM), we remove URLs and non-alpha characters from the text. Next, we stem all words using Porter’s algorithm Porter (1980) and exclude words from the SMART stop word list Lewis et al. (2004). Finally, we create lists of unigram, bigram and trigrams as input features to our classifier. Consequently, our features consist of unigrams, bigrams and trigrams altogether, and the documents are combined tweets of each user.

To reduce the dimension of features, we remove sparse terms by setting sparsity for political dataset to 0.9 and for non-political dataset to 0.85. Consequently, the political DFM consists of 1,760 users and 4,676 features, and the non-political DFM contains 1,760 users and 6,613 features. We also remove sparse terms from the network DFM by setting sparsity to 0.88. Our network DFM consists of 1,742 users and 55 features (followed accounts).

Last but not least, our dataset contains more left-leaning than right-leaning users. Therefore we select all of the right-leaning users and an equal number of left-leaning users randomly. We combine these two subsets and shuffle the documents in the resulting DFM, which contains 728 users balanced across our two classes (left-leaning and right-leaning) in the political and non-political sample datasets. When using network data as an input on its own or in combination with text, the numbers are slightly lower as network data was not available for all users.

Dataset NB NN SVMlin SVMpoly SVMrad
net 0.27 0.75 0.73 0.75 0.75
non-pol 0.54 0.64 0.60 0.65 0.65
non-pol+net 0.59 0.70 0.68 0.71 0.72
pol 0.66 0.75 0.74 0.73 0.72
pol+net 0.67 0.69 0.66 0.71 0.67
Table 1: Average F1 scores for the training datasets. We use 10-fold cross-validation while training the models. We repeat this procedure for five different randomly selected, but balanced, samples.

network non-political non-pol+net political pol+net Classifier F1 P R F1 P R F1 P R F1 P R F1 P R Naïve Bayes 0.29 0.17 1.00 0.64 0.63 0.67 0.70 0.69 0.72 0.75 0.79 0.71 0.73 0.75 0.71 Neural Network 0.77 0.71 0.83 0.69 0.67 0.73 0.77 0.74 0.80 0.81 0.80 0.82 0.74 0.70 0.78 SVMlin 0.74 0.66 0.85 0.65 0.59 0.78 0.72 0.66 0.80 0.79 0.76 0.83 0.74 0.68 0.82 SVMpoly 0.77 0.72 0.83 0.69 0.72 0.71 0.78 0.74 0.83 0.74 0.78 0.79 0.75 0.70 0.81 SVMrad 0.78 0.74 0.83 0.70 0.81 0.67 0.75 0.70 0.81 0.74 0.77 0.78 0.73 0.65 0.83

Table 2: F1, Precision, and Recall scores for predicting left–right political leaning on the 20% fully-held out test data (averaged across five balanced samples). Five datasets are used: network friends data only, non-political tweet text, non-political and network data (non-pol+net), political tweet text only, and political text and network data (pol+net). Results are shown for five classifiers: Naïve Bayes, Neural Network, and SVMs with linear, polynomial, and radial kernels. The highest F1, precision, and recall values for each dataset (column) are in bold.

Classification procedure

Our classification procedure follows these steps: (1) We split each sample data into training (80%) and test (20%) datasets. (2) We create a topic model using only the training data. (3) We train the political leaning classifiers on the training data using 10-fold cross-validation. (4) We apply the topic model to test data, and (5) we predict the leanings of test data. We completely isolate the test dataset from both the topic generation and classifier training processes by following this procedure.

We use Structural Topic Models (STM) to create our topic models. However, we do not provide prevalence co-variates to prevent biased topic generation. Therefore, we feed the topic modelling algorithm with only text data. In this case, the algorithm becomes an implementation of the Correlated Topic Model (CTM) Roberts et al. (2019).

We treat the number of topics as a hyperparameter that we tune through cross-validation on the training data. We find 150 topics gives the most meaningful results due to the large corpora (3.8M) and long-period (approx. nine years). We use spectral initialization as advised by the creators since it generates ‘better’ and more ‘consistent’ topics by using “a spectral decomposition (non-negative matrix factorization) of the word co-occurrence matrix" Roberts et al. (2019). We extract the theta distributions to classify documents (users) when the model returns. Theta states the probability of a document belonging to a topic.

To infer political leanings and compare the results, we train and compare neural network (NN), Naïve Bayes (NB), and support-vector machine (SVM) classifiers. The SVMs are trained with linear, radial and polynomial kernels. SVM, NB and NN are three of the most popular machine learning methods to classify text, although the ‘best’ approach depends heavily on the data and the specific task being performed Wang and Manning (2012); Ng and Jordan (2001).

After performing a classification, we can apply a threshold to the probabilistic results from the classifier. If the probability of a user’s classification is above the threshold, we assign them the predicted outcome. When the probability is below the threshold, we assign a value of ‘unknown’ for each best-performing model in Table 3. The ‘Unknown’ column reports the percentage of users in each classification task with probabilities below the stated threshold.

Classification results

We first calculate F1 scores using 10-fold cross-validation on the training data (Table 1). For those results, we renew the classification with five different balanced, random samples and report the average final F1 scores across all folds and samples. We also report F1, precision, and recall scores on the entirely held out test data (Table 2).This test data had no role in creating the topic models or other features and better approximates real-world performance.

In Table 1 we compare the performance of our models on the training data. The neural network, SVMpoly and SVMrad models perform similar to each other for the network dataset. The SVMlin and neural network model performances are close on political text and network only datasets. The F1 scores for the dataset combining political text and network data are generally lower than for either alone, suggesting the signals from these two sources somewhat overlap.

In Table 2, we compare the neural network, SVM and NB model performances on the unseen test data averaged across five balanced samples. The results show the best performing classifier changes with the task. The neural network model performs best on the political dataset, but the SVMpoly classifier has the highest F1 score for both hybrid datasets. Lastly, for the non-political and network datasets the SVMrad model has the highest F1 score.

Our main objective in this paper is to understand how well political leaning can be predicted from the non-political text; so, we further explore how different probability thresholds can be applied to the best performing classifiers’ outputs and report these in Table 3.

For the non-political dataset we select the SVMrad model, and for the non-political+network dataset we select the SVMpoly model. Classifying political leaning from non-political tweets is more challenging than using political tweets. Incorporating network data helps increase coverage and accuracy in the non-political case. However, even without network data, it is possible to infer a political leaning for nearly half of the users with high performance (For a probability threshold of 0.510.51, the F1 score is 0.90.9, but 56% of users are classified as being of an unknown political leaning).

Estimating political leaning from political tweets has a similar F1 score to using network data alone. However, combining network and political text data results in lower confidence estimates (and a higher number of unknown users than either text or network data alone at each threshold).

The effect of the probability threshold for models makes the comparison easier. This threshold excludes users with results of lower certainty and marks them as unknown. The impact on the performance of each threshold is reasonably consistent across input datasets: F1 scores increase with higher thresholds, but so too do the percentages of users classified with an unknown political leaning. For example, for the non-political input dataset, a 0.62 probability threshold increases the F1 scores by up to 23 percentage points but also marks nearly 97% of users in the dataset as unknown. The effects are similar for other input datasets but less extreme.

Dataset Prob. Threshold F1 Prec. Rec. Unknown
net 0.50 0.86 0.85 0.87 0.00
(SVMrad) 0.64 0.90 0.88 0.92 0.23
0.68 0.95 0.95 0.95 0.41
non-pol 0.50 0.77 0.75 0.79 0.00
(SVMrad) 0.51 0.90 0.93 0.87 0.56
0.62 1.00 1.00 1.00 0.97
non-pol+net 0.50 0.85 0.82 0.88 0.00
(SVMpoly) 0.70 0.90 0.90 0.90 0.36
0.78 0.95 0.95 0.95 0.47
pol 0.50 0.85 0.83 0.87 0.00
(NN) 0.84 0.90 0.88 0.92 0.27
0.94 0.95 0.92 0.97 0.41
pol+net 0.50 0.81 0.75 0.87 0.00
(SVMpoly) 0.74 0.91 0.90 0.92 0.52
0.76 0.96 0.97 0.94 0.59
Table 3: Example thresholds and F1, precision, and recall scores for the best performing models (the models with the highest F1 scores in Table 2) for each dataset. Different probability thresholds are set to give base, 0.9 and 0.95 F1 scores.

Post-hoc analysis

We extract the eleven most important topics and the ten most influential Twitter accounts from our best performing non-political+network SVMpoly model.

Table 4 shows the 10 most significant Twitter accounts in the SVMpoly non-political+network model. We generate the ratio of the normalized number of left and right followers of these accounts using our ground-truth dataset, and find a higher proportion of users following @10DowningStreet, @realDonaldTrump, and @JeremyClarkson are right-leaning. The remaining accounts were followed more by left-leaning users than right-leaning users. Although most of these accounts are political, there are two non-political accounts: Jeremy Clarkson—the former Top Gear presenter—and Charlton Brooker—the creator of Black Mirror.

Account Screen Name Left (%) Right (%)
Jeremy Corbyn @jeremycorbyn 47 11
Owen Jones @OwenJones84 33 4
UK Prime Minister @10DowningStreet 17 36
Donald J. Trump @realDonaldTrump 16 39
Charlie Brooker @charltonbrooker 30 8
Caroline Lucas @CarolineLucas 25 2
Jeremy Clarkson @JeremyClarkson 15 35
The Green Party @TheGreenParty 20 3
Jon Snow @jonsnowC4 25 7
NHS Million @NHSMillion 21 3
Table 4: Most influential Twitter accounts, which are generated by the non-political+network model, sorted by their importance. The Left column displays the percentage of users in the groundtruth data labelled as left-leaning who follow the account. The Right column displays the equivalent value for right-leaning users. All ten of the accounts are verified (have a blue checkmark badge) on Twitter.

After detecting the most important topics, we assign a label to each based on the most prevalent words and estimate the effects of left-leaning and right-leaning users. Figure 3 shows the contrast between leanings. The top words belonging to these topics are displayed in Table 5 and a complete list of all topics is available in the supplemental materials. The word lists are developed in two steps. First, we list the top 10 words by using FREX, LIFT and log scores. Then we concatenate these three lists in the given order and extract the top 15 unique words.444“FREX is the weighted harmonic mean of the word’s rank in terms of exclusivity and frequency,” “LIFT weights words by dividing by their frequency in other topics, therefore giving higher weight to words that appear less frequently in other topics,” and “Score divides the log frequency of the word in the topic by the log frequency of the word in other topics” Roberts et al. (2019).

Figure 3 shows topics discussing social democratic values are more prevalent in left-leaning users. A gardening related topic is also more left-leaning while the Premier League and a pollution-related topic are more right-leaning. It should be noted that most 95% confidence intervals cross the zero point indicating any left–right distinction is subtle.

Refer to caption
Figure 3: Topic prevalence. We estimate a regression predicting the proportion of each document about each topic in the STM model using political leaning. We take a global approach in approximating the measurement uncertainty in the topic proportions. We report the topic proportion estimates and 95% confidence intervals for the 11 topics with the largest left–right differences. For more details see Roberts et al. (2019).
Topic Name Features
London/Beer unit_kingdom, kingdom, ben, unit, beer, pub, london, januari, station, tim, wimbledon, cheat, rip, slice, peak
Pollution plastic, contain, ocean, pollut, dr, usa, sea, recycl, dwp, path, locat, commit, toe, current, planet
Follow/Unfollow unfollow, automat, peopl_follow, stat, follow, check, person, ireland, mention, reach, irish, track, retweet, big_fan, joe
Technology/EU android, dj, ee, eu, app, greec, independ, mobil, scotland, ipad, euro, europ, io, beat, scottish
Entertainment beth, xx, haha, xxx, deploy, sean, artist, nicki, wire, hahaha, jay, spain, boyfriend, til, willi
Africa/Corruption ni, nigeria, lawyer, african, corrupt, parcel, bbcradio, lol, journo, arrest, wk, individu, brazil, pregnant, id
MUFC/Premier League mufc, manutd, unit, rooney, manchest_unit, utd, goal, golf, moy, mourinho, ronaldo, pogba, score, fifa, leagu
Sugar Tax frog, level, skill, c***, sugar, committe, divin, nhs, suspect, holland, hunter, jeremi, bullshit, jeremi_hunt, ww
Premier League west_ham, spur, ham, salah, everton, kane, chelsea, klopp, pogba, lukaku, goal, mourinho, tottenham, midfield, liverpool
Soc. Dem. Values foodbank, poverti, nhs, wage, worker, crisi, fund, wick, solidar, food_bank, grenfel, bank, auster, polit, privat
Gardening bloom, garden, cute, layer, kinda, ship, charact, rich, bless, hug, glad, mom, damn, lmao, gonna
Table 5: Words belonging to corresponding topics. Unique words are selected by combining top 10 words of FREX, LIFT and log scores in that order. Full topic models are in the supplemental materials.

Effect of political activity

After labelling political tweets as discussed, we develop a political activity index by dividing the number of political tweets by the number of overall tweets per user. To determine whether there is a relation between political activity and classifier performance, we apply a Pearson correlation test to the classifier probabilities and our measure of political activity. For the best performing non-political textual model (SVMrad), we find only very weak correlations between the classifier probability and political activity (r=0.15r=0.15), the number of tokens (r=0.02r=0.02), and leaning scores (r=0.02r=-0.02). The best-performing political textual model (NN) similarly has weak correlations between its predicted probabilities and political activity (r=0.21r=0.21), the number of tokens (r=0.17r=0.17), and leaning scores (r=0.13r=0.13).

Case study: Polarization and news sharing

Having developed a classifier that successfully predicts the users left–right political leaning even with non-political text as input we now seek to apply this classifier in a short case study. Social media has become intertwined with ideas of political polarization and echo chambers in popular culture Pariser (2011). There are, of course, many studies that show polarization in explicitly political contexts, such as the hyperlinks of political bloggers Adamic and Glance (2005), the sharing of political news Flaxman et al. (2016), and Twitter activity in the context of politics Barberá (2015); Hong and Kim (2016).

Despite this, there is a lack of studies examining the polarization and echo chamber hypotheses in diverse contexts (for more, see, Dubois and Blank, 2018). The Internet offers the potential to access various information sources, and enthusiasm for politics and consumption of more media sources negatively coordinate with the likelihood of being in an echo chamber Dubois and Blank (2018); Barberá et al. (2015).

In this case study, we apply our non-political classifier to the sharing of news from three UK media outlets: The Telegraph (right-leaning), the BBC (centrist), and The Guardian (left-leaning). The case study fulfils two objectives. First, it demonstrates the face validity of our classifier by matching common expectations that political news from The Telegraph should be shared chiefly by right-leaning users, political news from The Guardian should be shared mostly by left-leaning users, and the BBC should be the least polarized of the three. Second, the case study advances the scholarly conversation on polarization by examining the sharing of non-political news—namely sports news—from these three outlets. Our results show that many right-leaning users consume sports news from The Guardian, suggesting more right-leaning users engage with, at least the sports section of, The Guardian than conventional wisdom would expect.

Data and methods

We examine the number of left- and right-leaning accounts sharing news in a two-by-three design. Our three news sources are The Telegraph, the BBC, and The Guardian, and our new types are political and sports news.

We first use elevated Twitter Streaming API access to collect URL patterns related to theguardian.com, telegraph.co.uk, and bbc.co.uk. We then manually examine these URLs to identify URLs patterns that are explicitly associated with political news and sport news. For example, all URLs containing bbc.co.uk/sport/ are labelled as sports news from the BBC, and all URLs containing theguardian.com/politics/2018/ are labelled as political news from The Guardian. A complete list of URL patterns is available in the supplemental materials.

Returning to the Twitter Streaming API, we collect all instances of URLs shared matching any of our patterns between 1 September 2018 and 8 September 2018. We then query the 3,200 most recent tweets of each user and collect the friends of each user appearing in our dataset using the Twitter RESTful API statuses/user_timeline endpoint and friends/ids endpoint.

Each user’s recent tweets and friends are used as input to estimate each user’s left–right political leaning. Next, we clean the text as described in the development of our classifier and combine all tweets from a given user into one document per user to estimate the political leaning of that user. The DFMs result in extremely sparse matrices (e.g., approx. 25M features). We, therefore, remove terms whose total frequency is lower than three, which makes the DFM dimensions computationally-feasible (e.g., approx. 5M features). Then, we extract features which are not present in our non-political DFM. As a result, we also remove political terms in this step. In the last step, we fit the non-political topic model to the news data to predict the political leaning of each user.

Our setup matches the non-political + network approach described in the main article. To classify our users sharing political and sports news, we select the SVMpoly classifier that yielded the optimal performance in the non-political + network task and set probability threshold to 0.7.

News sharing results

The results of our classification are shown in Table 6. The results conform with our hypothesis that the political leaning of users would match the political-leaning of the news publishers for political news sharing. For example, among users sharing political news who were classified as left-leaning, over 75% shared articles from the left-leaning Guardian. Similarly, over half of the users are classified as right-leaning who shared political news links from the right-leaning Telegraph.

In contrast to politics, we find The Guardian is the most popular source for sports news among users classified as either left- or right-leaning. In general right-leaning users share more sports than left-leaning users as a whole. Even so, the number of right-leaning users sharing sports news from The Telegraph is 8.5 times higher than the number of left-leaning users sharing articles from that source—the largest proportional difference in the dataset. We suspect sports news sharing is influenced by availability: The Telegraph has a paywall while the BBC and The Guardian do not. So it makes sense that subscribers paying for The Telegraph would get both political and sports news from it but that non-subscribers would look elsewhere. Many right-leaning users share sports news from the left-leaning, but free-to-access, Guardian suggests factors beyond political leaning such as availability and cost heavily influence the news sources people read.

Overall, the case study shows our classifier’s face-validity being able to detect the expected leanings of users sharing political news. At the same time, it suggests that the sharing of sports news is not as polarized and points toward new avenues of study made possible with a classifier that can predict political leaning from non-political text.

Guardian BBC Telegraph Total
Political
       Left 1,223 240 161 1,624
       Right 812 377 1,230 2,419
       Unknown 650 238 334 3,641
Sport
       Left 314 295 44 653
       Right 1,631 869 378 2,878
       Unknown 956 564 115 1,635
Table 6: The political orientation of individuals sharing news by source and type. We set the probability threshold to 0.7 which has a 90% accuracy is our model development (non-political+network). The highest value in each row, excluding the total, is in bold.

Discussion

Polarization is often studied in explicitly political contexts; however, understanding the actual effects of polarization requires researchers to explore non-political contexts. This, in turn, requires the ability to estimate political leaning in these non-political contexts. Building on the concept of ‘lifestyle politics’ DellaPosta et al. (2015) this paper has shown it is possible to estimate political leaning from non-political text.

What exactly is non-political text is unclear. Our preliminary analysis found that tweets sent outside of election periods were still often political in nature (e.g., containing discussion about the UK’s relationship with Europe). In response, we developed an approach to define and expand a list of political keywords and only allow tweets without these keywords to be considered for our tasks with non-political text. The topics displayed in Table 5 suggest this worked reasonably well. However, a qualitative inspection of the topics reveals a small number of political tweets are still in the non-political dataset: eight out of the 150 topics (5.3%) are political (Israel–Palestine, Scottish independence, Brexit, Macron–Merkel, the Green Party, UK Politics, US Politics, Westminster). Nonetheless, these form a tiny part of the dataset and cannot be responsible for the performance of our classifiers. Indeed, applying the best performing classifier trained on political text to the non-political text results in accuracy no better than random chance.

We find only only weak correlations between classifier performance and either political activity or political leaning. This suggests that individuals with different activity levels and at different points in the left–right spectrum all leave traces of their political leaning.

The use of Voter Advice Applications (VAAs) for ground truth data represents an exciting way to build larger-datasets for training and testing machine learning classifiers. In contrast to human coders (e.g., Conover et al., 2011) or crowdsourcing (e.g., Iyyer et al., 2014), VAAs are self-reported. Preoţiuc-Pietro et al. (2017) used self-report surveys, but VAAs can result in larger sample sizes, provide a longer assessment, and rely on the genuine motivation of users who also learn how their political preferences align with major parties. Nonetheless, it will be important to understand any biases in who shares VAA results on social media platforms. For instance, it may be that only the most politically-engaged users share VAA results on social media.

In addition to the variety of input data used, prior research has also tried various machine learning approaches to classifying political leaning including SVM, Naïve Bayes, and neural networks. Reported performance scores range from 60% to 90% Preoţiuc-Pietro et al. (2017); Gu et al. (2016); Jelveh et al. (2014); Iyyer et al. (2014); Yu et al. (2008); Kulkarni et al. (2018). Thus, our F1 score of 0.81 for political text is good, and achieving an F1 score of 0.7 for non-political text alone or 0.75 for non-political text and network data is in line with other work using explicitly political text.

Ideology, of course, is much broader than a simple left–right or liberal–conservative spectrum. Our work, therefore, complements approaches such as Graham et al. (2012) who apply moral foundation theory and find liberals and conservatives do not differ dramatically on moral foundations. The topics we find echo the moral foundations Graham et al. identify: liberals give higher importance to individualizing foundations—harm/care and fairness/reciprocity—and conservatives give higher importance to binding foundations—in-group/loyalty, authority/respect, and purity/sanctity. That the topic on pollution is more associated with the right may seem counter-intuitive, but we note the effect sizes are small and the analysis only considers prevalence, not attitude or sentiment.

We find the accounts that a user follows often reveal that user’s political leaning. While some of these accounts are explicitly political, others such as Jeremy Clarkson, a former Top Gear presenter, and Charlton Brooker, the creator of Black Mirror, are not. In other words, interest in Top Gear/Grand Tour or Black Mirror may hint at individuals’ political leaning more than expected. It may be unexpected that an official government account such as that of the UK Prime Minister is so explicitly political, but it is worth noting in the UK context that the Conservative Party has been in power since 2010.

Although our F1 scores indicate there are left/right differences, claiming that all right and left users behave similarly would be a gross oversimplification. There may be significant differences in other countries and cultures. In addition, our data is not representative. We cannot generalize the results of this study to explain complex structures such as conservative or social-democratic ideologies.

Our brief case study of news sharing hints at promising ways in which classifiers of non-political content may be used. We found that political news sharing on Twitter is associated with the political leaning of users, but that sports news sharing is less polarised. Many accounts classified as right-leaning shared sports news from The Guardian, a left-leaning source. Telegraph sports news, however, was still heavily associated with right-leaning users. We suspect paywalls partially drive this result: The Telegraph is the only source with a paywall in our data. It may be that many right-leaning users look to The Guardian and other sources for sports news rather than pay for access to The Telegraph. On the other hand, individuals who already pay for access to The Telegraph likely also choose to get their sports news there.

The level of polarization observed on social media thus depends in part on the type of content examined. As mentioned, most studies examine polarization in the context of the sharing of explicitly political content. However, our results indicate that a more holistic view of the content shared beyond politics may show less polarization. Our results also suggest that editorial decisions such as having a paywall or not affect the level of polarization observed in link-sharing on social media.

Conclusions

As political orientation influences and correlates with many aspects of non-political life, there is good reason to expect that political leaning can be inferred even from non-political text.

We first developed a classifier using political text from tweets primarily associated with general elections in the UK, achieving an F1 score of 0.81. We then collected recent tweets from users outside of any election period and removed tweets with political keywords. Using this non-political text data to estimate the political leaning of users, we were still able to achieve an F1 score of 0.70. Furthermore, incorporating the network information of these users alongside the textual features from their tweets further increased the F1 score.

Using our classifier, we were able to classify the political leaning of users sharing political and sports news on Twitter from three UK national news sources. Our analysis indicated the importance of examining polarization in political and non-political contexts.

This study has shown the possibility for the classification of political leaning from non-political text and highlighted the importance of studying political leaning in non-political contexts. We hope that this paper can help spur research to understand polarization in non-political contexts.

References

Appendix A Supplemental materials

This supplemental material includes additional detail on how we identify political content, the URL patterns we use to identify the sharing of political and sport news, and all topics in the topic model for non-political and also political content using 150 topics.

Identifying political content

Political Words

\seqsplit

#askmay, #battlefornumber10, #bbcdebate, #bbcelection, #bbcqt, #bbcsp, #believeinbritain, #brexit, #brexitdeal, #bristolgreenmp, #bristolwest, #budget2015, #budget2017, #cameronmustgo, #conservative, #conservatives, #corbyn, #cpc17, #dementiatax, #dupcoalition, #ed4pm, #election, #election2015, #election2017, #electionday, #electionday2017, #forthemany, #ge15, #ge17, #ge2015, #ge2017, #generalelection, #generalelection17, #generalelection2017, #getcameronout, #greens, #greensurge, #hungparliament, #imvotinglabour, #itvdebate, #jc4pm, #jc4pm2019, #jeremycorbyn, #jezwecan, #labour, #labourdoorstep, #labourleadership, #labourmanifesto, #labourmustwin, #le2017, #leadersdebate, #libdem, #libdemfightback, #libdems, #makejunetheendofmay, #marrshow, #mayvcorbyn, #newsnight, #nhscyberattack, #nigelfarage, #peston, #plaid15, #pmqs, #queensspeech, #registertovote, #ridge, #saveilf, #snp, #snpout, #tories, #toriesout, #toriesoutnow, #tory, #toryelectionfraud, #torymanifesto, #ukip, #uklabour, #victorialive, #vote, #vote2017, #votecameronout, #voteconservative, #votegreen2015, #votegreen2017, #votelabour, #votelibdem, #votematch, #votesnp, #voteukip, #weakandwobbly, #whyimvotingukip, #whyvote, #wsyvf, @aaronbastani, @ainemichellel, @amberruddhr, @amelia_womack, @andrewspoooner, @andysearson, @angelarayner, @angiemeader, @angry_voice, @angrysalmond, @annaturley, @barrygardiner, @bbcbreaking, @bbclaurak, @bbcnews, @bbcpolitics, @bbcr4today, @bbcscotlandnews, @bonn1egreer, @borisjohnson, @brexitbin, @bristolgreen, @britainelects, @campbellclaret, @carolinelucas, @cchqpress, @charliewoof81, @christinasnp, @chukaumunna, @chunkymark, @conservatives, @corbyn_power, @corbynator2, @d_raval, @daily_politics, @darrenhall2015, @david_cameron, @davidjfhalliday, @davidjo52951945, @davidlammy, @davidschneider, @dawnhfoster, @debbie_abrahams, @dmreporter, @dpjhodges, @dvatw, @ed_miliband, @el4jc, @emilythornberry, @evolvepolitics, @faisalislam, @fight4uk, @frasernelson, @gdnpolitics, @georgeaylett, @georgeeaton, @grantshapps, @guardian, @guardiannews, @guidofawkes, @hackneyabbott, @harrietharman, @harryslaststand, @hephaestus7, @hrtbps, @huffpostuk, @huffpostukpol, @iainmartin1, @iandunt, @imajsaclaimant, @independent, @ipsosmori, @itvnews, @jacob_rees_mogg, @james4labour, @jameskelly, @jamesmelville, @jamieross7, @jeremy_hunt, @jeremycorbyn, @jeremycorbyn4pm, @jimwaterson, @joglasg, @johnmcdonnellmp, @johnprescott, @johnrentoul, @jolyonmaugham, @jon_swindon, @jonashworth, @joswinson, @keir_starmer, @kevin_maguire, @kezdugdale, @krishgm, @laboureoin, @labourleft, @labourlewis, @labourpress, @ladydurrant, @leannewood, @liamyoung, @libdempress, @libdems, @louisemensch, @lucympowell, @mancman10, @marcherlord1, @markfergusonuk, @mhairihunter, @michaelrosenyes, @mikegalsworthy, @mirrorpolitics, @mmaher70, @molly4bristol, @mollymep, @montie, @mrmalky, @msmithsonpb, @natalieben, @newsthump, @nhaparty, @nhsmillion, @nick_clegg, @nickreeves9876, @nicolasturgeon, @nigel_farage, @normanlamb, @nw_nicholas, @owenjones84, @patricianpino, @paulmasonnews, @paulnuttallukip, @paulwaugh, @peoplesmomentum, @peston, @peterstefanovi2, @petewishart, @plaid_cymru, @politicshome, @prisonplanet, @rachael_swindon, @realdonaldtrump, @reclaimthenews, @redhotsquirrel, @redpeter99, @rhonddabryant, @richardburgon, @richardjmurphy, @robmcd85, @rosscolquhoun, @ruthdavidsonmsp, @sarahchampionmp, @scottishlabour, @scottories, @screwlabour, @skwawkbox, @skynews, @skynewsbreak, @socialistvoice, @standardnews, @stvnews, @sunny_hundal, @survation, @suzanneevans1, @telegraph, @telegraphnews, @telepolitics, @thecanarysays, @thegreenparty, @themingford, @thepileus, @theredrag, @theresa_may, @theresa_may’s, @therightarticle, @thesnp, @thisisamy_, @timfarron, @timothy_stanley, @tnewtondunn, @tomlondon6, @toryfibs, @trevdick, @trobinsonnewera, @uk_rants, @ukip, @uklabour, @vincecable, @wesstreeting, @willblackwriter, @wingsscotland, @wowpetition, @yougov, @yvettecoopermp, abbot, abbott, bennett, bernie, boris, brexit, cameron, cameron’s, campaign, campaigning, candidate, candidates, canvassing, centrist, clegg, clegg’s, clinton, coalition, communist, comres, con, conservative, conservatives, constituency, constituents, corbyn, corbyn’s, councillor, councillors, cymru, davidson, debates, dem, democrat, democrats, dems, dup, election, elections, electoral, entitlement, eu, fallon, farage, farage’s, farron, fein, fiscal, gardiner, ge, gop, gove, govern, government, government’s, govt, govt’s, greens, grn, guardian, hamas, hammond’s, hillary, hustings, icm, ifs, ind, ira, isis, islamist, johnson, johnson’s, jones, labour, labour’s, labours, ld, ldem, leanne, lib, libdem, libdems, liberal, manifesto, manifestos, may’s, mcdonnell, miliband, miliband’s, milliband, minister, mogg, mps, nuttall, obama, oth, palin, paxman, plaid, pm’s, policies, policy, poll, polling, polls, portillo, putin, queen’s, ref, referendum, republican, republicans, rudd, russia, salmond, sanders, scrapping, sinn, snp, snp’s, socialist, soros, sturgeon, sturgeon’s, terror, terrorism, terrorist, theresa, thornberry, tns, tories, tories’, tory, trident, trump, ukip, ukip’s, unionist, vaughan, vladimir, vote, voter, voters, votes, voting, yougov

Ambiguous Words

\seqsplit

free, #amazonbasket, #beastfromtheeast, #bournemouth, #colchester, #eurovision2015, #finland, #firearms, #foxhunting, #freebies, #glastonbury, #grenfell, #grenfelltower, #leeds, #londonbridge, #marr, #onelovemanchester, #spanishgp, #thursdaythoughts, #trident, #uk, #york, #youtube, @daraobriain, @emmakennedy, @jamin2g, @janeygodley, @kindleuk, @marcuschown, @paulbernaluk, @pinknews, @rustyrockets, @sophiareed1, @swanseacity1, 10m, 1bn, 2015, 34, 8bn, 8th, allowance, amber, ars, attack, austerity, baptist, behave, benz, branch, burrows, buts, chaos, clarkson, conference, cons, costed, costings, cuts, davey, david, deal, debate, deceased, defend, dementia, diane, diaz, dodging, donald, dunk, ed, emily, esther, exc, factions, fairer, fees, firearms, firefighters, foxes, freddy, general, glastonbury, grenfell, gy, hamlets, hs2, hsbc, ht, htt, hung, inheritance, intention, jeremy, june, kensington, kyle, lab, lambert, landslide, leader, leaders’, leaflet, leafleting, leaflets, lorde, marginal, matched, may, mcintyre, meeting, meetup, members, membership, methodist, middlesbrough, middleton, muppets, murphy, natalie, nicola, nigel, observer, participate, party, pensioners, pledge, pledges, plymouth, posters, progressive, prop, reception, register, registered, rehearsing, results, rodgers, rt, ruth, scarpping, scrap, seat, seats, sentencing, sheeran, slater, sos, source, stable, sub, supporting, surge, tactical, tactically, th, theo, tim, transformative, tuition, uc, uk, violence, watson, weak, weir, wheat, youtuber

News case study URL patterns

Source Type URL pattern
Guardian political theguardian.com/politics/2018/
sport theguardian.com/sport/2018/
BBC political bbc.co.uk/news/uk-politics-
sport bbc.co.uk/sport/
Telegraph political telegraph.co.uk/politics/2018/
sport telegraph.co.uk/football/2018/
telegraph.co.uk/cycling/2018/
telegraph.co.uk/cricket/2018/
telegraph.co.uk/rugby-union/2018/
Table 7: URL patterns used to select sport and political news sharing on Twitter from The Guardian, the BBC, and The Telegraph for the news sharing case study.

Full topic models

We are providing non-political and political topic models in this section.

Non-Political Topics

The topics we name in the paper are: 38 (Entertainment), 46 (Technology/EU), 48 (Gardening), 58 (MUFC/Premier League), 75 (London/Beer), 77 (Follow/Unfollow), 91 (Africa/Corruption), 95 (Sugar Tax), 98 (Premier League), 114 (Soc. Dem. Values), 148 (Pollution).

The explicitly political topics we identified in the non-political model are 27 (Israel–Palestine), 37 (Scottish independence), 54 (UK politics), 62 (EU politics), 69 (the Green Party), 84 (Brexit), 86 (the US), 106 (Westminster).

Table 8: Words belongs to non-political topics. Unique words are selected by combining top 10 words of FREX, LIFT and log scores in given order.
Topic Features
1 risk, daili, de, hr, late, leadership, insur, manag, thetim, innov, aug
2 youtub, youtub_video, video, video_youtub, islam, lyric, hd, feat, music_video, trailer, video_make, parodi, muslim
3 anxieti, ur, mental, mental_health, depress, mental_ill, ill, stigma, base, disord, anxious, therapist, mental_health_issu, insta, xx
4 man_love, steel, andrew, everton, liam, lfc, liverpool, gallagh, jim, lg, oasi, arena, easter_egg, klopp, giveaway
5 comic, health, daili, technolog, mentalhealth, healthcar, late, tech, busi, dc, medicin, mm, book
6 fm, mile, en, gb, step, travel, insid, leadership, eu, construct, stem, sustain
7 gay, today_find, pride, tho, netflixuk, porn, coffe, sean, ah, memori, twitter_account, dublin, domino, work_today, ugh
8 ink, illustr, draw, japanes, lesbian, omg, lil, irish, queer, ireland, submiss, dublin, plz, freelanc, eurovis
9 bronz, trophi, ps, earn, silver, gold, destini, broadcast, assassin, giveaway, gran, game
10 tourism, student, research, academ, univers, publish, phd, colleg, studi, organis, leisur, philosoph, prof, philosophi, sustain
11 lewi, xx, haha, chris, happi_birthday, birthday, gari, cheer, liam, bf, spencer, haha_good, oop, lol
12 sampl, repli, bargain, thriller, kindl, free, mysteri, buy, author, book, suspens, amazonuk, trilog, romanc
13 academi, chariti, budget, kent, educ, sector, trust, tax, manufactur, school, sussex, cyber, mat, implic
14 chanc_win, giveaway, competit, follow_chanc, enter, follow_chanc_win, chanc, follow_win, simpli, prize, winner_announc, follow_enter, follow_retweet, bundl, win
15 playlist, folk, tea, tuesday, thursday, monday, wednesday, august, decemb, novemb, reel, hardi, rhythm, elvi, octob
16 angel, pari, sam, bastard, danni, barber, pump, shite, lad, jim, barclay, glenn, innit, daft, casual
17 ebay, volum, nobl, barn, check, dvd, review, ukchang, sign_petit, petit, sign_petit_ukchang, petit_ukchang, disc, fossil
18 fb, mk, past_week, donat, past, awesom, geek, airport, cake, cycl, browni, amazonuk, phase, protein, workout
19 autom, schedul, script, web, window, marcus, loop, solut, technic, wizard, stop_work, scrape, boundari, blog, datum
20 gt_gt, gt, gt_gt_gt, album, bird, song, music, vibe, mega, moth, dope, make_sick, bloodi_good, blur, ala
21 railway, carriag, rail, destini, train, ps, delay, se, marvel, sigh, molli, virginmedia, berri, geek, lbc
22 lodg, swansea, ken, swan, leicest, anna, squar, mason, ed, instal, surnam, ceremoni, virginmedia, cardiff
23 canal, trip, boat, sun, great_day, festiv, weekend, rugbi, lock, cycl, sail, countrysid, great_weekend, visitor, temp
24 sw, pool, waterloo, citi, pr, xxxx, leader, anna, andrea, kyli, compassion, abba, samuel, yawn, grayl
25 sibl, studi, immigr, patient, diseas, evid, effect, outcom, increas, clinic, diabet, mortal, norm, intervent, genet
26 shaw, race, bet, cricket, bbcsport, bat, ffs, ball, bowl, counti, fort, moor, hardi, prove_wrong, tbh
27 antisemit, israel, jew, palestinian, jewish, isra, racist, anti_semit, semit, gaza, reluct
28 post_photo, photo, facebook, post, fisher, hors, race, xxx, parti, indi, sim, happi_day, rider, adel, xx
29 subscript, content, exclus, access, fan, renew, month, join, utd, portug, ab, manchest_unit, ferguson, leed, man_utd
30 durham, newcastl, laura, lauren, xo, hun, isl, dubai, cute, nah, luci, mc, lmao, tbh
31 develop, birmingham, role, derbi, coventri, net, softwar, job, hire, engin, infrastructur, midland, bbc_news
32 ur, pl, gal, wanna, uni, liter, babe, xo, bc, omg, ew, soz, makeup, sophi, casualti
33 detect, youtub, trailer, cinema, video_youtub, director, color, magazin, lee, ai, maggi, nasa, arch, depict, vice
34 imaceleb, doctorwho, wale, gbbo, nurs, swansea, flood, die_age, doctor, bbcone, phillip, minut_silenc, iain, evacu, sharon
35 eurovis, sync, brit, fave, song, spice, fab, chart, singl, battl, woo, wk, love_song, mexico, album
36 mi, run, race, marathon, runner, nike, crush, pace, jedi, paul, hm, endur, sprint, gym
37 scottish, scotland, independ, proport, glasgow, scot, averag, edinburgh, 3, wee, ruth, good_support, tier, uefa, ranger
38 beth, xx, haha, xxx, deploy, sean, artist, nicki, wire, hahaha, knit, navi, bbc_radio, willi, lol
39 tshirt, york, wed, edinburgh, magic, earn, badg, awesom, film, packag, surf, tee, yorkshir, chanc_win
40 keith, trek, samuel, christ, faith, god, spirit, israel, august, promis, ministri, profound, divin, amanda, revel
41 sheffield, yorkshir, south, vine, centr, ago_today, year_ago_today, sarah, year_ago, shop, hillsborough, world_big, arthur, xbox, lol
42 disabl, dwp, disabl_peopl, benefit, univers_credit, poverti, welfar, credit, homeless, auster, pip, sanction, luther, duncan
43 consult, insight, appl, io, iphon, karl, server, tim, read_tweet, cook, smartphon, broadband, samsung, softwar
44 afc, arsenal, wenger, chelsea, spur, arsenal_fan, alexi, sanchez, tottenham, emir, au, bayern, dortmund
45 legal, law, court, lawyer, lynn, ms, divorc, student, aid, justic, suprem_court, profess, mainten, academ
46 android, dj, ee, eu, app, greec, independ, mobil, scotland, ipad, tutori, currenc, rubi, versus, workout
47 lt, oxford, input, lt_lt, hey, stream, ill, nowplay, rank, hall, aha, acoust, meter, gordon, ah_good
48 bloom, garden, cute, layer, kinda, ship, charact, rich, bless, hug, hope_feel, roller, ldn, shoutout, gosh
49 thanksgiv, km, coffe, cat, tourist, holiday, gym, tube, latin, breakfast, sticker, latt, florida, unlock, kitti
50 connect, typo, lbc, garylinek, coin, sugar, lord, bot, threaten, climat_chang, fork, lord_sugar, edl, man_make, russian
51 gay, eurovis, lgbt, money_make, pride, australia, marriag, equal, sydney, homophob, gay_man, lgbtq, homophobia, abba, queer
52 bbc_news, bbc, news, great_good, portrait, exhibit, irish, ireland, wonder, museum, serena, twitter_follow, austria, thame, sad_news
53 opinion, fact, agre, understand, genuin, sens, fair, forbid, disagre, interest, logic, necessarili, piti, grasp, satir
54 bruce, anna, remain, parliament, migrant, illeg, franc, countri, politician, puppi, griev, leaver, traitor, des, imparti
55 fuck, cunt, fuckin, connor, yer, kyle, shite, ye, shit, lmao, cum, shag, fanni, fuck_hate, wank
56 writer, kickstart, movi, film, cinema, write, author, doctorwho, book, episod, haul, andr, crowdfund, day_leav, big_screen
57 follow_follow, blog, blogger, instagram, shoe, outfit, fashion, dress, gorgeous, beauti, follow_back, skirt, aso, luxuri, wardrob
58 mufc, manutd, unit, rooney, manchest_unit, utd, goal, golf, moy, mourinho, bale, trafford, ucl, cristiano, pogba
59 shade, film, star_war, hairdress, starwar, movi, hell, war, jedi, damn, shaun, laser, film_make, titan, awaken
60 lol, bald, lmao, im, omg, tho, tbh, ur, ppl, wtf, lol_good, sooo, goin, carniv, sooooo
61 cat, eye, eden, mill, insect, fuck_fuck, biscuit, ian, tree, poem, badger, windsor, good_god, andr, ha
62 deficit, el, la, los, remind, econom, en, tax, america, del, es, nationalist, merkel, macron, lo
63 turkish, lisa, bing, miss, mailonlin, harvey, louis, bloke, lbc, custom_servic, leo, robbi, vip, xx
64 art, galleri, artist, exhibit, studio, paint, print, workshop, beach, landscap, contemporari, tide, artwork
65 cricket, broad, bat, ash, stoke, test, england, bowl, root, ball, aussi, ol, over, jonni, fuck
66 thor, teacher, physic, activ, educ, child, school, teach, student, earli_year, nurseri, pe, classroom, toddler, egypt
67 stain, graphic, websit, van, ff, sign, newcastl, vehicl, updat, wall, saturday_morn, copper, frost, banner, high_street
68 exam, collin, uni, revis, essay, err, sister, lectur, graduat, hull, abbey, gcse, time_aliv, snapchat
69 green_parti, climat, green, climat_chang, ride, bike, peac, nhs, mag, hunt, pollut, heathrow, countrysid, environment, finland
70 stamp, paul, writer, morri, blue, andr, scotland, fabric, craft, recip, thumb, sew, stripe, xx, tho
71 jennif, hahaha, danni, haha, lawrenc, em, il, fav, nottingham, stop_watch, hunger, larri, fuck, im
72 market, strategi, content, social_medium, brand, social, digit, medium, trend, tip, optim, linkedin, influenc, algorithm, facebook
73 mia, xfactor, sing, factor, song, sterl, anthoni, england, fuck_fuck, gonna, fluffi, big_brother, week_day, carrot, blade
74 dec, furi, tyson, fight, fighter, box, mate, jr, pal, champ, boxer, ko, parker, big_man, aj
75 unit_kingdom, kingdom, ben, unit, beer, pub, london, januari, station, tim, buffet, slice, high_street, poet, buff
76 ppl, didnt, im, sandra, ive, doesnt, isnt, cuz, folk, what, your, hmmm, lol, hes
77 unfollow, automat, peopl_follow, stat, follow, check, person, ireland, mention, reach, shane, big_fan, monitor, vinc, find_peopl
78 long_term, ear, lfc, defend, poor, injuri, midfield, el, defens, hes, dale, trent, analys, salli, your
79 jane, lt, xxx, xo, nugget, ad, robert, hay, laura, lt_lt, sob, xxxx
80 blaze, nfl, wrestl, season, coach, defens, chief, matt, game, offens, vardi, boston, houston, panther, ref
81 maintain, action, figur, base, statement, appar, talk, link, prefer, give, mental_health, connect, make, time
82 council, communiti, wellb, local, west, committe, servic, resid, fund, volunt, social_care, neighbourhood, district, turnout, great_hear
83 nathan, josh, hall, eve, exam, jame, gonna, shit, revis, preston, homework, kennedi, quid, il, fuck
84 deal, trade, remain, democraci, agreement, union, border, european, freedom, good_pay, david_davi, european_union, leaver, negoti, norway
85 sign_petit, petit, frack, ukchang, degre, nhs, petit_ukchang, sign_petit_ukchang, sign, gaza, privatis, open_letter, georg_osborn
86 presid, america, syria, cnn, health_care, american, gun, senat, white_hous, potus, congress, fox_news, presid_unit, presid_unit_state, fbi
87 ted, steve, neighbour, chris, math, podcast, sum, wealth, hey, parliament, ch, sky_news, gorilla, puzzl, worth_watch
88 barrel, rick, turner, swansea, church, tire, brighton, rugbi, mate, morn, julia, tobi, chick, dissert, briton
89 chef, restaur, menu, manchest, lunch, food, meal, open, dish, roast, good_food, dine, norman, citi_centr, pm_pm
90 librari, museum, cat, omg, jo, daniel, liter, ur, staff, pay, patron, windsor, charlott, printer, citizenship
91 ni, nigeria, lawyer, african, corrupt, parcel, bbcradio, lol, journo, arrest, astronaut, eye_open, hoo, detain, wk
92 karaok, drink, gin, bake, tire, cake, nottingham, beer, dave, lil, gemma, cider, burnley, lush, buffet
93 choir, experi, care, young_peopl, ne, scotland, wee, kenni, glasgow, edinburgh, portray, peopl_care, jk_rowl, show_support, rowl
94 carter, saint, billi, st, fa, dan, ranger, bbcsport, band, radio, show_tonight, nra, curl, di, andi_murray
95 frog, level, skill, cunt, sugar, committe, divin, nhs, suspect, holland, jeremi_hunt, coffin, rag, blog, fascist
96 theapprentic, ps, nintendo, xbox, switch, mario, episod, game, star_war, trailer, ea, batman, starwar, consol
97 raf, york, scotland, poppi, veteran, fraser, soldier, jim, store, boss, primark, lili, today_rememb, case_miss, bbcradio
98 west_ham, spur, ham, salah, everton, kane, chelsea, klopp, pogba, lukaku, cl, mane, hazard, fuck, midfield
99 blackpool, wimbledon, itv, liverpool, strict, fabul, tenni, eurovis, ff, manutd, casualti, handsom, watson, saturday_night, im
100 vegan, hunt, anim, lash, rescu, wildlif, fox, rickygervai, protect, meat, rhino, endang, extinct, eleph
101 dart, mark, piersmorgan, lbc, pint, proud, midland, yep, kthopkin, twat, wetherspoon, hood, lap, chuck, rickygervai
102 leed, veggi, daughter, gbbo, rat, mum, charlott, sew, kim, advert, elton, usernam, crappi, amber, fuck
103 rio, euro, olymp, gold, medal, doctorwho, bridg, wale, silver, itv, andi_murray, world_record, showcas, pursuit, bbcsport
104 print, font, leed, design, poster, gig, mount, beer, brew, peter, ale, pale, punk, fest
105 album, gaga, omg, song, ariana, kate, meghan, singl, tour, icon, good_song, rihanna, banger, beyonc, nicki
106 economi, rural, britain, unemploy, budget, growth, chancellor, union, bn, westminst, commonwealth, margaret_thatcher, good_futur, gdp, booth
107 bowi, gig, david_bowi, album, david, band, music, ticket, tour, vinyl, great_weekend, tix, acoust, happi_friday, pre_order
108 mate, lincoln, lad, haha, newcastl, sunderland, hahaha, aye, dean, cunt, mate_good, morri, hahahahaha, ashley, fuck
109 muslim, islam, migrant, rape, tommi, ukrain, religion, gang, tommi_robinson, immigr, mosqu, free_speech, hate_crime, merkel, groom
110 celtic, ranger, rodger, wit, hes, scottish, bn, robert, brendan, mcgregor, iv, lennon, gerrard, daft, midfield
111 pin, fanci, newcastl, iphon, favorit, fab, nowplay, nois, bike, wear, christma_dinner, cube, lamp, rickygervai, bicycl
112 retail, water, ceo, energi, custom, market, gas, sector, regul, innov, supplier, icymi, storag, resili
113 yoga, theatr, delici, fabul, cast, workout, drag, mac, audit, dinner, repost, luv, nightclub, popcorn, rehears
114 foodbank, poverti, nhs, wage, worker, crisi, fund, wick, solidar, food_bank, grayl, live_wage, privatis, hunger, comrad
115 irl, ass, pokemon, damn, movi, dude, meme, bc, charact, shit, wizard, behold, vampir, worm, demon
116 franci, scout, shift, tattoo, beer, german, tbh, world_cup, worldcup, incid, navi, ballot, houston, troop, sadiq
117 black, smh, nah, bro, ass, mad, girl, lmao, drake, tryna, pierc, tl, black_man, black_peopl, hoe
118 cruis, safeti, driver, bbc_news, fire, leicest, car, vegetarian, member, union, exposur, fleet, trail, england_wale, fog
119 world_cup, goal, cup, fulham, footbal, leagu, worldcup, premier_leagu, sterl, england, huddersfield, pep, southgat, manchest_citi, colombia
120 franchis, agent, properti, episod, bro, soundcloud, ms, review, lol, itun, buyer, estat, mortgag, seller, xx
121 lfc, liverpool, belfast, klopp, anfield, suarez, xx, hillsborough, gerrard, lui, ski, salah
122 ironi, racist, rail, remain, ban, odd, invent, lefti, sane, tho, appropri, dodgi, mph, impli, nearbi
123 panel, energi, solar, hill, climat, local, council, sustain, green, winner, borough, effici, org, st_centuri, coal
124 outdoor, mountain, spotifi, lake, peak, trail, summit, district, honey, walk, moss, sculptur, vintag, nowplay, yorkshir
125 loveisland, georgia, alex, megan, laura, wes, love_island, liverpool, adam, beyonc, madonna, kyli, loyal, xx
126 church, fr, pray, mass, bishop, priest, cathol, mari, feast, worship, ministri, christ
127 villa, aston, cricket, dr, premier_leagu, xi, premier, england, score, mini, sydney, russia, off, good_lad, hugh
128 xxx, fab, xx, xxxx, wonder, love_show, gorgeous, smile, daughter, ador, make_smile, daisi, mt, hope_feel, bravo
129 photographi, cornwal, photograph, bird, sunset, essex, natur, wolf, photo, van, je, mere, sunris, bonfir, wildlif
130 carbon, thread, cox, bark, artist, andrea, tran, cc, jersey, energi, orient, fossil, solar, mt, word_day
131 calm, london, friday, airport, languag, uniqu, googl, ad, gatwick, german, camden, fur, den, paddington, croatia
132 welsh, cardiff, wale, languag, member, independ, william, group, english, facebook, renam, mud, royal_famili, christma_card, dump
133 recruit, warehous, graduat, virginmedia, email, glasgow, author, bobbi, web, slight, rupert, ten_minut, murdoch, amanda, email_address
134 mrjamesob, gun, gender, feminist, dumb, argument, dude, disagre, ideolog, nazi, alt, rifl, notch, ration, virtu
135 tom, jen, tit, nowplay, pint, doo, wrestl, fool, telli, tbh, minc_pie, minc, preston, fuck
136 christian, bibl, jesus, christ, ye, church, sin, pray, god, faith, satan, worship, vers, atheist, flesh
137 bristol, cook, bang, tape, record, arm, releas, bear, hideous, field, bleach, trace, gag, twelv, cd
138 googl, search, index, mobil, site, rank, content, tool, updat, link, algorithm, thx, optim, crawl, fetch
139 clarkson, cycl, favorit, eastend, driver, pride, bike, alan, gun, cyclist, waitros, potus, adida, paedophil, bicycl
140 bed, jodi, sleep, boyfriend, tire, flavour, eat, wash, nap, tesco, bedtim, payday, bra, lose_weight, crave
141 io, appl, raspberri, app, microsoft, iphon, window, ipad, code, mac, chrome, beta, browser, os, butler
142 airway, jet, nowplay, flight, rout, airport, leed, pepper, ryanair, airlin, paradis, ba, heathrow
143 norwich, neil, wes, gut, ticket, bet, championship, superb, refere, season, game_today, season_ticket, alloc, applaus, win_ticket
144 ha_ha, ha, wright, nurs, placement, ami, jo, disabl, luke, hug, brace, ar, jo_cox, orlando, lol
145 korean, riot, na, player, team, split, tournament, korea, region, champ, passiv, reddit, tier, leagu, game
146 gender, tran, feminist, sex, rape, woman, sexual, male, femal, femin, gum, misogyni, sexism, prostitut, sexual_assault
147 greg, cfc, chelsea, cont, sadiqkhan, iran, racist, mourinho, hazard, staff, diego, fifti, grenfelltow, fuck, jose
148 plastic, contain, ocean, pollut, dr, usa, sea, recycl, dwp, path, curios, litter, tonn, live_uk, crook
149 southampton, player, saint, squad, club, fc, season, midfield, loan, leagu, crook, start_season, make_chang, scorer, delight_announc
150 blake, chees, babe, holli, dale, aha, haha, joe, ahh, mac, fml, aso, dissert, deffo, bruis

Political Topic Model

This is the model created with the political text dataset.

Table 9: Words belongs to political topics. Unique words are selected by combining top 10 words of FREX, LIFT and log scores in given order.
Topic Features
1 blue, chicken, stori, ukchang, thegreenparti, vote_green, food, stuff, breakfast, philip, fri, lover, elect_manifesto, clair, green
2 life, world, fulfil, promis, breakdown, announc, labour, admir, check, gap, precis
3 mate, teen, fiction, stori, wolf, truth, true_stori, beach, heart, dirti, stori_good, top_stori, love_stori
4 tomorrow, tonight_vote, tonight, vote_tomorrow, forget, sum, asham, home, centr, releas, vote_govern, liam, mp_back, week_ago, vote
5 jeremycorbyn, marri, jeremycorbyn_theresa, peopl, jeremycorbyn_labour, stand, make, peopl_vote, vote_ge, sad, theresa_jeremycorbyn, behav, neighbour, vote_jeremycorbyn, februari
6 green, parti, polit_parti_side, green_parti, today, brand, campaign, labour_back, labour_polit, labour_polit_parti, parti_ge, conserv
7 prime, prime_minist, minist, categori, presid, live, chief, birthday, call, offic, sincer, minist_theresa, life
8 soar, vote_retweet, retweet, sampl, retweet_vote, maguir, sampl_size, kevin_maguir, size, kevin, dailymailuk, tomorrow_vote, poll, faisalislam
9 bbcqt, labour_ge, hrtbps, audienc, bbcquestiontim, owenjon, dimblebi, carolineluca, soubri, anna_soubri, interrupt, hmm, audienc_member, white_man, vote_record
10 agre, matter, true, wrong, politician, vote, vote_vote, peopl_vote, chang, vote_matter, vote_parti, parti_vote, chang_vote
11 ref, vote_support, cheer, lee, wow, morn, appoint, mr, jack, laugh, coach, merri_christma, moan, asap, milliband
12 chanc_win, competit, enter, select, follow, winner, chanc, product, prize, copi, tag, entri, fanci, win
13 cameron, david_cameron, david, miliband, surgeri, ed, ed_miliband, telegraph, osborn, pay, reshuffl, georg_osborn, labour_leader, pmqs
14 gb, york, endur, sajidjavid, javid, ruthdavidsonmsp, squar, arrang, remoan, good_luck, oop, curious, sajid, corbyn
15 instagram, snapchat, girl, storylin, artist, gonna, music, fuck, favorit, song, cute, good_stori, album, horror_stori, true_stori
16 jean, william, plaid, lib_dem, lib, leann, dem, cardiff, bbcdebat, ld, erm, bad_idea, leannewood, mare, tori_major
17 rhetor, energi, water, sector, industri, connect, blog, invest, berlin, confer, ceo, sustain, retail, cc, leadsom
18 read, small, outrag, act, make_big, differ, big, luck, govt, power, wash, mouth, yep, door, love
19 bbc, bbc_news, make_happen, news, davi, bbcnew, david_davi, news_brexit, blair, fish, news_tori, river, banner, botch, brexit
20 revis, exam, histori, ur, uni, lol, god, haha, obama, annoy, horror_stori, histori_book, toy_stori, presid_obama, snapchat
21 clegg, nick_clegg, nick, send_messag, leadersdeb, mind, divis, tuition_fee, end, tuition, xenophobia, sin, childish, generous, goodby
22 hollywood, front_page, page, newspap, front, soldier, beer, host, soviet, journal, hillsborough, genocid, murray, flee, instal
23 septemb, vote, labour, check, card, life, announc, world, wild, stori, dog
24 support_peopl, owner, shop, pride, mine, wolf, oop, agent, wall, remain_win, govern_support, peopl_peopl, merri_christma, support, wto
25 carolin_luca, liberti, guidofawk, telegraphnew, brighton, compromis, fish, afneil, carolin, dpjhodg, clap, luca, retail, cricket, wilson
26 peoplesvot, peoplesvot_uk, ordinari_peopl, martin, insult, edinburgh, deliv, outcom, attempt, approach, bell, call_peopl, bbc_live, bad_brexit, uk
27 conserv, vote_conserv, parti_side, conserv_parti, conserv_win, parti_work, parti, conserv_mp, conserv_govern, today, good_good, conserv_vote, campaign, polit
28 sign_petit, ukchang, petit, sign_petit_ukchang, sign, petit_ukchang, theresa_mp, anim, hunt, farm, trophi, xx, petit_call, call_govern
29 bold, elect_quiz, quiz, uk_elect, elect_quiz_result, quiz_result, wsyvf, uk_elect_quiz, result_labour, result, exhibit
30 rend, properti, agenc, photograph, landlord, insur, provid, embrac, agent, inform, cloud, tenant, aspect, packag, vat
31 leadersdeb, russel_brand, libdem, bbcdebat, barackobama, farron, tim_farron, ge, candid, liber_democrat, natali_bennett, hust, charl, natali, elect_debat
32 technolog, comic, health, daili, late, health_care, busi, tech, con, conserv_win, ep, top_stori, great_stori, stori
33 kent, scienc, plastic, environ, natur, climat, theatr, scheme, nato, beauti, recycl, museum, marin, robot, exhibit
34 life, check, promis, world, stori, life_stori, announc, vote, true, match, wed
35 gibraltar, collabor, spanish, wto, escap, spain, freedom, de, sovereignti, evil, 4, convent, dictatorship, anti_democrat, oppress
36 lib_dem, lib, dem, vote_futur, vote_lib_dem, vote_lib, lol, tim, amber_rudd, rudd, snake, yup, gear, comic, sigh
37 cancer, healthi, test, research, mum, amaz, share, inspir, share_stori, birmingham, top_stori, fab, love_stori, charli, grammar_school
38 hiv, experi, student, care, sexual, woman, welsh, lgbt, young, inspir, share_stori, launch_campaign, sister, peopl_live, harass
39 salmond, indyref, telegraph, shot, scot, currenc, scottish_independ, scottish, scotland, independ, milliband, scotland_vote, sterl, undecid, snp
40 battlefornumb, nhsmillion, leadersdeb, doctor, nhs, dr, jeremi_hunt, ge_labour, paxman, privatis, exempt, privatis_nhs, uk_polit, ukchang, dwp
41 unbeliev, wit, ref, thick, pathet, folk, disgrac, newsnight, pr, daili_mail, fiasco, carpet, slaveri, year_year, bitter
42 termin, cornwal, googl, comput, datum, facebook, art, search, site, websit, storytel, web, museum, traffic, exhibit
43 polit, parti, polit_parti, campaign, today, judg, debat, mp, reform, system, parti_leader, british_polit, conserv
44 manchest, wisdom, uklabour, abbott, wood, dian_abbott, dian, stephen, euref, resid, grammar_school, grammar, blast, champagn, steven
45 reward, swindon, pool, leadsom, andrea, rachael, rachael_swindon, wale, voteleav, vote_leav, compassion, badg, dpjhodg, britainelect
46 isi, saudi, yemen, obama, plea, syria, clinton, refuge, donald_trump, arabia, hunger, humanitarian, barack_obama, presid_obama, barack
47 vote_lose, irish, im, gonna, magic, abort, tree, money_tree, magic_money_tree, magic_money, dublin, wanna, black_woman, lend, cathol
48 gt, gt_gt, gay, marriag, miss, facebook, news, blog, sex, datum, privaci, berlin, nake, sex_marriag, bake
49 singl_market, hard_brexit, procedur, davi, brexit_brexit, market, brexit, singl, citizen, negoti, brexit_impact, ken_clark, ep, eea, soft_brexit
50 disabl, welfar, laboureoin, welfar_cut, poverti, jon, cut, mirrorpolit, swindon, tori, tax_credit, corbyn_leader, bedroom_tax, dailymirror, harryslaststand
51 innov, guardian, ian, drink, creativ, advertis, art, ad, food, de, storytel, gif, ted, magazin
52 peoplesvot, stopbrexit, proport, brexit_deal, fbpe, represent, deal_brexit, squeez, democraci, stop_brexit, vote_system, make_voic_hear, miser, peoplesvotemarch, brexit
53 ree_mogg, mogg, ree, bori_johnson, bori, jacob, jacob_ree_mogg, jacob_ree, johnson, dear_theresa, shower, tobi, call_jeremi, michael_gove, cartoon
54 cambridg, council, stark, councillor, fbpe, resid, citi, amend, motion, local_elect, scrutini, final_deal, localelect, vote_local, district
55 ft, victoria, iandunt, folk, music, tuesday, monday, wednesday, podcast, bill, februari, final_brexit_deal, final_brexit, hillaryclinton, guardian
56 launch, campaign, today, commit, polit, parti, confer, pledg, green, futur, parti_confer
57 councillor, local, joy, council, tim, canvass, candid, resid, cpc, ward, borough, district, afternoon, session, local_govern
58 welsh, ceremoni, wale, cymru, cardiff, plaid, dawn, deputi, plaid_cymru, councillor, deputi_leader, local_govern
59 elector_commiss, custom_union, brussel, custom, commiss, 2_referendum, chequer, elector, trade, union, darren, remain_campaign, grime, reckon, regulatori
60 fb, regist_vote, regist, young_peopl, ireland, northern_ireland, northern, girl, young, obligatori, peopl_regist, make_voic_hear, mini, make_voic, young_peopl_vote
61 comprehens, smear, tori_mp, jeremi_corbyn, video, msm, accus, expos, jeremi, homeless, bomb_syria, blairit, nerv, attack_corbyn, huffpostukpol
62 deal_brexit, ree, mogg, ree_mogg, deal, jacob, erg, jacob_ree, jacob_ree_mogg, brexit_deal, raab, domin_raab, irish_border, white_paper, countri_brexit
63 skynew, ukip_vote, skynewsbreak, gold, airport, hour, leed, maguir, idiot, abbott, olymp, flight, maker, yorkshir, kevin_maguir
64 vote_labour, labour_parti, labour_win, labour, vote_labour_vote, labour_govern, votelabour, dup, tori_govern, labour_vote, theresa_lose, lose_major, labour_gain, labour_elect, labour_labour
65 realdonaldtrump, piersmorgan, potus, crook, presid, trump, nigel_farag, nigel, presid_trump, fake, anti_trump, haha, cnn, william, fbi
66 today_good, id, bbclaurak, king, enter, broadcast, code, user, text, garden, santa, privaci, marvel, februari, communic