This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\useunder

\ul

11institutetext: Boston University, Boston, MA

Inferring Human Traits From Facebook Statuses

Andrew Cutler    Brian Kulis
Abstract

This paper explores the use of language models to predict 20 human traits from users’ Facebook status updates. The data was collected by the myPersonality project, and includes user statuses along with their personality, gender, political identification, religion, race, satisfaction with life, IQ, self-disclosure, fair-mindedness, and belief in astrology. A single interpretable model meets state of the art results for well-studied tasks such as predicting gender and personality; and sets the standard on other traits such as IQ, sensational interests, political identity, and satisfaction with life. Additionally, highly weighted words are published for each trait. These lists are valuable for creating hypotheses about human behavior, as well as for understanding what information a model is extracting. Using performance and extracted features we analyze models built on social media. The real world problems we explore include gendered classification bias and Cambridge Analytica’s use of psychographic models.

Keywords:
Social Media Psychographic Prediciton NLP.

1 Introduction

Facebook’s 2 billion users spend an average of 50 minutes a day on Facebook, Messenger, or Instagram [1]. Industry seeks to obtain, model and actualize this mountain of data in a variety of ways. For example, social media can be used to establish creditworthiness [2, 3], persuade voters [4, 5], or seek cognitive behavioral therapy from a chatbot [6]. Many of these tasks depend on knowing something about the personal life of the user. When determining the risk of default, a creditor may be interested in a debtor’s impulsiveness or strength of support network. A user’s home town could disambiguate a search term. Or—reflecting society’s values—a social media company may be less willing to flag inflammatory language when the speaker is criticizing their own [7].

Social media’s endlessly logged interactions have also been a boon to understanding human behavior. Researchers have used various social networks to model bullying [8], urban mobility [9], and the interplay of friendship and shared interests [10]. Such studies do not have the benefit of a controlled setting where a single variable can be isolated. However, orders of magnitude more observations in participants’ natural habitat offer more fidelity to lived experience [11]. Additionally subjects can be sampled from countries not so singularly Western, Educated, Industrialized, Rich, and Democratic—or WEIRD, in the parlance of Henrick et al [12].

In this paper we show how readily different personality and demographic information can be extracted from Facebook statuses. Our reported performance is useful to learn how traits are related to online behavior. For example, sensational interests as measured by the Sensational Interest Questionnaire (SIQ) have been studied for internal reliability [13], relationship to physical aggression [14], and role in intrasexual competition [15]. Yet work connecting SIQ with social media use relies on individually labeling sensational interests in statuses and is only predictive among males [16]. Our model performs well for both males and females without hand-labeling statuses. Similarly, other research found no relationship between satisfaction with life (SWL) and status updates [17]; we show modest test set performance. Finally, although Facebook Likes have been shown to be highly predictive of many personal traits [18], language models with good performance on this dataset have been limited to predicting personality, age, and gender [19, 20, 21].

The benchmark also helps assess the efficacy of services that explicitly or implicitly rely on inferring these traits. This is valuable to those developing new services as well as to users concerned about privacy. Of particular interest is the role of psychographic models in Cambridge Analytica’s (CA) marketing strategy. From leaked internal communications, in 2014 CA amassed a dataset of Facebook profiles and traits almost identical to those in the myPersonality dataset [22]. The week after CA’s project became public, Facebook’s stock plummeted $75 billion [23]. One factor in that drop was the belief that Facebook had allowed a third party to create a powerful marketing tool that could manipulate elections [24, 22]. There are dozens of publications on the myPersonality dataset. However, this is the first to predict SIQ, fair-mindedness, and self-disclosure, which CA discussed in relation to building user models [22].

Besides performance benchmarks, the other major contribution of this paper are the most highly weighted words to predict each trait. The weights also say something about human behavior. The interpretation here is more complex: regression on tens of thousands of features is fraught with over-fitting and colinearity. Despite those problems, in Section 3 we argue that the weights can still be treated as a data exploration tool similar to clustering. We provide examples of previously studied relationships that are borne out in the word lists, and believe the lists are a useful tool to develop yet unstudied hypotheses.

Highly weighted features are also an important way to analyze models. We argue in section 4.4 that a militarism predictor CA may have built is accurate, but extracts obvious features. Additionally, by inspecting the features in an Atheist vs. Agnostic classifier we find many gendered words. We demonstrate the bias empirically, then fix the classifier to be more fair. This approach is instructive for interrogating more critical models built on social media data.

This paper includes many contributions that could stand alone. We show that the text of Facebook statuses can predict user SWL and SIQ. We expand the prediction of political identity from a single spectrum (liberal/conservative) to twelve distinct ideologies with varying levels of overlap and popularity. On that task, we establish state of the art performance with a model that also provides informative features for every pairwise political comparison. We recreate models CA may have built, and report their performance and the type of information they extracted. We bring character level deep learning to gender prediction. To our knowledge, we also set the standard for predicting IQ, fair-mindedness, self-disclosure, race, and religion from Facebook statuses. Finally, we propose a novel method to make classification less biased.

Given the broad scope of this paper, some contributions are given less space than they would typically merit. Even so, we believe it is important to report results on many traits in a single paper. This demonstrates the power of a simple model and allows task difficulty and extracted features to be compared across traits without concerns about changing experimental setup.

2 Background

2.1 myPersonality Dataset

From 2008 to 2012, over 7 million Facebook users took the myPersonality quiz produced by the psychologist David Stillwell [11]. After answering at least 20 questions, users were scored on the Big Five personality axes: openness, creativity, extraversion, agreeableness, and neuroticism. Over 3 million of those users agreed to give researchers access to their extant Facebook profile and their personality scores. A much smaller subset of users answered additional questionnaires about their interests, Friends’ personality, belief in astrology, and other personal information. The research community has added to the dataset by providing race labels for several hundred thousand users; representing the text of statuses in terms of their Linguistic Inquiry and Word Count (LIWC) statistics [25]; and much more. Labels used in this study are listed in Tables 2 and 1, along with descriptive statistics. To see all available labels, visit myPersonality.org.

myPersonality.org lists 43 publications that use this data. Most work explores the relationship between personality and easily extractable features such as number of Friends or Likes, geographic location, or user-Like pairs. For example, user-Like pairs are shown to be better predictors of a personality than one’s spouse [26]. In 2013, Schwartz et al introduced the open vocabulary approach (or bag of words) to personality, gender, and age prediction [19]. This significantly outperforms closed-vocabulary approaches such as LIWC that rely on domain knowledge to assign each word to one or more of 69 categories. For an excellent overview of related work, we direct readers to that paper’s introduction [19].

2.2 Language Models

2.2.1 Bag of Words

The majority of our experiments use bag of words (BoW) term frequency-inverse document frequency (tf-idf) preprocessing followed by 2\ell_{2} regularized regression. First, the vocabulary is limited to the kk most common words in a given training set. Then a matrix of word counts, NN, is constructed, where NijN_{ij} refers to how often word jj is used by subject ii. Each row is normalized to sum to one, moved to a log scale, and divided by dd, the ratio of documents in which each word appears. In more formal notation, each element of the tf-idf matrix is defined by

Wij=1+log(Niji=1kNij)dj.W_{ij}=\frac{1+\log\Big{(}\frac{N_{ij}}{\sum_{i=1}^{k}N_{ij}}\Big{)}}{d_{j}}.

WW is then normalized so each row lies on the unit sphere. WW can now be used for linear classification or regression with 2\ell_{2} regularization on the parameters. This is commonly called Ridge Regression. For binary classification problems, labels are assigned values of {1,1}\{-1,1\} and a threshold determines predicted label. For categorical data with more than two labels, we train a classifier on each pair of labels. Predicted label is decided by majority vote of the c(c1)2\frac{c(c-1)}{2} classifiers, where cc is the number of classes.

2.2.2 Character-Level Convolutional Neural Network

For gender prediction, we also train a 49 layer character level convolutional neural network (char-CNN) described in [27]. Much like successful computer vision architectures [28], each character is embedded in continuous space and combined with neighbors by many layers of convolutional filters. Unlike BoW models, CNNs preserve the temporal dimension, allowing the use of syntactic information. While a great advantage, and theoretically more similar to human cognition, this requires different preprocessing. During training, all inputs must be the same length along the temporal axis despite the wide variation in total length of users’ statuses. We chose to split users’ concatenated statuses into chunks of no more than 4000 characters, and no less than 1000, as this is enough text for humans to perform gender classification [29]. Each chunk contains roughly 800 words. Chunks from the same user are assigned entirely to either the training or test set. Unfortunately, preprocessing differences do not allow for a direct comparison between methods. However, enforcing the same preprocessing for both models would necessarily limit one.

2.3 Labels

Tables 1 and 2 provide statistics of the continuous and categorical data respectively. What follows is a brief description of each label and how it was collected.

2.3.1 Gender

is the binary label users supplied when setting up their Facebook account. Offering this information was common before 2008, and mandatory from 2008-2014. In 2014, (after the collection of this dataset) Facebook added 56 more gender options but still uses a binary representation to monetize users [30].

2.3.2 Race

labels provided in the dataset are inferred from profile pictures using the Faceplusplus.com algorithm which can identify races termed White, Black, and Asian. A noisy measure of visual phenotype is not the gold standard for the study of race, however, our results indicate it is related to social media use.

2.3.3 Political identity

is limited to the twelve most common responses: IPA, anarchist, centrist, conservative, democrat, doesn’t care, hates politics, independent, liberal, libertarian, republican, and very liberal. These are heterogenous categories from an open-ended question. No work was done to limit labels to political parties (eg. remove “doesn’t care”), disambiguate misspelled or similar responses (eg. combine “anarchy” and “anarchist” or “liberal” and “very liberal”), or limit responses to one country. To produce the word list for Liberals and Conservatives in Table 15, we combine “liberal”, “very liberal”, and “democrat” as well as “conservative”, “very conservative”, and “republican”. The most likely meaning of IPA is the Independence Party of America, which was in its nascence during this survey. The party is most popular among young people disaffected by the two party system, a sentiment reflected by the users who report IPA.

2.3.4 Religion

categories were limited to the nine most common responses, and similar labels were combined. Three variants of Catholic—“catholic”,“christian-catholic”, and “romancatholic”—were merged to form Catholic. Likewise, Christian refers to “christian”, “christian-baptist” and “christian-evangelical”. The entire list includes: Atheist, Agnostic, Catholic, Christian, Hindu, and None.

2.3.5 Belief in star sign

is the user’s response to “Horoscopes provide useful information to help guide my decisions?” Options include: Strongly Agree, Slightly Agree, No Opinion, Slightly Disagree, and Strongly Disagree.

2.3.6 Personality

is determined on five axes—Openness, Conscientiousness, Extroversion, Agreeableness, and Neurotocism—by a survey. Users answer 20-300 questions which are used to score each personality component on a scale of 1-5. There is a large body of research showing that five factor analysis is explanatory for behavior [31], and its measurement is reproducible [32]. That work is now adapting to larger datasets collected online [11].

2.3.7 Sensational Interests

include Militarism, Violent-Occult, Intellectual Recreation, Occult Credulousness, and Wholesome activities. Users can indicate “Great Dislike”, “Slight Dislike”, “No Opinion”, “Slight Interest”, and “Great Interest” for 28 different items including: “Drugs”, “Paganism”, “Philosophy”, “Survivalism”, and “Vampires and Wolves”. Interest levels are calculated by summing responses from relevant items. The full calculation can be found in [13].

2.3.8 IQ

is determined by 20 questions that conform to Raven’s Standard Progressive Matrices. The development and validation of these questions is explained in [33] and [34]. Because performance on IQ tests has been rising at roughly 0.3 points a year over the past century and IQ is defined as mean 100, the scoring of a test is properly defined over an age cohort [35]. These scores do not take age into account and the mean is 114.

2.3.9 Satisfaction with life, self-disclosure, and fair-mindedness

are assessed by separate questionnaires. SWL is a measure of global well being somewhat robust to short term mood fluctuations [36].

3 The Interpretation of Feature Weights

A common approach to understand traits in social science is to solve

X=UT+ϵ,X=UT+\epsilon,

where XX is observations of subjects, TT is the traits of subjects, UU is a transition matrix, and ϵ\epsilon is model error [3, 13, 37, 38, 39, 40, 41, 42, 43]. Traits are preferred to be orthogonal to promote compactness without sacrificing modeling power. The Big 5 personality model is both criticized and defended on grounds of trait independence, explanatory power, and measureability, which conforms to the linear model above [44]. Because the traits are defined by language they will not be completely orthogonal. Additionally, observations are not independent. As such, values in UU will have dependencies across both rows and columns. Some traits like personality are used to predict other traits or life events [13, 40]. Learning those relationships can be interpreted as informing our beliefs about column dependencies for UU when both traits are part of TT.

In this paper, XX is the tf-idf word matrix, TT is defined by our labels, and the model weights are some estimate of UU we define as U^\hat{U}. Row dependencies in U^\hat{U} are based on how words function. For example, ‘camp’ and ‘camping’ perform similar roles in a status. Likewise, the relationship between IQ and agreeableness will be embedded in the columns of U^\hat{U}. However, many of the tasks have little training data and the solution is ill-posed. Regularization encourages generalization, but does not provide any guarantees. Further, sometimes ϵ\epsilon dominates the model when observations are not very explanatory or the relationship to a trait is not linear. Given these challenges, what confidence can be placed in the estimate U^\hat{U}?

These problems mirror those faced when clustering data. Clustering does not come with guarantees it will yield sensible answers in diverse scenarios [45]. However, it is broadly useful when exploring large sets of data [46, 47, 48]. Similarly, U^\hat{U} can be viewed as a way of ranking features for exploration. A highly ranked observation is not proof it is important. But several highly ranked observations with functional coherence may suggest a hypothesis; particularly when coupled with domain knowledge of row and column dependencies in UU.

The 55 most highly weighted features for each label are reported in the Appendix. Though the word lists are shown in order of importance, this ranking is not strict. Different regularization, preprocessing, or train/test splits can alter the ordering, especially when there are few examples. Additionally, more common words with lower weights may be used more often in a model’s prediction, but may not appear at the top of a list. One may use 1\ell_{1} regularization to obtain an arbitrary small number of non-zero weights [49]. This encourages weighting common words and provides more stable rankings. We demonstrate that approach with our IQ model in Section 4.2.5.

There are many well-studied phenomena embedded in the U^\hat{U} produced by our work. For example, Sarah Palin is the only politician indicated in the liberal word list in Table 15. Likewise, Nancy Pelosi ranks just below Ronald Reagan among conservative words. This accords with literature on the memorability of negative ads [50], importance of outgroup prejudice for social identity [51, 52], and biases women face in politics [53, 54]. We hope the many word lists in the appendix will be useful to researchers in the development of new hypotheses.

U^\hat{U} is also useful to understand models built on social media data. Until recently, the models themselves were not very important. However, machine learning can now be used to estimate sensitive traits such criminal recidivism [43]. Given the literalness with which estimates are often interpreted, it is essential to note that model weights are causal for the predicted label. In Section 4.5 we use our understanding of the input features to characterize information the model extracts to predict religion. This dataset also includes demographic labels, which show predicted religion labels are more gendered than the ground truth.

We hope the included word lists (a) highlight unstudied relationships about these traits (b) illustrate what kind of information is extracted from social media by machine learning systems.

4 Results and Discussion

4.1 Experimental Setup

All BoW experiments employ the same preprocessing. Users must have over 500 words in the sum of all their statuses. 80% of the data is randomly assigned to the training set; the remaining samples constitute the test set. The vocabulary is limited to the 40,000 most common words in each training set. Words must be used by at least 10 users but no more than 60% of users in the training set. The regularization parameter is tuned via efficient leave one out cross validation [55] when n<10,000n<10,000, and 33-fold cross validation for larger datasets. All BoW models are implemented using the sklearn library [56]. Table 1 reports the number of samples and explained variance (EV) of the predictions on continuous data. Table 2 reports the number of classes, ratio of samples in the dominant class, homogeneity, and performance on tasks with categorical data.

Table 1: Prediction Accuracy on Continuous Data
Label N EV
Personality
Openness 84451 0.171
Conscientiousness 84451 0.120
Extroversion 84451 0.141
Agreeableness 84451 0.090
Neuroticism 84451 0.100
Sensational Interests
Militarism 4074 0.165
Violent-Occult 4074 0.192
Intellectual Recreation 4074 0.033
Occult Credulousness 4074 0.144
Wholesome Activities 4074 0.108
Satisfaction With Life 2502 0.034
Self Disclosure 2006 0.092
Fair-Mindedness 2006 0.064
IQ 1807 0.128

Explained Variance (EV) is 1-Var(yy^)Var(y)\frac{\mathrm{Var}(y-\hat{y})}{\mathrm{Var}(y)}, where y^\hat{y} is the predicted label.

Table 2: Prediction Accuracy on Categorical Data
Label N Classes Mode Homogeneity F1-score Acc
Gender 109104 2 0.598 0.519 0.92 0.903
Race 22059 3 0.682 0.52 0.74 0.766
Political identity 19769 12 0.213 0.133 0.33 0.337
Religious identity 8388 5 0.488 0.318 0.54 0.541
Belief in Star Sign 7115 5 0.331 0.245 0.32 0.334

Mode is the ratio of the dominant class. Homogeneity is the probability two random samples will be of the same class. The F1-Score is the harmonic mean of precision and recall. For non-binary labels, the precision and recall for each class is weighted by its support.

Table 3: Gender Prediction
Model Accuracy
Human Majority Vote 0.840
LIWC 0.784
Tri-grams 0.914
Tri-grams + LIWC 0.916
BoW (40k Vocab) 0.903
BoW (500k Vocab) 0.928
49 layer char-CNN 0.901

Human baseline is the majority vote (n=210) in gender prediction on Twitter data [29]. LIWC and Tri-grams are reported in [19].

4.2 Performance

4.2.1 Gender

Table 3 compares our gender predictor to several other methods. The BoW model with a vocabulary of 500,000 yields accuracy of 92.8%, 1.4% more accurate than the tri-gram model reported by Schwartz et al [19]. Even though the same dataset is used, the comparison is not direct. The tri-gram model seeks to remove the age information from words, has a larger vocabulary, preserves some temporal relationships in the tri-grams, and draws a different train/test split. Moreover, the preprocessing is more restrictive and only includes users with at least 1000 words. Notwithstanding these discrepancies, which may boost or dampen performance, the results are very similar. When the LIWC representation is added to the tri-grams, there is a slight improvement to 91.6% accuracy. Preprocessing is even less similar for the char-CNN described in the Section 2.2.2. The human baseline of 84.0% consists of volunteer judgments based on 20-40 user tweets as reported by Nguyen et al [29]. This is less text than is available to the other models, and from a different social media platform. But, with 210 volunteer guesses per user, it provides a relevant human baseline.

4.2.2 Personality

After gender, personality is the most studied trait in this paper. Likewise, Schwartz et al achieve the best results to date [19]. They report the square root of EV to two significant digits: 0.42, 0.35, 0.38, 0.31, 0.31. In that format, we are just 0.01 beneath the state of the art for openness and agreeableness, 0.01 better for neuroticism, and equivalent for the remaining traits. As with gender, we achieve this with a simpler model.

4.2.3 Political Identity

Prediction accuracy of 33.7% is a gain of 11.7% over the baseline strategy of always predicting the mode, ‘doesn’t care’. As noted in the experiments section, training samples are weighted inversely to their class representation; therefore, ignoring any class will result in an equal loss. This does not provide the highest classification accuracy. However, we believe when some classes are sparsely populated an MSE optimal classifier that is highly biased toward the mode should not be the standard. For reference, equal sample weights and the same training scheme yield classification accuracy of 36.3% and a weighted f1 score of 31.6%. Five classes—IPA, hates politics, independent, libertarian, and very liberal—have no representation in the test set predictions. The weighted classifier predicts each class at least once.

According to Preotiuc-Pietro et al., all previous research on predicting political ideology from social media text has used binary labels such as liberal vs conservative or Democrat vs Republican. They broaden the classification task to include seven gradations on the liberal to conservative spectrum [57]. When predicting ideological tilt from tweets, they achieve a 2.6% boost over baseline (19.6%) with BoW follow by logistic regression. Word2Vec feature embeddings [58] and multi-target learning with some hand-crafted labels yield an 8.0% boost. From classification along grades of a single spectrum, we significantly expand the task to twelve diverse identities with varying levels of representation and ideological overlap while maintaining classification accuracy.

In Table 6 we report the matrix of highest weighted words for separating users in each pairwise class comparison. As with race, belief in star sign, and religion, we plan on making expanded pairwise lists available online. In Table 7 we report the confusion matrix. Note that many errors are between similar labels, such as liberal and democrat. Ease of training, strong performance, and representation of minority classes make a majority vote system of shallow pairwise classifiers a good approach for this task.

For binary comparison, by pooling {‘very liberal’,‘liberal’,‘democrat’} and {‘very conservative’,‘conservative’,‘republican’} we achieve 76.4% accuracy; 12.1% above baseline. Table 15 shows the top 55 liberal and conservative words.

4.2.4 Religion

Religion seems to be more difficult to glean from statuses than political identity. At 54.1%, accuracy is a modest 5.3% above guessing the mode. The most highly weighted pairwise words are on Table 8, and Table 9 shows the confusion matrix. The most highly weighted word to distinguish someone who is agnostic from an atheist is ‘boyfriend’. This led us to look deeper at that pairwise classifier in Section 4.5. Binary labels were constructed by pooling {‘catholic’, ‘christian-catholic’, ‘romancatholic’, ‘christian’, ‘christian-baptist’} and {‘atheist’, ‘agnostic’,‘none’ }. We achieve 78.0% accuracy, 5.2% above baseline. Those words are on table 15. To our knowledge, there is no other multi class religion predictor to which our results can be compared.

4.2.5 IQ

In a genome wide association meta study of 78,308 individuals, 336 single nucleotide polymorphisms were found to explain 2.1-4.8% of the IQ variance among the test population [59]. We achieve 12.8% EV with a model trained on less than 2000 users and their statuses. Using 1\ell_{1} regularization to limit the vocabulary to the ten most informative words—final, physics; ayaw, family, friend, heart, lmao, nite, strong, ur—still yields 5.6% percent EV. The relative accuracy of such a trivial model that leverages intuitive features is a helpful comparison for any project predicting this important trait. To our knowledge, this is the only work to date that infers IQ from social media.

The selected features are also informative. Words suggesting intelligence—‘final’ and ‘physics’—are parsimonious and singularly academic. Whereas the university experience is sufficient to find users with high IQ, features inversely related to IQ are more focused on disposition. From table 10, agreeableness is implied by ‘family’ and ‘heart’; conscientiousness is implied by ‘family’ and ‘lmao’; and low openness is implied by ‘ur’. Overall, the list can be characterized as prosocial, or at least concerned with social relationships. Predicting low IQ with prosocial features seems to challenge some previous research.

Gottlieb et al observed that learning disabled children were more likely to engage in solitary play [60]. Play has also been observed to be more aggressive [61]. More directly related to our task, McConaughy and Ritter showed a positive correlation between the IQ of learning disabled boys and social competence scores; and a negative correlation between IQ and behavior problem scores [62]. For further review of the subject see [63].

An MSE optimal classifier seeks to generalize information about samples near the average. This can cause bias when classifying minorities, but is instructive when interpreting features. Features should say something about the majority of our sample, those with IQ near the mean. This explains why antisocial behavior among those with extremely low IQ does not preclude prosocial behavior indicating moderately lower IQ. Reflecting the limitations of this type of study, words like ‘family’, ‘friend’, and ‘heart’ could also be caused by differing norms for social media use or many other factors. Prosocial words predicting lower IQ does however suggest interesting future work.

4.2.6 Sensational Interests

In this study, SIQ is the easiest continuous variable to predict, even with an order of magnitude less training data than personality. The SIQ asks lists 28 discrete interests like ‘black magic’ and ‘the armed forces’. Very similar terms can be recovered from statuses: ‘zombie’, ‘blood’, ‘vampire’; ‘military’, ‘marines’, ‘training’. Personality tests, on the other hand, ask more abstract questions like ‘I shirk my duties’ for conscientiousness. Many of these duties seem to be extracted in Table 10: ‘studying’, ‘busy’,‘obstacles’. But many more training examples are required for similar performance.

This is the first work to demonstrate an automatic system for predicting SIQ. Previous research relied on manually counting the number of sensational interests in statuses. The count was only correlated with militarism among men; the relationship was negative for women [16].

4.2.7 Satisfaction With Life

Previous research cast doubt on the relationship between status updates and SWL [17]. The number of positive words used on Facebook nationwide in a given day, week, or month, is inversely correlated with the SWL of that time period’s myPersonality participants. The interpretation of that result is that it “challenges the assumption that linguistic analysis of internet messages is related to underlying psychological states.” Here we show that a BoW model accounts for 3.4% of the variance in SWL scores. Moreover, the most important words the model finds are intuitive. Lower SWL is implied by “fucking”, “hate”, “bored”, “interview”, “sick”, “hospital”, “insomnia”, “farmville”, and “video”. The deleterious effects of joblessness, anger, chronic illness, and isolation are well documented. Words positively associated with SWL—“camping”, “imagination”, “epic”, “cleaned”, “success”—make similar sense.

Conversational AI on Facebook Messenger is an efficacious and scalable way to administer cognitive behavioral therapy [6]. Our results show linguistic analysis can shed light on underlying psychological states. This is important to find users that could benefit from such treatment.

4.2.8 Belief in Star Sign

Compared to political identity, BSS has seven fewer classes and a far more homogeneous distribution. Even so, the BSS classifier performs slightly worse than the politics classifier and roughly on par to the baseline of predicting the mode. Unlike our race, gender, politics and sensational interests, we don’t wear belief in astrology on our sleeve.

4.3 Model Selection

BoW models are somewhat unintuitive. Humans use syntactic information when decoding language, which the model discards. Yet, for many tasks they achieve state of the art performance. We compare our BoW to a character-level CNN on gender prediction, our most data rich problem. A character-level CNN is well suited to large amounts of messy, user generated data. Pooling layers in a CNN allow generalization of words like “gooooooooo” and “gooooooo”, while BoW must learn distinct weights. Surprisingly, the CNN does not outperform the simple BoW as shown in Table 3.

We found the choice of prediction model is not as important as preprocessing. In initial experiments, Support Vector Machines [64] and logistic regression, and 2\ell_{2} regularized regression yielded similar performance, depending on choice of nn-grams and whether Singular Value Decomposition was used [65]. We implement ridge regression and classification for simplicity.

Inferring human traits from social media is now being done using deep models [66, 57]. That may be useful in some cases, but for this project the deep model offered no performance boost or intuition to underlying human behavior. Perhaps a continuous bag of words [58] and recurrent neural network [67] would have done better, but researchers should not consider deep learning essential for this field. Moreover, any performance gains should be weighed against loss of interpretability.

4.4 Cambridge Analytica

With current technology, Facebook statuses are a better predictor of someone’s IQ than the totality of their genetic material [59]. When a marketing firm adds such a tool to their arsenal it is natural to be suspicious. Indeed, The Guardian article that broke the CA story was headlined “‘I made Steve Bannon’s psychological warfare tool’: meet the data war whistleblower” [24]. (Steve Bannon is the former chief executive of the Trump presidential campaign.) However, closer inspection of psychographic models casts doubt on their ability to add value to an advertising campaign, even when the predictions are accurate. In this paper we show that militarism is one of the most easily inferred traits. At 16.5% explained variance, it is more predictable than any of the big 5 personality traits except openness, even with just 5% of the training data. SIQ is also a much stronger predictor of aggressive behavior than the Big 5 [14]. If this trait was actionable for the Trump campaign, it is interesting that the two most highly weighted features are ‘xbox’ and ‘man’. Gaming interest and gender are already available via Facebook’s advertising platform; reaching that demographic does not require an independent model. Additionally, Steve Bannon’s belief in the political power of gamers predates CA’s psychographic model by a decade [68].

Readers are encouraged to view the word lists in the Appendix through the lens of task accuracy on Tables 1 and 2. They may come to the same conclusion as the Trump campaign who, according to CBS News, “never used the psychographic data at the heart of a whistleblower who once worked to help acquire the data’s reporting – principally because it was relatively new and of suspect quality and value.” [69]. Performance results and extracted features allow for more informed discussion; particularly for SIQ, fair-mindedness and self-disclosure on which we report the first accurate prediction model.

There are limitations to this analysis. Our models only use statuses; Likes and network statistics could increase accuracy. Further, other psychographic traits beyond militarism may be politically useful but have no obvious demographic stand-in. Finally, we don’t have access to CA’s exact dataset and instead built our models on the myPersonality dataset.

Table 4: Agnostic vs Atheist Confusion Matrix
Predicted (Men)
Agnostic Atheist Total
True Agnostic 36 33 69
Atheist 28 58 86
Total 64 91
Predicted (Women)
Agnostic Atheist Total
86 21 107
34 16 50
120 37
Table 5: Fair Agnostic vs Atheist Confusion Matrix
Predicted (Men)
Agnostic Atheist Total
True Agnostic 40 29 69
Atheist 31 55 86
Total 71 84
Predicted (Women)
Agnostic Atheist Total
85 22 107
31 19 50
116 41

4.5 Gender Bias in Atheist vs Agnostic Classifier

Highly weighted atheist words include “fucking”, “bloody”, “maths”, “degrees”, “disease”, “wifey”, and “religion”. Meanwhile, “beautiful”, “santa”, “friggin”, “thank”, “hubby”, “miles”, and “paperwork” imply the user is agnostic. This paints a picture of academic, male, disagreeable and British atheists. Agnostic words are more positive, female, and related to mundane preparation. A more complete list is shown in Table 15. What follows is an empirical analysis of our estimator‘s gender bias, a discussion of fairness, and results debiasing the model.

In this dataset, atheists and agnostics are 33.5% and 50.3% female respectively. This is a stronger female preference for agnosticism than random surveys across the United States which report 32% and 38%, respectively [70]. Table 4 shows the confusion matrices for men and women. The ratio of predicted to true agnostics is 0.945 for men and 1.35 for women. Similarly, the ratio of false atheist to false agnostic predictions is 90.8% larger for men than women. The classification of women, the minority in this dataset, is highly distorted.

Models built to generalize information often amplify biases in training data. Cooking videos elicit female pronouns in machine-generated captions 68% more than male pronouns, even though the training shows only 33% more women cooking [71]. Word embeddings used in machine translation [72], information retrieval [73], and student grade prediction [74] produce analogies such as “man is to computer programmer as woman is to homemaker”[75].

There are many notions of fairness defined over an individual [76, 77, 78], population [79, 80], or information available to the model [81]. Building a fair estimator often requires domain knowledge to define a similarity metric [76], make corpus-level constraints [71], or construct a causal model that separates protected information from other latent variables [78]. In this paper, we will use the notion of Disparate Mistreatment to measure fairness [79]. That is, if protected classes experience disparate rates of false positive, false negative or overall misclassification, the estimator is unfair.

To mitigate Disparate Mistreatment we explicitly encode gender—{1-1,0,11} for {male, unknown, female}—in the feature vector during train time. At test time the gender of all samples is encoded as unknown. The intuition is that latent variables are amplified when they are easy to extract and correlated with the target. As demonstrated by the accuracy of our race and gender predictors, that is often the case for protected information. There often exist more informative, if more subtle, traits than the protected features. For example, atheists and agnostics report a yawning gap in those that don’t believe in God, at 92% and 41% [70]. Additionally, religiosity is shown to be correlated with both Agreeableness and Conscientiousness [82]. But gender is much easier to extract then belief in God or personality. By explicitly giving the model gender information, we hope that the model will do more to extract those other features.

This approach produces much less Disparate Mistreatment of men and women. The ratio of predicted to true agnostics moves closer to parity at 1.02 for men and 1.22 for women. Additionally, the ratio of false atheist to false agnostic predictions is now only 31.8% larger for men, compared to 90.8% without intervention. The most highly weighted agnostic words for the new fair classifier are also less gendered; “hair”, “wifey”, and “boyfriend” are no longer in the top 55, as reported in Table 15. We also saw no decay in classification rate.

The gender bias of the atheism classifier is clear by simply inspecting its most heavily weighted features. More opaque models should be subjected to more rigorous inspection for bias.

5 Conclusion and Future Work

We match or set the state of the art for the 20 traits in this paper. Additionally, we provide the top words for many pairwise classification problems, and top 55 words for regression or binary classification problems. We hope researchers from many fields find the benchmarks and word lists useful. Our analysis of psychographic models in marketing as well as gender bias in a religion classifier are examples of how these performance measures and extracted features can be used together.

In future work we hope to explore what types of unfairness can be solved by our approach in Section 4.5. Further, models built on traits with few examples are well suited to be augmented by transfer learning. This is especially pressing for detecting states like low satisfaction with life, which can be somewhat ameliorated at low cost.

References

  • [1] J. B. Stewart, “Facebook has 50 minutes of your time each day. it wants more,” The New York Times, vol. 5, 2016.
  • [2] SunCorp, “Digitising reputation pays off in the rental market,” 2017.
  • [3] A. E. Khandani, A. J. Kim, and A. W. Lo, “Consumer credit-risk models via machine-learning algorithms,” Journal of Banking & Finance, vol. 34, no. 11, pp. 2767–2787, 2010.
  • [4] D. L. Cogburn and F. K. Espinoza-Vasquez, “From networked nominee to networked nation: Examining the impact of web 2.0 and social media on political participation and civic engagement in the 2008 obama campaign,” Journal of Political Marketing, vol. 10, no. 1-2, pp. 189–213, 2011.
  • [5] R. J. González, “Hacking the citizenry?: Personality profiling,‘big data’and the election of donald trump,” Anthropology Today, vol. 33, no. 3, pp. 9–12, 2017.
  • [6] K. K. Fitzpatrick, A. Darcy, and M. Vierhile, “Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial,” JMIR mental health, vol. 4, no. 2, 2017.
  • [7] R. Allan, “Hard questions: Who should decide what is hate speech in an online global community?,” 2017.
  • [8] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec, “Antisocial behavior in online discussion communities.,” in ICWSM, pp. 61–70, 2015.
  • [9] A. Noulas, S. Scellato, R. Lambiotte, M. Pontil, and C. Mascolo, “A tale of many cities: universal patterns in human urban mobility,” PloS one, vol. 7, no. 5, p. e37027, 2012.
  • [10] S.-H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha, “Like like alike: joint friendship and interest propagation in social networks,” in Proceedings of the 20th international conference on World wide web, pp. 537–546, ACM, 2011.
  • [11] M. Kosinski, S. C. Matz, S. D. Gosling, V. Popov, and D. Stillwell, “Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines.,” American Psychologist, vol. 70, no. 6, p. 543, 2015.
  • [12] J. Henrich, S. J. Heine, and A. Norenzayan, “The weirdest people in the world?,” Behavioral and Brain Sciences, vol. 33, no. 2-3, p. 61–83, 2010.
  • [13] V. Egan, J. Auty, R. Miller, S. Ahmadi, C. Richardson, and I. Gargan, “Sensational interests and general personality traits,” The Journal of Forensic Psychiatry, vol. 10, no. 3, pp. 567–582, 1999.
  • [14] V. Egan and V. Campbell, “Sensational interests, sustaining fantasies and personality predict physical aggression,” Personality and Individual Differences, vol. 47, no. 5, pp. 464–469, 2009.
  • [15] A. Weiss, V. Egan, and A. J. Figueredo, “Sensational interests as a form of intrasexual competition,” Personality and Individual Differences, vol. 36, no. 3, pp. 563–573, 2004.
  • [16] G. Hagger-Johnson, V. Egan, and D. Stillwell, “Are social networking profiles reliable indicators of sensational interests?,” Journal of Research in Personality, vol. 45, no. 1, pp. 71–76, 2011.
  • [17] N. Wang, M. Kosinski, D. Stillwell, and J. Rust, “Can well-being be measured using facebook status updates? validation of facebook’s gross national happiness index,” Social Indicators Research, vol. 115, no. 1, pp. 483–491, 2014.
  • [18] M. Kosinski, D. Stillwell, and T. Graepel, “Private traits and attributes are predictable from digital records of human behavior,” Proceedings of the National Academy of Sciences, vol. 110, no. 15, pp. 5802–5805, 2013.
  • [19] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p. e73791, 2013.
  • [20] G. Farnadi, G. Sitaraman, S. Sushmita, F. Celli, M. Kosinski, D. Stillwell, S. Davalos, M.-F. Moens, and M. De Cock, “Computational personality recognition in social media,” User modeling and user-adapted interaction, vol. 26, no. 2-3, pp. 109–142, 2016.
  • [21] M. Sap, G. Park, J. Eichstaedt, M. Kern, D. Stillwell, M. Kosinski, L. Ungar, and H. A. Schwartz, “Developing age and gender predictive lexica over social media,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151, 2014.
  • [22] N. Y. Times, “How trump consultants exploited the data of millions,” 2018.
  • [23] M. Watch, “Facebook valuation drops $75 billion in week after cambridge analytica scandal,” 2018.
  • [24] T. Guardian, “‘i made steve bannon’s psychological warfare tool’: meet the data war whistleblower,” 2018.
  • [25] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, p. 2001, 2001.
  • [26] W. Youyou, M. Kosinski, and D. Stillwell, “Computer-based personality judgments are more accurate than those made by humans,” Proceedings of the National Academy of Sciences, vol. 112, no. 4, pp. 1036–1040, 2015.
  • [27] A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very deep convolutional networks for text classification,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, pp. 1107–1116, 2017.
  • [28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
  • [29] D. Nguyen, D. Trieschnigg, A. S. Doğruöz, R. Gravel, M. Theune, T. Meder, and F. De Jong, “Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1950–1961, 2014.
  • [30] R. Bivens, “The gender binary will not be deprogrammed: Ten years of coding gender on facebook,” New Media & Society, vol. 19, no. 6, pp. 880–898, 2017.
  • [31] J. M. Digman, “Personality structure: Emergence of the five-factor model,” Annual review of psychology, vol. 41, no. 1, pp. 417–440, 1990.
  • [32] R. R. McCrae and P. T. Costa, “Validation of the five-factor model of personality across instruments and observers.,” Journal of personality and social psychology, vol. 52, no. 1, p. 81, 1987.
  • [33] M. LLC, “The development and piloting of an online iq test,” 2014.
  • [34] M. Kosinski, “Measurement and prediction of individual and group differences in the digital environment,” Department of Psychology University of Cambridge, 2014.
  • [35] J. R. Flynn, “Massive iq gains in 14 nations: What iq tests really measure.,” Psychological bulletin, vol. 101, no. 2, p. 171, 1987.
  • [36] E. Diener, R. A. Emmons, R. J. Larsen, and S. Griffin, “The satisfaction with life scale,” Journal of personality assessment, vol. 49, no. 1, pp. 71–75, 1985.
  • [37] L. Cooke, J. Wardle, E. Gibson, M. Sapochnik, A. Sheiham, and M. Lawson, “Demographic, familial and trait predictors of fruit and vegetable consumption by pre-school children,” Public health nutrition, vol. 7, no. 2, pp. 295–302, 2004.
  • [38] M. Peciña, H. Azhar, T. M. Love, T. Lu, B. L. Fredrickson, C. S. Stohler, and J.-K. Zubieta, “Personality trait predictors of placebo analgesia and neurobiological correlates,” Neuropsychopharmacology, vol. 38, no. 4, p. 639, 2013.
  • [39] L. C. Quilty, M. Sellbom, J. L. Tackett, and R. M. Bagby, “Personality trait predictors of bipolar disorder symptoms,” Psychiatry Research, vol. 169, no. 2, pp. 159–163, 2009.
  • [40] R. P. Tett, D. N. Jackson, and M. Rothstein, “Personality measures as predictors of job performance: a meta-analytic review,” Personnel psychology, vol. 44, no. 4, pp. 703–742, 1991.
  • [41] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E. Seligman, “Automatic personality assessment through social media language.,” Journal of personality and social psychology, vol. 108, no. 6, p. 934, 2015.
  • [42] N. Cesare, C. Grant, and E. O. Nsoesie, “Detection of user demographics on social media: A review of methods and recommendations for best practices,” arXiv preprint arXiv:1702.01807, 2017.
  • [43] J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,” arXiv preprint arXiv:1609.05807, 2016.
  • [44] O. P. John and S. Srivastava, “The big five trait taxonomy: History, measurement, and theoretical perspectives,” Handbook of personality: Theory and research, vol. 2, no. 1999, pp. 102–138, 1999.
  • [45] J. M. Kleinberg, “An impossibility theorem for clustering,” in Advances in neural information processing systems, pp. 463–470, 2003.
  • [46] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
  • [47] R. Shamir and R. Sharan, “1 1 algorithmic approaches to clustering gene expression data,” Current Topics in Computational Molecular Biology, p. 269, 2002.
  • [48] S. Dixon, E. Pampalk, and G. Widmer, “Classification of dance music by periodicity patterns,” 2003.
  • [49] N. Meinshausen and B. Yu, “Lasso-type recovery of sparse representations for high-dimensional data,” The Annals of Statistics, pp. 246–270, 2009.
  • [50] R. R. Lau, L. Sigelman, and I. B. Rovner, “The effects of negative political campaigns: a meta-analytic reassessment,” Journal of Politics, vol. 69, no. 4, pp. 1176–1209, 2007.
  • [51] L. Huddy, “Group identity and political cohesion,” Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, 2003.
  • [52] N. R. Branscombe and D. L. Wann, “Collective self-esteem consequences of outgroup derogation when a valued social identity is on trial,” European Journal of Social Psychology, vol. 24, no. 6, pp. 641–657, 1994.
  • [53] M. C. Schneider and A. L. Bos, “Measuring stereotypes of female politicians,” Political Psychology, vol. 35, no. 2, pp. 245–266, 2014.
  • [54] K. Dolan, “The impact of gender stereotyped evaluations on support for women candidates,” Political Behavior, vol. 32, no. 1, pp. 69–88, 2010.
  • [55] A. Vehtari, A. Gelman, and J. Gabry, “Efficient implementation of leave-one-out cross-validation and waic for evaluating fitted bayesian models,” arXiv preprint arXiv:1507.04544, 2015.
  • [56] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  • [57] D. Preoţiuc-Pietro, Y. Liu, D. Hopkins, and L. Ungar, “Beyond binary labels: political ideology prediction of twitter users,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 729–740, 2017.
  • [58] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
  • [59] S. Sniekers, S. Stringer, K. Watanabe, P. R. Jansen, J. R. Coleman, E. Krapohl, E. Taskesen, A. R. Hammerschlag, A. Okbay, D. Zabaneh, et al., “Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence,” Nature genetics, vol. 49, no. 7, p. 1107, 2017.
  • [60] B. W. Gottlieb, J. Gottlieb, D. Berkell, and L. Levy, “Sociometric status and solitary play of ld boys and girls,” Journal of Learning Disabilities, vol. 19, no. 10, pp. 619–622, 1986.
  • [61] T. Bryan, R. Wheeler, J. Felcan, and T. Henek, ““come on, dummy” an observational study of children’s communications,” Journal of Learning Disabilities, vol. 9, no. 10, pp. 661–669, 1976.
  • [62] S. H. McConaughy and D. R. Ritter, “Social competence and behavioral problems of learning disabled boys aged 6-11,” Journal of Learning Disabilities, vol. 19, no. 1, pp. 39–45, 1986.
  • [63] C. J. Bellanti and K. L. Bierman, “Disentangling the impact of low cognitive ability and inattention on social behavior and peer relationships,” Journal of Clinical Child Psychology, vol. 29, no. 1, pp. 66–75, 2000.
  • [64] J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural processing letters, vol. 9, no. 3, pp. 293–300, 1999.
  • [65] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” Numerische mathematik, vol. 14, no. 5, pp. 403–420, 1970.
  • [66] M. Iyyer, P. Enns, J. Boyd-Graber, and P. Resnik, “Political ideology detection using recursive neural networks,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1113–1122, 2014.
  • [67] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm,” arXiv preprint arXiv:1708.00524, 2017.
  • [68] Wired, “The decline and fall of an ultra rich online gaming empire,” 2008.
  • [69] C. News, “Trump campaign phased out use of cambridge analytica data before election,” 2018.
  • [70] Pew, “Religious landscape study.,” 2014.
  • [71] J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang, “Men also like shopping: Reducing gender bias amplification using corpus-level constraints,” arXiv preprint arXiv:1707.09457, 2017.
  • [72] W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398, 2013.
  • [73] S. Clinchant and F. Perronnin, “Aggregating continuous word embeddings for information retrieval,” in Proceedings of the workshop on continuous vector space models and their compositionality, pp. 100–109, 2013.
  • [74] J. Luo, S. E. Sorour, K. Goda, and T. Mine, “Predicting student grade based on free-style comments using word2vec and ann by considering prediction results obtained in consecutive lessons.,” International Educational Data Mining Society, 2015.
  • [75] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” in Advances in Neural Information Processing Systems, pp. 4349–4357, 2016.
  • [76] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226, ACM, 2012.
  • [77] M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth, “Rawlsian fairness for machine learning,” arXiv preprint arXiv:1610.09559, 2016.
  • [78] M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” in Advances in Neural Information Processing Systems, pp. 4069–4079, 2017.
  • [79] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi, “Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment,” in Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180, International World Wide Web Conferences Steering Committee, 2017.
  • [80] M. Hardt, E. Price, N. Srebro, et al., “Equality of opportunity in supervised learning,” in Advances in neural information processing systems, pp. 3315–3323, 2016.
  • [81] N. Grgic-Hlaca, M. B. Zafar, K. P. Gummadi, and A. Weller, “The case for process fairness in learning: Feature selection for fair decision making,” in NIPS Symposium on Machine Learning and the Law, vol. 1, p. 2, 2016.
  • [82] V. Saroglou, “Religiousness as a cultural adaptation of basic traits: A five-factor model perspective,” Personality and social psychology review, vol. 14, no. 1, pp. 108–125, 2010.
Table 6: Pairwise Politics Words

IPA anarchist centrist conserv. dem. doesn’t care hates pol. indep. lib. liber. repub. v. lib. IPA fuck wishes wishes smh yay rain congrats wishes money church damn anarchist excited wishes driving excited lol dont driving excited ready ready excited centrist xd fuck lord today tattoo shit surgery shit government school damn conservative xd fuck damn fb anymore shit damn damn art school damn democrat xd fuck wishes tonight stupid fuck died wishes government church wishes doesn’t care packers fuck wishes lord smh shit definitely wishes government church damn hates politics class music dey loves fb tht movie wishes email camp damn independent xd fuck wishes lord valentine sitting fuck wishes beer parents damn liberal xd fuck final lord im xd im gonna government church damn libertarian xd fuck headache lord walk xd dont till packing girls vacation republican xd fuck wishes wishes smh mum fuck minute wishes fucking damn very liberal xd xd boy lord im xd xd school missing im im

Table 7: Politics Confusion Matrix

Predicted Label IPA anar. centrist conserv. dem. doesn’t care hates pol. indep. lib. liber. repub. v. lib. Total IPA 0 2 3 3 11 18 2 1 3 1 16 1 61 anarchist 0 24 4 3 5 21 1 3 15 5 4 3 88 centrist 2 9 74 40 52 66 3 6 95 7 43 4 401 conservative 2 5 29 113 26 31 0 7 53 5 62 0 333 democrat 5 17 53 36 321 101 4 18 80 9 89 3 736 doesn’t care 3 39 51 29 122 373 12 12 105 12 102 9 869 hates politics 0 4 6 1 6 30 5 3 6 0 2 0 63 independent 0 8 16 13 35 22 1 8 29 4 25 1 162 liberal 1 18 51 27 74 51 6 6 223 15 24 13 509 libertarian 0 12 17 9 17 28 0 6 32 11 12 4 148 republican 1 8 19 57 67 64 1 8 29 3 179 3 439 very liberal 0 4 25 2 11 22 2 2 67 1 6 3 145 Total 14 150 348 333 747 827 37 80 737 73 564 44 3954

Table 8: Pairwise Religion Words
athiest agnostic catholic christian none
athiest boyfriend thank church lol
agnostic fucking prayers church lol
catholic fucking fucking lol lol
christian fucking fucking mass xmas
none fucking apartment god church

The most highly weighted word from each pairwise classifier. Word implies top label.

Table 9: Religion Confusion Matrix
Predicted Label
Atheist Agnostic Catholic Christian None Total
Atheist 68 29 17 16 21 151
Agnostic 54 69 27 55 11 216
Catholic 27 37 172 130 9 375
Christian 35 48 126 560 26 795
None 22 11 19 50 39 141
Total 206 194 361 811 106 1678

In the remaining tables the top 55 words are listed in order for each trait.

Table 10: Personality Words
Openness Conscientious Extroversion
- + - + - +
bored art lost gym internet party
boring poetry fucking ready quiet guys
husband beautiful xd weekend bored amazing
attitude universe phone excited listening audition
shopping peace im success apparently baby
dinner poem bored finished computer haha
tv writing fuck studying stupid dance
game books gonna busy pc girls
proud theatre sick vacation hmm fabulous
ur dream procrastination arm anime blast
dentist mind internet officially tt ready
daughter book computer family dark im
dont woman probably relax probably wine
haha guitar cousins tennis sims success
stupid damn hates wonderful didn lets
ni awesome sims special watching excited
ipod tea anybody win slow super
bed apartment charger glad depressing text
justin insomnia sister piano calculus chill
gift xd playing scholarship kind phone
2nd adventure grounded received anymore dear
hurt cali poker lmao repost parties
ohh far tt degrees maybe support
baseball philosophy status state draw loves
mum sigh momma tons yay pics
pray nature ftw motor trying hey
school maybe press obstacles books big
repost music dead research shadow hit
booked blues failed extremely bother met
lord chill forgot circumstances damned pirate
ops fam depression workout suppose ben
nice epic lazy paid reading rocked
tmr places youtube 100 cat gang
dam rights 420 hit poor sex
idol dragons school surgery depression sing
snowing woot http law sigh btw
pissed vampire awsome university games gorgeous
shut soul pokemon anatomy drawing musical
maths eclipse woke blessings odd cali
msn drawing dammit hmmmm 10th girlfriend
aldean strange hair husband pokemon stoked
vodka planet wished counting nice folks
comes yay cleaning calc essay ponder
eid dreams fine louis pointless wanna
alot blood dunno delhi managed hahahaha
waste sushi enemy final looks pool
worst smoking social drive grr tanning
kiero contact yo lets darkness hello
soo lines procrastinator iphone saw pumped
mas deep black lunch crying chillin
staff genius magic yankees lonely theatre
12 novel wasn running laptop kiss
piss smh fans weather shouldn office
transformers worried kinda zone paranoid cock
car folks trying smart walking lauren
Table 11: Personality Words Continued
Agreeable Neurotic Satisfaction With Life
- + - + - +
fucking wonderful loving sick bored family
stupid amazing girlfriend nervous fuck loving
kill awesome wife stressed fucking hope
shopping haha awesome depression hates thankful
shit smile parties depressed bday india
burn happiness party anymore apparently wonderful
bitch phone weekend lonely damn busy
pissed urself haha stress internet friend
punch family doing fucking zero heart
hates blessed game tired chem man
death status sunday trying wat yum
hell music kansas depressing supposed fb
suck woop guy sims ma glad
freak hands delicious anxiety hating beautiful
piss heart beach worst spend lauren
dead spirit definitely hair la lord
xmas smiles swag fed dumb wine
karma guy started scream young swim
fight moment ready fine british energy
blood beautiful hunting nightmare killed lunch
awful movie power rip hmm locked
deal theres funniest tears france woot
misery car melody horrible chances sons
fuck dancing hawaii flu simply special
enemies lord action worse exams trust
fake guitar hit issues mum wish
pathetic sore chillin scared main weeks
irony sara workout stressful hate day
dumb help flow fml edge father
cunt walk portland care dnt tried
care excited seat shes party journey
devil prayers smart stressing kept hospital
black knowing snowboarding ugh dat email
ich valentines knowing sad didn business
russian borrow sore gary months santa
idiots laura greatest hates du walked
cunts notifications success die rain lights
wtf beard basketball actually pass kingdom
crap reli update scary bus work
truck snowboarding gf boyfriend okay lol
deleted sorry women pills australia mommy
anger chillin gotta crying shooting turkey
die hill followed kitty england nap
tu whats jumping awful africa revenge
nightmare hearts fool hurt rachel truly
annoyed kindness dancing bored fml son
rip study greatness fair metal final
bloody worry blast screaming uk reached
drama clients woke dreading school survived
bitches smells ass friggin wtf dont
stupidity troops hitting suicide matt 0
hair sing cock miserable freakin god
wifi goood wise quiet 15 kitchen
fat holy kiss xd 200 normal
rage faster toes sadness free blessing
Table 12: Sensational Interest Words
Militaristic Violent-Occult Intellectual Recreation
- + - + - +
sleeping man lord hell im life
ugh xbox pray zombie course jon
sad gets cousins damn boring beautiful
excited gotta church fuck painful dancing
lovely good michael bitch decision yoga
oh training allah ass hurts thankful
hair headed jesus drink bus peace
shopping truck game blood game kinda
husband guitar 0 lmao stupid truly
sick guys summer xd bak la
cares bro gosh woot hero ich
mum gun praise halloween problem miss
boyfriend boom sunday play yeah likes
lady epic dad guys christ comfort
concert work loving drunk gona lol
today weight mum thanx id wtf
gaga gym team animal sittin insomnia
okay bike hospital sanity die chicken
pic dang 10 fucking horse children
adorable game tv dragons yell tired
sunday blast christ burn chuck lovely
ordered lol heal vampires 2day ap
birth war usa blah tommorrow funny
lots black personal man ow things
poor fish best loved bored man
ben military ray pissed fukin simple
fine woot nervous lil inbox thank
settings 12 thing bday race period
birthday till look send basketball countdown
cousins ppl week body word baby
shoes brave 2morrow metal rhys beach
art 17 quite head tell hey
omg fight poor piss step depression
stop success brazil blast wats jobs
wear marines cup theyre coke cure
prince hrs zumba cause football manage
round sword account gun penguins sugar
come make website death won aware
neighbours ko tryna vampire facebookers singing
basement friend study bleh letters egg
music hit haha tattoo awsome taste
speak play soccer ppl dont rains
thoughts pics feeling dead blah log
story hahaha christmas woman till taught
weird troops round purple playing coolest
awful army youth peaceful dead yellow
quite running story message fact cheers
rachel mag bible shit learned small
hear strong woah angel visit society
alice knw grace kinda address fly
tea beer prayers tongue 14 social
promised hehehe plan sushi chilling boo
jesus comwatch feat wolf win beauty
actually xoxo anybody poke pokemon world
counting run stressed kick sees sunshine
Table 13: Sensational Interest Words Continued
Occult Credulousness Wholesome Activities Belief in Star Sign
- + - + No Yes
church zombie coke woot minutes omg
praise ass michigan camping didn im
jesus bitch stupid fish church ready
lord halloween pathetic life praise friend
bible animal ops yesterday jesus mind
christ sign husband beautiful probably ass
team omg didn rain physics butt
quite xd hurts man jess stay
loving job kurwa mexico white tom
pray woot evil wish religion tomarrow
paper wish afternoon river iv october
game cure problem love officially promise
blessed street taylor path imagine lol
salvation vampire idea moon christ searching
ops guys jess haha germany bitch
summer send glee snow giants bleh
michael lol mum bike saw eye
spent thanx mental hahaha wants cute
youth luck meg ghost north family
cousins wtf mad baking decided halloween
word nature 360 grandma discovered hanging
god cancer pissed live 11th haunted
homework woohoo club goin ouch japanese
alarm miss uni sky skin mother
0 barely lyrics cat doesn dinner
haha moment head animal bacon card
player bar recently netflix train help
sunday safe internet birds hahaha bored
college proud min smile lasts luv
wedding woman lesson happiness america luck
prayer mom bus mom haven neighbors
glory away rly yum burning yum
forgiveness dare debate fishing pray fireworks
ann inches kevin truly thursday lmao
mm boyfriend inbox fell jessica tt
political il jeez make prince tired
fact nd official clean knew person
greatest pls nite portland umm nd
confused aware ms smells quiero watch
appreciated xmas lack lake deserves ya
algebra hell saw create heres prom
brazil solstice troy making finds crazy
travel date sims 2010 kim upload
daughter vampires school josh heard elf
bacon copy thinks children punch hehe
laura purple thanking laughing groups crack
personal haunted die sa car bell
week theyre hates law amazing human
greater lmao stuff jobs sick finish
statement later band earth tape lnk
messed interview thieves gets drink june
tv peeps feels hehehe morn change
em peaceful elm swimming dallas costume
poor drunk germany wa cops shit
trust dunno sat monkeys waters decorating
Table 14: Psychographic Words
Self-Disclosure Fair-Mindedness IQ
- + - + - +
bored family bored excited nite exam
fuck loving wat business ur hours
fucking hope soon says lmao sigh
hates thankful dad apartment alot camping
bday india xd great family finish
apparently wonderful stage delicious omg paper
damn busy pass sure 2011 wtf
internet friend moon needed city il
zero heart haha seattle lol finds
chem man kitty uni help important
wat yum tired airport wew read
supposed fb mum thankful boy physics
ma glad farmville dallas heart google
hating beautiful face learn com ra
spend lauren drank weekend angie xd
la lord fuk definitely www wifi
dumb wine fuck dinner ha text
young swim ma card 333 weeks
british energy sun amazing tom studying
killed lunch crap tonight goodnight training
hmm locked bday exciting history course
france woot shit degrees xxx student
chances sons hopefully classes xdd magic
simply special feel support friend kinda
exams trust fails priceless morning everytime
mum wish va oh mum raining
main weeks big certainly christmas yea
hate day nd government eid maths
edge father smoke ticket kay semester
dnt tried yay food gives maybe
party journey watchin january din exciting
kept hospital sick couple beautiful point
dat email wedding php folks kno
didn business regret journey luv excited
months santa seconds universe 0 imma
du walked im 21 hacked months
rain lights ignore grateful secrets flying
pass kingdom tt pay iam final
bus work lose size forgiveness nah
okay lol marriage class strong library
australia mommy lolz situation busy used
shooting turkey fukin duke jo chem
england nap picture honesty hate brain
africa revenge blessing austin ti everybody
rachel truly slow tires nightmare awesome
fml son anxiety 29 ayaw groups
metal final cy3 sisters prayer progress
uk reached library mother fought champion
school survived tmr heading ow calculus
wtf dont fucking bc sana behave
matt 0 epic piece tired den
freakin god il summer afraid badly
15 kitchen marie breakfast para times
200 normal bunch answer sum mobil
free blessing loaded surgery movie fun
Table 15: Religion and Politics Words
Agnostic vs Atheist A. vs A. (Fair) Religious vs Not Conservative vs Liberal
extra physics miles fucking church fucking church damn
miles fucking working physics pray fuck truck happy
turn snowing extra wat prayers xmas government fb
hair shit awhile fuck god damn america smh
packing wat packing bloody easter shit pray marriage
awhile write turn shit lord bloody haha xmas
insane bloody super write blessed hell prayers chicago
working enter hubby maths christmas ass deer sex
hubby fuck chill xx ugh india christmas hell
points sigh free snowing praying zombie country fam
friggin thinks sleepy enter hw fuckin tonight lovely
santa talk santa thinks ppl halloween 17 halloween
heck weeks heck talk prayer car lord health
wishes town ready science game yay awesome saw
child science friggin sigh believe social god yoga
free maths vacation hai family xx military celebrate
boyfriend degrees work cancer ready quite texas gay
lady lolz thursday person fb religion freedom apartment
learn record late coursework bless drink savior wtf
super xmas points town im oh dad thoughts
houston tom pack xd calling using bible shit
service hai houston weeks dang shitty jesus glee
pack person insane tom paper internet supper gaga
late dat ya film jesus fucked girls da
wanting tyler relax dat school damned huge palin
hasn cod join kill camp omfg praying 2010
mai afraid busy lolz gosh meh camp help
sleepy untill learn msn heart indian soldiers mexico
worked present child english success post byu mother
fly wifey headed xmas mary head christ indian
chill movie favorite chemistry strength cricket disney lady
join xx beautiful afraid butt any1 risen studies
kyle cancer season na fishing dragon beach social
dun boring san pierced brother lovely tournament art
thursday rape fly dick military body troops holiday
taken month worked anatomy sad new schools shitty
childhood kill service bbc uncle boyfriend leave ve
mother welcome spring tell senior teeth ill free
thank clinton wanting untill fair nice blonde earthquake
headed nicht halloween memory mom fml armed street
ya ay lady bothered tan warped xbox phone
london brother thank horse watching woke reagan lakers
beautiful tell childhood record em bleh utah ur
jail hadn mai cod president wednesday served fine
hates pierced hair ki smh gods tide relationship
paperwork wild paperwork nicht love afford gators asshole
wanna use 4th sheep haha japanese pelosi worried
clear perfect hopefully chem future tongue husband purple
san return missed brother best robert stinks putting
til needed peace fancy emails sophie trial omg
halloween paid hasn degrees goin holy picked nature
bring half trip disease football eye beep prop
kindle horse mother realised latest tattoo gun black
vida disease sunshine room thank decent trailer live
powers chuck kyle religion matthew odd ready eid
Table 16: Race Words
White vs. Black White vs. Asian Black vs. Asian
tonight smh tonight asian smh korea
dad fb blonde tt fb sa
stupid lord town tmr lord na
exited fam fuckin korea wit asian
thinks nigeria ass chinese aint gay
ends yall college ng da chinese
journey black gas na yall internet
meet fathers dope korean lol korean
hahahahahaha mj worse china say monday
fun yuh night ang fam xd
awesome gon men aq jackson tmr
ability birthday sons asians cos shooting
night mad adult chen michael philippines
mas lol pretty guys finals 3d
wouldnt finish theres thailand ass babe
chargers dey idea taiwan yuh heaven
bein asap hope karaoke black important
aftr tryna ability sa ny tan
pretty jackson melissa chan sooooo thailand
eh came state dream mad yummy
tom degrassi unique company mind completely
exhausted wat weekend craving season woot
tough iz screaming zzz wat smell
great hw mamaya holiday birthday bought
running pple tune wanna degrassi fly
exciting jus figure ms hell tt
yankees braids inside nguyen chelsea worry
politics haters exited singapore woman ruin
mirror females wine yang figure passed
pepsi misfits 5th hu african skating
roll god superman fat nigeria english
animal man emotionally ftw episode belong
grr omg sell gg iz shot
gay african sitting rice smart mas
tattoo desires february tttt saying grandpa
2nite chelsea easter damnit asap lazy
spend female months 555 attention sacrifice
monday cousin saying wong knowing grr
sorrow holla expecting achieve ki broken
ed smart rollin pa meeting yang
healthy laker wheres mode hw beer
enjoyable favour eminem lmao sings chatting
actually dis apparently pride india meet
charity money does bbq gas shoulder
delete happy status super self ang
iron mii legit 1st ready funn
blonde aye 30 long college shoes
comforted hard wen skating mj wood
standards wuz eric mean search dad
shot ready yelled heart years apart
chose nigga mis dx misfits aj
chatting jamaica breaking faith blessed line
damage bus homework expectation advice jack
innocent facebook actually research boys totally
thnx cos wishes hard fathers tomorrow