\useunder

\ul

¹¹institutetext: Boston University, Boston, MA

Inferring Human Traits From Facebook Statuses

Andrew Cutler Brian Kulis

Abstract

This paper explores the use of language models to predict 20 human traits from users’ Facebook status updates. The data was collected by the myPersonality project, and includes user statuses along with their personality, gender, political identification, religion, race, satisfaction with life, IQ, self-disclosure, fair-mindedness, and belief in astrology. A single interpretable model meets state of the art results for well-studied tasks such as predicting gender and personality; and sets the standard on other traits such as IQ, sensational interests, political identity, and satisfaction with life. Additionally, highly weighted words are published for each trait. These lists are valuable for creating hypotheses about human behavior, as well as for understanding what information a model is extracting. Using performance and extracted features we analyze models built on social media. The real world problems we explore include gendered classification bias and Cambridge Analytica’s use of psychographic models.

Keywords:

Social Media Psychographic Prediciton NLP.

1 Introduction

Facebook’s 2 billion users spend an average of 50 minutes a day on Facebook, Messenger, or Instagram [1]. Industry seeks to obtain, model and actualize this mountain of data in a variety of ways. For example, social media can be used to establish creditworthiness [2, 3], persuade voters [4, 5], or seek cognitive behavioral therapy from a chatbot [6]. Many of these tasks depend on knowing something about the personal life of the user. When determining the risk of default, a creditor may be interested in a debtor’s impulsiveness or strength of support network. A user’s home town could disambiguate a search term. Or—reflecting society’s values—a social media company may be less willing to flag inflammatory language when the speaker is criticizing their own [7].

Social media’s endlessly logged interactions have also been a boon to understanding human behavior. Researchers have used various social networks to model bullying [8], urban mobility [9], and the interplay of friendship and shared interests [10]. Such studies do not have the benefit of a controlled setting where a single variable can be isolated. However, orders of magnitude more observations in participants’ natural habitat offer more fidelity to lived experience [11]. Additionally subjects can be sampled from countries not so singularly Western, Educated, Industrialized, Rich, and Democratic—or WEIRD, in the parlance of Henrick et al [12].

In this paper we show how readily different personality and demographic information can be extracted from Facebook statuses. Our reported performance is useful to learn how traits are related to online behavior. For example, sensational interests as measured by the Sensational Interest Questionnaire (SIQ) have been studied for internal reliability [13], relationship to physical aggression [14], and role in intrasexual competition [15]. Yet work connecting SIQ with social media use relies on individually labeling sensational interests in statuses and is only predictive among males [16]. Our model performs well for both males and females without hand-labeling statuses. Similarly, other research found no relationship between satisfaction with life (SWL) and status updates [17]; we show modest test set performance. Finally, although Facebook Likes have been shown to be highly predictive of many personal traits [18], language models with good performance on this dataset have been limited to predicting personality, age, and gender [19, 20, 21].

The benchmark also helps assess the efficacy of services that explicitly or implicitly rely on inferring these traits. This is valuable to those developing new services as well as to users concerned about privacy. Of particular interest is the role of psychographic models in Cambridge Analytica’s (CA) marketing strategy. From leaked internal communications, in 2014 CA amassed a dataset of Facebook profiles and traits almost identical to those in the myPersonality dataset [22]. The week after CA’s project became public, Facebook’s stock plummeted $75 billion [23]. One factor in that drop was the belief that Facebook had allowed a third party to create a powerful marketing tool that could manipulate elections [24, 22]. There are dozens of publications on the myPersonality dataset. However, this is the first to predict SIQ, fair-mindedness, and self-disclosure, which CA discussed in relation to building user models [22].

Besides performance benchmarks, the other major contribution of this paper are the most highly weighted words to predict each trait. The weights also say something about human behavior. The interpretation here is more complex: regression on tens of thousands of features is fraught with over-fitting and colinearity. Despite those problems, in Section 3 we argue that the weights can still be treated as a data exploration tool similar to clustering. We provide examples of previously studied relationships that are borne out in the word lists, and believe the lists are a useful tool to develop yet unstudied hypotheses.

Highly weighted features are also an important way to analyze models. We argue in section 4.4 that a militarism predictor CA may have built is accurate, but extracts obvious features. Additionally, by inspecting the features in an Atheist vs. Agnostic classifier we find many gendered words. We demonstrate the bias empirically, then fix the classifier to be more fair. This approach is instructive for interrogating more critical models built on social media data.

This paper includes many contributions that could stand alone. We show that the text of Facebook statuses can predict user SWL and SIQ. We expand the prediction of political identity from a single spectrum (liberal/conservative) to twelve distinct ideologies with varying levels of overlap and popularity. On that task, we establish state of the art performance with a model that also provides informative features for every pairwise political comparison. We recreate models CA may have built, and report their performance and the type of information they extracted. We bring character level deep learning to gender prediction. To our knowledge, we also set the standard for predicting IQ, fair-mindedness, self-disclosure, race, and religion from Facebook statuses. Finally, we propose a novel method to make classification less biased.

Given the broad scope of this paper, some contributions are given less space than they would typically merit. Even so, we believe it is important to report results on many traits in a single paper. This demonstrates the power of a simple model and allows task difficulty and extracted features to be compared across traits without concerns about changing experimental setup.

2 Background

2.1 myPersonality Dataset

From 2008 to 2012, over 7 million Facebook users took the myPersonality quiz produced by the psychologist David Stillwell [11]. After answering at least 20 questions, users were scored on the Big Five personality axes: openness, creativity, extraversion, agreeableness, and neuroticism. Over 3 million of those users agreed to give researchers access to their extant Facebook profile and their personality scores. A much smaller subset of users answered additional questionnaires about their interests, Friends’ personality, belief in astrology, and other personal information. The research community has added to the dataset by providing race labels for several hundred thousand users; representing the text of statuses in terms of their Linguistic Inquiry and Word Count (LIWC) statistics [25]; and much more. Labels used in this study are listed in Tables 2 and 1, along with descriptive statistics. To see all available labels, visit myPersonality.org.

myPersonality.org lists 43 publications that use this data. Most work explores the relationship between personality and easily extractable features such as number of Friends or Likes, geographic location, or user-Like pairs. For example, user-Like pairs are shown to be better predictors of a personality than one’s spouse [26]. In 2013, Schwartz et al introduced the open vocabulary approach (or bag of words) to personality, gender, and age prediction [19]. This significantly outperforms closed-vocabulary approaches such as LIWC that rely on domain knowledge to assign each word to one or more of 69 categories. For an excellent overview of related work, we direct readers to that paper’s introduction [19].

2.2 Language Models

2.2.1 Bag of Words

The majority of our experiments use bag of words (BoW) term frequency-inverse document frequency (tf-idf) preprocessing followed by $\ell_{2}$ regularized regression. First, the vocabulary is limited to the $k$ most common words in a given training set. Then a matrix of word counts, $N$ , is constructed, where $N_{ij}$ refers to how often word $j$ is used by subject $i$ . Each row is normalized to sum to one, moved to a log scale, and divided by $d$ , the ratio of documents in which each word appears. In more formal notation, each element of the tf-idf matrix is defined by

W_{ij}=\frac{1+\log\Big{(}\frac{N_{ij}}{\sum_{i=1}^{k}N_{ij}}\Big{)}}{d_{j}}.

$W$ is then normalized so each row lies on the unit sphere. $W$ can now be used for linear classification or regression with $\ell_{2}$ regularization on the parameters. This is commonly called Ridge Regression. For binary classification problems, labels are assigned values of $\{-1,1\}$ and a threshold determines predicted label. For categorical data with more than two labels, we train a classifier on each pair of labels. Predicted label is decided by majority vote of the $\frac{c(c-1)}{2}$ classifiers, where $c$ is the number of classes.

2.2.2 Character-Level Convolutional Neural Network

For gender prediction, we also train a 49 layer character level convolutional neural network (char-CNN) described in [27]. Much like successful computer vision architectures [28], each character is embedded in continuous space and combined with neighbors by many layers of convolutional filters. Unlike BoW models, CNNs preserve the temporal dimension, allowing the use of syntactic information. While a great advantage, and theoretically more similar to human cognition, this requires different preprocessing. During training, all inputs must be the same length along the temporal axis despite the wide variation in total length of users’ statuses. We chose to split users’ concatenated statuses into chunks of no more than 4000 characters, and no less than 1000, as this is enough text for humans to perform gender classification [29]. Each chunk contains roughly 800 words. Chunks from the same user are assigned entirely to either the training or test set. Unfortunately, preprocessing differences do not allow for a direct comparison between methods. However, enforcing the same preprocessing for both models would necessarily limit one.

2.3 Labels

Tables 1 and 2 provide statistics of the continuous and categorical data respectively. What follows is a brief description of each label and how it was collected.

2.3.1 Gender

is the binary label users supplied when setting up their Facebook account. Offering this information was common before 2008, and mandatory from 2008-2014. In 2014, (after the collection of this dataset) Facebook added 56 more gender options but still uses a binary representation to monetize users [30].

2.3.2 Race

labels provided in the dataset are inferred from profile pictures using the Faceplusplus.com algorithm which can identify races termed White, Black, and Asian. A noisy measure of visual phenotype is not the gold standard for the study of race, however, our results indicate it is related to social media use.

2.3.3 Political identity

is limited to the twelve most common responses: IPA, anarchist, centrist, conservative, democrat, doesn’t care, hates politics, independent, liberal, libertarian, republican, and very liberal. These are heterogenous categories from an open-ended question. No work was done to limit labels to political parties (eg. remove “doesn’t care”), disambiguate misspelled or similar responses (eg. combine “anarchy” and “anarchist” or “liberal” and “very liberal”), or limit responses to one country. To produce the word list for Liberals and Conservatives in Table 15, we combine “liberal”, “very liberal”, and “democrat” as well as “conservative”, “very conservative”, and “republican”. The most likely meaning of IPA is the Independence Party of America, which was in its nascence during this survey. The party is most popular among young people disaffected by the two party system, a sentiment reflected by the users who report IPA.

2.3.4 Religion

categories were limited to the nine most common responses, and similar labels were combined. Three variants of Catholic—“catholic”,“christian-catholic”, and “romancatholic”—were merged to form Catholic. Likewise, Christian refers to “christian”, “christian-baptist” and “christian-evangelical”. The entire list includes: Atheist, Agnostic, Catholic, Christian, Hindu, and None.

2.3.5 Belief in star sign

is the user’s response to “Horoscopes provide useful information to help guide my decisions?” Options include: Strongly Agree, Slightly Agree, No Opinion, Slightly Disagree, and Strongly Disagree.

2.3.6 Personality

is determined on five axes—Openness, Conscientiousness, Extroversion, Agreeableness, and Neurotocism—by a survey. Users answer 20-300 questions which are used to score each personality component on a scale of 1-5. There is a large body of research showing that five factor analysis is explanatory for behavior [31], and its measurement is reproducible [32]. That work is now adapting to larger datasets collected online [11].

2.3.7 Sensational Interests

include Militarism, Violent-Occult, Intellectual Recreation, Occult Credulousness, and Wholesome activities. Users can indicate “Great Dislike”, “Slight Dislike”, “No Opinion”, “Slight Interest”, and “Great Interest” for 28 different items including: “Drugs”, “Paganism”, “Philosophy”, “Survivalism”, and “Vampires and Wolves”. Interest levels are calculated by summing responses from relevant items. The full calculation can be found in [13].

2.3.8 IQ

is determined by 20 questions that conform to Raven’s Standard Progressive Matrices. The development and validation of these questions is explained in [33] and [34]. Because performance on IQ tests has been rising at roughly 0.3 points a year over the past century and IQ is defined as mean 100, the scoring of a test is properly defined over an age cohort [35]. These scores do not take age into account and the mean is 114.

2.3.9 Satisfaction with life, self-disclosure, and fair-mindedness

are assessed by separate questionnaires. SWL is a measure of global well being somewhat robust to short term mood fluctuations [36].

3 The Interpretation of Feature Weights

A common approach to understand traits in social science is to solve

X=UT+\epsilon,

where $X$ is observations of subjects, $T$ is the traits of subjects, $U$ is a transition matrix, and $\epsilon$ is model error [3, 13, 37, 38, 39, 40, 41, 42, 43]. Traits are preferred to be orthogonal to promote compactness without sacrificing modeling power. The Big 5 personality model is both criticized and defended on grounds of trait independence, explanatory power, and measureability, which conforms to the linear model above [44]. Because the traits are defined by language they will not be completely orthogonal. Additionally, observations are not independent. As such, values in $U$ will have dependencies across both rows and columns. Some traits like personality are used to predict other traits or life events [13, 40]. Learning those relationships can be interpreted as informing our beliefs about column dependencies for $U$ when both traits are part of $T$ .

In this paper, $X$ is the tf-idf word matrix, $T$ is defined by our labels, and the model weights are some estimate of $U$ we define as $\hat{U}$ . Row dependencies in $\hat{U}$ are based on how words function. For example, ‘camp’ and ‘camping’ perform similar roles in a status. Likewise, the relationship between IQ and agreeableness will be embedded in the columns of $\hat{U}$ . However, many of the tasks have little training data and the solution is ill-posed. Regularization encourages generalization, but does not provide any guarantees. Further, sometimes $\epsilon$ dominates the model when observations are not very explanatory or the relationship to a trait is not linear. Given these challenges, what confidence can be placed in the estimate $\hat{U}$ ?

These problems mirror those faced when clustering data. Clustering does not come with guarantees it will yield sensible answers in diverse scenarios [45]. However, it is broadly useful when exploring large sets of data [46, 47, 48]. Similarly, $\hat{U}$ can be viewed as a way of ranking features for exploration. A highly ranked observation is not proof it is important. But several highly ranked observations with functional coherence may suggest a hypothesis; particularly when coupled with domain knowledge of row and column dependencies in $U$ .

The 55 most highly weighted features for each label are reported in the Appendix. Though the word lists are shown in order of importance, this ranking is not strict. Different regularization, preprocessing, or train/test splits can alter the ordering, especially when there are few examples. Additionally, more common words with lower weights may be used more often in a model’s prediction, but may not appear at the top of a list. One may use $\ell_{1}$ regularization to obtain an arbitrary small number of non-zero weights [49]. This encourages weighting common words and provides more stable rankings. We demonstrate that approach with our IQ model in Section 4.2.5.

There are many well-studied phenomena embedded in the $\hat{U}$ produced by our work. For example, Sarah Palin is the only politician indicated in the liberal word list in Table 15. Likewise, Nancy Pelosi ranks just below Ronald Reagan among conservative words. This accords with literature on the memorability of negative ads [50], importance of outgroup prejudice for social identity [51, 52], and biases women face in politics [53, 54]. We hope the many word lists in the appendix will be useful to researchers in the development of new hypotheses.

$\hat{U}$ is also useful to understand models built on social media data. Until recently, the models themselves were not very important. However, machine learning can now be used to estimate sensitive traits such criminal recidivism [43]. Given the literalness with which estimates are often interpreted, it is essential to note that model weights are causal for the predicted label. In Section 4.5 we use our understanding of the input features to characterize information the model extracts to predict religion. This dataset also includes demographic labels, which show predicted religion labels are more gendered than the ground truth.

We hope the included word lists (a) highlight unstudied relationships about these traits (b) illustrate what kind of information is extracted from social media by machine learning systems.

4 Results and Discussion

4.1 Experimental Setup

All BoW experiments employ the same preprocessing. Users must have over 500 words in the sum of all their statuses. 80% of the data is randomly assigned to the training set; the remaining samples constitute the test set. The vocabulary is limited to the 40,000 most common words in each training set. Words must be used by at least 10 users but no more than 60% of users in the training set. The regularization parameter is tuned via efficient leave one out cross validation [55] when $n<10,000$ , and $3$ -fold cross validation for larger datasets. All BoW models are implemented using the sklearn library [56]. Table 1 reports the number of samples and explained variance (EV) of the predictions on continuous data. Table 2 reports the number of classes, ratio of samples in the dominant class, homogeneity, and performance on tasks with categorical data.

Table 1: Prediction Accuracy on Continuous Data

Label	N	EV
Personality
Openness	84451	0.171
Conscientiousness	84451	0.120
Extroversion	84451	0.141
Agreeableness	84451	0.090
Neuroticism	84451	0.100
Sensational Interests
Militarism	4074	0.165
Violent-Occult	4074	0.192
Intellectual Recreation	4074	0.033
Occult Credulousness	4074	0.144
Wholesome Activities	4074	0.108
Satisfaction With Life	2502	0.034
Self Disclosure	2006	0.092
Fair-Mindedness	2006	0.064
IQ	1807	0.128

Explained Variance (EV) is 1- $\frac{\mathrm{Var}(y-\hat{y})}{\mathrm{Var}(y)}$ , where $\hat{y}$ is the predicted label.

Table 2: Prediction Accuracy on Categorical Data

Label	N	Classes	Mode	Homogeneity	F1-score	Acc
Gender	109104	2	0.598	0.519	0.92	0.903
Race	22059	3	0.682	0.52	0.74	0.766
Political identity	19769	12	0.213	0.133	0.33	0.337
Religious identity	8388	5	0.488	0.318	0.54	0.541
Belief in Star Sign	7115	5	0.331	0.245	0.32	0.334

Mode is the ratio of the dominant class. Homogeneity is the probability two random samples will be of the same class. The F1-Score is the harmonic mean of precision and recall. For non-binary labels, the precision and recall for each class is weighted by its support.

Table 3: Gender Prediction

Model	Accuracy
Human Majority Vote	0.840
LIWC	0.784
Tri-grams	0.914
Tri-grams + LIWC	0.916
BoW (40k Vocab)	0.903
BoW (500k Vocab)	0.928
49 layer char-CNN	0.901

Human baseline is the majority vote (n=210) in gender prediction on Twitter data [29]. LIWC and Tri-grams are reported in [19].

4.2 Performance

4.2.1 Gender

Table 3 compares our gender predictor to several other methods. The BoW model with a vocabulary of 500,000 yields accuracy of 92.8%, 1.4% more accurate than the tri-gram model reported by Schwartz et al [19]. Even though the same dataset is used, the comparison is not direct. The tri-gram model seeks to remove the age information from words, has a larger vocabulary, preserves some temporal relationships in the tri-grams, and draws a different train/test split. Moreover, the preprocessing is more restrictive and only includes users with at least 1000 words. Notwithstanding these discrepancies, which may boost or dampen performance, the results are very similar. When the LIWC representation is added to the tri-grams, there is a slight improvement to 91.6% accuracy. Preprocessing is even less similar for the char-CNN described in the Section 2.2.2. The human baseline of 84.0% consists of volunteer judgments based on 20-40 user tweets as reported by Nguyen et al [29]. This is less text than is available to the other models, and from a different social media platform. But, with 210 volunteer guesses per user, it provides a relevant human baseline.

4.2.2 Personality

After gender, personality is the most studied trait in this paper. Likewise, Schwartz et al achieve the best results to date [19]. They report the square root of EV to two significant digits: 0.42, 0.35, 0.38, 0.31, 0.31. In that format, we are just 0.01 beneath the state of the art for openness and agreeableness, 0.01 better for neuroticism, and equivalent for the remaining traits. As with gender, we achieve this with a simpler model.

4.2.3 Political Identity

Prediction accuracy of 33.7% is a gain of 11.7% over the baseline strategy of always predicting the mode, ‘doesn’t care’. As noted in the experiments section, training samples are weighted inversely to their class representation; therefore, ignoring any class will result in an equal loss. This does not provide the highest classification accuracy. However, we believe when some classes are sparsely populated an MSE optimal classifier that is highly biased toward the mode should not be the standard. For reference, equal sample weights and the same training scheme yield classification accuracy of 36.3% and a weighted f1 score of 31.6%. Five classes—IPA, hates politics, independent, libertarian, and very liberal—have no representation in the test set predictions. The weighted classifier predicts each class at least once.

According to Preotiuc-Pietro et al., all previous research on predicting political ideology from social media text has used binary labels such as liberal vs conservative or Democrat vs Republican. They broaden the classification task to include seven gradations on the liberal to conservative spectrum [57]. When predicting ideological tilt from tweets, they achieve a 2.6% boost over baseline (19.6%) with BoW follow by logistic regression. Word2Vec feature embeddings [58] and multi-target learning with some hand-crafted labels yield an 8.0% boost. From classification along grades of a single spectrum, we significantly expand the task to twelve diverse identities with varying levels of representation and ideological overlap while maintaining classification accuracy.

In Table 6 we report the matrix of highest weighted words for separating users in each pairwise class comparison. As with race, belief in star sign, and religion, we plan on making expanded pairwise lists available online. In Table 7 we report the confusion matrix. Note that many errors are between similar labels, such as liberal and democrat. Ease of training, strong performance, and representation of minority classes make a majority vote system of shallow pairwise classifiers a good approach for this task.

For binary comparison, by pooling {‘very liberal’,‘liberal’,‘democrat’} and {‘very conservative’,‘conservative’,‘republican’} we achieve 76.4% accuracy; 12.1% above baseline. Table 15 shows the top 55 liberal and conservative words.

4.2.4 Religion

Religion seems to be more difficult to glean from statuses than political identity. At 54.1%, accuracy is a modest 5.3% above guessing the mode. The most highly weighted pairwise words are on Table 8, and Table 9 shows the confusion matrix. The most highly weighted word to distinguish someone who is agnostic from an atheist is ‘boyfriend’. This led us to look deeper at that pairwise classifier in Section 4.5. Binary labels were constructed by pooling {‘catholic’, ‘christian-catholic’, ‘romancatholic’, ‘christian’, ‘christian-baptist’} and {‘atheist’, ‘agnostic’,‘none’ }. We achieve 78.0% accuracy, 5.2% above baseline. Those words are on table 15. To our knowledge, there is no other multi class religion predictor to which our results can be compared.

4.2.5 IQ

In a genome wide association meta study of 78,308 individuals, 336 single nucleotide polymorphisms were found to explain 2.1-4.8% of the IQ variance among the test population [59]. We achieve 12.8% EV with a model trained on less than 2000 users and their statuses. Using $\ell_{1}$ regularization to limit the vocabulary to the ten most informative words—final, physics; ayaw, family, friend, heart, lmao, nite, strong, ur—still yields 5.6% percent EV. The relative accuracy of such a trivial model that leverages intuitive features is a helpful comparison for any project predicting this important trait. To our knowledge, this is the only work to date that infers IQ from social media.

The selected features are also informative. Words suggesting intelligence—‘final’ and ‘physics’—are parsimonious and singularly academic. Whereas the university experience is sufficient to find users with high IQ, features inversely related to IQ are more focused on disposition. From table 10, agreeableness is implied by ‘family’ and ‘heart’; conscientiousness is implied by ‘family’ and ‘lmao’; and low openness is implied by ‘ur’. Overall, the list can be characterized as prosocial, or at least concerned with social relationships. Predicting low IQ with prosocial features seems to challenge some previous research.

Gottlieb et al observed that learning disabled children were more likely to engage in solitary play [60]. Play has also been observed to be more aggressive [61]. More directly related to our task, McConaughy and Ritter showed a positive correlation between the IQ of learning disabled boys and social competence scores; and a negative correlation between IQ and behavior problem scores [62]. For further review of the subject see [63].

An MSE optimal classifier seeks to generalize information about samples near the average. This can cause bias when classifying minorities, but is instructive when interpreting features. Features should say something about the majority of our sample, those with IQ near the mean. This explains why antisocial behavior among those with extremely low IQ does not preclude prosocial behavior indicating moderately lower IQ. Reflecting the limitations of this type of study, words like ‘family’, ‘friend’, and ‘heart’ could also be caused by differing norms for social media use or many other factors. Prosocial words predicting lower IQ does however suggest interesting future work.

4.2.6 Sensational Interests

In this study, SIQ is the easiest continuous variable to predict, even with an order of magnitude less training data than personality. The SIQ asks lists 28 discrete interests like ‘black magic’ and ‘the armed forces’. Very similar terms can be recovered from statuses: ‘zombie’, ‘blood’, ‘vampire’; ‘military’, ‘marines’, ‘training’. Personality tests, on the other hand, ask more abstract questions like ‘I shirk my duties’ for conscientiousness. Many of these duties seem to be extracted in Table 10: ‘studying’, ‘busy’,‘obstacles’. But many more training examples are required for similar performance.

This is the first work to demonstrate an automatic system for predicting SIQ. Previous research relied on manually counting the number of sensational interests in statuses. The count was only correlated with militarism among men; the relationship was negative for women [16].

4.2.7 Satisfaction With Life

Previous research cast doubt on the relationship between status updates and SWL [17]. The number of positive words used on Facebook nationwide in a given day, week, or month, is inversely correlated with the SWL of that time period’s myPersonality participants. The interpretation of that result is that it “challenges the assumption that linguistic analysis of internet messages is related to underlying psychological states.” Here we show that a BoW model accounts for 3.4% of the variance in SWL scores. Moreover, the most important words the model finds are intuitive. Lower SWL is implied by “fucking”, “hate”, “bored”, “interview”, “sick”, “hospital”, “insomnia”, “farmville”, and “video”. The deleterious effects of joblessness, anger, chronic illness, and isolation are well documented. Words positively associated with SWL—“camping”, “imagination”, “epic”, “cleaned”, “success”—make similar sense.

Conversational AI on Facebook Messenger is an efficacious and scalable way to administer cognitive behavioral therapy [6]. Our results show linguistic analysis can shed light on underlying psychological states. This is important to find users that could benefit from such treatment.

4.2.8 Belief in Star Sign

Compared to political identity, BSS has seven fewer classes and a far more homogeneous distribution. Even so, the BSS classifier performs slightly worse than the politics classifier and roughly on par to the baseline of predicting the mode. Unlike our race, gender, politics and sensational interests, we don’t wear belief in astrology on our sleeve.

4.3 Model Selection

BoW models are somewhat unintuitive. Humans use syntactic information when decoding language, which the model discards. Yet, for many tasks they achieve state of the art performance. We compare our BoW to a character-level CNN on gender prediction, our most data rich problem. A character-level CNN is well suited to large amounts of messy, user generated data. Pooling layers in a CNN allow generalization of words like “gooooooooo” and “gooooooo”, while BoW must learn distinct weights. Surprisingly, the CNN does not outperform the simple BoW as shown in Table 3.

We found the choice of prediction model is not as important as preprocessing. In initial experiments, Support Vector Machines [64] and logistic regression, and $\ell_{2}$ regularized regression yielded similar performance, depending on choice of $n$ -grams and whether Singular Value Decomposition was used [65]. We implement ridge regression and classification for simplicity.

Inferring human traits from social media is now being done using deep models [66, 57]. That may be useful in some cases, but for this project the deep model offered no performance boost or intuition to underlying human behavior. Perhaps a continuous bag of words [58] and recurrent neural network [67] would have done better, but researchers should not consider deep learning essential for this field. Moreover, any performance gains should be weighed against loss of interpretability.

4.4 Cambridge Analytica

With current technology, Facebook statuses are a better predictor of someone’s IQ than the totality of their genetic material [59]. When a marketing firm adds such a tool to their arsenal it is natural to be suspicious. Indeed, The Guardian article that broke the CA story was headlined “‘I made Steve Bannon’s psychological warfare tool’: meet the data war whistleblower” [24]. (Steve Bannon is the former chief executive of the Trump presidential campaign.) However, closer inspection of psychographic models casts doubt on their ability to add value to an advertising campaign, even when the predictions are accurate. In this paper we show that militarism is one of the most easily inferred traits. At 16.5% explained variance, it is more predictable than any of the big 5 personality traits except openness, even with just 5% of the training data. SIQ is also a much stronger predictor of aggressive behavior than the Big 5 [14]. If this trait was actionable for the Trump campaign, it is interesting that the two most highly weighted features are ‘xbox’ and ‘man’. Gaming interest and gender are already available via Facebook’s advertising platform; reaching that demographic does not require an independent model. Additionally, Steve Bannon’s belief in the political power of gamers predates CA’s psychographic model by a decade [68].

Readers are encouraged to view the word lists in the Appendix through the lens of task accuracy on Tables 1 and 2. They may come to the same conclusion as the Trump campaign who, according to CBS News, “never used the psychographic data at the heart of a whistleblower who once worked to help acquire the data’s reporting – principally because it was relatively new and of suspect quality and value.” [69]. Performance results and extracted features allow for more informed discussion; particularly for SIQ, fair-mindedness and self-disclosure on which we report the first accurate prediction model.

There are limitations to this analysis. Our models only use statuses; Likes and network statistics could increase accuracy. Further, other psychographic traits beyond militarism may be politically useful but have no obvious demographic stand-in. Finally, we don’t have access to CA’s exact dataset and instead built our models on the myPersonality dataset.

Table 4: Agnostic vs Atheist Confusion Matrix

		Agnostic	Atheist	Total
		Predicted (Men)
True	Agnostic	36	33	69
True	Atheist	28	58	86
	Total	64	91

Predicted (Women)
Agnostic	Atheist	Total
86	21	107
34	16	50
120	37

Table 5: Fair Agnostic vs Atheist Confusion Matrix

		Agnostic	Atheist	Total
		Predicted (Men)
True	Agnostic	40	29	69
True	Atheist	31	55	86
	Total	71	84

Predicted (Women)
Agnostic	Atheist	Total
85	22	107
31	19	50
116	41

4.5 Gender Bias in Atheist vs Agnostic Classifier

Highly weighted atheist words include “fucking”, “bloody”, “maths”, “degrees”, “disease”, “wifey”, and “religion”. Meanwhile, “beautiful”, “santa”, “friggin”, “thank”, “hubby”, “miles”, and “paperwork” imply the user is agnostic. This paints a picture of academic, male, disagreeable and British atheists. Agnostic words are more positive, female, and related to mundane preparation. A more complete list is shown in Table 15. What follows is an empirical analysis of our estimator‘s gender bias, a discussion of fairness, and results debiasing the model.

In this dataset, atheists and agnostics are 33.5% and 50.3% female respectively. This is a stronger female preference for agnosticism than random surveys across the United States which report 32% and 38%, respectively [70]. Table 4 shows the confusion matrices for men and women. The ratio of predicted to true agnostics is 0.945 for men and 1.35 for women. Similarly, the ratio of false atheist to false agnostic predictions is 90.8% larger for men than women. The classification of women, the minority in this dataset, is highly distorted.

Models built to generalize information often amplify biases in training data. Cooking videos elicit female pronouns in machine-generated captions 68% more than male pronouns, even though the training shows only 33% more women cooking [71]. Word embeddings used in machine translation [72], information retrieval [73], and student grade prediction [74] produce analogies such as “man is to computer programmer as woman is to homemaker”[75].

There are many notions of fairness defined over an individual [76, 77, 78], population [79, 80], or information available to the model [81]. Building a fair estimator often requires domain knowledge to define a similarity metric [76], make corpus-level constraints [71], or construct a causal model that separates protected information from other latent variables [78]. In this paper, we will use the notion of Disparate Mistreatment to measure fairness [79]. That is, if protected classes experience disparate rates of false positive, false negative or overall misclassification, the estimator is unfair.

To mitigate Disparate Mistreatment we explicitly encode gender—{ $-1$ ,0, $1$ } for {male, unknown, female}—in the feature vector during train time. At test time the gender of all samples is encoded as unknown. The intuition is that latent variables are amplified when they are easy to extract and correlated with the target. As demonstrated by the accuracy of our race and gender predictors, that is often the case for protected information. There often exist more informative, if more subtle, traits than the protected features. For example, atheists and agnostics report a yawning gap in those that don’t believe in God, at 92% and 41% [70]. Additionally, religiosity is shown to be correlated with both Agreeableness and Conscientiousness [82]. But gender is much easier to extract then belief in God or personality. By explicitly giving the model gender information, we hope that the model will do more to extract those other features.

This approach produces much less Disparate Mistreatment of men and women. The ratio of predicted to true agnostics moves closer to parity at 1.02 for men and 1.22 for women. Additionally, the ratio of false atheist to false agnostic predictions is now only 31.8% larger for men, compared to 90.8% without intervention. The most highly weighted agnostic words for the new fair classifier are also less gendered; “hair”, “wifey”, and “boyfriend” are no longer in the top 55, as reported in Table 15. We also saw no decay in classification rate.

The gender bias of the atheism classifier is clear by simply inspecting its most heavily weighted features. More opaque models should be subjected to more rigorous inspection for bias.

5 Conclusion and Future Work

We match or set the state of the art for the 20 traits in this paper. Additionally, we provide the top words for many pairwise classification problems, and top 55 words for regression or binary classification problems. We hope researchers from many fields find the benchmarks and word lists useful. Our analysis of psychographic models in marketing as well as gender bias in a religion classifier are examples of how these performance measures and extracted features can be used together.

In future work we hope to explore what types of unfairness can be solved by our approach in Section 4.5. Further, models built on traits with few examples are well suited to be augmented by transfer learning. This is especially pressing for detecting states like low satisfaction with life, which can be somewhat ameliorated at low cost.

References

[1] J. B. Stewart, “Facebook has 50 minutes of your time each day. it wants more,” The New York Times, vol. 5, 2016.
[2] SunCorp, “Digitising reputation pays off in the rental market,” 2017.
[3] A. E. Khandani, A. J. Kim, and A. W. Lo, “Consumer credit-risk models via machine-learning algorithms,” Journal of Banking & Finance, vol. 34, no. 11, pp. 2767–2787, 2010.
[4] D. L. Cogburn and F. K. Espinoza-Vasquez, “From networked nominee to networked nation: Examining the impact of web 2.0 and social media on political participation and civic engagement in the 2008 obama campaign,” Journal of Political Marketing, vol. 10, no. 1-2, pp. 189–213, 2011.
[5] R. J. González, “Hacking the citizenry?: Personality profiling,‘big data’and the election of donald trump,” Anthropology Today, vol. 33, no. 3, pp. 9–12, 2017.
[6] K. K. Fitzpatrick, A. Darcy, and M. Vierhile, “Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial,” JMIR mental health, vol. 4, no. 2, 2017.
[7] R. Allan, “Hard questions: Who should decide what is hate speech in an online global community?,” 2017.
[8] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec, “Antisocial behavior in online discussion communities.,” in ICWSM, pp. 61–70, 2015.
[9] A. Noulas, S. Scellato, R. Lambiotte, M. Pontil, and C. Mascolo, “A tale of many cities: universal patterns in human urban mobility,” PloS one, vol. 7, no. 5, p. e37027, 2012.
[10] S.-H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha, “Like like alike: joint friendship and interest propagation in social networks,” in Proceedings of the 20th international conference on World wide web, pp. 537–546, ACM, 2011.
[11] M. Kosinski, S. C. Matz, S. D. Gosling, V. Popov, and D. Stillwell, “Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines.,” American Psychologist, vol. 70, no. 6, p. 543, 2015.
[12] J. Henrich, S. J. Heine, and A. Norenzayan, “The weirdest people in the world?,” Behavioral and Brain Sciences, vol. 33, no. 2-3, p. 61–83, 2010.
[13] V. Egan, J. Auty, R. Miller, S. Ahmadi, C. Richardson, and I. Gargan, “Sensational interests and general personality traits,” The Journal of Forensic Psychiatry, vol. 10, no. 3, pp. 567–582, 1999.
[14] V. Egan and V. Campbell, “Sensational interests, sustaining fantasies and personality predict physical aggression,” Personality and Individual Differences, vol. 47, no. 5, pp. 464–469, 2009.
[15] A. Weiss, V. Egan, and A. J. Figueredo, “Sensational interests as a form of intrasexual competition,” Personality and Individual Differences, vol. 36, no. 3, pp. 563–573, 2004.
[16] G. Hagger-Johnson, V. Egan, and D. Stillwell, “Are social networking profiles reliable indicators of sensational interests?,” Journal of Research in Personality, vol. 45, no. 1, pp. 71–76, 2011.
[17] N. Wang, M. Kosinski, D. Stillwell, and J. Rust, “Can well-being be measured using facebook status updates? validation of facebook’s gross national happiness index,” Social Indicators Research, vol. 115, no. 1, pp. 483–491, 2014.
[18] M. Kosinski, D. Stillwell, and T. Graepel, “Private traits and attributes are predictable from digital records of human behavior,” Proceedings of the National Academy of Sciences, vol. 110, no. 15, pp. 5802–5805, 2013.
[19] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p. e73791, 2013.
[20] G. Farnadi, G. Sitaraman, S. Sushmita, F. Celli, M. Kosinski, D. Stillwell, S. Davalos, M.-F. Moens, and M. De Cock, “Computational personality recognition in social media,” User modeling and user-adapted interaction, vol. 26, no. 2-3, pp. 109–142, 2016.
[21] M. Sap, G. Park, J. Eichstaedt, M. Kern, D. Stillwell, M. Kosinski, L. Ungar, and H. A. Schwartz, “Developing age and gender predictive lexica over social media,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151, 2014.
[22] N. Y. Times, “How trump consultants exploited the data of millions,” 2018.
[23] M. Watch, “Facebook valuation drops $75 billion in week after cambridge analytica scandal,” 2018.
[24] T. Guardian, “‘i made steve bannon’s psychological warfare tool’: meet the data war whistleblower,” 2018.
[25] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, p. 2001, 2001.
[26] W. Youyou, M. Kosinski, and D. Stillwell, “Computer-based personality judgments are more accurate than those made by humans,” Proceedings of the National Academy of Sciences, vol. 112, no. 4, pp. 1036–1040, 2015.
[27] A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very deep convolutional networks for text classification,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, pp. 1107–1116, 2017.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[29] D. Nguyen, D. Trieschnigg, A. S. Doğruöz, R. Gravel, M. Theune, T. Meder, and F. De Jong, “Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1950–1961, 2014.
[30] R. Bivens, “The gender binary will not be deprogrammed: Ten years of coding gender on facebook,” New Media & Society, vol. 19, no. 6, pp. 880–898, 2017.
[31] J. M. Digman, “Personality structure: Emergence of the five-factor model,” Annual review of psychology, vol. 41, no. 1, pp. 417–440, 1990.
[32] R. R. McCrae and P. T. Costa, “Validation of the five-factor model of personality across instruments and observers.,” Journal of personality and social psychology, vol. 52, no. 1, p. 81, 1987.
[33] M. LLC, “The development and piloting of an online iq test,” 2014.
[34] M. Kosinski, “Measurement and prediction of individual and group differences in the digital environment,” Department of Psychology University of Cambridge, 2014.
[35] J. R. Flynn, “Massive iq gains in 14 nations: What iq tests really measure.,” Psychological bulletin, vol. 101, no. 2, p. 171, 1987.
[36] E. Diener, R. A. Emmons, R. J. Larsen, and S. Griffin, “The satisfaction with life scale,” Journal of personality assessment, vol. 49, no. 1, pp. 71–75, 1985.
[37] L. Cooke, J. Wardle, E. Gibson, M. Sapochnik, A. Sheiham, and M. Lawson, “Demographic, familial and trait predictors of fruit and vegetable consumption by pre-school children,” Public health nutrition, vol. 7, no. 2, pp. 295–302, 2004.
[38] M. Peciña, H. Azhar, T. M. Love, T. Lu, B. L. Fredrickson, C. S. Stohler, and J.-K. Zubieta, “Personality trait predictors of placebo analgesia and neurobiological correlates,” Neuropsychopharmacology, vol. 38, no. 4, p. 639, 2013.
[39] L. C. Quilty, M. Sellbom, J. L. Tackett, and R. M. Bagby, “Personality trait predictors of bipolar disorder symptoms,” Psychiatry Research, vol. 169, no. 2, pp. 159–163, 2009.
[40] R. P. Tett, D. N. Jackson, and M. Rothstein, “Personality measures as predictors of job performance: a meta-analytic review,” Personnel psychology, vol. 44, no. 4, pp. 703–742, 1991.
[41] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E. Seligman, “Automatic personality assessment through social media language.,” Journal of personality and social psychology, vol. 108, no. 6, p. 934, 2015.
[42] N. Cesare, C. Grant, and E. O. Nsoesie, “Detection of user demographics on social media: A review of methods and recommendations for best practices,” arXiv preprint arXiv:1702.01807, 2017.
[43] J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,” arXiv preprint arXiv:1609.05807, 2016.
[44] O. P. John and S. Srivastava, “The big five trait taxonomy: History, measurement, and theoretical perspectives,” Handbook of personality: Theory and research, vol. 2, no. 1999, pp. 102–138, 1999.
[45] J. M. Kleinberg, “An impossibility theorem for clustering,” in Advances in neural information processing systems, pp. 463–470, 2003.
[46] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
[47] R. Shamir and R. Sharan, “1 1 algorithmic approaches to clustering gene expression data,” Current Topics in Computational Molecular Biology, p. 269, 2002.
[48] S. Dixon, E. Pampalk, and G. Widmer, “Classification of dance music by periodicity patterns,” 2003.
[49] N. Meinshausen and B. Yu, “Lasso-type recovery of sparse representations for high-dimensional data,” The Annals of Statistics, pp. 246–270, 2009.
[50] R. R. Lau, L. Sigelman, and I. B. Rovner, “The effects of negative political campaigns: a meta-analytic reassessment,” Journal of Politics, vol. 69, no. 4, pp. 1176–1209, 2007.
[51] L. Huddy, “Group identity and political cohesion,” Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, 2003.
[52] N. R. Branscombe and D. L. Wann, “Collective self-esteem consequences of outgroup derogation when a valued social identity is on trial,” European Journal of Social Psychology, vol. 24, no. 6, pp. 641–657, 1994.
[53] M. C. Schneider and A. L. Bos, “Measuring stereotypes of female politicians,” Political Psychology, vol. 35, no. 2, pp. 245–266, 2014.
[54] K. Dolan, “The impact of gender stereotyped evaluations on support for women candidates,” Political Behavior, vol. 32, no. 1, pp. 69–88, 2010.
[55] A. Vehtari, A. Gelman, and J. Gabry, “Efficient implementation of leave-one-out cross-validation and waic for evaluating fitted bayesian models,” arXiv preprint arXiv:1507.04544, 2015.
[56] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[57] D. Preoţiuc-Pietro, Y. Liu, D. Hopkins, and L. Ungar, “Beyond binary labels: political ideology prediction of twitter users,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 729–740, 2017.
[58] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
[59] S. Sniekers, S. Stringer, K. Watanabe, P. R. Jansen, J. R. Coleman, E. Krapohl, E. Taskesen, A. R. Hammerschlag, A. Okbay, D. Zabaneh, et al., “Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence,” Nature genetics, vol. 49, no. 7, p. 1107, 2017.
[60] B. W. Gottlieb, J. Gottlieb, D. Berkell, and L. Levy, “Sociometric status and solitary play of ld boys and girls,” Journal of Learning Disabilities, vol. 19, no. 10, pp. 619–622, 1986.
[61] T. Bryan, R. Wheeler, J. Felcan, and T. Henek, ““come on, dummy” an observational study of children’s communications,” Journal of Learning Disabilities, vol. 9, no. 10, pp. 661–669, 1976.
[62] S. H. McConaughy and D. R. Ritter, “Social competence and behavioral problems of learning disabled boys aged 6-11,” Journal of Learning Disabilities, vol. 19, no. 1, pp. 39–45, 1986.
[63] C. J. Bellanti and K. L. Bierman, “Disentangling the impact of low cognitive ability and inattention on social behavior and peer relationships,” Journal of Clinical Child Psychology, vol. 29, no. 1, pp. 66–75, 2000.
[64] J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural processing letters, vol. 9, no. 3, pp. 293–300, 1999.
[65] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” Numerische mathematik, vol. 14, no. 5, pp. 403–420, 1970.
[66] M. Iyyer, P. Enns, J. Boyd-Graber, and P. Resnik, “Political ideology detection using recursive neural networks,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1113–1122, 2014.
[67] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm,” arXiv preprint arXiv:1708.00524, 2017.
[68] Wired, “The decline and fall of an ultra rich online gaming empire,” 2008.
[69] C. News, “Trump campaign phased out use of cambridge analytica data before election,” 2018.
[70] Pew, “Religious landscape study.,” 2014.
[71] J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang, “Men also like shopping: Reducing gender bias amplification using corpus-level constraints,” arXiv preprint arXiv:1707.09457, 2017.
[72] W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398, 2013.
[73] S. Clinchant and F. Perronnin, “Aggregating continuous word embeddings for information retrieval,” in Proceedings of the workshop on continuous vector space models and their compositionality, pp. 100–109, 2013.
[74] J. Luo, S. E. Sorour, K. Goda, and T. Mine, “Predicting student grade based on free-style comments using word2vec and ann by considering prediction results obtained in consecutive lessons.,” International Educational Data Mining Society, 2015.
[75] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” in Advances in Neural Information Processing Systems, pp. 4349–4357, 2016.
[76] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226, ACM, 2012.
[77] M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth, “Rawlsian fairness for machine learning,” arXiv preprint arXiv:1610.09559, 2016.
[78] M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” in Advances in Neural Information Processing Systems, pp. 4069–4079, 2017.
[79] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi, “Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment,” in Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180, International World Wide Web Conferences Steering Committee, 2017.
[80] M. Hardt, E. Price, N. Srebro, et al., “Equality of opportunity in supervised learning,” in Advances in neural information processing systems, pp. 3315–3323, 2016.
[81] N. Grgic-Hlaca, M. B. Zafar, K. P. Gummadi, and A. Weller, “The case for process fairness in learning: Feature selection for fair decision making,” in NIPS Symposium on Machine Learning and the Law, vol. 1, p. 2, 2016.
[82] V. Saroglou, “Religiousness as a cultural adaptation of basic traits: A five-factor model perspective,” Personality and social psychology review, vol. 14, no. 1, pp. 108–125, 2010.

Table 6: Pairwise Politics Words

IPA anarchist centrist conserv. dem. doesn’t care hates pol. indep. lib. liber. repub. v. lib. IPA fuck wishes wishes smh yay rain congrats wishes money church damn anarchist excited wishes driving excited lol dont driving excited ready ready excited centrist xd fuck lord today tattoo shit surgery shit government school damn conservative xd fuck damn fb anymore shit damn damn art school damn democrat xd fuck wishes tonight stupid fuck died wishes government church wishes doesn’t care packers fuck wishes lord smh shit definitely wishes government church damn hates politics class music dey loves fb tht movie wishes email camp damn independent xd fuck wishes lord valentine sitting fuck wishes beer parents damn liberal xd fuck final lord im xd im gonna government church damn libertarian xd fuck headache lord walk xd dont till packing girls vacation republican xd fuck wishes wishes smh mum fuck minute wishes fucking damn very liberal xd xd boy lord im xd xd school missing im im

Table 7: Politics Confusion Matrix

Predicted Label IPA anar. centrist conserv. dem. doesn’t care hates pol. indep. lib. liber. repub. v. lib. Total IPA 0 2 3 3 11 18 2 1 3 1 16 1 61 anarchist 0 24 4 3 5 21 1 3 15 5 4 3 88 centrist 2 9 74 40 52 66 3 6 95 7 43 4 401 conservative 2 5 29 113 26 31 0 7 53 5 62 0 333 democrat 5 17 53 36 321 101 4 18 80 9 89 3 736 doesn’t care 3 39 51 29 122 373 12 12 105 12 102 9 869 hates politics 0 4 6 1 6 30 5 3 6 0 2 0 63 independent 0 8 16 13 35 22 1 8 29 4 25 1 162 liberal 1 18 51 27 74 51 6 6 223 15 24 13 509 libertarian 0 12 17 9 17 28 0 6 32 11 12 4 148 republican 1 8 19 57 67 64 1 8 29 3 179 3 439 very liberal 0 4 25 2 11 22 2 2 67 1 6 3 145 Total 14 150 348 333 747 827 37 80 737 73 564 44 3954

Table 8: Pairwise Religion Words

	athiest	agnostic	catholic	christian	none
athiest		boyfriend	thank	church	lol
agnostic	fucking		prayers	church	lol
catholic	fucking	fucking		lol	lol
christian	fucking	fucking	mass		xmas
none	fucking	apartment	god	church

The most highly weighted word from each pairwise classifier. Word implies top label.

Table 9: Religion Confusion Matrix

	Predicted Label
	Atheist	Agnostic	Catholic	Christian	None	Total
Atheist	68	29	17	16	21	151
Agnostic	54	69	27	55	11	216
Catholic	27	37	172	130	9	375
Christian	35	48	126	560	26	795
None	22	11	19	50	39	141
Total	206	194	361	811	106	1678

In the remaining tables the top 55 words are listed in order for each trait.

Table 10: Personality Words

Openness		Conscientious		Extroversion
-	+	-	+	-	+
bored	art	lost	gym	internet	party
boring	poetry	fucking	ready	quiet	guys
husband	beautiful	xd	weekend	bored	amazing
attitude	universe	phone	excited	listening	audition
shopping	peace	im	success	apparently	baby
dinner	poem	bored	finished	computer	haha
tv	writing	fuck	studying	stupid	dance
game	books	gonna	busy	pc	girls
proud	theatre	sick	vacation	hmm	fabulous
ur	dream	procrastination	arm	anime	blast
dentist	mind	internet	officially	tt	ready
daughter	book	computer	family	dark	im
dont	woman	probably	relax	probably	wine
haha	guitar	cousins	tennis	sims	success
stupid	damn	hates	wonderful	didn	lets
ni	awesome	sims	special	watching	excited
ipod	tea	anybody	win	slow	super
bed	apartment	charger	glad	depressing	text
justin	insomnia	sister	piano	calculus	chill
gift	xd	playing	scholarship	kind	phone
2nd	adventure	grounded	received	anymore	dear
hurt	cali	poker	lmao	repost	parties
ohh	far	tt	degrees	maybe	support
baseball	philosophy	status	state	draw	loves
mum	sigh	momma	tons	yay	pics
pray	nature	ftw	motor	trying	hey
school	maybe	press	obstacles	books	big
repost	music	dead	research	shadow	hit
booked	blues	failed	extremely	bother	met
lord	chill	forgot	circumstances	damned	pirate
ops	fam	depression	workout	suppose	ben
nice	epic	lazy	paid	reading	rocked
tmr	places	youtube	100	cat	gang
dam	rights	420	hit	poor	sex
idol	dragons	school	surgery	depression	sing
snowing	woot	http	law	sigh	btw
pissed	vampire	awsome	university	games	gorgeous
shut	soul	pokemon	anatomy	drawing	musical
maths	eclipse	woke	blessings	odd	cali
msn	drawing	dammit	hmmmm	10th	girlfriend
aldean	strange	hair	husband	pokemon	stoked
vodka	planet	wished	counting	nice	folks
comes	yay	cleaning	calc	essay	ponder
eid	dreams	fine	louis	pointless	wanna
alot	blood	dunno	delhi	managed	hahahaha
waste	sushi	enemy	final	looks	pool
worst	smoking	social	drive	grr	tanning
kiero	contact	yo	lets	darkness	hello
soo	lines	procrastinator	iphone	saw	pumped
mas	deep	black	lunch	crying	chillin
staff	genius	magic	yankees	lonely	theatre
12	novel	wasn	running	laptop	kiss
piss	smh	fans	weather	shouldn	office
transformers	worried	kinda	zone	paranoid	cock
car	folks	trying	smart	walking	lauren

Table 11: Personality Words Continued

Agreeable		Neurotic		Satisfaction With Life
-	+	-	+	-	+
fucking	wonderful	loving	sick	bored	family
stupid	amazing	girlfriend	nervous	fuck	loving
kill	awesome	wife	stressed	fucking	hope
shopping	haha	awesome	depression	hates	thankful
shit	smile	parties	depressed	bday	india
burn	happiness	party	anymore	apparently	wonderful
bitch	phone	weekend	lonely	damn	busy
pissed	urself	haha	stress	internet	friend
punch	family	doing	fucking	zero	heart
hates	blessed	game	tired	chem	man
death	status	sunday	trying	wat	yum
hell	music	kansas	depressing	supposed	fb
suck	woop	guy	sims	ma	glad
freak	hands	delicious	anxiety	hating	beautiful
piss	heart	beach	worst	spend	lauren
dead	spirit	definitely	hair	la	lord
xmas	smiles	swag	fed	dumb	wine
karma	guy	started	scream	young	swim
fight	moment	ready	fine	british	energy
blood	beautiful	hunting	nightmare	killed	lunch
awful	movie	power	rip	hmm	locked
deal	theres	funniest	tears	france	woot
misery	car	melody	horrible	chances	sons
fuck	dancing	hawaii	flu	simply	special
enemies	lord	action	worse	exams	trust
fake	guitar	hit	issues	mum	wish
pathetic	sore	chillin	scared	main	weeks
irony	sara	workout	stressful	hate	day
dumb	help	flow	fml	edge	father
cunt	walk	portland	care	dnt	tried
care	excited	seat	shes	party	journey
devil	prayers	smart	stressing	kept	hospital
black	knowing	snowboarding	ugh	dat	email
ich	valentines	knowing	sad	didn	business
russian	borrow	sore	gary	months	santa
idiots	laura	greatest	hates	du	walked
cunts	notifications	success	die	rain	lights
wtf	beard	basketball	actually	pass	kingdom
crap	reli	update	scary	bus	work
truck	snowboarding	gf	boyfriend	okay	lol
deleted	sorry	women	pills	australia	mommy
anger	chillin	gotta	crying	shooting	turkey
die	hill	followed	kitty	england	nap
tu	whats	jumping	awful	africa	revenge
nightmare	hearts	fool	hurt	rachel	truly
annoyed	kindness	dancing	bored	fml	son
rip	study	greatness	fair	metal	final
bloody	worry	blast	screaming	uk	reached
drama	clients	woke	dreading	school	survived
bitches	smells	ass	friggin	wtf	dont
stupidity	troops	hitting	suicide	matt	0
hair	sing	cock	miserable	freakin	god
wifi	goood	wise	quiet	15	kitchen
fat	holy	kiss	xd	200	normal
rage	faster	toes	sadness	free	blessing

Table 12: Sensational Interest Words

Militaristic		Violent-Occult		Intellectual Recreation
-	+	-	+	-	+
sleeping	man	lord	hell	im	life
ugh	xbox	pray	zombie	course	jon
sad	gets	cousins	damn	boring	beautiful
excited	gotta	church	fuck	painful	dancing
lovely	good	michael	bitch	decision	yoga
oh	training	allah	ass	hurts	thankful
hair	headed	jesus	drink	bus	peace
shopping	truck	game	blood	game	kinda
husband	guitar	0	lmao	stupid	truly
sick	guys	summer	xd	bak	la
cares	bro	gosh	woot	hero	ich
mum	gun	praise	halloween	problem	miss
boyfriend	boom	sunday	play	yeah	likes
lady	epic	dad	guys	christ	comfort
concert	work	loving	drunk	gona	lol
today	weight	mum	thanx	id	wtf
gaga	gym	team	animal	sittin	insomnia
okay	bike	hospital	sanity	die	chicken
pic	dang	10	fucking	horse	children
adorable	game	tv	dragons	yell	tired
sunday	blast	christ	burn	chuck	lovely
ordered	lol	heal	vampires	2day	ap
birth	war	usa	blah	tommorrow	funny
lots	black	personal	man	ow	things
poor	fish	best	loved	bored	man
ben	military	ray	pissed	fukin	simple
fine	woot	nervous	lil	inbox	thank
settings	12	thing	bday	race	period
birthday	till	look	send	basketball	countdown
cousins	ppl	week	body	word	baby
shoes	brave	2morrow	metal	rhys	beach
art	17	quite	head	tell	hey
omg	fight	poor	piss	step	depression
stop	success	brazil	blast	wats	jobs
wear	marines	cup	theyre	coke	cure
prince	hrs	zumba	cause	football	manage
round	sword	account	gun	penguins	sugar
come	make	website	death	won	aware
neighbours	ko	tryna	vampire	facebookers	singing
basement	friend	study	bleh	letters	egg
music	hit	haha	tattoo	awsome	taste
speak	play	soccer	ppl	dont	rains
thoughts	pics	feeling	dead	blah	log
story	hahaha	christmas	woman	till	taught
weird	troops	round	purple	playing	coolest
awful	army	youth	peaceful	dead	yellow
quite	running	story	message	fact	cheers
rachel	mag	bible	shit	learned	small
hear	strong	woah	angel	visit	society
alice	knw	grace	kinda	address	fly
tea	beer	prayers	tongue	14	social
promised	hehehe	plan	sushi	chilling	boo
jesus	comwatch	feat	wolf	win	beauty
actually	xoxo	anybody	poke	pokemon	world
counting	run	stressed	kick	sees	sunshine

Table 13: Sensational Interest Words Continued

Occult Credulousness		Wholesome Activities		Belief in Star Sign
-	+	-	+	No	Yes
church	zombie	coke	woot	minutes	omg
praise	ass	michigan	camping	didn	im
jesus	bitch	stupid	fish	church	ready
lord	halloween	pathetic	life	praise	friend
bible	animal	ops	yesterday	jesus	mind
christ	sign	husband	beautiful	probably	ass
team	omg	didn	rain	physics	butt
quite	xd	hurts	man	jess	stay
loving	job	kurwa	mexico	white	tom
pray	woot	evil	wish	religion	tomarrow
paper	wish	afternoon	river	iv	october
game	cure	problem	love	officially	promise
blessed	street	taylor	path	imagine	lol
salvation	vampire	idea	moon	christ	searching
ops	guys	jess	haha	germany	bitch
summer	send	glee	snow	giants	bleh
michael	lol	mum	bike	saw	eye
spent	thanx	mental	hahaha	wants	cute
youth	luck	meg	ghost	north	family
cousins	wtf	mad	baking	decided	halloween
word	nature	360	grandma	discovered	hanging
god	cancer	pissed	live	11th	haunted
homework	woohoo	club	goin	ouch	japanese
alarm	miss	uni	sky	skin	mother
0	barely	lyrics	cat	doesn	dinner
haha	moment	head	animal	bacon	card
player	bar	recently	netflix	train	help
sunday	safe	internet	birds	hahaha	bored
college	proud	min	smile	lasts	luv
wedding	woman	lesson	happiness	america	luck
prayer	mom	bus	mom	haven	neighbors
glory	away	rly	yum	burning	yum
forgiveness	dare	debate	fishing	pray	fireworks
ann	inches	kevin	truly	thursday	lmao
mm	boyfriend	inbox	fell	jessica	tt
political	il	jeez	make	prince	tired
fact	nd	official	clean	knew	person
greatest	pls	nite	portland	umm	nd
confused	aware	ms	smells	quiero	watch
appreciated	xmas	lack	lake	deserves	ya
algebra	hell	saw	create	heres	prom
brazil	solstice	troy	making	finds	crazy
travel	date	sims	2010	kim	upload
daughter	vampires	school	josh	heard	elf
bacon	copy	thinks	children	punch	hehe
laura	purple	thanking	laughing	groups	crack
personal	haunted	die	sa	car	bell
week	theyre	hates	law	amazing	human
greater	lmao	stuff	jobs	sick	finish
statement	later	band	earth	tape	lnk
messed	interview	thieves	gets	drink	june
tv	peeps	feels	hehehe	morn	change
em	peaceful	elm	swimming	dallas	costume
poor	drunk	germany	wa	cops	shit
trust	dunno	sat	monkeys	waters	decorating

Table 14: Psychographic Words

Self-Disclosure		Fair-Mindedness		IQ
-	+	-	+	-	+
bored	family	bored	excited	nite	exam
fuck	loving	wat	business	ur	hours
fucking	hope	soon	says	lmao	sigh
hates	thankful	dad	apartment	alot	camping
bday	india	xd	great	family	finish
apparently	wonderful	stage	delicious	omg	paper
damn	busy	pass	sure	2011	wtf
internet	friend	moon	needed	city	il
zero	heart	haha	seattle	lol	finds
chem	man	kitty	uni	help	important
wat	yum	tired	airport	wew	read
supposed	fb	mum	thankful	boy	physics
ma	glad	farmville	dallas	heart	google
hating	beautiful	face	learn	com	ra
spend	lauren	drank	weekend	angie	xd
la	lord	fuk	definitely	www	wifi
dumb	wine	fuck	dinner	ha	text
young	swim	ma	card	333	weeks
british	energy	sun	amazing	tom	studying
killed	lunch	crap	tonight	goodnight	training
hmm	locked	bday	exciting	history	course
france	woot	shit	degrees	xxx	student
chances	sons	hopefully	classes	xdd	magic
simply	special	feel	support	friend	kinda
exams	trust	fails	priceless	morning	everytime
mum	wish	va	oh	mum	raining
main	weeks	big	certainly	christmas	yea
hate	day	nd	government	eid	maths
edge	father	smoke	ticket	kay	semester
dnt	tried	yay	food	gives	maybe
party	journey	watchin	january	din	exciting
kept	hospital	sick	couple	beautiful	point
dat	email	wedding	php	folks	kno
didn	business	regret	journey	luv	excited
months	santa	seconds	universe	0	imma
du	walked	im	21	hacked	months
rain	lights	ignore	grateful	secrets	flying
pass	kingdom	tt	pay	iam	final
bus	work	lose	size	forgiveness	nah
okay	lol	marriage	class	strong	library
australia	mommy	lolz	situation	busy	used
shooting	turkey	fukin	duke	jo	chem
england	nap	picture	honesty	hate	brain
africa	revenge	blessing	austin	ti	everybody
rachel	truly	slow	tires	nightmare	awesome
fml	son	anxiety	29	ayaw	groups
metal	final	cy3	sisters	prayer	progress
uk	reached	library	mother	fought	champion
school	survived	tmr	heading	ow	calculus
wtf	dont	fucking	bc	sana	behave
matt	0	epic	piece	tired	den
freakin	god	il	summer	afraid	badly
15	kitchen	marie	breakfast	para	times
200	normal	bunch	answer	sum	mobil
free	blessing	loaded	surgery	movie	fun

Table 15: Religion and Politics Words

Agnostic vs Atheist		A. vs A. (Fair)		Religious vs Not		Conservative vs Liberal
extra	physics	miles	fucking	church	fucking	church	damn
miles	fucking	working	physics	pray	fuck	truck	happy
turn	snowing	extra	wat	prayers	xmas	government	fb
hair	shit	awhile	fuck	god	damn	america	smh
packing	wat	packing	bloody	easter	shit	pray	marriage
awhile	write	turn	shit	lord	bloody	haha	xmas
insane	bloody	super	write	blessed	hell	prayers	chicago
working	enter	hubby	maths	christmas	ass	deer	sex
hubby	fuck	chill	xx	ugh	india	christmas	hell
points	sigh	free	snowing	praying	zombie	country	fam
friggin	thinks	sleepy	enter	hw	fuckin	tonight	lovely
santa	talk	santa	thinks	ppl	halloween	17	halloween
heck	weeks	heck	talk	prayer	car	lord	health
wishes	town	ready	science	game	yay	awesome	saw
child	science	friggin	sigh	believe	social	god	yoga
free	maths	vacation	hai	family	xx	military	celebrate
boyfriend	degrees	work	cancer	ready	quite	texas	gay
lady	lolz	thursday	person	fb	religion	freedom	apartment
learn	record	late	coursework	bless	drink	savior	wtf
super	xmas	points	town	im	oh	dad	thoughts
houston	tom	pack	xd	calling	using	bible	shit
service	hai	houston	weeks	dang	shitty	jesus	glee
pack	person	insane	tom	paper	internet	supper	gaga
late	dat	ya	film	jesus	fucked	girls	da
wanting	tyler	relax	dat	school	damned	huge	palin
hasn	cod	join	kill	camp	omfg	praying	2010
mai	afraid	busy	lolz	gosh	meh	camp	help
sleepy	untill	learn	msn	heart	indian	soldiers	mexico
worked	present	child	english	success	post	byu	mother
fly	wifey	headed	xmas	mary	head	christ	indian
chill	movie	favorite	chemistry	strength	cricket	disney	lady
join	xx	beautiful	afraid	butt	any1	risen	studies
kyle	cancer	season	na	fishing	dragon	beach	social
dun	boring	san	pierced	brother	lovely	tournament	art
thursday	rape	fly	dick	military	body	troops	holiday
taken	month	worked	anatomy	sad	new	schools	shitty
childhood	kill	service	bbc	uncle	boyfriend	leave	ve
mother	welcome	spring	tell	senior	teeth	ill	free
thank	clinton	wanting	untill	fair	nice	blonde	earthquake
headed	nicht	halloween	memory	mom	fml	armed	street
ya	ay	lady	bothered	tan	warped	xbox	phone
london	brother	thank	horse	watching	woke	reagan	lakers
beautiful	tell	childhood	record	em	bleh	utah	ur
jail	hadn	mai	cod	president	wednesday	served	fine
hates	pierced	hair	ki	smh	gods	tide	relationship
paperwork	wild	paperwork	nicht	love	afford	gators	asshole
wanna	use	4th	sheep	haha	japanese	pelosi	worried
clear	perfect	hopefully	chem	future	tongue	husband	purple
san	return	missed	brother	best	robert	stinks	putting
til	needed	peace	fancy	emails	sophie	trial	omg
halloween	paid	hasn	degrees	goin	holy	picked	nature
bring	half	trip	disease	football	eye	beep	prop
kindle	horse	mother	realised	latest	tattoo	gun	black
vida	disease	sunshine	room	thank	decent	trailer	live
powers	chuck	kyle	religion	matthew	odd	ready	eid

Table 16: Race Words

White vs. Black		White vs. Asian		Black vs. Asian
tonight	smh	tonight	asian	smh	korea
dad	fb	blonde	tt	fb	sa
stupid	lord	town	tmr	lord	na
exited	fam	fuckin	korea	wit	asian
thinks	nigeria	ass	chinese	aint	gay
ends	yall	college	ng	da	chinese
journey	black	gas	na	yall	internet
meet	fathers	dope	korean	lol	korean
hahahahahaha	mj	worse	china	say	monday
fun	yuh	night	ang	fam	xd
awesome	gon	men	aq	jackson	tmr
ability	birthday	sons	asians	cos	shooting
night	mad	adult	chen	michael	philippines
mas	lol	pretty	guys	finals	3d
wouldnt	finish	theres	thailand	ass	babe
chargers	dey	idea	taiwan	yuh	heaven
bein	asap	hope	karaoke	black	important
aftr	tryna	ability	sa	ny	tan
pretty	jackson	melissa	chan	sooooo	thailand
eh	came	state	dream	mad	yummy
tom	degrassi	unique	company	mind	completely
exhausted	wat	weekend	craving	season	woot
tough	iz	screaming	zzz	wat	smell
great	hw	mamaya	holiday	birthday	bought
running	pple	tune	wanna	degrassi	fly
exciting	jus	figure	ms	hell	tt
yankees	braids	inside	nguyen	chelsea	worry
politics	haters	exited	singapore	woman	ruin
mirror	females	wine	yang	figure	passed
pepsi	misfits	5th	hu	african	skating
roll	god	superman	fat	nigeria	english
animal	man	emotionally	ftw	episode	belong
grr	omg	sell	gg	iz	shot
gay	african	sitting	rice	smart	mas
tattoo	desires	february	tttt	saying	grandpa
2nite	chelsea	easter	damnit	asap	lazy
spend	female	months	555	attention	sacrifice
monday	cousin	saying	wong	knowing	grr
sorrow	holla	expecting	achieve	ki	broken
ed	smart	rollin	pa	meeting	yang
healthy	laker	wheres	mode	hw	beer
enjoyable	favour	eminem	lmao	sings	chatting
actually	dis	apparently	pride	india	meet
charity	money	does	bbq	gas	shoulder
delete	happy	status	super	self	ang
iron	mii	legit	1st	ready	funn
blonde	aye	30	long	college	shoes
comforted	hard	wen	skating	mj	wood
standards	wuz	eric	mean	search	dad
shot	ready	yelled	heart	years	apart
chose	nigga	mis	dx	misfits	aj
chatting	jamaica	breaking	faith	blessed	line
damage	bus	homework	expectation	advice	jack
innocent	facebook	actually	research	boys	totally
thnx	cos	wishes	hard	fathers	tomorrow