Sentiment Paradoxes in Social Networks:
Why Your Friends Are More Positive Than You?

Xinyi Zhou, Shengmin Jin, Reza Zafarani
Data Lab, Department of EECS, Syracuse University
{zhouxinyi, shengmin, reza}@data.syr.edu

Abstract

Most people consider their friends to be more positive than themselves, exhibiting a Sentiment Paradox. Psychology research attributes this paradox to human cognition bias. With the goal to understand this phenomenon, we study sentiment paradoxes in social networks. Our work shows that social connections (friends, followees, or followers) of users are indeed (not just illusively) more positive than the users themselves. This is mostly due to positive users having more friends. We identify five sentiment paradoxes at different network levels ranging from triads to large-scale communities. Empirical and theoretical evidence are provided to validate the existence of such sentiment paradoxes. By investigating the relationships between the sentiment paradox and other well-developed network paradoxes, i.e., friendship paradox and activity paradox, we find that user sentiments are positively correlated to their number of friends but rarely to their social activity. Finally, we demonstrate how sentiment paradoxes can be used to predict user sentiments.

Introduction

Sentiment analysis, also known as opinion mining, analyzes individual opinions, sentiments, and attitudes towards various entities such as individuals, products, organizations, and topics (?). Relying on advancements in natural language processing and machine learning (?), existing studies in sentiment analysis have made substantial progress towards classifying and predicting sentiments of independent individuals and groups in social networks, focusing on tasks such as content sentiment prediction and review spam detection (?).

However, existing studies have less explored sentiments among interacting users as their sentiments may be dependent. With the unavoidable peer influence in social networks (?), it is essential to consider user interactions when studying their sentiments, especially in large-scale social networks. For example, Lin et al. find that the stress levels of users are closely related to that of their friends on social media (?). A common observation with respect to sentiments of interacting users is that many users feel their friends are more positive than themselves, experiencing a sentiment paradox. There have been many discussions on why this phenomenon takes place, with psychology research linking it to human cognition biases. For example, Jordan et al. (?) show that most people have a tendency to underestimate the negative feelings of others. With many users in social networks experiencing a sentiment paradox – being less positive than their friends – can we attribute all such perceptions to human cognition biases alone? In other words, do sentiment paradoxes exist not only in user cognition, but also in reality?

The Present Work: Sentiment Paradoxes in Networks. We investigate whether users are indeed less positive than their social connections (friends, followees, or followers) in social networks. Possible interpretations for the existence (or non-existence) of the sentiment paradox are provided by mining the relationships between sentiment paradoxes and other well-established network paradoxes, i.e., friendship paradox and activity paradox. Finally, as an application, we show how sentiment paradoxes can be used to predict user sentiments (positive or negative).

Overall, the specific contributions of this paper are:

1.

Five sentiment paradoxes are identified in both undirected (friend) and directed (follower and followee) social networks and at multiple network levels (triad-, community-, and network-level). Our work shows that for most users their friends are indeed more positive than them, mostly due to the fact that more positive users are more likely to have more friends, followers, and followees;
2.

We empirically and mathematically verify each paradox, where our mathematical analysis allows us to determine whether such a paradox is expected to exist;
3.

We investigate the connections between the sentiment paradox and two other well-established network paradoxes: friendship paradox and activity paradox. Our results reveal factors that can determine the (1) existence and the (2) magnitude of sentiment paradoxes; and
4.

We demonstrate the role that the sentiment paradox that can play in practical applications, i.e., in predicting s user’s sentiments by looking at the sentiments of his or her social connections.

The remainder of the work is organized as follows. Experimental setup is presented first, followed by a formal definition for the general sentiment paradox, and sentiment paradoxes in triads and communities. Then, we investigate the connections of sentiment paradox to other network paradoxes, which helps determine the existence and magnitude of sentiment paradoxes. One application of sentiment paradoxes, i.e., user sentiment prediction, is provided next. Finally, a literature review and some conclusions are provided.

Experimental Setup

To study sentiment paradoxes at different network levels, proper data that contains user sentiments and their network information (e.g., friends or communities joined) is required.

Dataset. ¹¹1The data is released at: http://data.syr.edu/get/EmotionPatterns We have crawled a large-scale dataset from LiveJournal (?; ?). LiveJournal is a popular blogging and social networking site, where users can maintain a blog, journal, or a diary. Data collected from LiveJournal has several advantages:

1.

Sentiments are directly provided: when posting blogs, users can report their sentiments by selecting a mood (e.g., excited, busy or angry, see Appendix for a sample user post with its mood), which provides access to sentiment ground truth;
2.

Both undirected (friends) and directed (followees and followers) relationships of users exist, i.e., a directed and an undirected network. Note that these relationships are separate: a user can choose to subscribe (follow) another person without approval, and/or befriend (with approval) so that the two users can share some private posts. Hence, two users can follow each other (i.e., two directed edges in the directed network), but not be friends (no edge between them in the undirected network); and
3.

Community membership information is explicitly available on user profiles (i.e., no need to detect them using community detection, which can be subjective (?)). User can decide to create or join communities. Each community is often related to some topic and users in the same community often share similar interests.

We have collected the following data spanning more than 10 years (from 1999 to 2010) (?; ?): (i) users and their posts to obtain user sentiments; (ii) friendships and followee/follower relationships among users; and (iii) community memberships for all users. We only retain users with ten or more posts to exclude occasionally active or inactive users. We plot the post distribution of these excluded users, which is provided in the Appendix and indicates that most ( $\sim$ 97%) of these users have posted nothing. As moods are limited (132 moods), we manually convert each mood in our dataset to its sentiment polarity (details can be seen in the Appendix): positive ( $+$ , e.g., cheerful, excited and happy), negative ( $-$ , e.g., angry, annoyed and depressed) or neutral ( $0$ , e.g., busy). Some statistics on our data is provided in Table 1.

Table 1: Data Statistics

Data	Number
# Users	115,444
# User Posts	12,404,868
# Friendships	246,164
# Followee/Follower Relationships	793,948
# Triads (Undirected)	262,036
# Triads (Directed)	7,264,770
# Communities	200,208
# Community Memberships	2,473,074

User Sentiments. Traditionally, to obtain user sentiment values, one can rely on self-assessment surveys, which is time-consuming for large number of users. Here, we adopt an automatic way by investigating the historical posts of users (see Definition 1) (?).

Definition 1 (Subjective Well-Being (SWB)²²2Strictly, what we study is a component of the SWB rather than SWB itself as it includes both affective and cognitive parts.)

Assume user $u$ has $N_{p}(u)$ positive posts and $N_{n}(u)$ negative posts. The SWB value of $u$ , denoted by $S(u)$ , is

S(u)=\frac{N_{p}(u)-N_{n}(u)}{N_{p}(u)+N_{n}(u)}.

(1)

Note that $S(u)\in[-1,+1]$ , where $-1$ shows an extremely negative user and $+1$ , an extremely positive user.

Sentiment Distribution. The distribution of user sentiments can be obtained by plotting the SWB distribution. As observed in Figure 1, the SWB distribution approximately follows a normal distribution $\mathcal{N}(\mu,\sigma^{2})$ , which aligns with findings on sentiment distributions in other social networks (e.g., that of Twitter (?)). Using a normal fit, we obtain the SWB distribution, which is $\mathcal{N}(0,0.08)$ .

Refer to caption — Figure 1: User Sentiment Distribution (SWB values)

Sentiment Paradox

In this section, we mainly focus on a “general” sentiment paradox, which can be observed among all users of a network. We first present the definition of sentiment paradox, followed by experiments to verify its existence and mathematical proofs on whether the paradox is expected to exist.

Definition. The sentiment paradox, or network sentiment paradox, can be summarized as

Paradox 1 (Sentiment Paradox)

Your friends, followees, or followers are more positive than you.

Empirical Verification. To verify whether the sentiment paradox exists, we take the following three steps:

I. User Sentiment Assignment. We calculate how positive or negative users are by computing their SWB (Definition 1).

II. Computing Paradox Magnitude. Consider a user whose SWB value is less than the (i) mean or (ii) median of the SWB values of his or her connections. We can consider three types of connections: friends, followees, or followers. We consider this user as being less positive than his or her connections and denote the proportion of such users in a social network as the sentiment paradox magnitude:

Definition 2 (Sentiment Paradox Magnitude)

Consider a social network with a set of users $U=\{u_{i}\}$ , $i=1,2,\cdots,n$ , each with a SWB value $S(u_{i})$ . For each user $u_{i}$ , we denote her connections (either friends, followers, or followees) by $c_{ij}$ , $j=1,2,\cdots,m$ . The sentiment paradox magnitude of the network is calculated by

M=\frac{\sum_{u_{i}}I(\,S(u_{i})<\bar{S}(c_{ij})\,)}{||U||},

(2)

where $I(a<b)=1$ when $a<b$ and is 0 otherwise. The value $\bar{S}(c_{ij})$ is the (i) mean or the (ii) median of $S(c_{ij})$ ’s.

When the magnitude is greater than 0.5, we say the sentiment paradox strongly holds in the network. When the magnitude is less than or equal to 0.5, but is still greater than the proportion of users that are more positive than their connections, and that of users that are as positive as their connections, we say the paradox weakly holds in the network.

III. Assessing Statistical Significance. To assess the statistical significance of our findings, we compute the difference between the observed and expected paradox magnitudes. To compute the expected paradox magnitude, we maintain the SWB distribution of users and their network structure, but randomly assign a SWB value to each user. After random assignments, we recalculate the paradox magnitude. We conduct this experiment 1,000 times, and record the average magnitude, which is the expected paradox magnitude. To assess how significant the difference between observed and expected paradox magnitudes is, we compute surprise (?):

Definition 3 (Surprise)

In a social network with $N$ users, if paradox magnitude is $M$ and expected paradox magnitude is $M_{\mathsf{Expected}}$ ( $M_{\mathsf{Expected}}\neq 0$ and $1$ ), the surprise value is

\mathsf{surprise}=\frac{N(M-M_{\mathsf{Expected}})}{\sqrt{N\cdot M_{\mathsf{Expected}}(1-M_{\mathsf{Expected}})}}.

(3)

A surprise value on the order of tens is highly significant, indicating that $p$ -values are nearly zero.

Following this three-step process, we obtain the results in Table 2, where “Holds” (“Does not hold”) indicates that users are less (more) positive than their connections. “Unknown” indicates that users are as positive as their connections. In both undirected and directed networks, irrespective of using mean or the median, we make the following three observations:

1.

Sentiment paradox strongly holds within the network, as the observed sentiment paradox magnitudes (user proportions) for all networks are greater than 0.5;
2.

The observed paradox magnitude values are all higher than the expected paradox magnitudes; and
3.

The surprise values are on the order of tens, which indicate that the observed paradox magnitudes are all statistically significant.

Table 2: Empirical Verification of Sentiment Paradox. The observed proportions are greater than 0.5 indicates that the sentiment paradox strongly holds within networks. The observed proportions are higher than the expected ones, where such difference is statistically significant as the surprise values are on the order of tens.

Sentiment Paradox		Observed		Exp.	Surprise
Sentiment Paradox		#Users	Prop.	Prop.	Surprise
Friends	Holds	43,786	55.11%	50.10%	28.21
	Does not hold	35,588	44.79%	49.90%	-28.77
	Unknown	79	0.10%	0.00%	-
	Total	79,453	100%	100%	-
Followees	Holds	44,336	54.11%	49.50%	26.41
	Does not hold	36,699	44.79%	49.41%	-26.50
	Unknown	906	1.10%	1.09%	0.47
	Total	81,941	100%	100%	-
Followers	Holds	26,287	52.87%	49.58%	14.67
	Does not hold	23,015	46.29%	49.60%	-14.74
	Unknown	420	0.84%	0.82%	0.40
	Total	49,722	100%	100%	-

(a) User Sentiments vs. Average Sentiments of Connections

Sentiment Paradox		Observed		Exp.	Surprise
Sentiment Paradox		#Users	Prop.	Prop.	Surprise
Friends	Holds	43,621	54.90%	50.00%	27.61
	Does not hold	35,684	44.91%	49.96%	-28.44
	Unknown	148	0.19%	0.04%	20.76
	Total	79,453	100%	100%	-
Followees	Holds	43,311	52.86%	49.00%	22.13
	Does not hold	36,789	44.90%	48.91%	-23.01
	Unknown	1,841	2.24%	2.09%	3.07
	Total	81,941	100%	100%	-
Followers	Holds	25,542	51.37%	48.92%	10.92
	Does not hold	23,073	46.40%	48.91%	-11.17
	Unknown	1,107	2.23%	2.17%	0.85
	Total	49,722	100%	100%	-

(b) User Sentiments vs. Median Sentiments of Connections

Table 3: Sentiment Paradoxes at the Triad and Community Levels

		Triad Sentiment Paradox				Common-neighbor Sentiment Paradox				Community Sentiment Paradox				Common-interest Sentiment Paradox
		Observed		Exp.	Surp.	Observed		Exp.	Surp.	Observed		Exp.	Surp.	Observed		Exp.	Surp.
		#Users	Prop.	Prop.	Surp.	#Users	Prop.	Prop.	Surp.	#Users	Prop.	Prop.	Surp.	#Users	Prop.	Prop.	Surp.
Friends	Holds	11,044	52.52%	48.41%	11.91	11,333	53.89%	50.07%	11.10	20,381	51.19%	49.10%	12.21	20,742	53.12%	49.88%	12.77
	Does not hold	9,298	44.22%	48.27%	-11.74	9,695	46.11%	49.93%	-11.10	17,887	45.80%	48.93%	-12.37	18,254	46.75%	50.06%	-13.09
	Unknown	686	3.26%	3.32%	-0.47	0	0.00%	0.00%	-	783	2.01%	1.97%	-0.57	55	0.14%	0.06%	6.67
	Total	21,028	100%	100%	-	21,028	100%	100%	-	39,051	100%	100%	-	39,051	100%	100%	-
Followees	Holds	37,108	53.58%	49.03%	23.95	37,698	54.43%	50.01%	23.27	19,176	50.44%	46.96%	13.60	20,081	52.82%	48.78%	-15.75
	Does not hold	30,380	44.51%	49.00%	-23.64	31,564	45.57%	49.99%	-23.27	16,502	43.40%	46.99%	-14.03	16,985	44.67%	48.75%	-15.89
	Unknown	1,326	1.91%	1.97%	-1.14	2	0.00%	0.00%	-	2,341	6.16%	6.05%	0.90	953	2.51%	2.47%	0.43
	Total	69,264	100%	100%	-	69,264	100%	100%	-	38,019	100%	100%	-	38,019	100%	100%	-
Followees	Holds	39,496	52.61%	49.08%	19.35	39,952	53.22%	50.01%	17.59	13,314	49.95%	46.92%	9.91	13,872	52.04%	48.77%	10.69
	Does not hold	34,183	45.54%	49.07%	-19.35	35,116	46.78%	49.99%	-17.59	11,733	44.01%	47.18%	-10.37	12,127	45.49%	48.79%	-10.79
	Unknown	1,389	1.85%	1.85%	0	0	0.00%	0.00%	-	1,611	6.04%	5.90%	0.97	659	2.47%	2.44%	0.32
	Total	75,068	100%	100%	-	75,068	100%	100%	-	26,658	100%	100%	-	26,658	100%	100%	-

Theoretical Verification. We observe empirically from Table 2 that the expected magnitudes for the sentiment paradox to hold and not hold are almost the same, indicating that the paradox is not expected to exist within networks. Theorem 1 theoretically justifies this empirical observation.

Theorem 1

If the SWB values of users in a network follow a normal distribution $\mathcal{N}(\mu,\sigma^{2})$ , the SWB value of a user is expected to be equal to the (i) mean and (ii) median of SWB values of his connections (friends, followees, or followers), i.e., a user is expected to be as positive as his connections.

Proof 1

Assume random variable $S\in[-1,1]$ , which denotes the SWB values of users, follows a normal distribution $\mathcal{N}(\mu,\sigma^{2})$ . Assume we sample $n$ times from this distribution, where $n$ is the number of users in the network. For each user $u_{i}$ , $i=1,2,\cdots,n$ , we have two sample sets: (i) $S_{u}^{i}$ , with size one, as the SWB value of user $u_{i}$ ; (ii) $S_{f}^{i}$ , as the SWB values of the connections (friends, followees, or followers) of user $u_{i}$ . Assume $\bar{S}_{u}$ denote the sample mean from the sample $S_{u}^{i}$ , and $\bar{S}_{f}$ is the sample mean from samples $S_{f}^{i}$ . Note that $\bar{S}_{u}=S(u_{i})$ as $||S_{u}^{i}||=1$ . $S\sim\mathcal{N}(\mu,\sigma^{2})$ indicates $\bar{S}_{u}\sim\mathcal{N}(\mu,\sigma^{2})$ and $\bar{S}_{f}\sim\mathcal{N}(\mu,\sigma_{c}^{2})$ , for some $\sigma_{c}$ . Hence, $E(\bar{S}_{u})=E(\bar{S}_{f})=\mu$ and $E(\bar{S}_{u}-\bar{S}_{f})=0$ , which indicates that the expected SWB values of users are equal to the expected average SWB values of their connections. For the median, the proof is similar as the median and the mean are the same value in a normal distribution.

Sentiment Paradox in Triads

Triads (a group of three connected people) are crucial components of networks, especially when investigating ideas such as structural balance (i.e., a friend of a friend is a friend), clusterability (i.e., friends form small groups) and transitivity (i.e., $A$ is a friend of $B$ , $B$ is a friend of $C$ , so $A$ is a friend $C$ ). In this section, we study sentiments among interacting users in triads. We will investigate if a sentiment paradox exists at the triad level, and aim to provide explanations on the existence (or lack) of such a paradox.

To explore if the sentiment paradox holds within triads, assume user $u_{i}$ , $i=1,2,\cdots,n$ is a member of triads $t_{j}$ , $j=1,2,\cdots,m$ . Within each $t_{j}$ , we compare the sentiment (i.e., the SWB value) of $u_{i}$ and the mean and median of that of his two connections (friends, followees, or followers). Note at the triad level, the results based on either the mean or the median should be the same because each user has no more than two connections within a triad. If in the majority of triads that $u_{i}$ is a member of, $u_{i}$ is less positive than his connections, we consider $u_{i}$ as a user exhibiting sentiment paradox at the triad level. Then, we compute the proportion of such users in the overall network, and conduct significance analysis similar to how it was conducted in last section. Note such paradox is not expected to exist, as proved in Theorem 1. However, Table 3 provides the empirical results, which can be summarized as:

Paradox 2 (Triad Sentiment Paradox)

Your friends, followees, or followers in a triad are more positive than you.

On the other hand, one can think of a triad as a pair of users sharing a common neighbor. This observation motivates us to verify whether there is a sentiment paradox between users and their connections with whom users share common neighbors. Hence, we conduct an experiment similar to the one performed to validate the sentiment paradox in last section, except that we only compare sentiments between users and a subset of their connections with whom users share a triad, i.e., have at least one common neighbor. The paradox is not expected to exist either, as proved in Theorem 1. However, the empirical results in Table 3 show that:

Paradox 3 (Common-neighbor Sentiment Paradox)

Your friends (followees or followers) with whom you share friends (followees or followers) are more positive than you.

Sentiment Paradox in Communities

Similar to triads, communities also play an important role in understanding social networks (?). We take a similar approach to triad-level paradoxes and study sentiments among interacting users within communities. However, we highlight that unlike triads, communities can have different sizes (i.e., number of members) and different levels of interactions among their members (i.e., different densities). Hence, in addition to investigating whether sentiment paradox exists at the community level, we also assess whether the existence or magnitude of such paradoxes depend on the size or level of connections within communities. Similar to triads, we do not expect a sentiment paradox to exist at the community-level as proved in Theorem 1.

First, we assume user $u_{i}$ , $i=1,2,\cdots,n$ is involved in communities $c_{k}$ , $k=1,2,\cdots,p$ . For each $c_{k}$ , we compare the sentiment (i.e., the SWB value) of $u_{i}$ with the mean and median of that of his connections (friends, followees, or followers) within the community. If in a majority of communities that $u_{i}$ belongs to, $u_{i}$ is less positive than his connections in the community, we denote $u_{i}$ is exhibiting the paradox. Finally, we compute the fraction of such users in the network and perform statistical significance analysis. Table 3 has the results, which we summarize as the following paradox:

Paradox 4 (Community Sentiment Paradox)

Your friends, followees, or followers within a community are more positive than you.

Additionally, we conduct an experiment similar to the one performed to verify the common-neighbor sentiment paradox in the last section, in which we only compare sentiments between users and a subset of their connections with whom users share a community. Statistical significance is computed in the same way as before.

The results are shown in Table 3. We observe a sentiment paradox within such users. As users in our dataset mostly form communities around a common interest (one community often refers to a certain topic), we denote this paradox as the common-interest sentiment paradox:

Paradox 5 (Common-interest Sentiment Paradox)

Your friends, followees, or followers with whom you share some interests are more positive than you.

Impact of Community Variations. To assess the impact of variations in communities on the sentiment paradox, we measure the paradox magnitude by changing the community size (i.e., number of members/nodes) or community density (i.e., number of connections/edges). We vary the community size from 1 to 1,200, and community density from 1 to (i) 2,000 in the undirected network, and (ii) 4,000 in the directed network.³³3Community size and density both follow a power-law-like distribution. Only around ten (less than 0.005%) communities exist in which number of users is greater than 1,200, or friendships among users is greater than 2,000, or following and follower relationships among users is greater than 4,000. Then, we calculate the paradox magnitude within these communities. The results are in Figure 2.

We observe from Figure 2 that the proportion of communities within which the paradox holds becomes larger as communities become larger or denser, ultimately reaching 0.7 (and at times, over 0.9), while the expected magnitude is always around 0.5. Even when the community size or its density is very small, the observed proportion of communities where the sentiment paradox holds is always higher than the expected proportion, and the observed proportion of communities where the sentiment paradox does not hold is always lower than the expected values.

Connections to Network Paradoxes

Social networks exhibit many counter-intuitive properties. We assess the connection between sentiment paradox and two of the most commonly observed network paradoxes: (1) friendship paradox and (2) activity paradox, which provides opportunities to investigate the relationships among user sentiments, social connections and activities.

Friendship Paradox

One of the most well-known network paradoxes is the friendship paradox, first observed by Feld (?), which states that users have fewer friends than their friends, on average. The paradox also holds for the median value (?). In our data, in addition to sentiment paradoxes at different network levels, we observe a friendship paradox for most users (both mean and median, see Figure 5). Here, we explore the interplay between node degrees and the sentiment paradox, motivated by the following facts:

1.

A user with degree $d$ contributes his SWB value $d$ times to the average SWB distribution of friends of users. We illustrate this fact using an example.

Example 1

Consider a simple undirected friendship network (see Figure 4) with four users $u_{1}$ , $u_{2}$ , $u_{3}$ , and $u_{4}$ , whose corresponding SWB values are $+0.1$ , $-0.2$ , $-0.3$ , and $-0.4$ . For user $u_{1}$ , the average SWB value of his friends is $\frac{(-0.2)+(-0.3)+(-0.4)}{3}$ . The average SWB values of the friends of $u_{2}$ , $u_{3}$ , and $u_{4}$ are all $+0.1$ . Thus, user $u_{1}$ with degree three contributes his SWB value three times (as a friend of $u_{2}$ , $u_{3}$ , and $u_{4}$ ) to the average SWB distribution of friends of users, while other users, with degree one, contribute their SWB values only once.

Figure 4: An Illustration for Example 1
2.

Compared with the distribution of user sentiments (SWBs, see Figure 1), the distributions of the mean and median of sentiments of friends, followees or followers of users are skewed to the right (see Figure 5 for friends), i.e., the latter distributions are a weighted version of the former one, weighting those with comparatively high SWB values more.

Given these two facts, it is natural to study whether users with relatively high (in-, out-) degrees are more positive (i.e., have higher SWB values) than those with relatively low (in-, out-) degrees. We verify this hypothesis in two ways.

Table 4: Correlations Between User Sentiments (SWB) and Number of Social Connections. Correlations are all positive and highly significant as

p

-values approach zero.

	Correlation Coefficient
(SWB, # Friends)	0.05 ( $p$ -value $\rightarrow 0$ )
(SWB, # Followees)	0.04 ( $p$ -value $\rightarrow 0$ )
(SWB, # Followers)	0.04 ( $p$ -value $\rightarrow 0$ )

Table 5: Average Number of Social Connections for Positive, Negative and Neutral Users. In general, positive users have (30% to 45%) more social connections than the others.

Users	Friends	Followees	Followers
Positive ( $+$ )	5.09	8.01	8.16
Negative ( $-$ )	3.58	5.99	5.88
Neutral (0)	3.70	5.90	5.80
Overall	4.25	6.88	6.88

(a) When SWB values of users are between

-1

and

+1

Users	Friends	Followees	Followers
Positive ( $+$ )	5.05	7.94	8.07
Negative ( $-$ )	3.67	6.06	5.96
Neutral (0)	3.70	5.90	5.80
Overall	4.27	6.87	6.88

(b) When SWB values are between

-0.5

and

+0.5

I. Without labeling a user as positive, negative or neutral, we directly compute the correlation coefficient between user sentiments (SWB values) and their number of (i) friends (degrees), (ii) followees (out-degrees), and (iii) followers (in-degrees). Results are presented in Table 4, which indicate that the sentiments of users are positively correlated to their number of friends, followees and followers with $p$ -values approach zero (i.e., results are highly significant).

We further visualize such correlations, where a least-square fit of the trend is provided in Figure 6. It further validates that the SWB value of users is positively related to their (in-, out-) degrees, especially when SWB values are between $-0.5$ and $+0.5$ . Concretely, the (in-, out-) degree of users with SWB value $+0.5$ are about six more than that of users with SWB value $-0.5$ . In other words, more positive users usually have more friends, followees and followers.

Note that the group of users with extreme sentiments (i.e., whose SWB values approach $-1$ or $+1$ ) are not representative enough as they occupy a very small proportion (less than 7%) in the population. In Figure 6, it can be observed that such users seem to have significantly larger degrees. However, such phenomenon can be attributed to the degree distribution, which is almost power-law and has a heavy tail. Once one user in the group has a significantly larger degree, it easily leads to a peak when using the least-square fit. Hence, our conclusion here is obtained mainly based on users with SWB values between $-0.5$ and $+0.5$ as these users are more representative than users whose sentiments are not in this range.

II. We further consider all users and group them based on being positive ( $S(u)>0$ ), negative ( $S(u)<0$ ), or neutral ( $S(u)=0$ ). The average (in-, out-) degree for users within each group is then calculated and provided in Table 6(a). Table 6(a) shows that positive users have more friends, followees, and followers compared to the other users, which is also above the averages computed for all users. In particular the number of friends, followees, and followers of positive users are on average 30% to 45% greater than that of negative users. Additionally, we also compute the average (in-, out-) degree of users whose SWB values are between $-0.5$ and $+0.5$ . The results are shown in Table 6(b), which leads to the same conclusion.

Table 6: Feature List

Feature Group (# Features)	Description
General Sentiment Paradox (6)	Mean and median of SWB values of one’s social connections (friends, followees, or followers)
Triad Sentiment Paradox (6)	Mean and median of SWB values of one’s social connections in a triad
Common-neighbor Sentiment Paradox (6)	Mean and median of SWBs of one’s social connections with whom he shares common neighbors
Community Sentiment Paradox (6)	Mean and median of SWB values of one’s social connections in a community
Common-interest Sentiment Paradox (6)	Mean and median of SWBs of one’s social connections with whom he shares common interests
Friendship Paradox (9)	The number of degrees, in-degrees and out-degrees of oneself &
	Mean and median of degrees, in-degrees and out-degrees of one’s social connections

Activity Paradox

In social networks such as Twitter (?) and Digg (?), researchers have discovered the existence of an activity paradox: users are less active than their friends, on average. We observe an activity paradox, less strongly than friendship paradox, in our data (see Figure 5), which inspires us to explore the potential relationships between user activity and sentiments. We quantify user activity as follows:

Definition 4 (User Activity)

Suppose user $u$ has posted $n$ posts in a social network, where the first post was published on date $d_{1}$ and the last one was posted on date $d_{n}$ . The activity of user $u$ is defined as

A(u)=\frac{\Delta{t}}{d_{n}-d_{1}}n,

(4)

where $\Delta{t}$ is size of time window where we measure activity.

Note that $\frac{d_{n}-d_{1}}{\Delta{t}}$ indicates how many $\Delta{t}$ ’s (e.g., months) a user has been active on the network and $A(u)$ indicates the average number of posts of user $u$ in $\Delta{t}$ period.

The relation between user sentiments (i.e., SWB values) and activity (i.e., the average number of posts of users per $\Delta{t}$ =30 days) is shown in Figure 7. We observe that the value of $\Delta{t}$ does not influence the result. There seems to be a slight positive correlation; however, the number of user posts is rarely affected by the user SWB values if between $-0.5$ and $+0.5$ , which covers 93% of our users. Therefore, we do not consider user activity to have a significant impact on sentiment paradox.

Table 7: Distribution of Positive, Negative and Neutral Users

User	Number	Proportion
Positive ( $+$ )	50,705	43.92%
Negative ( $-$ )	61,066	52.90%
Neutral ( $0$ )	3,673	3.18%
Total	115,444	100.00%

User Sentiment Prediction

Sentiment paradoxes reveal a certain relationship between users and their social connections (friends, followees, or followers) at triad-, community- and network-level, i.e., in general, users are less positive than their social connections. In this section, we demonstrate how a user’s sentiment (positive or negative) can be predicted by investigating the general sentiments of his social connections at triad-, community- and network-level. Before the elaboration, we provide the distribution of positive, negative and neutral users in Table 7.

To predict user sentiments (positive or negative), we regard it as a binary classification problem to be addressed within a supervised machine learning framework. Specifically, we represent each user as a set of machine learning features. Features are inspired by the validated five sentiment paradoxes, and the friendship paradox which has been validated to be correlated to the sentiment paradox. Features are presented in Table 6. Then, several common supervised classifiers are trained and used to predict user sentiments (positive or negative) based on ten-fold cross-validation. Results are evaluated by accuracy (ACC) and AUC value.

Table 9 provides the overall results. Results are obtained by using XGBoost (?), which performs best among supervised classifiers including logistic regression, decision trees, naïve Bayes, random forests, and Support Vector Machine (SVM) - see Table 8 for their performance comparison. Results in Table 9 indicate that (1) among single sentiment paradoxes, the general one performs best in predicting user sentiments; (2) when combining all sentiment paradoxes, it outperforms when separately using single ones; and (3) in general, using all features (five sentiment paradoxes plus the correlated friendship paradox) perform best, which can achieve 62% accuracy ratio and 60% AUC value.

Table 8: Performance Comparison by using Various Supervised Classifiers in Predicting User Sentiments. XGBoost performs best among all selections.

Classifier	ACC	AUC
Logistic Regression	.613	.590
Decision Tree	.596	.580
Naïve Bayes	.600	.580
Random Forest	.590	.580
SVM	.587	.573
XGBoost	.620	.600

Table 9: Performance Comparison by using Various Feature Groups in Predicting User Sentiments. Among all single sentiment paradoxes, the general one performs best. When combining all sentiment paradoxes outperforms when separately using single ones.

Feature Group	ACC	AUC
General Sentiment Paradox	.601	.581
Triad Sentiment Paradox	.589	.568
Common-neighbor Sentiment Paradox	.592	.571
Community Sentiment Paradox	.590	.569
Common-interest Sentiment Paradox	.589	.569
All Sentiment Paradoxes	.617	.600
All Sentiment Paradoxes + Friendship Paradox	.620	.600

Related Work

Numerous studies have looked at network paradoxes, especially, friendship paradox. For example, friendship paradox has been observed in many online (e.g., Quora (?) and Twitter (?)) and offline networks (?). Kooti et al. (?) have observed and proved that friendship paradox must exist, based on the mean value, in social networks as node degrees always follow heavy-tail distributions. A recent study shows friendship paradox can help identify popular users by connecting it with friendship strength among users (?). Nettasinghe and Krishnamurthy utilize friendship paradox to design randomized polling methods for social networks (?).

In addition to friendship paradox, recent literature has focused on the explorations of other network paradoxes such as user activity paradox, happiness paradox (?), and scientific collaboration paradox indicating that researchers always have fewer coauthors, citations, publications, and lower h-index than their collaborators (?; ?; ?). The development of these non-friendship paradoxes, however, is in an early stage, whose potential interpretations for their existence and applications have rarely investigated.

Conclusion

This work is motivated by the limitation of current sentiment analysis studies that have not considered interacting users in social networks, and by the phenomenon that people often consider their friends to be more positive than themselves, often attributed to human cognition biases in psychology. We present five sentiment paradoxes at the triad-, community- and network-level, all empirically and mathematically validated in undirected (i.e., with friendships) and directed (i.e., with follower and followee relationships) networks. Through studying the relations between the sentiment paradox and various characteristics of networks and users, we observe that (i) sentiment distributions determine the expected (non-) existence of sentiment paradoxes; (ii) node degrees (i.e., the number of social connections of users) is positively correlated to user sentiments; and (iii) there is no clear pattern between user sentiments and user activity. These connections (though not causal) can be responsible for the existence and magnitude of sentiment paradoxes in social networks, which cannot be solely attributed to human cognition bias as they generally exist in social networks. Additionally, we firstly demonstrate the application of our findings in predicting user sentiment prediction. In the future, we will further analyze causal relationships between user’s connections (degrees) and sentiments. Sentiment paradoxes in dynamic social networks as well as the “like’ and “comments” networks will be part of our future studies.

References

[Bagrow, Danforth, and Mitchell 2017] Bagrow, J. P.; Danforth, C. M.; and Mitchell, L. 2017. Which friends are more popular than you?: Contact strength and the friendship paradox in social networks. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 103–108. ACM.
[Benevenuto, Laender, and Alves 2016] Benevenuto, F.; Laender, A. H.; and Alves, B. L. 2016. The H-index paradox: your coauthors have a higher H-index than you do. Scientometrics 106(1):469–474.
[Bollen et al. 2011] Bollen, J.; Gonçalves, B.; Ruan, G.; and Mao, H. 2011. Happiness is assortative in online social networks. Artificial Life 17(3):237–251.
[Bollen et al. 2017] Bollen, J.; Gonçalves, B.; van de Leemput, I.; and Ruan, G. 2017. The happiness paradox: your friends are happier than you. EPJ Data Science 6(1):4.
[Breck and Cardie 2017] Breck, E., and Cardie, C. 2017. Opinion mining and sentiment analysis. In The Oxford Handbook of Computational Linguistics 2nd edition. Oxford University Press.
[Chen and Guestrin 2016] Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794. ACM.
[Eom and Jo 2014] Eom, Y.-H., and Jo, H.-H. 2014. Generalized friendship paradox in complex networks: The case of scientific collaboration. Scientific reports 4:4603.
[Feld 1991] Feld, S. L. 1991. Why your friends have more friends than you do. American Journal of Sociology 96(6):1464–1477.
[Ferrara and Yang 2015] Ferrara, E., and Yang, Z. 2015. Quantifying the effect of sentiment on information diffusion in social media. PeerJ Computer Science 1:e26.
[Fortunato 2010] Fortunato, S. 2010. Community detection in graphs. Physics reports 486(3-5):75–174.
[Fotouhi, Momeni, and Rabbat 2014] Fotouhi, B.; Momeni, N.; and Rabbat, M. G. 2014. Generalized friendship paradox: An analytical approach. In International Conference on Social Informatics, 339–352. Springer.
[Hodas, Kooti, and Lerman 2013] Hodas, N. O.; Kooti, F.; and Lerman, K. 2013. Friendship paradox redux: Your friends are more interesting than you. ICWSM 13:8–10.
[Hodas, Kooti, and Lerman 2014] Hodas, N.; Kooti, F.; and Lerman, K. 2014. Network weirdness: Exploring the origins of network paradoxes. In Proceedings of the ICWSM, 8–10.
[Iyer 2018] Iyer, S. 2018. Friendship paradoxes on Quora. In Guide to Big Data Applications. Springer. 205–244.
[Jin and Zafarani 2017] Jin, S., and Zafarani, R. 2017. Emotions in social networks: Distributions, patterns, and models. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 1907–1916.
[Jordan et al. 2011] Jordan, A. H.; Monin, B.; Dweck, C. S.; Lovett, B. J.; John, O. P.; and Gross, J. J. 2011. Misery has more company than people think: Underestimating the prevalence of others’ negative emotions. Personality and Social Psychology Bulletin 37(1):120–135.
[Leskovec, Huttenlocher, and Kleinberg 2010] Leskovec, J.; Huttenlocher, D.; and Kleinberg, J. 2010. Signed networks in social media. In Proceedings of the SIGCHI conference on human factors in computing systems, 1361–1370. ACM.
[Lewis, Gonzalez, and Kaufman 2012] Lewis, K.; Gonzalez, M.; and Kaufman, J. 2012. Social selection and peer influence in an online social network. Proceedings of the National Academy of Sciences 109(1):68–72.
[Lin et al. 2017] Lin, H.; Jia, J.; Qiu, J.; Zhang, Y.; Shen, G.; Xie, L.; Tang, J.; Feng, L.; and Chua, T.-S. 2017. Detecting stress based on social interactions in social networks. IEEE Transactions on Knowledge and Data Engineering 29(9):1820–1833.
[Liu 2012] Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167.
[Nettasinghe and Krishnamurthy 2018] Nettasinghe, B., and Krishnamurthy, V. 2018. What do your friends think? efficient polling methods for networks using friendship paradox. arXiv preprint arXiv:1802.06505.
[Pires, Marquitti, and Guimaraes Jr 2017] Pires, M. M.; Marquitti, F. M.; and Guimaraes Jr, P. R. 2017. The friendship paradox in species-rich ecological networks: Implications for conservation and monitoring. Biological conservation 209:245–252.
[Ravi and Ravi 2015] Ravi, K., and Ravi, V. 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems 89:14–46.
[Zafarani and Liu 2009] Zafarani, R., and Liu, H. 2009. Social computing data repository at asu.

Appendix

Post Distribution of Inactive Users

In our experiments, we only retain users with ten or more posts to exclude occasionally active or inactive users. The post distribution of these excluded users is presented in Figure 8. The distribution indicates that a substantial number of users being not considered in our study has posted nothing.

Illustration of User Post

When posting blogs on LiveJournal, users can explicitly report their sentiments by selecting a mood. An illustration can be seen in Figure 9, where the mood is Chipper.

Sentiment Polarity Identification of Moods

There are 132 moods available on LiveJournal. The sentiment polarity (positive, neutral, or negative) of these moods is determined as shown in Table 10.

Table 10: Moods and Their Sentiment Polarity

	Mood
Positive	amused; accomplished; artistic; bouncy; calm; cheerful;
	content; creative; complacent; determined; excited;
	ecstatic; energetic; full; good; giggly; grateful; happy;
	hopeful; high; impressed; jubilant; loved; peaceful;
	productive; pleased; rejuvenated; sympathetic; satisfied;
	thankful; thoughtful; working;
Neutral	awake; blah; blank; busy; chipper; contemplative; ditzy;
	dorky; drained; drunk; flirty; geeky; groggy; horny; hot;
	hyper; indescribable; intimidated; mellow; nerdy; okay;
	optimistic; recumbent; refreshed; relaxed; rushed;
	shocked; sleepy; surprised;
Negative	aggravated; angry; annoyed; anxious; apathetic; bitchy;
	bored; cold; confused; cranky; crappy; crazy; crushed;
	curious; cynical; depressed; devious; dirty; disappointed;
	discontent; distressed; embarrassed; enthralled; envious;
	exanimate; enraged; exhausted; frustrated; giddy;
	gloomy; grumpy; guilty; hungry; indifferent; infuriated;
	irate; irritated; jealous; lazy; lethargic; listless; lonely;
	melancholy; mischievous; moody; morose; naughty;
	nauseated; nervous; nostalgic; numb; pessimistic; pissed
	off; pensive; predatory; quixotic; rejected; relieved;
	restless; sad; scared; sick; silly; sore; stressed; thirsty;
	tired; touched; uncomfortable; weird; worried;

Sentiment Paradoxes in Social Networks: Why Your Friends Are More Positive Than You?