Reconstructing social sensitivity from evolution of
content volume in Twitter
Abstract
We set up a simple mathematical model for the dynamics of public interest in terms of media coverage and social interactions. We test the model on a series of events related to violence in the US during 2020, using the volume of tweets and retweets as a proxy of public interest, and the volume of news as a proxy of media coverage. The model succesfully fits the data and allows inferring a measure of social sensibility that correlates with human mobility data. These findings suggest the basic ingredients and mechanisms that regulate social responses capable of ignite social mobilizations.
Introduction
The continuous expansion of the digital environment creates new and faster ways to exchange information and opinions [1]. At the same time, it also provides access to unprecedented amounts of data, allowing the quantitative investigation of the forces that underlie the diffusion of information [2] and the formation of public interest [3, 4].
Dynamical systems have been particularly successful in identifying collective mechanisms that give rise to public opinion [5, 6]. Using variables that describe the expansions and contractions of content volume, these models explain empirical data remarkably well [7].
In the domain of social media, the emergence of extreme opinions that arise from moderate initial conditions has been recently disclosed [8, 9]. But extreme social reactions appear also beyond the domain of opinions and debates. Normally, people react to the news by sharing information and discussing opinions. In a few occasions however, and under heightened social sensitivity, a reactive state may emerge giving rise to street manifestations, protests and riots [10] that have been extensively studied and modeled [11, 12].
Is it possible to extract a measure of social sensitivity from content volume coming from digital media? Here we hypothesize that the social sensitivity regulates the dynamics of the public reacting to the media coverage of massive events. To test this hypothesis, we set up a deliberately simple model for public interest modulated by media coverage and social interactions [13, 14, 15, 16] that allows us to infer a measure of social sensitivity. We capitalize on the paradigmatic model developed by Granovetter [17] based on the concept of critical mass, which represents the fraction of interested people needed to induce interest to the rest of the population. We investigate this in connection with a series of highly sensitive events that took place in the US during 2020.

Data
The Black Lives Matter movement [18] encompasses events of different nature and volume of activity in the social media (Figure 1a). Here we analyze a subset of the events well covered by media sources, as displayed in chronological order in Figure 1b. The time evolution of these events is shown in Figure 1c. Representing the public interest, we show in black the volume of tweets and retweets containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery and Andrés Guardado. Red filled curves correspond to the volume of tweets from the 29 most followed official media accounts containing the same keywords (See Supplementary Material for further details in the quantifying of media coverage).
Besides a general resemblance of the public interest (black) and media coverage (red) across events, the traces are not merely copies of each other. One common feature is that the public interest grows faster than coverage at the events’ onset. Here we propose that this effect is explained by the hightened social sensibility that characterizes these type of events. For this purpose we set up a model based on the one proposed by Granovetter, detailed in the next secion.
Model
Our approach is grounded in the Granovetter model [17], originally proposed to explain the emergence of riots. In this model, agents adopt a binary state which we interpret as interested () or non-interested () in the event. The dynamics of the system is described in terms of the public interest, the fraction , where is the size of the system. Each agent is characterized by a threshold , which is the fraction of interested agents needed to induce interest on the agent. Thresholds are random variables whose cumulative distribution is interpreted here as social engagement, given that it represents the fraction of agents that become active due to their threshold lies below .
Assuming that thresholds are normally distributed , we have:
(1) |
When is low, small groups can trigger the interest to the rest of the system. On the contrary, high values of would require a bigger fraction of interested people to induce interest to rest of the population. We therefore identify the quantity as the social sensitivity of the population.
In his original model, Granovetter described the dynamics of the public interest regardless of the influence of the media. To include this, we propose a modified model that reads (see details in Materials and Methods):
(2) |
When the system is not exposed to the media (), we recover the original Granovetter model, in which the dynamics of the public interest is driven by the social engagement with a time scale controlled by . On the contrary, when exposure to the media is maximum (), the public interest is only driven by the media coverage . In the general case , media coverage acts as an external field that modulates the public interest. Of course, media coverage and public interest are far from being independent of each other. On the contrary, they feed one another; in mathematical terms, a closed model would require another equation for the evolution of media coverage modulated by the public interest. Here we tackle this by feeding equation 2 with the experimental time traces of media coverage .
Let us summarize the principal components of our model. On the one hand, we have two variables that quantify the volume of opinions and information shared by people: the public interest and the media coverage . On the other hand, we have the social sensitivity and the social engagement , two variables that describe macroscopic interactions among people. In the next section we show that the social variables can be reliably derived from the collected data shown in Figure 1c.

Results
To reconstruct the social variables, we integrate the equation 2 using the volume of twitted news as a proxy for the coverage . We seek for the functions that minimize the difference between the resulting public interest and the volume of tweets and retweets. In Figure 2 show the best fitting curves for the public interest (upper panels) and the reconstructed social engagement and social sensitivity (lower panels, grey and red curves respectively).
The two social variables are of a different nature. In fact, while the engagement is a threshold-based variable whose dynamics can be expected to be fast, represents the slower, more gradual build-up of social sensitivity across the whole population. Accordingly, we find that this variable changes appreciably over periods of 15 days which is, as expected, longer than the typical time scales of the media coverage and public interest (see Methods).
A summary of the fitting parameters is found in Table 1. We find that exposure is rather stable across events, . This says that, although media coverage is important, people is mainly influenced by the social environment in this kind of events. Different from exposure, the time scale decreases when the events accumulate over time. This is also expected, since the first four events (Arbery, Floyd, Brooks and Guardado) occured one immediately after the other (Figure 1b), speeding up the dynamics of public interest along the sequence. After a pause of about two months, the same speeding up effect is seen for Taylor, that occurred right after Blake.
Exposure | Timescale ( days) | |
---|---|---|
Arbery | ||
Floyd | ||
Brooks | ||
Guardado | ||
Blake | ||
Taylor |
To quantify the performance of our model, we compare its goodness of fit with two basic models: one in which coverage is predicted by public interest alone, and the opposite one where public interest is predicted by coverage alone (see Methods). In Figure 3 we show the mean square errors for the three models. Comparison of the basic models shows that public interest tends to predict coverage better than coverage predicts public interest. This is also apparent from the time series (Figure 1c), where the response of the media is slower with respect to the public interest at the onset of the events. Our model performs better than the basic models, explaining this delay by an increase in the social sensitivity . The inferred dynamics of the social variables are shown in the lower panels of Figure 2.


Bottom panels of Figure 2 show periods of time of increasing social sensitivity, which leads to a sudden increase of the social engagement, when a macroscopic fraction of agents becomes interested in the events. If this dynamics is accurate, we should expect an impact beyond the digital environment. To investigate the emergence of measurable collective activity associated to an increase in social sensitivity, we collected mobility measures across the US territory [19]. In Figure 4 we show attendance to recreation places, groceries, pharmacies and public transport stations in the counties and periods of time when the events took place. We find different degrees of correlation between the social sensitivity and mobility patterns for the most populous events using a lag of 3 days. In the case of Floyd, social sensitivity correlates with all the four mobility measures, with a peak in the mean Spearman’s rank coefficient ; in the case of Taylor, for two of the mobility measures; for Blake, and only one measure ( in all cases). The last three events were less massive, and we find no significant correlations with social sensitivity accordingly.
Taken together, these results suggest that our low-dimensional approximation of the Granovetter model captures the basic ingredients that regulate social responses of very different magnitudes, which are indeed capable of ignite social mobilizations. The model implements the hypothesis that agents become involved from media exposure and also from the presence of a critical mass of interested agents in the system, which leads to characterize the social sensitivity of the population.
Discussion and conclusions
Fluctuating interactions among people in massive social events are difficult to quantify. In this work we set up a simple mathematical model that allows us to infer how social interaction influence volume content representing public interest knowing media coverage. We then test our model on Twitter volume data related to the Black Lives Matter movement.
We find that this formulation fits the experimental series better than two models in which public interest and coverage explain each other, in absence of social interactions. Crucially, we show that the evolution of the social sensibility correlates with variations in mobility data due to protests and riots during the events that draw the majority of the atention, presumably the most moving ones.
A possible limitation of our model is related to the assumption of uniform mixing [20] in pairwise interaction, given that public interest time series were collected from Twitter, which is indeed highly structured. The topology of social networks plays a key role when dealing with opinions of different sign, which give rise for instance to the emergence of echo chambers [21, 22]. In our work, however, we are dealing with the volume of keywords, regardless of ideological leanings. We show that, at least for the highly sensitive events analyzed here, the structure of the network can be disregarded, in line with similar models that assume uniform mixing and succesfully explain the dynamics of time series related to different hashtags in Twitter [5, 6, 7]. Simple as it is, our model provides direct and interpretable measures of social engagement.
We are witnessing a rapid development of algorithms that are capable of organizing massive amounts of data based on statistical relationships. However, this growth has not been matched with a development of dynamical models capable of generalize our knowledge [23]. We hope that this work contributes to our understanding of public interest, showing the potential of a simple model to explain social reactions within and outside the digital environment.
Materials and Methods
Corpus of data
We collected all the available tweets in English containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery, Andrés Guardado, Sean Monterrosa, Daniel Prude, Deon Kay, Walter Wallace Jr., Dijon Kizzee, Andre Hill, Dolal Idd, Marcellis Stinnette and Hakim Littleton, in a period of one month around a significant event related to each topic.
Tweets were collected using the Twitter API v2 [24]. We also collected the tweets with the same keywords from the group of most followed news accounts in Twitter [25]: @cnnbrk, @nytimes, @CNN, @BBCBreaking, @BBCWorld, @TheEconomist, @Reuters, @WSJ, @TIME, @ABC, @washingtonpost, @AP, @XHNews, @ndtv, @HuffPost, @BreakingNews, @guardian, @FinancialTimes, @SkyNews, @AJEnglish, @SkyNewsBreak, @Newsweek, @CNBC, @France24_en, @guardiannews, @RT_com, @Independent, @CBCNews, @Telegraph. Twitter data is available at https://shorturl.ae/AcUge.
Mobility measures correspond to the US County associated to each event. From all mobility-related time series we extract the trend to compare with social engagement.
We provide here a brief context of the analyzed events. George Perry Floyd Jr. was murdered by a police officer in Minneapolis (Ramsey County), Minnesota, on May 25, 2020. Breonna Taylor was fatally shot in Louisville (Jefferson County), Kentucky, on March 13, 2020. On September 23, several protests occur after charging decision announced in Taylor’s death. Jacob S. Blake was shot and seriously injured by a police officer in Kenosha County, Wisconsin, on August 23. Rayshard Brooks was murdered on June 12, 2020 in Atlanta (Fulton County), Georgia. Ahmaud Arbery was murdered on February 23, 2020 in Glynn County, Georgia. The case became resonant after the viralization of a video about the shooting that derive his death on May 7. Andrés Guardado was killed by a Deputy Sheriff in Los Angeles County, California, on June 18, 2020.
Data fitting
We first normalized both public interest and media coverage respect to their peak values. To find a timescale for the dynamics of the social sensitivity, we parameterized as a cubic-spline of equally spaced nodes within a 1-month period. The fitting error either falls abruptly at (Floyd and Blake) or does not change significantly in the range (Taylor, Brooks, Arbery and Guardado). We therefore fixed the value , for which changes appreciably on a timescale of days.
The media coverage was interpolated in order to obtain a continuous signal. Interpolation and numerical integration of equations 1 and 2 were performed with the library scipy [26]. Parameter fitting was performed using a grid-search in parameter in combination with a minimization routine for a the rest of the parameters ( and nodes of ). The routine consists on integrating the model and varying the parameters until a convergence critera is reached. We used Sequential Least Squares Programming for bounded problems in scipy to minimize the mean square error between the output of the model and data.
Basic models
We compare the goodness of fit with two basic models. In one of them, coverage is predicted by public interest and in the other it is the other way around, . Both basic models were set up to be nonlinear functions approximated by order 7th polynomials, and , without zeroth-order term (). In this way, the basic models match the number of fitting parameters of the model (, and , with ).
Analytical formulation of the model
Equation 1 is an analytical approximation of the threshold-based model proposed by Mark Granovetter [17] with the addition of an external field. In this model, agents adopt a binary state which we interpret as interest () or non-interest () in a given topic. The dynamics of the system is described in terms of the fraction of interested agents , where is the size of the system. The agents have also an associated threshold , which is the fraction of interested agents needed to induce interest on agent . The thresholds are random variables between and taken from a probability density ). On the other hand, the external field is introduced through a parameter independent of the state of the system.
With these ingredients, the dynamics of the system is as follows: the fraction of interested agents can change because a random agent interacts with the media with probability and become interested () in a given topic with probability or disinterested () with probability ; otherwise, with probability , the agent observes the system. In this last case, if the fraction of interested agents is greater than the threshold of the agent (), then it becomes interested (); otherwise, it becomes disinterested (). Agents’ state are synchronously updated, independently from their initial state.
Following [27], we derive the analytical expression for the dynamics of shown in equation 1. Let be the probability that the fraction of interested agents at time is . The master equation for is:
where y are the transition probabilities that a given agent become interested or disinterested given . These probabilities are given by:
where is the threshold cumulative distribution function , which by definition is the fraction of agents whose threshold is below ().
In the limit of infinite population (), , where is now the fraction of interested agents and a continue variable . In this limit, the following approximations are taken:
with . Replacing the above expressions in the master equation and neglecting terms of order, we obtain:
For a well-defined initial condition, ( is the Dirac’s delta) and re-scaling time , the solution of the above equation (pages 53-54 of [28]) is given by:
In particular, if the thresholds are normally distributed with mean and dispersion , . Finally, by adding a constant that allows to adjust the time-scale, equation 1 is obtained.
Equation 1 has equilibria given by The stability of these points is given by the sign of:
where can be observed that the parameter plays no role in setting the stability.
As reference, we summarize here all the variables and parameters of the model mentioned during the manuscript:
Public interest | |
Media coverage | |
Social engagement | |
Social sensitivity | |
Media exposure | |
Timescale |
Acknowledgements
This research was partially funded by the Universidad de Buenos Aires (UBA), the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) through grant PIP-11220200102083CO, and the Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación through grant PICT-2020-SERIEA-00966.
References
- Wu and Huberman [2007] F. Wu and B. A. Huberman, Novelty and collective attention, Proc. Natl. Acad. Sci. 104 (2007).
- Leskovec et al. [2009] J. Leskovec, L. Backstrom, and J. Kleinberg, Meme-tracking and the dynamics of the news cycle, in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (2009) pp. 497–506.
- Pinto et al. [2019] S. Pinto, F. Albanese, C. O. Dorso, and P. Balenzuela, Quantifying time-dependent media agenda and public opinion by topic modeling, Physica A: Statistical Mechanics and its Applications 524, 614 (2019).
- Albanese et al. [2020] F. Albanese, S. Pinto, V. Semeshenko, and P. Balenzuela, Analyzing mass media influence using natural language processing and time series analysis, Journal of Physics: Complexity 1, 025005 (2020).
- Towers et al. [2015] S. Towers, S. Afzal, G. Bernal, N. Bliss, S. Brown, B. Espinoza, J. Jackson, J. Judson-Garcia, M. Khan, M. Lin, et al., Mass media and the contagion of fear: the case of ebola in america, PloS one 10, e0129179 (2015).
- Muhlmeyer et al. [2019] M. Muhlmeyer, J. Huang, and S. Agarwal, Event triggered social media chatter: A new modeling framework, IEEE Transactions on Computational Social Systems 6, 197 (2019).
- Lorenz-Spreen et al. [2019] P. Lorenz-Spreen, B. M. Mønsted, P. Hövel, and S. Lehmann, Accelerating dynamics of collective attention, Nature communications 10, 1 (2019).
- Baumann et al. [2020a] F. Baumann, P. Lorenz-Spreen, I. M. Sokolov, and M. Starnini, Emergence of polarized ideological opinions in multidimensional topic spaces, arXiv preprint arXiv:2007.00601 (2020a).
- Baumann et al. [2020b] F. Baumann, P. Lorenz-Spreen, I. M. Sokolov, and M. Starnini, Modeling echo chambers and polarization dynamics in social networks, Physical Review Letters 124, 048301 (2020b).
- Drury et al. [2020] J. Drury, C. Stott, R. Ball, S. Reicher, F. Neville, L. Bell, M. Biddlestone, S. Choudhury, M. Lovell, and C. Ryan, A social identity model of riot diffusion: From injustice to empowerment in the 2011 London riots, European Journal of Social Psychology 50, 646 (2020).
- Agamennone [2020] M. Agamennone, Riots and Uprisings : Modelling Conflict between Centralised and Decentralised Systems, Ph.D. thesis, King’s College London (2020).
- Bonnasse-Gahot et al. [2018] L. Bonnasse-Gahot, H. Berestycki, M. A. Depuiset, M. B. Gordon, S. Roché, N. Rodriguez, and J. P. Nadal, Epidemiological modelling of the 2005 French riots: A spreading wave and the role of contagion, Scientific Reports 8, 1 (2018), arXiv:1701.07479 .
- Guo and McCombs [2015] L. Guo and M. McCombs, The power of information networks: New directions for agenda setting (Routledge, 2015).
- Castellano et al. [2009] C. Castellano, S. Fortunato, and V. Loreto, Statistical physics of social dynamics, Reviews of modern physics 81, 591 (2009).
- Balenzuela et al. [2015] P. Balenzuela, J. P. Pinasco, and V. Semeshenko, The undecided have the key: interaction-driven opinion dynamics in a three state model, PloS one 10, e0139572 (2015).
- Barrera Lemarchand et al. [2020] F. Barrera Lemarchand, V. Semeshenko, J. Navajas, and P. Balenzuela, Polarizing crowds: Consensus and bipolarization in a persuasive arguments model, Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 063141 (2020).
- Granovetter [1978] M. Granovetter, Threshold models of collective behavior, American journal of sociology 83, 1420 (1978).
- Carney [2016] N. Carney, All Lives Matter, but so Does Race, Humanity & Society 40, 180 (2016).
- [19] https://www.google.com/covid19/mobility/.
- Burghardt et al. [2019] K. Burghardt, W. Rand, and M. Girvan, Inferring models of opinion dynamics from aggregated jury data, PLOS ONE 14, 1 (2019).
- Cinelli et al. [2021] M. Cinelli, G. De Francisci Morales, A. Galeazzi, W. Quattrociocchi, and M. Starnini, The echo chamber effect on social media, Proceedings of the National Academy of Sciences 118, 10.1073/pnas.2023301118 (2021), https://www.pnas.org/content/118/9/e2023301118.full.pdf .
- Cota et al. [2019] W. Cota, S. Ferreira, R. Pastor-Satorras, and M. Starnini, Quantifying echo chamber effects in information spreading over political communication networks, EPJ Data Science 8, 10.1140/epjds/s13688-019-0213-9 (2019).
- Brunton et al. [2016] S. L. Brunton, J. L. Proctor, J. N. Kutz, and W. Bialek, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences of the United States of America 113, 3932 (2016), arXiv:1509.03580 .
- [24] https://developer.twitter.com/en/use-cases/do-research/academic-research.
- [25] https://www.intelligencefusion.co.uk/insights/resources/article/top-30-most-followed-news-accounts-on-twitter/%****␣main.bbl␣Line␣350␣****.
- Virtanen et al. [2020] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nature Methods 17, 261 (2020).
- Akhmetzhanov et al. [2013] A. R. Akhmetzhanov, L. Worden, and J. Dushoff, Effects of mixing in threshold models of social behavior, Physical Review E 88, 012816 (2013).
- Gardiner et al. [1985] C. W. Gardiner et al., Handbook of stochastic methods, Vol. 3 (springer Berlin, 1985).
Supplementary Material
Coverage estimation

Measures of media activity and public interest were collected from all the available tweets (in english language) containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery, Andrés Guardado, Sean Monterrosa, Daniel Prude, Deon Kay, Walter Wallace Jr., Dijon Kizzee, Andre Hill, Dolal Idd, Marcellis Stinnette and Hakim Littleton, in a period of one month around a significant event related to each topic using the Twitter API v2 [24].

In particular, media coverage was estimated by collecting tweets with the mentioned keywords from the group of most followed news accounts in Twitter [25]: @cnnbrk, @nytimes, @CNN, @BBCBreaking, @BBCWorld, @TheEconomist, @Reuters, @WSJ, @TIME, @ABC, @washingtonpost, @AP, @XHNews, @ndtv, @HuffPost, @BreakingNews, @guardian, @FinancialTimes, @SkyNews, @AJEnglish, @SkyNewsBreak, @Newsweek, @CNBC, @France24_en, @guardiannews, @RT_com, @Independent, @CBCNews, @Telegraph. Twitter data is available at https://shorturl.ae/AcUge.
To validate the measure of media coverage, we compare this quantity with information directly obtained from media articles. In particular, we tracked news articles from five main media outlets such as The New York Times, Fox News, UsaToday, Washington Post and Huffington Post related to the main events analyzed in the paper.
Figure 5 shows that the coverage reported in the main manuscript is similar to the number of articles in which the keyword is mentioned and also with the number of mentions.
Figure 6 shows the correlation between reported media coverage and the number of mentions in the articles. A coefficient higher than 0.8 is obtained in all cases, except from Guardado, suggesting that both approaches to measure media activity are equivalent. The differences in the Guardado case is due to the fact that only a few articles were found in the analyzed media.