\UseRawInputEncoding

PrEF: Percolation-based Evolutionary Framework for the diffusion-source-localization problem in large networks

Yang Liu¹¹¹1yangliuyh@gmail.com , Xiaoqi Wang¹, Xi Wang^2,3, Zhen Wang¹, J

\ddot{\text{u}}

rgen Kurths⁴
¹Northwestern Polytechnical University
²The Chinese University of Hong Kong
³Stanford University
⁴Potsdam Institute for Climate Impact Research

Abstract

We assume that the state of a number of nodes in a network could be investigated if necessary, and study what configuration of those nodes could facilitate a better solution for the diffusion-source-localization (DSL) problem. In particular, we formulate a candidate set which contains the diffusion source for sure, and propose the method, Percolation-based Evolutionary Framework (PrEF), to minimize such set. Hence one could further conduct more intensive investigation on only a few nodes to target the source. To achieve that, we first demonstrate that there are some similarities between the DSL problem and the network immunization problem. We find that the minimization of the candidate set is equivalent to the minimization of the order parameter if we view the observer set as the removal node set. Hence, PrEF is developed based on the network percolation and evolutionary algorithm. The effectiveness of the proposed method is validated on both model and empirical networks in regard to varied circumstances. Our results show that the developed approach could achieve a much smaller candidate set compared to the state of the art in almost all cases. Meanwhile, our approach is also more stable, i.e., it has similar performance irrespective of varied infection probabilities, diffusion models, and outbreak ranges. More importantly, our approach might provide a new framework to tackle the DSL problem in extreme large networks.

1 Introduction

There has recently been an enormous amount of interest focusing on the diffusion-source-localization (DSL) problem on networks, which aims to find out the real source of an undergoing or finished diffusion [1, 2, 3]. Two specific scenarios are epidemics and misinformation and both of which can be well modeled by the networks. Being one of the biggest enemies to global health, infectious diseases could cause rapid population declines or species extinction [4], from the Black Death (probably bubonic plague), which is estimated to have caused the death of as much as one third of the population of Europe between 1346 and 1350 [5], to nowadays COVID-19 pandemic, which might result in the largest global recession in history, in particular, climate change keeping exacerbating the spread of diseases and increasing the probability of global epidemics [6, 7]. In this case, the study of the DSL problem can potentially help administrations to make policies to prevent future outbreaks and hence save a lot of lives and resources. Further regarding misinformation, as one of the biggest threats to our societies, it could cause great impact on global health and economy and further weaken the public trust in governments [8], such as the ongoing COVID-19 where fake news circulates on online social networks (OSNs) and has made people believe that the virus could be killed by drinking slaty water and bleach [9], and the ‘Barack Obama was injured’ which wiped out $130 billion in stock value [10]. In this circumstance, the localization of the real source would play important role for containing the misinformation fundamentally.

The main task of the DSL problem is to find an estimator that could give us an inferred source based on the known partial information, and the most ideal estimator is the one that gives us the real source. However, due to the complexity of contact patterns and the uncertainty of diffusion models, the real source is generally almost impossible to be inferred exactly, even the underlying network is a tree [11]. Hence, as an alternative, the error distance is used, and an estimator is said to be better than another one if the corresponding inferred source is closer to the real source in hop-distance [2, 12, 13, 14, 3, 11]. And therefore, varied methods have been developed to minimize such error distance based on different-known-information assumptions, such as observers having knowledge of time stamps and infection directions [2], the diffusion information [15], and the states of all nodes [3], etc. [14, 11]. But here we argue that: what should we do next once we acquire the estimator having small error distance? Indeed, one can carry out intensive detection on the neighbor region of the inferred source to search for the real source. In this case, for regular networks, a small error distance is usually associated with a small number of nodes to be checked. However, most real-world networks are heterogeneous, which indicates that even a short error distance might correspond to a great number of nodes, especially those social networks.

Hence, in this paper, we present a method, Percolation-based Evolutionary Framework (PrEF), to tackle the DSL problem by suppressing a candidate set that the real source belongs to for sure. In particular, we assume that there are a group of nodes in the networks, whose states can be investigated if necessary. Meanwhile, those nodes are also assumed to have information of both time stamps and infection directions. Then, our goal is to use as fewer observers (nodes) as possible to achieve the containment of the candidate set. We find that such goal can be reached by the solution of the network immunization problem. Hence, we have our method based on the network percolation and evolutionary computation. Results on both model and empirical networks show that the proposed method is much better compared to the state-of-the-art approaches.

Key contributions of this paper are summarized as follows:

•

DSL vs. network immunization. We concretely study and derive the connection of the DSL problem and the network immunization problem, and find that the solution of the network immunization problem can be used to and could effectively cope with the DSL problem.
•

Percolation-based evolutionary framework. We propose a percolation-based evolutionary framework to solve the DSL problem, which takes a network percolation process and potentially works for a range of scenarios.
•

Extensive evaluation on synthetic and empirical networks. We evaluate the proposed method on two synthetic networks and four empirical networks drawn from different real-world scenarios, whose sizes are up to over $800,000$ nodes. Results show that our method is more effective, efficient, and stable than the-state-of-the-art approaches, and is also capable of handling large networks.

2 Related Work

DSL approaches. Shah and Zaman first studied the DSL problem of single diffusion source and introduced the rumor centrality method by counting the distinct diffusion configurations of each node [1, 16]. They considered the Susceptible-Infected (SI) model and concluded that a node is said to be the source with higher probability if it has a larger rumor centrality. Following that, Dong et al. also investigated the SI model and proposed a better approach based on the local rumor center generalized from the rumor centrality, given that a priori distribution is known for each node being the rumor source [17]. Similarly, Zhu and Ying investigated the problem under the Susceptible-Infected-Recovered (SIR) model and found that the Jordan center could be used to characterize such probability [12]. Wang et al. showed that the detection probability could be boosted if multiple diffusion snapshots were observed [13], which can be viewed as the case that the information of the diffusion time was integrated to some extent. Indeed, if the time-stamp or other additional information is known, the corresponding method would usually work better [2, 3, 18]. In short, almost all those exiting methods study the DSL problem on either simple networks (such a tree-type networks or model networks) or small empirical networks. Hence, their performance might be questioned in real and complex scenarios, such as networks having a lot of cycles [19, 20].

Network immunization. The network immunization problem aims to find a key group of nodes whose removal could fragment a given network to the greatest extend, which has been proved to be a NP-hard problem [21]. In general, approaches for coping with this problem can be summarized into four categories. The first one is to obtain the key group by strategies such as randomly selecting nodes from the network, which is usually called local-information-based methods [22, 23]. In this scope, since the network topology information does not have to be precisely known, these methods would be quite useful in some scenarios. Rather than that, when the network topology is known, the second category is usually much more effective. Methods of the second category draw the key group by directly classifying nodes based on measurements like degree centrality, eigenvector centrality, pagerank, and betweenness centrality [20]. More concretely, they firstly calculate the importance of each node using their centralities and choose those ranking on the top as the key group. The third category takes the same strategy but will heuristically update the importance of nodes in the remaining network after the most important node is removed, and the key group eventually consists of all removed nodes [21]. The last category obtains the key group in indirect ways [24, 25]. For instance, refs. [26, 27] achieve the goal by tackling the feedback vertex set problem.

3 Model

We assume that a diffusion $\zeta$ occurs on a network $G(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ and $\mathcal{E}$ are the corresponding node set and edge set, respectively. Letting $e_{uv}\in\mathcal{E}$ be the edge between nodes $u$ and $v$ , we define the nearest neighbor set regarding $u$ as $\Gamma(u)=\{v,\forall e_{uv}\in\mathcal{E}\}$ . A connected component $c_{i}$ of $G$ is a subnetwork $G^{\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ satisfying $\mathcal{V}^{\prime}\subset\mathcal{V}$ , $\mathcal{E}^{\prime}=\mathcal{E}\cap(\mathcal{V}^{\prime}\times\mathcal{V}^{\prime})$ , and $\mathcal{E}\cap((\mathcal{V}\setminus\mathcal{V}^{\prime})\times\mathcal{V}^{\prime})\equiv\emptyset$ . In particular, denoting $|c_{i}|$ the size of $c_{i}$ (i.e., the number of nodes in $c_{i}$ ), the largest connected component (LCC), $c_{\text{max}}$ , is then defined as the component that consists of the majority of nodes, that is, $c_{\text{max}}=\operatorname*{arg\,max}_{c_{i}}|c_{i}|$ . Now assuming that $G^{\prime\prime}$ is the remaining network of $G$ after the removal of $q$ fraction nodes and the incident edges, the corresponding size of the LCC, $|c^{\prime\prime}_{\text{max}}|$ , will hence be a monotonically decreasing function of $q$ . Such function is also known as the order parameter $\mathcal{G}(q)=|c^{\prime\prime}_{\text{max}}|/n$ , where $n=|\mathcal{V}|$ is the number of nodes in $G$ . According to the percolation theory [28], when $q$ is large enough, say larger than the critical threshold $q_{c}$ , the probability that a randomly selected node belongs to the LCC would approach zero. In other words, if $q>q_{c}$ , then there would be no any giant connected component in $G^{\prime\prime}$ .

The diffusion $\zeta$ is generally associated with four factors: the network structure $G$ , the diffusion source $v_{s}$ , the dynamic model $M$ , and the corresponding time $t$ , say $\zeta(G,v_{s},M,t)$ . Regarding $M$ , here we particularly consider the Susceptible-Infected-Recovered (SIR) model [29] as an example to explain the proposed method. More models will further be discussed in the result section. For nodes of $G$ governed by the SIR model, their states are either susceptible, infected or recovered. As $t\rightarrow t+1$ , an infected node $u$ has an infection probability $\beta_{uv}$ (or a time interval $\tau_{uv}=1/\beta_{uv}$ ) to transmit the information or virus, say $\varsigma$ , to its susceptible neighbor $v\in\Gamma(u)$ . Meanwhile, it recovers with a recovery probability $\gamma_{u}$ (or duration of $\tau_{u}=1/\gamma_{u}$ ). Those recovered nodes stay in the recovered state and $\varsigma$ cannot pass through a recovered node.

Now supposing that a group of nodes $\mathcal{O}\in\mathcal{V}$ are particularly chosen as observers and hence their states could be investigated if necessary, we study what and how the configuration of $\mathcal{O}$ could facilitate better solutions for the diffusion-source-localization (DSL) problem [1, 2, 3]. In particular, we assume that a node $u\in\mathcal{O}$ would record the relative infection time stamp $t_{u}$ once it gets infected. Besides, we also consider that part of $\mathcal{O}$ , say $\mathcal{O}_{d}$ , have the ability to record the infection direction, that is, a node $u\in\mathcal{O}_{d}$ can also show us the node $v$ if $v$ transmits $\varsigma$ to $u$ . Based on these assumptions, for a diffusion $\zeta$ triggered by an unknown diffusion source $u_{s}$ at time $t_{0}$ , the DSL problem aims to design an estimator $\sigma(G,\mathcal{O})$ that gives us the inferred source $\widehat{u}_{s}=\sigma(G,\mathcal{O})$ satisfying $\widehat{u}_{s}=\operatorname*{arg\,max}_{v\in G}\mathcal{P}(\mathcal{O}|v)$ , where $\mathcal{P}(\mathcal{O}|v)$ is the probability that we observe $\mathcal{O}$ given $\zeta(G,v,M,t)$ . Obviously, the state of a node $i\in\mathcal{O}$ is governed by all parameters of $\zeta$ but with unknown $M$ and $t$ . Hence, with high probability $\widehat{v}_{s}$ differs from the real source $v_{s}$ in most scenarios [2, 12, 3]. And therefore, the error distance $\epsilon=h(\widehat{v}_{s},v_{s})$ is conducted to verify the performance of an estimator, where $h(\widehat{v}_{s},v_{s})$ represents the hop-distance between $\widehat{v}_{s}$ and $v_{s}$ . Usually, an estimator is said to be better than another one if it has smaller $\epsilon$ .

But here we argue that: what should we do next after we obtain the estimator having a small $\epsilon$ ? Or in other words, can the estimator help us find the diffusion source more easily? Indeed, after acquiring $\widehat{v}_{s}$ , one can further conduct more intensive search on the neighbor region of $\widehat{v}_{s}$ to detect $v_{s}$ . In this case, a small $\epsilon$ generally corresponds to a small search region. However, due to the heterogeneity of contact patterns in most real-world networks [30], even a small region (i.e., a small $\epsilon$ ) might be associated with a lot of nodes. Therefore, it would be more practical in real scenarios if such estimator gives us a candidate set $\mathcal{V}_{c}$ satisfying $v_{s}\in\mathcal{V}_{c}$ for sure. And hence, we formulate the goal function that this paper aims to achieve as

\widehat{\mathcal{O}}=\operatorname*{arg\,min}_{\mathcal{O}}|\mathcal{V}_{c}|,v_{s}\in\mathcal{V}_{c}\text{ for sure},

(1)

where $|\mathcal{V}_{c}|$ is the size of $\mathcal{V}_{c}$ . Intuitively, Eq. (1) is developed based on the assumption that: the smaller the candidate set, the lower the cost of the intensive search. And in general, $\mathcal{V}_{c}$ should be finite guaranteed by finite $\mathcal{O}$ for an infinite $G$ , otherwise, the cost would be infinite since the intensive search has to be carried out on an infinite population.

4 Method

Let $\mathcal{V}_{r}$ be the removal node set and $\mathcal{V}_{o}$ the rest, i.e., $\mathcal{V}_{r}\cup\mathcal{V}_{o}=\mathcal{V}$ and $\mathcal{V}_{r}\cap\mathcal{V}_{o}=\emptyset$ . For the subnetwork $G^{\prime}$ regarding $\mathcal{V}_{o}$ , the boundary of a connected component $c_{i}$ , say $\widehat{c}_{i}$ , is defined as

\widehat{c}_{i}=\{u|e_{uv}\in\mathcal{E},\forall v\in c_{i},\forall u\in\mathcal{V}_{r}\}.

(2)

Likewise, we write the component cover $\alpha(u)$ that a specific node $u\in\mathcal{V}_{r}$ corresponds to as

\alpha(u)=\bigcup_{v\in\Gamma(u)}c_{i}(v),

(3)

where $c_{i}(v)$ represents the component that node $v$ belongs to. Denoting $t_{v}$ the time stamp that node $v$ gets infected and $\mathcal{O}^{\prime}=\{u|u=\operatorname*{arg\,min}_{v\in\mathcal{O}}t_{v}\}$ , where $t_{v}$ is assumed to be infinite if $v$ is still susceptible, the proposed approach is developed based on the following observation (Observation 1).

Observation 1.

Letting

\mathcal{V}_{c}^{\prime}=\bigcap_{\forall u\in\mathcal{O}^{\prime}}\alpha(u)\cup\{u\},

(4)

we then have $v_{s}\in\mathcal{V}_{c}^{\prime}$ for sure.

Refer to caption — Figure 1: Examples regarding DSL, where $\mathcal{V}_{r}=\mathcal{O}$ (Observer) and $\mathcal{V}_{o}$ the rest. Nodes in susceptible state are colored by green (marked by $\mathbf{S}$ ), infected by orange ( $\mathbf{I}$ ), and recovered by blue ( $\mathbf{R}$ ). $t_{v}$ represents the time stamp that node $v$ gets infected, e.g., $t_{1}$ of node $1$ . (a) Snapshot of $\zeta$ . (b) $\mathcal{O}^{\prime}=\{1\}$ , i.e., $t_{1}<t_{v},v\in[2,7]$ . (c) $\mathcal{O}^{\prime}=\{1,2\}$ , i.e., $t_{1}=t_{2}<t_{v},v\in[3,7]$ .

Example 1. Considering Fig. 1(a) and (b), the boundary of the connected component that the diffusion source belongs to is $\{1,2\}$ .

Example 2. Regarding Fig. 1(b), $\alpha(1)$ consists of all nodes covered by those shadows.

Example 3. With respect to Fig. 1(c), $\mathcal{V}_{c}^{\prime}$ comprises of node $1$ , node $2$ , the diffusion source, and the node adjacent to the source. $\blacksquare$

We now consider the generation of the observer set $\mathcal{O}$ (or equivalently $\mathcal{V}_{r}$ ). For a given network $G(\mathcal{V},\mathcal{E})$ constructed by the configuration model [31], letting $\langle k^{2}\rangle$ and $\langle k\rangle$ accordingly be the first and second moments of the corresponding degree sequence, we then have Lemma 1.

Lemma 1.

(Molloy-Reed criterion [31]) A network $G$ constructed based on the configuration model with high probability has a giant connected component (GCC) if

\langle k^{2}\rangle/\langle k\rangle>2,

(5)

where GCC represents a connected component whose size is proportional to the network size $n$ .

Now suppose that $\mathcal{V}_{r}$ consists of nodes randomly chosen from $G$ and let $q=|\mathcal{V}_{r}|$ represent the fraction of removed nodes. Apparently, such removal would change the degree sequence of the remaining network (i.e., subnetwork $G^{\prime}$ , also see Fig. 1(b) as an example). Assuming that there is a node $v$ shared by both $G$ and $G^{\prime}$ , the probability that its degree $k$ (in $G$ ) decreases to a specific degree $k^{\prime}$ (in $G^{\prime}$ ) should be

\tbinom{k}{k^{\prime}}(1-q)^{k^{\prime}}q^{k-k^{\prime}},

where $\tbinom{k}{k^{\prime}}$ is the combinational factor (note that each node has $q$ probability of being removed). Letting $p_{k}$ denote the degree distribution of $G$ , we then have the new degree distribution $p^{\prime}_{k^{\prime}}$ of $G^{\prime}$ as

p^{\prime}_{k^{\prime}}=\sum_{k=k^{\prime}}^{\infty}p_{k}\tbinom{k}{k^{\prime}}(1-q)^{k^{\prime}}q^{k-k^{\prime}},

(6)

and hence the corresponding first and second moments can be further obtained as

\langle k^{\prime}\rangle=\sum_{k^{\prime}=0}^{\infty}k^{\prime}p^{\prime}_{k^{\prime}}=\sum_{k^{\prime}=0}^{\infty}k^{\prime}\sum_{k=k^{\prime}}^{\infty}p_{k}\tbinom{k}{k^{\prime}}(1-q)^{k^{\prime}}q^{k-k^{\prime}}=(1-q)\langle k\rangle

(7)

and

\langle k^{\prime 2}\rangle=(1-q)^{2}\langle k^{2}\rangle+q(1-q)\langle k\rangle,

(8)

respectively. Since $G$ is constructed using the configuration model, its edges are independent of each other. That is, each edge of $G$ shares the same probability of connecting to $v\in\mathcal{V}_{r}$ . In other words, the removal of $v$ would remove each edge with the same probability and hence $G^{\prime}$ can be viewed as a special network that is also constructed by the configuration model. Thus, Lemma 1 can be used to determine whether a GCC exists in $G^{\prime}$ and we reach

q_{c}=1-\frac{1}{\langle k^{2}\rangle/\langle k\rangle-1},

(9)

where $q_{c}$ is the critical threshold of $q$ , that is: i) if $q<q_{c}$ , with high probability there is a GCC in $G^{\prime}$ ; ii) if $q>q_{c}$ , with high probability there is no GCC in $G^{\prime}$ [32, 33].

For random networks (such as Erdős-Rényi (ER) networks [34]), $\langle k^{2}\rangle=\langle k\rangle(\langle k\rangle+1)$ gives us $q_{c}=1-1/\langle k\rangle$ , which indicates that $q_{c}$ is usually less than $1$ and it increases as $G$ becomes denser. But for heterogeneous networks, say $p_{k}\backsim k^{-\ell}$ , $\langle k^{2}\rangle=\sum_{k}^{\infty}k^{2}p_{k}\backsim\sum_{k}k^{2-\ell}$ diverges if $\ell<3$ (most empirical networks have $2<\ell<3$ [20]), which means $q_{c}$ approaches $1$ .

Remark 1.

From the above analysis, we have the following conclusions regarding the case that the observer set $\mathcal{O}$ is randomly chosen. For random networks, $q_{c}<1$ indicates that one can always have a proper $\mathcal{O}$ to achieve finite $\mathcal{V}_{c}^{\prime}$ . And usually the denser the network, the larger the observer set $\mathcal{O}$ . But for heterogeneous networks, $q_{c}\rightarrow 1$ means that such goal could only be achieved by putting almost all nodes into the observer set $\mathcal{O}$ . $\blacksquare$

We further consider the case that $\mathcal{O}$ consists of hubs [35], where hubs represent those nodes that have more connections in a particular network. Specifically, for a network $G(\mathcal{V},\mathcal{E})$ , we first define a sequence $\mathcal{S}$ regarding $\mathcal{V}$ and assume that each element of $\mathcal{S}$ is uniquely associated with a node in $G$ . Then, letting $\mathcal{S}(i)=u$ and $\mathcal{S}(j)=v$ if $k_{u}\geqslant k_{v}$ , satisfying $i<j$ , where $k_{u}$ represents the degree of node $u$ , we have $\mathcal{O}=\{\mathcal{S}(i),i\in[1,\lfloor nq+0.5\rfloor]\}$ . For heterogeneous networks under such removals on hubs, threshold $q_{c}<1$ can be achieved and obtained by numerically solving [36]

q_{c}^{(2-\ell)/(1-\ell)}-2=\frac{2-\ell}{3-\ell}k_{\text{min}}(q_{c}^{(3-\ell)/(1-\ell)}-1),

(10)

where $k_{\text{min}}$ is the minimum degree.

For $\mathcal{V}_{c}^{\prime}$ , however, we could not obtain an explicit equation to indicate whether it is finite. But we can roughly show that there should be $q_{c}<1$ that gives us a finite $\mathcal{V}_{c}^{\prime}$ for networks generated by the configuration model with degree distribution $p_{k}\backsim k^{-\ell}$ . Supposing that the size of the LCC of $G^{\prime}$ is proportional to $n^{b}$ with $b<1$ , then

k_{\text{max}}n^{b}/n

(11)

gives us the possibly largest size of $\mathcal{V}_{c}^{\prime}$ , where $k_{\text{max}}$ is the maximum degree of $G$ and here we assume that such degree is unique. In the mentioned case, $k_{\text{max}}=k_{\text{min}}n^{1/(\ell-1)}$ holds and hence it approaches $0$ as $n\rightarrow\infty$ if $2<\ell<3$ (again, this is the case that we are particularly interested in), which indicates that one can always find some proper value of $b>0$ satisfying $b<(\ell-2)/(\ell-1)$ for a given $\ell$ . Note that, same as the random removal, each edge of $G$ also shares the same probability of being removed. Hence, both the size of the LCC and the number of connections that node $v_{\text{max}}$ has with $\mathcal{V}_{o}$ decrease as $q$ increases, where $v_{\text{max}}$ represents the node whose degree is $k_{\text{max}}$ .

But for networks having $k_{\text{max}}\backsim n$ such as star-shape networks, $\mathcal{V}_{c}^{\prime}$ is still infinite when $q$ is finite. In this case, the information of only infection time stamps of nodes in $\mathcal{O}$ is apparently not enough. However, if the infection direction is also recorded, then the central node as the unique observer is fairly enough for the DSL problem in those star-shape networks. Note that most existing DSL methods would also fail in star-shape networks. Hence, we further assume that part of $\mathcal{O}$ , say $\mathcal{O}_{d}$ , can also show the infection direction. Obviously, $|\mathcal{V}_{c}^{\prime}|$ is a monotonically decreasing function of $|\mathcal{O}_{d}|$ for a specific $q$ . In particular, if $\mathcal{O}_{d}=\mathcal{O}$ , then the size of $\mathcal{V}_{c}^{\prime}$ would be bounded by the size of the LCC.

Remark 2.

In general, finite $\mathcal{V}_{c}^{\prime}$ of heterogeneous networks can be achieved by carefully choosing $\mathcal{O}$ . Or in other words, the configuration of $\mathcal{O}$ plays fundamental role for the suppression of $\mathcal{V}_{c}^{\prime}$ . Associating $\mathcal{O}$ with the sequence $\mathcal{S}$ , such as random removal can be viewed as a removal over a random sequence $\mathcal{S}$ , our goal is now to acquire a better $\mathcal{S}$ that could give us a smaller $\mathcal{V}_{c}^{\prime}$ in regard to a specific $q$ . Besides, since the number of components that $v_{\text{max}}$ connects to is usually difficult to measure especially for real-world networks, here we choose to achieve the containment of $\mathcal{V}_{c}^{\prime}$ by curbing the LCC of $G^{\prime}$ (see also Eq. (11)), which coincides with the suppression of the order parameter $\mathcal{G}(q)$ . $\blacksquare$

Therefore, we reach

\mathcal{\widehat{S}}=\left\{\begin{aligned} &\operatorname*{arg\,min}_{\mathcal{S}}q_{c}(\mathcal{S}),\ \ \text{if $\delta$ is given},\\ &\operatorname*{arg\,min}_{\mathcal{S}}F(\mathcal{S}),\ \ \text{otherwise},\end{aligned}\right.

(12)

where $q_{c}(\mathcal{S})=\operatorname*{arg\,min}_{q}\mathcal{G}(q)\leqslant\delta$ , $F(\mathcal{S})=\sum_{q}\mathcal{G}(q)$ , and $\delta$ is a given parameter (such as $\delta=0.01$ ). Eq. (12) is also known as the network immunization problem which aims to contain epidemics by the isolation of as fewer nodes as possible [33]. And a lot of methods have been proposed to cope with such problem [21, 27, 37, 38, 39, 40, 24, 25]. Here we particularly choose and consider the approach based on the evolutionary framework (AEF) to construct $\mathcal{S}$ since it can achieve the state of the art in most networks.

We first introduce several auxiliary variables to the ease of description of AEF. Consider a given network $G(\mathcal{V},\mathcal{E})$ and the corresponding sequence $\mathcal{S}$ . Let $\mathcal{S}_{i}^{\bot}=(\mathcal{S}_{i},\mathcal{S}_{i+1},...,\mathcal{S}_{h})$ , where $\mathcal{S}_{i}$ is a subsequence of $\mathcal{S}$ . Likewise, denote $G^{\prime}(\mathcal{S}_{i}^{\bot})$ a subnetwork $G^{\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ , in which $\mathcal{V}^{\prime}=\{\mathcal{S}_{i}^{\bot}(j),\forall j\}$ and $\mathcal{E}^{\prime}=\mathcal{E}\cap(\mathcal{V}^{\prime}\times\mathcal{V}^{\prime})$ . Based on that, $F$ of $G^{\prime}(\mathcal{S}_{i}^{\bot})$ regarding $\mathcal{S}_{i}^{\bot}$ is written as $F^{\prime}(\mathcal{S}_{i}^{\bot})$ . Further, letting $p^{\prime}_{\text{max}}=|\text{LCC}|/n$ of $G^{\prime}$ , we define the critical subsequence $\mathcal{S}_{c}$ as the subsequence satisfying that $p^{\prime}_{\text{max}}\leqslant\delta$ of $G^{\prime}(\mathcal{S}_{i}^{\bot})$ and $p^{\prime}_{\text{max}}>\delta$ of $G^{\prime}((\mathcal{S}_{c},\mathcal{S}_{i}^{\bot}))$ . Note that all $F^{\prime}$ is scaled by $n$ , namely, the size of the studied network $G$ .

The core of AEF is the relationship-related (RR) strategy that works by repeatedly pruning the whole sequence $\mathcal{S}$ . Specifically, per iteration $T$ , RR keeps a new sequence $\mathcal{S}^{\prime}$ (i.e., $\mathcal{S}\leftarrow\mathcal{S}^{\prime}$ ) if $F(\mathcal{S}^{\prime})<F(\mathcal{S})$ (or $q_{c}(\mathcal{S}^{\prime})<q_{c}(\mathcal{S})$ ), which is obtained through the following steps. 1) Let $j=n$ , $\mathcal{S}^{\prime}\leftarrow\mathcal{S}$ , and $G^{\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ be a subnetwork of $G$ , which consists of all nodes in $\mathcal{V}^{\prime}=\{\mathcal{S}^{\prime}(z),z\in[j,n]\}$ and the associated edges in $\mathcal{E}^{\prime}=\{e_{uv},\forall u,v\in\mathcal{V}^{\prime}\}$ . 2) Construct the candidate set $\bar{s}_{j}$ by randomly choosing $\Delta$ times from $\{S^{\prime}(i),i\in[\max(j-r\times n,1),j]\}$ , where $\Delta\in[1,\widehat{\Delta}]$ and $r\in(0,\widehat{r}]$ are randomly generated per iteration, and $\widehat{\Delta}$ and $\widehat{r}$ are given parameters. 3) Choose the node $u$ ,

u=\operatorname*{arg\,min}_{v}\xi(v),\forall v\in\bar{s}_{i},

(13)

where $\xi(v)=\sum_{c_{i}\in c(v)}|c_{i}|$ or $\xi(v)=\prod_{c_{i}\in c(v)}|c_{i}|$ , in which $c(v)$ is the component set that node $v$ would connect. 4) Update $G^{\prime}$ and $\mathcal{S}^{\prime}$ , namely, $\mathcal{V}^{\prime}\leftarrow\mathcal{V}^{\prime}\cup\{u\}$ , $\mathcal{E}^{\prime}\leftarrow\mathcal{E}^{\prime}\cup\{e_{uv},\forall v\in\mathcal{V}^{\prime},v\neq u\}$ , and swap $\mathcal{S}^{\prime}(j)$ and $\mathcal{S}^{\prime}(z)$ satisfying $\mathcal{S}^{\prime}(z)=u$ . 5) $j\leftarrow j-1$ . 6) Repeat steps 2) - 5) until $j=1$ , which accounts for one round (see also Algorithm 1). And RR acquires the solution by repeating steps 1) - 6) $\widehat{T}$ times.

Input: $G(\mathcal{V},\mathcal{E})$ , $\mathcal{S}$ , $\widehat{\Delta}$ , $\widehat{r}$ Output: $\mathcal{S}$ 1 Initialization: $\mathcal{V}\leftarrow\{\}$ , $\mathcal{E}\leftarrow\{\}$ , $j\leftarrow n,\mathcal{S}^{\prime}\leftarrow\mathcal{S}$ , $\Delta$ , and $r$ 2 while $j\geqslant 1$ do 3 $j\leftarrow j-1$ 4 Get the candidate set $\bar{s}_{i}$ based on $\Delta$ and $r$ 5 Choose node $u\in\bar{s}_{i}$ based on Eq. (13) 6 Update $G^{\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ and $\mathcal{S}^{\prime}$ 7 8if $F(\mathcal{S}^{\prime})<F(\mathcal{S})$ then 9 $\mathcal{S}\leftarrow\mathcal{\mathcal{S}}^{\prime}$ Algorithm 1 One round of RR [38]

Observation 2.

Supposing that

F_{i}^{\prime}=F^{\prime}(\mathcal{S}_{i}^{\bot}\leftarrow\mathcal{S}_{i+1}^{\bot})=F^{\prime}(\mathcal{S}_{i}^{\bot})-F^{\prime}(\mathcal{S}_{i+1}^{\bot})

(14)

holds, then for a specific sequence $\mathcal{S}$ regarding a given network $G$ , $F_{i}^{\prime}$ would be independent of $F_{j}^{\prime}$ if either $i>j$ or $j>i$ satisfies. That is, for such a case, the order of nodes in $\mathcal{S}_{i}$ has no effect on $F_{j}^{\prime}$ . $\blacksquare$

AEF is developed based on RR and Observation 2. Specifically, at $T_{p}$ , a random integer $j(T_{p})\in[\pi_{1},\pi_{2}]$ is generated, where $\pi_{1}$ and $\pi_{2}$ are two given boundaries. Let $\mathcal{S}_{i}=(\mathcal{S}(z)),\forall z\in[j(T_{p})\times(i-1)+1,\min(j(T_{p})\times i)]$ . Then, for all subsequences $\mathcal{S}_{i},\forall i\in[1,h]$ , RR with the optimization of $F$ is conducted if $\delta$ is unknown, otherwise, $\mathcal{S}_{c}$ is optimized by RR with $q_{c}$ minimum.

Hence, the containment of $\mathcal{V}^{\prime}$ has been achieved (see also Eq. (4)), based on which existing DSL approaches can be used to further acquire the candidate set $\mathcal{V}_{c}\subset\mathcal{V}^{\prime}$ (see also Eq. (1)). Here, since our goal is to have a framework that can effectively cope with the DSL problem in large-scale networks, we choose to propose the following approach. Let $\mathcal{O}^{\prime\prime}$ be the effective periphery node set of $\mathcal{V}_{c}^{\prime\prime}$ defined as

\mathcal{O}^{\prime\prime}=\{u,\forall u\in\mathcal{O},\exists\Gamma(u)\cap\mathcal{V}_{c}^{\prime\prime}\neq\emptyset,t_{u}\neq\infty\},

(15)

where

\mathcal{V}_{c}^{\prime\prime}=\bigcap_{\forall u\in\mathcal{O}^{\prime}}\alpha(u).

(16)

Letting $t_{\text{min}}=\operatorname*{arg\,min}_{t_{u}}u,u\in\mathcal{O}^{\prime\prime}$ , we first refine the time stamp by $t_{u}^{\prime}=t_{u}-t_{\text{min}}$ . Then, a Reverse-Influence-Sampling (RIS) [41] like strategy is conducted to infer the source $\widehat{v}_{c}$ , which works in the following processes. 1) Let $\Lambda=\{\}$ and $G^{\prime\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime\prime})$ be the reverse network of $G^{\prime}(\mathcal{V}^{\prime},\mathcal{E}^{\prime})$ satisfying that $|\mathcal{E}^{\prime\prime}|\equiv|\mathcal{E}^{\prime}|$ and $e_{uv}\in\mathcal{E}^{\prime\prime}$ if $e_{vu}\in\mathcal{E}^{\prime}$ . 2) Randomly choose a node $u\in\mathcal{O}^{\prime\prime}$ and let $t_{u}^{\prime\prime}=t_{0}^{\prime}+t_{u}^{\prime}$ , where $t_{0}^{\prime}$ is randomly generated from $[0,\widehat{t}_{0}]$ and $\widehat{t}_{0}$ is a given parameter. 3) View $u$ as the source and transmits $\varsigma$ to one of its randomly chosen neighbors, and then it recovers. 4) Such transmission repeats $t_{u}^{\prime\prime}$ steps and denote the latest infected node by $v$ . 5) Let $\Lambda=\Lambda\cup\{v\}$ . 6) Repeat 2)-5) $T_{\Lambda}$ times. Using $\theta(v)$ to represent the frequency that a node $v\in\Lambda$ has regarding $\Lambda$ , then we estimate the source $\widehat{v}_{c}$ by

\widehat{v}_{c}=\operatorname*{arg\,max}_{v}\theta(v).

(17)

The candidate set $\mathcal{V}_{c}$ (see also Eq. (1)) is finally acquired by simply considering a few layers of neighbors of $\widehat{v}_{c}$ .

Remark 3.

Relying on AEF a finite $\mathcal{V}_{c}^{\prime}$ can be achieved by a small $\mathcal{O}$ , especially when $\mathcal{O}_{d}$ is large, i.e., the larger the $R_{d}$ , the better the corresponding result, where $R_{d}=|\mathcal{O}_{d}|/|\mathcal{O}|$ characterizes the rate of $\mathcal{O}_{d}$ regarding $\mathcal{O}$ . In particular, in tandem with the approach that we present to draw $\mathcal{V}_{c}$ from $\mathcal{V}_{c}^{\prime}$ , we name such framework as the Percolation-based Evolutionary Framework (PrEF) for the diffusion-source-localization problem. Note that other strategies can also be further developed or integrated into PrEF to acquire $\mathcal{V}_{c}^{\prime}$ based on $\mathcal{V}_{c}$ , such as those existing DSL methods. $\blacksquare$

5 Results

Competitors. We mainly compare the proposed approach with the Jordan-Center (JC) method [12] that generally achieves comparable results in most cases [15, 11, 18]. For JC, the corresponding candidate set $\mathcal{V}_{c}$ is constructed based on the associated node rank since neighbor-based strategy usually results in much larger $\mathcal{V}_{c}$ . Meanwhile, $\mathcal{O}$ consists of hubs is also considered as a baseline, say Hubs_s. Besides, since most current DSL approaches do not work for large networks, we also verify the performance of the proposed method by comparing it with approaches from the network immunization field, including the collective influence (CI) [21], the min-sum and reverse-greedy (MSRG) strategy [27], and the FINDER (FInding key players in Networks through DEep Reinforcement learning) method [25].

Settings. JC considers all infected and recovered nodes to achieve the source localization. PrEF is conducted with $\widehat{\Delta}=50$ , $\widehat{r}=1$ , $\widehat{T}=20$ , $\pi_{1}=1$ , $\pi_{2}=\lfloor 0.1\times n\rfloor$ , $T_{p}=5,000$ for networks of $n\leqslant 10^{5}$ , $T_{p}=2,500$ for $10^{5}<n\leqslant 10^{6}$ , $T_{p}=500$ for $n>10^{6}$ (AEF), and $T_{\Lambda}=10^{6}$ (RIS). Besides, we use $\text{PrEF}(R_{d})$ to represent PrEF regarding a specific $R_{d}$ . In addition, for each network, $q_{c}$ is obtained at $\mathcal{G}(q_{c})\thickapprox 0.005$ of AEF.

Diffusion models. SIR₁: $\beta_{uv}=\beta_{1}$ and $\gamma_{u}=\gamma_{1}$ , $\forall u,v\in\mathcal{V}$ . SIR₂: $\beta_{uv}\in[\beta_{0},\beta_{1}]$ and $\gamma_{u}=0$ , $\forall u,v\in\mathcal{V}$ , i.e., the Susceptible-Infected (SI) model [29]. SIR₃: $\beta_{uv}\in[\beta_{0},\beta_{1}]$ and $\gamma_{u}=1$ , $\forall u,v\in\mathcal{V}$ , i.e., the Independent Cascade (IC) model [42, 43].

Letting $n(t,\text{I})$ and $n(t,\text{R})$ accordingly be the number of nodes in infection and recovery states at $t$ of a particular diffusion $\zeta(G,v_{s},M,t)$ , we generate a DSL sample by the following processes. 1) A node $v_{s}\in\mathcal{V}$ is randomly chosen as the diffusion source to trigger $\zeta$ . 2) $\zeta$ is terminated at the moment when

(n(t,\text{I})+n(t,\text{R}))/n\geqslant\varepsilon,

where $\varepsilon$ is the outbreak range rate. Note that $(n(t,\text{I})+n(t,\text{R}))/n$ might be much larger than $\varepsilon$ if the infection probability is large.

Evaluation metric. We mainly consider the fraction of the candidate set (see also Eq. (1)), $\phi$ , to evaluate the performance of the proposed method, which is defined as

\phi=|\mathcal{V}_{c}|/n.

In what follows, $\phi$ is the mean drawn from over $1,000$ independent realizations if there is no special explantation. Besides, we also use $\phi(\cdot)$ to denote $\phi$ of a specific approach, such as $\phi(\text{PrEF})$ represents $\phi$ of PrEF.

Table 1: Experimental networks.

Networks	$n$	$m$
ER	$10,000$	$35,000$
SF	$10,000$	$40,000$
PG	$4,941$	$6,594$
SCM	$7,228$	$24,784$
LOCG	$196,591$	$950,327$
WG	$875,713$	$4,322,051$

Networks. We consider both model networks (including the ER network [34] and scale-free (SF) network [30]) and empirical networks (including the Power Grid (PG) network [19], the Scottish-cattle-movement (SCM) network [37], the loc-Gowalla (LOCG) network [44], and the web-Google (WG) network [45]). We choose the PG network since it is widely used to evaluate the performance of DSL approaches. Rather than that, the rest are all highly associated with the DSL problem. In particular, the SCM network is a network of Scottish cattle movements, on which the study of the DSL problem plays important role for food security [46]. Besides, the LOCG is a location-based online social network and the WG network is a network of Google web whose nodes represent web pages and edges are hyperlinks among them. The study of them can potentially contribute to the containment of misinformation. The basic information regarding those networks can be found in Table 1.

Results. We first fix $q$ to verify the performance of PrEF over varied infection probability $\beta_{1}$ . Indeed, if the diffusion is symmetrical, JC would be an effective estimator (see Figs. 2a and 2b when $\beta_{1}$ is large). But such effectiveness sharply decreases as $\beta_{1}$ decreases. By contrast, PrEF has steady performance for the whole range of $\beta_{1}$ and is much better than JC when $\beta_{1}$ is small, such as $\phi(\text{PrEF}(1))=0.0004$ versus $\phi(\text{JC})=0.0721$ at $\beta_{1}=0.1$ in the SF network. Besides, $\text{PrEF}(0)$ apparently works better in the ER network compared to the SF network, which indicates that $k_{\text{max}}$ might have impact on the effectiveness of $\text{PrEF}(0)$ since the SF network has a much larger $k_{\text{max}}$ . To further demonstrate that, we also consider two empirical networks: the Power Grid network (with $k_{\text{max}}=19$ ) and the Scottish network (with $k_{\text{max}}=3667$ ). As shown in Figs. 3c and 3d, even Hubs_s is more effective than JC in the Power Grid network but, JC, Hubs_s, and $\text{PrEF}(0)$ all fail in the Scottish network. Rather than that, $\text{PrEF}(1)$ works extremely well in both cases. Further considering the fraction of the candidate set $\phi$ as a function of the outbreak range rate $\varepsilon$ (Fig. 3), $\text{PrEF}(0)$ and $\text{PrEF}(1)$ still have stable performance while $\phi(\text{JC})$ rapidly increases as $\phi$ .

We further evaluate the performance of PrEF under different $q$ by comparing it with CI, MSRG, and FINDER on the two large networks. From those results shown in Fig. 4, we have the following conclusions: i) $\phi\rightarrow 1$ when $q\rightarrow 0$ , which is in accordance with our previous discussion, i.e., $\mathcal{G}(q)\rightarrow 1$ when $q\rightarrow 0$ ; ii) A specific method $\mathcal{S}$ that has better performance regarding $R_{d}=0$ usually also works better with respect to $R_{d}=1$ ; iii) For a specific $q$ , PrEF always has much smaller $\phi$ compared to CI, MSRG, and FINDER, especially for the WG network. Indeed, the size of the observer set $\mathcal{O}$ , the value of $R_{d}$ (see also Fig. 5), the strategy generating $\mathcal{O}$ all play fundamental roles for minimizing $\phi$ . In particular, PrEF has the best performance for almost all range of $q$ . Besides, results in Fig. 6 further demonstrate that the proposed method is also stable against varied diffusion models.

6 Conclusion

Aiming at the development of the-state-of-the-art approach to cope with the DSL problem for large networks, the PrEF method has been developed based on the network percolation and evolutionary computation, which can effectively narrow our search region of the diffusion source. Specifically, We have found that the DSL problem is in a degree equivalent to the network immunization problem if viewing immune nodes as observers, and hence it can be tackled in a similar scheme. In particular, we have demonstrated that the search region would be bounded by the LCC if the direction information of the diffusion is known, regardless of the network structure. But for the case that only the time stamp is recorded, both LCC and the largest degree have impact on the search region. We have also conducted extensive experiments to evaluate the performance of the proposed method. Results show that our method is much more effective, efficient, and stable compared to existing approaches.

References

[1] D. Shah and T. Zaman, “Detecting sources of computer viruses in networks: theory and experiment,” in Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2010, pp. 203–214.
[2] P. C. Pinto, P. Thiran, and M. Vetterli, “Locating the source of diffusion in large-scale networks,” Physical review letters, vol. 109, no. 6, p. 068702, 2012.
[3] S. S. Ali, T. Anwar, A. Rastogi, and S. A. M. Rizvi, “Epa: Exoneration and prominence based age for infection source identification,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 891–900.
[4] C. D. Harvell, C. E. Mitchell, J. R. Ward, S. Altizer, A. P. Dobson, R. S. Ostfeld, and M. D. Samuel, “Climate warming and disease risks for terrestrial and marine biota,” Science, vol. 296, no. 5576, pp. 2158–2162, 2002.
[5] F. Brauer, C. Castillo-Chavez, and C. Castillo-Chavez, Mathematical models in population biology and epidemiology. Springer, 2012, vol. 2.
[6] A. J. McMichael, D. H. Campbell-Lendrum, C. F. Corvalán, K. L. Ebi, A. Githeko, J. D. Scheraga, and A. Woodward, Climate change and human health: risks and responses. World Health Organization, 2003.
[7] D. T. Jamison, L. H. Summers, G. Alleyne, K. J. Arrow, S. Berkley, A. Binagwaho, F. Bustreo, D. Evans, R. G. Feachem, J. Frenk et al., “Global health 2035: a world converging within a generation,” The Lancet, vol. 382, no. 9908, pp. 1898–1955, 2013.
[8] X. Zhou and R. Zafarani, “A survey of fake news: Fundamental theories, detection methods, and opportunities,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–40, 2020.
[9] S. Sahoo, S. K. Padhy, J. Ipsita, A. Mehra, and S. Grover, “Demystifying the myths about covid-19 infection and its societal importance,” Asian journal of psychiatry, vol. 54, p. 102244, 2020.
[10] K. Rapoza, “Can ’fake news’ impact the stock market?” https://www.forbes.com/sites/kenrapoza/2017/02/26/can-fake-news-impact-the-stock-market/?sh=5f820b392fac, 2017, accessed: 2021-09-15.
[11] J. Choi, S. Moon, J. Woo, K. Son, J. Shin, and Y. Yi, “Information source finding in networks: Querying with budgets,” IEEE/ACM Transactions on Networking, vol. 28, no. 5, pp. 2271–2284, 2020.
[12] K. Zhu and L. Ying, “Information source detection in the sir model: A sample-path-based approach,” IEEE/ACM Transactions on Networking, vol. 24, no. 1, pp. 408–421, 2014.
[13] Z. Wang, W. Dong, W. Zhang, and C. W. Tan, “Rumor source detection with multiple observations: Fundamental limits and algorithms,” ACM SIGMETRICS Performance Evaluation Review, vol. 42, no. 1, pp. 1–13, 2014.
[14] J. Jiang, S. Wen, S. Yu, Y. Xiang, and W. Zhou, “Identifying propagation sources in networks: State-of-the-art and comparative studies,” IEEE Communications Surveys & Tutorials, vol. 19, no. 1, pp. 465–481, 2016.
[15] A. Y. Lokhov, M. Mézard, H. Ohta, and L. Zdeborová, “Inferring the origin of an epidemic with a dynamic message-passing algorithm,” Physical Review E, vol. 90, no. 1, p. 012801, 2014.
[16] D. Shah and T. Zaman, “Rumor centrality: a universal source detector,” in Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, 2012, pp. 199–210.
[17] W. Dong, W. Zhang, and C. W. Tan, “Rooting out the rumor culprit from suspects,” in 2013 IEEE International Symposium on Information Theory. IEEE, 2013, pp. 2671–2675.
[18] Y. Chai, Y. Wang, and L. Zhu, “Information sources estimation in time-varying networks,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2621–2636, 2021.
[19] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998.
[20] M. Newman, Networks. Oxford university press, 2018.
[21] F. Morone and H. A. Makse, “Influence maximization in complex networks through optimal percolation,” Nature, vol. 524, no. 7563, pp. 65–68, 2015.
[22] R. Cohen, S. Havlin, and D. Ben-Avraham, “Efficient immunization strategies for computer networks and populations,” Physical Review Letters, vol. 91, no. 24, p. 247901, 2003.
[23] Y. Liu, Y. Deng, and B. Wei, “Local immunization strategy based on the scores of nodes,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 26, no. 1, p. 013106, 2016.
[24] X.-L. Ren, N. Gleinig, D. Helbing, and N. Antulov-Fantulin, “Generalized network dismantling,” Proceedings of the national academy of sciences, vol. 116, no. 14, pp. 6554–6559, 2019.
[25] C. Fan, L. Zeng, Y. Sun, and Y.-Y. Liu, “Finding key players in complex networks through deep reinforcement learning,” Nature Machine Intelligence, pp. 1–8, 2020.
[26] S. Mugisha and H.-J. Zhou, “Identifying optimal targets of network attack by belief propagation,” Physical Review E, vol. 94, no. 1, p. 012305, 2016.
[27] A. Braunstein, L. Dall’Asta, G. Semerjian, and L. Zdeborová, “Network dismantling,” Proceedings of the National Academy of Sciences, vol. 113, no. 44, pp. 12 368–12 373, 2016.
[28] D. Stauffer and A. Aharony, Introduction to percolation theory. CRC press, 2018.
[29] M. J. Keeling and P. Rohani, Modeling infectious diseases in humans and animals. Princeton university press, 2011.
[30] A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999.
[31] M. Molloy and B. Reed, “A critical point for random graphs with a given degree sequence,” Random structures & algorithms, vol. 6, no. 2-3, pp. 161–180, 1995.
[32] R. Cohen, K. Erez, D. Ben-Avraham, and S. Havlin, “Resilience of the internet to random breakdowns,” Physical Review Letters, vol. 85, no. 21, p. 4626, 2000.
[33] A.-L. Barabási et al., Network science. Cambridge university press, 2016.
[34] P. Erdős and A. Rényi, “On random graphs I.” Publicationes Mathematicae (Debrecen), vol. 6, pp. 290–297, 1959.
[35] R. Albert, H. Jeong, and A.-L. Barabási, “Error and attack tolerance of complex networks,” Nature, vol. 406, no. 6794, pp. 378–382, 2000.
[36] R. Cohen, K. Erez, D. Ben-Avraham, and S. Havlin, “Breakdown of the internet under intentional attack,” Physical Review Letters, vol. 86, no. 16, p. 3682, 2001.
[37] P. Clusella, P. Grassberger, F. J. Pérez-Reche, and A. Politi, “Immunization and targeted destruction of networks using explosive percolation,” Physical Review Letters, vol. 117, no. 20, p. 208301, 2016.
[38] Y. Liu, X. Wang, and J. Kurths, “Optimization of targeted node set in complex networks under percolation and selection,” Physical Review E, vol. 98, no. 1, p. 012313, 2018.
[39] ——, “Framework of evolutionary algorithm for investigation of influential nodes in complex networks,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 6, pp. 1049–1063, 2019.
[40] C. Fan, Y. Sun, Z. Li, Y.-Y. Liu, M. Chen, and Z. Liu, “Dismantle large networks through deep reinforcement learning,” in ICLR representation learning on graphs and manifolds workshop, 2019.
[41] C. Borgs, M. Brautbar, J. Chayes, and B. Lucier, “Maximizing social influence in nearly optimal time,” in Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 2014, pp. 946–957.
[42] J. Goldenberg, B. Libai, and E. Muller, “Talk of the network: A complex systems look at the underlying process of word-of-mouth,” Marketing letters, vol. 12, no. 3, pp. 211–223, 2001.
[43] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 137–146.
[44] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user movement in location-based social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011, pp. 1082–1090.
[45] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, “Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters,” Internet Mathematics, vol. 6, no. 1, pp. 29–123, 2009.
[46] M. Keeling, M. Woolhouse, R. May, G. Davies, and B. T. Grenfell, “Modelling vaccination strategies against foot-and-mouth disease,” Nature, vol. 421, no. 6919, pp. 136–142, 2003.