This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\setcopyright

ifaamas \acmDOIdoi \acmISBN \acmConference[AAMAS’18]Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), M. Dastani, G. Sukthankar, E. Andre, S. Koenig (eds.)July 2018Stockholm, Sweden \acmYear2018 \copyrightyear2018 \acmPrice

Adversarial Classification on Social Networks

Sixie Yu sixie.yu@vanderbilt.edu Yevgeniy Vorobeychik yevgeniy.vorobeychik@vanderbilt.edu  and  Scott Alfeld salfeld@amherst.edu
Abstract.

The spread of unwanted or malicious content through social media has become a major challenge. Traditional examples of this include social network spam, but an important new concern is the propagation of fake news through social media. A common approach for mitigating this problem is by using standard statistical classification to distinguish malicious (e.g., fake news) instances from benign (e.g., actual news stories). However, such an approach ignores the fact that malicious instances propagate through the network, which is consequential both in quantifying consequences (e.g., fake news diffusing through the network), and capturing detection redundancy (bad content can be detected at different nodes). An additional concern is evasion attacks, whereby the generators of malicious instances modify the nature of these to escape detection. We model this problem as a Stackelberg game between the defender who is choosing parameters of the detection model, and an attacker, who is choosing both the node at which to initiate malicious spread, and the nature of malicious entities. We develop a novel bi-level programming approach for this problem, as well as a novel solution approach based on implicit function gradients, and experimentally demonstrate the advantage of our approach over alternatives which ignore network structure.

\affiliation\institution

Electrical Engineering and Computer Science, Vanderbilt University \cityNashville \stateTN

\affiliation\institution

Electrical Engineering and Computer Science, Vanderbilt University \cityNashville \stateTN

\affiliation\institution

Computer Science, Amherst College \cityAmherst \stateMA

{CCSXML}

¡ccs2012¿ ¡concept¿ ¡concept_id¿10010147.10010178.10010219.10010220¡/concept_id¿ ¡concept_desc¿Computing methodologies Multi-agent systems¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡/ccs2012¿

1. Introduction

Consider a large online social network, such as Facebook or Twitter. It enables unprecedented levels of social interaction in the digital space, as well as sharing of valuable information among individuals. It is also a treasure trove of potentially vulnerable individuals to exploit for unscrupulous parties who wish to gain an economic, social, or political advantage. In a perfect world, the social network is an enabler, allowing diffusion of valuable information. We can think of this “benign” information as originating stochastically from some node, and subsequently propagating over the network to its neighbors (e.g., through retweeting a news story), then their neighbors, and so on. But just as the network is a conduit for valueable information, so it is for “malicious” content. However, such undesirable content can be targeted: first, by selecting an influential starting point on the network (akin to influence maximization), and second, by tuning the content for maximal impact. For example, an adversary may craft the headline of a fake news story to capture the most attention. Consider the illustration in Figure 1, where an attacker crafts a fake news story and shares it with Adam. This story is then shared by Adam with his friends, and so on.

Refer to caption
Figure 1. An example of the propagation of malicious contents.

These are not abstract concerns. Recently, widespread malicious content (e.g., fake news, antisocial posts) in online social networks has become a major concern. For example, considering that over 50%50\% adults in the U.S. regard social media as their primary sources for news holcomb2013news , the negative impact of fake news can be substantial. According to Allcott et al. allcott2017social over 37 million news stories that are later proved fake were shared on Facebook in the last three months of 2016 U.S. presidential election. In addition to fake news, anti-social posts in online communities negatively affect other users and damage community dynamics cheng2015antisocial , while social network spam and phish can defraud users and spread malicious software cormack2008email .

The managers of online social networks are not powerless against these threats, and can deploy detection methods, such as statistical classifiers, to identify and stop the spread of malicious content. However, such traditional mitigations have not as yet proved adequate. We focus on two of the reasons for this: first, adversaries can tune content to avoid being detected, and second, traditional learning approaches do not account for network structure. The implication of network structure mediating both spread and detection has in turn two consequences: first, we have to account for impact of detection errors in terms of benign or malicious content subsequently propgatating through the network, and second, the fact that we can potentially detect malicious content at multiple nodes on the network creates a degree of redundancy. Consequently, while traditional detection methods use training data to learn a single “global” classifier of malicious and benign content, we show that specializing such learning to network structure, and using different classifiers at different nodes can dramatically improve performance.

To address the problem of malicious content detection on social networks, we propose two significant modeling innovations. First, we explicitly model the diffusion process of content over networks as a function of content (or, rather, features thereof). This is a generalization of typical network influence models which abstract away the nature of information being shared. It is also a crucial generalization in our setting, as it allows us to directly model the balancing act by the attacker between increasing social influence and avoiding detection. Second, we consider the problem of designing a collection of heterogeneous statistical detectors which explicitly account for network structure and diffusion at the level of individual nodes, rather than merely training data of past benign and malicious instances. We formalize the overall problem faced as a Stackelberg game between a defender (manager of the social network) who deploys a collection of heterogeneous detectors, and an attacker who optimally chooses both the starting node for malicious content, and the content itself. This results in a complex bi-level optimization problem, and we introduce a novel technical approach for solving it, first considering a naive model in which the defender knows the node being attacked, which allows us to develop a projected gradient descent approach for solving this restricted problem, and subsequently utilizing this to devise a heuristic algorithm for tackling the original problem. We show that our approach offers a dramatic improvement over both traditional homogeneous statistical detection and a common adversarial classification approach.

Related Work

A number of prior efforts have considered limiting adversarial influence on social networks. Most of these pit two influence maximization players against one another, with both choosing a subset of nodes to maximize the spread of their own influence (blocking the influence of the other). For example, Cerenet et al.  budak2011limiting consider the problem of blocking a “bad” campaign using a “good” campaign that spreads and thereby neutralizes the “bad” influence. Similarly, Tsai et al. tsai2012security study a zero-sum game between two parties with competing interests in a networked environment, with each party choosing a subset of nodes for initial influence. Vorobeychik et al.  vorobeychik2015securing considered an influence blocking game in which the defender chooses from a small set of security configurations for each node, while the attacker chooses an initial set of nodes to start a malicious cascade. The main differences between this prior work and ours is that (a) our diffusion process depends on the malicious content in addition to network topology, (b) detection at each node is explicitly accomplished using machine learning techniques, rather than an abstract small set of configurations, and (c) we consider an attacker who, in addition to choosing the starting point of a malicious cascade, chooses the content in part to evade the machine learning-based detectors. The issue of using heterogeneous (personalized) filters was previously studied by Laszka et al. Laszka15 , but this work did not consider network structure or adversarial evasion.

Our paper is also related to prior research in single-agent influence maximization and adversarial learning. Kempe et al. kempe2003maximizing initiated the study of influence maximization, where the goal is to select a set of nodes to maximize the total subset of network affected for discrete-time diffusion processes. Rodriguez et al. gomez2012influence and Du et al. du2012learning ; du2013uncover ; du2013scalable considered the continuous-time diffusion process to model information diffusion; we extend this model. Prior adversarial machine learning work, in turn, focuses on the design of a single detector (classifier) that is robust to evasion attacks dalvi2004adversarial ; bruckner2011stackelberg ; li2014feature . However, this work does not consider malicious content spreading over a social network.

2. Model

We are interested in protecting a set of agents on a social network from malicious content originating from an external source, while allowing regular (benign) content to diffuse. The social network is represented by a graph G=(V,E)G=(V,E), where VV is the set of vertices (agents) and EE is the set of edges. An edge between a pair of nodes represents communication or influence between them. For example, an edge from ii to jj may mean that jj can see and repost a video or a news article shared by ii. For simplicity, we assume that the network is undirected; generalization is direct.

We suppose that each message (benign or malicious) originates from a node on the network (which may differ for different messages) and then propagates to others. We utilize a finite set of instances diffusing over the network (of both malicious and benign content) as a training dataset DD. Each instance, malicious or benign, is represented by a feature vector xnx\in\mathbb{R}^{n} where nn is the dimension of the feature space. The dataset DD is partitioned into D+D^{+} and DD^{-}, where D+D^{+} corresponds to malicious and DD^{-} to benign instances.

To analyze the diffusion of benign and malicious content on social networks in the presence of an adversary, we develop formal models of (a) the diffusion process, (b) the defender who aims to prevent the spread of malicious content while allowing benign content diffuse, (c) the attacker who attempts to maximize the influence of a malicious message, and (d) the game between the attacker and defender. We present these next.

2.1. Continuous-Time Diffusion

Given an undirected network with a known topology, we use a continuous-time diffusion process to model the propagation of content (malicious or benign) through the social network, extending Rodriguez et al. gomez2012influence . In our model, diffusion will depend not merely on the network structure, but also on the nature of the item propagating through the network, which we quantify by a feature vector xx as above.

Suppose that the diffusion process for a single message originates at a node ss. First, xx is transmitted from ss to its direct neighbors. The time taken by a propagation through an edge ee is sampled from a distribution over time, fe(t;𝐰e,x)f_{e}(t;\mathbf{w}_{e},x), which is a function of the edge itself and the entity xx, and parametrized by 𝐰e\mathbf{w}_{e}. The affected (influenced) neighbors of ss then propagate xx to their neighbors, and so on. We assume that an affected agent remains affected through the diffusion process.

Given a sample of propagation times over all edges, the time tit_{i} taken to affect an agent ii is the length of the shortest path between ss and ii, where the weights of edges are propagation times associated with these edges. The continuous-time diffusion process is supplied with a time window TT, which is used to simulate time-sensitive natures of propagation, for example, people are generally concerned about a news for several months but not for years. An agent is affected if and only if its shortest path to ss is less than or equal to TT. The diffusion process terminates when the path from ss to every unaffected agent is above TT. We define the influence σ(s,x)\sigma(s,x) of an instance xx initially affecting a network node ss as the expected number of affected agents over a fixed time window TT.

Refer to caption
Figure 2. Rayleigh distributions with different 1/γ21/\gamma^{2}.

We assume that the distributions associated with edges are Rayleigh distributions (illustrated in Figure 2), which have density function f(t;γ)=tγ2et2/(2γ2)f(t;\gamma)=\frac{t}{\gamma^{2}}{e}^{-t^{2}/(2\gamma^{2})}, where t0t\geq 0 and γ\gamma is the scale parameter.111It is straightforward to allow for alternative distributions, such as Weibull. The Rayleigh distribution is commonly used in epidemiology and survival analysis wallinga2004different and has been recently applied to model information diffusion in social networks gomez2012influence ; du2013uncover . In order to account for heterogeneity among mutual interactions of agents, and to let the influence of a process depend on the content being diffused, we parameterize the Rayleigh distribution of each edge by letting 1/γ2=𝐰Tx1/\gamma^{2}=\mathbf{w}^{T}x, where 𝐰\mathbf{w} is sampled from the uniform distribution over [0,1][0,1]. This parameterization results in the following density function for an arbitrary edge:

fe(t;𝐰e,x)=t(𝐰eTx)e12t2(𝐰eTx).f_{e}(t;\mathbf{w}_{e},x)=t({\mathbf{w}_{e}}^{T}x){e}^{-\frac{1}{2}t^{2}({\mathbf{w}_{e}}^{T}x)}. (1)

We denote by 𝒲={𝐰e|eE}\mathcal{W}=\{\mathbf{w}_{e}|\forall e\in E\} the joint parametrization of all edges.

Throughout, we assume that the parameters 𝒲\mathcal{W} are given, and known to both the defender and attacker. A number of other research studies explore how to learn these parameter vectors from data du2012learning ; du2013uncover .

2.2. Defender Model

To protect the network against the spread of malicious content, the network manager—henceforth, the defender—can deploy statistical detection, which considers a particular instance (e.g., a tweet with an embedded link) and decides whether or not it is safe. The traditional way of deploying such a system is to take the dataset DD of labeled malicious and benign examples, train a classifier, and use it to detect new malicious content. However, this approach entirely ignores the fact that a social network mediates the spread of both malicious and benign entities. Moreover, both the nature (as captured in the feature vector) and the origin of malicious instances are deliberate decisions by the adversary aiming to maximize impact (and harm, from the defender’s perspective). Our key innovations are (a) to allow heterogeneous parametrization of classifiers deployed at different nodes, and (b) to explicitly consider both diffusion and adversarial manipulation during learning. In combination, this enables us to significantly boost detection effectiveness in social network settings.

Let Θ={θ1,θ2,,θ|V|}\Theta=\{\theta_{1},\theta_{2},\cdots,\theta_{|V|}\} be a vector of parameters of detection models deployed on the network where each θiΘ\theta_{i}\in\Theta represents the model used for content shared by node ii.222Below, we focus on θi\theta_{i} corresponding to detection thresholds as an illustration; generalization is direct. We now extend our definition of expected influence to be a function of detector parameters, denoting it by σ(i,Θ,x)\sigma(i,\Theta,x), since any content xx (malicious or benign) starting at node ii which is classified as malicious at a node jj (not necessarily the same as ii) will be blocked from spreading any further.

We define the defender’s utility as

Ud=αxDiVσ(i,Θ,x)(1α)xD+σ(s,Θ,z(x)),\displaystyle\begin{split}U_{d}=\alpha\sum_{x\in D^{-}}\sum_{i\in V}{\sigma(i,\Theta,x)}-(1-\alpha)\sum_{x\in D^{+}}{\sigma(s,\Theta,z(x))},\end{split} (2)

where ss is the starting node targeted by the adversary, which is subsequently modified by the same adversary into z(x)z(x) (in an attempt to bypass detection) when the original content used by the adversary is xx. The first part of the utility represents the influence of benign content that the defender wishes to maximize, while the second part denotes the influence of malicious content that the defender aims to limit, with α\alpha trading off the relative importance of these two considerations. Observe that we assume that benign content originates uniformly over the set of nodes, while malicious origin is selected by the adversary. The defender’s action space is the set of all possible parameters Θ\Theta of the detectors deployed at all network nodes. Note that, as is typical in machine learning, we are using the finite labeled dataset DD as a proxy for expected utility with respect to malicious and benign content generated from the same distribution as the data.

2.3. Attacker Model

The attacker’s decision is twofold: (1) find a node sVs\in V to start diffusion; and (2) transform malicious content from xx (its original, or ideal, form) into another feature vector z(x)z(x) with the aim of avoiding detection. The first decision is reminiscent of the influence maximization problemkempe2003maximizing . The second decision is commonly known as the evasion attack on classifiers lowd2005adversarial ; li2014feature . In our case, the adversary attempts to balance three considerations: (a) impact, mediated by the diffusion of malicious content, (b) evasion, or avoiding being detected (a critical consideration for impact as well), and (c) a cost of modifying original “ideal” content into another form, which corresponds to the associated reduced effectiveness of the transformed content, or effort involved in the transformation. We impose this last consideration as a hard constraint that z(x)xpϵ||z(x)-x||_{p}\leq\epsilon for an exogenously specified ϵ\epsilon, where p\|\cdot\|_{p} is the lpl_{p} norm.

Consider the collection of detectors with parameters Θ\Theta deployed on the network. We say that a malicious instance is detected at a node ii if 𝟙[θi(x)=1]=1\mathbbm{1}[\theta_{i}(x)=1]=1, where 𝟙()\mathbbm{1}(\cdot) is the 0-1 indicator function. The optimization problem of the attacker corresponding to an original malicious instance xD+x\in D^{+} is then:

maxi,z\displaystyle\max_{i,z} σ(i,Θ,z)\displaystyle\sigma(i,\Theta,z) (3)
s.t\displaystyle s.t zxpϵ\displaystyle{||z-x||}_{p}\leq\epsilon
𝟙[θj(z)=1]=0,jV\displaystyle\mathbbm{1}[\theta_{j}(z)=1]=0,\forall j\in V

where the first constraint is the attacker’s budget limit, while the second constraint requires that the attack instance zz remains undetected. If Problem (LABEL:eq:attacker_opt) does not have a feasible solution, the attacker sends the original malicious instance without any modification. Consequently, the pair (s,z(x))(s,z(x)) in the defender’s utility function above are the solutions to Problem (LABEL:eq:attacker_opt).

2.4. Stackelberg Game Formulation

We formally model the interaction between the defender and the attacker as a Stackelberg game in which the defender is the leader (choosing parameters of node-level detectors) and the attacker the follower (choosing a node to start malicious diffusion, as well as the content thereof). We assume that the attacker knows Θ\Theta, as well as all relevant parameters (such as 𝒲\mathcal{W}) before constructing its attack. The equilibrium of this game is the joint choice of (Θ,s(Θ),z(x;Θ))(\Theta,s(\Theta),z(x;\Theta)), where s(Θ)s(\Theta) and z(x;Θ)z(x;\Theta) solve Problem (LABEL:eq:attacker_opt), thereby maximizing the attacker’s utility, and Θ\Theta maximizes the defender’s utility given ss and zz. More precisely, we aim to find a Strong Stackelberg Equilibrium (SSE), where the attacker breaks ties in the defender’s favor.

We propose finding solutions to this Stackelberg game using the following optimization problem:

maxΘ\displaystyle\max_{\Theta} αxDiσ(i,Θ,x)(1α)xD+σ(s,Θ,z(x))\displaystyle\alpha\sum_{x\in D^{-}}\sum_{i}{\sigma(i,\Theta,x)}-(1-\alpha)\sum_{x\in D^{+}}{\sigma(s,\Theta,z(x))} (4)
s.t.:\displaystyle s.t.: xD+:(s,z(x))argmaxj,zσ(j,Θ,z)\displaystyle\forall x\in D^{+}:\quad(s,z(x))\in\operatorname*{arg\,max}_{j,z}\sigma(j,\Theta,z)
xD+:||z(x)x||pϵ\displaystyle\forall x\in D^{+}:\quad{||z(x)-x||}_{p}\leq\epsilon
xD+:𝟙[θk(x)=1]=0,kV\displaystyle\forall x\in D^{+}:\quad\mathbbm{1}[\theta_{k}(x)=1]=0,\forall k\in V

This is a hierarchical optimization problem, where the upper-level optimization corresponds to maximizing the defender’s utility. The constraints of the upper-level optimization are called the lower-level optimization, which is the attacker’s optimization problem.

The optimization problem (4) is generally intractable for several reasons. First, Problem (4) is a bilevel optimization problem colson2007overview , which is hard even when the upper- and lower-level problems are both linear colson2007overview . The second difficulty lies in maximizing σ(i,Θ,x)\sigma(i,\Theta,x) (the attacker’s problem), as the objective function does not have an explicit closed-form expression. In what follows, we develop a principled approach to address these technical challenges.

3. Solution Approach

We start by making a strong assumption that the defender knows the node being attacked. This will allow us to make considerable progress in transforming the problem into a significantly more tractable form. Subsequently, we relax this assumption, developing an effective heuristic algorithm for computing the SSE of the original problem.

First, we utilize the tree structure of a continuous-time diffusion process to convert (4) into a tractable bilevel optimization. We then collapse the bilevel optimization into a single-level optimization problem by leveraging Karush-Kuhn-Tucker (KKT) boyd2004convex conditions. The assumption that the defender knows the node being attacked allows us to solve the resulting single-level optimization problem using projected gradient descent.

3.1. Collapsing the Bilevel Problem

A continuous-time diffusion process proceeds in a breadth-first-search fashion. It starts from an agent ii trying to influence each of its neighbors. Then its neighbors try to influence their neighbors, and so on. Notice that once an agent becomes affected, it is no longer affected by others. The main consequence of this propagation process is that it results in a propagation tree rooted at ii, with its structure intimately connected to the starting node ii. This is where we leverage the assumption that the defender knows the starting node of the attack: in this case, the tree structure can be pre-computed, and fixed for the optimization problem.

We divide the agents traversed by the tree into several layers in terms of their distances to the source, where each layer is indexed by ll. Since the structure of the tree depends on ii, ll is a function of ii, l(i)l(i). An example of the influence propagation tree is depicted in Figure 3, where the first layer consists of {j,k,,g}\{j,k,\cdots,g\}. The number next to each edge represents the weight sampled from the associated distribution.

We define a matrix 𝐀lNl×n{\mathbf{A}}_{l}\in\mathbb{R}^{N_{l}\times n} where NlN_{l} is the number of agents in layer ll and nn is the feature dimension of xx. Each row of 𝐀l\mathbf{A}_{l} corresponds to the parametrization vector 𝐰\mathbf{w} of an edge in layer ll (an edge is in layer ll if one of its endpoint is in layer l1l-1 while the other is in layer ll; the source is always in layer zero). For example, in Figure 3, 𝐀1=[𝐰ijT;𝐰ikT;;𝐰igT]\mathbf{A}_{1}=[{\mathbf{w}^{T}_{ij}};{\mathbf{w}^{T}_{ik}};\cdots;{\mathbf{w}^{T}_{ig}}]. The product of 𝐀lx\mathbf{A}_{l}x is a vector in Nl\mathbb{R}^{N_{l}}, where each element corresponds to the parameter 1/γ21/\gamma^{2} of an edge in layer ll.

Refer to caption
Figure 3. A example continuous-time diffusion process.

Recall that a sample of random variables from Rayleigh distributions associated with edges corresponds to a sample of weights associated with these edges. With a fixed time window TT, small edge weights result in wider diffusion of the content over the social network. For example, in Figure 3 if the number next to each edge represents a sample of weights, then with T=1T=1 the propagation starting from ii can only reach agents jj and kk. However, if we assume that in another sample ti,j,ti,k,ti,gt_{i,j},t_{i,k},t_{i,g} all become 0.10.1, then with the same time window the propagation can reach every agent in the network. Consequently, the attacker’s goal is to increase 1/γ2=𝐰eTx1/\gamma^{2}=\mathbf{w}_{e}^{T}x for each edge ee. This suggests that in order to increase 1/γ21/\gamma^{2} the attacker can modify the malicious instance xx such that the inner products between xx and the parameter vectors 𝐰e\mathbf{w}_{e} of edges are large. Consequently, we can formulate the attacker’s optimization problem with respect to malicious content zz for a given original feature vector xx as

maxz\displaystyle\max_{z} lkl𝟏T𝐀lz\displaystyle\sum_{l}{k_{l}\mathbf{1}^{T}\mathbf{A}_{l}z} (5)
s.t.\displaystyle s.t. zxpϵ\displaystyle{||z-x||}_{p}\leq\epsilon
𝟙[θk(z)=1]=0,kV.\displaystyle\mathbbm{1}[\theta_{k}(z)=1]=0,\forall k\in V.

The attacker aims to make 1/γ21/\gamma^{2} for each edge as large as possible, which is captured by the objective function 𝟏T𝐀lz\mathbf{1}^{T}\mathbf{A}_{l}z, where 𝟏Nl\mathbf{1}\in\mathbb{R}^{N_{l}} is a vector with all elements equal to one. Intuitively, this means the attacker is trying to maximize on average the parameter 1/γ21/\gamma^{2} of every edge at layer ll. Here, [k1,k2,,kl][k_{1},k_{2},\cdots,k_{l}] is a vector of decreasing coefficients that provides more flexibility to modeling the attacker’s behavior: they are used to re-weight the importance of each layer. For example, setting k1=e0,k2=e1,,kl=elk_{1}=e^{0},k_{2}=e^{-1},\cdots,k_{l}=e^{-l} models the attacker who tries to make malicious instances spread wider at the earlier layers of the diffusion.

We now use similar ideas to convert the upper-level optimization problem of (4) into a more tractable form. Suppose that the node being attacked is ss (and known to the defender). Then the defender wants the detection model at jj to accurately identify both malicious and benign contents. This is achieved by the two indicator functions inside 1 and 2 in the reformulated objective function of the defender (6):

maxΘαxDj𝟙[θj(x)=0]lkl𝐜l,jT𝐀lx1(1α)xD+𝟙[θs(z(x))=0]lkl𝐜l,sT𝐀lz(x)2\displaystyle\begin{split}&\max_{\Theta}{\underbrace{\alpha\sum_{x\in D^{-}}\sum_{j}{\mathbbm{1}[\theta_{j}(x)=0]\sum_{l}{k_{l}\mathbf{c}^{T}_{l,j}\mathbf{A}_{l}x}}}_{\leavevmode\hbox to7.83pt{\vbox to7.83pt{\pgfpicture\makeatletter\hbox{\enspace\lower-3.91264pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.71265pt}{0.0pt}\pgfsys@curveto{3.71265pt}{2.05046pt}{2.05046pt}{3.71265pt}{0.0pt}{3.71265pt}\pgfsys@curveto{-2.05046pt}{3.71265pt}{-3.71265pt}{2.05046pt}{-3.71265pt}{0.0pt}\pgfsys@curveto{-3.71265pt}{-2.05046pt}{-2.05046pt}{-3.71265pt}{0.0pt}{-3.71265pt}\pgfsys@curveto{2.05046pt}{-3.71265pt}{3.71265pt}{-2.05046pt}{3.71265pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.99306pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}}\\ &\quad\quad-\underbrace{(1-\alpha)\sum_{x\in D^{+}}{\mathbbm{1}[\theta_{s}(z(x))=0]\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z(x)}}}_{\leavevmode\hbox to7.83pt{\vbox to7.83pt{\pgfpicture\makeatletter\hbox{\enspace\lower-3.91264pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{3.71265pt}{0.0pt}\pgfsys@curveto{3.71265pt}{2.05046pt}{2.05046pt}{3.71265pt}{0.0pt}{3.71265pt}\pgfsys@curveto{-2.05046pt}{3.71265pt}{-3.71265pt}{2.05046pt}{-3.71265pt}{0.0pt}\pgfsys@curveto{-3.71265pt}{-2.05046pt}{-2.05046pt}{-3.71265pt}{0.0pt}{-3.71265pt}\pgfsys@curveto{2.05046pt}{-3.71265pt}{3.71265pt}{-2.05046pt}{3.71265pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.99306pt}{-2.25555pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}\end{split} (6)

Notice that this expression includes a vector 𝐜l,jNl\mathbf{c}_{l,j}\in\mathbb{R}^{N_{l}} that does not appear in (5). 𝐜l,j\mathbf{c}_{l,j} is a function of Θ\Theta and xx, for a given node jj which triggers diffusion (which we omit below for clarity):

𝐜l,j=[𝟙[θl1(x)=0]𝟙[θl2(x)=0]𝟙[θlNl(x)=0].]\mathbf{c}_{l,j}=\\ \begin{bmatrix}\mathbbm{1}[\theta_{l_{1}}(x)=0]\\ \mathbbm{1}[\theta_{l_{2}}(x)=0]\\ \vdots\\ \mathbbm{1}[\theta_{l_{N_{l}}}(x)=0].\\ \end{bmatrix} (7)

Slightly abusing notation, we let li,i[1,2,,Nl]l_{i},i\in[1,2,\cdots,N_{l}] denote the iith agent in layer ll. The term kl𝐜l,jT𝐀lxk_{l}{\mathbf{c}}^{T}_{l,j}\mathbf{A}_{l}x in 1 can be expanded as follows:

kl𝐜l,jT𝐀lx=kl[𝟙[θl1(x)=0],,𝟙[θlNl(x)=0]][𝐰l1Tx𝐰lNlTx]=kl(𝟙[θl1(x)=0]𝐰l1Tx++𝟙[θlNl(x)=0]𝐰lNlTx),\displaystyle\begin{split}k_{l}\mathbf{c}^{T}_{l,j}&\mathbf{A}_{l}x\\ &=k_{l}\begin{bmatrix}\mathbbm{1}[\theta_{l_{1}}(x)=0],\ldots,\mathbbm{1}[\theta_{l_{N_{l}}}(x)=0]\end{bmatrix}\begin{bmatrix}\mathbf{w}^{T}_{l_{1}}x\\ \vdots\\ \mathbf{w}^{T}_{l_{N_{l}}}x\end{bmatrix}\\ &=k_{l}\bigg{(}\mathbbm{1}[\theta_{l_{1}}(x)=0]\mathbf{w}^{T}_{l_{1}}x+\cdots+\mathbbm{1}[\theta_{l_{N_{l}}}(x)=0]\mathbf{w}^{T}_{l_{N_{l}}}x\bigg{)},\end{split} (8)

noting again that ll and NlN_{l} depend on jj, the starting node of the diffusion process. From the expression (8), the defender tries to find Θ\Theta that minimizes the impact of false positives while maximizing the impact of true negatives. This is because if each benign instance xDx\in D^{-} is correctly identified (false-positive rates are zero and true-negative rates are one), the summation at the second line of expression (8) will attain its maximum possible value.

In addition to facilitating the propagation of benign contents, the defender wants to limit the propagation of malicious contents, which is embodied in 2. The equations in 2 are similar to those in 1, except that the summation is over malicious contents D+D^{+}, and 2 is accounting for the false negatives. In this case, 𝐜l,s\mathbf{c}_{l,s} is a function of z(x)z(x), the adversarial feature vector which transforms xx into another, zz.

We now re-formulate the problem (4) as a new bilevel optimization problem (9). The upper-level problem corresponds to the defender’s strategy (6), and the lower-level problem to the attacker’s optimization problem (5). Here, ss is again the node chosen by the attacker.

minΘ(1α)xD+𝟙[θs(x)=0]lkl𝐜l,sT𝐀lz(x)αxDj𝟙[θj(x)=0]lkl𝐜l,jT𝐀lxs.t:xD+:z(x)argmaxzlkl𝟏T𝐀lzs.t.xD+:||z(x)x||pϵxD+:𝟙[θk(z(x))=1]=0,kVxD+:z(x)0,\displaystyle\begin{split}&\min_{\Theta}{(1-\alpha)\sum_{x\in D^{+}}{\mathbbm{1}[\theta_{s}(x)=0]\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z(x)}}}\\ &\quad\quad-\alpha\sum_{x\in D^{-}}\sum_{j}{\mathbbm{1}[\theta_{j}(x)=0]\sum_{l}{k_{l}\mathbf{c}^{T}_{l,j}\mathbf{A}_{l}x}}\\ &\quad\quad\quad\quad s.t:\,\,\forall x\in D^{+}:z(x)\leftarrow\operatorname*{arg\,max}_{z}\sum_{l}{k_{l}\mathbf{1}^{T}\mathbf{A}_{l}z}\\ &\quad\quad\quad\quad\quad\quad s.t.\quad\forall x\in D^{+}:{||z(x)-x||}_{p}\leq\epsilon\\ &\quad\quad\quad\quad\quad\quad\quad\quad\,\,\forall x\in D^{+}:\mathbbm{1}[\theta_{k}(z(x))=1]=0,\forall k\in V\\ &\quad\quad\quad\quad\quad\quad\quad\quad\,\,\forall x\in D^{+}:z(x)\succeq 0,\end{split} (9)

where the last constraint ensures that 𝐰Tz(x)0\mathbf{w}^{T}z(x)\geq 0 for all attacks z(x)z(x).

The final step, inspired by mei2015security ; mei2015using , is to convert (9) into a single-level optimization problem via the KKT boyd2004convex conditions of the lower-level problem. With appropriate norm constraints (e.g., l2l_{2} norm) and a convex relaxation of the indicator functions (i.e., convex surrogates of the indicator functions), the lower-level problem of (9) is convex. A convex optimization problem can be equivalently represented by its KKT conditionsburges1998tutorial . The single-level optimization problem then becomes:

minΘF^ds.t.x:z(lkl𝐜l,sT𝐀lz+λg(z,x)+μTh(z,Θ)ηTz)=0λg(z,x)=0,λ0g(z,x)0η(z)=0,η0h(z,Θ)=0\displaystyle\begin{split}&\min_{\Theta}{\quad\hat{F}_{d}}\\ &s.t.\quad\forall x:\\ &\partial_{z}\bigg{(}-\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z}+\lambda g(z,x)+\mu^{T}h(z,\Theta)-\eta^{T}z\bigg{)}=0\\ &\qquad\quad\,\,\lambda g(z,x)=0,\lambda\geq 0\\ &\qquad\quad\,\,g(z,x)\leq 0\\ &\qquad\quad\,\,\eta\odot(-z)=0,\eta\succeq 0\\ &\qquad\quad\,\,h(z,\Theta)=0\\ \end{split} (10)

where F^d\hat{F}_{d} is the objective function of Problem (9), and λ\lambda, μ\mu, η\eta are vectors of lagrangian multipliers. g(z,x)=zxpϵ0g(z,x)={||z-x||}_{p}-\epsilon\leq 0 is the attacker’s budget constraint. h(x,Θ)h(x,\Theta) is the set of equality constraints 𝟙[θj(z)=1]=0,jV\mathbbm{1}[\theta_{j}(z)=1]=0,\forall j\in V. η(z)\eta\odot(-z) is the Hadamard (elementwise) product between η\eta and (z)(-z) .

3.2. Projected Gradient Descent

In this section we demonstrate how to solve the single-level optimization obtained above by projected gradient descent. The key technical challenge is that we don’t have an explicit representation of the gradients with respect to the defender’s decision Θ\Theta, as these are indirectly related via the optimal solution to the attacker’s optimization problem. We derive these gradients based on the implicit function of the defender’s utility with respect to Θ\Theta.

We begin by outlining the overall iterative projected gradient descent procedure. In iteration tt we update the parameters of detection models by taking a projected gradient step:

Θ(t+1)=Proj𝒜d(Θ(t)βtΘF^d|Θ=Θ(t))\displaystyle\begin{split}{\Theta}^{(t+1)}=\text{Proj}_{\mathcal{A}_{d}}\bigg{(}\Theta^{(t)}-\beta_{t}{\nabla}_{\Theta}\hat{F}_{d}\big{|}_{\Theta=\Theta^{(t)}}\bigg{)}\end{split} (11)

where 𝒜d\mathcal{A}_{d} is the feasible domain of Θ\Theta and βt\beta_{t} is the learning rate. With Θ(t+1){\Theta}^{(t+1)} we solve for z(t+1)z^{(t+1)}, which is the optimal attack for a fixed Θ(t+1){\Theta}^{(t+1)}. ΘF^d{\nabla}_{\Theta}\hat{F}_{d} is the gradient of the upper-level problem.

Expanding ΘF^d{\nabla}_{\Theta}\hat{F}_{d} using the chain rule and still using ss as the initially attacked node, we obtain

ΘF^d=(1α)1α21=xD+[𝟙[θs(z(x))=0]Θlkl𝐜l,sT𝐀lz(x)+𝟙[θs(z(x))=0][lkl𝐜l,sT𝐀lz(x)]Θ(a)]2=xDj[𝟙[θj(x)=0]Θlkl𝐜l,jT𝐀lx+𝟙[θj(x)=0][lkl𝐜l,jT𝐀lx]Θ(b)]\displaystyle\begin{split}&{\nabla}_{\Theta}\hat{F}_{d}=(1-\alpha)\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\enspace\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}-\alpha\leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\enspace\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\\ \leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\enspace\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}=&\sum_{x\in D^{+}}{\bigg{[}\frac{\partial\mathbbm{1}[\theta_{s}(z(x))=0]}{\partial\Theta}\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z(x)}}+\\ &\qquad\mathbbm{1}[\theta_{s}(z(x))=0]\underbrace{\frac{\partial[\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z(x)}]}{\partial\Theta}}_{\text{(a)}}\bigg{]}\\ \leavevmode\hbox to9.93pt{\vbox to9.93pt{\pgfpicture\makeatletter\hbox{\enspace\lower-4.9644pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.7644pt}{0.0pt}\pgfsys@curveto{4.7644pt}{2.63133pt}{2.63133pt}{4.7644pt}{0.0pt}{4.7644pt}\pgfsys@curveto{-2.63133pt}{4.7644pt}{-4.7644pt}{2.63133pt}{-4.7644pt}{0.0pt}\pgfsys@curveto{-4.7644pt}{-2.63133pt}{-2.63133pt}{-4.7644pt}{0.0pt}{-4.7644pt}\pgfsys@curveto{2.63133pt}{-4.7644pt}{4.7644pt}{-2.63133pt}{4.7644pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-3.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{ }\pgfsys@endscope}}} \pgfsys@invoke{ }\pgfsys@endscope}}} } \pgfsys@invoke{ }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}=&\sum_{x\in D^{-}}\sum_{j}{\bigg{[}\frac{\partial\mathbbm{1}[\theta_{j}(x)=0]}{\partial\Theta}\sum_{l}{k_{l}\mathbf{c}^{T}_{l,j}\mathbf{A}_{l}x}}+\\ &\qquad\mathbbm{1}[\theta_{j}(x)=0]\underbrace{\frac{\partial[\sum_{l}{k_{l}\mathbf{c}^{T}_{l,j}\mathbf{A}_{l}x}]}{\partial\Theta}}_{\text{(b)}}\bigg{]}\end{split} (12)

In both 1 and 2 we note that 𝟙[θj(x)=0]Θ\frac{\partial\mathbbm{1}[\theta_{j}(x)=0]}{\partial\Theta} is dependent on the specific detection models. We will give a concrete example of their derivation in Section 3.5.

In lkl𝐜l,sT𝐀lz(x)\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z(x)} there are two terms that are functions of Θ\Theta: 𝐜l,s\mathbf{c}_{l,s} and z(x)z(x). Consequently, (a)(a) can be expanded as:

(a)=lkl[𝐜l,sΘl𝐀lz(x)+[z(x)Θl]T𝐀lT𝐜l,s].\displaystyle\begin{split}(a)=\sum_{l}{k_{l}\bigg{[}\frac{\partial\mathbf{c}_{l,s}}{\partial\Theta_{l}}\mathbf{A}_{l}z(x)+{\left[\frac{\partial z(x)}{\partial\Theta_{l}}\right]}^{T}\mathbf{A}^{T}_{l}\mathbf{c}_{l,s}\bigg{]}}.\end{split} (13)

Note that only the detection models of those agents at layer ll have contribution to 𝐜l,s\mathbf{c}_{l,s}. Thus, 𝐜l,sΘl\frac{\partial\mathbf{c}_{l,s}}{\partial\Theta_{l}} is a Jacobian matrix with dimension Nl×NlN_{l}\times N_{l}, where NlN_{l} is the number of agents at layer ll and Θl\Theta_{l} denotes the detection models of those NlN_{l} agents. Since 𝐜l,s\mathbf{c}_{l,s} is also dependent on the specific detection models of agents, we defer its derivation to Section 3.5.

z(x)Θl\frac{\partial z(x)}{\partial\Theta_{l}} is a n×Nln\times N_{l} Jacobian matrix and is the main difficulty because we do not have an explicit function of the attacker’s optimal decision z(x)z(x) with respect to Θl\Theta_{l}. Fortunately, the constraints in (10) implicitly define z(x)z(x) in terms of Θ\Theta:

𝐟(Θ,z,λ,μ,η)=[z(lkl𝐜l,sT𝐀lz+λg(z,x)+μTh(z,Θ)ηTz)λg(z,x)μTh(z,Θ)η(z)]\displaystyle\begin{split}&\mathbf{f}(\Theta,z,\lambda,\mu,\eta)=\\ &\begin{bmatrix}\partial_{z}\bigg{(}-\sum_{l}{k_{l}\mathbf{c}^{T}_{l,s}\mathbf{A}_{l}z}+\lambda g(z,x)+\mu^{T}h(z,\Theta)-\eta^{T}z\bigg{)}\\ \lambda g(z,x)\\ \mu^{T}h(z,\Theta)\\ \eta\odot(-z)\end{bmatrix}\end{split} (14)

Θ\Theta and the attacked malicious instance zz satisfy 𝐟(Θ,z,λ,μ,η)=𝟎\mathbf{f}(\Theta,z,\lambda,\mu,\eta)=\mathbf{0}. The Implicit Function Theoremzorichmathematical states that if 𝐟(Θ,z,λ,μ,η)\mathbf{f}(\Theta,z,\lambda,\mu,\eta) is continuous and differentiable and the Jacobian matrix

[𝐟z|𝐟λ|𝐟μ|𝐟η]\left[\frac{\partial\mathbf{f}}{\partial z}|\frac{\partial\mathbf{f}}{\partial\lambda}|\frac{\partial\mathbf{f}}{\partial\mu}|\frac{\partial\mathbf{f}}{\partial\eta}\right]

has full rank, there is a unique implicit function I(Θ)=(z,λ,μ,η)I(\Theta)=(z,\lambda,\mu,\eta). Moreover, the derivative of IΘ\frac{\partial I}{\partial\Theta} is:

IΘ=[𝐟z|𝐟λ|𝐟μ|𝐟η]1(𝐟Θ).\displaystyle\begin{split}\frac{\partial I}{\partial\Theta}=-{\begin{bmatrix}\frac{\partial\mathbf{f}}{\partial z}|\frac{\partial\mathbf{f}}{\partial\lambda}|\frac{\partial\mathbf{f}}{\partial\mu}|\frac{\partial\mathbf{f}}{\partial\eta}\end{bmatrix}}^{-1}\left(\frac{\partial\mathbf{f}}{\partial\Theta}\right).\end{split} (15)

𝐟z\frac{\partial\mathbf{f}}{\partial z} is the Jacobian matrix of 𝐟(Θ,z,λ,μ,η)\mathbf{f}(\Theta,z,\lambda,\mu,\eta) with respect to zz, and so on. zΘn×N\frac{\partial z}{\partial\Theta}\in\mathbbm{R}^{n\times N} is the first nn rows of IΘ\frac{\partial I}{\partial\Theta}, where zΘl\frac{\partial z}{\partial\Theta_{l}} can be column-wise indexed by the nodes at layer ll.

(b)(b) can be similarly expanded as we had done for (a)(a), except that the attacker does not modify benign content, so that xDx\in D^{-} is no longer a function of Θ\Theta:

(b)=ljkl[𝐜l,jΘl𝐀lx].\displaystyle\begin{split}(b)=\sum_{l}\sum_{j}{k_{l}\bigg{[}\frac{\partial\mathbf{c}_{l,j}}{\partial\Theta_{l}}\mathbf{A}_{l}x\bigg{]}}.\end{split} (16)

The full projected gradient descent approach is given by Algorithm 1.

Algorithm 1 Find Defense Strategy
1:Input: agent jj
2:Initialize: Θ(0),λ,μ,η,β0\Theta^{(0)},\lambda,\mu,\eta,\beta_{0}
3:for t=1kt=1\cdots k do
4:  Θ(t+1)=Proj𝒜d(Θ(t)βtΘF^d|Θ=Θ(t))\Theta^{(t+1)}=\text{Proj}_{\mathcal{A}_{d}}\bigg{(}\Theta^{(t)}-\beta_{t}{\nabla}_{\Theta}\hat{F}_{d}\big{|}_{\Theta=\Theta^{(t)}}\bigg{)}
5:end for
6:return Θ(k+1)\Theta^{(k+1)}

3.3. Optimal Attack

So far, we had assumed that the network node being attacked is fixed. However, the ultimate goal is to allow the attacker to choose both the node ss, and the modification of the malicious content zz. We begin our generalization by first allowing the attacker to optimize these jointly.

The full attacker algorithm which results is described in Algorithm 2.

Algorithm 2 Optimal Attack Strategy
1:Input: Θ,x\Theta,x
2:Initialize: ret=[]ret=[]
3:for i=1|V|i=1\cdots|V| do
4:  x(i)Solve (5)x(i)\leftarrow\text{Solve (\ref{eq:attacker_opt_tractable})}
5:  U^a(i)Optimal objective value of (5)\hat{U}_{a}(i)\leftarrow\text{Optimal objective value of (\ref{eq:attacker_opt_tractable})}
6:  (i,z(i,x),U^a(i))\big{(}i,z(i,x),\hat{U}_{a}(i)\big{)} appended to retret
7:end for
8:z,sOptimalTuple(ret)z,s\leftarrow\text{OptimalTuple(ret)}
9:return z,sz,s

Recall that the tree structure of a propagation is dependent on the agent being attacked, which makes the objective function of (5) a function of the agent being attacked. Thus, for a given fixed Θ\Theta, the attacker iterates through each agent ii and solves the problem (5), assuming the propagation starts from ii, resulting in associated utility U^a(i)\hat{U}_{a}(i) and an attacked instance z(i,x)z(i,x). Then ii, z(i,x)z(i,x), and U^a(i)\hat{U}_{a}(i) are appended into a list of a 3-tuples (the sixth step in Algorithm 2). When the iteration completes the attacker picks the optimal 3-tuple in terms of utility (eighth step in Algorithm 2, where the function OptimalTuple(ret) finds the optimal 3-tuple from the list ret). The node ss and the corresponding attack instance zz in this optimal 3-tuple become the optimal attack.

3.4. SSE Heuristic Algorithm

Now we take the final step, relaxing the assumption that the attacker chooses a fixed node to attack which is known to the defender prior to choosing Θ\Theta. Our main observation is that fixing ss in the defender’s algorithm above allows us to find a collection of heterogeneous detector parameters Θ\Theta, and we can evaluate the actual utility of the associated defense (i.e., if the attacker optimally chooses both ss and zz in response) by using Algorithm 2. We use this insight to devise a simple heuristic: iterate over all potential nodes ss that can be attacked, compute the associated defense Θ(s)\Theta(s) (using the optimistic definition of defender’s utility in which ss is assumed fixed), then find the actual optimal attack in response for each xD+x\in D^{+}. Finally, choose the Θ(s)\Theta(s) which has the best actual defender utility.

This heuristic algorithm is described in Algorithm 3.

Algorithm 3 Optimal Defense Strategy
1:Input: G=(V,E),𝒲,DG=(V,E),\mathcal{W},D
2:for j=1|V|j=1\cdots|V| do
3:  ΘjApply Algorithm 1\Theta_{j}\leftarrow\text{Apply Algorithm \ref{algo:find_defense}}
4:  xD+:(s,z(x))Apply Algorithm 2\forall x\in D^{+}:(s,z(x))\leftarrow\text{Apply Algorithm \ref{algo:opt_attack}}
5:  U^d(j)DefenderUtility(Θj,(s,z(x)))\hat{U}_{d}(j)\leftarrow\textit{DefenderUtility}(\Theta_{j},(s,z(x)))
6:end for
7:jargmaxjU^d(j)j\leftarrow\operatorname*{arg\,max}_{j}{\hat{U}_{d}(j)}
8:return Θj\Theta_{j}

The fifth step in the algorithm includes the function DefenderUtility, which evaluates the defender’s utility U^d(j)\hat{U}_{d}(j). Note that the input argument ss of this function is used to determine the tree structure of the propagation started from ss.

Recall that Algorithm 1 solves (10), which depends on the specific detection model to compute the relevant gradients. Therefore, in what follows, we present a concrete example of how to solve (10) where detection models are logistic regressions. Specifically, we illustrate how to derive the two terms, 𝟙[θj(z)=0]Θ\frac{\partial\mathbbm{1}[\theta_{j}(z)=0]}{\partial\Theta} and 𝐜l,jΘl\frac{\partial\mathbf{c}_{l,j}}{\partial\Theta_{l}} that depend on particular details of the detection model.

3.5. Illustration: Logistic Regression

We consider the logistic regression model used for detection at individual nodes to illustrate the ideas developed above. For a node ii, its detection model has two components: the logistic regression 11+eϕTx\frac{1}{1+e^{-\phi^{T}x}}, where ϕ\phi is the weight vector of the logistic regression and xx the instance propagated to ii, and a detection threshold θi\theta_{i} (which is the parameter the defender will optimize). An instance is classified as benign if 11+eϕTxθi\frac{1}{1+e^{-\phi^{T}x}}\leq\theta_{i}. Thus (slightly abusing notation as before), θi(x)0\theta_{i}(x)\neq 0 (xx is classified as malicious) if 11+eϕTxθi\frac{1}{1+e^{-\phi^{T}x}}\geq\theta_{i}.

With the specific forms of the detection models we can derive 𝟙[θj(x)=0]Θ\frac{\partial\mathbbm{1}[\theta_{j}(x)=0]}{\partial\Theta} and 𝐜lΘl\frac{\partial\mathbf{c}_{l}}{\partial\Theta_{l}} (omitting the node index ss or jj for clarity). A technical challenge is that the indicator function 𝟙()\mathbbm{1}(\cdot) is not continuous or differentiable, which means that it’s difficult to characterize its derivative with respect to Θ\Theta. However, observe that for logistic regression θj(x)=0\theta_{j}(x)=0 (11+eϕTxθj)\big{(}\frac{1}{1+e^{-{\phi}^{T}x}}\leq\theta_{j}\big{)} is equivalent to log(θj1θj)ϕTx\log\big{(}\frac{\theta_{j}}{1-\theta_{j}}\big{)}\geq\phi^{T}x. Therefore we use log(θj1θj)ϕTx\log\big{(}\frac{\theta_{j}}{1-\theta_{j}}\big{)}-\phi^{T}x as a surrogate function for 𝟙[]\mathbbm{1}[\cdot]. Then 𝟙[θj(x)=0]Θ\frac{\partial\mathbbm{1}[\theta_{j}(x)=0]}{\partial\Theta} is a NN-dimension vector with the jjth element equal to 1θjθj2\frac{1}{\theta_{j}-\theta^{2}_{j}}. The 𝐜l\mathbf{c}_{l} vector then becomes:

𝐜l=[log(θl11θl1)ϕTxlog(θl21θl2)ϕTx]\mathbf{c}_{l}=\\ \begin{bmatrix}\log\big{(}\frac{\theta_{l_{1}}}{1-\theta_{l_{1}}}\big{)}-\phi^{T}x\\ \log\big{(}\frac{\theta_{l_{2}}}{1-\theta_{l_{2}}}\big{)}-\phi^{T}x\\ \vdots\end{bmatrix} (17)

and 𝐜lΘl\frac{\partial\mathbf{c}_{l}}{\partial\Theta_{l}} becomes a Nl×NlN_{l}\times N_{l} diagonal matrix:

𝐜lΘl=[1θl1θl121θNlθNl2]\frac{\partial\mathbf{c}_{l}}{\partial\Theta_{l}}=\\ \begin{bmatrix}\frac{1}{{\theta}_{l_{1}}-{\theta}^{2}_{l_{1}}}&&\\ &\ddots&\\ &&\frac{1}{{\theta}_{N_{l}}-{\theta}^{2}_{N_{l}}}\end{bmatrix} (18)

With equations (LABEL:eq:dF_dTheta)-(16), 𝐜lΘl\frac{\partial\mathbf{c}_{l}}{\partial\Theta_{l}} and 𝟙[θj(x)=0]Θ\frac{\partial\mathbbm{1}[\theta_{j}(x)=0]}{\partial\Theta}, we can now calculate ΘF^d\nabla_{\Theta}\hat{F}_{d}. Since the thresholds θi[0,1]\theta_{i}\in[0,1], the defender’s action space is [0,1]N[0,1]^{N}. When updating Θ\Theta by (11) we therefore project it back to [0,1]N[0,1]^{N} in each iteration.

4. Experiments

In this section we experimentally evaluate our proposed approach. We used the Spam dataset Lichman:2013 from UCI machine learning repository as the training dataset for the logistic regression model. The Spam dataset contains 4601 emails, where each email is represented by a 57-dimension feature vector. We divided the dataset into three disjoint subsets: DD^{\prime} used to learn the logistic regression (tuning the weight vectors with thresholds setting to 0.50.5) as well as other models to which we compare, DtrainD_{\text{train}} used in Algorithm 3 to find the optimal defense strategy, and DtestD_{\text{test}} to test the performance of the defense strategy. The sizes of DD^{\prime}, DtrainD_{\text{train}}, and DtestD_{\text{test}} are 3681, 460, and 460, respectively. They are all randomly sampled from DD.

Our experiments were conducted on two synthetic networks with 64 nodes: Barabasi-Albert preferential attachment networks (BA) barabasi1999emergence and Watts-Strogatz networks (Small-World)  watts1998collective . BA is characterized by its power-law degree distribution, where connectivity is heavily skewed towards high-degree nodes. The power-law degree distribution, P(k)krP(k)\sim k^{-r}, gives the probability that a randomly selected node has kk neighbors. The degree distributions of many real-world social networks have previously been shown to be reasonably approximated by the power-law distribution with r[2.1,2.4]r\in[2.1,2.4] barabasi2002evolution . Our experiments for BA were conducted across two sets of parameters: r=2.1r=2.1 and r=2.3r=2.3.

The Small-World topology is well-known for balancing shortest path distance between pairs of nodes and local clustering in a way as to qualitatively resemble real networks ugander2011anatomy . In our experiments we consider two kinds of Small-World networks. The first has average length of shortest path equal to 5.95.9 and local clustering coefficient equal to 0.1440.144. In this case the local clustering coefficient is close to what had been observed in large-scale Facebook friendship networks ugander2011anatomy . The second one has average shortest path length of 55 and local clustering coefficient of 0.080.08, where the local clustering coefficient is close to that for the electric power grid of the western United States watts1998collective .

Our node-level detectors use logistic regression, with our algorithm producing the threshold for these. The trade-off parameter α\alpha was set to 0.50.5 and the time window TT was set to 11. We applied standard pre-processing techniques to transform each feature to lie between zero and one. The attacker’s budget is measured by squared l2l_{2} norm and the budget limit ϵ\epsilon is varied from 0.0010.001 to 0.010.01. We compare our strategy with three others based on traditional approaches: Baseline, Re-training, and Personalized-single-threshold; we describe these next.

Baseline: This is the typical approach which simply learns a logistic regression on training data, sets all thresholds to 0.50.5, and deploys this model at all nodes.

Re-training: The idea of re-training, common in adversarial classification, is to iteratively augment the original training data with attacked instances, re-training the logistic regression each time, until convergence barreno2006can ; li2016general . The logistic regressions deployed at the nodes are homogeneous, with all thresholds being 0.50.5.

Personalized-single-threshold: This strategy is only allowed to tune a single agent’s threshold. It has access to DtrainD_{\text{train}} that includes unattacked emails. The strategy iterates throught each node ii and finds its optimal threshold. The optimality is measured by the defender’s utility as defined in (2), where the expected influence of an instance is estimated by simulating 1000 propagations started from ii. Then the strategy picks the node with largest utility and sets its optimal threshold.

As stated earlier, network topologies and parameter vectors associated with edges are assumed to be known by both the defender and the attacker. The attacker has full knowledge about the defense strategy, including the weight vectors of logistic regressions as well as their thresholds. As in the definition of Stackelberg game, the evaluation procedure lets the defender first choose its strategy Θ\Theta^{\ast}, and then the attacker computes its best response, which chooses the initial node for the attack ss and transformations of malicious content zz aimed at evading the classifier. Finally the defender’s utility is calculated by (2), where the expected influence is estimated by simulating 1000 propagations originating from ss for each malicious instance zz.

The experimental results for BA (r=2.1r=2.1) and Small-World (average length of shortest path=5.95.9 and local clustering coefficient=0.1440.144) are shown in Figure 4, and the experimental results for BA (r=2.3r=2.3) and Small-World (average length of shortest path=55 and local clustering coefficient=0.080.08) are shown in Figure 5.

Refer to caption Refer to caption
Figure 4. The performance of each defense strategy. Each bar is averaged over 10 random topologies. Left: BA. Right: Small-world)
Refer to caption Refer to caption
Figure 5. The performance of each defense strategy. Each bar is averaged over 10 random topologies. Left: BA. Right: Small-world)

As we can observe from the experiments, our algorithm outperforms all of the alternatives in nearly every instance; the sole exception is when the attacker budget is 0.0010.001, which effectively eliminates the adversarial component from learning. For larger budgets, our algorithm remarkably robust even as other algorithms perform quite poorly, so that when ϵ=0.01\epsilon=0.01, there is a rather dramatic gap between our approach and all alternatives. Not surprisingly, the most dramatic differences can be observed in the BA topology: with the large variance in the degree distribution of different nodes, our heterogeneous detection is particularly valuable in this setting. In contrast, the degradation of the other methods on Small-World topologies is not quite as dramatic, although the improvement offered by the proposed approach is still quite pronounced. Among the alternatives, it is also revealing that personalizing thresholds results in second-best performance: again, takng account of network topology is crucial; somewhat surprisingly, it often outperforms re-training, which explicitly accounts for adversarial evasion, but not network topology.

5. Conclusion

We address the problem of adversarial detection of malicious content spreading through social networks. Traditional approaches use with a homogeneous detector or a personalized filtering approach. Both ignore (and thus fail to exploit knowledge of) the network topology, and most filtering approaches in prior literature ignore the presence of an adversary. We present a combination of modeling and algorithmic advances to systematically address this problem. On the modeling side, we extend diffusion modeling to allow for dependence on the content propagating through the network, model the attacker as choosing both the malicious content, and initial target on the social network, and allow the defender to choose heterogeneous detectors over the network to block malicious content while allowing benign diffusion. On the algorithmic side, we solve the resulting Stackelberg game by first representing it as a bilevel program, then collapsing this program into a single-level program by exploiting the problem structure and applying KKT conditions, and finally deriving a projected gradient descent algorithm using explicit and implicit gradient information. Our experiments show that our approach dramatically outperforms, homogeneous classification, adversarial learning, and heterogeneous but non-adversarial alternatives.

6. Acknowledgements

This work was supported, in part by the National Science Foundation (CNS-1640624, IIS-1649972, and IIS-1526860), Office of Naval Research (N00014-15-1-2621) and Army Research Office (W911NF-16-1-0069).

References

  • [1] H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017.
  • [2] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
  • [3] A.-L. Barabâsi, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaborations. Physica A: Statistical mechanics and its applications, 311(3):590–614, 2002.
  • [4] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16–25. ACM, 2006.
  • [5] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [6] M. Brückner and T. Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD, pages 547–555. ACM, 2011.
  • [7] C. Budak, D. Agrawal, and A. El Abbadi. Limiting the spread of misinformation in social networks. In Proceedings of the 20th international conference on World wide web, pages 665–674. ACM, 2011.
  • [8] C. J. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121–167, 1998.
  • [9] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec. Antisocial behavior in online discussion communities. In International Conference on Weblogs and Social Media, pages 61–70, 2015.
  • [10] B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization. Annals of operations research, 153(1):235–256, 2007.
  • [11] G. V. Cormack et al. Email spam filtering: A systematic review. Foundations and Trends® in Information Retrieval, 1(4):335–455, 2008.
  • [12] N. Dalvi, P. Domingos, S. Sanghai, D. Verma, et al. Adversarial classification. In Proceedings of the tenth ACM SIGKDD, pages 99–108. ACM, 2004.
  • [13] N. Du, L. Song, M. G. Rodriguez, and H. Zha. Scalable influence estimation in continuous-time diffusion networks. In Advances in neural information processing systems, pages 3147–3155, 2013.
  • [14] N. Du, L. Song, H. Woo, and H. Zha. Uncover topic-sensitive information diffusion networks. In Artificial Intelligence and Statistics, pages 229–237, 2013.
  • [15] N. Du, L. Song, M. Yuan, and A. J. Smola. Learning networks of heterogeneous influence. In Advances in Neural Information Processing Systems, pages 2780–2788, 2012.
  • [16] M. Gomez-Rodriguez and B. Schölkopf. Influence maximization in continuous time diffusion networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 579–586. Omnipress, 2012.
  • [17] J. Holcomb, J. Gottfried, and A. Mitchell. News use across social media platforms. Pew Research Journalism Project, 2013.
  • [18] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD, pages 137–146. ACM, 2003.
  • [19] A. Laszka, Y. Vorobeychik, and X. Koutsoukos. Optimal personalized filtering against spear-phishing attacks. In AAAI Conference on Artificial Intelligence, 2015.
  • [20] B. Li and Y. Vorobeychik. Feature cross-substitution in adversarial classification. In Advances in neural information processing systems, pages 2087–2095, 2014.
  • [21] B. Li, Y. Vorobeychik, and X. Chen. A general retraining framework for scalable adversarial classification. arXiv preprint arXiv:1604.02606, 2016.
  • [22] M. Lichman. UCI machine learning repository, 2013.
  • [23] D. Lowd and C. Meek. Adversarial learning. In Proceedings of the eleventh ACM SIGKDD, pages 641–647. ACM, 2005.
  • [24] S. Mei and X. Zhu. The security of latent dirichlet allocation. In Artificial Intelligence and Statistics, pages 681–689, 2015.
  • [25] S. Mei and X. Zhu. Using machine teaching to identify optimal training-set attacks on machine learners. In AAAI Conference on Artificial Intelligence, pages 2871–2877, 2015.
  • [26] J. Tsai, T. H. Nguyen, and M. Tambe. Security games for controlling contagion. In AAAI Conference on Artificial Intelligence, 2012.
  • [27] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011.
  • [28] Y. Vorobeychik and J. Letchford. Securing interdependent assets. Autonomous Agents and Multi-Agent Systems, 29(2):305–333, 2015.
  • [29] J. Wallinga and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160(6):509–516, 2004.
  • [30] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393(6684):440–442, 1998.
  • [31] V. A. Zorich and R. Cooke. Mathematical analysis i. 2004.