\setcopyright

ifaamas \acmDOIdoi \acmISBN \acmConference[AAMAS’20]Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), B. An, N. Yorke-Smith, A. El Fallah Seghrouchni, G. Sukthankar (eds.)May 2020Auckland, New Zealand \acmYear2020 \copyrightyear2020 \acmPrice

\affiliation\institution

University of Southern California

\affiliation\institution

Carnegie Mellon University

\affiliation\institution

Carnegie Mellon University

\affiliation\institution

Carnegie Mellon University \affiliation \institutionWorld Wildlife Fund \affiliation \institutionCarnegie Mellon University

Green Security Game with Community Engagement

Taoan Huang taoanhua@usc.edu , Weiran Shen emersonswr@gmail.com , David Zeng dzeng@andrew.cmu.edu , Tianyu Gu tianyug@andrew.cmu.edu , Rohit Singh rsingh@wwf.sg and Fei Fang feif@cs.cmu.edu

Abstract.

While game-theoretic models and algorithms have been developed to combat illegal activities, such as poaching and over-fishing, in green security domains, none of the existing work considers the crucial aspect of community engagement: community members are recruited by law enforcement as informants and can provide valuable tips, e.g., the location of ongoing illegal activities, to assist patrols. We fill this gap and (i) introduce a novel two-stage security game model for community engagement, with a bipartite graph representing the informant-attacker social network and a level- $\kappa$ response model for attackers inspired by cognitive hierarchy; (ii) provide complexity results and exact, approximate, and heuristic algorithms for selecting informants and allocating patrollers against level- $\kappa$ ( $\kappa<\infty$ ) attackers; (iii) provide a novel algorithm to find the optimal defender strategy against level- $\infty$ attackers, which converts the problem of optimizing a parameterized fixed-point to a bi-level optimization problem, where the inner level is just a linear program, and the outer level has only a linear number of variables and a single linear constraint. We also evaluate the algorithms through extensive experiments.

Key words and phrases:

Security Game; Computational Sustainability; Community Engagement

1. Introduction

Despite the significance of protecting natural resources to environmental sustainability, a common lack of funding leads to an extremely low density of law enforcement units (referred to as defenders) to combat illegal activities such as wildlife poaching and overfishing (referred to as attacks). Due to insufficient sanctions, attackers are able to launch frequent attacks Le Gallic and Cox (2006); Leader-Williams and Milner-Gulland (1993), making it even more challenging to effectively detect and deter criminal activities through patrolling. To improve patrol efficiency, law enforcement agencies often recruit informants from local communities and plan defensive resources based on tips provided by them Linkie et al. (2015). Since attackers are often from the same local community and their activities can be observed by informants through social interactions, such tips contain detailed information about ongoing or upcoming criminal activities and, if known by defenders, can directly be used to guide allocating defensive resources. In fact, community engagement is listed by World Wild Fund for Nature as one of the six pillars towards zero poaching WWF (2015). The importance of community engagement goes beyond these green security domains about environment conservation and extends to domains such as fighting urban crimes Tublitz and Lawrence (2014); Gill et al. (2014).

Previous research in computational game theory have led to models and algorithms that can help the defenders allocate limited resources in the presence of attackers, with applications to enforce traffic Rosenfeld and Kraus (2017), combat oil-siphoning Wang et al. (2018), and deceive cyber adversaries Schlenker et al. (2018) in addition to protecting critical infrastructure Pita et al. (2008) and combating wildlife crime Fang et al. (2017). However, none of the work has considered this essential element of community engagement.

Community engagement leads to fundamentally new challenges that do not exist in previous literature. First, the defender not only needs to determine how to patrol but also needs to decide whom to recruit as informants. Second, there can be multiple attackers, and the existence of informants makes the success or failure of their attacks interdependent since any tip about other attackers’ actions can change the defender’s patrol. Third, because of the combinatorial nature of the tips, representing the defender’s strategy requires exponential space, making the problem of finding optimal defender strategy extremely challenging. Fourth, attackers may notice the patrol pattern over time and adapt their strategies accordingly.

In this paper, we provide the first study to fill the gap and provide a novel two-stage security game model for community engagement which represents the social network between potential informants and attackers with a bipartite graph. In the first stage of the game, the defender recruits a set of informants under a budget constraint, and in the second stage, the defender chooses a set of targets to protect based on tips from recruited informants. Inspired by the quantal cognitive hierarchy model Wright and Leyton-Brown (2014), we use a level- $\kappa$ response model for attackers, taking into account the fact that the attacker can make iterative reasoning and the attacker’s strategy will impact the actual marginal strategy of the defender.

Our second contribution includes complexity results and algorithms for computing optimal defender strategy against level- $\kappa$ ( $\kappa<\infty$ ) attackers. We show that the problem of selecting the optimal set of informants is NP-Hard. Further, based on sampling techniques, we develop an approximation algorithm to compute the optimal patrol strategy and a heuristic algorithm to find the optimal set of informants to recruit. For an expository purpose, we mainly describe the algorithms for level-0 attackers and provide the extension to level- $\kappa$ ( $0<\kappa<\infty$ ) attackers in the last section.

The third contribution is a novel algorithm to find the optimal defender strategy against level- $\infty$ attackers, which is an extremely challenging task: an attacker’s strategy may affect the defender’s marginal strategy, which in turn affects the attackers’ strategies and level- $\infty$ attackers is defined through a fixed-point argument; as a result, the defender’s utility relies crucially on solving a parameterized fixed-point problem. A naïve mathematical programming-based formulation is prohibitively large to solve. We instead reduce the program to a bi-level optimization problem, where both levels become more tractable. In particular, the inner level optimization is a linear program, and the outer level optimization is one with a linear number of variables and a single linear constraint.

Finally, we conduct extensive experiments. We compare the running time and solution quality of different algorithms. We show that our bi-level optimization algorithm achieves better performance than the algorithm adapted from previous works. We also compare level-0 attackers and the case with insider threat (i.e., the attacker is aware of the informants), where we formulate the problem as a mathematical program and solve it by adapting an algorithm from previous works. We show that the defender suffers from utility loss if the insider threat is not taken into consideration and the defender still assumes a naïve attacker model (level-0).

2. Related Work and Background

Community engagement is studied in criminology. Smith and Humphreys (2015); Moreto (2015); Duffy et al. (2015) investigate the role of community engagement in wildlife conservation. Linkie et al. (2015); Gill et al. (2014) show the positive effects of community-oriented strategies. However, they lack a mathematical model for strategic defender-attacker interactions.

Recruitment of informants has also been proposed to study societal attitudes in relation to crimes using evolutionary game theory models. Short et al. (2013) formulate the problem of solving recruitment strategies as an optimal control problem to account for limited resources and budget. In contrast to their work, we emphasize the synergy of community engagement and allocation of defensive resources and aim to find the best strategy of recruiting informants and allocating defensive resources.

In security domains, Stackelberg Security Game (SSG) has been applied to a variety of security problems Tambe (2011), with variants accounting for alarm systems, surveillance cameras, and drones that can provide information in real time Basilico et al. (2017); Ma et al. (2018); Guo et al. (2017). Unlike the sensors that provide location-based information as studied in previous works, the kind of tips the informants can provide depends on their social connections, an essential feature about community engagement.

Other than the full rationality model, boundedly rational behavioral models such as quantal response (QR) McKelvey and Palfrey (1995); Yang et al. (2012) and subjective utility quantal response Nguyen et al. (2013) have been explored in the study of SSG. Our model and solution approach are compatible with most existing behavioral models in the SSG literature, but for an expository purpose, we only focus on the QR model.

3. Model

In this section, we introduce our novel two-stage green security game with community engagement. The key addition is the consideration of informants from local communities. They can be recruited and trained by the defender to provide tips about ongoing or upcoming attacks.

Following existing works on SSG Jain et al. (2010); Korzhyk et al. (2011), we consider a game with a set of targets $T=[n]=\{1,\dots,n\}$ . The defender has $r$ units of defensive resources and each can protect or cover one target with no scheduling constraint. An attacker can choose a target to attack. If target $i$ is attacked, the defender (attacker) receives $R^{d}_{i}>0$ ( $P^{a}_{i}<0$ ) if it is covered, otherwise receives $P^{d}_{i}<0$ ( $R^{a}_{i}>0$ ).

Informants recruited by the defender can provide tips regarding the exact targets in ongoing or upcoming attacks but tip frequency and usefulness may vary due to heterogeneity in the informants’ social connections. We model the interactions and connections between potential informants $X$ (i.e., members of the community that are known to be non-attacker and can be recruited by the defender) and potential attackers $Y$ using a bipartite graph $G_{S}=(X,Y,E)$ with $X\cap Y=\emptyset$ . Here we assume the defender has access to a list of potential attackers which could be provided by the conservation site manager, since the deployment of our work relies on the manager’s domain knowledge, experience, and understanding of the social connections among community members.

When an attacker decides to launch an attack, an informant who interacted with the attacker previously may know his target location. Formally, for each $v\in Y$ , we assume that $v$ will attack a target with probability $p_{v}$ but the target is unknown without informants and each attacker takes actions independently. An edge $(u,v)\in E$ is associated with an information sharing intensity $w_{uv}$ , representing the probability of attack activities of attacker $v$ being reported by $u$ , given $v$ attacks and $u$ is recruited as an informant.

In the first stage, the defender recruits $k$ informants, and in the second stage, the defender receives tips from the informants and allocates $r$ units of defensive resources. The defender’s goal is to maximize the expected utility defined as the summation of the utilities for each attack.

Let $U$ denote the set of recruited informants in the first stage where $|U|\leq k$ , and $V=\{v\mid\exists u\in V,(u,v)\in E\}$ denote the set of attackers that are connected with at least one informant in $U$ . We represent tips as a vector of disjoint subsets of attackers $\mathbf{V}=(V_{1},\ldots,V_{n})$ , where $V_{i}$ is the set of attackers who are reported to attack target $i\in T$ such that $V_{i}\subseteq V,V_{i}\cap V_{j}=\emptyset$ for any $i,j\in T$ . An attacker $v$ is reported if there exists $i\in T$ such that $v\in V_{i}$ , otherwise he is unreported. We also denote by $V_{0}=\bigcup_{i\in T}V_{i}$ the set of reported attackers. It is possible that $V_{0}=\emptyset$ and we say the defender is informed if $V_{0}\neq\emptyset$ . Note that $\mathbf{V}$ is a compact representation of the tips received by the defender as it neglects the identity of the informants, which is not crucial in the defender’s decision making given that all the tips are assumed to be correct.

In practice, tips are infrequent and the defender is often very protective of the informants. Thus, the attackers are often not aware of the existence of informants unless there is a significant insider threat. In addition, patrols can be divided into two categories – routine patrols and ambush patrols, where the latter are in response to tips from informants. Ambush patrols are costly, often requiring rangers to lie in wait for many hours for the possibility of catching a poacher. If not informed, the defender follows her routine patrol strategy $\mathbf{x}_{0}=(x_{1},\ldots,x_{n})$ with $x_{i}$ denoting the probability that target $i$ is covered. Naturally, under this assumption the defender should use a strategy $x_{0}$ that is optimal against the QR model, which can be computed by following Yang et al. (2012). If informed she uses different strategies $\mathbf{x}(\mathbf{V})$ based on the tip $\mathbf{V}$ . Assume that each attacker, if deciding to attack a target, will respond to the defender’s strategy following a known behavioral model – the QR model. We define $\mathsf{QR}(\mathbf{x}^{\prime}):=(q^{\prime}_{1},\ldots,q^{\prime}_{n})$ , where $q^{\prime}_{i}$ is the probability of attacking target $i$ defined by

\displaystyle q^{\prime}_{i}=\frac{e^{\lambda\left[x^{\prime}_{i}P_{i}^{a}+(1-x^{\prime}_{i})R_{i}^{a}\right]}}{\sum_{j\in T}e^{\lambda\left[x^{\prime}_{j}P_{j}^{a}+(1-x^{\prime}_{j})R_{j}^{a}\right]}},

(1)

and $\mathbf{x}^{\prime}$ is the attacker’s subjective belief of the coverage probabilities. In the above equation, $\lambda\geq 0$ is the precision parameter McKelvey and Palfrey (1995) fixed throughout the paper. We discuss the relaxation of the some of the assumptions mentioned above in Section 8.

3.1. Level- $\kappa$ Response Model

Motivated by the costly ambush patrols and inspired by the cognitive hierarchy theory, we propose the level- $\kappa$ response model as the attackers’ behavior model.

When the informants’ report intensities are negligible, the attackers are almost always faced with the routine patrol $\mathbf{x}_{0}$ . But when the informants’ report intensities are not negligible, the attackers’ behavior will change the marginal probability that a target is covered. Thus we assume that level-0 attackers just play the quantal response against the routine patrol $\mathbf{x}_{0}$ : $\mathbf{q}^{0}=\mathsf{QR}(\mathbf{x}_{0})$ . Then the defender will likely get informed with different tips $\mathbf{V}$ , and respond with $\mathbf{x}(\mathbf{V})$ accordingly. Over time, the attackers will learn about the change in the frequency that a target is covered. We denote the induced defender’s marginal strategy at level 0 by $\hat{\mathbf{x}}^{0}=\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{0})$ . After observing $\hat{\mathbf{x}}^{0}$ at level 0, level-1 attackers will update their strategies from $\mathbf{q}^{0}$ to $\mathbf{q}^{1}=\mathsf{QR}(\hat{\mathbf{x}}^{0})$ . Similarly, attackers at level $\kappa$ ( $0<\kappa<\infty$ ) will use quantal response against the defender’s marginal strategy at level $\kappa-1$ , i.e., $\mathbf{q}^{\kappa}=\mathsf{QR}(\mathbf{\hat{x}}^{\kappa-1})$ , where $\hat{\mathbf{x}}^{\kappa-1}=\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{\kappa-1})$ . In Section 5, we also define level- $\infty$ attackers.

Denote by $\mathsf{DefEU}(U)$ the defender’s optimal utility when they recruit a set of informants $U$ and use the optimal defending strategy. The key questions raised given this model are (i) how to recruit a set $U$ of at most $k$ informants and (ii) how to respond to the provided tips to maximize the expected $\mathsf{DefEU}(U)$ ?

4. Defending against Level-0 Attackers

In this section, we first tackle the case where all attackers are level-0 by providing complexity results and algorithms to find the optimal set of informants. Designing efficient algorithms to solve this computationally hard problem is particularly challenging due to the combinatorial nature of the tips and exponentially many possibilities of informant selections. Furthermore, in the general case, attackers are heterogeneous and we do not know which attackers will be reported, making it hard to compute $\mathsf{DefEU}(U)$ .

4.1. Complexity Results

Let $\mathbf{q}^{0}=(q_{1},\ldots,q_{n}).$ Before presenting our complexity results, we first define some useful notations. Given the set of informants $U$ and the tips $\mathbf{V}=(V_{1},\ldots,V_{n})$ , we denote by $\tilde{p}_{v}(V_{0})$ the probability of $v\in Y$ attacking a target given $V_{0}$ such that $V_{0}=\bigcup_{i\in T}V_{i}$ . We can compute $\tilde{p}_{v}(V_{0})$ with

\displaystyle\tilde{p}_{v}(V_{0})=\begin{cases}1&v\in V_{0}\\ \frac{(1-\tilde{w}_{v})p_{v}}{(1-\tilde{w}_{v})p_{v}+1-p_{v}}&v\in V\setminus V_{0}\\ p_{v}&v\in Y\setminus V\end{cases},

where $\tilde{w}_{v}=1-\prod_{(u,v)\in E,u\in U}(1-w_{uv})$ is the probability of $v$ being reported given he attacks. Given $V_{0}$ and $t_{i}=|V_{i}|$ reported attacks on each target $i$ , we compute the expected utility on $i$ if $i$ is covered with $\mathsf{EU}_{i}^{c}(t_{i},V_{0}):=\left(t_{i}+q_{i}\sum_{v\in Y\setminus V_{0}}\tilde{p}_{v}(V_{0})\right)R_{i}^{d}.$ We compute the expected utility if $i$ is uncovered, $\mathsf{EU}_{i}^{u}(t_{i},V_{0})$ , similarly. Then, the expected gain of the target if covered can be written as $\mathsf{EG}_{i}(t_{i},V_{0}):=\mathsf{EU}_{i}^{c}(t_{i},V_{0})-\mathsf{EU}_{i}^{u}(t_{i},V_{0})$ .

{theorem}

[] When the defender is informed by informants $U$ , the optimal allocation of defensive resources can be determined in $O(|Y|+n)$ time given the tips $\mathbf{V}=(V_{1},\ldots,V_{n})$ . Given tips from recruited informants, the defender can find the optimal resource allocation by greedily protecting the targets with the highest expected gains. The proof of Theorem 4.1 is deferred to Appendix B. However, the problem of computing the optimal set of informants is still hard.

{theorem}

[] Computing the optimal set of informants to recruit is NP-Hard. The proof of Theorem 4.1 in Appendix C focuses on a relatively simple case and constructs a reduction from the maximum coverage problem ( $\mathsf{MCP}$ ).

4.2. Finding the Optimal Set of Informants

In this subsection, we develop exact and heuristic informant selection algorithms to compute the optimal set of informants. To find the $U$ that maximizes $\mathsf{DefEU}(U)$ , we first focus on computing $\mathsf{DefEU}(U)$ by providing a dynamic programming-based algorithm and approximate algorithms.

4.2.1. Calculating $\mathsf{DefEU}(U)$

Let $\mathsf{DefEU}_{0}$ be the expected utility when using the optimal regular defending strategy against a single attack, which can be obtained by the algorithms introduced in Yang et al. (2012). Then $\mathsf{DefEU}(U)$ can be explicitly written as

		$\displaystyle\mathsf{DefEU}(U)=\Pr[V_{0}=\emptyset]\mathsf{DefEU}_{0}$
	$\displaystyle+$	$\displaystyle\Pr[V_{0}\neq\emptyset]\mathsf{E}\left[\sum_{i\in[n]}\mathbf{x}_{i}(V)\mathsf{EG}_{i}(t_{i},V_{0})+\mathsf{EU}_{i}(t_{i},V_{0})\Big{\|}V_{0}\neq\emptyset\right].$

To directly compute $\mathsf{DefEU}(U)$ from the above equation is formidable due to the exponential number of tips combinations. However, it is possible to reduce a significant amount of enumeration by handling the calculation carefully. We first develop an Enumeration and Dynamic Programming-based Algorithm (EDPA) to compute the exact $\mathsf{DefEU}(U)$ as shown in Algorithm 1.

First, we compute the utility when the defender is not informed (lines 4-6). Then, we focus on calculating the total utility $\mathsf{DefEU}^{\prime}(U)$ in the case when the defender is informed. By the linearity of expectation, $\mathsf{DefEU}^{\prime}(U)$ can be computed as the summation of the expected utility obtained from all targets. Therefore, we focus on the calculation of the expected utility of a single target $i$ . For each target $i$ , Algorithm 1 enumerates all possible types of tips (lines 2-7). We denote each type of tip by a tuple $(t_{i},V_{0})$ , which encodes the set of reported attackers $V_{0}\neq\emptyset$ and the number of reported attackers $t_{i}$ targeting location $i$ . The probability of receiving $(t_{i},V_{0})$ can be written as

\Pr(t_{i},V_{0}|U)=P_{V_{0}}{|V_{0}|\choose t_{i}}q_{i}^{t_{i}}(1-q_{i})^{|V_{0}|-t_{i}},

where

\displaystyle P_{V_{0}}=\prod_{v\in V_{0}}(\tilde{w}_{v}p_{v})\prod_{v\in V\setminus V_{0}}(1-\tilde{w}_{v}p_{v})

(2)

is the probability of having $V_{0}$ being the set of reported attackers given $U$ (line 3). Let $P_{i,r}$ be the probability of $i$ being among the $r$ targets with the highest expected gain given $(t_{i},V_{0})$ and $U$ (lines 12-13). For a given tip type $(t_{i},V_{0})$ , the expected contribution to $\mathsf{DefEU}^{\prime}(U)$ of target $i$ is

	$\displaystyle\Pr(t_{i},V_{0}\|U)\cdot\mathsf{EU}_{i}(t_{i},V_{0})+P_{V_{0}}{\|V_{0}\|\choose t_{i}}q_{i}^{t_{i}}\cdot P_{i,r}\mathsf{EG}_{i}(t_{i},V_{0})$
	$\displaystyle=P_{V_{0}}{\|V_{0}\|\choose t_{i}}q_{i}^{t_{i}}\left((1-q_{i})^{\|V_{0}\|-t_{i}}\mathsf{EU}_{i}(t_{i},V_{0})+P_{i,r}\mathsf{EG}_{i}(t_{i},V_{0})\right).$

We can then compute $\mathsf{DefEU}^{\prime}(U)$ by summing over all possible $t_{i},V_{0}\neq\emptyset$ .

The calculation of $P_{i,r}$ is all that remains. This can be done very efficiently via Algorithm 2, a dynamic programming-based calculation. Let $\{{i_{1}},\ldots,{i_{n-1}}\}$ denote the set of targets apart from $i$ , i.e., $T\setminus\{i\}$ (line 1) and $y!\cdot f(s,x,y)$ be the probability of having $y$ reported attacks among the first $s$ targets with $x$ of the targets having expected gain higher than $\mathsf{EG}_{i}$ given the tips of type $(t_{i},V_{0})$ . Therefore, $f(s,x,y)$ can be neatly written as

\displaystyle f(s,x,y)

\displaystyle=

\displaystyle\sum_{\begin{subarray}{c}a_{1}+\cdots+a_{s}=y,\\ \sum_{j=1}^{s}\mathbf{1}_{[\mathsf{EG}_{i_{j}}(a_{j},V_{0})>\mathsf{EG}_{i}(t_{i},V_{0})]}=x\end{subarray}}\frac{q_{i_{1}}^{a_{1}}q_{i_{2}}^{a_{2}}\cdots q_{i_{s}}^{a_{s}}}{a_{1}!a_{2}!\cdots a_{s}!},

which can be calculated using dynamic programming (line 5-11). Computing $f(s,x,y)$ is done in a similar way by counting the number of $s$ -partitions on integer $y$ , where we also consider the constraint brought in by the limitation on the number of resources. To calculate $f(s,x,y)$ , we enumerate $a_{s}$ as $\tilde{y}$ (line 6) and compare $\mathsf{EG}_{i_{s}}(a_{s},V_{0})$ with $\mathsf{EG}_{i}(t_{i},V_{0})$ (line 8). If $\mathsf{EG}_{i_{s}}(a_{s},V_{0})>\mathsf{EG}_{i}(t_{i},V_{0})$ , we check the value of $f(s-1,x-1,y-\tilde{y})$ (line 9), otherwise check $f(s-1,x,y-\tilde{y})$ (line 11). Thus, we have $P_{i,r}=(|V_{0}|-t_{i})!\left(\sum_{x=0}^{r-1}f(s,x,|V_{0}|-t_{i})\right).$ The time complexity for Algorithm 2 is $O(nr|Y|^{2})$ and $O(2^{|Y|}n^{2}r|Y|^{3})$ for Algorithm 1.

Algorithm 1 Calculate

\mathsf{DefEU}(U)

\mathsf{EU}\leftarrow 0

2:for all possible sets of reported attackers

V_{0}\subseteq V

P_{V_{0}}\leftarrow\prod_{v\in V_{0}}(\tilde{w}_{v}p_{v})\prod_{v\in V\setminus V_{0}}(1-\tilde{w}_{v}p_{v})

4: if

V_{0}=\emptyset

then

\mathsf{EU}=\mathsf{EU}+P_{V_{0}}\sum_{v\in Y}\tilde{p}_{v}(V_{0})\mathsf{DefEU}_{0}

6: Continue to line 2

7: for target

i\in T

and

0\leq t_{i}\leq|V_{0}|

8: Calculate

f(\cdot)

given

|V_{0}|,i,t_{i}

\mathsf{EG}_{i}\leftarrow(t_{i}+q_{i}\sum_{v\in Y\setminus V_{0}}\tilde{p}_{v}(V_{0}))(R_{i}^{d}-P_{i}^{d})

10:

\mathsf{EU}_{i}^{u}\leftarrow(t_{i}+q_{i}\sum_{v\in Y\setminus V_{0}}\tilde{p}_{v}(V_{0}))P_{i}^{d}

11:

P_{i,r}\leftarrow(|V_{0}|-t_{i})!\left(\sum_{x=0}^{r-1}f(s,x,|V_{0}|-t_{i})\right)

12:

\mathsf{EU}=\mathsf{EU}+P_{V_{0}}{|V_{0}|\choose t_{i}}q_{i}^{t_{i}}\cdot P_{i,r}\cdot\mathsf{EG}_{i}

13:

\mathsf{EU}=\mathsf{EU}+P_{V_{0}}{|V_{0}|\choose t_{i}}q_{i}^{t_{i}}(1-q_{i}^{t_{i}})\mathsf{EU}_{i}^{u}

14:

\mathsf{DefEU}(U)\leftarrow\mathsf{EU}

Algorithm 2 Calculate

f(\cdot)

given

|V_{0}|,i,t_{i}

\{{i_{1}},\ldots,{i_{n-1}}\}\leftarrow T\setminus\{i\}

\mathsf{EG}_{i}\leftarrow(t_{i}+q_{i}\sum_{v\in Y\setminus V_{0}}\tilde{p}_{v}(V_{0}))(R_{i}^{d}-P_{i}^{d})

3:Initialize

f(s,x,y)\leftarrow 0

for all

s,x,y

f(0,0,0)\leftarrow 1

5:for

s

[1,n-1]

x

[0,\min(s,r)]

y

[0,|V_{0}|-t_{i}]

6: for

\tilde{y}

[0,y]

\mathsf{EG}_{i_{s}}\leftarrow(\tilde{y}+q_{i_{s}}\sum_{v\in Y\setminus V_{0}}\tilde{p}_{v}(V_{0}))(R_{i_{s}}^{d}-P_{i_{s}}^{d})

8: if

\mathsf{EG}_{i_{s}}>\mathsf{EG}_{i}

then

f(s,x,y)\mathrel{+}=\frac{q_{i_{s}}^{\tilde{y}}}{\tilde{y}!}f(s-1,x-1,y-\tilde{y})

10: else

11:

f(s,x,y)\mathrel{+}=\frac{q_{i_{s}}^{\tilde{y}}}{\tilde{y}!}f(s-1,x,y-\tilde{y})

Since EDPA runs in exponential time, we introduce approximation methods to estimate $\mathsf{DefEU}(U)$ . Let $\mathsf{DefEU}(U,C)$ be the estimated defender’s utility returned by Algorithm 1 if only subsets of reported attackers $V_{0}$ with $|V_{0}|<C$ are enumerated in line 2. We denote by C-Truncated this approach of estimating $\mathsf{DefEU}(U)$ . Next, we show that $\mathsf{DefEU}(U,C)$ is close to the exact $\mathsf{DefEU}(U)$ when it is unlikely to have many attacks happening at the same time. Formally, assume that the expected number of attacks is bounded by a constant $C^{\prime}$ , that is $\sum_{v\in Y}p_{v}\leq C^{\prime}$ , $\mathsf{DefEU}(U,C)$ for $C>C^{\prime}$ is an estimation of $\mathsf{DefEU}(U)$ with bounded error.

Lemma \thetheorem.

Assume that $\sum_{v\in Y}p_{v}\leq C^{\prime}$ and $|P_{i}^{d}|,|R_{i}^{d}|\leq Q$ , the error of estimation $|{\mathsf{DefEU}}(U,C)-\mathsf{DefEU}(U)|$ for $C>C^{\prime}$ is at most:

\displaystyle Q\cdot e^{-2(C-C^{\prime})^{2}/|Y|}\left(C+\frac{1}{1-e^{-4(C-C^{\prime})/|Y|}}\right).

The proof of Lemma 4.2.1 is deferred to Appendix G. The time complexity of C-Truncated is given by $O(n^{2}r|Y|^{C+3})$ .

However, for the case where $\sum_{v\in Y}p_{v}$ is large, we have to set $C$ to be larger than $\sum_{v\in Y}p_{v}$ for C-Truncated in order to obtain a high-quality solution; otherwise the error will become unbounded. To mitigate this limitation, we also propose an alternative sampling approach, T-Sampling, to estimate $\mathsf{DefEU}(U)$ for general cases without restrictions on $\sum_{p_{v}}$ . Instead of enumerating all possible $V_{0}$ as EDPA does, in T-Sampling, we draw $\mathsf{T}$ i.i.d. samples of the set of reported attackers where each sample $V_{0}$ is drawn with probability $P_{V_{0}}$ . T-Sampling takes the average of the expected defender’s utility when having $V_{0}$ as the reported attackers over all samples as the estimation of $\mathsf{DefEU}(U)$ . We can sample $V_{0}$ as follows: (i) Let $V_{0}=\emptyset$ initially; (ii) For each $v\in V$ , add $v$ to $V_{0}$ with probability $\tilde{w}_{v}p_{v}$ ; (iii) Return $V_{0}$ as a sample of the set of reported attackers. From Equation (2), the above sampling process is consistent with the distribution of $V_{0}$ . T-Sampling returns an estimation of $\mathsf{DefEU}(U)$ in $O(\mathsf{T}n^{2}r|Y|^{3})$ time.

Proposition \thetheorem

Let $\mathsf{DefEU}^{(\mathsf{T})}(U)$ be the estimation of $\mathsf{DefEU}(U)$ given by T-Sampling using $\mathsf{T}$ samples. We have: $\lim_{\mathsf{T}\rightarrow\infty}\mathsf{DefEU}^{(\mathsf{T})}(U)=\mathsf{DefEU}(U)$

4.2.2. Selecting Informants $U$

Given the algorithms for computing $\mathsf{DefEU}(U)$ , a straightforward way of selecting informants is through enumeration (denoted as Select).

When using C-Truncated as a subroutine to compute $\mathsf{DefEU}(U)$ , the solution quality of the selected set of informants is guaranteed by the following theorem.

{theorem}

Assume that $\sum_{v\in Y}p_{v}\leq C^{\prime}$ and $|P_{i}^{d}|,|R_{i}^{d}|\leq Q$ . Let $U_{\mathsf{OPT}}$ and $U^{\prime}$ be the optimal set of informants and the one chosen by C-Truncated. Then for $C>C^{\prime}$ , the error $|\mathsf{DefEU}(U_{\mathsf{OPT}})-\mathsf{DefEU}(U^{\prime})|$ can be bounded by:

\displaystyle 2Q\cdot e^{-2(C-C^{\prime})^{2}/|Y|}\left(C+\frac{1}{1-e^{-4(C-C^{\prime})/|Y|}}\right).

Proposition \thetheorem

Using T-Sampling to estimate $\mathsf{DefEU}$ , the optimal set of informants can be found when $\mathsf{T}\rightarrow\infty$ .

Algorithm 3

\mathsf{Search}(U^{\prime})

1:if

|U^{\prime}|=k

then

2: Update

\mathsf{OPT}

with

(U^{\prime},\mathsf{DefEU}(U^{\prime}))

3: return

u_{1}\leftarrow\arg\max_{u\in X}\mathsf{DefEU}(U^{\prime}\cup\{u\})

u_{2}\leftarrow\arg\max_{u\in X\setminus\{u_{1}\}}\mathsf{DefEU}(U^{\prime}\cup\{u\})

\mathsf{Search}(U^{\prime}\cup\{u_{1}\})

\mathsf{Search}(U^{\prime}\cup\{u_{2}\})

Based on existing results in submodular optimization Nemhauser et al. (1978), one may expect a greedy algorithm that step by step adds an informant that leads to the largest utility to work well. However, the set function $\mathsf{DefEU}(U)$ in our problem violates submodularity (see Appendix F) and such greedy algorithm will not guarantee an approximation ratio of $1-1/e$ . Therefore, we propose GSA (Greedy-based Search Algorithm) for the selection of informants as shown in Algorithm 3. GSA starts by calling $\mathsf{Search}(\emptyset)$ . While $|U^{\prime}|<k$ , $\mathsf{Search}(U^{\prime})$ expands the current set of informants $U^{\prime}$ by adding $u_{1},u_{2}$ to $U^{\prime}$ and recursing, where $u_{1}$ and $u_{2}$ are the two informants that give the largest marginal gain in $\mathsf{DefEU}$ (line 4-5); Otherwise, it updates the optimal solution with $U^{\prime}$ (line 1-3).

We identify a tractable case to conclude the section.

Lemma \thetheorem.

Given the set of recruited informants $U$ , the defender’s expected utility $\mathsf{DefEU}(U)$ can be computed in polynomial time if $w_{uv}=1\,\forall(u,v)\in E$ . When $k$ is a constant, the optimal set of informants can be computed in polynomial time.

This represents the case where the informants have strong connections with a particular group of attackers and can get full access to their attack plans. We refer to the property of $w_{uv}=1$ for all $u,v$ as SISI (Strong Information Sharing Intensity). Denote by ASISI (Algorithm for SISI) the polynomial-time algorithm in Lemma 4.2.2. We provide more details about the SISI case in Appendix D and E.

We summarize the time complexity of all algorithms for computing the optimal $U$ in Table 1 in the Appendix.

5. Defending Against Level- $\infty$ Attackers

As discussed in Section 3.1, a level- $\kappa$ attacker may keep adapting to the new marginal strategy formed by his current level of behavior. In this section, we first show in Theorem 5 that there exists a fixed-point strategy for the attacker in our level- $\kappa$ response model, and then use that to define the level- $\infty$ attackers.

We formulate the problem of finding the optimal defender’s strategy for this case as a mathematical program. However, such a program can be too large to solve. We propose a novel technique that reduces the program to a bi-level optimization problem, with both levels much more tractable.

{theorem}

[] Let $\Delta_{n}=\{\mathbf{q}\mid\mathbf{q}\in[0,1]^{n},\mathbf{1}^{\mathsf{T}}\mathbf{q}\leq 1\}$ . Given defender’s strategies $\mathbf{x}_{0}$ and $\mathbf{x}(\mathbf{V})$ , there exists $\mathbf{q}^{*}\in\Delta_{n}$ such that $\mathbf{q}^{*}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{*}))$ .

Proof.

Since $\Delta_{n}$ is a compact convex set and $\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{*}))$ is a continuous function of $\mathbf{q}$ , by Brouwer fixed-point theorem, there exists $\mathbf{q}^{*}\in\Delta_{n}$ such that $\mathbf{q}^{*}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{*}))$ . ∎

According to the definition of level- $\kappa$ attackers, we have $\mathbf{q}^{\kappa+1}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{\kappa}))$ . Slightly generalizing the definition, we define a level- $\infty$ attacker as:

Definition \thetheorem (level- $\infty$ attacker).

Given the defender’s strategies $\mathbf{x}_{0}$ and $\mathbf{x}(\mathbf{V})$ , the strategy $\mathbf{q}$ of a level- $\infty$ attacker satisfies $\mathbf{q}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}))$ .

Remark \thetheorem.

Note that Definition 5 is not obtained by taking the limit of the level- $\kappa$ definition, since such a limit may not even exist (see Example H in Appendix H).

Remark \thetheorem.

Although the level- $\infty$ attacker is defined through a fixed point argument, we still stick to the Stackelberg assumption: the defender leads and the attacker follows. Notice that in the equation $\mathbf{q}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}))$ , $\mathbf{q}$ will only be defined after the defender commits to strategies $\mathbf{x}_{0}$ and $\mathbf{x}$ . However, it is different from the standard Strong Stackelberg Equilibrium Korzhyk et al. (2011) in that the attacker is following a level- $\infty$ response model, as defined by the fixed point equation.

Also, as we will discuss in Section 7.1.3 on our experiments, when $r=n$ , the defender’s optimal strategy is not to use up all the available resources. This is clearly different from a Nash equilibrium, as the defender still has incentives to use more resources.

5.1. Convergence Condition for the Level- $\kappa$ Response Model

We focus on the single-attacker case, where there are only $n$ different types of tips. We use $\mathbf{V}_{i}$ to denote the tips where the attacker is reported to attack target $i$ . When the attacker is using strategy $q$ , the probability of receiving $\mathbf{V}_{i}$ is $\Pr\{\mathbf{V}_{i}\}=wq_{i}$ .

{theorem}

Let $\bar{x}_{i}=\max_{j}\{x_{i}(\mathbf{V}_{j})\}$ . In the single attacker case, if there exists constant $L\in[0,1)$ , such that $\bar{x}_{i}\leq\frac{L}{n\lambda(R^{a}_{i}-P^{a}_{i})},\forall i$ , then level- $\kappa$ agents converge to level- $\infty$ agents as $\kappa$ approaches infinity.

The proof of Theorem 5.1 is omitted since it is immediate from the following lemma:

Lemma \thetheorem.

In the single attacker case, if there exists constant $L\in[0,1)$ , such that $\bar{x}_{i}\leq\frac{L}{n\lambda(R^{a}_{i}-P^{a}_{i})}$ for all $i$ , then $g(\mathbf{q})$ is $L$ -Lipschitz with respect to the $L^{1}$ -norm, i.e., $g(\mathbf{q})$ is a contraction.

The proof of Lemma 5.1 is deferred to Appendix I.

Corollary \thetheorem

In the single attacker case, if there exists a constant $L\in[0,1)$ , such that $\frac{L}{n\lambda(R^{a}_{i}-P^{a}_{i})}>1,\forall i$ , then level- $\kappa$ agents converge to level- $\infty$ agents as $\kappa$ goes to infinity.

5.2. A Bi-Level Optimization for Solving the Optimal Defender’s Strategy

In this section, we still consider the single attacker case and assume the defender has $r\geq 1$ resources. Clearly, the optimal set of informants should contain the ones with the highest information sharing intensities. It remains to compute the optimal strategies $\mathbf{x}_{0}$ and $\mathbf{x}(\mathbf{V})$ . Given the optimal set of informants $U^{*}$ , the probability of receiving a tip is $w=1-\prod_{u\in U^{*}}(1-w_{u1})$ . Let $\Pr\{\mathbf{V}\}$ be the probability of receiving tips $\mathbf{V}$ , which depends $\mathbf{q}$ . Let $\mathbf{x}(\mathbf{V})=(x_{1}(\mathbf{V}),\ldots,x_{n}(\mathbf{V}))$ be the defender strategy when receiving tips $\mathbf{V}$ .

Let $\mathbf{q}=(q_{1},\ldots,q_{n})$ be the strategy of the level- $\infty$ attacker. Given $\mathbf{V}$ and the corresponding $t_{i}$ ’s, the expected number of attackers that are going to attack target $i$ is $d_{i}=t_{i}+(1-\sum_{j}t_{j})\tilde{p}_{v}(\emptyset)q_{i}$ . Therefore, given $\hat{\mathbf{x}}$ we have the defender’s expected utility $\mathsf{DefEU}(\mathbf{x}_{0},\mathbf{x})$ as

\displaystyle\mathsf{DefEU}(\mathbf{x}_{0},\mathbf{x})=\textstyle\sum_{\mathbf{V},i}\Pr\{\mathbf{V}\}d_{i}\left[P_{i}^{d}+x_{i}(\mathbf{V})\left(R_{i}^{d}-P_{i}^{d}\right)\right].

Then the problem of finding the optimal defender strategy can be formulated as the following mathematical program:

\displaystyle\text{max\quad}\mathsf{DefEU}(\mathbf{x}_{0},\mathbf{x})\quad\text{s.t.\quad}\mathbf{q}=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q})).

In the single-attacker case, we need $n$ and $n^{2}$ variables to represent $\mathbf{x}_{0}$ and $\mathbf{x}$ . We can use the QRI-MILP algorithm¹¹1An algorithm that computes an approximate defender’s optimal strategy against a variant of level-0 attackers who take into account the impact of informants when determining the target they attack. See Appendix J for more details. to find the solution. However, this approach needs to solve a mixed integer program and does not scale well.

To tackle the problem, we focus on the defender’s marginal strategy instead of the full strategy representation, and decompose the above program into a bi-level optimization problem.

Let $\hat{\mathbf{x}}=\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q})=\sum_{\mathbf{V}}\Pr\{\mathbf{V}\}\mathbf{x}(\mathbf{V})$ , where we slightly abuse notation and use $\mathbf{V}=\emptyset$ to denote the case of receiving no tip, $\mathbf{x}(\emptyset)$ to denote $\mathbf{x}_{0}$ . The bi-level optimization method works as follows. At the inner level, we fix an arbitrary feasible $\hat{\mathbf{x}}$ , and solve the following mathematical program:

	max	$\displaystyle\mathsf{DefEU}(\hat{\mathbf{x}})$
	s.t.	$\displaystyle\textstyle\sum_{\mathbf{V}}\Pr\{\mathbf{V}\}\mathbf{x}(\mathbf{V})=\hat{\mathbf{x}},\,\mathbf{q}=\mathsf{QR}(\hat{\mathbf{x}})$
		$\displaystyle\mathbf{1}^{\mathsf{T}}\mathbf{x}(\mathbf{V})\leq r,\mathbf{x}(\mathbf{V})\in[0,1]^{n},\forall\mathbf{V}$

Since $\hat{\mathbf{x}}$ is fixed, $\mathbf{q}$ and $\Pr\{\mathbf{V}\}$ are also fixed. Thus, the program above becomes a linear program, with $\mathbf{x}(\mathbf{V})$ as variables. We can always find a feasible solution to it by simply setting $\mathbf{x}(\mathbf{V})=\hat{\mathbf{x}},\forall\mathbf{V}$ . Solving this linear program gives us the optimal defender’s utility $\mathsf{DefEU}(\hat{\mathbf{x}})$ for any possible $\hat{\mathbf{x}}$ . To find the optimal defender strategy, we solve the outer-level optimization problem below:

\displaystyle\text{max\quad}\mathsf{DefEU}(\hat{\mathbf{x}})\text{\quad s.t.\quad}\hat{\mathbf{x}}\text{ is feasible.}

Since the feasible region of $\hat{\mathbf{x}}$ is continuous, we can use any known algorithm (e.g., gradient descent) to solve the outer-level program. The inner-level linear program still suffers from the scalability problem. However, when there are multiple attackers, the optimal objective value can be well-approximated by simply sampling a subset of possible $\mathbf{V}$ ’s, or focusing only on the $\mathbf{V}$ ’s with the highest probabilities. For those $\mathbf{V}$ ’s that are not considered, we can always use $\mathbf{x}_{0}$ as the default strategy for $\mathbf{x}(\mathbf{V})$ .

6. Defending Against Informant-Aware Attackers

We now consider a variant of our model where attackers take into account the impact of informants when determining the target they attack. Specifically, we assume the attackers follow the QR behavior model but incorporate the probability of being discovered when determining their expected utility for attacking a target.²²2Consider attackers that have had experience playing against the defender. Over time, the attacker might start to consider their expected utility in practice, which is affected by informants. In this setting, the attackers’ subjective belief $\mathbf{x}^{\prime}$ of the target coverage probability does not necessarily satisfy $\sum_{i}x^{\prime}_{i}\leq r$ . Consider the example of a single attacker and a single informant with report intensity 1. Assume that the defender has $r=1$ and always protects the target being reported with probability 1. Then no matter which target the attacker chooses to attack, it will always be covered.

We focus on the single attacker case with $r\geq 1$ . We first consider the problem of computing the optimal defender strategy when given the set of informants $U$ and associated probability of receiving a tip $w=1-\prod_{u\in U}(1-w_{u1})$ . In the general case with multiple attackers, we will need to specify the defender strategy for each combination of tips received. However, when there is only one attacker, we can succinctly describe the defender strategy by their default strategy without tips, $\mathbf{x}$ , and their probability of defending a location after receiving a tip for that location, $\mathbf{z}$ . Then, under the QR adversary model, the probability $q_{i}$ of the attacker targeting location $i$ will be:

\displaystyle q_{i}=\frac{e^{\lambda\left\{\left[(1-w)x_{i}+wz_{i}\right]P^{a}_{i}+\left[1-(1-w)x_{i}-wz_{i}\right]R^{a}_{i}\right\}}}{\sum_{j\in\mathcal{T}}e^{\lambda\left\{\left[(1-w)x_{j}+wz_{j}\right]P^{a}_{j}+\left[1-(1-w)x_{j}-wz_{j}\right]R^{a}_{j}\right\}}}.

This leads to the following optimization problem, QRI, to compute the optimal defender strategy:

$\displaystyle\max_{x,y,a}$	$\displaystyle\quad\frac{\sum_{i\in\mathcal{T}}e^{\lambda R^{a}_{i}}e^{-\lambda(R^{a}_{i}-P^{a}_{i})y_{i}}\left[(R^{d}_{i}-P^{d}_{i})y_{i}+P^{d}_{i}\right]}{\sum_{i\in\mathcal{T}}e^{\lambda R^{a}_{i}}e^{-\lambda(R^{a}_{i}-P^{A}_{i})y_{i}}}$
subject to	$\displaystyle\quad y_{i}=(1-w)x_{i}+wz_{i},\quad\forall i\in\mathcal{T}$	(3)
	$\displaystyle\quad\textstyle\sum_{i\in\mathcal{T}}x_{i}\leq r$	(4)
	$\displaystyle\quad 0\leq x_{i},z_{i}\leq 1,\quad\forall i\in\mathcal{T}$	(5)

We can compute the optimal defender strategy by adapting the approach used in the PASAQ algorithm Yang et al. (2012). The description of the algorithm is deferred to Appendix J.

7. Experiment

In this section, we demonstrate the effectiveness of our proposed algorithms through extensive experiments. In our experiments, all reported results are averaged over 30 randomly generated game instances. See Appendix L for details about generating game instances and parameters. Unless specified otherwise, all game instances are generated in this way.

7.1. Experimental Results

We compare the scalability and the solution quality of Select using EDPA, C-Truncated, T-Sampling to obtain $\mathsf{DefEU}$ and GSA for different settings of the problems against level-0 attackers.

First, we test the case where $\sum_{v\in Y}p_{v}<3$ . We set $|X|=6,k=4,n=8,r=3$ and enumerate $|Y|$ from 2 to 16. The results are shown in Figure 1(a). We also include Greedy as a baseline that always chooses the informants that maximizes the probability of receiving tips. We can see that T-Sampling performs the best in term of runtime, but fails to provide high-quality solutions. While C-Truncated is slower than T-Sampling, it performs the best with no error on all test cases. However, when there is no restriction on $\sum_{v\in Y}p_{v}$ , as shown in Figure 1(b), C-Truncated performs badly, even worse than Greedy for large $|Y|$ , while T-Sampling performs a lot better and GSA performs the best. We also fix $|X|=7$ , $|Y|=10,k=3,r=5$ and change the number of targets $n$ from 5 to 25 for $\sum_{v\in Y}p_{v}<3$ . The results are shown in Figure 1(c). GSA is the fastest but provides slightly worse solutions than C-Truncated does. The runtime of Greedy is less than 0.3s for all instances tested.

We then perform a case study to show the trade-off between the optimal number of resources to allocate and the optimal number of informants to recruit with budget constraints when defending against level-0 attackers. We set $|X|=|Y|=n=6$ and generate an instance of the game. We set the cost of allocating one defensive resource $C_{r}=3$ and the cost of hiring one informant $C_{i}=1$ . Given a budget $B$ , we can recruit $k$ informants and allocate $r$ resources when $k\cdot C_{i}+r\cdot C_{r}\leq B$ . The trade-off between the optimal $k$ and $r$ is shown in Figure 1(d). In the same instance, we study how the defender’s utility would change by increasing the number of recruited informants with fixed $r$ . Given a fixed number of resources, the defender should recruit as many informants as possible. We can also see that assuming the defender can acquire sufficient resources, the importance of recruiting additional informants is diminished. This result provides useful guidance to defenders such as conservation agencies in allocating their budget and recruiting informants.

We run additional experiments for the SISI case and do a case study to show the errors of the estimations for all $U\subseteq X$ on 2 instances. We present the results in Appendix K.

Refer to caption — (a) Runtime and solution quality increasing $|Y|$ with $\sum_{p_{v}}<3.$

7.1.1. Level-0 vs. Level- $\infty$ attackers

We set $|X|=n=6$ , $|Y|=p_{1}=1$ , and $G_{S}$ to be fully connected. We set $r=2,4,6$ and vary $k$ from $1\ldots 6$ . We first fix the defender’s strategy to the one against level-0 attackers and compare the utility achieved by the defender when defending against a level-0 attacker and a level- $\infty$ attacker. We show how the defender utility varies with the number of informants and defensive resources in Figure 1(e). On average, we see that the defender utility against a level- $\infty$ attacker is lower than that against a level-0 attacker. We also show the utility of the defender using her optimal strategy against a level- $\infty$ attacker. We can see that when facing a level- $\infty$ attacker, the defender utility when using the optimal strategy is higher by a margin than using the one against level-0 attackers.

7.1.2. Level-0 vs. informant-aware defenders

We set $|X|=n=6$ , $|Y|=p_{1}=1$ , and $G_{S}$ to be fully connected. We vary $r$ from $1\ldots 6$ and $k$ from $0\ldots 6$ . We assume that the defender recruits the $k$ informants with the highest information sharing intensity $w_{u1}$ . The optimal defender strategy against the informant-aware attacker case is found using QRI-MILP. The defender strategy against the level-0 attacker case is computed using PASAQ Yang et al. (2012). The defender utility against the level-0 attacker is found by first computing $q_{i},\mathsf{DefEU}_{0}$ and then using the results to compute $\mathsf{DefEU}(U)$ .

In Figure 1(g), we show how the defender utility in the two cases varies with the number of informants and defensive resources. On average, we see that the defender utility is marginally higher against the level-0 attacker than against the informant-aware attacker, particularly when the defender has either very few or very many defensive resources. We also compare the defender’s utility of the level-0 defender (defending against level-0 attackers) and the informant-aware defender (defending against informant-aware attackers). The results are deferred to Appendix K.

7.1.3. Comparison between the Bi-Level Algorithm and QRI-MILP

We empirically compare the bi-level optimization algorithm with QRI-MILP. We set $|X|=n=6$ , $|Y|=p_{1}=1$ , and $G_{S}$ to be fully connected. We vary $r$ from $1\ldots 6$ and $k$ from $0\ldots 6$ .

In both cases, we assume that the defender recruits the $k$ informants with the highest information sharing intensity $w_{u1}$ . The results are shown in Figure 1(f). In general, our bi-level algorithm gives higher expected defender utilities than the QRI-MILP algorithm, except when $r=1$ . Our results show that both increasing the number of resources and hiring more informants increase the defender’s utility. However, as the number of resources ( $r$ ) increases, the utility gain from hiring more informants diminishes.

Intuitively, if the number of resources equals the number of targets, the defender should always cover all the targets, Interestingly, during our experiments, we observed that in this case, the optimal defender strategy may not always use all his resources to cover all the targets. The reason is that in a general sum game, by decreasing the probability of protecting a certain target on purpose, the defender can lure the attacker into attacking the target more frequently, and thus increase his expected utility. Such strategies can be found in real-world wildlife protections where the patrollers may sometimes deliberately ignore the tips. This is also reflected in our bi-level algorithm. If the defender always uses all his resources, then both the defender’s and the attacker’s strategies are fixed, and hiring more informants does not increase the defender’s expected utility. But if the defender strategy does not always use all his resources, then hiring more informants could help (see the bi-level algorithm for the $r=6$ case in Figure 1(f)).

8. Discussion and Conclusion

In this paper, we introduced a novel two-stage security game model and a multi-level QR behavioral model that incorporated community engagement. We provided complexity results, developed algorithms to find (sub-)optimal groups of informants to recruit against level-0 attackers and evaluated the algorithms through extensive experiments. Our results also generalize to the case where informants have heterogeneous recruitment costs and to different kinds of attacker response models, such as SUQR model Nguyen et al. (2013), which can be done by calculating the attacker’s response correspondingly. Also, see Appendix M on how to extend our algorithms to defend against level- $\kappa$ ( $\kappa<\infty$ ) attackers. In Section 5, we defined a more powerful type of attacker that could respond to the marginal strategy and developed a bi-level optimization algorithm to find the optimal defender’s strategy in this case.

In the anti-poaching domain, some conservation site managers utilize the so-called “intelligence” operations that rely on informants in nearby villages to alert rangers when they know the poachers’ plans in advance. The deployment of the work relies on the site manager to provide their understanding of the social connections among community members. The edges and parameters of the bipartite graph in our model can be extracted from a local social media application or historical data collected by site managers. Recruiting and training reliable informants is costly and managers may only afford a limited number of them. Our model and solution can help the managers efficiently recruit informants, make the best use of tips and evaluate the trade-off between allocating budget to hiring rangers and recruiting informants in a timely fashion.

For future work, instead of using a particular behavior model, we can use historical records as training data and learn the attackers’ behavior in different domains. It would also be interesting to consider the case where the informants can only provide inaccurate tips or other types of tips, e.g., some subset of targets will be attacked instead of a single location. We can also model the informants as strategic agents. In real life, it is possible that informants may also provide fake information if they have their own utility structures. We can try to reward them to elicit true information and maximize the defender’s utility.

Acknowledgement

This work is supported in part by NSF grant IIS-1850477 and a research grant from Lockheed Martin.

References

(1)
Basilico et al. (2017) Nicola Basilico, Andrea Celli, Giuseppe De Nittis, and Nicola Gatti. 2017. Coordinating multiple defensive resources in patrolling games with alarm systems. In AAMAS’17. 678–686.
Duffy et al. (2015) Rosaleen Duffy, Freya AV St John, Bram Büscher, and DAN Brockington. 2015. The militarization of anti-poaching: undermining long term goals? Environmental Conservation 42, 4 (2015), 345–348.
Fang et al. (2017) Fei Fang, Thanh Hong Nguyen, Rob Pickles, Wai Y. Lam, Gopalasamy R. Clements, Bo An, Amandeep Singh, Brian C. Schwedock, Milind Tambe, and Andrew Lemieux. 2017. PAWS - A Deployed Game-Theoretic Application to Combat Poaching. AI Magazine (2017). http://www.aaai.org/ojs/index.php/aimagazine/article/view/2710
Gill et al. (2014) Charlotte Gill, David Weisburd, Cody W Telep, and Trevor Bennett. 2014. Community-oriented policing to reduce crime, disorder and fear and increase satisfaction and legitimacy among citizens: A systematic review. Journal of Experimental Criminology (2014).
Guo et al. (2017) Qingyu Guo, Boyuan An, Branislav Bosansky, and Christopher Kiekintveld. 2017. Comparing strategic secrecy and Stackelberg commitment in security games. In IJCAI-17.
Jain et al. (2010) Manish Jain, Jason Tsai, James Pita, Christopher Kiekintveld, Shyamsunder Rathi, Milind Tambe, and Fernando Ordónez. 2010. Software assistants for randomized patrol planning for the lax airport police and the federal air marshal service. Interfaces (2010).
Korzhyk et al. (2011) Dmytro Korzhyk, Zhengyu Yin, Christopher Kiekintveld, Vincent Conitzer, and Milind Tambe. 2011. Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness. Journal of Artificial Intelligence Research 41 (2011), 297–327.
Le Gallic and Cox (2006) Bertrand Le Gallic and Anthony Cox. 2006. An economic analysis of illegal, unreported and unregulated (IUU) fishing: Key drivers and possible solutions. Marine Policy 30, 6 (2006), 689–695.
Leader-Williams and Milner-Gulland (1993) N Leader-Williams and EJ Milner-Gulland. 1993. Policies for the enforcement of wildlife laws: the balance between detection and penalties in Luangwa Valley, Zambia. Conservation Biology 7, 3 (1993), 611–617.
Linkie et al. (2015) Matthew Linkie, Deborah J. Martyr, Abishek Harihar, Dian Risdianto, Rudijanta T. Nugraha, Maryati, Nigel Leader‐Williams, and Wai‐Ming Wong. 2015. EDITOR’S CHOICE: Safeguarding Sumatran tigers: evaluating effectiveness of law enforcement patrols and local informant networks. Journal of Applied Ecology (2015).
Ma et al. (2018) Xiaobo Ma, Yihui He, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, and Xiaohong Guan. 2018. Camera Placement Based on Vehicle Traffic for Better City Security Surveillance. IEEE Intelligent Systems 33, 4 (Jul 2018), 49–61. https://doi.org/10.1109/mis.2018.223110904
McKelvey and Palfrey (1995) Richard D McKelvey and Thomas R Palfrey. 1995. Quantal response equilibria for normal form games. Games and economic behavior (1995).
Moreto (2015) William D Moreto. 2015. Introducing intelligence-led conservation: bridging crime and conservation science. Crime Science 4, 1 (2015), 15.
Nemhauser et al. (1978) George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functions—I. Mathematical programming 14, 1 (1978), 265–294.
Nguyen et al. (2013) Thanh Hong Nguyen, Rong Yang, Amos Azaria, Sarit Kraus, and Milind Tambe. 2013. Analyzing the Effectiveness of Adversary Modeling in Security Games.. In AAAI.
Pita et al. (2008) James Pita, Manish Jain, Janusz Marecki, Fernando Ordóñez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport. In AAMAS: industrial track.
Rosenfeld and Kraus (2017) Ariel Rosenfeld and Sarit Kraus. 2017. When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty.. In IJCAI. 3814–3822.
Schlenker et al. (2018) Aaron Schlenker, Omkar Thakoor, Haifeng Xu, Fei Fang, Milind Tambe, Long Tran-Thanh, Phebe Vayanos, and Yevgeniy Vorobeychik. 2018. Deceiving Cyber Adversaries: A Game Theoretic Approach. In AAMAS.
Short et al. (2013) Martin B Short, Ashley B Pitcher, and Maria R D’Orsogna. 2013. External conversions of player strategy in an evolutionary game: A cost-benefit analysis through optimal control. European Journal of Applied Mathematics 24, 1 (2013), 131–159.
Smith and Humphreys (2015) MLR Smith and Jasper Humphreys. 2015. The Poaching Paradox: Why South Africa’s ‘Rhino Wars’ Shine a Harsh Spotlight on Security and Conservation. Ashgate Publishing Company.
Tambe (2011) Milind Tambe. 2011. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge University Press.
Tublitz and Lawrence (2014) Rebecca Tublitz and Sarah Lawrence. 2014. The Fitness Improvement Training Zone Program. (2014).
Wang et al. (2018) Xinrun Wang, Bo An, Martin Strobel, and Fookwai Kong. 2018. Catching Captain Jack: Efficient Time and Space Dependent Patrols to Combat Oil-Siphoning in International Waters. (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16312
Wright and Leyton-Brown (2014) James R Wright and Kevin Leyton-Brown. 2014. Level-0 meta-models for predicting human behavior in games. In Proceedings of the fifteenth ACM conference on Economics and computation. ACM, 857–874.
WWF (2015) WWF. 2015. Developing an approach to community-based crime prevention. http://zeropoaching.org/pdfs/Community-based-crime%20prevention-strategies.pdf. (2015).
Yang et al. (2012) Rong Yang, Fernando Ordonez, and Milind Tambe. 2012. Computing optimal strategy against quantal response in security games. In AAMAS.

Appendix:
Green Security Game with Community Engagement

Appendix A Complexity of different algorithms

Algorithm	Time Complexity
EDPA	$O(\|X\|^{k}2^{\|Y\|}n^{2}r\|Y\|^{3})$
C-Truncated	$O(\|X\|^{k}n^{2}r\|Y\|^{C+3})$
T-Sampling	$O(\|X\|^{k}\mathsf{T}n^{2}r\|Y\|^{3})$
ASISI	$O(\|X\|^{k}n^{2}r\|Y\|^{4})$
GSA	$O(2^{k}\|X\|n^{2}r\|Y\|^{3})$

Table 1. Complexity Table

Appendix B Proof of Theorem 4.1

See 4.1

Proof.

Given the tips $\mathbf{V}$ , the defender should calculate $\mathsf{EG}_{i}(|V_{i}|,V_{0})$ for each target $i\in T$ , and then allocate the resources to $r$ of the targets with the highest $\mathsf{EG}_{i}$ .

The above strategy is indeed optimal since the expected utility with no resources is given by $\sum_{i\in T}\mathsf{EU}^{u}_{i}(|V_{i}|,V_{0})$ , and once an additional unit of resource is given, it should always be allocated to the uncovered target that could lead to the largest increment in expected utility, i.e., the target with the largest $\mathsf{EG}_{i}(|V_{i}|,V_{0})$ .

The calculation of $\mathsf{EG}_{i}(|V_{i}|,V_{0})$ for each $i\in T$ can be done in $O(n+|Y|)$ time, and finding the $r$ largest $\mathsf{EG}_{i}(|V_{i}|,V_{0})$ can be done in $O(n)$ time, leading to the overall complexity of $O(|Y|+n)$ . ∎

Appendix C Proof of Theorem 4.1

See 4.1

Proof.

Consider the case where $r=1$ , $p_{v}=w_{uv}=1$ for all $u,v$ , and the targets are uniform i.e., $R_{i}^{d}$ ’s ( $R_{i}^{a},P_{i}^{d},P_{i}^{a}$ ) are the same for all $i\in T$ . We use the notation $R^{d}$ ( $P^{d}$ ) instead of $R_{i}^{d}$ ( $P_{i}^{d}$ ) for simplicity. Let $\lambda=0$ .

To start with, we investigate how $\mathsf{DefEU}(U)$ depends on a given $U$ . Since $p_{v}=1$ and $w_{uv}=1$ for all $u\in X,v\in Y$ , all attackers in $V$ will be reported to attack a location. Let random variable $X_{i}=|V_{i}|$ be the number of attackers who are reported to attack location $i$ . Since the targets are uniform, an attacker will attack each location with probability $q_{i}=\frac{1}{n}$ if he goes attacking. Then the defender’s expected utility $\mathsf{DefEU}(U)$ could be written as

	$\displaystyle\mathsf{DefEU}(U)=$	$\displaystyle\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]+\frac{\|Y\|-\|V\|}{n}\right)R^{d}$
		$\displaystyle+\left(\|V\|-\mathbb{E}\left[\max_{i\in T}X_{i}\right]+\frac{n-1}{n}(\|Y\|-\|V\|)\right)P^{d}$
	$\displaystyle=$	$\displaystyle\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]-\frac{\|V\|}{n}\right)\left(R^{d}-P^{d}\right)+\frac{\|Y\|}{n}R^{d}+\frac{(n-1)\|Y\|}{n}P^{d}.$

The latter two terms are independent of the choice of informants so to maximize $\mathsf{DefEU}$ , it suffices the maximize $\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]-\frac{|V|}{n}\right)$ .

We can prove by induction on $|V|$ that $\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]-\frac{|V|}{n}\right)$ increases as $|V|$ increases, or $\mathbb{E}\left[\max_{i\in T}X_{i}\right]$ increases by at least $\frac{1}{n}$ if $|V|$ is increased by 1:

(1)

Since $\mathbb{E}\left[\max_{i\in T}X_{i}\right]=1$ when $|V|=1$ and $\mathbb{E}\left[\max_{i\in T}X_{i}\right]=1+\frac{1}{n}$ when $|V|=2$ , it holds for $|V|=1$ .
(2)

Consider $|V|\geq 1$ and the corresponding sequence $\{X_{i}\}_{i=1}^{n}$ . Let $X_{m}=\max_{1\leq i\leq n}\{X_{i}\}$ . We add an attacker to $V$ and denote by $p$ the probability of he targeting the location with the largest $X_{i}$ . Thus the expected maximum increase by $p$ . Since $p\geq\frac{1}{n}$ and by a simple coupling argument, we have that $\mathbb{E}\left[\max_{i\in T}X_{i}\right]$ increases by at least $\frac{1}{n}.$

Thus, in this case, solving for the optimal solution of the original problem is equivalent to solving for $U$ that maximizes the size of $V$ in the first stage.

We show that the optimization problem is NP-Hard using a reduction from $\mathsf{MCP}$ : we are given a number $k$ and a collection of sets $S$ . The objective is to find a subset $S^{\prime}\subseteq S$ of sets such that $|S^{\prime}|\leq k$ and the number of covered elements $\left|\bigcup_{S_{i}\in S^{\prime}}S_{i}\right|$ is maximized. Let $X=\{x_{1},\ldots,x_{|S|}\}$ , $Y=\bigcup_{S_{i}\in S}S_{i}$ , $E=\{(x_{i},y):i\in[|S|]\land y\in S_{i}\}$ , $p_{v}=1$ for all $v\in Y$ and $W_{e}=1$ for all $e\in E$ . Thus to find a $U\subseteq X$ with $|U|\leq k$ that maximizes the size of $V$ is equivalent to finding a subset of sets with size no larger than $k$ that maximizes the number of covered elements in the instance of $\mathsf{MCP}$ . ∎

Appendix D Proof of Lemma 4.2.2

See 4.2.2

Proof.

Since $w_{uv}=1$ for all $u,v$ , we have $\tilde{p}_{v}(V_{0})=0$ for each $v\in V\setminus V_{0}$ given $V_{0}$ . Therefore, the expected gain of target $j$ with $\tilde{y}$ reported attacks can be written as $\mathsf{EG}_{j}=(\tilde{y}+q_{j}\sum_{v\in Y\setminus V}p_{v})(R_{j}^{d}-P_{j}^{d})$ and the calculation of $f(\cdot)$ depends only on the size of $|V_{0}|$ . Thus, instead of enumerating $V_{0}$ , we enumerate $0\leq t_{0}\leq|V|$ as the size of $V_{0}$ in line 2 of Algorithm 1, and replace $P_{V_{0}}$ in Algorithm 1 with $P_{t_{0}}$ , where $P_{t_{0}}=\Pr[|V_{0}|=t_{0}|U]$ can be obtained by expanding the following polynomial $\prod_{v\in V}(1-p_{v}+p_{v}x)=\sum_{i=0}^{|V|}P_{i}x^{i}.$ Therefore, $\mathsf{DefEU}(U)$ can be calculated in $O(n^{2}r|Y|^{4})$ time.

Since all possible $U$ can be enumerated in $O(|X|^{k})$ , the optimal set of informants can be computed in $O(|X|^{k}n^{2}r|Y|^{4})$ . ∎

Appendix E ASISI

In Algorithm 4 and Algorithm 5, we present the polynomial time algorithm used to compute $\mathsf{DefEU}(U)$ when $w_{uv}=1,\forall(u,v)\in E$ .

Algorithm 4 Calculate

\mathsf{DefEU}(U)

1:Expand polynomial

\prod_{v\in V}(1-p_{v}+p_{v}x)=\sum_{j=0}^{|V|}P_{j}x^{j}

\mathsf{EU}\leftarrow 0

3:for all possible number of reported attackers

0\leq t_{0}\leq|V|

4: if

t_{0}=0

then

\mathsf{EU}=\mathsf{EU}+P_{t_{0}}\sum_{v\in Y}p_{v}\mathsf{DefEU}_{0}

6: Continue to line 2

7: for target

i\in T

and

0\leq t_{i}\leq t_{0}

\triangleright

Enumerate target

i

and the number of attackers

t_{i}

targeting

i

8: Calculate

f(\cdot)

given

t_{0},i,t_{i}

\mathsf{EG}_{i}\leftarrow(t_{i}+q_{i}\sum_{v\in Y\setminus V}p_{v})(R_{i}^{d}-P_{i}^{d})

10:

\mathsf{EU}_{i}^{u}\leftarrow(t_{i}+q_{i}\sum_{v\in Y\setminus V}p_{v})P_{i}^{d}

11:

P_{i,r}\leftarrow(t_{0}-t_{i})!\left(\sum_{x=0}^{r-1}f(s,x,t_{0}-t_{i})\right)

12:

\mathsf{EU}=\mathsf{EU}+P_{t_{0}}{t_{0}\choose t_{i}}q_{i}^{t_{i}}\cdot P_{i,r}\cdot EG_{i}

13:

\mathsf{EU}=\mathsf{EU}+P_{t_{0}}{t_{0}\choose t_{i}}q_{i}^{t_{i}}(1-q_{i}^{t_{i}})\mathsf{EU}_{i}^{u}

14:

\mathsf{DefEU}(U)\leftarrow\mathsf{EU}

Algorithm 5 Calculate

f(\cdot)

given

t_{0}

i

t_{i}

\{{i_{1}},\ldots,{i_{n-1}}\}\leftarrow T\setminus\{i\}

\mathsf{EG}_{i}=(t_{i}+q_{i}\sum_{v\in Y\setminus V}p_{v})(R_{i}^{d}-P_{i}^{d})

3:Initialize

f(s,x,y)\leftarrow 0

for all

s,x,y

f(0,0,0)\leftarrow 1

5:for

s\leftarrow 1

n-1

6: for

x\leftarrow 0

\min(s,r)

7: for

y\leftarrow 0

t_{0}-t_{i}

8: for

\tilde{y}\leftarrow 0

y

\mathsf{EG}_{i_{s}}=(\tilde{y}+q_{i_{s}}\sum_{v\in Y\setminus V}p_{v})(R_{i}^{d}-P_{i}^{d})

10: if

\mathsf{EG}_{i_{s}}>\mathsf{EG}_{i}

then

11:

f(s,x,y)\leftarrow f(s,x,y)+\frac{q_{i_{s}}^{\tilde{y}}}{\tilde{y}!}f(s-1,x-1,y-\tilde{y})

12: else

13:

f(s,x,y)\leftarrow f(s,x,y)+\frac{q_{i_{s}}^{\tilde{y}}}{\tilde{y}!}f(s-1,x,y-\tilde{y})

Appendix F Defender Utility is not Submodular

We provide a counterexample that disproves the submodularity of $\mathsf{DefEU}(U)$ .

Example \thetheorem

Consider a network $G_{S}=(X,Y,E)$ where $X=\{u_{1},u_{2}\}$ , $Y=\{v_{1},v_{2},v_{3}\}$ , $E=\{(u_{1},v_{2}),(u_{2},v_{3})\}$ , $p_{v}=1\,\forall v\in Y$ and $w_{uv}=1\,\forall(u,v)\in E$ . There are 2 targets $T=\{1,2\}$ , where $R_{i}^{d}=i$ , $P_{i}^{d}=-10^{-8}\approx 0$ for any $i\in T$ . Letting $\lambda=0$ yields $q_{i}=0.5$ . The defender has only 1 resource. We can see that $\mathsf{DefEU}(\emptyset)=\mathsf{DefEU}(\{1\})=\mathsf{DefEU}(\{2\})=3$ , $\mathsf{DefEU}(\{1,2\})=\frac{1}{4}(2+0.5)+\frac{1}{4}(4+1)+\frac{1}{2}(2+1)=3.375.$ As a result, $\mathsf{DefEU}(\{1,2\})+\mathsf{DefEU}(\emptyset)>\mathsf{DefEU}(\{1\})+\mathsf{DefEU}(\{2\}).$

Appendix G Proof of Lemma 4.2.1

See 4.2.1

Proof.

Let the random variable $W$ be the number of attacks. Let $\mathcal{A}_{1}$ be the set of events of having no less than $C$ reported attackers and $\mathcal{A}_{2}$ be the set of events of having no less than $C$ attacks. Let $E_{A}$ be the expected defender’s utility taken over all possible tips given an event $A$ . By noticing that $\mathcal{A}_{1}\subseteq\mathcal{A}_{2}$ , we have

	$\displaystyle\|\mathsf{DefEU}(U)-{\mathsf{DefEU}}(U,C)\|$	(6)
$\displaystyle\leq$	$\displaystyle\sum_{A\in\mathcal{A}_{1}}\Pr[A]\|E_{A}\|\leq\sum_{A\in\mathcal{A}_{2}}\Pr[A]\|E_{A}\|$
$\displaystyle\leq$	$\displaystyle Q\sum_{i=C}^{\|Y\|}\Pr[W=i]\cdot i$
$\displaystyle=$	$\displaystyle Q\left(C\Pr[W\geq C]+\sum_{i=C+1}^{\|Y\|}\Pr[W\geq i]\right)$
$\displaystyle\leq$	$\displaystyle Q\left(Ce^{-2(C-C^{\prime})^{2}/\|Y\|}+\sum_{i=C+1}^{\|Y\|}e^{-2(i-C^{\prime})^{2}/\|Y\|}\right)$
$\displaystyle<$	$\displaystyle Q\cdot e^{-2(C-C^{\prime})^{2}/\|Y\|}\left(C+\frac{1}{1-e^{-4(C-C^{\prime})/\|Y\|}}\right).$	(7)

Inequality (6) follows by the Chernoff Bound, and inequality (7) follows since

			$\displaystyle\sum_{i=C+1}^{\|Y\|}e^{-2(i-C^{\prime})^{2}/\|Y\|}$
		$\displaystyle\leq$	$\displaystyle e^{-2(C+1-C^{\prime})^{2}/\|Y\|}\sum_{i\geq 0}e^{-4i(C+1-C^{\prime})/\|Y\|}$
		$\displaystyle<$	$\displaystyle e^{-2(C-C^{\prime})^{2}/\|Y\|}\cdot\frac{1}{1-e^{-4(C-C^{\prime})/\|Y\|}}.$

∎

Appendix H Non-convergence of the level- $\kappa$ response

Example \thetheorem

Suppose there is a single attacker and two targets with the following payoffs:

	$\displaystyle R^{a}_{1}=0.6,R^{a}_{2}=0.8,$
	$\displaystyle P^{a}_{1}=-0.8,P^{a}_{2}=-0.6.$

In this case, there are only two possible tips: the attacker attacks target 1 ( $\mathbf{V}_{1}$ ), and the attacker attacks 2 ( $\mathbf{V}_{2}$ ). Assume that only 1 informant is recruited with report probability $w=0.5$ . The defender has only 1 defensive resource and uses the following strategy:

\displaystyle\mathbf{x}_{0}=(0.5,0.5),\mathbf{x}(\mathbf{V}_{1})=(1.0,0.0),\mathbf{x}(\mathbf{V}_{2})=(0.0,1.0).

When the attacker has $\lambda=2.9$ , the level- $\kappa$ response will converge to $\mathbf{q}=(0.4283,0.5717)$ . However, if $\lambda=3.0$ , then the process will eventually oscillate between $\mathbf{q}=(0.2924,0.7076)$ and $\mathbf{q}^{\prime}=(0.5676,0.4324)$ iteratively.

Appendix I Proof of Lemma 5.1

See 5.1

Proof.

Given defender’s strategy $\mathbf{x}_{0}$ and $\mathbf{x}(\mathbf{V})$ , define:

\displaystyle g(\mathbf{q})=\mathsf{QR}(\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q})).

Then a level-( $\kappa$ +1) attacker’s strategy can be computed by

\displaystyle\mathbf{q}^{\kappa+1}=g(\mathbf{q}^{\kappa})=g(g(\mathbf{q}^{\kappa-1}))=\cdots=g^{\kappa}(\mathbf{q}).

The convergence of level- $\kappa$ is equivalent to the convergence of $g^{\kappa}(\mathbf{q})$ .

The marginal strategy $\hat{\mathbf{x}}$ can be written as:

\displaystyle\hat{\mathbf{x}}(\mathbf{q})=\sum_{\mathbf{V}}\Pr\{\mathbf{V}\}\mathbf{x}(\mathbf{V})=(1-w)\mathbf{x}_{0}+\sum_{i}wq_{i}\mathbf{x}(\mathbf{V}_{i}).

Notice that the function $g(\mathbf{q})$ is just the quantal response against $\hat{\mathbf{x}}$ :

\displaystyle g_{i}(\mathbf{q})=\frac{e^{\lambda u^{a}_{i}(\hat{x}_{i})}}{\sum_{j}e^{\lambda u^{a}_{j}(\hat{x}_{j})}},

where $u^{a}_{i}(\hat{x}_{i})$ is the attacker’s expected utility of attacking target $i$ when the defender’s marginal strategy is $\hat{\mathbf{x}}$ : $u^{a}_{i}(\hat{x}_{i})=R^{a}_{i}-\hat{x}_{i}(R^{a}_{i}-P^{a}_{i})$ . Therefore,

	$\displaystyle\frac{\partial g_{i}}{\partial q_{j}}=$	$\displaystyle\frac{\partial g_{i}}{\partial u^{a}_{i}}\frac{\partial u^{a}_{l}}{\partial q_{j}}+\sum_{l\neq i}\frac{\partial g_{i}}{\partial u^{a}_{l}}\frac{\partial u^{a}_{l}}{\partial q_{j}}$
	$\displaystyle=$	$\displaystyle\lambda wg_{i}(1-g_{i})(P^{a}_{i}-R^{a}_{i})x_{i}(\mathbf{V}_{j})+\sum_{l\neq i}\lambda wg_{i}g_{l}(R^{a}_{l}-P^{a}_{l})x_{k}(\mathbf{V}_{j})$
	$\displaystyle=$	$\displaystyle\lambda wg_{i}\left[(P^{a}_{i}-R^{a}_{i})x_{i}(\mathbf{V}_{j})+\sum_{l}g_{l}(R^{a}_{l}-P^{a}_{l})x_{l}(\mathbf{V}_{j})\right].$

Note that in the above equation, $0\leq w,g_{i},g_{l},x_{i}(\mathbf{V}_{j}),x_{l}(\mathbf{V}_{j})\leq 1$ , $P^{a}_{i}-R^{a}_{i}<0$ and $R^{a}_{l}-P^{a}_{l}>0$ . Thus we have:

\displaystyle\lambda(P^{a}_{i}-R^{a}_{i})x_{i}(\mathbf{V}_{i})<\frac{\partial g_{i}}{\partial q_{j}}<\lambda\sum_{l}g_{l}(R^{a}_{l}-P^{a}_{l})x_{l}(\mathbf{V}_{j}).

(8)

On the other hand, $\bar{x}_{i}\leq\frac{L}{n\lambda(R^{a}_{i}-P^{a}_{i})}$ means $x_{i}(\mathbf{V}_{j})\leq\frac{L}{n\lambda(R^{a}_{i}-P^{a}_{i})},\forall j$ . Plugging into Equation (8), we get:

\displaystyle-\frac{L}{n}<\frac{\partial g_{i}}{\partial q_{j}}<\frac{L}{n}\sum_{l}g_{l}=\frac{L}{n}.

For any $\mathbf{q}^{\prime}\neq\mathbf{q}$ , let $\mathbf{q}^{(i)}=(q_{1},\dots,q_{i},q^{\prime}_{i},\dots,q^{\prime}_{n})$ . So $\mathbf{q}^{(0)}=\mathbf{q}^{\prime}$ and $\mathbf{q}^{(n)}=\mathbf{q}$ . Therefore,

		$\displaystyle\left\\|g(\mathbf{q})-g(\mathbf{q}^{\prime})\right\\|_{1}=\sum_{i=1}^{n}\left\|g_{i}(\mathbf{q})-g_{i}(\mathbf{q}^{\prime})\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\left\|\sum_{j=1}^{n}\left[g_{i}(\mathbf{q}^{(j)})-g_{i}(\mathbf{q}^{(j-1)})\right]\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\left\|g_{i}(\mathbf{q}^{(j)})-g_{i}(\mathbf{q}^{(j-1)})\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\left\|\int_{q^{\prime}_{j}}^{q_{j}}\left.\frac{\partial g_{i}(\mathbf{q})}{\partial q_{j}}\right\|_{\mathbf{q}=(q_{1},\dots,q_{i-1},s,q^{\prime}_{i+1},\dots,q^{\prime}_{n})}\,\mathrm{d}s\right\|$
	$\displaystyle<$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\int_{q^{\prime}_{j}}^{q_{j}}\frac{L}{n}\,\mathrm{d}s\leq\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{L}{n}\|q_{j}-q^{\prime}_{j}\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\frac{L}{n}\\|\mathbf{q}-\mathbf{q}^{\prime}\\|_{1}=L\\|\mathbf{q}-\mathbf{q}^{\prime}\\|_{1}.$

∎

Appendix J Algorithm for Defending against Informant-Aware Attackers

Proposition \thetheorem

The optimal objective of QRI is non-decreasing in $w$ .

Proof.

Consider the two optimization problems induced by different values for $w$ : $w_{1},w_{2}$ where $w_{2}>w_{1}$ . Let $(\mathbf{x},\mathbf{y},\mathbf{z})$ be a solution for when $w=w_{1}$ . Then, $(\mathbf{x},\mathbf{y},\frac{w_{1}}{w_{2}}\mathbf{z}+(1-\frac{w_{1}}{w_{2}})\mathbf{x})$ is a feasible solution for when $w=w_{2}$ that achieves the same objective value. To see why it is feasible, observe that constraint (3) is satisfied by construction and constraint (5) is satisfied since the new value for each $z_{i}$ is a convex combination of the previous $x_{i},z_{i}$ , which were both in $[0,1]$ . ∎

Proposition J implies that when selecting informants, it is optimal to simply maximize $w$ . Since $w=1-\prod_{u\in U}(1-w_{u1})$ , we can select informants greedily and choose the $k$ informants with the largest information sharing intensity $w_{u1}$ . We can then solve the optimization problem to find the optimal allocation of resources. Finally, we discuss how to find an approximate solution to QRI using a MILP approach.

We can compute the optimal defender strategy by adapting the approach used in the PASAQ algorithm Yang et al. (2012). Let $N(y),D(y)$ be the numerator and denominator of the objective in QRI. As with PASAQ, we binary search on the optimal value $\delta^{*}$ . We can check for feasibility of a given $\delta$ by rewriting the objective to $\min_{x,y,z}\delta D(y)-N(y)$ and checking if the optimal value is less than $0$ . To solve the new optimization problem, which still has a non-linear objective function, we adapt their approach of approximating the objective function with linear constraints and write a MILP.

First, let $\theta_{i}=e^{\lambda R^{a}_{i}}$ , $\beta_{i}=\lambda(R^{a}_{i}-P^{a}_{i})$ , and $\alpha_{i}=R^{d}_{i}-P^{d}_{i}$ . We rewrite the objective as

\sum_{i\in\mathcal{T}}\theta_{i}(\delta-P^{d}_{i})e^{-\beta_{i}(y_{i})}+\sum_{i\in\mathcal{T}}\theta_{i}\alpha_{i}y_{i}e^{-\beta_{i}(y_{i})}

We have two non-linear functions that need approximation $f^{1}_{i}(y)=e^{-\beta_{i}y}$ and $f^{2}_{i}(y)=ye^{-\beta_{i}y}$ . Let $\gamma_{ij}$ be the slope for the linear approximation of $f^{1}_{i}(y)$ from $(\frac{j}{K},f^{1}_{i}(\frac{j}{K}))$ to $(\frac{j+1}{K},f^{1}_{i}(\frac{j+1}{K}))$ and similarly with $\mu_{ij}$ for $f^{2}_{i}(y)$ .

The key change in our MILP compared to PASAQ is that we replace the original defender resource constraint with constraints (9) - (12), which take into account the ability of the defender to respond to tips.

QRI-MILP:

$\displaystyle\min_{x,y,z,a}\quad\sum_{i\in\mathcal{T}}$	$\displaystyle\theta_{i}(\delta-P^{d}_{i})(1+\sum_{j=1}^{K}\gamma_{ij}y_{ij})-\sum_{i\in\mathcal{T}}\theta_{i}\alpha_{i}\sum_{j=1}^{K}\mu_{ij}y_{ij}$
subject to	$\displaystyle\quad\sum_{j=1}^{K}y_{ij}=(1-w)x_{i}+wz_{i},\quad\forall i$	(9)
	$\displaystyle\quad\sum_{i\in\mathcal{T}}x_{i}\leq r$	(10)
	$\displaystyle\quad 0\leq x_{i}\leq 1,\quad\forall i$	(11)
	$\displaystyle\quad 0\leq z_{i}\leq 1,\quad\forall i$	(12)
	$\displaystyle\quad 0\leq y_{ij}\leq\frac{1}{K},\quad\forall i,j=1\ldots K$	(13)
	$\displaystyle\quad a_{ij}\frac{1}{K}\leq y_{ij},\quad\forall i,j=1\ldots K-1$	(14)
	$\displaystyle\quad y_{i(j+1)}\leq a_{ij},\quad\forall i,j=1\ldots K-1$	(15)
	$\displaystyle\quad a_{ij}\in\{0,1\},\quad\forall i,j=1\ldots K-1$	(16)

Proposition \thetheorem

The feasible region for $\mathbf{y}=\langle y_{i}=\sum_{j=1}^{K}y_{ij},i\in\mathcal{T}\rangle$ of QRI-MILP is equivalent to that of QRI.

Proof.

With the substitution $y_{i}=\sum_{j=1}^{K}y_{ij}$ , constraints (9) - (12) are directly translated from QRI. The remaining constraints (13) - (16) can be shown to allow for any potential $y_{i}$ , represented correctly with the appropriate $y_{ij}$ . ∎

With the above claim shown, the proof for the approximate correctness of PASAQ applies here and we can show that we can find an $\varepsilon$ -optimal solution for arbitrarily small $\varepsilon$ Yang et al. (2012).

Appendix K Additional Experiment Results

In Figure 2, we compare the performance of the level-0 defender and the informant-aware defender when playing against an informant-aware attacker. We see that despite that their strategies are computed under the level-0 attacker assumption, the utility of the level-0 defender is only slightly lower than the utility of the informant-aware defender. For a fixed $r$ , the difference in utility grows larger as the defender recruits more informants and has a higher probability of receiving a tip.

We test the special case assuming SISI. We set $|X|=6,k=4,n=10,r=3$ and enumerate $|Y|$ from 2 to 16. The average runtime of all algorithms including ASISI together with the relative error of GSA and T-Sampling are shown in Figure 3. Though ASISI and GSA are the fastest among all, the average relative errors of GSA are slightly above $0\%$ . T-Sampling is slightly slower than ASISI and GSA, and the solutions are less accurate than the other two in this case.

Another experiment is a case study on 2 instances with $\sum_{v\in Y}p_{v}<2$ fixed $|X|=6$ , $|Y|=8$ , $n=6$ , $r=3$ , $Q=2$ . We run EDPA, C-Truncated ( $C=6$ ) and T-Sampling on each instance and show the error of the estimations for all $U\subseteq X$ . The results are shown in Figure 4 and the red lines indicate the error bound given by Lemma 4.2.1. We encode the set of informants in binary, e.g., the set with code $19=(010011)_{2}$ represents the set $\{u_{1},u_{2},u_{4}\}.$ The first instance is constructed to show that the bound given by Lemma 4.2.1 is empirically tight, i.e., the estimation of $\mathsf{DefEU}(U)$ by C-Truncated could be large but still bounded. In this case, we set $p_{v}=1,w_{uv}=1\,\forall u,v$ , $R_{i}^{d}=Q$ and $P_{i}^{d}=-10^{-3}.$ While the other instance is randomly generated. It is shown that T-Sampling has larger errors with higher variances compared to C-Truncated.

Appendix L Experiment Setup

To generate $G_{S}=(X,Y,E)$ , we first fix the sets $X$ and $Y$ . For each $u\in X$ , we sample the degree of $u$ , $d_{u}$ , uniformly from $[|Y|]$ and then sample a uniformly random subset of $Y$ of size $d_{u}$ . For each $(u,v)\in E$ , $w_{uv}$ is drawn from $U[0,0.2]$ . For the attack probability $p_{v}$ , in the general case, each $p_{v}$ is drawn from $U[0.4,1]$ . When we restrict $\sum_{v\in Y}p_{v}\leq C^{\prime}$ , we draw a vector $\mathbf{t}=(t_{1},\ldots,t_{|Y|})$ from $U[0,1]^{|Y|}$ and set $p_{v}=\min\{1,C^{\prime}\cdot\frac{t_{v}}{||\mathbf{t}||_{1}}$ }. For the payoff matrix, each $R_{i}^{d}$ ( $R_{i}^{a}$ ) is drawn from $U(0,Q]$ and each $P_{i}^{d}(P_{i}^{a})$ is drawn from $U[-Q,0)$ , where $Q$ is set to 2. The precision parameter $\lambda$ is set to 2. $\mathsf{DefEU}_{0}$ and $q_{i}$ ’s are obtained by a binary search with a convex optimization as introduced in Yang et al. (2012). The number of samples $\mathsf{T}$ used in T-Sampling is set to 100. In GSA, EDPA is used to calculate $\mathsf{DefEU}(U)$ .

In the QRI-MILP algorithm, the optimal defender strategy is found with approximation parameter $K=10$ . The bi-level optimization algorithm is implemented using MATLAB R2017a. The low-level linear program is solved using the linprog function and the high-level optimization is solved with the fmincon function.

Appendix M Defending Against Level- $\kappa$ Attackers

In the section 4, we deal with the case with only type-0 attackers and provide algorithms to find the optimal set of informants to recruit. In this section, we show how those approaches can be easily extended to the case with level- $\kappa$ attackers.

Once given $\hat{\mathbf{x}}^{\kappa-1}$ , $\mathbf{q}^{\kappa}$ can be easily obtained. So as $\mathsf{DefEU}_{\kappa}$ , the defender’s expected utility using $\mathbf{x}_{0}$ against a single attack of a level- $\kappa$ attacker. To get the solution, we simply replace $\mathsf{DefEU}_{0},\mathbf{q}^{0}$ with $\mathsf{DefEU}_{\kappa},\mathbf{q}^{\kappa}$ and apply Select or GSA. In order to calculate $\hat{\mathbf{x}}^{\kappa-1}$ by definition, all that remains is to calculate $\mathsf{MS}(\mathbf{x}_{0},\mathbf{x},\mathbf{q}^{i})$ for $i<\kappa$ . The marginal probability of each target being covered can be calculated in a way similar to EDPA.

	$\displaystyle\mathsf{DefEU}(U)=$	$\displaystyle\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]+\frac{\|Y\|-\|V\|}{n}\right)R^{d}$
		$\displaystyle+\left(\|V\|-\mathbb{E}\left[\max_{i\in T}X_{i}\right]+\frac{n-1}{n}(\|Y\|-\|V\|)\right)P^{d}$
	$\displaystyle=$	$\displaystyle\left(\mathbb{E}\left[\max_{i\in T}X_{i}\right]-\frac{\|V\|}{n}\right)\left(R^{d}-P^{d}\right)+\frac{\|Y\|}{n}R^{d}+\frac{(n-1)\|Y\|}{n}P^{d}.$

			$\displaystyle\sum_{i=C+1}^{\|Y\|}e^{-2(i-C^{\prime})^{2}/\|Y\|}$
		$\displaystyle\leq$	$\displaystyle e^{-2(C+1-C^{\prime})^{2}/\|Y\|}\sum_{i\geq 0}e^{-4i(C+1-C^{\prime})/\|Y\|}$
		$\displaystyle<$	$\displaystyle e^{-2(C-C^{\prime})^{2}/\|Y\|}\cdot\frac{1}{1-e^{-4(C-C^{\prime})/\|Y\|}}.$

		$\displaystyle\left\\|g(\mathbf{q})-g(\mathbf{q}^{\prime})\right\\|_{1}=\sum_{i=1}^{n}\left\|g_{i}(\mathbf{q})-g_{i}(\mathbf{q}^{\prime})\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\left\|\sum_{j=1}^{n}\left[g_{i}(\mathbf{q}^{(j)})-g_{i}(\mathbf{q}^{(j-1)})\right]\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\left\|g_{i}(\mathbf{q}^{(j)})-g_{i}(\mathbf{q}^{(j-1)})\right\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\left\|\int_{q^{\prime}_{j}}^{q_{j}}\left.\frac{\partial g_{i}(\mathbf{q})}{\partial q_{j}}\right\|_{\mathbf{q}=(q_{1},\dots,q_{i-1},s,q^{\prime}_{i+1},\dots,q^{\prime}_{n})}\,\mathrm{d}s\right\|$
	$\displaystyle<$	$\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}\int_{q^{\prime}_{j}}^{q_{j}}\frac{L}{n}\,\mathrm{d}s\leq\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{L}{n}\|q_{j}-q^{\prime}_{j}\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\frac{L}{n}\\|\mathbf{q}-\mathbf{q}^{\prime}\\|_{1}=L\\|\mathbf{q}-\mathbf{q}^{\prime}\\|_{1}.$

Green Security Game with Community Engagement

Abstract.

Key words and phrases:

1. Introduction

2. Related Work and Background

3. Model

3.1. Level-κ\kappa Response Model

4. Defending against Level-0 Attackers

4.1. Complexity Results

4.2. Finding the Optimal Set of Informants

4.2.1. Calculating 𝖣𝖾𝖿𝖤𝖴​(U)\mathsf{DefEU}(U)

Lemma \thetheorem.

Proposition \thetheorem

4.2.2. Selecting Informants UU

Proposition \thetheorem

Lemma \thetheorem.

5. Defending Against Level-∞\infty Attackers

Proof.

Definition \thetheorem (level-∞\infty attacker).

Remark \thetheorem.

Remark \thetheorem.

5.1. Convergence Condition for the Level-κ\kappa Response Model

Lemma \thetheorem.

Corollary \thetheorem

5.2. A Bi-Level Optimization for Solving the Optimal Defender’s Strategy

6. Defending Against Informant-Aware Attackers

7. Experiment

7.1. Experimental Results

7.1.1. Level-0 vs. Level-∞\infty attackers

7.1.2. Level-0 vs. informant-aware defenders

7.1.3. Comparison between the Bi-Level Algorithm and QRI-MILP

8. Discussion and Conclusion

Acknowledgement

References

Appendix: Green Security Game with Community Engagement

Appendix A Complexity of different algorithms

Appendix B Proof of Theorem 4.1

Proof.

Appendix C Proof of Theorem 4.1

Proof.

Appendix D Proof of Lemma 4.2.2

Proof.

Appendix E ASISI

Appendix F Defender Utility is not Submodular

Example \thetheorem

Appendix G Proof of Lemma 4.2.1

Proof.

Appendix H Non-convergence of the level-κ\kappa response

Example \thetheorem

Appendix I Proof of Lemma 5.1

Proof.

Appendix J Algorithm for Defending against Informant-Aware Attackers

Proposition \thetheorem

Proof.

Proposition \thetheorem

Proof.

Appendix K Additional Experiment Results

Appendix L Experiment Setup

Appendix M Defending Against Level-κ\kappa Attackers

3.1. Level- $\kappa$ Response Model

4.2.1. Calculating $\mathsf{DefEU}(U)$

4.2.2. Selecting Informants $U$

5. Defending Against Level- $\infty$ Attackers

Definition \thetheorem (level- $\infty$ attacker).

5.1. Convergence Condition for the Level- $\kappa$ Response Model

7.1.1. Level-0 vs. Level- $\infty$ attackers

Appendix:
Green Security Game with Community Engagement

Appendix H Non-convergence of the level- $\kappa$ response

Appendix M Defending Against Level- $\kappa$ Attackers