This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\acmConference

A full version of the paper published in Proc. of AAMAS 2022 \acmYear \acmDOI \acmPrice \acmISBN \acmSubmissionID493 \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia

Minimizing Expected Intrusion Detection Time
in Adversarial Patrolling

David Klaška david.klaska@mail.muni.cz Antonín Kučera tony@fi.muni.cz Vít Musil musil@fi.muni.cz  and  Vojtěch Řehák rehak@fi.muni.cz
Abstract.

In adversarial patrolling games, a mobile Defender strives to discover intrusions at vulnerable targets initiated by an Attacker. The Attacker’s utility is traditionally defined as the probability of completing an attack, possibly weighted by target costs. However, in many real-world scenarios, the actual damage caused by the Attacker depends on the time elapsed since the attack’s initiation to its detection. We introduce a formal model for such scenarios, and we show that the Defender always has an optimal strategy achieving maximal protection. We also prove that finite-memory Defender’s strategies are sufficient for achieving protection arbitrarily close to the optimum. Then, we design an efficient strategy synthesis algorithm based on differentiable programming and gradient descent. We evaluate the efficiency of our method experimentally.

Key words and phrases:
Strategy synthesis, Security Games, Adversarial Patrolling

1. Introduction

This paper follows the security games line of work studying optimal allocation of limited security resources for achieving optimal target coverage Tambe (2011). Practical applications of security games include the deployment of police checkpoints at the Los Angeles International Airport Pita et al. (2008), the scheduling of federal air marshals over the U.S. domestic airline flights Tsai et al. (2009), the arrangement of city guards in Los Angeles Metro Fave et al. (2014), the positioning of U.S. Coast Guard patrols to secure selected locations An et al. (2014), and also applications to wildlife protection Ford et al. (2014); Wang et al. (2019); Xu (2021).

Patrolling games are a special type of security games where a mobile Defender moves among protected targets with the aim of detecting possible incidents. Compared with static monitoring facilities such as sensor networks or surveillance systems, patrolling is more flexible and less costly on implementation and maintenance. Due to these advantages Yan et al. (2013), patrolling is indispensable in detecting crimes Jakob et al. (2011); Chen et al. (2017), managing disasters Maza et al. (2011), wildlife protection Wang et al. (2019); Xu (2021), etc. Many works consider human Defenders such as police squads or rangers Wang et al. (2019) where the patrolling horizon is bounded. Recent technological advances motivate the study of robotic patrolling with unbounded horizon where the Defender is an autonomous device operating for a long time without interruption.

Most of the existing patrolling models can be classified as either regular or adversarial. Regular patrolling can be seen as a form of surveillance where the Defender aims at discovering accidents as quickly as possible by minimizing the time lag between two consecutive visits for each target. Here, a Defender’s strategy is typically a single path or a cycle visiting all targets. In adversarial patrolling, the Defender strives to protect the targets against an Attacker exploiting the best attack opportunities maximizing the damage. The solution concept is typically based on Stackelberg equilibrium Yin et al. (2010); Sinha et al. (2018), where the Defender commits to a strategy γ\gamma and the Attacker follows by selecting a strategy π\pi maximizing the expected Attacker’s utility against γ\gamma. Defender’s strategies are typically randomized so that the Attacker cannot foresee the next Defender’s moves, and the Defender aims at maximizing the probability of discovering an attack before its completion. The adversarial model is also appropriate in situations when a certain protection level must be guaranteed even if the accidents happen at the least convenient moment.

In infinite-horizon adversarial patrolling models, every target τ\tau is assigned a finite resilience d(τ)d(\tau), and an attack at τ\tau is discovered if the Defender visits τ\tau in the next d(τ)d(\tau) time units. Although this model is adequate in many scenarios, it is not applicable when the actual damage depends on the time elapsed since initiating the attack. For example, if the attack involves setting a fire, punching a hole in a fuel tank, or setting a trap, then the associated damage increases with time. In this case, the Defender should aim at minimizing the expected attack discovery time rather than maximizing the probability of visiting a target before a deadline. We refer to Section 2.3 for a more detailed discussion.

In this work, we formalize the objective of minimizing the expected attack discovery time in infinite-horizon adversarial patrolling, and we design an efficient strategy synthesis algorithm. We start by fixing a suitable formal model. The terrain is modeled by the standard patrolling graph, and the Defender’s/Attacker’s strategies are also defined in the standard way. However, the expected damage caused by attacking a target τ\tau is defined as the expected time of visiting τ\tau by the Defender since initiating the attack, multiplied by the target cost α(τ)\alpha(\tau). Intuitively, α(τ)\alpha(\tau) is the “damage per time unit” when attacking τ\tau. We use Stackelberg equilibrium as the underlying solution concept, and define the protection value of a given Defender’s strategy γ\gamma as the expected attack discovery time (weighted by target costs) guaranteed by γ\gamma against an arbitrary Attacker’s strategy.

In general, a Defender’s strategy γ\gamma may randomize and the choice of the next move may depend on the whole history of moves. The randomization is crucial for increasing protection value (a concrete example is given in Section 2.3). Since general strategies are not finitely representable, they are not algorithmically workable. Recent results on infinite-horizon adversarial patrolling Klaška et al. (2018, 2021) identify the subclass of regular Defender’s strategies as sufficiently powerful to maximize the probability of timely attack discovery. Here, a strategy is regular if it uses finite memory and rational probability distributions. However, it is not clear whether regular strategies are equivalently powerful as general strategies when minimizing the expected attack discovery time. Perhaps surprisingly, we show that the answer is positive, despite all issues caused by specific properties of this objective. More precisely, we prove that the limit protection value achievable by regular strategies is the same as the limit protection value achievable by general strategies. This non-trivial result is based on deep insights into the structure of (sub)optimal Defender’s strategies.

Our second main contribution is an algorithm synthesizing a regular Defender’s strategy and its protection value for a given patrolling graph. We show that the protection value of a regular strategy is a differentiable function, and we proceed by designing an efficient strategy improvement procedure based on gradient descent.

We evaluate our algorithm experimentally on instances of considerable size. Since our work initiates the study of infinite-horizon adversarial patrolling with the expected attack discovery time, there is no baseline set by previous works. To estimate the quality of the constructed regular strategies, we consider instances where the optimal protection value can be determined by hand, but constructing the associated Defender’s strategy is sufficiently tricky to properly examine the capabilities of our strategy synthesis algorithm.

The experiments also show that our algorithm is sufficiently fast for recomputing a patrolling strategy dynamically when the underlying patrolling graph changes due to unpredictable external events. Hence, the applicability of our results is not limited just to static scenarios.

Our main contribution can be summarized as follows:

  • We propose a formal model for infinite-horizon adversarial patrolling where the damage caused by attacking a target depends on the time needed to discover the attack.

  • We prove that regular strategies can achieve the same limit protection value as general strategies.

  • We design an efficient algorithm synthesizing a regular Defender’s strategy for a given patrolling graph, and we evaluate its functionality experimentally.

1.1. Related work

The literature on regular and adversarial patrolling is rich; the existing overviews include Huang et al. (2019); Almeida et al. (2004); Portugal and Rocha (2011). We give a summary of previous results on infinite-horizon adversarial patrolling, which is perhaps closest to our work.

Most of the existing results concentrate on computing an optimal moving strategy for certain topologies of admissible moves. The underlying solution concept is the Stackelberg equilibrium Sinha et al. (2018); Yin et al. (2010), where the Defender/Attacker play the roles of the Leader/Follower.

For general topologies, the existence of a perfect Defender’s strategy discovering all attacks in time is 𝖯𝖲𝖯𝖠𝖢𝖤\mathsf{PSPACE}-complete Ho and Ouaknine (2015). Consequently, computing an optimal Defender’s strategy is 𝖯𝖲𝖯𝖠𝖢𝖤\mathsf{PSPACE}-hard. Moreover, computing an ε\varepsilon-optimal strategy for ε1/2n\varepsilon\leq 1/2n, where nn is the number of vertices, is 𝖭𝖯\mathsf{NP}-hard Klaška et al. (2020). Hence, no feasible strategy synthesis algorithm can guarantee (sub)optimality for all inputs, and finding high-quality strategy in reasonable time is challenging.

The existing strategy synthesis algorithms are based either on mathematical programming, reinforcement learning, or gradient descent. The first approach suffers from scalability issues caused by non-linear constraints Basilico et al. (2012a, 2009a). Reinforcement learning has so far been successful mainly for patrolling with finite horizon, such as green security games Wang et al. (2019); Biswas et al. (2021); Xu (2021); Karwowski et al. (2019). Gradient descent techniques for finite-memory strategies Kučera and Lamser (2016); Klaška et al. (2018, 2021) are applicable to patrolling graphs of reasonable size.

Patrolling for restricted topologies has been studied for lines, circles (Agmon et al., 2008a, b), or fully connected environments (Brázdil et al., 2018). Apart of special topologies, specific variants and aspects of the patrolling problem have been studied, including moving targets Bosansky et al. (2011); Fang et al. (2013), multiple patrolling units Basilico et al. (2010), movement of the Attacker on the graph Basilico et al. (2009b), or reaction to alarms Munoz de Cote et al. (2013); Basilico et al. (2012b).

2. The Model

In this section we introduce a formal model for infinite-horizon patrolling where the Defender aims at minimizing the expected attack discovery time. The terrain (protected area) is modeled by the standard patrolling graph Basilico et al. (2009a, 2012a); Klaška et al. (2018, 2021). We use the variant where time is spent by performing edges Klaška et al. (2018, 2021) rather than by staying in vertices Basilico et al. (2009a, 2012a). The model of Defender/Attacker is also standard Kučera and Lamser (2016); Klaška et al. (2020) (the study of patrolling models usually starts by considering the scenario with one Defender and one Attacker, and we stick to this approach). The new ingredient of our model in the way of evaluating the protection achieved by Defender’s strategies (see Section 2.3), and here we devote more space to explaining and justifying our definitions.

In the rest of this paper, we use \mathbb{N} and +\mathbb{N}_{+} to denote the sets of non-negative and positive integers. We assume familiarity with basic notions of probability, Markov chain theory, and calculus.

2.1. Terrain model

Locations where the Defender selects the next move are modeled as vertices in a directed graph. The edges correspond to admissible moves, and are labeled by the corresponding traversal time. The protected targets are a subset of vertices with integer weights representing their costs. Formally, a patrolling graph is a tuple G=(V,T,E,𝑡𝑚,α)G=(V,T,E,\mathit{tm},\alpha) where

  • VV is a finite set of vertices (Defender’s positions);

  • TVT\subseteq V is a non-empty set of targets;

  • EV×VE\subseteq V\times V is a set of edges (admissible moves);

  • 𝑡𝑚:E+\mathit{tm}\colon E\to\mathbb{N}_{+} specifies the traversal time of an edge;

  • α:T+\alpha\colon T\to\mathbb{R}_{+} defines the costs of targets.

We require that GG is strongly connected, i.e., for all v,uVv,u\in V there is a path from vv to uu. We write uvu\to v instead of (u,v)E(u,v)\in E, and use αmax\alpha_{\max} to denote the maximal target cost.

The set of all non-empty finite and infinite paths in GG are denoted by \mathcal{H} (histories) and 𝒲\mathcal{W} (walks), respectively. For a given history h=v1,,vnh=v_{1},\ldots,v_{n}, we use 𝑡𝑚(h)=i=1n1𝑡𝑚(vi,vi+1)\mathit{tm}(h)=\sum_{i=1}^{n-1}\mathit{tm}(v_{i},v_{i+1}) to denote the total traversal time of hh.

2.2. Defender and Attacker

We adopt a simplified patrolling scenario with one Defender and one Attacker. In the rest of this section, let GG be a fixed patrolling graph.

2.2.1. Defender.

A Defender’s strategy is a function γ\gamma assigning to every history hh\in\mathcal{H} of Defender’s moves a probability distribution on VV such that γ(h)(v)>0\gamma(h)(v)>0 only if hvhv\in\mathcal{H}, i.e., uvu\to v where uu is the last vertex of hh. We also use 𝑤𝑎𝑙𝑘(h)\mathit{walk}(h) to denote the set of all walks initiated by a given hh\in\mathcal{H}.

For every initial vertex vv where the Defender starts patrolling, the strategy γ\gamma determines the probability space (𝒲,,γ,v)(\mathcal{W},\mathcal{F},\mathbb{P}^{\gamma,v}) over the walks in the standard way, i.e., \mathcal{F} is the σ\sigma-field generated by all 𝑤𝑎𝑙𝑘(h)\mathit{walk}(h) where hh\in\mathcal{H}, and γ,v\mathbb{P}^{\gamma,v} is the unique probability measure satisfying γ,v(𝑤𝑎𝑙𝑘(h))=i=1n1γ(v1,,vi)(vi+1)\mathbb{P}^{\gamma,v}(\mathit{walk}(h))=\prod_{i=1}^{n-1}\gamma(v_{1},\ldots,v_{i})(v_{i+1}) for every history h=v1,,vnh=v_{1},\ldots,v_{n} where v1=vv_{1}=v (if v1vv_{1}\neq v, we have that γ,v(𝑤𝑎𝑙𝑘(h))=0\mathbb{P}^{\gamma,v}(\mathit{walk}(h))=0). We use 𝔼γ,v[R]\mathbb{E}^{\gamma,v}[R] to denote the expected value of a random variable RR in this probability space.

2.2.2. Attacker.

The Attacker observes the history of Defender’s moves and decides whether and where to initiate an attack. In general, the Attacker may be able to determine the next Defender’s move right after the Defender leaves the vertex currently visited. For the Attacker, this is the best moment to attack, because initiating an attack in the middle of a Defender’s move can only decrease the Attacker’s utility (cf. Section 2.3).

Formally, an observation is a sequence o=v1,,vn,vnvn+1o=v_{1},\ldots,v_{n},v_{n}{\rightarrow}v_{n+1}, where v1,,vnv_{1},\ldots,v_{n} is a path in GG. Intuitively, v1,,vnv_{1},\ldots,v_{n} is the sequence of vertices visited by the Defender, vnv_{n} is the currently visited vertex, and vnvn+1v_{n}{\rightarrow}v_{n+1} is the edge taken next. The set of all observations is denoted by Ω\Omega.

An Attacker’s strategy is a function π:Ω{𝑤𝑎𝑖𝑡,𝑎𝑡𝑡𝑎𝑐𝑘τ:τT}\pi\colon\Omega\rightarrow\{\mathit{wait},\mathit{attack}_{\tau}:\tau\in T\}. We require that if π(v1,,vn,vnu)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi(v_{1},\ldots,v_{n},v_{n}{\rightarrow}u)=\mathit{attack}_{\tau} for some τT\tau\in T, then π(v1,,vi,vivi+1)=𝑤𝑎𝑖𝑡\pi(v_{1},\ldots,v_{i},v_{i}{\rightarrow}v_{i+1})=\mathit{wait} for all 1i<n1\leq i<n, i.e., the Attacker exploits an optimal attack opportunity.

2.3. Protection value

Suppose the Defender commits to a strategy γ\gamma and the Attacker selects a strategy π\pi. The expected damage caused by π\pi against γ\gamma is the expected time to discover an attack scheduled by π\pi weighted by target costs.

More precisely, we say that a target τ\tau is attacked along a walk w=v1,v2,w=v_{1},v_{2},\ldots if π(v1,,vn,vnvn+1)=τ\pi(v_{1},\ldots,v_{n},v_{n}\rightarrow v_{n+1})=\tau for some nn. Note that the index nn is unique if it exists. Let m>nm>n be the least index such that vm=τv_{m}=\tau. If no such mm exists, we say that the attack along ww is not discovered. Otherwise, the attack is discovered in time 𝑡𝑚(vn,,vm)\mathit{tm}(v_{n},\ldots,v_{m}).

Let 𝒟π:𝒲\mathcal{D}^{\pi}:\mathcal{W}\rightarrow\mathbb{N}_{\infty} be a function defined as follows:

𝒟π(w)={α(τ) if τ is attacked along w and the attack is discovered in time ; if τ is attacked along w and the attack is not discovered; 0if no target is attacked along w.\mathcal{D}^{\pi}(w)=\begin{cases}\ell\cdot\alpha(\tau)&\parbox[t]{260.17464pt}{if $\tau$ is attacked along $w$ and the attack is\\ discovered in time $\ell$;}\\ \infty&\parbox[t]{260.17464pt}{if $\tau$ is attacked along $w$ and the attack is not\\ discovered;}\\ 0&\mbox{if no target is attacked along $w$.}\end{cases}

The expected damage caused by π\pi against γ\gamma initiated in vv is defined as 𝔼γ,v[𝒟π]\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]. Since the Defender may choose the initial vertex vv, we define the protection value achieved by γ\gamma and the limit protection value as follows:

Val(γ)\displaystyle\operatorname{Val}(\gamma) =\displaystyle= minvsupπ𝔼γ,v[𝒟π]\displaystyle\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{v}\sup_{\pi}\ \mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}] (1)
Val\displaystyle\operatorname{Val} =\displaystyle= infγVal(γ)\displaystyle\inf_{\gamma}\ \operatorname{Val}(\gamma) (2)

We say that a Defender’s strategy γ\gamma is optimal if Val(γ)=Val\operatorname{Val}(\gamma)=\operatorname{Val}.

2.3.1. Discussion

In this section, we discuss possible alternative approaches to formalizing the objective of discovering an initiated attack as quickly as possible.

Note that this objective is implicitly taken into account in regular patrolling where the Defender aims at minimizing the time lag between two consecutive visits for each target (see Section 1). For randomized strategies, one may try to minimize the expected time lag between two consecutive visits for each target. At first glance, this objective seems similar to minimizing supπ𝔼γ,v[𝒟π]\sup_{\pi}\ \mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]. In reality, the objective is different and problematic. To see this, consider the trivial patrolling graph of Fig. 1a with two targets τ1,τ2\tau_{1},\tau_{2} and four edges (incl. two self-loops) with traversal time 11. The costs of both targets are equal to 11. A natural strategy γ1\gamma_{1} for patrolling these targets is a deterministic loop alternately visiting τ1\tau_{1} and τ2\tau_{2} (see Fig 1b). Then, the maximal expected time lag between two consecutive visits of a target is 22, and we also have that Val(γ1)=2\operatorname{Val}(\gamma_{1})=2. However, consider the randomized strategy γ2\gamma_{2} of Fig. 1c. In the target currently visited, the Defender performs the self-loop with probability 0.990.99, and with the remaining probability 0.010.01, the Defender moves to the other target (see the dashed arrows in Fig. 1c). For γ2\gamma_{2}, the maximal expected time lag between two consecutive visits of a target is again equal to 22 (in Markov chain terminology, the stationary distribution determined by γ2\gamma_{2} assigns 1/21/2 to each target, and hence the mean recurrence time is equal to 22 for each target). Hence, if we adopted minimizing the maximal expected time lag between two consecutive visits of a target as the Defender’s objective, the strategies γ1\gamma_{1} and γ2\gamma_{2} would be equivalently good, despite the fact that the expected time to visit τ2\tau_{2} from τ1\tau_{1} is 100100 when the Defender commits to γ2\gamma_{2}. Observe that the difference between γ1\gamma_{1} and γ2\gamma_{2} is captured properly by our approach.

(a)τ1\tau_{1}τ2\tau_{2}111111(b)τ1\tau_{1}τ2\tau_{2}(c)τ1\tau_{1}τ2\tau_{2}0.010.010.010.010.990.990.990.99
Figure 1. We compare two strategies (b) and (c) on graph (a). Strategy (b) patrols the targets in a deterministic loop, while strategy (c) applies randomization. Both have the same expected time lag between two consecutive visits to a target but differ in the protection significantly.

Let us note that although the example of Fig. 1 contains self-loops (which do not appear in real-world patrolling graphs), these can easily be avoided by inserting auxiliary vertices so that the demonstrated deficiency is still present.

One may still argue that the above problem is caused just by allowing the Defender to use randomized strategies. This is a valid objection. So, the question is whether randomization is really needed, i.e., whether the Defender can achieve strictly better protection by using randomized strategies.

Consider the graph of Fig. 2 with two targets τ1\tau_{1} and τ2\tau_{2} where α(τ1)=1\alpha(\tau_{1})=1, α(τ2)=2\alpha(\tau_{2})=2, and the traversal time of every edge is 11.

The protection achieved by an arbitrary deterministic strategy is not better than 88, because the Attacker can wait until the Defender starts moving from τ2\tau_{2} to vv so that the next move selected after arriving to vv will be edge leading to τ1\tau_{1}. Note that the Attacker knows the Defender’s strategy and can observe its moves, and hence he can recognize this attack opportunity. Since the Defender needs at least 44 time units to return to τ2\tau_{2}, the expected damage caused by this attack is at least 88.

The Attacker’s ability to anticipate future Defender’s moves can be decreased by randomization. Consider the simple strategy σb\sigma_{b} of Fig. 2b, where the Defender moves from vv to τ1\tau_{1} and τ2\tau_{2} with probability pp and 1p1-p, respectively. After reaching a target, the Defender returns to vv. The optimal value of pp is 724120.3\frac{7}{2}-\frac{\sqrt{41}}{2}\doteq 0.3, and the expected damage of an optimal attack is then 7.7\doteq 7.7. Note that the strategy σb\sigma_{b} is memoryless, i.e., its decisions depend just on the currently visited vertex.

At first glance, it is not clear whether the protection can be improved, because the probability pp used by σb\sigma_{b} implements an optimal “balance” between visiting τ1\tau_{1} and τ2\tau_{2} determined by the weights of τ1\tau_{1} and τ2\tau_{2}. However, consider the finite-memory strategy σc\sigma_{c} of Fig. 2c which is “almost deterministic” except for the moment when the Defender returns to vv from τ2\tau_{2}. Here, the strategy select the next edge uniformly at random. Then, the expected damage of an optimal attack is 66 (the best attack opportunity is to attack τ2\tau_{2} right after the robot starts moving form τ2\tau_{2} to vv. The expected time need to visit τ2\tau_{2} is then equal to 33, yielding the expected damage 66). Hence, protection is increased not only by randomization, but also by appropriate use of memory.

(c)τ1\tau_{1}τ2\tau_{2}vv0.50.50.50.5(b)τ1\tau_{1}τ2\tau_{2}vvpp1p1-p(a)τ1\tau_{1}τ2\tau_{2}vv1122
Figure 2. On graph (a), a memoryless randomized strategy (b) is outperformed by a randomized strategy (c) with finite memory.

3. Finite-memory Defender’s strategies

In this section, we prove that finite-memory Defender’s strategies can achieve the same limit protection value as general strategies.

Let GG be a patrolling graph. A general Defender’s strategy for GG (see Section 2.2.1) depends on the whole history of moves and cannot be finitely represented. A computationally feasible subclass are finite-memory (or regular) strategies Kučera and Lamser (2016); Klaška et al. (2018, 2021) where the relevant information about the history is represented by finitely memory elements assigned to each vertex.

Formally, let mem:V\operatorname{mem}\colon V\to\mathbb{N} be a function assigning to every vertex the number of memory elements. The set of augmented vertices is defined by V^={(v,m):vV, 1mmem(v)}\smash{\widehat{V}}=\{(v,m)\colon v\in V,\,1\leq m\leq\operatorname{mem}(v)\}. We use v^\smash{\widehat{v}} to denote an augmented vertex of the form (v,m)(v,m) where mmem(v)m\leq\operatorname{mem}(v).

A regular Defender’s strategy for GG is a function σ:V^𝐷𝑖𝑠𝑡(V^)\sigma\colon\smash{\widehat{V}}\to\mathit{Dist}(\smash{\widehat{V}}) where σ(v,m)(v,m)>0\sigma(v,m)(v^{\prime},m^{\prime})>0 only if vvv\to v^{\prime}. We say that σ\sigma is unambiguous if for all v,vVv,v^{\prime}\in V and mmem(v)m\leq\operatorname{mem}(v) there is at most one mm^{\prime} such that σ(v,m)(v,m)>0\sigma(v,m)(v^{\prime},m^{\prime})>0.

Intuitively, the Defender starts patrolling in a designated initial vertex vv with initial memory element mm, and then traverses the vertices of GG and updates the memory according to σ\sigma. Hence, the current memory element represents some information about the history of visited vertices.

For every initial v^V^\smash{\widehat{v}}\in\smash{\widehat{V}}, the strategy σ\sigma determines the probability space over the walks in the way described in Section 2.2.1. The only difference is that the probability of 𝑤𝑎𝑙𝑘(v1,,vn)\mathit{walk}(v_{1},\ldots,v_{n}) where v1=vv_{1}=v is defined as v^2,,v^ni=1n1σ(v^i)(v^i+1)\sum_{\smash{\widehat{v}}_{2},\ldots,\smash{\widehat{v}}_{n}}\prod_{i=1}^{n-1}\sigma(\smash{\widehat{v}}_{i})(\smash{\widehat{v}}_{i+1}). Here, v^1=v^\smash{\widehat{v}}_{1}=\smash{\widehat{v}} is the initial augmented vertex. Hence, the notion of protection value defined in Section 2.3 is applicable also to regular strategies (where minv\min_{v} is replaced with minv^\min_{\smash{\widehat{v}}}) in (1)).

An important question is whether regular strategies can achieve the same limit protection value as general strategies. The answer is positive, and it is proven in two steps. First, we show that there exists an optimal Defender’s strategy γ\gamma satisfying Val(γ)=Val\operatorname{Val}(\gamma)=\operatorname{Val} (see Section 2.3). Then, for arbitrarily small ε>0\varepsilon>0, we prove the existence of a regular strategy σ\sigma such that Val(σ)Val(γ)+ε\operatorname{Val}(\sigma)\leq\operatorname{Val}(\gamma)+\varepsilon.

{theorem}

For every patrolling graph, there exists a Defender’s strategy γ\gamma such that Val(γ)=Val\operatorname{Val}(\gamma)=\operatorname{Val}.

Proof.

By the definition of Val\operatorname{Val}, there exist a vertex vv and an infinite sequence Γ=γ1,γ2,\Gamma=\gamma_{1},\gamma_{2},\ldots of Defender’s strategies such that Val(γi)=supπ𝔼γi,v[𝒟π]\operatorname{Val}(\gamma_{i})=\sup_{\pi}\mathbb{E}^{\gamma_{i},v}[\mathcal{D}^{\pi}] for all i1i\geq 1, and the infinite sequence Val(γ1),Val(γ2),\operatorname{Val}(\gamma_{1}),\operatorname{Val}(\gamma_{2}),\ldots converges to Val\operatorname{Val}.

Let Histories=h1,h2,\textit{Histories}=h_{1},h_{2},\ldots be a sequence where every history occurs exactly once (without any special ordering). Let Γ0=Γ\Gamma_{0}=\Gamma. For every i1i\geq 1, we inductively define an infinite sequence of strategies Γi\Gamma_{i} and a probability distribution γ(hi)\gamma(h_{i}) over VV, assuming that Γi1\Gamma_{i-1} has already been defined. Let tt be the last vertex of hih_{i}, and let {u1,,uk}\{u_{1},\ldots,u_{k}\} be the set of all immediate successors of tt in GG (i.e., tujt\to u_{j} for all jkj\leq k). Since every bounded infinite sequence of real numbers contains an infinite convergent subsequence, there exists an infinite subsequence Γi=ϱ1,ϱ2,\Gamma_{i}=\varrho_{1},\varrho_{2},\ldots of Γi1\Gamma_{i-1} such that the sequence

ϱ1(hi)(uj),ϱ2(hi)(uj),\varrho_{1}(h_{i})(u_{j}),\ \varrho_{2}(h_{i})(u_{j}),\ \ldots

is convergent for every jkj\leq k. We put

γ(hi)(uj)=limϱ(hi)(uj).\gamma(h_{i})(u_{j})\quad=\quad\lim_{\ell\to\infty}\ \varrho_{\ell}(h_{i})(u_{j})\,.

It is easy to check that j=1kγ(hi)(uj)=1\sum_{j=1}^{k}\gamma(h_{i})(u_{j})=1, i.e., γ(hi)\gamma(h_{i}) is indeed a distribution on VV. Hence, the function γ\gamma is a Defender’s strategy, and we show that Val(γ)=Val\operatorname{Val}(\gamma)=\operatorname{Val}.

For the sake of contradiction, suppose supπ𝔼γ,v[𝒟π]Val=δ>0\sup_{\pi}\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]-\operatorname{Val}=\delta>0. Let ε=δ/4\varepsilon=\delta/4. Then, there exists an Attackers strategy π\pi^{*} such that

𝔼γ,v[𝒟π]Val+δε\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\geq\quad\operatorname{Val}\ +\ \delta\ -\ \varepsilon (3)

Let 𝐴𝑡𝑡(Ω)Ω\mathit{Att}(\Omega)\subseteq\Omega be the set of all observations oo such that π(o)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi^{*}(o)=\mathit{attack}_{\tau} for some τT\tau\in T. For every o=v1,,vn,vnvn+1𝐴𝑡𝑡(Ω)o=v_{1},\ldots,v_{n},v_{n}{\to}v_{n+1}\in\mathit{Att}(\Omega), let 𝑤𝑎𝑙𝑘(o)\mathit{walk}(o) be the set of all walks starting with v1,,vn+1v_{1},\ldots,v_{n+1}. Observe that if o,o𝐴𝑡𝑡(Ω)o,o^{\prime}\in\mathit{Att}(\Omega) and ooo\neq o^{\prime}, then 𝑤𝑎𝑙𝑘(o)𝑤𝑎𝑙𝑘(o)=\mathit{walk}(o)\cap\mathit{walk}(o^{\prime})=\emptyset. Furthermore, for every w𝒲o𝐴𝑡𝑡(Ω)𝑤𝑎𝑙𝑘(o)w\in\mathcal{W}\smallsetminus\bigcup_{o\in\mathit{Att}(\Omega)}\mathit{walk}(o), we have that 𝒟π(w)=0\mathcal{D}^{\pi^{*}}(w)=0. Hence, we obtain

𝔼γ,v[𝒟π]=o𝐴𝑡𝑡(Ω)γ,v(𝑤𝑎𝑙𝑘(o))𝔼γ,v[𝒟π𝑤𝑎𝑙𝑘(o)]\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad=\quad\sum_{o\in\mathit{Att}(\Omega)}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)] (4)

where 𝔼γ,v[𝒟π𝑤𝑎𝑙𝑘(o)]\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)] is the conditional expected value of 𝒟π\mathcal{D}^{\pi^{*}} under the condition that a walk starts with oo. (Note that the conditional expectation is undefined when γ,v(𝑤𝑎𝑙𝑘(o))=0\mathbb{P}^{\gamma,v}(\mathit{walk}(o))=0. In that case, “0undefined0\cdot\textrm{undefined}” is interpreted as 0.) Hence, there exists a finite set of observations O𝐴𝑡𝑡(Ω)O\subseteq\mathit{Att}(\Omega) such that

𝔼γ,v[𝒟π]ε+oOγ,v(𝑤𝑎𝑙𝑘(o))𝔼γ,v[𝒟π𝑤𝑎𝑙𝑘(o)]\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\leq\quad\varepsilon+\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)] (5)

For each o=v1,,vn,vnvn+1Oo=v_{1},\ldots,v_{n},v_{n}{\to}v_{n+1}\in O, let τo\tau_{o} be the target attacked by π\pi after observing oo, and let (o)\mathcal{H}(o) be the set of all histories hh initiated in vn+1v_{n+1} such that τo\tau_{o} is the last vertex of hh and τo\tau_{o} occurs exactly once in hh. We use oho\odot h to denote the history v1,,vn,hv_{1},\ldots,v_{n},h. We have that 𝔼γ,v[𝒟π𝑤𝑎𝑙𝑘(o)]\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)] is equal to

h(o)γ,v(𝑤𝑎𝑙𝑘(oh)𝑤𝑎𝑙𝑘(o))(𝑡𝑚(h)+𝑡𝑚(vn,vn+1))α(τo)\sum_{h\in\mathcal{H}(o)}\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)}\cdot\big{(}\mathit{tm}(h)+\mathit{tm}(v_{n},v_{n+1})\big{)}\cdot\alpha(\tau_{o}) (6)

where γ,v(𝑤𝑎𝑙𝑘(oh)𝑤𝑎𝑙𝑘(o))\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)} is the conditional probability of performing a walk starting with oho\odot h under the condition that a walk starting with oo is performed. Clearly, there exists a finite H(o)(o)H(o)\subseteq\mathcal{H}(o) such that the sum (6) decreases at most by ε\varepsilon when hh ranges over H(o)H(o) instead of (o)\mathcal{H}(o). For short, we use Eγ,v(o)E^{\gamma,v}(o) to denote the sum

hH(o)γ,v(𝑤𝑎𝑙𝑘(oh)𝑤𝑎𝑙𝑘(o))(𝑡𝑚(h)+𝑡𝑚(vn,vn+1))α(τo)\sum_{h\in H(o)}\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)}\cdot\big{(}\mathit{tm}(h)+\mathit{tm}(v_{n},v_{n+1})\big{)}\cdot\alpha(\tau_{o}) (7)

Hence,

𝔼γ,v[𝒟π𝑤𝑎𝑙𝑘(o)]ε+Eγ,v(o)\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]\quad\leq\quad\varepsilon+E^{\gamma,v}(o) (8)

By combining (5) and (8), we obtain

𝔼γ,v[𝒟π]2ε+oOγ,v(𝑤𝑎𝑙𝑘(o))Eγ,v(o)\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\leq\quad 2\varepsilon+\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o) (9)

Since ε=δ/4\varepsilon=\delta/4, this implies

oOγ,v(𝑤𝑎𝑙𝑘(o))Eγ,v(o)Val+ε\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o)\quad\geq\quad\operatorname{Val}+\varepsilon (10)

Let H=oOH(o)H=\bigcup_{o\in O}H(o). Since HH is finite, there exists an index \ell such that all elements of HH appear among the first \ell elements of Histories. Consider the sequence Γ=ϱ1,ϱ2,\Gamma_{\ell}=\varrho_{1},\varrho_{2},\ldots and observe that, for all i1i\geq 1,

Val(ϱi)\displaystyle\operatorname{Val}(\varrho_{i}) =\displaystyle= supπ𝔼ϱi,v[𝒟π]\displaystyle\sup_{\pi}\mathbb{E}^{\varrho_{i},v}[\mathcal{D}^{\pi}] (11)
\displaystyle\geq 𝔼ϱi,v[𝒟π]\displaystyle\mathbb{E}^{\varrho_{i},v}[\mathcal{D}^{\pi^{*}}]
\displaystyle\geq oOϱi,v(𝑤𝑎𝑙𝑘(o))Eϱi,v(o)\displaystyle\sum_{o\in O}\mathbb{P}^{\varrho_{i},v}(\mathit{walk}(o))\cdot E^{\varrho_{i},v}(o)

Since the sequence of distributions ϱ1(h),ϱ2(h),\varrho_{1}(h),\varrho_{2}(h),\ldots converges to γ(h)\gamma(h) for every hHh\in H, we also obtain that

limioOϱi,v(𝑤𝑎𝑙𝑘(o))Eϱi,v(o)=oOγ,v(𝑤𝑎𝑙𝑘(o))Eγ,v(o)\lim_{i\to\infty}\sum_{o\in O}\mathbb{P}^{\varrho_{i},v}(\mathit{walk}(o))\cdot E^{\varrho_{i},v}(o)\ =\ \sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o) (12)

By combining (10), (11), and (22), we obtain Val(ϱj)Val+ε/2\operatorname{Val}(\varrho_{j})\geq\operatorname{Val}+\varepsilon/2 for all sufficiently large jj. This means that the sequence Val(ϱ1),Val(ϱ2),\operatorname{Val}(\varrho_{1}),\operatorname{Val}(\varrho_{2}),\ldots does not converge to Val\operatorname{Val}, and we have a contradiction. ∎

{theorem}

Let GG be a patrolling graph, and let 𝑅𝑒𝑔\mathit{Reg} be the class of all regular strategies for GG. Then infσ𝑅𝑒𝑔Val(σ)=Val\inf_{\sigma\in\mathit{Reg}}\operatorname{Val}(\sigma)=\operatorname{Val}.

Proof.

By Theorem 3.1, there exists a Defender’s strategy γ\gamma such that Val(γ)=Val\operatorname{Val}(\gamma)=\operatorname{Val}. We show that for every ε>0\varepsilon>0, there exist sufficiently large d,d,\ell\in\mathbb{N} such that Val(σd,)Val(γ)+ε\operatorname{Val}(\sigma_{d,\ell})\leq\operatorname{Val}(\gamma)+\varepsilon, where σδ,\sigma_{\delta,\ell} is a regular strategy obtained by dd-discretization and \ell-folding of γ\gamma.

A dd-discretization of γ\gamma is a strategy γd\gamma_{d} where for all hh\in\mathcal{H} and vVv\in V, the following conditions are satisfied:

  • γd(h)(v)=k/d\gamma_{d}(h)(v)=k/d for some k{0,,d}k\in\{0,\ldots,d\};

  • γd(h)(v)=0\gamma_{d}(h)(v)=0 iff γ(h)(v)=0\gamma(h)(v)=0;

  • |γd(h)(v)γ(h)(v)||V|/d|\gamma_{d}(h)(v)-\gamma(h)(v)|\leq|V|/d.

Observe that a dd-discretization of γ\gamma exists for every d|V|d\geq|V|.

The regular strategy σd,\sigma_{d,\ell} is obtained by \ell-folding the strategy γd\gamma_{d} constructed for a sufficiently large dd. We can view γd\gamma_{d} as an infinite tree TT where the nodes are histories and hxhuh\stackrel{{\scriptstyle\raisebox{-0.90417pt}{\scriptsize$x$}}}{{\rightarrow}}hu iff γ(h)(u)=x\gamma(h)(u)=x. Furthermore, we label each node hh of TT with the last vertex of hh. Since the edge probabilities range over finitely many values, the tree TT contains only finitely many subtrees of height \ell up to isomorphism preserving both node and edge labels. For a given history h=v0,,vnh=v_{0},\ldots,v_{n}, let fh,shf_{h},s_{h}\in\mathbb{N} be the lexicographically smallest pair of indexes such that fh<shf_{h}<s_{h}, both fhf_{h} and shs_{h} are integer multiples of \ell, and the subtrees of height \ell rooted by vfhv_{f_{h}} and vshv_{s_{h}} are isomorphic. If no such fh,shf_{h},s_{h} exist, we say that hh does not contain a folding pair.

Note that there exists a constant cc_{\ell} independent of hh such that every history hh of length at least cc_{\ell} contains a folding pair fh,shf_{h},s_{h} with both components bounded by cc_{\ell}. We define a strategy γd,\gamma_{d,\ell} as follows:

  • γd,(h)=γd(h)\gamma_{d,\ell}(h)=\gamma_{d}(h) for every history hh without a folding pair.

  • γd,(h)=γd,(h)\gamma_{d,\ell}(h)=\gamma_{d,\ell}(h^{\prime}) for every history hh with a folding pair fh,shf_{h},s_{h}, where the hh^{\prime} is obtained from h=v0,,vnh=v_{0},\ldots,v_{n} by deleting the subpath vfh,,vsh1v_{f_{h}},\ldots,v_{s_{h}-1}.

Note that γd,\gamma_{d,\ell} can be equivalently represented as a finite-memory strategy σd,\sigma_{d,\ell} where the memory elements correspond to the (finitely many) histories without a folding pair.

It remains to show that for every ε>0\varepsilon>0 there are sufficiently large d,d,\ell such that Val(γd,)Val+ε\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}+\varepsilon.

We start by observing important structural properties of the optimal strategy γ\gamma. Let vVv^{*}\in V such that supπ𝔼γ,v[𝒟π]=Val\sup_{\pi}\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]=\operatorname{Val}. A history hh is γ\gamma-eligible if γ,v(𝑤𝑎𝑙𝑘(h))>0\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))>0. For every γ\gamma-eligible history hh, let γ[h]\gamma[h] be a strategy such that γ[h](h)=γ(hh)\gamma[h](h^{\prime})=\gamma(h\odot h^{\prime}) for every hh^{\prime}\in\mathcal{H} initiated in the last vertex uu of hh (for the other hh^{\prime}, the strategy γ[h]\gamma[h] is defined arbitrarily). We show that

supπ𝔼γ[h],u[𝒟π]=Val\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]\quad=\quad\operatorname{Val} (13)

Clearly,

supπ𝔼γ[h],u[𝒟π]Val\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]\quad\geq\quad\operatorname{Val} (14)

for every γ\gamma-eligible hh because otherwise we have a contradiction with the definition of Val\operatorname{Val}. Now suppose supπ𝔼γ[h],u[𝒟π]>Val\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]>\operatorname{Val} for some γ\gamma-eligible h=v1,,vnh=v_{1},\ldots,v_{n}. We show that then there is an Attacker’s strategy π\pi such that 𝔼γ,v[𝒟π]>Val\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]>\operatorname{Val}, which contradicts the optimality of γ\gamma. Let HH be the set of all γ\gamma-eligible histories of the form v1,,vk,tv_{1},\ldots,v_{k},t where k<nk<n and tvk+1t\neq v_{k+1}. Furthermore, let κ=supπ𝔼γ[h],u[𝒟π]Val\kappa=\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]-\operatorname{Val}, and let δ>0\delta>0 be a constant satisfying

δ(1γ,v(𝑤𝑎𝑙𝑘(h)))γ,v(𝑤𝑎𝑙𝑘(h))κ3\delta\cdot\big{(}1-\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))\big{)}\quad\leq\quad\frac{\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))\cdot\kappa}{3}

For every hHh^{\prime}\in H, let πh\pi_{h^{\prime}} be an Attacker’s strategy such that 𝔼γ[h],u[𝒟πh]Valδ\mathbb{E}^{\gamma[h^{\prime}],u}[\mathcal{D}^{\pi_{h^{\prime}}}]\geq\operatorname{Val}-\delta, and we also fix an Attacker’s strategy πh\pi_{h} such that 𝔼γ[h],vn[𝒟πh]Val+23κ\mathbb{E}^{\gamma[h],v_{n}}[\mathcal{D}^{\pi_{h^{\prime}}}]\geq\operatorname{Val}+\frac{2}{3}\kappa. The strategy π\pi is defined as follows. For every H{h}\hbar\in H\cup\{h\} and every observation of the form h′′,uth^{\prime\prime},u\to t such that h′′th^{\prime\prime}t is γ[]\gamma[\hbar]-eligible and π(h′′,ut)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi_{\hbar}(h^{\prime\prime},u{\to}t)=\mathit{attack}_{\tau} for some τT\tau\in T, we put π(h′′,ut)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi(\hbar\odot h^{\prime\prime},u{\to}t)=\mathit{attack}_{\tau}. Thus, we obtain 𝔼γ,v[𝒟π]Val+κ/3\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]\geq\operatorname{Val}+\kappa/3.

For every target τ\tau, let π[τ]\pi[\tau] be the Attacker’s strategy where π[σ](u,uv)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi[\sigma](u,u\to v)=\mathit{attack}_{\tau} for all u,vVu,v\in V, i.e., π[τ]\pi[\tau] attacks τ\tau right after the Defender starts its walk. An immediate consequence of (13) is that for every γ\gamma-eligible history hh ending in a vertex uu and every target τ\tau, we have that

𝔼γ[h],u[𝒟π[τ]]Val\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}]\quad\leq\quad\operatorname{Val} (15)

Let RτR_{\tau} be a function assigning to every walk w=v0,v1,w=v_{0},v_{1},\ldots the least nn such that vn=τv_{n}=\tau. If there is no such nn, we put Rτ(w)=R_{\tau}(w)=\infty. Since Rτ𝒟π[τ]R_{\tau}\leq\mathcal{D}^{\pi[\tau]}, we have that

γ[h],u(Rτ2Val)12\mathbb{P}^{\gamma[h],u}(R_{\tau}\geq 2\operatorname{Val})\quad\leq\quad\frac{1}{2} (16)

by (15) and Markov inequality.

Note that (16) holds for all γ\gamma-eligible hh and τ\tau. Hence, we have that

γ[h],u(Rτi2Val)12i\mathbb{P}^{\gamma[h],u}(R_{\tau}\geq i\cdot 2\operatorname{Val})\quad\leq\quad\frac{1}{2^{i}} (17)

because Rτi2ValR_{\tau}\geq i\cdot 2\operatorname{Val} requires success of ii consecutive independent experiments where each experiment succeeds with probability bounded by 1/21/2.

Now we show that for every ε>0\varepsilon>0, there exist sufficiently large d,d,\ell such that Val(γd,)Val(γ)+ε\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}(\gamma)+\varepsilon, which proves our theorem.

For the rest of this proof, we fix ε>0\varepsilon>0. Furthermore, we fix kk\in\mathbb{N} satisfying

i=212k(i1)ik2Val𝑡𝑚maxαmaxε/4\sum_{i=2}^{\infty}\frac{1}{2^{k(i-1)}}\cdot i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max}\quad\leq\quad\varepsilon/4 (18)

Here 𝑡𝑚max=max{𝑡𝑚(e)eE}\mathit{tm}_{\max}=\max\{\mathit{tm}(e)\mid e\in E\} and αmax=max{α(τ)τT}\alpha_{\max}=\max\{\alpha(\tau)\mid\tau\in T\}. Note that such a kk exists because the above sum converges to 0 as kk\to\infty. For every ii\in\mathbb{N}, let i\mathcal{I}_{i} be the set of all integers jj satisfying

(i1)k2Valj<ik2Val.(i-1)\cdot k\cdot 2\operatorname{Val}\quad\leq\quad j\quad<\quad i\cdot k\cdot 2\operatorname{Val}\,.

We have that

𝔼γ[h],u[𝒟π[τ]]=i=1𝔼γ[h],u[𝒟π[τ]Rτi]γ[h],u[Rτi]\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}]=\sum_{i=1}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}] (19)

Observe

𝔼γ[h],u[𝒟π[τ]Rτi]\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}] \displaystyle\leq ik2Val𝑡𝑚maxαmax\displaystyle i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max} (20)
γ[h],u[Rτi]\displaystyle\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}] \displaystyle\leq 12k(i1)\displaystyle\frac{1}{2^{k(i-1)}} (21)

Inequality (20) is trivial, and (21) follows from (17). Using (18), we obtain

i=2𝔼γ[h],u[𝒟π[τ]Rτi]γ[h],u[Rτi]\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}] (22)
\displaystyle\leq i=2ik2Val𝑡𝑚maxαmax12k(i1)ε/4\displaystyle\sum_{i=2}^{\infty}i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max}\cdot\frac{1}{2^{k(i-1)}}\quad\leq\quad\varepsilon/4

Consider a strategy γd\gamma_{d} where d|V|d\geq|V|. Then every γ\gamma-eligible history is γd\gamma_{d}-eligible, and vice-versa. Furthermore, for every δ>0\delta>0, there exists a sufficiently large dd such that, for all hh, τ\tau, and ii,

𝔼γd[h],u[𝒟π[τ]Rτi]\displaystyle\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}] \displaystyle\leq 𝔼γ[h],u[𝒟π[τ]Rτi]+δ\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]+\delta
γd[h],u[Rτi]\displaystyle\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{i}] \displaystyle\leq (1/2+δ)k(i1)\displaystyle(1/2+\delta)^{k\cdot(i-1)}

Consequently, we can fix a sufficiently large dd such that the values of

𝔼γ[h],u[𝒟π[τ]Rτ1]γ[h],u[Rτ1],\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{1}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{1}],
i=2𝔼γ[h],u[𝒟π[τ]Rτi]γ[h],u[Rτi]\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}]

increase at most by ε/4\varepsilon/4 when γ\gamma is replaced with γd\gamma_{d}. Then,

𝔼γd[h],u[𝒟π[τ]]γd[h],u[Rτ1]\displaystyle\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}]\cdot\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{1}] \displaystyle\leq Val+ε4\displaystyle\operatorname{Val}+\frac{\varepsilon}{4}
i=2𝔼γd[h],u[𝒟π[τ]Rτi]γd[h],u[Rτi]\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{i}] \displaystyle\leq ε2\displaystyle\frac{\varepsilon}{2} (23)

Now consider a strategy γd,\gamma_{d,\ell} where =k2Val\ell=k\cdot 2\operatorname{Val}, and let hh be a γd,\gamma_{d,\ell}-eligible history. Then 𝔼γd,[h],u[𝒟π[τ]]\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}] is equal to

i=1𝔼γd,[h],u[𝒟π[τ]Rτi]γd,[h],u[Rτi]\sum_{i=1}^{\infty}\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma_{d,\ell}[h],u}[R_{\tau}\in\mathcal{I}_{i}]

By definition of \ell-folding, for every i1i\geq 1 there is a γd\gamma_{d}-eligible history hh^{\prime} without a folding pair such that

𝔼γd,[h],u[𝒟π[τ]Rτi]=𝔼γd[h],u[𝒟π[τ]Rτ1]\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\ =\ \mathbb{E}^{\gamma_{d}[h^{\prime}],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{1}] (24)

Furthermore,

γd,[h],u[Rτi](1ξ)k(i1)\mathbb{P}^{\gamma_{d,\ell}[h],u}[R_{\tau}\in\mathcal{I}_{i}]\ \leq\ (1-\xi)^{k(i-1)} (25)

where ξ\xi is the maximal γd[h],u[Rτ1]\mathbb{P}^{\gamma_{d}[h^{\prime}],u}[R_{\tau}\in\mathcal{I}_{1}] such that hh^{\prime} is a γd\gamma_{d}-eligible history hh^{\prime} without a folding pair .

Due to (24) and (25), the bounds of (23) remain valid when γd\gamma_{d} is replaced with γd,\gamma_{d,\ell} and hh is a γd,\gamma_{d,\ell}-eligible history. Hence,

𝔼γd,[h],u[𝒟π[τ]]Val+ε\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}]\leq\operatorname{Val}+\varepsilon (26)

Since (26) holds for every γd,\gamma_{d,\ell}-eligible hh and every target τ\tau, we obtain Val(γd,)Val+ε\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}+\varepsilon. ∎

4. Strategy Synthesis Algorithm

In this section, we design an efficient algorithm synthesizing a regular Defender’s strategy for a given patrolling graph capable of balancing the trade-off between maximizing Val(σ)\operatorname{Val}(\sigma) and minimizing the tail probabilities Pc(Val(σ))P_{c}(\operatorname{Val}(\sigma)).

4.1. Computing Val(σ)\operatorname{Val}(\sigma)

First, we show how to compute Val(σ)\operatorname{Val}(\sigma) for a given regular strategy σ\sigma. Let GG be a patrolling graph and σ\sigma a regular strategy for GG. Let E^\smash{\widehat{E}} be the set of all (u^,v^)V^×V^(\smash{\widehat{u}},\smash{\widehat{v}})\in\smash{\widehat{V}}\times\smash{\widehat{V}} such that σ(u^)(v^)>0\sigma(\smash{\widehat{u}})(\smash{\widehat{v}})>0, i.e., E^\smash{\widehat{E}} is the set of augmented edges used by σ\sigma. For every target τ\tau, let π[τ]\pi[\tau] be the Attacker strategy where for all u,vVu,v\in V we have that π[τ](u,uv)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi[\tau](u,u\to v)=\mathit{attack}_{\tau} , i.e., π[τ]\pi[\tau] attacks τ\tau immediately after the Defender starts its walk.

For every e^=(u^,v^)E^\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in\smash{\widehat{E}} and τT\tau\in T, let τ,e^\mathcal{L}_{\tau,\smash{\widehat{e}}} be the expected damage caused by an attack at τ\tau scheduled right after the Defender starts traversing e^\smash{\widehat{e}}, i.e.,

τ,e^=𝔼σ,u^[𝒟π[τ]walk(e^)].\mathcal{L}_{\tau,\smash{\widehat{e}}}\quad=\quad\mathbb{E}^{\sigma,\smash{\widehat{u}}}\big{[}\mathcal{D}^{\pi[\tau]}\mid walk(\smash{\widehat{e}})\big{]}\,. (27)

Hence, τ,e^\mathcal{L}_{\tau,\smash{\widehat{e}}} is the conditional expected value of 𝒟π[τ]\mathcal{D}^{\pi[\tau]} under the condition that the Defender’s walk starts by traversing e^\smash{\widehat{e}}.

Consider the directed graph G^=(V^,E^)\smash{\widehat{G}}=(\smash{\widehat{V}},\smash{\widehat{E}}), and let \mathcal{B} denote the set of all bottom strongly connected components of G^\smash{\widehat{G}}. Let

(σ)=minBmaxτTmaxe^E(B)τ,e^\mathcal{L}(\sigma)\quad=\quad\min_{B\in\mathcal{B}}\ \max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B)}\ \mathcal{L}_{\tau,\smash{\widehat{e}}} (28)

where E(B)=E^(B×B)E(B)=\smash{\widehat{E}}\cap(B\times B) is the set of augmented edges in the component BB used by σ\sigma. We have the following:

{theorem}

Let σ\sigma be a regular strategy for a patrolling graph GG. Then Val(σ)(σ)\operatorname{Val}(\sigma)\leq\mathcal{L}(\sigma). If σ\sigma is unambiguous, then Val(σ)=(σ)\operatorname{Val}(\sigma)=\mathcal{L}(\sigma).

Proof.

For purposes of this proof, we need to introduce several notions. An augmented walk is an infinite sequence v^1,v^2,\smash{\widehat{v}}_{1},\smash{\widehat{v}}_{2},\dots such that vivi+1v_{i}\to v_{i+1} for all ii. The set of all augmented walks is denoted 𝒲^\smash{\widehat{\mathcal{W}}}. An augmented history is a non-empty finite prefix of an augmented walk. The set of all augmented histories is denoted by ^\smash{\widehat{\mathcal{H}}}, and for h^^\smash{\widehat{h}}\in\smash{\widehat{\mathcal{H}}}, 𝑤𝑎𝑙𝑘(h^)\mathit{walk}(\smash{\widehat{h}}) denotes the set of all augmented walks starting with h^\smash{\widehat{h}}.

Furthermore, for every regular strategy σ\sigma and every initial v^V^\smash{\widehat{v}}\in\smash{\widehat{V}}, we define the probability space (𝒲^,^,^σ,v^)(\smash{\widehat{\mathcal{W}}},\smash{\widehat{\mathcal{F}}},\smash{\widehat{\mathbb{P}}}^{\sigma,\smash{\widehat{v}}}) over the augmented walks in the expected way.

For an arbitrary walk w=v1,v2,w=v_{1},v_{2},\dots, let Aug(w)Aug(w) denote the set of all augmented walks of the form v^1,v^2,\smash{\widehat{v}}_{1},\smash{\widehat{v}}_{2},\dots. Observe that for every measurable event FF\in\mathcal{F}, writing F^=wFAug(w)\smash{\widehat{F}}=\bigcup_{w\in F}Aug(w), we have F^^\smash{\widehat{F}}\in\smash{\widehat{\mathcal{F}}} and ^σ,v^(F^)=σ,v^(F)\smash{\widehat{\mathbb{P}}}^{\sigma,\smash{\widehat{v}}}(\smash{\widehat{F}})=\mathbb{P}^{\sigma,\smash{\widehat{v}}}(F). Hence, to simplify our notation, we write \mathbb{P} instead of ^\smash{\widehat{\mathbb{P}}}. The extension of the random variable 𝒟π\mathcal{D}^{\pi} to augmented walks is straightforward.

Let σ\sigma be a regular Defender’s strategy. We prove Val(σ)(σ)\operatorname{Val}(\sigma)\leq\mathcal{L}(\sigma). Recall

Val(σ)\displaystyle\operatorname{Val}(\sigma) =\displaystyle= minv^V^supπ𝔼σ,v^[𝒟π]\displaystyle\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{\smash{\widehat{v}}\in\smash{\widehat{V}}}\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]
(σ)\displaystyle\mathcal{L}(\sigma) =\displaystyle= minBmaxτTmaxe^E(B)τ,e^\displaystyle\min_{B\in\mathcal{B}}\ \max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B)}\ \mathcal{L}_{\tau,\smash{\widehat{e}}}

Let BB^{*}\in\mathcal{B} be a bottom strongly connected component achieving the above minimum. We show that

minv^V^supπ𝔼σ,v^[𝒟π]maxτTmaxe^E(B)τ,e^.\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{\smash{\widehat{v}}\in\smash{\widehat{V}}}\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\leq\max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B^{*})}\ \mathcal{L}_{\tau,\smash{\widehat{e}}}.

Denoting the right-hand side by MM, we show that there is v^V^\smash{\widehat{v}}\in\smash{\widehat{V}} such that supπ𝔼σ,v^[𝒟π]M\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\leq M. Choose an arbitrary v^B\smash{\widehat{v}}\in B^{*} and assume, for the sake of contradiction, that supπ𝔼σ,v^[𝒟π]>M\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]>M. Hence, there is an Attacker’s strategy π\pi such that 𝔼σ,v^[𝒟π]>M\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]>M.

Now we decompose the expectation 𝔼σ,v^[𝒟π]\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}] according to augmented observations, i.e., sequences o^=v^1,,v^n,v^nv^n+1\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}, where v1,,vn+1v_{1},\ldots,v_{n+1} is a path in GG. Given o^=v^1,,v^n,v^nv^n+1\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}, we use oo to denote the “unaugmented” observation v1,,vn,vnvn+1v_{1},\ldots,v_{n},v_{n}{\rightarrow}v_{n+1}. Let Ω^\smash{\widehat{\Omega}} be the set of all augmented observation, and let 𝐴𝑡𝑡(Ω^)Ω^\mathit{Att}(\smash{\widehat{\Omega}}){\subseteq}\smash{\widehat{\Omega}} be the set of all o^\smash{\widehat{o}} where π(o)𝑤𝑎𝑖𝑡\pi(o)\neq\mathit{wait}. For every o^=v^1,,v^n,v^nv^n+1\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\to}\smash{\widehat{v}}_{n+1} in 𝐴𝑡𝑡(Ω^)\mathit{Att}(\smash{\widehat{\Omega}}), let 𝑤𝑎𝑙𝑘(o^)\mathit{walk}(\smash{\widehat{o}}) be the set of all augmented walks starting with v^1,,v^n+1\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n+1}. Note that if o^,o^𝐴𝑡𝑡(Ω^)\smash{\widehat{o}},\smash{\widehat{o}}^{\prime}\in\mathit{Att}(\smash{\widehat{\Omega}}) and o^o^\smash{\widehat{o}}\neq\smash{\widehat{o}}^{\prime}, then 𝑤𝑎𝑙𝑘(o^)𝑤𝑎𝑙𝑘(o^)=\mathit{walk}(\smash{\widehat{o}})\cap\mathit{walk}(\smash{\widehat{o}}^{\prime})=\emptyset. Furthermore, for every w^𝒲^o^𝐴𝑡𝑡(Ω^)𝑤𝑎𝑙𝑘(o^)\smash{\widehat{w}}\in\smash{\widehat{\mathcal{W}}}\setminus\bigcup_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathit{walk}(\smash{\widehat{o}}), we have that 𝒟π(w^)=0\mathcal{D}^{\pi}(\smash{\widehat{w}})=0. Hence, we obtain

𝔼σ,v^[𝒟π]=o^𝐴𝑡𝑡(Ω^)σ,v^(𝑤𝑎𝑙𝑘(o^))𝔼σ,v^[𝒟π𝑤𝑎𝑙𝑘(o^)].\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]=\sum_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))\cdot\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})].

Since

o^𝐴𝑡𝑡(Ω^)σ,v^(𝑤𝑎𝑙𝑘(o^))1,\sum_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))\leq 1,

there is o^=v^1,,v^n,v^nv^n+1𝐴𝑡𝑡(Ω^)\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\to}\smash{\widehat{v}}_{n+1}\in\mathit{Att}(\smash{\widehat{\Omega}}) such that

σ,v^(𝑤𝑎𝑙𝑘(o^))>0and𝔼σ,v^[𝒟π𝑤𝑎𝑙𝑘(o^)]>M.\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))>0\quad\hbox{and}\quad\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})]>M.

Let e^=(v^n,v^n+1)\smash{\widehat{e}}=(\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n+1}), and let τT\tau\in T be the attacked target (i.e., π(o)=𝑎𝑡𝑡𝑎𝑐𝑘τ\pi(o)=\mathit{attack}_{\tau}). Since v^=v^1B\smash{\widehat{v}}=\smash{\widehat{v}}_{1}\in B^{*} and σ,v^(𝑤𝑎𝑙𝑘(o^))>0\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))>0, it holds that e^E(B)\smash{\widehat{e}}\in E(B^{*}). Since σ\sigma is regular, its (randomized) behavior after e^\smash{\widehat{e}} is always the same, regardless of the traversed history. Therefore,

𝔼σ,v^[𝒟π𝑤𝑎𝑙𝑘(o^)]=𝔼σ,v^n[𝒟π[τ]𝑤𝑎𝑙𝑘(e^)]=τ,e^.\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})]=\mathbb{E}^{\sigma,\smash{\widehat{v}}_{n}}[\mathcal{D}^{\pi[\tau]}\mid\mathit{walk}(\smash{\widehat{e}})]=\mathcal{L}_{\tau,\smash{\widehat{e}}}\,.

Hence, τ,e^>M\mathcal{L}_{\tau,\smash{\widehat{e}}}>M, which contradicts the definition of MM.

Now assume that σ\sigma is unambiguous. We prove that Val(σ)(σ)\operatorname{Val}(\sigma)\geq\mathcal{L}(\sigma). So, assume that v^V^\smash{\widehat{v}}\in\smash{\widehat{V}} achieves the minimum in the definition of Val(σ)\operatorname{Val}(\sigma). We construct an Attacker’s strategy π\pi such that 𝔼σ,v^[𝒟π](σ)\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma). For this purpose, let φ:T\varphi\colon\mathcal{B}\rightarrow T and ψ:E^\psi\colon\mathcal{B}\rightarrow\smash{\widehat{E}} be such that for every BB\in\mathcal{B}, ψ(B)E(B)\psi(B)\in E(B) and the choice τ=φ(B),e^=ψ(B)\tau=\varphi(B),\smash{\widehat{e}}=\psi(B) achieves the maximal τ,e^\mathcal{L}_{\tau,\smash{\widehat{e}}} (see the definition of (σ)\mathcal{L}(\sigma)). In particular, for every BB\in\mathcal{B}, we have

φ(B),ψ(B)(σ).\mathcal{L}_{\varphi(B),\psi(B)}\geq\mathcal{L}(\sigma). (29)

Note that the condition ψ(B)E(B)\psi(B)\in E(B) for every BB\in\mathcal{B} implies that ψ\psi is injective. Moreover, let \mathcal{E} denote the range of ψ\psi (thus, ψ:\psi\colon\mathcal{B}\rightarrow\mathcal{E} is a bijection) and for every e^\smash{\widehat{e}}\in\mathcal{E}, let τ(e^)\tau(\smash{\widehat{e}}) denote the target φ(ψ1(e^))\varphi(\psi^{-1}(\smash{\widehat{e}})). Using this notation, (29) becomes

τ(e^),e^(σ)\mathcal{L}_{\tau(\smash{\widehat{e}}),\smash{\widehat{e}}}\geq\mathcal{L}(\sigma) (30)

for every e^\smash{\widehat{e}}\in\mathcal{E}.

For every observation oΩo\in\Omega s.t. σ,v^(walk(o))>0\mathbb{P}^{\sigma,\smash{\widehat{v}}}(walk(o))>0, there is exactly one augmented observation of the form o^\smash{\widehat{o}} such that σ,v^(walk(o^))>0\mathbb{P}^{\sigma,\smash{\widehat{v}}}(walk(\smash{\widehat{o}}))>0 (this follows by a trivial induction on the length of oo). In the rest of this proof, for every oΩo\in\Omega, the symbol o^\smash{\widehat{o}} denotes the unique augmented observation satisfying the above. Now, for each oΩo\in\Omega, we define π(o)\pi(o) as follows: Let o^=v^1,,v^n,v^nv^n+1\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}. If none of the augmented edges (v^i,v^i+1)(\smash{\widehat{v}}_{i},\smash{\widehat{v}}_{i+1}) for 1i<n1\leq i<n is in \mathcal{E} and the last augmented edge e^=(v^n,v^n+1)\smash{\widehat{e}}=(\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n+1}) does appear in \mathcal{E}, we put π(o)=𝑎𝑡𝑡𝑎𝑐𝑘τ(e^)\pi(o)=\mathit{attack}_{\tau(\smash{\widehat{e}})}. Otherwise, we put π(o)=𝑤𝑎𝑖𝑡\pi(o)=\mathit{wait}. For every e^\smash{\widehat{e}}\in\mathcal{E}, let 𝐴𝑡𝑡(e^)\mathit{Att}(\smash{\widehat{e}}) denote the union of 𝑤𝑎𝑙𝑘(o^)\mathit{walk}(\smash{\widehat{o}}) over all oΩo\in\Omega such that π(o)𝑤𝑎𝑖𝑡\pi(o)\neq\mathit{wait} and o^\smash{\widehat{o}} ends with e^\smash{\widehat{e}}. Since the Attacker may attack only once along any (augmented) walk, we have 𝐴𝑡𝑡(e^)𝐴𝑡𝑡(e^)=\mathit{Att}(\smash{\widehat{e}})\cap\mathit{Att}(\smash{\widehat{e}}^{\prime})=\emptyset for all e^,e^\smash{\widehat{e}},\smash{\widehat{e}}^{\prime}\in\mathcal{E} such that e^e^\smash{\widehat{e}}\neq\smash{\widehat{e}}^{\prime}. Since 𝒟π\mathcal{D}^{\pi} is non-negative, we obtain

𝔼σ,v^[𝒟π]e^σ,v^(𝐴𝑡𝑡(e^))𝔼σ,v^[𝒟π𝐴𝑡𝑡(e^)].\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\sum_{\widehat{e}\in\mathcal{E}}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{Att}(\smash{\widehat{e}}))\cdot\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{Att}(\smash{\widehat{e}})]. (31)

Since σ\sigma is regular and, for every e^\smash{\widehat{e}}\in\mathcal{E}, π\pi always attacks the same target τ(e^)\tau(\smash{\widehat{e}}) when the Defender starts traversing e^\smash{\widehat{e}} (regardless of the previous history), we have

𝔼σ,v^[𝒟π𝐴𝑡𝑡(e^)]=τ(e^),e^(σ),\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{Att}(\smash{\widehat{e}})]=\mathcal{L}_{\tau(\smash{\widehat{e}}),\smash{\widehat{e}}}\geq\mathcal{L}(\sigma),

where the inequality follows from (30). Substituting into (31) yields

𝔼σ,v^[𝒟π](σ)e^σ,v^(𝐴𝑡𝑡(e^)).\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma)\cdot\sum_{\widehat{e}\in\mathcal{E}}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{Att}(\smash{\widehat{e}})).

The desired inequality 𝔼σ,v^[𝒟π](σ)\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma) follows from the fact that the above sum is equal to 11 (this follows by applying basic results of finite Markov chain theory; the Defender almost surely visits some bottom strongly connected component BB\in\mathcal{B} and there it almost surely traverses every edge infinitely often. In particular, the Defender almost surely visits the unique edge e^E(B)\smash{\widehat{e}}\in\mathcal{E}\cap E(B)). ∎

Our proof of Theorem 4.1 reveals that the Attacker can cause the expected damage equal to (σ)\mathcal{L}(\sigma) if it can observe the memory updates performed by the Defender. If σ\sigma is unambiguous, then the Attacker can determine the memory updates just by observing the history of Defender’s moves. However, if the memory updates are randomized, the Attacker needs to access the Defender’s internal data structures during a patrolling walk. Depending on a setup, this may or may not be possible. By the worst-case paradigm of adversarial patrolling, (σ)\mathcal{L}(\sigma) is more appropriate than Val(σ)\operatorname{Val}(\sigma) for measuring the protection achieved by σ\sigma. As we shall see, our strategy synthesis algorithm typically outputs unambiguous regular strategies where (σ)=Val(σ)\mathcal{L}(\sigma)=\operatorname{Val}(\sigma). Hence, it does not really matter whether (σ)\mathcal{L}(\sigma) is understood as the protection achieved by σ\sigma or just a bound on this protection.

Fix BB\in\mathcal{B} and τT\tau\in T. If BB does not contain any augmented vertex of the form τ^\smash{\widehat{\tau}}, then τ,e^=\mathcal{L}_{\tau,\smash{\widehat{e}}}=\infty for all e^E(B)\smash{\widehat{e}}\in E(B). Otherwise, to every e^=(u^,v^)E(B)\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in E(B), we associate a variable Xe^X_{\smash{\widehat{e}}}, and create a system of linear equations

Xe^=𝑡𝑚(u,v)+{0if v=τ,v^w^σ(v^)(w^)X(v^,w^)otherwiseX_{\smash{\widehat{e}}}\;=\;\mathit{tm}(u,v)+\begin{cases}0&\mbox{if $v=\tau$,}\\ \sum_{\smash{\widehat{v}}\to\smash{\widehat{w}}}\sigma(\smash{\widehat{v}})(\smash{\widehat{w}})\cdot X_{(\smash{\widehat{v}},\smash{\widehat{w}})}&\mbox{otherwise}\end{cases} (32)

over e^E(B)\smash{\widehat{e}}\in E(B). By a straightforward generalization of (Norris, 1998, Theorem 1.3.5), system (32) has a unique solution, equal to (τ,e^)e^E(B)(\mathcal{L}_{\tau,\smash{\widehat{e}}})_{\smash{\widehat{e}}\in E(B)}.

Observe that for all augmented edges e^,g^E(B)\smash{\widehat{e}},\smash{\widehat{g}}\in E(B) leading to the same augmented vertex (say e^=(u^,v^)\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}}) and g^=(w^,v^)\smash{\widehat{g}}=(\smash{\widehat{w}},\smash{\widehat{v}})), we have τ,g^=τ,e^𝑡𝑚(u,v)+𝑡𝑚(w,v)\mathcal{L}_{\tau,\smash{\widehat{g}}}=\mathcal{L}_{\tau,\smash{\widehat{e}}}-\mathit{tm}(u,v)+\mathit{tm}(w,v). Therefore, we may reduce the number of variables as well as equations of the system from |E(B)||E(B)| to |B||B|. Indeed, to every v^B\smash{\widehat{v}}\in B, we assign a variable Yv^Y_{\smash{\widehat{v}}}, and construct a system of linear equations

Yv^={0if v=τ,v^w^σ(v^)(w^)(𝑡𝑚(v,w)+Yw^)otherwise.Y_{\smash{\widehat{v}}}\;=\;\begin{cases}0&\mbox{if $v=\tau$,}\\ \sum_{\smash{\widehat{v}}\to\smash{\widehat{w}}}\sigma(\smash{\widehat{v}})(\smash{\widehat{w}})\cdot(\mathit{tm}(v,w)+Y_{\smash{\widehat{w}}})&\mbox{otherwise.}\end{cases} (33)

Then, for every e^=(u^,v^)E(B)\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in E(B), we have τ,e^=𝑡𝑚(u,v)+yv^\mathcal{L}_{\tau,\smash{\widehat{e}}}=\mathit{tm}(u,v)+y_{\smash{\widehat{v}}}, where (yv^)v^B(y_{\smash{\widehat{v}}})_{\smash{\widehat{v}}\in B} is the unique solution of the system (33).

4.2. Optimization Scheme

Our strategy synthesis algorithm is based on interpreting \mathcal{L} as a piecewise differentiable function and applying methods of differentiable programming. We start from a random strategy σ\sigma, repeatedly compute (σ)\mathcal{L}(\sigma) and update the strategy against the direction of its gradient.

The optimization algorithm is described in Algo. 1. On forward pass, strategies are produced from real-valued coefficients by a Softmax function that outputs probability distributions. For every target τ\tau, we solve the system (33) to obtain a damage (τ,e^)τT,e^E(B)(\mathcal{L}_{\tau,\smash{\widehat{e}}})_{\tau\in T,\smash{\widehat{e}}\in E(B)}. Then, instead of hard maximum in equation (28) we optimize a loss function defined by

loss=τ,e^Φε(τ,e^)2,\text{loss}\;=\;\sum\nolimits_{\tau,\smash{\widehat{e}}}\Phi_{\varepsilon}(\mathcal{L}_{\tau,\smash{\widehat{e}}})^{2}, (34)

where Φε(t)=0\Phi_{\varepsilon}(t)=0 for t[0,mεm)t\in[0,m-\varepsilon m) and Φε(t)=1+(tm)/εm\Phi_{\varepsilon}(t)=1+(t-m)/\varepsilon m for t[mεm,m]t\in[m-\varepsilon m,m], in which m=¯m=\overline{\mathcal{L}} is the hard maximum, the bar denotes the stop-gradient operator, and ε(0,1)\varepsilon\in(0,1) is a hyperparameter. Minimizing loss instead of \mathcal{L} leads to a more efficient gradient propagation. On top of this, we enforce the model to prefer deterministic strategies over randomized by adding an average entropy of strategies’ probability distributions with a factor β\beta.

On backward pass, the loss gradient is computed using the automatic differentiation, we add decaying Gaussian noise and update the coefficients using Adam optimizer (Kingma and Ba, 2015).

As Softmax never produces probability distributions containing zeros, we cut the outputs at a certain threshold (called rounding threshold) to allow endpoint values on evaluation. Note that edges with zero probabilities are excluded from E(B)E(B) which is crucial for equation (33).

Algorithm 1 Strategy optimization
coefficients \leftarrow Init()
for step \in steps do
  // Forward pass
  strategy \leftarrow Softmax(coefficients)
   // solving linear system (33)
  damage \leftarrow Solve(strategy)
  loss \leftarrow Loss(damage) ++ β\beta\cdotEntropy(strategy)
  // Backward pass
  coefficients.grad \leftarrow Gradient(loss)
   // automatic differentiation
  coefficients.grad += Noise(step)
   // Adam optimizer’s step
  coefficients += Step(coefficients.grad, step)
  // Strategy evaluation
  strategy \leftarrow Cutoff(Softmax(coefficients))
  damage \leftarrow Solve(strategy)
  \mathcal{L}\leftarrow Max(damage)
  Save \mathcal{L}, strategy return strategy with the smallest \mathcal{L}

5. Experiments

We experimentally evaluate strategy synthesis algorithm on series of synthetic graphs with increasing sizes. We perform two sets of tests. The first analyzes runtimes while the second one focuses on the achieved protection values.

The experiments were performed on a desktop machine with Ubuntu 20.04 LTS running on Intel® Core™ i7-8700 Processor (6 cores, 12 threads) with 32GB RAM.

5.1. Runtime Analysis

We generate synthetic graphs with n=10,20,,100n=10,20,\ldots,100 vertices. To obtain random but similarly structured graphs, we start with a grid of size n×nn\times n and choose its nn nodes as the vertices of our patrolling graph, half of them being targets. All vertices are equipped with 6 memory elements. The travel time between vertices is set to the number of edges on the shortest path in the original grid. In the final patrolling graph, we omit those edges that have an alternative connection of at most the same length visiting another vertex.

For each nn, we generate 10 graphs of nn vertices and run 10 optimization trials with 100 steps for each graph. In Fig. 3, we report statistics of average step-times in seconds aggregated by nn. Note than even considerably large graphs are processed in units of seconds, which confirms the applicability of our algorithm to dynamically changing environments (see Section 1).

Recall that one optimization step consists of a forward pass (damage and loss of the current strategy), backward pass (gradient), and one more test evaluation. For hyperparameters, we set ε=0.3\varepsilon=0.3, β=0.2\beta=0.2, learning rate =0.5=0.5, cutoff threshold =0.1=0.1, and rounding threshold =0.001=0.001 (for a deeper explanation, see Section 4.2).

Refer to caption
Figure 3. Runtime analysis of the strategy synthesis algorithm (in seconds).

Fig. 4 shows the convergence of values during the optimization process. Here we fix one graph for each number of vertices and run 10 trials each with 120 steps. The colors are assigned to individual graphs, the areas show ranges of the obtained values during the optimization process. The solid lines highlight minimal values.

Refer to caption
Figure 4. Values of strategies synthesized during the first 120 steps. The colored areas show ranges of the obtained values during the optimization process. The solid lines highlight the minimal values.

5.2. Patrolling Airport Gates

One typical application of patrolling is security patrol at airports (see Section 1). Airport buildings have a specific tree structure of terminals with symmetric gates with a central node connecting the terminals. A terminal typically consists of pairs of gates joined by halls. A patrolling graph for an airport with three terminals of 4, 2, and 6 gates is shown in Fig. 5.

CCHHGGGGHHGGGGHHGGGGHHGGGGHHGGGGHHGGGG
Figure 5. A patrolling graph for an airport with 3 terminals.

We generate a sequence of random airport graphs with 3 terminals and an increasing number of gates determined randomly. Hence, an airport graph with nn gates has n/2n/2 halls and exactly one central node. The gates are targets and have exactly one memory element, while halls and the central node are non-target vertices with 44 memory elements each (thus, the strategy can “remember” the previously visited vertex). The costs of all targets are equal to 11, and all edges have the same traversal time 11.

Since the target costs are the same, we can estimate Val\operatorname{Val} (the achievable protection) by the length of the shortest cycle visiting all targets, which is 2(|V|1)2(|V|-1) where VV is the set of vertices. Note that automatic synthesis of a regular strategy with comparable protection is tricky—the synthesis algorithm must “discover” the relevance of the previously visited vertex and design the memory updates accordingly.

We use the same hyperparameters as above. For each airport, we synthesized 30 strategies, each in 500 steps of iterations. In Fig. 6, we show the protection values of the synthesized strategies for increasing number of vertices normalized by the baseline 2(|V|1)2(|V|-1). For larger |V||V|, the optimization converges to locally optimal randomized strategies with worse protection than the deterministic-loop strategy. In particular, for |V|=91|V|=91, the protection achieved by the constructed strategy is about 33%33\% worse than the baseline on average (with best found strategy loosing 20%20\% above the baseline).

Refer to caption
Figure 6. Normalized values of strategies synthesized for airport graphs with increasing number of vertices. The normalization baseline is the shortest cycle visiting all targets with length 2(v1)2(v-1) where vv stands for the number of vertices.

6. Conclusion

The outcomes show that high-quality Defender’s strategies are computed very quickly even for instances of considerable size. Hence, our algorithm can also be used to re-compute a Defender’s strategy dynamically when the patrolling scenario changes.

The problem encountered in our experiments is the existence of locally optimal randomized strategies where the optimization loop gets stuck. An interesting question is whether this problem can be overcome by tuning the parameters of gradient descent or by constructing the initial seeds in a more sophisticated way.

A natural continuation of our study is extending the presented results to scenarios with multiple Defenders and Attackers.

Acknowledgements

Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-21-1-0189.

Vít Musil is also supported from Operational Programme Research, Development and Education - Project Postdoc2MUNI (No. CZ.02.2.69/0.0/0.0/18_053/0016952).

Disclaimer

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

References

  • (1)
  • Agmon et al. (2008a) N. Agmon, S. Kraus, and G. Kaminka. 2008a. Multi-Robot Perimeter Patrol in Adversarial Settings. In Proceedings of ICRA 2008. IEEE Computer Society Press, 2339–2345.
  • Agmon et al. (2008b) N. Agmon, V. Sadov, G.A. Kaminka, and S. Kraus. 2008b. The impact of adversarial knowledge on adversarial planning in perimeter patrol. In Proceedings of AAMAS 2008. 55–62.
  • Almeida et al. (2004) A. Almeida, G. Ramalho, H. Santana, P. Tedesco, T. Menezes, V. Corruble, and Y. Chevaleyr. 2004. Recent Advances on Multi-Agent Patrolling. Advances in Artificial Intelligence – SBIA 3171 (2004), 474–483.
  • An et al. (2014) B. An, E. Shieh, R. Yang, M. Tambe, C. Baldwin, J. DiRenzo, B. Maule, and G. Meyer. 2014. Protect—A Deployed Game Theoretic System for Strategic Security Allocation for the United States Coast Guard. AI Magazine 33, 4 (2014), 96–110.
  • Basilico et al. (2009a) N. Basilico, N. Gatti, and F. Amigoni. 2009a. Leader-follower strategies for robotic patrolling in environments with arbitrary topologies. In Proceedings of AAMAS 2009. 57–64.
  • Basilico et al. (2012a) N. Basilico, N. Gatti, and F. Amigoni. 2012a. Patrolling Security Games: Definitions and Algorithms for Solving Large Instances with Single Patroller and Single Intruder. Artificial Inteligence 184–185 (2012), 78–123.
  • Basilico et al. (2009b) N. Basilico, N. Gatti, T. Rossi, S. Ceppi, and F. Amigoni. 2009b. Extending algorithms for mobile robot patrolling in the presence of adversaries to more realistic settings. In Proccedings of WI-IAT 2009. 557–564.
  • Basilico et al. (2010) Nicola Basilico, Nicola Gatti, and Federico Villa. 2010. Asynchronous Multi-Robot Patrolling against Intrusion in Arbitrary Topologies. In Proceedings of AAAI 2010.
  • Basilico et al. (2012b) N. Basilico, G. De Nittis, and N. Gatti. 2012b. A Security Game Combining Patrolling and Alarm-Triggered Responses Under Spatial and Detection Uncertainties. In Proceedings of AAAI 2016. 404–410.
  • Biswas et al. (2021) A. Biswas, G. Aggarwal, P. Varakantham, and M. Tambe. 2021. Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2021).
  • Bosansky et al. (2011) B. Bosansky, V. Lisy, M. Jakob, and M. Pechoucek. 2011. Computing Time-Dependent Policies for Patrolling Games with Mobile Targets. In Proceedings of AAMAS 2011.
  • Brázdil et al. (2018) T. Brázdil, A. Kučera, and V. Řehák. 2018. Solving Patrolling Problems in the Internet Environment. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2018). 121–127.
  • Chen et al. (2017) H. Chen, T. Cheng, and S. Wise. 2017. Developing an Online Cooperative Police Patrol Routing Strategy. Computers, Environment and Urban Systems 62 (2017), 19–29.
  • Fang et al. (2013) Fei Fang, Albert Xin Jiang, and Milind Tambe. 2013. Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile Resources. In Proceedings of AAMAS 2013.
  • Fave et al. (2014) F.M. Delle Fave, A.X. Jiang, Z. Yin, C . Zhang, M. Tambe, S. Kraus, and J. Sullivan. 2014. Game-Theoretic Security Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System. Journal of Artificial Intelligence Research 50 (2014), 321–367.
  • Ford et al. (2014) B. Ford, D. Kar, F.M. Delle Fave, R. Yang, and M. Tambe. 2014. PAWS: Adaptive Game-Theoretic Patrolling for Wildlife Protection. In Proceedings of AAMAS 2014. 1641–1642.
  • Ho and Ouaknine (2015) Hsi-Ming Ho and J. Ouaknine. 2015. The Cyclic-Routing UAV Problem is PSPACE-Complete. In Proceedings of FoSSaCS 2015 (Lecture Notes in Computer Science, Vol. 9034). Springer, 328–342.
  • Huang et al. (2019) L. Huang, M. Zhou, K. Hao, and E. Hou. 2019. A Survey of Multi-robot Regular and Adversarial Patrolling. IEEE/CAA Journal of Automatica Sinica 6, 4 (2019), 894–903.
  • Jakob et al. (2011) M. Jakob, O. Vanek, and M. Pechoucek. 2011. Using Agents to Improve International MaritimeTransport Security. IEEE Intelligent Systems 26, 1 (2011), 90–96.
  • Karwowski et al. (2019) J. Karwowski, J. Mandziuk, A. Zychowski, F. Grajek, and B. An. 2019. A Memetic Approach for Sequential Security Games on a Plane with Moving Targets. In Proceedings of AAAI 2019. 970–977.
  • Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of ICLR 2015.
  • Klaška et al. (2018) D. Klaška, A. Kučera, T. Lamser, and V. Řehák. 2018. Automatic Synthesis of Efficient Regular Strategies in Adversarial Patrolling Games. In Proceedings of AAMAS 2018. 659–666.
  • Klaška et al. (2021) D. Klaška, A. Kučera, V. Musil, and V. Řehák. 2021. Regstar: Efficient Strategy Synthesis for Adversarial Patrolling Games. In Proceedings of UAI 2021.
  • Klaška et al. (2020) D. Klaška, A. Kučera, and V. Řehák. 2020. Adversarial Patrolling with Drones. In Proceedings of AAMAS 2020. 629–637.
  • Kučera and Lamser (2016) A. Kučera and T. Lamser. 2016. Regular Strategies and Strategy Improvement: Efficient Tools for Solving Large Patrolling Problems. In Proceedings of AAMAS 2016. 1171–1179.
  • Maza et al. (2011) I. Maza, F. Caballero, J. Capitán, J.R. Martínez de Dios, and A. Ollero. 2011. Experimental Results in Multi-UAV Coordination for Disaster Management and Civil Security Applications. Journal of Intelligent and Robotic Systems 61, 1–4 (2011), 563–585.
  • Munoz de Cote et al. (2013) Enrique Munoz de Cote, Ruben Stranders, Nicola Basilico, Nicola Gatti, and Nick Jennings. 2013. Introducing alarms in adversarial patrolling games: extended abstract. In Proceedings of AAMAS 2013. 1275–1276.
  • Norris (1998) J.R. Norris. 1998. Markov Chains. Cambridge University Press.
  • Pita et al. (2008) J. Pita, M. Jain, J. Marecki, F. Ordónez, C. Portway, M. Tambe, C. Western, P. Paruchuri, and S. Kraus. 2008. Deployed ARMOR Protection: The Application of a Game Theoretic Model for Security at the Los Angeles Int. Airport. In Proceedings of AAMAS 2008. 125–132.
  • Portugal and Rocha (2011) D. Portugal and R. Rocha. 2011. A Survey on Multi-Robot Patrolling Algorithms. Technological Innovation for Sustainability 349 (2011), 139–146.
  • Sinha et al. (2018) A. Sinha, F. Fang, B. An, C. Kiekintveld, and M. Tambe. 2018. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2018). 5494–5501.
  • Tambe (2011) M. Tambe. 2011. Security and Game Theory. Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press.
  • Tsai et al. (2009) J. Tsai, S. Rathi, C. Kiekintveld, F. Ordóñez, and M. Tambe. 2009. IRIS—A Tool for Strategic Security Allocation in Transportation Networks Categories and Subject Descriptors. In Proceedings of AAMAS 2009. 37–44.
  • Wang et al. (2019) Y. Wang, Z.R. Shi, L. Yu, Y. Wu, R. Singh, L. Joppa, and F. Fang. 2019. Deep Reinforcement Learning for Green Security Games with Real-Time Information. In Proceedings of AAAI 2019. 1401–1408.
  • Xu (2021) L. Xu. 2021. Learning and Planning Under Uncertainty for Green Security. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2021).
  • Yan et al. (2013) Z. Yan, N. Jouandeau, and A.A. Cherif. 2013. A Survey and Analysis of Multi-Robot Coordination. International Journal of Advanced Robotic Systems 10, 12 (2013), 1–18.
  • Yin et al. (2010) Z. Yin, D. Korzhyk, C. Kiekintveld, V. Conitzer, and M. Tambe. 2010. Stackelberg vs. Nash in security games: Interchangeability, equivalence, and uniqueness. In Proceedings of AAMAS 2010. 1139–1146.