\acmConference

A full version of the paper published in Proc. of AAMAS 2022 \acmYear \acmDOI \acmPrice \acmISBN \acmSubmissionID493 \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia \affiliation \institutionMasaryk University \cityBrno \countryCzechia

Minimizing Expected Intrusion Detection Time
in Adversarial Patrolling

David Klaška david.klaska@mail.muni.cz , Antonín Kučera tony@fi.muni.cz , Vít Musil musil@fi.muni.cz and Vojtěch Řehák rehak@fi.muni.cz

Abstract.

In adversarial patrolling games, a mobile Defender strives to discover intrusions at vulnerable targets initiated by an Attacker. The Attacker’s utility is traditionally defined as the probability of completing an attack, possibly weighted by target costs. However, in many real-world scenarios, the actual damage caused by the Attacker depends on the time elapsed since the attack’s initiation to its detection. We introduce a formal model for such scenarios, and we show that the Defender always has an optimal strategy achieving maximal protection. We also prove that finite-memory Defender’s strategies are sufficient for achieving protection arbitrarily close to the optimum. Then, we design an efficient strategy synthesis algorithm based on differentiable programming and gradient descent. We evaluate the efficiency of our method experimentally.

Key words and phrases:

Strategy synthesis, Security Games, Adversarial Patrolling

1. Introduction

This paper follows the security games line of work studying optimal allocation of limited security resources for achieving optimal target coverage Tambe (2011). Practical applications of security games include the deployment of police checkpoints at the Los Angeles International Airport Pita et al. (2008), the scheduling of federal air marshals over the U.S. domestic airline flights Tsai et al. (2009), the arrangement of city guards in Los Angeles Metro Fave et al. (2014), the positioning of U.S. Coast Guard patrols to secure selected locations An et al. (2014), and also applications to wildlife protection Ford et al. (2014); Wang et al. (2019); Xu (2021).

Patrolling games are a special type of security games where a mobile Defender moves among protected targets with the aim of detecting possible incidents. Compared with static monitoring facilities such as sensor networks or surveillance systems, patrolling is more flexible and less costly on implementation and maintenance. Due to these advantages Yan et al. (2013), patrolling is indispensable in detecting crimes Jakob et al. (2011); Chen et al. (2017), managing disasters Maza et al. (2011), wildlife protection Wang et al. (2019); Xu (2021), etc. Many works consider human Defenders such as police squads or rangers Wang et al. (2019) where the patrolling horizon is bounded. Recent technological advances motivate the study of robotic patrolling with unbounded horizon where the Defender is an autonomous device operating for a long time without interruption.

Most of the existing patrolling models can be classified as either regular or adversarial. Regular patrolling can be seen as a form of surveillance where the Defender aims at discovering accidents as quickly as possible by minimizing the time lag between two consecutive visits for each target. Here, a Defender’s strategy is typically a single path or a cycle visiting all targets. In adversarial patrolling, the Defender strives to protect the targets against an Attacker exploiting the best attack opportunities maximizing the damage. The solution concept is typically based on Stackelberg equilibrium Yin et al. (2010); Sinha et al. (2018), where the Defender commits to a strategy $\gamma$ and the Attacker follows by selecting a strategy $\pi$ maximizing the expected Attacker’s utility against $\gamma$ . Defender’s strategies are typically randomized so that the Attacker cannot foresee the next Defender’s moves, and the Defender aims at maximizing the probability of discovering an attack before its completion. The adversarial model is also appropriate in situations when a certain protection level must be guaranteed even if the accidents happen at the least convenient moment.

In infinite-horizon adversarial patrolling models, every target $\tau$ is assigned a finite resilience $d(\tau)$ , and an attack at $\tau$ is discovered if the Defender visits $\tau$ in the next $d(\tau)$ time units. Although this model is adequate in many scenarios, it is not applicable when the actual damage depends on the time elapsed since initiating the attack. For example, if the attack involves setting a fire, punching a hole in a fuel tank, or setting a trap, then the associated damage increases with time. In this case, the Defender should aim at minimizing the expected attack discovery time rather than maximizing the probability of visiting a target before a deadline. We refer to Section 2.3 for a more detailed discussion.

In this work, we formalize the objective of minimizing the expected attack discovery time in infinite-horizon adversarial patrolling, and we design an efficient strategy synthesis algorithm. We start by fixing a suitable formal model. The terrain is modeled by the standard patrolling graph, and the Defender’s/Attacker’s strategies are also defined in the standard way. However, the expected damage caused by attacking a target $\tau$ is defined as the expected time of visiting $\tau$ by the Defender since initiating the attack, multiplied by the target cost $\alpha(\tau)$ . Intuitively, $\alpha(\tau)$ is the “damage per time unit” when attacking $\tau$ . We use Stackelberg equilibrium as the underlying solution concept, and define the protection value of a given Defender’s strategy $\gamma$ as the expected attack discovery time (weighted by target costs) guaranteed by $\gamma$ against an arbitrary Attacker’s strategy.

In general, a Defender’s strategy $\gamma$ may randomize and the choice of the next move may depend on the whole history of moves. The randomization is crucial for increasing protection value (a concrete example is given in Section 2.3). Since general strategies are not finitely representable, they are not algorithmically workable. Recent results on infinite-horizon adversarial patrolling Klaška et al. (2018, 2021) identify the subclass of regular Defender’s strategies as sufficiently powerful to maximize the probability of timely attack discovery. Here, a strategy is regular if it uses finite memory and rational probability distributions. However, it is not clear whether regular strategies are equivalently powerful as general strategies when minimizing the expected attack discovery time. Perhaps surprisingly, we show that the answer is positive, despite all issues caused by specific properties of this objective. More precisely, we prove that the limit protection value achievable by regular strategies is the same as the limit protection value achievable by general strategies. This non-trivial result is based on deep insights into the structure of (sub)optimal Defender’s strategies.

Our second main contribution is an algorithm synthesizing a regular Defender’s strategy and its protection value for a given patrolling graph. We show that the protection value of a regular strategy is a differentiable function, and we proceed by designing an efficient strategy improvement procedure based on gradient descent.

We evaluate our algorithm experimentally on instances of considerable size. Since our work initiates the study of infinite-horizon adversarial patrolling with the expected attack discovery time, there is no baseline set by previous works. To estimate the quality of the constructed regular strategies, we consider instances where the optimal protection value can be determined by hand, but constructing the associated Defender’s strategy is sufficiently tricky to properly examine the capabilities of our strategy synthesis algorithm.

The experiments also show that our algorithm is sufficiently fast for recomputing a patrolling strategy dynamically when the underlying patrolling graph changes due to unpredictable external events. Hence, the applicability of our results is not limited just to static scenarios.

Our main contribution can be summarized as follows:

•

We propose a formal model for infinite-horizon adversarial patrolling where the damage caused by attacking a target depends on the time needed to discover the attack.
•

We prove that regular strategies can achieve the same limit protection value as general strategies.
•

We design an efficient algorithm synthesizing a regular Defender’s strategy for a given patrolling graph, and we evaluate its functionality experimentally.

1.1. Related work

The literature on regular and adversarial patrolling is rich; the existing overviews include Huang et al. (2019); Almeida et al. (2004); Portugal and Rocha (2011). We give a summary of previous results on infinite-horizon adversarial patrolling, which is perhaps closest to our work.

Most of the existing results concentrate on computing an optimal moving strategy for certain topologies of admissible moves. The underlying solution concept is the Stackelberg equilibrium Sinha et al. (2018); Yin et al. (2010), where the Defender/Attacker play the roles of the Leader/Follower.

For general topologies, the existence of a perfect Defender’s strategy discovering all attacks in time is $\mathsf{PSPACE}$ -complete Ho and Ouaknine (2015). Consequently, computing an optimal Defender’s strategy is $\mathsf{PSPACE}$ -hard. Moreover, computing an $\varepsilon$ -optimal strategy for $\varepsilon\leq 1/2n$ , where $n$ is the number of vertices, is $\mathsf{NP}$ -hard Klaška et al. (2020). Hence, no feasible strategy synthesis algorithm can guarantee (sub)optimality for all inputs, and finding high-quality strategy in reasonable time is challenging.

The existing strategy synthesis algorithms are based either on mathematical programming, reinforcement learning, or gradient descent. The first approach suffers from scalability issues caused by non-linear constraints Basilico et al. (2012a, 2009a). Reinforcement learning has so far been successful mainly for patrolling with finite horizon, such as green security games Wang et al. (2019); Biswas et al. (2021); Xu (2021); Karwowski et al. (2019). Gradient descent techniques for finite-memory strategies Kučera and Lamser (2016); Klaška et al. (2018, 2021) are applicable to patrolling graphs of reasonable size.

Patrolling for restricted topologies has been studied for lines, circles (Agmon et al., 2008a, b), or fully connected environments (Brázdil et al., 2018). Apart of special topologies, specific variants and aspects of the patrolling problem have been studied, including moving targets Bosansky et al. (2011); Fang et al. (2013), multiple patrolling units Basilico et al. (2010), movement of the Attacker on the graph Basilico et al. (2009b), or reaction to alarms Munoz de Cote et al. (2013); Basilico et al. (2012b).

2. The Model

In this section we introduce a formal model for infinite-horizon patrolling where the Defender aims at minimizing the expected attack discovery time. The terrain (protected area) is modeled by the standard patrolling graph Basilico et al. (2009a, 2012a); Klaška et al. (2018, 2021). We use the variant where time is spent by performing edges Klaška et al. (2018, 2021) rather than by staying in vertices Basilico et al. (2009a, 2012a). The model of Defender/Attacker is also standard Kučera and Lamser (2016); Klaška et al. (2020) (the study of patrolling models usually starts by considering the scenario with one Defender and one Attacker, and we stick to this approach). The new ingredient of our model in the way of evaluating the protection achieved by Defender’s strategies (see Section 2.3), and here we devote more space to explaining and justifying our definitions.

In the rest of this paper, we use $\mathbb{N}$ and $\mathbb{N}_{+}$ to denote the sets of non-negative and positive integers. We assume familiarity with basic notions of probability, Markov chain theory, and calculus.

2.1. Terrain model

Locations where the Defender selects the next move are modeled as vertices in a directed graph. The edges correspond to admissible moves, and are labeled by the corresponding traversal time. The protected targets are a subset of vertices with integer weights representing their costs. Formally, a patrolling graph is a tuple $G=(V,T,E,\mathit{tm},\alpha)$ where

•

$V$ is a finite set of vertices (Defender’s positions);
•

$T\subseteq V$ is a non-empty set of targets;
•

$E\subseteq V\times V$ is a set of edges (admissible moves);
•

$\mathit{tm}\colon E\to\mathbb{N}_{+}$ specifies the traversal time of an edge;
•

$\alpha\colon T\to\mathbb{R}_{+}$ defines the costs of targets.

We require that $G$ is strongly connected, i.e., for all $v,u\in V$ there is a path from $v$ to $u$ . We write $u\to v$ instead of $(u,v)\in E$ , and use $\alpha_{\max}$ to denote the maximal target cost.

The set of all non-empty finite and infinite paths in $G$ are denoted by $\mathcal{H}$ (histories) and $\mathcal{W}$ (walks), respectively. For a given history $h=v_{1},\ldots,v_{n}$ , we use $\mathit{tm}(h)=\sum_{i=1}^{n-1}\mathit{tm}(v_{i},v_{i+1})$ to denote the total traversal time of $h$ .

2.2. Defender and Attacker

We adopt a simplified patrolling scenario with one Defender and one Attacker. In the rest of this section, let $G$ be a fixed patrolling graph.

2.2.1. Defender.

A Defender’s strategy is a function $\gamma$ assigning to every history $h\in\mathcal{H}$ of Defender’s moves a probability distribution on $V$ such that $\gamma(h)(v)>0$ only if $hv\in\mathcal{H}$ , i.e., $u\to v$ where $u$ is the last vertex of $h$ . We also use $\mathit{walk}(h)$ to denote the set of all walks initiated by a given $h\in\mathcal{H}$ .

For every initial vertex $v$ where the Defender starts patrolling, the strategy $\gamma$ determines the probability space $(\mathcal{W},\mathcal{F},\mathbb{P}^{\gamma,v})$ over the walks in the standard way, i.e., $\mathcal{F}$ is the $\sigma$ -field generated by all $\mathit{walk}(h)$ where $h\in\mathcal{H}$ , and $\mathbb{P}^{\gamma,v}$ is the unique probability measure satisfying $\mathbb{P}^{\gamma,v}(\mathit{walk}(h))=\prod_{i=1}^{n-1}\gamma(v_{1},\ldots,v_{i})(v_{i+1})$ for every history $h=v_{1},\ldots,v_{n}$ where $v_{1}=v$ (if $v_{1}\neq v$ , we have that $\mathbb{P}^{\gamma,v}(\mathit{walk}(h))=0$ ). We use $\mathbb{E}^{\gamma,v}[R]$ to denote the expected value of a random variable $R$ in this probability space.

2.2.2. Attacker.

The Attacker observes the history of Defender’s moves and decides whether and where to initiate an attack. In general, the Attacker may be able to determine the next Defender’s move right after the Defender leaves the vertex currently visited. For the Attacker, this is the best moment to attack, because initiating an attack in the middle of a Defender’s move can only decrease the Attacker’s utility (cf. Section 2.3).

Formally, an observation is a sequence $o=v_{1},\ldots,v_{n},v_{n}{\rightarrow}v_{n+1}$ , where $v_{1},\ldots,v_{n}$ is a path in $G$ . Intuitively, $v_{1},\ldots,v_{n}$ is the sequence of vertices visited by the Defender, $v_{n}$ is the currently visited vertex, and $v_{n}{\rightarrow}v_{n+1}$ is the edge taken next. The set of all observations is denoted by $\Omega$ .

An Attacker’s strategy is a function $\pi\colon\Omega\rightarrow\{\mathit{wait},\mathit{attack}_{\tau}:\tau\in T\}$ . We require that if $\pi(v_{1},\ldots,v_{n},v_{n}{\rightarrow}u)=\mathit{attack}_{\tau}$ for some $\tau\in T$ , then $\pi(v_{1},\ldots,v_{i},v_{i}{\rightarrow}v_{i+1})=\mathit{wait}$ for all $1\leq i<n$ , i.e., the Attacker exploits an optimal attack opportunity.

2.3. Protection value

Suppose the Defender commits to a strategy $\gamma$ and the Attacker selects a strategy $\pi$ . The expected damage caused by $\pi$ against $\gamma$ is the expected time to discover an attack scheduled by $\pi$ weighted by target costs.

More precisely, we say that a target $\tau$ is attacked along a walk $w=v_{1},v_{2},\ldots$ if $\pi(v_{1},\ldots,v_{n},v_{n}\rightarrow v_{n+1})=\tau$ for some $n$ . Note that the index $n$ is unique if it exists. Let $m>n$ be the least index such that $v_{m}=\tau$ . If no such $m$ exists, we say that the attack along $w$ is not discovered. Otherwise, the attack is discovered in time $\mathit{tm}(v_{n},\ldots,v_{m})$ .

Let $\mathcal{D}^{\pi}:\mathcal{W}\rightarrow\mathbb{N}_{\infty}$ be a function defined as follows:

\mathcal{D}^{\pi}(w)=\begin{cases}\ell\cdot\alpha(\tau)&\parbox[t]{260.17464pt}{if $\tau$ is attacked along $w$ and the attack is\\ discovered in time $\ell$;}\\ \infty&\parbox[t]{260.17464pt}{if $\tau$ is attacked along $w$ and the attack is not\\ discovered;}\\ 0&\mbox{if no target is attacked along $w$.}\end{cases}

The expected damage caused by $\pi$ against $\gamma$ initiated in $v$ is defined as $\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]$ . Since the Defender may choose the initial vertex $v$ , we define the protection value achieved by $\gamma$ and the limit protection value as follows:

	$\displaystyle\operatorname{Val}(\gamma)$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{v}\sup_{\pi}\ \mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]$		(1)
	$\displaystyle\operatorname{Val}$	$\displaystyle=$	$\displaystyle\inf_{\gamma}\ \operatorname{Val}(\gamma)$		(2)

We say that a Defender’s strategy $\gamma$ is optimal if $\operatorname{Val}(\gamma)=\operatorname{Val}$ .

2.3.1. Discussion

In this section, we discuss possible alternative approaches to formalizing the objective of discovering an initiated attack as quickly as possible.

Note that this objective is implicitly taken into account in regular patrolling where the Defender aims at minimizing the time lag between two consecutive visits for each target (see Section 1). For randomized strategies, one may try to minimize the expected time lag between two consecutive visits for each target. At first glance, this objective seems similar to minimizing $\sup_{\pi}\ \mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]$ . In reality, the objective is different and problematic. To see this, consider the trivial patrolling graph of Fig. 1a with two targets $\tau_{1},\tau_{2}$ and four edges (incl. two self-loops) with traversal time $1$ . The costs of both targets are equal to $1$ . A natural strategy $\gamma_{1}$ for patrolling these targets is a deterministic loop alternately visiting $\tau_{1}$ and $\tau_{2}$ (see Fig 1b). Then, the maximal expected time lag between two consecutive visits of a target is $2$ , and we also have that $\operatorname{Val}(\gamma_{1})=2$ . However, consider the randomized strategy $\gamma_{2}$ of Fig. 1c. In the target currently visited, the Defender performs the self-loop with probability $0.99$ , and with the remaining probability $0.01$ , the Defender moves to the other target (see the dashed arrows in Fig. 1c). For $\gamma_{2}$ , the maximal expected time lag between two consecutive visits of a target is again equal to $2$ (in Markov chain terminology, the stationary distribution determined by $\gamma_{2}$ assigns $1/2$ to each target, and hence the mean recurrence time is equal to $2$ for each target). Hence, if we adopted minimizing the maximal expected time lag between two consecutive visits of a target as the Defender’s objective, the strategies $\gamma_{1}$ and $\gamma_{2}$ would be equivalently good, despite the fact that the expected time to visit $\tau_{2}$ from $\tau_{1}$ is $100$ when the Defender commits to $\gamma_{2}$ . Observe that the difference between $\gamma_{1}$ and $\gamma_{2}$ is captured properly by our approach.

Figure 1. We compare two strategies (b) and (c) on graph (a). Strategy (b) patrols the targets in a deterministic loop, while strategy (c) applies randomization. Both have the same expected time lag between two consecutive visits to a target but differ in the protection significantly.

Let us note that although the example of Fig. 1 contains self-loops (which do not appear in real-world patrolling graphs), these can easily be avoided by inserting auxiliary vertices so that the demonstrated deficiency is still present.

One may still argue that the above problem is caused just by allowing the Defender to use randomized strategies. This is a valid objection. So, the question is whether randomization is really needed, i.e., whether the Defender can achieve strictly better protection by using randomized strategies.

Consider the graph of Fig. 2 with two targets $\tau_{1}$ and $\tau_{2}$ where $\alpha(\tau_{1})=1$ , $\alpha(\tau_{2})=2$ , and the traversal time of every edge is $1$ .

The protection achieved by an arbitrary deterministic strategy is not better than $8$ , because the Attacker can wait until the Defender starts moving from $\tau_{2}$ to $v$ so that the next move selected after arriving to $v$ will be edge leading to $\tau_{1}$ . Note that the Attacker knows the Defender’s strategy and can observe its moves, and hence he can recognize this attack opportunity. Since the Defender needs at least $4$ time units to return to $\tau_{2}$ , the expected damage caused by this attack is at least $8$ .

The Attacker’s ability to anticipate future Defender’s moves can be decreased by randomization. Consider the simple strategy $\sigma_{b}$ of Fig. 2b, where the Defender moves from $v$ to $\tau_{1}$ and $\tau_{2}$ with probability $p$ and $1-p$ , respectively. After reaching a target, the Defender returns to $v$ . The optimal value of $p$ is $\frac{7}{2}-\frac{\sqrt{41}}{2}\doteq 0.3$ , and the expected damage of an optimal attack is then $\doteq 7.7$ . Note that the strategy $\sigma_{b}$ is memoryless, i.e., its decisions depend just on the currently visited vertex.

At first glance, it is not clear whether the protection can be improved, because the probability $p$ used by $\sigma_{b}$ implements an optimal “balance” between visiting $\tau_{1}$ and $\tau_{2}$ determined by the weights of $\tau_{1}$ and $\tau_{2}$ . However, consider the finite-memory strategy $\sigma_{c}$ of Fig. 2c which is “almost deterministic” except for the moment when the Defender returns to $v$ from $\tau_{2}$ . Here, the strategy select the next edge uniformly at random. Then, the expected damage of an optimal attack is $6$ (the best attack opportunity is to attack $\tau_{2}$ right after the robot starts moving form $\tau_{2}$ to $v$ . The expected time need to visit $\tau_{2}$ is then equal to $3$ , yielding the expected damage $6$ ). Hence, protection is increased not only by randomization, but also by appropriate use of memory.

Figure 2. On graph (a), a memoryless randomized strategy (b) is outperformed by a randomized strategy (c) with finite memory.

3. Finite-memory Defender’s strategies

In this section, we prove that finite-memory Defender’s strategies can achieve the same limit protection value as general strategies.

Let $G$ be a patrolling graph. A general Defender’s strategy for $G$ (see Section 2.2.1) depends on the whole history of moves and cannot be finitely represented. A computationally feasible subclass are finite-memory (or regular) strategies Kučera and Lamser (2016); Klaška et al. (2018, 2021) where the relevant information about the history is represented by finitely memory elements assigned to each vertex.

Formally, let $\operatorname{mem}\colon V\to\mathbb{N}$ be a function assigning to every vertex the number of memory elements. The set of augmented vertices is defined by $\smash{\widehat{V}}=\{(v,m)\colon v\in V,\,1\leq m\leq\operatorname{mem}(v)\}$ . We use $\smash{\widehat{v}}$ to denote an augmented vertex of the form $(v,m)$ where $m\leq\operatorname{mem}(v)$ .

A regular Defender’s strategy for $G$ is a function $\sigma\colon\smash{\widehat{V}}\to\mathit{Dist}(\smash{\widehat{V}})$ where $\sigma(v,m)(v^{\prime},m^{\prime})>0$ only if $v\to v^{\prime}$ . We say that $\sigma$ is unambiguous if for all $v,v^{\prime}\in V$ and $m\leq\operatorname{mem}(v)$ there is at most one $m^{\prime}$ such that $\sigma(v,m)(v^{\prime},m^{\prime})>0$ .

Intuitively, the Defender starts patrolling in a designated initial vertex $v$ with initial memory element $m$ , and then traverses the vertices of $G$ and updates the memory according to $\sigma$ . Hence, the current memory element represents some information about the history of visited vertices.

For every initial $\smash{\widehat{v}}\in\smash{\widehat{V}}$ , the strategy $\sigma$ determines the probability space over the walks in the way described in Section 2.2.1. The only difference is that the probability of $\mathit{walk}(v_{1},\ldots,v_{n})$ where $v_{1}=v$ is defined as $\sum_{\smash{\widehat{v}}_{2},\ldots,\smash{\widehat{v}}_{n}}\prod_{i=1}^{n-1}\sigma(\smash{\widehat{v}}_{i})(\smash{\widehat{v}}_{i+1})$ . Here, $\smash{\widehat{v}}_{1}=\smash{\widehat{v}}$ is the initial augmented vertex. Hence, the notion of protection value defined in Section 2.3 is applicable also to regular strategies (where $\min_{v}$ is replaced with $\min_{\smash{\widehat{v}}}$ ) in (1)).

An important question is whether regular strategies can achieve the same limit protection value as general strategies. The answer is positive, and it is proven in two steps. First, we show that there exists an optimal Defender’s strategy $\gamma$ satisfying $\operatorname{Val}(\gamma)=\operatorname{Val}$ (see Section 2.3). Then, for arbitrarily small $\varepsilon>0$ , we prove the existence of a regular strategy $\sigma$ such that $\operatorname{Val}(\sigma)\leq\operatorname{Val}(\gamma)+\varepsilon$ .

{theorem}

For every patrolling graph, there exists a Defender’s strategy $\gamma$ such that $\operatorname{Val}(\gamma)=\operatorname{Val}$ .

Proof.

By the definition of $\operatorname{Val}$ , there exist a vertex $v$ and an infinite sequence $\Gamma=\gamma_{1},\gamma_{2},\ldots$ of Defender’s strategies such that $\operatorname{Val}(\gamma_{i})=\sup_{\pi}\mathbb{E}^{\gamma_{i},v}[\mathcal{D}^{\pi}]$ for all $i\geq 1$ , and the infinite sequence $\operatorname{Val}(\gamma_{1}),\operatorname{Val}(\gamma_{2}),\ldots$ converges to $\operatorname{Val}$ .

Let $\textit{Histories}=h_{1},h_{2},\ldots$ be a sequence where every history occurs exactly once (without any special ordering). Let $\Gamma_{0}=\Gamma$ . For every $i\geq 1$ , we inductively define an infinite sequence of strategies $\Gamma_{i}$ and a probability distribution $\gamma(h_{i})$ over $V$ , assuming that $\Gamma_{i-1}$ has already been defined. Let $t$ be the last vertex of $h_{i}$ , and let $\{u_{1},\ldots,u_{k}\}$ be the set of all immediate successors of $t$ in $G$ (i.e., $t\to u_{j}$ for all $j\leq k$ ). Since every bounded infinite sequence of real numbers contains an infinite convergent subsequence, there exists an infinite subsequence $\Gamma_{i}=\varrho_{1},\varrho_{2},\ldots$ of $\Gamma_{i-1}$ such that the sequence

\varrho_{1}(h_{i})(u_{j}),\ \varrho_{2}(h_{i})(u_{j}),\ \ldots

is convergent for every $j\leq k$ . We put

\gamma(h_{i})(u_{j})\quad=\quad\lim_{\ell\to\infty}\ \varrho_{\ell}(h_{i})(u_{j})\,.

It is easy to check that $\sum_{j=1}^{k}\gamma(h_{i})(u_{j})=1$ , i.e., $\gamma(h_{i})$ is indeed a distribution on $V$ . Hence, the function $\gamma$ is a Defender’s strategy, and we show that $\operatorname{Val}(\gamma)=\operatorname{Val}$ .

For the sake of contradiction, suppose $\sup_{\pi}\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi}]-\operatorname{Val}=\delta>0$ . Let $\varepsilon=\delta/4$ . Then, there exists an Attackers strategy $\pi^{*}$ such that

\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\geq\quad\operatorname{Val}\ +\ \delta\ -\ \varepsilon

(3)

Let $\mathit{Att}(\Omega)\subseteq\Omega$ be the set of all observations $o$ such that $\pi^{*}(o)=\mathit{attack}_{\tau}$ for some $\tau\in T$ . For every $o=v_{1},\ldots,v_{n},v_{n}{\to}v_{n+1}\in\mathit{Att}(\Omega)$ , let $\mathit{walk}(o)$ be the set of all walks starting with $v_{1},\ldots,v_{n+1}$ . Observe that if $o,o^{\prime}\in\mathit{Att}(\Omega)$ and $o\neq o^{\prime}$ , then $\mathit{walk}(o)\cap\mathit{walk}(o^{\prime})=\emptyset$ . Furthermore, for every $w\in\mathcal{W}\smallsetminus\bigcup_{o\in\mathit{Att}(\Omega)}\mathit{walk}(o)$ , we have that $\mathcal{D}^{\pi^{*}}(w)=0$ . Hence, we obtain

\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad=\quad\sum_{o\in\mathit{Att}(\Omega)}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]

(4)

where $\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]$ is the conditional expected value of $\mathcal{D}^{\pi^{*}}$ under the condition that a walk starts with $o$ . (Note that the conditional expectation is undefined when $\mathbb{P}^{\gamma,v}(\mathit{walk}(o))=0$ . In that case, “ $0\cdot\textrm{undefined}$ ” is interpreted as $0$ .) Hence, there exists a finite set of observations $O\subseteq\mathit{Att}(\Omega)$ such that

\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\leq\quad\varepsilon+\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]

(5)

For each $o=v_{1},\ldots,v_{n},v_{n}{\to}v_{n+1}\in O$ , let $\tau_{o}$ be the target attacked by $\pi$ after observing $o$ , and let $\mathcal{H}(o)$ be the set of all histories $h$ initiated in $v_{n+1}$ such that $\tau_{o}$ is the last vertex of $h$ and $\tau_{o}$ occurs exactly once in $h$ . We use $o\odot h$ to denote the history $v_{1},\ldots,v_{n},h$ . We have that $\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]$ is equal to

\sum_{h\in\mathcal{H}(o)}\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)}\cdot\big{(}\mathit{tm}(h)+\mathit{tm}(v_{n},v_{n+1})\big{)}\cdot\alpha(\tau_{o})

(6)

where $\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)}$ is the conditional probability of performing a walk starting with $o\odot h$ under the condition that a walk starting with $o$ is performed. Clearly, there exists a finite $H(o)\subseteq\mathcal{H}(o)$ such that the sum (6) decreases at most by $\varepsilon$ when $h$ ranges over $H(o)$ instead of $\mathcal{H}(o)$ . For short, we use $E^{\gamma,v}(o)$ to denote the sum

\sum_{h\in H(o)}\mathbb{P}^{\gamma,v}\big{(}\mathit{walk}(o\odot h)\mid\mathit{walk}(o)\big{)}\cdot\big{(}\mathit{tm}(h)+\mathit{tm}(v_{n},v_{n+1})\big{)}\cdot\alpha(\tau_{o})

(7)

Hence,

\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}\mid\mathit{walk}(o)]\quad\leq\quad\varepsilon+E^{\gamma,v}(o)

(8)

By combining (5) and (8), we obtain

\mathbb{E}^{\gamma,v}[\mathcal{D}^{\pi^{*}}]\quad\leq\quad 2\varepsilon+\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o)

(9)

Since $\varepsilon=\delta/4$ , this implies

\sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o)\quad\geq\quad\operatorname{Val}+\varepsilon

(10)

Let $H=\bigcup_{o\in O}H(o)$ . Since $H$ is finite, there exists an index $\ell$ such that all elements of $H$ appear among the first $\ell$ elements of Histories. Consider the sequence $\Gamma_{\ell}=\varrho_{1},\varrho_{2},\ldots$ and observe that, for all $i\geq 1$ ,

$\displaystyle\operatorname{Val}(\varrho_{i})$	$\displaystyle=$	$\displaystyle\sup_{\pi}\mathbb{E}^{\varrho_{i},v}[\mathcal{D}^{\pi}]$	(11)
	$\displaystyle\geq$	$\displaystyle\mathbb{E}^{\varrho_{i},v}[\mathcal{D}^{\pi^{*}}]$
	$\displaystyle\geq$	$\displaystyle\sum_{o\in O}\mathbb{P}^{\varrho_{i},v}(\mathit{walk}(o))\cdot E^{\varrho_{i},v}(o)$

Since the sequence of distributions $\varrho_{1}(h),\varrho_{2}(h),\ldots$ converges to $\gamma(h)$ for every $h\in H$ , we also obtain that

\lim_{i\to\infty}\sum_{o\in O}\mathbb{P}^{\varrho_{i},v}(\mathit{walk}(o))\cdot E^{\varrho_{i},v}(o)\ =\ \sum_{o\in O}\mathbb{P}^{\gamma,v}(\mathit{walk}(o))\cdot E^{\gamma,v}(o)

(12)

By combining (10), (11), and (22), we obtain $\operatorname{Val}(\varrho_{j})\geq\operatorname{Val}+\varepsilon/2$ for all sufficiently large $j$ . This means that the sequence $\operatorname{Val}(\varrho_{1}),\operatorname{Val}(\varrho_{2}),\ldots$ does not converge to $\operatorname{Val}$ , and we have a contradiction. ∎

{theorem}

Let $G$ be a patrolling graph, and let $\mathit{Reg}$ be the class of all regular strategies for $G$ . Then $\inf_{\sigma\in\mathit{Reg}}\operatorname{Val}(\sigma)=\operatorname{Val}$ .

Proof.

By Theorem 3.1, there exists a Defender’s strategy $\gamma$ such that $\operatorname{Val}(\gamma)=\operatorname{Val}$ . We show that for every $\varepsilon>0$ , there exist sufficiently large $d,\ell\in\mathbb{N}$ such that $\operatorname{Val}(\sigma_{d,\ell})\leq\operatorname{Val}(\gamma)+\varepsilon$ , where $\sigma_{\delta,\ell}$ is a regular strategy obtained by $d$ -discretization and $\ell$ -folding of $\gamma$ .

A $d$ -discretization of $\gamma$ is a strategy $\gamma_{d}$ where for all $h\in\mathcal{H}$ and $v\in V$ , the following conditions are satisfied:

•

$\gamma_{d}(h)(v)=k/d$ for some $k\in\{0,\ldots,d\}$ ;
•

$\gamma_{d}(h)(v)=0$ iff $\gamma(h)(v)=0$ ;
•

$|\gamma_{d}(h)(v)-\gamma(h)(v)|\leq|V|/d$ .

Observe that a $d$ -discretization of $\gamma$ exists for every $d\geq|V|$ .

The regular strategy $\sigma_{d,\ell}$ is obtained by $\ell$ -folding the strategy $\gamma_{d}$ constructed for a sufficiently large $d$ . We can view $\gamma_{d}$ as an infinite tree $T$ where the nodes are histories and $h\stackrel{{\scriptstyle\raisebox{-0.90417pt}{\scriptsize$x$}}}{{\rightarrow}}hu$ iff $\gamma(h)(u)=x$ . Furthermore, we label each node $h$ of $T$ with the last vertex of $h$ . Since the edge probabilities range over finitely many values, the tree $T$ contains only finitely many subtrees of height $\ell$ up to isomorphism preserving both node and edge labels. For a given history $h=v_{0},\ldots,v_{n}$ , let $f_{h},s_{h}\in\mathbb{N}$ be the lexicographically smallest pair of indexes such that $f_{h}<s_{h}$ , both $f_{h}$ and $s_{h}$ are integer multiples of $\ell$ , and the subtrees of height $\ell$ rooted by $v_{f_{h}}$ and $v_{s_{h}}$ are isomorphic. If no such $f_{h},s_{h}$ exist, we say that $h$ does not contain a folding pair.

Note that there exists a constant $c_{\ell}$ independent of $h$ such that every history $h$ of length at least $c_{\ell}$ contains a folding pair $f_{h},s_{h}$ with both components bounded by $c_{\ell}$ . We define a strategy $\gamma_{d,\ell}$ as follows:

•

$\gamma_{d,\ell}(h)=\gamma_{d}(h)$ for every history $h$ without a folding pair.
•

$\gamma_{d,\ell}(h)=\gamma_{d,\ell}(h^{\prime})$ for every history $h$ with a folding pair $f_{h},s_{h}$ , where the $h^{\prime}$ is obtained from $h=v_{0},\ldots,v_{n}$ by deleting the subpath $v_{f_{h}},\ldots,v_{s_{h}-1}$ .

Note that $\gamma_{d,\ell}$ can be equivalently represented as a finite-memory strategy $\sigma_{d,\ell}$ where the memory elements correspond to the (finitely many) histories without a folding pair.

It remains to show that for every $\varepsilon>0$ there are sufficiently large $d,\ell$ such that $\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}+\varepsilon$ .

We start by observing important structural properties of the optimal strategy $\gamma$ . Let $v^{*}\in V$ such that $\sup_{\pi}\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]=\operatorname{Val}$ . A history $h$ is $\gamma$ -eligible if $\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))>0$ . For every $\gamma$ -eligible history $h$ , let $\gamma[h]$ be a strategy such that $\gamma[h](h^{\prime})=\gamma(h\odot h^{\prime})$ for every $h^{\prime}\in\mathcal{H}$ initiated in the last vertex $u$ of $h$ (for the other $h^{\prime}$ , the strategy $\gamma[h]$ is defined arbitrarily). We show that

\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]\quad=\quad\operatorname{Val}

(13)

Clearly,

\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]\quad\geq\quad\operatorname{Val}

(14)

for every $\gamma$ -eligible $h$ because otherwise we have a contradiction with the definition of $\operatorname{Val}$ . Now suppose $\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]>\operatorname{Val}$ for some $\gamma$ -eligible $h=v_{1},\ldots,v_{n}$ . We show that then there is an Attacker’s strategy $\pi$ such that $\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]>\operatorname{Val}$ , which contradicts the optimality of $\gamma$ . Let $H$ be the set of all $\gamma$ -eligible histories of the form $v_{1},\ldots,v_{k},t$ where $k<n$ and $t\neq v_{k+1}$ . Furthermore, let $\kappa=\sup_{\pi}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi}]-\operatorname{Val}$ , and let $\delta>0$ be a constant satisfying

\delta\cdot\big{(}1-\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))\big{)}\quad\leq\quad\frac{\mathbb{P}^{\gamma,v^{*}}(\mathit{walk}(h))\cdot\kappa}{3}

For every $h^{\prime}\in H$ , let $\pi_{h^{\prime}}$ be an Attacker’s strategy such that $\mathbb{E}^{\gamma[h^{\prime}],u}[\mathcal{D}^{\pi_{h^{\prime}}}]\geq\operatorname{Val}-\delta$ , and we also fix an Attacker’s strategy $\pi_{h}$ such that $\mathbb{E}^{\gamma[h],v_{n}}[\mathcal{D}^{\pi_{h^{\prime}}}]\geq\operatorname{Val}+\frac{2}{3}\kappa$ . The strategy $\pi$ is defined as follows. For every $\hbar\in H\cup\{h\}$ and every observation of the form $h^{\prime\prime},u\to t$ such that $h^{\prime\prime}t$ is $\gamma[\hbar]$ -eligible and $\pi_{\hbar}(h^{\prime\prime},u{\to}t)=\mathit{attack}_{\tau}$ for some $\tau\in T$ , we put $\pi(\hbar\odot h^{\prime\prime},u{\to}t)=\mathit{attack}_{\tau}$ . Thus, we obtain $\mathbb{E}^{\gamma,v^{*}}[\mathcal{D}^{\pi}]\geq\operatorname{Val}+\kappa/3$ .

For every target $\tau$ , let $\pi[\tau]$ be the Attacker’s strategy where $\pi[\sigma](u,u\to v)=\mathit{attack}_{\tau}$ for all $u,v\in V$ , i.e., $\pi[\tau]$ attacks $\tau$ right after the Defender starts its walk. An immediate consequence of (13) is that for every $\gamma$ -eligible history $h$ ending in a vertex $u$ and every target $\tau$ , we have that

\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}]\quad\leq\quad\operatorname{Val}

(15)

Let $R_{\tau}$ be a function assigning to every walk $w=v_{0},v_{1},\ldots$ the least $n$ such that $v_{n}=\tau$ . If there is no such $n$ , we put $R_{\tau}(w)=\infty$ . Since $R_{\tau}\leq\mathcal{D}^{\pi[\tau]}$ , we have that

\mathbb{P}^{\gamma[h],u}(R_{\tau}\geq 2\operatorname{Val})\quad\leq\quad\frac{1}{2}

(16)

by (15) and Markov inequality.

Note that (16) holds for all $\gamma$ -eligible $h$ and $\tau$ . Hence, we have that

\mathbb{P}^{\gamma[h],u}(R_{\tau}\geq i\cdot 2\operatorname{Val})\quad\leq\quad\frac{1}{2^{i}}

(17)

because $R_{\tau}\geq i\cdot 2\operatorname{Val}$ requires success of $i$ consecutive independent experiments where each experiment succeeds with probability bounded by $1/2$ .

Now we show that for every $\varepsilon>0$ , there exist sufficiently large $d,\ell$ such that $\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}(\gamma)+\varepsilon$ , which proves our theorem.

For the rest of this proof, we fix $\varepsilon>0$ . Furthermore, we fix $k\in\mathbb{N}$ satisfying

\sum_{i=2}^{\infty}\frac{1}{2^{k(i-1)}}\cdot i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max}\quad\leq\quad\varepsilon/4

(18)

Here $\mathit{tm}_{\max}=\max\{\mathit{tm}(e)\mid e\in E\}$ and $\alpha_{\max}=\max\{\alpha(\tau)\mid\tau\in T\}$ . Note that such a $k$ exists because the above sum converges to $0$ as $k\to\infty$ . For every $i\in\mathbb{N}$ , let $\mathcal{I}_{i}$ be the set of all integers $j$ satisfying

(i-1)\cdot k\cdot 2\operatorname{Val}\quad\leq\quad j\quad<\quad i\cdot k\cdot 2\operatorname{Val}\,.

We have that

\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}]=\sum_{i=1}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}]

(19)

Observe

	$\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]$	$\displaystyle\leq$	$\displaystyle i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max}$		(20)
	$\displaystyle\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}]$	$\displaystyle\leq$	$\displaystyle\frac{1}{2^{k(i-1)}}$		(21)

Inequality (20) is trivial, and (21) follows from (17). Using (18), we obtain

			$\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}]$		(22)
		$\displaystyle\leq$	$\displaystyle\sum_{i=2}^{\infty}i\cdot k\cdot 2\operatorname{Val}\cdot\mathit{tm}_{\max}\cdot\alpha_{\max}\cdot\frac{1}{2^{k(i-1)}}\quad\leq\quad\varepsilon/4$		(22)

Consider a strategy $\gamma_{d}$ where $d\geq|V|$ . Then every $\gamma$ -eligible history is $\gamma_{d}$ -eligible, and vice-versa. Furthermore, for every $\delta>0$ , there exists a sufficiently large $d$ such that, for all $h$ , $\tau$ , and $i$ ,

	$\displaystyle\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]$	$\displaystyle\leq$	$\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]+\delta$
	$\displaystyle\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{i}]$	$\displaystyle\leq$	$\displaystyle(1/2+\delta)^{k\cdot(i-1)}$

Consequently, we can fix a sufficiently large $d$ such that the values of

	$\displaystyle\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{1}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{1}],$
	$\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma[h],u}[R_{\tau}\in\mathcal{I}_{i}]$

increase at most by $\varepsilon/4$ when $\gamma$ is replaced with $\gamma_{d}$ . Then,

	$\displaystyle\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}]\cdot\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{1}]$	$\displaystyle\leq$	$\displaystyle\operatorname{Val}+\frac{\varepsilon}{4}$
	$\displaystyle\sum_{i=2}^{\infty}\mathbb{E}^{\gamma_{d}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma_{d}[h],u}[R_{\tau}\in\mathcal{I}_{i}]$	$\displaystyle\leq$	$\displaystyle\frac{\varepsilon}{2}$		(23)

Now consider a strategy $\gamma_{d,\ell}$ where $\ell=k\cdot 2\operatorname{Val}$ , and let $h$ be a $\gamma_{d,\ell}$ -eligible history. Then $\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}]$ is equal to

\sum_{i=1}^{\infty}\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\cdot\mathbb{P}^{\gamma_{d,\ell}[h],u}[R_{\tau}\in\mathcal{I}_{i}]

By definition of $\ell$ -folding, for every $i\geq 1$ there is a $\gamma_{d}$ -eligible history $h^{\prime}$ without a folding pair such that

\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{i}]\ =\ \mathbb{E}^{\gamma_{d}[h^{\prime}],u}[\mathcal{D}^{\pi[\tau]}\mid R_{\tau}\in\mathcal{I}_{1}]

(24)

Furthermore,

\mathbb{P}^{\gamma_{d,\ell}[h],u}[R_{\tau}\in\mathcal{I}_{i}]\ \leq\ (1-\xi)^{k(i-1)}

(25)

where $\xi$ is the maximal $\mathbb{P}^{\gamma_{d}[h^{\prime}],u}[R_{\tau}\in\mathcal{I}_{1}]$ such that $h^{\prime}$ is a $\gamma_{d}$ -eligible history $h^{\prime}$ without a folding pair .

Due to (24) and (25), the bounds of (23) remain valid when $\gamma_{d}$ is replaced with $\gamma_{d,\ell}$ and $h$ is a $\gamma_{d,\ell}$ -eligible history. Hence,

\mathbb{E}^{\gamma_{d,\ell}[h],u}[\mathcal{D}^{\pi[\tau]}]\leq\operatorname{Val}+\varepsilon

(26)

Since (26) holds for every $\gamma_{d,\ell}$ -eligible $h$ and every target $\tau$ , we obtain $\operatorname{Val}(\gamma_{d,\ell})\leq\operatorname{Val}+\varepsilon$ . ∎

4. Strategy Synthesis Algorithm

In this section, we design an efficient algorithm synthesizing a regular Defender’s strategy for a given patrolling graph capable of balancing the trade-off between maximizing $\operatorname{Val}(\sigma)$ and minimizing the tail probabilities $P_{c}(\operatorname{Val}(\sigma))$ .

4.1. Computing $\operatorname{Val}(\sigma)$

First, we show how to compute $\operatorname{Val}(\sigma)$ for a given regular strategy $\sigma$ . Let $G$ be a patrolling graph and $\sigma$ a regular strategy for $G$ . Let $\smash{\widehat{E}}$ be the set of all $(\smash{\widehat{u}},\smash{\widehat{v}})\in\smash{\widehat{V}}\times\smash{\widehat{V}}$ such that $\sigma(\smash{\widehat{u}})(\smash{\widehat{v}})>0$ , i.e., $\smash{\widehat{E}}$ is the set of augmented edges used by $\sigma$ . For every target $\tau$ , let $\pi[\tau]$ be the Attacker strategy where for all $u,v\in V$ we have that $\pi[\tau](u,u\to v)=\mathit{attack}_{\tau}$ , i.e., $\pi[\tau]$ attacks $\tau$ immediately after the Defender starts its walk.

For every $\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in\smash{\widehat{E}}$ and $\tau\in T$ , let $\mathcal{L}_{\tau,\smash{\widehat{e}}}$ be the expected damage caused by an attack at $\tau$ scheduled right after the Defender starts traversing $\smash{\widehat{e}}$ , i.e.,

\mathcal{L}_{\tau,\smash{\widehat{e}}}\quad=\quad\mathbb{E}^{\sigma,\smash{\widehat{u}}}\big{[}\mathcal{D}^{\pi[\tau]}\mid walk(\smash{\widehat{e}})\big{]}\,.

(27)

Hence, $\mathcal{L}_{\tau,\smash{\widehat{e}}}$ is the conditional expected value of $\mathcal{D}^{\pi[\tau]}$ under the condition that the Defender’s walk starts by traversing $\smash{\widehat{e}}$ .

Consider the directed graph $\smash{\widehat{G}}=(\smash{\widehat{V}},\smash{\widehat{E}})$ , and let $\mathcal{B}$ denote the set of all bottom strongly connected components of $\smash{\widehat{G}}$ . Let

\mathcal{L}(\sigma)\quad=\quad\min_{B\in\mathcal{B}}\ \max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B)}\ \mathcal{L}_{\tau,\smash{\widehat{e}}}

(28)

where $E(B)=\smash{\widehat{E}}\cap(B\times B)$ is the set of augmented edges in the component $B$ used by $\sigma$ . We have the following:

{theorem}

Let $\sigma$ be a regular strategy for a patrolling graph $G$ . Then $\operatorname{Val}(\sigma)\leq\mathcal{L}(\sigma)$ . If $\sigma$ is unambiguous, then $\operatorname{Val}(\sigma)=\mathcal{L}(\sigma)$ .

Proof.

For purposes of this proof, we need to introduce several notions. An augmented walk is an infinite sequence $\smash{\widehat{v}}_{1},\smash{\widehat{v}}_{2},\dots$ such that $v_{i}\to v_{i+1}$ for all $i$ . The set of all augmented walks is denoted $\smash{\widehat{\mathcal{W}}}$ . An augmented history is a non-empty finite prefix of an augmented walk. The set of all augmented histories is denoted by $\smash{\widehat{\mathcal{H}}}$ , and for $\smash{\widehat{h}}\in\smash{\widehat{\mathcal{H}}}$ , $\mathit{walk}(\smash{\widehat{h}})$ denotes the set of all augmented walks starting with $\smash{\widehat{h}}$ .

Furthermore, for every regular strategy $\sigma$ and every initial $\smash{\widehat{v}}\in\smash{\widehat{V}}$ , we define the probability space $(\smash{\widehat{\mathcal{W}}},\smash{\widehat{\mathcal{F}}},\smash{\widehat{\mathbb{P}}}^{\sigma,\smash{\widehat{v}}})$ over the augmented walks in the expected way.

For an arbitrary walk $w=v_{1},v_{2},\dots$ , let $Aug(w)$ denote the set of all augmented walks of the form $\smash{\widehat{v}}_{1},\smash{\widehat{v}}_{2},\dots$ . Observe that for every measurable event $F\in\mathcal{F}$ , writing $\smash{\widehat{F}}=\bigcup_{w\in F}Aug(w)$ , we have $\smash{\widehat{F}}\in\smash{\widehat{\mathcal{F}}}$ and $\smash{\widehat{\mathbb{P}}}^{\sigma,\smash{\widehat{v}}}(\smash{\widehat{F}})=\mathbb{P}^{\sigma,\smash{\widehat{v}}}(F)$ . Hence, to simplify our notation, we write $\mathbb{P}$ instead of $\smash{\widehat{\mathbb{P}}}$ . The extension of the random variable $\mathcal{D}^{\pi}$ to augmented walks is straightforward.

Let $\sigma$ be a regular Defender’s strategy. We prove $\operatorname{Val}(\sigma)\leq\mathcal{L}(\sigma)$ . Recall

	$\displaystyle\operatorname{Val}(\sigma)$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{\smash{\widehat{v}}\in\smash{\widehat{V}}}\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]$
	$\displaystyle\mathcal{L}(\sigma)$	$\displaystyle=$	$\displaystyle\min_{B\in\mathcal{B}}\ \max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B)}\ \mathcal{L}_{\tau,\smash{\widehat{e}}}$

Let $B^{*}\in\mathcal{B}$ be a bottom strongly connected component achieving the above minimum. We show that

\mathop{\mathrm{min}\vphantom{\mathrm{sup}}}_{\smash{\widehat{v}}\in\smash{\widehat{V}}}\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\leq\max_{\tau\in T}\ \max_{\smash{\widehat{e}}\in E(B^{*})}\ \mathcal{L}_{\tau,\smash{\widehat{e}}}.

Denoting the right-hand side by $M$ , we show that there is $\smash{\widehat{v}}\in\smash{\widehat{V}}$ such that $\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\leq M$ . Choose an arbitrary $\smash{\widehat{v}}\in B^{*}$ and assume, for the sake of contradiction, that $\sup_{\pi}\ \mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]>M$ . Hence, there is an Attacker’s strategy $\pi$ such that $\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]>M$ .

Now we decompose the expectation $\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]$ according to augmented observations, i.e., sequences $\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}$ , where $v_{1},\ldots,v_{n+1}$ is a path in $G$ . Given $\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}$ , we use $o$ to denote the “unaugmented” observation $v_{1},\ldots,v_{n},v_{n}{\rightarrow}v_{n+1}$ . Let $\smash{\widehat{\Omega}}$ be the set of all augmented observation, and let $\mathit{Att}(\smash{\widehat{\Omega}}){\subseteq}\smash{\widehat{\Omega}}$ be the set of all $\smash{\widehat{o}}$ where $\pi(o)\neq\mathit{wait}$ . For every $\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\to}\smash{\widehat{v}}_{n+1}$ in $\mathit{Att}(\smash{\widehat{\Omega}})$ , let $\mathit{walk}(\smash{\widehat{o}})$ be the set of all augmented walks starting with $\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n+1}$ . Note that if $\smash{\widehat{o}},\smash{\widehat{o}}^{\prime}\in\mathit{Att}(\smash{\widehat{\Omega}})$ and $\smash{\widehat{o}}\neq\smash{\widehat{o}}^{\prime}$ , then $\mathit{walk}(\smash{\widehat{o}})\cap\mathit{walk}(\smash{\widehat{o}}^{\prime})=\emptyset$ . Furthermore, for every $\smash{\widehat{w}}\in\smash{\widehat{\mathcal{W}}}\setminus\bigcup_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathit{walk}(\smash{\widehat{o}})$ , we have that $\mathcal{D}^{\pi}(\smash{\widehat{w}})=0$ . Hence, we obtain

\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]=\sum_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))\cdot\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})].

Since

\sum_{\smash{\widehat{o}}\in\mathit{Att}(\smash{\widehat{\Omega}})}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))\leq 1,

there is $\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\to}\smash{\widehat{v}}_{n+1}\in\mathit{Att}(\smash{\widehat{\Omega}})$ such that

\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))>0\quad\hbox{and}\quad\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})]>M.

Let $\smash{\widehat{e}}=(\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n+1})$ , and let $\tau\in T$ be the attacked target (i.e., $\pi(o)=\mathit{attack}_{\tau}$ ). Since $\smash{\widehat{v}}=\smash{\widehat{v}}_{1}\in B^{*}$ and $\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{walk}(\smash{\widehat{o}}))>0$ , it holds that $\smash{\widehat{e}}\in E(B^{*})$ . Since $\sigma$ is regular, its (randomized) behavior after $\smash{\widehat{e}}$ is always the same, regardless of the traversed history. Therefore,

\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{walk}(\smash{\widehat{o}})]=\mathbb{E}^{\sigma,\smash{\widehat{v}}_{n}}[\mathcal{D}^{\pi[\tau]}\mid\mathit{walk}(\smash{\widehat{e}})]=\mathcal{L}_{\tau,\smash{\widehat{e}}}\,.

Hence, $\mathcal{L}_{\tau,\smash{\widehat{e}}}>M$ , which contradicts the definition of $M$ .

Now assume that $\sigma$ is unambiguous. We prove that $\operatorname{Val}(\sigma)\geq\mathcal{L}(\sigma)$ . So, assume that $\smash{\widehat{v}}\in\smash{\widehat{V}}$ achieves the minimum in the definition of $\operatorname{Val}(\sigma)$ . We construct an Attacker’s strategy $\pi$ such that $\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma)$ . For this purpose, let $\varphi\colon\mathcal{B}\rightarrow T$ and $\psi\colon\mathcal{B}\rightarrow\smash{\widehat{E}}$ be such that for every $B\in\mathcal{B}$ , $\psi(B)\in E(B)$ and the choice $\tau=\varphi(B),\smash{\widehat{e}}=\psi(B)$ achieves the maximal $\mathcal{L}_{\tau,\smash{\widehat{e}}}$ (see the definition of $\mathcal{L}(\sigma)$ ). In particular, for every $B\in\mathcal{B}$ , we have

\mathcal{L}_{\varphi(B),\psi(B)}\geq\mathcal{L}(\sigma).

(29)

Note that the condition $\psi(B)\in E(B)$ for every $B\in\mathcal{B}$ implies that $\psi$ is injective. Moreover, let $\mathcal{E}$ denote the range of $\psi$ (thus, $\psi\colon\mathcal{B}\rightarrow\mathcal{E}$ is a bijection) and for every $\smash{\widehat{e}}\in\mathcal{E}$ , let $\tau(\smash{\widehat{e}})$ denote the target $\varphi(\psi^{-1}(\smash{\widehat{e}}))$ . Using this notation, (29) becomes

\mathcal{L}_{\tau(\smash{\widehat{e}}),\smash{\widehat{e}}}\geq\mathcal{L}(\sigma)

(30)

for every $\smash{\widehat{e}}\in\mathcal{E}$ .

For every observation $o\in\Omega$ s.t. $\mathbb{P}^{\sigma,\smash{\widehat{v}}}(walk(o))>0$ , there is exactly one augmented observation of the form $\smash{\widehat{o}}$ such that $\mathbb{P}^{\sigma,\smash{\widehat{v}}}(walk(\smash{\widehat{o}}))>0$ (this follows by a trivial induction on the length of $o$ ). In the rest of this proof, for every $o\in\Omega$ , the symbol $\smash{\widehat{o}}$ denotes the unique augmented observation satisfying the above. Now, for each $o\in\Omega$ , we define $\pi(o)$ as follows: Let $\smash{\widehat{o}}=\smash{\widehat{v}}_{1},\ldots,\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n}{\rightarrow}\smash{\widehat{v}}_{n+1}$ . If none of the augmented edges $(\smash{\widehat{v}}_{i},\smash{\widehat{v}}_{i+1})$ for $1\leq i<n$ is in $\mathcal{E}$ and the last augmented edge $\smash{\widehat{e}}=(\smash{\widehat{v}}_{n},\smash{\widehat{v}}_{n+1})$ does appear in $\mathcal{E}$ , we put $\pi(o)=\mathit{attack}_{\tau(\smash{\widehat{e}})}$ . Otherwise, we put $\pi(o)=\mathit{wait}$ . For every $\smash{\widehat{e}}\in\mathcal{E}$ , let $\mathit{Att}(\smash{\widehat{e}})$ denote the union of $\mathit{walk}(\smash{\widehat{o}})$ over all $o\in\Omega$ such that $\pi(o)\neq\mathit{wait}$ and $\smash{\widehat{o}}$ ends with $\smash{\widehat{e}}$ . Since the Attacker may attack only once along any (augmented) walk, we have $\mathit{Att}(\smash{\widehat{e}})\cap\mathit{Att}(\smash{\widehat{e}}^{\prime})=\emptyset$ for all $\smash{\widehat{e}},\smash{\widehat{e}}^{\prime}\in\mathcal{E}$ such that $\smash{\widehat{e}}\neq\smash{\widehat{e}}^{\prime}$ . Since $\mathcal{D}^{\pi}$ is non-negative, we obtain

\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\sum_{\widehat{e}\in\mathcal{E}}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{Att}(\smash{\widehat{e}}))\cdot\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{Att}(\smash{\widehat{e}})].

(31)

Since $\sigma$ is regular and, for every $\smash{\widehat{e}}\in\mathcal{E}$ , $\pi$ always attacks the same target $\tau(\smash{\widehat{e}})$ when the Defender starts traversing $\smash{\widehat{e}}$ (regardless of the previous history), we have

\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}\mid\mathit{Att}(\smash{\widehat{e}})]=\mathcal{L}_{\tau(\smash{\widehat{e}}),\smash{\widehat{e}}}\geq\mathcal{L}(\sigma),

where the inequality follows from (30). Substituting into (31) yields

\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma)\cdot\sum_{\widehat{e}\in\mathcal{E}}\mathbb{P}^{\sigma,\smash{\widehat{v}}}(\mathit{Att}(\smash{\widehat{e}})).

The desired inequality $\mathbb{E}^{\sigma,\smash{\widehat{v}}}[\mathcal{D}^{\pi}]\geq\mathcal{L}(\sigma)$ follows from the fact that the above sum is equal to $1$ (this follows by applying basic results of finite Markov chain theory; the Defender almost surely visits some bottom strongly connected component $B\in\mathcal{B}$ and there it almost surely traverses every edge infinitely often. In particular, the Defender almost surely visits the unique edge $\smash{\widehat{e}}\in\mathcal{E}\cap E(B)$ ). ∎

Our proof of Theorem 4.1 reveals that the Attacker can cause the expected damage equal to $\mathcal{L}(\sigma)$ if it can observe the memory updates performed by the Defender. If $\sigma$ is unambiguous, then the Attacker can determine the memory updates just by observing the history of Defender’s moves. However, if the memory updates are randomized, the Attacker needs to access the Defender’s internal data structures during a patrolling walk. Depending on a setup, this may or may not be possible. By the worst-case paradigm of adversarial patrolling, $\mathcal{L}(\sigma)$ is more appropriate than $\operatorname{Val}(\sigma)$ for measuring the protection achieved by $\sigma$ . As we shall see, our strategy synthesis algorithm typically outputs unambiguous regular strategies where $\mathcal{L}(\sigma)=\operatorname{Val}(\sigma)$ . Hence, it does not really matter whether $\mathcal{L}(\sigma)$ is understood as the protection achieved by $\sigma$ or just a bound on this protection.

Fix $B\in\mathcal{B}$ and $\tau\in T$ . If $B$ does not contain any augmented vertex of the form $\smash{\widehat{\tau}}$ , then $\mathcal{L}_{\tau,\smash{\widehat{e}}}=\infty$ for all $\smash{\widehat{e}}\in E(B)$ . Otherwise, to every $\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in E(B)$ , we associate a variable $X_{\smash{\widehat{e}}}$ , and create a system of linear equations

X_{\smash{\widehat{e}}}\;=\;\mathit{tm}(u,v)+\begin{cases}0&\mbox{if $v=\tau$,}\\ \sum_{\smash{\widehat{v}}\to\smash{\widehat{w}}}\sigma(\smash{\widehat{v}})(\smash{\widehat{w}})\cdot X_{(\smash{\widehat{v}},\smash{\widehat{w}})}&\mbox{otherwise}\end{cases}

(32)

over $\smash{\widehat{e}}\in E(B)$ . By a straightforward generalization of (Norris, 1998, Theorem 1.3.5), system (32) has a unique solution, equal to $(\mathcal{L}_{\tau,\smash{\widehat{e}}})_{\smash{\widehat{e}}\in E(B)}$ .

Observe that for all augmented edges $\smash{\widehat{e}},\smash{\widehat{g}}\in E(B)$ leading to the same augmented vertex (say $\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})$ and $\smash{\widehat{g}}=(\smash{\widehat{w}},\smash{\widehat{v}})$ ), we have $\mathcal{L}_{\tau,\smash{\widehat{g}}}=\mathcal{L}_{\tau,\smash{\widehat{e}}}-\mathit{tm}(u,v)+\mathit{tm}(w,v)$ . Therefore, we may reduce the number of variables as well as equations of the system from $|E(B)|$ to $|B|$ . Indeed, to every $\smash{\widehat{v}}\in B$ , we assign a variable $Y_{\smash{\widehat{v}}}$ , and construct a system of linear equations

Y_{\smash{\widehat{v}}}\;=\;\begin{cases}0&\mbox{if $v=\tau$,}\\ \sum_{\smash{\widehat{v}}\to\smash{\widehat{w}}}\sigma(\smash{\widehat{v}})(\smash{\widehat{w}})\cdot(\mathit{tm}(v,w)+Y_{\smash{\widehat{w}}})&\mbox{otherwise.}\end{cases}

(33)

Then, for every $\smash{\widehat{e}}=(\smash{\widehat{u}},\smash{\widehat{v}})\in E(B)$ , we have $\mathcal{L}_{\tau,\smash{\widehat{e}}}=\mathit{tm}(u,v)+y_{\smash{\widehat{v}}}$ , where $(y_{\smash{\widehat{v}}})_{\smash{\widehat{v}}\in B}$ is the unique solution of the system (33).

4.2. Optimization Scheme

Our strategy synthesis algorithm is based on interpreting $\mathcal{L}$ as a piecewise differentiable function and applying methods of differentiable programming. We start from a random strategy $\sigma$ , repeatedly compute $\mathcal{L}(\sigma)$ and update the strategy against the direction of its gradient.

The optimization algorithm is described in Algo. 1. On forward pass, strategies are produced from real-valued coefficients by a Softmax function that outputs probability distributions. For every target $\tau$ , we solve the system (33) to obtain a damage $(\mathcal{L}_{\tau,\smash{\widehat{e}}})_{\tau\in T,\smash{\widehat{e}}\in E(B)}$ . Then, instead of hard maximum in equation (28) we optimize a loss function defined by

\text{loss}\;=\;\sum\nolimits_{\tau,\smash{\widehat{e}}}\Phi_{\varepsilon}(\mathcal{L}_{\tau,\smash{\widehat{e}}})^{2},

(34)

where $\Phi_{\varepsilon}(t)=0$ for $t\in[0,m-\varepsilon m)$ and $\Phi_{\varepsilon}(t)=1+(t-m)/\varepsilon m$ for $t\in[m-\varepsilon m,m]$ , in which $m=\overline{\mathcal{L}}$ is the hard maximum, the bar denotes the stop-gradient operator, and $\varepsilon\in(0,1)$ is a hyperparameter. Minimizing loss instead of $\mathcal{L}$ leads to a more efficient gradient propagation. On top of this, we enforce the model to prefer deterministic strategies over randomized by adding an average entropy of strategies’ probability distributions with a factor $\beta$ .

On backward pass, the loss gradient is computed using the automatic differentiation, we add decaying Gaussian noise and update the coefficients using Adam optimizer (Kingma and Ba, 2015).

As Softmax never produces probability distributions containing zeros, we cut the outputs at a certain threshold (called rounding threshold) to allow endpoint values on evaluation. Note that edges with zero probabilities are excluded from $E(B)$ which is crucial for equation (33).

Algorithm 1 Strategy optimization

coefficients

\leftarrow

Init()

for step

\in

steps do

// Forward pass

strategy

\leftarrow

Softmax(coefficients)

// solving linear system (33)

damage

\leftarrow

Solve(strategy)

loss

\leftarrow

Loss(damage)

+

\beta\cdot

Entropy(strategy)

// Backward pass

coefficients.grad

\leftarrow

Gradient(loss)

// automatic differentiation

coefficients.grad += Noise(step)

// Adam optimizer’s step

coefficients += Step(coefficients.grad, step)

// Strategy evaluation

strategy

\leftarrow

Cutoff(Softmax(coefficients))

damage

\leftarrow

Solve(strategy)

\mathcal{L}\leftarrow

Max(damage)

Save

\mathcal{L}

, strategy return strategy with the smallest

\mathcal{L}

5. Experiments

We experimentally evaluate strategy synthesis algorithm on series of synthetic graphs with increasing sizes. We perform two sets of tests. The first analyzes runtimes while the second one focuses on the achieved protection values.

The experiments were performed on a desktop machine with Ubuntu 20.04 LTS running on Intel^® Core™ i7-8700 Processor (6 cores, 12 threads) with 32GB RAM.

5.1. Runtime Analysis

We generate synthetic graphs with $n=10,20,\ldots,100$ vertices. To obtain random but similarly structured graphs, we start with a grid of size $n\times n$ and choose its $n$ nodes as the vertices of our patrolling graph, half of them being targets. All vertices are equipped with 6 memory elements. The travel time between vertices is set to the number of edges on the shortest path in the original grid. In the final patrolling graph, we omit those edges that have an alternative connection of at most the same length visiting another vertex.

For each $n$ , we generate 10 graphs of $n$ vertices and run 10 optimization trials with 100 steps for each graph. In Fig. 3, we report statistics of average step-times in seconds aggregated by $n$ . Note than even considerably large graphs are processed in units of seconds, which confirms the applicability of our algorithm to dynamically changing environments (see Section 1).

Recall that one optimization step consists of a forward pass (damage and loss of the current strategy), backward pass (gradient), and one more test evaluation. For hyperparameters, we set $\varepsilon=0.3$ , $\beta=0.2$ , learning rate $=0.5$ , cutoff threshold $=0.1$ , and rounding threshold $=0.001$ (for a deeper explanation, see Section 4.2).

Refer to caption — Figure 3. Runtime analysis of the strategy synthesis algorithm (in seconds).

Fig. 4 shows the convergence of values during the optimization process. Here we fix one graph for each number of vertices and run 10 trials each with 120 steps. The colors are assigned to individual graphs, the areas show ranges of the obtained values during the optimization process. The solid lines highlight minimal values.

5.2. Patrolling Airport Gates

One typical application of patrolling is security patrol at airports (see Section 1). Airport buildings have a specific tree structure of terminals with symmetric gates with a central node connecting the terminals. A terminal typically consists of pairs of gates joined by halls. A patrolling graph for an airport with three terminals of 4, 2, and 6 gates is shown in Fig. 5.

Figure 5. A patrolling graph for an airport with 3 terminals.

We generate a sequence of random airport graphs with 3 terminals and an increasing number of gates determined randomly. Hence, an airport graph with $n$ gates has $n/2$ halls and exactly one central node. The gates are targets and have exactly one memory element, while halls and the central node are non-target vertices with $4$ memory elements each (thus, the strategy can “remember” the previously visited vertex). The costs of all targets are equal to $1$ , and all edges have the same traversal time $1$ .

Since the target costs are the same, we can estimate $\operatorname{Val}$ (the achievable protection) by the length of the shortest cycle visiting all targets, which is $2(|V|-1)$ where $V$ is the set of vertices. Note that automatic synthesis of a regular strategy with comparable protection is tricky—the synthesis algorithm must “discover” the relevance of the previously visited vertex and design the memory updates accordingly.

We use the same hyperparameters as above. For each airport, we synthesized 30 strategies, each in 500 steps of iterations. In Fig. 6, we show the protection values of the synthesized strategies for increasing number of vertices normalized by the baseline $2(|V|-1)$ . For larger $|V|$ , the optimization converges to locally optimal randomized strategies with worse protection than the deterministic-loop strategy. In particular, for $|V|=91$ , the protection achieved by the constructed strategy is about $33\%$ worse than the baseline on average (with best found strategy loosing $20\%$ above the baseline).

6. Conclusion

The outcomes show that high-quality Defender’s strategies are computed very quickly even for instances of considerable size. Hence, our algorithm can also be used to re-compute a Defender’s strategy dynamically when the patrolling scenario changes.

The problem encountered in our experiments is the existence of locally optimal randomized strategies where the optimization loop gets stuck. An interesting question is whether this problem can be overcome by tuning the parameters of gradient descent or by constructing the initial seeds in a more sophisticated way.

A natural continuation of our study is extending the presented results to scenarios with multiple Defenders and Attackers.

Acknowledgements

Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-21-1-0189.

Vít Musil is also supported from Operational Programme Research, Development and Education - Project Postdoc2MUNI (No. CZ.02.2.69/0.0/0.0/18_053/0016952).

Disclaimer

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

References

(1)
Agmon et al. (2008a) N. Agmon, S. Kraus, and G. Kaminka. 2008a. Multi-Robot Perimeter Patrol in Adversarial Settings. In Proceedings of ICRA 2008. IEEE Computer Society Press, 2339–2345.
Agmon et al. (2008b) N. Agmon, V. Sadov, G.A. Kaminka, and S. Kraus. 2008b. The impact of adversarial knowledge on adversarial planning in perimeter patrol. In Proceedings of AAMAS 2008. 55–62.
Almeida et al. (2004) A. Almeida, G. Ramalho, H. Santana, P. Tedesco, T. Menezes, V. Corruble, and Y. Chevaleyr. 2004. Recent Advances on Multi-Agent Patrolling. Advances in Artificial Intelligence – SBIA 3171 (2004), 474–483.
An et al. (2014) B. An, E. Shieh, R. Yang, M. Tambe, C. Baldwin, J. DiRenzo, B. Maule, and G. Meyer. 2014. Protect—A Deployed Game Theoretic System for Strategic Security Allocation for the United States Coast Guard. AI Magazine 33, 4 (2014), 96–110.
Basilico et al. (2009a) N. Basilico, N. Gatti, and F. Amigoni. 2009a. Leader-follower strategies for robotic patrolling in environments with arbitrary topologies. In Proceedings of AAMAS 2009. 57–64.
Basilico et al. (2012a) N. Basilico, N. Gatti, and F. Amigoni. 2012a. Patrolling Security Games: Definitions and Algorithms for Solving Large Instances with Single Patroller and Single Intruder. Artificial Inteligence 184–185 (2012), 78–123.
Basilico et al. (2009b) N. Basilico, N. Gatti, T. Rossi, S. Ceppi, and F. Amigoni. 2009b. Extending algorithms for mobile robot patrolling in the presence of adversaries to more realistic settings. In Proccedings of WI-IAT 2009. 557–564.
Basilico et al. (2010) Nicola Basilico, Nicola Gatti, and Federico Villa. 2010. Asynchronous Multi-Robot Patrolling against Intrusion in Arbitrary Topologies. In Proceedings of AAAI 2010.
Basilico et al. (2012b) N. Basilico, G. De Nittis, and N. Gatti. 2012b. A Security Game Combining Patrolling and Alarm-Triggered Responses Under Spatial and Detection Uncertainties. In Proceedings of AAAI 2016. 404–410.
Biswas et al. (2021) A. Biswas, G. Aggarwal, P. Varakantham, and M. Tambe. 2021. Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2021).
Bosansky et al. (2011) B. Bosansky, V. Lisy, M. Jakob, and M. Pechoucek. 2011. Computing Time-Dependent Policies for Patrolling Games with Mobile Targets. In Proceedings of AAMAS 2011.
Brázdil et al. (2018) T. Brázdil, A. Kučera, and V. Řehák. 2018. Solving Patrolling Problems in the Internet Environment. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2018). 121–127.
Chen et al. (2017) H. Chen, T. Cheng, and S. Wise. 2017. Developing an Online Cooperative Police Patrol Routing Strategy. Computers, Environment and Urban Systems 62 (2017), 19–29.
Fang et al. (2013) Fei Fang, Albert Xin Jiang, and Milind Tambe. 2013. Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile Resources. In Proceedings of AAMAS 2013.
Fave et al. (2014) F.M. Delle Fave, A.X. Jiang, Z. Yin, C . Zhang, M. Tambe, S. Kraus, and J. Sullivan. 2014. Game-Theoretic Security Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System. Journal of Artificial Intelligence Research 50 (2014), 321–367.
Ford et al. (2014) B. Ford, D. Kar, F.M. Delle Fave, R. Yang, and M. Tambe. 2014. PAWS: Adaptive Game-Theoretic Patrolling for Wildlife Protection. In Proceedings of AAMAS 2014. 1641–1642.
Ho and Ouaknine (2015) Hsi-Ming Ho and J. Ouaknine. 2015. The Cyclic-Routing UAV Problem is PSPACE-Complete. In Proceedings of FoSSaCS 2015 (Lecture Notes in Computer Science, Vol. 9034). Springer, 328–342.
Huang et al. (2019) L. Huang, M. Zhou, K. Hao, and E. Hou. 2019. A Survey of Multi-robot Regular and Adversarial Patrolling. IEEE/CAA Journal of Automatica Sinica 6, 4 (2019), 894–903.
Jakob et al. (2011) M. Jakob, O. Vanek, and M. Pechoucek. 2011. Using Agents to Improve International MaritimeTransport Security. IEEE Intelligent Systems 26, 1 (2011), 90–96.
Karwowski et al. (2019) J. Karwowski, J. Mandziuk, A. Zychowski, F. Grajek, and B. An. 2019. A Memetic Approach for Sequential Security Games on a Plane with Moving Targets. In Proceedings of AAAI 2019. 970–977.
Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of ICLR 2015.
Klaška et al. (2018) D. Klaška, A. Kučera, T. Lamser, and V. Řehák. 2018. Automatic Synthesis of Efficient Regular Strategies in Adversarial Patrolling Games. In Proceedings of AAMAS 2018. 659–666.
Klaška et al. (2021) D. Klaška, A. Kučera, V. Musil, and V. Řehák. 2021. Regstar: Efficient Strategy Synthesis for Adversarial Patrolling Games. In Proceedings of UAI 2021.
Klaška et al. (2020) D. Klaška, A. Kučera, and V. Řehák. 2020. Adversarial Patrolling with Drones. In Proceedings of AAMAS 2020. 629–637.
Kučera and Lamser (2016) A. Kučera and T. Lamser. 2016. Regular Strategies and Strategy Improvement: Efficient Tools for Solving Large Patrolling Problems. In Proceedings of AAMAS 2016. 1171–1179.
Maza et al. (2011) I. Maza, F. Caballero, J. Capitán, J.R. Martínez de Dios, and A. Ollero. 2011. Experimental Results in Multi-UAV Coordination for Disaster Management and Civil Security Applications. Journal of Intelligent and Robotic Systems 61, 1–4 (2011), 563–585.
Munoz de Cote et al. (2013) Enrique Munoz de Cote, Ruben Stranders, Nicola Basilico, Nicola Gatti, and Nick Jennings. 2013. Introducing alarms in adversarial patrolling games: extended abstract. In Proceedings of AAMAS 2013. 1275–1276.
Norris (1998) J.R. Norris. 1998. Markov Chains. Cambridge University Press.
Pita et al. (2008) J. Pita, M. Jain, J. Marecki, F. Ordónez, C. Portway, M. Tambe, C. Western, P. Paruchuri, and S. Kraus. 2008. Deployed ARMOR Protection: The Application of a Game Theoretic Model for Security at the Los Angeles Int. Airport. In Proceedings of AAMAS 2008. 125–132.
Portugal and Rocha (2011) D. Portugal and R. Rocha. 2011. A Survey on Multi-Robot Patrolling Algorithms. Technological Innovation for Sustainability 349 (2011), 139–146.
Sinha et al. (2018) A. Sinha, F. Fang, B. An, C. Kiekintveld, and M. Tambe. 2018. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2018). 5494–5501.
Tambe (2011) M. Tambe. 2011. Security and Game Theory. Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press.
Tsai et al. (2009) J. Tsai, S. Rathi, C. Kiekintveld, F. Ordóñez, and M. Tambe. 2009. IRIS—A Tool for Strategic Security Allocation in Transportation Networks Categories and Subject Descriptors. In Proceedings of AAMAS 2009. 37–44.
Wang et al. (2019) Y. Wang, Z.R. Shi, L. Yu, Y. Wu, R. Singh, L. Joppa, and F. Fang. 2019. Deep Reinforcement Learning for Green Security Games with Real-Time Information. In Proceedings of AAAI 2019. 1401–1408.
Xu (2021) L. Xu. 2021. Learning and Planning Under Uncertainty for Green Security. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2021).
Yan et al. (2013) Z. Yan, N. Jouandeau, and A.A. Cherif. 2013. A Survey and Analysis of Multi-Robot Coordination. International Journal of Advanced Robotic Systems 10, 12 (2013), 1–18.
Yin et al. (2010) Z. Yin, D. Korzhyk, C. Kiekintveld, V. Conitzer, and M. Tambe. 2010. Stackelberg vs. Nash in security games: Interchangeability, equivalence, and uniqueness. In Proceedings of AAMAS 2010. 1139–1146.

Minimizing Expected Intrusion Detection Time in Adversarial Patrolling

Abstract.

Key words and phrases:

1. Introduction

1.1. Related work

2. The Model

2.1. Terrain model

2.2. Defender and Attacker

2.2.1. Defender.

2.2.2. Attacker.

2.3. Protection value

2.3.1. Discussion

3. Finite-memory Defender’s strategies

Proof.

Proof.

4. Strategy Synthesis Algorithm

4.1. Computing Val⁡(σ)\operatorname{Val}(\sigma)

Proof.

4.2. Optimization Scheme

5. Experiments

5.1. Runtime Analysis

5.2. Patrolling Airport Gates

6. Conclusion

Acknowledgements

Disclaimer

References

Minimizing Expected Intrusion Detection Time
in Adversarial Patrolling

4.1. Computing $\operatorname{Val}(\sigma)$