This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

General Cops and Robbers Games with randomness

Frédéric Simard fsima063@uottawa.ca Josée Desharnais josee.desharnais@ift.ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada Department of Computer Science and Software Engineering, Université Laval, Québec, QC, Canada
Abstract

Cops and Robbers games have been studied for the last few decades in computer science and mathematics. As in general pursuit evasion games, pursuers (cops) seek to capture evaders (robbers); however, players move in turn and are constrained to move on a discrete structure, usually a graph, and know the exact location of their opponent. In 2017, Bonato and MacGillivray [2] presented a general characterization of Cops and Robbers games in order for them to be globally studied. However, their model doesn’t cover cases where stochastic events may occur, such as the robbers moving in a random fashion. In this paper we present a novel model with stochastic elements that we call a Generalized Probabilistic Cops and Robbers game (GPCR). A typical such game is one where the robber moves according to a probabilistic distribution, either because she is rather lost or drunk than evading, or because she is a robot. We present results to solve GPCR games, thus enabling one to study properties relating to the optimal strategies in large classes of Cops and Robbers games. Some classic Cops and Robbers games properties are also extended.

keywords:
Cops and Robbers games , pursuit games , optimal strategies , graph theory, stochastic games
journal: Theoretical Computer Science

1 Introduction

Cops and Robbers games have been studied as examples of discrete-time pursuit games on graphs since the publication of Quilliot’s doctoral thesis [29] in 1978 and, independently, Nowakowski and Winkler’s article [26] in 1983. Both monographs describe a turn-based game in which a lone cop pursues a robber on the vertices of a graph. The game evolves in discrete time and with perfect information. The cop wins if he eventually shares the same vertex as the robber’s, otherwise, if the play continues indefinitely, the latter wins. A given graph is copwin if the cop has a winning strategy: for any possible move the robber makes, the cop has an answer that leads him to eventually catch the robber (in finite time). As there is no tie, it is always true that one player has a (deterministic) winning strategy.

Since the first exposition of the game of Cop and Robber, many variants have emerged. Aigner and Fromme [1] notably presented in 1984 the cop number: it is the minimal number of cops required on a graph to capture a robber. Since then, more alternatives have been described, each one modifying one game parameter or more such as the speed of the players, the radius of capture of the cops, etc. We refer to Bonato and Nowakowski’s book [4] for a comprehensive description of these different formulations. The survey on guaranteed graph searching problems by Fomin and Thilikos [11] is also a great reference on the subject. In graph searching games, the objective is to capture a fugitive on a graph. The problems in which the object is always found are called guaranteed.

In 2017 Bonato and MacGillivray [2] presented a first generalization of Cops and Robbers games that encompasses the majority of the variants described previously. Indeed, all two-player, turn-based, discrete-time, pursuit games of perfect information on graphs in which both players play optimally are contained in Bonato and MacGillivray’s model. As such, this model encompasses all pursuit games deemed combinatorial (we refer to Conway’s book On Numbers and Games [9] for an introduction on the subject of combinatorial games). Those games include the set of turn-based, perfect information, games played on a discrete structure without any randomness.

Recently, some researchers such as Prałat and Kehagias [19], Komarov and Winkler [22] and Simard et al. [31] described a game, called the Cop and Drunk Robber game, in which the robber walks in a random fashion: each of her movements is described by a uniform random walk on the vertices of the graph. In general, this strategy is suboptimal. Since this particular game cannot be described by Bonato and MacGillivray’s model, it appears natural to seek to extend their framework to integrate games with random events.

There has also been a recent push towards more game theoretic approaches to modeling Cops and Robbers games, notably by Konstantinidis, Kehagias and others (see for example [23, 18, 16, 17, 24]). Our paper can be considered more in line with this way of treating Cops and Robbers games than more traditional approaches.

This paper thus presents a model of Cops and Robbers games that is more general than that of Bonato and MacGillivray. The main objective of this model is to incorporate games such as the Cop and Drunk Robber game. The probabilistic nature of this game leads to define a framework different from the one of Bonato and MacGillivray.

In Cops and Robbers games, one is generally interested in the question of solving a game. This question is universal to game theory where one defines a solution concept such as the Nash Equilibrium. In Cops and Robbers games, often-times the cops’ point of view is adopted and one seeks to determine whether it is feasible, and if so how, for them to capture the robbers. In stochastic Cops and Robbers games, one can generalize the question to a quantitative scale of success: what is the (best) probability for the cops to capture the robbers, and which strategy reflects it. One can also ask the dual question of what would be the minimal number of cops required in order to capture the robbers with some probability. In deterministic games, this graph parameter is known as the cop number.

One can note that many solutions of Cops and Robbers games share the same structure, and this is reflected in the fact that they can be solved with a recursive expression. Indeed, Nowakowski and Winkler [26] in 1983 presented a preorder relation on vertices, writing xnyx\preceq_{n}y when the cop has a winning strategy in at most nn moves if positioned on vertex yy, while the robber is on vertex xx. An important aspect of this relation n\preceq_{n} is that it can be computed recursively and thus leads to a polynomial time algorithm to compute its values, as well as the strategy of the cop. This relation was extended 2020 years later by Hahn and MacGillivray [15] in order to solve games of kk cops by letting players move on the graph’s strong product. Clarke and MacGillivray [7] have also defined a characterization of kk-cop-win graphs through a dismantling strategy and studied the algorithmic complexity of the problem. For a fixed kk the problem can be resolved in polynomial time with degree 2k+22k+2. On a related note, Kinnersley [20] proved that it is EXPTIME-complete to determine whether the cop-number of a graph GG is less than some integer kk when both GG and kk are part of the input. This shows that Clarke and MacGillivray’s result is somehow optimal.

In games with stochastic components, such order relations can be generalized by considering the probability of capture, as is done in a recent paper about the Optimal Search Path (OSP) problem [31]. A recursion wn(x,y)w_{n}(x,y) is defined: it represents the probability that a cop standing on vertex yy captures the robber, positioned on vertex xx, in at most nn steps. This relation, defined on the Cop and Drunk Robber game [19, 22, 31], is analogous to Nowakowski and Winkler’s xnyx\preceq_{n}y and is slightly more general as it enables to model the robber’s random movement. One can wonder up to what point the relation wnw_{n} can be extended while preserving its polynomial nature. Theorem 2.13 and Proposition 2.24 give an answer to this question.

This paper is divided as follows. Section 2 presents our model of Cops and Robbers games, the wnw_{n} recursion along with some complexity results, notably, on wnw_{n}. Stationarity results on wnw_{n} are also included. Since most Cops and Robbers games are played on graphs, another formulation of our model is presented on such a structure in Section 3. We conclude in Section 4.

2 An abstract Cops and Robbers game

We now present a general model of Probabilistic Cops and Robbers games; it is played with perfect information, is turn-based starting with the cops, and takes place on a discrete structure. From each state/configuration of the game, after choosing their actions, the cops and robbers will jump to a state according to their transition matrices, denoted TrobT_{\scalebox{0.7}{\rm rob}} and TcopT_{\scalebox{0.7}{\rm cop}}. These matrices may encode probabilistic behaviours: Tcop(s,a,s)T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})111The notation T(s,a,s)T(s,a,s^{\prime}) refers to a transition matrix view. In this way, it corresponds to annotating the edge [s,s][s,s^{\prime}] of the transition system with an action aa and a positive value, the probability. In the Markov Decision Processes (MDP) community, it is also written Ta(s,s)T_{a}(s,s^{\prime}), or (s|s,a)\mathbb{P}(s^{\prime}|s,a). is interpreted as the probability that the cop, starting in ss and playing action aa, will arrive in ss^{\prime}.

Definition 2.1.

A Generalized Probabilistic Cops and Robbers game (GPCR) is played by two players, the cop team and the robber team. It is given by the following tuple

𝒢\displaystyle\mathcal{G} =(S,i0,F,A,Tcop,Trob),\displaystyle=\left(S,i_{0},F,A,T_{\scalebox{0.7}{\rm cop}},T_{\scalebox{0.7}{\rm rob}}\right), (1)

satisfying

  1. 1.

    S=Scop×Srob×SoS=S_{\scalebox{0.7}{\rm cop}}\times S_{\scalebox{0.7}{\rm rob}}\times S_{\mathrm{o}}, the non-empty finite set of states representing the possible configurations of the game. The sets ScopS_{\scalebox{0.7}{\rm cop}} and SrobS_{\scalebox{0.7}{\rm rob}} hold the possible cops and robbers positions while SoS_{\mathrm{o}} may contain other relevant information (like whose turn it is).

  2. 2.

    i0Si_{0}\in S is the initial state.

  3. 3.

    FSF\subseteq S is the set of final (winning) states for the cops.

  4. 4.

    A=AcopArobA=A_{\scalebox{0.7}{\rm cop}}\cup A_{\scalebox{0.7}{\rm rob}}, with AcopA_{\scalebox{0.7}{\rm cop}} and ArobA_{\scalebox{0.7}{\rm rob}} the non-empty, finite sets of actions of the cops and robbers, respectively.

  5. 5.

    Tcop:S×Acop×S[0,1]T_{\scalebox{0.7}{\rm cop}}:S\times A_{\scalebox{0.7}{\rm cop}}\times S\rightarrow[0,1] is a transition function for the cops, that is,

    sSTcop(s,a,s){0,1} for all sS and aAcop.\textstyle\sum_{s^{\prime}\in S}T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})\in\left\{0,1\right\}\mbox{ for all $s\in S$ and }a\in A_{\scalebox{0.7}{\rm cop}}.

    When the sum is 1, we say that aa is playable in ss, and we write Acop(s)A_{\scalebox{0.7}{\rm cop}}(s) for the set of playable actions for the cops at state sSs\in S. Furthermore, TcopT_{\scalebox{0.7}{\rm cop}} also satisfies

    • (a)

      for all sSs\in S, Acop(s)A_{\scalebox{0.7}{\rm cop}}(s)\neq\emptyset

    • (b)

      if sFs\in F, then Tcop(s,a,s)=1T_{\scalebox{0.7}{\rm cop}}(s,a,s)=1 for all action aAcopa\in A_{\scalebox{0.7}{\rm cop}}; hence Tcop(s,a,s)=0T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})=0 for all sss^{\prime}\neq s.

  6. 6.

    TrobT_{\scalebox{0.7}{\rm rob}} is a transition function for the robbers, similar to TcopT_{\scalebox{0.7}{\rm cop}}. Arob(s)A_{\scalebox{0.7}{\rm rob}}(s) is the set of playable actions by the robbers in state sSs\in S.

A play of 𝒢\mathcal{G} is an infinite sequence i0a0s1a1s2a2(SAcopSArob)ωi_{0}a_{0}s_{1}a_{1}s_{2}a_{2}\dots\in(SA_{\scalebox{0.7}{\rm cop}}SA_{\scalebox{0.7}{\rm rob}})^{\omega} of states and playable actions of 𝒢\mathcal{G} that alternates the moves of cop\mathrm{cop}\! and rob\mathrm{rob}. It thus satisfies Tcop(sj,aj,sj+1)>0T_{\scalebox{0.7}{\rm cop}}(s_{j},a_{j},s_{j+1})>0 for j=0,2,4,j=0,2,4,\dots and Trob(sj,aj,sj+1)>0T_{\scalebox{0.7}{\rm rob}}(s_{j},a_{j},s_{j+1})>0 for j=1,3,5,j=1,3,5,\dots. The cops win whenever a final state sFs\in F is encountered, otherwise the robbers win. A turn is a subsequence of two moves, starting from cop\mathrm{cop}. We also consider finite plays and we write 𝒢n\mathcal{G}_{n} for the game where plays are finite with nn (complete) turns.

An equivalent formulation for TcopT_{\scalebox{0.7}{\rm cop}}, and sometimes more handy, is to rather define Tcop(s,a)T_{\scalebox{0.7}{\rm cop}}(s,a) as a distribution on SS, for an action aa playable in ss. The correspondance is Tcop(s,a)(X)=sXTcop(s,a,s)T_{\scalebox{0.7}{\rm cop}}(s,a)(X)=\sum_{s^{\prime}\in X}T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime}) for XSX\subseteq S. For example, the second condition of the fifth item in the preceding definition could have been stated Tcop(s,a)=δsT_{\scalebox{0.7}{\rm cop}}(s,a)=\delta_{s}, where δs\delta_{s} is the Dirac distribution on an element ss, that is, δs\delta_{s} has value 1 on {s}\{s\}, and is 0 elsewhere.

A play progresses as follows: from a state ss, the cops choose an action acopAcop(s)a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(s), which results in a new state ss^{\prime}, randomly chosen according to distribution Tcop(s,acop)T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}}); then the robbers play an action arobArob(s)a_{\scalebox{0.7}{\rm rob}}\in A_{\scalebox{0.7}{\rm rob}}(s^{\prime}), which results in the next state s′′s^{\prime\prime}, drawn with probability Trob(s,arob,s′′)T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime}). Once a final state is reached, the players are forced to stay in the same state. Notice that one could record whose turn it is in the third component of the states: So={cop,rob}S_{\mathrm{o}}=\{\mathrm{cop},\mathrm{rob}\}. However, this doubles the state set and complexifies the definition of the transition function. In most games, it is more intuitive to define the rules for movement independently of when this transition will be taken, like in chess.

We sometimes use the notation sxs_{\textsf{x}}, for x{cop,rob,o}\textsf{x}\in\left\{\mathrm{cop},\mathrm{rob},\mathrm{o}\right\} to denote the projection of a state sSs\in S on the set SxS_{\textsf{x}}. The set SoS_{\mathrm{o}} is rarely used in the current section, but will be valuable further on, such as in Example 3.5 on dynamic graphs whose structures vary with time.

In what follows, we write DistB\mathrm{Dist}_{B} as the set of discrete distributions on a set BB and 𝒰BDistB\mathcal{U}_{B}\in\mathrm{Dist}_{B} for the discrete uniform distribution on the same set.

Most of the example games we will describe will be between a single cop and a single robber, even if the definition specifies a cop team and a robber team. The usual way of presenting the positions of the cop team is with a single vertex in the strong product of each member’s possible territory.

2.1 Encoding of known games and processes, stochastic or not

We now describe a few known games, following the structure of Definition 2.1. The first one is a typical, deterministic example of a Cops and Robbers game. We say a game is deterministic when both distributions defined by TcopT_{\scalebox{0.7}{\rm cop}} and TrobT_{\scalebox{0.7}{\rm rob}} are concentrated on a single point, in other words if Tcop(s,a)T_{\scalebox{0.7}{\rm cop}}(s,a) and Trob(s,a)T_{\scalebox{0.7}{\rm rob}}(s,a) are Dirac for all sSs\in S and aAa\in A. The reader can safely skip this section.

Example 2.2 (Classic Cop and Robber game).

Let G=(V,E)G=(V,E) be a finite graph. In this game, both players play alone and walk on the vertices of the graph, successively choosing their next moves among their neighbourhoods. The final states are those in which both players share a vertex, in which case the cop wins. The tricky part for encoding this game is that in their first move, the cop and the robber can choose whatever vertices they want, so the rule of moving differs at the first move from the rest of the play. So we let icop,irobVi_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}}\notin V be two elements that will serve as starting points for the cop and the robber. Because the first moves are chosen in turn, the set of states SS below must contain states in V×{irob}V\times\{i_{\scalebox{0.7}{\rm rob}}\}, which can only be reached after the cop’s first move, but before the robber’s. To simplify SS, we include states that will not be reached, and this will be governed by the transition functions. The different sets are:

i0\displaystyle i_{0} =(icop,irob)\displaystyle=(i_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}})
S\displaystyle S =({icop}V)×({irob}V)\displaystyle=(\left\{i_{\scalebox{0.7}{\rm cop}}\right\}\cup V)\times(\left\{i_{\scalebox{0.7}{\rm rob}}\right\}\cup V)
F\displaystyle F ={(x,x)V2}\displaystyle=\left\{(x,x)\in V^{2}\right\}
Acop\displaystyle A_{\scalebox{0.7}{\rm cop}} =V\displaystyle=V
Arob\displaystyle A_{\scalebox{0.7}{\rm rob}} =V.\displaystyle=V.

Let (c,r)S(c,r)\in S, xVx\in V, and actions cAcopc^{\prime}\in A_{\scalebox{0.7}{\rm cop}} and rArobr^{\prime}\in A_{\scalebox{0.7}{\rm rob}}. We define:

Tcop((c,r),c,(x,r))\displaystyle T_{\scalebox{0.7}{\rm cop}}((c,r),c^{\prime},(x,r)) ={1, if x=c and c=icop or cN[c],0, otherwise.\displaystyle=\begin{cases}1,&\mbox{ if }x=c^{\prime}\mbox{ and }c=i_{\scalebox{0.7}{\rm cop}}\mbox{ or }c^{\prime}\in N[c],\\ 0,&\mbox{ otherwise.}\end{cases}
Trob((c,r),r,(c,x))\displaystyle T_{\scalebox{0.7}{\rm rob}}((c,r),r^{\prime},(c,x)) ={1, if x=r and r=irob or rN[r],0, otherwise.\displaystyle=\begin{cases}1,&\mbox{ if }x=r^{\prime}\mbox{ and }r=i_{\scalebox{0.7}{\rm rob}}\mbox{ or }r^{\prime}\in N[r],\\ 0,&\mbox{ otherwise.}\end{cases}

Thus, for state (c,r)S{i0}(c,r)\in S\setminus\{i_{0}\}, the playable action set is Acop(c,r)=N[c]A_{\scalebox{0.7}{\rm cop}}(c,r)=N[c]. Similarly, for the robber we get Arob(c,r)=N[r]A_{\scalebox{0.7}{\rm rob}}(c,r)=N[r]. Because a play starts with the cop, it is not required to specify the condition cicopc\neq i_{\scalebox{0.7}{\rm cop}} in function TrobT_{\scalebox{0.7}{\rm rob}}. Similarly, is it not necessary to make a special case of state c=rc=r, since the play ends anyway.

The stochasticity of Definition 2.1 is motivated by the following example, called the Cop and Drunk Robber game. It is rather similar to the one just presented except that the robber moves randomly on the vertices of the graph.

Example 2.3 (Cop and Drunk Robber game).

From the preceding example, only the robber’s transition function TrobT_{\scalebox{0.7}{\rm rob}} is modified, the rest stays the same. Let (c,r)S(c,r)\in S and rArobr^{\prime}\in A_{\scalebox{0.7}{\rm rob}}. The robber’s transition function is then:

Trob((c,r),r)\displaystyle T_{\scalebox{0.7}{\rm rob}}((c,r),r^{\prime}) ={δ(c,r), if r=irob,𝒰{c}×N[r], otherwise.\displaystyle=\begin{cases}\delta_{(c,r^{\prime})},&\mbox{ if }r=i_{\scalebox{0.7}{\rm rob}},\\ \mathcal{U}_{\{c\}\times N[r]},&\mbox{ otherwise. }\end{cases}

The robber, after the first move, moves uniformly randomly on her neighbourhood, which amounts to ignoring her action rArobr^{\prime}\in A_{\scalebox{0.7}{\rm rob}}. One could also restrict her actions by Arob(s)={1}A_{\scalebox{0.7}{\rm rob}}(s)=\left\{1\right\} when sS{i0}s\in S\setminus\{i_{0}\}.

In the Cop and Drunk Robber game, the robber moves according to a uniform distribution on her neighbourhood. Varying her transition function could represent various scenarios. For example, the robber’s probability of ending on a vertex rr^{\prime} from vertex rr could depend on the distance between rr and rr^{\prime}.

In addition to the Cop and Drunk Robber game itself, a recent paper by Simard et al. [31] presented a variant of this game in which the robber can evade capture. The main difference between these games is that the cop may not catch the robber even when standing on the same vertex. This game is presented in the next example.

Example 2.4 (Cop and Drunk Defending Robber).

The game’s main structure is again similar to that of Example 2.2, but we need a jail to simulate the catch of the robber, jVj^{*}\notin V. The initial state is the same, and we have:

i0\displaystyle i_{0} =(icop,irob)\displaystyle=(i_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}})
S\displaystyle S =({icop}V)×({irob}V){(j,j)}\displaystyle=(\left\{i_{\scalebox{0.7}{\rm cop}}\right\}\cup V)\times(\left\{i_{\scalebox{0.7}{\rm rob}}\right\}\cup V)\cup\left\{(j^{*},j^{*})\right\}
F\displaystyle F ={(j,j)}.\displaystyle=\left\{(j^{*},j^{*})\right\}.

When players do not meet, they move on GG as before. Yet, when the cop steps on the same vertex vv as the robber, there is a probability p(v)p(v) the robber gets captured, where p:V[0,1]p:V\rightarrow[0,1]. For (c,r)F(c,r)\not\in F, the robber’s transition function is then:

Trob((c,r),r)\displaystyle T_{\scalebox{0.7}{\rm rob}}((c,r),r^{\prime}) ={δ(c,r), if r=irob,𝒰{c}×N[r], if cr and rirob,Dr, if c=r and rirob,\displaystyle=\begin{cases}\delta_{(c,r^{\prime})},&\mbox{ if }r=i_{\scalebox{0.7}{\rm rob}},\\ \mathcal{U}_{\{c\}\times N[r]},&\mbox{ if }c\neq r\mbox{ and }r\neq i_{\scalebox{0.7}{\rm rob}},\\ D_{r},&\mbox{ if }c=r\mbox{ and }r\neq i_{\scalebox{0.7}{\rm rob}},\end{cases}
where Dr(x)\displaystyle\mbox{ where }D_{r}(x) ={1p(r)|N[r]|, if x{c}×N[r] and c=r,p(r), if x=(j,j).\displaystyle=\begin{cases}\frac{1-p(r)}{|N[r]|},&\mbox{ if }x\in\left\{c\right\}\times N[r]\mbox{ and }c=r,\\ p(r),&\mbox{ if }x=(j^{*},j^{*}).\end{cases}

When the cop steps on the robber’s vertex (c=rc=r), at the end of his turn, the next move for the robber follows the distribution DrD_{r}. The robber is caught by the cop with probability p(r)p(r), bringing the play in a final state, otherwise she proceeds as expected: the target state is chosen uniformly randomly in the robber’s neighbourhood. Variations of this game could be defined through different distributions for Trob((c,r),r)T_{\scalebox{0.7}{\rm rob}}((c,r),r^{\prime}) with crc\neq r. Likewise, in DrD_{r}, the factor 1|N[r]|\frac{1}{|N[r]|} could be replaced with any distribution on N[r]{N[r]}.

We now present the Cop and Fast Robber game with surveillance zone as first formulated in Marcoux [25]. This example is reconsidered further on in Section 3. Chalopin et al.  also studied a game of Cop and Fast Robber with the aim of characterizing graph classes [6].

Example 2.5 (Cop and Fast Robber).

This game is similar to the classic one (Example 2.2) except that the robber is not limited to a single transition. It has been studied by Fomin et al. [10]. We present a variation where the cop can capture the robber when she appears in his watch zone, even in the middle of a path movement. This watch zone can simulate the use of a weapon by the cop. The states will now contain, in addition to both players’ positions, the set of vertices watched by the cop. We assume here that the cop’s watch zone is his neighbourhood, as in Marcoux [25]; Fomin et al.’s version is retrieved with a watch zone consisting of a single vertex, the cop’s position. In the initial state, the cop’s watch zone is empty since the robber cannot be captured before her first step. We again use a jail state jVj^{*}\notin V. When both players find themselves there, the game ends and the robber has lost. Hence, we let:

i0\displaystyle i_{0} =(icop,,irob) with icop,irobV,\displaystyle=(i_{\scalebox{0.7}{\rm cop}},\emptyset,i_{\scalebox{0.7}{\rm rob}})\mbox{ with }i_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}}\notin V,
F\displaystyle F ={(j,,j)},\displaystyle=\left\{(j^{*},\emptyset,j^{*})\right\},
S\displaystyle S =({(icop,)}{(c,N[c])cV})×({irob}V)F\displaystyle=\left(\left\{(i_{\scalebox{0.7}{\rm cop}},\emptyset)\right\}\cup\left\{(c,N[c])\mid c\in V\right\}\right)\times\left(\left\{i_{\scalebox{0.7}{\rm rob}}\right\}\cup V\right)\cup F

Let (c,C,r)S(c,C,r)\in S be the current state and cN[c]c^{\prime}\in N[c] an action of the cop. Here is the cop’s transition function, for (c,C,r)F(c,C,r)\not\in F:

Tcop((c,C,r),c)\displaystyle T_{\scalebox{0.7}{\rm cop}}((c,C,r),c^{\prime}) ={δ(c,N[c],r), if c=icop and cV or  if cV and cN[c],0, otherwise.\displaystyle=\begin{cases}\delta_{(c^{\prime},N[c^{\prime}],r)},&\mbox{ if }c=i_{\scalebox{0.7}{\rm cop}}\mbox{ and }c^{\prime}\in V\mbox{ or }\\ &\mbox{ if }c\in V\mbox{ and }c^{\prime}\in N[c],\\ 0,&\mbox{ otherwise.}\end{cases}

As in the classic game, the cop can jump to any vertex in his first move; after that he moves in the neighbourhood of his current position. His watch zone then changes to N[c]N[c^{\prime}]. We use CC as watch zone in this definition to emphasize the fact that it does not influence the cop’s next state. On her turn, on vertex r1Vr_{1}\in V, the robber’s action consists in choosing a path π=(r1,r2,,rn)\pi=(r_{1},r_{2},\dots,r_{n}) of finite length n>0n>0, that is, [ri,ri+1][r_{i},r_{i+1}] is an edge in EE for each i=1,2,i=1,2,\dots. The robber’s transition function is:

Trob((c,C,r1),π)\displaystyle T_{\scalebox{0.7}{\rm rob}}((c,C,r_{1}),\pi) ={δ(c,C,rn), if r1=irob and rnVN[c], or if r1V and riC, for all  2in,δ(j,,j), otherwise.\displaystyle=\begin{cases}\delta_{(c,C,r_{n})},&\mbox{ if }r_{1}=i_{\scalebox{0.7}{\rm rob}}\mbox{ and }r_{n}\in V\setminus N[c],\mbox{ or}\\ &\mbox{ if }r_{1}\in V\mbox{ and }r_{i}\notin C\mbox{, for all }\;2\leq i\leq n,\\ \delta_{(j^{*},\emptyset,j^{*})},&\mbox{ otherwise.}\end{cases}

The robber is thus ensured to reach her destination rnr_{n} provided that she never crosses the cop’s watch zone on her path π\pi. If this happens, then the robber is taken to the jail state (j,,j)(j^{*},\emptyset,j^{*}).

In Section 3, we present this game again, but with the possibility for the robber to evade capture.

Hence, because of Definition 2.1’s rather general description, it is possible to encode a great variety of random events resulting from the cops’ or the robbers’ actions. In the following example, we encode a simple inhomogeneous Markov Chain by forgetting the notions of cop. This makes the example fairly degenerate but it also shows the generality of Definition 2.1.

Example 2.6 (Finite Markov chain).

A Markov chain is a sequence of random variables X0,X1,X_{0},X_{1},\dots on a space EE, having the Markov property. So we can assume that the evolution is given by an initial distribution qq on EE and a family of matrices M0,M1,M_{0},M_{1},\dots, where Mi(s,s)M_{i}(s,s^{\prime}) is the probability that Xi+1=sX_{i+1}=s^{\prime} given that Xi=sX_{i}=s. We can encode it as a GPCR game from Definition 2.1. In previous examples, we have ignored the third component of states, SoS_{\mathrm{o}}, but here we can ignore one of the players sets, like SrobS_{\scalebox{0.7}{\rm rob}}; equivalently, we can assume a single state for the robber and no effect by TrobT_{\scalebox{0.7}{\rm rob}}. We define

i0\displaystyle i_{0} E\displaystyle\notin E
S\displaystyle S =I(E×)\displaystyle=I\cup(E\times\mathbb{N})
F\displaystyle F =\displaystyle=\emptyset
A\displaystyle A ={1}\displaystyle=\left\{1\right\}
Tcop(i0,1,(e,0))\displaystyle T_{\scalebox{0.7}{\rm cop}}(i_{0},1,(e,0)) =q(e)\displaystyle=q(e)
Tcop((e,j),1,(e,j+1))\displaystyle T_{\scalebox{0.7}{\rm cop}}((e,j),1,(e^{\prime},j+1)) =Mj(e,e).\displaystyle=M_{j}(e,e^{\prime}).

Since the action of the player has no influence on the progress of the game, it is natural to define AA as a singleton. Technically, a play alternates between the moves of cops and robbers, so it is a sequence i01(e0,0)1(e0,0)1(e1,1)1(e1,1)i_{0}1(e_{0},0)1(e_{0},0)1(e_{1},1)1(e_{1},1)\dots; the repetitions reflect the fact that the robber has no effect. If we ignore the useless information of such a play, we obtain a sequence i0e0e1e2i_{0}e_{0}e_{1}e_{2}\dots, which is just a walk in the Markov chain (and the robber wins). Another way to write down this model would have been to let the two players play similarly, with Trob=TcopT_{\scalebox{0.7}{\rm rob}}=T_{\scalebox{0.7}{\rm cop}}, but the states would then have to be triplets, and the initial state would force a less simple encoding.

Similarly, we can encode a finite state Markov Decision Process (MDP) with reachability objectives [28] with Definition 2.1. The encoding will satisfy that the optimal value of the MDP is 1 if the cops wins, otherwise it is 0, and the robber wins.

The probabilistic Zombies and Survivors game on graphs [3] can also be viewed as a GPCR game, one in which only the robbers play optimally. It models a situation in which a single robber (the survivor) tries to escape a set of cops (the zombies). However, the cops have to choose their initial vertices at random and, on each turn, choose randomly among the set of vertices that minimize the distance to the robber.

2.2 Strategies

A deterministic (or pure) strategy is a function that prescribes to a player which action to play on each possible game history. Some strategies are better than others; we will be interested in the probability of winning for the cops, which will be attained by following a strategy. Ultimately, we are interested in memoryless strategies, that is, those that only depend on the present state, and not on the previous moves; nevertheless, we need to define more general strategies as well.

Definition 2.7.

Let 𝒢\mathcal{G} be a game. A history on 𝒢\mathcal{G} is an initial fragment of a play on 𝒢\mathcal{G} ending in a state. H𝒢H_{\mathcal{G}} is the set of histories on 𝒢\mathcal{G}.

  • 1.

    the set of general strategies is Ωg={σ:H𝒢A}\Omega^{\mathrm{g}}=\{\sigma:H_{\mathcal{G}}\rightarrow A\}.

  • 2.

    the set of memoryless strategies is Ω={σ:SA}\Omega=\{\sigma:S\rightarrow A\}.

  • 3.

    the set of finite horizon strategies is Ωf={σ:(S×)A}\Omega^{\mathrm{f}}=\{\sigma:(S\times{\mathbb{N}})\rightarrow A\}.

A finite horizon strategy counts the number of turns remaining, and it is otherwise memoryless. A finite horizon strategy is conveniently defined on 𝒢\mathcal{G} but it is actually played on 𝒢n\mathcal{G}_{n}, hence the following definition of how such a strategy is followed. At turn 0 of hh (histories i0i_{0} and i0a0s1i_{0}a_{0}s_{1}), there are nn turns remaining, so σ\sigma is evaluated with nn on the second coordinate of its argument; at turn 1 (histories i0a0s1a1s2i_{0}a_{0}s_{1}a_{1}s_{2} and i0a0s1a1s2a2s3i_{0}a_{0}s_{1}a_{1}s_{2}a_{2}s_{3}), there are n1n-1 turns remaining.

Definition 2.8.

Let h=i0a0s1a1s2a2s3h=i_{0}a_{0}s_{1}a_{1}s_{2}a_{2}s_{3}\dots be a (finite or infinite) play of 𝒢\mathcal{G}.

  • 1.

    hh follows a general strategy σΩg\sigma\in\Omega^{\mathrm{g}} for the cops if for all j=0,2,4,j=0,2,4,\dots we have aj=σ(a0s1a1s2a2s3sj)a_{j}=\sigma(a_{0}s_{1}a_{1}s_{2}a_{2}s_{3}\dots s_{j}). Similarly for the robbers.

  • 2.

    hh follows a memoryless strategy σΩcop\sigma\in\Omega_{\mathrm{cop}} for the cops if for all j=0,2,4,j=0,2,4,\dots we have aj=σ(sj)a_{j}=\sigma(s_{j}). Similarly for the robbers.

  • 3.

    hh follows a finite horizon strategy σΩcopf\sigma\in\Omega^{\mathrm{f}}_{\mathrm{cop}} on 𝒢n\mathcal{G}_{n} for the cops if for j=0,2,4,,2nj=0,2,4,\dots,2n we have aj=σ(sj,nj2))a_{j}=\sigma(s_{j},n-\frac{j}{2})).

  • 4.

    hh follows a finite horizon strategy σΩrobf\sigma\in\Omega^{\mathrm{f}}_{\mathrm{rob}} on 𝒢n\mathcal{G}_{n} for the robbers if j=1,3,5,,2n+1j=1,3,5,\dots,2n+1 we have aj=σ(sj,nj12)a_{j}=\sigma(s_{j},n-\frac{j-1}{2}).

These strategies are all deterministic, or pure: a single action is chosen. Some papers consider mixed or behavioral strategies, where this choice is randomized. This is unnecessary in our setting because, as is well known in perfect information games, among all optimal strategies, there is always a pure one. We will come back to this when we study optimal strategies later on.

We now present an example where the optimal strategy for the infinite game is memoryless (only depends on the states), but, for any finite horizon game 𝒢n\mathcal{G}_{n}, it is a finite horizon strategy.

Example 2.9.

This example is in the spirit of the Cop and Drunk Robber game, presented in Example 2.3. As in this example, the cop moves on his neighbourhood and so does the robber, who cannot choose her action, as before, but the difference with Example 2.3 is that the robber’s movement is not uniform. The graph is a cycle of length 5. The robber moves clockwise with probability 0.9, and counterclockwise with probability 0.1. If the cop is at distance 1 of the robber at his turn, of course he wins in this turn. Otherwise, the cop is at distance 2, more specifically at clockwise distance 2 or 3. Let us focus on states ss where this clockwise distance is 2 (from the cop to the robber). On the long term, the cop’s best choice is to move counterclockwise. However, if only one turn remains, the best move for the cop is the clockwise move because then with probability 0.10.1, the robber will jump to his position, whereas the probability of winning is zero in the counterclockwise direction. So the best strategy σ\sigma for 𝒢n\mathcal{G}_{n} satisfies σ(s,n)σ(s,1)\sigma(s,n)\neq\sigma(s,1) in such a state ss, for n>1n>1, hence it is not memoryless. Indeed, for example, σ(s,2)σ(s,1)\sigma(s,2)\neq\sigma(s,1) because the probability of catching the robbers by playing counterclockwise when 2 turns remain is 0.90.9, and it is 0.190.19 by playing clockwise (0.10.1 in one move of the robber plus 0.090.09 in two moves).

2.3 Winning conditions in GPCR games

In this section we are interested in winning strategies for the cops, their probability of winning in a given number nn of turns (that is, in 𝒢n\mathcal{G}_{n}) and their probability of winning without any limit on the number of turns (in 𝒢\mathcal{G}).

Given finite horizon strategies σcop\sigma_{\mathrm{cop}} and σrob\sigma_{\mathrm{rob}}, for the cops and for the robbers, we consider the probability that the robbers are captured in nn steps or less:

pn(σcop,σrob)\displaystyle p_{n}(\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}) :=[\saycapture in at most n stepsσcop,σrob]\displaystyle:=\mathbb{P}\left[\mbox{\say{capture in at most $n$ steps}}\mid\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}\right]

Since the cops want to maximize this probability, and the robbers want to minimize it, the probability for the cops to win in nn turns or less (playing optimally), whatever the robbers strategy, is:

pn\displaystyle p_{n}^{*} :=maxσcopΩcopfminσrobΩrobfpn(σcop,σrob).\displaystyle:=\max_{\sigma_{\mathrm{cop}}\in\Omega^{\mathrm{f}}_{\mathrm{cop}}}\min_{\sigma_{\mathrm{rob}}\in\Omega^{\mathrm{f}}_{\mathrm{rob}}}p_{n}(\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}). (2)

This is in fact the value of 𝒢n\mathcal{G}_{n} in the sense of game theory. In game theory, the value for 𝒢n\mathcal{G}_{n} exists if

maxσcopΩcopgminσrobΩrobgpn(σcop,σrob)=minσrobΩrobgmaxσcopΩcopgpn(σcop,σrob).\displaystyle\max_{\sigma_{\mathrm{cop}}\in\Omega_{\mathrm{cop}}^{\mathrm{g}}}\min_{\sigma_{\mathrm{rob}}\in\Omega_{\mathrm{rob}}^{\mathrm{g}}}p_{n}(\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}})=\min_{\sigma_{\mathrm{rob}}\in\Omega_{\mathrm{rob}}^{\mathrm{g}}}\max_{\sigma_{\mathrm{cop}}\in\Omega_{\mathrm{cop}}^{\mathrm{g}}}p_{n}(\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}). (3)

In our setting defining the payoff function of a play as 1 when the robbers are captured and 0 otherwise, we have, by Wal and Wessels [33], that the game 𝒢n\mathcal{G}_{n} has value pnp_{n}^{*}. That the restriction of pnp_{n}^{*} to finite horizon strategies does achieves the value of 𝒢n\mathcal{G}_{n} is given again by Wal and Wessels, who call such strategies Markov strategies. Finally, since 𝒢n\mathcal{G}_{n} is finite and with perfect information, a standard game-theoretical argument [27] justifies that the optimal strategies are deterministic (or pure).

We say that the cops and the robbers play optimally in 𝒢n\mathcal{G}_{n} if they each follow a strategy that yields probability pnp_{n}^{*} for the cops to win. We will show later on, but it is also straightforward222This can be proven by induction, since for n+1n+1 the cops can choose their optimal strategy for nn and simply do anything on the last turn. from the definition, that pnp_{n}^{*} is increasing in nn; since it is moreover bounded by 1, the limit always exists and we will prove that is it equal to the value of 𝒢\mathcal{G}.

Indeed, from a known result in Simple Stochastic Games (SSG), one can show that 𝒢\mathcal{G} has a value and that this value is achieved by a pair of optimal strategies that are deterministic (or pure) and memoryless. The argument is well known in the literature on SSGs, but requires a construction, so we leave it to Appendix A. Thus, let us write the value of game 𝒢\mathcal{G} as p𝒢p_{\mathcal{G}}^{*}, that is,

p𝒢\displaystyle p_{\mathcal{G}}^{*} =maxσcopΩcminσrobΩr[\saycapture in a playσcop,σrob],\displaystyle=\max_{\sigma_{\mathrm{cop}}\in\Omega_{c}}\min_{\sigma_{\mathrm{rob}}\in\Omega_{r}}\mathbb{P}\left[\mbox{\say{capture in a play}}\mid\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}\right], (4)

and the equality still holds when the min\min and max\max operators are switched. This value is guaranteed by Theorem A.1 [8, 30]. In Proposition 2.16, we will show that the difference in the cop using a finite-horizon strategy in 𝒢n\mathcal{G}_{n} and a memoryless one in 𝒢\mathcal{G} is negligible for a sufficiently large integer nn.

Equation (2) returns either 0 or 11 in deterministic games such as the Classic Cop and Robber game. We seek here to study games that can be stochastic, where pnp_{n}^{*} can take any value in [0,1][0,1]. Thus, we adapt the usual definition of copwin to our broader model.

Definition 2.10.

Let 𝒢\mathcal{G} be a GPCR game. We say 𝒢\mathcal{G} is

  • 1.

    c(p,n)c(p,n)-win if the cops can ensure a win with probability at least pp in at most nn turns, that is pnpp_{n}^{*}\geq p;

  • 2.

    pp-copwin if it is c(p,n)c(p,n)-win for some nn\in\mathbb{N};

  • 3.

    almost surely copwin if the cops can win when they are allowed to play infinitely, that is p𝒢=1p_{\mathcal{G}}^{*}=1;

  • 4.

    copwin if it is c(1,n)c(1,n)-win for some nn\in\mathbb{N}.

It is easy to see that when 𝒢\mathcal{G} corresponds to the Classic Cop and Robber game, as defined in Example 2.2, this definition of copwin coincides with the classical one. In that sense, it can be considered as a generalization of the classical one, because in any copwin finite graph, the cop wins in at most n=|V(G)|2n=\left\lvert V(G)\right\rvert^{2} turns.

Remark 2.11.

We will see in Proposition 2.16 that limnpn=p𝒢\lim_{n\to\infty}p_{n}^{*}=p_{\mathcal{G}}^{*}. Thus, if there exists nn such that pn>0p_{n}^{*}>0 and if all states reachable within a finite number of moves of the cop’s optimal strategy are in the same strongly connected component, then p𝒢=1p_{\mathcal{G}}^{*}=1. Indeed, after nn turns, if the play is not over, the cops can go back to the configuration where pn>0p_{n}^{*}>0: the initial position that is proposed by the cops’ strategy. In that state, the probability that the robbers have not been caught is at most 1pn1-p_{n}^{*}; the probability that the robbers are not caught after mm repetition of this cycle is at most (1pn)m(1-p_{n}^{*})^{m}. It is thus zero at the limit. This happens, for example, if pn>0p_{n}^{*}>0 and 𝒢\mathcal{G} is a strongly connected graph. However, we cannot, in general, claim that if pn>0p_{n}^{*}>0 after n>|S|n>\left\lvert S\right\rvert turns have been played, then p𝒢=1p_{\mathcal{G}}^{*}=1.

We define a probabilistic analog to the cop number, c(G)c(G), which is the minimal number of cops required on a graph GG in order for the cops to capture the robbers. It is an important subject of research in Classic Cops and Robbers games [4], in particular relating to Meyniel’s conjecture that c(G)O(|V(G)|)c(G)\in\mathop{}\!O{\left(\sqrt{\left\lvert V(G)\right\rvert}\right)}. Furthermore, one of the main areas of research on cops and robbers games that involve random events is the expected capture time of the robbers [22, 21, 19]. Thus, we further generalize the expected capture time of the robbers for any game 𝒢\mathcal{G}.

Adding cops in a game 𝒢\mathcal{G} is done in the natural way. The set of cops states ScopS_{\scalebox{0.7}{\rm cop}} is the cartesian product of the sets of single cop positions, and the transition function is updated so as to let all cops move in one step.

Definition 2.12.

The (p,n)(p,n)-cop number cpn(𝒢)c_{p}^{n}(\mathcal{G}) of a game 𝒢\mathcal{G} is the minimal number of cops required for the capture of the robbers in at most nn turns with probability at least pp. In other words, cpn(𝒢)c_{p}^{n}(\mathcal{G}) is the minimal number of cops required for a game 𝒢\mathcal{G} to be c(p,n)-win. The pp-cop number, cp(𝒢)=cp(𝒢)c_{p}(\mathcal{G})=c_{p}^{\infty}(\mathcal{G}), is the minimal number of cops necessary for having p𝒢pp_{\mathcal{G}}^{*}\geq p.

Let T𝒢pT_{\mathcal{G}}^{p} be the random variable giving the number of turns required for the robbers to be captured with probability at least pp in 𝒢\mathcal{G} under optimal strategies. Then, the pp-expected capture time of the robbers is 𝔼[T𝒢p]\mathbb{E}\left[T_{\mathcal{G}}^{p}\right]. The expected capture time of the robbers is 𝔼[T𝒢1]\mathbb{E}\left[T_{\mathcal{G}}^{1}\right].

Since some of the optimal strategies of 𝒢\mathcal{G} are memoryless, we can turn the question of computing 𝔼[T𝒢p]\mathbb{E}\left[T_{\mathcal{G}}^{p}\right] into a question of computing an expected hitting time in a Markov chain. Let us write σcop\sigma_{\mathrm{cop}}^{*} (σrob\sigma_{\mathrm{rob}}^{*}) for the optimal strategy of the cops (robbers) in 𝒢\mathcal{G} and let \mathcal{M} be the Markov chain such that for any state sSs\in S, it has two states (s,σcop(s))(s,\sigma_{\mathrm{cop}}^{*}(s)) and (s,σrob(s))(s,\sigma_{\mathrm{rob}}^{*}(s)). Furthermore, let MM be its transition matrix, that is governed by the distributions Tcop(s,σcop(s))T_{\scalebox{0.7}{\rm cop}}(s,\sigma_{\mathrm{cop}}^{*}(s)) and Trob(s,σrob(s))T_{\scalebox{0.7}{\rm rob}}(s,\sigma_{\mathrm{rob}}^{*}(s)). Suppose (Xn)n0(X_{n})_{n\geq 0} describes the stochastic process on \mathcal{M} beginning at the initial state i0i_{0}, then T:=12minn0(XnF)T:=\frac{1}{2}\min_{n\geq 0}(X_{n}\in F) is the hitting time of FF from i0i_{0}. The expectation of TT is 𝔼[T𝒢1]\mathbb{E}\left[T_{\mathcal{G}}^{1}\right].

2.4 Solving GPCR games

Similarly as with Bonato and MacGillivray’s model, we define a method for solving GPCR games, that is, for computing the probability for the cops to capture the robbers in an optimal play, and the strategy to follow. This method takes the form of a recursion, defining the probability wn(s)w_{n}(s) that state ss leads to a final state in at most nn steps (ww is for winning in the following theorem). This recursion gives a strategy for the cops.

Theorem 2.13.

Let 𝒢\mathcal{G} be a GPCR game, and let:

w0\displaystyle w_{0} (s):={1, if sF,0, otherwise.\displaystyle(s):=\begin{cases}1,&\mbox{ if }s\in F,\\ 0,&\mbox{ otherwise.}\end{cases}
wn\displaystyle w_{n} (s):=\displaystyle(s):=
{ 1, if sF,maxaAcop(s)sSTcop(s,a,s)minaArob(s)s′′STrob(s,a,s′′)wn1(s′′), otherwise.\displaystyle\begin{cases}\mbox{~{}}~{}1,&\mbox{ if }s\in F,\\ \displaystyle\max_{\!\!a\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm cop}}(\!s^{\prime}\!)\!\!$}}\sum_{s^{\prime}\in S}\!T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})\displaystyle\min_{\!\!\!\!a^{\prime}\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm rob}}(\!s^{\prime}\!)\!\!$}}\sum_{\scalebox{0.7}{$s^{\prime\prime}$}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a^{\prime},s^{\prime\prime})w_{n-1}(s^{\prime\prime}),\mbox{}&\mbox{ otherwise.}\end{cases} (5)

Then wn(s)w_{n}(s) gives the probability for the robbers to be captured in nn turns or less, given that both players play optimally, starting in state ss. Thus,

wn(i0)=pn.w_{n}(i_{0})=p_{n}^{*}.

This also says that 𝒢\mathcal{G} is c(p,n)c(p,n)-win if and only if wn(i0)pw_{n}(i_{0})\geq p. For (s,k)S×(s,k)\in S\times\mathbb{N}, let σcop(s,k)\sigma^{*}_{\mathrm{cop}}(s,k) be the argmax\operatornamewithlimits{argmax} in place of max\max in Equation (2.13). This defines finite horizon strategies that are optimal in 𝒢n\mathcal{G}_{n}333The argmax is not necessary unique..

The recursive part of wnw_{n}’s definition is as follows: to win, the cops must take the best action aa; this leads them to state ss^{\prime} with probability Tcop(s,a,s)T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime}); from this state, the robbers must choose the action aa^{\prime} that will give them the smallest probability of being caught. Action aa^{\prime} leads the robbers to state s′′s^{\prime\prime} with probability Trob(s,a,s′′)T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a^{\prime},s^{\prime\prime}) and then we multiply by the probability that the cops catch the robbers from this state, wn1(s′′)w_{n-1}(s^{\prime\prime}). Since the cops want a high probability, a maximum is taken; it is the converse for the robbers. The full equation gives the expected probability of capture of the robbers by the cops when both players move optimally.

Proof.

The proof is by induction on nn. We prove that wn(s)w_{n}(s) gives the probability for the robbers to be captured in nn turns or less, given that both players play optimally, starting in state ss. Let ss be any state.

If n=0n=0, then the cops win if and only if sFs\in F, in which case, by definition we do have w0(s)=1w_{0}(s)=1. Otherwise the robbers win and w0(s)=0w_{0}(s)=0, as wanted.

If n>0n>0, suppose the result holds for n1kn-1\leq k and let ss be the current state. If this state is final, then the robbers are caught in nn turns or less with probability 11 and wn(s)=1w_{n}(s)=1 as desired. Otherwise, let the cops, playing first, choose an action acopAcop(s)a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(s), after what the next state ss^{\prime} is drawn according to Tcop(s,acop,s)T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}},s^{\prime}). Then, the robbers can choose an action arobArob(s)a_{\scalebox{0.7}{\rm rob}}\in A_{\scalebox{0.7}{\rm rob}}(s^{\prime}), in which case the next state s′′s^{\prime\prime} will be drawn with probability Trob(s,arob,s′′)T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime}). By the induction hypothesis, we know a final state will be encountered in n1n-1 turns or less with probability wn1(s′′)w_{n-1}(s^{\prime\prime}) starting from state s′′s^{\prime\prime}. Thus, the probability the robbers are caught in nn turns or less by playing action aroba_{\scalebox{0.7}{\rm rob}} after the cops have reached state ss^{\prime} is given by:

s′′STrob(s,arob,s′′)wn1(s′′).\sum_{s^{\prime\prime}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime})w_{n-1}(s^{\prime\prime}).

Note that if sFs^{\prime}\in F, this value is exactly wn1(s)w_{n-1}(s^{\prime}), since by definition, we must have Trob(s,arob,s′′)=1T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime})=1 if s′′=ss^{\prime\prime}=s^{\prime} and 0 otherwise. The robbers wish to minimize this value among their set of available actions, which is possible since both sets SS and ArobA_{\scalebox{0.7}{\rm rob}} are finite. Hence, supposing action acopAcop(s)a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(s) has been chosen by the cops, the game stochastically transits to some other state sSs^{\prime}\in S with probability Tcop(s,acop,s)T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}},s^{\prime}). Thus, with probability

sSTcop(s,acop,s)minarobArob(s)s′′STrob(s,arob,s′′)wn1(s′′),\sum_{s^{\prime}\in S}T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}},s^{\prime})\min_{a_{\scalebox{0.7}{\rm rob}}\in A_{\scalebox{0.7}{\rm rob}}(s^{\prime})}\sum_{s^{\prime\prime}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime})w_{n-1}(s^{\prime\prime}),

the robbers are caught in at most nn turns from state ss, when the cops play action acopa_{\scalebox{0.7}{\rm cop}}. The cops want to maximize this value and, as for the robbers, this is possible because the considered sets are finite. Thus, the cops must play the action

argmaxacopAcop(s)sSTcop(s,acop,s)minarobArob(s)s′′STrob(s,arob,s′′)wn1(s′′).\operatornamewithlimits{argmax}_{a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(s)}\sum_{s^{\prime}\in S}T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}},s^{\prime})\min_{a_{\scalebox{0.7}{\rm rob}}\in A_{\scalebox{0.7}{\rm rob}}(s^{\prime})}\sum_{s^{\prime\prime}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a_{\scalebox{0.7}{\rm rob}},s^{\prime\prime})w_{n-1}(s^{\prime\prime}).

The claim about σcop\sigma^{*}_{\mathrm{cop}} is straightforward from this result. The choices of actions at the initial state thus give the probability wn(i0)w_{n}(i_{0}). Because pnp_{n}^{*} is, by definition, the probability of capture of the robbers in nn turns or less when both players play optimally, we conclude that wn(i0)=pnw_{n}(i_{0})=p_{n}^{*}. ∎

This result implies that the wnw_{n}’s are probabilities that increase with nn. In other words, we have the following corollary.

Corollary 2.14.

For any n,sSn\in\mathbb{N},s\in S we have 0wn(s)wn+1(s)10\leq w_{n}(s)\leq w_{n+1}(s)\leq 1.

Note that there are many optimal strategies for the cops in 𝒢\mathcal{G}, that is, stategies that have value pnp_{n}^{*}, but they are not all as efficient. Consider a game 𝒢n\mathcal{G}_{n} where the robbers can be caught in k<nk<n turns with probability 1, and let σk\sigma_{k} be an optimal strategy for 𝒢k\mathcal{G}_{k}. Then the strategy that stays idle for nkn-k turns and then behave as prescribed by σk\sigma_{k} is optimal, but not efficient, and it respects the argmax of Equation (2.13). The next proposition shows how to define an efficient one.

Proposition 2.15.

For each NN\in\mathbb{N}, there exists an optimal strategy σN\sigma_{N}^{*} with horizon NN that satisfies: for all sSs\in S, if wN1(s)=wN2(s)w_{N_{1}}(s)=w_{N_{2}}(s) for N1N2NN_{1}\leq N_{2}\leq N, then σN(s,N1)=σN(s,N2)\sigma_{N}^{*}(s,N_{1})=\sigma_{N}^{*}(s,N_{2}). Similarly for the robbers.

Proof.

For any (s,m)S×(s,m)\in S\times\mathbb{N}, we denote by ACT(s,m)\mbox{\rm ACT}(s,m) the set of actions that achieve the maximum in Equation (2.13) for wm(s)w_{m}(s). We have proved in Theorem 2.13 that any strategy satisfying σ(s,m)ACT(s,m)\sigma(s,m)\in\mbox{\rm ACT}(s,m) for all (s,m)S×(s,m)\in S\times\mathbb{N} is optimal. Let us prove that if wN1(s)=wN1+1(s)w_{N_{1}}(s)=w_{N_{1}+1}(s) for N1N_{1}\in\mathbb{N}, then ACT(s,N1)ACT(s,N1+1)\mbox{\rm ACT}(s,N_{1})\subseteq\mbox{\rm ACT}(s,N_{1}+1). By contradiction let kk be the smallest integer such that there is a state ss and an action aACT(s,k)ACT(s,k+1)a\in\mbox{\rm ACT}(s,k)\setminus\mbox{\rm ACT}(s,k+1). By induction, the cops play action aa at time k+1k+1, and then, with horizon kk, they choose an optimal action in σk1\sigma_{k-1}^{*} that is also in σk\sigma_{k}^{*} (possible by minimality of kk) and so on until the last turn where they stay in place or whatever possible. This gives us a value of at least wk(s)w_{k}(s) and, by definition, at most wk+1(s)w_{k+1}(s). Since wk(s)=wk+1(s)w_{k}(s)=w_{k+1}(s), the finite horizon strategy defined above is optimal. This is a contradiction since aa should be in ACT(s,k+1)\mbox{\rm ACT}(s,k+1). Thus we obtain that if wN1(s)=wN2(s)w_{N_{1}}(s)=w_{N_{2}}(s) for N1N2NN_{1}\leq N_{2}\leq N, then ACT(s,N1)ACT(s,N2)\mbox{\rm ACT}(s,N_{1})\subseteq\mbox{\rm ACT}(s,N_{2}), for all N1N2NN_{1}\leq N_{2}\leq N. Hence the wanted strategy exists. The argument is similar for the robbers. ∎

Although wn(i0)w_{n}(i_{0}) only gives the value of the game 𝒢n\mathcal{G}_{n} with finite-horizon strategies, we can show that this relation, as a function of nn, converges to the value of 𝒢\mathcal{G}.

Proposition 2.16.

The value of 𝒢\mathcal{G} is limnpn\lim_{n\to\infty}p_{n}^{*}. Furthermore, the optimal strategies of 𝒢n\mathcal{G}_{n} are ϵ\epsilon-optimal strategies of 𝒢\mathcal{G} for any ϵ>0\epsilon>0 and sufficiently large integer nn.

Proof.

From a previous argument, we know that some pair (sc,sr)(s_{c},s_{r}) of optimal memoryless strategies for the cops and the robbers yields a probability p𝒢p_{\mathcal{G}}^{*} of winning for the cops. It holds that pnp𝒢p_{n}^{*}\leq p_{\mathcal{G}}^{*}, for any integer nn, since the value of 𝒢n\mathcal{G}_{n} can only be at most the value of 𝒢\mathcal{G}. Recall that since pnp_{n}^{*} is non-decreasing in nn and bounded above by p𝒢p_{\mathcal{G}}^{*}, we have that limnpnp𝒢\lim_{n\to\infty}p_{n}^{*}\leq p_{\mathcal{G}}^{*}.

Now, let us play strategies (sc,sr)(s_{c},s_{r}), chosen above, in the game 𝒢n\mathcal{G}_{n} for any integer nn. Consider the probability that the cops win in 𝒢n\mathcal{G}_{n} when both players follow those strategies. These probabilities, for each nn, form a sequence (vn)n:=(v1,v2,)(v_{n})_{n\in\mathbb{N}}:=(v_{1},v_{2},\dots). This sequence is non-decreasing and bounded above by p𝒢p_{\mathcal{G}}^{*}.

Let AnA_{n} be the event \saythere is a capture in at most nn turns under strategies scs_{c} and srs_{r}. Observe that A0A1A_{0}\subseteq A_{1}\subseteq\dots is a non-decreasing sequence. Thus, by the Monotone Convergence Theorem:

p𝒢\displaystyle p_{\mathcal{G}}^{*} =[{hh is a play following sc,sr where cops win}]\displaystyle=\mathbb{P}\left[\{h\mid\mbox{$h$ is a play following $s_{c},s_{r}$ where cops win}\}\right]
=[i=0Ai]\displaystyle=\mathbb{P}\left[\cup_{i=0}^{\infty}A_{i}\right]
=limn[An]\displaystyle=\lim_{n\to\infty}\mathbb{P}\left[A_{n}\right]
=limnvn.\displaystyle=\lim_{n\to\infty}v_{n}.

Thus, for any ϵ>0\epsilon>0 there exists an integer NN such that for all nNn\geq N, p𝒢vn<ϵp_{\mathcal{G}}^{*}-v_{n}<\epsilon. But, we also have that vnpnv_{n}\leq p_{n}^{*} for any integer nn, since wn(i0)w_{n}(i_{0}) is the value of 𝒢n\mathcal{G}_{n}. Hence, it follows that 0<p𝒢pnp𝒢vn=|p𝒢vn|<ϵ0<p_{\mathcal{G}}^{*}-p_{n}^{*}\leq p_{\mathcal{G}}^{*}-v_{n}=\left\lvert p_{\mathcal{G}}^{*}-v_{n}\right\rvert<\epsilon. This completes the proof. ∎

It is interesting to note that this theorem only applies if there are best strategies for the cops and robbers. In particular, it is not true if 𝒢\mathcal{G} is played on the infinite graph of the following example.

Example 2.17.

Consider an infinite star graph with a central vertex, from which paths of lengths nn are deployed, for every integer nn, and consider the Classic Cops and Robbers game 𝒢\mathcal{G} on this graph with one cop and one robber. The best move for the cop is to start on the (infinitely branching) central vertex. Then whatever state the robber chooses, the cop will catch her in a finite number of turn, so this graph is copwin in the sense of Definition 2.10. However this number of turn is unbounded, so when playing in 𝒢n\mathcal{G}_{n}, the robber can simply choose a vertex at distance greater than nn; so the value of 𝒢n\mathcal{G}_{n} is 0 for all nn. The proof of the theorem fails in that case because, the graph being infinite, there is no optimal strategy for the robber in 𝒢\mathcal{G}. Whatever state the robber chooses, there is always a further state that would allow her to be captured in more turns, that is, there is always a better strategy.

Under certain conditions that will be further studied in Subsection 2.6, the (wn)n(w_{n})_{n\in\mathbb{N}} sequence becomes constant.

Definition 2.18.

We say that (wn)n(w_{n})_{n\in\mathbb{N}} is stationary if there exists an integer NN\in\mathbb{N} such that wn(s)=wn+1(s)w_{n}(s)=w_{n+1}(s), for all n>Nn>N, sSs\in S. We write w¯\overline{w} for the stationary part of (wn)n(w_{n})_{n\in\mathbb{N}}.

Remark 2.19.

It follows from the definition of wnw_{n} that, if for some NN, wN(s)=wN+1(s)w_{N}(s)=w_{N+1}(s) for all sSs\in S, then (wn)n(w_{n})_{n\in\mathbb{N}} is stationary and w¯\overline{w} starts at n=Nn=N or less.

From Theorem 2.13, we deduce Theorem 2.20 that is more in line with traditional game theoretical arguments and show that in addition to the equality limnwn(i0)=p𝒢\lim_{n\to\infty}w_{n}(i_{0})=p_{\mathcal{G}}^{*} we can compute explicitly the optimal strategy of the cops in 𝒢\mathcal{G}, from the limit of the wnw_{n}’s.

Theorem 2.20.

The (point-wise) limit w:=limnwnw_{\infty}:=\lim_{n\to\infty}w_{n} exists and it satisfies

w\displaystyle w_{\infty} (s)=\displaystyle(s)=
{ 1, if sF,maxaAcop(s)sSTcop(s,a,s)minaArob(s)s′′STrob(s,a,s′′)w(s′′), otherwise.\displaystyle\begin{cases}\mbox{~{}}~{}1,&\mbox{ if }s\in F,\\ \displaystyle\max_{\!\!a\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm cop}}(\!s^{\prime}\!)\!\!$}}\sum_{s^{\prime}\in S}\!T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})\displaystyle\min_{\!\!\!\!a^{\prime}\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm rob}}(\!s^{\prime}\!)\!\!$}}\sum_{\scalebox{0.7}{$s^{\prime\prime}$}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a^{\prime},s^{\prime\prime})w_{\infty}(s^{\prime\prime}),\mbox{}&\mbox{ otherwise.}\end{cases} (6)

Moreover, the optimal (memoryless) strategy for the cops in 𝒢\mathcal{G}, from any state ss, can be retrieved by a cops’ action for which the maximum of Equation (2.20) is achieved.

Proof.

Let LL be the lattice of functions S[0,1]S\rightarrow[0,1], ordered point-wise, with the null function as bottom element \bot. Equation (2.13) determines the following function :LL{\cal F}:L\rightarrow L. For f:S[0,1]f:S\to[0,1] and sSs\in S,

(f)\displaystyle{\cal F}(f) (s):=\displaystyle(s):=
{ 1, if sF,maxaAcop(s)sSTcop(s,a,s)minaArob(s)s′′STrob(s,a,s′′)f(s′′), otherwise.\displaystyle\begin{cases}\mbox{~{}}~{}1,&\mbox{ if }s\in F,\\ \displaystyle\max_{\!a\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm cop}}(s^{\prime}\!)\!$}}\sum_{s^{\prime}\in S}\!T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime})\displaystyle\min_{\!\!\!\!a^{\prime}\in\scalebox{0.8}{$A_{\scalebox{0.7}{\rm rob}}(s^{\prime}\!)\!$}}\sum_{\scalebox{0.7}{$s^{\prime\prime}$}\in S}T_{\scalebox{0.7}{\rm rob}}(s^{\prime},a^{\prime},s^{\prime\prime})f(s^{\prime\prime}),\mbox{}&\mbox{ otherwise.}\end{cases}

From previous remarks, {\cal F} is monotone increasing. Thus, we deduce from the Knaster-Tarski fixed point theorem [14] that {\cal F} has a least fixed point given by w:=limnn()w_{\infty}:=\lim_{n\to\infty}{\cal F}^{n}(\bot). Furthermore, we have ()=w0{\cal F}(\bot)=w_{0} and (wn1)=wn{\cal F}(w_{n-1})=w_{n}, so n+1()=wn{\cal F}^{n+1}(\bot)=w_{n}, for all integer nn, and thus w=limnwnw_{\infty}=\lim_{n\to\infty}w_{n} and satisfies Equation (2.20).

We showed in Theorem 2.13 that wn(i0)=pnw_{n}(i_{0})=p_{n}^{*}, and in Proposition 2.16 that limnpn=p𝒢\lim_{n\to\infty}p_{n}^{*}=p_{\mathcal{G}}^{*}. Consequently, w(i0)=p𝒢w_{\infty}(i_{0})=p_{\mathcal{G}}^{*}. Hence, w(i0)w_{\infty}(i_{0}) is the probability that the cops capture the robbers when both teams play optimally. Similarly, one can show that w(s)w_{\infty}(s) is the probability that, starting at ss, the cops capture the robbers when both team play optimally. This, together with the fact that ww_{\infty} satisfies Equation (2.20), imply that the optimal strategy for the cop is coherent with an action achieving the argmax\operatornamewithlimits{argmax} operator in place of the max\max operator in Equation (2.20). One cannot choose any such action because, for example, a temporary bad action, like staying idle, can give the same probability of winning than another action, but you can only choose it a finite number of times, which is incompatible with a memoryless strategy. ∎

Remark 2.21.

Recall that we have wn(i0)=pnw_{n}(i_{0})=p_{n}^{*} and that, by definition, it holds that pn=minσrobΩrobgmaxσcopΩcopgpn(σcop,σrob)p_{n}^{*}=\min_{\sigma_{\mathrm{rob}}\in\Omega_{\mathrm{rob}}^{\mathrm{g}}}\max_{\sigma_{\mathrm{cop}}\in\Omega_{\mathrm{cop}}^{\mathrm{g}}}p_{n}(\sigma_{\mathrm{cop}},\sigma_{\mathrm{rob}}). Thus, we could have defined wn(i0)w_{n}(i_{0}) with switched operators min\min and max\max. Then, we can deduce the optimal robbers strategies by flipping those operators and replacing the min\min operator by an argmin\operatornamewithlimits{argmin} operator. This also holds in ww_{\infty}.

Now, with the help of Equation (2.13) we can generalize the classic theorem of Cops and Robbers games. This is done in the next corollary.

Corollary 2.22.

Let 𝒢\mathcal{G} be a GPCR game. Then, 𝒢\mathcal{G} is copwin if and only if the sequence (wn)n(w_{n})_{n\in\mathbb{N}} is stationary and

w¯(i0)=1.\overline{w}(i_{0})=1.

Moreover, the game is pp-copwin if and only if the sequence is stationary and

w¯(i0)p.\overline{w}(i_{0})\geq p.

If 𝒢\mathcal{G} is not p-copwin for any pp, then the game is almost surely copwin if and only if the sequence is not stationary and

w(i0)=1.w_{\infty}(i_{0})=1.
Remark 2.23.

If the GPCR game 𝒢\mathcal{G} is deterministic, then wn(s)w_{n}(s) is 0 or 11 for any nn\in\mathbb{N} and sSs\in S. It therefore follows from monotonicity of (wn)n(w_{n})_{n\in\mathbb{N}} (see Corollary 2.14) and from Remark 2.19 that the stationary part starts at some N|S|N\leq\left\lvert S\right\rvert. Indeed, if wnwn+1w_{n}\neq w_{n+1} there is at least one ss such that wn(s)=0w_{n}(s)=0 and wn+1(s)=1w_{n+1}(s)=1. This difference can be observed at most |S|\left\lvert S\right\rvert times.

The conditions under which (wn)n(w_{n})_{n\in\mathbb{N}} is stationary are presented in Proposition 2.26.

2.5 The computational complexity of the wnw_{n} recursion

We show a result on the algorithmic complexity of computing function wnw_{n} (Equation (2.13)). This function is computable with dynamic programming, yet it may require a high number of operations, especially as its complexity is function of the size of the state space. Recall that Equation (2.13) was devised to be as general and efficient as possible. However, given the context of Definition 2.1, the best one can hope for its polynomial complexity in the size of the state and action spaces.

Proposition 2.24.

In the worst case and under a dynamic programming approach, computing wnw_{n} requires O(n|S|3max|Acop|max|Arob|)\mathop{}\!O{\left(n\left\lvert S\right\rvert^{3}\max|A_{\scalebox{0.7}{\rm cop}}|\max|A_{\scalebox{0.7}{\rm rob}}|\right)} operations, where max|Acop|\max|A_{\scalebox{0.7}{\rm cop}}| is maxsS|Acop(s)|\max_{s\in S}\left\lvert A_{\scalebox{0.7}{\rm cop}}(s)\right\rvert, similarly for max|Arob|\max|A_{\scalebox{0.7}{\rm rob}}|. The spatial complexity is O(n|S|)\mathop{}\!O{\left(n\left\lvert S\right\rvert\right)}.

Proof.

Let ana_{n} be the number of operations required for computing the recursion of wnw_{n}. Assume that computing probabilities TcopT_{\scalebox{0.7}{\rm cop}} and TrobT_{\scalebox{0.7}{\rm rob}} require unit cost. Clearly, a0=1a_{0}=1. In the worst case, when n>0n>0, all elements of the sets AcopA_{\scalebox{0.7}{\rm cop}} and ArobA_{\scalebox{0.7}{\rm rob}} must be considered in order to ensure optimality of the actions chosen and thus max|Arob|max|Acop|\textstyle\max|A_{\scalebox{0.7}{\rm rob}}|\max|A_{\scalebox{0.7}{\rm cop}}| operations are required. We always have that |S|>|{sSTcop(s,acop,s)>0}|\left\lvert S\right\rvert>\left\lvert\left\{s^{\prime}\in S\mid T_{\scalebox{0.7}{\rm cop}}(s,a_{\scalebox{0.7}{\rm cop}},s^{\prime})>0\right\}\right\rvert and similarly for Trob(s,arob,s)T_{\scalebox{0.7}{\rm rob}}(s,a_{\scalebox{0.7}{\rm rob}},s^{\prime}). Then, in the worst case,

an\displaystyle a_{n} |S|3max|Arob|max|Acop|+an1\displaystyle\leq\left\lvert S\right\rvert^{3}\textstyle\max|A_{\scalebox{0.7}{\rm rob}}|\max|A_{\scalebox{0.7}{\rm cop}}|+a_{n-1}
n|S|3max|Acop|max|Arob|+1,\displaystyle\leq n\left\lvert S\right\rvert^{3}\textstyle\max|A_{\scalebox{0.7}{\rm cop}}|\max|A_{\scalebox{0.7}{\rm rob}}|+1,

where we assumed that all values of wn1w_{n-1} were saved in memory for all n1n-1. Memorizing those values requires a spatial complexity of O(n|S|)\mathop{}\!O{\left(n\left\lvert S\right\rvert\right)} at most. The final complexity is thus O(n|S|3max|Acop|max|Arob|)\mathop{}\!O{\left(n\left\lvert S\right\rvert^{3}\max|A_{\scalebox{0.7}{\rm cop}}|\max|A_{\scalebox{0.7}{\rm rob}}|\right)}. ∎

Consequently, both spatial and temporal algorithmic complexities depend on the three sets SS, AcopA_{\scalebox{0.7}{\rm cop}} and ArobA_{\scalebox{0.7}{\rm rob}}. This suggests that these complexities may be high if the number of available actions is. One could imagine a game in which actions are paths, resulting in exponential complexity in |S|\left\lvert S\right\rvert. Still, whenever AcopO(p(|S|))A_{\scalebox{0.7}{\rm cop}}\in\mathop{}\!O{\left(p(\left\lvert S\right\rvert)\right)} and ArobO(q(|S|))A_{\scalebox{0.7}{\rm rob}}\in\mathop{}\!O{\left(q(\left\lvert S\right\rvert)\right)} for some polynomials pp and qq, then Equation (2.13) is clearly computable in polynomial time in the size of SS. Moreover, as we will see in Corollary 2.27, wnw_{n} does not have to be computed for all nn in order to determine if the cops have a winning strategy or not, essentially, n=|S|n=\left\lvert S\right\rvert suffices. In many studied cases, |S||S| is itself polynomial in the size of the structure on which the game is played, leading each time to polynomial time algorithms for solving the game.

2.6 A stationarity result

In traditional games of Cops and Robbers where a relation n\preceq_{n} is defined (such as the classic game [26] and the game with kk cops [7]), it is useful to prove results on the convergence of the recursion n\preceq_{n}. One demonstrates the relation becomes stationary, that is, there exists a number NN\in\mathbb{N} such that for all integers n>Nn>N and all pairs of vertices (u,v)V2(u,v)\in V^{2}, if unvu\preceq_{n}v, then un+1vu\preceq_{n+1}v. One then writes \preceq for the stationary part of the sequence, i.e. =N\preceq=\preceq_{N}. This result is vital for solving Cops and Robbers games as it ensures the relation \preceq can be computed in finite time.

Contrary to the relation n\preceq_{n} found in deterministic Cops and Robbers games (such as the classic game in Example 2.2), the relation wnw_{n} does not always become stationary. For example, on the triangle K3K_{3}, with one cop and one robber, although it is copwin in the classical sense, whenever one adds a probability of capture on the vertices, say 1/M1/M for M>0M>0, then after nn turns the cop will have captured the robber with probability only 1(11M)n1-(1-\frac{1}{M})^{n}. Thus, after nn turns, the cop can only ensure a probability of capture strictly less than 11, although he can clearly win with probability pp for any p[0,1]p\in[0,1]. In other words, a game may be almost surely copwin, but not c(1,n)c(1,n)-win for any integer nn. In the following proposition, we formulate and prove an upper bound on the minimal number of steps nn required to determine p𝒢p_{\mathcal{G}}^{*}, the probability of capture in an infinite game.

Recall that it does not hold in general that in a copwin graph (in the classical sense of one cop against one robber) every optimal strategy of the cop prevents him from visiting any vertex more than once [5]. Were this to be true, we could easily upper bound the capture time of the robber. However, we show in Lemma 2.25 that a milder version of this result holds for states, instead of simple cop position.

To have an intuition of why the following lemma is true, it is important to note that the condition of stationarity is a very strong one. The contraposition of the lemma may be more informative: the only way for wnw_{n} to become stationary is that there is no loop possible in any play following the optimal strategies of the players. An example of a graph where it does not happen is a cycle of length 3, where the robbers have equal probability in both directions in every state. There are plays where the robbers are caught after an arbitrary large number of turns. On the other hand, an acyclic graph does induce stationarity for wnw_{n}.

Lemma 2.25.

Suppose (wn(s))n(w_{n}(s))_{n\in\mathbb{N}} is stationary at N>0N>0 in a game 𝒢\mathcal{G}, for a state ss and that the cops and robbers follow their optimal strategy, from Proposition 2.15. Then, every winning play (for the cops) from ss brings the cops in any given state at most once (at the end of a turn).

Proof.

We prove the result for both the cops and robbers, that is, in a winning play where they follow their optimal strategy, none of them visit the same state twice at their turn. Because of stationarity and Proposition 2.16 the optimal strategy for the cops in 𝒢\mathcal{G} is also optimal in 𝒢n\mathcal{G}_{n}, nNn\geq N. Suppose the lemma is false. Then there is a winning play π\pi (i.e., reaching FF in NN turns or less) from state ss containing a loop through a state sks_{k} that is thus reached twice by the same player in the play, the second time being at sls_{l}, k<lk<l (with kk and ll having the same parity). This play follows the optimal strategies of the players. None of the states of the loop are in FF by definition of a play. Consider the set Π\Pi of plays πi\pi_{i} that start as π\pi until the first occurence of sls_{l}, follow the fragment slalsl+1ak1sks_{l}a_{l}s_{l+1}\ldots a_{k-1}s_{k} from sls_{l} for ii times, and then continue as the fragment of π\pi after it exits sl=sks_{l}=s_{k} the last time. These are plays (in particular, they are alternating between the players). All these plays are winning (one of them may be π\pi), but infinitely many of them reach FF at a turn greater than NN. If we prove that these plays follow the optimal strategies, this contradicts stationarity as the robbers are caught in more than NN turns in an infinite number of them, which implies that the value of 𝒢N+k\mathcal{G}_{N+k} is strictly greater than the value of 𝒢N\mathcal{G}_{N} for infinitely many kk.

We do have that any play of Π\Pi follows the optimal strategies. Indeed, since π\pi follows the optimal memoryless strategyies, everytime the play reaches state sk=sls_{k}=s_{l}, the same action is chosen for the player. In the first occurence, it leads to enter the loop, in the last one it leads to leave it. This happens when the action leads to a stochastic next state. ∎

Note that the lemma is not true if the robbers do not play well. Indeed consider the very simple deterministic game played on a cycle of length greater than 3; a robber can avoid capture indefinitiely by traveling away from the cop. Then (wn(s)n(w_{n}(s)_{n\in\mathbb{N}} is stationnary for every state ss. Consider a play where the robber decides to stop after having traveled 8 times around the cycle. The play is winning for the cop, but even if the cop follows the optimal strategy, the same state is encountered 8 times.

Proposition 2.26.

Let 𝒢\mathcal{G} be a GPCR game and sSs\in S. Then, the recursion wnw_{n} defined by Equation (2.13) is such that:

  1. 1.

    if w|S|(s)=0w_{\left\lvert S\right\rvert}(s)=0, then for every k>0k>0, w|S|+k(s)=0w_{\left\lvert S\right\rvert+k}(s)=0;

  2. 2.

    if w|S|+1(s)>w|S|(s)w_{\left\lvert S\right\rvert+1}(s)>w_{\left\lvert S\right\rvert}(s), then (wn(s))n(w_{n}(s))_{n\in\mathbb{N}} is not stationary.

Proof.

For the first claim, assume that w|S|+k(s)>0w_{\left\lvert S\right\rvert+k}(s)>0. Then there is a path from state ss to a final state in FF that follows σ|S|+k\sigma^{*}_{\left\lvert S\right\rvert+k} (and that has positive probability). If this path is longer than |S|\left\lvert S\right\rvert then it contains a repetition of at least one state ss^{\prime}, at turns, say, m1m_{1}, and m2m_{2}. Consider the finite horizon strategy that follows σ|S|+k\sigma^{*}_{\left\lvert S\right\rvert+k} for the first m1m_{1} turns, and then follows σ|S|+km2\sigma^{*}_{\left\lvert S\right\rvert+k-m_{2}}, which is the strategy followed by σ|S|+k\sigma^{*}_{\left\lvert S\right\rvert+k} when ss^{\prime} of π\pi was encountered for the second time originally in π\pi. So removing from π\pi the subpath between m1m_{1} and m2m_{2}, we obtain a shorter path that has positive value and that follows this strategy. By continuing this procedure, we obtain a path of length |S||S| or less and Claim 1 is proved.

From Lemma 2.25, if (wn(s))n(w_{n}(s))_{n\in\mathbb{N}} is stationary from NN, there is no (positive, or winning) plays where the same state is encountered twice in the NN first turns of 𝒢N\mathcal{G}_{N} following σN\sigma_{N}^{*}. Now, suppose N|S|N\geq|S|. Thus, there is no repetition of states, which implies that for all sSs\in S, all paths that contribute to the value wN(s)w_{N}(s) are of length at most |S||S|, and the result follows. ∎

It is interesting to note the contrapositive of the second item in Proposition 2.26, that if (wn(s))n(w_{n}(s))_{n\in\mathbb{N}} is stationary for some state ss, then w|S|(s)=w|S|+1(s)w_{\left\lvert S\right\rvert}(s)=w_{\left\lvert S\right\rvert+1}(s). In other words, the stationary part starts at most at turn |S|\left\lvert S\right\rvert. This result is by state, so other states may not be stationnary. Note however that we cannot deduce stationnarity from observing w|S|(s)=w|S|+1(s)w_{\left\lvert S\right\rvert}(s)=w_{\left\lvert S\right\rvert+1}(s) because the sequence may stay stable for a few turns and then be updated with a positive value. Anyway, we can complete the algorithmic complexity presented in Proposition 2.24.

Corollary 2.27.

In the worst case, under a dynamic programming approach at most O(|S|4max|Acop|max|Arob|)\mathop{}\!O{\left(\left\lvert S\right\rvert^{4}\max|A_{\scalebox{0.7}{\rm cop}}|\max|A_{\scalebox{0.7}{\rm rob}}|\right)} operations are sufficient in order to determine whether wnw_{n} is null, stationary equal to a number p(0,1]p\in(0,1] or infinitely increasing.

Proof.

The result follows from Proposition 2.26 and 2.24 by substituting nn for |S|\left\lvert S\right\rvert. For stationnarity, for example, if (wn)n(w_{n})_{n\in\mathbb{N}} is stationnary, then (wn(s))n(w_{n}(s))_{n\in\mathbb{N}} is stationnary for all sSs\in S, so we can conclude that (wn)n(w_{n})_{n\in\mathbb{N}} is stationnary at n=|S|n=\left\lvert S\right\rvert. ∎

2.7 Bonato and MacGillivray’s generalized Cops and Robbers game

This subsection is dedicated to our comparison with Bonato and MacGillivray’s generalized Cops and Robbers game [2], which is another attempt at studying Cops and Robbers games in general forms. For the sake of self-containment, their model is transcribed here. This model is completely deterministic and thus is included as a special case of Definition 2.1.

Bonato and MacGillivray’s game is presented in the following definition.

Definition 2.28 (Bonato and MacGillivray’s game).

A discrete time process 𝒢\mathcal{G} is a generalized Cops and Robbers game if it satisfies the following rules :

  1. 1.

    Two players, pursuer and evader compete against each other.

  2. 2.

    There is perfect information.

  3. 3.

    There is a set 𝒫P\mathcal{P}_{P} of admissible positions for the pursuer and a set 𝒫E\mathcal{P}_{E} for the evader. The set of admissible positions of the game is the subset 𝒫𝒫P×𝒫E\mathcal{P}\subseteq\mathcal{P}_{P}\times\mathcal{P}_{E} of positions that can be reached according to the rules of the game. The set of game states is the subset 𝒮𝒫×{P,E}\mathcal{S}\subseteq\mathcal{P}\times\left\{P,E\right\} such that ((pP,qE),X)𝒮((p_{P},q_{E}),X)\in\mathcal{S} if, when XX is the player next to play, the position (pP,qE)(p_{P},q_{E}) can be reached by following the rules of the game.

  4. 4.

    For each game state and each player, there exists a non-empty set of allowed movements. Each movement leaves the other player’s position unchanged. We write 𝒜P(pP,qE)\mathcal{A}_{P}(p_{P},q_{E}) the set of allowed movements for the pursuer when the game state is ((pP,qE),P)((p_{P},q_{E}),P) and 𝒜E(pP,qE)\mathcal{A}_{E}(p_{P},q_{E}) for the set of movements allowed to the evader when the game state is ((pP,qE),E)((p_{P},q_{E}),E).

  5. 5.

    The rules of the game specify how the game begins. Thus, there exists a set 𝒫P×𝒫E\mathcal{I}\subseteq\mathcal{P}_{P}\times\mathcal{P}_{E} of admissible starting positions. We define P={pP:qe𝒫E,(pP,qE)}\mathcal{I}_{P}=\left\{p_{P}:\exists\;q_{e}\in\mathcal{P}_{E},(p_{P},q_{E})\in\mathcal{I}\right\} and, for pP𝒫Pp_{P}\in\mathcal{P}_{P}, we define the set E(pP)={qE𝒫E:(pP,qE)}\mathcal{I}_{E}(p_{P})=\left\{q_{E}\in\mathcal{P}_{E}:(p_{P},q_{E})\in\mathcal{I}\right\}. The game 𝒢\mathcal{G} starts with the pursuer choosing a starting position pPPp_{P}\in\mathcal{I}_{P} and then the evader choosing a starting position qEE(pP)q_{E}\in\mathcal{I}_{E}(p_{P}).

  6. 6.

    After both players have chosen their initial positions, the game unfolds alternatively with the pursuer moving first. Each player, on his turn, must choose an admissible action given the current state.

  7. 7.

    The rules of the game specify when the pursuer has captured the evader. In other words, there is a subset \mathcal{F} of final positions. The pursuer wins 𝒢\mathcal{G} if, at any moment, the current position belongs to \mathcal{F}. The evader wins if his position never belongs to \mathcal{F}.

Only Cops and Robbers games in which the set 𝒫\mathcal{P} is finite are considered. Games considered are played on a finite sequence of turns indexed by natural integers including 0.

We also present how the same authors defined an extension of the relation n\preceq_{n} of Nowakowski and Winkler [26] in order to solve the set of games characterized by their model.

Definition 2.29 (Bonato and MacGillivray’s n\preceq_{n}).

Let 𝒢\mathcal{G} be a Cops and Robbers game given by Definition 2.28. We let :

  1. 1.

    qE0pPq_{E}\preceq_{0}p_{P} if and only if (pP,qE)(p_{P},q_{E})\in\mathcal{F}.

  2. 2.

    Suppose that 0,1,,i1\preceq_{0},\preceq_{1},\dots,\preceq_{i-1} have all been defined for some i1i\geq 1. Define qEipPq_{E}\preceq_{i}p_{P} if (pP,qE)(p_{P},q_{E})\in\mathcal{F} or if ((pP,qE),E)𝒮((p_{P},q_{E}),E)\in\mathcal{S} and for all xE𝒜E(pP,qE)x_{E}\in\mathcal{A}_{E}(p_{P},q_{E}) either (pP,xE)(p_{P},x_{E})\in\mathcal{F} or there exists some wP𝒜P(pP,xE)w_{P}\in\mathcal{A}_{P}(p_{P},x_{E}) such that xEjwPx_{E}\preceq_{j}w_{P} for some j<ij<i.

By definition, i\preceq_{i} contains i1\preceq_{i-1} for all i1i\geq 1. Since 𝒫E×𝒫P\mathcal{P}_{E}\times\mathcal{P}_{P} are finite, there exists some tt such that t=k\preceq_{t}=\preceq_{k} for all ktk\geq t. We define =t\preceq=\preceq_{t}.

Bonato and MacGillivray then use the relation defined in Definition 2.29 to show a necessary and sufficient condition for the existence of a winning strategy for the pursuer that is greatly similar to corresponding theorem of Nowakowski and Winkler [26].

Theorem 2.30 (The copwin theorem of Bonato and MacGillivray).

The pursuer has a winning strategy in a game of Cops and Robbers characterized by Definition 2.28 if and only if there exists some pPPp_{P}\in\mathcal{I}_{P} such that for all qEE(pP)q_{E}\in\mathcal{I}_{E}(p_{P}), either (pP,qE)(p_{P},q_{E})\in\mathcal{F} or there exists wP𝒜P(pP,qE)w_{P}\in\mathcal{A}_{P}(p_{P},q_{E}) such that qEwPq_{E}\preceq w_{P}.

It should be clear at this point that both Definitions 2.1 and 2.28 describe alternative pursuit games of perfect information that unfold on discrete structures. Although the notation is different in both cases, it should also be clear that Bonato and MacGillivray’s model is embedded in ours. The only difference between our formalism has to do with the initial states. Indeed, we only allow one initial state, i0i_{0}, which is not the case in Definition 2.28. This does not cause any problem as it suffices to play one more turn in our model, or even to modify the first reachable states. In order to simplify what follows, we assume the set of initial states in both models are equivalent. We conclude that Equation (2.13) should encode the relation n\preceq_{n} of Definition 2.29. Indeed, other than its deterministic character, the relation n\preceq_{n} is greatly similar to our recursion. Both relations are binary and recursive. Both share the same structure: a single case when n=0n=0 in which both players may not make another move; a second case when n>0n>0, but the current state is final; finally, a last case, again when n>0n>0, when both players must choose an action that is optimal in the subsequent turns. Thus, we formally show how those two equations are related in the coming lines.

We first note that Equation (2.13) can be simplified when following Bonato and MacGillvray’s model. Since the component sos_{\mathrm{o}} is not used in what follows, we simply write (c,r)S(c,r)\in S. Since the game is deterministic, we let players choose their next position directly. The recursion wnw_{n} is thus given by :

w0(c,r)\displaystyle w_{0}(c,r) =1(c,r)F;\displaystyle=1\iff(c,r)\in F;
wn(c,r)\displaystyle w_{n}(c,r) ={1, if (c,r)F;maxcAcop(c,r)minrArob(c,r)wn1(c,r), otherwise.\displaystyle=\begin{cases}1,&\mbox{ if }(c,r)\in F;\\ \displaystyle\max_{c^{\prime}\in A_{\scalebox{0.7}{\rm cop}}(c,r)}\min_{r^{\prime}\in A_{\scalebox{0.7}{\rm rob}}(c^{\prime},r)}w_{n-1}(c^{\prime},r^{\prime}),&\mbox{ otherwise.}\end{cases} (7)

The following theorem thus makes the connection between the two formalisms that are our model and that of Bonato and MacGillivray. In order to clarify the exposition, the relation n\preceq_{n} is written in our model, that of Definition 2.1. Given the preceding remarks, this should incur no loss of generality.

Theorem 2.31.

Let the relation n\preceq_{n} be given by Definition 2.29 and wnw_{n} the recursion given by Equation (2.13). Assume 𝒢\mathcal{G} is a GPCR game given by Definition 2.1, but following the specifications of Definition 2.28. Then, we have:

wn(c,r)=1acopAcop(c,r):rnc.\displaystyle w_{n}(c,r)=1\iff\exists\;a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(c,r):r\preceq_{n}c^{\prime}. (8)
Proof.

First, observe that relation n\preceq_{n} compares the positions of the pursuer and the evader. These positions are encoded in the game states SS of our model. Moreover, the set of actions 𝒜\mathcal{A} defined in model 2.28 are in fact restrictions of the set of actions from model 2.1. Indeed, actions in 𝒜\mathcal{A} directly correspond to game positions, whereas we enable, in definition 2.1, the action sets to be disjoint from the set of states. It is thus possible to define a game 𝒢\mathcal{G} that respects the hypotheses of Definition 2.28 and where Expression (8) is well-defined. A subtle difference between both formalisms has to do with the turn counters: in relation wnw_{n} the cops are next to play while the robbers are to make their move in relation n\preceq_{n}. This does not change the fact that cops play first in both games. Now, we prove the result by induction, similarly as in the proof of Proposition 3.2. Base case: n=0n=0. w0(c,r)=1w_{0}(c,r)=1 if and only if (c,r)F(c,r)\in F and (c,r)F(c,r)\in F if and only if r0cr\preceq_{0}c. Induction step. Assume the result holds for nkn\leq k and let’s show it for n=k+1n=k+1. It holds that wk+1(c,r)=1w_{k+1}(c,r)=1 if and only if (c,r)F(c,r)\in F, in which case rk+1cr\preceq_{k+1}c by definition, or there exists an action cAcop(c,r)c^{\prime}\in A_{\scalebox{0.7}{\rm cop}}(c,r) for the cops such that no matter the response rArob(c,r)r^{\prime}\in A_{\scalebox{0.7}{\rm rob}}(c^{\prime},r) of the robbers, we have wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1. By the induction hypothesis, we have wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1 if and only if there exists an action c′′Acop(c,r)c^{\prime\prime}\in A_{\scalebox{0.7}{\rm cop}}(c^{\prime},r^{\prime}) such that rkc′′r^{\prime}\preceq_{k}c^{\prime\prime}. Thus, if the cops play action cc^{\prime}, they position themselves on a state in which rk+1cr\preceq_{k+1}c^{\prime}. Conversely, assume there exists an action cAcop(c,r)c^{\prime}\in A_{\scalebox{0.7}{\rm cop}}(c,r) such that rk+1cr\preceq_{k+1}c^{\prime}. Then, by definition, for all response rArob(c,r)r^{\prime}\in A_{\scalebox{0.7}{\rm rob}}(c^{\prime},r) of the robbers there exists an action c′′Acop(c,r)c^{\prime\prime}\in A_{\scalebox{0.7}{\rm cop}}(c^{\prime},r^{\prime}) of the cops such that rkc′′r^{\prime}\preceq_{k}c^{\prime\prime}. In this case, by the induction hypothesis, we have wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1. The cops play action cAcop(c,r)c^{\prime}\in A_{\scalebox{0.7}{\rm cop}}(c,r), in which case we have wk+1(c,r)=1w_{k+1}(c,r)=1. ∎

3 A concrete model of GPCR games

In this section we present a more concrete model of GPCR games that is closer to the usual definitions in the literature. Thus, we specify that the game is played on a graph, without pointing out its particular shape. The actions of the players will correspond to paths as in the game of Cop and Fast Robber [25]. The game presented in Definition 2.1 is abstract because its sets do not depend on any precise structure and so neither does the algorithmic complexity of computing Equation (2.13). The point of reformulating Definition 2.1 is to refine some results and formulate them in terms of the graph’s structure.

3.1 Definition of concrete Cops and Robbers games

In the game presented below, players walk on paths since it appears, in light of the literature, that such actions are most general. We also grant the cops a watch zone that enables them to capture the robbers whenever they are observed. We write 𝒫\mathcal{P} for the set of finite paths in a graph and 𝒫v𝒫\mathcal{P}_{v}\subseteq\mathcal{P} for the set of paths that start on vertex vVv\in V. To simplify the notation, we formulate the concrete model in the setting where there are one cop and one robber, and without the auxiliary information set SoS_{\mathrm{o}}. The extension to the general case is straightforward.

Definition 3.1.

A GPCR game 𝒢=(S,i0,F,A,Tcop,Trob)\mathcal{G}=\left(S,i_{0},F,A,T_{\scalebox{0.7}{\rm cop}},T_{\scalebox{0.7}{\rm rob}}\right) with one cop and one robber (Definition 2.1) is concrete if there is a graph G=(V,E)G=(V,E) satisfying:

  1. 1.

    S=Scop×SrobS=S_{\scalebox{0.7}{\rm cop}}\times S_{\scalebox{0.7}{\rm rob}} is a finite set of configurations of the game.

  2. 2.

    i0=(icop,irob)i_{0}=(i_{\mathrm{cop}},i_{\mathrm{rob}}), where icop,irobVi_{\mathrm{cop}},i_{\mathrm{rob}}\not\in V.

  3. 3.

    ScopV×𝒫(E){icop}S_{\scalebox{0.7}{\rm cop}}\subseteq V\times\mathcal{P}(E)\cup\{i_{\mathrm{cop}}\} is the set of configurations of the cop. The second coordinate is the cop’s watch zone.

  4. 4.

    SrobV{irob}S_{\scalebox{0.7}{\rm rob}}\subseteq V\cup\{i_{\mathrm{rob}}\} is the set of positions of the robber.

  5. 5.

    Acop((c,z),r)𝒫c×𝒫(E)A_{\scalebox{0.7}{\rm cop}}((c,z),r)\subseteq\mathcal{P}_{c}\times\mathcal{P}(E) is the set of available actions for the cop. He can move along a path from his present position cc and choose a watch zone. From the initial state, Acop(icop)V×𝒫(E)A_{\scalebox{0.7}{\rm cop}}(i_{\mathrm{cop}})\subseteq V\times\mathcal{P}(E).

  6. 6.

    Arob((c,z),r)𝒫rA_{\scalebox{0.7}{\rm rob}}((c,z),r)\subseteq\mathcal{P}_{r} is the set of available actions for the robber. She can move along a path from her present position rr. From the initial state, Arob(irob)VA_{\scalebox{0.7}{\rm rob}}(i_{\mathrm{rob}})\subseteq V.

The definition of a play and all previous remarks and details that apply to Definition 2.1 are still applicable in Definition 3.1. A peculiarity here is how we let the cop have his own watch zone consisting in a set of edges. Thus, the cop can only capture the robber on the robber’s turn. Indeed, seeing as the robber moves along paths, we can explicitly deduce at what point a robber is susceptible to get caught crossing a cop’s watch zone. It’s a natural choice of modeling that makes writing the probability of capture easier.

3.2 Example: Classic Cop and Robber game

Nowakowski and Winkler’s, and Quilliot’s, game is now presented in the form of Definition 3.1. In this game, we will consider that the game is over not when the cop reaches the same position as the robber, but exactly after that, during the robber’s turn, when she tries to escape. This slightly different interpretation leads to the same game. Our presentation allows to model and solve a more general situation where the robber could have a possibility of escaping, even if the cop reaches the robber’s position. Let G=(V,E)G=(V,E) be a finite, undirected, reflexive and connected graph and let:

Scop\displaystyle S_{\scalebox{0.7}{\rm cop}} =V×𝒫(E)\displaystyle=V\times\mathcal{P}(E)
Srob\displaystyle S_{\scalebox{0.7}{\rm rob}} =V\displaystyle=V
Acop(c,r)\displaystyle A_{\scalebox{0.7}{\rm cop}}(c,r) ={([c,c],Ec)[c,c]E}.\displaystyle=\{([c,c^{\prime}],E_{c^{\prime}})\mid[c,c^{\prime}]\in E\}.

The watch zone EcE_{c^{\prime}} of the next state is the set of adjacent edges of the cop’s next position cc^{\prime}. The final states are those in which both players stand on the same vertex, F={(c,r)S:c=r}F=\left\{(c,r)\in S:c=r\right\}. The initial state is i0=(icop,irob)i_{0}=(i_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}}) and we let players choose any vertex from it, that is, Acop(icop,irob)=Arob(c,irob)=VA_{\scalebox{0.7}{\rm cop}}(i_{\scalebox{0.7}{\rm cop}},i_{\scalebox{0.7}{\rm rob}})=A_{\scalebox{0.7}{\rm rob}}(c,i_{\scalebox{0.7}{\rm rob}})=V, with cVc\in V. Finally, the probabilities of transition are trivial since the game is deterministic.

Now, in order to show that Equation (2.13) is well-defined, we demonstrate how it encodes the relation n\preceq_{n} of Nowakowski and Winkler [26]. Since the game is deterministic, Equation (2.13) reduces to :

w0(c,r)\displaystyle w_{0}(c,r) =1c=r\displaystyle=1\iff c=r
wn(c,r)\displaystyle w_{n}(c,r) =maxcN[c]minrN[r]wn1(c,r).\displaystyle=\max_{c^{\prime}\in N[c]}\min_{r^{\prime}\in N[r]}w_{n-1}(c^{\prime},r^{\prime}). (9)

This equation is also a particular case of Equation (2.7). The next proposition shows that Equation (3.2) simulates the relation n\preceq_{n}.

Proposition 3.2.

It holds that wn(c,r)=1w_{n}(c,r)=1 if and only if there exists a vertex cN[c]c^{\prime}\in N[c] such that rncr\preceq_{n}c^{\prime}.

Proof.

We prove the result by induction. We note that in recursion wnw_{n} it is the cop’s turn to play, while in relation n\preceq_{n} the robber is next to move. Base case: n=0n=0. w0(c,r)=1w_{0}(c,r)=1 if and only if r=cr=c and r=cr=c if and only if r0cr\preceq_{0}c. Induction step. Assume the result holds for nkn\leq k and let us show it holds for n=k+1n=k+1. Then, wk+1(c,r)=1w_{k+1}(c,r)=1 if and only if there exists an action cc^{\prime} for the cop from which, no matter the response rr^{\prime} of the robber, we have wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1. By the induction hypothesis, wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1 if and only if there exists a vertex c′′N[c]c^{\prime\prime}\in N[c^{\prime}] such that rkc′′r^{\prime}\preceq_{k}c^{\prime\prime}. Thus, the cop can play action cN[c]c^{\prime}\in N[c] and we have rk+1cr\preceq_{k+1}c^{\prime}. Conversely, if there exists a vertex cN[c]c^{\prime}\in N[c] such that rk+1cr\preceq_{k+1}c^{\prime}, then, by definition, for any action rN[r]r^{\prime}\in N[r] of the robber there exist a response c′′N[c]c^{\prime\prime}\in N[c] of the cop such that rkc′′r^{\prime}\preceq_{k}c^{\prime\prime}. By the induction hypothesis, we thus have wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1. In this case, the cop can play action cN[c]c^{\prime}\in N[c] such that, no matter the answer of the robber rN[r]r^{\prime}\in N[r], wk(c,r)=1w_{k}(c^{\prime},r^{\prime})=1. By definition, we thus have wk+1(c,r)=1w_{k+1}(c,r)=1. ∎

3.3 Example: Cop and Fast Defending Robber game

Definition 3.1 is further illustrated on the following example. It describes the game of Cop and Fast Robber with probability of capture, which is a variant of the one presented by Fomin et al. [10], already mentioned in Example 2.5, and a variant of Example 2.4, where the robber could evade from capture. Unsurprisingly, given both games ask of robbers to move along paths, it is easier to write this new game following Definition 3.1.

For a path π𝒫\pi\in\mathcal{P} on a graph GG, we write π[k]\pi[k] for its kthk^{th} vertex and π[]\pi[*] for its last one. Let G=(V,E)G=(V,E) be a finite graph. Assume that the cop guards a watch zone CEC\subset E and that each time the robber crosses an edge ee he survives his walk with probability qC(e)q_{C}(e) (between 0 and 1). In Example 2.4, a capture probability was used, here we define a survival probability as it is simpler to use in the current context. Contrary to the Defending Robber game 2.4, the probability of survival depends on the cop’s watch zone as well as the robber’s action. Here, only the cop’s watch zone and the transition functions are modified compared to Example 2.5. So we have an element jVj^{*}\notin V and the set of final states are F={(j,,j)}F=\left\{(j^{*},\emptyset,j^{*})\right\}. We write EcE_{c} for the set of edges incident to cc. Similarly, we write EπE_{\pi} for the set of edges of a path π\pi. Let:

Tcop((c,Ec,r),c)\displaystyle T_{\scalebox{0.7}{\rm cop}}((c,E_{c},r),c^{\prime}) ={δ(c,Ec,r), if c=icop and cV or if cV and cN[c];0, otherwise.\displaystyle=\begin{cases}\delta_{(c^{\prime},E_{c^{\prime}},r)},&\mbox{ if }c=i_{\scalebox{0.7}{\rm cop}}\mbox{ and }c^{\prime}\in V\\ &\mbox{ or if }c\in V\mbox{ and }c^{\prime}\in N[c];\\ 0,&\mbox{ otherwise.}\end{cases}

The robber’s transition function is given by:

Trob((c,Ec,r),π)\displaystyle T_{\scalebox{0.7}{\rm rob}}((c,E_{c},r),\pi) ={δ(c,Ec,π[]), if EπEc=;D(r,π[]), if EπEc;\displaystyle=\begin{cases}\delta_{(c,E_{c},\pi[*])},&\mbox{ if }E_{\pi}\cap E_{c}=\emptyset;\\ D_{(r,\pi[*])},&\mbox{ if }E_{\pi}\cap E_{c}\neq\emptyset;\end{cases}

where D(r,π[])D_{(r,\pi[*])} is a function satisfying:

D(r,π[])(x)={eEπqEc(e), if x=(c,Ec,π[]);1eEπqEc(e), if x=(j,,j).D_{(r,\pi[*])}(x)=\begin{cases}\prod_{e\in E_{\pi}}q_{E_{c}}(e),&\mbox{ if }x=(c,E_{c},\pi[*]);\\ 1-\prod_{e\in E_{\pi}}q_{E_{c}}(e),&\mbox{ if }x=(j^{*},\emptyset,j^{*}).\end{cases}

Note that to retrieve the game considered in Example 2.5 and Marcoux’s thesis [25], we should rather use a watch zone EcE_{c} containing all edges on paths of length 2 from cc and change the conditions on TrobT_{\scalebox{0.7}{\rm rob}} for Eπ1Ec=E_{\pi_{1}}\cap E_{c}=\emptyset and Eπ1EcE_{\pi_{1}}\cap E_{c}\neq\emptyset, where π1{\pi_{1}} is the subpath of π\pi starting in π[1]\pi[1].

Since the watch zone is determined by the cop’s position, we can use the simplified notation (c,r)(c,r) for a state (c,Ec,r)(c,E_{c},r). Thus, the recursion of Equation (2.13) can be written as follows. For the jail state: wi(j,,j)=1w_{i}(j^{*},\emptyset,j^{*})=1 for all i0i\geq 0. For (c,Ec,r)(j,,j)(c,E_{c},r)\neq(j^{*},\emptyset,j^{*}), we have w0(c,r)=0w_{0}(c,r)=0 and, for n1n\geq 1,

wn(c,r)\displaystyle w_{n}(c,r) =\displaystyle=
maxcN[c]\displaystyle\max_{\!\!\!\!c^{\prime}\in N[c]} minπ𝒫rTrob((c,r),π,(c,π[]))wn1(c,π[])+Trob((c,r),π,(j,j)).\displaystyle\min_{\pi\in\mathcal{P}_{r}\!\!}T_{\scalebox{0.7}{\rm rob}}((c^{\prime},r),\pi,(c^{\prime},\pi[*]))w_{n-1}(c^{\prime},\pi[*])+T_{\scalebox{0.7}{\rm rob}}((c^{\prime},r),\pi,(j^{*},j^{*})).

Following Proposition 2.24, the algorithmic complexity of the previous recursion is at most O(nΔ|V|6|𝒫|)\mathop{}\!O{\left(n\Delta\left\lvert V\right\rvert^{6}\left\lvert\mathcal{P}\right\rvert\right)}, where Δ\Delta is the maximal degree of GG. Indeed, SS corresponds to the set of pairs of vertices, the cop can only move on his neighbourhood and the robber is allowed to choose any path of finite length. Hence, even if we restrict the possible paths that can choose the robber to elementary paths (paths that do not cross twice a same vertex), the size of the possible robber actions, and therefore the size of 𝒫\cal{P}, is exponential in the size of the graph on which the game is played. However, as shown in the next proposition, wnw_{n} can be computed in polynomial time in the size of the graph itself.

Proposition 3.3.

Computing wn(i)w_{n}(i) in the Cop and Fast Defending Robber game requires at most O(|V|3log|V|+(n+1)|V|2|E|)\mathop{}\!O{\left(\left\lvert V\right\rvert^{3}\log\left\lvert V\right\rvert+(n+1)\left\lvert V\right\rvert^{2}\left\lvert E\right\rvert\right)} operations and uses at most O(|V|3)O({\left\lvert V\right\rvert^{3}}) space, for any nn\in\mathbb{N}.

Proof.

Let 𝒫rr\mathcal{P}_{r}^{r^{\prime}} be the set of paths beginning in rr and ending in rr^{\prime}. Let (c,Ec)(c,E_{c}) be a cop position. The robber’s transition function can be simplified by assuming qEc(e)=1q_{E_{c}}(e)=1 if eEce\notin E_{c}. Then, Trob((c,r),π,(c,π[r]))=ΠeEπqEc(e)T_{\scalebox{0.7}{\rm rob}}((c,r),\pi,(c,\pi[r^{\prime}]))=\Pi_{e\in E_{\pi}}q_{E_{c}}(e) if the robber is not caught on π[r]\pi[r^{\prime}]. The previous recursion, when state (c,r)(c,r) is not final, can be simplified to:

wn(c,r)\displaystyle w_{n}(c,r) =maxcN[c]minrVπ𝒫rr(eEπqEc(e)wn1(c,r)+1eEπqEc(e)).\displaystyle=\max_{c^{\prime}\in N[c]}\min_{\begin{subarray}{c}r^{\prime}\in V\\ \pi\in\mathcal{P}_{r}^{r^{\prime}}\end{subarray}}\left(\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e)w_{n-1}(c^{\prime},r^{\prime})+1-\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e)\right).
=maxcN[c]minrVπ𝒫rr((wn1(c,r)1)eEπqEc(e)+1).\displaystyle=\max_{c^{\prime}\in N[c]}\min_{\begin{subarray}{c}r^{\prime}\in V\\ \pi\in\mathcal{P}_{r}^{r^{\prime}}\end{subarray}}\Big{(}(w_{n-1}(c^{\prime},r^{\prime})-1)\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e)+1\Big{)}.

If cc^{\prime} and rr^{\prime} are fixed, we look for the path π\pi minimizing the expression in parentheses, so maximizing eEπqEc(e)\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e). This is the same path that maximizes eEπlogqEc(e)\sum_{e\in E_{\pi}}\log q_{E_{c^{\prime}}}(e) because log\log is a monotone increasing function. Because logqEc(e)<0\log q_{E_{c^{\prime}}}(e)<0, with qEc(e)[0,1]q_{E_{c^{\prime}}}(e)\in[0,1], we can minimize eEπlogqEc(e)\sum_{e\in E_{\pi}}-\log q_{E_{c^{\prime}}}(e).

Observe that the survival probabilities qEc(e)q_{E_{c^{\prime}}}(e) depend only on the vertex cc^{\prime} and the edge ee. Thus, prior to evaluating wn(c,r)w_{n}(c,r), we can precompute |V|\left\lvert V\right\rvert all-pairs shortest paths (one for each possible cop position) by weighting each edge eEe\in E with logqEx(e)-\log q_{E_{x}}(e) for each source xVx\in V. This is done in O(|E||V|2+|V|3log|V|)\mathop{}\!O{\left(\left\lvert E\right\rvert\left\lvert V\right\rvert^{2}+\left\lvert V\right\rvert^{3}\log\left\lvert V\right\rvert\right)} operations, for example by using the algorithm of Fredman and Tarjan [12]. This takes O(|V|3)\mathop{}\!O{\left(\left\lvert V\right\rvert^{3}\right)} space (because the path does not have to be stored, it can be recomputed in O(Δ|V|)\mathop{}\!O{\left(\Delta\left\lvert V\right\rvert\right)}). Thus, for each cc^{\prime} we store the values eEπqEc(e)\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e) for the shortest path π\pi between rr and rr^{\prime}. Finding the next robber position rr^{\prime} thus requires at most O(|V|)\mathop{}\!O{\left(\left\lvert V\right\rvert\right)} operations.

Now, assume wn1(c,r)w_{n-1}(c^{\prime},r^{\prime}) is computed for all cc^{\prime}, rr^{\prime}, in time an1a_{n-1}. We look for the vertex cN[c]c^{\prime}\in N[c] maximizing

minrVπ𝒫rr((wn1(c,r)1)eEπqEc(e)+1).\min_{\begin{subarray}{c}r^{\prime}\in V\\ \pi\in\mathcal{P}_{r}^{r^{\prime}}\end{subarray}}\Big{(}(w_{n-1}(c^{\prime},r^{\prime})-1)\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e)+1\Big{)}.

The values wn1(c,r)w_{n-1}(c^{\prime},r^{\prime}) are already computed, as well as the eEπqEc(e)\prod_{e\in E_{\pi}}q_{E_{c^{\prime}}}(e)’s. Thus, at most O(|N[c]||V|)\mathop{}\!O{\left(\left\lvert N[c]\right\rvert\left\lvert V\right\rvert\right)} operations are required to evaluate expression wn(c,r)w_{n}(c,r) when cc and rr are fixed. To find all maxima, that is for all cVc\in V and rVr\in V, we need to make at most O(|V|c|V||N[c]|)=O(2|V||E|)\mathop{}\!O{\left(\left\lvert V\right\rvert\sum_{c\in\left\lvert V\right\rvert}\left\lvert N[c]\right\rvert\right)}=\mathop{}\!O{\left(2\left\lvert V\right\rvert\left\lvert E\right\rvert\right)} operations. On turn nn, we make a number of operations anO(|V||E|)+an1O(n|V||E|).a_{n}\in\mathop{}\!O{\left(\left\lvert V\right\rvert\left\lvert E\right\rvert\right)}+a_{n-1}\subseteq\mathop{}\!O{\left(n\left\lvert V\right\rvert\left\lvert E\right\rvert\right)}. The total complexity is thus:

O(|E||V|2+|V|3log|V|)+an=O((n+1)|E||V|2+|V|3log|V|).\mathop{}\!O{\left(\left\lvert E\right\rvert\left\lvert V\right\rvert^{2}+\left\lvert V\right\rvert^{3}\log\left\lvert V\right\rvert\right)}+a_{n}=\mathop{}\!O{\left((n+1)\left\lvert E\right\rvert\left\lvert V\right\rvert^{2}+\left\lvert V\right\rvert^{3}\log\left\lvert V\right\rvert\right)}.

The bottlenecks for the spatial complexity are the shortest path algorithms which require at most O(|V|3)\mathop{}\!O{\left(\left\lvert V\right\rvert^{3}\right)} space. For each wnw_{n} we only need wn1w_{n-1} so we do not need to store any other wkw_{k} for k<n1k<n-1. ∎

An important aspect of the fast robber game is its ability to model situations of imperfect information in which the cops only gather information on the robber’s position at regular intervals. This game, deemed with witness, is shown by Chalopin et al. [6] to correspond to the game of Cop and Fast Robber (without watch zone). In essence, the authors present an equivalence between the classes of copwin graphs in the witness game and in the fast robber one. We can wonder if the same could be said of the stochastic case.

3.4 Example: Cop and Drunk Robber game

Let us revisit the Cop and Drunk Robber game of Example 2.3 with the concrete model and Equation (2.13).

We can show how it is always easier to capture a robber moving randomly than a robber playing optimally. For the sake of generality, assume the robber can play according to any distribution in DistN[r]\mathrm{Dist}_{N[r]} when she finds herself on vertex rr. Let ϕ(DistN[r])rV\phi\subseteq\left(\mathrm{Dist}_{N[r]}\right)_{r\in V} be a sequence of distributions on the vertices VV and ϕr\phi_{r} its component that is in DistN[r]\mathrm{Dist}_{N[r]}. Then, we write wnϕ(c,r)w_{n}^{\phi}(c,r) for the recursion in which Trob((c,r),r)=ϕrT_{\scalebox{0.7}{\rm rob}}((c,r),r^{\prime})=\phi_{r}. In other words, we write:

wnϕ(c,r)=maxcN[c]rN[r]ϕr(r)wn1ϕ(c,r),\displaystyle w_{n}^{\phi}(c,r)=\max_{c^{\prime}\in N[c]}\sum_{r^{\prime}\in N[r]}\phi_{r}(r^{\prime})w_{n-1}^{\phi}(c^{\prime},r^{\prime}), (10)

if crc\neq r and wnϕ(c,r)=1w_{n}^{\phi}(c,r)=1 if c=rc=r, for all n0n\geq 0. The classic recursion from Equation (3.2) is written wn(c,r)w_{n}(c,r).

Proposition 3.4.

It is always easier to capture a robber playing randomly than an adversarial one, that is:

wnϕ(c,r)wn(c,r).w_{n}^{\phi}(c,r)\geq w_{n}(c,r).
Proof.

We write δN[r]\delta_{N[r]} for the set of Dirac distributions defined on N[r]N[r]. The robber would be harder to capture if she were to minimize her probability of capture only on Dirac distributions because her optimal strategy is deterministic. Thus, we compute

wnϕ(c,r)\displaystyle w_{n}^{\phi}(c,r) :=maxcN[c]rN[r]ϕr(r)wn1ϕ(c,r)\displaystyle:=\max_{c^{\prime}\in N[c]}\sum_{r^{\prime}\in N[r]}\phi_{r}(r^{\prime})w_{n-1}^{\phi}(c^{\prime},r^{\prime})
maxcN[c]minψ(DistN[r])rVrN[r]ψr(r)wn1ψ(c,r)\displaystyle\geq\max_{c^{\prime}\in N[c]}\min_{\psi\subseteq\left(\mathrm{Dist}_{N[r]}\right)_{r\in V}}\sum_{r^{\prime}\in N[r]}\psi_{r}(r^{\prime})w_{n-1}^{\psi}(c^{\prime},r^{\prime})
=maxcN[c]minψ(δN[r])rVrN[r]ψr(r)wn1ψ(c,r)\displaystyle=\max_{c^{\prime}\in N[c]}\min_{\psi\subseteq\left(\delta_{N[r]}\right)_{r\in V}}\sum_{r^{\prime}\in N[r]}\psi_{r}(r^{\prime})w_{n-1}^{\psi}(c^{\prime},r^{\prime})
=maxcN[c]minrN[r]wn1(c,r).\displaystyle=\max_{c^{\prime}\in N[c]}\min_{r^{\prime}\in N[r]}w_{n-1}(c^{\prime},r^{\prime}).

The first line is the definition of Equation (10) with the robber playing according to distribution ϕ\phi. If she could choose this distribution, she could fall on a distribution ψ(DistN[r])rV\psi\subseteq\left(\mathrm{Dist}_{N[r]}\right)_{r\in V} ensuring her a greater probability of survival, that justifies the second line. Then, we observe that since her optimal strategy is deterministic, it corresponds to a sequence of Dirac distributions and she loses nothing in playing according to ψ(δN[r])rV\psi\subseteq\left(\delta_{N[r]}\right)_{r\in V}. The last line is simply the preceding one rewritten without distributions, as in this case ψr\psi_{r} is concentrated on a single vertex rN[r]r^{\prime}\in N[r]. ∎

3.5 Example: Temporal Cop and Robber game

In graph theory, one can define many random processes to stochastically generate graphs that vary on each time step. One thus obtains a sequence of graphs G0,G1,G_{0},G_{1},\dots that represents the evolution of a network over time. Those graphs are called dynamic graphs, link streams, time-varying graphs or temporal networks, depending on the community, and can model, for example, the destruction of a bridge or of a road that makes it impossible for the players to pass through it.

Suppose kk cops are chasing ll robbers on the sequence G0,G1,G_{0},G_{1},\dots. In order to take into account the variable nature of the underlying structure of a game from Definition 3.1, we can make use of the component SoS_{\mathrm{o}} as a turn counter. Let Gt=(Vt,Et)G_{t}=(V_{t},E_{t}) be the graph generated at time tt,

St=Vtk×𝒫(Et)k×Vtl×{t}S_{t}=V_{t}^{k}\times\mathcal{P}(E_{t})^{k}\times V_{t}^{l}\times\left\{t\right\}

and S=t=1StS=\bigcup_{t=1}^{\infty}S_{t}. Hence, on each time step tt a new graph GtG_{t} is created according to a certain process and the set of states is renewed. The sets of actions can also be redefined. Let 𝒫uGt\mathcal{P}_{u}^{G_{t}} be the set of finite paths on GtG_{t} that begin on vertex uVtu\in V_{t}. The sets of actions are thus :

Acop(c,C,r,t)\displaystyle A_{\scalebox{0.7}{\rm cop}}(c,C,r,t) i=1k𝒫ciGt×𝒫(Et);\displaystyle\subseteq\prod_{i=1}^{k}\mathcal{P}_{c^{i}}^{G_{t}}\times\mathcal{P}(E_{t});
Arob(c,C,r,t)\displaystyle A_{\scalebox{0.7}{\rm rob}}(c,C,r,t) i=1l𝒫riGt.\displaystyle\subseteq\prod_{i=1}^{l}\mathcal{P}_{r^{i}}^{G_{t}}.

Since this example is rather general, we let the transition functions be undefined. We require, however, that the transition functions follow the arrow of time: if stSts_{t}\in S_{t} is a game state at time tt and acopAcop(st)a_{\scalebox{0.7}{\rm cop}}\in A_{\scalebox{0.7}{\rm cop}}(s_{t}) is a cop action, then Tcop(st,acop)(X)>0T_{\scalebox{0.7}{\rm cop}}(s_{t},a_{\scalebox{0.7}{\rm cop}})(X)>0 only if XSt+1X\subseteq S_{t+1}. The same holds for the robbers.

4 Conclusion

This paper presented a relatively simple yet very general model in order to describe games of Cops and Robbers that, notably, may include stochastic aspects. The game 𝒢\mathcal{G} was presented along with a method of resolution in the form of a recursion wnw_{n} in Theorem 2.13. We show in Proposition 2.16 that we can always retrieve an ϵ\epsilon-optimal strategy for 𝒢\mathcal{G} from the recursion wnw_{n} (for large enough nn). Moreover, in Proposition 2.26 we show that if the recursion becomes stationary, stationarity must occur at most at index |S|\left\lvert S\right\rvert. This is a first step in the analysis of the rate of convergence of the recursion.

We have exposed how some classic Cops and Robbers games can be written into our model and extended. Many more games could now be studied as GPCR such as the Firefighting game, under certain conditions, in which a team of firefighters seeks to prevent the nodes of a graph from burning. An interesting notion that is captured by our framework, in Definition 3.1, is that of the surveillance zones of the cops that can be chosen at each step. Thus, we claim a wide variety of games of Cops and Robbers can be solved with the concepts developed in this paper. Furthermore, such a broad exposition of games of Cops and Robbers as ours enables one to study the effects of modifying certain rules, for example on the number of cops or on the speed of players, on the games. That is, one can use Equation (2.13) and probe its values in order to test how modifying these rules affect the ability of the cops to capture the robbers.

We have extended the classic notion of cop number with the pp-cop number, although the question is still open about the behaviour of this function. The expected capture time of the robbers is also of great interest. This function can now be studied on large swaths of Cops and Robbers games. In part, this question can be motivated by a Simard et al.’s paper [31] on the relation between an Operations Research problem and the resolution of a Cop and Drunk Robber game. Specifically, the authors tackled the problem of upper bounding the probability of detecting a hidden and randomly moving object on a graph with a single optimally moving searcher. This problem, being NP-hard  [32], is constrained to be solved in a maximum number of time steps TT\in\mathbb{N}. In particular, it appears that if one could tightly upper bound the expected capture time of a game derived from Definition 2.1, then one could, following the ideas presented in this paper, deduce the optimal number of searchers to send on a mission to rescue the object. Then, if this number were deduced, one could further apply the ideas of this article along with Equation (2.13) in order to help solve this search problem with multiple searchers.

Finally, a last avenue of research that is worth mentioning and that is possibly of most interest to researchers in robotics and operations research concerns the extension of model 2.1 to games of imperfect information. Imperfect information refers to the lack of knowledge of one or both players. Cops and Robbers games of imperfect information thus contain games in which robbers are invisible, that can model problems of graph search such as the one mentioned above. Game theory seems apt to enable the transition from perfect information to imperfect information games with the use of belief states. Such generalization could be paired with the branch and bound method presented in Simard et al. [31] in order to solve more general search problems.

In light of the literature on Cops and Robbers games it appears this paper distances itself from most studies on the subject. Indeed, we do not claim any results on typical Cops and Robbers questions such as the asymptotic behaviour of cp(𝒢)c_{p}(\mathcal{G}) or on dismantling schemes to characterize classes of winning graphs. However, we think that modelling such a wide variety of games opens the door to further studies on Cops and Robbers games that can now be tackled in their generality, which was not possible before. Thus, although our model may not enable one to compute analytical solutions on classical questions of Cops and Robbers games, we have good hope that algorithmic ones will be devised in order to solve more general problems on classes, not of graphs, but of games. In short, it appears that new and promising avenues of research have come to light with the objects presented in this paper and we hope researchers will be driven to tackle those open questions that were unearthed.

Acknowledgement

The authors acknowledge the careful reading of reviewers, which has helped improve the paper presentation. Josée Desharnais and François Laviolette acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC, grant numbers 239294 and 262067).

References

  • [1] M. Aigner and M. Fromme. A game of cops and robbers. Discrete Applied Mathematics, 8(1):1–12, 1984.
  • [2] Anthony Bonato and Gary Macgillivray. Characterizations and algorithms for generalized cops and robbers games. Contributions to Discrete Mathematics, 12(1):1–10, 2017.
  • [3] Anthony Bonato, Dieter Mitsche, Xavier Pérez-Giménez, and Pawel Pralat. A probabilistic version of the game of zombies and survivors on graphs. Theor. Comput. Sci., 655:2–14, 2016.
  • [4] Anthony Bonato and Richard J. Nowakowski. The Game of Cops and Robbers on Graphs. American Mathematical Society, 2011.
  • [5] M Boyer, S El Harti, A El Ouarari, R Ganian, T Gavenciak, G Hahn, C Moldenauer, I Rutter, B Thériault, and M Vatshelle. Cops-and-robbers: remarks and problems. Journal of Combinatorial Mathematics and Combinatorial Computing, 85, 2013.
  • [6] Jérémie Chalopin, Victor Chepoi, Nicolas Nisse, and Yann Vaxès. Cop and Robber Games When the Robber Can Hide and Ride. SIAM Journal on Discrete Mathematics, 25(1):333–359, jan 2011.
  • [7] Nancy E. Clarke and Gary MacGillivray. Characterizations of k-copwin graphs. Discrete Mathematics, 312(8):1421–1425, 2012.
  • [8] Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992.
  • [9] John Horton Conway. On numbers and games. IMA, 1976.
  • [10] Fedor V. Fomin, Petr A. Golovach, Jan Kratochvíl, Nicolas Nisse, and Karol Suchan. Pursuing a fast robber on a graph. Theoretical Computer Science, 411(7-9):1167–1181, feb 2010.
  • [11] Fedor V. Fomin and Dimitrios M. Thilikos. An annotated bibliography on guaranteed graph searching. Theoretical Computer Science, 399(3):236–245, 2008.
  • [12] Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34(3):596–615, 1987.
  • [13] Hugo Gimbert and Florian Horn. Simple stochastic games with few random vertices are easy to solve. In International Conference on Foundations of Software Science and Computational Structures, pages 5–19. Springer, 2008.
  • [14] Andrzej Granas and James Dugundji. Fixed Point Theory. Springer-Verlag, 2003.
  • [15] Geňa Hahn and Gary MacGillivray. A note on k-cop, l-robber games on graphs. Discrete Mathematics, 306(19):2492 – 2497, 2006. Creation and Recreation: A Tribute to the Memory of Claude Berge.
  • [16] Ath. Kehagias. Generalized cops and robbers: A multi-player pursuit game on graphs. Dynamic Games and Applications, Nov 2018.
  • [17] Ath. Kehagias and G. Konstantinidis. Selfish cops and passive robber: Qualitative games. Theoretical Computer Science, 680:25 – 35, 2017.
  • [18] Athanasios Kehagias, Dieter Mitsche, and Paweł Prałat. Cops and invisible robbers: The cost of drunkenness. Theoretical Computer Science, 481:100 – 120, 2013.
  • [19] Athanasios Kehagias and Paweł Prałat. Some remarks on cops and drunk robbers. Theoretical Computer Science, 463:133 – 147, 2012. Special Issue on Theory and Applications of Graph Searching Problems.
  • [20] William B. Kinnersley. Cops and Robbers is EXPTIME-complete. Journal of Combinatorial Theory, Series B, 111:201–220, mar 2015.
  • [21] Natasha Komarov. Expected Capture Time in Variants of Cops & Robbers Games. PhD thesis, Dartmouth College, 2013.
  • [22] Natasha Komarov and Peter Winkler. Capturing the Drunk Robber on a Graph. The Electronic Journal of Combinatorics, 21(3):14, 2014.
  • [23] G. Konstantinidis and A. Kehagias. Selfish cops and active robber: Multi-player pursuit evasion on graphs. Theoretical Computer Science, 780:84 – 102, 2019.
  • [24] G. Konstantinidis and Ath. Kehagias. Simultaneously moving cops and robbers. Theoretical Computer Science, 645:48 – 59, 2016.
  • [25] Héli Marcoux. Jeux de poursuite policier-voleur sur un graphe, le cas du voleur rapide. Mémoire de maîtrise, Université Laval, 2014.
  • [26] Richard Nowakowski and Peter Winkler. Vertex-to-vertex pursuit in a graph. Discrete Mathematics, 43(2-3):235–239, 1983.
  • [27] Martin J Osborne and Ariel Rubinstein. A course in game theory. MIT press, 1994.
  • [28] Martin L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  • [29] Alain Quilliot. Problèmes de jeux, de point fixe, de connectivité et de représentation sur des graphes, des ensembles ordonnés et des hypergraphes. Thèse de doctorat d’état, Université de Paris VI, France, 1983.
  • [30] Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  • [31] Frédéric Simard, Michael Morin, Claude-Guy Quimper, François Laviolette, and Josée Desharnais. Bounding an Optimal Search Path with a Game of Cop and Robber on Graphs. In Principles and Practice of Constraint Programming: 21st International Conference, CP 2015, Cork, Ireland, August 31–September 4, 2015, Proceedings, volume 9255, pages 403–418. Springer Science Business Media, 2015.
  • [32] K E Trummel and J R Weisinger. The Complexity of the Optimal Searcher Path Problem. Operations Research, 34(2):324–327, 1986.
  • [33] J. Wal, van der and J. Wessels. On Markov games. Memorandum COSOR. Technische Hogeschool Eindhoven, 1975.

Appendix A Constructing a GPCR game as a Simple Stochastic Game

The following argument is inspired by the SSG exposition of Gimbert and Horn [13]. A simple stochastic game is a tuple (V,Vmax,Vmin,VR,E,t,p)(V,V_{\max},V_{\min},V_{R},E,t,p), where (V,E)(V,E) describes a directed graph GG and Vmax,Vmin,VRV_{\max},V_{\min},V_{R} form a partition of VV. There is a special vertex tVt\in V, called the target, and pp is a probability function such that for every vertex wVw\in V and vVRv\in V_{R}, p(wv)p(w\mid v) is the probability of transiting from vv to ww. There are two players, max\max and min\min, and the game is played with perfect information. The set VmaxV_{\max} contains those nodes controlled by player max\max, i.e. where this player is next to play, and VminV_{\min} those nodes controlled by player min\min. The set of edges EE is defined by the possible moves in the game. The game proceeds as follows: imagine a token is placed on some initial vertex iVmaxVmini\in V_{\max}\cup V_{\min}, then the player who is next to play moves the token along an edge, either the token is again in some vertex of VmaxVminV_{\max}\cup V_{\min} where one player has to make a move, or the token is now on some vertex vv of VRV_{R}. When the token is on vv, an outneighbour of vv is chosen randomly according to the distribution p(v)p(\cdot\mid v), where the token is moved. The game ends if tt is ever encountered, in which case max\max wins, otherwise it continues indefinitely and the other player wins.

Following Gimbert and Horn, we define a play as an infinite sequence of vertices v0v1v_{0}v_{1}\dots of GG such that (vi,vi+1)E(v_{i},v_{i+1})\in E for all ii and a finite play (what we called a history) as a finite prefix of a play. A strategy for max\max is a function σ:VVmaxV\sigma:V^{*}V_{\max}\rightarrow V and a strategy for min\min is a function τ:VVminV\tau:V^{*}V_{\min}\rightarrow V, where VV^{*} is the set of finite plays. We suppose that for each finite play (v0vn)(v_{0}\dots v_{n}) and vertex vVmaxv\in V_{\max}, (v,σ(v0vnv)E(v,\sigma(v_{0}\dots v_{n}v)\in E and similarly for τ\tau. Note that such strategies are deterministic, which is without loss of generality. We write for convenience Γmax\Gamma_{\max} and Γmin\Gamma_{\min} for the sets of max\max (resp. min\min) strategies. Now, for any node vVv\in V, we can define the value of vv for max\max (resp. min\min) as the probability the target node is reached from that node. If p(tσ,τ,v)p(t\mid\sigma,\tau,v) is the probability that tt is reached from vv under strategies σ\sigma and τ\tau, then we let

val¯(v)\displaystyle\underline{val}(v) :=supσΓmaxinfτΓminp(tσ,τ,v),\displaystyle:=\sup_{\sigma\in\Gamma_{\max}}\inf_{\tau\in\Gamma_{\min}}p(t\mid\sigma,\tau,v),
val¯(v)\displaystyle\overline{val}(v) :=infτΓminsupσΓmaxp(tσ,τ,v).\displaystyle:=\inf_{\tau\in\Gamma_{\min}}\sup_{\sigma\in\Gamma_{\max}}p(t\mid\sigma,\tau,v).

The following theorem [8, 13, 30] is well known about simple stochastic games.

Theorem A.1.

In any simple stochastic game and from any vertex vv, val¯(v)=val¯(v)\underline{val}(v)=\overline{val}(v) and we write val(v):=val¯(v)val(v):=\underline{val}(v). Furthermore, there exists deterministic and memoryless strategies for players max\max and min\min that achieve the value val(v)val(v).

We write a GPCR game 𝒢\mathcal{G} as a simple stochastic game by describing a directed graph G=(V=VcopVrobVR{t},E)G=(V=V_{\mathrm{cop}}\cup V_{\mathrm{rob}}\cup V_{R}\cup\left\{t\right\},E), where VcopV_{\mathrm{cop}} is the set of vertices controlled by the cops, VrobV_{\mathrm{rob}} the set of vertices controlled by the robbers, VRV_{R} is the set of random vertices and tt is the target vertex for the cops. The set of edges EE is induced by the transition functions TcopT_{\scalebox{0.7}{\rm cop}} and TrobT_{\scalebox{0.7}{\rm rob}}. If there exists a play in 𝒢\mathcal{G} with a subsequence sassas^{\prime}, then we add an edge from ss to some node vVRv\in V_{R}, labelled aa. We add an edge from aa to ss^{\prime} weighted either by Tcop(s,a,s)T_{\scalebox{0.7}{\rm cop}}(s,a,s^{\prime}) or Trob(s,a,s)T_{\scalebox{0.7}{\rm rob}}(s,a,s^{\prime}) depending on whether aa was played by the cops or by the robbers. We assume that vertex tt holds all final states of FF, thus all transitions of the form Tcop(s,a,f)>0T_{\scalebox{0.7}{\rm cop}}(s,a,f)>0 or Trob(s,a,f)>0T_{\scalebox{0.7}{\rm rob}}(s,a,f)>0 for any state ss, action aa and final state ff, induce edges from some vertex of VRV_{R} to tt. Now, in this game the cops win if and only if they can reach tt from the initial vertex i0i_{0}. Thus, this is a simple stochastic game that corresponds to 𝒢\mathcal{G}. We deduce from Theorem A.1 that val(v)val(v) exists and it is the probability that the cops capture the robbers from a vertex, or state, vv in 𝒢\mathcal{G}. Since this SSG has a value, so does 𝒢\mathcal{G} and this value is the probability just mentioned.