This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\FAILED\FAILED

Asymptotic Security using Bayesian Defense Mechanism with Application to Cyber Deception

Hampei Sasahara    \IEEEmembershipMember, IEEE    Henrik Sandberg    \IEEEmembershipSenior Member, IEEE H. Sasahara is with the Department of Systems and Control Engineering, Tokyo Institute of Technology, Tokyo, 152-8552 Japan e-mail: sasahara@sc.e.titech.ac.jp.H. Sandberg is with the Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, SE-100 44 Sweden e-mail: hsan@kth.se.This work was supported in part by JSPS KAKENHI Grant Number 22K21272, the Swedish Research Council grant 2016-00861, and Digital Futures (projects DEMOCRITUS and RoSE).©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract

This paper addresses the question whether model knowledge can guide a defender to appropriate decisions, or not, when an attacker intrudes into control systems. The model-based defense scheme considered in this study, namely Bayesian defense mechanism, chooses reasonable reactions through observation of the system’s behavior using models of the system’s stochastic dynamics, the vulnerability to be exploited, and the attacker’s objective. On the other hand, rational attackers take deceptive strategies for misleading the defender into making inappropriate decisions. In this paper, their dynamic decision making is formulated as a stochastic signaling game. It is shown that the belief of the true scenario has a limit in a stochastic sense at an equilibrium based on martingale analysis. This fact implies that there are only two possible cases: the defender asymptotically detects the attack with a firm belief, or the attacker takes actions such that the system’s behavior becomes nominal after a finite number of time steps. Consequently, if different scenarios result in different stochastic behaviors, the Bayesian defense mechanism guarantees the system to be secure in an asymptotic manner provided that effective countermeasures are implemented. As an application of the finding, a defensive deception utilizing asymmetric recognition of vulnerabilities exploited by the attacker is analyzed. It is shown that the attacker possibly withdraws even if the defender is unaware of the exploited vulnerabilities, as long as the defender’s unawareness is concealed by the defensive deception.

{IEEEkeywords}

Bayesian methods, game theory, intrusion detection, security, stochastic systems.

1 Introduction

\IEEEPARstart

Societal monetary loss from cyber crime is estimated to be about a thousand billion USD per year presently, and even worse, a rising trend can be observed [1]. Another trend is that not only information systems but also control systems, which are typically governed by physical laws, are exposed to cyber threats as demonstrated by recent incidents [2, 3, 4, 5]. Deception is a key notion to predict the consequence of incidents. Rational attackers take deceptive strategies, i.e., the attacker tries to conceal her existence and even mislead the defender into taking inappropriate decisions. An example of deception is replay attacks, which hijacks sensors of the plant, eavesdrops the nominal data transmitted when the system is operated under normal conditions, and replays the observed nominal data during the execution of another damaging attack. A replay attack was executed in the Stuxnet incident, and it was an essential factor leading to serious damage in the targeted plant [6]. The incident suggests that prevention of deception is a fundamental requirement for secure system design.

Assuming the situation where an attacker might intrude into a control system where a defense mechanism is implemented, this paper addresses the following question: Can model knowledge guide the defender to appropriate decisions against attacker’s deceptive strategies? Specifically, we consider the case where the stochastic model of the control system, the vulnerability to be exploited, and the objective of the attacker are known. The setting naturally leads to Bayesian defense mechanisms, which monitor the system’s behavior and form a belief on the existence of the attacker using the model. If the system’s behavior is inconsistent with the nominal one, the belief increases owing to Bayes’ rule. When the belief is strong enough, the Bayesian defense mechanism proactively carries out a proper reaction. On the other hand, we also suppose a powerful attacker who knows the model and the defense scheme to be implemented. The attacker aims at achieving her objective while avoiding being detected by deceiving the defender.

For mathematical analysis, we formulate the decision making as a dynamic game with incomplete information. More specifically, we refer to the game as a stochastic signaling game, because it is a stochastic game [7] in the sense that the system’s dynamics is given as a Markov decision process (MDP) governed by two players and it is also a signaling game [8] in the sense that one player’s type is unknown to the opponent. In this game, the attacker strategically chooses harmful actions while avoiding being detected, while the defender, namely, the Bayesian defense mechanism, chooses appropriate counteractions according to her belief.

Based on the game-theoretic formulation, we find that model knowledge can always lead the defender to appropriate decisions in an asymptotic sense as long as the system’s dynamics admits no stealthy attacks. More specifically, there are only two possible cases: one is that the defender asymptotically forms a firm belief on the existence of an attacker and the other is that the attacker takes harmless actions after finite time such that the system converges to nominal behavior. This finding leads to the conclusion that the Bayesian defense mechanism guarantees the system to be secure in an asymptotic manner.

The analysis means that the defender always wins in an asymptotic manner when the stochastic model of the system is available and the vulnerability exploited for the intrusion is known and modeled. However, in practice, it is hard to be aware of all possible vulnerabilities in advance. As an application of the finding above, we consider defensive deception using bluffing that utilizes asymmetric recognitions between the attacker and the defender. Specifically, we suppose that, the defender is unaware of the exploited vulnerability but the attacker is unaware of the defender’s unawareness. If the state of the system does not possess any information about the defender’s recognition on the vulnerability, the attacker cannot identify whether the defender is aware of the vulnerability, or not. The result obtained in the former part suggests that the attacker may possibly withdraw if the defender’s reactions affect only the attacker’s utility without influence to the system’s behavior. The difficulty of the analysis is that standard incomplete information games, which assume common prior, cannot describe this situation. The common prior implicitly assumes that the attacker is aware of the defender’s unawareness. To overcome the difficulty, we employ the Mertens-Zamir model, which can represent incomplete information games without common prior assumption, using the notion of belief hierarchy [9, 10]. Based on this setting, we show, in a formal manner, that the defensive deception effectively works when the attacker strongly believes that the defender is aware of the vulnerability.

Related Work

Model-based security analysis helps the system designer to prioritize security investments [11]. Attack graphs [12] and attack trees [13] are basic models of vulnerabilities, attacks, and consequences. Incorporating defensive actions into the graphical representation induces defense trees [14]. For dynamic models, attack countermeasure trees, partially observable MDP, and Bayesian network model have been used [15, 16, 17]. Those probabilistic models naturally lead to Bayesian defense mechanisms, such as Bayesian intrusion detection [18, 19], Bayesian intrusion response [20], and Bayesian security risk management [21]. Meanwhile, the model of the dynamical system to be protected is also used for control system security [22, 23]. For example, identifying existence of stealthy attacks and removing the vulnerability require the dynamical model [24, 25], and attack detection performance can be enhanced by model knowledge [26]. Our Bayesian defense mechanisms can be interpreted as a generalization of those approaches. This work reveals a fundamental property of such commonly used model-based defense schemes.

Game theory is a standard approach to modeling the decision making in cyber security, where there inevitably arises a need to address strategic interactions between the attacker and the defender [27, 28]. In particular, games with incomplete information play a crucial role in deceptive situations [29, 30, 31, 32]. The modeling in this study follows the signaling game framework in [33, 34]. Our main concern is especially on asymptotic phenomena in the dynamic deception and effectiveness of model knowledge.

Our finding is based on analysis of an asymptotic behavior of Bayesian inference. The convergence property of Bayesian inference on the true parameter is referred to as Bayesian consistency, which has been investigated mainly in the context of statistics [35, 36]. However, those existing results are basically applicable only to independent and identically distributed (i.i.d.) samples because the discussion mostly relies on the strong law of large numbers (SLLN). Although there is an extension to Markov chains [37], the observable variable in our work is not Markov. Indeed, sophisticated attackers can choose strategies such that the states at all steps are correlated with the entire previous trajectory. Thus, existing results for Bayesian consistency cannot be applied to our problem in a straightforward manner.

Preliminary versions of this work have been presented in [38, 39], but they made the claim of Theorem 2 as an assumption rather than proving it. Moreover, they did not include rigorous proofs of the claims in Section 3 and analysis of the bluffing proposed in Section 4.

Organization and Preliminaries

In Section 2, we present a motivating example of water supply networks, and subsequently, formulate the decision making as a stochastic signaling game. Section 3 analyzes the consequence of the formulated game and shows that Bayesian defense mechanisms can achieve asymptotic security of the system to be protected. In Section 4, we analyze a defensive deception that utilizes asymmetric recognition as an application of the finding of Section 3. The game of interest is reformulated using the Mertens-Zamir model. It is shown that the attacker possibly stops the execution even if the defender is unaware of the exploited vulnerabilities, as long as the defender’s belief is concealed. Section 5 verifies the theoretical results through numerical simulation. Finally, Section 6 concludes and summarizes the paper.

Let \mathbb{N}, +{\mathbb{Z}_{+}}, and \mathbb{R} be the sets of natural numbers, nonnegative integers, and real numbers, respectively. The kk-ary Cartesian power of the set 𝒳\mathcal{X} is denoted by 𝒳k.\mathcal{X}^{k}. The tuple (x0,,xk)(x_{0},\ldots,x_{k}) is denoted by x0:kx_{0:k}. The cardinality of a set 𝒳\mathcal{X} is denoted by |𝒳||\mathcal{X}|. For a set 𝒳,\mathcal{X}, the Kronecker delta denoted by δ:𝒳×𝒳{0,1}\delta:\mathcal{X}\times\mathcal{X}\to\{0,1\} is defined by δ(x,y)=1\delta(x,y)=1 if x=yx=y and δ(x,y)=0\delta(x,y)=0 otherwise. The σ\sigma-algebra generated by a random variable XX is denoted by σ(X)\sigma(X). For a sequence of events EkE_{k} for kk\in\mathbb{N}, the supremum set N=1k=NEk\cap_{N=1}^{\infty}\cup_{k=N}^{\infty}E_{k}, namely, the event where EkE_{k} occurs infinitely often, is denoted by {Eki.o.}\{E_{k}\ {\rm i.o.}\}. Jensen’s inequality, which is often applied in this paper, is given as follows: For a real convex function φ\varphi and a finite set 𝒳\mathcal{X}, the inequality

x𝒳p(x)φ(a(x))φ(x𝒳p(x)a(x))\textstyle{\sum_{x\in\mathcal{X}}p(x)\varphi(a(x))\geq\varphi\left(\sum_{x\in\mathcal{X}}p(x)a(x)\right)} (1)

holds where a:𝒳a:\mathcal{X}\to\mathbb{R} and p:𝒳[0,1]p:\mathcal{X}\to[0,1] that satisfies the equation x𝒳p(x)=1\sum_{x\in\mathcal{X}}p(x)=1. The inequality is reversed if φ\varphi is concave. The generalized Borel-Cantelli’s second lemma is given as follows [40, Theorem 4.3.4]: Let k\mathcal{F}_{k} for k+k\in{\mathbb{Z}_{+}} be a filtration of a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) with 0:={,Ω}\mathcal{F}_{0}:=\{\emptyset,\Omega\} and let EkE_{k} for k+k\in{\mathbb{Z}_{+}} be a sequence of events with Ekk+1E_{k}\in\mathcal{F}_{k+1}. Then

{Eki.o.}={ωΩ:k=0(Ek|k)(ω)=}.\textstyle{\{E_{k}\ {\rm i.o.}\}=\left\{\omega\in\Omega:\sum_{k=0}^{\infty}\mathbb{P}(E_{k}|\mathcal{F}_{k})(\omega)=\infty\right\}.} (2)

The appendix contains the proofs of the claims made in the paper.

2 Modeling using Stochastic Signaling Games

2.1 Motivating Example

As a motivating example, we consider water distribution networks (WDNs), which supply drinking water of suitable quality to customers. Because of their indispensability to our life, WDNs are an attractive target for adversaries and expose their architecture to cyber-physical attacks [41]. In particular, we treat the water tank system illustrated by Fig. 1, where a tank is connected to a reservoir within a WDN. The amount of the water in the tank varies due to usage for drinking and flow between the external network. Thus the tank system is needed to be properly controlled through actuation of the pump and the valve to keep the water amount within a desired range [42]. A programmable logic controller (PLC) transmits on/off control signals to the pump and the valve monitoring the state, namely, the water level of the tank. The dynamics is modeled as a MDP, where the state space and the action space are given by quantized water levels and finite control actions. Interaction to the external network is modeled as the randomness in the process.

Refer to caption
Figure 1: Motivating example: water tank system connected to a reservoir within a water distribution network. The programmable logic controller (PLC) transmits on/off control signals to the pump and the valve monitoring the state, the water level of the tank. In the scenario, an adversarial software possibly intrudes into the PLC and then the infected PLC tries to cause overflow by sending inappropriate control signals without being detected. A Bayesian defense mechanism, which utilizes the data of the monitored state and forms her belief on existence of the attacker based on the system model, is also equipped to deal with the attack.

We here suppose an attack scenario considered in [43]. The adversary succeeds to hijack the PLC and can directly manipulate its control logic. Such an intrusion can be carried out by stealthy and evasive maneuvers in advanced persistent threats [44]. The objective of the attack is to damage the system by causing water overflow through inappropriate control signals without being detected. To deal with this attack, we consider a Bayesian defense mechanism, which utilizes the data of the monitored state and forms her belief on existence of the attacker based on the system model. The Bayesian defense mechanism chooses a proper reaction by identifying if the system is under attack through an observation of the state. If the system’s behavior is highly suspicious, for example, the defense mechanism takes an aggressive reaction such as log analysis, dispatch of operators, or emergency shutdown.

The defender’s belief on the existence of an attacker plays a key role to analyze the consequence of the threat. When the attacker naively executes an attack, the system’s behavior becomes different from the one of the normal operation and accordingly the belief increases. On the other hand, if the attacker chooses sophisticated attacks that deceive the defender, the belief may decrease. Our main interest in this study is to investigate the defense capability achieved by the Bayesian defense mechanism.

2.2 Modeling using Stochastic Signaling Game

We introduce the general description based on dynamic games with incomplete information. In particular, we refer to the game as a stochastic signaling game where the system’s dynamics is given as an MDP and the type of a player is unknown to the opponent.

The system to be protected with a Bayesian defense mechanism is depicted in Fig. 2. The system is modeled by a finite MDP governed by two players as in standard stochastic games. Formally, the MDP considered in this paper is given by the tuple :=(𝒳,𝒜,,P,P0)\mathcal{M}:=(\mathcal{X},\mathcal{A},\mathcal{R},P,P_{0}) where 𝒳\mathcal{X} is a finite state space, 𝒜\mathcal{A} and \mathcal{R} are finite action spaces, P:𝒳×𝒳×𝒜×[0,1]P:\mathcal{X}\times\mathcal{X}\times\mathcal{A}\times\mathcal{R}\to[0,1] is a transition probability, and P0:𝒳[0,1]P_{0}:\mathcal{X}\to[0,1] is the probability distribution of the initial state. The state at the kkth step is denoted by xk𝒳x_{k}\in\mathcal{X}. There is an agent who can alter the system through an action ak𝒜a_{k}\in\mathcal{A} for k+k\in{\mathbb{Z}_{+}}. We refer to the agent as sender as in standard signaling games. Based on the measured output, the Bayesian defense mechanism, called a receiver, chooses an action rkr_{k}\in\mathcal{R} at each time step. We henceforth refer to rkr_{k} as a reaction for emphasizing that rkr_{k} denotes a counteraction against potentially malicious attacks. The system dynamics is given by PP, where the transition probability from xx to xx^{\prime} with aa and rr is denoted by P(x|x,a,r)P(x^{\prime}|x,a,r). To eliminate the possibility of trivial stealthy attacks, we assume that the system’s behavior varies in a stochastic sense when different actions are taken.

Assumption 1

For any x𝒳x\in\mathcal{X} and rr\in\mathcal{R}, there exists x𝒳x^{\prime}\in\mathcal{X} such that

P(x|x,a,r)P(x|x,a,r)P(x^{\prime}|x,a,r)\neq P(x^{\prime}|x,a^{\prime},r) (3)

for different actions aaa\neq a^{\prime}.

Refer to caption
Figure 2: Block diagram of the system to be protected using the Bayesian defense mechanism. The system is governed by actions and reactions, which are decided by the sender and the receiver, respectively. The sender type θb\theta_{\rm b} means that the system is normally operated. The other type θm\theta_{\rm m} means that there exists an attacker who executes malicious actions. The receiver is the Bayesian defense mechanism that forms her belief on the existence of an attacker utilizing the measured data and chooses reactions based on the belief.

Next, we determine the class of the decision rules. Let θΘ\theta\in{\it\Theta} denote the type of the sender. For simplicity, the type is assumed to be binary, i.e., Θ={θb,θm},{\it\Theta}=\{\theta_{\rm b},\theta_{\rm m}\}, where θb\theta_{\rm b} and θm\theta_{\rm m} correspond to benign and malicious senders, respectively. The types θb\theta_{\rm b} and θm\theta_{\rm m} describe the situations where there does not and does exist an adversary, respectively. The true type θ\theta is known to the sender, but unknown to the receiver. Let s¯s:=(s¯ks)k+\bar{s}^{\rm s}:=(\bar{s}^{\rm s}_{k})_{k\in{\mathbb{Z}_{+}}} and s¯r:=(s¯kr)k+\bar{s}^{\rm r}:=(\bar{s}^{\rm r}_{k})_{k\in{\mathbb{Z}_{+}}} denote the sender’s and receiver’s pure strategy, respectively. It is assumed that the receiver’s available information about the sender type is only the state, i.e., she cannot observe her instantaneous utility, defined below, nor the sender’s action. Similarly, it is assumed that the sender can observe only the state and her action. The strategies at the kkth step with the available information are given by s¯ks:Θ×ks𝒜,\bar{s}^{\rm s}_{k}:{\it\Theta}\times\mathcal{H}^{\rm s}_{k}\to\mathcal{A}, and s¯kr:kr\bar{s}^{\rm r}_{k}:\mathcal{H}^{\rm r}_{k}\to\mathcal{R} where hksksh^{\rm s}_{k}\in\mathcal{H}^{\rm s}_{k} and hkrkrh^{\rm r}_{k}\in\mathcal{H}^{\rm r}_{k} are histories at the kkth step given by hks=(x0:k,a0:k1)h^{\rm s}_{k}=(x_{0:k},a_{0:k-1}) and hkr=(x0:k,r0:k1).h^{\rm r}_{k}=(x_{0:k},r_{0:k-1}). Note that the resulting state trajectory is not Markov since the strategies depend on the entire history. Because we consider pure strategies, it suffices to consider the state-history dependent strategies sks:Θ×𝒳k+1𝒜s^{\rm s}_{k}:{\it\Theta}\times\mathcal{X}^{k+1}\to\mathcal{A} and skr:𝒳k+1s^{\rm r}_{k}:\mathcal{X}^{k+1}\to\mathcal{R}, recursively defined by

sks(θ,x0:k):=s¯ks(θ,x0:k,s0:k1s(x0:k1)),skr(x0:k):=s¯kr(x0:k,s0:k1r(x0:k1)).\begin{array}[]{l}s^{\rm s}_{k}(\theta,x_{0:k}):=\bar{s}^{\rm s}_{k}(\theta,x_{0:k},s^{\rm s}_{0:k-1}(x_{0:k-1})),\\ s^{\rm r}_{k}(x_{0:k}):=\bar{s}^{\rm r}_{k}(x_{0:k},s^{\rm r}_{0:k-1}(x_{0:k-1})).\end{array} (4)

The strategy profile is denoted by s:=(ss,sr).s:=(s^{\rm s},s^{\rm r}). The sender’s and receiver’s admissible strategy sets are denoted by 𝒮s\mathcal{S}^{\rm s} and 𝒮r\mathcal{S}^{\rm r}, respectively. The set of admissible strategy profiles is denoted by 𝒮:=𝒮s×𝒮r\mathcal{S}:=\mathcal{S}^{\rm s}\times\mathcal{S}^{\rm r}. Note that, although we do not specify 𝒮\mathcal{S} here, it can be taken to be any set of state-history dependent strategies. While we consider a general strategy set in Sec. 3, we impose a constraint on 𝒮\mathcal{S} in Sec. 4.

Once a strategy profile is fixed, the stochastic property of the system is induced. Construct the canonical measurable space (Ω,)(\Omega,\mathcal{F}) of the MDP with the sender type where Ω:=Θ×Πk=0(𝒳×𝒜×)\Omega:={\it\Theta}\times\Pi_{k=0}^{\infty}(\mathcal{X}\times\mathcal{A}\times\mathcal{R}) and \mathcal{F} is its product σ\sigma-algebra [45, Chapter 2]. We denote ω=(θ,(x0,a0,r0),(x1,a1,r1),)Ω\omega=(\theta,(x_{0},a_{0},r_{0}),(x_{1},a_{1},r_{1}),\ldots)\in\Omega. The random variables Θ,Xk,Ak,\Theta,X_{k},A_{k}, and RkR_{k} are defined on the measurable space (Ω,)(\Omega,\mathcal{F}) by the projections of ω\omega such that Θ(ω):=θ,Xk(ω):=xk,\Theta(\omega):=\theta,X_{k}(\omega):=x_{k}, Ak(ω):=ak,A_{k}(\omega):=a_{k}, Rk(ω):=rk.R_{k}(\omega):=r_{k}. The probability measure on (Ω,)(\Omega,\mathcal{F}), induced by ss, is denoted by s\mathbb{P}^{s}, which satisfies

{s(X0=x0)=P0(x0),s(Ak=ak|Θ=θ,X0:k=x0:k)=δ(ak,sks(θ,x0:k)),s(Rk=rk|X0:k=x0:k)=δ(rk,skr(x0:k)),s(Xk+1=xk+1|X0:k=x0:k,Ak=ak,Rk=rk)=P(xk+1|xk,ak,rk),s(Θ=θ)=π0(θ)\left\{\begin{array}[]{l}\mathbb{P}^{s}(X_{0}=x_{0})=P_{0}(x_{0}),\\ \mathbb{P}^{s}(A_{k}=a_{k}|\Theta=\theta,X_{0:k}=x_{0:k})=\delta(a_{k},s^{\rm s}_{k}(\theta,x_{0:k})),\\ \mathbb{P}^{s}(R_{k}=r_{k}|X_{0:k}=x_{0:k})=\delta(r_{k},s^{\rm r}_{k}(x_{0:k})),\\ \mathbb{P}^{s}(X_{k+1}=x_{k+1}|X_{0:k}=x_{0:k},A_{k}=a_{k},R_{k}=r_{k})\\ \quad=P(x_{k+1}|x_{k},a_{k},r_{k}),\\ \mathbb{P}^{s}(\Theta=\theta)=\pi_{0}(\theta)\end{array}\right. (5)

for any k+k\in{\mathbb{Z}_{+}} with the initial distribution of the sender type π0:Θ[0,1]\pi_{0}:{\it\Theta}\to[0,1]. We denote the conditional probability s(|Θ=θ)\mathbb{P}^{s}(\cdot|\Theta=\theta) by θs\mathbb{P}^{s}_{\theta}. To simplify the notation, we denote the conditional probability mass function with type θ\theta by

pθs(xk+1|x0:k):=θs(Xk+1=xk+1|X0=x0,,Xk=xk).p^{s}_{\theta}(x_{k+1}|x_{0:k}):=\mathbb{P}^{s}_{\theta}(X_{k+1}=x_{k+1}|X_{0}=x_{0},\ldots,X_{k}=x_{k}). (6)

The expectation with respect to s\mathbb{P}^{s} is denoted by 𝔼s\mathbb{E}^{s}.

We introduce each player’s belief on the uncertain variables next. The receiver’s belief at the kkth step is given by

πkr(θ,a0:k1|x0:k,r0:k1):=s(Θ=θ,A0:k1=a0:k1|X0:k=x0:k,R0:k1=r0:k1)\begin{array}[]{l}\pi^{\rm r}_{k}(\theta,a_{0:k-1}|x_{0:k},r_{0:k-1})\\ :=\mathbb{P}^{s}(\Theta=\theta,A_{0:k-1}=a_{0:k-1}|X_{0:k}=x_{0:k},R_{0:k-1}=r_{0:k-1})\end{array} (7)

for k+k\in{\mathbb{Z}_{+}}. The belief can be recursively computed by Bayes’ rule

πk+1r(θ,a0:k|x0:k+1,r0:k)=δ(ak,sks(θ,x0:k))×P(xk+1|xk,sks(θ,x0:k),rk)πkr(θ,a0:k1|x0:k,r0:k1)ϕΘP(xk+1|xk,sks(ϕ,x0:k),rk)πkr(ϕ,a0:k1|x0:k,r0:k1)\begin{array}[]{l}\pi^{\rm r}_{k+1}(\theta,a_{0:k}|x_{0:k+1},r_{0:k})=\delta(a_{k},s^{\rm s}_{k}(\theta,x_{0:k}))\\ \times\dfrac{P(x_{k+1}|x_{k},s^{\rm s}_{k}(\theta,x_{0:k}),r_{k})\pi^{\rm r}_{k}(\theta,a_{0:k-1}|x_{0:k},r_{0:k-1})}{\sum_{\phi\in{\it\Theta}}P(x_{k+1}|x_{k},s^{\rm s}_{k}(\phi,x_{0:k}),r_{k})\pi^{\rm r}_{k}(\phi,a_{0:k-1}|x_{0:k},r_{0:k-1})}\end{array} (8)

when the denominator is nonzero. To simplify notation, we introduce the receiver’s belief only of the sender type:

πkr(θ|x0:k):=πkr(θ,s0:k1s(θ,x0:k1)|x0:k,s0:k1r(x0:k1)),\pi^{\rm r}_{k}(\theta|x_{0:k}):=\pi^{\rm r}_{k}(\theta,s^{\rm s}_{0:k-1}(\theta,x_{0:k-1})|x_{0:k},s^{\rm r}_{0:k-1}(x_{0:k-1})), (9)

which follows Bayes’ rule

πk+1r(θ|x0:k+1)=P(xk+1|xk,sks(θ,x0:k),skr(x0:k))πkr(θ|x0:k)ϕΘP(xk+1|xk,sks(ϕ,x0:k),skr(x0:k))πkr(ϕ|x0:k).\begin{array}[]{l}\pi^{\rm r}_{k+1}(\theta|x_{0:k+1})\\ =\dfrac{P(x_{k+1}|x_{k},s^{\rm s}_{k}(\theta,x_{0:k}),s^{\rm r}_{k}(x_{0:k}))\pi^{\rm r}_{k}(\theta|x_{0:k})}{\sum_{\phi\in{\it\Theta}}P(x_{k+1}|x_{k},s^{\rm s}_{k}(\phi,x_{0:k}),s^{\rm r}_{k}(x_{0:k}))\pi^{\rm r}_{k}(\phi|x_{0:k})}.\end{array} (10)

The sender’s belief can similarly be defined and is denoted by πks(r0:k1|θ,x0:k,a0:k1)\pi^{\rm s}_{k}(r_{0:k-1}|\theta,x_{0:k},a_{0:k-1}).

In Sec. 3, the initial beliefs are assumed to be known to both players, i.e., we make the common prior assumption. Since we consider pure strategies, r0:k1r_{0:k-1} is uniquely determined by x0:k1x_{0:k-1} once the strategy is fixed. Hence, the sender’s belief does not appear explicitly in Sec. 3. On the other hand, in Sec. 4, we consider the case where the initial belief is unknown to the sender, modeling the possibility of bluffing.

Let Us:Θ×𝒳×𝒜×U^{\rm s}:{\it\Theta}\times\mathcal{X}\times\mathcal{A}\times\mathcal{R}\to\mathbb{R} be the sender’s instantaneous utility. For a given strategy profile s𝒮s\in\mathcal{S} and type θΘ\theta\in{\it\Theta}, the sender’s expected average utility at the kkth step with the horizon length TT is given by

U¯k,Ts(sk:k+T|θ,x0:k):=𝔼s[1T+1τ=kk+TUs(Θ,Xτ,sτs(Θ,X0:τ),sτr(X0:τ))|θ,x0:k].\begin{array}[]{l}\bar{U}^{\rm s}_{k,T}(s_{k:k+T}|\theta,x_{0:k})\\ \displaystyle{:=\mathbb{E}^{s}\left[\dfrac{1}{T+1}\left.\sum_{\tau=k}^{k+T}U^{\rm s}(\Theta,X_{\tau},s^{\rm s}_{\tau}(\Theta,X_{0:\tau}),s^{\rm r}_{\tau}(X_{0:\tau}))\right|\theta,x_{0:k}\right].}\end{array} (11)

Similarly, with the receiver’s instantaneous utility given by Ur:Θ×𝒳×𝒜×U^{\rm r}:{\it\Theta}\times\mathcal{X}\times\mathcal{A}\times\mathcal{R}\to\mathbb{R}, the receiver’s expected average utility at the kkth step with the horizon length TT is given by

U¯k,Tr(sk:k+T|x0:k):=𝔼s[1T+1τ=kk+TUr(Θ,Xτ,sτs(Θ,X0:τ),sτr(X0:τ))|x0:k].\begin{array}[]{l}\bar{U}^{{\rm r}}_{k,T}(s_{k:k+T}|x_{0:k})\\ \displaystyle{:=\mathbb{E}^{s}\left[\dfrac{1}{T+1}\left.\sum_{\tau=k}^{k+T}U^{\rm r}(\Theta,X_{\tau},s^{\rm s}_{\tau}(\Theta,X_{0:\tau}),s^{\rm r}_{\tau}(X_{0:\tau}))\right|x_{0:k}\right].}\end{array} (12)

We denote the limits by (U¯ks,U¯kr):=limT(U¯k,Ts,U¯k,Tr)(\bar{U}^{{\rm s}}_{k},\bar{U}^{{\rm r}}_{k}):=\lim_{T\to\infty}(\bar{U}^{\rm s}_{k,T},\bar{U}^{{\rm r}}_{k,T}) assuming they exist. Under this notation, the strategy profile s=(ss,sr)s=(s^{\rm s},s^{\rm r}) is said to be a perfect Bayesian equilibrium (PBE) if

{sk:sBRks(sr:r|θ,x0:k),θΘ,sk:rBRkr(sk:s|x0:k)\left\{\begin{array}[]{l}s^{\rm s}_{k:\infty}\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{r:\infty}|\theta,x_{0:k}),\quad\forall\theta\in{\it\Theta},\\ s^{\rm r}_{k:\infty}\in{\rm BR}^{\rm r}_{k}(s^{\rm s}_{k:\infty}|x_{0:k})\end{array}\right. (13)

for any k+k\in{\mathbb{Z}_{+}} and x0:k𝒳k+1x_{0:k}\in\mathcal{X}^{k+1} where BRks{\rm BR}^{\rm s}_{k} and BRkr{\rm BR}^{\rm r}_{k} are best responses defined by

BRks(sk:r|θ,x0:k):=argmaxs~k:s𝒮k:sU¯ks((s~k:s,sk:r)|θ,x0:k),BRkr(sk:s|x0:k):=argmaxs~k:r𝒮k:rU¯kr((sk:s,s~k:r)|x0:k).\begin{array}[]{l}{\rm BR}^{\rm s}_{k}(s^{\rm r}_{k:\infty}|\theta,x_{0:k}):=\displaystyle{\operatorname*{arg\,max}_{\tilde{s}^{\rm s}_{k:\infty}\in\mathcal{S}^{\rm s}_{k:\infty}}\ \bar{U}^{{\rm s}}_{k}((\tilde{s}^{\rm s}_{k:\infty},s^{\rm r}_{k:\infty})|\theta,x_{0:k})},\\ {\rm BR}^{\rm r}_{k}(s^{\rm s}_{k:\infty}|x_{0:k}):=\displaystyle{\operatorname*{arg\,max}_{\tilde{s}^{\rm r}_{k:\infty}\in\mathcal{S}^{\rm r}_{k:\infty}}\ \bar{U}^{{\rm r}}_{k}((s^{\rm s}_{k:\infty},\tilde{s}^{\rm r}_{k:\infty})|x_{0:k})}.\end{array} (14)

Note that, our analysis can be extended to the case of general objective functions rather than expected average utilities as long as the adversary with the utilities avoids being detected, which is formally stated in Definition 1 below.

We define the game formulated above by

𝒢1:=(,𝒮,U,Θ,π0),\mathcal{G}_{1}:=(\mathcal{M},\mathcal{S},U,{\it\Theta},\pi_{0}), (15)

where the initial belief is common information. This game belongs to the class of incomplete, imperfect, and asymmetric information stochastic games. Owing to the existence of the type θ,\theta, which is unknown to the receiver, the information is incomplete. Because the actions taken by each player are unobservable to the opponent, the information is imperfect and asymmetric. Although investigating existence and computing equilibria of the game are challenging, we discuss properties of equilibria on the premise that they exist and are given because our interest here lies in the consequences for the threat.

3 Analysis: Asymptotic Security

In this section, we analyze asymptotic behaviors of beliefs and actions when the adversary avoids being detected. It is shown that the system is guaranteed to be secure in an asymptotic manner as long as the defender possesses an effective counteraction.

3.1 Belief’s Asymptotic Behavior

The random variable of the belief on the type θΘ\theta\in{\it\Theta} at the kkth step πkθ:Ω[0,1]\pi^{\theta}_{k}:\Omega\to[0,1] is given by

πkθ(ω):=πkr(θ|X0:k(ω)).\pi^{\theta}_{k}(\omega):=\pi^{\rm r}_{k}(\theta|X_{0:k}(\omega)). (16)

Recall that πkθ\pi^{\theta}_{k} represents the defender’s confidence on existence of an attacker. If the belief is low in spite of existence of malicious signals, this means that the Bayesian defense mechanism is deceived. Because we are interested in whether the Bayesian defense mechanism is permanently deceived, or not, we examine asymptotic behavior of the belief.

We first investigate increment of the belief sequence. The following lemma is key to our analysis.

Lemma 1

Consider the game 𝒢1\mathcal{G}_{1}. The belief of the true type πkθ\pi^{\theta}_{k} is a submartingale with respect to the probability θs\mathbb{P}^{s}_{\theta} and the filtration σ(X0:k)\sigma(X_{0:k}) for any type θ\theta and strategy profile ss.

Lemma 1 roughly implies that the expectation of the belief on the true type is non-decreasing. As a direct conclusion of this lemma, the following theorem holds.

Theorem 1

Consider the game 𝒢1\mathcal{G}_{1}. There exists an integrable random variable πθ:Ω[0,1]\pi^{\theta}_{\infty}:\Omega\to[0,1] such that

limkπkθ=πθθsa.s.\lim_{k\to\infty}\pi^{\theta}_{k}=\pi^{\theta}_{\infty}\quad\mathbb{P}^{s}_{\theta}{\rm-a.s.} (17)

for any type θ\theta and strategy profile ss.

Theorem 1 implies that the belief has a limit even if an intermittent attack is executed. Fig. 3 depicts the distributions of the belief sequence when there exists an attacker. Owing to the model knowledge, if the adversary stops the attack at some time step then the belief is invariant, which is illustrated as the transition of the belief at k=1k=1 in Fig. 3. Moreover, the expectation of the belief is non-decreasing over time as claimed by Lemma 1. Thus, there exists a limit πθm\pi^{\theta_{\rm m}}_{\infty} as shown at the right of Fig. 3.

Refer to caption
Figure 3: Distributions of the belief sequence when there exists an attacker. Lemma 1 and Theorem 1 claim that its expectation is non-decreasing over time and the belief has a limit. When the adversary stops the attack, the belief is invariant.

We next investigate the limit. An undesirable limit is πθ=0\pi^{\theta}_{\infty}=0, which means that the defender is completely deceived. We show that this does not happen as long as the initial belief is nonzero. The following lemma holds.

Lemma 2

Consider the game 𝒢1\mathcal{G}_{1}. If π0θ>0\pi^{\theta}_{0}>0 for any type θ,\theta, then log(πkθ)\log(\pi^{\theta}_{k}) with any basis converges θs\mathbb{P}^{s}_{\theta}-almost surely to an integrable random variable as kk\to\infty for any type θ\theta and strategy profile ss.

Lemma 2 leads to the following theorem.

Theorem 2

Consider the game 𝒢1\mathcal{G}_{1}. If π0θ>0\pi^{\theta}_{0}>0 then

πθ>0θsa.s.\pi^{\theta}_{\infty}>0\quad\mathbb{P}^{s}_{\theta}{\rm-a.s.} (18)

for any type θ\theta and strategy profile ss.

Theorem 2 implies that the complete deception described by πθ=0\pi^{\theta}_{\infty}=0 does not occur.

Remark: Theorems 1 and 2 can heuristically be justified from an information-theoretic perspective as follows. Suppose that the state sequence x0:kx_{0:k} is observed at the kkth step. Then the belief is given by

πk(θ|x0:k)=π0(θ)ϕθpϕs(x0:k)pθs(x0:k)π0(ϕ)+π0(θ)=π0(θ)ϕθexp(kSkϕ(x0:k))π0(ϕ)+π0(θ)\begin{array}[]{cl}\pi_{k}(\theta|x_{0:k})&=\dfrac{\pi_{0}(\theta)}{\sum_{\phi\neq\theta}\frac{p^{s}_{\phi}(x_{0:k})}{p^{s}_{\theta}(x_{0:k})}\pi_{0}(\phi)+\pi_{0}(\theta)}\\ &=\dfrac{\pi_{0}(\theta)}{\sum_{\phi\neq\theta}{\rm exp}(kS^{\phi}_{k}(x_{0:k}))\pi_{0}(\phi)+\pi_{0}(\theta)}\end{array} (19)

where pθs(x0:k)p^{s}_{\theta}(x_{0:k}) and pϕs(x0:k)p^{s}_{\phi}(x_{0:k}) are the joint probability mass functions of x0:kx_{0:k} with respect to θs\mathbb{P}^{s}_{\theta} and ϕs\mathbb{P}^{s}_{\phi}, respectively, and

Skϕ(x0:k):=1ki=1klogpϕs(xi|x0:i1)pθs(xi|x0:i1).S^{\phi}_{k}(x_{0:k}):=\dfrac{1}{k}\sum_{i=1}^{k}\log\dfrac{p^{s}_{\phi}(x_{i}|x_{0:i-1})}{p^{s}_{\theta}(x_{i}|x_{0:i-1})}. (20)

Assuming that pθs(xk|x0:k1)p^{s}_{\theta}(x_{k}|x_{0:k-1}) approaches a stationary distribution pθs(x)p^{s}_{\theta}(x) on 𝒳\mathcal{X} and SLLN can be applied, we have

limkSkϕ=𝔼xpθs[logpϕs(x)/pθs(x)]=DKL(pθs||pϕs)\lim_{k\to\infty}S^{\phi}_{k}=\mathbb{E}_{x\sim p^{s}_{\theta}}\left[\log p^{s}_{\phi}(x)/p^{s}_{\theta}(x)\right]=-D_{\rm KL}(p^{s}_{\theta}||p^{s}_{\phi}) (21)

where DKLD_{\rm KL} denotes the Kullback-Leibler divergence. Since DKLD_{\rm KL} is nonnegative for any pair of distributions, SkϕS^{\phi}_{k} converges to a nonpositive number, which results in convergence of πkθ\pi^{\theta}_{k}. If pθspϕsp^{s}_{\theta}\neq p^{s}_{\phi} for any ϕΘ{θ}\phi\in{\it\Theta}\setminus\{\theta\}, the limit of SkϕS^{\phi}_{k} becomes negative, and hence limkexp(kSkϕ(x0:k))=0,\lim_{k\to\infty}\exp(kS^{\phi}_{k}(x_{0:k}))=0, which leads to

limkπk(θ|x0:k)=π0(θ)ϕθlimkexp(kSkϕ(x0:k))π0(ϕ)+π0(θ)=1.\begin{array}[]{cl}\displaystyle{\lim_{k\to\infty}\pi_{k}(\theta|x_{0:k})}&=\dfrac{\pi_{0}(\theta)}{\displaystyle{\sum_{\phi\neq\theta}\lim_{k\to\infty}{\rm exp}(kS^{\phi}_{k}(x_{0:k}))\pi_{0}(\phi)+\pi_{0}(\theta)}}\\ &=1.\end{array} (22)

Thus, the belief of the true type converges to one. Such a convergence property of the Bayesian estimator on the true parameter, referred to as Bayesian consistency, has been investigated mainly in the context of statistics [35, 36]. In this sense, Theorems 1 and 2 can be regarded as another representation of Bayesian consistency. However, note again that this discussion is not a rigorous proof but a heuristic justification because the state is essentially non-i.i.d. and even non-ergodic in our game-theoretic formulation.

3.2 Asymptotic Security

It has turned out that the belief has a positive limit. To clarify our interest, we define the notion of detection-averse utilities.

Definition 1

(Detection-averse Utilities) A pair (Us,Ur)(U^{\rm s},U^{\rm r}) in the game 𝒢1\mathcal{G}_{1} are detection-averse utilities when

πθm<1θmsa.s.\pi^{\theta_{\rm m}}_{\infty}<1\quad\mathbb{P}^{s}_{\theta_{\rm m}}{\rm-a.s.} (23)

for any PBE ss.

Definition 1 characterizes utilities where the malicious sender avoids having the defender form a firm belief on the existence of an attacker. An example of detection-averse utilities is given in Appendix 7. Naturally, strategies reasonable for the attacker should be detection-averse as long as the defender possesses an effective counteraction. If the utilities of interest are not detection-averse, this means that the defense mechanism cannot cope with the attack because the attacker is not afraid to reveal herself. For protecting such systems, appropriate counteractions should be implemented beforehand.

Suppose that there is an effective countermeasure, and hence the utilities are detection-averse. A simple malicious sender’s strategy that satisfies (23) is to imitate the benign sender’s strategy after a finite number of time steps. We give a formal definition of such strategies.

Definition 2

(Asymptotically Benign Strategy) A strategy profile ss in the game 𝒢1\mathcal{G}_{1} is asymptotically benign when

limkδ(Akθm,Akθb)=1θmsa.s.\lim_{k\to\infty}\delta\left(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k}\right)=1\quad\mathbb{P}^{s}_{\theta_{\rm m}}{\rm-a.s.} (24)

where AkθA^{\theta}_{k} is the action taken by the sender with the type θ\theta defined by Akθ:=sks(θ,X0:k).A^{\theta}_{k}:=s^{\rm s}_{k}(\theta,X_{0:k}).

The objective of this subsection is to show that Bayesian defense mechanisms can restrict all reasonable strategies to be asymptotically benign as long as an effective countermeasure is implemented.

As a preparation for proving our main claim, we investigate the asymptotic behavior of state transition. From Theorems 1 and 2, we can expect that the state eventually loses information on the type, which is justified by the following lemma.

Lemma 3

Consider the game 𝒢1\mathcal{G}_{1} with detection-averse utilities. If π0θm>0,\pi^{\theta_{\rm m}}_{0}>0, then

limk|pθms(Xk+1|X0:k)pθbs(Xk+1|X0:k)|=0θmsa.s.\lim_{k\to\infty}\left|p^{s}_{\theta_{\rm m}}(X_{k+1}|X_{0:k})-p^{s}_{\theta_{\rm b}}(X_{k+1}|X_{0:k})\right|=0\quad\mathbb{P}^{s}_{\theta_{\rm m}}{\rm-a.s.} (25)

for any PBE ss.

Under Assumption 1, which eliminates the possibility of stealthy attacks, Lemma 3 implies that the actions themselves must be identical. This fact yields the following theorem, one of the main results in this paper.

Theorem 3

Consider the game 𝒢1\mathcal{G}_{1} with detection-averse utilities. Let Assumption 1 hold and assume π0θm>0\pi^{\theta_{\rm m}}_{0}>0. Then, every PBE of 𝒢1\mathcal{G}_{1} is asymptotically benign.

Theorem 3 implies that the malicious sender’s action converges to the benign action. Equivalently, an attacker necessarily behaves as a benign sender after a finite number time steps. Therefore, the system is guaranteed to be secure in an asymptotic manner, i.e., Bayesian defense mechanisms can prevent deception in an asymptotic sense. This result indicates the powerful defense capability achieved by model knowledge.

4 Application: Analysis of Defensive Deception utilizing Asymmetric Recognition

4.1 Idea of Defensive Deception using Bluffing

The result in Section 3 claims that the defender, namely, the Bayesian defense mechanism, always wins in an asymptotic manner when the stochastic model of the system is available and the vulnerability to be exploited for intrusion is known and modeled. The latter condition is quantitatively described by the condition π0(θm)>0\pi_{0}(\theta_{\rm m})>0. Although the derived result proves a quite powerful defense capability, it is also true that it is almost impossible to be aware of all possible vulnerabilities in advance. Moreover, it is also challenging to implement effective countermeasures for all scenarios and to compute the equilibrium of the dynamic game.

In this section, as an application of the finding in the previous section, we consider defensive deception using bluffing that utilizes asymmetric recognitions between the attacker and the defender. Suppose that an attacker exploits a vulnerability of which the defender is unaware but the attacker is unaware of the defender’s unawareness. Then their recognition becomes asymmetric in the sense that the attacker does not correctly recognize the defender’s recognition of the vulnerability. This situation naturally arises in practice because the defender’s recognition is private information. By utilizing the asymmetric recognition, the defender can possibly deceive the attacker such that the attacker believes that the defender might be aware of the vulnerability and carrying out effective counteractions. Specifically, we consider the bluffing strategies where the system’s state does not possess information about the defender’s belief. For instance, if the defender chooses the reactions that affect only the players’ utilities without influence to the system, the state is independent of the reaction. By concealing the defender’s unawareness, the defender’s recognition, which is quantified by her belief, is completely unknown to the attacker over time.

The defensive deception is possibly able to force the attacker to withdraw even if the defender is actually unaware of the exploited vulnerability. For instance, consider the example in Sec. 2.1 and suppose that emergency shutdown of the system can be carried out by the defender. Suppose also that the attacker wants to keep administrative privileges of the PLC. In this case, the attacker may rationally terminate her evasive maneuvers after a finite number of time steps due to the risk of sudden shutdown. The objective of this section is to show that the hypothesis is true in a formal manner.

4.2 Reformulation using Type Structure

The situation of interest in this section is that the defender is unaware of the vulnerability to be exploited but the attacker is not necessarily aware of this unawareness. To address the uncertainty on defender’s recognition, the attacker forms her belief on the defender’s belief. Fig. 4 illustrates the attacker’s belief on the defender’s belief with the common prior assumption, i.e., the initial defender’s belief is known to the attacker, which has been made in the previous section. In this case, the attacker has a firm belief that the defender is unaware of the vulnerability. On the other hand, Fig. 5 illustrates the attacker’s belief without the common prior assumption. Then the attacker’s belief is no longer firm as depicted by the figure. In addition, because of the lack of the common prior assumption, the defender also forms another belief on the attacker’s belief on the defender’s belief on the existence of an attacker. This procedure repeats indefinitely and induces infinitely many beliefs.

Refer to caption
Figure 4: The attacker’s belief of the defender’s belief with symmetric recognition, which is the case of the game 𝒢1\mathcal{G}_{1}. The attacker is aware of the fact that the defender is unaware of the vulnerability. Moreover, the defender is aware of the attacker’s awareness.
Refer to caption
Figure 5: The attacker’s belief of the defender’s belief with asymmetric recognition. Because the defender’s true belief is unknown to the attacker, the attacker forms a belief on both cases that the defender is aware or unaware of the vulnerability. Moreover, the defender forms a belief on the attacker’s belief. This process induces the notion of belief hierarchy.

The notion of belief hierarchy has been proposed to handle the infinitely many beliefs [9, 10, 46]. A belief hierarchy is formed as follows. Let Δ()\Delta(\cdot) denote the set of probability measures over a set. The first-order initial belief is given as π01Δ(Θ)\pi^{1}_{0}\in\Delta({\it\Theta}), which describes the defender’s initial belief on existence of the attacker. The second-order initial belief is given as π02Δ(Δ(Θ))\pi^{2}_{0}\in\Delta(\Delta({\it\Theta})), which describes the attacker’s initial belief on the defender’s first-order belief. In a similar manner, the belief at any level is given, and the tuple of beliefs at all levels is referred to as a belief hierarchy.

To handle belief hierarchies, the Mertens-Zamir model has been introduced [9, 10, 46]. The model considers type structure, in which a belief hierarchy is embedded. A type structure consists of players, sets of types, and initial beliefs. In particular, a type structure for our situation of interest can be given by

𝒯=((s,r),(Θs,Θr),(π0s,π0r))\mathcal{T}=(({\rm s},{\rm r}),({\it\Theta}^{\rm s},{\it\Theta}^{\rm r}),(\pi^{\rm s}_{0},\pi^{\rm r}_{0})) (26)

where (s,r)({\rm s},{\rm r}) represents the sender and the receiver, Θs{\it\Theta}^{\rm s} and Θr{\it\Theta}^{\rm r} represent the sets of player types, and π0s:Θr×Θs[0,1]\pi^{\rm s}_{0}:{\it\Theta}^{\rm r}\times{\it\Theta}^{\rm s}\to[0,1] and π0r:Θs×Θr[0,1]\pi^{\rm r}_{0}:{\it\Theta}^{\rm s}\times{\it\Theta}^{\rm r}\to[0,1] represent the initial beliefs. The value π0s(θr|θs)\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s}) denotes the sender’s initial belief of the receiver type θr\theta^{\rm r} when the sender type is θs\theta^{\rm s}, and π0r(θs|θr)\pi^{\rm r}_{0}(\theta^{\rm s}|\theta^{\rm r}) denotes the corresponding receiver’s initial belief. The first-order initial belief is given by π01(θs)=π0r(θs|θr)\pi^{1}_{0}(\theta^{\rm s})=\pi^{\rm r}_{0}(\theta^{\rm s}|\theta^{\rm r}) for the true receiver type θrΘr\theta^{\rm r}\in{\it\Theta}^{\rm r}, and the second-order initial belief is given by π02(πr(|θr)|θs)=π0s(θr|θs)\pi^{2}_{0}(\pi^{\rm r}(\cdot|\theta^{\rm r})|\theta^{\rm s})=\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s}) for the true sender type θsΘs\theta^{\rm s}\in{\it\Theta}^{\rm s}. By repeating it, the belief at any level of the belief hierarchy can be derived from the type structure. Importantly, for any reasonable belief hierarchy there exists a type structure that can generate the belief hierarchy of interest. For a formal discussion, see [9, 10, 46].

We model the situation of interest by using the binary type sets:

Θs={θbs,θms},Θr={θur,θar}.{\it\Theta}^{\rm s}=\{\theta^{\rm s}_{\rm b},\theta^{\rm s}_{\rm m}\},\quad{\it\Theta}^{\rm r}=\{\theta^{\rm r}_{\rm u},\theta^{\rm r}_{\rm a}\}. (27)

While θbs\theta^{\rm s}_{\rm b} and θms\theta^{\rm s}_{\rm m} represent benign and malicious senders, respectively, θur\theta^{\rm r}_{\rm u} and θar\theta^{\rm r}_{\rm a} represent receivers being unaware and aware of the vulnerability, respectively. The receiver’s initial beliefs are set to

π0r(θbs|θur)=1,π0r(θms|θur)=0\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm b}|\theta^{\rm r}_{\rm u})=1,\quad\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm m}|\theta^{\rm r}_{\rm u})=0 (28)

and

π0r(θbs|θar)=α,π0r(θms|θar)=1α\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm b}|\theta^{\rm r}_{\rm a})=\alpha,\quad\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm m}|\theta^{\rm r}_{\rm a})=1-\alpha (29)

with α[0,1)\alpha\in[0,1). The initial beliefs mean that, the receiver θur\theta^{\rm r}_{\rm u} is unaware of the vulnerability and firmly believes that the system is normally operated, while the receiver θar\theta^{\rm r}_{\rm a} is aware of the vulnerability and suspects existence of an attacker with probability 1α1-\alpha. The sender’s initial beliefs are assumed to be given by

π0s(θur|θbs)=1,π0s(θar|θbs)=0,\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm b})=1,\quad\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm b})=0, (30)

and

π0s(θur|θms)=β,π0s(θar|θms)=1β\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})=\beta,\quad\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})=1-\beta (31)

with β[0,1]\beta\in[0,1]. The malicious sender does not know the true receiver type, i.e., whether the sender is aware of the vulnerability or not. The given initial beliefs are summarized in Table 1.

Table 1: Initial Beliefs on Opponent Type
θbs\theta^{\rm s}_{\rm b} θms\theta^{\rm s}_{\rm m}
π0r(|θur)\pi^{\rm r}_{0}(\cdot|\theta^{\rm r}_{\rm u}) 1 0
π0r(|θar)\pi^{\rm r}_{0}(\cdot|\theta^{\rm r}_{\rm a}) α\alpha 1α1-\alpha
θur\theta^{\rm r}_{\rm u} θar\theta^{\rm r}_{\rm a}
π0s(|θbs)\pi^{\rm s}_{0}(\cdot|\theta^{\rm s}_{\rm b}) 1 0
π0s(|θms)\pi^{\rm s}_{0}(\cdot|\theta^{\rm s}_{\rm m}) β\beta 1β1-\beta

In accordance with the introduction of the type structure, the definition of strategies and the solution concept are needed to be slightly modified. The contrasting ingredients of the game with symmetric recognition and the one with asymmetric recognition are listed in Table 2, where those with asymmetric recognition can analogically be defined. The conditional probability s(|Θs=θs,Θr=θr)\mathbb{P}^{s}(\cdot|\Theta^{\rm s}=\theta^{\rm s},\Theta^{\rm r}=\theta^{\rm r}), which is the probability measure induced by ss(θs,)s^{\rm s}(\theta^{\rm s},\cdot) and sr(θr,)s^{\rm r}(\theta^{\rm r},\cdot), is denoted by θs,θrs\mathbb{P}^{s}_{\theta^{\rm s},\theta^{\rm r}}. The sender’s expected average utility at the kkth step with the horizon length TT is given by

U¯k,Ts(sk:k+T|θs,x0:k):=1T+1×𝔼s[τ=kk+TUs(Θs,Xτ,sτs(Θs,X0:τ),sτr(Θr,X0:τ))|θs,x0:k].\begin{array}[]{l}\bar{U}^{\rm s}_{k,T}(s_{k:k+T}|\theta^{\rm s},x_{0:k}):=\dfrac{1}{T+1}\\ \displaystyle{\small{\times\mathbb{E}^{s}\left[\left.\sum_{\tau=k}^{k+T}U^{\rm s}(\Theta^{\rm s},X_{\tau},s^{\rm s}_{\tau}(\Theta^{\rm s},X_{0:\tau}),s^{\rm r}_{\tau}(\Theta^{\rm r},X_{0:\tau}))\right|\theta^{\rm s},x_{0:k}\right].}}\end{array} (32)

The receiver’s expected average utility at the kkth step with the horizon length TT is given by

U¯k,Tr(sk:k+T|θr,x0:k):=1T+1×𝔼s[τ=kk+TUr(Θs,Xτ,sτs(Θs,X0:τ),sτr(Θr,X0:τ))|θr,x0:k].\begin{array}[]{l}\bar{U}^{\rm r}_{k,T}(s_{k:k+T}|\theta^{\rm r},x_{0:k}):=\dfrac{1}{T+1}\\ \displaystyle{\small{\times\mathbb{E}^{s}\left[\left.\sum_{\tau=k}^{k+T}U^{\rm r}(\Theta^{\rm s},X_{\tau},s^{\rm s}_{\tau}(\Theta^{\rm s},X_{0:\tau}),s^{\rm r}_{\tau}(\Theta^{\rm r},X_{0:\tau}))\right|\theta^{\rm r},x_{0:k}\right].}}\end{array} (33)

A strategy ss is said to be a PBE when the limit of the utilities (U¯ks,U¯kr)(\bar{U}^{\rm s}_{k},\bar{U}^{\rm r}_{k}) satisfies

{sr:sBRks(sr:r|θs,x0:k),θsΘs,sk:rBRkr(sk:s|θr,x0:k),θrΘr,\left\{\begin{array}[]{l}s^{\rm s}_{r:\infty}\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{r:\infty}|\theta^{\rm s},x_{0:k}),\quad\forall\theta^{\rm s}\in{\it\Theta}^{\rm s},\\ s^{\rm r}_{k:\infty}\in{\rm BR}^{\rm r}_{k}(s^{\rm s}_{k:\infty}|\theta^{\rm r},x_{0:k}),\quad\forall\theta^{\rm r}\in{\it\Theta}^{\rm r},\end{array}\right. (34)

for any k+k\in{\mathbb{Z}_{+}} and x0:k𝒳k+1x_{0:k}\in\mathcal{X}^{k+1} where

BRks(sk:r|θs,x0:k):=argmaxs~k:s𝒮k:sU¯ks((s~k:s,sk:r)|θs,x0:k),BRkr(sk:s|θr,x0:k):=argmaxs~k:r𝒮k:rU¯kr((sk:s,s~k:r)|θr,x0:k).\begin{array}[]{l}{\rm BR}^{\rm s}_{k}(s^{\rm r}_{k:\infty}|\theta^{\rm s},x_{0:k}):=\displaystyle{\operatorname*{arg\,max}_{\tilde{s}^{\rm s}_{k:\infty}\in\mathcal{S}^{\rm s}_{k:\infty}}\ \bar{U}^{\rm s}_{k}((\tilde{s}^{\rm s}_{k:\infty},s^{\rm r}_{k:\infty})|\theta^{\rm s},x_{0:k})},\\ {\rm BR}^{\rm r}_{k}(s^{\rm s}_{k:\infty}|\theta^{\rm r},x_{0:k}):=\displaystyle{\operatorname*{arg\,max}_{\tilde{s}^{\rm r}_{k:\infty}\in\mathcal{S}^{\rm r}_{k:\infty}}\ \bar{U}^{\rm r}_{k}((s^{\rm s}_{k:\infty},\tilde{s}^{\rm r}_{k:\infty})|\theta^{\rm r},x_{0:k})}.\end{array} (35)
Table 2: Contrasting ingredients of the games with symmetric and asymmetric recognitions
symmetric recognition asymmetric recognition
receiver’s strategy skr(x0:k)s^{\rm r}_{k}(x_{0:k}) skr(θr,x0:k)s^{\rm r}_{k}(\theta^{\rm r},x_{0:k})
sender’s belief N/A πs(θr|θs)\pi^{\rm s}(\theta^{\rm r}|\theta^{\rm s})
receiver’s belief πr(θ)\pi^{\rm r}(\theta) πr(θs|θr)\pi^{\rm r}(\theta^{\rm s}|\theta^{\rm r})
sender’s utility U¯s(s,θ)\bar{U}^{\rm s}(s,\theta) U¯s(s,θs)\bar{U}^{\rm s}(s,\theta^{\rm s})
receiver’s utility U¯r(s)\bar{U}^{\rm r}(s) U¯r(s,θr)\bar{U}^{\rm r}(s,\theta^{\rm r})

We define the game formulated above by

𝒢2:=(,𝒮,U,(Θs,Θr),(π0s,π0r)),\mathcal{G}_{2}:=(\mathcal{M},\mathcal{S},U,({\it\Theta}^{\rm s},{\it\Theta}^{\rm r}),(\pi^{\rm s}_{0},\pi^{\rm r}_{0})), (36)

where the defender’s initial belief is not common information in contrast to 𝒢1\mathcal{G}_{1}.

In the following discussion, we analyze 𝒢2\mathcal{G}_{2} through 𝒢1\mathcal{G}_{1}. To clarify their relationship, we describe the game 𝒢1\mathcal{G}_{1} using the modified formulation. Define another game

𝒢^2:=(,𝒮,U,(Θs,Θr),(π^0s,π0r)),\hat{\mathcal{G}}_{2}:=(\mathcal{M},\mathcal{S},U,({\it\Theta}^{\rm s},{\it\Theta}^{\rm r}),(\hat{\pi}^{\rm s}_{0},\pi^{\rm r}_{0})), (37)

where

π^0s(θar|θms)=1.\hat{\pi}^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})=1. (38)

The initial belief means that the adversary believes that the defender is aware of the vulnerability. The situation of 𝒢^2\hat{\mathcal{G}}_{2} is the same as that of 𝒢1\mathcal{G}_{1} if the defender is aware of the vulnerability. Thus, these games lead to the same consequence when the true types are θms\theta^{\rm s}_{\rm m} and θar\theta^{\rm r}_{\rm a}. The following lemma holds.

Lemma 4

Consider the games 𝒢1\mathcal{G}_{1} and 𝒢^2\hat{\mathcal{G}}_{2}. For a strategy profile s^2=(s^2s,s^2r)\hat{s}_{2}=(\hat{s}^{\rm s}_{2},\hat{s}^{\rm r}_{2}) in 𝒢^2\hat{\mathcal{G}}_{2}, let s1=(s1s,s1r)s_{1}=(s^{\rm s}_{1},s^{\rm r}_{1}) be a strategy profile in 𝒢1\mathcal{G}_{1} such that

s1s:=s^2s,s1r:=s^2r|θr=θars^{\rm s}_{1}:=\hat{s}^{\rm s}_{2},\quad s^{\rm r}_{1}:=\hat{s}^{\rm r}_{2}|_{\theta^{\rm r}=\theta^{\rm r}_{\rm a}} (39)

where s^2r|θr=θar\hat{s}^{\rm r}_{2}|_{\theta^{\rm r}=\theta^{\rm r}_{\rm a}} is the restriction of s^2r\hat{s}^{\rm r}_{2} with θr=θar\theta^{\rm r}=\theta^{\rm r}_{\rm a}. Then the probability measures induced by s1s_{1} and s^2\hat{s}_{2} are equal when θs=θms\theta^{\rm s}=\theta^{\rm s}_{\rm m} and θr=θar\theta^{\rm r}=\theta^{\rm r}_{\rm a}, i.e.,

θmss1=θms,θars^2.\mathbb{P}^{s_{1}}_{\theta^{\rm s}_{\rm m}}=\mathbb{P}^{\hat{s}_{2}}_{\theta^{\rm s}_{\rm m},\theta^{\rm r}_{\rm a}}. (40)

Also, if s^2,k:sBRks(s^2,k:r|θms,x0:k)\hat{s}^{\rm s}_{2,k:\infty}\in{\rm BR}^{\rm s}_{k}(\hat{s}^{\rm r}_{2,k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k}) then s1,k:sBRks(s1,k:r|θm,x0:k)s^{\rm s}_{1,k:\infty}\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{1,k:\infty}|\theta_{\rm m},x_{0:k}).

We extend the notions of detection-averse utilities and asymptotically benign strategies to 𝒢2\mathcal{G}_{2}. Our objective is to investigate the effectiveness of the proposed defensive deception. It is possible to define detection-averse utilities directly using the game 𝒢2\mathcal{G}_{2} as utilities where the resulting equilibrium leads the adversary to avoid being detected. However, this definition immediately means that the defensive deception works well, and any results from the definition cannot show its effectiveness. Instead, we say that utilities in 𝒢2\mathcal{G}_{2} are detection-averse when the adversary avoids being detected if she is certain that the defender is aware of the vulnerability.

Definition 3

(Detection-averse Utilities in 𝒢2\mathcal{G}_{2}) A pair of utilities (Us,Ur)(U^{\rm s},U^{\rm r}) in the game 𝒢2\mathcal{G}_{2} are detection-averse utilities when

limkπkr(θms|θar)<1θms,θarsa.s.\lim_{k\to\infty}\pi^{\rm r}_{k}(\theta^{\rm s}_{\rm m}|\theta^{\rm r}_{\rm a})<1\quad\mathbb{P}^{s}_{\theta^{\rm s}_{\rm m},\theta^{\rm r}_{\rm a}}{\rm-a.s.} (41)

for any PBE ss of 𝒢^2\hat{\mathcal{G}}_{2}.

Note that Definition 3 is a necessary requirement to make the game interesting, because the adversary is not afraid of being detected at all without this condition.

Next, we define desirable strategies that should be achieved by Bayesian defense mechanisms. We say a strategy in 𝒢2\mathcal{G}_{2} to be asymptotically benign when it becomes benign regardless of the defender’s awareness.

Definition 4

(Asymptotically Benign Strategies in 𝒢2\mathcal{G}_{2}) A strategy profile ss in the game 𝒢2\mathcal{G}_{2} is asymptotically benign when

limkδ(Akθm,Akθb)=1θms,θrsa.s.\lim_{k\to\infty}\delta\left(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k}\right)=1\quad\mathbb{P}^{s}_{\theta^{\rm s}_{\rm m},\theta^{\rm r}}{\rm-a.s.} (42)

for any θrΘr\theta^{\rm r}\in{\it\Theta}^{\rm r}.

Note that Definition 4 requires the strategy to be asymptotically benign for any θrΘr\theta^{\rm r}\in{\it\Theta}^{\rm r}. In other words, the strategy is needed to be asymptotically benign even if the defender is unaware of the vulnerability.

4.3 Passively Bluffing Strategies

We expect that there exists a chance of preventing attacks that exploit unnoticed vulnerabilities if the state does not possess information about the defender’s recognition. To formally verify this expectation, we define passively bluffing strategies.

Definition 5

(Passively Bluffing Strategies) A strategy profile ss in 𝒢2\mathcal{G}_{2} is a passively bluffing strategy profile when the sender’s belief satisfies

πks(θr|X0:k,θs)=π0s(θr|θs)θs,θrsa.s.\pi^{\rm s}_{k}(\theta^{\rm r}|X_{0:k},\theta^{\rm s})=\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s})\quad\mathbb{P}^{s}_{\theta^{\rm s},\theta^{\rm r}}{\rm-a.s.} (43)

for any θsΘs,θrΘr,\theta^{\rm s}\in{\it\Theta}^{\rm s},\theta^{\rm r}\in{\it\Theta}^{\rm r}, and k+k\in{\mathbb{Z}_{+}}. A strategy profile set 𝒮\mathcal{S} in 𝒢2\mathcal{G}_{2} is a passively bluffing strategy set when its all elements are passively bluffing.

Definition 5 requires the sender’s belief to be invariant over time. If the strategy is passively bluffing, the adversary cannot identify whether the defender is aware of the exploited vulnerability or not even in an asymptotic sense. Note that the introduced passively bluffing strategies can be regarded as a commitment. It is well known that restricting feasible strategies, referred to as commitment, can be beneficial in a game [47, 48]. In what follows, we investigate the effectiveness of the specific commitment.

Passively bluffing strategies can relax the condition for asymptotically benign strategies. The following lemma holds.

Lemma 5

Consider the game 𝒢2\mathcal{G}_{2}. If a passively bluffing strategy profile ss satisfies

limkδ(Akθm,Akθb)=1θms,θarsa.s.\lim_{k\to\infty}\delta\left(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k}\right)=1\quad\mathbb{P}^{s}_{\theta^{\rm s}_{\rm m},\theta^{\rm r}_{\rm a}}{\rm-a.s.} (44)

then ss is asymptotically benign.

The difference between (42) and (44) is the required receiver type. Lemma 5 implies that if a passively bluffing strategy profile is asymptotically benign when the receiver is aware of the vulnerability then the strategy is needed to be asymptotically benign even when the receiver is unaware of the vulnerability.

Remark: Although Definition 5 depends not only on the receiver’s strategy but also on the sender’s strategy for generality, the bluffing should be realized only by the defender in practice. A simple defender’s approach to achieving the bluffing is to choose reactions that do not influence the system’s behavior. Let pb\mathcal{R}_{\rm pb}\subset\mathcal{R} be the set of reactions such that the system’s dynamics is independent of the reaction, i.e., the transition probability satisfies

P(x|x,a,r)=P(x|x,a,r)P(x^{\prime}|x,a,r)=P(x^{\prime}|x,a,r^{\prime}) (45)

for any x𝒳,x𝒳x^{\prime}\in\mathcal{X},x\in\mathcal{X}, a𝒜a\in\mathcal{A}, rpbr\in\mathcal{R}_{\rm pb}, rpbr^{\prime}\in\mathcal{R}_{\rm pb}. If the receiver’s strategy takes only reactions in pb,\mathcal{R}_{\rm pb}, every strategy profile is passively bluffing. Indeed, because the transition probability is independent of rpbr\in\mathcal{R}_{\rm pb}, the probability distribution of the state is independent of θr\theta^{\rm r}. Thus, from Bayes’ rule, we have

πks(θr|x0:k,θs)=pθs,θrs(x0:k)π0s(θr|θs)ϕrΘrpθs,ϕrs(x0:k)π0s(ϕr|θs)=pθss(x0:k)π0s(θr|θs)ϕrΘrpθss(x0:k)π0s(ϕr|θs)=π0s(θr|θs)ϕrΘrπ0s(ϕr|θs)=π0s(θr|θs)\begin{array}[]{cl}\pi^{\rm s}_{k}(\theta^{\rm r}|x_{0:k},\theta^{\rm s})&=\dfrac{p^{s}_{\theta^{\rm s},\theta^{\rm r}}(x_{0:k})\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s})}{\sum_{\phi^{\rm r}\in{\it\Theta}^{\rm r}}p^{s}_{\theta^{\rm s},\phi^{\rm r}}(x_{0:k})\pi^{\rm s}_{0}(\phi^{\rm r}|\theta^{\rm s})}\\ &=\dfrac{p^{s}_{\theta^{\rm s}}(x_{0:k})\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s})}{\sum_{\phi^{\rm r}\in{\it\Theta}^{\rm r}}p^{s}_{\theta^{\rm s}}(x_{0:k})\pi^{\rm s}_{0}(\phi^{\rm r}|\theta^{\rm s})}\\ &=\dfrac{\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s})}{\sum_{\phi^{\rm r}\in{\it\Theta}^{\rm r}}\pi^{\rm s}_{0}(\phi^{\rm r}|\theta^{\rm s})}\\ &=\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s})\end{array} (46)

when pθs,θrs(x0:k)0p^{s}_{\theta^{\rm s},\theta^{\rm r}}(x_{0:k})\neq 0. An example of such reactions is just analyzing the network log and raising an alarm inside the operation room without applying control on the system itself. Note that the reaction still affects the players’ decision making through their utility functions, even if (45) holds.

4.4 Analysis

Our expectation can be described in a quantitative form based on the definition of passively bluffing strategies, which lead to a simple representation of the sender’s utility. If ss is passively bluffing, the sender’s belief is invariant over time. Hence, the sender’s utility with infinite horizon is given by

U¯ks(sk:|θms,x0:k)=θrΘrU¯k,θrs(sk:|θms,x0:k)π0s(θr|θms)\begin{array}[]{l}\displaystyle{\bar{U}^{\rm s}_{k}(s_{k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k})=\sum_{\theta^{\rm r}\in{\it\Theta}^{\rm r}}\bar{U}^{\rm s}_{k,\theta^{\rm r}}(s_{k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k})\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s}_{\rm m})}\end{array} (47)

where

U¯k,θrs(sk:|θms,x0:k):=limT1T+1×𝔼θrs[τ=kk+TUs(θms,Xτ,sτs(θms,X0:τ),sτr(θr,X0:τ))|x0:k].\begin{array}[]{l}\bar{U}^{\rm s}_{k,\theta^{\rm r}}(s_{k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k}):=\lim_{T\to\infty}\dfrac{1}{T+1}\\ \displaystyle{\times\mathbb{E}^{s}_{\theta^{\rm r}}\left[\left.\sum_{\tau=k}^{k+T}U^{\rm s}(\theta^{\rm s}_{\rm m},X_{\tau},s^{\rm s}_{\tau}(\theta^{\rm s}_{\rm m},X_{0:\tau}),s^{\rm r}_{\tau}(\theta^{\rm r},X_{0:\tau}))\right|x_{0:k}\right].}\end{array} (48)

Note that U¯k,θars\bar{U}^{\rm s}_{k,\theta^{\rm r}_{\rm a}} and U¯k,θurs\bar{U}^{\rm s}_{k,\theta^{\rm r}_{\rm u}} denote the sender’s utilities of the two cases where the defender is aware and unaware of the vulnerability, respectively. Thus (47) implies that the sender’s utility is simply given as a sum weighted by her initial beliefs when the strategy is passively bluffing. Therefore, we can expect that the sender possibly stops the execution in the middle of the attack if π0s(θar|θms)\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m}) is sufficiently large. We show the existence of such sender’s initial belief. Note that π0s(θar|θms)=1\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})=1 is the trivial case, and thus we assume that sender’s initial beliefs that are strictly less than one.

First, we rephrase the result in Sec. 3. Let 𝒮nab\mathcal{S}_{\rm nab} denote the set of non-asymptotically-benign strategies in 𝒢2\mathcal{G}_{2}. Our aim here is to show that the set of PBE of 𝒢2\mathcal{G}_{2} does not overlap with 𝒮nab\mathcal{S}_{\rm nab} when the attacker strongly believes that the defender is aware of the vulnerability. It suffices to show that there is no overlap between the set of PBE of 𝒢2\mathcal{G}_{2} and 𝒮nab:=𝒮nab𝒮\mathcal{S}_{\rm nab}^{\ast}:=\mathcal{S}_{\rm nab}\cap\mathcal{S}^{\ast} where

𝒮:={(ss,sr)𝒮:sk:sBRkr(sk:r|θbs,x0:k),sk:rBRkr(sk:s|θr,x0:k),θrΘr,k+,x0:k𝒳k+1},\begin{array}[]{l}\mathcal{S}^{\ast}:=\{(s^{\rm s},s^{\rm r})\in\mathcal{S}:s^{\rm s}_{k:\infty}\in{\rm BR}^{\rm r}_{k}(s^{\rm r}_{k:\infty}|\theta^{\rm s}_{\rm b},x_{0:k}),\\ s^{\rm r}_{k:\infty}\in{\rm BR}^{\rm r}_{k}(s^{\rm s}_{k:\infty}|\theta^{\rm r},x_{0:k}),\forall\theta^{\rm r}\in{\it\Theta}^{\rm r},k\in{\mathbb{Z}_{+}},x_{0:k}\in\mathcal{X}^{k+1}\},\end{array} (49)

where the benign sender and the receiver with any type take their best response strategies. Note that, 𝒮nab\mathcal{S}_{\rm nab} and 𝒮\mathcal{S}^{\ast} of the games 𝒢2\mathcal{G}_{2} and 𝒢^2\hat{\mathcal{G}}_{2} are identical because the sets are independent of the malicious sender’s belief. The following lemma is another description of the claim of Theorem 3 with respect to U¯θars\bar{U}^{\rm s}_{\theta^{\rm r}_{\rm a}} and 𝒮nab\mathcal{S}_{\rm nab}^{\ast}.

Lemma 6

Consider the game 𝒢^2\hat{\mathcal{G}}_{2} with detection-averse utilities. Let Assumption 1 hold. For any strategy profile s=(ss,sr)s=(s^{\rm s},s^{\rm r}) in 𝒮nab\mathcal{S}_{\rm nab}^{\ast}, there exists s~s𝒮s\tilde{s}^{\rm s}\in\mathcal{S}^{\rm s} such that

Dθar(s,s~s)>0D_{\theta^{\rm r}_{\rm a}}(s,\tilde{s}^{\rm s})>0 (50)

holds where

Dθr(s,s~s):=U¯θrs((s~s,sr),θms)U¯θrs(s,θms).D_{\theta^{\rm r}}(s,\tilde{s}^{\rm s}):=\bar{U}^{\rm s}_{\theta^{\rm r}}((\tilde{s}^{\rm s},s^{\rm r}),\theta^{\rm s}_{\rm m})-\bar{U}^{\rm s}_{\theta^{\rm r}}(s,\theta^{\rm s}_{\rm m}). (51)

Lemma 6 implies the existence of a function

g:𝒮nab𝒮ss.t.Dθar(s,g(s))>0g:\mathcal{S}_{\rm nab}^{\ast}\to\mathcal{S}^{\rm s}\quad{\rm s.t.}\quad D_{\theta^{\rm r}_{\rm a}}(s,g(s))>0 (52)

for any s𝒮nab.s\in\mathcal{S}^{\ast}_{\rm nab}. Thus we have γ0\gamma\geq 0 where

γ:=infs𝒮nabDθar(s,g(s)).\gamma:=\inf_{s\in\mathcal{S}_{\rm nab}^{\ast}}D_{\theta^{\rm r}_{\rm a}}(s,g(s)). (53)

We here make an assumption that Dθar(s,g(s))D_{\theta^{\rm r}_{\rm a}}(s,g(s)) is uniformly lower bounded by a positive value.

Assumption 2

For the game 𝒢^2\hat{\mathcal{G}}_{2}, there exists gg in (52) such that the infimum (53) is positive, i.e., γ>0\gamma>0.

Assumption 2 eliminates the case where the difference between the sender’s utilities achievable by asymptotically benign strategies and non-asymptotically-benign strategies is infinitesimally small.

The following theorem, the main result of this section, holds.

Theorem 4

Consider the game 𝒢2\mathcal{G}_{2} with detection-averse utilities and a passively bluffing strategy set. Let Assumptions 1 and 2 hold. Then, there exists a sender’s initial belief π0s(θar|θms)<1\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})<1 such that every PBE of 𝒢2\mathcal{G}_{2} is asymptotically benign.

Theorem 4 implies that the system can possibly be protected by passively bluffing strategies if the attacker strongly believes that the defender is aware of the vulnerability. The result suggests the importance of concealing the defender’s recognition and the effectiveness of defensive deception.

5 Simulation

In this section, we confirm the theoretical results through numerical simulation.

5.1 Fundamental Setup

We assume the state space and the action space to be binary, i.e., 𝒳={xn,xa}\mathcal{X}=\{x_{\rm n},x_{\rm a}\} and 𝒜={ab,am}.\mathcal{A}=\{a_{\rm b},a_{\rm m}\}. The states xnx_{\rm n} and xax_{\rm a} represent the normal and abnormal states, respectively, and aba_{\rm b} and ama_{\rm m} represent benign and malicious actions, respectively. The benign and malicious actions correspond to nominal and malicious control signals, respectively. The reaction set is given by ={rb,rm,rmb}\mathcal{R}=\{r_{\rm b},r_{\rm m},r^{\rm b}_{\rm m}\}. The state transition diagram is depicted in Fig. 6. The initial state is set to xnx_{\rm n}. The transition probability is given as follows. Set the transition probability from xnx_{\rm n} to be given by

P(xa|xn,a,r)={0.2ifa=ab,0.3ifa=amP(x_{\rm a}|x_{\rm n},a,r)=\left\{\begin{array}[]{cl}0.2&{\rm if}\ a=a_{\rm b},\\ 0.3&{\rm if}\ a=a_{\rm m}\end{array}\right. (54)

for any rr\in\mathcal{R}, which means that the probability from the normal state to the abnormal state is increased by the malicious action and it is independent of the reaction. The transition probability from xax_{\rm a} to xax_{\rm a} is given by Table 3. The probability from the abnormal state to the abnormal state is increased by the malicious action and it is decreased by the reaction rmr_{\rm m}. The reaction rmbr^{\rm b}_{\rm m} corresponds to bluffing since it induces the same transition probability as rbr_{\rm b}.

Table 3: Transition Probabilities from the abnormal state to abnormal state.
P(xa|xa,a,r)P(x_{\rm a}|x_{\rm a},a,r) rbr_{\rm b} rmr_{\rm m} rmbr^{\rm b}_{\rm m}
aba_{\rm b} 0.5 0.3 0.5
ama_{\rm m} 0.6 0.4 0.6
Refer to caption
Figure 6: State transition diagram of the numerical example.

The utilities are given as follows. The benign sender’s utility is

Us(θb,x,a,r)={1ifx=xn,0otherwiseU^{\rm s}(\theta_{\rm b},x,a,r)=\left\{\begin{array}[]{cl}1&{\rm if}\ x=x_{n},\\ 0&{\rm otherwise}\end{array}\right. (55)

for any a𝒜a\in\mathcal{A} and rr\in\mathcal{R}, which means that the benign sender prefers the nominal state regardless of other variables. The malicious sender’s utility is given by Table 4. The benign action aba_{\rm b} is a risk-free action, which always induces zero utility, while the malicious action ama_{\rm m} is a risky action. If the reaction is rb,r_{\rm b}, the malicious sender obtains positive utility, where the abnormal state xax_{\rm a} is more preferred than xnx_{\rm n}. On the other hand, if the reaction is rm,r_{\rm m}, the malicious sender incurs loss. The receiver’s utility is set to be independent on a𝒜a\in\mathcal{A} and given by Table 5, where aa is omitted. The receiver obtains utility only when she takes an appropriate reaction depending on the sender type. When an appropriate reaction is chosen, the normal state is more preferred than the abnormal state. Note that rmbr^{\rm b}_{\rm m} induces the same utilities as those with rmr_{\rm m} but it increases the probability of the abnormal state. Therefore, there is no motivation to choose rmbr^{\rm b}_{\rm m} when the defender’s recognition is known to the attacker.

Table 4: Malicious Sender’s Utility
Us(θm,x,ab,r)U^{\rm s}(\theta_{\rm m},x,a_{\rm b},r) rbr_{\rm b} rmr_{\rm m} rmbr^{\rm b}_{\rm m}
xnx_{\rm n} 0 0 0
xax_{\rm a} 0 0 0
Us(θm,x,am,r)U^{\rm s}(\theta_{\rm m},x,a_{\rm m},r) rbr_{\rm b} rmr_{\rm m} rmbr^{\rm b}_{\rm m}
xnx_{\rm n} 1 -3 -3
xax_{\rm a} 2 -3 -3
Table 5: Receiver’s Utility
Ur(θb,x,r)U^{\rm r}(\theta_{\rm b},x,r) rbr_{\rm b} rmr_{\rm m} rmbr^{\rm b}_{\rm m}
xnx_{\rm n} 5 0 0
xax_{\rm a} 1 0 0
Ur(θm,x,r)U^{\rm r}(\theta_{\rm m},x,r) rbr_{\rm b} rmr_{\rm m} rmbr^{\rm b}_{\rm m}
xnx_{\rm n} 0 5 5
xax_{\rm a} 0 1 1

Since it is difficult to compute an exact equilibrium for the infinite time horizon problem, we treat a sequence of equilibria for a finite time horizon problem as a tractable approximation [49]. Letting (sks,skr,,sk+T1s,sk+T1r)(s^{\rm s}_{k},s^{\rm r}_{k},\ldots,s^{\rm s}_{k+T-1},s^{\rm r}_{k+T-1}) be the resulting equilibrium of the finite time horizon game, we use skss^{\rm s}_{k} and skrs^{\rm r}_{k} as the kkth strategies as with receding horizon control. The equilibrium is obtained through brute-force search. For the game 𝒢2\mathcal{G}_{2}, the strategies in the simulation are given in a similar manner. The horizon length is set to T=2T=2. In the numerical examples, the equilibrium is uniquely determined.

5.2 Simulation: Asymptotic Security

In the first scenario, we consider the case where the vulnerability is known, and thus this situation corresponds to the game 𝒢1\mathcal{G}_{1} in (15). The initial belief is given by π0r(θm)=0.01\pi^{\rm r}_{0}(\theta_{\rm m})=0.01, which is known to the sender. The true sender type is given by θs=θms\theta^{\rm s}=\theta^{\rm s}_{\rm m}.

Under the setting, sample paths of the belief on the malicious sender, the state, the action, and the reaction with θ=θm\theta=\theta_{\rm m} are depicted in Fig. 7. The belief converges to a nonzero value over time as claimed by Theorems 1 and 2. The action converges to the benign action as claimed by Theorem 3. The graphs evidence asymptotic security achieved by the Bayesian defense mechanism. In more detail, it can be observed that the malicious sender takes the malicious action ama_{\rm m} while the receiver takes the reaction rbr_{\rm b} until about the time step k=50k=50. This is because the receiver’s belief on the malicious sender is low during the beginning of the game. On the other hand, between the time steps k=50k=50 and k=100k=100, aba_{\rm b} and rmr_{\rm m} sporadically appear because the belief is increased. Finally, after the time step k=100k=100, the belief exceeds a threshold, which results in the fixed actions a=aba=a_{\rm b} and r=rmr=r_{\rm m}. It is notable that rmbr^{\rm b}_{\rm m} is not chosen at all since there is no reason for it, as explained above.

Refer to caption
Figure 7: Sample paths of the belief on the malicious sender, state, action, and reactions with θ=θm\theta=\theta_{\rm m}. The belief converges to a nonzero value over time as claimed by Theorems 1 and 2. The action converges to the benign action as claimed by Theorem 3. The results evidence asymptotic security achieved by the Bayesian defense mechanism.

5.3 Simulation: Defensive Deception using Bluffing

In the second scenario, we consider the case where the defender is unaware of the vulnerability and the attacker is unaware of the defender’s unawareness. Then this situation corresponds to the game 𝒢2\mathcal{G}_{2} in (36). The initial beliefs are given by π0r(θms|θar)=0.3\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm m}|\theta^{\rm r}_{\rm a})=0.3 and π0s(θar|θms)=0.8.\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})=0.8. The true types are given by θs=θms\theta^{\rm s}=\theta^{\rm s}_{\rm m} and θr=θur\theta^{\rm r}=\theta^{\rm r}_{\rm u}. Note that π0r(θms|θur)=0\pi^{\rm r}_{0}(\theta^{\rm s}_{\rm m}|\theta^{\rm r}_{\rm u})=0 and hence the defender is completely unaware of the attack while the game is proceeding.

We first consider the case where the strategy is not passively bluffing. The same transition probability as that used in the previous simulation, where it depends on the receiver’s reaction. As a result, the state possesses information about the receiver type.

Fig. 8 depicts sample paths of the receiver’s belief on the malicious sender if the receiver were aware of the vulnerability, the sender’s belief on the receiver being aware, the actual state, the actual action, and reactions that would be taken by the receiver being aware. It can be observed that the sender’s belief converges to zero, i.e., the sender notices that the receiver is unaware of the vulnerability. As a result, malicious actions are constantly taken after a sufficiently large number of time steps. The result indicates that the defense mechanism fails to defend the system in this case.

Refer to caption
Figure 8: Sample paths of the receiver’s belief on the malicious sender when the receiver is aware, the sender’s belief on the receiver being aware, state, action, and reactions that would be taken when the receiver were aware, where the strategy is not passively bluffing. The sender’s belief converges to zero, i.e., the attacker notices that the defender is unaware of the vulnerability. As a result, the malicious action is continuously taken after a sufficiently large number of time steps.

We next consider the bluffing case. As a commitment for passively bluffing strategy, we restrict the reaction set to ={rb,rmb}\mathcal{R}=\{r_{\rm b},r^{\rm b}_{\rm m}\}. Then the transition probability is independent of the reaction, and hence any strategy becomes passively bluffing.

Fig. 9 depicts sample paths of those depicted in Fig. 8 under the bluffing setting. The sender’s belief is invariant over time because the state does not possess information about the receiver type. Thus, the malicious sender remains cautious about being detected. As a result, the benign action is continuously taken after a sufficiently large number of time steps in contrast to Fig. 8. The result indicates that asymptotic security is achieved by the bluffing even if the defender is unaware of the vulnerability. The simulation suggests importance of concealing the defender’s belief even if it degrades control performance.

Refer to caption
Figure 9: Sample paths of the receiver’s belief on the malicious sender when the receiver is aware, the sender’s belief on the receiver being aware, state, action, and reactions that would be taken when the receiver were aware, where the strategy is passively bluffing. The sender’s belief is invariant over time because the state does not possess information about the receiver type. Thus, the attacker keeps to be cautious about being detected. As a result, the benign action is continuously taken after a sufficiently large number of time steps in contrast to Fig. 8.

6 Conclusion

This study has analyzed defense capability achieved by Bayesian defense mechanisms. It has been shown that the system to be protected can be guaranteed to be secure by Bayesian defense mechanisms provided that effective countermeasures are implemented. This fact implies that model knowledge can prevent the defender from being deceived in an asymptotic sense. As a defensive deception utilizing the derived asymptotic security, bluffing utilizing asymmetric recognition has been considered. It has also been shown that the attacker possibly stops the execution in the middle of the attack in a rational manner when she strongly believes the defender to be aware of the vulnerability, even if the vulnerability is unnoticed.

Important future work includes an extension to infinite state spaces because the state space in control systems typically is a subset of the Euclidean space. For this purpose, existing Bayesian consistency analysis for general sample space should be useful [36]. Moreover, although it is assumed that the state is observable in this framework, a generalization to partially observable setting is a more practical setting. We expect that the key properties such as Lemma 1 still holds if the system requirement, such as Assumption 1, can be appropriately modified. Another direction is to extend the results to non-binary types. Finally, finding a general condition for detection-averse utilities is an important issue. For this purpose, the example in Appendix 7 should be helpful.

\appendices

7 Example of Detection-averse Utilities

Consider an MDP with binary spaces 𝒳={xn,xa},𝒜={ab,am},\mathcal{X}=\{x_{\rm n},x_{\rm a}\},\mathcal{A}=\{a_{\rm b},a_{\rm m}\}, and ={rb,rm}\mathcal{R}=\{r_{\rm b},r_{\rm m}\}. The initial state is xnx_{\rm n} and it goes to xax_{\rm a} with probability p>0p>0 when a=ama=a_{\rm m} and stays at xmx_{\rm m} otherwise. The objective of the receiver is to detect the true state, which is modeled by

Ur(θ,x,a,r)={1if(θ,r)=(θb,rb)or(θm,rm),0otherwise.U^{\rm r}(\theta,x,a,r)=\left\{\begin{array}[]{cl}1&{\rm if}\ (\theta,r)=(\theta_{\rm b},r_{\rm b})\ {\rm or}\ (\theta_{\rm m},r_{\rm m}),\\ 0&{\rm otherwise}.\end{array}\right. (56)

The malicious sender’s utility is given by

Us(θm,x,a,rb)={1ifa=am,0otherwise,Us(θm,x,a,rm)=1,U^{\rm s}(\theta_{\rm m},x,a,r_{\rm b})=\left\{\begin{array}[]{cl}1&{\rm if}\ a=a_{\rm m},\\ 0&{\rm otherwise},\end{array}\right.U^{\rm s}(\theta_{\rm m},x,a,r_{\rm m})=-1, (57)

which means that the adversary wants to avoid being detected. The benign sender is assumed to choose rbr_{\rm b} anytime. The initial belief satisfies π0r(θm)<1/2\pi^{\rm r}_{0}(\theta_{\rm m})<1/2. Let (ss,sr)(s^{\rm s},s^{\rm r}) be a strategy such that πθm=1\pi^{\theta_{\rm m}}_{\infty}=1 with probability q>0q>0. Take such x0:x_{0:\infty} and then there exists N+N\in{\mathbb{Z}_{+}} such that πkr(θm|x0:k)=1\pi^{\rm r}_{k}(\theta_{\rm m}|x_{0:k})=1 for any k>Nk>N since

πkr(θm|x0:k)={π0r(θm)ifxτ=xnτ{0,,k},1otherwise.\pi^{\rm r}_{k}(\theta_{\rm m}|x_{0:k})=\left\{\begin{array}[]{ll}\pi^{\rm r}_{0}(\theta_{\rm m})&{\rm if}\ x_{\tau}=x_{\rm n}\ \forall\tau\in\{0,\ldots,k\},\\ 1&{\rm otherwise}.\end{array}\right. (58)

The receiver’s best response at x0:kx_{0:k} is to take rmr_{\rm m}, which leads to the sender’s average utility at x0:kx_{0:k} equal 1-1. Thus, the sender’s average utility at the initial state is q<0-q<0. However, if the sender takes the strategy such that ak=aba_{\rm k}=a_{\rm b} for any k+k\in{\mathbb{Z}_{+}}, then the sender’s average utility at the initial state is 0. Thus, sss^{\rm s} is not a best response to srs^{\rm r}, which means that the utilities are detection-averse.

8 Proofs of Propositions

In the proofs, we omit the symbol ss in the notation for simplicity when no confusion arises.

Proof.

Proof of Lemma 1: It is clear that πkθ\pi^{\theta}_{k} is adapted to the filtration σ(X0:k)\sigma(X_{0:k}). It is also clear that πkθ\pi^{\theta}_{k} is integrable with respect to θ\mathbb{P}_{\theta} since it is bounded. Thus, it suffices to show

𝔼[πk+1θ|σ(X0:k)]πkθθa.s.\mathbb{E}\left[\pi^{\theta}_{k+1}|\sigma(X_{0:k})\right]\geq\pi^{\theta}_{k}\quad\mathbb{P}_{\theta}{\rm-a.s.} (59)

for the claim. For a fixed outcome ωΩ\omega\in\Omega with which X0:k(ω)=x0:kX_{0:k}(\omega)=x_{0:k}, the inequality is equivalent to

xk+1𝒳pθ(xk+1|x0:k)πk+1(θ|x0:k+1)πkr(θ|x0:k).\sum_{x_{k+1}\in\mathcal{X}}p_{\theta}(x_{k+1}|x_{0:k})\pi_{k+1}(\theta|x_{0:k+1})\geq\pi^{\rm r}_{k}(\theta|x_{0:k}). (60)

Thus it suffices to show (60) for any kk\in\mathbb{N} and x0:k𝒳k+1x_{0:k}\in\mathcal{X}^{k+1}.

First, we reduce the index of the summation in (60). When πkr(θ|x0:k)=0\pi^{\rm r}_{k}(\theta|x_{0:k})=0, the inequality (60) always holds. Thus we assume πkr(θ|x0:k)>0\pi^{\rm r}_{k}(\theta|x_{0:k})>0 in the following. Define

𝒳k0:={xk+1𝒳:ϕΘpϕ(xk+1|x0:k)πkr(ϕ|x0:k)=0}.\mathcal{X}^{0}_{k}:=\left\{x_{k+1}\in\mathcal{X}:\sum_{\phi\in{\it\Theta}}p_{\phi}(x_{k+1}|x_{0:k})\pi^{\rm r}_{k}(\phi|x_{0:k})=0\right\}. (61)

Because πkr(ϕ|x0:k)\pi^{\rm r}_{k}(\phi|x_{0:k}) is positive for any ϕΘ\phi\in{\it\Theta}, if xk+1x_{k+1} belongs to 𝒳k0\mathcal{X}^{0}_{k} then pθ(xk+1|x0:k)=0p_{\theta}(x_{k+1}|x_{0:k})=0. Hence (60) is equivalent to

xk+1𝒳k+pθ(xk+1|x0:k)πk+1r(θ|x0:k+1)πkr(θ|x0:k)\sum_{x_{k+1}\in\mathcal{X}^{+}_{k}}p_{\theta}(x_{k+1}|x_{0:k})\pi^{\rm r}_{k+1}(\theta|x_{0:k+1})\geq\pi^{\rm r}_{k}(\theta|x_{0:k}) (62)

where 𝒳k+:=𝒳𝒳k0\mathcal{X}^{+}_{k}:=\mathcal{X}\setminus\mathcal{X}^{0}_{k}.

To simplify notation, we define

π(θ):=πkr(θ|x0:k),pϕ(x):=pϕ(x|x0:k),𝒳+:=𝒳k+\pi(\theta):=\pi^{\rm r}_{k}(\theta|x_{0:k}),\ p_{\phi}(x):=p_{\phi}(x|x_{0:k}),\ \mathcal{X}^{+}:=\mathcal{X}^{+}_{k} (63)

for fixed kk and x0:kx_{0:k}. Then the inequality (62) is equivalent to

x𝒳+pθ(x)pθ(x)π(θ)ϕΘpϕ(x)π(ϕ)π(θ).\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\dfrac{p_{\theta}(x)\pi(\theta)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\geq\pi(\theta). (64)

Because π(θ)>0\pi(\theta)>0, this inequality is equivalent to

x𝒳+pθ(x)pθ(x)ϕΘpϕ(x)π(ϕ)=:G(θ)1.\underbrace{\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\dfrac{p_{\theta}(x)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}}_{=:G(\theta)}\geq 1. (65)

By rewriting G(θ)G(\theta), we have

G(θ)=x𝒳+pθ(x)1ϕΘpϕ(x)pθ(x)π(ϕ).\begin{array}[]{cl}G(\theta)&=\displaystyle{\sum_{x\in\mathcal{X}^{+}}}p_{\theta}(x)\dfrac{1}{\sum_{\phi\in{\it\Theta}}\frac{p_{\phi}(x)}{p_{\theta}(x)}\pi(\phi)}.\end{array} (66)

By applying Jensen’s inequality in (1) with the functions p(x):=pθ(x),a(x):=ϕΘpϕ(x)pθ(x)π(ϕ),p(x):=p_{\theta}(x),a(x):=\sum_{\phi\in{\it\Theta}}\frac{p_{\phi}(x)}{p_{\theta}(x)}\pi(\phi), and φ(ξ):=1/ξ\varphi(\xi):=1/\xi, we have

G(θ)φ(x𝒳+pθ(x)ϕΘpϕ(x)pθ(x)π(ϕ))=φ(x𝒳+ϕΘpϕ(x)π(ϕ))=φ(ϕΘ(x𝒳+pϕ(x))π(ϕ))=φ(ϕΘπ(ϕ))=φ(1)=1,\begin{array}[]{cl}G(\theta)&\geq\varphi\left(\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\sum_{\phi\in{\it\Theta}}\frac{p_{\phi}(x)}{p_{\theta}(x)}\pi(\phi)\right)\\ &=\varphi\left(\sum_{x\in\mathcal{X}^{+}}\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)\right)\\ &=\varphi\left(\sum_{\phi\in{\it\Theta}}\left(\sum_{x\in\mathcal{X}^{+}}p_{\phi}(x)\right)\pi(\phi)\right)\\ &=\varphi\left(\sum_{\phi\in{\it\Theta}}\pi(\phi)\right)\\ &=\varphi(1)\\ &=1,\end{array} (67)

which leads to the claim. ∎

Proof.

Proof of Theorem 1: Because the belief is uniformly bounded over time, we have supk𝔼θ[πkθ]<.\sup_{k\in\mathbb{N}}\mathbb{E}_{\theta}\left[\pi^{\theta}_{k}\right]<\infty. From Lemma 1 and Doob’s convergence theorem [50, Theorem 4.4.1], the claim holds. ∎

Proof.

Proof of Lemma 2: Since π0θ>0\pi^{\theta}_{0}>0, we have πkθ>0\pi^{\theta}_{k}>0 θ\mathbb{P}_{\theta}-almost surely for any kk\in\mathbb{N}. Thus log(πkθ)\log(\pi^{\theta}_{k}) is well-defined. We first show that log(πkθ)\log(\pi^{\theta}_{k}) is a submartingale with respect to the probability measure θ\mathbb{P}_{\theta} and the filtration σ(X0:k)\sigma(X_{0:k}). It is clear that log(πkθ)\log(\pi^{\theta}_{k}) is adapted to the filtration σ(X0:k)\sigma(X_{0:k}). Because the number of elements in the support of log(πkθ)\log(\pi^{\theta}_{k}) is finite, log(πkθ)\log(\pi^{\theta}_{k}) is integrable for any kk\in\mathbb{N}. Thus it suffices to show that

𝔼θ[log(πk+1θ)|σ(X0:k)]log(πkθ)θa.s.\mathbb{E}_{\theta}\left[\log(\pi^{\theta}_{k+1})|\sigma(X_{0:k})\right]\geq\log(\pi^{\theta}_{k})\quad\mathbb{P}_{\theta}{\rm-a.s.} (68)

As in the proof of Lemma 1, this inequality is equivalent to

xk+1𝒳k+pθ(xk+1|x0:k)log(πk+1r(θ|x0:k+1))log(πkr(θ|x0:k))\sum_{x_{k+1}\in\mathcal{X}^{+}_{k}}p_{\theta}(x_{k+1}|x_{0:k})\log(\pi^{\rm r}_{k+1}(\theta|x_{0:k+1}))\geq\log(\pi^{\rm r}_{k}(\theta|x_{0:k})) (69)

for any x0:k𝒳k+1x_{0:k}\in\mathcal{X}^{k+1}. With the notation (63), this is equivalent to

x𝒳+pθ(x)log(pθ(x)π(θ)ϕΘpϕ(x)π(ϕ))log(π(θ)).\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\log\left(\dfrac{p_{\theta}(x)\pi(\theta)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\right)\geq\log(\pi(\theta)). (70)

Because the left-hand side can be rewritten by

x𝒳+pθ(x)log(pθ(x)π(θ)ϕΘpϕ(x)π(ϕ))=x𝒳+pθ(x){log(pθ(x)ϕΘpϕ(x)π(ϕ))+log(π(θ))}=x𝒳+pθ(x)log(pθ(x)ϕΘpϕ(x)π(ϕ))+log(π(θ)),\begin{array}[]{cl}&\displaystyle{\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)}\log\left(\dfrac{p_{\theta}(x)\pi(\theta)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\right)\\ =&\displaystyle{\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)}\left\{\log\left(\dfrac{p_{\theta}(x)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\right)+\log(\pi(\theta))\right\}\\ =&\displaystyle{\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)}\log\left(\dfrac{p_{\theta}(x)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\right)+\log(\pi(\theta)),\end{array} (71)

the inequality (70) is equivalent to

x𝒳+pθ(x)log(pθ(x)ϕΘpϕ(x)π(ϕ))0,\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\log\left(\dfrac{p_{\theta}(x)}{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}\right)\geq 0, (72)

which is also equivalent to

x𝒳+pθ(x)log(ϕΘpϕ(x)π(ϕ)pθ(x))H(θ)0.\underbrace{\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\log\left(\dfrac{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}{p_{\theta}(x)}\right)}_{H(\theta)}\leq 0. (73)

By applying Jensen’s inequality for a concave function with p(x):=pθ(x),a(x):=ϕΘpϕ(x)π(ϕ)pθ(x),p(x):=p_{\theta}(x),a(x):=\frac{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}{p_{\theta}(x)}, and φ(ξ):=log(ξ),\varphi(\xi):=\log(\xi), we have

H(θ)log(x𝒳+pθ(x)ϕΘpϕ(x)π(ϕ)pθ(x))=log(x𝒳+ϕΘpϕ(x)π(ϕ))=log(1)=0,\begin{array}[]{cl}H(\theta)&\leq\log\left(\sum_{x\in\mathcal{X}^{+}}p_{\theta}(x)\frac{\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)}{p_{\theta}(x)}\right)\\ &=\log\left(\sum_{x\in\mathcal{X}^{+}}\sum_{\phi\in{\it\Theta}}p_{\phi}(x)\pi(\phi)\right)\\ &=\log(1)\\ &=0,\end{array} (74)

which implies that log(πkθ)\log(\pi^{\theta}_{k}) is a submartingale.

From Doob’s convergence theorem [50, Theorem 4.4.1], it suffices to show that the expectation of the nonnegative part of log(πkθ)\log(\pi^{\theta}_{k}) is uniformly bounded. Because πkθ(0,1]\pi^{\theta}_{k}\in(0,1], log(πkθ)\log(\pi^{\theta}_{k}) is nonpositive for any kk\in\mathbb{N}, and hence the uniform boundedness holds. ∎

Proof.

Proof of Theorem 2: We prove the claim by contradiction. Define EE as the inverse image of {0}\{0\} for πθ\pi^{\theta}_{\infty}. Assume that θ(E)>0\mathbb{P}_{\theta}(E)>0. For any ωE,\omega\in E, πkθ(ω)0\pi^{\theta}_{k}(\omega)\to 0 as kk\to\infty. Hence, from the continuity of logarithm functions, it turns out that log(πkθ(ω))\log(\pi^{\theta}_{k}(\omega))\to-\infty as kk\to\infty. This means that log(πkθ(ω))\log(\pi^{\theta}_{k}(\omega)) diverges for ωE\omega\in E. However, Lemma 2 states that log(πkθ)\log(\pi^{\theta}_{k}) converges θ\mathbb{P}_{\theta}-almost surely. This is a contradiction. ∎

Proof.

Proof of Lemma 3: We first show that the coefficient of Bayes’ rule converges to one, i.e.,

limkfk(θm,X0:k)=1θma.s.\lim_{k\to\infty}f_{k}(\theta_{\rm m},X_{0:k})=1\quad\mathbb{P}_{\theta_{\rm m}}{\rm-a.s.} (75)

where fk+1(θ,x0:k):=πk+1θ/πkθf_{k+1}(\theta,x_{0:k}):=\pi^{\theta}_{k+1}/\pi^{\theta}_{k}. Because πθm\pi^{\theta_{\rm m}}_{\infty} is nonzero θm\mathbb{P}_{\theta_{\rm m}}-almost surely from Theorem 2, we have

limkfk+1(θm,X0:k)=limk(πk+1θm/πkθm)=πθm/πθm=1\begin{array}[]{cl}\displaystyle{\lim_{k\to\infty}f_{k+1}(\theta_{\rm m},X_{0:k})}&\displaystyle{=\lim_{k\to\infty}\left(\pi^{\theta_{\rm m}}_{k+1}/\pi^{\theta_{\rm m}}_{k}\right)}\\ &=\pi^{\theta_{\rm m}}_{\infty}/\pi^{\theta_{\rm m}}_{\infty}\\ &=1\end{array} (76)

θm\mathbb{P}_{\theta_{\rm m}}-almost surely. Thus (75) holds.

It is observed that fkf_{k} can be calculated by

fks(θ,x0:k)=pθs(xk|x0:k1)ϕΘpϕs(xk|x0:k1)πk1r(ϕ|x0:k1)f^{s}_{k}(\theta,x_{0:k})=\dfrac{p^{s}_{\theta}(x_{k}|x_{0:k-1})}{\sum_{\phi\in{\it\Theta}}p^{s}_{\phi}(x_{k}|x_{0:k-1})\pi^{\rm r}_{k-1}(\phi|x_{0:k-1})} (77)

from (10). Define

fN,k+1(ω):=pθm(Xk+1(ω)|X0:k(ω)),fD,k+1(ω):=ϕΘpϕ(Xk+1(ω)|X0:k(ω))πkϕ(ω),\begin{array}[]{l}f_{{\rm N},k+1}(\omega):=p_{\theta_{\rm m}}(X_{k+1}(\omega)|X_{0:k}(\omega)),\\ f_{{\rm D},k+1}(\omega):=\sum_{\phi\in{\it\Theta}}p_{\phi}(X_{k+1}(\omega)|X_{0:k}(\omega))\pi^{\phi}_{k}(\omega),\end{array} (78)

which denote the numerator and the denominator of the coefficient fk+1(θm,X0:k+1(ω))f_{k+1}(\theta_{\rm m},X_{0:k+1}(\omega)), respectively. Since π0θm>0\pi^{\theta_{\rm m}}_{0}>0, we have 0<fD,k+110<f_{{\rm D},k+1}\leq 1 for any ωΩ,k+\omega\in\Omega,k\in{\mathbb{Z}_{+}}. Thus

0fD,k+1|fk+11||fk+11|θma.s.0\leq f_{{\rm D},k+1}|f_{k+1}-1|\leq|f_{k+1}-1|\quad\mathbb{P}_{\theta_{\rm m}}{\rm-a.s.} (79)

for any k+k\in{\mathbb{Z}_{+}}. Because limk|fk+11|=0θma.s.\lim_{k\to\infty}|f_{k+1}-1|=0\quad\mathbb{P}_{\theta_{\rm m}}{\rm-a.s.} from (75), the squeeze theorem for (79) yields

limkfD,k+1|fk+1θm,s1|=0θma.s.\lim_{k\to\infty}f_{{\rm D},k+1}|f^{\theta_{\rm m},s}_{k+1}-1|=0\quad\mathbb{P}_{\theta_{\rm m}}{\rm-a.s.} (80)

Now we have

fD,k+1|fk+1θm,s1|=|fN,k+1fD,k+1|=|pθm(Xk+1|X0:k)ϕΘpθ(Xk+1|X0:k)πkθ|=|pθm(Xk+1|X0:k)(1πkθm)pθb(Xk+1|X0:k)πkθb|=|pθm(Xk+1|X0:k)pθb(Xk+1|X0:k)|(1πkθm).\begin{array}[]{l}f_{{\rm D},k+1}|f^{\theta_{\rm m},s}_{k+1}-1|\\ =|f_{{\rm N},k+1}-f_{{\rm D},k+1}|\\ =|p_{\theta_{\rm m}}(X_{k+1}|X_{0:k})-\sum_{\phi\in{\it\Theta}}p_{\theta}(X_{k+1}|X_{0:k})\pi^{\theta}_{k}|\\ =|p_{\theta_{\rm m}}(X_{k+1}|X_{0:k})(1-\pi^{\theta_{\rm m}}_{k})-p_{\theta_{\rm b}}(X_{k+1}|X_{0:k})\pi^{\theta_{\rm b}}_{k}|\\ =|p_{\theta_{\rm m}}(X_{k+1}|X_{0:k})-p_{\theta_{\rm b}}(X_{k+1}|X_{0:k})|(1-\pi^{\theta_{\rm m}}_{k}).\end{array} (81)

Since ss is a PBE with detection-averse utilities, limk(1πkθm)0θma.s.\lim_{k\to\infty}(1-\pi^{\theta_{\rm m}}_{k})\neq 0\quad\mathbb{P}_{\theta_{\rm m}}{\rm-a.s.} Therefore, (80) leads to the claim. ∎

Proof.

Proof of Theorem 3: From the definition of the conditional probability mass function, we have

pθ(Xk+1|X0:k)=P(Xk+1|Xk,sks(θ,X0:k),skr(X0:k)).p_{\theta}(X_{k+1}|X_{0:k})=P(X_{k+1}|X_{k},s^{\rm s}_{k}(\theta,X_{0:k}),s^{\rm r}_{k}(X_{0:k})). (82)

Thus the claim of Lemma 3 can be rewritten by

|P(Xk+1|Xk,Akθm,skr(X0:k))P(Xk+1|Xk,Akθb,skr(X0:k))|0\begin{array}[]{l}|P(X_{k+1}|X_{k},A^{\theta_{\rm m}}_{k},s^{\rm r}_{k}(X_{0:k}))-P(X_{k+1}|X_{k},A^{\theta_{\rm b}}_{k},s^{\rm r}_{k}(X_{0:k}))|\\ \to 0\end{array} (83)

θm\mathbb{P}_{\theta_{\rm m}}-almost surely as kk\to\infty. From finiteness of the MDP, the condition (83) is equivalent to

θm({Eki.o.})=0\mathbb{P}_{\theta_{\rm m}}(\{E_{k}\ {\rm i.o.}\})=0 (84)

where

Ek:={P(Xk+1|Xk,Akθm,Rk)P(Xk+1|Xk,Akθb,Rk)}.E_{k}:=\left\{P(X_{k+1}|X_{k},A^{\theta_{\rm m}}_{k},R_{k})\neq P(X_{k+1}|X_{k},A^{\theta_{\rm b}}_{k},R_{k})\right\}. (85)

By applying the generalized Borel-Cantelli’s second lemma in (2) with k:=σ(X0:k),\mathcal{F}_{k}:=\sigma(X_{0:k}), we have

θm(E)=0\mathbb{P}_{\theta_{\rm m}}(E)=0 (86)

where

E:={ωΩ:k=0θm(Ek|σ(X0:k))(ω)=}.\textstyle{E:=\left\{\omega\in\Omega:\sum_{k=0}^{\infty}\mathbb{P}_{\theta_{\rm m}}\left(E_{k}|\sigma(X_{0:k})\right)(\omega)=\infty\right\}.} (87)

We derive a simpler description of the event EE. For any ωΩ\omega\in\Omega, the set of nonnegative integers +{\mathbb{Z}_{+}} can be divided into two disjoint subsets ^+(ω)\hat{\mathbb{Z}}_{+}(\omega) and +^+(ω){\mathbb{Z}_{+}}\setminus\hat{\mathbb{Z}}_{+}(\omega) such that

{Akθm(ω)Akθb(ω),k^+(ω),Akθm(ω)=Akθb(ω),k+^+(ω).\left\{\begin{array}[]{l}A^{\theta_{\rm m}}_{k}(\omega)\neq A^{\theta_{\rm b}}_{k}(\omega),\quad\forall k\in\hat{\mathbb{Z}}_{+}(\omega),\\ A^{\theta_{\rm m}}_{k}(\omega)=A^{\theta_{\rm b}}_{k}(\omega),\quad\forall k\in{\mathbb{Z}_{+}}\setminus\hat{\mathbb{Z}}_{+}(\omega).\end{array}\right. (88)

For a fixed ωΩ\omega\in\Omega, θm(Ek|σ(X0:k))(ω)=0\mathbb{P}_{\theta_{\rm m}}(E_{k}|\sigma(X_{0:k}))(\omega)=0 for k+^+(ω)k\in{\mathbb{Z}_{+}}\setminus\hat{\mathbb{Z}}_{+}(\omega). Thus we have

k=0θm(Ek|σ(X0:k))(ω)=k^+(ω)θm(Ek|σ(X0:k))(ω).\sum_{k=0}^{\infty}\mathbb{P}_{\theta_{\rm m}}\left(E_{k}|\sigma(X_{0:k})\right)(\omega)=\sum_{k\in\hat{\mathbb{Z}}_{+}(\omega)}\mathbb{P}_{\theta_{\rm m}}\left(E_{k}|\sigma(X_{0:k})\right)(\omega). (89)

Moreover, for any k+k\in{\mathbb{Z}_{+}} and ωΩ\omega\in\Omega, we have

θm(Ek|σ(X0:k))(ω)=xk+1𝒳k+1(ω)P(xk+1|xk,akθm,rk)\mathbb{P}_{\theta_{\rm m}}\left(E_{k}|\sigma(X_{0:k})\right)(\omega)=\sum_{x_{k+1}\in\mathcal{X}_{k+1}(\omega)}P(x_{k+1}|x_{k},a^{\theta_{\rm m}}_{k},r_{k}) (90)

where

𝒳k+1(ω):={x𝒳:P(x|xk,akθm,rk)P(x|xk,akθb,rk)}\mathcal{X}_{k+1}(\omega):=\{x\in\mathcal{X}:P(x|x_{k},a^{\theta_{\rm m}}_{k},r_{k})\neq P(x|x_{k},a^{\theta_{\rm b}}_{k},r_{k})\} (91)

with x0:k:=X0:k(ω),akθ:=Akθ(ω),x_{0:k}:=X_{0:k}(\omega),a^{\theta}_{k}:=A^{\theta}_{k}(\omega), and rk:=Rk(ω)r_{k}:=R_{k}(\omega). Thus, the condition in the definition of EE can be rewritten by

k^+(ω)xk+1𝒳k+1(ω)P(xk+1|xk,akθm,rk)=.\displaystyle{\sum_{k\in\hat{\mathbb{Z}}_{+}(\omega)}\sum_{x_{k+1}\in\mathcal{X}_{k+1}(\omega)}P(x_{k+1}|x_{k},a^{\theta_{\rm m}}_{k},r_{k})=\infty.} (92)

Now we define F:={AkθmAkθbi.o.}.F:=\{A^{\theta_{\rm m}}_{k}\neq A^{\theta_{\rm b}}_{k}\ {\rm i.o.}\}. We show the claim by contradiction. Assume θm(F)>0\mathbb{P}_{\theta_{\rm m}}(F)>0. Then θm(E|F)\mathbb{P}_{\theta_{\rm m}}(E|F) is well-defined. From (86), we have θm(EF)=θm(E|F)θm(F)=0.\mathbb{P}_{\theta_{\rm m}}(E\cap F)=\mathbb{P}_{\theta_{\rm m}}(E|F)\mathbb{P}_{\theta_{\rm m}}(F)=0. Because θm(F)\mathbb{P}_{\theta_{\rm m}}(F) is assumed to be nonzero, this equation implies θm(E|F)=0.\mathbb{P}_{\theta_{\rm m}}(E|F)=0. We now calculate θm(E|F)\mathbb{P}_{\theta_{\rm m}}(E|F) from its definition. Let

γ(ω):=infkZ^+(ω)xk+1𝒳k+1(ω)P(xk+1|xk,akθm,rk).\gamma(\omega):=\inf_{k\in\hat{Z}_{+}(\omega)}\sum_{x_{k+1}\in\mathcal{X}_{k+1}(\omega)}P(x_{k+1}|x_{k},a^{\theta_{\rm m}}_{k},r_{k}). (93)

For any ωΩ\omega\in\Omega, the set 𝒳k+1(ω)\mathcal{X}_{k+1}(\omega) is nonempty for k^+(ω)k\in\hat{\mathbb{Z}}_{+}(\omega) from Assumption 1. This fact and the finiteness of the MDP lead to that γ(ω)>0.\gamma(\omega)>0. The infimum leads to the inequality

k^+(ω)xk+1𝒳k+1(ω)P(xk+1|xk,akθm,rk)|^+(ω)|γ(ω).\sum_{k\in\hat{\mathbb{Z}}_{+}(\omega)}\sum_{x_{k+1}\in\mathcal{X}_{k+1}(\omega)}P(x_{k+1}|x_{k},a^{\theta_{\rm m}}_{k},r_{k})\geq\left|\hat{\mathbb{Z}}_{+}(\omega)\right|\gamma(\omega). (94)

If ωF\omega\in F, then ^+(ω)\hat{\mathbb{Z}}_{+}(\omega) has infinite elements and hence |^+(ω)|γ(ω)=|\hat{\mathbb{Z}}_{+}(\omega)|\gamma(\omega)=\infty. Thus, for any ωF\omega\in F, from the inequality (94), the condition (92) holds. Therefore, θm(E|F)=1\mathbb{P}_{\theta_{\rm m}}(E|F)=1, which is a contradiction. Hence θm(F)=0\mathbb{P}_{\theta_{\rm m}}(F)=0 holds. ∎

Proof.

Proof of Lemma 4: The former claim is obvious since the probability measures are independent of the strategies with θbs\theta^{\rm s}_{\rm b} and θur\theta^{\rm r}_{\rm u}. Assume s^2,k:sBRks(s^2,k:r|θms,x0:k)\hat{s}^{\rm s}_{2,k:\infty}\in{\rm BR}^{\rm s}_{k}(\hat{s}^{\rm r}_{2,k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k}) for any k+k\in{\mathbb{Z}_{+}} and x0:k𝒳k+1x_{0:k}\in\mathcal{X}^{k+1}. Because π^ks(θar|θms)=1\hat{\pi}^{\rm s}_{k}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})=1 and π^ks(θur|θms)=0\hat{\pi}^{\rm s}_{k}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})=0 for k+k\in{\mathbb{Z}_{+}}, the malicious sender’s expected average utility in 𝒢^2\hat{\mathcal{G}}_{2} is given by

U¯k,Ts(s^2,k:k+T|θms,x0:k):=1T+1×𝔼s[τ=kk+TUs(θms,Xk,s^2,τs(θms,X0:τ),s^2,τr(θar,X0:τ))|x0:k].\begin{array}[]{l}\bar{U}^{\rm s}_{k,T}(\hat{s}_{2,{k:k+T}}|\theta^{\rm s}_{\rm m},x_{0:k}):=\dfrac{1}{T+1}\\ \displaystyle{\small{\times\mathbb{E}^{s}\left[\left.\sum_{\tau=k}^{k+T}U^{\rm s}(\theta^{\rm s}_{\rm m},X_{k},\hat{s}^{\rm s}_{2,\tau}(\theta^{\rm s}_{\rm m},X_{0:\tau}),\hat{s}^{\rm r}_{2,\tau}(\theta^{\rm r}_{\rm a},X_{0:\tau}))\right|x_{0:k}\right].}}\end{array} (95)

which is the malicious sender’s utility in 𝒢1\mathcal{G}_{1}. Thus, BRks(s^2,k:r|θms,x0:k){\rm BR}^{\rm s}_{k}(\hat{s}^{\rm r}_{2,k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k}) in 𝒢^2\hat{\mathcal{G}}_{2} is equal to BRks(s1,k:r|θm,x0:k){\rm BR}^{\rm s}_{k}(s^{\rm r}_{1,k:\infty}|\theta_{\rm m},x_{0:k}) in 𝒢1\mathcal{G}_{1}. Hence, s1,k:sBRks(s1,k:r|θm,x0:k).s^{\rm s}_{1,k:\infty}\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{1,k:\infty}|\theta_{\rm m},x_{0:k}).

Proof.

Proof of Lemma 5: If ss is a passively bluffing strategy, then the distribution of HksH^{\rm s}_{k} is independent of the receiver type. Thus the distribution of δ(Akθm,Akθb)\delta\left(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k}\right) is also independent of the receiver type. Hence, if (44) holds, the same condition holds for θur\theta^{\rm r}_{\rm u} as well. ∎

Proof.

Proof of Lemma 6: Take s=(ss,sr)𝒮nabs=(s^{\rm s},s^{\rm r})\in\mathcal{S}_{\rm nab}^{\ast}. Consider 𝒢1\mathcal{G}_{1} corresponding to 𝒢^2\hat{\mathcal{G}}_{2}. Let s1=(s1s,s1r)s_{1}=(s^{\rm s}_{1},s^{\rm r}_{1}) be a strategy profile in 𝒢1\mathcal{G}_{1} given by (39). From Lemma 4, we have

θmss1(δ(Akθm,Akθb)=0)=θms,θars^2(δ(Akθm,Akθb)=0).\mathbb{P}^{s_{1}}_{\theta^{\rm s}_{\rm m}}\left(\delta(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k})=0\right)=\mathbb{P}^{\hat{s}_{2}}_{\theta^{\rm s}_{\rm m},\theta^{\rm r}_{\rm a}}\left(\delta(A^{\theta_{\rm m}}_{k},A^{\theta_{\rm b}}_{k})=0\right). (96)

From the contraposition of Lemma 5, this equation implies that s1s_{1} is not asymptotically benign in the sense of the game 𝒢1\mathcal{G}_{1} since s𝒮nabs\in\mathcal{S}_{\rm nab}. Thus, s1s_{1} is not a PBE of 𝒢1\mathcal{G}_{1} from Theorem 3. This means that s1ss^{\rm s}_{1} contains a strategy that is not a best response. Because s𝒮s\in\mathcal{S}^{\ast}, this means s1,k:sBRks(s1,k:r|θm,x0:k)s^{\rm s}_{1,k:\infty}\not\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{1,k:\infty}|\theta_{\rm m},x_{0:k}) for some kk. From the contraposition of Lemma 4, sk:sBRks(sk:r|θms,x0:k)s^{\rm s}_{k:\infty}\not\in{\rm BR}^{\rm s}_{k}(s^{\rm r}_{k:\infty}|\theta^{\rm s}_{\rm m},x_{0:k}), which is equivalent to (50). ∎

Proof.

Proof of Theorem 4: We prove the existence of π0s(θar|θms)<1\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})<1 such that the contraposition of the condition holds, i.e., if ss is not asymptotically benign then ss is not a PBE. Let s𝒮nabs\in\mathcal{S}_{\rm nab}. If s𝒮s\not\in\mathcal{S}^{\ast}, ss is not a PBE. Thus we suppose s𝒮nabs\in\mathcal{S}_{\rm nab}^{\ast}. It suffices to show that there exists π0s(θar|θms)<1\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})<1 such that

infs𝒮nabD(s,g(s))>0\inf_{s\in\mathcal{S}_{\rm nab}^{\ast}}D(s,g(s))>0 (97)

where D(s,s~s):=U¯s((s~s,sr),θms,πs)U¯s(s,θms,πs)D(s,\tilde{s}^{\rm s}):=\bar{U}^{\rm s}((\tilde{s}^{\rm s},s^{\rm r}),\theta^{\rm s}_{\rm m},\pi^{\rm s})-\bar{U}^{\rm s}(s,\theta^{\rm s}_{\rm m},\pi^{\rm s}) and gg is given in (52).

From (47), we have D(s,s~s)=θrΘrπ0s(θr|θms).D(s,\tilde{s}^{\rm s})=\sum_{\theta^{\rm r}\in{\it\Theta}^{\rm r}}\pi^{\rm s}_{0}(\theta^{\rm r}|\theta^{\rm s}_{\rm m}). From the definition of γ\gamma in (53), we have

D(s,g(s))Dθur(s,g(s))π0s(θur|θms)+γπ0s(θar|θms)=γ+(Dθur(s,g(s))γ)π0s(θur|θms)\begin{array}[]{cl}D(s,g(s))&\geq D_{\theta^{\rm r}_{\rm u}}(s,g(s))\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})+\gamma\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})\\ &=\gamma+(D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma)\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})\end{array} (98)

Consider the case where Dθur(s,g(s))γ0D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma\geq 0 for any s𝒮nabs\in\mathcal{S}_{\rm nab}^{\ast}. Then (98) implies that D(s,g(s))γD(s,g(s))\geq\gamma for any π0s(θar|θms)<1\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm a}|\theta^{\rm s}_{\rm m})<1. From Assumption 2, this inequality leads to infs𝒮nabD(s,g(s))γ>0,\inf_{s\in\mathcal{S}_{\rm nab}^{\ast}}D(s,g(s))\geq\gamma>0, which implies (97).

Next, consider the case where Dθur(s,g(s))γ<0D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma<0 for some s𝒮nabs\in\mathcal{S}_{\rm nab}^{\ast}. By taking an initial belief π0s(θur|θms)>0\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})>0 such that

π0s(θur|θms)<infs𝒯γDθur(s,g(s))γ\pi^{\rm s}_{0}(\theta^{\rm r}_{\rm u}|\theta^{\rm s}_{\rm m})<\inf_{s\in\mathcal{T}}\dfrac{-\gamma}{D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma} (99)

where 𝒯:={s𝒮nab:Dθur(s,g(s))γ<0},\mathcal{T}:=\{s\in\mathcal{S}_{\rm nab}^{\ast}:D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma<0\}, we have (97) from (98). From the definition of DθurD_{\theta^{\rm r}_{\rm u}} in (51), Dθur(s,g(s))D_{\theta^{\rm r}_{\rm u}}(s,g(s)) is bounded in 𝒯\mathcal{T} because U¯θrs\bar{U}^{\rm s}_{\theta^{\rm r}} is bounded for any strategy. Thus we have

infs𝒯γDθur(s,g(s))γ>0.\inf_{s\in\mathcal{T}}\dfrac{-\gamma}{D_{\theta^{\rm r}_{\rm u}}(s,g(s))-\gamma}>0. (100)

Thus a nonzero initial belief that satisfies (99) exists.

References

  • [1] McAfee, “The hidden costs of cybercrime,” Tech. Rep., 2020, [Online]. Available: https://www.mcafee.com/enterprise/en-us/assets/reports/rp-hidden-costs-of-cybercrime.pdf.
  • [2] Cybersecurity & Infrastructure Security Agency, “Stuxnet malware mitigation,” Tech. Rep. ICSA-10-238-01B, 2014, [Online]. Available: https://www.us-cert.gov/ics/advisories/ICSA-10-238-01B.
  • [3] ——, “HatMan - safety system targeted malware,” Tech. Rep. MAR-17-352-01, 2017, [Online]. Available: https://www.us-cert.gov/ics/MAR-17-352-01-HatMan-Safety-System-Targeted-Malware-Update-B.
  • [4] ——, “Cyber-attack against Ukrainian critical infrastructure,” Tech. Rep. IR-ALERT-H-16-056-01, 2018, [Online]. Available: https://www.us-cert.gov/ics/alerts/IR-ALERT-H-16-056-01.
  • [5] ——, “DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,” Tech. Rep. AA21-131A, 2021, [Online]. Available: https://us-cert.cisa.gov/ncas/alerts/aa21-131a.
  • [6] N. Falliere, L. O. Murchu, and E. Chien, “W32. Stuxnet Dossier,” Symantec, Tech. Rep., 2011.
  • [7] A. Neyman and S. Sorin, Eds., Stochastic Games and Applications.   Springer, 2003.
  • [8] I.-K. Cho and D. M. Kreps, “Signaling Games and Stable Equilibria,” The Quarterly Journal of Economics, vol. 102, no. 2, pp. 179–221, 1987.
  • [9] J. Mertens and S. Zamir, “Formulation of Bayesian analysis for games with incomplete information,” International Journal of Game Theory, vol. 14, pp. 1–29, 1985.
  • [10] E. Dekel and M. Siniscalchi, “Epistemic game theory,” in Handbook of Game Theory.   Elsevier, 2015, ch. 12, pp. 619–702.
  • [11] M. S. Lund, B. Solhaug, and K. Stolen, Model-Driven Risk Analysis.   Springer, 2011.
  • [12] C. Phillips and L. P. Swiler, “A graph-based system for network-vulnerability analysis,” in Proc. 1998 Workshop on New Security Paradigms, 1998, p. 71–79.
  • [13] B. Schneier, “Attack trees,” Dr. Dobb’s Journal, vol. 24, no. 12, pp. 21–29, 1999.
  • [14] S. Bistarelli, F. Fioravanti, and P. Peretti, “Defense trees for economic evaluation of security investments,” in Proc. First International Conference on Availability, Reliability and Security, 2006.
  • [15] A. Roy, D. S. Kim, and K. S. Trivedi, “Attack countermeasure trees (ACT): towards unifying the constructs of attack and defense trees,” Security and Communication Networks, vol. 5, no. 8, pp. 929–943, 2012.
  • [16] E. Miehling, M. Rasouli, and D. Teneketzis, “A POMDP approach to the dynamic defense of large-scale cyber networks,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 10, pp. 2490–2505, 2018.
  • [17] S. Chockalingam, W. Pieters, A. Teixeira, and P. Gelder, “Bayesian network models in cyber security: A systematic review,” in Secure IT Systems, ser. Lecture Notes in Computer Science.   Springer, 2017.
  • [18] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, “Bayesian event classification for intrusion detection,” in Proc. 19th Annual Computer Security Applications Conference, 2003, pp. 14–23.
  • [19] W. Alhakami, A. ALharbi, S. Bourouis, R. Alroobaea, and N. Bouguila, “Network anomaly intrusion detection using a nonparametric Bayesian approach and feature selection,” IEEE Access, vol. 7, pp. 52 181–52 190, 2019.
  • [20] S. A. Zonouz, H. Khurana, W. H. Sanders, and T. M. Yardley, “RRE: A game-theoretic intrusion response and recovery engine,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 2, pp. 395–406, 2014.
  • [21] N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk management using Bayesian attack graphs,” IEEE Trans. Dependable Secure Comput., vol. 9, no. 1, pp. 61–74, 2012.
  • [22] H. Sandberg, S. Amin, and K. H. Johansson, “Cyberphysical security in networked control systems: An introduction to the issue,” IEEE Control Systems Magazine, vol. 35, no. 1, pp. 20–23, 2015.
  • [23] S. M. Dibaji, M. Pirani, D. B. Flamholz, A. M. Annaswamy, K. H. Johansson, and A. Chakrabortty, “A systems and control perspective of CPS security,” Annual Reviews in Control, vol. 47, pp. 394–411, 2019.
  • [24] A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “Revealing stealthy attacks in control systems,” in Proc. 50th Annual Allerton Conference on Communication, Control, and Computing, 2012, pp. 1806–1813.
  • [25] J. Milošević, A. Teixeira, T. Tanaka, K. H. Johansson, and H. Sandberg, “Security measure allocation for industrial control systems: Exploiting systematic search techniques and submodularity,” International Journal of Robust and Nonlinear Control, vol. 30, no. 11, pp. 4278–4302, 2020.
  • [26] J. Giraldo et al., “A survey of physics-based attack detection in cyber-physical systems,” ACM Comput. Surv., vol. 51, no. 4, 2018.
  • [27] M. Tambe, Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned.   Cambridge University Press, 2012.
  • [28] T. Alpcan and T. Başar, Network Security: A Decision and Game-Theoretic Approach.   Cambridge University Press, 2010.
  • [29] J. Pawlick, E. Colbert, and Q. Zhu, “A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy,” ACM Computing Surveys, vol. 52, no. 4, 2019.
  • [30] M. O. Sayin and T. Başar, “Deception-as-defense framework for cyber-physical systems,” in Safety, Security and Privacy for Cyber-Physical Systems, R. M. Ferrari and A. M. H. Teixeira, Eds.   Springer International Publishing, 2021, pp. 287–317.
  • [31] H. Sasahara and H. Sandberg, “Epistemic signaling games for cyber deception with asymmetric recognition,” IEEE Contr. Syst. Lett., vol. 6, pp. 854–859, 2022.
  • [32] J. Pawlick and Q. Zhu, Game Theory for Cyber Deception: From Theory to Applications, ser. Static & Dynamic Game Theory: Foundations & Applications.   Springer, 2021.
  • [33] J. Pawlick, E. Colbert, and Q. Zhu, “Modeling and analysis of leaky deception using signaling games with evidence,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 7, pp. 1871–1886, July 2019.
  • [34] Q. Zhu and Z. Xu, “Secure estimation of CPS with a digital twin,” in Cross-Layer Design for Secure and Resilient Cyber-Physical Systems.   Springer, 2020, pp. 115–138.
  • [35] P. Diaconis and D. Freedman, “On the consistency of Bayes estimation,” The Annals of Statistics, vol. 14, no. 1, pp. 1–26, 1986.
  • [36] S. Walker, “New approaches to Bayesian consistency,” The Annals of Statistics, vol. 32, no. 5, pp. 2028–2043, 2004.
  • [37] P. Eichelsbacher and A. Ganesh, “Bayesian inference for Markov chains,” Journal of Applied Probability, vol. 39, no. 1, pp. 91–99, 2002.
  • [38] H. Sasahara, S. Sarıtaş, and H. Sandberg, “Asymptotic security of control systems by covert reaction: Repeated signaling game with undisclosed belief,” in Proc. 59th IEEE Conference on Decision and Control, 2020.
  • [39] H. Sasahara and H. Sandberg, “Asymptotic security by model-based incident handlers for Markov decision processes,” in Proc. 60th IEEE Conference on Decision and Control, 2021.
  • [40] R. Durrett, Probability: Theory and Examples, ser. Cambridge Series in Statistical and Probabilistic Mathematics.   Cambridge University Press, 2019.
  • [41] A. Rasekh, A. Hassanzadeh, S. Mulchandani, S. Modi, and M. K. Banks, “Smart water networks and cyber security,” Journal of Water Resources Planning and Management, vol. 142, no. 7, 2016.
  • [42] E. Creaco, A. Campisano, N. Fontana, G. Marini, P. R. Page, and T. Walski, “Real time control of water distribution newtorks: A state-of-the-art review,” Water Research, vol. 161, pp. 517–530, 2019.
  • [43] R. Taormina, S. Galelli, N. O. Tippenhauer, E. Salomons, and A. Ostfeld, “Characterizing cyber-physical attacks on water distribution systems,” Journal of Water Resources Planning and Management, vol. 143, no. 5, 2017.
  • [44] P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in Proc. International Conference on Communications and Multimedia Security, 2014, pp. 63–72.
  • [45] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming.   John Wiley & Sons, Inc., 1994.
  • [46] S. Zamir, “Bayesian games: Games with incomplete information,” in Encyclopedia of Complexity and Systems Science.   Springer, 2009, pp. 426–441.
  • [47] A. Heifetz, “Commitment,” in Game Theory: Interactive Strategies in Economics and Management.   Cambridge University Press, 2012, ch. 20.
  • [48] J. Letchford, D. Korzhyk, and V. Conitzer, “On the value of commitment,” Autonomous Agents and Multi-Agent Systems, vol. 28, no. 6, pp. 986–1016, 2014.
  • [49] H. S. Chang and S. I. Marcus, “Two-person zero-sum Markov games: Receding horizon approach,” IEEE Trans. Autom. Control, vol. 48, no. 11, pp. 1951–1961, 2003.
  • [50] E. Çinlar, Probability and Statistics, ser. Graduate Texts in Mathematics.   Springer, 2011.
{IEEEbiography}

[[Uncaptioned image]]Hampei Sasahara(M’15) received the Ph.D. degree in engineering from Tokyo Institute of Technology in 2019. He is currently an Assistant Professor with Tokyo Institute of Technology, Tokyo, Japan. From 2019 to 2021, he was a Postdoctoral Scholar with KTH Royal Institute of Technology, Stockholm, Sweden. His main interests include secure control system design and control of large-scale systems.

{IEEEbiography}

[[Uncaptioned image]]Henrik Sandberg(F’23) received the M.Sc. degree in engineering physics and the Ph.D. degree in automatic control from Lund University, Lund, Sweden, in 1999 and 2004, respectively. He is a Professor with the Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, Sweden. From 2005 to 2007, he was a Postdoctoral Scholar with the California Institute of Technology, Pasadena, CA, USA. In 2013, he was a Visiting Scholar with the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA. He has also held visiting appointments with the Australian National University, Canberra, ACT, USA, and the University of Melbourne, Parkville, VIC, Australia. His current research interests include security of cyberphysical systems, power systems, model reduction, and fundamental limitations in control. Dr. Sandberg received the Best Student Paper Award from the IEEE Conference on Decision and Control in 2004, an Ingvar Carlsson Award from the Swedish Foundation for Strategic Research in 2007, and Consolidator Grant from the Swedish Research Council in 2016. He has served on the editorial boards of IEEE Transactions on Automatic Control and the IFAC Journal Automatica.