This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Nonzero-sum Discrete-time Stochastic Games with Risk-sensitive Ergodic Cost Criterion

Bivakar Bose   Chandan Pal   Somnath Pradhan   and Subhamay Saha Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, email: b.bivakar@iitg.ernet.inDepartment of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, email: cpal@iitg.ernet.in Department of Mathematics, Indian Institute Science Education and Research, Bhopal, Bhopal-462066 India, email: somnath@iiserb.ac.in Department of Mathematics, Indian Institute of Technology, Guwahati, Guwahati-781039 India, email: saha.subhamay@iitg.ac.in
Abstract

In this paper we study infinite horizon nonzero-sum stochastic games for controlled discrete-time Markov chains on a Polish state space with risk-sensitive ergodic cost criterion. Under suitable assumptions we show that the associated ergodic optimality equations admit unique solutions. Finally, the existence of Nash-equilibrium in randomized stationary strategies is established by showing that an appropriate set-valued map has a fixed point.

Key Words : Nonzero-sum game, risk-sensitive ergodic cost criterion, optimality equations, Nash equilibrium, set-valued map

Mathematics Subject Classification: 91A15, 91A50

1 Introduction

We consider infinite horizon ergodic-cost risk-sensitive two-person nonzero-sum stochastic games for discrete time Markov decision processes (MDPs) on a general state space with bounded cost function. We first establish the existence of unique solutions of the corresponding optimality equations. This is used to establish the existence of optimal randomized stationary strategies for a player fixing the strategy of the other player. We then define a suitable topology on the space of randomized stationary strategies. Then under certain separability assumptions we show that a suitably defined set-valued map has a fixed point. This gives the existence of a Nash equilibrium for the nonzero-sum game under consideration. The main step towards this end is to establish the upper semi-continuity of the defined set-valued map.

In a stochastic control problem, since the cost payable depends on the random evolution of the underlying stochastic process, the same is also random. So the most basic approach is to minimize the expected cost. This is the risk-neutral approach. But the obvious short coming of this approach is that it does not take care of the risk factor. In the risk-sensitive criterion, the expected value of the exponential of the cost is considered. Generally the risk associated with a random quantity is quantified by the standard deviation. Hence risk-sensitive criterion provides significantly better protection from the risk since it captures the effects of the higher order moments of the cost as well, see [24] for more details. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost; accumulated costs are multiplicative over time as opposed to additive in the risk-neutral case. Risk-sensitive control problems also has deep connections to robust control theory, control theory literature which deals with model uncertainty, see [24] and references therein. Risk-sensitive criterion also arises naturally in portfolio optimization problems in mathematical finance, see [6] and associated references. Nonzero-sum stochastic games arise naturally in strategic multi-agent decision-making system like socioeconomic systems [16] , network security [1], routing and scheduling [19] and so on.

Following the seminal work of of Howard and Matheson [15] there has been a lot of work on risk-sensitive control problems. For an up to date survey on ergodic risk-sensitive control problems with underlying stochastic process being discrete-time Markov chains, see [7]. In the multi-controller set up, discrete-time risk-sensitive zero-sum games on countable state space have been studied in [2, 11]. In [4] the authors consider discrete-time zero-sum risk-sensitive stochastic games on Borel state space for both discounted and ergodic cost criterion. Their analysis involves transforming the original risk-sensitive problem to a risk-neutral problem. Nonzero-sum risk-sensitive stochastic games with both discounted and ergodic cost criterion for countable state space and bounded cost has been studied in [3, 22]. In [23] discrete-time nonzero-sum stochastic games under the risk-sensitive ergodic cost criterion with countable state space and unbounded cost function has been investigated. To the best of our knowledge, this is the first paper which considers risk-sensitive nonzero-sum games on a general state space. The analysis of nonzero-sum games on general state space is significantly more involved as compared to the case of countable state space. One of the reasons for this is that in case of countable state space the topology on the space of randomized stationary controls is fairly simple to work with. In case of general state space this becomes a substantial challenge. In the risk-neutral setup itself the analysis becomes substantially involved and requires additional assumptions. In the risk-neutral set up discrete-time nonzero-sum ergodic cost games with general state space has been studied in [10, 17, 21]. In [10], the authors prove the existence of a Nash-equilibrium under an additive reward additive transition(ARAT) assumption. In [17], the authors establish the existence of an ϵ\epsilon-Nash equilibrium. In [21], the authors assume that the transition law is a combination of finitely many probability measures. In this work we also assume ARAT condition.

In this paper we first consider ergodic optimality equations which correspond to control problems with the strategy of one player fixed. We show the existence of unique solutions to these equations. We follow a span contraction approach. For this analysis we make certain ergodicity assumption along with a few extra assumptions on the state transition kernel. Analogous assumptions appear in [8] where the authors consider risk-sensitive control problems with ergodic cost criterion. In order to establish the existence of a Nash equilibrium we first define an appropriate set valued map. In order to to show the existence of a Nash equilibrium for the nonzero-sum game we use Fan’s fixed point theorem [9]. This involves showing that the defined set valued map is upper semi-continuous under the appropriate topology.

The rest of the paper is organized as follows. Section 2 introduces the game model, some preliminaries and notations. In section 3 we show the existence of a unique solution to the optimality equation using a certain span-norm contraction. Existence of the Nash-equilibrium has been shown in section 4.

2 The Game Model

In this section, we present the discrete-time nonzero-sum stochastic game model and introduce the notations utilized throughout the paper. The following elements are needed to describe the discrete-time nonzero-sum stochastic game:

{X,A,B,P,c1,c2},\displaystyle\{X,A,B,P,c_{1},c_{2}\},

where each component is described below.

  • XX is the state space assumed to be Polish space endowed with the Borel σ\sigma-algebra 𝒳\mathcal{X}.

  • AA and BB are action spaces of player 11 and 22 respectively, assumed to be compact metric spaces. Let 𝒜\mathcal{A} and \mathcal{B} denote the Borel σ\sigma-algebras on AA and BB respectively.

  • PP is the transition kernel from X×A×B𝒫(X)X\times A\times B\to\mathcal{P}(X), where for any metric space DD, 𝒫(D)\mathcal{P}(D) denotes the space of all probability measures on DD with the topology of weak convergence.

  • ci:X×A×Bc_{i}:X\times A\times B\to\mathbb{R}, i=1,2,i=1,2, is one-stage cost function for player ii, assumed to be bounded and continuous on A×BA\times B. Since, the cost is bounded, without loss of generality let, 0cic¯0\leq c_{i}\leq\bar{c}.

At each stage(time) the players observe the current state xXx\in X of the system and then player 11 and 22 independently choose actions aAa\in A and bBb\in B respectively. As a consequence two things happen:

  • (i)

    player i,i=1,2i,i=1,2, pays an immediate cost ci(x,a,b).c_{i}(x,a,b).

  • (ii)

    the system moves to a new state yy with the distribution P(|x,a,b).P(\cdot|x,a,b).

The whole process then repeats from the new state yy. The cost accumulates throughout the course of the game. The planning horizon is taken to be infinite and each player wants to minimize his/her infinite-horizon risk-sensitive cost with respect to some cost criterion, which is in our case defined by (2.1) below.

At each stage the players choose their actions independently on the basis of the available information. The information available for decision making at time t0:={0,1,2,}t\in\mathbb{N}_{0}:=\{0,1,2,\ldots\} is given by the history of the process up to that time

ht:=(x0,a0,b0,x1,a1,b1,,at1,bt1,xt)Ht,h_{t}:=\left(x_{0},a_{0},b_{0},x_{1},a_{1},b_{1},\ldots,a_{t-1},b_{t-1},x_{t}\right)\in H_{t},

where H0=X,Ht=Ht1×(A×B×X),,H=(X×A×B)H_{0}=X,H_{t}=H_{t-1}\times(A\times B\times X),\ldots,H_{\infty}=(X\times A\times B)^{\infty} are the history spaces. The history spaces are endowed with the corresponding Borel σ\sigma-algebra. A strategy for player 1 is a sequence π1={πt1}t0\pi^{1}=\left\{\pi_{t}^{1}\right\}_{t\in\mathbb{N}_{0}} of stochastic kernels πt1:Ht𝒫(A)\pi_{t}^{1}:H_{t}\rightarrow\mathcal{P}(A). The set of all strategies for player 1 is denoted by Π1\Pi_{1}. A strategy π1Π1\pi^{1}\in\Pi_{1} is called a Markov strategy if

πt1(ht1,a,b,x)()=πt1(ht1,a,b,x)()\pi_{t}^{1}\left(h_{t-1},a,b,x\right)(\cdot)=\pi_{t}^{1}\left(h_{t-1}^{\prime},a^{\prime},b^{\prime},x\right)(\cdot)

for all ht1,ht1Ht1,a,aA,b,bB,xX,t0h_{t-1},h_{t-1}^{\prime}\in H_{t-1},a,a^{\prime}\in A,b,b^{\prime}\in B,x\in X,t\in\mathbb{N}_{0}. Thus a Markov strategy for player 1 can be identified with a sequence of measurable maps {Φt1},Φt1:X𝒫(A)\left\{\varPhi_{t}^{1}\right\},\varPhi_{t}^{1}:X\rightarrow\mathcal{P}(A). A Markov strategy {Φt1}\left\{\Phi_{t}^{1}\right\} is called a stationary strategy if Φt1=Φ:X𝒫(A)\varPhi_{t}^{1}=\Phi:X\rightarrow\mathcal{P}(A) for all tt. Let M1M_{1} and S1S_{1} denote the set of Markov and stationary strategies for player 1 , respectively. The strategies for player 2 are defined similarly. Let Π2,M2,\Pi_{2},M_{2}, and S2S_{2} denote the set of arbitrary, Markov and stationary strategies for player 2, respectively.

Given an initial distribution π~0𝒫(X)\tilde{\pi}_{0}\in\mathcal{P}(X) and a pair of strategies (π1,π2)Π1×Π2(\pi^{1},\pi^{2})\in\Pi_{1}\times\Pi_{2}, the corresponding state and action processes {Xt},{At},{Bt}\left\{X_{t}\right\},\left\{A_{t}\right\},\left\{B_{t}\right\} are stochastic processes defined on the canonical space (H,𝔅(H),Pπ~0π1,π2)(H_{\infty},\mathfrak{B}(H_{\infty}),P_{\tilde{\pi}_{0}}^{\pi^{1},\pi^{2}})(where 𝔅(H)\mathfrak{B}(H_{\infty}) is the Borel σ\sigma-field on HH_{\infty}) via the projections Xt(h)=xt,At(h)=at,Bt(h)=btX_{t}\left(h_{\infty}\right)=x_{t},A_{t}\left(h_{\infty}\right)=a_{t},B_{t}\left(h_{\infty}\right)=b_{t}, where Pπ~0π1,π2P_{\tilde{\pi}_{0}}^{\pi^{1},\pi^{2}} is uniquely determined by π1,π2\pi^{1},\pi^{2} and π~0\tilde{\pi}_{0} by Ionescu Tulcea’s theorem [5, Proposition 7.28]. When π~0=δx0\tilde{\pi}_{0}=\delta_{x_{0}} (the dirac measure at x0x_{0}),x0X~{}x_{0}\in X, we simply write this probability measure as Px0π1,π2P_{x_{0}}^{\pi_{1},\pi_{2}}. For htHt,aA,bBh_{t}\in H_{t},a\in A,b\in B, we have

Px0π1,π2(X0=x0)=1,\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(X_{0}=x_{0})=1,
Px0π1,π2(Xt+1E|ht,At=a,Bt=b)=P(E|xt,a,b)E𝒳,\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(X_{t+1}\in E|h_{t},A_{t}=a,B_{t}=b)=P(E|x_{t},a,b)~{}\forall E\in\mathcal{X},
Px0π1,π2(AtF,BtG|ht)=πt1(ht)(F)πt2(ht)(G)F𝒜,G.\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(A_{t}\in F,B_{t}\in G|h_{t})=\pi_{t}^{1}(h_{t})(F)\pi_{t}^{2}(h_{t})(G)~{}\forall F\in\mathcal{A},~{}\forall G\in\mathcal{B}.

Let the corresponding expectation operator be denoted by Eπ~0π1,π2(Ex0π1,π2)E_{{\tilde{\pi}}_{0}}^{\pi^{1},\pi^{2}}(E_{x_{0}}^{\pi^{1},\pi^{2}}) with respect to the probability measure Pπ~0π1,π2(Px0π1,π2)P_{{\tilde{\pi}}_{0}}^{\pi^{1},\pi^{2}}(P_{x_{0}}^{\pi^{1},\pi^{2}}). Now from [13] we know that under any (Φ×Ψ)S1×S2(\Phi\times\Psi)\in S_{1}\times S_{2}, the corresponding stochastic process XtX_{t} is a Markov process.

Ergodic cost criterion: We now define the risk-sensitive ergodic cost criterion for nonzero-sum discrete-time game. Let (Xt,At,Bt)(X_{t},A_{t},B_{t}) be the corresponding process with X0=xXX_{0}=x\in X and θ>0\theta>0 be the risk-sensitive parameter. For a pair of strategies (π1,π2)(Π1×Π2)(\pi^{1},\pi^{2})\in(\Pi_{1}\times\Pi_{2}), the risk-sensitive ergodic cost criterion for player i=1,2i=1,2 is given by

Jiπ1,π2(x)=lim supn1nlnExπ1,π2[eθt=0n1ci(xt,at,bt)]J^{\pi^{1},\pi^{2}}_{i}(x)=\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\pi^{2}}_{x}\bigg{[}e^{{\theta}{\sum_{t=0}^{n-1}c_{i}(x_{t},a_{t},b_{t})}}\bigg{]} (2.1)

Since the risk-sensitive parameter remains the same throughout, we assume without loss of generality that θ=1\theta=1. Note that, Jiπ1,π2J^{\pi^{1},\pi^{2}}_{i} for i=1,2i=1,2 are bounded as our cost functions are bounded.

Nash equilibrium: A pair of strategies (π1,π2)Π1×Π2(\pi^{*1},\pi^{*2})\in\Pi_{1}\times\Pi_{2} is called a Nash equilibrium(for the ergodic cost criterion) if

J1π1,π2(x)J1π1,π2(x)for all π1Π1andxXJ^{\pi^{*1},\pi^{*2}}_{1}(x)\leq J^{\pi^{1},\pi^{*2}}_{1}(x)~{}\text{for all }\pi^{1}\in\Pi_{1}~{}\text{and}~{}x\in X

and

Jπ1,π22(x)J2π1,π2(x)for all π2Π2andxX.J^{\pi^{*1},\pi^{*2}_{2}}(x)\leq J^{\pi^{*1},\pi^{2}}_{2}(x)~{}\text{for all }\pi^{2}\in\Pi_{2}~{}\text{and}~{}x\in X.

Our primary goal is to establish the existence of a Nash equilibrium in stationary strategies.

For i=1,2i=1,2, let us define transition measures P~i\tilde{P}_{i} from X×A×B𝒫(X)X\times A\times B\to\mathcal{P}(X) by

P~i(dy|x,a,b)=eci(x,a,b)P(dy|x,a,b).\tilde{P}_{i}(dy|x,a,b)=e^{c_{i}(x,a,b)}P(dy|x,a,b). (2.2)

Moreover, define for i=1,2,φ𝒫(A)i=1,2,~{}\varphi\in\mathcal{P}(A) and ψ𝒫(B)\psi\in\mathcal{P}(B)

P(dy|x,φ,ψ):=BAP(dy|x,a,b)φ(da)ψ(db)P(dy|x,\varphi,\psi):=\int_{B}\int_{A}P(dy|x,a,b)\varphi(da)\psi(db)

and

P~i(dy|x,φ,ψ):=BAP~i(dy|x,a,b)φ(da)ψ(db).\tilde{P}_{i}(dy|x,\varphi,\psi):=\int_{B}\int_{A}\tilde{P}_{i}(dy|x,a,b)\varphi(da)\psi(db).

Obviously P~i\tilde{P}_{i} for i=1,2i=1,2 is in general is not a probability measure. The normalizing constant for xX,φ𝒫(A),x\in X,\varphi\in\mathcal{P}(A), and ψ𝒫(B)\psi\in\mathcal{P}(B) is given by

c~i(x,φ,ψ)=XP~i(dy|x,φ,ψ)=BAeci(x,a,b)φ(da)ψ(db).\tilde{c}_{i}(x,\varphi,\psi)=\int_{X}\tilde{P}_{i}(dy|x,\varphi,\psi)=\int_{B}\int_{A}e^{c_{i}(x,a,b)}\varphi(da)\psi(db). (2.3)

Since 0cic¯0\leq c_{i}\leq\bar{c}, the function c~i\tilde{c}_{i} is also bounded. More precisely 1c~i(x,φ,ψ)ec¯1\leq\tilde{c}_{i}(x,\varphi,\psi)\leq e^{\bar{c}} for i=1,2i=1,2 and for each xX,φ𝒫(A)x\in X,\varphi\in\mathcal{P}(A) and ψ𝒫(B)\psi\in\mathcal{P}(B). Thus for i=1,2i=1,2

P^i(|x,φ,ψ):=P~i(|x,φ,ψ)c~i(x,φ,ψ)\hat{P}_{i}(\cdot|x,\varphi,\psi):=\frac{\tilde{P}_{i}(\cdot|x,\varphi,\psi)}{\tilde{c}_{i}(x,\varphi,\psi)} (2.4)

defines a probability transition kernel and we also use the notation c^i(x,φ,ψ)\hat{c}_{i}(x,\varphi,\psi):= lnc~i(x,φ,ψ)\ln\tilde{c}_{i}(x,\varphi,\psi) for i=1,2i=1,2.

We will use the above transformations, to convert our optimality equations (3.5) and (3.9) into a well-known equation. This process is beneficial as it helps us to prove the existence of the unique solution to the optimality equations as we will see in the next section.

Define 𝔹(X)\mathbb{B}(X), the space of all real valued bounded measurable functions on XX endowed with the supremum norm .\|.\|. For a fixed φ𝒫(A),\varphi\in\mathcal{P}(A), and v𝔹(X)v\in\mathbb{B}(X) define the operator:

Tv(x)=infψ𝒫(B)[c^2(x,φ,ψ)+lnXev(y)P^2(dy|x,φ,ψ)].Tv(x)=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\varphi,\psi)+ln\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)\bigg{]}. (2.5)

Due to the dual representation of the exponential certainty equivalent [12, Lemma 3.3] it is possible to write (2.5) as

Tv(x)=infψsupμ[c^2(x,φ,ψ)+Xv(y)μ(dy)I(μ,P^2(|x,φ,ψ)],Tv(x)=\inf_{\psi}\sup_{\mu}\bigg{[}\hat{c}_{2}(x,\varphi,\psi)+\int_{X}v(y)\mu(dy)-I(\mu,\hat{P}_{2}(\cdot|x,\varphi,\psi)\bigg{]}, (2.6)

where the supremum is over all probability measures μ𝒫(X)\mu\in\mathcal{P}(X) and I(p,q)I(p,q) is the relative entropy of the two probability measures p,qp,q which is defined by

I(p,q):=Xlndpdqp(dx)I(p,q):=\int_{X}ln\frac{dp}{dq}p(dx)

when p<<qp<<q and ++\infty otherwise. Note that the supremum in (2.6) is attained at the probability measure given by

μxψvφ(E):=Eev(y)P^2(dy|x,φ,ψ)Xev(y)P^2(dy|x,φ,ψ)\mu^{\varphi}_{x\psi v}(E):=\frac{\int_{E}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)} (2.7)

for measurable set EXE\subset X. Obviously μxψvφ\mu^{\varphi}_{x\psi v} can be interpreted as a transition kernel.

Let v𝔹(X)v\in\mathbb{B}(X). The span semi-norm of vv is defined as:

vsp=supxXv(x)infxXv(x).\|v\|_{sp}=\sup_{x\in X}v(x)-\inf_{x\in X}v(x). (2.8)

In the last part of this section we make a couple of assumptions that will be in force throughout the rest of the paper. First we will consider the following continuity assumption which is quite standard in literature, see [4, 8, 10, 18] for instance. This will allow us to show that the map TT defined in (2.5) is a contraction in the span semi-norm.

Assumption 2.1.

For a fixed xXx\in X the transition measure PP is strongly continuous in (a,b)(a,b), i.e. for all bounded and measurable v:Xv:X\to\mathbb{R} we have that (a,b)Xv(y)P(dy|x,a,b)(a,b)\mapsto\int_{X}v(y)P(dy|x,a,b) is continuous in (a,b)(a,b).

It follows from (2.2) that P~i(dy|x,a,b)\tilde{P}_{i}(dy|x,a,b) is also strongly continuous in (a,b)(a,b) for i=1,2i=1,2 which lead us the following remark.

Remark 2.1.

Due to the fact that we consider for any metric space DD, the space 𝒫(D)\mathcal{P}(D) is endowed with the topology of weak convergence, from Assumption 2.1 it follows immediately that for all bounded and measurable functions v:Xv:X\mapsto\mathbb{R} and a fixed xXx\in X the map (φ,ψ)Xv(y)P^i(dy|x,φ,ψ),(\varphi,\psi)\mapsto\int_{X}v(y)\hat{P}_{i}(dy|x,\varphi,\psi), i=1,2,i=1,2, is continuous in (φ,ψ)𝒫(A)×𝒫(B)(\varphi,\psi)\in\mathcal{P}(A)\times\mathcal{P}(B).

Next we have the following ergodicity assumption.

Assumption 2.2.

There exists a real number 0<δ<10<\delta<1 such that

supP(|x,φ,ψ)P(x,φ,ψ)TV2δ,\sup\left\|{P}(\cdot|x,\varphi,\psi)-{P}\left(\cdot\mid x^{\prime},\varphi^{\prime},\psi^{\prime}\right)\right\|_{\mathrm{TV}}\leq 2\delta,

where the supremum is over all x,xX,φ,φ𝒫(A),ψ,ψ𝒫(B)x,x^{\prime}\in X,\varphi,\varphi^{\prime}\in\mathcal{P}(A),\psi,\psi^{\prime}\in\mathcal{P}(B) and TV\|\cdot\|_{\mathrm{TV}} denotes the total variation norm.

For xX,aA,bB,P(𝒪|x,a,b)>0x\in X,a\in A,b\in B,~{}P(\mathcal{O}|x,a,b)>0 for any open set 𝒪X\mathcal{O}\subset X.

3 Solution to the optimality equations

In this section, we demonstrate that the operator TT defined in (2.5) is a contraction. The fixed point of TT corresponds to the solution of the optimality equation for player 1. In the latter part of this section, we define another operator UU corresponding to player 2 and establish results analogous to those obtained for TT.

Proposition 3.1.

Under Assumptions 2.1 and 2.2 the operator TT maps 𝔹(X)\mathbb{B}(X) to 𝔹(X)\mathbb{B}(X) and for each M>0M>0, there exists a positive constant α(M)<1\alpha(M)<1 such that for all v1,v2𝔹M(X)v_{1},v_{2}\in\mathbb{B}_{M}(X)

Tv1Tv2spα(M)v1v2sp,\left\|Tv_{1}-Tv_{2}\right\|_{sp}\leq\alpha(M)\left\|v_{1}-v_{2}\right\|_{sp},

where 𝔹M(X)={v𝔹(X):vspM}\mathbb{B}_{M}(X)=\{v\in\mathbb{B}(X):\|v\|_{sp}\leq M\}.

Proof.

From the definition of TT it follows that TT transforms 𝔹(X)\mathbb{B}(X) into itself and the infimum is attained. For given functions v1,v2𝔹(X)v_{1},v_{2}\in\mathbb{B}(X) and x1,x2Xx_{1},x_{2}\in X, let Ψ1,Ψ2S2\Psi_{1},\Psi_{2}\in S_{2} be such that for i=1,2i=1,2

Tvi(xi)=supμ[c^2(xi,φ,Ψi(xi))+Xvi(y)μ(dy)I(μ,P^2(|xi,φ,Ψi(xi))].Tv_{i}(x_{i})=\sup_{\mu}\big{[}\hat{c}_{2}(x_{i},\varphi,\Psi_{i}(x_{i}))+\int_{X}v_{i}(y)\mu(dy)-I(\mu,\hat{P}_{2}(\cdot|x_{i},\varphi,\Psi_{i}^{*}(x_{i}))\big{]}.

Then we obtain that

(Tv1)(x2)(Tv2)(x2)((Tv1)(x1)(Tv2)(x1))\displaystyle(Tv_{1})(x_{2})-(Tv_{2})(x_{2})-((Tv_{1})(x_{1})-(Tv_{2})(x_{1}))
supμ{c^2(x2,φ,Ψ2(x2))+Xv1(y)μ(dy)I(μ,P2^(|x2,φ,Ψ2(x2)))}\displaystyle\leq\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}
supμ{c^2(x2,φ,Ψ2(x2))+Xv2(y)μ(dy)I(μ,P2^(|x2,φ,Ψ2(x2)))}\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{2}(y)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}
supμ{c^2(x1,φ,Ψ1(x1))+Xv1(y)μ(dy)I(μ,P2^(|x1,φ,Ψ1(x1)))}\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{1}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}
+supμ{c^2(x1,φ,Ψ1(x1))+Xv2(y)μ(dy)I(μ,P2^(|x1,φ,Ψ1(x1)))}\displaystyle+\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}
c^2(x2,φ,Ψ2(x2))+Xv1(y)μx2Ψ2v1φ(dy)I(μx2Ψ2v1φ,P2^(|x2,φ,Ψ2(x2)))\displaystyle\leq\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)-I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)
c^2(x2,φ,Ψ2(x2))Xv2(y)μx2Ψ2v1φ(dy)+I(μx2Ψ2v1φ,P2^(|x2,φ,Ψ2(x2)))\displaystyle-\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)-\int_{X}v_{2}(y)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)+I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)
c^2(x1,φ,Ψ1(x1))Xv1(y)μx1Ψ1v2φ(dy)+I(μx1Ψ1v2φ,P2^(|x1,φ,Ψ1(x1)))\displaystyle-\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)-\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)+I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)
+c^2(x1,φ,Ψ1(x1))+Xv2(y)μx1Ψ1v2φ(dy)I(μx1Ψ1v2φ,P2^(|x1,φ,Ψ1(x1)))\displaystyle+\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)-I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)
=Δ(v1(y)v2(y))(μx2Ψ2v1φμx1Ψ1v2φ)(dy)+Δc(v1(y)v2(y))(μx2Ψ2v1φμx1Ψ1v2φ)(dy)\displaystyle=\int_{\Delta}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)+\int_{\Delta^{c}}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)
supyX(v1(y)v2(y))(μx2Ψ2v1φμx1Ψ1v2φ)(Δ)+infyX(v1(y)v2(y))(μx2Ψ2v1φμx1Ψ1v2φ)(Δc)\displaystyle\leq\sup_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(\Delta)+\inf_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)({\Delta}^{c})
=v1v2sp(μx2Ψ2v1φμx1Ψ1v2φ)(Δ),\displaystyle=\|v_{1}-v_{2}\|_{sp}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(\Delta),

where the set Δ\Delta comes from the Hahn-Jordan decomposition of μx2Ψ2v1φμx1Ψ1v2φ\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}} and Δc\Delta^{c} denotes the complement of Δ\Delta. Now, taking supremum over x1,x2Xx_{1},x_{2}\in X in the above set-up we have

Tv1Tv2spv1v2spsupE𝒳supx1,x2XsupΨ1,Ψ2S2(μx2Ψ2v1φμx1Ψ1v2φ)(E).\left\|Tv_{1}-Tv_{2}\right\|_{sp}\leq\|v_{1}-v_{2}\|_{sp}\sup_{E\in\mathcal{X}}\sup_{x_{1},x_{2}\in X}\sup_{\Psi_{1},\Psi_{2}\in S_{2}}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(E).

We claim that

supv1,v2;v1sp,v2spMsupE𝒳supx1,x2XsupΨ1,Ψ2S2(μx2Ψ2v1φμx1Ψ1v2φ)(E)=α(M)<1.\sup_{v_{1},v_{2};\|v_{1}\|_{sp},\|v_{2}\|_{sp}\leq M}\sup_{E\in\mathcal{X}}\sup_{x_{1},x_{2}\in X}\sup_{\Psi_{1},\Psi_{2}\in S_{2}}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(E)=\alpha(M)<1. (3.1)

Suppose (3.1) does not hold. Then there exists sequences {v1n},{v2n}\{v_{1n}\},\{v_{2n}\} with v1nspM,v2nspM\|v_{1n}\|_{sp}\leq M,\|v_{2n}\|_{sp}\leq M, {En},En𝒳,{x1n},{x2n}\{E_{n}\},E_{n}\in\mathcal{X},\{x_{1n}\},\{x_{2n}\} and {Ψ1n},{Ψ2n}\{\Psi_{1n}\},\{\Psi_{2n}\} such that

(μx2nΨ2nv1nφμx1nΨ1nv2nφ)(En)1asn.(\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}}-\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}})(E_{n})\rightarrow 1~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

As μx2nΨ2nv1nφ\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}} and μx1nΨ1nv2nφ\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}} are probability measures therefore

μx2nΨ2nv1nφ(En)1asn,\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}}(E_{n})\to 1~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

μx1nΨ1nv2nφ(En)0asn.\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}}(E_{n})\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Since for each xX,ψ𝒫(B)x\in X,\psi\in\mathcal{P}(B) and v𝔹(X)v\in\mathbb{B}(X) from (2.7) we get

evspP2^(E|x,φ,ψ)μxψvφ(E),e^{-\|v\|_{sp}}\hat{P_{2}}\left(E|x,\varphi,\psi\right)\leq\mu^{\varphi}_{x\psi v}(E),

we have

P2^(Enc|x2n,φ,Ψ2n(x2n))0asn,\hat{P_{2}}\left(E_{n}^{c}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

P2^(En|x1n,φ,Ψ1n(x1n))0asn.\hat{P_{2}}\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Consequently using (2.4) direct calculations imply

P(Enc|x2n,φ,Ψ2n(x2n))0asn,P\left(E_{n}^{c}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

P(En|x1n,φ,Ψ1n(x1n))0asn.P\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Hence

limn(P(En|x2n,φ,Ψ2n(x2n))P(En|x1n,φ,Ψ1n(x1n)))=1\lim_{n\to\infty}\bigg{(}P\left(E_{n}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)-P\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\bigg{)}=1 (3.2)

But from Assumption 2.2(i)(i) we get, x1,x2X,ψ,ψ𝒫(B),E𝒳\forall x_{1},x_{2}\in X,\forall\psi,\psi^{\prime}\in\mathcal{P}(B),\forall E\in\mathcal{X}

P(E|x,φ,ψ)P(E|x,φ,ψ)δ,P(E|x,\varphi,\psi)-P(E|x^{\prime},\varphi,\psi^{\prime})\leq\delta,

which contradicts (3.2). Hence (3.1) holds and therefore the theorem also holds true. ∎

We will now make additional assumptions to show that TT is a global contraction in 𝔹L(X).\mathbb{B}_{L}(X).

Assumption 3.1.

There exists λ𝒫(X)\lambda\in\mathcal{P}(X) such that P(|x,a,b)<<λP(\cdot|x,a,b)<<\lambda for all xX,aA,x\in X,a\in A, and bB.b\in B. Also let h:X×A×B×Xh:X\times A\times B\times X\to\mathbb{R} be the Radon-Nikodym derivative of P(|x,a,b)P(\cdot|x,a,b) with respect to λ\lambda.

Assumption 3.2.
supx,xXsupyXsupaAsupbBh(x,a,b,y)h(x,a,b,y)=κ<.\sup_{x,x^{\prime}\in X}\sup_{y\in X}\sup_{a\in A}\sup_{b\in B}\frac{h(x,a,b,y)}{h(x^{\prime},a,b,y)}=\kappa<\infty.
Lemma 3.1.

Under the Assumptions 2.1, 3.1 and 3.2 the operator TT transforms 𝔹(X)\mathbb{B}(X) into 𝔹L(X)\mathbb{B}_{L}(X), where L=lnκ+3c¯L=ln\kappa+3\bar{c}. Furthermore, TT is a global contraction in 𝔹L(X).\mathbb{B}_{L}(X).

Proof.

Notice that for a v𝔹(X)v\in\mathbb{B}(X) we have

Tv(x)Tv(x)supψ𝒫(B)[c^2(x,φ,ψ)c^2(x,φ,ψ)+lnXev(y)P^2(dy|x,φ,ψ)Xev(y)P^2(dy|x,φ,ψ)].Tv(x)-Tv(x^{\prime})\leq\sup_{\psi\in\mathcal{P}(B)}\Bigg{[}\hat{c}_{2}(x,\varphi,\psi)-\hat{c}_{2}(x^{\prime},\varphi,\psi)+ln\frac{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x^{\prime},\varphi,\psi)}\Bigg{]}. (3.3)

Now we get

lnXev(y)P^2(dy|x,φ,ψ)Xev(y)P^2(dy|x,φ,ψ)\displaystyle ln\frac{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x^{\prime},\varphi,\psi)}
=lnXev(y)BAec2(x,a,b)h(x,a,b,y)h(x,a,b,y)h(x,a,b,y)φ(da)ψ(db)λ(dy)Xev(y)BAec2(x,a,b)h(x,a,b,y)φ(da)ψ(db)λ(dy).c~2(x,φ,ψ)c~2(x,φ,ψ)\displaystyle=ln\frac{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}{\frac{h(x,a,b,y)}{h(x^{\prime},a,b,y)}}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x^{\prime},a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}.\frac{\tilde{c}_{2}(x^{\prime},\varphi,\psi)}{\tilde{c}_{2}(x,\varphi,\psi)}
lnκ+lnc~2(x,φ,ψ)c~2(x,φ,ψ)+lnXev(y)BAec2(x,a,b)h(x,a,b,y)φ(da)ψ(db)λ(dy)Xev(y)BAec2(x,a,b)h(x,a,b,y)φ(da)ψ(db)λ(dy)\displaystyle\leq ln\kappa+ln\frac{\tilde{c}_{2}(x^{\prime},\varphi,\psi)}{\tilde{c}_{2}(x,\varphi,\psi)}+ln\frac{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x^{\prime},a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}
lnκ+lnec¯+lnec¯Xev(y)BAh(x,a,b,y)φ(da)ψ(db)λ(dy)Xev(y)BAh(x,a,b,y)φ(da)ψ(db)λ(dy)\displaystyle\leq ln\kappa+lne^{\bar{c}}+ln\frac{e^{\bar{c}}\int_{X}e^{v(y)}\int_{B}\int_{A}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}
=\displaystyle= lnκ+2c¯.\displaystyle ln\kappa+2\bar{c}.

In the above expression the first equality follows from Assumption 3.1, first inequality follows from Assumption 3.2 and the last inequality follows from (2.3) and the fact that 0c2c¯0\leq c_{2}\leq\bar{c}. So, from (3.3) we have

Tv(x)Tv(x)lnκ+3c¯.Tv(x)-Tv(x^{\prime})\leq ln\kappa+3\bar{c}. (3.4)

From (3.4) it follows that TvspL\|Tv\|_{sp}\leq L. Hence TT is a global contraction in the span norm on 𝔹L(X).\mathbb{B}_{L}(X).

Before proceeding to the main theorem of this section, we briefly outline some main points. Suppose player 2 announces that he/she is going to employ a strategy ΨS2\Psi\in S_{2}. In such a scenario, player 1 attempts to minimize

J1π1,Ψ(x)=lim supn1nlnExπ1,Ψ[et=0n1ci(xt,at,bt)]J^{\pi^{1},\Psi}_{1}(x)=\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\Psi}_{x}\bigg{[}e^{\sum_{t=0}^{n-1}c_{i}(x_{t},a_{t},b_{t})}\bigg{]}

over π1Π1\pi^{1}\in\Pi_{1}. Thus for player 1 it is a discrete-time Markov decision problem with risk sensitive ergodic cost. Player 2 go through equivalent situations when player 1 announces his strategy to be ΦS1\Phi\in S_{1}. This leads us to the following theorem.

Theorem 3.1.

Suppose Assumptions 2.1, 2.2, 3.1 and 3.2 are satisfied.Then for ΦS1\Phi\in S_{1}, there exists a unique solution pair (ρ2,v2)+×𝔹L(X)(\rho_{2}^{*},v_{2}^{*})\in\mathbb{R_{+}}\times\mathbb{B}_{L}(X) with v2(x0)=0v_{2}^{*}(x_{0})=0, satisfying

ev(x)+ρ=infψ𝒫(B)BAec2(x,a,b)Xev(y)P(dy|x,a,b)Φ(x)(da)ψ(db).e^{v(x)+\rho}={\inf_{\psi\in\mathcal{P}(B)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v(y)}P(dy|x,a,b)\Phi(x)(da)\psi(db)}. (3.5)

In addition, a strategy ΨS2\Psi^{*}\in S_{2} is an optimal strategy of player 2 given player 1 chooses Φ\Phi if and only if (3.5) attains point-wise minimum at Ψ\Psi^{*}. Moreover,

ρ2=infπ2Π2lim supn1nlnExΦ,π2[et=0n1c2(xt,at,bt)].\rho^{*}_{2}=\inf_{\pi^{2}\in\Pi_{2}}\limsup_{n\to\infty}\frac{1}{n}lnE^{\Phi,\pi^{2}}_{x}\bigg{[}e^{\sum_{t=0}^{n-1}c_{2}(x_{t},a_{t},b_{t})}\bigg{]}. (3.6)

(:=ρ2Φ=infπ2Π2J2Φ,π2)\bigg{(}:=\rho^{*\Phi}_{2}=\inf_{\pi^{2}\in\Pi_{2}}J^{\Phi,\pi^{2}}_{2}\bigg{)}

Proof.

Notice that (3.5) can be rewritten as

v(x)+ρ=infψ𝒫(B)[c^2(x,Φ(x),ψ)+lnXev(y)P^2(dy|x,Φ(x),ψ)].v(x)+\rho=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi)\bigg{]}. (3.7)

By Lemma 3.1, TT is a global contraction in the span norm in 𝔹L(X)\mathbb{B}_{L}(X), so that it has a fixed point v^2\hat{v}_{2} in 𝔹L(X)\mathbb{B}_{L}(X) and v^2\hat{v}_{2} (which is unique up to an additive constant) and the constant ρ2=Tv^2v^2\rho_{2}^{*}=T\hat{v}_{2}-\hat{v}_{2} are solutions to (3.7) and consequently to (3.5).

Let v2(x)=v^2(x)v^2(x0)v^{*}_{2}(x)=\hat{v}_{2}(x)-\hat{v}_{2}(x_{0}). Then v2(x0)=0v^{*}_{2}(x_{0})=0 and it can be easily seen that (ρ2,v2)(\rho_{2}^{*},v^{*}_{2}) satisfies (3.7). Since, v2(x)=v^2(x)v^2(x0)v^{*}_{2}(x)=\hat{v}_{2}(x)-\hat{v}_{2}(x_{0}) and v^2spL\|\hat{v}_{2}\|_{sp}\leq L, so v2spL\|v^{*}_{2}\|_{sp}\leq L as well.

Let (ρ,v)(\rho^{\prime},v^{\prime}) be another solution of (3.5) i.e., it satisfies ρ+v=Tv(x)\rho^{\prime}+v^{\prime}=Tv^{\prime}(x) with v(x0)=0v^{\prime}(x_{0})=0. Then clearly vv^{\prime} is also a span fixed point of TT. Hence v2(x)v(x)=constant.v^{*}_{2}(x)-v^{\prime}(x)=constant. Since v2(x0)v(x0)=0v^{*}_{2}(x_{0})-v^{\prime}(x_{0})=0, it follows that v2vv^{*}_{2}\equiv v^{\prime}. It then easily follows that ρ2=ρ\rho^{*}_{2}=\rho^{\prime}.

The proof of the remaining part is analogous to the proof in [12, Theorem 2.1] which has been done for countable state space but can easily be extended to our general state space case. ∎

For a fixed ψ𝒫(B),\psi\in\mathcal{P}(B), and v𝔹(X)v\in\mathbb{B}(X) define the operator:

Uv(x)=infφ𝒫(A)[c^1(x,φ,ψ)+lnXev(y)P^1(dy|x,φ,ψ)].Uv(x)=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\psi)+ln\int_{X}e^{v(y)}\hat{P}_{1}(dy|x,\varphi,\psi)\bigg{]}. (3.8)

By similar arguments we can also show that UU is a global contraction in the span norm in 𝔹L(X)\mathbb{B}_{L}(X) and the following theorem holds true.

Theorem 3.2.

Suppose Assumptions 2.1, 2.2, 3.1 and 3.2 are satisfied. Then for ΨS2\Psi\in S_{2}, there exists a unique solution pair (ρ1,v1)+×𝔹L(X)(\rho^{*}_{1},v_{1}^{*})\in\mathbb{R_{+}}\times\mathbb{B}_{L}(X) with v2(x0)=0v_{2}^{*}(x_{0})=0 (where x0x_{0} is some fixed state), satisfying

ev(x)+ρ=infφ𝒫(A)BAec1(x,a,b)Xev(y)P(dy|x,a,b)φ(da)Ψ(x)(db).e^{v(x)+\rho}={\inf_{\varphi\in\mathcal{P}(A)}\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v(y)}P(dy|x,a,b)\varphi(da)\Psi(x)(db)}. (3.9)

In addition, a strategy ΦS1\Phi^{*}\in S_{1} is an optimal strategy of player 1 given player 2 chooses Ψ\Psi if and only if (3.9) attains point-wise minimum at Φ\Phi^{*}. Moreover,

ρ1=infπ1Π1lim supn1nlnExπ1,Ψ[et=0n1c1(xt,at,bt)].\rho^{*}_{1}=\inf_{\pi^{1}\in\Pi_{1}}\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\Psi}_{x}\big{[}e^{\sum_{t=0}^{n-1}c_{1}(x_{t},a_{t},b_{t})}\big{]}. (3.10)

(:=ρ1Ψ=infπ1Π1J1π1,Ψ)\bigg{(}:=\rho^{*\Psi}_{1}=\inf_{\pi^{1}\in\Pi_{1}}J^{\pi^{1},\Psi}_{1}\bigg{)}

4 Existence of Nash equilibrium

In this section we establish the existence of a pair of stationary equilibrium strategies for a nonzero-sum game. To this end we first outline a standard procedure for establishing the existence of a Nash equilibrium. From Theorem 3.2 it follows that given that player 2 is using the strategy ΨS2\Psi\in S_{2}, we can find an optimal response ΦS1\Phi^{*}\in S_{1} for player 1 . Clearly Φ\Phi^{*} depends on Ψ\Psi and moreover there may be several optimal responses for player 1 in S1S_{1}. Analogous results holds for player 2 if player 1 announces that he is going to use a strategy ΦS1\Phi\in S_{1}. Hence given a pair of strategies (Φ,Ψ)S1×S2\left(\Phi,\Psi\right)\in S_{1}\times S_{2}, we can find a set of pairs of optimal responses {(Φ,Ψ)S1×S2}\left\{\left(\Phi^{*},\Psi^{*}\right)\in S_{1}\times S_{2}\right\} via the appropriate pair of optimality equations described above. This defines a set-valued map. Clearly any fixed point of this set-valued map is a Nash equilibrium.

To ensure the existence of a Nash equilibrium, we first take the following separability assumptions.

Assumption 4.1.

There exist two sub stochastic kernels

P1:X×A𝒫(X),P2:X×B𝒫(X)P_{1}:X\times A\rightarrow\mathcal{P}(X),\quad P_{2}:X\times B\rightarrow\mathcal{P}(X)

such that

P(x,a,b)=P1(x,a)+P2(x,b),xX,aA,bB.P(\cdot\mid x,a,b)=P_{1}(\cdot\mid x,a)+P_{2}(\cdot\mid x,b),\quad x\in X,\quad a\in A,\quad b\in B.

Since P<<λP<<\lambda, we have P1<<λP_{1}<<\lambda and P2<<λP_{2}<<\lambda. Let h1h_{1} and h2h_{2} be the respective densities. We assume that for each x,yXx,y\in X, h1(x,,y)h_{1}(x,\cdot,y) and h2(x,,y)h_{2}(x,\cdot,y) are continuous.

For each xXx\in X,

X[supa|h1(x,a,y)|+supb|h1(x,b,y)|]λ(dy)<\int_{X}\bigg{[}\sup_{a}|h_{1}(x,a,y)|+\sup_{b}|h_{1}(x,b,y)|\bigg{]}\lambda(dy)<\infty
Assumption 4.2.

The reward functions ci,i=1,2c_{i},i=1,2, are separable in action variables, i.e., there exist bounded continuous(in the second variable) functions

ci1:X×A,ci2:X×B,i=1,2,c_{i1}:X\times A\rightarrow\mathbb{R},\quad c_{i2}:X\times B\rightarrow\mathbb{R},\quad i=1,2,

such that

ci(x,a,b)=ci1(x,a)+ci2(x,b),xX,aA,bB.c_{i}(x,a,b)=c_{i1}(x,a)+c_{i2}(x,b),\quad x\in X,\quad a\in A,\quad b\in B.

Following [14] and [18] we topologize the spaces S1S_{1} and S2S_{2} with the topology of relaxed controls introduced in [20]. We identify two elements Φ,Φ^S1\Phi,\hat{\Phi}\in S_{1} if Φ=Φ^\Phi=\hat{\Phi} a.e. λ\lambda (where λ\lambda is as in Assumption 3.1). Let

Y1={f:X×AfY_{1}=\{f:X\times A\rightarrow\mathbb{R}\mid f is measurable in the first argument and continuous in the second and there exists gL1(λ)g\in L^{1}(\lambda) such that |f(x,a)|g(x)|f(x,a)|\leq g(x) for every aA}a\in A\}.

Then Y1Y_{1} is a Banach space with norm [20]

fW=Xsupa|f(x,a)|λ(dx).\|f\|_{W}=\int_{X}\sup_{a}|f(x,a)|\lambda(dx).

Every ΦS1\Phi\in S_{1} (with the λ\lambda-a.e. equivalence relation) can be identified with the element ΛΦY1\Lambda_{\Phi}\in Y_{1}^{*} (the dual of Y1Y_{1}) defined as

ΛΦ(f)=XAf(x,a)Φ(x)(da)λ(dx).\Lambda_{\Phi}(f)=\int_{X}\int_{A}f(x,a)\Phi(x)(da)\lambda(dx).

Thus S1S_{1} can be identified with a subset of Y1Y_{1}^{*}. Equip S1S_{1} with the weak-star topology. Then it can be shown as in [18] that S1S_{1} is compact and metrizable. S2S_{2} can be topologized analogously.

Next, we present the following lemmas, which play a pivotal role to show upper semi-continuity of a specific set-valued map which we have mentioned earlier.

Lemma 4.1.

Let, ΦmΦS1andΨmΨS2\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2} in the weak star topology. Then under Assumption 4.2 for i=1,2i=1,2, c^i(x,Φm,Ψm)c^i(x,Φ,Ψ)\hat{c}_{i}(x,\Phi_{m},\Psi_{m})\to\hat{c}_{i}(x,{\Phi},{\Psi}) as mm\to\infty.

Proof.

We have for i=1,2i=1,2,

c^i(x,Φm,Ψm)=lnc~i(x,Φm,Ψm)\displaystyle\hat{c}_{i}(x,\Phi_{m},\Psi_{m})=ln\tilde{c}_{i}(x,\Phi_{m},\Psi_{m})
=lnBAeci(x,a,b)Φm(x)(da)Ψm(x)(db)\displaystyle=ln\int_{B}\int_{A}e^{c_{i}(x,a,b)}\Phi_{m}(x)(da)\Psi_{m}(x)(db)
=lnAeci1(x,a)Φm(x)(da)+lnBeci2(x,b)Ψm(x)(db)\displaystyle=ln\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)+ln\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)
=lnX1.λ(dx)Aeci1(x,a)Φm(x)(da)+lnX1.λ(dx)Beci2(x,b)Ψm(x)(db)\displaystyle=ln\int_{X}1.\lambda(dx)\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)+ln\int_{X}1.\lambda(dx)\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)
=lnXA(1.eci1(x,a))Φm(x)(da)λ(dx)+lnXB(1.eci2(x,b))Ψm(x)(db)λ(dx).\displaystyle=ln\int_{X}\int_{A}\bigg{(}1.e^{c_{i1}(x,a)}\bigg{)}\Phi_{m}(x)(da)\lambda(dx)+ln\int_{X}\int_{B}\bigg{(}1.e^{c_{i2}(x,b)}\bigg{)}\Psi_{m}(x)(db)\lambda(dx).

Now by Assumption 4.2 and since ΦmΦandΨmΨ\Phi_{m}\to\Phi~{}\text{and}~{}\Psi_{m}\to\Psi in the weak star topology, the result is immediate. ∎

Lemma 4.2.

Suppose Assumptions 2.1, 3.1, 4.1 and 4.2 hold. Let {vm}\{v_{m}\} be a uniformly bounded sequence in 𝔹(X)\mathbb{B}(X) and v𝔹(X)v\in\mathbb{B}(X) be a weak star limit point of {vm}\{v_{m}\}. If ΦmΦS1andΨmΨS2\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2} in the weak star topology, then for each xXx\in X and i=1,2i=1,2,

Xvm(y)Pi^(dy|x,Φm,Ψm)Xv(y)Pi^(dy|x,Φ,Ψ)asm.\int_{X}{v_{m}(y)}\hat{P_{i}}(dy|x,\Phi_{m},\Psi_{m})\to\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi,\Psi)~{}~{}~{}~{}\text{as}~{}~{}~{}~{}m\rightarrow\infty.
Proof.

Note that

|Xvm(y)Pi^(dy|x,Φm(x),Ψm(x))Xv(y)Pi^(dy|x,Φ(x),Ψ(x))|\displaystyle\Bigg{|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi(x),\Psi(x))\Bigg{|} (4.1)
|Xvm(y)Pi^(dy|x,Φm(x),Ψm(x))Xv(y)Pi^(dy|x,Φm(x),Ψm(x))|\displaystyle\leq\Bigg{|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{|}
+|Xv(y)Pi^(dy|x,Φm(x),Ψm(x))Xv(y)Pi^(dy|x,Φ(x),Ψ(x))|.\displaystyle+\Bigg{|}\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi(x),\Psi(x))\Bigg{|}.

We claim that

Xv(y)Pi^(dy|x,Φm,Ψm)Xv(y)Pi^(dy|x,Φ,Ψ)asm.\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m},\Psi_{m})\to\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi,\Psi)~{}~{}~{}~{}\text{as}~{}~{}~{}~{}m\rightarrow\infty. (4.2)

Observe that, under Assumption 4.1(i)(i), Assumption 4.2 and using (2.4) we get

Xv(y)Pi^(dy|x,Φm,Ψm)\displaystyle\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m},\Psi_{m})
=BAXv(y)eci1(x,a)eci2(x,b)P1(dy|x,a)Φm(x)(da)Ψm(x)(db)ci(x,Φm(x),Ψm(x))+\displaystyle=\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{1}(dy|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+
BAXv(y)eci1(x,a)eci2(x,b)P2(dy|x,a)Φm(x)(da)Ψm(x)(db)ci(x,Φm(x),Ψm(x))\displaystyle\hskip 85.35826pt\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{2}(dy|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}
=AXv(y)eci1(x,a)h1(x,a,y)λ(dy)Φm(x)(da)Beci2(x,b)Ψm(x)(db)ci(x,Φm(x),Ψm(x))+\displaystyle=\frac{\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}h_{1}(x,a,y)\lambda(dy)\Phi_{m}(x)(da)\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+
BXv(y)eci2(x,b)h2(x,b,y)λ(dy)Ψm(x)(db)Aeci1(x,a)Φm(x)(da)ci(x,Φm(x),Ψm(x)).\displaystyle\hskip 71.13188pt\frac{\int_{B}\int_{X}v(y)e^{c_{i2}(x,b)}h_{2}(x,b,y)\lambda(dy)\Psi_{m}(x)(db)\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}.

By using Lemma 4.1 and since ΦmΦS1andΨmΨS2\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2}, under Assumption 4.1(i)(i), (4.2) holds true, i.e. the second term in the right hand side of (4.1) goes to zero as mm\to\infty. Now we show that the first one also goes to zero.

Again note that,

|Xvm(y)Pi^(dy|x,Φm(x),Ψm(x))Xv(y)Pi^(dy|x,Φm(x),Ψm(x))|\displaystyle\Bigg{|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{|}
=|XBAvm(y)v(y)c~i(x,Φm(x),Ψm(x))eci(x,a,b)P(dy|x,a,b)Φm(x)(da)Ψm(x)(db)|\displaystyle=\Bigg{|}\int_{X}\int_{B}\int_{A}\frac{{v_{m}(y)}-{v(y)}}{\tilde{c}_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}e^{c_{i}(x,a,b)}P(dy|x,a,b)\Phi_{m}(x)(da)\Psi_{m}(x)(db)\Bigg{|}
ec¯BA|X(vm(y)v(y))P(dy|x,a,b)|Φm(x)(da)Ψm(x)(db)\displaystyle\leq e^{\bar{c}}\int_{B}\int_{A}\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy|x,a,b)\Bigg{|}\Phi_{m}(x)(da)\Psi_{m}(x)(db)
ec¯supbBsupaA|X(vm(y)v(y))P(dy|x,a,b)|\displaystyle\leq e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy|x,a,b)\Bigg{|}
=ec¯supbBsupaA|X(vm(y)v(y))h(x,a,b,y)λ(dy)|,\displaystyle=e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a,b,y)\lambda(dy)\Bigg{|},

where h=h1+h2h=h_{1}+h_{2}. From the compactness of A,BA,B and the continuity of h(x,.,.,y)h(x,.,.,y), it follows that for m,m\in\mathbb{N},

Vm(x)\displaystyle V_{m}(x) :=|X(vm(y)v(y))h(x,am,bm,y)λ(dy)|=supbBsupaA|X(vm(y)v(y))h(x,a,b,y)λ(dy)|,\displaystyle:=\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a_{m},b_{m},y)\lambda(dy)\Bigg{|}=\sup_{b\in B}\sup_{a\in A}\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a,b,y)\lambda(dy)\Bigg{|},

for some sequences {am}A\{a_{m}\}\in A and {bm}B\{b_{m}\}\in B. We now prove that Vm(x)0V_{m}(x)\to 0 as mm\to\infty.

Since AA and BB are compact , without loss of generality, we can assume that

ama0 and bmb0, for some a0A and b0B.a_{m}\rightarrow a_{0}\quad\text{ and }\quad b_{m}\rightarrow b_{0},\quad\text{ for some }a_{0}\in A\text{ and }b_{0}\in B.

Note that, for each mm, we have

Vm(x)\displaystyle V_{m}(x) |(vm(y)v(y))(h(x,am,bm,y)h(x,a0,b0,y))λ(dy)|+\displaystyle\leq\left|\int\left({v_{m}(y)}-{v(y)}\right)\left(h\left(x,a_{m},b_{m},y\right)-h\left(x,a_{0},b_{0},y\right)\right)\lambda(dy)\right|+ (4.3)
|(vm(y)v(y))h(x,a0,b0,y)λ(dy)|.\displaystyle\hskip 156.49014pt\left|\int\left({v_{m}(y)}-{v(y)}\right)h\left(x,a_{0},b_{0},y\right)\lambda(dy)\right|.

Moreover,

|(vm(y)v(y))(h(x,am,bm,y)h(x,a0,b0,y))λ(dy)|\displaystyle\left|\int\left({v_{m}(y)}-{v(y)}\right)\left(h\left(x,a_{m},b_{m},y\right)-h\left(x,a_{0},b_{0},y\right)\right)\lambda(dy)\right|\leq
vm(y)v(y)h(x,am,bm,)h(x,a0,b0,)L1(λ).\displaystyle\hskip 128.0374pt\left\|{v_{m}(y)}-{v(y)}\right\|\left\|h\left(x,a_{m},b_{m},\cdot\right)-h\left(x,a_{0},b_{0},\cdot\right)\right\|_{L^{1}(\lambda)}.

By Assumption 4.1(ii)(ii) and by the boundedness of {vm(y)v(y)}\left\{\left\|{v_{m}(y)}-{v(y)}\right\|\right\}, from the last inequality we get that, first term on the right-hand side of (4.3) goes to zero as mm\rightarrow\infty. Since, vv is a weak star limit of {vm}\{v_{m}\} and h(x,a0,b0,)L1(λ)h\left(x,a_{0},b_{0},\cdot\right)\in L_{1}(\lambda), so the second term on the right-hand side of (4.3) also goes to zero as mm\rightarrow\infty. Thus, we have shown that Vm(x)0V_{m}(x)\rightarrow 0 as mm\rightarrow\infty. Hence the result followed. ∎

Lemma 4.3.

Let for M>0M>0, {vm}\{v_{m}\} be any sequence in 𝔹M(X)\mathbb{B}_{M}(X) with vm(x0)=0v_{m}(x_{0})=0 for all mm\in\mathbb{N}. Then {vm}\{{v_{m}}\} is uniformly bounded .

Proof.

Now for each fixed mm\in\mathbb{N} as,

infxXvm(x)vm(x0)=0\inf_{x\in X}v_{m}(x)\leq v_{m}(x_{0})=0

and vm(x)spM\|v_{m}(x)\|_{sp}\leq M, using (2.8) we have

supxXvm(x)M\sup_{x\in X}v_{m}{(x)}\leq M (4.4)

Again as supxXvm(x)vm(x0)\sup_{x\in X}v_{m}(x)\geq v_{m}(x_{0}) and vm(x)spM\|v_{m}(x)\|_{sp}\leq M, using (2.8) we have

infxXvm(x)M\inf_{x\in X}v_{m}{(x)}\geq-M (4.5)

Now from, (4.4) and (4.5) for fixed mm\in\mathbb{N} and each xXx\in X we get

Mvm(x)M-M\leq v_{m}(x)\leq M

Therefore, {vm}\{{v_{m}}\} is a uniformly bounded sequence in 𝔹M(X)\mathbb{B}_{M}(X). ∎

For a fix ΦS1\Phi\in S_{1} let,

H(Φ)={ΨS2:c^2(x,Φ(x),Ψ(x))+lnXev2Φ(y)P^2(dy|x,Φ(x),Ψ(x))\displaystyle H(\Phi)=\bigg{\{}\Psi^{*}\in S_{2}:\hat{c}_{2}(x,\Phi(x),\Psi^{*}(x))+ln\int_{X}e^{v^{*\Phi}_{2}(y)}\hat{P}_{2}(dy|x,\Phi(x),\Psi^{*}(x))
=infψ𝒫(B)[c^2(x,Φ(x),ψ)+lnXev2Φ(y)P^2(dy|x,Φ(x),ψ)]},\displaystyle\hskip 99.58464pt=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{v^{*\Phi}_{2}(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi)\bigg{]}\bigg{\}},

where v2Φ{v^{*\Phi}_{2}} is the unique solution of (3.5) corresponding to the strategy ΦS1\Phi\in S_{1}.

Similarly for a fix ΨS2\Psi\in S_{2} let,

H(Ψ)={ΦS1:c^1(x,Φ(x),Ψ(x))+lnXev1Ψ(y)P^1(dy|x,Φ(x),Ψ(x))\displaystyle H(\Psi)=\bigg{\{}\Phi^{*}\in S_{1}:\hat{c}_{1}(x,\Phi^{*}(x),\Psi(x))+ln\int_{X}e^{v^{*\Psi}_{1}(y)}\hat{P}_{1}(dy|x,\Phi^{*}(x),\Psi(x))
=infφ𝒫(A)[c^1(x,φ,Ψ(x))+lnXev1Ψ(y)P^2(dy|x,φ,Ψ(x))]},\displaystyle\hskip 99.58464pt=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\Psi(x))+ln\int_{X}e^{v^{*\Psi}_{1}(y)}\hat{P}_{2}(dy|x,\varphi,\Psi(x))\bigg{]}\bigg{\}},

where v1Ψ{v^{*\Psi}_{1}} is the unique solution of (3.9) corresponding to the strategy ΨS2\Psi\in S_{2}.

Remark 4.1.

Since exponential and logarithmic functions are increasing functions, so H(Φ)H(\Phi) and H(Ψ)H(\Psi) also have the following expressions:

H(Φ)={ΨS2:BAec2(x,a,b)Xev2Φ(y)P(dy|x,a,b)Φ(x)(da)Ψ(x)(db)\displaystyle H(\Phi)=\bigg{\{}\Psi^{*}\in S_{2}:\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v^{*\Phi}_{2}(y)}P(dy|x,a,b)\Phi(x)(da)\Psi^{*}(x)(db)
=infψ𝒫(B)BAec2(x,a,b)Xev2Φ(y)P(dy|x,a,b)Φ(x)(da)ψ(db)}.\displaystyle\hskip 85.35826pt=\inf_{\psi\in\mathcal{P}(B)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v^{*\Phi}_{2}(y)}P(dy|x,a,b)\Phi(x)(da)\psi(db)\bigg{\}}.
H(Ψ)={ΦS1:BAec1(x,a,b)Xev1Ψ(y)P(dy|x,a,b)Φ(x)(da)Ψ(x)(db)\displaystyle H(\Psi)=\bigg{\{}\Phi^{*}\in S_{1}:\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v^{*\Psi}_{1}(y)}P(dy|x,a,b)\Phi^{*}(x)(da)\Psi(x)(db)
=infφ𝒫(A)BAec1(x,a,b)Xev1Ψ(y)P(dy|x,a,b)φ(da)Ψ(x)(db)}.\displaystyle\hskip 85.35826pt=\inf_{\varphi\in\mathcal{P}(A)}\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v^{*\Psi}_{1}(y)}P(dy|x,a,b)\varphi(da)\Psi(x)(db)\bigg{\}}.

Next set

H(Φ,Ψ)=H(Ψ)×H(Φ).H(\Phi,\Psi)=H(\Psi)\times H(\Phi).
Lemma 4.4.

Under Assumptions 2.1, 3.1, 4.1 and 4.2, H(Φ,Ψ)H(\Phi,\Psi) is a non-empty compact convex subset of S1×S2S_{1}\times S_{2}.

Proof.

From Remark 2.1, we know that Xev(y)P^2(dy|x,Φ(x),ψ)\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi) is continuous on 𝒫(A)×𝒫(B)\mathcal{P}(A)\times\mathcal{P}(B) for each xXx\in X. As BB is compact, 𝒫(B)\mathcal{P}(B) is also compact. Then it is easy to see that H(Φ)H(\Phi) is non-empty. Let ΨmH(Φ)\Psi_{m}^{*}\in H(\Phi) and as S2S_{2} is compact, {Ψm}\{\Psi^{*}_{m}\} has a convergent subsequence(denoted by the same sequence by abuse of notation)such that ΨmΨ^S2\Psi^{*}_{m}\to\hat{\Psi}\in S_{2}. Now for any ψ𝒫(B)\psi\in\mathcal{P}(B)

c^2(x,Φ(x),Ψm(x))+lnXev2Φ(y)P^2(dy|x,Φ(x),Ψm(x))\displaystyle\hat{c}_{2}(x,\Phi(x),\Psi_{m}^{*}(x))+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi(x),\Psi_{m}^{*}(x)) (4.6)
c^2(x,Φ(x),ψ)+lnXev2Φ(y)P^2(dy|x,Φ(x),ψ).\displaystyle\hskip 142.26378pt\leq\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi).

Using Lemma 4.1 and 4.2, from (4.6) we get for any ψ𝒫(B)\psi\in\mathcal{P}(B)

c^2(x,Φ(x),Ψ^(x))+lnXev2Φ(y)P^2(dy|x,Φ(x),Ψ^(x))\displaystyle\hat{c}_{2}(x,\Phi(x),\hat{\Psi}(x))+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi(x),\hat{\Psi}(x))
c^2(x,Φ(x),ψ)+lnXev2Φ(y)P^2(dy|x,Φ(x),ψ)λa.e.\displaystyle\hskip 113.81102pt\leq\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi)\hskip 7.11317pt\lambda-a.e.

Hence it follows that Ψ^H(Φ)\hat{\Psi}\in H(\Phi) and therefore H(Φ)H(\Phi) is closed. Since S2S_{2} is a compact metric space, it follows that H(Φ)H(\Phi) is also compact. Using Remark 4.1 the convexity of H(Φ)H(\Phi) and H(Ψ)H(\Psi) follows easily. By analogous arguments, H(Ψ)H(\Psi) is also non-empty compact subset of S2S_{2}. Hence H(Φ,Ψ)H(\Phi,\Psi) is a non-empty compact convex subset of S1×S2S_{1}\times S_{2}. ∎

Next lemma proves the upper semi-continuity of a certain set valued map. This result will be useful in establishing the existence of a Nash equilibrium in the space of stationary Markov strategies.

Lemma 4.5.

Under Assumptions 2.1, 2.2, 3.1, 3.2, 4.1 and 4.2 the map (Φ,Ψ)H(Φ,Ψ)(\Phi,\Psi)\to H(\Phi,\Psi) from S1×S22S1×S2S_{1}\times S_{2}\to 2^{S_{1}\times S_{2}} is upper semi-continuous.

Proof.

Let ΨmH(Φm)\Psi_{m}^{*}\in H(\Phi_{m}). {Φm}\{\Phi_{m}\} has a convergent subsequence (denoted by the same sequence by abuse of notation) such that ΦmΦ¯S1\Phi_{m}\to\bar{\Phi}\in S_{1} and similarly {Ψm}\{\Psi_{m}^{*}\} has a subsequence too such that ΨmΨ^S2\Psi_{m}^{*}\to\hat{\Psi}\in S_{2}. Since, {v2Φm}𝔹L(X)\{v^{*\Phi_{m}}_{2}\}\in\mathbb{B}_{L}(X) and {ρ2Φm}\{\rho^{*\Phi_{m}}_{2}\} is bounded so without loss of generality let v2Φmv2v^{*\Phi_{m}}_{2}\to v_{2} in the weak star sense and ρ2Φmρ2\rho^{*\Phi_{m}}_{2}\to\rho_{2}. Then since,

ρ2Φm+v2Φm=c^2(x,Φm(x),Ψm(x))+lnXev2Φm(y)P^2(dy|x,Φm(x),Ψm(x)){\rho^{*\Phi_{m}}_{2}}+{v^{*\Phi_{m}}_{2}}=\hat{c}_{2}(x,\Phi_{m}(x),\Psi_{m}^{*}(x))+ln\int_{X}e^{{v^{*\Phi_{m}}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi_{m}(x),\Psi_{m}^{*}(x)) (4.7)

Using Lemma 4.1, 4.2 and 4.3 it follows that

ρ2+v2(x)=c^2(x,Φ¯(x),Ψ^(x))+lnXev2(y)P^2(dy|x,Φ¯(x),Ψ^(x))λa.e.\rho_{2}+v_{2}(x)=\hat{c}_{2}(x,\bar{\Phi}(x),\hat{\Psi}(x))+ln\int_{X}e^{{v_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\hat{\Psi}(x))\hskip 14.22636pt\lambda-a.e. (4.8)

From (4.7) for any ψ𝒫(B)\psi\in\mathcal{P}(B) we get

ρ2Φm+v2Φmc^2(x,Φm(x),ψ)+lnXev2Φm(y)P^2(dy|x,Φm(x),ψ).{\rho^{*\Phi_{m}}_{2}}+{v^{*\Phi_{m}}_{2}}\leq\hat{c}_{2}(x,\Phi_{m}(x),\psi)+ln\int_{X}e^{{v^{*\Phi_{m}}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi_{m}(x),\psi).

Again using Lemma 4.1, 4.2 and 4.3 it follows that

ρ2+v2(x)c^2(x,Φ¯(x),ψ)+lnXev2(y)P^2(dy|x,Φ¯(x),ψ)λa.e.\rho_{2}+v_{2}(x)\leq\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\hskip 14.22636pt\lambda-a.e. (4.9)

Let v2(x)=v2(x)v2(x0)v^{*}_{2}(x)=v_{2}(x)-v_{2}(x_{0}). Then from (4.9) we get, for any ψ𝒫(B)\psi\in\mathcal{P}(B)

ρ2+v2(x)c^2(x,Φ¯(x),ψ)+lnXev2(y)P^2(dy|x,Φ¯(x),ψ)λa.e.\rho_{2}+v^{*}_{2}(x)\leq\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\hskip 14.22636pt\lambda-a.e. (4.10)

and from (4.8) we get

ρ2+v2(x)=c^2(x,Φ¯(x),Ψ^(x))+lnXev2(y)P^2(dy|x,Φ¯(x),Ψ^(x))infψ𝒫(B)[c^2(x,Φ¯(x),ψ)+lnXev2(y)P^2(dy|x,Φ¯(x),ψ)]}λa.e.\begin{aligned} \rho_{2}+v^{*}_{2}(x)&=\hat{c}_{2}(x,\bar{\Phi}(x),\hat{\Psi}(x))+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\hat{\Psi}(x))\hskip 14.22636pt\\ &\geq\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\bigg{]}\hskip 14.22636pt\end{aligned}\Biggl{\}}\lambda-a.e. (4.11)

Since (4.10) holds for every ψ𝒫(B)\psi\in\mathcal{P}(B), from (4.10) and (4.11) we get

ρ2+v2(x)=infψ𝒫(B)[c^2(x,Φ¯(x),ψ)+lnXev2(y)P^2(dy|x,Φ¯(x),ψ)]λa.e.\rho_{2}+v^{*}_{2}(x)=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\bigg{]}\hskip 14.22636pt\lambda-a.e. (4.12)

with v2(x0)=0v^{*}_{2}(x_{0})=0. Now by Theorem 3.1 we can say that (4.12) has unique solution (ρ2Φ¯,v2Φ¯)(\rho^{*\bar{\Phi}}_{2},v^{*\bar{\Phi}}_{2})(corresponds to Φ¯S1\bar{\Phi}\in S_{1}) satisfying v2Φ¯(x0)=0v^{*\bar{\Phi}}_{2}(x_{0})=0. Therefore, ρ2=ρ2Φ¯\rho_{2}=\rho^{*\bar{\Phi}}_{2} and v2=v2Φ¯v^{*}_{2}=v^{*\bar{\Phi}}_{2}. Thus, from (4.11) and (4.12) it follows that Ψ^H(Φ¯)\hat{\Psi}\in H(\bar{\Phi}).

Suppose ΦmH(Ψm)\Phi_{m}^{*}\in H(\Psi_{m}) and along a suitable subsequence ΦmΦ^S1\Phi_{m}^{*}\to\hat{\Phi}\in S_{1} and ΨmΨ¯S2\Psi_{m}\to\bar{\Psi}\in S_{2}. Then by similar arguments one can show that Φ^H(Ψ¯)\hat{\Phi}\in H(\bar{\Psi}). This proves that (Φ^,Ψ^)H(Φ¯,Ψ¯)(\hat{\Phi},\hat{\Psi})\in H(\bar{\Phi},\bar{\Psi}). Hence the map (Φ,Ψ)H(Φ1,Ψ)(\Phi,\Psi)\to H(\Phi_{1},\Psi) is upper semi-continuous.

Now we are now all set to show the existence is of the Nash equilibrium which is directly follows by using Fan’s fixed point theorem [9]. ∎

Theorem 4.1.

Suppose that the Assumptions 2.1, 2.2, 3.1, 3.2, 4.1 and 4.2 are satisfied. Then there exists a Nash equilibrium in the space of stationary strategies S1×S2S_{1}\times S_{2}.

Proof.

Using Lemma 4.4 and 4.5 from Fan’s fixed point theorem, it follows that there exists a fixed point (Φ,Ψ)(S1×S2)(\Phi^{*},\Psi^{*})\in(S_{1}\times S_{2}) for the map (Φ,Ψ)H(Φ,Ψ)(\Phi,\Psi)\to H(\Phi,\Psi) from S1×S22S1×S2S_{1}\times S_{2}\to 2^{S_{1}\times S_{2}}. This implies that (ρ1Ψ,v1Ψ),(ρ2Φ,v2Φ)(\rho_{1}^{*\Psi^{*}},v_{1}^{*\Psi^{*}}),(\rho_{2}^{*\Phi^{*}},v_{2}^{*\Phi^{*}}) satisfy the following coupled optimality equations:

ρ1Ψ+v1Ψ\displaystyle\rho_{1}^{*\Psi^{*}}+v_{1}^{*\Psi^{*}} =infφ𝒫(A)[c^1(x,φ,Ψ(x))+lnXev1Ψ(y)P^1(dy|x,φ,Ψ(x))]\displaystyle=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\Psi^{*}(x))+ln\int_{X}e^{v_{1}^{*\Psi^{*}}}(y)\hat{P}_{1}(dy|x,\varphi,\Psi^{*}(x))\bigg{]} (4.13)
=[c^1(x,Φ(x),Ψ(x))+lnXev1Ψ(y)P^1(dy|x,Φ(x),Ψ(x))],\displaystyle=\bigg{[}\hat{c}_{1}(x,\Phi^{*}(x),\Psi^{*}(x))+ln\int_{X}e^{v_{1}^{*\Psi^{*}}}(y)\hat{P}_{1}(dy|x,\Phi^{*}(x),\Psi^{*}(x))\bigg{]},

and

ρ2Φ+v2Φ\displaystyle\rho_{2}^{*\Phi^{*}}+v_{2}^{*\Phi^{*}} =infψ𝒫(B)[c^2(x,Φ(x),ψ)+lnXev2Φ1P^2(dy|x,Φ(x),ψ)]\displaystyle=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi^{*}(x),\psi)+ln\int_{X}e^{v_{2}^{*\Phi^{*}_{1}}}\hat{P}_{2}(dy|x,\Phi^{*}(x),\psi)\bigg{]} (4.14)
=[c^2(x,Φ(x),Ψ(x))+lnXev2Φ1(x)P^1(dy|x,Φ(x),Ψ(x))].\displaystyle=\bigg{[}\hat{c}_{2}(x,\Phi^{*}(x),\Psi^{*}(x))+ln\int_{X}e^{v_{2}^{*\Phi^{*}_{1}(x)}}\hat{P}_{1}(dy|x,\Phi^{*}(x),\Psi^{*}(x))\bigg{]}.

Now by Theorem 3.1, from (4.13), it follows that

ρ1Ψ=infπ1Π1J1π1,Ψ=J1Φ,Ψ.\rho_{1}^{*\Psi^{*}}=\inf_{\pi^{1}\in\Pi_{1}}J^{\pi^{1},\Psi^{*}}_{1}=J^{\Phi^{*},\Psi^{*}}_{1}. (4.15)

Similarly, by Theorem 3.2, from (4.14), it follows that

ρ2Φ=infπ2Π2J2Φ,π2=J2Φ,Ψ.\rho^{*\Phi^{*}}_{2}=\inf_{\pi^{2}\in\Pi_{2}}J^{\Phi^{*},\pi^{2}}_{2}=J^{\Phi^{*},\Psi^{*}}_{2}. (4.16)

Thus, from equations (4.15) and (4.16), we get

J1π1,ΨJ1Φ,Ψ,π1Π1,J2Φ,π2J2Φ,Ψ,π2Π2.}\begin{aligned} &J^{\pi_{1},\Psi^{*}}_{1}\geq J^{\Phi^{*},\Psi^{*}}_{1},\forall\pi^{1}\in\Pi_{1},\\ &J^{\Phi^{*},\pi^{2}}_{2}\geq J^{\Phi^{*},\Psi^{*}}_{2},\forall\pi^{2}\in\Pi_{2}.\end{aligned}\Biggl{\}} (4.17)

Hence (Φ,Ψ)S1×S2(\Phi^{*},\Psi^{*})\in S_{1}\times S_{2} obviously forms a λ\lambda-equilibrium stationary strategy (i.e., (4.17) holds in a set of λ\lambda-measure 1). Then by a construction analogous to Theorem 1 of [18] the existence of the desired Nash equilibrium follows. ∎

References

  • [1] Tansu Alpcan and Tamer Ba¸sar. Network security. Cambridge University Press, Cambridge, 2011. A decision and game-theoretic approach.
  • [2] Arnab Basu and Mrinal K Ghosh. Zero-sum risk-sensitive stochastic games on a countable state space. Stochastic Process. Appl., 124(1):961–983, 2014.
  • [3] Arnab Basu and Mrinal K Ghosh. Nonzero-sum risk-sensitive stochastic games on a countable state space. Mathematics of Operations Research, 43(2):516–532, 2018.
  • [4] N. Bauerle and U. Rieder. Zero-sum risk sensitive stochastic games. Stoch. Processes and Their Appl, 127:622–642, 2017.
  • [5] Dimitri Bertsekas and Steven E Shreve. Stochastic optimal control: the discrete-time case, volume 5. Athena Scientific, 1996.
  • [6] Tomasz R. Bielecki, Stanley R. Pliska, and Shuenn-Jyi Sheu. Risk sensitive portfolio management with Cox-Ingersoll-Ross interest rates: the HJB equation. SIAM J. Control Optim., 44(5):1811–1843, 2005.
  • [7] Anup Biswas and Vivek S. Borkar. Ergodic risk-sensitive control—a survey. Annu. Rev. Control, 55:118–141, 2023.
  • [8] Giovanni B Di Masi and Lukasz Stettner. Risk-sensitive control of discrete-time markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38(1):61–78, 1999.
  • [9] Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of the National Academy of Sciences, 38(2):121–126, 1952.
  • [10] Mrinal K Ghosh and Arunabha Bagchi. Stochastic games with average payoff criterion. Applied Mathematics and Optimization, 38:283–301, 1998.
  • [11] Mrinal K Ghosh, Subrata Golui, Chandan Pal, and Somnath Pradhan. Discrete-time zero-sum games for markov chains with risk-sensitive average cost criterion. Stochastic Processes and their Applications, 158:40–74, 2023.
  • [12] Daniel Hernández-Hernández and Steven I Marcus. Risk sensitive control of markov processes in countable state space. Systems & control letters, 29(3):147–155, 1996.
  • [13] Onésimo Hernández-Lerma. Adaptive Markov control processes, volume 79. Springer Science & Business Media, 2012.
  • [14] CJ Himmelberg, Thiruvenkatachari Parthasarathy, TES Raghavan, and FS Van Vleck. Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proceedings of the American Mathematical Society, 60(1):245–251, 1976.
  • [15] Ronald A. Howard and James E. Matheson. Risk-sensitive Markov decision processes. Management Sci., 18:356–369, 1971/72.
  • [16] Matthew O. Jackson. Social and economic networks. Princeton University Press, Princeton, NJ, 2008.
  • [17] Andrzej S. Nowak and Eitan Altman. ϵ\epsilon-equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J. Control Optim., 40(6):1821–1839, 2002.
  • [18] T Parthasarathy. Existence of equilibrium stationary strategies in discounted stochastic games. Sankhyā: The Indian Journal of Statistics, Series A, pages 114–127, 1982.
  • [19] Tim Roughgarden. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, 2016.
  • [20] Jack Warga. Functions of relaxed controls. SIAM Journal on Control, 5(4):628–641, 1967.
  • [21] Qingda Wei and Xian Chen. Nonzero-sum expected average discrete-time stochastic games: the case of uncountable spaces. SIAM J. Control Optim., 57(6):4099–4124, 2019.
  • [22] Qingda Wei and Xian Chen. Risk-sensitive average equilibria for discrete-time stochastic games. Dynamic Games and Applications, 9:521–549, 2019.
  • [23] Qingda Wei and Xian Chen. Nonzero-sum risk-sensitive average stochastic games: the case of unbounded costs. Dynamic games and Applications, 11:835–862, 2021.
  • [24] Peter Whittle. Risk-sensitive linear/quadratic/gaussian control. Advances in Applied Probability, 13(4):764–777, 1981.