Nonzero-sum Discrete-time Stochastic Games with Risk-sensitive Ergodic Cost Criterion

Bivakar Bose Chandan Pal Somnath Pradhan and Subhamay Saha Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, email: b.bivakar@iitg.ernet.inDepartment of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, email: cpal@iitg.ernet.in Department of Mathematics, Indian Institute Science Education and Research, Bhopal, Bhopal-462066 India, email: somnath@iiserb.ac.in Department of Mathematics, Indian Institute of Technology, Guwahati, Guwahati-781039 India, email: saha.subhamay@iitg.ac.in

Abstract

In this paper we study infinite horizon nonzero-sum stochastic games for controlled discrete-time Markov chains on a Polish state space with risk-sensitive ergodic cost criterion. Under suitable assumptions we show that the associated ergodic optimality equations admit unique solutions. Finally, the existence of Nash-equilibrium in randomized stationary strategies is established by showing that an appropriate set-valued map has a fixed point.

Key Words : Nonzero-sum game, risk-sensitive ergodic cost criterion, optimality equations, Nash equilibrium, set-valued map

Mathematics Subject Classification: 91A15, 91A50

1 Introduction

We consider infinite horizon ergodic-cost risk-sensitive two-person nonzero-sum stochastic games for discrete time Markov decision processes (MDPs) on a general state space with bounded cost function. We first establish the existence of unique solutions of the corresponding optimality equations. This is used to establish the existence of optimal randomized stationary strategies for a player fixing the strategy of the other player. We then define a suitable topology on the space of randomized stationary strategies. Then under certain separability assumptions we show that a suitably defined set-valued map has a fixed point. This gives the existence of a Nash equilibrium for the nonzero-sum game under consideration. The main step towards this end is to establish the upper semi-continuity of the defined set-valued map.

In a stochastic control problem, since the cost payable depends on the random evolution of the underlying stochastic process, the same is also random. So the most basic approach is to minimize the expected cost. This is the risk-neutral approach. But the obvious short coming of this approach is that it does not take care of the risk factor. In the risk-sensitive criterion, the expected value of the exponential of the cost is considered. Generally the risk associated with a random quantity is quantified by the standard deviation. Hence risk-sensitive criterion provides significantly better protection from the risk since it captures the effects of the higher order moments of the cost as well, see [24] for more details. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost; accumulated costs are multiplicative over time as opposed to additive in the risk-neutral case. Risk-sensitive control problems also has deep connections to robust control theory, control theory literature which deals with model uncertainty, see [24] and references therein. Risk-sensitive criterion also arises naturally in portfolio optimization problems in mathematical finance, see [6] and associated references. Nonzero-sum stochastic games arise naturally in strategic multi-agent decision-making system like socioeconomic systems [16] , network security [1], routing and scheduling [19] and so on.

Following the seminal work of of Howard and Matheson [15] there has been a lot of work on risk-sensitive control problems. For an up to date survey on ergodic risk-sensitive control problems with underlying stochastic process being discrete-time Markov chains, see [7]. In the multi-controller set up, discrete-time risk-sensitive zero-sum games on countable state space have been studied in [2, 11]. In [4] the authors consider discrete-time zero-sum risk-sensitive stochastic games on Borel state space for both discounted and ergodic cost criterion. Their analysis involves transforming the original risk-sensitive problem to a risk-neutral problem. Nonzero-sum risk-sensitive stochastic games with both discounted and ergodic cost criterion for countable state space and bounded cost has been studied in [3, 22]. In [23] discrete-time nonzero-sum stochastic games under the risk-sensitive ergodic cost criterion with countable state space and unbounded cost function has been investigated. To the best of our knowledge, this is the first paper which considers risk-sensitive nonzero-sum games on a general state space. The analysis of nonzero-sum games on general state space is significantly more involved as compared to the case of countable state space. One of the reasons for this is that in case of countable state space the topology on the space of randomized stationary controls is fairly simple to work with. In case of general state space this becomes a substantial challenge. In the risk-neutral setup itself the analysis becomes substantially involved and requires additional assumptions. In the risk-neutral set up discrete-time nonzero-sum ergodic cost games with general state space has been studied in [10, 17, 21]. In [10], the authors prove the existence of a Nash-equilibrium under an additive reward additive transition(ARAT) assumption. In [17], the authors establish the existence of an $\epsilon$ -Nash equilibrium. In [21], the authors assume that the transition law is a combination of finitely many probability measures. In this work we also assume ARAT condition.

In this paper we first consider ergodic optimality equations which correspond to control problems with the strategy of one player fixed. We show the existence of unique solutions to these equations. We follow a span contraction approach. For this analysis we make certain ergodicity assumption along with a few extra assumptions on the state transition kernel. Analogous assumptions appear in [8] where the authors consider risk-sensitive control problems with ergodic cost criterion. In order to establish the existence of a Nash equilibrium we first define an appropriate set valued map. In order to to show the existence of a Nash equilibrium for the nonzero-sum game we use Fan’s fixed point theorem [9]. This involves showing that the defined set valued map is upper semi-continuous under the appropriate topology.

The rest of the paper is organized as follows. Section 2 introduces the game model, some preliminaries and notations. In section 3 we show the existence of a unique solution to the optimality equation using a certain span-norm contraction. Existence of the Nash-equilibrium has been shown in section 4.

2 The Game Model

In this section, we present the discrete-time nonzero-sum stochastic game model and introduce the notations utilized throughout the paper. The following elements are needed to describe the discrete-time nonzero-sum stochastic game:

\displaystyle\{X,A,B,P,c_{1},c_{2}\},

where each component is described below.

•

$X$ is the state space assumed to be Polish space endowed with the Borel $\sigma$ -algebra $\mathcal{X}$ .
•

$A$ and $B$ are action spaces of player $1$ and $2$ respectively, assumed to be compact metric spaces. Let $\mathcal{A}$ and $\mathcal{B}$ denote the Borel $\sigma$ -algebras on $A$ and $B$ respectively.
•

$P$ is the transition kernel from $X\times A\times B\to\mathcal{P}(X)$ , where for any metric space $D$ , $\mathcal{P}(D)$ denotes the space of all probability measures on $D$ with the topology of weak convergence.
•

$c_{i}:X\times A\times B\to\mathbb{R}$ , $i=1,2,$ is one-stage cost function for player $i$ , assumed to be bounded and continuous on $A\times B$ . Since, the cost is bounded, without loss of generality let, $0\leq c_{i}\leq\bar{c}$ .

At each stage(time) the players observe the current state $x\in X$ of the system and then player $1$ and $2$ independently choose actions $a\in A$ and $b\in B$ respectively. As a consequence two things happen:

(i)

player $i,i=1,2$ , pays an immediate cost $c_{i}(x,a,b).$
(ii)

the system moves to a new state $y$ with the distribution $P(\cdot|x,a,b).$

The whole process then repeats from the new state $y$ . The cost accumulates throughout the course of the game. The planning horizon is taken to be infinite and each player wants to minimize his/her infinite-horizon risk-sensitive cost with respect to some cost criterion, which is in our case defined by (2.1) below.

At each stage the players choose their actions independently on the basis of the available information. The information available for decision making at time $t\in\mathbb{N}_{0}:=\{0,1,2,\ldots\}$ is given by the history of the process up to that time

h_{t}:=\left(x_{0},a_{0},b_{0},x_{1},a_{1},b_{1},\ldots,a_{t-1},b_{t-1},x_{t}\right)\in H_{t},

where $H_{0}=X,H_{t}=H_{t-1}\times(A\times B\times X),\ldots,H_{\infty}=(X\times A\times B)^{\infty}$ are the history spaces. The history spaces are endowed with the corresponding Borel $\sigma-$ algebra. A strategy for player 1 is a sequence $\pi^{1}=\left\{\pi_{t}^{1}\right\}_{t\in\mathbb{N}_{0}}$ of stochastic kernels $\pi_{t}^{1}:H_{t}\rightarrow\mathcal{P}(A)$ . The set of all strategies for player 1 is denoted by $\Pi_{1}$ . A strategy $\pi^{1}\in\Pi_{1}$ is called a Markov strategy if

\pi_{t}^{1}\left(h_{t-1},a,b,x\right)(\cdot)=\pi_{t}^{1}\left(h_{t-1}^{\prime},a^{\prime},b^{\prime},x\right)(\cdot)

for all $h_{t-1},h_{t-1}^{\prime}\in H_{t-1},a,a^{\prime}\in A,b,b^{\prime}\in B,x\in X,t\in\mathbb{N}_{0}$ . Thus a Markov strategy for player 1 can be identified with a sequence of measurable maps $\left\{\varPhi_{t}^{1}\right\},\varPhi_{t}^{1}:X\rightarrow\mathcal{P}(A)$ . A Markov strategy $\left\{\Phi_{t}^{1}\right\}$ is called a stationary strategy if $\varPhi_{t}^{1}=\Phi:X\rightarrow\mathcal{P}(A)$ for all $t$ . Let $M_{1}$ and $S_{1}$ denote the set of Markov and stationary strategies for player 1 , respectively. The strategies for player 2 are defined similarly. Let $\Pi_{2},M_{2},$ and $S_{2}$ denote the set of arbitrary, Markov and stationary strategies for player 2, respectively.

Given an initial distribution $\tilde{\pi}_{0}\in\mathcal{P}(X)$ and a pair of strategies $(\pi^{1},\pi^{2})\in\Pi_{1}\times\Pi_{2}$ , the corresponding state and action processes $\left\{X_{t}\right\},\left\{A_{t}\right\},\left\{B_{t}\right\}$ are stochastic processes defined on the canonical space $(H_{\infty},\mathfrak{B}(H_{\infty}),P_{\tilde{\pi}_{0}}^{\pi^{1},\pi^{2}})$ (where $\mathfrak{B}(H_{\infty})$ is the Borel $\sigma$ -field on $H_{\infty}$ ) via the projections $X_{t}\left(h_{\infty}\right)=x_{t},A_{t}\left(h_{\infty}\right)=a_{t},B_{t}\left(h_{\infty}\right)=b_{t}$ , where $P_{\tilde{\pi}_{0}}^{\pi^{1},\pi^{2}}$ is uniquely determined by $\pi^{1},\pi^{2}$ and $\tilde{\pi}_{0}$ by Ionescu Tulcea’s theorem [5, Proposition 7.28]. When $\tilde{\pi}_{0}=\delta_{x_{0}}$ (the dirac measure at $x_{0}$ ), $~{}x_{0}\in X$ , we simply write this probability measure as $P_{x_{0}}^{\pi_{1},\pi_{2}}$ . For $h_{t}\in H_{t},a\in A,b\in B$ , we have

	$\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(X_{0}=x_{0})=1,$
	$\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(X_{t+1}\in E\|h_{t},A_{t}=a,B_{t}=b)=P(E\|x_{t},a,b)~{}\forall E\in\mathcal{X},$
	$\displaystyle P_{x_{0}}^{\pi^{1},\pi^{2}}(A_{t}\in F,B_{t}\in G\|h_{t})=\pi_{t}^{1}(h_{t})(F)\pi_{t}^{2}(h_{t})(G)~{}\forall F\in\mathcal{A},~{}\forall G\in\mathcal{B}.$

Let the corresponding expectation operator be denoted by $E_{{\tilde{\pi}}_{0}}^{\pi^{1},\pi^{2}}(E_{x_{0}}^{\pi^{1},\pi^{2}})$ with respect to the probability measure $P_{{\tilde{\pi}}_{0}}^{\pi^{1},\pi^{2}}(P_{x_{0}}^{\pi^{1},\pi^{2}})$ . Now from [13] we know that under any $(\Phi\times\Psi)\in S_{1}\times S_{2}$ , the corresponding stochastic process $X_{t}$ is a Markov process.

Ergodic cost criterion: We now define the risk-sensitive ergodic cost criterion for nonzero-sum discrete-time game. Let $(X_{t},A_{t},B_{t})$ be the corresponding process with $X_{0}=x\in X$ and $\theta>0$ be the risk-sensitive parameter. For a pair of strategies $(\pi^{1},\pi^{2})\in(\Pi_{1}\times\Pi_{2})$ , the risk-sensitive ergodic cost criterion for player $i=1,2$ is given by

J^{\pi^{1},\pi^{2}}_{i}(x)=\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\pi^{2}}_{x}\bigg{[}e^{{\theta}{\sum_{t=0}^{n-1}c_{i}(x_{t},a_{t},b_{t})}}\bigg{]}

(2.1)

Since the risk-sensitive parameter remains the same throughout, we assume without loss of generality that $\theta=1$ . Note that, $J^{\pi^{1},\pi^{2}}_{i}$ for $i=1,2$ are bounded as our cost functions are bounded.

Nash equilibrium: A pair of strategies $(\pi^{*1},\pi^{*2})\in\Pi_{1}\times\Pi_{2}$ is called a Nash equilibrium(for the ergodic cost criterion) if

J^{\pi^{*1},\pi^{*2}}_{1}(x)\leq J^{\pi^{1},\pi^{*2}}_{1}(x)~{}\text{for all }\pi^{1}\in\Pi_{1}~{}\text{and}~{}x\in X

and

J^{\pi^{*1},\pi^{*2}_{2}}(x)\leq J^{\pi^{*1},\pi^{2}}_{2}(x)~{}\text{for all }\pi^{2}\in\Pi_{2}~{}\text{and}~{}x\in X.

Our primary goal is to establish the existence of a Nash equilibrium in stationary strategies.

For $i=1,2$ , let us define transition measures $\tilde{P}_{i}$ from $X\times A\times B\to\mathcal{P}(X)$ by

\tilde{P}_{i}(dy|x,a,b)=e^{c_{i}(x,a,b)}P(dy|x,a,b).

(2.2)

Moreover, define for $i=1,2,~{}\varphi\in\mathcal{P}(A)$ and $\psi\in\mathcal{P}(B)$

P(dy|x,\varphi,\psi):=\int_{B}\int_{A}P(dy|x,a,b)\varphi(da)\psi(db)

and

\tilde{P}_{i}(dy|x,\varphi,\psi):=\int_{B}\int_{A}\tilde{P}_{i}(dy|x,a,b)\varphi(da)\psi(db).

Obviously $\tilde{P}_{i}$ for $i=1,2$ is in general is not a probability measure. The normalizing constant for $x\in X,\varphi\in\mathcal{P}(A),$ and $\psi\in\mathcal{P}(B)$ is given by

\tilde{c}_{i}(x,\varphi,\psi)=\int_{X}\tilde{P}_{i}(dy|x,\varphi,\psi)=\int_{B}\int_{A}e^{c_{i}(x,a,b)}\varphi(da)\psi(db).

(2.3)

Since $0\leq c_{i}\leq\bar{c}$ , the function $\tilde{c}_{i}$ is also bounded. More precisely $1\leq\tilde{c}_{i}(x,\varphi,\psi)\leq e^{\bar{c}}$ for $i=1,2$ and for each $x\in X,\varphi\in\mathcal{P}(A)$ and $\psi\in\mathcal{P}(B)$ . Thus for $i=1,2$

\hat{P}_{i}(\cdot|x,\varphi,\psi):=\frac{\tilde{P}_{i}(\cdot|x,\varphi,\psi)}{\tilde{c}_{i}(x,\varphi,\psi)}

(2.4)

defines a probability transition kernel and we also use the notation $\hat{c}_{i}(x,\varphi,\psi)$ := $\ln\tilde{c}_{i}(x,\varphi,\psi)$ for $i=1,2$ .

We will use the above transformations, to convert our optimality equations (3.5) and (3.9) into a well-known equation. This process is beneficial as it helps us to prove the existence of the unique solution to the optimality equations as we will see in the next section.

Define $\mathbb{B}(X)$ , the space of all real valued bounded measurable functions on $X$ endowed with the supremum norm $\|.\|$ . For a fixed $\varphi\in\mathcal{P}(A),$ and $v\in\mathbb{B}(X)$ define the operator:

Tv(x)=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\varphi,\psi)+ln\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)\bigg{]}.

(2.5)

Due to the dual representation of the exponential certainty equivalent [12, Lemma 3.3] it is possible to write (2.5) as

Tv(x)=\inf_{\psi}\sup_{\mu}\bigg{[}\hat{c}_{2}(x,\varphi,\psi)+\int_{X}v(y)\mu(dy)-I(\mu,\hat{P}_{2}(\cdot|x,\varphi,\psi)\bigg{]},

(2.6)

where the supremum is over all probability measures $\mu\in\mathcal{P}(X)$ and $I(p,q)$ is the relative entropy of the two probability measures $p,q$ which is defined by

I(p,q):=\int_{X}ln\frac{dp}{dq}p(dx)

when $p<<q$ and $+\infty$ otherwise. Note that the supremum in (2.6) is attained at the probability measure given by

\mu^{\varphi}_{x\psi v}(E):=\frac{\int_{E}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}

(2.7)

for measurable set $E\subset X$ . Obviously $\mu^{\varphi}_{x\psi v}$ can be interpreted as a transition kernel.

Let $v\in\mathbb{B}(X)$ . The span semi-norm of $v$ is defined as:

\|v\|_{sp}=\sup_{x\in X}v(x)-\inf_{x\in X}v(x).

(2.8)

In the last part of this section we make a couple of assumptions that will be in force throughout the rest of the paper. First we will consider the following continuity assumption which is quite standard in literature, see [4, 8, 10, 18] for instance. This will allow us to show that the map $T$ defined in (2.5) is a contraction in the span semi-norm.

Assumption 2.1.

For a fixed $x\in X$ the transition measure $P$ is strongly continuous in $(a,b)$ , i.e. for all bounded and measurable $v:X\to\mathbb{R}$ we have that $(a,b)\mapsto\int_{X}v(y)P(dy|x,a,b)$ is continuous in $(a,b)$ .

It follows from (2.2) that $\tilde{P}_{i}(dy|x,a,b)$ is also strongly continuous in $(a,b)$ for $i=1,2$ which lead us the following remark.

Remark 2.1.

Due to the fact that we consider for any metric space $D$ , the space $\mathcal{P}(D)$ is endowed with the topology of weak convergence, from Assumption 2.1 it follows immediately that for all bounded and measurable functions $v:X\mapsto\mathbb{R}$ and a fixed $x\in X$ the map $(\varphi,\psi)\mapsto\int_{X}v(y)\hat{P}_{i}(dy|x,\varphi,\psi),$ $i=1,2,$ is continuous in $(\varphi,\psi)\in\mathcal{P}(A)\times\mathcal{P}(B)$ .

Next we have the following ergodicity assumption.

Assumption 2.2.

There exists a real number $0<\delta<1$ such that

\sup\left\|{P}(\cdot|x,\varphi,\psi)-{P}\left(\cdot\mid x^{\prime},\varphi^{\prime},\psi^{\prime}\right)\right\|_{\mathrm{TV}}\leq 2\delta,

where the supremum is over all $x,x^{\prime}\in X,\varphi,\varphi^{\prime}\in\mathcal{P}(A),\psi,\psi^{\prime}\in\mathcal{P}(B)$ and $\|\cdot\|_{\mathrm{TV}}$ denotes the total variation norm.

For $x\in X,a\in A,b\in B,~{}P(\mathcal{O}|x,a,b)>0$ for any open set $\mathcal{O}\subset X$ .

3 Solution to the optimality equations

In this section, we demonstrate that the operator $T$ defined in (2.5) is a contraction. The fixed point of $T$ corresponds to the solution of the optimality equation for player 1. In the latter part of this section, we define another operator $U$ corresponding to player 2 and establish results analogous to those obtained for $T$ .

Proposition 3.1.

Under Assumptions 2.1 and 2.2 the operator $T$ maps $\mathbb{B}(X)$ to $\mathbb{B}(X)$ and for each $M>0$ , there exists a positive constant $\alpha(M)<1$ such that for all $v_{1},v_{2}\in\mathbb{B}_{M}(X)$

\left\|Tv_{1}-Tv_{2}\right\|_{sp}\leq\alpha(M)\left\|v_{1}-v_{2}\right\|_{sp},

where $\mathbb{B}_{M}(X)=\{v\in\mathbb{B}(X):\|v\|_{sp}\leq M\}$ .

Proof.

From the definition of $T$ it follows that $T$ transforms $\mathbb{B}(X)$ into itself and the infimum is attained. For given functions $v_{1},v_{2}\in\mathbb{B}(X)$ and $x_{1},x_{2}\in X$ , let $\Psi_{1},\Psi_{2}\in S_{2}$ be such that for $i=1,2$

Tv_{i}(x_{i})=\sup_{\mu}\big{[}\hat{c}_{2}(x_{i},\varphi,\Psi_{i}(x_{i}))+\int_{X}v_{i}(y)\mu(dy)-I(\mu,\hat{P}_{2}(\cdot|x_{i},\varphi,\Psi_{i}^{*}(x_{i}))\big{]}.

Then we obtain that

		$\displaystyle(Tv_{1})(x_{2})-(Tv_{2})(x_{2})-((Tv_{1})(x_{1})-(Tv_{2})(x_{1}))$
		$\displaystyle\leq\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}$
		$\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{2}(y)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}$
		$\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{1}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}$
		$\displaystyle+\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}$
		$\displaystyle\leq\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)-I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)$
		$\displaystyle-\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)-\int_{X}v_{2}(y)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)+I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)$
		$\displaystyle-\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)-\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)+I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)$
		$\displaystyle+\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)-I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)$
		$\displaystyle=\int_{\Delta}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)+\int_{\Delta^{c}}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)$
		$\displaystyle\leq\sup_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(\Delta)+\inf_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)({\Delta}^{c})$
		$\displaystyle=\\|v_{1}-v_{2}\\|_{sp}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(\Delta),$

where the set $\Delta$ comes from the Hahn-Jordan decomposition of $\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}$ and $\Delta^{c}$ denotes the complement of $\Delta$ . Now, taking supremum over $x_{1},x_{2}\in X$ in the above set-up we have

\left\|Tv_{1}-Tv_{2}\right\|_{sp}\leq\|v_{1}-v_{2}\|_{sp}\sup_{E\in\mathcal{X}}\sup_{x_{1},x_{2}\in X}\sup_{\Psi_{1},\Psi_{2}\in S_{2}}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(E).

We claim that

\sup_{v_{1},v_{2};\|v_{1}\|_{sp},\|v_{2}\|_{sp}\leq M}\sup_{E\in\mathcal{X}}\sup_{x_{1},x_{2}\in X}\sup_{\Psi_{1},\Psi_{2}\in S_{2}}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(E)=\alpha(M)<1.

(3.1)

Suppose (3.1) does not hold. Then there exists sequences $\{v_{1n}\},\{v_{2n}\}$ with $\|v_{1n}\|_{sp}\leq M,\|v_{2n}\|_{sp}\leq M$ , $\{E_{n}\},E_{n}\in\mathcal{X},\{x_{1n}\},\{x_{2n}\}$ and $\{\Psi_{1n}\},\{\Psi_{2n}\}$ such that

(\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}}-\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}})(E_{n})\rightarrow 1~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

As $\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}}$ and $\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}}$ are probability measures therefore

\mu^{\varphi}_{x_{2n}\Psi_{2n}v_{1n}}(E_{n})\to 1~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

\mu^{\varphi}_{x_{1n}\Psi_{1n}v_{2n}}(E_{n})\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Since for each $x\in X,\psi\in\mathcal{P}(B)$ and $v\in\mathbb{B}(X)$ from (2.7) we get

e^{-\|v\|_{sp}}\hat{P_{2}}\left(E|x,\varphi,\psi\right)\leq\mu^{\varphi}_{x\psi v}(E),

we have

\hat{P_{2}}\left(E_{n}^{c}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

\hat{P_{2}}\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Consequently using (2.4) direct calculations imply

P\left(E_{n}^{c}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty,

and

P\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\to 0~{}~{}~{}\text{as}~{}~{}~{}n\to\infty.

Hence

\lim_{n\to\infty}\bigg{(}P\left(E_{n}|x_{2n},\varphi,\Psi_{2n}(x_{2n})\right)-P\left(E_{n}|x_{1n},\varphi,\Psi_{1n}(x_{1n})\right)\bigg{)}=1

(3.2)

But from Assumption 2.2 $(i)$ we get, $\forall x_{1},x_{2}\in X,\forall\psi,\psi^{\prime}\in\mathcal{P}(B),\forall E\in\mathcal{X}$

P(E|x,\varphi,\psi)-P(E|x^{\prime},\varphi,\psi^{\prime})\leq\delta,

which contradicts (3.2). Hence (3.1) holds and therefore the theorem also holds true. ∎

We will now make additional assumptions to show that $T$ is a global contraction in $\mathbb{B}_{L}(X).$

Assumption 3.1.

There exists $\lambda\in\mathcal{P}(X)$ such that $P(\cdot|x,a,b)<<\lambda$ for all $x\in X,a\in A,$ and $b\in B.$ Also let $h:X\times A\times B\times X\to\mathbb{R}$ be the Radon-Nikodym derivative of $P(\cdot|x,a,b)$ with respect to $\lambda$ .

Assumption 3.2.

\sup_{x,x^{\prime}\in X}\sup_{y\in X}\sup_{a\in A}\sup_{b\in B}\frac{h(x,a,b,y)}{h(x^{\prime},a,b,y)}=\kappa<\infty.

Lemma 3.1.

Under the Assumptions 2.1, 3.1 and 3.2 the operator $T$ transforms $\mathbb{B}(X)$ into $\mathbb{B}_{L}(X)$ , where $L=ln\kappa+3\bar{c}$ . Furthermore, $T$ is a global contraction in $\mathbb{B}_{L}(X).$

Proof.

Notice that for a $v\in\mathbb{B}(X)$ we have

Tv(x)-Tv(x^{\prime})\leq\sup_{\psi\in\mathcal{P}(B)}\Bigg{[}\hat{c}_{2}(x,\varphi,\psi)-\hat{c}_{2}(x^{\prime},\varphi,\psi)+ln\frac{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy|x^{\prime},\varphi,\psi)}\Bigg{]}.

(3.3)

Now we get

		$\displaystyle ln\frac{\int_{X}e^{v(y)}\hat{P}_{2}(dy\|x,\varphi,\psi)}{\int_{X}e^{v(y)}\hat{P}_{2}(dy\|x^{\prime},\varphi,\psi)}$
		$\displaystyle=ln\frac{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}{\frac{h(x,a,b,y)}{h(x^{\prime},a,b,y)}}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x^{\prime},a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}.\frac{\tilde{c}_{2}(x^{\prime},\varphi,\psi)}{\tilde{c}_{2}(x,\varphi,\psi)}$
		$\displaystyle\leq ln\kappa+ln\frac{\tilde{c}_{2}(x^{\prime},\varphi,\psi)}{\tilde{c}_{2}(x,\varphi,\psi)}+ln\frac{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}e^{c_{2}(x^{\prime},a,b)}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}$
		$\displaystyle\leq ln\kappa+lne^{\bar{c}}+ln\frac{e^{\bar{c}}\int_{X}e^{v(y)}\int_{B}\int_{A}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}{\int_{X}e^{v(y)}\int_{B}\int_{A}h(x^{\prime},a,b,y)\varphi(da)\psi(db)\lambda(dy)}$
	$\displaystyle=$	$\displaystyle ln\kappa+2\bar{c}.$

In the above expression the first equality follows from Assumption 3.1, first inequality follows from Assumption 3.2 and the last inequality follows from (2.3) and the fact that $0\leq c_{2}\leq\bar{c}$ . So, from (3.3) we have

Tv(x)-Tv(x^{\prime})\leq ln\kappa+3\bar{c}.

(3.4)

From (3.4) it follows that $\|Tv\|_{sp}\leq L$ . Hence $T$ is a global contraction in the span norm on $\mathbb{B}_{L}(X).$ ∎

Before proceeding to the main theorem of this section, we briefly outline some main points. Suppose player 2 announces that he/she is going to employ a strategy $\Psi\in S_{2}$ . In such a scenario, player 1 attempts to minimize

J^{\pi^{1},\Psi}_{1}(x)=\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\Psi}_{x}\bigg{[}e^{\sum_{t=0}^{n-1}c_{i}(x_{t},a_{t},b_{t})}\bigg{]}

over $\pi^{1}\in\Pi_{1}$ . Thus for player 1 it is a discrete-time Markov decision problem with risk sensitive ergodic cost. Player 2 go through equivalent situations when player 1 announces his strategy to be $\Phi\in S_{1}$ . This leads us to the following theorem.

Theorem 3.1.

Suppose Assumptions 2.1, 2.2, 3.1 and 3.2 are satisfied.Then for $\Phi\in S_{1}$ , there exists a unique solution pair $(\rho_{2}^{*},v_{2}^{*})\in\mathbb{R_{+}}\times\mathbb{B}_{L}(X)$ with $v_{2}^{*}(x_{0})=0$ , satisfying

e^{v(x)+\rho}={\inf_{\psi\in\mathcal{P}(B)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v(y)}P(dy|x,a,b)\Phi(x)(da)\psi(db)}.

(3.5)

In addition, a strategy $\Psi^{*}\in S_{2}$ is an optimal strategy of player 2 given player 1 chooses $\Phi$ if and only if (3.5) attains point-wise minimum at $\Psi^{*}$ . Moreover,

\rho^{*}_{2}=\inf_{\pi^{2}\in\Pi_{2}}\limsup_{n\to\infty}\frac{1}{n}lnE^{\Phi,\pi^{2}}_{x}\bigg{[}e^{\sum_{t=0}^{n-1}c_{2}(x_{t},a_{t},b_{t})}\bigg{]}.

(3.6)

$\bigg{(}:=\rho^{*\Phi}_{2}=\inf_{\pi^{2}\in\Pi_{2}}J^{\Phi,\pi^{2}}_{2}\bigg{)}$

Proof.

Notice that (3.5) can be rewritten as

v(x)+\rho=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi)\bigg{]}.

(3.7)

By Lemma 3.1, $T$ is a global contraction in the span norm in $\mathbb{B}_{L}(X)$ , so that it has a fixed point $\hat{v}_{2}$ in $\mathbb{B}_{L}(X)$ and $\hat{v}_{2}$ (which is unique up to an additive constant) and the constant $\rho_{2}^{*}=T\hat{v}_{2}-\hat{v}_{2}$ are solutions to (3.7) and consequently to (3.5).

Let $v^{*}_{2}(x)=\hat{v}_{2}(x)-\hat{v}_{2}(x_{0})$ . Then $v^{*}_{2}(x_{0})=0$ and it can be easily seen that $(\rho_{2}^{*},v^{*}_{2})$ satisfies (3.7). Since, $v^{*}_{2}(x)=\hat{v}_{2}(x)-\hat{v}_{2}(x_{0})$ and $\|\hat{v}_{2}\|_{sp}\leq L$ , so $\|v^{*}_{2}\|_{sp}\leq L$ as well.

Let $(\rho^{\prime},v^{\prime})$ be another solution of (3.5) i.e., it satisfies $\rho^{\prime}+v^{\prime}=Tv^{\prime}(x)$ with $v^{\prime}(x_{0})=0$ . Then clearly $v^{\prime}$ is also a span fixed point of $T$ . Hence $v^{*}_{2}(x)-v^{\prime}(x)=constant.$ Since $v^{*}_{2}(x_{0})-v^{\prime}(x_{0})=0$ , it follows that $v^{*}_{2}\equiv v^{\prime}$ . It then easily follows that $\rho^{*}_{2}=\rho^{\prime}$ .

The proof of the remaining part is analogous to the proof in [12, Theorem 2.1] which has been done for countable state space but can easily be extended to our general state space case. ∎

For a fixed $\psi\in\mathcal{P}(B),$ and $v\in\mathbb{B}(X)$ define the operator:

Uv(x)=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\psi)+ln\int_{X}e^{v(y)}\hat{P}_{1}(dy|x,\varphi,\psi)\bigg{]}.

(3.8)

By similar arguments we can also show that $U$ is a global contraction in the span norm in $\mathbb{B}_{L}(X)$ and the following theorem holds true.

Theorem 3.2.

Suppose Assumptions 2.1, 2.2, 3.1 and 3.2 are satisfied. Then for $\Psi\in S_{2}$ , there exists a unique solution pair $(\rho^{*}_{1},v_{1}^{*})\in\mathbb{R_{+}}\times\mathbb{B}_{L}(X)$ with $v_{2}^{*}(x_{0})=0$ (where $x_{0}$ is some fixed state), satisfying

e^{v(x)+\rho}={\inf_{\varphi\in\mathcal{P}(A)}\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v(y)}P(dy|x,a,b)\varphi(da)\Psi(x)(db)}.

(3.9)

In addition, a strategy $\Phi^{*}\in S_{1}$ is an optimal strategy of player 1 given player 2 chooses $\Psi$ if and only if (3.9) attains point-wise minimum at $\Phi^{*}$ . Moreover,

\rho^{*}_{1}=\inf_{\pi^{1}\in\Pi_{1}}\limsup_{n\to\infty}\frac{1}{n}lnE^{\pi^{1},\Psi}_{x}\big{[}e^{\sum_{t=0}^{n-1}c_{1}(x_{t},a_{t},b_{t})}\big{]}.

(3.10)

$\bigg{(}:=\rho^{*\Psi}_{1}=\inf_{\pi^{1}\in\Pi_{1}}J^{\pi^{1},\Psi}_{1}\bigg{)}$

4 Existence of Nash equilibrium

In this section we establish the existence of a pair of stationary equilibrium strategies for a nonzero-sum game. To this end we first outline a standard procedure for establishing the existence of a Nash equilibrium. From Theorem 3.2 it follows that given that player 2 is using the strategy $\Psi\in S_{2}$ , we can find an optimal response $\Phi^{*}\in S_{1}$ for player 1 . Clearly $\Phi^{*}$ depends on $\Psi$ and moreover there may be several optimal responses for player 1 in $S_{1}$ . Analogous results holds for player 2 if player 1 announces that he is going to use a strategy $\Phi\in S_{1}$ . Hence given a pair of strategies $\left(\Phi,\Psi\right)\in S_{1}\times S_{2}$ , we can find a set of pairs of optimal responses $\left\{\left(\Phi^{*},\Psi^{*}\right)\in S_{1}\times S_{2}\right\}$ via the appropriate pair of optimality equations described above. This defines a set-valued map. Clearly any fixed point of this set-valued map is a Nash equilibrium.

To ensure the existence of a Nash equilibrium, we first take the following separability assumptions.

Assumption 4.1.

There exist two sub stochastic kernels

P_{1}:X\times A\rightarrow\mathcal{P}(X),\quad P_{2}:X\times B\rightarrow\mathcal{P}(X)

such that

P(\cdot\mid x,a,b)=P_{1}(\cdot\mid x,a)+P_{2}(\cdot\mid x,b),\quad x\in X,\quad a\in A,\quad b\in B.

Since $P<<\lambda$ , we have $P_{1}<<\lambda$ and $P_{2}<<\lambda$ . Let $h_{1}$ and $h_{2}$ be the respective densities. We assume that for each $x,y\in X$ , $h_{1}(x,\cdot,y)$ and $h_{2}(x,\cdot,y)$ are continuous.

For each $x\in X$ ,

\int_{X}\bigg{[}\sup_{a}|h_{1}(x,a,y)|+\sup_{b}|h_{1}(x,b,y)|\bigg{]}\lambda(dy)<\infty

Assumption 4.2.

The reward functions $c_{i},i=1,2$ , are separable in action variables, i.e., there exist bounded continuous(in the second variable) functions

c_{i1}:X\times A\rightarrow\mathbb{R},\quad c_{i2}:X\times B\rightarrow\mathbb{R},\quad i=1,2,

such that

c_{i}(x,a,b)=c_{i1}(x,a)+c_{i2}(x,b),\quad x\in X,\quad a\in A,\quad b\in B.

Following [14] and [18] we topologize the spaces $S_{1}$ and $S_{2}$ with the topology of relaxed controls introduced in [20]. We identify two elements $\Phi,\hat{\Phi}\in S_{1}$ if $\Phi=\hat{\Phi}$ a.e. $\lambda$ (where $\lambda$ is as in Assumption 3.1). Let

$Y_{1}=\{f:X\times A\rightarrow\mathbb{R}\mid f$ is measurable in the first argument and continuous in the second and there exists $g\in L^{1}(\lambda)$ such that $|f(x,a)|\leq g(x)$ for every $a\in A\}$ .

Then $Y_{1}$ is a Banach space with norm [20]

\|f\|_{W}=\int_{X}\sup_{a}|f(x,a)|\lambda(dx).

Every $\Phi\in S_{1}$ (with the $\lambda$ -a.e. equivalence relation) can be identified with the element $\Lambda_{\Phi}\in Y_{1}^{*}$ (the dual of $Y_{1}$ ) defined as

\Lambda_{\Phi}(f)=\int_{X}\int_{A}f(x,a)\Phi(x)(da)\lambda(dx).

Thus $S_{1}$ can be identified with a subset of $Y_{1}^{*}$ . Equip $S_{1}$ with the weak-star topology. Then it can be shown as in [18] that $S_{1}$ is compact and metrizable. $S_{2}$ can be topologized analogously.

Next, we present the following lemmas, which play a pivotal role to show upper semi-continuity of a specific set-valued map which we have mentioned earlier.

Lemma 4.1.

Let, $\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2}$ in the weak star topology. Then under Assumption 4.2 for $i=1,2$ , $\hat{c}_{i}(x,\Phi_{m},\Psi_{m})\to\hat{c}_{i}(x,{\Phi},{\Psi})$ as $m\to\infty$ .

Proof.

We have for $i=1,2$ ,

		$\displaystyle\hat{c}_{i}(x,\Phi_{m},\Psi_{m})=ln\tilde{c}_{i}(x,\Phi_{m},\Psi_{m})$
		$\displaystyle=ln\int_{B}\int_{A}e^{c_{i}(x,a,b)}\Phi_{m}(x)(da)\Psi_{m}(x)(db)$
		$\displaystyle=ln\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)+ln\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)$
		$\displaystyle=ln\int_{X}1.\lambda(dx)\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)+ln\int_{X}1.\lambda(dx)\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)$
		$\displaystyle=ln\int_{X}\int_{A}\bigg{(}1.e^{c_{i1}(x,a)}\bigg{)}\Phi_{m}(x)(da)\lambda(dx)+ln\int_{X}\int_{B}\bigg{(}1.e^{c_{i2}(x,b)}\bigg{)}\Psi_{m}(x)(db)\lambda(dx).$

Now by Assumption 4.2 and since $\Phi_{m}\to\Phi~{}\text{and}~{}\Psi_{m}\to\Psi$ in the weak star topology, the result is immediate. ∎

Lemma 4.2.

Suppose Assumptions 2.1, 3.1, 4.1 and 4.2 hold. Let $\{v_{m}\}$ be a uniformly bounded sequence in $\mathbb{B}(X)$ and $v\in\mathbb{B}(X)$ be a weak star limit point of $\{v_{m}\}$ . If $\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2}$ in the weak star topology, then for each $x\in X$ and $i=1,2$ ,

\int_{X}{v_{m}(y)}\hat{P_{i}}(dy|x,\Phi_{m},\Psi_{m})\to\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi,\Psi)~{}~{}~{}~{}\text{as}~{}~{}~{}~{}m\rightarrow\infty.

Proof.

Note that

		$\displaystyle\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi(x),\Psi(x))\Bigg{\|}$		(4.1)
		$\displaystyle\leq\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{\|}$
		$\displaystyle+\Bigg{\|}\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi(x),\Psi(x))\Bigg{\|}.$

We claim that

\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi_{m},\Psi_{m})\to\int_{X}{v(y)}\hat{P_{i}}(dy|x,\Phi,\Psi)~{}~{}~{}~{}\text{as}~{}~{}~{}~{}m\rightarrow\infty.

(4.2)

Observe that, under Assumption 4.1 $(i)$ , Assumption 4.2 and using (2.4) we get

		$\displaystyle\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m},\Psi_{m})$
		$\displaystyle=\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{1}(dy\|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+$
		$\displaystyle\hskip 85.35826pt\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{2}(dy\|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}$
		$\displaystyle=\frac{\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}h_{1}(x,a,y)\lambda(dy)\Phi_{m}(x)(da)\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+$
		$\displaystyle\hskip 71.13188pt\frac{\int_{B}\int_{X}v(y)e^{c_{i2}(x,b)}h_{2}(x,b,y)\lambda(dy)\Psi_{m}(x)(db)\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}.$

By using Lemma 4.1 and since $\Phi_{m}\to{\Phi}\in S_{1}~{}\text{and}~{}\Psi_{m}\to\Psi\in S_{2}$ , under Assumption 4.1 $(i)$ , (4.2) holds true, i.e. the second term in the right hand side of (4.1) goes to zero as $m\to\infty$ . Now we show that the first one also goes to zero.

Again note that,

		$\displaystyle\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{\|}$
		$\displaystyle=\Bigg{\|}\int_{X}\int_{B}\int_{A}\frac{{v_{m}(y)}-{v(y)}}{\tilde{c}_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}e^{c_{i}(x,a,b)}P(dy\|x,a,b)\Phi_{m}(x)(da)\Psi_{m}(x)(db)\Bigg{\|}$
		$\displaystyle\leq e^{\bar{c}}\int_{B}\int_{A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy\|x,a,b)\Bigg{\|}\Phi_{m}(x)(da)\Psi_{m}(x)(db)$
		$\displaystyle\leq e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy\|x,a,b)\Bigg{\|}$
		$\displaystyle=e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a,b,y)\lambda(dy)\Bigg{\|},$

where $h=h_{1}+h_{2}$ . From the compactness of $A,B$ and the continuity of $h(x,.,.,y)$ , it follows that for $m\in\mathbb{N},$

\displaystyle V_{m}(x)

\displaystyle:=\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a_{m},b_{m},y)\lambda(dy)\Bigg{|}=\sup_{b\in B}\sup_{a\in A}\Bigg{|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a,b,y)\lambda(dy)\Bigg{|},

for some sequences $\{a_{m}\}\in A$ and $\{b_{m}\}\in B$ . We now prove that $V_{m}(x)\to 0$ as $m\to\infty$ .

Since $A$ and $B$ are compact , without loss of generality, we can assume that

a_{m}\rightarrow a_{0}\quad\text{ and }\quad b_{m}\rightarrow b_{0},\quad\text{ for some }a_{0}\in A\text{ and }b_{0}\in B.

Note that, for each $m$ , we have

	$\displaystyle V_{m}(x)$	$\displaystyle\leq\left\|\int\left({v_{m}(y)}-{v(y)}\right)\left(h\left(x,a_{m},b_{m},y\right)-h\left(x,a_{0},b_{0},y\right)\right)\lambda(dy)\right\|+$		(4.3)
		$\displaystyle\hskip 156.49014pt\left\|\int\left({v_{m}(y)}-{v(y)}\right)h\left(x,a_{0},b_{0},y\right)\lambda(dy)\right\|.$		(4.3)

Moreover,

		$\displaystyle\left\|\int\left({v_{m}(y)}-{v(y)}\right)\left(h\left(x,a_{m},b_{m},y\right)-h\left(x,a_{0},b_{0},y\right)\right)\lambda(dy)\right\|\leq$
		$\displaystyle\hskip 128.0374pt\left\\|{v_{m}(y)}-{v(y)}\right\\|\left\\|h\left(x,a_{m},b_{m},\cdot\right)-h\left(x,a_{0},b_{0},\cdot\right)\right\\|_{L^{1}(\lambda)}.$

By Assumption 4.1 $(ii)$ and by the boundedness of $\left\{\left\|{v_{m}(y)}-{v(y)}\right\|\right\}$ , from the last inequality we get that, first term on the right-hand side of (4.3) goes to zero as $m\rightarrow\infty$ . Since, $v$ is a weak star limit of $\{v_{m}\}$ and $h\left(x,a_{0},b_{0},\cdot\right)\in L_{1}(\lambda)$ , so the second term on the right-hand side of (4.3) also goes to zero as $m\rightarrow\infty$ . Thus, we have shown that $V_{m}(x)\rightarrow 0$ as $m\rightarrow\infty$ . Hence the result followed. ∎

Lemma 4.3.

Let for $M>0$ , $\{v_{m}\}$ be any sequence in $\mathbb{B}_{M}(X)$ with $v_{m}(x_{0})=0$ for all $m\in\mathbb{N}$ . Then $\{{v_{m}}\}$ is uniformly bounded .

Proof.

Now for each fixed $m\in\mathbb{N}$ as,

\inf_{x\in X}v_{m}(x)\leq v_{m}(x_{0})=0

and $\|v_{m}(x)\|_{sp}\leq M$ , using (2.8) we have

\sup_{x\in X}v_{m}{(x)}\leq M

(4.4)

Again as $\sup_{x\in X}v_{m}(x)\geq v_{m}(x_{0})$ and $\|v_{m}(x)\|_{sp}\leq M$ , using (2.8) we have

\inf_{x\in X}v_{m}{(x)}\geq-M

(4.5)

Now from, (4.4) and (4.5) for fixed $m\in\mathbb{N}$ and each $x\in X$ we get

-M\leq v_{m}(x)\leq M

Therefore, $\{{v_{m}}\}$ is a uniformly bounded sequence in $\mathbb{B}_{M}(X)$ . ∎

For a fix $\Phi\in S_{1}$ let,

		$\displaystyle H(\Phi)=\bigg{\{}\Psi^{}\in S_{2}:\hat{c}_{2}(x,\Phi(x),\Psi^{}(x))+ln\int_{X}e^{v^{\Phi}_{2}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\Psi^{}(x))$
		$\displaystyle\hskip 99.58464pt=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{v^{*\Phi}_{2}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\psi)\bigg{]}\bigg{\}},$

where ${v^{*\Phi}_{2}}$ is the unique solution of (3.5) corresponding to the strategy $\Phi\in S_{1}$ .

Similarly for a fix $\Psi\in S_{2}$ let,

		$\displaystyle H(\Psi)=\bigg{\{}\Phi^{}\in S_{1}:\hat{c}_{1}(x,\Phi^{}(x),\Psi(x))+ln\int_{X}e^{v^{\Psi}_{1}(y)}\hat{P}_{1}(dy\|x,\Phi^{}(x),\Psi(x))$
		$\displaystyle\hskip 99.58464pt=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\Psi(x))+ln\int_{X}e^{v^{*\Psi}_{1}(y)}\hat{P}_{2}(dy\|x,\varphi,\Psi(x))\bigg{]}\bigg{\}},$

where ${v^{*\Psi}_{1}}$ is the unique solution of (3.9) corresponding to the strategy $\Psi\in S_{2}$ .

Remark 4.1.

Since exponential and logarithmic functions are increasing functions, so $H(\Phi)$ and $H(\Psi)$ also have the following expressions:

		$\displaystyle H(\Phi)=\bigg{\{}\Psi^{}\in S_{2}:\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v^{\Phi}_{2}(y)}P(dy\|x,a,b)\Phi(x)(da)\Psi^{*}(x)(db)$
		$\displaystyle\hskip 85.35826pt=\inf_{\psi\in\mathcal{P}(B)}\int_{B}\int_{A}e^{c_{2}(x,a,b)}\int_{X}e^{v^{*\Phi}_{2}(y)}P(dy\|x,a,b)\Phi(x)(da)\psi(db)\bigg{\}}.$

		$\displaystyle H(\Psi)=\bigg{\{}\Phi^{}\in S_{1}:\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v^{\Psi}_{1}(y)}P(dy\|x,a,b)\Phi^{*}(x)(da)\Psi(x)(db)$
		$\displaystyle\hskip 85.35826pt=\inf_{\varphi\in\mathcal{P}(A)}\int_{B}\int_{A}e^{c_{1}(x,a,b)}\int_{X}e^{v^{*\Psi}_{1}(y)}P(dy\|x,a,b)\varphi(da)\Psi(x)(db)\bigg{\}}.$

Next set

H(\Phi,\Psi)=H(\Psi)\times H(\Phi).

Lemma 4.4.

Under Assumptions 2.1, 3.1, 4.1 and 4.2, $H(\Phi,\Psi)$ is a non-empty compact convex subset of $S_{1}\times S_{2}$ .

Proof.

From Remark 2.1, we know that $\int_{X}e^{v(y)}\hat{P}_{2}(dy|x,\Phi(x),\psi)$ is continuous on $\mathcal{P}(A)\times\mathcal{P}(B)$ for each $x\in X$ . As $B$ is compact, $\mathcal{P}(B)$ is also compact. Then it is easy to see that $H(\Phi)$ is non-empty. Let $\Psi_{m}^{*}\in H(\Phi)$ and as $S_{2}$ is compact, $\{\Psi^{*}_{m}\}$ has a convergent subsequence(denoted by the same sequence by abuse of notation)such that $\Psi^{*}_{m}\to\hat{\Psi}\in S_{2}$ . Now for any $\psi\in\mathcal{P}(B)$

		$\displaystyle\hat{c}_{2}(x,\Phi(x),\Psi_{m}^{}(x))+ln\int_{X}e^{{v^{\Phi}_{2}}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\Psi_{m}^{*}(x))$		(4.6)
		$\displaystyle\hskip 142.26378pt\leq\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\psi).$		(4.6)

Using Lemma 4.1 and 4.2, from (4.6) we get for any $\psi\in\mathcal{P}(B)$

		$\displaystyle\hat{c}_{2}(x,\Phi(x),\hat{\Psi}(x))+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\hat{\Psi}(x))$
		$\displaystyle\hskip 113.81102pt\leq\hat{c}_{2}(x,\Phi(x),\psi)+ln\int_{X}e^{{v^{*\Phi}_{2}}(y)}\hat{P}_{2}(dy\|x,\Phi(x),\psi)\hskip 7.11317pt\lambda-a.e.$

Hence it follows that $\hat{\Psi}\in H(\Phi)$ and therefore $H(\Phi)$ is closed. Since $S_{2}$ is a compact metric space, it follows that $H(\Phi)$ is also compact. Using Remark 4.1 the convexity of $H(\Phi)$ and $H(\Psi)$ follows easily. By analogous arguments, $H(\Psi)$ is also non-empty compact subset of $S_{2}$ . Hence $H(\Phi,\Psi)$ is a non-empty compact convex subset of $S_{1}\times S_{2}$ . ∎

Next lemma proves the upper semi-continuity of a certain set valued map. This result will be useful in establishing the existence of a Nash equilibrium in the space of stationary Markov strategies.

Lemma 4.5.

Under Assumptions 2.1, 2.2, 3.1, 3.2, 4.1 and 4.2 the map $(\Phi,\Psi)\to H(\Phi,\Psi)$ from $S_{1}\times S_{2}\to 2^{S_{1}\times S_{2}}$ is upper semi-continuous.

Proof.

Let $\Psi_{m}^{*}\in H(\Phi_{m})$ . $\{\Phi_{m}\}$ has a convergent subsequence (denoted by the same sequence by abuse of notation) such that $\Phi_{m}\to\bar{\Phi}\in S_{1}$ and similarly $\{\Psi_{m}^{*}\}$ has a subsequence too such that $\Psi_{m}^{*}\to\hat{\Psi}\in S_{2}$ . Since, $\{v^{*\Phi_{m}}_{2}\}\in\mathbb{B}_{L}(X)$ and $\{\rho^{*\Phi_{m}}_{2}\}$ is bounded so without loss of generality let $v^{*\Phi_{m}}_{2}\to v_{2}$ in the weak star sense and $\rho^{*\Phi_{m}}_{2}\to\rho_{2}$ . Then since,

{\rho^{*\Phi_{m}}_{2}}+{v^{*\Phi_{m}}_{2}}=\hat{c}_{2}(x,\Phi_{m}(x),\Psi_{m}^{*}(x))+ln\int_{X}e^{{v^{*\Phi_{m}}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi_{m}(x),\Psi_{m}^{*}(x))

(4.7)

Using Lemma 4.1, 4.2 and 4.3 it follows that

\rho_{2}+v_{2}(x)=\hat{c}_{2}(x,\bar{\Phi}(x),\hat{\Psi}(x))+ln\int_{X}e^{{v_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\hat{\Psi}(x))\hskip 14.22636pt\lambda-a.e.

(4.8)

From (4.7) for any $\psi\in\mathcal{P}(B)$ we get

{\rho^{*\Phi_{m}}_{2}}+{v^{*\Phi_{m}}_{2}}\leq\hat{c}_{2}(x,\Phi_{m}(x),\psi)+ln\int_{X}e^{{v^{*\Phi_{m}}_{2}}(y)}\hat{P}_{2}(dy|x,\Phi_{m}(x),\psi).

Again using Lemma 4.1, 4.2 and 4.3 it follows that

\rho_{2}+v_{2}(x)\leq\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\hskip 14.22636pt\lambda-a.e.

(4.9)

Let $v^{*}_{2}(x)=v_{2}(x)-v_{2}(x_{0})$ . Then from (4.9) we get, for any $\psi\in\mathcal{P}(B)$

\rho_{2}+v^{*}_{2}(x)\leq\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\hskip 14.22636pt\lambda-a.e.

(4.10)

and from (4.8) we get

\begin{aligned} \rho_{2}+v^{*}_{2}(x)&=\hat{c}_{2}(x,\bar{\Phi}(x),\hat{\Psi}(x))+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\hat{\Psi}(x))\hskip 14.22636pt\\ &\geq\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\bigg{]}\hskip 14.22636pt\end{aligned}\Biggl{\}}\lambda-a.e.

(4.11)

Since (4.10) holds for every $\psi\in\mathcal{P}(B)$ , from (4.10) and (4.11) we get

\rho_{2}+v^{*}_{2}(x)=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\bar{\Phi}(x),\psi)+ln\int_{X}e^{{v^{*}_{2}(y)}}\hat{P}_{2}(dy|x,\bar{\Phi}(x),\psi)\bigg{]}\hskip 14.22636pt\lambda-a.e.

(4.12)

with $v^{*}_{2}(x_{0})=0$ . Now by Theorem 3.1 we can say that (4.12) has unique solution $(\rho^{*\bar{\Phi}}_{2},v^{*\bar{\Phi}}_{2})$ (corresponds to $\bar{\Phi}\in S_{1}$ ) satisfying $v^{*\bar{\Phi}}_{2}(x_{0})=0$ . Therefore, $\rho_{2}=\rho^{*\bar{\Phi}}_{2}$ and $v^{*}_{2}=v^{*\bar{\Phi}}_{2}$ . Thus, from (4.11) and (4.12) it follows that $\hat{\Psi}\in H(\bar{\Phi})$ .

Suppose $\Phi_{m}^{*}\in H(\Psi_{m})$ and along a suitable subsequence $\Phi_{m}^{*}\to\hat{\Phi}\in S_{1}$ and $\Psi_{m}\to\bar{\Psi}\in S_{2}$ . Then by similar arguments one can show that $\hat{\Phi}\in H(\bar{\Psi})$ . This proves that $(\hat{\Phi},\hat{\Psi})\in H(\bar{\Phi},\bar{\Psi})$ . Hence the map $(\Phi,\Psi)\to H(\Phi_{1},\Psi)$ is upper semi-continuous.

Now we are now all set to show the existence is of the Nash equilibrium which is directly follows by using Fan’s fixed point theorem [9]. ∎

Theorem 4.1.

Suppose that the Assumptions 2.1, 2.2, 3.1, 3.2, 4.1 and 4.2 are satisfied. Then there exists a Nash equilibrium in the space of stationary strategies $S_{1}\times S_{2}$ .

Proof.

Using Lemma 4.4 and 4.5 from Fan’s fixed point theorem, it follows that there exists a fixed point $(\Phi^{*},\Psi^{*})\in(S_{1}\times S_{2})$ for the map $(\Phi,\Psi)\to H(\Phi,\Psi)$ from $S_{1}\times S_{2}\to 2^{S_{1}\times S_{2}}$ . This implies that $(\rho_{1}^{*\Psi^{*}},v_{1}^{*\Psi^{*}}),(\rho_{2}^{*\Phi^{*}},v_{2}^{*\Phi^{*}})$ satisfy the following coupled optimality equations:

	$\displaystyle\rho_{1}^{\Psi^{}}+v_{1}^{\Psi^{}}$	$\displaystyle=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\Psi^{}(x))+ln\int_{X}e^{v_{1}^{\Psi^{}}}(y)\hat{P}_{1}(dy\|x,\varphi,\Psi^{}(x))\bigg{]}$		(4.13)
		$\displaystyle=\bigg{[}\hat{c}_{1}(x,\Phi^{}(x),\Psi^{}(x))+ln\int_{X}e^{v_{1}^{\Psi^{}}}(y)\hat{P}_{1}(dy\|x,\Phi^{}(x),\Psi^{}(x))\bigg{]},$		(4.13)

and

	$\displaystyle\rho_{2}^{\Phi^{}}+v_{2}^{\Phi^{}}$	$\displaystyle=\inf_{\psi\in\mathcal{P}(B)}\bigg{[}\hat{c}_{2}(x,\Phi^{}(x),\psi)+ln\int_{X}e^{v_{2}^{\Phi^{}_{1}}}\hat{P}_{2}(dy\|x,\Phi^{}(x),\psi)\bigg{]}$		(4.14)
		$\displaystyle=\bigg{[}\hat{c}_{2}(x,\Phi^{}(x),\Psi^{}(x))+ln\int_{X}e^{v_{2}^{\Phi^{}_{1}(x)}}\hat{P}_{1}(dy\|x,\Phi^{}(x),\Psi^{}(x))\bigg{]}.$		(4.14)

Now by Theorem 3.1, from (4.13), it follows that

\rho_{1}^{*\Psi^{*}}=\inf_{\pi^{1}\in\Pi_{1}}J^{\pi^{1},\Psi^{*}}_{1}=J^{\Phi^{*},\Psi^{*}}_{1}.

(4.15)

Similarly, by Theorem 3.2, from (4.14), it follows that

\rho^{*\Phi^{*}}_{2}=\inf_{\pi^{2}\in\Pi_{2}}J^{\Phi^{*},\pi^{2}}_{2}=J^{\Phi^{*},\Psi^{*}}_{2}.

(4.16)

Thus, from equations (4.15) and (4.16), we get

\begin{aligned} &J^{\pi_{1},\Psi^{*}}_{1}\geq J^{\Phi^{*},\Psi^{*}}_{1},\forall\pi^{1}\in\Pi_{1},\\ &J^{\Phi^{*},\pi^{2}}_{2}\geq J^{\Phi^{*},\Psi^{*}}_{2},\forall\pi^{2}\in\Pi_{2}.\end{aligned}\Biggl{\}}

(4.17)

Hence $(\Phi^{*},\Psi^{*})\in S_{1}\times S_{2}$ obviously forms a $\lambda$ -equilibrium stationary strategy (i.e., (4.17) holds in a set of $\lambda$ -measure 1). Then by a construction analogous to Theorem 1 of [18] the existence of the desired Nash equilibrium follows. ∎

References

[1] Tansu Alpcan and Tamer Ba¸sar. Network security. Cambridge University Press, Cambridge, 2011. A decision and game-theoretic approach.
[2] Arnab Basu and Mrinal K Ghosh. Zero-sum risk-sensitive stochastic games on a countable state space. Stochastic Process. Appl., 124(1):961–983, 2014.
[3] Arnab Basu and Mrinal K Ghosh. Nonzero-sum risk-sensitive stochastic games on a countable state space. Mathematics of Operations Research, 43(2):516–532, 2018.
[4] N. Bauerle and U. Rieder. Zero-sum risk sensitive stochastic games. Stoch. Processes and Their Appl, 127:622–642, 2017.
[5] Dimitri Bertsekas and Steven E Shreve. Stochastic optimal control: the discrete-time case, volume 5. Athena Scientific, 1996.
[6] Tomasz R. Bielecki, Stanley R. Pliska, and Shuenn-Jyi Sheu. Risk sensitive portfolio management with Cox-Ingersoll-Ross interest rates: the HJB equation. SIAM J. Control Optim., 44(5):1811–1843, 2005.
[7] Anup Biswas and Vivek S. Borkar. Ergodic risk-sensitive control—a survey. Annu. Rev. Control, 55:118–141, 2023.
[8] Giovanni B Di Masi and Lukasz Stettner. Risk-sensitive control of discrete-time markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38(1):61–78, 1999.
[9] Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of the National Academy of Sciences, 38(2):121–126, 1952.
[10] Mrinal K Ghosh and Arunabha Bagchi. Stochastic games with average payoff criterion. Applied Mathematics and Optimization, 38:283–301, 1998.
[11] Mrinal K Ghosh, Subrata Golui, Chandan Pal, and Somnath Pradhan. Discrete-time zero-sum games for markov chains with risk-sensitive average cost criterion. Stochastic Processes and their Applications, 158:40–74, 2023.
[12] Daniel Hernández-Hernández and Steven I Marcus. Risk sensitive control of markov processes in countable state space. Systems & control letters, 29(3):147–155, 1996.
[13] Onésimo Hernández-Lerma. Adaptive Markov control processes, volume 79. Springer Science & Business Media, 2012.
[14] CJ Himmelberg, Thiruvenkatachari Parthasarathy, TES Raghavan, and FS Van Vleck. Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proceedings of the American Mathematical Society, 60(1):245–251, 1976.
[15] Ronald A. Howard and James E. Matheson. Risk-sensitive Markov decision processes. Management Sci., 18:356–369, 1971/72.
[16] Matthew O. Jackson. Social and economic networks. Princeton University Press, Princeton, NJ, 2008.
[17] Andrzej S. Nowak and Eitan Altman. $\epsilon$ -equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J. Control Optim., 40(6):1821–1839, 2002.
[18] T Parthasarathy. Existence of equilibrium stationary strategies in discounted stochastic games. Sankhyā: The Indian Journal of Statistics, Series A, pages 114–127, 1982.
[19] Tim Roughgarden. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, 2016.
[20] Jack Warga. Functions of relaxed controls. SIAM Journal on Control, 5(4):628–641, 1967.
[21] Qingda Wei and Xian Chen. Nonzero-sum expected average discrete-time stochastic games: the case of uncountable spaces. SIAM J. Control Optim., 57(6):4099–4124, 2019.
[22] Qingda Wei and Xian Chen. Risk-sensitive average equilibria for discrete-time stochastic games. Dynamic Games and Applications, 9:521–549, 2019.
[23] Qingda Wei and Xian Chen. Nonzero-sum risk-sensitive average stochastic games: the case of unbounded costs. Dynamic games and Applications, 11:835–862, 2021.
[24] Peter Whittle. Risk-sensitive linear/quadratic/gaussian control. Advances in Applied Probability, 13(4):764–777, 1981.

		$\displaystyle(Tv_{1})(x_{2})-(Tv_{2})(x_{2})-((Tv_{1})(x_{1})-(Tv_{2})(x_{1}))$
		$\displaystyle\leq\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}$
		$\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{2}(y)\mu(dy)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)\right\}$
		$\displaystyle-\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{1}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}$
		$\displaystyle+\sup_{\mu}\left\{\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu\left(dy\right)-I\left(\mu,\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)\right\}$
		$\displaystyle\leq\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)+\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)-I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)$
		$\displaystyle-\hat{c}_{2}\left(x_{2},\varphi,\Psi_{2}(x_{2})\right)-\int_{X}v_{2}(y)\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}(dy)+I\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}},\hat{P_{2}}\left(\cdot\|x_{2},\varphi,\Psi_{2}(x_{2})\right)\right)$
		$\displaystyle-\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)-\int_{X}v_{1}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)+I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)$
		$\displaystyle+\hat{c}_{2}\left(x_{1},\varphi,\Psi_{1}(x_{1})\right)+\int_{X}v_{2}\left(y\right)\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}(dy)-I\left(\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}},\hat{P_{2}}\left(\cdot\|x_{1},\varphi,\Psi_{1}(x_{1})\right)\right)$
		$\displaystyle=\int_{\Delta}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)+\int_{\Delta^{c}}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(dy)$
		$\displaystyle\leq\sup_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)(\Delta)+\inf_{y\in X}\left(v_{1}\left(y\right)-v_{2}\left(y\right)\right)\left(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}}\right)({\Delta}^{c})$
		$\displaystyle=\\|v_{1}-v_{2}\\|_{sp}(\mu^{\varphi}_{x_{2}\Psi_{2}v_{1}}-\mu^{\varphi}_{x_{1}\Psi_{1}v_{2}})(\Delta),$

		$\displaystyle\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi(x),\Psi(x))\Bigg{\|}$		(4.1)
		$\displaystyle\leq\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{\|}$
		$\displaystyle+\Bigg{\|}\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi(x),\Psi(x))\Bigg{\|}.$

		$\displaystyle\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m},\Psi_{m})$
		$\displaystyle=\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{1}(dy\|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+$
		$\displaystyle\hskip 85.35826pt\frac{\int_{B}\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}e^{c_{i2}(x,b)}P_{2}(dy\|x,a)\Phi_{m}(x)(da)\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}$
		$\displaystyle=\frac{\int_{A}\int_{X}v(y)e^{c_{i1}(x,a)}h_{1}(x,a,y)\lambda(dy)\Phi_{m}(x)(da)\int_{B}e^{c_{i2}(x,b)}\Psi_{m}(x)(db)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}+$
		$\displaystyle\hskip 71.13188pt\frac{\int_{B}\int_{X}v(y)e^{c_{i2}(x,b)}h_{2}(x,b,y)\lambda(dy)\Psi_{m}(x)(db)\int_{A}e^{c_{i1}(x,a)}\Phi_{m}(x)(da)}{c_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}.$

		$\displaystyle\Bigg{\|}\int_{X}{v_{m}(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))-\int_{X}{v(y)}\hat{P_{i}}(dy\|x,\Phi_{m}(x),\Psi_{m}(x))\Bigg{\|}$
		$\displaystyle=\Bigg{\|}\int_{X}\int_{B}\int_{A}\frac{{v_{m}(y)}-{v(y)}}{\tilde{c}_{i}(x,\Phi_{m}(x),\Psi_{m}(x))}e^{c_{i}(x,a,b)}P(dy\|x,a,b)\Phi_{m}(x)(da)\Psi_{m}(x)(db)\Bigg{\|}$
		$\displaystyle\leq e^{\bar{c}}\int_{B}\int_{A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy\|x,a,b)\Bigg{\|}\Phi_{m}(x)(da)\Psi_{m}(x)(db)$
		$\displaystyle\leq e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}P(dy\|x,a,b)\Bigg{\|}$
		$\displaystyle=e^{\bar{c}}~{}\sup_{b\in B}\sup_{a\in A}\Bigg{\|}\int_{X}{({v_{m}(y)}-{v(y)})}h(x,a,b,y)\lambda(dy)\Bigg{\|},$

	$\displaystyle\rho_{1}^{\Psi^{}}+v_{1}^{\Psi^{}}$	$\displaystyle=\inf_{\varphi\in\mathcal{P}(A)}\bigg{[}\hat{c}_{1}(x,\varphi,\Psi^{}(x))+ln\int_{X}e^{v_{1}^{\Psi^{}}}(y)\hat{P}_{1}(dy\|x,\varphi,\Psi^{}(x))\bigg{]}$		(4.13)
		$\displaystyle=\bigg{[}\hat{c}_{1}(x,\Phi^{}(x),\Psi^{}(x))+ln\int_{X}e^{v_{1}^{\Psi^{}}}(y)\hat{P}_{1}(dy\|x,\Phi^{}(x),\Psi^{}(x))\bigg{]},$		(4.13)