Nonzero-sum Discrete-time Stochastic Games with Risk-sensitive Ergodic Cost Criterion
Abstract
In this paper we study infinite horizon nonzero-sum stochastic games for controlled discrete-time Markov chains on a Polish state space with risk-sensitive ergodic cost criterion. Under suitable assumptions we show that the associated ergodic optimality equations admit unique solutions. Finally, the existence of Nash-equilibrium in randomized stationary strategies is established by showing that an appropriate set-valued map has a fixed point.
Key Words : Nonzero-sum game, risk-sensitive ergodic cost criterion, optimality equations, Nash equilibrium, set-valued map
Mathematics Subject Classification: 91A15, 91A50
1 Introduction
We consider infinite horizon ergodic-cost risk-sensitive two-person nonzero-sum stochastic games for discrete time Markov decision processes (MDPs) on a general state space with bounded cost function. We first establish the existence of unique solutions of the corresponding optimality equations. This is used to establish the existence of optimal randomized stationary strategies for a player fixing the strategy of the other player. We then define a suitable topology on the space of randomized stationary strategies. Then under certain separability assumptions we show that a suitably defined set-valued map has a fixed point. This gives the existence of a Nash equilibrium for the nonzero-sum game under consideration. The main step towards this end is to establish the upper semi-continuity of the defined set-valued map.
In a stochastic control problem, since the cost payable depends on the random evolution of the underlying stochastic process, the same is also random. So the most basic approach is to minimize the expected cost. This is the risk-neutral approach. But the obvious short coming of this approach is that it does not take care of the risk factor. In the risk-sensitive criterion, the expected value of the exponential of the cost is considered. Generally the risk associated with a random quantity is quantified by the standard deviation. Hence risk-sensitive criterion provides significantly better protection from the risk since it captures the effects of the higher order moments of the cost as well, see [24] for more details. The analysis of risk-sensitive control is technically more involved because of the exponential nature of the cost; accumulated costs are multiplicative over time as opposed to additive in the risk-neutral case. Risk-sensitive control problems also has deep connections to robust control theory, control theory literature which deals with model uncertainty, see [24] and references therein. Risk-sensitive criterion also arises naturally in portfolio optimization problems in mathematical finance, see [6] and associated references. Nonzero-sum stochastic games arise naturally in strategic multi-agent decision-making system like socioeconomic systems [16] , network security [1], routing and scheduling [19] and so on.
Following the seminal work of of Howard and Matheson [15] there has been a lot of work on risk-sensitive control problems. For an up to date survey on ergodic risk-sensitive control problems with underlying stochastic process being discrete-time Markov chains, see [7]. In the multi-controller set up, discrete-time risk-sensitive zero-sum games on countable state space have been studied in [2, 11]. In [4] the authors consider discrete-time zero-sum risk-sensitive stochastic games on Borel state space for both discounted and ergodic cost criterion. Their analysis involves transforming the original risk-sensitive problem to a risk-neutral problem. Nonzero-sum risk-sensitive stochastic games with both discounted and ergodic cost criterion for countable state space and bounded cost has been studied in [3, 22]. In [23] discrete-time nonzero-sum stochastic games under the risk-sensitive ergodic cost criterion with countable state space and unbounded cost function has been investigated. To the best of our knowledge, this is the first paper which considers risk-sensitive nonzero-sum games on a general state space. The analysis of nonzero-sum games on general state space is significantly more involved as compared to the case of countable state space. One of the reasons for this is that in case of countable state space the topology on the space of randomized stationary controls is fairly simple to work with. In case of general state space this becomes a substantial challenge. In the risk-neutral setup itself the analysis becomes substantially involved and requires additional assumptions. In the risk-neutral set up discrete-time nonzero-sum ergodic cost games with general state space has been studied in [10, 17, 21]. In [10], the authors prove the existence of a Nash-equilibrium under an additive reward additive transition(ARAT) assumption. In [17], the authors establish the existence of an -Nash equilibrium. In [21], the authors assume that the transition law is a combination of finitely many probability measures. In this work we also assume ARAT condition.
In this paper we first consider ergodic optimality equations which correspond to control problems with the strategy of one player fixed. We show the existence of unique solutions to these equations. We follow a span contraction approach. For this analysis we make certain ergodicity assumption along with a few extra assumptions on the state transition kernel. Analogous assumptions appear in [8] where the authors consider risk-sensitive control problems with ergodic cost criterion. In order to establish the existence of a Nash equilibrium we first define an appropriate set valued map. In order to to show the existence of a Nash equilibrium for the nonzero-sum game we use Fan’s fixed point theorem [9]. This involves showing that the defined set valued map is upper semi-continuous under the appropriate topology.
The rest of the paper is organized as follows. Section 2 introduces the game model, some preliminaries and notations. In section 3 we show the existence of a unique solution to the optimality equation using a certain span-norm contraction. Existence of the Nash-equilibrium has been shown in section 4.
2 The Game Model
In this section, we present the discrete-time nonzero-sum stochastic game model and introduce the notations utilized throughout the paper. The following elements are needed to describe the discrete-time nonzero-sum stochastic game:
where each component is described below.
-
•
is the state space assumed to be Polish space endowed with the Borel -algebra .
-
•
and are action spaces of player and respectively, assumed to be compact metric spaces. Let and denote the Borel -algebras on and respectively.
-
•
is the transition kernel from , where for any metric space , denotes the space of all probability measures on with the topology of weak convergence.
-
•
, is one-stage cost function for player , assumed to be bounded and continuous on . Since, the cost is bounded, without loss of generality let, .
At each stage(time) the players observe the current state of the system and then player and independently choose actions and respectively. As a consequence two things happen:
-
(i)
player , pays an immediate cost
-
(ii)
the system moves to a new state with the distribution
The whole process then repeats from the new state . The cost accumulates throughout the course of the game. The planning horizon is taken to be infinite and each player wants to minimize his/her infinite-horizon risk-sensitive cost with respect to some cost criterion, which is in our case defined by (2.1) below.
At each stage the players choose their actions independently on the basis of the available information. The information available for decision making at time is given by the history of the process up to that time
where are the history spaces. The history spaces are endowed with the corresponding Borel algebra. A strategy for player 1 is a sequence of stochastic kernels . The set of all strategies for player 1 is denoted by . A strategy is called a Markov strategy if
for all . Thus a Markov strategy for player 1 can be identified with a sequence of measurable maps . A Markov strategy is called a stationary strategy if for all . Let and denote the set of Markov and stationary strategies for player 1 , respectively. The strategies for player 2 are defined similarly. Let and denote the set of arbitrary, Markov and stationary strategies for player 2, respectively.
Given an initial distribution and a pair of strategies , the corresponding state and action processes are stochastic processes defined on the canonical space (where is the Borel -field on ) via the projections , where is uniquely determined by and by Ionescu Tulcea’s theorem [5, Proposition 7.28]. When (the dirac measure at ),, we simply write this probability measure as . For , we have
Let the corresponding expectation operator be denoted by with respect to the probability measure . Now from [13] we know that under any , the corresponding stochastic process is a Markov process.
Ergodic cost criterion: We now define the risk-sensitive ergodic cost criterion for nonzero-sum discrete-time game. Let be the corresponding process with and be the risk-sensitive parameter. For a pair of strategies , the risk-sensitive ergodic cost criterion for player is given by
(2.1) |
Since the risk-sensitive parameter remains the same throughout, we assume without loss of generality that . Note that, for are bounded as our cost functions are bounded.
Nash equilibrium: A pair of strategies is called a Nash equilibrium(for the ergodic cost criterion) if
and
Our primary goal is to establish the existence of a Nash equilibrium in stationary strategies.
For , let us define transition measures from by
(2.2) |
Moreover, define for and
and
Obviously for is in general is not a probability measure. The normalizing constant for and is given by
(2.3) |
Since , the function is also bounded. More precisely for and for each and . Thus for
(2.4) |
defines a probability transition kernel and we also use the notation := for .
We will use the above transformations, to convert our optimality equations (3.5) and (3.9) into a well-known equation. This process is beneficial as it helps us to prove the existence of the unique solution to the optimality equations as we will see in the next section.
Define , the space of all real valued bounded measurable functions on endowed with the supremum norm . For a fixed and define the operator:
(2.5) |
Due to the dual representation of the exponential certainty equivalent [12, Lemma 3.3] it is possible to write (2.5) as
(2.6) |
where the supremum is over all probability measures and is the relative entropy of the two probability measures which is defined by
when and otherwise. Note that the supremum in (2.6) is attained at the probability measure given by
(2.7) |
for measurable set . Obviously can be interpreted as a transition kernel.
Let . The span semi-norm of is defined as:
(2.8) |
In the last part of this section we make a couple of assumptions that will be in force throughout the rest of the paper. First we will consider the following continuity assumption which is quite standard in literature, see [4, 8, 10, 18] for instance. This will allow us to show that the map defined in (2.5) is a contraction in the span semi-norm.
Assumption 2.1.
For a fixed the transition measure is strongly continuous in , i.e. for all bounded and measurable we have that is continuous in .
It follows from (2.2) that is also strongly continuous in for which lead us the following remark.
Remark 2.1.
Due to the fact that we consider for any metric space , the space is endowed with the topology of weak convergence, from Assumption 2.1 it follows immediately that for all bounded and measurable functions and a fixed the map is continuous in .
Next we have the following ergodicity assumption.
Assumption 2.2.
There exists a real number such that
where the supremum is over all and denotes the total variation norm.
For for any open set .
3 Solution to the optimality equations
In this section, we demonstrate that the operator defined in (2.5) is a contraction. The fixed point of corresponds to the solution of the optimality equation for player 1. In the latter part of this section, we define another operator corresponding to player 2 and establish results analogous to those obtained for .
Proposition 3.1.
Proof.
From the definition of it follows that transforms into itself and the infimum is attained. For given functions and , let be such that for
Then we obtain that
where the set comes from the Hahn-Jordan decomposition of and denotes the complement of . Now, taking supremum over in the above set-up we have
We claim that
(3.1) |
Suppose (3.1) does not hold. Then there exists sequences with , and such that
As and are probability measures therefore
and
Since for each and from (2.7) we get
we have
and
Consequently using (2.4) direct calculations imply
We will now make additional assumptions to show that is a global contraction in
Assumption 3.1.
There exists such that for all and Also let be the Radon-Nikodym derivative of with respect to .
Assumption 3.2.
Lemma 3.1.
Proof.
Notice that for a we have
(3.3) |
Now we get
In the above expression the first equality follows from Assumption 3.1, first inequality follows from Assumption 3.2 and the last inequality follows from (2.3) and the fact that . So, from (3.3) we have
(3.4) |
From (3.4) it follows that . Hence is a global contraction in the span norm on ∎
Before proceeding to the main theorem of this section, we briefly outline some main points. Suppose player 2 announces that he/she is going to employ a strategy . In such a scenario, player 1 attempts to minimize
over . Thus for player 1 it is a discrete-time Markov decision problem with risk sensitive ergodic cost. Player 2 go through equivalent situations when player 1 announces his strategy to be . This leads us to the following theorem.
Theorem 3.1.
Proof.
Notice that (3.5) can be rewritten as
(3.7) |
By Lemma 3.1, is a global contraction in the span norm in , so that it has a fixed point in and (which is unique up to an additive constant) and the constant are solutions to (3.7) and consequently to (3.5).
Let . Then and it can be easily seen that satisfies (3.7). Since, and , so as well.
Let be another solution of (3.5) i.e., it satisfies with . Then clearly is also a span fixed point of . Hence Since , it follows that . It then easily follows that .
The proof of the remaining part is analogous to the proof in [12, Theorem 2.1] which has been done for countable state space but can easily be extended to our general state space case. ∎
For a fixed and define the operator:
(3.8) |
By similar arguments we can also show that is a global contraction in the span norm in and the following theorem holds true.
Theorem 3.2.
Suppose Assumptions 2.1, 2.2, 3.1 and 3.2 are satisfied. Then for , there exists a unique solution pair with (where is some fixed state), satisfying
(3.9) |
In addition, a strategy is an optimal strategy of player 1 given player 2 chooses if and only if (3.9) attains point-wise minimum at . Moreover,
(3.10) |
4 Existence of Nash equilibrium
In this section we establish the existence of a pair of stationary equilibrium strategies for a nonzero-sum game. To this end we first outline a standard procedure for establishing the existence of a Nash equilibrium. From Theorem 3.2 it follows that given that player 2 is using the strategy , we can find an optimal response for player 1 . Clearly depends on and moreover there may be several optimal responses for player 1 in . Analogous results holds for player 2 if player 1 announces that he is going to use a strategy . Hence given a pair of strategies , we can find a set of pairs of optimal responses via the appropriate pair of optimality equations described above. This defines a set-valued map. Clearly any fixed point of this set-valued map is a Nash equilibrium.
To ensure the existence of a Nash equilibrium, we first take the following separability assumptions.
Assumption 4.1.
There exist two sub stochastic kernels
such that
Since , we have and . Let and be the respective densities. We assume that for each , and are continuous.
For each ,
Assumption 4.2.
The reward functions , are separable in action variables, i.e., there exist bounded continuous(in the second variable) functions
such that
Following [14] and [18] we topologize the spaces and with the topology of relaxed controls introduced in [20]. We identify two elements if a.e. (where is as in Assumption 3.1). Let
is measurable in the first argument and continuous in the second and there exists such that for every .
Then is a Banach space with norm [20]
Every (with the -a.e. equivalence relation) can be identified with the element (the dual of ) defined as
Thus can be identified with a subset of . Equip with the weak-star topology. Then it can be shown as in [18] that is compact and metrizable. can be topologized analogously.
Next, we present the following lemmas, which play a pivotal role to show upper semi-continuity of a specific set-valued map which we have mentioned earlier.
Lemma 4.1.
Let, in the weak star topology. Then under Assumption 4.2 for , as .
Proof.
We have for ,
Now by Assumption 4.2 and since in the weak star topology, the result is immediate. ∎
Lemma 4.2.
Proof.
Note that
(4.1) | ||||
We claim that
(4.2) |
Observe that, under Assumption 4.1, Assumption 4.2 and using (2.4) we get
By using Lemma 4.1 and since , under Assumption 4.1, (4.2) holds true, i.e. the second term in the right hand side of (4.1) goes to zero as . Now we show that the first one also goes to zero.
Again note that,
where . From the compactness of and the continuity of , it follows that for
for some sequences and . We now prove that as .
Since and are compact , without loss of generality, we can assume that
Note that, for each , we have
(4.3) | ||||
Moreover,
By Assumption 4.1 and by the boundedness of , from the last inequality we get that, first term on the right-hand side of (4.3) goes to zero as . Since, is a weak star limit of and , so the second term on the right-hand side of (4.3) also goes to zero as . Thus, we have shown that as . Hence the result followed. ∎
Lemma 4.3.
Let for , be any sequence in with for all . Then is uniformly bounded .
Proof.
Remark 4.1.
Since exponential and logarithmic functions are increasing functions, so and also have the following expressions:
Next set
Proof.
From Remark 2.1, we know that is continuous on for each . As is compact, is also compact. Then it is easy to see that is non-empty. Let and as is compact, has a convergent subsequence(denoted by the same sequence by abuse of notation)such that . Now for any
(4.6) | ||||
Using Lemma 4.1 and 4.2, from (4.6) we get for any
Hence it follows that and therefore is closed. Since is a compact metric space, it follows that is also compact. Using Remark 4.1 the convexity of and follows easily. By analogous arguments, is also non-empty compact subset of . Hence is a non-empty compact convex subset of . ∎
Next lemma proves the upper semi-continuity of a certain set valued map. This result will be useful in establishing the existence of a Nash equilibrium in the space of stationary Markov strategies.
Proof.
Let . has a convergent subsequence (denoted by the same sequence by abuse of notation) such that and similarly has a subsequence too such that . Since, and is bounded so without loss of generality let in the weak star sense and . Then since,
(4.7) |
Using Lemma 4.1, 4.2 and 4.3 it follows that
(4.8) |
From (4.7) for any we get
Again using Lemma 4.1, 4.2 and 4.3 it follows that
(4.9) |
Let . Then from (4.9) we get, for any
(4.10) |
and from (4.8) we get
(4.11) |
Since (4.10) holds for every , from (4.10) and (4.11) we get
(4.12) |
with . Now by Theorem 3.1 we can say that (4.12) has unique solution (corresponds to ) satisfying . Therefore, and . Thus, from (4.11) and (4.12) it follows that .
Suppose and along a suitable subsequence and . Then by similar arguments one can show that . This proves that . Hence the map is upper semi-continuous.
Now we are now all set to show the existence is of the Nash equilibrium which is directly follows by using Fan’s fixed point theorem [9]. ∎
Theorem 4.1.
References
- [1] Tansu Alpcan and Tamer Ba¸sar. Network security. Cambridge University Press, Cambridge, 2011. A decision and game-theoretic approach.
- [2] Arnab Basu and Mrinal K Ghosh. Zero-sum risk-sensitive stochastic games on a countable state space. Stochastic Process. Appl., 124(1):961–983, 2014.
- [3] Arnab Basu and Mrinal K Ghosh. Nonzero-sum risk-sensitive stochastic games on a countable state space. Mathematics of Operations Research, 43(2):516–532, 2018.
- [4] N. Bauerle and U. Rieder. Zero-sum risk sensitive stochastic games. Stoch. Processes and Their Appl, 127:622–642, 2017.
- [5] Dimitri Bertsekas and Steven E Shreve. Stochastic optimal control: the discrete-time case, volume 5. Athena Scientific, 1996.
- [6] Tomasz R. Bielecki, Stanley R. Pliska, and Shuenn-Jyi Sheu. Risk sensitive portfolio management with Cox-Ingersoll-Ross interest rates: the HJB equation. SIAM J. Control Optim., 44(5):1811–1843, 2005.
- [7] Anup Biswas and Vivek S. Borkar. Ergodic risk-sensitive control—a survey. Annu. Rev. Control, 55:118–141, 2023.
- [8] Giovanni B Di Masi and Lukasz Stettner. Risk-sensitive control of discrete-time markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38(1):61–78, 1999.
- [9] Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of the National Academy of Sciences, 38(2):121–126, 1952.
- [10] Mrinal K Ghosh and Arunabha Bagchi. Stochastic games with average payoff criterion. Applied Mathematics and Optimization, 38:283–301, 1998.
- [11] Mrinal K Ghosh, Subrata Golui, Chandan Pal, and Somnath Pradhan. Discrete-time zero-sum games for markov chains with risk-sensitive average cost criterion. Stochastic Processes and their Applications, 158:40–74, 2023.
- [12] Daniel Hernández-Hernández and Steven I Marcus. Risk sensitive control of markov processes in countable state space. Systems & control letters, 29(3):147–155, 1996.
- [13] Onésimo Hernández-Lerma. Adaptive Markov control processes, volume 79. Springer Science & Business Media, 2012.
- [14] CJ Himmelberg, Thiruvenkatachari Parthasarathy, TES Raghavan, and FS Van Vleck. Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proceedings of the American Mathematical Society, 60(1):245–251, 1976.
- [15] Ronald A. Howard and James E. Matheson. Risk-sensitive Markov decision processes. Management Sci., 18:356–369, 1971/72.
- [16] Matthew O. Jackson. Social and economic networks. Princeton University Press, Princeton, NJ, 2008.
- [17] Andrzej S. Nowak and Eitan Altman. -equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J. Control Optim., 40(6):1821–1839, 2002.
- [18] T Parthasarathy. Existence of equilibrium stationary strategies in discounted stochastic games. Sankhyā: The Indian Journal of Statistics, Series A, pages 114–127, 1982.
- [19] Tim Roughgarden. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, 2016.
- [20] Jack Warga. Functions of relaxed controls. SIAM Journal on Control, 5(4):628–641, 1967.
- [21] Qingda Wei and Xian Chen. Nonzero-sum expected average discrete-time stochastic games: the case of uncountable spaces. SIAM J. Control Optim., 57(6):4099–4124, 2019.
- [22] Qingda Wei and Xian Chen. Risk-sensitive average equilibria for discrete-time stochastic games. Dynamic Games and Applications, 9:521–549, 2019.
- [23] Qingda Wei and Xian Chen. Nonzero-sum risk-sensitive average stochastic games: the case of unbounded costs. Dynamic games and Applications, 11:835–862, 2021.
- [24] Peter Whittle. Risk-sensitive linear/quadratic/gaussian control. Advances in Applied Probability, 13(4):764–777, 1981.