This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Incentive Compatibility in Two-Stage Repeated Stochastic Games

Bharadwaj Satchidanandan and Munther A. Dahleh
Abstract

We address mechanism design for two-stage repeated stochastic games — a novel setting using which many emerging problems in next-generation electricity markets can be readily modeled. We introduce a new notion of equilibrium called Dominant Strategy Non-Bankrupting Equilibrium (DNBE) which requires players to make very little assumptions about the behavior of other players to employ their equilibrium strategy. Consequently, a mechanism that renders truth-telling a DNBE could be quite effective in molding real-world behavior along truthful lines. We present a mechanism for two-stage repeated stochastic games that renders truth-telling a Dominant Strategy Non-Bankrupting Equilibrium.

I Introduction

The power system is on the cusp of a revolution. The coming decade could witness increased renewable energy penetration, Electric Vehicle (EV) penetration, EV energy storage integration, demand response programs, etc. These changes have a profound impact on electricity market operations. New mechanisms must be devised to address a variety of important problems that are anticipated to arise in next-generation electricity markets. Most of the existing mechanism design settings are insufficient to model certain crucial features of these problems. To address this, we introduce the setting of Two-Stage Repeated Stochastic Games using which many problems that arise in the context of electricity markets can be readily modeled. The setting is an extension of the one-shot two-stage stochastic game introduced in [1] to repeated plays.

At a high level, a two-stage stochastic game, as the name suggests, consists of two stages. In the first stage, the players do not know their valuation functions precisely, but rather only know the probability distribution of their valuation functions. It is only in the second stage of the game that the valuation functions realize. However, the social planner cannot wait until the second stage to decide on the outcome. It is constrained to make certain decisions in the first stage itself based on the probability distribution bids of the players’ valuation functions. Once the valuation functions realize and are reported in the second stage, the social planner can make corrections to the first stage outcome by taking certain recourse actions but this comes at a cost [1]. It is the stochasticity of the players’ valuation functions and the prospect for them to misreport both the probability distribution and the realization of their valuation functions that preclude the use of classical mechanism design techniques to design efficient and incentive-compatible mechanisms for this setting.

Motivated by applications to electricity markets which operate every day, we consider a setting wherein a two-stage stochastic game is played repeatedly. Repeated playing affords the players a large class of strategies that adapt a player’s actions to all past observations and inferences obtained therefrom. In other settings such as iterative auctions or dynamic games where a large strategy space of this sort manifests, it typically has an important implication for mechanism design: It may be impossible to obtain truth-telling as a dominant strategy equilibrium [2]. Consequently, in such scenarios, it is common to settle for mechanisms that render truth-telling only a Nash equilibrium, or variants thereof, even though Nash equilibria are known to be poor models of real-world behavior. This is owing to each player having to make overly specific assumptions about the behaviors of the other players in order for them to employ their Nash equilibrium strategy, which they may not make. In general, the lesser the burden of speculation in an equilibrium, the more plausible it is that it models real-world behavior.

Guided by the above maxim, we develop a new notion of equilibrium called the Dominant Strategy Non-Bankrupting Equilibrium (DNBE) that requires players to make very little assumptions about the behaviors of the other players for them to employ their equilibrium strategy. Specifically, the only assumption that the players are required to make to play their DNBE strategy is that no player employs a strategy that leads to their own bankruptcy. We make this more precise in Section II. That the assumption is mild in that it is quite likely to hold in practice needs no belaboring. Consequently, a mechanism that implements a certain desired behavior as a DNBE as opposed to only a Nash equilibrium could be quite effective in molding real-world behavior along the desired lines.

We then present a mechanism for two-stage repeated stochastic games that renders truth-telling a dominant strategy non-bankrupting equilibrium. The mechanism is individually rational in that every player is guaranteed to accrue a nonnegative utility by truth-telling regardless of what strategies the other players employ. Finally, if every player bids truthfully, then the outcome that the mechanism produces maximizes social welfare. The mechanism is a generalization of the mechanism that we have developed in [3] for energy storage markets.

Finally, we apply the mechanism to design an efficient and incentive-compatible demand response market. There are two main takeaways that we wish to highlight for designers of next-generation electricity markets. The first is that there is a need to redesign the “bidding language” of the day-ahead market. In today’s electricity markets, the generators and loads bid their supply and demand functions respectively. However, with the inclusion of demand response providers who may not know exactly in the day-ahead market their ability to reduce consumption the following day, the day-ahead market should allow for bids that are only probabilistic in nature. It is only in real time, if and when called upon for demand response, that the demand response providers should be required to disclose their actual costs for curtailing consumption. The theory developed in the paper allows for such probabilistic bids to be submitted to the system operator. Secondly, our results show that “simple” mechanisms like making payments proportional to the power curtailed by demand response providers, which have been employed in previous demand response trials, are incapable of attaining the optimal social welfare. Significant welfare gains can be obtained by employing carefully-designed mechanisms that take into account the uncertainties of the market participants.

The rest of the paper is organized as follows. Section II begins with a precise description of a two-stage repeated stochastic game, defines the notion of dominant strategy non-bankrupting equilibrium, and formulates the mechanism design problem. Section III develops a mechanism for two-stage repeated stochastic games that guarantees truth-telling to be a dominant strategy non-bankrupting equilibrium. Section IV describes the application of the results to the design of demand response markets. Section V provides an account of related work. Section VI concludes the paper.

Notation: Vectors and sequences are denoted using boldface letters. Given a sequence 𝐱={x(1),x(2),},\mathbf{x}=\{x(1),x(2),\ldots\}, we denote by 𝐱l\mathbf{x}^{l} the segment {x(1),,x(l)}.\{x(1),\ldots,x(l)\}. The hat notation is used to denote bids: Given a variable xx that is private to a player, we denote by x^\widehat{x} the bid that the player submits for x.x.

II Problem Formulation

A two-stage stochastic game played by nn players and consisting of a social planner is described by

  1. 1.

    a publicly-known set Δ\Delta known as the type space of the players,

  2. 2.

    a publicly-known set Θ\Theta of probability mass functions over Δ\Delta, known as the supertype space of the players,

  3. 3.

    for each i{1,,n},i\in\{1,\ldots,n\}, a probability distribution θiΘ\theta_{i}\in\Theta, known as player ii’s supertype, that is privately known to player ii in the first stage of the game, and which it is supposed to report to the social planner in the first stage,

  4. 4.

    a set 𝒪1\mathcal{O}_{1} of first-stage outcomes,

  5. 5.

    a first-stage decision rule g1:Θn𝒪1g^{*}_{1}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\to\mathcal{O}_{1} according to which the social planner chooses the first-stage outcome as a function of the players’ supertype bids,

  6. 6.

    for each i{1,,n},i\in\{1,\ldots,n\}, player ii’s type δiΔ\delta_{i}\in\Delta that is “drawn by nature” at random according to θi\theta_{i}, whose realization is privately observed by player ii in the second stage of the game, and which it is supposed to report to the social planner in the second stage,

  7. 7.

    a set 𝒪2\mathcal{O}_{2} of second-stage outcomes or “recourse actions” that the social planner can choose,

  8. 8.

    a second-stage decision rule g2:Θn×Δn𝒪2g^{*}_{2}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\times\Delta^{n}\to\mathcal{O}_{2} according to which the social planner chooses the second-stage outcome as a function of the players’ type and supertype bids,

  9. 9.

    a cost function c:𝒪1×𝒪2c\mathrel{\mathop{\mathchar 58\relax}}\mathcal{O}_{1}\times\mathcal{O}_{2}\to\mathbb{R} that specifies for every (o1,o2)𝒪1×𝒪2,(o_{1},o_{2})\in\mathcal{O}_{1}\times\mathcal{O}_{2}, the cost incurred by the social planner for choosing the outcome o1o_{1} in the first stage and taking the recourse action o2o_{2} in the second stage,

  10. 10.

    for each i{1,,n},i\in\{1,\ldots,n\}, a valuation function vi:Δ×𝒪1×𝒪2v_{i}\mathrel{\mathop{\mathchar 58\relax}}\Delta\times\mathcal{O}_{1}\times\mathcal{O}_{2}\to\mathbb{R} of player ii that specifies for every δiΔ\delta_{i}\in\Delta and every (o1,o2)𝒪1×𝒪2,(o_{1},o_{2})\in\mathcal{O}_{1}\times\mathcal{O}_{2}, the valuation of player ii if its type is δi\delta_{i} and the social planner chooses the outcomes o1o_{1} and o2o_{2} in the first and the second stage of the game respectively.

The first- and second-stage decision rules (g1,g2)(g_{1}^{*},g_{2}^{*}) that we consider are those that maximize the expected social welfare. To elaborate, let g1:Θn𝒪1g_{1}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\to\mathcal{O}_{1} be any first-stage decision rule and g2:Θn×Δn𝒪2g_{2}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\times\Delta^{n}\to\mathcal{O}_{2} be any second-stage decision rule. If the players bid their types and supertypes truthfully, then the expected social welfare that results as a consequence of using the decision rule (g1,g2)(g_{1},g_{2}) is

𝔼𝜹𝜽[i=1nvi(δi,g1(𝜽),g2(𝜽,𝜹))c(g1(𝜽),g2(𝜽,𝜹))]=:W(𝜽,g1,g2).\mathbb{E}_{\boldsymbol{\delta}\sim\boldsymbol{\theta}}\big{[}\sum_{i=1}^{n}v_{i}(\delta_{i},g_{1}(\boldsymbol{\theta}),g_{2}(\boldsymbol{\theta},\boldsymbol{\delta}))-c(g_{1}(\boldsymbol{\theta}),g_{2}(\boldsymbol{\theta},\boldsymbol{\delta}))\big{]}=\mathrel{\mathop{\mathchar 58\relax}}{W}(\boldsymbol{\theta},g_{1},g_{2}).

The goal of the social planner is to maximize the expected social welfare, and so the decision rule (g1,g2)(g_{1}^{*},g_{2}^{*}) that it employs is

(g1,g2)=argmaxg1,g2W(,g1,g2),\displaystyle(g_{1}^{*},g_{2}^{*})=\operatorname*{arg\,max}_{g_{1},g_{2}}\;{W}(\cdot,g_{1},g_{2}), (1)

where the maximization is defined in the pointwise sense. The social planner computes g1g_{1}^{*} and g2g_{2}^{*} and announces it to the players before the game commences.

The problem that we study is one where a two-stage stochastic game of the above form is played repeatedly on each day l,l, l+.l\in\mathbb{Z}_{+}. For ease of exposition, we assume that the supertypes of the players remain the same on all days and it is only their types that differ across days, though this assumption can be relaxed in a straightforward manner. Consequently, for each player ii, i{1,,n},i\in\{1,\ldots,n\}, we denote by θi\theta_{i} its privately known supertype which remains the same on all days and by δi(l)\delta_{i}(l) its privately known type on day l.l. The sequence {𝜹(1),𝜹(2),}\{\boldsymbol{\delta}(1),\boldsymbol{\delta}(2),\ldots\} is assumed to be Independent and Identically Distributed (IID) with 𝜹(1)θ1××θn.\boldsymbol{\delta}(1)\sim\theta_{1}\times\ldots\times\theta_{n}.

II-A First-stage strategy

On each day l,l, each player ii is required to report its supertype to the social planner in the first stage so that the latter can compute the optimal first-stage outcome. Since the players’ supertypes are assumed to remain the same on all days, it suffices for the players to bid their supertypes just once, namely, in the first stage of the game on day 1.1. Owing to strategic reasons that will be clear shortly, the players may not bid their supertypes truthfully, and so we denote by θ^i{\widehat{\theta}_{i}} the supertype bid of player ii and by σi:ΘΘ\sigma_{i}\mathrel{\mathop{\mathchar 58\relax}}\Theta\to\Theta the first-stage strategy according to which player ii constructs its supertype bid. Therefore, θ^i=σi(θi).\widehat{\theta}_{i}=\sigma_{i}(\theta_{i}). Once all players submit their supertype bids, the social planner computes the first-stage outcome as g1(𝝈(𝜽)),g_{1}^{*}(\boldsymbol{\sigma}(\boldsymbol{\theta})), where 𝝈(𝜽)[σ1(θ1),,σn(θn)].\boldsymbol{\sigma}(\boldsymbol{\theta})\coloneqq[\sigma_{1}(\theta_{1}),\ldots,\sigma_{n}(\theta_{n})]. The game then proceeds to the second stage.

II-B Second-stage bidding policy

In the second stage on each day ll, each player ii observes the realization of δi(l)\delta_{i}(l) which it is supposed to report to the social planner. However, owing to strategic reasons that will become clear shortly, the players may not bid their type realizations truthfully, and so we denote by δ^i(l)\widehat{\delta}_{i}(l) player ii’s type bid on day l.l. We allow for the player to construct its type bid on any day ll using all information available to it until day ll, and in accordance with any randomized, history-dependent policy of its choosing. Specifically, a second-stage bidding policy μ\mu of player ii is a rule which specifies for each o1𝒪1o_{1}\in\mathcal{O}_{1} and each l+,l\in\mathbb{Z}_{+}, a probability transition kernel μ(δ^i(l)|δil,δ^il1,o2l1;o1)\mathbb{P}_{\mu}(\widehat{\delta}_{i}(l)\big{|}\delta_{i}^{l},\widehat{\delta}_{i}^{l-1},o_{2}^{l-1};o_{1}) according to which player ii constructs its second-stage bid δ^i(l)\widehat{\delta}_{i}(l) on day ll if the first-stage outcome is o1o_{1}. We denote by Πi\Pi_{i} the set of all second-stage bidding policies available to player i.i.

Note that the second stage bidding policy is a rule that maps the history of observations available to a player to its second-stage bid. While the outcome of the rule is random owing to the types and second-stage outcomes being random, there is nothing random about the rule itself. Consequently, a player without any loss of generality can choose its second-stage bidding policy right on day 11 as a function of its supertype. This leads to the notion of the second-stage strategy which is described next.

II-C Second-stage strategy

A second-stage strategy of player ii is a function πi:ΘΠi\pi_{i}\mathrel{\mathop{\mathchar 58\relax}}\Theta\to\Pi_{i} which specifies the second-stage bidding policy that it employs as a function of its private supertype θi.\theta_{i}. Therefore, πi(θi)\pi_{i}(\theta_{i}) is the second-stage bidding policy employed by player i.i.

Once all players submit their type bids, the social planner computes the second-stage outcome for day ll as o2(l)=g2(𝝈(𝜽),𝜹^(l))o_{2}(l)=g_{2}^{*}(\boldsymbol{\sigma}(\boldsymbol{\theta}),\widehat{\boldsymbol{\delta}}(l)). Note that once the players’ first-stage and second-stage bidding policies are fixed, a functional relationship is established between the types and the type bids, and all random variables become well-defined.

II-D Strategies and Strategy profiles

We refer to the composition of the first- and second-stage strategies simply as a strategy. I.e, Si(σi,πi)S_{i}\coloneqq(\sigma_{i},{\pi}_{i}) is referred to as the strategy of player i.i. We denote by Λi\Lambda_{i} the set of strategies available to player i.i. Finally, we refer to 𝑺(S1,,Sn)\boldsymbol{S}\coloneqq(S_{1},\ldots,S_{n}) as the strategy profile of the players and denote by Λ\Lambda the set of strategy profiles Λ1××Λn\Lambda_{1}\times\ldots\times\Lambda_{n}.

II-E Truthful strategies

The stochasticity of the player types necessitates the definition of truthful strategy to be weaker than requiring a player to bid its type truthfully on all days.

Definition 1.

A strategy Si=(σi,πi)S_{i}=(\sigma_{i},{\pi}_{i}) of player ii, i{1,,n},i\in\{1,\ldots,n\}, is truthful if

  1. (i)

    σi(θ)=θ\sigma_{i}(\theta)=\theta for every θΘ,\theta\in\Theta, and

  2. (ii)

    for every θΘ\theta\in\Theta and every o1𝒪1,o_{1}\in\mathcal{O}_{1}, there exists +\mathcal{L}\subseteq\mathbb{Z}_{+} with limL1Ll=1L𝟙{l}=0\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{l\in\mathcal{L}\}}=0 such that for all l,l\notin\mathcal{L},

    πi(θ)(δ^i(l)|δil,δ^il1,o2l1;o1)=𝟙{δ^i(l)=δi(l)}.\mathbb{P}_{\pi_{i}(\theta)}(\widehat{\delta}_{i}(l)\big{|}\delta_{i}^{l},\widehat{\delta}_{i}^{l-1},o_{2}^{l-1};o_{1})=\mathds{1}_{\{\widehat{\delta}_{i}(l)=\delta_{i}(l)\}}.

A strategy profile (S1,,Sn)(S_{1},\ldots,S_{n}) is a truthful strategy profile if SiS_{i} is truthful for every i{1,,n}.i\in\{1,\ldots,n\}.

In other words, a strategy SiS_{i} is truthful if the supertype bid is truthful and the type bid is truthful “almost all days.” We denote by 𝒯iΛi\mathcal{T}_{i}\subset\Lambda_{i} the set of all truthful strategies available to player i.i.

II-F Payments and utilities

The social planner collects a payment from each player at the end of each day that is determined as a function of the bids that they submit until that day. We denote by pi,l:Θ1××Θn×Δ1l××Δnlp_{i,l}\mathrel{\mathop{\mathchar 58\relax}}\Theta_{1}\times\ldots\times\Theta_{n}\times\Delta_{1}^{l}\times\ldots\times\Delta_{n}^{l}\to\mathbb{R} the payment rule so that pi,l(𝜽^,𝜹^l)p_{i,l}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}^{l}) specifies the amount that player ii should pay on day ll. The utility accrued by player ii is defined as

ui(Si,𝑺i,𝜽,𝜹)[lim infL1Ll=1Lvi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))pi,l(𝜽^,𝜹^l)].\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\coloneqq\bigg{[}\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l)))-p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\bigg{]}. (2)

Note that a player’s utility is a random variable that depends on the realization of the type sequence 𝜹.\boldsymbol{\delta}^{\infty}.

II-G Non-Bankrupting strategies

As mentioned in Section I, a “mild” behavioral assumption, one that is quite likely to hold in practice, is that no player behaves in a manner that might result in its own bankruptcy. This is captured by the notion of a non-bankrupting strategy.

Definition 2.

A strategy SiS_{i} of player i,i, i{1,,n},i\in\{1,\ldots,n\}, is non-bankrupting if for all (𝑺i,𝜽),(\boldsymbol{S}_{-i},\boldsymbol{\theta}),

ui(Si,𝑺i,𝜽,𝜹)>u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})>-\infty

for all 𝜹,\boldsymbol{\delta}^{\infty}, except perhaps on a set of probability zero.

A strategy profile 𝑺=(S1,,Sn)\boldsymbol{S}=(S_{1},\ldots,S_{n}) is non-bankrupting if SiS_{i} is non-bankrupting for every i{1,,n}.i\in\{1,\ldots,n\}.

We denote by 𝒩i\mathcal{NB}_{i} the set of non-bankrupting strategies of player i,i, by 𝒩i\mathcal{NB}_{-i} the set of non-bankrupting strategy profiles of all players except player i,i, and by 𝒩\mathcal{NB} the set of non-bankrupting strategy profiles of all players.

II-H Dominant Strategy Non-Bankrupting Equilibrium

We are now ready to introduce a notion of equilibrium that is “slightly” weaker than dominant strategy equilibrium.

Definition 3.

A strategy profile 𝑺=(S1,,Sn)𝒩\boldsymbol{S}=(S_{1},\ldots,S_{n})\in\mathcal{NB} is a Dominant Strategy Non-Bankrupting Equilibrium (DNBE) if for all i{1,,n},i\in\{1,\ldots,n\}, all Si𝒩i,S^{\prime}_{-i}\in\mathcal{NB}_{-i}, all SiΛi,S_{i}^{\prime}\in\Lambda_{i}, and all 𝜽,\boldsymbol{\theta},

ui(Si,Si,𝜽,𝜹)ui(Si,Si,𝜽,𝜹)\displaystyle u_{i}(S_{i},S^{\prime}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\geq u_{i}(S^{\prime}_{i},S^{\prime}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty}) (3)

for all 𝜹,\boldsymbol{\delta}^{\infty}, except perhaps on a set of probability zero.

It is perhaps instructive to contrast DNBE with Dominant Strategy Equilibrium (DSE) and Nash Equilibrium (NE) to gain a better appreciation of the notion. Note that for a strategy profile 𝑺\boldsymbol{S} to form a Nash equilibrium, it must hold for every i{1,,n}i\in\{1,\ldots,n\} that SiS_{i} is a best response to 𝑺i.\boldsymbol{S}_{-i}. On the other hand, for the strategy profile 𝑺\boldsymbol{S} to form a DNBE, we must have for all i{1,,n}i\in\{1,\ldots,n\} that SiS_{i} is a best response not only to 𝑺i\boldsymbol{S}_{-i}, but also to all 𝑺i𝒩i.\boldsymbol{S}^{\prime}_{-i}\in\mathcal{NB}_{-i}. It follows that any dominant strategy non-bankrupting equilibrium is also a Nash equilibrium but not vice-versa. The stronger notion of dominant strategy equilibrium requires for all i{1,,n}i\in\{1,\ldots,n\} that SiS_{i} is a best response to every 𝑺iΛi,\boldsymbol{S}^{\prime}_{-i}\in\Lambda_{-i}, and not just to those in 𝒩i\mathcal{NB}_{-i} as required by DNBE. Hence, any dominant strategy equilibrium is also a dominant strategy non-bankrupting equilibrium. Fig. 1 illustrates the hierarchy formed by these equilibrium notions.

DSEDNBENESet of Strategy Profiles
Figure 1: Hierarchy of equilibrium notions. Any dominant strategy equilibrium is also a dominant strategy non-bankrupting equilibrium, and any dominant strategy non-bankrupting equilibrium is also a Nash equilibrium.

II-I Mechanism Design Problem

Arbitrarily fix the strategy profile 𝑺\boldsymbol{S} of the players. The long-term average social welfare that results from the game is

q(𝑺,𝜽,𝜹)lim infL1Ll=1L[i=1nvi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(𝒍)))]c(g1(𝜽^),g2(𝜽^,𝜹^(l))).\displaystyle q(\boldsymbol{S},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\coloneqq\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{[}\sum_{i=1}^{n}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}(l)}))\bigg{]}-c(g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l))). (4)

The objective of the social planner is to ensure that the average social welfare q(𝑺,𝜽,𝜹)q(\boldsymbol{S},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty}) equals the optimal value W(𝜽)W^{*}(\boldsymbol{\theta}) that would result almost surely if all players employ a truthful strategy. However, the objective of each player ii is to maximize its own utility given by (2), and so it may not employ a truthful strategy if there is a possibility for it to accrue a higher utility by doing so than by employing a truthful strategy. This brings us to the mechanism design problem. We wish to design a payment rule {pi,l:(i,l){1,,n}×+}\{p_{i,l}\mathrel{\mathop{\mathchar 58\relax}}(i,l)\in\{1,\ldots,n\}\times\mathbb{Z}_{+}\} such that each player employing a truthful strategy is a Dominant Strategy Non-Bankrupting Equilibrium. The next section develops the mechanism and establishes the incentive and efficiency properties guaranteed by it.

III An Efficient and Incentive-Compatible Mechanism for Two-Stage Repeated Stochastic Games

For each i{1,,n},i\in\{1,\ldots,n\}, the payment of player ii on any day ll consists of two components piFp_{i}^{F} and piSp_{i}^{S} that can be computed by the social planner at the end of the first and the second stages of the game respectively on day l.l. These payment functions are defined next.

III-A First-stage payment

The first-stage payment piFp_{i}^{F} is a function of only the first-stage bids of the players. Since these quantities remain the same on all days, so do the first-stage payments. The first-stage payment is simply the VCG payment and is defined as

piF(𝜽^)W(𝜽^i)𝔼𝜹𝜽^[jivj(δj,g1(𝜽^),g2(𝜽^,𝜹))c(g1(𝜽^),g2(𝜽^,𝜹))],\displaystyle p_{i}^{F}(\boldsymbol{\widehat{\theta}})\coloneqq W^{*}(\boldsymbol{\widehat{\theta}}_{-i})-\mathbb{E}_{\boldsymbol{\delta}\sim\mathbb{P}_{\boldsymbol{\widehat{\theta}}}}\bigg{[}\sum_{j\neq i}v_{j}(\delta_{j},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))-c(g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))\bigg{]}, (5)

where 𝜽^i\widehat{\boldsymbol{\theta}}_{-i} denotes the supertype bids of all players other than player ii.

III-B Second-stage payment

At a high level, the first functionality of the second-stage payment is to bind the first-stage and the second-stage strategies of the players. To achieve this, the second-stage payment rule compares the empirical frequencies of the players’ type bids with their supertype bids and penalizes discrepancies between them. To elaborate, denote by θ^i(t)\widehat{\theta}_{i}(t) the probability that a random variable distributed according to θ^i\widehat{\theta}_{i} takes the value t,t, tΔ.t\in\Delta. On each day ll and for each player i,i, (l,i)+×{1,,n},(l,i)\in\mathbb{Z}_{+}\times\{1,\ldots,n\}, the second-stage payment rule computes the discrepancy

f^i,t(l)[1ll=1l𝟙{δ^i(l)=t}]θ^i(t)\displaystyle\widehat{f}_{i,t}(l)\coloneqq\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\delta}_{i}(l^{\prime})=t\}}\bigg{]}-\widehat{\theta}_{i}(t) (6)

for every tΔt\in\Delta, and imposes a penalty of Jp(l)J_{p}(l) on player ii if f^i,t(l)\widehat{f}_{i,t}(l) falls outside a window of size r(l)r(l) for some tΔ,t\in\Delta, i.e., if

|f^i,t(l)|r(l)\displaystyle|\widehat{f}_{i,t}(l)|\geq r(l) (7)

for some tΔ.t\in\Delta.

In a setting of repeated playing, the sequence of second-stage outcomes serves as a source of common randomness which the players can potentially use to correlate their second-stage bids if there is a possibility for them to accrue a higher utility by doing so than by fabricating their bids independently of the other players’ bids. The second functionality of the second-stage payment is to disincentivize such strategies. Towards this end, on each day ll and for each player ii, (l,i)+×{1,,n},(l,i)\in\mathbb{Z}_{+}\times\{1,\ldots,n\}, the second-stage payment rule computes

h^i,𝐝(l)[1ll=1l𝟙{δ^i(l)=di,𝜹^i(l)=𝐝i}][θ^i(di)][1ll=1l𝟙{𝜹^i(l)=𝐝i}]\displaystyle\widehat{h}_{i,\mathbf{d}}(l)\coloneqq\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\delta}_{i}(l^{\prime})=d_{i},\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\bigg{]}-\bigg{[}\widehat{\theta}_{i}(d_{i})\bigg{]}\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\bigg{]} (8)

for every 𝐝Δn,\mathbf{d}\in\Delta^{n}, and imposes a penalty of Jp(l)J_{p}(l) on player ii if it falls outside a window of size r(l)r(l) for some 𝐝Δn,\mathbf{d}\in\Delta^{n}, i.e., if

|h^i,𝐝(l)|r(l)\displaystyle|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l) (9)

for some 𝐝Δn.\mathbf{d}\in\Delta^{n}.

How should the window size sequence {r}\{r\} be chosen? On the one hand, the window size r(l)r(l) must tend to zero as ll tends to infinity for otherwise, the set of sequences {𝜹^}\{\boldsymbol{\widehat{\delta}}\} that satisfy (7) and (9) would be “large,” thereby violating incentive compatibility. On the other hand, if {r}\{r\} decays too quickly, then even truthful type bids would violate (7) and (9) infinitely often, thereby incurring a large penalty and violating individual rationality. Hence, the sequence {r}\{r\} should be chosen in a manner that balances the two objectives. This is achieved by choosing {r}\{r\} such that

limlr(l)=0,\displaystyle\lim_{l\to\infty}r(l)=0, (10)

and for some γ>0,\gamma>0,

r(l)ln2l1+γ2l\displaystyle r(l)\geq\sqrt{\frac{\ln{2l^{1+\gamma}}}{2l}} (11)

for all l+.l\in\mathbb{Z}_{+}.111It suffices that (11) holds not for all ll but only for all sufficiently large ll.

To obtain an intuition for condition (11), note that the empirical frequency 1ll=1l𝟙{δi(l)=t}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\delta_{i}(l^{\prime})=t\}} resulting from the true type sequence of player ii is a random variable with mean θi(t)\theta_{i}(t) and standard deviation that scales as 1/l.{{1}/\sqrt{l}}. Therefore, if the window size decays at the same rate, then the probability of the empirical frequency falling outside the window would remain at a constant value. This suggests that the window size must scale slower than at least 1/l.{{1}/\sqrt{l}}. By scaling the window size only slightly slower than 1/l{{1}/\sqrt{l}}, namely the rate specified by condition (11), truthful bids are guaranteed to almost surely satisfy (7) and (9) for all but finitely many values of ll. This is established in Lemma 1.

How should the penalty sequence {Jp}\{J_{p}\} be chosen? As shown in Lemma 1, truthful players incur a penalty only finitely often almost surely, and so the long-term average penalty that they incur is almost surely zero regardless of how the sequence {Jp}\{J_{p}\} is chosen. Therefore, the only objective in the design of {Jp}\{J_{p}\} is for every non-truthful strategy to incur a sufficiently high penalty. This is accomplished by choosing {Jp}\{J_{p}\} to be any nonnegative sequence such that

limlJp(l)l=.\displaystyle\lim_{l\to\infty}\frac{J_{p}(l)}{l}=\infty. (12)

We now have the necessary quantities to define the second-stage payment function. Define the event

Ei,𝑺(l){maxtΔ|f^i,t(l)|r(l)max𝐝Δn|h^i,𝐝(l)|r(l)}\displaystyle{E}_{i,\boldsymbol{S}}(l)\coloneqq{\{\max_{t\in\Delta}\;|\widehat{f}_{i,t}(l)|\geq r(l)\;\cup\;\max_{\mathbf{d}\in\Delta^{n}}|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\}} (13)

which denotes the occurrence of at least one of (7) and (9). The second-stage payment of player ii on day ll is defined as

pi,lS(𝜽^,𝜹^l)[vi(δ^i(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))𝔼𝜹𝜽^[vi(δi,g1(𝜽^),g2(𝜽^,𝜹))]]+Jp(l)𝟙{Ei,𝑺(l)}.\displaystyle p_{i,l}^{S}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\coloneqq\Bigg{[}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-\mathbb{E}_{\boldsymbol{\delta}\sim\mathbb{P}_{\boldsymbol{\widehat{\theta}}}}\big{[}v_{i}(\delta_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))\big{]}\Bigg{]}+J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}. (14)

A negative value of the above quantity implies a transfer from the social planner to player ii on day l.l. Note that if all players employ a truthful strategy, then the long-term average second-stage payment almost surely equals zero for every player.

The total payment pi,l(𝜽^,𝜹^l)p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l}) that player ii transfers to the social planner on day ll is

pi,l(𝜽^,𝜹^l)piF(𝜽^)+pi,lS(𝜽^,𝜹^l).\displaystyle p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\coloneqq p_{i}^{F}(\boldsymbol{\widehat{\theta}})+p_{i,l}^{S}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l}). (15)

The following theorem establishes the incentive and optimality guarantees of the mechanism.

Theorem 1.

Consider the two-stage repeated stochastic game induced by the payment rule (15).

  1. 1.

    A Truthful strategy profile is a dominant strategy non-bankrupting equilibrium.

  2. 2.

    If for every i{1,,n}i\in\{1,\ldots,n\} and every 𝜽,\boldsymbol{\theta},

    W(𝜽)W(𝜽i)0,\displaystyle W^{*}(\boldsymbol{\theta})-W^{*}(\boldsymbol{\theta}_{-i})\geq 0, (16)

    then every player obtains a nonnegative utility by employing a truthful strategy regardless of the strategies that the other players employ.

  3. 3.

    If every player employs a truthful strategy, then the long-term average social welfare (4) that results is almost surely equal to its optimal value W(𝜽).W^{*}(\boldsymbol{\theta}).

Proof.

Arbitrarily fix 𝜽,\boldsymbol{\theta}, i{1,,n},i\in\{1,\ldots,n\}, the strategy SiΛiS_{i}\in\Lambda_{i} that player ii employs, and the strategy profile 𝑺i𝒩i\boldsymbol{S}_{-i}\in\mathcal{NB}_{-i} that all other players employ. We begin with a lemma.

Lemma 1.

For Ti𝒯i,T_{i}\in\mathcal{T}_{i}, it holds almost surely that

lim supL1Ll=1LJp(l)𝟙{Ei,(Ti,𝑺i)(l)}=0.\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,(T_{i},\boldsymbol{S}_{-i})}(l)\}}=0. (17)

I.e., if player ii employs a truthful strategy, then the penalty that it pays is almost surely zero.

Proof.

It suffices to show that {Ei,(Ti,𝑺i)(l)}\{E_{i,(T_{i},\boldsymbol{S}_{-i})}(l)\} almost surely occurs only finitely often. Arbitrarily fix 𝐝Δn\mathbf{d}\in\Delta^{n}. Define lσ(𝜹^il,δil1)\mathcal{F}_{l}\coloneqq\sigma(\widehat{\boldsymbol{\delta}}_{-i}^{l},\delta_{i}^{l-1}) so that (𝟙{𝜹^i(l)=𝐝i}[𝟙{δi(l)=di}θi(di)],l+1)\big{(}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]},\mathcal{F}_{l^{\prime}+1}\big{)} is a martingale difference sequence bounded by unity. It follows from the Azuma-Hoeffding inequality that

(|1ll=1l𝟙{𝜹^i(l)=𝐝i}[𝟙{δi(l)=di}θi(di)]|r(l))2e2lr2(l).\displaystyle\mathbb{P}\big{(}\big{|}\ \frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]}\big{|}\geq r(l)\big{)}\leq 2e^{-2lr^{2}(l)}. (18)

Combining the above inequality with (11) implies

(|1ll=1l𝟙{𝜹^i(l)=𝐝i}[𝟙{δi(l)=di}θi(di)]|r(l))1l1+γ.\displaystyle\mathbb{P}\big{(}\big{|}\ \frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]}\big{|}\geq r(l)\big{)}\leq\frac{1}{l^{1+\gamma}}. (19)

Using (8) and the fact that player ii employs a truthful strategy, the above inequality implies

(|h^i,𝐝(l)|r(l))1l1+γ\displaystyle\mathbb{P}\big{(}\big{|}\widehat{h}_{i,\mathbf{d}}(l)\big{|}\geq r(l)\big{)}\leq\frac{1}{l^{1+\gamma}} (20)

which in turn implies that l=1(|h^i,𝐝(l)|r(l))<.\sum_{l=1}^{\infty}\mathbb{P}\big{(}\big{|}\widehat{h}_{i,\mathbf{d}}(l)\big{|}\geq r(l)\big{)}<\infty. Invoking the Borel-Cantelli lemma, we have that {|h^i,𝐝(l)|r(l)}\{|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\} almost surely occurs only finitely often.

Similarly, (𝟙{δi(l)=di}θi(di),l+1)(\mathds{1}_{\{{\delta}_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i}),\mathcal{F}_{l^{\prime}+1}) is a martingale difference sequence bounded by unity and following the same sequence of arguments as above, it can be established that {|f^i,di(l)|r(l)}\{|\widehat{f}_{i,d_{i}}(l)|\geq r(l)\} almost surely occurs only finitely often.

Since 𝐝\mathbf{d} is arbitrarily chosen, we have that for every 𝐝Δn,\mathbf{d}\in\Delta^{n}, {|h^i,𝐝(l)|r(l)}\{|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\} and {|f^i,di(l)|r(l)}\{|\widehat{f}_{i,d_{i}}(l)|\geq r(l)\} almost surely occur only finitely often, and the desired result follows. ∎

We have

ui(Si,𝑺i,𝜽,𝜹)\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty}) =lim infL1Ll=1Lvi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))pi,l(𝜽^,𝜹^l),\displaystyle=\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}({\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l}),

where 𝜽^\boldsymbol{\widehat{\theta}} and 𝜹^\boldsymbol{\widehat{\delta}}^{\infty} are determined in accordance with 𝑺\boldsymbol{S}. Substituting (5) and (14) into (15), substituting the resulting expression for pi(l)p_{i}(l) into the above equality, and simplifying the result yields

ui(Si,𝑺i,𝜽,𝜹)\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty}) =[W(𝜽^)W(𝜽^i)]\displaystyle=\big{[}W^{*}(\boldsymbol{\widehat{\theta}})-W^{*}(\boldsymbol{\widehat{\theta}}_{-i})\big{]}
+lim infL1Ll=1L(vi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))vi(δ^i(l),g1(𝜽^),g2(𝜽^,𝜹^(l))))\displaystyle+\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{(}v_{i}({\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))\bigg{)}
lim supL1Ll=1LJp(l)𝟙{Ei,𝑺(l)}.\displaystyle-\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}. (21)

Arbitrarily fix Ti𝒯iT_{i}\in\mathcal{T}_{i}. Then, we obtain using Lemma 1 and some straightforward algebra that

ui(Ti,𝑺i,𝜽,𝜹)ui(Si,𝑺i,𝜽,𝜹)\displaystyle u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})-u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty}) =[W(θi,𝜽^i)W(θ^i,𝜽^i)]\displaystyle=\big{[}W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\big{]}
+lim supL1Ll=1L(vi(δ^i(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))vi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l))))\displaystyle+\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{(}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-v_{i}({\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))\bigg{)}
+lim supL1Ll=1LJp(l)𝟙{Ei,𝑺(l)}.\displaystyle+\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}. (22)

In what follows, we show that the above quantity is almost surely nonnegative, implying that truthful strategy profiles are Dominant Strategy Non-Bankrupting Equilibria.

Define

νi(θ^i,𝜽^i)𝔼(δ^i,𝜹^i)θ^i×𝜽^i[vi(δ^i,g1(𝜽^),g2(𝜽^,𝜹^))]\displaystyle\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\coloneqq\mathbb{E}_{(\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i})\sim\widehat{\theta}_{i}\times\boldsymbol{\widehat{\theta}}_{-i}}\bigg{[}v_{i}\big{(}\widehat{\delta}_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}\bigg{]} (23)

and

νi(θ^i,𝜽^i)𝔼(δ^i,𝜹^i)θ^i×𝜽^i[jivj(δ^j,g1(𝜽^),g2(𝜽^,𝜹^))c(g1(𝜽^),g2(𝜽^,𝜹^))]\displaystyle\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\coloneqq\mathbb{E}_{(\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i})\sim\widehat{\theta}_{i}\times\boldsymbol{\widehat{\theta}}_{-i}}\bigg{[}\sum_{j\neq i}v_{j}\big{(}\widehat{\delta}_{j},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}-c\big{(}g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}\bigg{]} (24)

so that

W(θ^i,𝜽^i)=νi(θ^i,𝜽^i)+νi(θ^i,𝜽^i).\displaystyle W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})=\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}). (25)

Let μ(Δ,Δn)\mu(\Delta,\Delta^{n}) be the set of joint probability mass functions over Δ×Δn.\Delta\times\Delta^{n}. For ψμ(Δ,Δn),\psi\in\mu(\Delta,\Delta^{n}), define

ρi(ψ)𝔼(δi,[δ^i,𝜹^i])ψ[vi(δi,g1(𝜽^),g2(𝜽^,𝜹^))].\displaystyle\rho_{i}(\psi)\coloneqq\mathbb{E}_{({\delta}_{i},[\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i}])\sim\psi}\bigg{[}v_{i}({\delta}_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}))\bigg{]}. (26)

Let Ψ(θi,𝜽^)μ(Δ,Δn)\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})\subset\mu(\Delta,\Delta^{n}) be the set of joint probability mass functions with “xx-marginal” distributed according to θi\theta_{i} and “yy-marginal” distributed according to θ^1××θ^n.{\widehat{\theta}_{1}\times\ldots\times\widehat{\theta}_{n}}. Then, for every ψΨ(θi,𝜽^),\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}}),

W(θi,𝜽^i)ρi(ψ)+νi(θ^i,𝜽^i).\displaystyle W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})\geq\rho_{i}(\psi)+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}). (27)

To see this, note that if (δi,𝜹i)θi×𝜽^i(\delta_{i},\boldsymbol{\delta}_{-i})\sim\theta_{i}\times\boldsymbol{\widehat{\theta}}_{-i}, then the social planner can map δi\delta_{i} to a random variable δi\delta_{i}^{\prime} using an appropriate probability transition kernel Pδi|𝜹P_{\delta_{i}^{\prime}|\boldsymbol{\delta}} such that (δi,[δi,𝜹i])ψΨ(θi,𝜽^).(\delta_{i},[\delta_{i}^{\prime},\boldsymbol{\delta}_{-i}])\sim\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}}). Consequently, by choosing the first-stage outcome as g1(𝜽^)g_{1}^{*}(\boldsymbol{\widehat{\theta}}) and the second-stage outcome as g2(𝜽^,[δi,𝜹i])g_{2}^{*}(\boldsymbol{\widehat{\theta}},[{\delta}_{i}^{\prime},\boldsymbol{\delta}_{-i}]), an expected social welfare of ρi(ψ)+νi(θ^i,𝜽^i)\rho_{i}(\psi)+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}) can be attained. It follows that the optimal expected social welfare W(θi,𝜽^i)W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i}) is at least as large, which yields (27)222This argument requires the second-stage decision rule to be randomized whereas we have assumed g1g_{1}^{*} and g2g_{2}^{*} to be deterministic functions. This apparent gap can be addressed by noting that an optimal decision rule (g1,g2)(g_{1}^{*},g_{2}^{*}) can be found within the class of deterministic functions..

Suppose for a moment that each player j{1,,n}j\in\{1,\ldots,n\} employs a stationary second-stage bidding policy μSj\mu_{S}^{j} so that δ^j(l)\widehat{\delta}_{j}(l) is chosen as a function of δj(l)\delta_{j}(l) according to some probability kernel Pδ^j|δjjP^{j}_{\widehat{\delta}_{j}|\delta_{j}} for every l.l. For player jj’s strategy to be non-bankrupting, it is necessary that limL1Ll=1L𝟙{δ^j(l)=t}=θ^j(t)\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=t\}}=\widehat{\theta}_{j}(t) almost surely for every tΔt\in\Delta for (7) would be violated infinitely often otherwise resulting in infinite average penalty. So, for every j{1,,n},j\in\{1,\ldots,n\}, if player jj’s strategy is to be non-bankrupting, then Pδ^j|δjjP^{j}_{\widehat{\delta}_{j}|\delta_{j}} must be such that δ^j(1)θ^j\widehat{\delta}_{j}(1)\sim\widehat{\theta}_{j} given δj(1)θj\delta_{j}(1)\sim\theta_{j}. It follows that for every j{1,,n},j\in\{1,\ldots,n\}, (δj(1),𝜹^(1))ψj(\delta_{j}(1),\boldsymbol{\widehat{\delta}}(1))\sim\psi_{j} for some ψjΨ(θj,𝜽^)\psi_{j}\in\Psi(\theta_{j},\boldsymbol{\widehat{\theta}}). It also follows that {(𝜹(1),𝜹^(1)),(𝜹(2),𝜹^(2)),}\{(\boldsymbol{\delta}(1),\boldsymbol{\widehat{\delta}}(1)),(\boldsymbol{\delta}(2),\boldsymbol{\widehat{\delta}}(2)),\ldots\} is a sequence of IID random variables, and so we obtain using the Strong Law of Large Numbers (SLLN) that the RHS of (22) almost surely equals [W(θi,𝜽^i)W(θ^i,𝜽^i)]+[νi(θ^i,𝜽^i)ρi(ψi)].[W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})]+[\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})-\rho_{i}(\psi_{i})]. Upon substituting (25), this becomes W(θi,𝜽^i)νi(θ^i,𝜽^i)ρi(ψi),W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})-\rho_{i}(\psi_{i}), and combining it with (27) implies the nonnegativity of (22).

However, in order to fabricate the type bids, the players may not restrict just to stationary policies but can employ any history-dependent policy. The rest of the proof is devoted to showing that the same result, namely, the nonnegativity of (22), holds even in the general case where the players may employ any non-bankrupting strategy. The key to establishing this is the following lemma that characterizes the empirical joint distributions of the reported types when all players employ a non-bankrupting strategy.

Lemma 2.

Suppose that for every j{1,,n},j\in\{1,\ldots,n\},

lim supL1Ll=1LJp(l)𝟙{Ej,𝑺(l)}<.\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty. (28)

Then, for every 𝐝Δn,\mathbf{d}\in\Delta^{n},

limL1Ll=1L𝟙{𝜹^(l)=𝐝}=Πj=1nθ^j(dj).\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\boldsymbol{\widehat{\delta}}(l)=\mathbf{d}\}}=\Pi_{j=1}^{n}\widehat{\theta}_{j}(d_{j}). (29)
Proof.

It suffices to show that for all 𝐝Δn\mathbf{d}\in\Delta^{n} and all k{1,,n1},k\in\{1,\ldots,n-1\},

limL1Ll=1L𝟙{𝜹^k:n(l)=𝐝k:n}=θ^k(dk)[limL1Ll=1L𝟙{𝜹^k+1:n(l)=𝐝k+1:n}]\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]} (30)

and that

limL1Ll=1L𝟙{δ^n(l)=dn}=θ^n(dn),\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{{\delta}}_{n}(l)={d}_{n}\}}=\widehat{\theta}_{n}(d_{n}), (31)

where 𝐝k:n[dkdk+1dn]\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\coloneqq[d_{k}\;d_{k+1}\;\ldots\;d_{n}] and 𝜹^k:n(l)\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l) is defined likewise.

Combining (28) with (12) implies that lim supLl=1L𝟙{Ej,𝑺(l)}<\limsup_{L\to\infty}\sum_{l=1}^{L}\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty for every j{1,,n}j\in\{1,\ldots,n\}. I.e., the event sequence {Ej,𝑺(l)}\{E_{j,\boldsymbol{S}}(l)\} occurs only finitely often. Hence, we obtain using (13) and (10) that for all 𝐝Δn\mathbf{d}\in\Delta^{n} and all j{1,,n},j\in\{1,\ldots,n\},

limLf^j,dj(L)=0\displaystyle\lim_{L\to\infty}\widehat{f}_{j,d_{j}}(L)=0 (32)

and

limLh^j,𝐝(L)=0.\displaystyle\lim_{L\to\infty}\widehat{h}_{j,\mathbf{d}}(L)=0. (33)

Substituting (6) in (32) implies

limL1Ll=1L𝟙{δ^j(l)=dj}=θ^j(dj)\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=d_{j}\}}=\widehat{\theta}_{j}(d_{j}) (34)

for all djΔ{d}_{j}\in\Delta and all j{1,,n},j\in\{1,\ldots,n\}, which in particular establishes (31).

Substituting (8) in (33) implies

limL1Ll=1L𝟙{δ^j(l)=dj,𝜹^j(l)=𝐝j}=θ^j(dj)[limL1Ll=1L𝟙{𝜹^j(l)=𝐝j}]\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=d_{j},\widehat{\boldsymbol{\delta}}_{-j}(l)=\mathbf{d}_{-j}\}}=\widehat{\theta}_{j}(d_{j})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-j}(l)=\mathbf{d}_{-j}\}}\bigg{]} (35)

for all 𝐝Δn\mathbf{d}\in\Delta^{n} and all j{1,,n}j\in\{1,\ldots,n\}. In concluding (35), we have assumed that the limit in the RHS exists, to justify which certain additional arguments are required. We omit these details since they might lessen the focus on the main aspects of the proof.

The equality (30) can now established by noting that

limL1Ll=1L𝟙{𝜹^k:n(l)=𝐝k:n}\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}} =limL1Ll=1Lt1,,tk1𝟙{δ^1(l)=t1,,δ^k1(l)=tk1,𝜹^k:n(l)=𝐝k:n}\displaystyle=\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\sum_{t_{1},\ldots,t_{k-1}}\mathds{1}_{\{{\widehat{\delta}}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}
=t1,,tk1limL1Ll=1L𝟙{δ^1(l)=t1,,δ^k1(l)=tk1,𝜹^k:n(l)=𝐝k:n}\displaystyle=\sum_{t_{1},\ldots,t_{k-1}}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}
=t1,,tk1[θ^k(dk)][limL1Ll=1L𝟙{δ^1(l)=t1,,δ^k1(l)=tk1,𝜹^k+1:n(l)=𝐝k+1:n}]\displaystyle=\sum_{t_{1},\ldots,t_{k-1}}\bigg{[}\widehat{\theta}_{k}(d_{k})\bigg{]}\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}
=θ^k(dk)[limL1Ll=1Lt1,,tk1𝟙{δ^1(l)=t1,,δ^k1(l)=tk1,𝜹^k+1:n(l)=𝐝k+1:n}]\displaystyle=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\sum_{t_{1},\ldots,t_{k-1}}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}
=θ^k(dk)[limL1Ll=1L𝟙{𝜹^k+1:n(l)=𝐝k+1:n}],\displaystyle=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}, (36)

where the third equality follows from (35). ∎

It follows from (12) that lim supL1Ll=1LJp(l)𝟙{Ei,𝑺(l)}\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}} can only take values 0 and .\infty. In the latter case, the nonnegativity of (22) is immediate. In the former case, since 𝑺i\boldsymbol{S}_{-i} is a non-bankrupting strategy profile, we have that for all j{1,,n},j\in\{1,\ldots,n\},

lim supL1Ll=1LJp(l)𝟙{Ej,𝑺(l)}<\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty (37)

almost surely. Consequently, Lemma 2 applies, and we get

limL1Ll=1Lvi(δ^i(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))=νi(θ^i,𝜽^i).\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))=\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}). (38)

Now, consider the empirical joint distribution ψL(d,𝐝^)1Ll=1L𝟙{δi(l)=d,𝜹^(l)=𝐝^},\psi_{L}(d,\mathbf{\widehat{d}})\coloneqq\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{{\delta}_{i}(l)={d},\widehat{\boldsymbol{\delta}}(l)=\widehat{\mathbf{d}}\}}, where dΔ{d}\in\Delta and 𝐝^Δn.\widehat{\mathbf{d}}\in\Delta^{n}. Note that ψLμ(Δ,Δn)\psi_{L}\in\mu(\Delta,\Delta^{n}) for all L+.L\in\mathbb{Z}_{+}. It follows from SLLN that for any dΔ,d\in\Delta, limL𝐝^ψL(d,𝐝^)=θi(d).\lim_{L\to\infty}\sum_{\mathbf{\widehat{d}}}\psi_{L}(d,\widehat{\mathbf{d}})=\theta_{i}(d). Since (37) holds, we obtain using Lemma 2 that for any 𝐝^Δn,\mathbf{\widehat{d}}\in\Delta^{n}, limLdψL(d,𝐝^)=Πj=1nθ^j(dj^).\lim_{L\to\infty}\sum_{{{d}}}\psi_{L}(d,\widehat{\mathbf{d}})=\Pi_{j=1}^{n}\widehat{\theta}_{j}(\widehat{{d}_{j}}). I.e., the sequence {ψL}\{\psi_{L}\} of empirical joint distributions is such that its x-marginal approaches the distribution θi\theta_{i} and its y-marginal approaches the distribution θ^1××θ^n.{\widehat{\theta}_{1}\times\ldots\times\widehat{\theta}_{n}}. It can be shown as a consequence that {ψL}\{\psi_{L}\} approaches the set Ψ(θi,𝜽^)\Psi(\theta_{i},\boldsymbol{\widehat{\theta}}) in that minψΨ(θi,𝜽^)ψψL0\min_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}||\psi-\psi_{L}||\to 0 as L,L\to\infty, where ||||||\cdot|| can be any norm defined on the set μ(Δ,Δn).\mu(\Delta,\Delta^{n}). Also, the function ρi:μ(Δ,Δn)\rho_{i}\mathrel{\mathop{\mathchar 58\relax}}\mu(\Delta,\Delta^{n})\to\mathbb{R} defined in (26) is a continuous function over a compact set, and hence uniformly continuous. It follows that

lim infLρi(ψL)supψΨ(θi,𝜽^)ρi(ψ).\displaystyle\liminf_{L\to\infty}\rho_{i}(\psi_{L})\leq\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi). (39)

Note also that 1Ll=1Lvi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))=𝔼(δi,[δ^i,𝜹^i])ψL[vi(δi,g1(𝜽^),g2(𝜽^,𝜹^))]=ρi(ψL).\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))=\mathbb{E}_{(\delta_{i},[\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i}])\sim\psi_{L}}[v_{i}(\delta_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}))]=\rho_{i}(\psi_{L}). Taking the limit as LL\to\infty and using (39) implies

lim infL1Ll=1Lvi(δi(l),g1(𝜽^),g2(𝜽^,𝜹^(l)))supψΨ(θi,𝜽^)ρi(ψ).\displaystyle\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l)))\leq\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi). (40)

Substituting (38) and (40) in (22) yields

ui(Ti,𝑺i,𝜽,𝜹)ui(Si,𝑺i,𝜽,𝜹)[W(θi,𝜽^i)W(θ^i,𝜽^i)]+νi(θ^i,𝜽^𝒊)supψΨ(θi,𝜽^)ρi(ψ).\displaystyle u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})-u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\geq[W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})]+\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}_{-i}})-\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi).

Upon substituting (25), the RHS of the above inequality becomes W(θi,𝜽^i)νi(θ^i,𝜽^𝒊)supψΨ(θi,𝜽^)ρi(ψ).W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}_{-i}})-\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi). Combining this with (27) implies its nonnegativity, thereby establishing the nonnegativity of (22).

We now prove the second statement of the theorem. Arbitrarily fix 𝑺iΛi\boldsymbol{S}_{-i}\in\Lambda_{-i} and Ti𝒯i.T_{i}\in\mathcal{T}_{i}. Using (15), (2) and Lemma 1, we obtain almost surely that ui(Ti,𝑺i,𝜽,𝜹)=[W(θi,𝜽^i)W(𝜽^i)]0u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})=\big{[}W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\boldsymbol{\widehat{\theta}}_{-i})\big{]}\geq 0, where the inequality follows from (16). Hence, truth-telling is individually rational for every player.

That the mechanism maximizes social welfare under truthful bidding is a straightforward consequence of the optimality of the first- and the second-stage decision rules. ∎

The following section describes an application of the mechanism to the design of demand response markets.

IV Application to Demand Response Markets

As mentioned in Section I, one of the motivating reasons for introducing the environment of a two-stage repeated stochastic game is its ability to readily model many problems that arise in the context of next-generation electricity markets. We illustrate one such problem in this section, namely, mechanism design for demand response markets. In addition to illustrating an application of the proposed framework, the results of this section also serve to illustrate the benefits of using the proposed mechanism as opposed to other “natural” mechanisms that a policy-maker might employ in such scenarios.

One of the main requirements of power systems operations is that the power supply has to equal the random demand at each time instant. In conventional systems, the power supply can be controlled, and so the generation is continuously adjusted to follow the random demand to maintain balance. However, at deep levels of renewable energy penetration, the generation becomes random. A popular paradigm for maintaining demand-supply balance in such a system is to make the demand follow the random supply. This typically involves curtailing consumption during times of power supply shortage. This is referred to as demand response, and is achieved by using incentives to modulate the demand.

One of the key challenges in implementing demand response is that in order to optimally allocate a desired consumption reduction among the demand response providers, their costs for curtailing consumption must be known, which are in general random and private to the loads, and which they could misreport to achieve more favorable allocations for themselves. The goal of the mechanism designer is to elicit both the probability distribution and the realization of the private costs truthfully. See [4] for more details. In what follows, we describe how the mechanism developed in the previous section can be applied to this problem.

In this section, we overload certain notation. Specifically, whenever a demand response market-specific quantity maps to a two-stage repeated stochastic game-specific quantity, the former will be denoted using the same symbol that has been used for the latter.

Consider a system consisting of nn Demand Response (DR) providers and a reserve generator. Each DR provider has a cost function that specifies the cost it incurs as a function of its power consumption reduction. We assume that the cost function is parameterizable and denote by δi(l)\delta_{i}(l) the parameter that specifies the cost function of DR provider ii on day ll. Hence, c(x,δi(l))c(x,\delta_{i}(l)) denotes the cost that DR provider ii incurs on day ll for curtailing its consumption by xx units from its baseline. The sequence 𝜹\boldsymbol{\delta}^{\infty} is IID with 𝜹(1)𝜽θ1××θn\boldsymbol{\delta}(1)\sim\boldsymbol{\theta}\coloneqq\theta_{1}\times\ldots\times\theta_{n} where θi\theta_{i} denotes the probability distribution of δi(1).{\delta}_{i}(1). The reserve generator has associated with it a production function cs:c_{s}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}\to\mathbb{R} which specifies the cost it incurs as a function of the power that it produces.

Denote by d(l)d(l) the power shortage on day l.l. The system operator wishes to minimize the social cost of compensating the shortage, and therefore wishes to determine the consumption reduction of the DR providers and the reserve generation on day ll as

(𝐱(𝜹(l)),gs(𝜹(l)))=argminx1,,xn,gs\displaystyle(\mathbf{x}^{*}(\boldsymbol{\delta}(l)),g_{s}^{*}(\boldsymbol{\delta}(l)))=\operatorname*{arg\,min}_{x_{1},\ldots,x_{n},g_{s}}\quad i=1nc(xi,δi(l))+cs(gs)\displaystyle\sum_{i=1}^{n}c(x_{i},\delta_{i}(l))+c_{s}(g_{s}) (41)
subjectto\displaystyle\mathrm{subject\;to}\quad i=1nxi+gs=d(l).\displaystyle\sum_{i=1}^{n}x_{i}+g_{s}=d(l).

The problem of course is that the system operator does not know {δ1(l),,δn(l)},\{\delta_{1}(l),\ldots,\delta_{n}(l)\}, and so it requests the DR providers to bid their cost functions. Denote by δ^i(l)\widehat{\delta}_{i}(l) the parameter that DR provider ii bids on day ll. The system operator computes 𝐱(𝜹^(l))\mathbf{x}^{*}(\widehat{\boldsymbol{\delta}}(l)) and pays each DR load ii a payment pi(l)p_{i}(l) on day ll for reducing its consumption by xi(𝜹^(l)).x_{i}^{*}(\boldsymbol{\widehat{\delta}}(l)). The average utility that DR provider ii accrues is defined as

uilimL1Ll=1Lpi(l)c(xi(𝜹^(l)),δi(l)).\displaystyle u_{i}^{\infty}\coloneqq\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}p_{i}(l)-c(x_{i}^{*}(\widehat{\boldsymbol{\delta}}(l)),\delta_{i}(l)). (42)
Refer to caption
Figure 2: Social cost vs. the number of DR providers. The larger the number of participants in the demand response program, the lower the social cost of the program.

It is straightforward to see that the average utility of each DR provider is a function of not only its own bidding strategy, but also the bidding strategy of the other DR loads. Consequently, a DR provider may not bid its cost truthfully if there is a possibility for it to obtain a larger utility by misreporting its cost. This in turn could result in the demand response program operating in a manner that is not social cost-minimizing. This motivates the mechanism design problem. The mechanism presented in the previous section can be used to design a payment rule which results in truth-telling being a dominant strategy non-bankrupting equilibrium.

For our numerical study, we have taken c(x,δi)=δi2x2c(x,\delta_{i})=\frac{\delta_{i}}{2}x^{2}, cs(x,δs)=δs2x2,c_{s}(x,\delta_{s})=\frac{\delta_{s}}{2}x^{2}, 𝜽\boldsymbol{\theta} to be a product of beta distributions of unit mean and variance 22, and δs(l)\delta_{s}(l) to also be beta distributed with the same parameters.

Fig. 2 quantifies how the social cost reduces as the participation of DR providers increases. Fig. 3 illustrates how the payment resulting from the proposed mechanism behaves from the point of view of a randomly chosen DR provider. Specifically, we fix the cost function of a randomly chosen DR provider and plot how its average payment varies with the mean of the costs of the other DR providers. Qualitatively, the higher the mean cost of a DR provider, the higher the inelasticity of its demand. Hence, Fig. 3 quantifies the rate at which the payment received by a given DR load increases as a function of the inelasticity of the other DR providers.

An arguably natural alternative for the proposed mechanism is the posted price mechanism wherein the system operator announces the payment pppp_{pp} that the DR providers would receive per unit reduction in their power consumption. Each DR provider ii then chooses its curtailment xi,pp(l)x_{i,pp}^{*}(l) on day ll as xi,pp(l)=argminxc(x,δi(l))pppx.{x}_{i,pp}^{*}(l)=\operatorname*{arg\,min}_{x}\;c(x,\delta_{i}(l))-p_{pp}x. The residual mismatch d(l)i=1nxi,pp(l)=:gs(l)d(l)-\sum_{i=1}^{n}x_{i,pp}^{*}(l)=\mathrel{\mathop{\mathchar 58\relax}}g_{s}(l) is purchased in the spot market at price cs(x,δs(l))=δs(l)2gs2(l).c_{s}(x,\delta_{s}(l))=\frac{\delta_{s}(l)}{2}g^{2}_{s}(l). Such a mechanism has been employed, for example, in a prior demand response trial in the United Kingdom.

How do such “simple,” “natural” alternatives compare with the proposed mechanism? Fig. 4 compares the social cost attained by the proposed mechanism with the social cost attained by the posted price mechanism. Certain important observations are in order. First, note that there exists a price point at which the posted price mechanism attains its minimum social cost. However, this price point is a function of the type distributions of the DR loads which are their private knowledge. This necessitates the system operator to perform price discovery in order to compute the optimal price point — a process that is vulnerable to strategic manipulation by the DR providers. Secondly, even assuming that the DR providers do not manipulate the price discovery, the minimum social cost that can be attained by the posted price mechanism is in general strictly larger than what can be attained by employing the proposed mechanism.

Refer to caption
Figure 3: Average payment received by a fixed DR provider as a function of the mean of the supertypes of the other DR providers. The fixed DR provider has cost parameter δi(l)=4\delta_{i}(l)=4 for all ll, and the supertypes of the other loads are beta distributed with varying mean and a fixed variance of 22. Hence, the average payment received by a given load increases as the demand of the other loads become more and more inelastic.
Refer to caption
Figure 4: The social cost attained by the posted price mechanism vs. the price.

V Related Work

The setting of two-stage stochastic games was introduced in [1] which considers a one-shot setting and develops a mechanism that renders truthful bidding a sequential ex post Nash equilibrium. Reference [5] considers a two-stage game setting to model electricity markets consisting of wind power producers and develops a mechanism that incentivizes truthful bidding. However, it assumes that it is only in the first stage of the game that the wind power producers can bid strategically, and not in the second stage. In contrast, the setting that we have considered assumes that the valuation function distribution and the valuation function realization are private to the players, and that they can misreport either or both of them to accrue a higher utility. Reference [6] presents a two-stage mechanism called the generalized Groves mechanism. In terms of the terminology and the framework presented in this paper, the setting in [6] can be interpreted as each player having a privately known distribution of its valuation function which it is required to bid to the social planner. The joint distribution of the players’ valuation functions is assumed to be common knowledge. The social planner chooses an outcome that maximizes the expected social welfare based on the bids. After the social planner chooses the outcome, the valuation functions realize, which the players are required to bid in the second stage. Following this, a final payment is made. The payment rule guarantees that truth-telling by all players is an ex post Nash equilibrium. It is important to recognize that it is only the payment rule that has two stages in the aforementioned setting, and not the game itself. This in fact is one of the key departures of the one-shot two-stage stochastic game setting from the setting considered in [6]; the latter doesn’t include the possibility for the social planner to take recourse actions after the valuation functions realize. In the context of electricity markets, not only is it feasible to take recourse actions, it is also imperative to take recourse actions if grid stability is to be maintained. Reference [7] builds upon the mechanism proposed in [6] to devise a two-stage mechanism for bilateral trade. A power system offering a demand response program is considered in [8, 9] and a two-stage mechanism is presented using which a certain quantity of power can be apportioned among the loads when a demand response event occurs. The first stage establishes a contingency plan that specifies the amount of power that would be supplied to each load in each contingency and the corresponding price, and the second stage, during which the contingency occurs, allows the loads to trade among themselves at the price established in the first stage. It is shown that the second stage trade results in an allocation that Pareto dominates the first-stage allocation. All of the aforementioned papers consider a one-shot game whereas the setting that we have considered is one of repeated plays. As mentioned in Section I, the aspect of repeated plays introduces certain additional complexities for mechanism design that can be attributed to the availability history-dependent bidding strategies to players. A similar challenge manifests in dynamic games. References [10, 11, 12, 13, 14] are some of the papers that address the problem of mechanism design for dynamic games. The solution concept adopted in most of the literature on dynamic games is ex post Nash equilibrium or variants thereof. With the exception of certain special cases such as in [14], to the best of our knowledge, we are unaware of any other work that tries to surpass Nash equilibrium or its variants and implement truth-telling in stronger notions of equilibria for broad classes of repeated or dynamic games. A generously disposed view of the present paper could be as an attempt in that direction.

VI Conclusion

We have considered two-stage repeated stochastic games wherein private information is revealed over two stages and the social planner is constrained to make a decision in each stage. The setting models many important problems that arise in next-generation electricity markets. Recognizing the limitation of Nash equilibria in molding real-world behavior, we have introduced the notion of a dominant strategy non-bankrupting equilibrium which requires the players to make very little assumptions about the behaviors of the other players to employ their equilibrium strategy. Consequently, a mechanism that implements a certain desired behavior as a dominant strategy non-bankrupting equilibrium could effectively mold real-world behavior along the desired lines. We have developed a mechanism for two-stage repeated stochastic games that implements truth-telling as a DNBE. The mechanism is also individually rational and maximizes social welfare.

References

  • [1] S. Ieong, A. M.-C. So, and M. Sundararajan, “Stochastic mechanism design,” in International Workshop on Web and Internet Economics.   Springer, 2007, pp. 269–280.
  • [2] D. Bergemann and J. Välimäki, “Dynamic mechanism design: An introduction,” Journal of Economic Literature, vol. 57, no. 2, pp. 235–74, 2019.
  • [3] B. Satchidanandan and M. A. Dahleh, “An efficient and incentive-compatible mechanism for energy storage markets,” arXiv preprint arXiv:2012.11540, 2020.
  • [4] B. Satchidanandan, M. Roozbehani, and M. A. Dahleh, “A two-stage mechanism for demand response markets,” IEEE Control Systems Letters, vol. 7, pp. 49–54, 2023.
  • [5] W. Tang and R. Jain, “Market mechanisms for buying random wind,” IEEE Transactions on Sustainable Energy, vol. 6, no. 4, pp. 1615–1623, 2015.
  • [6] C. Mezzetti, “Mechanism design with interdependent valuations: Efficiency,” Econometrica, vol. 72, no. 5, pp. 1617–1626, 2004.
  • [7] T. Kunimoto and C. Zhang, “Efficient Bilateral Trade with Interdependent Values: The Use of Two-Stage Mechanisms,” Singapore Management University, School of Economics, Economics and Statistics Working Papers 14-2020, May 2020. [Online]. Available: https://ideas.repec.org/p/ris/smuesw/2020_014.html
  • [8] J. A. Doucet, K. J. Min, M. Roland, and T. Strauss, “A two-stage mechanism to improve electricity rationing,” The Canadian Journal of Economics / Revue canadienne d’Economique, vol. 29, pp. S270–S275, 1996. [Online]. Available: http://www.jstor.org/stable/135999
  • [9] J. A. Doucet, K. Jo Min, M. Roland, and T. Strauss, “Electricity rationing through a two-stage mechanism,” Energy Economics, vol. 18, no. 3, pp. 247–263, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/014098839600014X
  • [10] A. Pavan, I. Segal, and J. Toikka, “Dynamic mechanism design: A myersonian approach,” Econometrica, vol. 82, no. 2, pp. 601–653, 2014.
  • [11] D. P. Baron and D. Besanko, “Regulation and information in a continuing relationship,” Information Economics and policy, vol. 1, no. 3, pp. 267–302, 1984.
  • [12] M. Battaglini, “Long-term contracting with markovian consumers,” American Economic Review, vol. 95, no. 3, pp. 637–658, 2005.
  • [13] D. Bergemann and J. Valimaki, “Efficient dynamic auctions,” Cowles Foundation Discussion Paper, no. 1584, 2006.
  • [14] K. Ma and P. R. Kumar, “Incentive compatibility in stochastic dynamic systems,” IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 651–666, 2020.