Incentive Compatibility in Two-Stage Repeated Stochastic Games

Bharadwaj Satchidanandan and Munther A. Dahleh

Abstract

We address mechanism design for two-stage repeated stochastic games — a novel setting using which many emerging problems in next-generation electricity markets can be readily modeled. We introduce a new notion of equilibrium called Dominant Strategy Non-Bankrupting Equilibrium (DNBE) which requires players to make very little assumptions about the behavior of other players to employ their equilibrium strategy. Consequently, a mechanism that renders truth-telling a DNBE could be quite effective in molding real-world behavior along truthful lines. We present a mechanism for two-stage repeated stochastic games that renders truth-telling a Dominant Strategy Non-Bankrupting Equilibrium.

I Introduction

The power system is on the cusp of a revolution. The coming decade could witness increased renewable energy penetration, Electric Vehicle (EV) penetration, EV energy storage integration, demand response programs, etc. These changes have a profound impact on electricity market operations. New mechanisms must be devised to address a variety of important problems that are anticipated to arise in next-generation electricity markets. Most of the existing mechanism design settings are insufficient to model certain crucial features of these problems. To address this, we introduce the setting of Two-Stage Repeated Stochastic Games using which many problems that arise in the context of electricity markets can be readily modeled. The setting is an extension of the one-shot two-stage stochastic game introduced in [1] to repeated plays.

At a high level, a two-stage stochastic game, as the name suggests, consists of two stages. In the first stage, the players do not know their valuation functions precisely, but rather only know the probability distribution of their valuation functions. It is only in the second stage of the game that the valuation functions realize. However, the social planner cannot wait until the second stage to decide on the outcome. It is constrained to make certain decisions in the first stage itself based on the probability distribution bids of the players’ valuation functions. Once the valuation functions realize and are reported in the second stage, the social planner can make corrections to the first stage outcome by taking certain recourse actions but this comes at a cost [1]. It is the stochasticity of the players’ valuation functions and the prospect for them to misreport both the probability distribution and the realization of their valuation functions that preclude the use of classical mechanism design techniques to design efficient and incentive-compatible mechanisms for this setting.

Motivated by applications to electricity markets which operate every day, we consider a setting wherein a two-stage stochastic game is played repeatedly. Repeated playing affords the players a large class of strategies that adapt a player’s actions to all past observations and inferences obtained therefrom. In other settings such as iterative auctions or dynamic games where a large strategy space of this sort manifests, it typically has an important implication for mechanism design: It may be impossible to obtain truth-telling as a dominant strategy equilibrium [2]. Consequently, in such scenarios, it is common to settle for mechanisms that render truth-telling only a Nash equilibrium, or variants thereof, even though Nash equilibria are known to be poor models of real-world behavior. This is owing to each player having to make overly specific assumptions about the behaviors of the other players in order for them to employ their Nash equilibrium strategy, which they may not make. In general, the lesser the burden of speculation in an equilibrium, the more plausible it is that it models real-world behavior.

Guided by the above maxim, we develop a new notion of equilibrium called the Dominant Strategy Non-Bankrupting Equilibrium (DNBE) that requires players to make very little assumptions about the behaviors of the other players for them to employ their equilibrium strategy. Specifically, the only assumption that the players are required to make to play their DNBE strategy is that no player employs a strategy that leads to their own bankruptcy. We make this more precise in Section II. That the assumption is mild in that it is quite likely to hold in practice needs no belaboring. Consequently, a mechanism that implements a certain desired behavior as a DNBE as opposed to only a Nash equilibrium could be quite effective in molding real-world behavior along the desired lines.

We then present a mechanism for two-stage repeated stochastic games that renders truth-telling a dominant strategy non-bankrupting equilibrium. The mechanism is individually rational in that every player is guaranteed to accrue a nonnegative utility by truth-telling regardless of what strategies the other players employ. Finally, if every player bids truthfully, then the outcome that the mechanism produces maximizes social welfare. The mechanism is a generalization of the mechanism that we have developed in [3] for energy storage markets.

Finally, we apply the mechanism to design an efficient and incentive-compatible demand response market. There are two main takeaways that we wish to highlight for designers of next-generation electricity markets. The first is that there is a need to redesign the “bidding language” of the day-ahead market. In today’s electricity markets, the generators and loads bid their supply and demand functions respectively. However, with the inclusion of demand response providers who may not know exactly in the day-ahead market their ability to reduce consumption the following day, the day-ahead market should allow for bids that are only probabilistic in nature. It is only in real time, if and when called upon for demand response, that the demand response providers should be required to disclose their actual costs for curtailing consumption. The theory developed in the paper allows for such probabilistic bids to be submitted to the system operator. Secondly, our results show that “simple” mechanisms like making payments proportional to the power curtailed by demand response providers, which have been employed in previous demand response trials, are incapable of attaining the optimal social welfare. Significant welfare gains can be obtained by employing carefully-designed mechanisms that take into account the uncertainties of the market participants.

The rest of the paper is organized as follows. Section II begins with a precise description of a two-stage repeated stochastic game, defines the notion of dominant strategy non-bankrupting equilibrium, and formulates the mechanism design problem. Section III develops a mechanism for two-stage repeated stochastic games that guarantees truth-telling to be a dominant strategy non-bankrupting equilibrium. Section IV describes the application of the results to the design of demand response markets. Section V provides an account of related work. Section VI concludes the paper.

Notation: Vectors and sequences are denoted using boldface letters. Given a sequence $\mathbf{x}=\{x(1),x(2),\ldots\},$ we denote by $\mathbf{x}^{l}$ the segment $\{x(1),\ldots,x(l)\}.$ The hat notation is used to denote bids: Given a variable $x$ that is private to a player, we denote by $\widehat{x}$ the bid that the player submits for $x.$

II Problem Formulation

A two-stage stochastic game played by $n$ players and consisting of a social planner is described by

1.

a publicly-known set $\Delta$ known as the type space of the players,
2.

a publicly-known set $\Theta$ of probability mass functions over $\Delta$ , known as the supertype space of the players,
3.

for each $i\in\{1,\ldots,n\},$ a probability distribution $\theta_{i}\in\Theta$ , known as player $i$ ’s supertype, that is privately known to player $i$ in the first stage of the game, and which it is supposed to report to the social planner in the first stage,
4.

a set $\mathcal{O}_{1}$ of first-stage outcomes,
5.

a first-stage decision rule $g^{*}_{1}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\to\mathcal{O}_{1}$ according to which the social planner chooses the first-stage outcome as a function of the players’ supertype bids,
6.

for each $i\in\{1,\ldots,n\},$ player $i$ ’s type $\delta_{i}\in\Delta$ that is “drawn by nature” at random according to $\theta_{i}$ , whose realization is privately observed by player $i$ in the second stage of the game, and which it is supposed to report to the social planner in the second stage,
7.

a set $\mathcal{O}_{2}$ of second-stage outcomes or “recourse actions” that the social planner can choose,
8.

a second-stage decision rule $g^{*}_{2}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\times\Delta^{n}\to\mathcal{O}_{2}$ according to which the social planner chooses the second-stage outcome as a function of the players’ type and supertype bids,
9.

a cost function $c\mathrel{\mathop{\mathchar 58\relax}}\mathcal{O}_{1}\times\mathcal{O}_{2}\to\mathbb{R}$ that specifies for every $(o_{1},o_{2})\in\mathcal{O}_{1}\times\mathcal{O}_{2},$ the cost incurred by the social planner for choosing the outcome $o_{1}$ in the first stage and taking the recourse action $o_{2}$ in the second stage,
10.

for each $i\in\{1,\ldots,n\},$ a valuation function $v_{i}\mathrel{\mathop{\mathchar 58\relax}}\Delta\times\mathcal{O}_{1}\times\mathcal{O}_{2}\to\mathbb{R}$ of player $i$ that specifies for every $\delta_{i}\in\Delta$ and every $(o_{1},o_{2})\in\mathcal{O}_{1}\times\mathcal{O}_{2},$ the valuation of player $i$ if its type is $\delta_{i}$ and the social planner chooses the outcomes $o_{1}$ and $o_{2}$ in the first and the second stage of the game respectively.

The first- and second-stage decision rules $(g_{1}^{*},g_{2}^{*})$ that we consider are those that maximize the expected social welfare. To elaborate, let $g_{1}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\to\mathcal{O}_{1}$ be any first-stage decision rule and $g_{2}\mathrel{\mathop{\mathchar 58\relax}}\Theta^{n}\times\Delta^{n}\to\mathcal{O}_{2}$ be any second-stage decision rule. If the players bid their types and supertypes truthfully, then the expected social welfare that results as a consequence of using the decision rule $(g_{1},g_{2})$ is

\mathbb{E}_{\boldsymbol{\delta}\sim\boldsymbol{\theta}}\big{[}\sum_{i=1}^{n}v_{i}(\delta_{i},g_{1}(\boldsymbol{\theta}),g_{2}(\boldsymbol{\theta},\boldsymbol{\delta}))-c(g_{1}(\boldsymbol{\theta}),g_{2}(\boldsymbol{\theta},\boldsymbol{\delta}))\big{]}=\mathrel{\mathop{\mathchar 58\relax}}{W}(\boldsymbol{\theta},g_{1},g_{2}).

The goal of the social planner is to maximize the expected social welfare, and so the decision rule $(g_{1}^{*},g_{2}^{*})$ that it employs is

\displaystyle(g_{1}^{*},g_{2}^{*})=\operatorname*{arg\,max}_{g_{1},g_{2}}\;{W}(\cdot,g_{1},g_{2}),

(1)

where the maximization is defined in the pointwise sense. The social planner computes $g_{1}^{*}$ and $g_{2}^{*}$ and announces it to the players before the game commences.

The problem that we study is one where a two-stage stochastic game of the above form is played repeatedly on each day $l,$ $l\in\mathbb{Z}_{+}.$ For ease of exposition, we assume that the supertypes of the players remain the same on all days and it is only their types that differ across days, though this assumption can be relaxed in a straightforward manner. Consequently, for each player $i$ , $i\in\{1,\ldots,n\},$ we denote by $\theta_{i}$ its privately known supertype which remains the same on all days and by $\delta_{i}(l)$ its privately known type on day $l.$ The sequence $\{\boldsymbol{\delta}(1),\boldsymbol{\delta}(2),\ldots\}$ is assumed to be Independent and Identically Distributed (IID) with $\boldsymbol{\delta}(1)\sim\theta_{1}\times\ldots\times\theta_{n}.$

II-A First-stage strategy

On each day $l,$ each player $i$ is required to report its supertype to the social planner in the first stage so that the latter can compute the optimal first-stage outcome. Since the players’ supertypes are assumed to remain the same on all days, it suffices for the players to bid their supertypes just once, namely, in the first stage of the game on day $1.$ Owing to strategic reasons that will be clear shortly, the players may not bid their supertypes truthfully, and so we denote by ${\widehat{\theta}_{i}}$ the supertype bid of player $i$ and by $\sigma_{i}\mathrel{\mathop{\mathchar 58\relax}}\Theta\to\Theta$ the first-stage strategy according to which player $i$ constructs its supertype bid. Therefore, $\widehat{\theta}_{i}=\sigma_{i}(\theta_{i}).$ Once all players submit their supertype bids, the social planner computes the first-stage outcome as $g_{1}^{*}(\boldsymbol{\sigma}(\boldsymbol{\theta})),$ where $\boldsymbol{\sigma}(\boldsymbol{\theta})\coloneqq[\sigma_{1}(\theta_{1}),\ldots,\sigma_{n}(\theta_{n})].$ The game then proceeds to the second stage.

II-B Second-stage bidding policy

In the second stage on each day $l$ , each player $i$ observes the realization of $\delta_{i}(l)$ which it is supposed to report to the social planner. However, owing to strategic reasons that will become clear shortly, the players may not bid their type realizations truthfully, and so we denote by $\widehat{\delta}_{i}(l)$ player $i$ ’s type bid on day $l.$ We allow for the player to construct its type bid on any day $l$ using all information available to it until day $l$ , and in accordance with any randomized, history-dependent policy of its choosing. Specifically, a second-stage bidding policy $\mu$ of player $i$ is a rule which specifies for each $o_{1}\in\mathcal{O}_{1}$ and each $l\in\mathbb{Z}_{+},$ a probability transition kernel $\mathbb{P}_{\mu}(\widehat{\delta}_{i}(l)\big{|}\delta_{i}^{l},\widehat{\delta}_{i}^{l-1},o_{2}^{l-1};o_{1})$ according to which player $i$ constructs its second-stage bid $\widehat{\delta}_{i}(l)$ on day $l$ if the first-stage outcome is $o_{1}$ . We denote by $\Pi_{i}$ the set of all second-stage bidding policies available to player $i.$

Note that the second stage bidding policy is a rule that maps the history of observations available to a player to its second-stage bid. While the outcome of the rule is random owing to the types and second-stage outcomes being random, there is nothing random about the rule itself. Consequently, a player without any loss of generality can choose its second-stage bidding policy right on day $1$ as a function of its supertype. This leads to the notion of the second-stage strategy which is described next.

II-C Second-stage strategy

A second-stage strategy of player $i$ is a function $\pi_{i}\mathrel{\mathop{\mathchar 58\relax}}\Theta\to\Pi_{i}$ which specifies the second-stage bidding policy that it employs as a function of its private supertype $\theta_{i}.$ Therefore, $\pi_{i}(\theta_{i})$ is the second-stage bidding policy employed by player $i.$

Once all players submit their type bids, the social planner computes the second-stage outcome for day $l$ as $o_{2}(l)=g_{2}^{*}(\boldsymbol{\sigma}(\boldsymbol{\theta}),\widehat{\boldsymbol{\delta}}(l))$ . Note that once the players’ first-stage and second-stage bidding policies are fixed, a functional relationship is established between the types and the type bids, and all random variables become well-defined.

II-D Strategies and Strategy profiles

We refer to the composition of the first- and second-stage strategies simply as a strategy. I.e, $S_{i}\coloneqq(\sigma_{i},{\pi}_{i})$ is referred to as the strategy of player $i.$ We denote by $\Lambda_{i}$ the set of strategies available to player $i.$ Finally, we refer to $\boldsymbol{S}\coloneqq(S_{1},\ldots,S_{n})$ as the strategy profile of the players and denote by $\Lambda$ the set of strategy profiles $\Lambda_{1}\times\ldots\times\Lambda_{n}$ .

II-E Truthful strategies

The stochasticity of the player types necessitates the definition of truthful strategy to be weaker than requiring a player to bid its type truthfully on all days.

Definition 1.

A strategy $S_{i}=(\sigma_{i},{\pi}_{i})$ of player $i$ , $i\in\{1,\ldots,n\},$ is truthful if

(i)

$\sigma_{i}(\theta)=\theta$ for every $\theta\in\Theta,$ and

(ii)

for every $\theta\in\Theta$ and every $o_{1}\in\mathcal{O}_{1},$ there exists $\mathcal{L}\subseteq\mathbb{Z}_{+}$ with $\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{l\in\mathcal{L}\}}=0$ such that for all $l\notin\mathcal{L},$

\mathbb{P}_{\pi_{i}(\theta)}(\widehat{\delta}_{i}(l)\big{|}\delta_{i}^{l},\widehat{\delta}_{i}^{l-1},o_{2}^{l-1};o_{1})=\mathds{1}_{\{\widehat{\delta}_{i}(l)=\delta_{i}(l)\}}.

A strategy profile $(S_{1},\ldots,S_{n})$ is a truthful strategy profile if $S_{i}$ is truthful for every $i\in\{1,\ldots,n\}.$

In other words, a strategy $S_{i}$ is truthful if the supertype bid is truthful and the type bid is truthful “almost all days.” We denote by $\mathcal{T}_{i}\subset\Lambda_{i}$ the set of all truthful strategies available to player $i.$

II-F Payments and utilities

The social planner collects a payment from each player at the end of each day that is determined as a function of the bids that they submit until that day. We denote by $p_{i,l}\mathrel{\mathop{\mathchar 58\relax}}\Theta_{1}\times\ldots\times\Theta_{n}\times\Delta_{1}^{l}\times\ldots\times\Delta_{n}^{l}\to\mathbb{R}$ the payment rule so that $p_{i,l}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}^{l})$ specifies the amount that player $i$ should pay on day $l$ . The utility accrued by player $i$ is defined as

\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\coloneqq\bigg{[}\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l)))-p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\bigg{]}.

(2)

Note that a player’s utility is a random variable that depends on the realization of the type sequence $\boldsymbol{\delta}^{\infty}.$

II-G Non-Bankrupting strategies

As mentioned in Section I, a “mild” behavioral assumption, one that is quite likely to hold in practice, is that no player behaves in a manner that might result in its own bankruptcy. This is captured by the notion of a non-bankrupting strategy.

Definition 2.

A strategy $S_{i}$ of player $i,$ $i\in\{1,\ldots,n\},$ is non-bankrupting if for all $(\boldsymbol{S}_{-i},\boldsymbol{\theta}),$

u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})>-\infty

for all $\boldsymbol{\delta}^{\infty},$ except perhaps on a set of probability zero.

A strategy profile $\boldsymbol{S}=(S_{1},\ldots,S_{n})$ is non-bankrupting if $S_{i}$ is non-bankrupting for every $i\in\{1,\ldots,n\}.$

We denote by $\mathcal{NB}_{i}$ the set of non-bankrupting strategies of player $i,$ by $\mathcal{NB}_{-i}$ the set of non-bankrupting strategy profiles of all players except player $i,$ and by $\mathcal{NB}$ the set of non-bankrupting strategy profiles of all players.

II-H Dominant Strategy Non-Bankrupting Equilibrium

We are now ready to introduce a notion of equilibrium that is “slightly” weaker than dominant strategy equilibrium.

Definition 3.

A strategy profile $\boldsymbol{S}=(S_{1},\ldots,S_{n})\in\mathcal{NB}$ is a Dominant Strategy Non-Bankrupting Equilibrium (DNBE) if for all $i\in\{1,\ldots,n\},$ all $S^{\prime}_{-i}\in\mathcal{NB}_{-i},$ all $S_{i}^{\prime}\in\Lambda_{i},$ and all $\boldsymbol{\theta},$

\displaystyle u_{i}(S_{i},S^{\prime}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\geq u_{i}(S^{\prime}_{i},S^{\prime}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})

(3)

for all $\boldsymbol{\delta}^{\infty},$ except perhaps on a set of probability zero.

It is perhaps instructive to contrast DNBE with Dominant Strategy Equilibrium (DSE) and Nash Equilibrium (NE) to gain a better appreciation of the notion. Note that for a strategy profile $\boldsymbol{S}$ to form a Nash equilibrium, it must hold for every $i\in\{1,\ldots,n\}$ that $S_{i}$ is a best response to $\boldsymbol{S}_{-i}.$ On the other hand, for the strategy profile $\boldsymbol{S}$ to form a DNBE, we must have for all $i\in\{1,\ldots,n\}$ that $S_{i}$ is a best response not only to $\boldsymbol{S}_{-i}$ , but also to all $\boldsymbol{S}^{\prime}_{-i}\in\mathcal{NB}_{-i}.$ It follows that any dominant strategy non-bankrupting equilibrium is also a Nash equilibrium but not vice-versa. The stronger notion of dominant strategy equilibrium requires for all $i\in\{1,\ldots,n\}$ that $S_{i}$ is a best response to every $\boldsymbol{S}^{\prime}_{-i}\in\Lambda_{-i},$ and not just to those in $\mathcal{NB}_{-i}$ as required by DNBE. Hence, any dominant strategy equilibrium is also a dominant strategy non-bankrupting equilibrium. Fig. 1 illustrates the hierarchy formed by these equilibrium notions.

Figure 1: Hierarchy of equilibrium notions. Any dominant strategy equilibrium is also a dominant strategy non-bankrupting equilibrium, and any dominant strategy non-bankrupting equilibrium is also a Nash equilibrium.

II-I Mechanism Design Problem

Arbitrarily fix the strategy profile $\boldsymbol{S}$ of the players. The long-term average social welfare that results from the game is

\displaystyle q(\boldsymbol{S},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\coloneqq\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{[}\sum_{i=1}^{n}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}(l)}))\bigg{]}-c(g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l))).

(4)

The objective of the social planner is to ensure that the average social welfare $q(\boldsymbol{S},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})$ equals the optimal value $W^{*}(\boldsymbol{\theta})$ that would result almost surely if all players employ a truthful strategy. However, the objective of each player $i$ is to maximize its own utility given by (2), and so it may not employ a truthful strategy if there is a possibility for it to accrue a higher utility by doing so than by employing a truthful strategy. This brings us to the mechanism design problem. We wish to design a payment rule $\{p_{i,l}\mathrel{\mathop{\mathchar 58\relax}}(i,l)\in\{1,\ldots,n\}\times\mathbb{Z}_{+}\}$ such that each player employing a truthful strategy is a Dominant Strategy Non-Bankrupting Equilibrium. The next section develops the mechanism and establishes the incentive and efficiency properties guaranteed by it.

III An Efficient and Incentive-Compatible Mechanism for Two-Stage Repeated Stochastic Games

For each $i\in\{1,\ldots,n\},$ the payment of player $i$ on any day $l$ consists of two components $p_{i}^{F}$ and $p_{i}^{S}$ that can be computed by the social planner at the end of the first and the second stages of the game respectively on day $l.$ These payment functions are defined next.

III-A First-stage payment

The first-stage payment $p_{i}^{F}$ is a function of only the first-stage bids of the players. Since these quantities remain the same on all days, so do the first-stage payments. The first-stage payment is simply the VCG payment and is defined as

\displaystyle p_{i}^{F}(\boldsymbol{\widehat{\theta}})\coloneqq W^{*}(\boldsymbol{\widehat{\theta}}_{-i})-\mathbb{E}_{\boldsymbol{\delta}\sim\mathbb{P}_{\boldsymbol{\widehat{\theta}}}}\bigg{[}\sum_{j\neq i}v_{j}(\delta_{j},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))-c(g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))\bigg{]},

(5)

where $\widehat{\boldsymbol{\theta}}_{-i}$ denotes the supertype bids of all players other than player $i$ .

III-B Second-stage payment

At a high level, the first functionality of the second-stage payment is to bind the first-stage and the second-stage strategies of the players. To achieve this, the second-stage payment rule compares the empirical frequencies of the players’ type bids with their supertype bids and penalizes discrepancies between them. To elaborate, denote by $\widehat{\theta}_{i}(t)$ the probability that a random variable distributed according to $\widehat{\theta}_{i}$ takes the value $t,$ $t\in\Delta.$ On each day $l$ and for each player $i,$ $(l,i)\in\mathbb{Z}_{+}\times\{1,\ldots,n\},$ the second-stage payment rule computes the discrepancy

\displaystyle\widehat{f}_{i,t}(l)\coloneqq\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\delta}_{i}(l^{\prime})=t\}}\bigg{]}-\widehat{\theta}_{i}(t)

(6)

for every $t\in\Delta$ , and imposes a penalty of $J_{p}(l)$ on player $i$ if $\widehat{f}_{i,t}(l)$ falls outside a window of size $r(l)$ for some $t\in\Delta,$ i.e., if

\displaystyle|\widehat{f}_{i,t}(l)|\geq r(l)

(7)

for some $t\in\Delta.$

In a setting of repeated playing, the sequence of second-stage outcomes serves as a source of common randomness which the players can potentially use to correlate their second-stage bids if there is a possibility for them to accrue a higher utility by doing so than by fabricating their bids independently of the other players’ bids. The second functionality of the second-stage payment is to disincentivize such strategies. Towards this end, on each day $l$ and for each player $i$ , $(l,i)\in\mathbb{Z}_{+}\times\{1,\ldots,n\},$ the second-stage payment rule computes

\displaystyle\widehat{h}_{i,\mathbf{d}}(l)\coloneqq\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\delta}_{i}(l^{\prime})=d_{i},\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\bigg{]}-\bigg{[}\widehat{\theta}_{i}(d_{i})\bigg{]}\bigg{[}\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\bigg{]}

(8)

for every $\mathbf{d}\in\Delta^{n},$ and imposes a penalty of $J_{p}(l)$ on player $i$ if it falls outside a window of size $r(l)$ for some $\mathbf{d}\in\Delta^{n},$ i.e., if

\displaystyle|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)

(9)

for some $\mathbf{d}\in\Delta^{n}.$

How should the window size sequence $\{r\}$ be chosen? On the one hand, the window size $r(l)$ must tend to zero as $l$ tends to infinity for otherwise, the set of sequences $\{\boldsymbol{\widehat{\delta}}\}$ that satisfy (7) and (9) would be “large,” thereby violating incentive compatibility. On the other hand, if $\{r\}$ decays too quickly, then even truthful type bids would violate (7) and (9) infinitely often, thereby incurring a large penalty and violating individual rationality. Hence, the sequence $\{r\}$ should be chosen in a manner that balances the two objectives. This is achieved by choosing $\{r\}$ such that

\displaystyle\lim_{l\to\infty}r(l)=0,

(10)

and for some $\gamma>0,$

\displaystyle r(l)\geq\sqrt{\frac{\ln{2l^{1+\gamma}}}{2l}}

(11)

for all $l\in\mathbb{Z}_{+}.$ ¹¹1It suffices that (11) holds not for all $l$ but only for all sufficiently large $l$ .

To obtain an intuition for condition (11), note that the empirical frequency $\frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\delta_{i}(l^{\prime})=t\}}$ resulting from the true type sequence of player $i$ is a random variable with mean $\theta_{i}(t)$ and standard deviation that scales as ${{1}/\sqrt{l}}.$ Therefore, if the window size decays at the same rate, then the probability of the empirical frequency falling outside the window would remain at a constant value. This suggests that the window size must scale slower than at least ${{1}/\sqrt{l}}.$ By scaling the window size only slightly slower than ${{1}/\sqrt{l}}$ , namely the rate specified by condition (11), truthful bids are guaranteed to almost surely satisfy (7) and (9) for all but finitely many values of $l$ . This is established in Lemma 1.

How should the penalty sequence $\{J_{p}\}$ be chosen? As shown in Lemma 1, truthful players incur a penalty only finitely often almost surely, and so the long-term average penalty that they incur is almost surely zero regardless of how the sequence $\{J_{p}\}$ is chosen. Therefore, the only objective in the design of $\{J_{p}\}$ is for every non-truthful strategy to incur a sufficiently high penalty. This is accomplished by choosing $\{J_{p}\}$ to be any nonnegative sequence such that

\displaystyle\lim_{l\to\infty}\frac{J_{p}(l)}{l}=\infty.

(12)

We now have the necessary quantities to define the second-stage payment function. Define the event

\displaystyle{E}_{i,\boldsymbol{S}}(l)\coloneqq{\{\max_{t\in\Delta}\;|\widehat{f}_{i,t}(l)|\geq r(l)\;\cup\;\max_{\mathbf{d}\in\Delta^{n}}|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\}}

(13)

which denotes the occurrence of at least one of (7) and (9). The second-stage payment of player $i$ on day $l$ is defined as

\displaystyle p_{i,l}^{S}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\coloneqq\Bigg{[}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-\mathbb{E}_{\boldsymbol{\delta}\sim\mathbb{P}_{\boldsymbol{\widehat{\theta}}}}\big{[}v_{i}(\delta_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\delta}))\big{]}\Bigg{]}+J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}.

(14)

A negative value of the above quantity implies a transfer from the social planner to player $i$ on day $l.$ Note that if all players employ a truthful strategy, then the long-term average second-stage payment almost surely equals zero for every player.

The total payment $p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})$ that player $i$ transfers to the social planner on day $l$ is

\displaystyle p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l})\coloneqq p_{i}^{F}(\boldsymbol{\widehat{\theta}})+p_{i,l}^{S}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l}).

(15)

The following theorem establishes the incentive and optimality guarantees of the mechanism.

Theorem 1.

Consider the two-stage repeated stochastic game induced by the payment rule (15).

1.

A Truthful strategy profile is a dominant strategy non-bankrupting equilibrium.
2.

If for every $i\in\{1,\ldots,n\}$ and every $\boldsymbol{\theta},$

$\displaystyle W^{*}(\boldsymbol{\theta})-W^{*}(\boldsymbol{\theta}_{-i})\geq 0,$ (16)

then every player obtains a nonnegative utility by employing a truthful strategy regardless of the strategies that the other players employ.
3.

If every player employs a truthful strategy, then the long-term average social welfare (4) that results is almost surely equal to its optimal value $W^{*}(\boldsymbol{\theta}).$

Proof.

Arbitrarily fix $\boldsymbol{\theta},$ $i\in\{1,\ldots,n\},$ the strategy $S_{i}\in\Lambda_{i}$ that player $i$ employs, and the strategy profile $\boldsymbol{S}_{-i}\in\mathcal{NB}_{-i}$ that all other players employ. We begin with a lemma.

Lemma 1.

For $T_{i}\in\mathcal{T}_{i},$ it holds almost surely that

\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,(T_{i},\boldsymbol{S}_{-i})}(l)\}}=0.

(17)

I.e., if player $i$ employs a truthful strategy, then the penalty that it pays is almost surely zero.

Proof.

It suffices to show that $\{E_{i,(T_{i},\boldsymbol{S}_{-i})}(l)\}$ almost surely occurs only finitely often. Arbitrarily fix $\mathbf{d}\in\Delta^{n}$ . Define $\mathcal{F}_{l}\coloneqq\sigma(\widehat{\boldsymbol{\delta}}_{-i}^{l},\delta_{i}^{l-1})$ so that $\big{(}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]},\mathcal{F}_{l^{\prime}+1}\big{)}$ is a martingale difference sequence bounded by unity. It follows from the Azuma-Hoeffding inequality that

\displaystyle\mathbb{P}\big{(}\big{|}\ \frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]}\big{|}\geq r(l)\big{)}\leq 2e^{-2lr^{2}(l)}.

(18)

Combining the above inequality with (11) implies

\displaystyle\mathbb{P}\big{(}\big{|}\ \frac{1}{l}\sum_{l^{\prime}=1}^{l}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-i}(l^{\prime})=\mathbf{d}_{-i}\}}\big{[}\mathds{1}_{\{\delta_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i})\big{]}\big{|}\geq r(l)\big{)}\leq\frac{1}{l^{1+\gamma}}.

(19)

Using (8) and the fact that player $i$ employs a truthful strategy, the above inequality implies

\displaystyle\mathbb{P}\big{(}\big{|}\widehat{h}_{i,\mathbf{d}}(l)\big{|}\geq r(l)\big{)}\leq\frac{1}{l^{1+\gamma}}

(20)

which in turn implies that $\sum_{l=1}^{\infty}\mathbb{P}\big{(}\big{|}\widehat{h}_{i,\mathbf{d}}(l)\big{|}\geq r(l)\big{)}<\infty.$ Invoking the Borel-Cantelli lemma, we have that $\{|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\}$ almost surely occurs only finitely often.

Similarly, $(\mathds{1}_{\{{\delta}_{i}(l^{\prime})=d_{i}\}}-\theta_{i}(d_{i}),\mathcal{F}_{l^{\prime}+1})$ is a martingale difference sequence bounded by unity and following the same sequence of arguments as above, it can be established that $\{|\widehat{f}_{i,d_{i}}(l)|\geq r(l)\}$ almost surely occurs only finitely often.

Since $\mathbf{d}$ is arbitrarily chosen, we have that for every $\mathbf{d}\in\Delta^{n},$ $\{|\widehat{h}_{i,\mathbf{d}}(l)|\geq r(l)\}$ and $\{|\widehat{f}_{i,d_{i}}(l)|\geq r(l)\}$ almost surely occur only finitely often, and the desired result follows. ∎

We have

\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})

\displaystyle=\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}({\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-p_{i,l}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}^{l}),

where $\boldsymbol{\widehat{\theta}}$ and $\boldsymbol{\widehat{\delta}}^{\infty}$ are determined in accordance with $\boldsymbol{S}$ . Substituting (5) and (14) into (15), substituting the resulting expression for $p_{i}(l)$ into the above equality, and simplifying the result yields

$\displaystyle u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})$	$\displaystyle=\big{[}W^{}(\boldsymbol{\widehat{\theta}})-W^{}(\boldsymbol{\widehat{\theta}}_{-i})\big{]}$
	$\displaystyle+\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{(}v_{i}({\delta}_{i}(l),g_{1}^{}(\boldsymbol{\widehat{\theta}}),g_{2}^{}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-v_{i}(\widehat{\delta}_{i}(l),g_{1}^{}(\boldsymbol{\widehat{\theta}}),g_{2}^{}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))\bigg{)}$
	$\displaystyle-\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}.$	(21)

Arbitrarily fix $T_{i}\in\mathcal{T}_{i}$ . Then, we obtain using Lemma 1 and some straightforward algebra that

$\displaystyle u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})-u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})$	$\displaystyle=\big{[}W^{}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\big{]}$
	$\displaystyle+\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\bigg{(}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{}(\boldsymbol{\widehat{\theta}}),g_{2}^{}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))-v_{i}({\delta}_{i}(l),g_{1}^{}(\boldsymbol{\widehat{\theta}}),g_{2}^{}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))\bigg{)}$
	$\displaystyle+\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}.$	(22)

In what follows, we show that the above quantity is almost surely nonnegative, implying that truthful strategy profiles are Dominant Strategy Non-Bankrupting Equilibria.

Define

\displaystyle\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\coloneqq\mathbb{E}_{(\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i})\sim\widehat{\theta}_{i}\times\boldsymbol{\widehat{\theta}}_{-i}}\bigg{[}v_{i}\big{(}\widehat{\delta}_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}\bigg{]}

(23)

and

\displaystyle\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})\coloneqq\mathbb{E}_{(\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i})\sim\widehat{\theta}_{i}\times\boldsymbol{\widehat{\theta}}_{-i}}\bigg{[}\sum_{j\neq i}v_{j}\big{(}\widehat{\delta}_{j},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}-c\big{(}g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}})\big{)}\bigg{]}

(24)

so that

\displaystyle W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})=\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}).

(25)

Let $\mu(\Delta,\Delta^{n})$ be the set of joint probability mass functions over $\Delta\times\Delta^{n}.$ For $\psi\in\mu(\Delta,\Delta^{n}),$ define

\displaystyle\rho_{i}(\psi)\coloneqq\mathbb{E}_{({\delta}_{i},[\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i}])\sim\psi}\bigg{[}v_{i}({\delta}_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}))\bigg{]}.

(26)

Let $\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})\subset\mu(\Delta,\Delta^{n})$ be the set of joint probability mass functions with “ $x-$ marginal” distributed according to $\theta_{i}$ and “ $y-$ marginal” distributed according to ${\widehat{\theta}_{1}\times\ldots\times\widehat{\theta}_{n}}.$ Then, for every $\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}}),$

\displaystyle W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})\geq\rho_{i}(\psi)+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}).

(27)

To see this, note that if $(\delta_{i},\boldsymbol{\delta}_{-i})\sim\theta_{i}\times\boldsymbol{\widehat{\theta}}_{-i}$ , then the social planner can map $\delta_{i}$ to a random variable $\delta_{i}^{\prime}$ using an appropriate probability transition kernel $P_{\delta_{i}^{\prime}|\boldsymbol{\delta}}$ such that $(\delta_{i},[\delta_{i}^{\prime},\boldsymbol{\delta}_{-i}])\sim\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}}).$ Consequently, by choosing the first-stage outcome as $g_{1}^{*}(\boldsymbol{\widehat{\theta}})$ and the second-stage outcome as $g_{2}^{*}(\boldsymbol{\widehat{\theta}},[{\delta}_{i}^{\prime},\boldsymbol{\delta}_{-i}])$ , an expected social welfare of $\rho_{i}(\psi)+\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})$ can be attained. It follows that the optimal expected social welfare $W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})$ is at least as large, which yields (27)²²2This argument requires the second-stage decision rule to be randomized whereas we have assumed $g_{1}^{*}$ and $g_{2}^{*}$ to be deterministic functions. This apparent gap can be addressed by noting that an optimal decision rule $(g_{1}^{*},g_{2}^{*})$ can be found within the class of deterministic functions..

Suppose for a moment that each player $j\in\{1,\ldots,n\}$ employs a stationary second-stage bidding policy $\mu_{S}^{j}$ so that $\widehat{\delta}_{j}(l)$ is chosen as a function of $\delta_{j}(l)$ according to some probability kernel $P^{j}_{\widehat{\delta}_{j}|\delta_{j}}$ for every $l.$ For player $j$ ’s strategy to be non-bankrupting, it is necessary that $\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=t\}}=\widehat{\theta}_{j}(t)$ almost surely for every $t\in\Delta$ for (7) would be violated infinitely often otherwise resulting in infinite average penalty. So, for every $j\in\{1,\ldots,n\},$ if player $j$ ’s strategy is to be non-bankrupting, then $P^{j}_{\widehat{\delta}_{j}|\delta_{j}}$ must be such that $\widehat{\delta}_{j}(1)\sim\widehat{\theta}_{j}$ given $\delta_{j}(1)\sim\theta_{j}$ . It follows that for every $j\in\{1,\ldots,n\},$ $(\delta_{j}(1),\boldsymbol{\widehat{\delta}}(1))\sim\psi_{j}$ for some $\psi_{j}\in\Psi(\theta_{j},\boldsymbol{\widehat{\theta}})$ . It also follows that $\{(\boldsymbol{\delta}(1),\boldsymbol{\widehat{\delta}}(1)),(\boldsymbol{\delta}(2),\boldsymbol{\widehat{\delta}}(2)),\ldots\}$ is a sequence of IID random variables, and so we obtain using the Strong Law of Large Numbers (SLLN) that the RHS of (22) almost surely equals $[W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})]+[\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})-\rho_{i}(\psi_{i})].$ Upon substituting (25), this becomes $W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})-\rho_{i}(\psi_{i}),$ and combining it with (27) implies the nonnegativity of (22).

However, in order to fabricate the type bids, the players may not restrict just to stationary policies but can employ any history-dependent policy. The rest of the proof is devoted to showing that the same result, namely, the nonnegativity of (22), holds even in the general case where the players may employ any non-bankrupting strategy. The key to establishing this is the following lemma that characterizes the empirical joint distributions of the reported types when all players employ a non-bankrupting strategy.

Lemma 2.

Suppose that for every $j\in\{1,\ldots,n\},$

\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty.

(28)

Then, for every $\mathbf{d}\in\Delta^{n},$

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\boldsymbol{\widehat{\delta}}(l)=\mathbf{d}\}}=\Pi_{j=1}^{n}\widehat{\theta}_{j}(d_{j}).

(29)

Proof.

It suffices to show that for all $\mathbf{d}\in\Delta^{n}$ and all $k\in\{1,\ldots,n-1\},$

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}

(30)

and that

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{{\delta}}_{n}(l)={d}_{n}\}}=\widehat{\theta}_{n}(d_{n}),

(31)

where $\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\coloneqq[d_{k}\;d_{k+1}\;\ldots\;d_{n}]$ and $\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)$ is defined likewise.

Combining (28) with (12) implies that $\limsup_{L\to\infty}\sum_{l=1}^{L}\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty$ for every $j\in\{1,\ldots,n\}$ . I.e., the event sequence $\{E_{j,\boldsymbol{S}}(l)\}$ occurs only finitely often. Hence, we obtain using (13) and (10) that for all $\mathbf{d}\in\Delta^{n}$ and all $j\in\{1,\ldots,n\},$

\displaystyle\lim_{L\to\infty}\widehat{f}_{j,d_{j}}(L)=0

(32)

and

\displaystyle\lim_{L\to\infty}\widehat{h}_{j,\mathbf{d}}(L)=0.

(33)

Substituting (6) in (32) implies

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=d_{j}\}}=\widehat{\theta}_{j}(d_{j})

(34)

for all ${d}_{j}\in\Delta$ and all $j\in\{1,\ldots,n\},$ which in particular establishes (31).

Substituting (8) in (33) implies

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{j}(l)=d_{j},\widehat{\boldsymbol{\delta}}_{-j}(l)=\mathbf{d}_{-j}\}}=\widehat{\theta}_{j}(d_{j})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{-j}(l)=\mathbf{d}_{-j}\}}\bigg{]}

(35)

for all $\mathbf{d}\in\Delta^{n}$ and all $j\in\{1,\ldots,n\}$ . In concluding (35), we have assumed that the limit in the RHS exists, to justify which certain additional arguments are required. We omit these details since they might lessen the focus on the main aspects of the proof.

The equality (30) can now established by noting that

$\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}$	$\displaystyle=\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\sum_{t_{1},\ldots,t_{k-1}}\mathds{1}_{\{{\widehat{\delta}}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}$
	$\displaystyle=\sum_{t_{1},\ldots,t_{k-1}}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k\mathrel{\mathop{\mathchar 58\relax}}n}\}}$
	$\displaystyle=\sum_{t_{1},\ldots,t_{k-1}}\bigg{[}\widehat{\theta}_{k}(d_{k})\bigg{]}\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}$
	$\displaystyle=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\sum_{t_{1},\ldots,t_{k-1}}\mathds{1}_{\{\widehat{\delta}_{1}(l)=t_{1},\ldots,\widehat{\delta}_{k-1}(l)=t_{k-1},\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]}$
	$\displaystyle=\widehat{\theta}_{k}(d_{k})\bigg{[}\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{\widehat{\boldsymbol{\delta}}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}(l)=\mathbf{d}_{k+1\mathrel{\mathop{\mathchar 58\relax}}n}\}}\bigg{]},$	(36)

where the third equality follows from (35). ∎

It follows from (12) that $\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{i,\boldsymbol{S}}(l)\}}$ can only take values $0$ and $\infty.$ In the latter case, the nonnegativity of (22) is immediate. In the former case, since $\boldsymbol{S}_{-i}$ is a non-bankrupting strategy profile, we have that for all $j\in\{1,\ldots,n\},$

\displaystyle\limsup_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}J_{p}(l)\mathds{1}_{\{E_{j,\boldsymbol{S}}(l)\}}<\infty

(37)

almost surely. Consequently, Lemma 2 applies, and we get

\displaystyle\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\widehat{\delta}_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))=\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i}).

(38)

Now, consider the empirical joint distribution $\psi_{L}(d,\mathbf{\widehat{d}})\coloneqq\frac{1}{L}\sum_{l=1}^{L}\mathds{1}_{\{{\delta}_{i}(l)={d},\widehat{\boldsymbol{\delta}}(l)=\widehat{\mathbf{d}}\}},$ where ${d}\in\Delta$ and $\widehat{\mathbf{d}}\in\Delta^{n}.$ Note that $\psi_{L}\in\mu(\Delta,\Delta^{n})$ for all $L\in\mathbb{Z}_{+}.$ It follows from SLLN that for any $d\in\Delta,$ $\lim_{L\to\infty}\sum_{\mathbf{\widehat{d}}}\psi_{L}(d,\widehat{\mathbf{d}})=\theta_{i}(d).$ Since (37) holds, we obtain using Lemma 2 that for any $\mathbf{\widehat{d}}\in\Delta^{n},$ $\lim_{L\to\infty}\sum_{{{d}}}\psi_{L}(d,\widehat{\mathbf{d}})=\Pi_{j=1}^{n}\widehat{\theta}_{j}(\widehat{{d}_{j}}).$ I.e., the sequence $\{\psi_{L}\}$ of empirical joint distributions is such that its x-marginal approaches the distribution $\theta_{i}$ and its y-marginal approaches the distribution ${\widehat{\theta}_{1}\times\ldots\times\widehat{\theta}_{n}}.$ It can be shown as a consequence that $\{\psi_{L}\}$ approaches the set $\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})$ in that $\min_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}||\psi-\psi_{L}||\to 0$ as $L\to\infty,$ where $||\cdot||$ can be any norm defined on the set $\mu(\Delta,\Delta^{n}).$ Also, the function $\rho_{i}\mathrel{\mathop{\mathchar 58\relax}}\mu(\Delta,\Delta^{n})\to\mathbb{R}$ defined in (26) is a continuous function over a compact set, and hence uniformly continuous. It follows that

\displaystyle\liminf_{L\to\infty}\rho_{i}(\psi_{L})\leq\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi).

(39)

Note also that $\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}(l)))=\mathbb{E}_{(\delta_{i},[\widehat{\delta}_{i},\widehat{\boldsymbol{\delta}}_{-i}])\sim\psi_{L}}[v_{i}(\delta_{i},g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\widehat{\boldsymbol{\delta}}))]=\rho_{i}(\psi_{L}).$ Taking the limit as $L\to\infty$ and using (39) implies

\displaystyle\liminf_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}v_{i}(\delta_{i}(l),g_{1}^{*}(\boldsymbol{\widehat{\theta}}),g_{2}^{*}(\boldsymbol{\widehat{\theta}},\boldsymbol{\widehat{\delta}}(l)))\leq\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi).

(40)

Substituting (38) and (40) in (22) yields

\displaystyle u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})-u_{i}(S_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})\geq[W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}}_{-i})]+\nu_{i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}_{-i}})-\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi).

Upon substituting (25), the RHS of the above inequality becomes $W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-\nu_{-i}(\widehat{\theta}_{i},\boldsymbol{\widehat{\theta}_{-i}})-\sup_{\psi\in\Psi(\theta_{i},\boldsymbol{\widehat{\theta}})}\rho_{i}(\psi).$ Combining this with (27) implies its nonnegativity, thereby establishing the nonnegativity of (22).

We now prove the second statement of the theorem. Arbitrarily fix $\boldsymbol{S}_{-i}\in\Lambda_{-i}$ and $T_{i}\in\mathcal{T}_{i}.$ Using (15), (2) and Lemma 1, we obtain almost surely that $u_{i}(T_{i},\boldsymbol{S}_{-i},\boldsymbol{\theta},\boldsymbol{\delta}^{\infty})=\big{[}W^{*}(\theta_{i},\boldsymbol{\widehat{\theta}}_{-i})-W^{*}(\boldsymbol{\widehat{\theta}}_{-i})\big{]}\geq 0$ , where the inequality follows from (16). Hence, truth-telling is individually rational for every player.

That the mechanism maximizes social welfare under truthful bidding is a straightforward consequence of the optimality of the first- and the second-stage decision rules. ∎

The following section describes an application of the mechanism to the design of demand response markets.

IV Application to Demand Response Markets

As mentioned in Section I, one of the motivating reasons for introducing the environment of a two-stage repeated stochastic game is its ability to readily model many problems that arise in the context of next-generation electricity markets. We illustrate one such problem in this section, namely, mechanism design for demand response markets. In addition to illustrating an application of the proposed framework, the results of this section also serve to illustrate the benefits of using the proposed mechanism as opposed to other “natural” mechanisms that a policy-maker might employ in such scenarios.

One of the main requirements of power systems operations is that the power supply has to equal the random demand at each time instant. In conventional systems, the power supply can be controlled, and so the generation is continuously adjusted to follow the random demand to maintain balance. However, at deep levels of renewable energy penetration, the generation becomes random. A popular paradigm for maintaining demand-supply balance in such a system is to make the demand follow the random supply. This typically involves curtailing consumption during times of power supply shortage. This is referred to as demand response, and is achieved by using incentives to modulate the demand.

One of the key challenges in implementing demand response is that in order to optimally allocate a desired consumption reduction among the demand response providers, their costs for curtailing consumption must be known, which are in general random and private to the loads, and which they could misreport to achieve more favorable allocations for themselves. The goal of the mechanism designer is to elicit both the probability distribution and the realization of the private costs truthfully. See [4] for more details. In what follows, we describe how the mechanism developed in the previous section can be applied to this problem.

In this section, we overload certain notation. Specifically, whenever a demand response market-specific quantity maps to a two-stage repeated stochastic game-specific quantity, the former will be denoted using the same symbol that has been used for the latter.

Consider a system consisting of $n$ Demand Response (DR) providers and a reserve generator. Each DR provider has a cost function that specifies the cost it incurs as a function of its power consumption reduction. We assume that the cost function is parameterizable and denote by $\delta_{i}(l)$ the parameter that specifies the cost function of DR provider $i$ on day $l$ . Hence, $c(x,\delta_{i}(l))$ denotes the cost that DR provider $i$ incurs on day $l$ for curtailing its consumption by $x$ units from its baseline. The sequence $\boldsymbol{\delta}^{\infty}$ is IID with $\boldsymbol{\delta}(1)\sim\boldsymbol{\theta}\coloneqq\theta_{1}\times\ldots\times\theta_{n}$ where $\theta_{i}$ denotes the probability distribution of ${\delta}_{i}(1).$ The reserve generator has associated with it a production function $c_{s}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}\to\mathbb{R}$ which specifies the cost it incurs as a function of the power that it produces.

Denote by $d(l)$ the power shortage on day $l.$ The system operator wishes to minimize the social cost of compensating the shortage, and therefore wishes to determine the consumption reduction of the DR providers and the reserve generation on day $l$ as

	$\displaystyle(\mathbf{x}^{}(\boldsymbol{\delta}(l)),g_{s}^{}(\boldsymbol{\delta}(l)))=\operatorname*{arg\,min}_{x_{1},\ldots,x_{n},g_{s}}\quad$	$\displaystyle\sum_{i=1}^{n}c(x_{i},\delta_{i}(l))+c_{s}(g_{s})$		(41)
	$\displaystyle\mathrm{subject\;to}\quad$	$\displaystyle\sum_{i=1}^{n}x_{i}+g_{s}=d(l).$

The problem of course is that the system operator does not know $\{\delta_{1}(l),\ldots,\delta_{n}(l)\},$ and so it requests the DR providers to bid their cost functions. Denote by $\widehat{\delta}_{i}(l)$ the parameter that DR provider $i$ bids on day $l$ . The system operator computes $\mathbf{x}^{*}(\widehat{\boldsymbol{\delta}}(l))$ and pays each DR load $i$ a payment $p_{i}(l)$ on day $l$ for reducing its consumption by $x_{i}^{*}(\boldsymbol{\widehat{\delta}}(l)).$ The average utility that DR provider $i$ accrues is defined as

\displaystyle u_{i}^{\infty}\coloneqq\lim_{L\to\infty}\frac{1}{L}\sum_{l=1}^{L}p_{i}(l)-c(x_{i}^{*}(\widehat{\boldsymbol{\delta}}(l)),\delta_{i}(l)).

(42)

Refer to caption — Figure 2: Social cost vs. the number of DR providers. The larger the number of participants in the demand response program, the lower the social cost of the program.

It is straightforward to see that the average utility of each DR provider is a function of not only its own bidding strategy, but also the bidding strategy of the other DR loads. Consequently, a DR provider may not bid its cost truthfully if there is a possibility for it to obtain a larger utility by misreporting its cost. This in turn could result in the demand response program operating in a manner that is not social cost-minimizing. This motivates the mechanism design problem. The mechanism presented in the previous section can be used to design a payment rule which results in truth-telling being a dominant strategy non-bankrupting equilibrium.

For our numerical study, we have taken $c(x,\delta_{i})=\frac{\delta_{i}}{2}x^{2}$ , $c_{s}(x,\delta_{s})=\frac{\delta_{s}}{2}x^{2},$ $\boldsymbol{\theta}$ to be a product of beta distributions of unit mean and variance $2$ , and $\delta_{s}(l)$ to also be beta distributed with the same parameters.

Fig. 2 quantifies how the social cost reduces as the participation of DR providers increases. Fig. 3 illustrates how the payment resulting from the proposed mechanism behaves from the point of view of a randomly chosen DR provider. Specifically, we fix the cost function of a randomly chosen DR provider and plot how its average payment varies with the mean of the costs of the other DR providers. Qualitatively, the higher the mean cost of a DR provider, the higher the inelasticity of its demand. Hence, Fig. 3 quantifies the rate at which the payment received by a given DR load increases as a function of the inelasticity of the other DR providers.

An arguably natural alternative for the proposed mechanism is the posted price mechanism wherein the system operator announces the payment $p_{pp}$ that the DR providers would receive per unit reduction in their power consumption. Each DR provider $i$ then chooses its curtailment $x_{i,pp}^{*}(l)$ on day $l$ as ${x}_{i,pp}^{*}(l)=\operatorname*{arg\,min}_{x}\;c(x,\delta_{i}(l))-p_{pp}x.$ The residual mismatch $d(l)-\sum_{i=1}^{n}x_{i,pp}^{*}(l)=\mathrel{\mathop{\mathchar 58\relax}}g_{s}(l)$ is purchased in the spot market at price $c_{s}(x,\delta_{s}(l))=\frac{\delta_{s}(l)}{2}g^{2}_{s}(l).$ Such a mechanism has been employed, for example, in a prior demand response trial in the United Kingdom.

How do such “simple,” “natural” alternatives compare with the proposed mechanism? Fig. 4 compares the social cost attained by the proposed mechanism with the social cost attained by the posted price mechanism. Certain important observations are in order. First, note that there exists a price point at which the posted price mechanism attains its minimum social cost. However, this price point is a function of the type distributions of the DR loads which are their private knowledge. This necessitates the system operator to perform price discovery in order to compute the optimal price point — a process that is vulnerable to strategic manipulation by the DR providers. Secondly, even assuming that the DR providers do not manipulate the price discovery, the minimum social cost that can be attained by the posted price mechanism is in general strictly larger than what can be attained by employing the proposed mechanism.

V Related Work

The setting of two-stage stochastic games was introduced in [1] which considers a one-shot setting and develops a mechanism that renders truthful bidding a sequential ex post Nash equilibrium. Reference [5] considers a two-stage game setting to model electricity markets consisting of wind power producers and develops a mechanism that incentivizes truthful bidding. However, it assumes that it is only in the first stage of the game that the wind power producers can bid strategically, and not in the second stage. In contrast, the setting that we have considered assumes that the valuation function distribution and the valuation function realization are private to the players, and that they can misreport either or both of them to accrue a higher utility. Reference [6] presents a two-stage mechanism called the generalized Groves mechanism. In terms of the terminology and the framework presented in this paper, the setting in [6] can be interpreted as each player having a privately known distribution of its valuation function which it is required to bid to the social planner. The joint distribution of the players’ valuation functions is assumed to be common knowledge. The social planner chooses an outcome that maximizes the expected social welfare based on the bids. After the social planner chooses the outcome, the valuation functions realize, which the players are required to bid in the second stage. Following this, a final payment is made. The payment rule guarantees that truth-telling by all players is an ex post Nash equilibrium. It is important to recognize that it is only the payment rule that has two stages in the aforementioned setting, and not the game itself. This in fact is one of the key departures of the one-shot two-stage stochastic game setting from the setting considered in [6]; the latter doesn’t include the possibility for the social planner to take recourse actions after the valuation functions realize. In the context of electricity markets, not only is it feasible to take recourse actions, it is also imperative to take recourse actions if grid stability is to be maintained. Reference [7] builds upon the mechanism proposed in [6] to devise a two-stage mechanism for bilateral trade. A power system offering a demand response program is considered in [8, 9] and a two-stage mechanism is presented using which a certain quantity of power can be apportioned among the loads when a demand response event occurs. The first stage establishes a contingency plan that specifies the amount of power that would be supplied to each load in each contingency and the corresponding price, and the second stage, during which the contingency occurs, allows the loads to trade among themselves at the price established in the first stage. It is shown that the second stage trade results in an allocation that Pareto dominates the first-stage allocation. All of the aforementioned papers consider a one-shot game whereas the setting that we have considered is one of repeated plays. As mentioned in Section I, the aspect of repeated plays introduces certain additional complexities for mechanism design that can be attributed to the availability history-dependent bidding strategies to players. A similar challenge manifests in dynamic games. References [10, 11, 12, 13, 14] are some of the papers that address the problem of mechanism design for dynamic games. The solution concept adopted in most of the literature on dynamic games is ex post Nash equilibrium or variants thereof. With the exception of certain special cases such as in [14], to the best of our knowledge, we are unaware of any other work that tries to surpass Nash equilibrium or its variants and implement truth-telling in stronger notions of equilibria for broad classes of repeated or dynamic games. A generously disposed view of the present paper could be as an attempt in that direction.

VI Conclusion

We have considered two-stage repeated stochastic games wherein private information is revealed over two stages and the social planner is constrained to make a decision in each stage. The setting models many important problems that arise in next-generation electricity markets. Recognizing the limitation of Nash equilibria in molding real-world behavior, we have introduced the notion of a dominant strategy non-bankrupting equilibrium which requires the players to make very little assumptions about the behaviors of the other players to employ their equilibrium strategy. Consequently, a mechanism that implements a certain desired behavior as a dominant strategy non-bankrupting equilibrium could effectively mold real-world behavior along the desired lines. We have developed a mechanism for two-stage repeated stochastic games that implements truth-telling as a DNBE. The mechanism is also individually rational and maximizes social welfare.

References

[1] S. Ieong, A. M.-C. So, and M. Sundararajan, “Stochastic mechanism design,” in International Workshop on Web and Internet Economics. Springer, 2007, pp. 269–280.
[2] D. Bergemann and J. Välimäki, “Dynamic mechanism design: An introduction,” Journal of Economic Literature, vol. 57, no. 2, pp. 235–74, 2019.
[3] B. Satchidanandan and M. A. Dahleh, “An efficient and incentive-compatible mechanism for energy storage markets,” arXiv preprint arXiv:2012.11540, 2020.
[4] B. Satchidanandan, M. Roozbehani, and M. A. Dahleh, “A two-stage mechanism for demand response markets,” IEEE Control Systems Letters, vol. 7, pp. 49–54, 2023.
[5] W. Tang and R. Jain, “Market mechanisms for buying random wind,” IEEE Transactions on Sustainable Energy, vol. 6, no. 4, pp. 1615–1623, 2015.
[6] C. Mezzetti, “Mechanism design with interdependent valuations: Efficiency,” Econometrica, vol. 72, no. 5, pp. 1617–1626, 2004.
[7] T. Kunimoto and C. Zhang, “Efficient Bilateral Trade with Interdependent Values: The Use of Two-Stage Mechanisms,” Singapore Management University, School of Economics, Economics and Statistics Working Papers 14-2020, May 2020. [Online]. Available: https://ideas.repec.org/p/ris/smuesw/2020_014.html
[8] J. A. Doucet, K. J. Min, M. Roland, and T. Strauss, “A two-stage mechanism to improve electricity rationing,” The Canadian Journal of Economics / Revue canadienne d’Economique, vol. 29, pp. S270–S275, 1996. [Online]. Available: http://www.jstor.org/stable/135999
[9] J. A. Doucet, K. Jo Min, M. Roland, and T. Strauss, “Electricity rationing through a two-stage mechanism,” Energy Economics, vol. 18, no. 3, pp. 247–263, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/014098839600014X
[10] A. Pavan, I. Segal, and J. Toikka, “Dynamic mechanism design: A myersonian approach,” Econometrica, vol. 82, no. 2, pp. 601–653, 2014.
[11] D. P. Baron and D. Besanko, “Regulation and information in a continuing relationship,” Information Economics and policy, vol. 1, no. 3, pp. 267–302, 1984.
[12] M. Battaglini, “Long-term contracting with markovian consumers,” American Economic Review, vol. 95, no. 3, pp. 637–658, 2005.
[13] D. Bergemann and J. Valimaki, “Efficient dynamic auctions,” Cowles Foundation Discussion Paper, no. 1584, 2006.
[14] K. Ma and P. R. Kumar, “Incentive compatibility in stochastic dynamic systems,” IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 651–666, 2020.