This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\intervalconfig

soft open fences

A Nash Equilibrium Solution for Periodic Double Auctions

Bharat Manvi1 and Easwar Subramanian2 1bharat.manvi@tcs.com,2easwar.subramanian@tcs.com; TCS Innovation Labs, Hyderabad, India.©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract

We consider a periodic double auction (PDA) setting where buyers of the auction have multiple (but finite) opportunities to procure multiple but fixed units of a commodity. The goal of each buyer participating in such auctions is to reduce their cost of procurement by planning their purchase across multiple rounds of the PDA. Formulating such optimal bidding strategies in a multi-agent periodic double auction setting is a challenging problem as such strategies involve planning across current and future auctions. In this work, we consider one such setup wherein the composite supply curve is known to all buyers. Specifically, for the complete information setting, we model the PDA as a Markov game and derive Markov perfect Nash equilibrium (MPNE) solution to devise an optimal bidding strategy for the case when each buyer is allowed to make one bid per round of the PDA. Thereafter, the efficacy of the Nash policies obtained is demonstrated with numerical experiments.

I Introduction

Auctions are mechanisms that facilitate buying and selling of goods between market participants. A double auction consists of multiple buyers and sellers submitting their asks and bids to a market institution in order to procure a target unit of a commodity. A bid or ask consists of a price-quantity pair (p,qp,q) indicating that the participant is willing to buy/sell qq units of the commodity at a unit price pp. The market institution matches the buy bids with the sell asks to determine the clearing price and cleared quantities for all sellers and buyers. These type of auctions are very prevalent in stock exchanges [1] and energy markets [2]. For example, in energy markets, power generating companies are the sellers while energy brokers servicing retail customers are the buyers and a energy market regulator plays the role of the central market institution. Since the volume of trade is very high in such markets [3], it is prudent to design an optimal bidding strategy on behalf of a market participant to bring in profits and system efficiency to the ecosystem. The design of such optimal bidding strategies become more pronounced in a periodic double auction (PDA) setup (see Figure 1) wherein buyers and sellers participate in a (finite) sequence of auctions to exchange certain units of a commodity [4]. For example, an energy broker, armed with an estimated energy requirement for a future time slot, participates in day-ahead auctions, to procure the required energy from power generating companies by competing with other energy brokers. In these auctions, the broker will have more than one opportunity to procure the estimated energy by participating in a sequence of auctions. For the purpose of this exposition, such an auction set up, as depicted in Figure 1, is referred to as a periodic double auction (PDA). Evidently, in this PDA setup, an optimal bidding strategy involves planning across current and future time auctions and any small improvement in the bidding strategies of the market participants can lead to improved profits and system efficiency. Motivated by this, we formulate periodic double auctions as a Markov game and derive equilibrium solutions to devise optimal bidding strategies.

Refer to caption
Figure 1: A Periodic Double Auction Setup

Equilibrium solutions for double auctions have been studied extensively in the past. For example, the work of Satterthwaite and Williams [5] proved the existence of non-trivial equilibria for kk-double auctions. Analytical solutions for Nash equilibrium strategies for double auctions with average clearing price rule (ACPR) have been derived for the one buyer one seller single shot case for uniformly distributed valuations [6] and scale based strategies [7, 8]. However, all the above approaches were developed for single shot double auctions whereas sequential decision making in a multi-agent setting was required to study PDAs. In recent years, Markov game framework have been in use to devise bidding strategies in multi-shot auctions wherein approaches such as multi-agent Q-learning [9], multi-agent deep Q networks [10], deep deterministic policy gradients [11] are deployed. However, much of these works do not involve deriving analytical solutions for equilibrium strategies. As far as we know, this work is the first attempt to find analytical solutions for equilibrium strategies in a PDA set up and herein lies our main contribution.

We now elaborate on certain aspects of the PDA considered in this work as the equilibrium analysis would depend on these specifics [12]. First, we assume that the total supply available is enough to meet the overall demand requirement and that all the asks from the suppliers can be clubbed into a composite supply curve and is made known to all the buyers to make their bids. This implies that the resultant Markov game will have only the buyers as the game participants. Second, each PDA consists of fixed number of rounds, i.e., each buyer has the same fixed number of auctions to procure their respective estimated demand. Third, buyers estimate their respective procurement need before the start of the first auction of the PDA and the estimate is not altered during the course of the PDA. Fourth, the buyers do not attempt to buy more than their outstanding requirement in any auction. Fifth, in the case that a particular buyer is not able to procure her targeted units of the commodity even after exhausting all rounds of the PDA, it will be procured outside of the auction at a higher cost. Finally, we consider uniform payment rule (UPR), wherein, the clearing price decided through the auction mechanism is the same for all market participants. Specifically, we consider average clearing price rule (ACPR) [8] where the market clearing price (MCP) is arrived as the average of the last cleared bid and last cleared ask.

II Market Clearing Mechanism

We begin by introducing a few notations. For any positive integer KK, let [K][K] denote the set {1,,K}\{1,\cdots,K\}. We consider a system of NN buyers participating in HH rounds of a PDA to procure multiple units of a commodity from a set of sellers. For the purpose of this exposition, we assume that the composite supply curve from all the sellers at round h[H]h\in[H] consisting of MhM^{h} asks is known to all the buyers and is given by h={(p1h,q1h),(p2h,q2h),,(pMhh,qMhh)}\mathcal{L}^{h}=\{(p^{h}_{1},q^{h}_{1}),(p^{h}_{2},q^{h}_{2}),\ldots,(p^{h}_{M_{h}},q^{h}_{M_{h}})\} where ph[0,pmax]p_{h}\in[0,p_{\max}] and qh[0,qmax]q_{h}\in[0,q_{\max}] are respectively the price and quantity components of the ask (pmh,qmh), with m[Mh](p^{h}_{m},q^{h}_{m}),\textrm{ with }\;m\in[M_{h}] and pmax,qmaxp_{\max},q_{\max} are suitable upper bounds for the ask. The total supply available, at any round h[H]h\in[H], is given by,

Q𝒮,h=m[Mh]qmh.\displaystyle Q^{\mathscr{S},h}=\sum\nolimits_{m\in[M_{h}]}q^{h}_{m}. (1)

Denote the outstanding requirement of a buyer b[N]b\in[N] at round hh as Qb,h0Q^{{b},h}\geq 0. Let 𝒬h={Q1,h,Q2,h,,QN,h}\mathcal{Q}^{h}=\{Q^{1,h},Q^{2,h},\dots,Q^{N,h}\} be the vector that contains the outstanding requirement of all the NN buyers at round hh. Let h\mathcal{B}^{h} denote the set of all bids placed by all buyers at round hh, wherein, each buyer b[N]b\in[N] places at most one bid consisting of price-quantity pair (pb,h,qb,h)(p^{b,h},q^{b,h}) with pb,h[0,pmax]p^{b,h}\in[0,p_{\max}] and qb,h[0,Qb,h]q^{b,h}\in[0,Q^{{b},h}]. The number of bids placed at round hh is given by BhNB^{h}\leq N and the total demand from all buyers round hh is given by

Q𝒟,h=b[N]qb,h{}Q^{\mathscr{D},h}=\sum\nolimits_{b\in[N]}q^{b,h} (2)

with qb,h=Qb,hq^{b,h}=Q^{b,h}.

In this work, we consider average clearing price rule (ACPR) as the clearing mechanism. The ACPR is a special case of kk-Double auction (k\interval01k\in\interval{0}{1}), where k=0.5k=0.5 for the ACPR. The MCP in kk-Double auction is defined as λh=kpdh+(k1)plb,h\lambda^{h}=k\cdot p^{h}_{d}+(k-1)\cdot p^{b,h}_{l}, where pdhp^{h}_{d} and plb,hp^{b,h}_{l} are the last cleared ask and bid prices respectively. In particular for ACPR the clearing price is λh=pdh+plb,h2\lambda^{h}=\frac{p^{h}_{d}+p^{b,h}_{l}}{2}. Here, the bids with bid price greater than plb,hp^{b,h}_{l} are fully cleared and the bid with bid price plb,hp^{b,h}_{l} is either fully or partially cleared. Similarly the asks with price lesser than pdhp^{h}_{d} are fully cleared and the ask with ask price pdhp^{h}_{d} is either fully or partial cleared. The cleared quantity of the last cleared ask and bid depends on the total cleared quantity QhQ^{h}. Moreover, this total cleared quantity in the clearing mechanism is given as Qh=min{j=1dqjh,i=1lqib,h}Q^{h}=\min\{\sum\nolimits_{j=1}^{d}q^{h}_{j},\sum\nolimits_{i=1}^{l}q^{b,h}_{i}\}. An example of the ACPR mechanism with a cleared price and a total cleared quantity is shown in Figure 2.

Refer to caption
Figure 2: Average Clearing Price Rule

III The Markov Game Framework

Having described the clearing process of a double auction, we now model the PDA consisting of NN buyers with horizon HH as a finite horizon Markov game 111A Markov game is sometimes known as a stochastic game [13] specified by =N,S,A,C,P,H\mathcal{M}=\langle N,S,A,C,P,H\rangle. The ingredients of \mathcal{M} are a finite set of players NN; a state space SS; for each player b[N]b\in[N], an action set AbA^{b}; a transition probability PP from S×ASS\times A\rightarrow S, where A=×bNAbA=\times_{b\in N}A^{b} is the action profile, with P(s|s,a)P(s^{\prime}|s,a) as the probability that the next state is sSs^{\prime}\in S, given the current state is sSs\in S and current action profile is aAa\in A; and a payoff function222Although, we use the term payoff, CC actually specifies the cost function. 𝒞\mathcal{C} from S×ANS\times A\rightarrow\mathbb{R}^{N}, where the bb-th coordinate of 𝒞\mathcal{C} is CbC^{b}, is the payoff to player bb as a function of state and action profile.

More specifically, we let the state at round hh denoted by shs^{h}, to consist of {𝒬h,h}\{\mathcal{Q}^{h},\mathcal{L}^{h}\} and the action ab,hAba^{b,h}\in A^{b} by player b[N]b\in[N] at round hh consists of at most one bid belonging to the bounded set [0,pmax]×[0,qmax][0,p_{\max}]\times[0,q_{\max}]. The payoff function Cb,h:S×AC^{b,h}:S\times A\rightarrow\mathbb{R} returns a scalar value to player bb specifying her cost of procurement (if any) for the auction at round hh. More precisely,

Cb,h(sh,ah)={λhαb,hat non terminal state shΨ×Qb,hwhen h=H+1,\displaystyle C^{b,h}(s^{h},a^{h})=\begin{cases}\lambda^{h}\cdot\alpha^{b,h}&\text{at non terminal state $s^{h}$}\\ \Psi\times Q^{b,h}&\text{when }h=H+1,\end{cases}

where ah=(ab,h,ab,h)Aa^{h}=(a^{b,h},a^{-b,h})\in A is the joint action set containing one action for each player at round hh with ab,ha^{-b,h} specifying the N1N-1 actions of all players except bb. In addition, we have λh0\lambda^{h}\geq 0 to be the clearing price of the auction at round hh, αb,h\alpha^{b,h} is cleared quantity for the buyer bb at round hh. The entity Ψ0\Psi\geq 0 is the unit price of procuring the commodity outside of the HH auctions and Qb,H+1Q^{b,H+1} is the remaining units of the commodity to be procured by buyer bb after exhausting the HH rounds of the PDA. Given a state shSs^{h}\in S and action profile ahAa^{h}\in A, the next state at round h+1h+1 is given by sh+1={𝒬h+1,h+1}s^{h+1}=\{\mathcal{Q}^{h+1},\mathcal{L}^{h+1}\}, where h+1\mathcal{L}^{h+1} refers to the uncleared asks of the supply curve from round hh and 𝒬h+1={Q1,h+1,,QN,h+1}\mathcal{Q}^{h+1}=\{Q^{1,h+1},\cdots,Q^{N,h+1}\} with Qb,h+1=Qb,hαb,h,b[N]Q^{b,h+1}=Q^{b,h}-\alpha^{b,h},\forall\;b\in[N].

At any round hh, having seen the state shs^{h}, the players choose their action based on a policy. A (Markov) policy for a player b[N]b\in[N] is a collection of policies πb={πb,h:SΔAb}h=1H\pi^{b}=\{\pi^{b,h}:S\rightarrow\Delta_{A_{b}}\}_{h=1}^{H} where each πb,h(|sh)ΔAb\pi^{b,h}(\cdot|s^{h})\in\Delta_{A_{b}} specifies the probability of taking action ahAba^{h}\in A_{b} at state shs^{h}. Let π=(πb,πb)\pi=(\pi^{b},\pi^{-b}) be the joint policy containing one policy for each player b[N]b\in[N] where πb\pi^{-b} denotes the N1N-1 policies of all players except bb. The value of a joint policy π\pi (not necessarily Markov), at round hh, for any player bb is a function Vπh:SV^{h}_{\pi}:S\rightarrow\mathbb{R} defined as below.

Vπh(s)=𝔼τ(P,πb,πb)[h=hH+1Cb,h(sh,ab,h,ab,h)|sh=s]V_{\pi}^{h}(s)=\operatorname{\mathbb{E}}_{\tau\sim(P,\pi^{b},\pi^{-b})}\left[\sum\limits_{h^{\prime}=h}^{H+1}C^{b,h^{\prime}}(s^{h^{\prime}},a^{b,h^{\prime}},a^{-b,h^{\prime}})|s^{h}=s\right]

with ab,hπba^{b,h^{\prime}}\sim\pi^{b}, ab,hπba^{-b,h^{\prime}}\sim\pi^{-b} and τ\tau is a trajectory of the Markov game, generated by following the joint policy π\pi. As the Markov game pertaining to this work involves cost minimization as the objective, the optimal policy for any player is to find a policy that minimizes the value function. However, in a multi-agent scenario, when other players act rationally, finding optimal policy is equivalent to finding (Nash) equilibrium solution which is the best response to rational behaviour of other participating agents. Hence, in this paper, for the PDA modelled as a Markov game, we look for MPNE [14, 15] solutions defined as below.

Definition III.1

Given a NN player finite horizon stochastic game specified by =<N,S,A,C,P,H>\mathcal{M}=<N,S,A,C,P,H> a joint policy π=(πb,πb)\pi_{*}=(\pi^{b}_{*},\pi^{b}_{*}) is a (Markov) perfect Nash equilibrium (MPNE) if for all b[N]b\in[N], for all sSs\in S, for all h[H]h\in[H] and for all Markov policy πb:SΔAb\pi^{b}:S\rightarrow\Delta_{A_{b}}, we have

Vπb,πbh(s)Vπb,πbh(s)V^{h}_{\pi_{*}^{b},\pi_{*}^{-b}}(s)\leq V^{h}_{\pi^{b},\pi_{*}^{-b}}(s)

The perfectness of the Nash equilibrium is due to condition that the inequality in Definition (III.1) holds for every round h[H]h\in[H] and for every element of the state space SS. In the sequel, we propose a MPNE solution for the PDA problem described in Section I.

IV A Nash Strategy for the Single Bid Case

Having elaborated on the Markov game framework, we now describe a joint policy which is an MPNE for the PDA setup considered in this work wherein each buyer is allowed to place one bid per round of the Markov game. Note that here, the goal is to find MPNE in the space of deterministic policies.

Recall from Equations (1) and (2) that Q𝒟,hQ^{\mathscr{D},h} and Q𝒮,hQ^{\mathscr{S},h} denote total demand requirement and total supply available at round hh. At each round hh, let [Nh][N^{h}] denote the set of NN players indexed by the decreasing order of their quantity requirement333For this work, we assume players quantity requirements are unique. Now, let uhu_{h} be the index of the ask from the set h\mathcal{L}^{h} such that all of the demand requirement at round hh is met. That is, uh=argminj(Q𝒟,hm=1jqmh)u_{h}=\operatorname*{arg\,min}\nolimits_{j}\left(Q^{\mathscr{D},h}\leq\sum\nolimits_{m=1}^{j}q^{h}_{m}\right). At round hh, denote Q𝒟b,h=Q𝒟,hQb,h,b[Nh]Q^{\mathscr{D}_{-b},h}=Q^{\mathscr{D},h}-Q^{b,h},\;b\in[N^{h}] as the demand requirement of all players except the player bb. Let vhb,b[Nh]v^{b}_{h},\;b\in[N^{h}] be the lowest index of the ordered set h\mathcal{L}^{h} such that the total supply available for the first vhbv^{b}_{h} asks satisfies the demand requirement of all players except bb. That is, vhb=argminj(Q𝒟b,h<m=1jqmh)b[Nh]v^{b}_{h}=\operatorname*{arg\,min}\nolimits_{j}\left(Q^{\mathscr{D}_{-b},h}<\sum\nolimits_{m=1}^{j}q^{h}_{m}\right)\;\forall\;b\in[N^{h}]. Next, let us define index zhz_{h} as zh=uh(Hh)z_{h}=u_{h}-(H-h). Finally, let ψh=max{1,argmaxj{vhjzh}}\psi^{h}=\max\{1,\operatorname*{arg\,max}_{j}\{v^{j}_{h}\leq z_{h}\}\} as the player who bids pzhp_{z_{h}} and let ϕh\phi^{h} as the player with the maximum requirement. Note that ψh=ϕh\psi^{h}=\phi^{h} when ψh=1\psi^{h}=1.

The joint policy π\pi^{*} for a player b[Nh]b\in[N^{h}] at round h[H]h\in[H] for state shSs^{h}\in S, can now be formulated as,

πb,h(s)={pb,h=0,qb,h=0 if Qb,h=0,b[Nh]π1b,h(s) if Hhuhvhϕπ2b,h(s) Otherwise \pi^{b,h}_{*}(s)=\begin{cases}p^{b,h}=0,\;q^{b,h}=0&\textrm{ if }\;Q^{b,h}=0,\forall\;b\in[N^{h}]\\ \pi_{1}^{b,h}(s)&\textrm{ if }\;H-h\geq u_{h}-v^{\phi}_{h}\\ \pi_{2}^{b,h}(s)&\;\textrm{ Otherwise }\end{cases} (3)

where the policies π1b,h(s)\pi_{1}^{b,h}(s) and π2b,h(s)\pi_{2}^{b,h}(s) are defined as,

π1b,h(s)={pb,h=pvhϕ,qb,h=Qb,h if Qb,h>0,b=ϕhpb,h=pmax,qb,h=Qb,h if Qb,h>0,bϕh\displaystyle\pi^{b,h}_{1}(s)=\begin{cases}p^{b,h}=p_{v^{\phi}_{h}},\;q^{b,h}=Q^{b,h}&\textrm{ if }Q^{b,h}>0,\;b=\phi^{h}\\ p^{b,h}=p_{max},\;q^{b,h}=Q^{b,h}&\textrm{ if }Q^{b,h}>0,\;b\neq\phi^{h}\end{cases}
π2b,h(s)={pb,h=pzh,qb,h=Qb,h if Qb,h>0,b=ψhpb,h=pmax,qb,h=Qb,h if Qb,h>0,bψh\displaystyle\pi^{b,h}_{2}(s)=\begin{cases}p^{b,h}=p_{z_{h}},\;q^{b,h}=Q^{b,h}&\textrm{ if }Q^{b,h}>0,\;b=\psi^{h}\\ p^{b,h}=p_{max},\;q^{b,h}=Q^{b,h}&\textrm{ if }Q^{b,h}>0,\;b\neq\psi^{h}\end{cases}

Here, pmaxp_{\max} is the maximum possible bid price and is greater than largest possible ask price i.e pmax>pMHp_{\max}>p_{M_{H}}. The policy in (3) suggests that the player with the highest requirement would wait for other players to get their demand satiated provided there are enough rounds as determined by HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h}. In this case, the player with highest requirement also determines the MCP. However, if there are not enough rounds (Hh<uhvhϕH-h<u_{h}-v^{\phi}_{h}), then the player bψb\neq\psi would bid for the whole quantity at the highest possible price and the player (ψ\psi) would bid a price that decides the clearing price. In the case when there is only one buyer left in the market, the policy in (3) recommends the player to follow the supply curve.

Having described the joint policy, we now evaluate the value of the policy πb,h\pi_{*}^{b,h} at round hh, for player b[Nh]b\in[N^{h}] at state shSs^{h}\in S. To this end, the MCP λh\lambda^{h}, the total market cleared quantity QhQ^{h} and the cleared quantity αb,h\alpha^{b,h} for a buyer bb while adopting the policy πb,h\pi_{*}^{b,h} at round h[H]h\in[H] for state shs^{h} is tabulated in the Lemma below.

Lemma IV.1

If at round hh, the available supply is adequate to satisfy the outstanding requirement of all players, that is, Q𝒟,hQ𝒮,hQ^{\mathscr{D},h}\leq Q^{\mathscr{S},h} and if all the players follow the policy πb,h\pi_{*}^{b,h} given as in Equation (3), then Table I gives the clearing price and quantity for the players.

TABLE I: Cleared price and quantities
Case : HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h} Case : Hh<uhvhϕH-h<u_{h}-v^{\phi}_{h}
The clearing price is λh=pvhϕ\lambda_{h}=p_{v^{\phi}_{h}} The clearing price is λh=pzh\lambda_{h}=p_{z_{h}}
The total market cleared quantity at round hh is, Qh=min(j=1vhϕqjh,b[N]Qb,h)Q^{h}_{*}=\min\left(\sum\limits_{j=1}^{v^{\phi}_{h}}q^{h}_{j},\sum\limits_{b\in[N]}Q^{b,h}\right) The total market cleared quantity at round hh is, Qh=min(j=1zhqjh,b[N]Qb,h)Q^{h}_{*}=\min\left(\sum\limits_{j=1}^{z_{h}}q^{h}_{j},\sum\limits_{b\in[N]}Q^{b,h}\right)
The bids placed by any player bϕhb\neq\phi^{h}, at round hh gets fully cleared. That is, αb,h=Qb,h,bϕ\alpha^{b,h}=Q^{b,h},\;\;\forall b\neq\phi. The bids placed by any player bψhb\neq\psi^{h}, at round hh gets fully cleared. That is, αb,h=Qb,h,bψh\alpha^{b,h}=Q^{b,h},\;\;\forall b\neq\psi^{h}.
The bids placed the player b=ϕhb=\phi^{h} at round hh, gets cleared as, αb,h=(Qhb[N]ϕhqjb,h)\alpha^{b,h}=\left(Q^{h}-\sum\limits_{b\in[N]\setminus\phi^{h}}q^{b,h}_{j}\right) The bids placed the player b=ψhb=\psi^{h} at round hh, gets cleared as, αb,h=(Qhb[N]ψhqjb,h)\alpha^{b,h}=\left(Q^{h}-\sum\limits_{b\in[N]\setminus\psi^{h}}q^{b,h}_{j}\right)
Proof:

First note that policy πb,h\pi_{*}^{b,h} has just two price bids with the highest bid price at pmaxpMhhp_{\max}\geq p^{h}_{M_{h}}. This implies that there exists at least one bid that is greater than some ask and hence the total cleared quantity Qh>0Q^{h}>0. In the case, that at round hh, there is adequate supply to cater to the demand of all buyers, that is, Q𝒟,hQ𝒮,hQ^{\mathscr{D},h}\leq Q^{\mathscr{S},h}, the player ϕh\phi^{h} has maximum requirement and the bid at price pvhϕp_{v^{\phi}_{h}}. By construction, pvhϕp_{v^{\phi}_{h}} is also the point where the supply and demand curve intersect and hence the MCP is pvhϕp_{v^{\phi}_{h}}. It is now easy to see that, the total market cleared quantity is given by,

Qh=min(j=1vhϕqjh,b[Nh]Qb,h).Q^{h}=\min\left(\sum\nolimits_{j=1}^{v^{\phi}_{h}}q^{h}_{j},\sum\nolimits_{b\in[N^{h}]}Q^{b,h}\right).

As the bids placed at the higher price pmaxp_{\max} gets cleared first and since the available supply is enough to cater to outstanding demand requirement at round hh, bids gets cleared exactly as stated in the first column of the table in Lemma. In similar lines, we can show for the case Hh<uhvhϕH-h<u_{h}-v^{\phi}_{h}. ∎

Having described the clearing implications for a buyer b[Nh]b\in[N^{h}] for following the policy πb,h\pi_{*}^{b,h} of Equation (3) at state shSs^{h}\in S, we now compute the value of the equilibrium policy for a buyer b[Nh]b\in[N^{h}] which follows from Lemma IV.1. When HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h}, we have,

Vπb,πbh(s)={pvhϕ×Qb,h, if bϕh[pvhϕ×(Qhb[N]ϕhqjb,h)+k=h+1Hpvkϕ×Qk], if b=ϕh.V^{h}_{\pi^{b}_{*},\pi^{-b}_{*}}(s)=\begin{cases}\!p_{v^{\phi}_{h}}\times Q^{b,h},\;\;&\textrm{ if }\;b\neq\phi^{h}\\ \begin{aligned} \bigg{[}p_{v^{\phi}_{h}}\times\left(Q^{h}-\sum\nolimits_{b\in[N]\setminus\phi_{h}}q^{b,h}_{j}\right)\\ +\sum\nolimits_{k=h+1}^{H}p_{v^{\phi}_{k}}\times Q^{k}\bigg{]},\end{aligned}&\textrm{ if }\;b=\phi^{h}.\end{cases} (4)

On the other hand, when Hh<uhvhϕH-h<u_{h}-v^{\phi}_{h}, we have,

Vπb,πbh(s)={pzh×Qb,h, if bψh[pzh×(Qhb[N]ψhqjb,h)+k=h+1Hpzk×Qk], if b=ψh.V^{h}_{\pi^{b}_{*},\pi^{-b}_{*}}(s)=\begin{cases}\!p_{z_{h}}\times Q^{b,h},\;\;&\textrm{ if }\;b\neq\psi^{h}\\ \begin{aligned} \bigg{[}p_{z_{h}}\times\left(Q^{h}-\sum\nolimits_{b\in[N]\setminus\psi^{h}}q^{b,h}_{j}\right)\\ +\sum\nolimits_{k=h+1}^{H}p_{z_{k}}\times Q^{k}\bigg{]},\end{aligned}&\textrm{ if }\;b=\psi^{h}.\end{cases} (5)

V Equilibrium Analysis

In this section, for the PDA considered in this exposition, we show that the policy in (3) is an MPNE in the space of all deterministic policies. More precisely, we need to show that, for all b[Nh]b\in[N^{h}], for all sSs\in S, for all h[H]h\in[H] and for any deterministic policy πb:SAb\pi^{b}:S\rightarrow A_{b}, we have

Vπb,πbh(s)Vπb,πbh(s).V^{h}_{\pi_{*}^{b},\pi_{*}^{-b}}(s)\leq V^{h}_{\pi^{b},\pi_{*}^{-b}}(s).

Denote the bid of buyer b[Nh]b\in[N^{h}] at state ss and round hh as prescribed by the policy πb,h(s)\pi^{b,h}_{*}(s) as (pb,h,qb,h)(p_{*}^{b,h},q_{*}^{b,h}). Further, recall that each bid of a player b[Nh]b\in[N^{h}] belong to the bounded set [0,pmax]×[0,qmax][0,p_{\max}]\times[0,q_{\max}] and at any round hh, the player bb does not bid more than the outstanding demand requirement Qb,hQ^{b,h}, the possible deviations available for a player bb at a state ss and round hh can be tabulated as below.

TABLE II: Possible Deviations
Higher Priced Deviations Lower Priced Deviations
pb,h>pb,hp^{b,h}>p^{b,h}_{*}, qb,h<qb,hq^{b,h}<q^{b,h}_{*} pb,h<pb,hp^{b,h}<p^{b,h}_{*}, qb,h<qb,hq^{b,h}<q^{b,h}_{*}
pb,h>pb,hp^{b,h}>p^{b,h}_{*}, qb,h=qb,hq^{b,h}=q^{b,h}_{*} pb,h<pb,hp^{b,h}<p^{b,h}_{*}, qb,h=qb,hq^{b,h}=q^{b,h}_{*}
Equal Priced Deviation
pb,h=pb,hp^{b,h}=p^{b,h}_{*}, qb,h<qb,hq^{b,h}<q^{b,h}_{*}

Given these deviations, we now show that, at any state ss and at any round hh, a player b[Nh]b\in[N^{h}] deviating from the policy πb,h(s)\pi^{b,h}_{*}(s) (Equation (3)) in any of the ways listed above (Table (II)) will not incur any less expenditure than what is accounted for via the value functions in Equations (4) and (5). To this end, we first provide results that will be used later in the analysis. The first result provides an insight into how the MCP varies across the rounds of a PDA.

Lemma V.1

Consider a PDA with HH rounds with ACPR. In the case when the composite supply curve does not change across the rounds of the PDA, the MCP at rounds hh and h+1h+1, are related as,

λh+1λh.\lambda^{h+1}\geq\lambda^{h}.
Proof:

Recall once ACPR is chosen for a PDA as the clearing mechanism at every round h[H]h\in[H], the MCP is λh=pdh+plb,h2\lambda^{h}=\frac{p^{h}_{d}+p^{b,h}_{l}}{2}, which lies in the interval [pdh,plb,h][p_{d}^{h},p^{b,h}_{l}] where pdhp_{d}^{h} is the price of the last cleared ask and plb,hp^{b,h}_{l} is the price of the last cleared bid (at round hh). The result now follows by noting that the uncleared asks of round hh, from which the asks of round h+1h+1 would be rolled out, have prices greater than or equal to pdhp_{d}^{h}. ∎

Recall that the policy in Equation (3), suggests that N1N-1 players to bid at price pmaxp_{\max} and the remaining player to bid at a specified (lower) price. The next result states that any deviations in the bid, by a player recommended to bid at price pmaxp_{\max}, at any round h[H]h\in[H], might reduce the procurement cost.

Lemma V.2

Let the conditions of Lemma V.1 hold with Ψ>βpmax\Psi>\beta\cdot p_{\max} (β>1\beta>1) and let ω[Nh]\omega\in[N^{h}] be a player that is prescribed by the policy in Equation (3) to bid at a price pmaxp_{\max} to procure his outstanding demand requirement at round hh. If the player ω\omega deviates from the said policy to another policy πω\pi^{\omega} at round hh at state ss, then,

Vπω,πωh(s)Vπω,πωh(s).V^{h}_{\pi_{*}^{\omega},\pi_{*}^{-\omega}}(s)\leq V^{h}_{\pi^{\omega},\pi_{*}^{-\omega}}(s).
Proof:

Among the five deviations enumerated in Equation (II), the deviations suggesting that the bid price greater than pmaxp_{\max} are not applicable to player ω\omega (as by design pmaxp_{\max} is the maximum bid price). The other three deviations (at round hh) either suggest that the bid of player ω\omega has bid price less than equal to pmaxp_{\max} or bid quantity less than equal to Qω,hQ^{\omega,h}. In the case qω,h<Qω,hq^{\omega,h}<Q^{\omega,h}, the bid placed by ω\omega will lose out on the priority when compared to following the policy in Equation (3). This implies that the bid quantity Qω,hQ^{\omega,h} could be partially cleared at round hh (as opposed to Qω,hQ^{\omega,h} being cleared if policy (3) is followed). Further, by Lemma V.1, the remaining requirement of Qω,hQ^{\omega,h} is likely to be cleared at a higher price in future rounds, hence the overall cost incurred by player ω\omega is greater than or equal to the cost incurred when policy in (3) is followed.

Now if player ω\omega deviates in bid price but with fixed bid quantity as qω,h=Qω,hq^{\omega,h}=Q^{\omega,h}. First consider when there are enough rounds for the player ϕh\phi^{h} (i.e HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h}) and he/she deviates below the price pmaxp_{\max} then the priority of the player decreases. Here, with similar arguments made earlier using Lemma V.1 it can be concluded that the deviation is expensive. Next, if the player ϕh\phi^{h} does not have enough rounds, then the policy π2\pi_{2} is recommended. Here, if the player ω\omega has requirement greater than the player bidding at pzhp_{z_{h}}, then for the bid price pω,h[pzh,pmax)p^{\omega,h}\in[p_{z_{h}},p_{\max}), the value function is unchanged for player ω\omega. However, if the player ω\omega bids at price pω,h[pvhψ,pzh)p^{\omega,h}\in[p_{v^{\psi}_{h}},p_{z_{h}}), by construction the number of remaining rounds for the player bb would be less. Hence the player has to buy non-zero quantity from balancing market at a price Ψ\Psi. Furthermore, if the player ω\omega has less requirement than the player bidding at pzhp_{z_{h}}, then with bid price pω,h(pzh,pmax)p^{\omega,h}\in(p_{z_{h}},p_{\max}), the value function is unchanged for player ω\omega. And for the bid price pω,h=pzhp^{\omega,h}=p_{z_{h}}, the value function might increase due to lemma V.1. Finally, for the bid price pω,h[pvhψ,pzh)p^{\omega,h}\in[p_{v^{\psi}_{h}},p_{z_{h}}), the condition on Ψ>βpmax\Psi>\beta\cdot p_{\max} will lead to higher value function for the player ω\omega. ∎

Lemma V.3

Let the conditions of Lemma V.1 with the balancing cost Ψ>βpmax\Psi>\beta\cdot p_{\max} (β>1\beta>1) and let ω[Nh]\omega\in[N^{h}] be a player that is prescribed by the policy in Equation (3) to bid at the price pvhϕp_{v^{\phi}_{h}} or pzhp_{z_{h}} to procure its outstanding demand requirement at round hh. If the player deviates to another policy π\pi at round hh, instead of following the policy in Equation (3), then, for any state sSs\in S,

Vπω,πω(s)hVπ,πωh(s).V^{h}_{\pi_{*}^{\omega},\pi_{*}^{-\omega}(s)}\leq V^{h}_{{\pi^{,}\pi_{*}^{-\omega}}}(s).
Proof:

Here for player ω\omega, all the five deviations listed in Equation (II) are possible. Now consider the case of policy π1\pi_{1} which has pv1ϕp_{v^{\phi}_{1}} as the bid price. Here, the deviations with the bid price greater than pvhϕp_{v^{\phi}_{h}} will increase the MCP, which leads to the increased value function. Next if the bid price is less than pvhϕp_{v^{\phi}_{h}}, by construction the player ϕ\phi is not cleared. Similarly for bidding qb,h<Qb,hq^{b,h}<Q^{b,h}, the cleared quantity at round hh is less than the cleared quantity of the MPNE policy. Hence in both previous cases, by Lemma V.1, the value function increases.

For the policy π2\pi_{2}, the recommended price is pzhp_{z_{h}}. If the player ω\omega bids at a price more than pzhp_{z_{h}}, then similar to earlier case the MCP at round hh would be greater than or equal to pzhp_{z_{h}} resulting in possible increase in the cost of procurement. And, if the player bids at price pzhp_{z_{h}} with bid quantity less than Qω,hQ^{\omega,h}, the cleared quantity could be less than the demand procured by following policy (3) which would imply more demand needs to be satisfied in the remaining rounds. Again from Lemma V.1, this could lead to higher cost of procurement. Finally, if the bid price is pω,h[pvhψ,pzh)p^{\omega,h}\in[p_{v^{\psi}_{h}},p_{z_{h}}) then by the choice of Ψ>βpmax\Psi>\beta\cdot p_{\max} with suitable β>1\beta>1, the value function increases. ∎

Theorem V.1

Let the conditions of Lemma V.1 hold with the balancing cost Ψ>βpmax\Psi>\beta\cdot p_{\max} (β>1\beta>1). If a buyer b[Nh]b\in[N^{h}] deviates to another policy π\pi at round hh, instead of following the policy in Equation (3), then, for any state sSs\in S and h[H]h\in[H],

Vπb,πbh(s)Vπb,πbh(s).V^{h}_{\pi_{*}^{b},\pi_{*}^{-b}}(s)\leq V^{h}_{\pi^{b},\pi_{*}^{-b}}(s). (6)
Proof:

From Lemmas V.1, V.2 and V.3 the value function of the policy (3) satisfies (6). ∎

Note that the policy in (3) is a Markov policy since it only depends on the present state ss. Moreover, the inequality (6) holding for all hHh\in H and sSs\in S implies that the policy satisfies sub-game perfectness.

VI Simulations

This section considers a simple numerical setup to demonstrate the efficacy of the Nash policies described in Section IV. Our setup consists of three players (buyers) in the market. The players go through a PDA simulator which has H=24H=24 rounds to procure the required quantity. The quantity requirement of the four players (P0, P1, P2) at some round hHh\leq H is given as 𝒬h=(232.18,164.6,90.7)\mathcal{Q}^{h}=(232.18,164.6,90.7). The players P0 and P2 are the players with largest and smallest requirement respectively. The players know the supply curve (ask pattern) h\mathcal{L}_{h} which has 31 asks and the total supply Q𝒮=1502.38>Q𝒟,h=487.48Q^{\mathscr{S}}=1502.38>Q^{\mathscr{D},h}=487.48. We consider two values of hh, namely h=1h=1 and h=23h=23, wherein the choice h=1h=1 satisfies the condition HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h} and the latter does not.

Figure  3(a) compares the value function when all players adopt the policy in (3) with a joint policy in which player P0 deviates to a bid price higher than the prescribed price pzhp_{z_{h}} at h=1h=1. The higher bid price of P0 results in higher cost because of increased MCP. In Figure  3(b), for h=23h=23, the condition HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h} is not satisfied and hence the prescribed bid price of P0 is pmaxp_{\max}. When P0 bids less than pmaxp_{\max} its bid priority decreases resulting in procurement outside of the PDA at higher cost Ψ\Psi thereby increasing the overall cost. Figure  3(c), considers the case h=24h=24, wherein the condition HhuhvhϕH-h\geq u_{h}-v^{\phi}_{h} is not satisfied. Here, we consider the deviation by the minimum requirement player P2 to bid at a price p(pvhψ,pzh)p\in(p_{v^{\psi}_{h}},p_{z_{h}}) less than the prescribed price pzhp_{z_{h}}. This deviation to a lower price, although results in lower cost of procurement at round hh, leads to higher overall cost as the player has to buy more units of the commodity outside of the auction at higher price Ψβpmax\Psi\geq\beta\cdot p_{\max}. Finally, in Figure 3(d), we consider average cost incurred by the players in 100 PDAs (each with H=24H=24 rounds) with varying demand requirement. In each of these 100 PDAs, we let player P0 deviate from the prescribed Nash policy to the Zero intelligent (ZI) policy [16] and the corresponding value functions are compared with the value function for the Nash policy.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparison of the value function of MPNE and deviation. (a) The value function for h=1h=1 with the player P0 deviating to a price higher than pzhp_{z_{h}}. (b) The value function for h=23h=23 with player P0 deviating from MPNE to a lesser price than the prescribed price pmaxp_{\max}. (c) Value function for h=24h=24 with player P2 deviating from Nash policy to bid at a price lower than pzhp_{z_{h}}. (d) Average cost incurred by players in a series of 100 PDAs each with horizon H=24H=24 with deviation by player P0 to ZI policy

VII Conclusion

In this paper, we formulate optimal bidding strategies for a periodic double auction setting consisting of multiple buyers competing with each other to satisfy their respective demand. Each buyer has multiple opportunities to procure their need and the composite supply curve is known to all of them. The problem is modeled as a Markov game and we propose equilibrium solutions that could act as optimal bidding strategies when all buyers behave rationally. Apart from proving that the proposed policies are indeed MPNE, we also conducted simple numerical simulations to demonstrate the efficacy of the proposed solution framework. The PDA set up considered in this paper have applications in devising optimal bidding strategies for day-ahead electricity markets.

Although, in this work, we have considered only the case of adequate supply with one bid per auction per buyer, we believe that the case of multiple bids per auction and inadequate supply can be handled using the techniques developed in this work. Despite the fact that, the equilibrium solutions proposed here are for the complete information setting, they are still important for two reasons. First, as far as we know, ours is the first work to derive analytical equilibrium solutions for multi-shot auctions. Second, these policies can be used as a baseline to compare with a policy that is obtained in an incomplete information setting, which would be a direction of our future work.

References

  • [1] S. Parsons, J. A. Rodriguez-Aguilar, and M. Klein, “Auctions and Bidding : A Guide for Computer Scientists,” ACM Computing Surveys, vol. 43, no. 2, pp. 1–59, Jan. 2011. [Online]. Available: https://doi.org/10.1145/1883612.1883617
  • [2] W. Ketter, J. Collins, and M. de Weerdt, “The 2020 Power Trading Agent Competition,” SSRN Electronic Journal, 2020. [Online]. Available: https://doi.org/10.2139/ssrn.3564107
  • [3] “Nord Pool AS Anual Report,” 2020. [Online]. Available: www.nordpoolgroup.com/49eea7/globalassets/download-center/annual-report/annual-review-2020.pdf
  • [4] M. M. P. Chowdhury, C. Kiekintveld, S. Tran, and W. Yeoh, “Bidding in Periodic Double Auctions Using Heuristics and Dynamic Monte Carlo Tree Search,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18.   International Joint Conferences on Artificial Intelligence Organization, 7 2018, pp. 166–172. [Online]. Available: https://doi.org/10.24963/ijcai.2018/23
  • [5] M. A. Satterthwaite and S. R. Williams, “Bilateral Trade with the Sealed Bid k-double Auction: Existence and efficiency,” Journal of Economic Theory, vol. 48, no. 1, pp. 107–133, June 1989. [Online]. Available: https://doi.org/10.1016/0022-0531(89)90121-x
  • [6] K. Chatterjee and W. Samuelson, “Bargaining Under Incomplete Information,” in Operations Research, vol. 31, 1983.
  • [7] S. Ghosh, S. Gujar, P. Paruchuri, E. Subramanian, and S. Bhat, “Bidding in Smart Grid PDAs: Theory, Analysis and Strategy,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, pp. 1974–1981, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5568
  • [8] S. Chandlekar, E. Subramanian, S. Bhat, P. Paruchuri, and S. Gujar, “Multi-Unit Double Auctions: Equilibrium Analysis and Bidding Strategy Using DDPG in Smart-Grids,” in Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, ser. AAMAS ’22.   International Foundation for Autonomous Agents and Multiagent Systems, 2022, p. 1569–1571.
  • [9] N. Rashedi, M. A. Tajeddini, and H. Kebriaei, “Markov Game Approach for Multi-agent Competitive Bidding Strategies in Electricity Market,” IET Generation, Transmission and Distribution, vol. 10, no. 15, pp. 3756–3763, Nov. 2016. [Online]. Available: https://doi.org/10.1049/iet-gtd.2016.0075
  • [10] A. Ghasemi, A. Shojaeighadikolaei, K. Jones, M. Hashemi, A. G. Bardas, and R. Ahmadi, “A Multi-Agent Deep Reinforcement Learning Approach for a Distributed Energy Marketplace in Smart Grids,” in 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm).   IEEE, Nov. 2020. [Online]. Available: https://doi.org/10.1109/smartgridcomm47815.2020.9302981
  • [11] Y. Du, F. Li, H. Zandi, and Y. Xue, “Approximating Nash Equilibrium in Day-ahead Electricity Market Bidding with Multi-agent Deep Reinforcement Learning,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 3, pp. 534–544, 2021. [Online]. Available: https://doi.org/10.35833/mpce.2020.000502
  • [12] R. Wilson, “Strategic analysis of auctions,” Handbook of Game Theory with Economic Applications, vol. 1, pp. 227–279, 1992.
  • [13] Y. Zhang, G. Qu, P. Xu, Y. Lin, Z. Chen, and A. Wierman, “Global convergence of localized policy iteration in networked multi-agent reinforcement learning,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–51, Feb. 2023. [Online]. Available: https://doi.org/10.1145/3579443
  • [14] Y. Yang and J. Wang, “An overview of multi-agent reinforcement learning from game theoretical perspective,” CoRR, vol. abs/2011.00583, 2020. [Online]. Available: https://arxiv.org/abs/2011.00583
  • [15] J. Li, Y. Zhou, T. Ren, and J. Zhu, “Exploration analysis in finite-horizon turn-based stochastic games,” in Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), ser. Proceedings of Machine Learning Research, J. Peters and D. Sontag, Eds., vol. 124.   PMLR, 03–06 Aug 2020, pp. 201–210. [Online]. Available: https://proceedings.mlr.press/v124/li20a.html
  • [16] D. K. Gode and S. Sunder, “Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality,” Journal of political economy, vol. 101, no. 1, pp. 119–137, 1993.