This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Incentive Designs for Learning Agents to Stabilize Coupled Exogenous Systems

Jair Certório, , Nuno C. Martins, ,
Richard J. La and Murat Arcak
The work of Certório and Martins was supported by the AFOSR grant FA95502310467, and the NSF grants 2135561 and 2139713. The work of Arcak was supported by the NSF grant CNS-2135791 Jair Certório, Nuno C. Martins, and Richard J. La are with the Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, College Park, MD 20740 USA (e-mail: certorio, nmartins, hyongla@umd.edu). Murat Arcak is with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 USA (e-mail: arcak@eecs.berkeley.edu).
Abstract

We consider a large population of learning agents noncooperatively selecting strategies from a common set, influencing the dynamics of an exogenous system (ES) we seek to stabilize at a desired equilibrium. Our approach is to design a dynamic payoff mechanism capable of shaping the population’s strategy profile, thus affecting the ES’s state, by offering incentives for specific strategies within budget limits. Employing system-theoretic passivity concepts, we establish conditions under which a payoff mechanism can be systematically constructed to ensure the global asymptotic stability of the ES’s equilibrium. In comparison to previous approaches originally studied in the context of the so-called epidemic population games, the method proposed here allows for more realistic epidemic models and other types of ESs, such as predator-prey dynamics. The stability of the equilibrium is established with the support of a Lyapunov function, which provides useful bounds on the transient states.

I Introduction

Systems whose behavior depends on the strategic choices of many agents can be studied through the lens of evolutionary game theory, in particular when considering large populations of nondescript agents that repeatedly revise their strategies. Examples of systems with such dependency on the aggregate choice of a large population include models of traffic congestion [1], optimal power dispatch [2], distributed task allocation [3], building temperature control [4], and epidemic mitigation [5].

The coupling between the evolutionary dynamics, which models the population’s strategic choices, and the exogenous system (ES) dynamics, which captures the state of the system affected by the decisions of the population, makes the task of designing stabilizing policies challenging. This is especially true when we aim to design policies that not only improve the behavior of the system, but also provide performance guarantees that hold at any given time.

In our work, we generalize the design concept from [5] to a larger class of ESs which, when agents stop revising their strategies, have a Lyapunov function and satisfy some mild assumptions. We design incentives that guarantee the convergence of the population state to an equilibrium, which can be selected independently of the payoff mechanism. While our results require similar assumptions on the behavior of the population to those of [5], unlike [5, 6] we obtain a bound on the instantaneous cost of implementing our incentives.

In addition, we show that the proposed payoff mechanism is compatible with many ESs, such as the epidemic models studied by [5, 6, 7] and the Leslie-Gower model for studying the interaction between populations of hosts and parasites [8]. As an application, we use the proposed framework to devise incentives for a modified epidemic model studied in [6], while considering disease transmission rates that depend nonlinearly on the agents’ choices.

I-A Contributions

The goal of our study is to develop a new framework for designing a dynamic payoff mechanism that guarantees the convergence of both the population and the ES, whose dynamics are influenced by the strategic choices of the agents in the population, to a desirable equilibrium that can be selected independently of the payoff. The incentive we design has a bound on the instantaneous reward offered to the population, which is not guaranteed in the previous studies that use δ\delta-passivity for designing policies that mitigate epidemics.

The assumptions we introduce on the learning rule employed by the agents are similar to those of [5]. However, we relax one of the assumptions on the learning rule, which is not easy to verify, and replace it with another assumption that is easier to check. The proposed mechanism works on previously studied ESs [5, 6, 7], but is not limited to epidemic models; our framework is more general and can be applied to any system satisfying the conditions identified in this paper, even when the learning rule is unknown to the policy maker. As an illustrative example, we show in §VI-A that the Leslie-Gower system studied in [8] satisfies these conditions.

II Related Works

Earlier studies showed that, for potential games, bounded-rational learning rules with the positive correlation (PC) property, where agents revise their strategies in a way that increases their payoffs, guarantee the convergence of the population state to Nash equilibria (NEs) [9]. In addition, Hofbauer and Sandholm [10] established that, for the class of contractive games, many evolutionary dynamics lead the population state to an NE. A recent study demonstrated that for certain potential and strictly contractive games, the population state converges to an NE, even when the revision rates depend explicitly on the current strategies of the agents [11, 12]. For a survey of earlier studies and the applications of population games we refer the reader to [13, 14], and the references therein.

Motivated by the class of contractive games, Fox and Shamma [15] showed that certain learning rules, such as impartial pairwise comparison and excess-payoff target rules, exhibit a form of passivity which they named δ\delta-passivity. The concept of δ\delta-passivity was generalized in [16] to admit a large class of dynamical payoff mechanism and in [1] that introduced δ\delta-dissipativity. In [17], the authors determined a sufficient condition for the interconnections of δ\delta-dissipative dynamical systems to also be δ\delta-dissipative.

Adapting tools from robust control, Mabrok and Shamma [18] studied the passivity properties of higher-order games and determined necessary conditions for evolutionary dynamics to be stable for all higher-order passive games. Their work also proved that replicator dynamics is lossless [18], which was later shown to be not δ\delta-passive [19].

The population game framework has been used in many problems. For example, it has been used for distributed optimization [20] and distributed NE seeking [21]. Obando et al. [4] studied a temperature control problem, where the population models the heating power to be distributed in a building and is coupled to a thermal model of the building. We refer the reader to a survey [14] for additional examples.

To the best of our knowledge, [5] is the first study that used δ\delta-passivity as a design tool: a dynamic payoff mechanism was designed to lessen the impact of an epidemic subject to a limit on the long-term budget available to the decision maker. This work was extended to cases with nonnegligible disease mortality rates [6] and to scenarios with noisy payoffs to agents [22]. The same framework was also used to consider two-population scenarios [7].

Our work extends the design method in [5] to a larger class of ESs, which includes epidemic models as examples. Our assumptions are similar to those of previous studies, but the proposed dynamic payoff mechanism has a provable bound on the incentives provided to the agents. We also determine conditions under which a class of ESs coupled to a population of learning agents can be stabilized to a desired equilibrium.

III Population Games and Learning Rules

We consider a population of a large number of nondescript agents, in which each agent follows a single strategy at any given time and can repeatedly revise its strategy, based on the payoffs to available strategies at the revision times. We assume that the agents have a common set of nn strategies available to them. The instantaneous payoff obtained by following the ii-th strategy at time tt is given by pi(t)p_{i}\text{\footnotesize$(t)$}, and the payoff vector offered to the population at time tt is denoted as p(t):=(pi(t)|i[n])p\text{\footnotesize$(t)$}:=(p_{i}\text{\footnotesize$(t)$}\;|\;i\in[n]), where [n]:={1,,n}[n]:=\{1,\dots,n\}. The payoff perceived by agents at time tt is the difference between the rewards offered by the policy maker at time tt, which is denoted by r(t)r\text{\footnotesize$(t)$}, and the vector that contains the intrinsic costs of the nn strategies, which we denote by cc. Thus, the payoff vector at time tt is given by

p(t)=r(t)c.\displaystyle p\text{\footnotesize$(t)$}=r\text{\footnotesize$(t)$}-c. (1)

These assumptions render the tools of population games well-suited for analyzing the strategic interactions among the agents. The population state at time tt is denoted by x(t):=(xi(t)|i[n])x\text{\footnotesize$(t)$}:=(x_{i}\text{\footnotesize$(t)$}\;|\;i\in[n]), with xi(t)x_{i}\text{\footnotesize$(t)$} being the proportion of the population following the ii-th strategy at time tt. The vector x(t)x\text{\footnotesize$(t)$} takes values in the standard simplex

𝕏:={x[0,1]n|i=1nxi=1}.\displaystyle\mathbb{X}:=\left\{x\in[0,1]^{n}\;\Big{|}\;\sum_{i=1}^{n}x_{i}=1\right\}.

In the large-population limit, for t0t\geq 0, the population state xx evolves according to the Evolutionary Dynamics Model (EDM)

x˙(t)\displaystyle\dot{x}\text{\footnotesize$(t)$} =V(x(t),p(t)),\displaystyle=V(x\text{\footnotesize$(t)$},p\text{\footnotesize$(t)$}), (EDM)

where V:𝕏×nnV:\mathbb{X}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}. The ii-th element of V(x,p)V(x,p) is

Vi(x,p):=j=1n(xjτji(x,p)xiτij(x,p)),\displaystyle V_{i}(x,p):=\sum_{j=1}^{n}\left(x_{j}\tau_{ji}(x,p)-x_{i}\tau_{ij}(x,p)\right), (2)

with a learning rule (also referred to as a revision protocol) τ\tau that is a Lipschitz continuous map τ:𝕏×n[0,τ¯]n×n\tau:\mathbb{X}\times\mathbb{R}^{n}\rightarrow[0,\bar{\tau}]^{n\times n}, upper bounded by τ¯>0\bar{\tau}>0.

We consider the EDM coupled to an ES, whose state at time tt is denoted by y(t)y(t). The ES state y(t)y(t) takes values in 𝕐m\mathbb{Y}\subset\mathbb{R}^{m} and evolves according to

y˙(t)=f(y(t);x(t)),\displaystyle\dot{y}\text{\footnotesize$(t)$}=f(y\text{\footnotesize$(t)$};x\text{\footnotesize$(t)$}), (3)

where f:m×nmf:\mathbb{R}^{m}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is locally Lipschitz continuous and the population state x(t)x\text{\footnotesize$(t)$} acts as time-varying parameters of the ES. We assume that, for any x𝕏x\in\mathbb{X}, if x(t)xx\text{\footnotesize$(t)$}\equiv x then (3) has a unique equilibrium denoted as y(x)y^{*}(x).

The dynamics of rewards r(t)r\text{\footnotesize$(t)$} offered to the agents is described by the following:

q˙(t)\displaystyle\dot{q}\text{\footnotesize$(t)$} =G(y(t),x(t),q(t)),\displaystyle=G(y\text{\footnotesize$(t)$},x\text{\footnotesize$(t)$},q\text{\footnotesize$(t)$}), (4)
r(t)\displaystyle r\text{\footnotesize$(t)$} =H(y(t),x(t),q(t)),\displaystyle=H(y\text{\footnotesize$(t)$},x\text{\footnotesize$(t)$},q\text{\footnotesize$(t)$}),
q(0)\displaystyle q\text{\footnotesize$(0)$} =q0,\displaystyle=q_{0},

where GG and HH form a dynamic payoff mechanism to be designed by the policy maker.

Definition 1

A learning rule τ\tau is said to satisfy the positive correlation (PC) condition if the following holds: for all (x,p)𝕏×n(x,p)\in\mathbb{X}\times\mathbb{R}^{n},

V(x,p)0pV(x,p)>0.\displaystyle V(x,p)\neq 0\;\Rightarrow\;p^{\top}V(x,p)>0.
Definition 2

A learning rule τ\tau is Nash Stationary (NS) if, given the best response map :n 2𝕏\mathscr{M}:\mathbb{R}^{n}\;\rightarrow\;2^{\mathbb{X}}, where

(p):=argmaxx𝕏px,pn,\displaystyle\mathscr{M}(p):=\operatorname*{arg\,max}_{x\in\mathbb{X}}\;p^{\top}x,\quad p\in\mathbb{R}^{n},

the following holds:

V(x,p)=0x(p),pn.\displaystyle V(x,p)=0\;\Leftrightarrow\;x\in\mathscr{M}(p),\;p\in\mathbb{R}^{n}.
Definition 3

An EDM is δ\delta-passive if there exist (i) a differentiable function 𝒮:𝕏×n0\mathcal{S}:\mathbb{X}\times\mathbb{R}^{n}\;\rightarrow\;\mathbb{R}_{\geq 0} and (ii) a Lipschitz continuous function 𝒫:𝕏×n0\mathcal{P}:\mathbb{X}\times\mathbb{R}^{n}\;\rightarrow\;\mathbb{R}_{\geq 0}, which satisfy the following inequality for all xx, pp and uu in 𝕏\mathbb{X}, n\mathbb{R}^{n} and n\mathbb{R}^{n}, respectively:

𝒮x(x,p)V(x,p)+𝒮p(x,p)u𝒫(x,p)+uV(x,p)\displaystyle\frac{\partial\mathcal{S}}{\partial x}(x,p)V(x,p)+\frac{\partial\mathcal{S}}{\partial p}(x,p)u\leq-\mathcal{P}(x,p)+u^{\top}V(x,p) (5)

where 𝒮\mathcal{S} and 𝒫\mathcal{P} must also satisfy the equivalences below:

𝒮(x,p)=0V(x,p)=0,\displaystyle\mathcal{S}(x,p)=0\;\Leftrightarrow\;V(x,p)=0,
𝒫(x,p)=0V(x,p)=0.\displaystyle\mathcal{P}(x,p)=0\;\Leftrightarrow\;V(x,p)=0.

Since the EDM is determined by the learning rule, we say that the learning rule is δ\delta-passive if the resulting EDM is δ\delta-passive. Two well-known classes of learning rules that satisfy PC and NS conditions and lead to δ\delta-passive evolutionary dynamics are the separable excess payoff target and the impartial pairwise comparison learning rules.

Example 1

A learning rule τ\tau is said to be of the separable excess payoff target type [23] if, for each jj in [n][n], there is some ρj:0\rho_{j}:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0} such that

τij(x,p)\displaystyle\tau_{ij}(x,p) =ρj(pjxp) for all i[n],\displaystyle=\rho_{j}(p_{j}-x^{\top}p)\ \mbox{ for all }i\in[n],

and ρj\rho_{j} satisfies ρj(v)=0\rho_{j}(v)=0 for v0v\leq 0 and ρj(v)>0\rho_{j}(v)>0 for v>0v>0.

Example 2

A learning rule is said to be of the impartial pairwise comparison type [24] if, for each jj in [n][n], there is some ρ:0\rho:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0} such that

τij(x,p)\displaystyle\tau_{ij}(x,p) =ρj(pjpi) for all i[n],\displaystyle=\rho_{j}(p_{j}-p_{i})\ \mbox{ for all }i\in[n],

and ρj\rho_{j} satisfies ρj(v)=0\rho_{j}(v)=0 for v0v\leq 0 and ρj(v)>0\rho_{j}(v)>0 for v>0v>0.

Lastly, we introduce a lemma that will be useful for proving our main result (Theorem 1) in the following section.

Lemma 1

For any fixed qnq\in\mathbb{R}^{n} and x¯(q)\bar{x}\in\mathscr{M}\left(q\right), the only vector x𝕏x\in\mathbb{X} that satisfies

x(x¯x+q)\displaystyle x\in\mathscr{M}\left(\bar{x}-x+q\right) (6)

is x=x¯x=\bar{x}.

Proof.

Since x¯(q)\bar{x}\in\mathscr{M}\left(q\right), it is a solution to (6). To see that no other solution exists, rewrite (6) as

(yx)(xx¯q)0y𝕏.(y-x)^{\top}\left(x-\bar{x}-q\right)\geq 0\quad\forall y\in\mathbb{X}.

Define f(x)=xx¯q22f(x)=\|x-\bar{x}-q\|_{2}^{2}, and note from the inequality above

(yx)f(x)0y𝕏.(y-x)^{\top}\nabla f(x)\geq 0\quad\forall y\in\mathbb{X}.

It then follows from the minimum principle that xargminf(z)x\in\operatorname*{arg\,min}f(z) s.t. z𝕏z\in\mathbb{X}. As this is a convex problem with strictly convex objective, it has a unique solution. Since x¯\bar{x} is a solution as noted above, the lemma follows. ∎

IV Main Result

Our goal is to design a dynamic payoff mechanism given by the maps GG and HH, which not only guarantees the convergence of the population state to some x𝕏x^{*}\in\mathbb{X} selected by the policy maker, but also gives bounds on the ES state y(t)y\text{\footnotesize$(t)$} and the instantaneous cost r(t)r\text{\footnotesize$(t)$} incurred by the policy maker. Our convergence result assumes that the population adopts a learning rule that is δ\delta-passive, NS and PC, along with some conditions on the ES in (3). For the incentive design, we do not need to know the learning rule used by the agents as long as it satisfies the properties above.

Although our assumptions are similar to those in the previous works, we do not require the assumption in equation [5, (13)] on the storage function associated with the learning rule, but instead assume that the learning rule is PC, which is simpler to check.

Our approach to designing the dynamic payoff mechanism leverages the δ\delta-passivity of the EDM, which yields a Lyapunov function for the overall system. The Lyapunov function is used to bound (y,x)(t)(y,x)\text{\footnotesize$(t)$} based on the initial condition (y,x)(0)(y,x)\text{\footnotesize$(0)$}. Moreover, the maps GG and HH, combined with the bound on (y,x)(t)(y,x)\text{\footnotesize$(t)$}, also enable us to bound the instantaneous rewards provided to the agents, as discussed in §V.

Theorem 1

Consider a payoff vector pnp^{*}\in\mathbb{R}^{n}, a population state x(p)x^{*}\in\mathscr{M}(p^{*}), and positive design parameters k1,k2,k_{1},k_{2}, and k3k_{3}. Suppose that the exogenous system (3) satisfies the following:

(i) There is a nonnegative continuously differentiable function 𝒰:𝕐×𝕏0\mathcal{U}:\mathbb{Y}\times\mathbb{X}\rightarrow\mathbb{R}_{\geq 0} such that, for any x𝕏x\in\mathbb{X} and y𝕐y\in\mathbb{Y}, 𝒰y(y;x)f(y;x)0\frac{\partial\mathcal{U}}{\partial y}(y;x)f(y;x)\leq 0.

(ii) For any α0\alpha\in\mathbb{R}_{\geq 0}, {(y,x)𝕐×𝕏|𝒰(y;x)α}\{(y,x)\in\mathbb{Y}\times\mathbb{X}\;|\;\mathcal{U}(y;x)\leq\alpha\} is compact.

(iii) For every x𝕏x\in\mathbb{X}, the set {y(x)}\{y^{*}(x)\} is the largest invariant subset of {y𝕐|𝒰y(y;x)f(y;x)=0}\{y\in\mathbb{Y}|\frac{\partial\mathcal{U}}{\partial y}(y;x)f(y;x)=0\}.

(iv) The function 𝒰\mathcal{U} satisfies 𝒰x(y(x);x)=𝟎\frac{\partial\mathcal{U}}{\partial x}(y^{*}(x);x)=\mathbf{0} for every x𝕏x\in\mathbb{X}.

In addition, assume that

(v) the learning rule τ\tau is Nash stationary, δ\delta-passive, and positively correlated.

Then, the dynamic payoff mechanism given by

G(y,x,q)=\displaystyle G(y,x,q)= k1x𝒰(y;x)\displaystyle-k_{1}\nabla_{x}\mathcal{U}(y;x)
k2(xx)k3(qp),\displaystyle-k_{2}(x-x^{*})-k_{3}(q-p^{*}), (7)
H(y,x,q)\displaystyle H(y,x,q) =c+q,\displaystyle=c+q,

guarantees that, for any initial condition (y,x)(0)𝕐×𝕏(y,x)\text{\footnotesize$(0)$}\in\mathbb{Y}\times\mathbb{X} and q0q_{0} in n\mathbb{R}^{n}, we have (y,x,q)(t)t(y(x),x,p)(y,x,q)\text{\footnotesize$(t)$}\xrightarrow{t\rightarrow\infty}(y^{*}(x^{*}),x^{*},p^{*}).

Proof.

Define the following candidate Lyapunov function for the overall system comprised of (2), (3), and (4).

(y,x,q):=\displaystyle\mathcal{L}(y,x,q):= k1𝒰(y;x)+k3(maxi(pi)xp)\displaystyle\ k_{1}\mathcal{U}(y;x)+k_{3}(\max_{i}(p_{i}^{*})-x^{\top}p^{*})
+k22xx22+𝒮(x,q),\displaystyle+\frac{k_{2}}{2}\|x-x^{*}\|_{2}^{2}+\mathcal{S}(x,q), (8)

where 𝒮\mathcal{S} is the storage function of the EDM. Due to our selection of HH, we have p=q.p=q.

We denote by ˙(y,x,q)\dot{\mathcal{L}}(y,x,q) the directional derivatives of the function (8) along the vector field defined by (2), (3), and (4) at the point (y,x,q)(y,x,q):

˙(y,x,q)=\displaystyle\dot{\mathcal{L}}(y,x,q)= k1(y𝒰(y;x)f(y;x)+x𝒰(y;x)V(x,q))\displaystyle k_{1}(\nabla_{y}\mathcal{U}(y;x)^{\top}f(y;x)+\nabla_{x}\mathcal{U}(y;x)^{\top}V(x,q))
k3V(x,q)p+k2(xx)V(x,q)\displaystyle-k_{3}V(x,q)^{\top}p^{*}+k_{2}(x-x^{*})^{\top}V(x,q)
+x𝒮(x,q)f(y;x)+p𝒮(x,q)G(y,x,q)\displaystyle+\nabla_{x}\mathcal{S}(x,q)^{\top}f(y;x)+\nabla_{p}\mathcal{S}(x,q)^{\top}G(y,x,q)
\displaystyle\leq k1y𝒰(y;x)f(y;x)+V(x,q)G(y,x,q)\displaystyle k_{1}\nabla_{y}\mathcal{U}(y;x)^{\top}f(y;x)+V(x,q)^{\top}G(y,x,q)
+(k1x𝒰(y;x)+k2(xx)k3p)V(x,q)\displaystyle+(k_{1}\nabla_{x}\mathcal{U}(y;x)+k_{2}(x-x^{*})-k_{3}p^{*})^{\top}V(x,q)
𝒫(x,q),\displaystyle-\mathcal{P}(x,q),
=\displaystyle= k1y𝒰(y;x)f(y;x)k3qV(x,q)𝒫(x,q),\displaystyle k_{1}\nabla_{y}\mathcal{U}(y;x)^{\top}f(y;x)-k_{3}q^{\top}V(x,q)-\mathcal{P}(x,q),

where the inequality follows from (5). Conditions (i) and (v) imply that ˙\dot{\mathcal{L}} is nonpositive, and \mathcal{L} is a nonstrict Lyapunov function.

By condition (ii) and ˙0\dot{\mathcal{L}}\leq 0, {(y,x)(t)|t0}\{(y,x)\text{\footnotesize$(t)$}|t\geq 0\} is bounded. Since (7) is a bounded-input bounded-output linear system with state pp and bounded input k1x𝒰(y;x)k2(xx)+k3p-k_{1}\nabla_{x}\mathcal{U}(y;x)-k_{2}(x-x^{*})+k_{3}p^{*}, which is a continuous function of the trajectory {(y,x)(t)|t0}\{(y,x)\text{\footnotesize$(t)$}|t\geq 0\}, we obtain that {(y,x,q)(t)|t0}\{(y,x,q)\text{\footnotesize$(t)$}|t\geq 0\} is bounded.

Let 𝔼\mathbb{E} denote the largest invariant subset within {(y,x,q)|˙(y,x,q)=0}\{(y,x,q)\;|\;\dot{\mathcal{L}}(y,x,q)=0\}. From the LaSalle-Krasovskii invariance principle, (y,x,q)(t)(y,x,q)\text{\footnotesize$(t)$} converges to the ω\omega-limit set L+𝔼L^{+}\subset\mathbb{E}, which is compact and invariant with respect to (2), (3), and (4) [25, Lemma 4.1].

For any (y,x,q)(0)𝔼(y,x,q)\text{\footnotesize$(0)$}\in\mathbb{E}, the trajectory will satisfy that, for all t0t\geq 0, x(t)=x(0)x\text{\footnotesize$(t)$}=x\text{\footnotesize$(0)$} and y(t)=y(x(0))y\text{\footnotesize$(t)$}=y^{*}(x\text{\footnotesize$(0)$}) due to the fact that ˙(y(t),x(t),q(t))=0\dot{\mathcal{L}}(y\text{\footnotesize$(t)$},x\text{\footnotesize$(t)$},q\text{\footnotesize$(t)$})=0 for all t0t\geq 0 implies x˙(t)=0\dot{x}\text{\footnotesize$(t)$}=0 for all t0t\geq 0, and condition (iii) in Theorem 1. This, together with condition (iv), leads to

q˙(t)=k2(x(0)x)k3(q(t)p)\displaystyle\dot{q}\text{\footnotesize$(t)$}=-k_{2}(x\text{\footnotesize$(0)$}-x^{*})-k_{3}(q\text{\footnotesize$(t)$}-p^{*})

so that q(t)tk2k3(xx(0))+pq\text{\footnotesize$(t)$}\xrightarrow{t\rightarrow\infty}\frac{k_{2}}{k_{3}}(x^{*}-x\text{\footnotesize$(0)$})+p^{*} and

(y,x,q)(t)t(y(x(0)),x(0),k2k3(xx(0))+p).\displaystyle\textstyle(y,x,q)(t)\xrightarrow{t\rightarrow\infty}(y^{*}(x\text{\footnotesize$(0)$}),x\text{\footnotesize$(0)$},\frac{k_{2}}{k_{3}}(x^{*}-x\text{\footnotesize$(0)$})+p^{*}).

Since the learning rule is assumed NS, we must have

x(0)(xx(0)+k3k2p),\displaystyle\textstyle x\text{\footnotesize$(0)$}\in\mathscr{M}\left(x^{*}-x\text{\footnotesize$(0)$}+\frac{k_{3}}{k_{2}}p^{*}\right),

and by Lemma 1 we have x(0)=xx\text{\footnotesize$(0)$}=x^{*} and L+={(y(x),x,p)}L^{+}=\{(y^{*}(x^{*}),x^{*},p^{*})\}. ∎

V Bounds

As proven in Theorem 1, if the ES in (3) satisfies (i)-(iv) and the learning rule satisfies (v), the payoff mechanism described in (7) can be used to stabilize the ES and the population to a desired equilibrium. Also, we are able to determine bounds for the state and the rewards r(t)r\text{\footnotesize$(t)$} offered to the population.

If p(t)=𝟎p\text{\footnotesize$(t)$}=\mathbf{0} at some tt, the storage function of the EDM is equal to zero at tt, as any x(t)𝕏x\text{\footnotesize$(t)$}\in\mathbb{X} is a best response. In particular, if p(0)=p=𝟎p(0)=p^{*}=\mathbf{0}, both 𝒮(x(0),p(0))\mathcal{S}(x\text{\footnotesize$(0)$},p\text{\footnotesize$(0)$}) and max(p)x(0)p\max(p^{*})-x\text{\footnotesize$(0)$}^{\top}p^{*} are equal to zero, and we have

0\displaystyle\mathcal{L}_{0} :=(y(0),x(0),𝟎)\displaystyle:=\mathcal{L}(y\text{\footnotesize$(0)$},x\text{\footnotesize$(0)$},\mathbf{0})
=k1𝒰(y(0);x(0))+k22x(0)x22.\displaystyle=k_{1}\mathcal{U}(y\text{\footnotesize$(0)$};x\text{\footnotesize$(0)$})+\frac{k_{2}}{2}\|x\text{\footnotesize$(0)$}-x^{*}\|_{2}^{2}.

Furthermore, because \mathcal{L} is decreasing along trajectories and 𝒮(x,𝟎)=0\mathcal{S}(x,\mathbf{0})=0 for any x𝕏x\in\mathbb{X}, we obtain that, for any t0t\geq 0,

(y(t),x(t),𝟎)(y(t),x(t),p(t))\displaystyle\mathcal{L}(y\text{\footnotesize$(t)$},x\text{\footnotesize$(t)$},\mathbf{0})\leq\mathcal{L}(y\text{\footnotesize$(t)$},x\text{\footnotesize$(t)$},p\text{\footnotesize$(t)$}) (y(0),x(0),𝟎).\displaystyle\leq\mathcal{L}(y\text{\footnotesize$(0)$},x\text{\footnotesize$(0)$},\mathbf{0}). (9)

This in turn can be used to bound not only y(t)y\text{\footnotesize$(t)$} and x(t)x\text{\footnotesize$(t)$} but also the policy maker’s instantaneous cost c¯(t):=x(t)r(t)\bar{c}\text{\footnotesize$(t)$}:=x\text{\footnotesize$(t)$}^{\top}r\text{\footnotesize$(t)$}: for any t0t\geq 0,

c¯(t)max{g(y,x)|x𝕏,y𝕐,(y,x,𝟎)0},\displaystyle\bar{c}\text{\footnotesize$(t)$}\leq\max\{g(y,x)|x\in\mathbb{X},y\in\mathbb{Y},\mathcal{L}(y,x,\mathbf{0})\leq\mathcal{L}_{0}\}, (10)
g(y,x):=c+G(y,x,𝟎)/k3,\displaystyle g(y,x):=\left\|c+G(y,x,\mathbf{0})/k_{3}\right\|_{\infty},

where the terms G(y,x,𝟎)G(y,x,\mathbf{0}), (y,x,𝟎)\mathcal{L}(y,x,\mathbf{0}) and 0\mathcal{L}_{0} are affected by the choice of the parameters k1k_{1}, k2k_{2}, and k3k_{3}.

VI Examples

In §VI-A and §VI-B we present two systems that fit our framework as the ES in (3). They meet conditions (i)-(iv) of Theorem 1, and when coupled to a population that employs a learning rule satisfying condition (v), the dynamic payoff mechanism described in Theorem 1 stabilizes the overall system to a desired equilibrium y(x)y^{*}(x). In §VI-C we consider a modification of [6] to exemplify how our theorem can be leveraged for design: We first select a target equilibrium population state xx^{*} that minimizes the disease transmission rate subject to a budget constraint, and choose the parameters k1,k2k_{1},k_{2}, and k3k_{3} so that the peak size of the infected population is guaranteed to be below a given threshold. We then present simulation results using several different learning rules.

Our examples focus on systems that are naturally coupled to a population of agents and are affected by the strategic choices of the agents. The system considered in [5, 6] is a compartmental model of an epidemic disease, and the population state affects the transmission rate of the disease. Similarly, [7] considers an epidemic model with two interacting populations, with the transmission rates of each population being affected by its agents’ current strategies. Korobeinikov [8] studied a Host-Parasite model, and by finding a nonstrict Lyapunov function he proved the convergence to the unique equilibrium of the model. We modify this model so that some of its parameters change according to the population state. We choose a desirable equilibrium (y(x),x)(y^{*}(x^{*}),x^{*}) of this modified model to reduce the number of parasites at the equilibrium, and then use (7) to stabilize the equilibrium.

VI-A Leslie-Gower predator-prey model

Korobeinikov [8] studies the Leslie-Gower model that captures the interaction of populations of hosts and parasites. Let O(t)O\text{\footnotesize$(t)$} and P(t)P\text{\footnotesize$(t)$} denote the number of hosts and parasites, respectively, at time tt. The population sizes evolve according to the following differential equations:

O˙(t)\displaystyle\dot{O}\text{\footnotesize$(t)$} =(z1a1P(t)b1O(t))O(t),\displaystyle=(z_{1}-a_{1}P\text{\footnotesize$(t)$}-b_{1}O\text{\footnotesize$(t)$})O\text{\footnotesize$(t)$}, (11a)
P˙(t)\displaystyle\dot{P}\text{\footnotesize$(t)$} =(z2a2P(t)/O(t))P(t),\displaystyle=(z_{2}-a_{2}P\text{\footnotesize$(t)$}/O\text{\footnotesize$(t)$})P\text{\footnotesize$(t)$}, (11b)

where z1,z2,a1z_{1},z_{2},a_{1}, and a2a_{2} are positive, and b1b_{1} is nonnegative. The intrinsic population growth rates of the hosts and parasites are z1z_{1} and z2z_{2}, respectively. The parameter b1b_{1} relates to a growth limit on the hosts without parasites, while a1a_{1} relates to a decrease of hosts due the parasites, and a2a_{2} relates to a population limit on the parasites due to the number of hosts. The unique co-existing equilibrium, where O,P>0O^{*},P^{*}>0, is

O=z1a1z2+a2b1 and P=z1z2a1z2+a2b1.\displaystyle O^{*}=\frac{z_{1}}{a_{1}z_{2}+a_{2}b_{1}}\ \mbox{ and }\ P^{*}=\frac{z_{1}z_{2}}{a_{1}z_{2}+a_{2}b_{1}}.

The following is a nonstrict Lyapunov function of (11) on (0,)2(0,\infty)^{2}:

U(O,P):=𝒰~((O,P);(O,P)),\displaystyle U(O,P):=\tilde{\mathcal{U}}((O,P);(O^{*},P^{*})), (12)

where

𝒰~(v;w):=log(v1w1)+w1v1+a1w1a2(log(v2w2)+w2v2),\displaystyle\tilde{\mathcal{U}}(v;w):=\log\left(\frac{v_{1}}{w_{1}}\right)+\frac{w_{1}}{v_{1}}+\frac{a_{1}w_{1}}{a_{2}}\left(\log\left(\frac{v_{2}}{w_{2}}\right)+\frac{w_{2}}{v_{2}}\right),

with v,w>02v,w\in\mathbb{R}_{>0}^{2}. The directional derivatives of (12) along the vector field defined by (11) is

U˙(O,P)=a1P(PP)2b1O(OO)2\displaystyle\dot{U}(O,P)=-\frac{a_{1}}{P}(P-P^{*})^{2}-\frac{b_{1}}{O}(O-O^{*})^{2}

with positive a1a_{1} and nonnegative b1b_{1}. This confirms that (12) is a Lyapunov function of (11).

Suppose that z1z_{1} and z2z_{2} are functions of the population state, i.e., z1,z2:𝕏>0z_{1},z_{2}:\mathbb{X}\rightarrow\mathbb{R}_{>0}, and b1:𝕏0b_{1}:\mathbb{X}\rightarrow\mathbb{R}_{\geq 0}. Such scenarios could arise when the agents are the farmers who breed and raise livestock, which are the hosts affected by parasites. The strategic choices of the agents could include, for example, how many animals to breed or which measures to take to reduce the spread of parasites, e.g., diagnosing, isolating, and treating infected hosts. In this case, the Leslie-Gower model as the ES in (3) is described by

f(y;x):=\displaystyle f(y;x):= [(z1(x)a1y2b1(x)y1)y1(z2(x)a2y2/y1)y2]\displaystyle\begin{bmatrix}(z_{1}(x)-a_{1}y_{2}-b_{1}(x)y_{1})y_{1}\\ (z_{2}(x)-a_{2}y_{2}/y_{1})y_{2}\end{bmatrix}

with the equilibrium

y(x):=\displaystyle y^{*}(x):= [z1(x)a1z2(x)+a2b1(x)z1(x)z2(x)a1z2(x)+a2b1(x)]𝕐=(0,)2.\displaystyle\begin{bmatrix}\frac{z_{1}(x)}{a_{1}z_{2}(x)+a_{2}b_{1}(x)}\\ \frac{z_{1}(x)z_{2}(x)}{a_{1}z_{2}(x)+a_{2}b_{1}(x)}\end{bmatrix}\in\mathbb{Y}=(0,\infty)^{2}.

Suppose z1,z2z_{1},z_{2}, and b1b_{1} are continuously differentiable so that the equilibrium map yy^{*} is also continuously differentiable.

Define

𝒰(y;x):=𝒰~(y;y(x))a1a2y1(x).\displaystyle\mathcal{U}(y;x):=\tilde{\mathcal{U}}(y;y^{*}(x))-\frac{a_{1}}{a_{2}}y_{1}^{*}(x)\ .

Note that (a) 𝒰(y;x)>0\mathcal{U}(y;x)>0 for any yy(x)y\neq y^{*}(x), and (b) for any x𝕏x\in\mathbb{X}, if x(t)xx\text{\footnotesize$(t)$}\equiv x, then 𝒰˙(y(t);x(t))=U˙(y1(t),y2(t))\dot{\mathcal{U}}(y\text{\footnotesize$(t)$};x\text{\footnotesize$(t)$})=\dot{U}(y_{1}\text{\footnotesize$(t)$},y_{2}\text{\footnotesize$(t)$}). Thus, it satisfies conditions (i) and (iii). As a continuous function, the sublevel sets of 𝒰\mathcal{U} are closed, and we can verify that they are also bounded so that condition (ii) is satisfied. Lastly, 𝒰\mathcal{U} satisfies (iv):

𝒰x(y(x);x)\displaystyle\frac{\partial\mathcal{U}}{\partial x}(y^{*}(x);x)
=(𝒰~w(y(x);y(x))a1a2[10])yx(x)=𝟎,\displaystyle=\left(\frac{\partial\tilde{\mathcal{U}}}{\partial w}(y^{*}(x);y^{*}(x))-\frac{a_{1}}{a_{2}}\begin{bmatrix}1\\ 0\end{bmatrix}^{\top}\right)\frac{\partial y^{*}}{\partial x}(x)=\mathbf{0},

because

𝒰~w(y(x);y(x))=a1a2[10].\displaystyle\frac{\partial\tilde{\mathcal{U}}}{\partial w}(y^{*}(x);y^{*}(x))=\frac{a_{1}}{a_{2}}\begin{bmatrix}1\\ 0\end{bmatrix}^{\top}.

VI-B Epidemic Population Games (EPG)

Previous studies on EPG [5, 6, 7] examined epidemic compartmental models coupled to a population, with the epidemic model being the ES. Here we show that the model used in [6] satisfies the conditions in Theorem 1, even for a more general dependency of the transmission rates on the agents’ strategies than that considered in [6]. The epidemic model satisfies conditions (i)-(iv) and, when coupled to a population employing a learning rule that satisfies (v), we can use Theorem 1 to drive them to a desirable equilibrium. A similar analysis shows that the epidemic models in [5, 7] also satisfy conditions (i)-(iv).

We first briefly describe the normalized susceptible-infectious-recovered-susceptible (SIRS) model, which is the ES we aim to stabilize. Let I(t)I\text{\footnotesize$(t)$}, and R(t)R\text{\footnotesize$(t)$} denote the proportions of infected agents and recovered agents, respectively, in the population at time tt. Suppose that N(t)N\text{\footnotesize$(t)$} is the population size at time tt. The population size changes according to N˙(t)=(gδI(t))N(t)\dot{N}\text{\footnotesize$(t)$}=(g-\delta I\text{\footnotesize$(t)$})N\text{\footnotesize$(t)$}, where g:=θζg:=\theta-\zeta is the difference between the birth rate θ\theta and the natural death rate ζ\zeta, and δ>0\delta>0 is the disease death rate. The disease recovery rate and the rate at which a recovered individual becomes susceptible again due to waning immunity are denoted by γ\gamma and ψ\psi, respectively.

Since the model is normalized, I(t)[0,1]I\text{\footnotesize$(t)$}\in[0,1] and R(t)[0,1I(t)]R\text{\footnotesize$(t)$}\in[0,1-I\text{\footnotesize$(t)$}] at any time tt. The ES state y(t)y\text{\footnotesize$(t)$} is given by (I(t),R(t))(I\text{\footnotesize$(t)$},R\text{\footnotesize$(t)$}) and evolves according to

I˙(t)\displaystyle\dot{I}\text{\footnotesize$(t)$} =((t)S(t)+δI(t)σ)I(t),\displaystyle=({\mathcal{B}}\text{\footnotesize$(t)$}S\text{\footnotesize$(t)$}+\delta I\text{\footnotesize$(t)$}-\sigma)I\text{\footnotesize$(t)$}, (13a)
R˙(t)\displaystyle\dot{R}\text{\footnotesize$(t)$} =γI(t)ωR(t)+δR(t)I(t),\displaystyle=\gamma I\text{\footnotesize$(t)$}-\omega R\text{\footnotesize$(t)$}+\delta R\text{\footnotesize$(t)$}I\text{\footnotesize$(t)$}, (13b)

where (t){\mathcal{B}}\text{\footnotesize$(t)$} is the average transmission rate at time tt, σ¯:=γ+ζ+δ\bar{\sigma}:=\gamma+\zeta+\delta, σ:=g+σ¯\sigma:=g+\bar{\sigma}, ω¯:=ψ+ζ\bar{\omega}:=\psi+\zeta, σ:=g+ω¯\sigma:=g+\bar{\omega}, and σ¯1\bar{\sigma}^{-1} is the mean infectious period for an affected individual (till recovery or death). The adopted time unit is one day, and newborns are assumed susceptible. As in [6], we assume δ>0\delta>0 but moderate such that δ<min{ω,γ}\delta<\min\{\omega,\gamma\}. Also, (t)>σ{\mathcal{B}}\text{\footnotesize$(t)$}>\sigma for all t0t\geq 0 so that there is a unique endemic equilibrium.

For fixed (t)B>σ{\mathcal{B}}\text{\footnotesize$(t)$}\equiv B>\sigma, the endemic equilibrium of (13) is given by the differentiable functions of BB:

IB:=\displaystyle I_{B}^{*}:= bBΔ2δ(Bδ), and\displaystyle\frac{b_{B}-\sqrt{\Delta}}{2\delta(B-\delta)},\ \mbox{ and}
RB:=\displaystyle R_{B}^{*}:= (1σ/B)(1δ/B)IB,\displaystyle(1-\sigma/B)-(1-\delta/B)I_{B}^{*},

where bB:=γB+ω(Bδ)+δ(Bσ)b_{B}:=\gamma B+\omega(B-\delta)+\delta(B-\sigma), and the discriminant is Δ:=bB2B4δω(Bδ)(Bσ)\Delta:=b_{B}^{2}B-4\delta\omega(B-\delta)(B-\sigma).

Now, suppose (t)B(x(t)){\mathcal{B}}\text{\footnotesize$(t)$}\equiv B(x\text{\footnotesize$(t)$}), where B:𝕏(σ,)B:\mathbb{X}\rightarrow(\sigma,\infty) is a continuously differentiable function, and let 𝕐:=(0,1]×[0,1]\mathbb{Y}:=(0,1]\times[0,1] and y(x):=(IB(x),RB(x))y^{*}(x):=(I_{B(x)}^{*},R_{B(x)}^{*}). For a fixed x(t)xx\text{\footnotesize$(t)$}\equiv x, the following is a strict Lyapunov function for (13) on 𝕐\mathbb{Y}:

𝒰(I,R;x):=\displaystyle{\mathcal{U}}(I,R;x):= 𝒰~(I,R;B(x)),\displaystyle\;{\tilde{\mathcal{U}}}(I,R;B(x)), (15)

where

𝒰~(I,R;B):=\displaystyle{\widetilde{\mathcal{U}}}(I,R;B):= (IIB)+IBlnIBI+aB2(RRB)2,\displaystyle(I-I_{B}^{*})+I_{B}^{*}\ln{\frac{I_{B}^{*}}{I}}+\frac{a_{B}}{2}(R-R_{B}^{*})^{2},

and aB:=B/(γ+δRB)a_{B}:=B/(\gamma+\delta R_{B}^{*}). The derivative of (15) along trajectories is

ddt𝒰~(I(t),R(t);B)=\displaystyle\frac{d}{dt}{\widetilde{\mathcal{U}}}(I\text{\footnotesize$(t)$},R\text{\footnotesize$(t)$};B)= (Bδ)(I(t)IB)2\displaystyle\ -(B-\delta)(I\text{\footnotesize$(t)$}-I_{B}^{*})^{2}
aB(ωδI(t))(R(t)RB)2,\displaystyle-a_{B}(\omega-\delta I\text{\footnotesize$(t)$})(R\text{\footnotesize$(t)$}-R_{B}^{*})^{2},

and it is negative for any (I(t),R(t))𝕐{y(x)}(I\text{\footnotesize$(t)$},R\text{\footnotesize$(t)$})\in\mathbb{Y}\setminus\{y^{*}(x)\}.

Since (15) is a strict Lyapunov function when x(t)x\text{\footnotesize$(t)$} is constant, it satisfies conditions (i) and (iii). As 𝒰\mathcal{U} is a continuous function, its sublevel sets are closed. Moreover, because the sublevel sets are also contained in 𝕐×𝕏\mathbb{Y}\times\mathbb{X}, which is a bounded set, they are also bounded and, hence, condition (ii) holds. Lastly, we can verify condition (iv) as follows.

𝒰x(IB(x),RB(x);x)\displaystyle\frac{\partial\mathcal{U}}{\partial x}(I_{B(x)}^{*},R_{B(x)}^{*};x) =𝒰~B(IB(x),RB(x);B(x))Bx(x)=0,\displaystyle=\frac{\partial{\widetilde{\mathcal{U}}}}{\partial B}(I_{B(x)}^{*},R_{B(x)}^{*};B(x))\frac{\partial B}{\partial x}(x)=0,

where the second equality follows from 𝒰~B(IB(x),RB(x);B(x))=0\frac{\partial{\widetilde{\mathcal{U}}}}{\partial B}(I_{B(x)}^{*},R_{B(x)}^{*};B(x))=0.

VI-C Designing an Intervention Policy for Epidemics with Nonlinear Infection Rate

We consider a modification of the EPG studied in [6], which was described in §VI-B, as an application of Theorem 1 to a dynamic payoff design problem. We aim to mitigate an epidemic outbreak and reduce the endemic level of infected agents while guaranteeing that the long-term cost of the policy maker does not exceed some available budget cc^{*}.

The study in [6] considers the average transmission rate that depends linearly on the population state, with (t)βx(t){\mathcal{B}}\text{\footnotesize$(t)$}\equiv{\beta}^{\top}x\text{\footnotesize$(t)$}, where β>0n{\beta}\in\mathbb{R}_{>0}^{n}. Such dependency on x(t)x\text{\footnotesize$(t)$} is consistent with the choices of the susceptible agents determining the likelihood of contracting the disease when exposed, e.g., choosing to wear masks or getting vaccinated. In their model, the proportion of susceptible agents following the ii-th strategy at time tt is xi(t)S(t)x_{i}\text{\footnotesize$(t)$}S\text{\footnotesize$(t)$}, and for those agents the rate of new infections is equal to βixi(t)S(t)I(t)\beta_{i}x_{i}\text{\footnotesize$(t)$}S\text{\footnotesize$(t)$}I\text{\footnotesize$(t)$}.

Suppose that we allow the average transmission rate to depend on both the choices of the susceptible agents and those of the infected agents. For example, an infected agent that takes no preventive measures is more likely to transmit the disease than another infected agent that does take preventive measures. In this case, the rate of new infections among susceptible agents following strategy ii due to the exposure to infected agents adopting strategy jj would be βijxi(t)S(t)xj(t)I(t)\beta_{ij}x_{i}\text{\footnotesize$(t)$}S\text{\footnotesize$(t)$}x_{j}\text{\footnotesize$(t)$}I\text{\footnotesize$(t)$}. Therefore, the average transmission rate is given by

(t)B(x(t)):=x(t)Qx(t),t0,\displaystyle{\mathcal{B}}\text{\footnotesize$(t)$}\equiv B(x\text{\footnotesize$(t)$}):=x\text{\footnotesize$(t)$}^{\top}Qx\text{\footnotesize$(t)$},\quad t\geq 0,

with Qn×nQ\in\mathbb{R}^{n\times n} being a positive matrix with the elements Qij=βijQ_{ij}=\beta_{ij}. We assume that the disease is too infectious to be eradicated so that B(x)σB(x)\geq\sigma for all x𝕏x\in\mathbb{X}.

We aim to select a target population equilibrium xx^{*} that minimizes the transmission rate subject to a budget constraint. In order to find xx^{*} we solve

xargminz𝕏B(z) s.t. czminicic,\displaystyle x^{*}\in\operatorname*{arg\,min}_{z\in\mathbb{X}}B(z)\ \text{ s.t. }c^{\top}z-\min_{i}c_{i}\leq c^{*},

where cc is the vector of intrinsic costs of the strategies, and cc^{*} is the long-term budget available to the policy maker.

After determining xx^{*}, we select k1,k2,k3>0k_{1},k_{2},k_{3}>0 and use the payoff mechanism described in Theorem 1 to lead the population and epidemic states to the selected equilibrium.

Example 3

Consider a disease with parameters: δ=0.005\delta=0.005, ζ=0\zeta=0, θ=0.0002\theta=0.0002, γ=0.1\gamma=0.1 (mean recovery period \sim 10 days) and ω¯=0.011\bar{\omega}=0.011 (mean immunity period \sim 91 days). Agents have three available strategies with

Q=[0.130.180.20.160.220.230.170.280.5], and c\displaystyle Q=\begin{bmatrix}0.13&0.18&0.2\\ 0.16&0.22&0.23\\ 0.17&0.28&0.5\end{bmatrix},\ \mbox{ and }\ c =[0.20.10].\displaystyle=\begin{bmatrix}0.2\\ 0.1\\ 0\end{bmatrix}.

The initial conditions are I(0)=0.019I\text{\footnotesize$(0)$}=0.019 and R(0)=0.172R\text{\footnotesize$(0)$}=0.172 for the epidemic, x(0)=(1 0 0)x\text{\footnotesize$(0)$}=(1\;0\;0)^{\top} for the EDM, and q(0)=(0 0 0)q\text{\footnotesize$(0)$}=(0\;0\;0)^{\top} for the payoff dynamics. The long-term budget of the policy maker is c=0.1c^{*}=0.1, and we select p=𝟎p^{*}=\mathbf{0}, which yields x(1 10 1)/12x^{*}\approx(1\;10\;1)/12 and y(x)(5.1%,46.9%)y^{*}(x^{*})\approx(5.1\%,46.9\%). Our goal is to design GG and HH so that I(t)10%{I\text{\footnotesize$(t)$}}\leq 10\% for all t0t\geq 0.

For simplicity, we select q0=cq_{0}=-c and p=p(0)=𝟎p^{*}=p(0)=\mathbf{0} and k3=1k_{3}=1, as k3k_{3} does not affect the bound on I(t)I\text{\footnotesize$(t)$} if p=𝟎p^{*}=\mathbf{0}. Based on (9), we look for values of k1k_{1} and k2k_{2} that meet our requirement that I(t)10%{I\text{\footnotesize$(t)$}}\leq 10\% for all t0t\geq 0, by solving

Imax(k1,k2):=maxI,R,x\displaystyle I_{\text{max}}(k_{1},k_{2}):=\max_{I,R,x}\quad I\displaystyle I
s.t.x\displaystyle\text{s.t.}\leavevmode\nobreak\ x 𝕏,\displaystyle\in\mathbb{X},
I,R\displaystyle I,R 0,\displaystyle\geq 0,
I+R\displaystyle I+R 1,\displaystyle\leq 1,
(y,x,0)\displaystyle\mathcal{L}(y,x,0) (y(0),x(0),0),\displaystyle\leq\mathcal{L}(y\text{\footnotesize$(0)$},x\text{\footnotesize$(0)$},0), (16)

where k1k_{1} and k2k_{2} affect constraint (16).

We solve numerically the optimization above for several values of k1k_{1} and k2k_{2}, and show the results in Fig. 1. The requirement that I(t)10%{I\text{\footnotesize$(t)$}}\leq 10\% is met for k1,k2k_{1},k_{2} in the region on the bottom right of the plot, and we select k1=2k_{1}=2 and k2=0.022k_{2}=0.022. We then use the reward mechanism given by Theorem 1 to guide the population to the desired equilibrium.

The simulation results for several different learning rules that satisfy condition (iv) in Theorem 1 are shown in Fig. 2.111See the simulation code at github/jcert/incentive-design-coupled-dynamics for more details on the learning rules that were used. It is clear that they all converge and satisfy the requirement that I(t)10%{I\text{\footnotesize$(t)$}}\leq 10\% for all t0t\geq 0. Had we not used the bound, Imax(k1,k2)I_{\text{max}}(k_{1},k_{2}), to determine the parameters of the reward mechanism, the bound on the peak of infections may have been violated, as shown in Fig. 3. In both figures we observe that the instantaneous cost, c¯(t)\bar{c}\text{\footnotesize$(t)$}, converges to cc^{*}, which is represented as a dashed black line in the plots.

Refer to caption
Figure 1: Bound Imax(k1,k2)I_{\text{max}}(k_{1},k_{2}), for the conditions in Example 3, when varying k1k_{1} and k2k_{2}.
Refer to caption
Figure 2: Simulation of Example 3, for many different learning rules, using k1=2,k2=0.022,k3=1k_{1}=2,k_{2}=0.022,k_{3}=1.
Refer to caption
Figure 3: Simulation of Example 3, for many different learning rules, using k1=k2=k3=1k_{1}=k_{2}=k_{3}=1.

VII Conclusion

We studied a large population of learning agents whose strategic choices influence the dynamics of an ES we seek to stabilize at a desired equilibrium. Our framework can be used to design a dynamic payoff mechanism that guarantees the convergence of both the population and the ES (Theorem 1). When the conditions on the ES stated in the theorem are met, the designed incentives can stabilize more general systems than previously considered.

We also presented example systems that satisfy the conditions of our main result (§VI-A and §VI-B) and applied our framework to design incentives that mitigate an epidemic with nonlinear infection rates subject to long-time budget constraints (§VI-C). Unlike the incentives designed in the previous studies of EPG, our payoff mechanism is guaranteed to have a bound on the instantaneous reward offered to the population.

In future research we plan to extend the results by relaxing the assumptions on the ES to be only local and to allow the EDM to depend on the ES states. Another direction we are interested in pursuing is to examine design problems where the payoff of certain strategies can only be partially designed.

References

  • [1] M. Arcak and N. C. Martins, “Dissipativity Tools for Convergence to Nash Equilibria in Population Games,” IEEE Control Netw. Syst., vol. 8, no. 1, pp. 39–50, Mar. 2021.
  • [2] A. Pantoja and N. Quijano, “A population dynamics approach for the dispatch of distributed generators,” IEEE Transactions on Industrial Electronics, vol. 58, no. 10, pp. 4559–4567, Oct. 2011.
  • [3] S. Park, Y. D. Zhong, and N. E. Leonard, “Multi-Robot Task Allocation Games in Dynamically Changing Environments,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   Xi’an, China: IEEE, May 2021, pp. 8678–8684. [Online]. Available: https://ieeexplore.ieee.org/document/9561809/
  • [4] G. Obando, A. Pantoja, and N. Quijano, “Building temperature control based on population dynamics,” IEEE transactions on Control Systems Technology, vol. 22, no. 1, pp. 404–412, Jan. 2014.
  • [5] N. C. Martins, J. Certório, and R. J. La, “Epidemic population games and evolutionary dynamics,” Automatica, vol. 153, p. 111016, Jul. 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109823001711
  • [6] J. Certório, N. C. Martins, and R. J. La, “Epidemic Population Games With Nonnegligible Disease Death Rate,” IEEE Control Systems Letters, vol. 6, pp. 3229–3234, 2022.
  • [7] J. Certório, R. J. La, and N. C. Martins, “Epidemic Population Games for Policy Design: Two Populations with Viral Reservoir Case Study,” in Proc. IEEE conf. decis. control (CDC), 2023, pp. 7667–7674. [Online]. Available: https://ieeexplore.ieee.org/document/10383665
  • [8] A. Korobeinikov, “A Lyapunov function for Leslie-Gower predator-prey models,” Appl. Math. Lett., vol. 14, no. 6, pp. 697–699, Aug. 2001. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S089396590180029X
  • [9] W. H. Sandholm, “Potential games with continuous player sets,” Journal of economic theory, vol. 97, pp. 81–108, 2001.
  • [10] J. Hofbauer and W. H. Sandholm, “Stable games and their dynamics,” Journal of Economic Theory, vol. 144, no. 4, pp. 1665–1693.e4, Jul. 2009.
  • [11] S. Kara, N. C. Martins, and M. Arcak, “Population games with erlang clocks: Convergence to nash equilibria for pairwise comparison dynamics,” in Proc. IEEE conf. decis. control (CDC), 2022, pp. 7688–7695.
  • [12] S. Kara and N. C. Martins, “Excess Payoff Evolutionary Dynamics With Strategy-Dependent Revision Rates: Convergence to Nash Equilibria for Potential Games,” IEEE Control Systems Letters, vol. 7, pp. 1009–1014, 2023.
  • [13] W. H. Sandholm, Population games and evolutionary dynamics.   MIT Press, 2010.
  • [14] N. Quijano, C. Ocampo-Martinez, J. Barreiro-Gomez, G. Obando, A. Pantoja, and E. Mojica-Nava, “The Role of Population Games and Evolutionary Dynamics in Distributed Control Systems: The Advantages of Evolutionary Game Theory,” IEEE Control Syst., vol. 37, no. 1, pp. 70–97, 2017. [Online]. Available: https://ieeexplore.ieee.org/document/7823106/
  • [15] M. J. Fox and J. S. Shamma, “Population games, stable games, and passivity,” Games, vol. 4, pp. 561–583, 2013.
  • [16] S. Park, N. C. Martins, and J. S. Shamma, “Payoff dynamics model and evolutionary dynamics model: Feedback and convergence to equilibria,” 2020.
  • [17] K. S. Schweidel and M. Arcak, “Compositional Analysis of Interconnected Systems Using Delta Dissipativity,” IEEE Control Systems Letters, vol. 6, pp. 662–667, 2022.
  • [18] M. A. Mabrok and J. S. Shamma, “Passivity analysis of higher order evolutionary dynamics and population games,” in Proc. IEEE conf. decis. control (CDC), 2016, pp. 6129–6134. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7799211
  • [19] S. Park, J. S. Shamma, and N. C. Martins, “Passivity and evolutionary game dynamics,” in Proc. IEEE conf. decis. control (CDC), 2018, pp. 3553–3560.
  • [20] J. Martinez-Piazuelo, N. Quijano, and C. Ocampo-Martinez, “A Payoff Dynamics Model for Equality-Constrained Population Games,” IEEE Control Systems Letters, vol. 6, pp. 530–535, 2022.
  • [21] J. Martinez-Piazuelo, C. Ocampo-Martinez, and N. Quijano, “On Distributed Nash Equilibrium Seeking in a Class of Contractive Population Games,” IEEE Control Systems Letters, pp. 1–1, 2022.
  • [22] S. Park, J. Certório, N. C. Martins, and R. J. La, “Epidemic population games and perturbed best response dynamics,” 2024.
  • [23] W. H. Sandholm, “Excess payoff dynamics and other well-behaved evolutionary dynamics,” Journal of Economic Theory, vol. 124, no. 2, pp. 149–170, Oct. 2005. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0022053105000554
  • [24] ——, “Pairwise comparison dynamics and evolutionary foundations for Nash equilibrium,” Games, vol. 1, no. 1, pp. 3–17, 2010.
  • [25] H. K. Khalil, Nonlinear systems.   Prentice Hall, 1995.