This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Heterogeneous Beliefs and Multi-Population Learning in Network Games

Shuyue Hu Shanghai Artificial Intelligence Laboratory Harold Soh National University of Singapore Georgios Piliouras Singapore University of Technology and Design
Abstract

The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP). We show that the system state — a probability distribution over beliefs — evolves according to a system of partial differential equations. We establish the convergence of SFP to Quantal Response Equilibria in different classes of games capturing both network competition as well as network coordination. We also prove that the beliefs will eventually homogenize in all network games. Although the initial belief heterogeneity disappears in the limit, we show that it plays a crucial role for equilibrium selection in the case of coordination games as it helps select highly desirable equilibria. Contrary, in the case of network competition, the resulting limit behavior is independent of the initialization of beliefs, even when the underlying game has many distinct Nash equilibria.

1 Introduction

Smooth Fictitious play (SFP) and variants thereof are arguably amongst the most well-studied learning models in AI and game theory [2, 3, 21, 22, 9, 19, 36, 37, 42, 18, 17]. SFP describes a belief-based learning process: agents form beliefs about the play of opponents and update their beliefs based on observations. Informally, an agent’s belief can be thought as reflecting how likely its opponents will play each strategy. During game plays, each agent plays smoothed best responses to its beliefs. Much of the literature of SFP is framed in the context of homogeneous beliefs models where all agents in a given role have the same beliefs. This includes models with one agent in each player role [3, 2, 39] as well as models with a single population but in which all agents have the same beliefs [21, 22]. SFP are known to converge in large classes of homogeneous beliefs models (e.g., most 2-player games [9, 19, 3]). However, in the context of heterogeneous beliefs, where agents in a population have different beliefs, SFP has been explored to a less extent.

The study of heterogeneous beliefs (or more broadly speaking, population heterogeneity) is important and practically relevant. From multi-agent system perspective, heterogeneous beliefs widely exist in many applications, such as traffic management, online trading and video game playing. For example, it is natural to expect that public opinions generally diverge on autonomous vehicles and that people have different beliefs about the behaviors of taxi drivers vs non-professional drivers. From machine learning perspective, recent empirical advances hint that injecting heterogeneity potentially accelerates population-based training of neural networks and improves learning performance [25, 29, 44]. From game theory perspective, considering heterogeneity of beliefs better explains results of some human experiments [10, 11].

Heterogeneous beliefs models of SFP are not entirely new. In the pioneering work [12], Fudenberg and Takahashi examine the heterogeneity issue in 2-population settings by appealing to techniques from the stochastic approximation theory. This approach, which is typical in the SFP literature, relates the limit behavior of each individual to an ordinary differential equation (ODE) and has yielded significant insights for many homogeneous beliefs models [3, 2, 19, 39]. However, this approach, as also noted by Fudenberg and Takahashi, “does not provide very precise estimates of the effect of the initial condition of the system.” Consider an example of a population of agents each can choose between two pure strategies s1s_{1} and s2s_{2}. Let us imagine two cases: (i) every agents in the population share the same belief that their opponents play a mixed strategy choosing s1s_{1} and s2s_{2} with equal probability 0.50.5, and (ii) half of the agents believe that their opponents determinedly play the pure strategy s1s_{1} and the other half believe that their opponents determinedly play the pure strategy s2s_{2}. The stochastic approximation approach would generally treat these two cases equally, providing little information about the heterogeneity in beliefs as well as its consequential effects on the system evolution. This drives our motivating questions:

How does heterogeneous populations evolve under SFP? How much and under what conditions does the heterogeneity in beliefs affect their long-term behaviors?

Model and Solutions. In this paper, we study the dynamics of SFP in general classes of multi-population network games that allow for heterogeneous beliefs. In a multi-population network game, each vertex of the network represents a population (continuum) of agents, and each edge represents a series of 2-player subgames between two neighboring populations. Note that multi-population network games include all the 2-population games considered in [12] and are representation of subclasses of real-world systems where the graph structure is evident [czechowski2021poincar]. We consider that for a certain population, individual agents form separate beliefs about each neighbor population and observe the mean strategy play of that population. Taking a approach different from stochastic approximation, we define the system state as a probability measure over the space of beliefs, which allows us to precisely examine the impact of heterogeneous beliefs on system evolution. This probability measure changes over time in response to agents’ learning. Thus, the main challenge is to analyze the evolution of the measure, which in general requires the development of new techniques.

As a starting point, we establish a system of partial differential equations (PDEs) to track the evolution of the measure in continuous time limit (Proposition 1). The PDEs that we derive are akin to the continuity equations111The continuity equation is a PDE that describes the transport phenomena of some quantity (e.g., mass, energy, momentum and other conserved quantities) in a physical system. commonly encountered in physics and do not allow for a general solution. Appealing to moment closure approximation [13], we circumvent the need of solving the PDEs and directly analyze the dynamics of the mean and variance (Proposition 2 and Theorem 1). As one of our key results, we prove that the variance of beliefs always decays quadratically fast with time in all network games (Theorem 1). Put differently, eventually, beliefs will homogenize and the distribution of beliefs will collapse to a single point, regardless of initial distributions of beliefs, 2-player subgames that agents play, and the number of populations and strategies. This result is non-trivial and perhaps somewhat counterintuitive. Afterall, one may find it more natural to expect that the distribution of beliefs would converge to some distribution rather than a single point, as evidenced by recent studies on Q-learning and Cross learning [23, 24, 27].

Technically, the eventual belief homogenization has a significant implication — it informally hints that the asymptotic system state of initially heterogeneous systems are likely to be the same as in homogeneous systems. We show that the fixed point of SFP correspond to Quantual Response Equilibria (QRE)222QRE is a game theoretic solution concept under bounded rationality. By QRE, in this paper we refer to their canonical form also referred to as logit equilibria or logit QRE in the literature [14]. in network games for both homogeneous and initially heterogeneous systems (Theorem 2). As our main result, we establish the convergence of SFP to QRE in different classes of games capturing both network competition as well as network coordination, independent of belief initialization. Specifically, for competitive network games, we first prove via a Lyapunov argument that the SFP converges to a unique QRE in homogeneous systems, even when the underlying game has many distinct Nash equilibria (Theorem 3). Then, we show that this convergence result can be carried over to initially heterogeneous systems (Theorem 4), by leveraging that the mean belief dynamics of initially heterogeneous systems is asymptotically autonomous [31] with its limit dynamics being the belief dynamics of a homogeneous system (Lemma 7). For coordination network games, we also prove the convergence to QRE for homogeneous and initially heterogeneous systems, in which the underlying network has star structure (Theorem 5).

On the other hand, the eventual belief homogenization may lead to a misconception that belief heterogeneity has little effect on system evolution. Using an example of 2-population stag hunt games, we show that belief heterogeneity actually plays a crucial role in equilibrium selection, even though it eventually vanishes. As shown in Figure 1, changing the variance of initial beliefs results in different limit behaviors, even when the mean of initial beliefs remains unchanged; in particular, while a small variance leads to the less desirable equilibrium (H,H)(H,H), a large variance leads to the payoff dominant equilibrium (S,S)(S,S). Thus, in the case of network coordination, initial belief heterogeneity can help select the highly desirable equilibrium and provides interesting insights to the seminal thorny problem of equilibrium selection [26]. On the contrary, in the case of network competition, we prove (Theorems 3 and 4 on the convergence to a unique QRE in competitive network games) as well as showcase experimentally that the resulting limit behavior is independent of initialization of beliefs, even if the underlying game has many distinct Nash equilibria.

Refer to caption
Figure 1: The system dynamics under the effects of different variances of initial beliefs (thin lines: predictions of our PDE model, shaded wide lines: simulation results). μ¯2S\bar{\mu}_{2S} represents the mean belief about population 22 and x¯1S\bar{x}_{1S} represents the mean probability of playing strategy SS in population 11. Initially, we set the mean beliefs μ¯2S=μ¯1S=0.3\bar{\mu}_{2S}=\bar{\mu}_{1S}=0.3 (details of the setup are summarized in the supplementary). Given the same initial mean belief, different initial variances σ2(μ2S)\sigma^{2}(\mu_{2S}) lead to the convergence to different beliefs (the left panel) and even to different strategy choices (the right panel). In particular, a large initial variance helps select the payoff dominant equilibrium (S,S)(S,S) in stag hunt games.

Related Works.

SFP and its variants have recently attracted a lot of attention in AI research [36, 37, 42, 18, 17]. There is a significant literature that analyze SFP in different models [3, 7, 21, 19], and the paper that is most closely related to our work is [12]. Fudenberg and Takahashi [12] also examines the heterogeneity issue and anticipate belief homogenization in the limit under 2-population settings. In this paper, we consider multi-population network games, which is a generalization of their setting.333The analysis presented in this paper covers all generic 2-population network games, all generic bipartite network games where the game played on each edge is the same along all edges, and all weighted zero-sum games which do not require the graph to be bipartite nor to have the same game played on each edge. Moreover, our approach is more fundamental, as the PDEs that we derive can provide much richer information about the system evolution and thus precisely estimates the temporal effects of heterogeneity, which is generally intractable in [12]. Therefore, using our approach, we are able to show an interesting finding — the initial heterogeneity plays a crucial role in equilibrium selection (Figure 1) — which unfortunately cannot be shown using the approach in [12]. Last but not least, to our knowledge, our paper is the first work that presents a systematic study of smooth fictitious play in general classes of network games.

On the other hand, networked multi-agent learning constitutes one of the current frontiers in AI and ML research [43, 30, 16]. Recent theoretical advances on network games provide conditions for learning behaviors to be not chaotic [6, 34], and investigate the convergence of Q-learning and continuous-time FP in the case of network competitions [7, 28]. However, [7, 28] consider that there is only one agent on each vertex, and hence their models are essentially for homogeneous systems.

Lahkar and Seymour [27] and Hu et al. [23, 24] also use the continuity equations as a tool to study population heterogeneity in multi-agent systems where a single population of agents applies Cross learning or Q-learning to play symmetric games. They either prove or numerically showcase that heterogeneity generally persists. Our results complement these advances by showing that heterogeneity vanishes under SFP and that heterogeneity helps select highly desirable equilibria. Moreover, methodologically, we establish new proof techniques for the convergence of learning dynamics in heterogeneous systems by leveraging seminal results (Lemmas 1 and 2) from the asymptotically autonomous dynamical system literature, which may be of independent interest.

2 Preliminaries

Population Network Games.

A population network game (PNG) Γ=(N,(V,E),(Si,ωi)iV,(𝐀ij)(i,j)E)\Gamma=(N,(V,E),(S_{i},\omega_{i})_{\forall i\in V},(\mathbf{A}_{ij})_{(i,j)\in E}) consists of a multi-agent system NN distributed over a graph (V,E)(V,E), where V={1,,n}V=\{1,...,n\} is the set of vertices each represents a population (continuum) of agents, and EE is the set of pairs, (i,j)(i,j), of population ijVi\neq j\in V. For each population iVi\in V, agents of this population has a finite set SiS_{i} of pure strategies (or actions) with generic elements siSis_{i}\in S_{i}. Agents may also use mixed strategies (or choice distributions). For an arbitrary agent kk in population ii, its mixed strategy is a vector 𝐱i(k)Δi\mathbf{x}_{i}(k)\in\Delta_{i}, where Δi\Delta_{i} is the simplex in |Si|\mathbb{R}^{|S_{i}|} such that siSixisi(k)=1\sum_{s_{i}\in S_{i}}x_{is_{i}}(k)=1 and xisi(k)0,siSix_{is_{i}}(k)\geq 0,\forall s_{i}\in S_{i}. Each edge (i,j)E(i,j)\in E defines a series of two-player subgames between populations ii and jj, such that for a given time step, each agent in population ii is randomly paired up with another agent in population jj to play a two-player subgame. We denote the payoff matrices for agents of population ii and jj in these two-player subgames by 𝐀ij|Si|×|Sj|\mathbf{A}_{ij}\in\mathbb{R}^{|S_{i}|\times|S_{j}|} and 𝐀ji|Sj|×|Si|\mathbf{A}_{ji}\in\mathbb{R}^{|S_{j}|\times|S_{i}|}, respectively. Note that at a given time step, each agent chooses a (mixed or pure) strategy and plays that strategy in all two-player subgames. Let 𝐱=(𝐱i,{𝐱j}(i,j)E)\mathbf{x}=(\mathbf{x}_{i},\{\mathbf{x}_{j}\}_{(i,j)\in E}) be a mixed strategy profile, where 𝐱i\mathbf{x}_{i} (or 𝐱j\mathbf{x}_{j}) denotes a generic mixed strategy in population ii (or jj). Given the mixed strategy profile 𝐱\mathbf{x}, the expected payoff of using 𝐱i\mathbf{x}_{i} in the game Γ\Gamma is

ri(𝐱)=ri(𝐱i,{𝐱j}(i,j)E)(i,j)E𝐱i𝐀ij𝐱j.r_{i}(\mathbf{x})=r_{i}(\mathbf{x}_{i},\{\mathbf{x}_{j}\}_{(i,j)\in E})\coloneqq\sum_{{(i,j)\in E}}\mathbf{x}_{i}^{\top}\mathbf{A}_{ij}\mathbf{x}_{j}. (1)

The game Γ\Gamma is competitive (or weighted zero-sum), if there exist positive constants ω1,,ωn\omega_{1},\ldots,\omega_{n} such that

iVωiri(𝐱)=(i,j)E(ωi𝐱i𝐀ij𝐱j+ωj𝐱j𝐀ji𝐱i)=0,𝐱iVΔi.\sum_{i\in V}\omega_{i}r_{i}(\mathbf{x})=\sum_{(i,j)\in E}\left(\omega_{i}\mathbf{x}_{i}^{\top}\mathbf{A}_{ij}\mathbf{x}_{j}+\omega_{j}\mathbf{x}_{j}^{\top}\mathbf{A}_{ji}\mathbf{x}_{i}\right)=0,\quad\forall\mathbf{x}\in\prod_{i\in V}\Delta_{i}. (2)

On the other hand, Γ\Gamma is a coordination network game, if for each edge (i,j)E(i,j)\in E, the payoff matrices of the two-player subgame satisfy 𝐀ij=𝐀ji.\mathbf{A}_{ij}=\mathbf{A}_{ji}^{\top}.

Smooth Fictitious Play.

SFP is a belief-based model for learning in games. In SFP, agents form beliefs about the play of opponents and respond to the beliefs via smooth best responses. Given a game Γ\Gamma, consider an arbitrary agent kk in a population iVi\in V. Let Vi={jV:(i,j)E}V_{i}=\{j\in V:(i,j)\in E\} be the set of neighbor populations. Agent kk maintains a weight κjsji(k)\kappa^{i}_{js_{j}}(k) for each opponent strategy sjSjs_{j}\in S_{j} of a neighbor population jVij\in V_{i}. Based on the weights, agent kk forms a belief about the neighbor population jj, such that each opponent strategy sjs_{j} is played with probability

μjsji(k)=κjsji(k)sjSjκjsji(k).\mu_{js_{j}}^{i}(k)=\frac{\kappa^{i}_{js_{j}}(k)}{\sum_{s_{j}^{\prime}\in S_{j}}\kappa^{i}_{js_{j}^{\prime}}(k)}. (3)

Let 𝛍ji(k)\bm{\upmu}_{j}^{i}(k) be the vector of beliefs with the sjs_{j}-th element equals μjsji(k)\mu_{js_{j}}^{i}(k). Agent kk forms separate beliefs for each neighbor population, and plays a smooth best response to the set of beliefs {𝛍ji(k)}jVi\{\bm{\upmu}_{j}^{i}(k)\}_{j\in V_{i}}. Given a game Γ\Gamma, agent kk’s expected payoff for using a pure strategy siSis_{i}\in S_{i} is

uisi(k)=ri(𝐞si,{𝛍ji(k,t)}jVi)=jVi𝐞si𝐀ij𝛍ji(k)\quad u_{is_{i}}(k)=r_{i}(\mathbf{e}_{s_{i}},\{\bm{\upmu}_{j}^{i}(k,t)\}_{j\in V_{i}})=\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j}^{i}(k) (4)

where 𝐞si\mathbf{e}_{s_{i}} is a unit vector where the sis_{i}-th element is 11. The probability of playing strategy sis_{i} is then given by

xisi(k)=exp(βuisi(k))siSiexp(βuisi(k))x_{is_{i}}(k)=\frac{\exp({\beta u_{is_{i}}(k))}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp({\beta u_{is_{i}^{\prime}}(k)})} (5)

where β\beta is a temperature (or the degree of rationality). We consider that agents observe the mean mixed strategy of each neighbor population. As such, at a given time step tt, agent kk updates the weights for each opponent strategy sjSj,jVis_{j}\in S_{j},j\in V_{i} as follows:

κjsji(k,t+1)=κjsji(k,t)+x¯jsj(t)\kappa_{js_{j}}^{i}(k,t+1)=\kappa_{js_{j}}^{i}(k,t)+\bar{x}_{js_{j}}(t) (6)

where x¯jsj\bar{x}_{js_{j}} is the mean probability of playing strategy sjs_{j} in population jj, i.e., x¯jsj=1njlpopulation jxjsj(l)\bar{x}_{js_{j}}=\frac{1}{n_{j}}\sum_{l\in\text{population }j}x_{js_{j}}(l) with the number of agents denoted by njn_{j}. For simplicity, we assume the initial sum of weights sjSjκjsji(k,0)\sum_{s_{j}\in S_{j}}{\kappa_{js_{j}}^{i}(k,0)} to be the same for every agent in the system NN and denote this initial sum by λ\lambda. Observe that Equation 6 can be rewritten as

(λ+t+1)μjsji(k,t+1)=(λ+t)μjsji(k,t)+x¯jsj(t).(\lambda+t+1)\mu_{js_{j}}^{i}(k,t+1)=(\lambda+t)\mu_{js_{j}}^{i}(k,t)+\bar{x}_{js_{j}}(t). (7)

Hence, even though agent kk directly updates the weights, its individual state can be characterized by the set of beliefs {𝛍ji(k)}jVi\{\bm{\upmu}_{j}^{i}(k)\}_{j\in V_{i}}. In the following, we usually drop the time index tt and agent index kk in the bracket (depending on the context) for notational convenience.

3 Belief Dynamics in Population Network Games

Observe that for an arbitrary agent kk, its belief 𝛍ji(k)\bm{\upmu}^{i}_{j}(k) is in the simplex Δj={𝛍ji(k)|Sj||sjSjμjsji(k)=1,μjsji(k)0,sjSj}\Delta_{j}=\{\bm{\upmu}^{i}_{j}(k)\in\mathbb{R}^{|S_{j}|}|\sum_{s_{j}\in S_{j}}\mu_{js_{j}}^{i}(k)=1,\mu_{js_{j}}^{i}(k)\geq 0,\forall s_{j}\in S_{j}\}. We assume that the system state is characterized by a Borel probability measure PP defined on the state space Δ=iVΔi\Delta=\prod_{i\in V}\Delta_{i}. Given 𝛍iΔi\bm{\upmu}_{i}\in\Delta_{i}, we write the marginal probability density function as p(𝛍i,t)p(\bm{\upmu}_{i},t). Note that p(𝛍i,t)p(\bm{\upmu}_{i},t) is the density of agents having the belief 𝛍i\bm{\upmu}_{i} about population ii throughout the system. Define 𝛍={𝛍i}iVΔ\bm{\upmu}=\{\bm{\upmu}_{i}\}_{i\in V}\in\Delta. Since agents maintain separate beliefs about different neighbor populations, the joint probability density function p(𝛍,t)p(\bm{\upmu},t) can be factorized, i.e., p(𝛍,t)=iVp(𝛍i,t)p(\bm{\upmu},t)=\prod_{i\in V}p(\bm{\upmu}_{i},t). We make the following assumption for the initial marginal density functions.

Assumption 1.

At time t=0t=0, for each population iVi\in V, the marginal density function p(𝛍i,t)p(\bm{\upmu}_{i},t) is continuously differentiable and has zero mass at the boundary of the simplex Δi\Delta_{i}.

This assumption is standard and common for a “nice” probability distribution. Under this mild condition, we determine the evolution of the system state PP with the following proposition, using the techniques similar to those in [27, 23].

Proposition 1 (Population Belief Dynamics).

The continuous-time dynamics of the marginal density function p(𝛍i,t)p(\bm{\upmu}_{i},t) for each population iVi\in V is governed by a partial differential equation

p(𝛍i,t)t=(p(𝛍i,t)𝐱¯i𝛍iλ+t+1)-\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t}=\nabla\cdot\left(p(\bm{\upmu}_{i},t)\frac{\mathbf{\bar{x}}_{i}-\bm{\upmu}_{i}}{\lambda+t+1}\right) (8)

where \nabla\cdot is the divergence operator and 𝐱¯i\bar{\mathbf{x}}_{i} is the mean mixed strategy with each sis_{i}-th element

x¯isi\displaystyle\bar{x}_{is_{i}} =jViΔjexp(βuisi)siSiexp(βuisi)jVip(𝛍j,t)(jVid𝛍j)\displaystyle=\int_{\prod_{j\in V_{i}}\Delta_{j}}\frac{\exp{(\beta u_{is_{i}})}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp{(\beta u_{is_{i}^{\prime}})}}\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right) (9)

where uisi=jVi𝐞si𝐀ij𝛍ju_{is_{i}}=\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j}.

For every marginal density function p(𝛍i,t)p(\bm{\upmu}_{i},t), the total mass is always conserved (Corollary 1 of the supplementary); moreover, the mass at the boundary of the simplex Δi\Delta_{i} always remains zero, indicating that agents’ beliefs will never go to extremes (Corollary 2 of the supplementary).

Generalizing the notion of a system state to a distribution over beliefs allows us to address a very specific question — the impact of belief heterogeneity on system evolution. That said, partial differential equations (Equation 8) are notoriously difficult to solve. Here we resort to the evolution of moments based on the evolution of the distribution (Equation 8). In the following proposition, we show that the characterization of belief heterogeneity is important, as the dynamics of the mean system state (or the mean belief dynamics) is indeed affected by belief heterogeneity.

Proposition 2 (Mean Belief Dynamics).

The dynamics of the mean belief 𝛍¯i\bar{\bm{\upmu}}_{i} about each population iVi\in V is governed by a system of differential equations such that for each strategy sis_{i},

dμ¯isidt\displaystyle\frac{d\bar{\mu}_{is_{i}}}{dt} fsi({𝛍j}jVi)μ¯isiλ+t+1+jVisjSj2fsi({𝛍j}jVi)(μjsj)2Var(μjsj)2(λ+t+1).\displaystyle\approx\frac{f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})-\bar{\mu}_{is_{i}}}{\lambda+t+1}+\frac{\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})}{(\partial\mu_{js_{j}})^{2}}\text{Var}(\mu_{js_{j}})}{2(\lambda+t+1)}. (10)

where fsi({𝛍}jVi)f_{s_{i}}(\{\bm{\upmu}\}_{j\in V_{i}}) is the logit choice function (Equation 5) applied to strategy siSis_{i}\in S_{i}, and Var(μjsj)\text{Var}(\mu_{js_{j}}) is the variance of belief μjsj\mu_{js_{j}} in the entire system.

In general, the mean belief dynamics is under the joint effects of the mean, variance, and infinitely many higher moments of the belief distribution. To allow for more conclusive results, we apply the moment closure approximation444Moment closure is a typical approximation method used to estimate moments of population models [13, 15, 32]. To use moment closure, a level is chosen past which all cumulants are set to zero. The conventional choice of the level is 22, i.e., setting the third and higher cumulants to be zero. and assume the effects of the third and higher moments to be negligible.

Now, just for a moment, suppose that the system beliefs are homogeneous —- the beliefs of every individuals are the same. Hence, the mean belief dynamics are effectively the belief dynamics of individuals. The following proposition follows from Equation 7.

Proposition 3 (Belief Dynamics for Homogeneous Populations).

For a homogeneous system, the dynamics of the belief 𝛍i\bm{\upmu}_{i} about each population iVi\in V is governed by a system of differential equations such that for each strategy sis_{i},

dμisidt=xisiμisiλ+t+1=fsi({𝛍j}jVi)μisiλ+t+1\displaystyle\frac{d{\mu}_{is_{i}}}{dt}=\frac{{x}_{is_{i}}-\mu_{is_{i}}}{\lambda+t+1}=\frac{f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})-\mu_{is_{i}}}{\lambda+t+1} (11)

where μisi\mu_{is_{i}} is the same for all agents in each neighbor population jVij\in V_{i}.

Intuitively, the mean belief dynamics indicates the trend of beliefs in a system, and the variance of beliefs indicates belief heterogeneity. Contrasting Propositions 2 and 3, it is clear that the variance of belief (belief heterogeneity) plays a role in determining the mean belief dynamics (the trend of beliefs) for heterogeneous systems. It is then natural to ask: how does the belief heterogeneity evolve over time? How much does the belief heterogeneity affect the trend of beliefs? Our investigation to these questions reveals an interesting finding — the variance of beliefs asymptotically tends to zero.

Theorem 1 (Quadratic Decay of the Variance of Population Beliefs).

The dynamics of the variance of beliefs 𝛍i\bm{\upmu}_{i} about each population iVi\in V is governed by a system of differential equations such that for each strategy sis_{i},

dVar(μisi)dt=2Var(μisi)λ+t+1.\frac{d\text{Var}(\mu_{is_{i}})}{dt}=-\frac{2\text{Var}(\mu_{is_{i}})}{\lambda+t+1}. (12)

At given time tt, Var(μisi)=(λ+1λ+t+1)2σ2(μisi)\text{Var}(\mu_{is_{i}})=\left(\frac{\lambda+1}{\lambda+t+1}\right)^{2}\sigma^{2}({\mu}_{is_{i}}), where σ2(μisi)\sigma^{2}(\mu_{is_{i}}) is the initial variance. Thus, the variance Var(μisi)\text{Var}(\mu_{is_{i}}) decays to zero quadratically fast with time.

Such quadratic decay of the variance stands no matter what 2-player subgames agents play and what initial conditions are. Put differently, the beliefs will eventually homogenize for all population network games. This fact immediately implies the system state in the limit.

Corollary 1.

As time tt\to\infty, the density function p(𝛍i,t)p(\bm{\upmu}_{i},t) for each population iVi\in V evolves into a Dirac delta function, and the variance of the choice distributions within each population iVi\in V also goes to zero.

Note that while the choice distributions will homogenize within each population, they are not necessarily the same across different populations. This is because the strategy choice of each population is in response to its own set of neighbor populations (which are generally different).

4 Convergence of Smooth Fictitious Play in Population Network Games

The finding on belief homogenization is non-trivial and also technically important. One implication is that the fixed points of systems with initially heterogeneous beliefs are the same as in systems with homogeneous beliefs. Thus, it follows from the belief dynamics for homogeneous systems (Proposition 3) that the fixed points of systems have the following property.

Theorem 2 (Fixed Points of System Dynamics).

For any system that initially have homogeneous or heterogeneous beliefs, the fixed points of the system dynamics is a pair (𝛍,𝐱)(\bm{\upmu}^{\ast},\mathbf{x}^{\ast}) that satisfy 𝐱i=𝛍i\mathbf{x}_{i}^{\ast}=\bm{\upmu}_{i}^{\ast} for each population iVi\in V and are the solutions of the system of equations

xisi=exp(βjVi𝐞si𝐀ij𝐱j)siSiexp(βjVi𝐞si𝐀ij𝐱j)x_{is_{i}}^{\ast}=\frac{\exp\left(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\mathbf{x}_{j}^{\ast}\right)}{\sum_{s_{i}^{\prime}\in S_{i}}\exp\left(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}^{\prime}}^{\top}\mathbf{A}_{ij}\mathbf{x}_{j}^{\ast}\right)} (13)

for every strategy siSis_{i}\in S_{i} and population iVi\in V. Such fixed points always exist and coincide with the Quantal Response Equilibria (QRE) [33] of the population network game Γ\Gamma.

Note that the above theorem applies for all population network games.

We study the convergence of SFP to the QRE under the both cases of network competition and network coordination. Due to space limits, in the following, we mainly focus on network competition and present only the main result on network coordination.

4.1 Network Competition

Consider a competitive population network game Γ\Gamma. Note that in competitive network games, the Nash equilibrium payoffs need not to be unique (which is in clear contrast to two-player settings), and it generally allows for infinitely many Nash equilibria. In the following theorem, focusing on homogeneous systems, we establish the convergence of the belief dynamics to a unique QRE, regardless of the number of Nash equilibria in the underlying game.

Theorem 3 (Convergence in Homogeneous Network Competition).

Given a competitive Γ\Gamma, for any system that has homogeneous beliefs, the belief dynamics (Equation 11) converges to a unique QRE which is globally asymptotically stable.

Proof of Sketch.

We proof this theorem by showing that the “distance” between 𝐱i\mathbf{x}_{i} and μi\mathbf{\upmu}_{i} is strictly decreasing until the QRE is reached. In particular, we measure the distance in terms of the perturbed payoff and construct a strict Lyapunov function

LiVωi[πi(𝐱i,{𝛍j}jVi)πi(𝛍i,{𝛍j}jVi)]L\coloneqq\sum_{i\in V}\omega_{i}\left[\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)-\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\right] (14)

where ω1ωn\omega_{1}\ldots\omega_{n} are the positive weights given by Γ\Gamma, and πi\pi_{i} is a perturbed payoff function defined as πi(𝐱i,{𝛍j}jVi)𝐱ijViAij𝛍j1βsiSixisiln(xisi)\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\coloneqq\mathbf{x}_{i}^{\top}\sum_{j\in V_{i}}A_{ij}\bm{\upmu}_{j}-\frac{1}{\beta}\sum_{s_{i}\in S_{i}}x_{is_{i}}\ln(x_{is_{i}}). ∎

Next, we turn to systems with initially heterogeneous beliefs. Leveraging that the variance of beliefs eventually goes to zero, we establish the following lemma.

Lemma 1.

For a system that initially has heterogeneous beliefs, the mean belief dynamics (Equation 10) is asymptotically autonomous [31] with the limit equation d𝛍idt=𝐱i𝛍i,\frac{d\bm{\upmu}_{i}}{dt}=\mathbf{x}_{i}-\bm{\upmu}_{i}, which after time-reparmeterization is equivalent to the belief dynamics for homogeneous systems (Equation 11).

For ease of presentation, we follow the convention to denote the solution flows of an asymptotically autonomous system and its limit equation by ϕ\phi and Θ\Theta, respectively. Thieme [40] provides the following seminal result that connects the limit behaviors of ϕ\phi and Θ\Theta.

Lemma 2 (Thieme [40] Theorem 4.2).

Given a metric space (X,d)(X,d). Assume that the equilibria of Θ\Theta are isolated compact Θ\Theta-invariant subsets of XX. The ω\omega-Θ\Theta-limit set of any pre-compact Θ\Theta-orbit contains a Θ\Theta-equilibrium. The point (s,x),st0,xX(s,x),s\geq t_{0},x\in X, have a pre-compact ϕ\phi-orbit. Then the following alternative holds: 1) ϕ(t,s,x)e,t\phi(t,s,x)\to e,t\to\infty, for some Θ\Theta-equilibrium e, and 2) the ω\omega-ϕ\phi-limit set of (s,x)(s,x) contains finitely many Θ\Theta-equilibria which are chained to each other in a cyclic way.

Combining the above results, we prove the convergence for initially heterogeneous systems.

Theorem 4 (Convergence in Initially Heterogeneous Network Competition).

Given a competitive Γ\Gamma, for any system that initially has heterogeneous beliefs, the mean belief dynamics (Equation 10) converges to a unique QRE.

The following corollary immediately follows as the result of belief homogenization.

Corollary 2.

For any competitive Γ\Gamma, under smooth fictitious play, the choice distributions and beliefs of every individual converges to a unique QRE (given in Theorem 2), regardless of belief initialization and the number of Nash equilibria in Γ\Gamma.

4.2 Network Coordination

We delegate most of the results on coordination network games to the supplementary, and summarize only the main result here.

Theorem 5 (Convergence in Network Coordination with Star Structure).

Given a coordination Γ\Gamma where the network structure consists of a single or disconnected multiple stars, each orbit of the belief dynamics (Equation 11) for homogeneous systems as well as each orbit of the mean belief dynamics (Equation 10) for initially heterogeneous systems converges to the set of QRE.

Note that this theorem applies to all 2-population coordination games, as network games with or without star structure are essentially the same when there are only two vertices. We also remark that pure or mixed Nash equilibria in coordination network games are complex; as reported in recent works [5, 4, 1], finding a pure Nash equilibrium is PLS-complete. Hence, learning in the general case of network coordination is difficult and generally requires some conditions for theoretical analysis [34, 35].

5 Experiments: Equilibrium Selection in Population Network Games

In this section, we complement our theory and present an empirical study of SFP in a two-population coordination (stag hunt) game and a five-population zero-sum (asymmetric matching pennies) game. Importantly, these two games both have multiple Nash equilibria, which naturally raises the problem of equilibrium selection.

HH SS
HH (1, 1) (2, 0)
SS (0, 2) (4, 4)
Table 1: Stag Hunt.
Refer to caption
Figure 2: Asymmetric Matching Pennies.

5.1 Two-Population Stag Hunt Games

We have shown in Figure 1 (in the introduction) that given the same initial mean belief, changing the variances of initial beliefs can result in different limit behaviors. In the following, we systematically study the effect of initial belief heterogeneity by visualizing how it affects the regions of attraction to different equilibria.

Game Description. We consider a two-population stag hunt game, where each player in populations 11 and 22 has two actions {H,S}\{H,S\}. As shown in the payoff bi-matrices (Table 1), there are two pure strategy Nash equilibria in this game: (H,H)(H,H) and (S,S)(S,S). While (H,H)(H,H) is risk dominant, (S,S)(S,S) is indeed more desirable as it is payoff dominant as well as Pareto optimal.

Results. In this game, population 1 forms beliefs about population 2 and vice versa. We denote the initial mean beliefs by a pair (μ¯2H,μ¯1H)(\bar{\mu}_{2H},\bar{\mu}_{1H}). We numerically solve the mean belief dynamics for a large range of initial mean beliefs, given different variances of initial beliefs. In Figure 3, for each pair of initial mean beliefs, we color the corresponding data point based on which QRE the system eventually converges to. We observe that as the variance of initial beliefs increases (from the left to right panel), a larger range of initial mean beliefs results in the convergence to the QRE that approximates the payoff dominant equilibrium (S,S)(S,S). Put differently, a higher degree of initial belief heterogeneity leads to a larger region of attraction to (S,S)(S,S). Hence, belief heterogeneity eventually vanishes though, it provides an approach to equilibrium selection, as it helps select the highly desirable equilibrium.

Refer to caption
Figure 3: Belief heterogeneity helps select the payoff dominant equilibrium (S,S)(S,S) (yellow: the equilibrium (S,S)(S,S), blue: the equilibrium (H,H)(H,H)). As the variance of initial beliefs increases (from the left to right panel), a larger range of initial mean beliefs will approximately reach the equilibrium (S,S)(S,S) in the limit. For each panel, the initial variances of two populations σ2(μ1H)\sigma^{2}(\mu_{1H}) and σ2(μ2H)\sigma^{2}(\mu_{2H}) are the same.
Refer to caption
Figure 4: With different belief initialization, SFP selects a unique equilibrium where all agents in population 33 play strategy HH with probability 0.50.5. We run 100 simulation runs for each initialization. The thin lines represent the mean mixed strategy (the choice probability of HH) and the shaded areas represent the variance of the mixed strategies in the population. In the legends, BB denotes Beta distribution; the two Beta distributions correspond to the initial beliefs about the neighbor populations 22 and 44, respectively.

5.2 Five-Population Asymmetric Matching Pennies Games

We have shown in Corollary 2 that SFP converges to a unique QRE even if there are multiple Nash equilibria in a competitive Γ\Gamma. In the following, we corroborate this by providing empirical evidence in agent-based simulations with different belief initialization (the details of simulations are summarized in the supplementary).

Game Description. Consider a five-population asymmetric matching pennies game [28], where the network structure is a line (depicted in Figure 2). Each agent has two actions {H,T}\{H,T\}. Agents in populations 11 and 55 do not learn; they always play strategies HH and TT, respectively. For agents in populations 22 to 44, they receive +1+1 if they match the strategy of the opponent in the next population, and receive 1-1 if they mismatch. On the contrary, they receive +1+1 if they mismatch the strategy of the opponent in the previous population, and receive 1-1 if they match. Hence, this game has infinitely many Nash equilibria of the form: agents in populations 22 and 44 play strategy TT, whereas agents in population 33 are indifferent between strategies HH and TT.

Results. In this game, agents in each population form two beliefs (one for the previous population and one for the next population). We are mainly interested in the strategies of population 33, as the Nash equilibria differ in the strategies in population 33. For validation, we vary population 33’s beliefs about the neighbor populations 22 and 44, and fix population 33’s beliefs about the other populations. As shown in Figure 4, given differential initialization of beliefs, agents in population 33 converge to the same equilibrium where they all take strategy HH with probability 0.50.5. Therefore, even when the underlying zero-sum game has many Nash equilibria, SFP with different initial belief heterogeneity selects a unique equilibria, addressing the problem of equilibrium selection.

6 Conclusions

We study a heterogeneous beliefs model of SFP in network games. Representing the system state with a distribution over beliefs, we prove that beliefs eventually become homogeneous in all network games. We establish the convergence of SFP to Quantal Response Equilibria in general competitive network games as well as coordination network games with star structure. We experimentally show that although the initial belief heterogeneity vanishes in the limit, it plays a crucial role in equilibrium selection and helps select highly desirable equilibria.

Appendix A: Corollaries and Proofs omitted in Section 3

Proof of Proposition 1

It follows from Equation 7 in the main paper that the change in 𝛍ji(k,t)\bm{\upmu}_{j}^{i}(k,t) between two discrete time steps is

𝛍ji(k,t+1)=𝛍ji(k,t)+𝐱¯j(t)𝛍ji(k,t)λ+t+1.\bm{\upmu}_{j}^{i}(k,t+1)=\bm{\upmu}_{j}^{i}(k,t)+\frac{\bar{\mathbf{x}}_{j}(t)-\bm{\upmu}_{j}^{i}(k,t)}{\lambda+t+1}. (15)
Lemma 3.

Under Assumption 1 (in the main paper), for an arbitrary agent kk in population ii, its belief 𝛍ji(k,t)\bm{\upmu}_{j}^{i}(k,t) about a neighbor population jj will never reach the extreme belief (i.e., the boundary of the simplex Δi\Delta_{i}).

Proof.

Assumption 1 ensures that 𝐱¯j(0)\bar{\mathbf{x}}_{j}(0) is in the interior of the simplex Δj\Delta_{j}. Moreover, the logit choice function (Equation 5 in the main paper) also ensures that 𝐱¯j(t)\bar{\mathbf{x}}_{j}(t) stays in the interior of Δj\Delta_{j} afterwards for a finite temperature β\beta. Hence, from Equation 15, one can see that 𝛍ji(k,t)\bm{\upmu}_{j}^{i}(k,t) for every time step tt will stay in the interior of Δj\Delta_{j}. ∎

In the following, for notation convenience, we sometimes drop the agent index kk and the time index tt depending on the context. Consider a population ii. We rewrite the change in the beliefs about this population as follows.

𝛍i(t+1)=𝛍i(t)+𝐱¯i(t)𝛍i(t)λ+t+1.\bm{\upmu}_{i}(t+1)=\bm{\upmu}_{i}(t)+\frac{\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (16)

Suppose that the amount of time that passes between two successive time steps is δ(0,1]\delta\in(0,1]. We rewrite the above equation as

𝛍i(t+δ)=𝛍i(t)+δ𝐱¯i(t)𝛍i(t)λ+t+1.\bm{\upmu}_{i}(t+\delta)=\bm{\upmu}_{i}(t)+\delta\frac{\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (17)

Next, we consider a test function θ(𝛍i)\theta(\bm{\upmu}_{i}). Define

Y=𝔼[θ(𝛍i(t+δ))]𝔼[θ(𝛍i(t))]δ.Y=\frac{\mathbb{E}[\theta(\bm{\upmu}_{i}(t+\delta))]-\mathbb{E}[\theta(\bm{\upmu}_{i}(t))]}{\delta}. (18)

Applying Taylor series for θ(𝛍i(t+δ))\theta(\bm{\upmu}_{i}(t+\delta)) at 𝛍i(t)\bm{\upmu}_{i}(t), we obtain

θ(𝛍i(t+δ))\displaystyle\theta(\bm{\upmu}_{i}(t+\delta)) =θ(𝛍i(t))+δλ+t+1𝛍iθ(𝛍i)[𝐱¯i(t)𝛍i(t)]\displaystyle=\theta(\bm{\upmu}_{i}(t))+\frac{\delta}{\lambda+t+1}\partial_{\bm{\upmu}_{i}}\theta(\bm{\upmu}_{i})\left[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)\right]
+δ22(λ+t+1)2[𝐱¯i(t)𝛍i(t)]𝐇θ(𝛍i)[𝐱¯i(t)𝛍i(t)]\displaystyle\quad+\frac{\delta^{2}}{2(\lambda+t+1)^{2}}\left[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)\right]^{\top}\mathbf{H}\theta(\bm{\upmu}_{i})\left[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)\right]
+o([δ𝐱¯i(t)𝛍i(t)λ+t+1]2)\displaystyle\quad+o\left(\left[\delta\frac{\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}\right]^{2}\right) (19)

where 𝐇\mathbf{H} denotes the Hessian matrix. Hence, the expectation 𝔼[θ(𝛍i(t+δ))]\mathbb{E}[\theta(\bm{\upmu}_{i}(t+\delta))] is

𝔼[θ(𝛍i(t+δ))]\displaystyle\mathbb{E}[\theta(\bm{\upmu}_{i}(t+\delta))] =𝔼[θ(𝛍i(t))]+δλ+t+1𝔼[𝛍iθ(𝛍i(t))(𝐱¯i(t)𝛍i(t))]\displaystyle=\mathbb{E}[\theta(\bm{\upmu}_{i}(t))]+\frac{\delta}{\lambda+t+1}\mathbb{E}[\partial_{\bm{\upmu}_{i}}\theta(\bm{\upmu}_{i}(t))(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))]
+δ22(λ+t+1)2𝔼[[𝐱¯i(t)𝛍i(t)]𝐇θ(𝛍i)[𝐱¯i(t)𝛍i(t)]]\displaystyle\quad+\frac{\delta^{2}}{2(\lambda+t+1)^{2}}\mathbb{E}\left[[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)]^{\top}\mathbf{H}\theta(\bm{\upmu}_{i})\left[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)\right]\right]
+δ22(λ+t+1)2𝔼[o([𝐱¯i(t)𝛍i(t)]2)]\displaystyle\quad+\frac{\delta^{2}}{2(\lambda+t+1)^{2}}\mathbb{E}[o([\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)]^{2})] (20)

Moving the term 𝔼[θ(𝛍i(t))]\mathbb{E}[\theta(\bm{\upmu}_{i}(t))] to the left hand side and dividing both sides by δ\delta, we recover the quantity YY, i.e.,

Y\displaystyle Y =1λ+t+1𝔼[𝛍iθ(𝛍i(t))(𝐱¯i(t)𝛍i(t))]\displaystyle=\frac{1}{\lambda+t+1}\mathbb{E}[\partial_{\bm{\upmu}_{i}}\theta(\bm{\upmu}_{i}(t))(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))]
+δ2(λ+t+1)2𝔼[[𝐱¯i(t)𝛍i(t)]𝐇θ(𝛍i(t))[𝐱¯i(t)𝛍i(t)]+o((𝐱¯i(t)𝛍i(t))2)]\displaystyle\quad+\frac{\delta}{2(\lambda+t+1)^{2}}\mathbb{E}[[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)]^{\top}\mathbf{H}\theta(\bm{\upmu}_{i}(t))[\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t)]+o\left((\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))^{2}\right)] (21)

Taking the limit of YY with δ0\delta\to 0, the contribution of the second term on the right hand side vanishes, yielding

limδ0Y\displaystyle\lim_{\delta\to 0}Y =1λ+t+1𝔼[𝛍iθ(𝛍i(t))(𝐱¯i(t)𝛍i(t))]\displaystyle=\frac{1}{\lambda+t+1}\mathbb{E}[\partial_{\bm{\upmu}_{i}}\theta(\bm{\upmu}_{i}(t))(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))] (22)
=1λ+t+1p(𝛍i(t),t)[𝛍iθ(𝛍i(t))(𝐱¯i(t)𝛍i(t))]𝑑𝛍i(t).\displaystyle=\frac{1}{\lambda+t+1}\int p(\bm{\upmu}_{i}(t),t)\left[\partial_{\bm{\upmu}_{i}}\theta(\bm{\upmu}_{i}(t))(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))\right]d\bm{\upmu}_{i}(t). (23)

Apply integration by parts. We obtain

limδ0Y=01λ+t+1θ(𝛍i(t))[p(𝛍i(t),t)(𝐱¯i(t)𝛍i(t))]𝑑𝛍i(t)\displaystyle\lim_{\delta\to 0}Y=0-\frac{1}{\lambda+t+1}\int\theta(\bm{\upmu}_{i}(t))\nabla\cdot\left[p(\bm{\upmu}_{i}(t),t)(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))\right]d\bm{\upmu}_{i}(t) (24)

where we have leveraged that the probability mass p(𝛍i,t)p(\bm{\upmu}_{i},t) at the boundary Δi\partial\Delta_{i} remains zero as a result of Lemma 1. On the other hand, according to the definition of YY,

limδ0Y=limδ0θ(𝛍i(t))p(𝛍i,t+δ)p(𝛍i,t)δ𝑑𝛍i=θ(𝛍i(t))tp(𝛍i,t)d𝛍i.\displaystyle\lim_{\delta\to 0}Y=\lim_{\delta\to 0}\int\theta(\bm{\upmu}_{i}(t))\frac{p(\bm{\upmu}_{i},t+\delta)-p(\bm{\upmu}_{i},t)}{\delta}d\bm{\upmu}_{i}=\int\theta(\bm{\upmu}_{i}(t))\partial_{t}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}. (25)

Therefore, we have the equality

θ(𝛍i(t))tp(𝛍i,t)d𝛍i=1λ+t+1θ(𝛍i(t))[p(𝛍i(t),t)(𝐱¯i(t)𝛍i(t))]𝑑𝛍i(t).\displaystyle\int\theta(\bm{\upmu}_{i}(t))\partial_{t}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}=-\frac{1}{\lambda+t+1}\int\theta(\bm{\upmu}_{i}(t))\nabla\cdot\left[p(\bm{\upmu}_{i}(t),t)(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))\right]d\bm{\upmu}_{i}(t). (26)

As θ\theta is a test function, this leads to

tp(𝛍i,t)=1λ+t+1[p(𝛍i(t),t)(𝐱¯i(t)𝛍i(t))].\displaystyle\partial_{t}p(\bm{\upmu}_{i},t)=-\frac{1}{\lambda+t+1}\nabla\cdot\left[p(\bm{\upmu}_{i}(t),t)(\bar{\mathbf{x}}_{i}(t)-\bm{\upmu}_{i}(t))\right]. (27)

Rearranging the terms, we obtain Equation 8 in the main paper. By the definition of expectation given a probability distribution, it is straightforward to obtain Equation 9 in the main paper. Q.E.D.

Remarks: The PDEs we derived are akin to the continuity equation commonly encountered in physics in the study of conserved quantities.The continuity equation describes the transport phenomena (e.g., of mass or energy) in a physical system. This renders a physical interpretation for our PDE model: under SFP, the belief dynamics of a heterogeneous system is analogously the transport of the agent mass in the simplex Δ=iVΔi\Delta=\prod_{i\in V}\Delta_{i}.

Corollaries of Proposition 1

Corollary 3.

For any population iVi\in V, the system beliefs about this population never go to extremes.

Proof.

This is a straightforward result of Lemma 1. ∎

Corollary 4.

For any population iVi\in V, the total probability mass p(𝛍i,t)p(\bm{\upmu}_{i},t) always remains conserved.

Proof.

Consider the time derivative of the total probability mass

ddtp(𝛍i,t)𝑑𝛍i.\displaystyle\frac{d}{dt}\int p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}. (28)

Apply the Leibniz rule to interchange differentiation and integration,

ddtp(𝛍i,t)𝑑𝛍i=p(𝛍i,t)t𝑑𝛍i.\displaystyle\frac{d}{dt}\int p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}=\int\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t}d\bm{\upmu}_{i}. (29)

Substitute p(𝛍i,t)t\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t} with Equation 8 in the main paper,

ddtp(𝛍i,t)𝑑𝛍i\displaystyle\frac{d}{dt}\int p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}
=(p(𝛍i,t)𝐱¯i𝛍iλ+t+1)𝑑𝛍i\displaystyle=-\int\nabla\cdot\left(p(\bm{\upmu}_{i},t)\frac{\mathbf{\bar{x}}_{i}-\bm{\upmu}_{i}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (30)
=siSiμisi(p(𝛍i,t)x¯isiμisiλ+t+1)d𝛍i\displaystyle=-\int\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(p(\bm{\upmu}_{i},t)\frac{\bar{x}_{is_{i}}-\mu_{is_{i}}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (31)
=1λ+t+1[siSiμisip(𝛍i,t)(x¯isiμisi)d𝛍i+p(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i]\displaystyle=-\frac{1}{\lambda+t+1}\left[\int\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}+\int p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}\right] (32)

Apply integration by parts,

siSiμisip(𝛍i,t)(x¯isiμisi)d𝛍i=0p(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i.\displaystyle\int\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}=0-\int p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}. (33)

where we have leveraged that the probability mass p(𝛍i,t)p(\bm{\upmu}_{i},t) at the boundary Δi\partial\Delta_{i} remains zero. Hence, the terms within the bracket of Equation 32 cancel out, and

ddtp(𝛍i,t)𝑑𝛍i=0.\displaystyle\frac{d}{dt}\int p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}=0. (34)

Proof of Proposition 2

Lemma 4.

The dynamics of the mean belief 𝛍¯i\bar{\bm{\upmu}}_{i} about each population iVi\in V is governed by a differential equation

dμ¯isidt=x¯isiμ¯isiλ+t+1,siSi.\displaystyle\frac{d\bar{\mu}_{is_{i}}}{dt}=\frac{\bar{x}_{is_{i}}-\bar{\mu}_{is_{i}}}{\lambda+t+1},\qquad\forall s_{i}\in S_{i}. (35)
Proof.

The time derivative of the mean belief about strategy sis_{i} is

dμ¯isidt=ddtμisip(𝛍i,t)𝑑𝛍i.\displaystyle\frac{d\bar{\mu}_{is_{i}}}{dt}=\frac{d}{dt}\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}. (36)

We apply the Leibniz rule to interchange differentiation and integration, and then substitute p(𝛍i,t)t\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t} with Equation 8 in the main paper.

ddtμisip(𝛍i,t)𝑑𝛍i\displaystyle\frac{d}{dt}\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i} (37)
=μisip(𝛍i,t)t𝑑𝛍i\displaystyle=\int\mu_{is_{i}}\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t}d\bm{\upmu}_{i} (38)
=μisi(p(𝛍i,t)𝐱¯i𝛍iλ+t+1)𝑑𝛍i\displaystyle=-\int\mu_{is_{i}}\nabla\cdot\left(p(\bm{\upmu}_{i},t)\frac{\mathbf{\bar{x}}_{i}-\bm{\upmu}_{i}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (39)
=μisisiSiμisi(p(𝛍i,t)x¯isiμisiλ+t+1)d𝛍i\displaystyle=-\int\mu_{is_{i}}\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(p(\bm{\upmu}_{i},t)\frac{\bar{x}_{is_{i}}-\mu_{is_{i}}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (40)
=γ[μisisiSi(μisip(𝛍i,t))(x¯isiμisi)d𝛍i+μisip(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i]\displaystyle=\gamma\left[\int\mu_{is_{i}}\sum_{s_{i}\in S_{i}}\left(\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\right)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}+\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}\right] (41)

where γ1λ+t+1\gamma\coloneqq-\frac{1}{\lambda+t+1}. Apply integration by parts to the first term in Equation 41.

μisisiSi(μisip(𝛍i,t))(x¯isiμisi)d𝛍i\displaystyle\int\mu_{is_{i}}\sum_{s_{i}\in S_{i}}\left(\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\right)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}
=μisip(𝛍i,t)[siSiμisi(x¯isiμisi)]+p(𝛍i,t)μisi[μisi(x¯isiμisi)]d𝛍i\displaystyle=-\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)\left[\sum_{s_{i}^{\prime}\in S_{i}}\partial_{\mu_{is_{i}^{\prime}}}(\bar{x}_{is_{i}^{\prime}}-\mu_{is_{i}^{\prime}})\right]+p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left[\mu_{is_{i}}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]d\bm{\upmu}_{i} (42)

where we have leveraged that the probability mass at the boundary remains zero. Hence, it follows from Equation 41 that

ddtμisip(𝛍i,t)𝑑𝛍i\displaystyle\frac{d}{dt}\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i} (43)
=γμisip(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍iγp(𝛍i,t)μisi[μisi(x¯isiμisi)]d𝛍i\displaystyle=-\gamma\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)\sum_{s_{i}^{\prime}\in S_{i}}\partial_{\mu_{is_{i}^{\prime}}}(\bar{x}_{is_{i}^{\prime}}-\mu_{is_{i}^{\prime}})d\bm{\upmu}_{i}-\gamma\int p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left[\mu_{is_{i}}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]d\bm{\upmu}_{i}
+γμisip(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i\displaystyle\qquad+\gamma\int\mu_{is_{i}}p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i} (44)
=γp(𝛍i,t)[μisiμisi(x¯isiμisi)μisi[μisi(x¯isiμisi)]]𝑑𝛍i\displaystyle=\gamma\int p(\bm{\upmu}_{i},t)\left[\mu_{is_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)-\partial_{\mu_{is_{i}}}\left[\mu_{is_{i}}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]\right]d\bm{\upmu}_{i} (45)
=γp(𝛍i,t)μisi𝑑𝛍ip(𝛍i,t)x¯isi𝑑𝛍i\displaystyle=\gamma\int p(\bm{\upmu}_{i},t)\mu_{is_{i}}d\bm{\upmu}_{i}-\int p(\bm{\upmu}_{i},t)\bar{x}_{is_{i}}d\bm{\upmu}_{i} (46)
=x¯isiμ¯isiλ+t+1\displaystyle=\frac{\bar{x}_{is_{i}}-\bar{\mu}_{is_{i}}}{\lambda+t+1} (47)

We repeat the mean probability x¯isi\bar{x}_{is_{i}}, which has been given in Equation 9 in the main paper, as follows:

x¯isi\displaystyle\bar{x}_{is_{i}} =exp(βuisi)siSiexp(βuisi)jVip(𝛍j,t)(jVid𝛍j)\displaystyle=\int\frac{\exp{(\beta u_{is_{i}})}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp{(\beta u_{is_{i}^{\prime}})}}\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right) (48)

where uisi=jVi𝐞si𝐀ij𝛍ju_{is_{i}}=\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j}. Define 𝛍¯{𝛍¯j}jVi\bar{\bm{\upmu}}\coloneqq\{\bar{\bm{\upmu}}_{j}\}_{j\in V_{i}} and

fsi({𝛍j}jVi)exp(βjVi𝐞si𝐀ij𝛍j)siSiexp(βjVi𝐞si𝐀ij𝛍j).\displaystyle f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})\coloneqq\frac{\exp{(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j})}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp{(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}^{\prime}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j})}}. (49)

Applying the Taylor expansion to approximate this function at the mean belief 𝛍¯\bar{\bm{\upmu}}, we have

fsi({𝛍j}jVi)fsi(𝛍¯)+fsi(𝛍¯)(𝛍𝛍¯)+12(𝛍𝛍¯)𝐇fsi(𝛍¯)(𝛍𝛍¯)+O(𝛍𝛍¯3)\displaystyle f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})\approx f_{s_{i}}(\bar{\bm{\upmu}})+\nabla f_{s_{i}}(\bar{\bm{\upmu}})\cdot(\bm{\upmu}-\bar{\bm{\upmu}})+\frac{1}{2}(\bm{\upmu}-\bar{\bm{\upmu}})^{\top}\mathbf{H}f_{s_{i}}(\bar{\bm{\upmu}})(\bm{\upmu}-\bar{\bm{\upmu}})+O(||\bm{\upmu}-\bar{\bm{\upmu}}||^{3}) (50)

where 𝐇\mathbf{H} denotes the Hessian matrix. Hence, we can rewrite Equation 48 as

x¯isi\displaystyle\bar{x}_{is_{i}} =fsi({𝛍j}jVi)jVip(𝛍j,t)(jVid𝛍j)\displaystyle=\int f_{s_{i}}(\{\bm{\upmu}_{j}\}_{j\in V_{i}})\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right) (51)
fsi(𝛍¯)+fsi(𝛍¯)𝛍jVip(𝛍j,t)(jVid𝛍j)fsi(𝛍¯)𝛍¯\displaystyle\approx f_{s_{i}}(\bar{\bm{\upmu}})+\int\nabla f_{s_{i}}(\bar{\bm{\upmu}})\cdot\bm{\upmu}\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right)-\nabla f_{s_{i}}(\bar{\bm{\upmu}})\cdot\bar{\bm{\upmu}}
+12(𝛍𝛍¯)𝐇fsi(𝛍¯)(𝛍𝛍¯)jVip(𝛍j,t)(jVid𝛍j)\displaystyle\qquad+\int\frac{1}{2}(\bm{\upmu}-\bar{\bm{\upmu}})^{\top}\mathbf{H}f_{s_{i}}(\bar{\bm{\upmu}})(\bm{\upmu}-\bar{\bm{\upmu}})\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right)
+O(𝛍𝛍¯)3jVip(𝛍j,t)(jVid𝛍j)\displaystyle\qquad+\int O(||\bm{\upmu}-\bar{\bm{\upmu}}||)^{3}\prod_{j\in V_{i}}p(\bm{\upmu}_{j},t)\left(\prod_{j\in V_{i}}d\bm{\upmu}_{j}\right) (52)

Observe that in Equation 52, the second and the third term can be canceled out. Moreover, for any two neighbor populations j,kVij,k\in V_{i}, the beliefs 𝛍j,𝛍k\bm{\upmu}_{j},\bm{\upmu}_{k} about these two populations are separate and independent. Hence, the covariance of these beliefs are zero. We apply the moment closure approximation [32, 13] with the second order and obtain

x¯isifsi(𝛍¯)+12jVisjSj2fsi(𝛍¯)(μjsj)2Var(μjsj).\displaystyle\bar{x}_{is_{i}}\approx f_{s_{i}}(\bar{\bm{\upmu}})+\frac{1}{2}\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}}\text{Var}(\mu_{js_{j}}). (53)

Hence, substituting x¯isi\bar{x}_{is_{i}} in Lemma 4 with the above approximation, we have the mean belief dynamics

dμ¯isidt\displaystyle\frac{d\bar{\mu}_{is_{i}}}{dt} fsi(𝛍¯)μ¯isiλ+t+1+jVisjSj2fsi(𝛍¯)(μjsj)2Var(μjsj)2(λ+t+1).\displaystyle\approx\frac{f_{s_{i}}(\bar{\bm{\upmu}})-\bar{\mu}_{is_{i}}}{\lambda+t+1}+\frac{\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}}\text{Var}(\mu_{js_{j}})}{2(\lambda+t+1)}. (54)

Q.E.D.

Remarks: the use of the moment closure approximation (considering only the first and the second moments) is for obtaining more conclusive results. Strictly speaking, the mean belief dynamics also depend on the third and higher moments. However, we observe in the experiments that these moments in general have little effects on the mean belief dynamics. To be more specific, given the same initial mean beliefs, while the variance of initial beliefs sometimes can change the limit behaviors of a system, we do not observe similar phenomena for the third and higher moments.

Proof of Proposition 3

Consider a population ii. It follows from Equation 7 in the main paper that the change in the beliefs about this population can be written as follows.

𝛍i(t+1)=𝛍i(t)+𝐱i(t)𝛍i(t)λ+t+1.\bm{\upmu}_{i}(t+1)=\bm{\upmu}_{i}(t)+\frac{\mathbf{x}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (55)

Suppose that the amount of time that passes between two successive time steps is δ(0,1]\delta\in(0,1]. We rewrite the above equation as

𝛍i(t+δ)=𝛍i(t)+δ𝐱i(t)𝛍i(t)λ+t+1.\bm{\upmu}_{i}(t+\delta)=\bm{\upmu}_{i}(t)+\delta\frac{\mathbf{x}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (56)

Move the term 𝛍i(t)\bm{\upmu}_{i}(t) to the right hand side and divide both sides by δ\delta,

𝛍i(t+δ)𝛍i(t)δ=𝐱i(t)𝛍i(t)λ+t+1.\frac{\bm{\upmu}_{i}(t+\delta)-\bm{\upmu}_{i}(t)}{\delta}=\frac{\mathbf{x}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (57)

Assume that the amount of time δ\delta between two successive time steps goes to zero. we have

d𝛍idt=limδ0𝛍i(t+δ)𝛍i(t)δ=𝐱i(t)𝛍i(t)λ+t+1.\frac{d\bm{\upmu}_{i}}{dt}=\lim_{\delta\to 0}\frac{\bm{\upmu}_{i}(t+\delta)-\bm{\upmu}_{i}(t)}{\delta}=\frac{\mathbf{x}_{i}(t)-\bm{\upmu}_{i}(t)}{\lambda+t+1}. (58)

Note that for continuous-time dynamics, we usually drop the time index in the bracket, yielding the belief dynamics (Equation 11) in Proposition 3. Q.E.D.

Proof of Theorem 1

Without loss of generality, we consider the variance of the belief μisi\mu_{is_{i}} about strategy sis_{i} of population ii. Note that

Var(μisi)=𝔼[(μisi)2](μ¯isi)2.\displaystyle\text{Var}(\mu_{is_{i}})=\mathbb{E}[(\mu_{is_{i}})^{2}]-(\bar{\mu}_{is_{i}})^{2}. (59)

Hence, we have

dVar(μisi)dt=d𝔼[(μisi)2]dt2μ¯isidμ¯isidt.\frac{d\text{Var}(\mu_{is_{i}})}{dt}=\frac{d\mathbb{E}[(\mu_{is_{i}})^{2}]}{dt}-2\bar{\mu}_{is_{i}}\frac{d\bar{\mu}_{is_{i}}}{dt}. (60)

Consider the first term on the right hand side. We apply the Leibniz rule to interchange differentiation and integration, and then substitute p(𝛍i,t)t\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t} with Equation 8 in the main paper.

d𝔼[(μisi)2]dt\displaystyle\frac{d\mathbb{E}[(\mu_{is_{i}})^{2}]}{dt}
=(μisi)2p(𝛍i,t)t𝑑𝛍i\displaystyle=\int(\mu_{is_{i}})^{2}\frac{\partial p(\bm{\upmu}_{i},t)}{\partial t}d\bm{\upmu}_{i} (61)
=(μisi)2(p(𝛍i,t)𝐱¯i𝛍iλ+t+1)𝑑𝛍i\displaystyle=-\int(\mu_{is_{i}})^{2}\nabla\cdot\left(p(\bm{\upmu}_{i},t)\frac{\mathbf{\bar{x}}_{i}-\bm{\upmu}_{i}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (62)
=(μisi)2siSiμisi(p(𝛍i,t)x¯isiμisiλ+t+1)d𝛍i\displaystyle=-\int(\mu_{is_{i}})^{2}\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(p(\bm{\upmu}_{i},t)\frac{\bar{x}_{is_{i}}-\mu_{is_{i}}}{\lambda+t+1}\right)d\bm{\upmu}_{i} (63)
=γ(μisi)2siSiμisip(𝛍i,t)(x¯isiμisi)d𝛍i+γ(μisi)2p(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i\displaystyle=\gamma\int(\mu_{is_{i}})^{2}\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}+\gamma\int(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i} (64)

where γ1λ+t+1\gamma\coloneqq-\frac{1}{\lambda+t+1}. Applying integration by parts to the first term in Equation 64 yields

(μisi)2siSiμisip(𝛍i,t)(x¯isiμisi)d𝛍i\displaystyle\int(\mu_{is_{i}})^{2}\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}p(\bm{\upmu}_{i},t)\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i}
=(μisi)2p(𝛍i,t)[siSiμisi(x¯isiμisi)]+p(𝛍i,t)μisi[(μisi)2(x¯isiμisi)]d𝛍i\displaystyle=-\int(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)\left[\sum_{s_{i}^{\prime}\in S_{i}}\partial_{\mu_{is_{i}^{\prime}}}(\bar{x}_{is_{i}^{\prime}}-\mu_{is_{i}^{\prime}})\right]+p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left[(\mu_{is_{i}})^{2}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]d\bm{\upmu}_{i} (65)

where we have leveraged that the probability mass at the boundary remains zero. Combining the above two equations, we obtain

d𝔼[(μisi)2]dt\displaystyle\frac{d\mathbb{E}[(\mu_{is_{i}})^{2}]}{dt}
=γ(μisi)2p(𝛍i,t)[siSiμisi(x¯isiμisi)]+p(𝛍i,t)μisi[(μisi)2(x¯isiμisi)]d𝛍i\displaystyle=-\gamma\int(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)\left[\sum_{s_{i}^{\prime}\in S_{i}}\partial_{\mu_{is_{i}^{\prime}}}(\bar{x}_{is_{i}^{\prime}}-\mu_{is_{i}^{\prime}})\right]+p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left[(\mu_{is_{i}})^{2}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]d\bm{\upmu}_{i}
+γ(μisi)2p(𝛍i,t)siSiμisi(x¯isiμisi)d𝛍i\displaystyle\qquad+\gamma\int(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)\sum_{s_{i}\in S_{i}}\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i} (66)
=γ[p(𝛍i,t)μisi[(μisi)2(x¯isiμisi)]]+(μisi)2p(𝛍i,t)μisi(x¯isiμisi)d𝛍i\displaystyle=\gamma\int\left[-p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left[(\mu_{is_{i}})^{2}(\bar{x}_{is_{i}}-\mu_{is_{i}})\right]\right]+(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)\partial_{\mu_{is_{i}}}\left(\bar{x}_{is_{i}}-\mu_{is_{i}}\right)d\bm{\upmu}_{i} (67)
=γ2(μisi)2p(𝛍i,t)𝑑𝛍iγ2x¯isiμisip(𝛍i,t)𝑑𝛍i\displaystyle=\gamma\int 2(\mu_{is_{i}})^{2}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i}-\gamma\int 2\bar{x}_{is_{i}}\mu_{is_{i}}p(\bm{\upmu}_{i},t)d\bm{\upmu}_{i} (68)
=2𝔼[(μisi)2]2x¯isiμ¯isiλ+t+1.\displaystyle=-\frac{2\mathbb{E}[(\mu_{is_{i}})^{2}]-2\bar{x}_{is_{i}}\bar{\mu}_{is_{i}}}{\lambda+t+1}. (69)

Next, we consider the second term in Equation 60. By Lemma 4, we have

2μ¯isidμ¯isidt=2μ¯isi(x¯isiμ¯isi)λ+t+1.\displaystyle 2\bar{\mu}_{is_{i}}\frac{d\bar{\mu}_{is_{i}}}{dt}=\frac{2\bar{\mu}_{is_{i}}(\bar{x}_{is_{i}}-\bar{\mu}_{is_{i}})}{\lambda+t+1}. (70)

Combining Equations 69 and 70, the dynamics of the variance is

dVar(μisi)dt\displaystyle\frac{d\text{Var}(\mu_{is_{i}})}{dt} =2𝔼[(μisi)2]2x¯isiμ¯isiλ+t+12μ¯isi(x¯isiμ¯isi)λ+t+1\displaystyle=-\frac{2\mathbb{E}[(\mu_{is_{i}})^{2}]-2\bar{x}_{is_{i}}\bar{\mu}_{is_{i}}}{\lambda+t+1}-\frac{2\bar{\mu}_{is_{i}}(\bar{x}_{is_{i}}-\bar{\mu}_{is_{i}})}{\lambda+t+1} (71)
=2(μ¯isi)22𝔼[(μisi)2]λ+t+1\displaystyle=\frac{2(\bar{\mu}_{is_{i}})^{2}-2\mathbb{E}[(\mu_{is_{i}})^{2}]}{\lambda+t+1} (72)
=2Var(μisi)λ+t+1.\displaystyle=-\frac{2\text{Var}(\mu_{is_{i}})}{\lambda+t+1}. (73)

Q.E.D.

Remarks: We believe that the rationale behind such a phenomenon is twofold: 1) agents apply smooth fictitious play, and 2) agents respond to the mean strategy play of other populations rather than the strategy play of some fixed agents. Regarding the former, we notice that under a similar setting, population homogenization may not occur if agents apply other learning methods, e.g., Q-learning and Cross learning. Regarding the latter, imagine that agents adjust their beliefs in response to the strategies of some fixed agents. For example, consider two populations; one contains agents A and C, and the other one contains agents B and D. Suppose that agents A and B form a fixed pair such that they adjust their beliefs only in response to each other; the same applies to agents C and D. Belief homogenization may not happen.

Appendix B: Proofs omitted in Section 4.1

Proof of Theorem 2

Belief homogenization implies that the fixed points of systems with initially heterogeneous beliefs are the same as in systems with homogeneous beliefs. Thus, we focus on homogeneous systems to analyze the fixed points. It is straightforward to see that

d𝛍idt=𝐱i𝛍iλ+t+1=0𝐱i=𝛍i.\displaystyle\frac{d\bm{\upmu}_{i}}{dt}=\frac{\mathbf{x}_{i}-\bm{\upmu}_{i}}{\lambda+t+1}=0\implies\mathbf{x}_{i}=\bm{\upmu}_{i}. (74)

Denote the fixed points of the system dynamics, which satisfies the above equation, by (𝐱i,𝛍i)(\mathbf{x}_{i}^{\ast},\bm{\upmu}_{i}^{\ast}) for each population ii. By the logit choice function (Equation 5 in the main paper), we have

xisi=exp(βuisi)siSiexp(βuisi)=exp(βjVi𝐞si𝐀ij𝛍j)siSiexp(βjVi𝐞si𝐀ij𝛍j).\displaystyle x_{is_{i}}^{\ast}=\frac{\exp{(\beta u_{is_{i}})}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp{(\beta u_{is_{i}^{\prime}})}}=\frac{\exp{(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j}^{\ast})}}{\sum_{s_{i}^{\prime}\in S_{i}}\exp{(\beta\sum_{j\in V_{i}}\mathbf{e}_{s_{i}^{\prime}}^{\top}\mathbf{A}_{ij}\bm{\upmu}_{j}^{\ast})}}. (75)

Leveraging that 𝐱i=𝛍i,iV\mathbf{x}_{i}^{\ast}=\bm{\upmu}_{i}^{\ast},\forall i\in V at the fixed points, we can replace 𝛍j\bm{\upmu}_{j}^{\ast} with 𝐱j\mathbf{x}_{j}^{\ast}. Q.E.D.

Proof of Theorem 3

Consider a population ii. The set of neighbor populations is ViV_{i}, the set of beliefs about the neighbor populations is {𝛍j}jVi\{\bm{\upmu}_{j}\}_{j\in V_{i}}, and the choice distribution is 𝐱i\mathbf{x}_{i}. Given a population network game Γ\Gamma, the expected payoff is given by 𝐱i(i,j)EAij𝛍j\mathbf{x}_{i}^{\top}\sum_{(i,j)\in E}A_{ij}\bm{\upmu}_{j}. Define a perturbed payoff function

πi(𝐱i,{𝛍j}jVi)𝐱ijViAij𝛍j+v(𝐱i)\displaystyle\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\coloneqq\mathbf{x}_{i}^{\top}\sum_{j\in V_{i}}A_{ij}\bm{\upmu}_{j}+v(\mathbf{x}_{i}) (76)

where v(𝐱i)=1βsiSixisiln(xisi)v(\mathbf{x}_{i})=-\frac{1}{\beta}\sum_{s_{i}\in S_{i}}x_{is_{i}}\ln(x_{is_{i}}). Under this form of v(𝐱i)v(\mathbf{x}_{i}), the maximization of πi\pi_{i} yields the choice distribution 𝐱i\mathbf{x}_{i} from the logit choice function [8]. Based on this, we establish the following lemma.

Lemma 5.

For a choice distribution 𝐱i\mathbf{x}_{i} of SFP in a population network game,

𝐱iπi(𝐱i,{𝛍j}jVi)=𝟎andjVi(Aij𝛍j)=𝐱iv(𝐱i).\partial_{\mathbf{x}_{i}}\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)=\mathbf{0}\quad\text{and}\quad\sum_{j\in V_{i}}\left(A_{ij}\bm{\upmu}_{j}\right)^{\top}=-\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i}). (77)
Proof.

This lemma immediately follows from the fact that the maximization of πi\pi_{i} will yield the choice distribution 𝐱i\mathbf{x}_{i} from the logit choice function [8]. ∎

The belief dynamics of a homogeneous populations can be simplified after time-reparameterization.

Lemma 6.

Given τ=lnλ+t+1λ+1\tau=\ln\frac{\lambda+t+1}{\lambda+1}, the belief dynamics of homogeneous systems (given in Equation 11 in the main paper) is equivalent to

d𝛍idτ=𝐱i𝛍i.\displaystyle\frac{d\bm{\upmu}_{i}}{d\tau}=\mathbf{x}_{i}-\bm{\upmu}_{i}. (78)
Proof.

From τ=lnλ+t+1λ+1\tau=\ln\frac{\lambda+t+1}{\lambda+1}, we have

t=(λ+1)(exp(τ)1).\displaystyle t=(\lambda+1)(\exp{(\tau)}-1). (79)

By the chain rule, for each dimension sis_{i},

dμisidτ\displaystyle\frac{d\mu_{is_{i}}}{d\tau} =dμisidtdtdτ\displaystyle=\frac{d\mu_{is_{i}}}{dt}\frac{dt}{d\tau} (80)
=xisiμisiλ+t+1d((λ+1)(exp(τ)1))dτ\displaystyle=\frac{x_{is_{i}}-\mu_{is_{i}}}{\lambda+t+1}\frac{d\left((\lambda+1)(\exp{(\tau)}-1)\right)}{d\tau} (81)
=xisiμisiλ+(λ+1)(exp(τ)1)+1(λ+1)exp(τ)\displaystyle=\frac{x_{is_{i}}-\mu_{is_{i}}}{\lambda+(\lambda+1)(\exp{(\tau)}-1)+1}(\lambda+1)\exp{(\tau)} (82)
=xisiμisi.\displaystyle=x_{is_{i}}-\mu_{is_{i}}. (83)

Next, we define the Lyapunov function LL as

LiVωiLis.t.Liπi(𝐱i,{𝛍j}jVi)πi(𝛍i,{𝛍j}jVi).L\coloneqq\sum_{i\in V}\omega_{i}L_{i}\quad\text{s.t.}\quad L_{i}\coloneqq\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)-\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right). (84)

where {ωi}iV\{\omega_{i}\}_{i\in V} is the set of positive weights defined in the weighted zero-sum Γ\Gamma. The function LL is non-negative because for every iVi\in V, 𝐱i\mathbf{x}_{i} maximizes the function πi\pi_{i}. When for every iVi\in V, 𝐱i=𝛍i\mathbf{x}_{i}=\bm{\upmu}_{i}, the function LL reaches the minimum value 0.

Rewrite LL as

L=iV[ωiπi(𝐱i,{𝛍j}jVi)ωi𝛍ijViAij𝛍jωiv(𝛍i)].L=\sum_{i\in V}\left[\omega_{i}\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)-\omega_{i}\bm{\upmu}_{i}^{\top}\sum_{j\in V_{i}}A_{ij}\bm{\upmu}_{j}-\omega_{i}v(\bm{\upmu}_{i})\right]. (85)

We observe that πi(𝐱i,{𝛍j}jVi)\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right) is convex in 𝛍j,jVi\bm{\upmu}_{j},j\in V_{i} by Danskin’s theorem, and v(𝛍i)-v(\bm{\upmu}_{i}) is strictly convex in 𝛍i\bm{\upmu}_{i}. Moreover, by the weighted zero-sum property given in Equation 2 in the main paper, we have

iV(ωi𝛍ijViAij𝛍j)=0\displaystyle\sum_{i\in V}\left(\omega_{i}\bm{\upmu}_{i}^{\top}\sum_{j\in V_{i}}A_{ij}\bm{\upmu}_{j}\right)=0 (86)

since μiΔi,μjΔj\mu_{i}\in\Delta_{i},\mu_{j}\in\Delta_{j} for every i,jV.i,j\in V. Therefore, the function LL is a strictly convex function and attains its minimum value 0 at a unique point 𝐱i=𝛍i\mathbf{x}_{i}=\bm{\upmu}_{i}, iV.\forall i\in V.

Consider the function LiL_{i}. Its time derivative is

L˙i\displaystyle\dot{L}_{i} =𝐱iπi(𝐱i,{𝛍j}jVi)𝐱˙i+jVi[𝛍jπi(𝐱i,{𝛍j}jVi)𝛍˙j]\displaystyle=\partial_{\mathbf{x}_{i}}\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\dot{\mathbf{x}}_{i}+\sum_{j\in V_{i}}\left[\partial_{\bm{\upmu}_{j}}\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\dot{\bm{\upmu}}_{j}\right] (87)
𝛍iπi(𝛍i,{𝛍j}jVi)𝛍˙ijVi[𝛍jπi(𝛍i,{𝛍j}jVi)𝛍˙j].\displaystyle\quad-\partial_{\bm{\upmu}_{i}}\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\dot{\bm{\upmu}}_{i}-\sum_{j\in V_{i}}\left[\partial_{\bm{\upmu}_{j}}\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\dot{\bm{\upmu}}_{j}\right].

Note that the partial derivative 𝐱iπi\partial_{\mathbf{x}_{i}}\pi_{i} equals 𝟎\mathbf{0} by Lemma 5. Thus, we can rewrite this as

L˙i\displaystyle\dot{L}_{i} =𝛍iπi(𝛍i,{𝛍j}jVi)𝛍˙i+jVi[𝛍jπi(𝐱i,{𝛍j}jVi)𝛍jπi(𝛍i,{𝛍j}jVi)]𝛍˙j\displaystyle=\partial_{\bm{\upmu}_{i}}\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\dot{\bm{\upmu}}_{i}+\sum_{j\in V_{i}}\left[\partial_{\bm{\upmu}_{j}}\pi_{i}\left(\mathbf{x}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)-\partial_{\bm{\upmu}_{j}}\pi_{i}\left(\bm{\upmu}_{i},\{\bm{\upmu}_{j}\}_{j\in V_{i}}\right)\right]\dot{\bm{\upmu}}_{j} (88)
=[jVi(Aij𝛍j)+μiv(𝛍i)](𝐱i𝛍i)+jVi(𝐱iAij𝛍iAij)(𝐱j𝛍j)\displaystyle=-\left[\sum_{j\in V_{i}}\left(A_{ij}\bm{\upmu}_{j}\right)^{\top}+\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})\right](\mathbf{x}_{i}-\bm{\upmu}_{i})+\sum_{j\in V_{i}}\left(\mathbf{x}_{i}^{\top}A_{ij}-\bm{\upmu}_{i}^{\top}A_{ij}\right)(\mathbf{x}_{j}-\bm{\upmu}_{j}) (89)
=[𝐱iv(𝐱i)μiv(𝛍i)](𝐱i𝛍i)+jVi(𝐱iAij𝐱j𝛍iAij𝐱j𝐱iAij𝛍j+𝛍iAij𝛍j).\displaystyle=\left[\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i})-\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})\right](\mathbf{x}_{i}-\bm{\upmu}_{i})+\sum_{j\in V_{i}}\left(\mathbf{x}_{i}^{\top}A_{ij}\mathbf{x}_{j}-\bm{\upmu}_{i}^{\top}A_{ij}\mathbf{x}_{j}-\mathbf{x}_{i}^{\top}A_{ij}\bm{\upmu}_{j}+\bm{\upmu}_{i}^{\top}A_{ij}\bm{\upmu}_{j}\right). (90)

where from Equation 89 to 90, we apply Lemma 5 to substitute jVi(Aij𝛍j)\sum_{j\in V_{i}}\left(A_{ij}\bm{\upmu}_{j}\right)^{\top} with 𝐱iv(𝐱i)-\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i}). Hence, summing over all the populations, the time derivative of LL is

L˙\displaystyle\dot{L} =iVωi[𝐱iv(𝐱i)μiv(𝛍i)](𝐱i𝛍i)\displaystyle=\sum_{i\in V}\omega_{i}\left[\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i})-\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})\right](\mathbf{x}_{i}-\bm{\upmu}_{i})
+iVjViωi(𝐱iAij𝐱j𝛍iAij𝐱j𝐱iAij𝛍j+𝛍iAij𝛍j).\displaystyle\quad+\sum_{i\in V}\sum_{j\in V_{i}}\omega_{i}\left(\mathbf{x}_{i}^{\top}A_{ij}\mathbf{x}_{j}-\bm{\upmu}_{i}^{\top}A_{ij}\mathbf{x}_{j}-\mathbf{x}_{i}^{\top}A_{ij}\bm{\upmu}_{j}+\bm{\upmu}_{i}^{\top}A_{ij}\bm{\upmu}_{j}\right). (91)

The summation in the second line is equivalent to

(i,j)E(ωi𝐱iAij𝐱j+ωj𝐱jAji𝐱i)(ωi𝛍iAij𝐱j+ωj𝐱jAji𝛍i)\displaystyle\sum_{(i,j)\in E}(\omega_{i}\mathbf{x}_{i}^{\top}A_{ij}\mathbf{x}_{j}+\omega_{j}\mathbf{x}_{j}^{\top}A_{ji}\mathbf{x}_{i})-(\omega_{i}\bm{\upmu}_{i}^{\top}A_{ij}\mathbf{x}_{j}+\omega_{j}\mathbf{x}_{j}^{\top}A_{ji}\bm{\upmu}_{i}) (92)
(ωi𝐱iAij𝛍j+ωj𝛍jAji𝐱i)+(ωi𝛍iAij𝛍j+ωj𝛍jAji𝛍i).\displaystyle\ \qquad-(\omega_{i}\mathbf{x}_{i}^{\top}A_{ij}\bm{\upmu}_{j}+\omega_{j}\bm{\upmu}_{j}^{\top}A_{ji}\mathbf{x}_{i})+(\omega_{i}\bm{\upmu}_{i}^{\top}A_{ij}\bm{\upmu}_{j}+\omega_{j}\bm{\upmu}_{j}^{\top}A_{ji}\bm{\upmu}_{i}). (93)

By the weighted zero-sum property given in Equation 2 in the main paper, this summation equals 0, yielding

L˙=iVωi[𝐱iv(𝐱i)μiv(𝛍i)](𝐱i𝛍i).\dot{L}=\sum_{i\in V}\omega_{i}\left[\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i})-\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})\right](\mathbf{x}_{i}-\bm{\upmu}_{i}). (94)

Note that the function vv is strictly concave such that its second derivative is negative definite. By this property, L˙0\dot{L}\leq 0 with equality only if 𝐱i=𝛍i,iV\mathbf{x}_{i}=\bm{\upmu}_{i},\forall i\in V, which corresponds to the QRE. Therefore, LL is a strict Lyapunov function, and the global asymptotic stability of the QRE follows. Q.E.D.

Remarks: Intuitively, the Lyapunov function defined above measures the distance between the QRE and a given set of beliefs. The idea of measuring the distance in terms of entropy-regularized payoffs is inspired from the seminal work [19]. However, different from the network games considered in this paper, Hofbauer and Hopkins [19] consider SFP in two-player games. To our knowledge, so far there has been no systematic study on SFP in network games.

Proof of Theorem 4

The proof of Theorem 4 leverages the seminal results of the asymptotically autonomous dynamical system [31, 40, 41] which conventionally is defined as follows.

Definition 1.

A nonautonomous system of differential equations in RnR^{n}

x=f(t,x)x^{\prime}=f(t,x) (95)

is said to be asymptotically autonomous with limit equation

y=g(y),y^{\prime}=g(y), (96)

if f(t,x)g(x),t,f(t,x)\to g(x),t\to\infty, where the convergence is uniform on each compact subset of RnR^{n}. Conventionally, the solution flow of Eq. 95 is called the asymptotically autonomous semiflow (denoted by ϕ\phi) and the solution flow of Eq. 96 is called the limit semiflow (denoted by Θ\Theta).

Based on this definition, we establish Lemma 1 in the main paper, which is repeated as follows.

Lemma 7.

For a system that initially has heterogeneous beliefs, the mean belief dynamics is asymptotically autonomous [31] with the limit equation

d𝛍idt=𝐱i𝛍i\displaystyle\frac{d\bm{\upmu}_{i}}{dt}=\mathbf{x}_{i}-\bm{\upmu}_{i} (97)

which after time-reparameterization is equivalent to the belief dynamics for homogeneous systems.

Proof.

We first time-reparameterize the mean belief dynamics of heterogeneous systems. Assume τ=lnλ+t+1λ+1\tau=\ln\frac{\lambda+t+1}{\lambda+1}. By the chain rule and Equation 54, for each dimension sis_{i},

dμ¯isidτ\displaystyle\frac{d\bar{\mu}_{is_{i}}}{d\tau} =dμ¯isidtdtdτ\displaystyle=\frac{d\bar{\mu}_{is_{i}}}{dt}\frac{dt}{d\tau} (98)
=[fsi(𝛍¯)μ¯isiλ+t+1+jVisjSj2fsi(𝛍¯)(μjsj)2Var(μjsj)2(λ+t+1)]d((λ+1)(exp(τ)1))dτ\displaystyle=\left[\frac{f_{s_{i}}(\bar{\bm{\upmu}})-\bar{\mu}_{is_{i}}}{\lambda+t+1}+\frac{\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}}\text{Var}(\mu_{js_{j}})}{2(\lambda+t+1)}\right]\frac{d\left((\lambda+1)(\exp{(\tau)}-1)\right)}{d\tau} (99)
=fsi(𝛍¯)μ¯isi+12jVisjSj2fsi(𝛍¯)(μjsj)2(λ+1λ+t+1)2σ2(μjsj)λ+(λ+1)(exp(τ)1)+1(λ+1)exp(τ)\displaystyle=\frac{f_{s_{i}}(\bar{\bm{\upmu}})-\bar{\mu}_{is_{i}}+\frac{1}{2}\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}}\left(\frac{\lambda+1}{\lambda+t+1}\right)^{2}\sigma^{2}(\mu_{js_{j}})}{\lambda+(\lambda+1)(\exp{(\tau)}-1)+1}\left(\lambda+1\right)\exp{(\tau)} (100)
=fsi(𝛍¯)μ¯isi+12jVisjSj2fsi(𝛍¯)(μjsj)2σ2(μjsj)exp(2τ).\displaystyle=f_{s_{i}}(\bar{\bm{\upmu}})-\bar{\mu}_{is_{i}}+\frac{1}{2}\sum_{j\in V_{i}}\sum_{s_{j}\in S_{j}}\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}}\sigma^{2}(\mu_{js_{j}})\exp{(-2\tau)}. (101)

Observe that exp(2τ)\exp{(-2\tau)} decays to zero exponentially fast and that both σ2(μjsj)\sigma^{2}(\mu_{js_{j}}) and 2fsi(𝛍¯)(μjsj)2\frac{\partial^{2}f_{s_{i}}(\bar{\bm{\upmu}})}{(\partial\mu_{js_{j}})^{2}} are bounded for every 𝛍\bm{\upmu} in the simplex jViΔj\prod_{j\in V_{i}}\Delta_{j}. Hence, Equation 101 converges locally and uniformly to the following equation:

dμ¯isidτ=fsi(𝛍¯)μ¯isi.\displaystyle\frac{d\bar{\mu}_{is_{i}}}{d\tau}=f_{s_{i}}(\bar{\bm{\upmu}})-\bar{\mu}_{is_{i}}. (102)

Note that xisi=fsi(𝛍¯)x_{is_{i}}=f_{s_{i}}(\bar{\bm{\upmu}}) for homogeneous systems, and the above equation is algebraically equivalent to Equation 97. Hence, by Definition 1, Equation 101 is asymptotically autonomous with the limit equation being Equation 97. ∎

By the above lemma, we can formally connect the limit behaviors of initially heterogeneous systems and those of homogeneous systems. Recall that Theorem 3 in the main paper states that under SFP, there is a unique rest point (QRE) for the belief dynamics in a weighted zero-sum network game Γ\Gamma; this excludes the case where there are finitely many equilibria that are chained to each other. Hence, combining Lemma 2 in the main paper, we prove that the mean belief dynamics of initially heterogeneous systems converges to a unique QRE. Q.E.D.

Appendix C: Results and Proofs omitted in Section 4.2

For the case of network coordination, we consider networks that consist of a star or disconnected multiple stars due to technical reasons. In Figure 1, we present examples of the considered network structure with different numbers of nodes (populations).

Refer to caption
Figure 5: Population network games where the underlying network consists of star structure.

In the following theorem, focusing on homogeneous systems, we establish the convergence of the belief dynamics to the set of QRE.

Theorem 6 (Convergence in Homogeneous Network Coordination with Star Structure).

Given a coordination Γ\Gamma where the network structure consists of a single or disconnected multiple stars, each orbit of the belief dynamics for homogeneous systems converges to the set of QRE.

Proof.

Consider a root population jj of a star structure. Its set of leaf (neighbor) populations is VjV_{j}, the set of beliefs about the leaf populations is {𝛍i}iVj\{\bm{\upmu}_{i}\}_{i\in V_{j}}, and the choice distribution is 𝐱j\mathbf{x}_{j}. Given the game Γ\Gamma, the expected payoff is 𝐱jiVjAji𝛍i\mathbf{x}_{j}^{\top}\sum_{i\in V_{j}}A_{ji}\bm{\upmu}_{i}. Define a perturbed payoff function

πj(𝐱j,{𝛍i}iVj)𝐱jiVjAji𝛍i+v(𝐱j)\displaystyle\pi_{j}\left(\mathbf{x}_{j},\{\bm{\upmu}_{i}\}_{i\in V_{j}}\right)\coloneqq\mathbf{x}_{j}^{\top}\sum_{i\in V_{j}}A_{ji}\bm{\upmu}_{i}+v(\mathbf{x}_{j}) (103)

where v(𝐱j)=1βsjSjxjsjln(xjsj)v(\mathbf{x}_{j})=-\frac{1}{\beta}\sum_{s_{j}\in S_{j}}x_{js_{j}}\ln(x_{js_{j}}). Under this form of v(𝐱j)v(\mathbf{x}_{j}), the maximization of πj\pi_{j} yields the choice distribution 𝐱j\mathbf{x}_{j} from the logit choice function [8].

Consider a leaf population ii of the root population jj. It has only one neighbor population, which is population jj. Thus, given the game Γ\Gamma, the expected payoff is 𝐱iAij𝛍j\mathbf{x}_{i}^{\top}A_{ij}\bm{\upmu}_{j}. Define a perturbed payoff function

πi(𝐱i,𝛍j)𝐱iAij𝛍j+v(𝐱i)\displaystyle\pi_{i}\left(\mathbf{x}_{i},\bm{\upmu}_{j}\right)\coloneqq\mathbf{x}_{i}^{\top}A_{ij}\bm{\upmu}_{j}+v(\mathbf{x}_{i}) (104)

where v(𝐱i)=1βsiSixisiln(xisi)v(\mathbf{x}_{i})=-\frac{1}{\beta}\sum_{s_{i}\in S_{i}}x_{is_{i}}\ln(x_{is_{i}}). Similarly, the maximization of πi\pi_{i} yields the choice distribution 𝐱i\mathbf{x}_{i} from the logit choice function [8]. Based on this, we establish the following lemma.

Lemma 8.

For choice distributions of SFP in a population network game with start structure,

𝐱jπj(𝐱j,{𝛍i}iVj)=𝟎andiVj(Aji𝛍i)=𝐱jv(𝐱j)\displaystyle\partial_{\mathbf{x}_{j}}\pi_{j}\left(\mathbf{x}_{j},\{\bm{\upmu}_{i}\}_{i\in V_{j}}\right)=\mathbf{0}\quad\text{and}\quad\sum_{i\in V_{j}}\left(A_{ji}\bm{\upmu}_{i}\right)^{\top}=-\partial_{\mathbf{x}_{j}}v(\mathbf{x}_{j})\quad if jj is a root population, (105)
𝐱iπi(𝐱i,𝛍j)=𝟎and(Aij𝛍j)=𝐱iv(𝐱i)\displaystyle\partial_{\mathbf{x}_{i}}\pi_{i}\left(\mathbf{x}_{i},\bm{\upmu}_{j}\right)=\mathbf{0}\quad\text{and}\quad\left(A_{ij}\bm{\upmu}_{j}\right)^{\top}=-\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i})\quad if ii is a leaf population. (106)
Proof.

This lemma immediately follows from the fact that the maximization of πj\pi_{j} and πi\pi_{i} , respectively, yield the choice distributions 𝐱j\mathbf{x}_{j} and 𝐱i\mathbf{x}_{i} from the logit choice function [8]. ∎

For readability, we repeat the belief dynamics of a homogeneous population after time-reparameterization, which has been proved in Lemma 4 in Appendix B, as follows:

d𝛍idτ=𝐱i𝛍i.\displaystyle\frac{d\bm{\upmu}_{i}}{d\tau}=\mathbf{x}_{i}-\bm{\upmu}_{i}. (107)

Let V\mathcal{R}\subset V be the set of all root populations. We define

LjLjs.t.Lj\displaystyle L\coloneqq\sum_{j\in\mathcal{R}}L_{j}\quad\text{s.t.}\quad L_{j} 𝛍jiVjAji𝛍i+v(𝛍j)+iVjv(𝛍i).\displaystyle\coloneqq\bm{\upmu}_{j}^{\top}\sum_{i\in V_{j}}A_{ji}\bm{\upmu}_{i}+v(\bm{\upmu}_{j})+\sum_{i\in V_{j}}v(\bm{\upmu}_{i}). (108)

Consider the function LjL_{j}. Its time derivative L˙j\dot{L}_{j} is

L˙j=[𝛍j(𝛍jiVjAji𝛍i)𝛍˙j+iVj𝛍i(𝛍jiVjAji𝛍i)𝛍˙i]+μjv(𝛍j)𝛍˙j+iVjμiv(𝛍i)𝛍˙i\displaystyle\dot{L}_{j}=\left[\partial_{\bm{\upmu}_{j}}(\bm{\upmu}_{j}^{\top}\sum_{i\in V_{j}}A_{ji}\bm{\upmu}_{i})\dot{\bm{\upmu}}_{j}+\sum_{i\in V_{j}}\partial_{\bm{\upmu}_{i}}(\bm{\upmu}_{j}^{\top}\sum_{i\in V_{j}}A_{ji}\bm{\upmu}_{i})\dot{\bm{\upmu}}_{i}\right]+\partial_{\mathbf{\upmu}_{j}}v(\bm{\upmu}_{j})\dot{\bm{\upmu}}_{j}+\sum_{i\in V_{j}}\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})\dot{\bm{\upmu}}_{i} (109)
=iVj(Aji𝛍i)(𝐱j𝛍j)+[iVj𝛍jAji(𝐱i𝛍i)]+μjv(𝛍j)(𝐱j𝛍j)+iVjμiv(𝛍i)(𝐱i𝛍i).\displaystyle=\sum_{i\in V_{j}}(A_{ji}\bm{\upmu}_{i})^{\top}(\mathbf{x}_{j}-\bm{\upmu}_{j})+\left[\sum_{i\in V_{j}}\bm{\upmu}_{j}^{\top}A_{ji}(\mathbf{x}_{i}-\bm{\upmu}_{i})\right]+\partial_{\mathbf{\upmu}_{j}}v(\bm{\upmu}_{j})(\mathbf{x}_{j}-\bm{\upmu}_{j})+\sum_{i\in V_{j}}\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})(\mathbf{x}_{i}-\bm{\upmu}_{i}). (110)

Since Γ\Gamma is a coordination game, we have (Aij𝛍j)=𝛍jAij=𝛍jAji\left(A_{ij}\bm{\upmu}_{j}\right)^{\top}=\bm{\upmu}_{j}^{\top}A_{ij}^{\top}=\bm{\upmu}_{j}^{\top}A_{ji}. Hence, applying Lemma 8, we can substitute iVj(Aji𝛍i)\sum_{i\in V_{j}}(A_{ji}\bm{\upmu}_{i})^{\top} with v(𝐱j)-v^{\prime}(\mathbf{x}_{j}), and 𝛍jAji\bm{\upmu}_{j}^{\top}A_{ji} with v(𝐱i)-v^{\prime}(\mathbf{x}_{i}), yielding

L˙j=𝐱jv(𝐱j)(𝐱j𝛍j)+[iVj(𝐱iv(𝐱i))(𝐱i𝛍i)]+μjv(𝛍j)(𝐱j𝛍j)+iVjμiv(𝛍i)(𝐱i𝛍i)\displaystyle\dot{L}_{j}=-\partial_{\mathbf{x}_{j}}v(\mathbf{x}_{j})(\mathbf{x}_{j}-\bm{\upmu}_{j})+\left[\sum_{i\in V_{j}}(-\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i}))(\mathbf{x}_{i}-\bm{\upmu}_{i})\right]+\partial_{\mathbf{\upmu}_{j}}v(\bm{\upmu}_{j})(\mathbf{x}_{j}-\bm{\upmu}_{j})+\sum_{i\in V_{j}}\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})(\mathbf{x}_{i}-\bm{\upmu}_{i}) (111)
=(μjv(𝛍j)𝐱jv(𝐱j))(𝐱j𝛍j)+iVj(μiv(𝛍i)𝐱iv(𝐱i))(𝐱i𝛍i)\displaystyle=(\partial_{\mathbf{\upmu}_{j}}v(\bm{\upmu}_{j})-\partial_{\mathbf{x}_{j}}v(\mathbf{x}_{j}))(\mathbf{x}_{j}-\bm{\upmu}_{j})+\sum_{i\in V_{j}}(\partial_{\mathbf{\upmu}_{i}}v(\bm{\upmu}_{i})-\partial_{\mathbf{x}_{i}}v(\mathbf{x}_{i}))(\mathbf{x}_{i}-\bm{\upmu}_{i}) (112)

Note that the function vv is strictly concave such that its second derivative is negative definite. By this property, L˙j0\dot{L}_{j}\geq 0 with equality only if 𝐱i=𝛍i,iVj\mathbf{x}_{i}=\bm{\upmu}_{i},\forall i\in V_{j} and 𝐱j=𝛍j\mathbf{x}_{j}=\bm{\upmu}_{j}. Thus, the time derivative of the function LL, i.e., L˙=jL˙j0\dot{L}=\sum_{j\in\mathcal{R}}\dot{L}_{j}\geq 0 with equality only if 𝐱i=𝛍i,iVj,𝐱j=𝛍j,j\mathbf{x}_{i}=\bm{\upmu}_{i},\forall i\in V_{j},\mathbf{x}_{j}=\bm{\upmu}_{j},\forall j\in\mathcal{R}. ∎

We generalize the convergence result to initially heterogeneous systems in the following theorem.

Theorem 7 (Convergence in Initially Heterogeneous Network Coordination with Star Structure).

Given a coordination Γ\Gamma where the network structure consists of a single or disconnected multiple stars, each orbit of the mean belief dynamics for initially heterogeneous systems converges to the set of QRE.

Proof.

The proof technique is similar to that for initially heterogeneous competitive network games. By Lemma 1 in the main paper, we show that the mean belief dynamics of initially heterogeneous systems is asymptotically autonomous with the belief dynamics of homogeneous systems. Therefore, it follows from Lemma 2 in the main paper that the convergence result for homogeneous systems can be carried over to the initially heterogeneous systems. ∎

Remarks: The convergence of SFP in coordination games and potential games has been established under the 2-player settings [19] as well as some n-player settings [20, 39]. Our work differs from the previous works in two aspects. First, our work allows for heterogeneous beliefs. Moreover, we consider that agents maintain separate beliefs about other agents, while in the previous works agents do not distinguish between other agents. Thus, even when the system beliefs are homogeneous, our setting is still different from (and more complicated) than the previous settings.

Appendix D: Omitted Experimental Details

Numerical Method for the PDE model.

PDEs are notoriously difficult to solve, and only limited types of PDEs allow analytic solutions. Hence, similar to previous research [23], we resort to numerical method for PDEs; in particular, we consider the finite difference method [38].

Agent-based Simulations.

The presented simulation results are averaged over 100 independent simulation runs to smooth out the randomness. For each simulation run, there are 1,0001,000 agents in each population. For each agent, the initial beliefs are sampled from the given initial probability distribution.

Detailed Experimental Setups for Figure 1.

In the case of small initial variance, the initial beliefs μ1H\mu_{1H} and μ2H\mu_{2H} are distributed according to the distribution Beta(280,120)\text{Beta}(280,120). On the contrary, in the case of large initial variance, the initial beliefs μ1H\mu_{1H} and μ2H\mu_{2H} are distributed according to the distribution Beta(14,6)\text{Beta}(14,6). Thus, initially, the mean beliefs in these two cases are both μ¯1H=μ¯2H=0.7\bar{\mu}_{1H}=\bar{\mu}_{2H}=0.7 and μ¯1S=μ¯2S=0.3\bar{\mu}_{1S}=\bar{\mu}_{2S}=0.3. In both cases, the initial sum of weights λ=10\lambda=10 and the temperature β=10\beta=10.

Detailed Experimental Setups for Figure 3.

We visualize the regions of attraction of different equilibria in stag hunt games by numerically solving the mean belief dynamics (Equation 10 in the main paper). The initial variances have been given in the title of each panel. In all cases, the initial sum of weights λ=0\lambda=0 and the temperature β=5\beta=5.

Detailed Experimental Setups for Figure 4.

We let the initial beliefs about populations 1, 3 and 5 remain unchanged across different cases, and vary the initial beliefs about populations 2 and 4. The initial beliefs about populations 1, 3 and 5, denoted by μ1H\mu_{1H}, μ3H\mu_{3H} and μ5H\mu_{5H}, are distributed according to the distributions Beta(20,10)\text{Beta}(20,10), Beta(6,4)\text{Beta}(6,4), and Beta(10,5)\text{Beta}(10,5), respectively. The initial beliefs about populations 2 and 4 have been given in the legends of Figure 4. In all cases, the initial sum of weights λ=10\lambda=10 and the temperature β=10\beta=10. Note that μiT=1μiH\mu_{iT}=1-\mu_{iH} for all populations i=1,2,3,4,5.i=1,2,3,4,5.

Source Code and Computing Resource.

We have attached the source code for reproducing our main experiments. The Matlab script finitedifference.m numerically solves our PDE model presented in Proposition 1 in the main paper. The Matlab script regionofattraction.m visualizes the region of attraction of different equilibria in stag hunt games, which are presented in Figure 3. The Python scripts simulation(staghunt).py and simulation(matchingpennies).py correspond to the agent-based simulations in two-population stag hunt games and five-population asymmetric matching pennies games, respectively. We use a laptop (CPU: AMD Ryzen 7 5800H) to run all the experiments.

References

  • [1] Yakov Babichenko and Aviad Rubinstein. Settling the complexity of nash equilibrium in congestion games. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1426–1437, 2021.
  • [2] Michel Benaïm and Mathieu Faure. Consistency of vanishingly smooth fictitious play. Mathematics of Operations Research, 38(3):437–450, 2013.
  • [3] Michel Benaım and Morris W Hirsch. Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games and Economic Behavior, 29(1-2):36–72, 1999.
  • [4] Shant Boodaghians, Rucha Kulkarni, and Ruta Mehta. Smoothed efficient algorithms and reductions for network coordination games. arXiv preprint arXiv:1809.02280, 2018.
  • [5] Yang Cai and Constantinos Daskalakis. On minmax theorems for multiplayer games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms, pages 217–234. SIAM, 2011.
  • [6] Aleksander Czechowski and Georgios Piliouras. Poincar\\backslash{\{e}\}-bendixson limit sets in multi-agent learning. 2022.
  • [7] Christian Ewerhart and Kremena Valkanova. Fictitious play in networks. Games and Economic Behavior, 123:182–206, 2020.
  • [8] Drew Fudenberg, Fudenberg Drew, David K Levine, and David K Levine. The theory of learning in games, volume 2. MIT press, 1998.
  • [9] Drew Fudenberg and David M Kreps. Learning mixed equilibria. Games and economic behavior, 5(3):320–367, 1993.
  • [10] Drew Fudenberg and David K Levine. Steady state learning and nash equilibrium. Econometrica: Journal of the Econometric Society, pages 547–573, 1993.
  • [11] Drew Fudenberg and David K Levine. Measuring players’ losses in experimental games. The Quarterly Journal of Economics, 112(2):507–536, 1997.
  • [12] Drew Fudenberg and Satoru Takahashi. Heterogeneous beliefs and local information in stochastic fictitious play. Games and Economic Behavior, 71(1):100–120, 2011.
  • [13] Colin S Gillespie. Moment-closure approximations for mass-action models. IET systems biology, 3(1):52–58, 2009.
  • [14] Jacob K Goeree, Charles A Holt, and Thomas R Palfrey. Quantal response equilibrium. In Quantal Response Equilibrium. Princeton University Press, 2016.
  • [15] Leo A Goodman. Population growth of the sexes. Biometrics, 9(2):212–225, 1953.
  • [16] Shubham Gupta, Rishi Hazra, and Ambedkar Dukkipati. Networked multi-agent reinforcement learning with emergent communication. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1858–1860, 2020.
  • [17] Jiequn Han and Ruimeng Hu. Deep fictitious play for finding markovian nash equilibrium in multi-agent games. In Mathematical and Scientific Machine Learning, pages 221–245. PMLR, 2020.
  • [18] Erik Hernández, Antonio Barrientos, and Jaime Del Cerro. Selective smooth fictitious play: An approach based on game theory for patrolling infrastructures with a multi-robot system. Expert Systems with Applications, 41(6):2897–2913, 2014.
  • [19] Josef Hofbauer and Ed Hopkins. Learning in perturbed asymmetric games. Games and Economic Behavior, 52(1):133–152, 2005.
  • [20] Josef Hofbauer and William H Sandholm. On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–2294, 2002.
  • [21] Josef Hofbauer and William H Sandholm. Evolution in games with randomly disturbed payoffs. Journal of economic theory, 132(1):47–69, 2007.
  • [22] Ed Hopkins. A note on best response dynamics. Games and Economic Behavior, 29(1-2):138–150, 1999.
  • [23] Shuyue Hu, Chin-wing Leung, and Ho-fung Leung. Modelling the dynamics of multiagent q-learning in repeated symmetric games: a mean field theoretic approach. In Advances in Neural Information Processing Systems, pages 12102–12112, 2019.
  • [24] Shuyue Hu, Chin-Wing Leung, Ho-fung Leung, and Harold Soh. The dynamics of q-learning in population games: A physics-inspired continuity equation model. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, page 615–623, 2022.
  • [25] Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
  • [26] KG Binmore AP Kirman et al. Frontiers of game theory. Mit Press, 1993.
  • [27] Ratul Lahkar and Robert M Seymour. Reinforcement learning in population games. Games and Economic Behavior, 80:10–38, 2013.
  • [28] Stefanos Leonardos and Georgios Piliouras. Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11263–11271, 2021.
  • [29] Chin-Wing Leung, Shuyue Hu, and Ho-Fung Leung. Self-play or group practice: Learning to play alternating markov game in multi-agent system. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9234–9241. IEEE, 2021.
  • [30] Yiwei Liu, Jiamou Liu, Kaibin Wan, Zhan Qin, Zijian Zhang, Bakhadyr Khoussainov, and Liehuang Zhu. From local to global norm emergence: Dissolving self-reinforcing substructures with incremental social instruments. In International Conference on Machine Learning, pages 6871–6881. PMLR, 2021.
  • [31] L Markus. Asymptotically autonomous differential systems. contributions to the theory of nonlinear oscillations iii (s. lefschetz, ed.), 17-29. Annals of Mathematics Studies, 36, 1956.
  • [32] Timothy I Matis and Ivan G Guardiola. Achieving moment closure through cumulant neglect. The Mathematica Journal, 12:12–2, 2010.
  • [33] Richard D McKelvey and Thomas R Palfrey. Quantal response equilibria for normal form games. Games and economic behavior, 10(1):6–38, 1995.
  • [34] Sai Ganesh Nagarajan, David Balduzzi, and Georgios Piliouras. From chaos to order: Symmetry and conservation laws in game dynamics. In International Conference on Machine Learning, pages 7186–7196. PMLR, 2020.
  • [35] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos. Advances in Neural Information Processing Systems, 30, 2017.
  • [36] Julien Perolat, Bilal Piot, and Olivier Pietquin. Actor-critic fictitious play in simultaneous move multistage games. In International Conference on Artificial Intelligence and Statistics, pages 919–928. PMLR, 2018.
  • [37] Sarah Perrin, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, and Olivier Pietquin. Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33:13199–13213, 2020.
  • [38] Gordon Dennis Smith. Numerical solution of partial differential equations: finite difference methods. Clarendon Press, 1985.
  • [39] Brian Swenson and H Vincent Poor. Smooth fictitious play in n×\times 2 potential games. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 1739–1743. IEEE, 2019.
  • [40] Horst R Thieme. Convergence results and a poincaré-bendixson trichotomy for asymptotically autonomous differential equations. Journal of mathematical biology, 30(7):755–763, 1992.
  • [41] Horst R Thieme. Asymptotically autonomous differential equations in the plane. The Rocky Mountain Journal of Mathematics, pages 351–380, 1994.
  • [42] Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, and Andreea Minca. Provable fictitious play for general mean-field games. arXiv preprint arXiv:2010.04211, 2020.
  • [43] Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Basar. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, pages 5872–5881. PMLR, 2018.
  • [44] Rui Zhao, Jinming Song, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun, and Yang Wei. Maximum entropy population based training for zero-shot human-ai coordination. arXiv preprint arXiv:2112.11701, 2021.