Near-Optimal No-Regret Learning in General Games
Abstract
We show that Optimistic Hedge – a common variant of multiplicative-weights-updates with recency bias – attains regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after rounds of interaction, each player experiences total regret that is . Our bound improves, exponentially, the regret attainable by standard no-regret learners in games, the regret attainable by no-regret learners with recency bias [SALS15], and the bound that was recently shown for Optimistic Hedge in the special case of two-player games [CP20]. A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of .
1 Introduction
Online learning has a long history that is intimately related to the development of game theory, convex optimization, and machine learning. One of its earliest instantiations can be traced to Brown’s proposal [Bro49] of fictitious play as a method to solve two-player zero-sum games. Indeed, as shown by [Rob51], when the players of (zero-sum) matrix game use fictitious play to iteratively update their actions in response to each other’s history of play, the resulting dynamics converge in the following sense: the product of the empirical distributions of strategies for each player converges to the set of Nash equilibria in the game, though the rate of convergence is now known to be exponentially slow [DP14]. Moreover, such convergence to Nash equilibria fails in non-zero-sum games [Sha64].
The slow convergence of fictitious play to Nash equilibria in zero-sum matrix games and non-convergence in general-sum games can be mitigated by appealing to the pioneering works [Bla54, Han57] and the ensuing literature on no-regret learning [CBL06]. It is known that if both players of a zero-sum matrix game experience regret that is at most , the product of the players’ empirical distributions of strategies is an -approximate Nash equilibrium. More generally, if each player of a general-sum, multi-player game experiences regret that is at most , the empirical distribution of joint strategies converges to a coarse correlated equilibrium111In general-sum games, it is typical to focus on proving convergence rates for weaker types of equilibrium than Nash, such as coarse correlated equilibria, since finding Nash equilibria is PPAD-complete [DGP06, CDT09]. of the game, at a rate of . Importantly, a multitude of online learning algorithms, such as the celebrated Hedge and Follow-The-Perturbed-Leader algorithms, guarantee adversarial regret [CBL06]. Thus, when such algorithms are employed by all players in a game, their regret implies convergence to coarse correlated equilibria (and Nash equilibria of matrix games) at a rate of .
While standard no-regret learners guarantee regret for each player in a game, the players can do better by employing specialized no-regret learning procedures. Indeed, it was established by [DDK11] that there exists a somewhat complex no-regret learner based on Nesterov’s excessive gap technique [Nes05], which guarantees regret to each player of a two-player zero-sum game. This represents an exponential improvement over the regret guaranteed by standard no-regret learners. More generally, [SALS15] established that if players of a multi-player, general-sum game use any algorithm from the family of Optimistic Mirror Descent (MD) or Optimistic Follow-the-Regularized-Leader (FTRL) algorithms (which are analogoues of the MD and FTRL algorithms, respectively, with recency bias), each player enjoys regret that is . This was recently improved by [CP20] to in the special case of two-player games in which the players use Optimistic Hedge, a particularly simple representative from both the Optimistic MD and Optimistic FTRL families.
The above results for general-sum games represent significant improvements over the regret attainable by standard no-regret learners, but are not as dramatic as the logarithmic regret that has been shown attainable by no-regret learners, albeit more complex ones, in 2-player zero-sum games (e.g., [DDK11]). Indeed, despite extensive work on no-regret learning, understanding the optimal regret that can be guaranteed by no-regret learning algorithms in general-sum games has remained elusive. This question is especially intruiging in light of experiments suggesting that polylogarithmic regret should be attainable [SALS15, HAM21]. In this paper we settle this question by showing that no-regret learners can guarantee polylogarithmic regret to each player in general-sum multi-player games. Moreover, this regret is attainable by a particularly simple algorithm – Optimistic Hedge:
Algorithm Setting Regret in games Adversarial regret Hedge (& many other algs.) multi-player, general-sum [CBL06] [CBL06] Excessive Gap Technique 2-player, 0-sum [DDK11] [DDK11] DS-OptMD, OptDA 2-player, 0-sum [HAM21] [HAM21] Optimistic Hedge multi-player, general-sum [RS13b, SALS15] [RS13b, SALS15] Optimistic Hedge 2-player, general-sum [CP20] Optimistic Hedge multi-player, general-sum (Theorem 3.1) (Corollary D.1)
Theorem 1.1 (Abbreviated version of Theorem 3.1).
Suppose that players play a general-sum multi-player game, with a finite set of strategies per player, over rounds. Suppose also that each player uses Optimistic Hedge to update her strategy in every round, as a function of the history of play so far. Then each player experiences regret.
An immediate corollary of Theorem 1.1 is that the empirical distribution of play is a -approximate coarse correlated equilibrium (CCE) of the game. We remark that Theorem 1.1 bounds the total regret experienced by each player of the multi-player game, which is the most standard regret objective for no-regret learning in games, and which is essential to achieve convergence to CCE. For the looser objective of the average of all players’ regrets, [RS13b] established a bound for Optimistic Hedge in two-player zero-sum games, and [SALS15] generalized this bound, to in -player general-sum games. Note that since some players may experience negative regret [HAM21], the average of the players’ regrets cannot be used in general to bound the maximum regret experienced by any individual player. Finally, we remark that several results in the literature posit no-regret learning as a model of agents’ rational behavior; for instance, [Rou09, ST13, RST17] show that no-regret learners in smooth games enjoy strong Price-of-Anarchy bounds. By showing that each agent can obtain very small regret in games by playing Optimistic Hedge, Theorem 1.1 strengthens the plausability of the common assumption made in this literature that each agent will choose to use such a no-regret algorithm.
1.1 Related work
Table 1 summarizes the prior works that aim to establish optimal regret bounds for no-regret learners in games. We remark that [CP20] shows that the regret of Hedge is even in 2-player games where each player has 2 actions, meaning that optimism is necessary to obtain fast rates. The table also includes a recent result of [HAM21] showing that when the players in a 2-player zero-sum game with actions per player use a variant of Optimistic Hedge with adaptive step size (a special case of their algorithms DS-OptMD and OptDA), each player has regret. The techniques of [HAM21] differ substantially from ours: the result in [HAM21] is based on showing that the joint strategies rapidly converge, pointwise, to a Nash equilibrium . Such a result seems very unlikely to extend to our setting of general-sum games, since finding an approximate Nash equilibrium even in 2-player games is PPAD-complete [CDT09]. We also remark that the earlier work [KHSC18] shows that each player’s regret is at most when they use a certain algorithm based on Optimistic MD in 2-player zero-sum games; their technique is heavily tailored to 2-player zero-sum games, relying on the notion of duality in such a setting.
[FLL+16] shows that one can obtain fast rates in games for a broader class of algorithms (e.g., including Hedge) if one adopts a relaxed (approximate) notion of optimality. [WL18] uses optimism to obtain adaptive regret bounds for bandit problems. Many recent papers (e.g., [DP19, GPD20, LGNPw21, HAM21, WLZL21, AIMM21]) have studied the last-iterate convergence of algorithms from the Optimistic Mirror Descent family, which includes Optimistic Hedge. Finally, a long line of papers (e.g., [HMcW+03, DFP+10, KLP11, BCM12, PP16, BP18, MPP18, BP19, CP19, VGFL+20]) has studied the dynamics of learning algorithms in games. Essentially all of these papers do not use optimism, and many of them show non-convergence (e.g., divergence or recurrence) of the iterates of various learning algorithms such as FTRL and Mirror Descent when used in games.
2 Preliminaries
Notation.
For a positive integer , let . For a finite set , let denote the space of distributions on . For , we will write and interpret elements of as vectors in . For a vector and , we denote the th coordinate of as . For vectors , write . The base-2 logarithm of is denoted .
No-regret learning in games.
We consider a game with players, where player has action space with actions. We may assume that for each player . The joint action space is . The specification of the game is completed by a collection of loss functions . For an action profile and , is the loss player experiences when each player plays . A mixed strategy for player is a distribution over , with the probability of playing action given by . Given a mixed strategy profile (or an action profile ) and a player we let (or , respectively) denote the profile after removing the th mixed strategy (or the th action , respectively).
The players play the game for a total of rounds. At the beginning of each round , each player chooses a mixed strategy . The loss vector of player , denoted , is defined as . As a matter of convention, set to be the all-zeros vector. We consider the full-information setting in this paper, meaning that player observes its full loss vector for each round . Finally, player experiences a loss of . The goal of each player is to minimize its regret, defined as:
Optimistic hedge.
The Optimistic Hedge algorithm chooses mixed strategies for player as follows: at time , it sets to be the uniform distribution on . Then for all , player ’s strategy at iteration is defined as follows, for :
(1) |
Optimistic Hedge is a modification of Hedge, which performs the updates . The update (1) modifies the Hedge update by replacing the loss vector with a predictor of the following iteration’s loss vector, . Hedge corresponds to FTRL with a negative entropy regularizer (see, e.g., [Bub15]), whereas Optimistic Hedge corresponds to Optimistic FTRL with a negative entropy regularizer [RS13b, RS13a].
Distributions & divergences.
For distributions on a finite domain , the KL divergence between is . The chi-squared divergence between is . For a distribution on and a vector , we write Also define . If further has full support, then define . The above notations will often be used when is the mixed strategy profile for some player and is a loss vector ; in such a case the norms and are often called local norms.
3 Results
Below we state our main theorem, which shows that when all players in a game play according to Optimistic Hedge with appropriate step size, they all experience polylogarithmic individual regrets.
Theorem 3.1 (Formal version of Theorem 1.1).
There are constants so that the following holds. Suppose a time horizon and a game with players and actions for each player is given. Suppose all players play according to Optimistic Hedge with any positive step size . Then for any , the regret of player satisfies
(2) |
In particular, if the players’ step size is chosen as , then the regret of player satisfies
(3) |
A common goal in the literature on learning in games is to obtain an algorithm that achieves fast rates whan played by all players, and so that each player still obtains the optimal rate of in the adversarial setting (i.e., when receives an arbitrary sequence of losses ). We show in Corollary D.1 (in the appendix) that by running Optimistic Hedge with an adaptive step size, this is possible. Table 1 compares our regret bounds discussed in this section to those of prior work.
4 Proof overview
In this section we overview the proof of Theorem 3.1; the full proof may be found in the appendix.
4.1 New adversarial regret bound
The first step in the proof of Theorem 3.1 is to prove a new regret bound (Lemma 4.1 below) for Optimistic Hedge that holds for an adversarial sequence of losses. We will show in later sections that when all players play according to Optimistic Hedge, the right-hand side of the regret bound (4) is bounded by a quantity that grows only poly-logarithmically in .
Lemma 4.1.
There is a constant so that the following holds. Suppose any player follows the Optimistic Hedge updates (1) with step size , for an arbitrary sequence of losses . Then
(4) |
The detailed proof of Lemma 4.1 can be found in Section A, but we sketch the main steps here. The starting point is a refinement of [RS13a, Lemma 3] (stated as Lemma A.5), which gives an upper bound for in terms of local norms corresponding to each of the iterates of Optimistic Hedge. The bound involves the difference between the Optimistic Hedge iterates and iterates defined by :
(5) |
We next show (in Lemma A.2) that and may be lower bounded by and , respectively. Note it is a standard fact that the KL divergence between two distributions is upper bounded by the chi-squared distribution between them; by contrast, Lemma A.2 can exploit that , and are close to each other to show a reverse inequality. Finally, exploiting the exponential weights-style functional relationship between and , we show (in Lemma A.3) that the -divergence may be lower bounded by , leading to the term being subtracted in (4). The -divergence , as well as the term in (5) are bounded in a similar manner to obtain (4).
4.2 Finite differences
Given Lemma 4.1, in order to establish Theorem 3.1, it suffices to show Lemma 4.2 below. Indeed, (6) below implies that the right-hand side of (4) is bounded above by , which is bounded above by for the choice of Theorem 3.1.222Notice that the factor in (6) is not important for this argument – any constant less than 1 would suffice.
Lemma 4.2 (Abbreviated; detailed version in Section C.3).
Suppose all players play according to Optimistic Hedge with step size satifying for a sufficiently large constant . Then for any , the losses for player satisfy:
(6) |
The definition below allows us to streamline our notation when proving Lemma 4.2.
Definition 4.1 (Finite differences).
Suppose is a sequence of vectors . For integers , the order- finite difference sequence for the sequence , denoted by , is the sequence defined recursively as: for all , and
(7) |
for all , .333We remark that while Definition 4.1 is stated for a 1-indexed sequence , we will also occasionally consider 0-indexed sequences , in which case the same recursive definition (7) holds for the finite differences , .
Remark 4.3.
Notice that another way of writing (7) is: We also remark for later use that
Let , where denotes the fixed time horizon from Theorem 3.1 (and thus Lemma 4.2). In the proof of Lemma 4.2, we will bound the finite differences of order for certain sequences. The bound (6) of Lemma 4.2 may be rephased as upper bounding , by ; to prove this, we proceed in two steps:
-
1.
(Upwards induction step) First, in Lemma 4.4 below, we find an upper bound on for all , , which decays exponentially in for . This is done via upwards induction on , i.e., first proving the base case using boundedness of the losses and then inductively. The main technical tool we develop for the inductive step is a weak form of the chain rule for finite differences, Lemma 4.5. The inductive step uses the fact that all players are following Optimistic Hedge to relate the th order finite differences of player ’s loss sequence to the th order finite differences of the strategy sequences for players ; then we use the exponential-weights style updates of Optimistic Hedge and Lemma 4.5 to relate the th order finite differences of the strategies to the th order finite differences of the losses .
-
2.
(Downwards induction step) We next show that for all , is bounded above by , for some and . This shown via downwards induction on , namely first establishing the base case by using the result of item 1 for and then treating the cases . The inductive step makes use of the discrete Fourier transform (DFT) to relate the finite differences of different orders (see Lemmas 4.7 and 4.8). In particular, Parseval’s equality together with a standard relationship between the DFT of the finite differences of a sequence to the DFT of that sequence allow us to first prove the inductive step in the frequency domain and then transport it back to the original (time) domain.
In the following subsections we explain in further detail how the two steps above are completed.
4.3 Upwards induction proof overview
Addressing item 1 in the previous subsection, the lemma below gives a bound on the supremum norm of the -th order finite differences of each player’s loss vector, when all players play according to Optimistic Hedge and experience losses according to their loss functions .
Lemma 4.4 (Abbreviated).
Fix a step size satisfying . If all players follow Optimistic Hedge updates with step size , then for any player , integer satisfying , and time step , it holds that
A detailed version of Lemma 4.4, together with its full proof, may be found in Section B.4. We next give a proof overview of Lemma 4.4 for the case of 2 players, i.e., ; we show in Section B.4 how to generalize this computation to general . Below we introduce the main technical tool in the proof, a “boundedness chain rule,” and then outline how it is used to prove Lemma 4.4.
Main technical tool for Lemma 4.4: boundedness chain rule.
We say that a function is a softmax-type function if there are real numbers and some so that for all , Lemma 4.5 below may be interpreted as a “boundedness chain rule” for finite differences. To explain the context for this lemma, recall that given an infinitely differentiable vector-valued function and an infinitely differentiable function , the higher order derivatives of the function may be computed in terms of those of and using the chain rule. Lemma 4.5 considers an analogous setting where the input variable to is discrete-valued, taking values in (and so we identify the function with the sequence ). In this case, the higher order finite differences of the sequence (Definition 4.1) take the place of the higher order derivatives of with respect to . Though there is no generic chain rule for finite differences, Lemma 4.5 states that, at least when is a softmax-type function, we may bound the higher order finite differences of the sequence . In the lemma’s statement we let denote the sequence .
Lemma 4.5 (“Boundedness chain rule” for finite differences; abbreviated).
Suppose that , is a softmax-type function, and is a sequence of vectors in satisfying for . Suppose for some , for each and , it holds that . Then for all ,
A detailed version of Lemma 4.5 may be found in Section B.3. While Lemma 4.5 requires to be a softmax-type function for simplicity (and this is the only type of function we will need to consider for the case ) we remark that the detailed version of Lemma 4.5 allows to be from a more general family of analytic functions whose higher order derivatives are appropriately bounded. The proof of Lemma 4.4 for all requires that more general form of Lemma 4.5.
The proof of Lemma 4.5 proceeds by considering the Taylor expansion of the function at the origin, which we write as follows: for , , where , denotes the quantity and denotes . The fact that is a softmax-type function ensures that the radius of convergence of its Taylor series is at least 1, i.e., for any satisfying . By the assumption that for each , we may therefore decompose as:
(8) |
where denotes the sequence of scalars for all . The fact that is a softmax-type function allows us to establish strong bounds on for each in Lemma B.5. The proof of Lemma B.5 bounds the by exploiting the simple form of the derivative of a softmax-type function to decompose each into a sum of terms. Then we establish a bijection between the terms of this decomposition and graph structures we refer to as factorial trees; that bijection together with the use of an appropriate generating function allow us to complete the proof of Lemma B.5.
Lemma 4.6 (Abbreviated; detailed vesion in Section B.2).
Fix any , a multi-index and set . For each of the functions , and for each , there are integers , , and , so that the following holds. For any sequence of vectors, it holds that, for each ,
(9) |
Lemma 4.6 expresses the th order finite differences of the sequence as a sum of terms, each of which is a product of finite order differences of a sequence (i.e., the th coordinate of the vectors ). Crucially, when using Lemma 4.6 to prove Lemma 4.5, the assumption of Lemma 4.5 gives that for each , each , and each , we have the bound . These assumed bounds may be used to bound the right-hand side of (9), which together with Lemma 4.6 and (8) lets us complete the proof of Lemma 4.5.
Proving Lemma 4.4 using the boundedness chain rule.
Next we discuss how Lemma 4.5 is used to prove Lemma 4.4, namely to bound for each , , and . Lemma 4.4 is proved using induction, with the base case being a straightforward consequence of the fact that for all . For the rest of this section we focus on the inductive case, i.e., we pick some and assume Lemma 4.4 holds for all .
The first step is to reduce the claim of Lemma 4.4 to the claim that the upper bound holds for each . Recalling that we are only sketching here the case for simplicity, this reduction proceeds as follows: for , define the matrix by , for . We have assumed that all players are using Optimistic Hedge and thus ; for our case here (), this may be rewritten as , . Thus
where the first equality is from Remark 4.3 and the inequality follows since all entries of have absolute value . A similar computation allows us to show .
To complete the inductive step it remains to upper bound the quantities for . To do so, we note that the definition of the Optimistic Hedge updates (1) implies that for any , and , we have
(10) |
For , , set Also, for each , , , and any vector define Thus (10) gives that for , . Viewing as a fixed parameter and letting vary, it follows that for and , .
Recalling that our goal is to bound for each , we can do so by using Lemma 4.5 with and , if we can show that its precondition is met, i.e. that for all , the appropriate and appropriate constants . Helpfully, the definition of as a partial sum allows us to relate the -th order finite differences of the sequence to the -th order finite differences of the sequence as follows:
(11) |
Since for , the inductive assumption of Lemma 4.4 gives a bound on the -norm of the terms on the right-hand side of (11), which are sufficient for us to apply Lemma 4.5. Note that the inductive assumption gives an upper bound on that only scales with , whereas Lemma 4.5 requires scaling of . This discrepancy is corrected by the factor of on the right-hand side of (11), which gives the desired scaling (since for the choice ).
4.4 Downwards induction proof overview
In this section we discuss in further detail item 2 in Section 4.2; in particular, we will show that there is a parameter so that for all integers satisfying ,
(12) |
where hides factors polynomial in . The validity of (12) for implies Lemma 4.2. On the other hand, as long we choose the value in (12) to satisfy , then Lemma 4.4 implies that . This gives that (12) holds for . To show that (12) holds for all , we use downwards induction; fix any , and assume that (12) has been shown for all satisfying . Our main tool in the inductive step is to apply Lemma 4.7 below. To state it, for , we say that a sequence of distributions is -consecutively close if for each , it holds that .444Here, for distributions , denotes the vector whose th entry is . Lemma 4.7 shows that given a sequence of vectors for which the variances of its second-order finite differences are bounded by the variances of its first-order finite differences, a similar relationship holds between its first- and zeroth-order finite differences.
Lemma 4.7.
There is a sufficiently large constant so that the following holds. For any and , suppose that and satisfy the following conditions:
-
1.
The sequence is -consecutively close for some .
-
2.
It holds that
Then
Given Lemma 4.7, the inductive step for establishing (12) is straightforward: we apply Lemma 4.7 with and for all . The fact that are updated with Optimistic Hedge may be used to establish that precondition 1 of Lemma 4.7 holds. Since and , that the inductive hypothesis (12) holds for implies that precondition 2 of Lemma 4.7 holds for appropriate . Thus Lemma 4.7 implies that (12) holds for the value , which completes the inductive step.
On the proof of Lemma 4.7.
Finally we discuss the proof of Lemma 4.7. One technical challenge is the fact that the vectors are not constant functions of , but rather change slowly (as constrained by being -consecutively close). The main tool for dealing with this difficulty is Lemma C.1, which shows that for a -consecutively close sequence , for any vector , . This fact, together with some algebraic manipulations, lets us to reduce to the case that all are equal. It is also relatively straightforward to reduce to the case that for all , i.e., so that . We may further separate into its individual components , and treat each one separately, thus allowing us to reduce to a one-dimensional problem. Finally, we make one further reduction, which is to replace the finite differences in Lemma 4.7 with circular finite differences, defined below:
Definition 4.2 (Circular finite difference).
Suppose is a sequence of vectors . For integers , the level- circular finite difference sequence for the sequence , denoted by , is the sequence defined recursively as: for all , and
(13) |
Circular finite differences for a sequence are defined similarly to finite differences (Definition 4.1) except that unlike for finite differences, where are not defined, are defined by “wrapping around” back to the beginning of the sequence. The above-described reductions, which are worked out in detail in Section C.2, allow us to reduce proving Lemma 4.7 to proving the following simpler lemma:
Lemma 4.8.
Suppose , , and is a sequence of reals satisfying
(14) |
Then
To prove Lemma 4.8, we apply the discrete Fourier transform to both sides of (14) and use the Cauchy-Schwarz inequality in frequency domain. For a sequence , its (discrete) Fourier transform is the sequence defined by . Below we prove Lemma 4.8 for the special case ; we defer the general case to Section C.1.
Proof of Lemma 4.8 for special case .
We have the following:
where the first equality uses Parseval’s equality, the second uses Fact C.3 (in the appendix) for , and the inequality uses Cauchy-Schwarz. By Parseval’s inequality and Fact C.3 for , the right-hand side of the above equals , which, by assumption, is at most . Rearranging terms completes the proof. ∎
References
- [AIMM21] Waïss Azizian, Franck Iutzeler, Jérome Malick, and Panayotis Mertikopoulos. The last-iterate convergence rate of optimistic mirror descent in stochastic variational inequalities. In Conference on Learning Theory, pages 1–32, 2021.
- [BCM12] Maria-Florina Balcan, Florin Constantin, and Ruta Mehta. The Weighted Majority Algorithm does not Converge in Nearly Zero-sum Games. In ICML Workshop on Markets, Mechanisms, and Multi-Agent Models, 2012.
- [Bla54] David Blackwell. Controlled Random Walks. In Proceedings of the International Congress of Mathematicians, volume 3, pages 336–338, 1954.
- [BP18] James P. Bailey and Georgios Piliouras. Multiplicative Weights Update in Zero-Sum Games. In Proceedings of the 2018 ACM Conference on Economics and Computation - EC ’18, pages 321–338, Ithaca, NY, USA, 2018. ACM Press.
- [BP19] James P. Bailey and Georgios Piliouras. Fast and furious learning in zero-sum games: Vanishing regret with non-vanishing step sizes. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12977–12987, 2019.
- [Bro49] George W Brown. Some Notes on Computation of Games Solutions. Technical report, RAND CORP SANTA MONICA CA, 1949.
- [Bub15] Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Found. Trends Mach. Learn., 8(3–4):231–357, November 2015.
- [CBL06] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge university press, 2006.
- [CDT09] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player nash equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
- [CP19] Yun Kuen Cheung and Georgios Piliouras. Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games. In Proceedings of the Thirty-Second Conference on Learning Theory, pages 807–834, 2019.
- [CP20] Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. In Advances in Neural Information Processing Systems, volume 33, pages 18990–18999. Curran Associates, Inc., 2020.
- [CS04] Imre Csiszár and Paul C. Shields. Information Theory and Statistics: A Tutorial. Commun. Inf. Theory, 1(4):417–528, December 2004.
- [DDK11] Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-Optimal No-Regret Algorithms for Zero-Sum Games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms (SODA), 2011.
- [DFP+10] Constantinos Daskalakis, Rafael Frongillo, Christos H. Papadimitriou, George Pierrakos, and Gregory Valiant. On learning algorithms for nash equilibria. In Proceedings of the Third International Conference on Algorithmic Game Theory, SAGT’10, page 114–125, Berlin, Heidelberg, 2010. Springer-Verlag.
- [DGP06] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing a nash equilibrium. In Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing (STOC), 2006.
- [DP14] Constantinos Daskalakis and Qinxuan Pan. A counter-example to Karlin’s strong conjecture for fictitious play. In Proceedings of the 55th Annual Symposium on Foundations of Computer Science (FOCS), 2014.
- [DP19] Constantinos Daskalakis and Ioannis Panageas. Last-iterate convergence: Zero-sum games and constrained min-max optimization. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 27:1–27:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
- [FLL+16] Dylan J Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Eva Tardos. Learning in games: Robustness of fast convergence. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- [GKP89] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, Reading, 1989.
- [GPD20] Noah Golowich, Sarath Pattathil, and Constantinos Daskalakis. Tight last-iterate convergence rates for no-regret learning in multi-player games. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- [HAM21] Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory, 2021.
- [Han57] James Hannan. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
- [HMcW+03] Hart, Andreu Mas-colell, Of Jörgen W. Weibull, O Vega, Drew Fudenberg, David K. Levine, Josef Hofbauer, Karl Sigmund, Eric Maskin, Motty Perry, and Er Vasin. Uncoupled dynamics do not lead to nash equilibrium. Amer. Econ. Rev, pages 1830–1836, 2003.
- [KHSC18] Ehsan Asadi Kangarshahi, Ya-Ping Hsieh, Mehmet Fatih Sahin, and Volkan Cevher. Let’s be honest: An optimal no-regret framework for zero-sum games. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2488–2496. PMLR, 10–15 Jul 2018.
- [KLP11] Robert Kleinberg, Katrina Ligett, and Georgios Piliouras. Beyond the nash equilibrium barrier. In In Innovations in Computer Science (ICS, pages 125–140, 2011.
- [LGNPw21] Qi Lei, Sai Ganesh Nagarajan, Ioannis Panageas, and xiao wang. Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 1441–1449. PMLR, 13–15 Apr 2021.
- [MPP18] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’18, page 2703–2717, USA, 2018. Society for Industrial and Applied Mathematics.
- [Nes05] Yu Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235–249, 2005.
- [PP16] Christos Papadimitriou and Georgios Piliouras. From nash equilibria to chain recurrent sets: Solution concepts and topology. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, ITCS ’16, page 227–235, New York, NY, USA, 2016. Association for Computing Machinery.
- [Rob51] Julia Robinson. An Iterative Method of Solving a Game. Annals of mathematics, pages 296–301, 1951.
- [Rou09] Tim Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, page 513–522, New York, NY, USA, 2009. Association for Computing Machinery.
- [RS13a] Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Annual Conference on Learning Theory, pages 993–1019, 2013.
- [RS13b] Alexander Rakhlin and Karthik Sridharan. Optimization, Learning, and Games with Predictable Sequences. arXiv:1311.1869 [cs], November 2013. arXiv: 1311.1869.
- [RST17] Tim Roughgarden, Vasilis Syrgkanis, and Éva Tardos. The price of anarchy in auctions. J. Artif. Int. Res., 59(1):59–101, May 2017.
- [SALS15] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- [Sha64] L. Shapley. Some Topics in Two-Person Games. Advances in Game theory, 1964.
- [ST13] Vasilis Syrgkanis and Eva Tardos. Composable and efficient mechanisms. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’13, page 211–220, New York, NY, USA, 2013. Association for Computing Machinery.
- [VGFL+20] Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Thanasis Lianeas, Panayotis Mertikopoulos, and Georgios Piliouras. No-regret learning and mixed nash equilibria: They do not mix. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1380–1391. Curran Associates, Inc., 2020.
- [WL18] Chen-Yu Wei and Haipeng Luo. More adaptive algorithms for adversarial bandits. In Proceedings of the 31st Conference On Learning Theory, pages 1263–1291, 2018.
- [WLZL21] Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo. Linear last-iterate convergence in constrained saddle-point optimization. In International Conference on Learning Representations, 2021.
Appendix A Proofs for Section 4.1
In this section we prove Lemma 4.1. Throughout the section we use the notation of Lemma 4.1: in particular, we assume that any player follows the Optimistic Hedge updates (1) with step size , for an arbitrary sequence of losses .
A.1 Preliminary lemmas
The first few lemmas in this section pertain to vectors , for some ; note that such vectors may be viewed as distributions on . Let denote the Radon-Nikodym derivative, i.e., the vector whose th component is .
Lemma A.1.
If , then .
Proof.
The lemma is immediate from the definition of the divergence:
∎
It is a standard fact (though one which we do not need in our proofs) that for all , . The below lemma shows an inequality in the opposite direction when are bounded:
Lemma A.2.
There is a constant so that the following holds. Suppose that for we have and . Then .
Proof.
There is a constant so that for any , for all , we have
Set , so that for all by assumption. Then for , we have
∎
The next lemma considers two vectors which are related by a multiplicative weights-style update with loss vector ; the lemma relates to .
Lemma A.3.
There is a constant so that the following holds. Suppose that , , , and satisfy, for each ,
(15) |
Then
Proof.
Let , where denotes the all-1s vector. Note that , and that if we replace with , (15) remains true. Moreover, . Thus, by replacing with , we may assume from here on that and that .
Note that
where is a random variable that takes values with probability . As long as is a sufficiently large constant, we have that, for all satisfying ,
(16) |
Thus, for a sufficiently large constant , we have, for all satisfying ,
(17) |
Moreover, since , we have from (16) that . For a sufficiently large constant it follows that
(18) |
Combining (17) and (18) and again using the fact that , we get, for some sufficiently large constant , as long as ,
By the assumption that , we have , and thus the above gives the desired result. ∎
We will need the following standard lemma:
Lemma A.4 ([RS13a], Eq. (26)).
For any , , , if it holds that , then for any ,
For , we define the vector by
(19) |
Additionally define to be the uniform distribution over .
The next lemma, Lemma A.5 is very similar to [RS13a, Lemma 3], and is indeed essentially shown in the course of the proof of that lemma. Note that no boundedness assumption is placed on the vectors in Lemma A.5. For completeness we provide a full proof of the lemma.
Lemma A.5 (Refinement of Lemma 3, [RS13a]).
Suppose that any player follows the Optimistic Hedge updates (1) with step size , for an arbitrary sequence of losses . For any vector , it holds that
(20) |
Proof.
For any , it holds that
(21) |
For , set . Using the definition of the dual norm and the fact , we have
(22) |
It is immediate from the definitions of (in (19)) and (in (1)) that for ,
(23) |
Using Lemma A.4 with , we obtain
(24) |
Next, we note that, again by (19) and (1), for ,
Using Lemma A.4 with , we obtain
(25) |
By (21), (22), (24), and (25), we have
(26) |
The statement of the lemma follows by summing (26) over and using the fact that for any choice of , . ∎
A.2 Proof of Lemma 4.1
Now we are ready to prove Lemma 4.1. For convenience we restate the lemma.
Lemma 4.1 (restated).
There is a constant so that the following holds. Suppose any player follows the Optimistic Hedge updates (1) with step size , for an arbitrary sequence of losses . Then for any vector , it holds that
(27) |
Proof.
Lemma A.5 gives that, for any ,
(28) |
Note that for any vectors , if there is a vector so that for all , , we have that
Therefore, by (19) and (23), respectively, we obtain that, for ,
(29) |
(Above we have also used that for all .) Thus, for , we can apply Lemma A.2 and show, for a sufficiently large constant ,
(30) | ||||
(31) |
Note also that for vectors we have that . By Lemma A.3 and (19), we have that, for a sufficiently large constant , as long as ,
(32) |
and
(33) |
Appendix B Proofs for Section 4.3
In this section we give the full proof of Lemma 4.4. In Section B.1 we introduce some preliminaries. In Section B.2 we prove Lemma 4.5, the “boundedness chain rule” for finite differences. In Section B.4 we show how to use this lemma to prove Lemma 4.4.
B.1 Additional preliminaries
In this section we introduce some additional notations and basic combinatorial lemmas. Definition B.1 introduces the shift operator , which like the finite difference operator , maps one sequence to another sequence.
Definition B.1 (Shift operator).
Suppose is a sequence of vectors . For integers , the -shift sequence for the sequence , denoted by , is the sequence , defined by for .
For sequences and of real numbers, we will denote the product sequence as as the sequence of vectors . Lemmas B.1 and B.2 below are standard analogues of the product rule for finite differences. The (straightforward) proofs are provided for completeness.
Lemma B.1 (Product rule; Eq. (2.55) of [GKP89]).
Suppose and are sequences of real numbers. Then the product sequence satisfies
Proof.
We compute
∎
Lemma B.2 (Multivariate product rule).
Suppose that and for , are sequences of real numbers. Then the product sequence satisfies
Proof.
We compute
∎
Lemma B.4 and Lemma B.3, which is used in the proof of the former, are used to bound certain sums with many terms in the proof of Lemma 4.5. To state Lemma B.3 we make one definition. For positive integers and any , define
where the sum is over integers satisfying for . In the definition of , the quantity (which arises when some ) is interpreted as 1.
Lemma B.3.
For any positive integers and any so that , , and , then
Proof of Lemma B.3.
We may rewrite and then upper bound it as follows:
(36) | ||||
where (36) follows since is convex in for , and therefore, in the interval , takes on maximal values at the endpoints. We see
for when . Also,
for when . (This inequality is easily seen to be equivalent to the fact that , which follows from the fact that for and for .) Therefore,
∎
Lemma B.4.
Fix integers . For any function , define, for each , . Then, for any ,
(37) |
Proof.
In the case that , we simply use the fact that the number of functions is , and each term of the summation on the left-hand side of (37) is at most 1. In the remainder of the proof we may thus assume that .
For any tuple of non-negative integers with , there are (see [CS04, Lemma 2.2] for a proof of this inequality) functions such that for all . Combining these like terms,
(38) |
We evaluate this sum in 2 cases: whether or not is greater than . The contribution to this sum coming from terms with is
(39) |
by Lemma B.3.
We next consider the case where . For a specific term with , we know there is a unique such that since . So, we can represent the contribution to the sum from this case as
(40) | ||||
(41) | ||||
(42) |
where (40) follows by symmetry, (41) follows by factoring out the contribution of and letting , and (42) follows by Lemma B.3.
Lemma B.5.
For , let such that . For each , define to be the function
and let denote the Taylor series of . Then for any and any integer ,
Proof.
Note that, for each ,
and so
It is straightforward to see that the following equalities hold for any , :
We claim that for any , we can express as a polynomial in comprised of monomials each of degree . We verify this by induction, first noting that after taking zero derivatives, the function is a degree-1 monomial. Assume that for some sequence , we can express
where each . We see that for each , there is some sequence of bits so that
(43) |
where we define, for each ,
Thus, can be expressed as a sum of monomials of degree , completing the inductive step.
This inductive argument also demonstrates a bijection between the monomials of and a combinatorial structure that we call factorial trees. Formally, we define a factorial tree to be a directed graph on vertices such that each vertex has a single incoming edge from one of the vertices in . (For a non-negative integer , we write .) For a factorial tree , let denote the parent of a vertex . A particular factorial tree represents the monomial that was generated by choosing the term in (43) for derivation when taking the derivative , for each . (See Figure 1 for an example.)

Each of the monomials comprising is a product of terms corresponding to indices (i.e., the first term in the product is either or , the second term is either or , and so on). We say that a term corresponding to index is perturbed if it is (as opposed to ). From our construction, we see that the term is perturbed if and there is no such that . That is, is a leaf in the corresponding factorial tree and the parent of corresponds to the same index as . One can think of as a coloring of all the vertices of the factorial tree with colors, except the root of the tree (vertex ) which has fixed color . Then, we can say the term is perturbed if and only if is a leaf with the same color as its parent. We call such a leaf a petal. For , we let be the set of petals on tree with color , be the set of leaves of tree , and be the set of all non-leaves other than the fixed-color root. Therefore,
(where we let for notational convenience) | ||||
where in the last step we decompose, for each factorial tree , into the tuple of indices corresponding to the non-leaves , and the tuple of indices corresponding to the leaves .
We note that, fixing tree and the colors of all non-leaves ,
And so,
(as no non-leaf can ever be a petal) | ||||
where is the set of all factorial trees and is the uniform distribution over . For a specific vertex , we note that if and only if it is not the parent of any vertex . So,
(44) |
We will show via induction that, for any vertex set
(45) |
Having established the base case for every with , we assume (45) holds for all with . For any set of vertices , consider an arbitrary partition of into two sets with . We see
by the inductive hypothesis, as desired. Thus, is at most the coefficient of the polynomial
and so
and
as desired. ∎
Lemma B.6.
Let be softmax-type functions. That is, for each , there is some and indices such that
where for all . Let denote the Taylor series of . Then for any integer ,
Proof.
Letting denote the Taylor series of for all , we have and therefore
Lemma B.7.
Let be any softmax-type function. Then the radius of convergence of the Taylor series of at the origin is at least 1.
Proof.
For a complex number , write to denote the real and imaginary parts, respectively, of . Note that for any with for all , we have
and thus . Moreover, for any such point , it holds that . It then follows that for such we have . In particular, is holomorphic on the region .
Fix any , and let . By the multivariate version of Cauchy’s integral formula,
The power series of at is defined as , where . For any with , we have , which tends to as . Thus, by the (multivariate version of the) Cauchy-Hadamard theorem, the radius of convergence of the power series of at is at least . ∎
B.2 Proof of Lemma 4.6
In this section prove Lemma 4.6, which, as explained in Section 4.3, is an important ingredient in the proof of Lemma 4.5. The detailed version of Lemma 4.6 is presented below; it includes several claims which are omitted for simplicity in the abbreviated version in Section 4.3.
Lemma 4.6 (Detailed).
Fix any integer , a multi-index and set . For each of the functions , and for each , there are integers , , and , so that the following holds. For any sequence of vectors, it holds that
(47) |
Moreover, the following properties hold:
-
1.
For each and , . In particular, .
-
2.
For each and , it holds that .
-
3.
For each , , and , .
Proof of Lemma 4.6.
We use induction on . First note that in the case and for any , we have that , and so for the unique function , for all , we may take , , and ensure that for each there are values of so that .
Now fix any integer , and suppose the statement of the claim holds for all . We have that
(48) | ||||
(49) |
where (48) uses Lemma B.2 and (49) uses the commutativity of and . For each , we construct functions , defined by for , and for . Next, for , we define the quantities as follows:
-
•
Set if , and .
-
•
Set if , and if .
-
•
Set .
By (49) and the above definitions, we have
thus verifying (9) for the value .
Finally we verify that items 1 through 3 in the lemma statement hold. The definition of above together with the inductive hypothesis ensures that for all , , thus verifying item 1 of the lemma statement. Since for all , it follows from the inductive hypothesis that ; this verifies item 2. Finally, note that for any and , , and thus item 3 follows from the inductive hypothesis. ∎
B.3 Proof of Lemma 4.5
In this section we prove Lemma 4.5. To introduce the detailed version of the lemma we need the following definition. Suppose is a real-valued function that is real-analytic in a neighborhood of the origin. For real numbers , we say that is -bounded if the Taylor series of at , denoted , satisfies, for each integer , . In the statement of Lemma 4.5 below, the quantity is interpreted as 1 (in particular, for ).
Lemma 4.5 (“Boundedness chain rule” for finite differences; detailed).
Suppose that , is a -bounded function so that the radius of convergence of its power series at is at least , and is a sequence of vectors satisfying for . Suppose for some , for each and , it holds that for some . Then for all ,
Proof of Lemma 4.5.
Note that the th order finite differences of a constant sequence are identically 0 for , so by subtracting from , we may assume without loss of generality that . (Here denotes the all-zeros vector.)
By assumption, the radius of convergence of the power series of at the origin is at least , and so for each , there is a real number so that for with for each ,
(50) |
Let ; by the assumption that is -bounded, we have that for all .
For , recall that denotes the sequence , defined by . Then since for all , we have that, for , .
We next upper bound the quantities . To do so, fix some , and set . For each function and , recall the integers , , defined in Lemma 4.6. By assumption it holds that for each , each , each , . It follows that for each and function ,
where the last equality uses that (item 1 of Lemma 4.6). Then by Lemma 4.6, we have:
(51) |
where (51) follows from Lemma B.4, the fact that , and that (item 1 of Lemma 4.6).
B.4 Proof of Lemma 4.4
Lemma 4.4 (Detailed).
Fix a parameter . If all players follow Optimistic Hedge updates with step size , then for any player , integer satisfying , time step , it holds that
Proof.
We have that for each agent , each , and each , . Thus, for ,
(55) | ||||
(56) |
where (55) and (56) use Remark 4.3 and in (56), refers to the sequence , .
In the remainder of this lemma we will prepend to the loss sequence the vectors . We will also prepend to the strategy sequence . Next notice that for any agent , any , and any , by the definition (1) of the Optimistic Hedge updates, it holds that, for each ,
Note in particular that our definitions of ensure that the above equation holds even for . Now an integer satisfying ; for , let us write
Also, for a vector and an index , define
(57) |
so that for . In particular, for any , and any choices of for all ,
(58) |
Next, note that
meaning that for any ,
(59) |
We next establish the following claims which will allow us to prove Lemma 4.4 by induction.
Claim B.8.
For any , , and , it holds that .
Proof of Claim B.8.
The claim is immediate from the triangle inequality and the fact that for all . ∎
Claim B.9.
Fix so that . Suppose that for some and for all , all , and all , it holds that . Suppose that the step size satisfies . Then for all and ,
(60) |
Proof of Claim B.9.
Set , so that the assumption of the claim gives .
We first use Lemma 4.5 to bound, for each , , and for all , the quantity . In particular, we will apply Lemma 4.5 with , , the value of in the statement of Claim B.9, , and the sequence , for , defined as
namely the concatenation of the vectors . The function in Lemma 4.5 is set to the function that takes as input the concatenation of for all and outputs:
(61) |
where the function are as defined in (57). We first verify the preconditions of Lemma 4.5. By Lemma B.6, is a -bounded function for some constant . By Lemma B.7, the radius of convergence of each function at is at least 1; thus the radius of convergence of at is at least . Claim B.8 gives that for all . Thus, since ,
for . Next, for and , we have
(62) | ||||
(63) | ||||
(64) |
where (62) follows from (59), (63) follows from the assumption in the statement of Claim B.9 and , and (64) follows from the fact that . It then follows from Lemma 4.5 and (58) that
(65) | ||||
(66) | ||||
(67) |
(In particular, (65) uses (58), (66) uses the definition of in (61), and (67) uses Lemma 4.5.)
Appendix C Proofs for Section 4.4
The main goal of this section is to prove Lemma 4.7. First, in Section C.1 we prove some preliminary lemmas and then we prove Lemma 4.7 in Section C.2
C.1 Preliminary lemmas
Lemma C.1 shows that and are close when the entries of are close; it will be applied with equal to the strategies played in the course of Optimistic Hedge.
Lemma C.1.
Suppose and are given, and is a vector. Suppose are distributions with for some . Then
(69) |
Proof.
We first prove that . To do so, note that since adding a constant to every entry of does not change or , by replacing with , we may assume without loss of generality that . Thus . Now we may compute:
(70) |
where (70) uses the fact that .
By interchanging the roles of , we obtain that
This completes the proof of the lemma. ∎
Next we prove Lemma 4.8 (recall that only the special case was proved in Section 4.4). For convenience the lemma is repeated below.
Lemma 4.8 (Restated).
Suppose , , and is a sequence of reals satisfying
(71) |
Then
To prove Lemma 4.8 we need the following basic facts about the Fourier transform:
Fact C.2 (Parseval’s equality).
It holds that .
The second fact gives a formula for the Fourier transform of the circular finite differences; its simple form is the reason we work with circular finite differences in this section:
Fact C.3.
For , .
Proof of Lemma 4.8.
Note that the discrete Fourier transform of satisfies , and similarly , for . By the Cauchy-Schwarz inequality, Parseval’s equality (Fact C.2), Fact C.3, and the assumption that (71) holds, we have
(72) |
Note that for real numbers and with , it holds that
Taking and (for which is immediate) and using (72) then gives
as desired. ∎
C.2 Proof of Lemma 4.7
Now we prove Lemma 4.7. For convenience we restate the lemma below with the exact value of the constant referred to in the version in Section 4.4.
Lemma 4.7 (Restated).
For any and , suppose that and satisfy the following conditions:
-
1.
The sequence is -consecutively close for some .
-
2.
It holds that
Then
(73) |
Proof.
Fix a positive integer , to be specified exactly below. For , define by
(74) |
Then
(75) | ||||
(76) |
where (75) uses the fact that and so for all , and the final inequality (76) follows from assumption 2 of the lemma statement.
By (74) and Lemma C.1 with , we have, for some constant ,
(77) |
Here we have used that for , it holds that since .
For any integer , we define the sequence , for . Thus for , which implies that for all , , , and thus
(78) |
By the definition of the sequence , for , we have
(79) |
For , let us now define
(80) |
so that, by (77), (78), and (79),
(81) |
By (80) and Lemma 4.8 applied to the sequence , it holds that, for each ,
(82) |
Then we have:
(83) | ||||
(84) | ||||
(85) |
where (83) follows from (79), (84) follows from (82) and (78), and (85) follows from (81). Summing the above for , we obtain, for some constant ,
(86) | ||||
(87) | ||||
(88) | ||||
(89) | ||||
(90) | ||||
(91) | ||||
(92) |
where:
-
•
(86) follows since and thus for all ;
- •
- •
- •
- •
- •
Now choose , so that . Therefore, as long as , we have, since , that
Then it follows from (92) that
(93) |
Using that , the inequality can be satisfied by ensuring that . Note that our choice of ensures that , as was assumed earlier. Moreover, we have . Thus, (93) gives the desired result. ∎
C.3 Completing the proof of Theorem 3.1
Using the lemma developed in the previous sections we now can complete the proof of Theorem 3.1. We begin by proving Lemma 4.2. The lemma is restated formally below.
Lemma 4.2 (Detailed).
There are constants so that the following holds. Suppose a time horizon is given, we set , and all players play according to Optimistic Hedge with step size satisfying . Then for any , the losses for player satisfy:
(94) |
We state a generic version of this lemma that can be applied in more general settings.
Lemma C.4.
For any integers and , we set , , and . Suppose that and satisfy the following
-
1.
For each and , it holds that
-
2.
The sequence is -consecutively close for some .
Then,
(95) |
Proof of Lemma 4.2.
We hope to apply Lemma C.4 with , and , as well as . To do so, we must verify that the preconditions of Lemma C.4 hold when our sequences arise from the dynamics of players playing Optimistic Hedge with step size satisfying .
Set (note that is the constant appearing in item 2 of Lemma C.4 and item 1 of Lemma 4.7 in Section C.2). Our assumption that implies that, as long as the constant satisfies ,
(96) |
To verify precondition 1, we apply Lemma 4.4 with the parameter in the lemma set to : a valid selection as . We conclude that, for each , and , it holds that since as required by the lemma. To verify precondition 2, we first confirm that our selection of places it in the desired interval as . By the definition of the Optimistic Hedge updates, for all and , we have . Thus, the sequence is -consecutively close (since for satisfying (96)). Therefore, Lemma C.4 applies and we have
for , as desired. ∎
So, it suffices to prove Lemma C.4.
Proof of Lemma C.4.
Set . Therefore, from item 1 we have,
(97) |
We will now prove, via reverse induction on , that for all satisfying ,
(98) |
with (note that is the constant appearing in the inequality (73) of the statement of Lemma 4.7 in Section C.2). The base case is verified by (97) and the fact that . Now suppose that (98) holds for some satisfying . We will now apply Lemma 4.7, with and for , as well as , , , and the parameter of Lemma 4.7 set to . We verify that precondition 1 holds due to precondition 2 of Lemma C.4 and the fact that . Moreover, precondition 2 holds by our inductive hypothesis (98) and our choice of . Therefore, by Lemma 4.7 and the fact that for our choice of , it follows that
where the final equality follows since is chosen so that . This completes the proof of the inductive step. Thus (98) holds for . Using again that the sequence is -exponentially close, we see that
(99) | ||||
(100) | ||||
(101) | ||||
(102) |
where (99) and (101) follow from Lemma C.1, and (100) uses (98) for . Now, (102) verifies the statement of the lemma as and . ∎
We are finally ready to prove Theorem 3.1. For convenience the theorem is restated below.
Theorem 3.1 (Restated).
There are constants so that the following holds. Suppose a time horizon is given. Suppose all players play according to Optimistic Hedge with any positive step size . Then for any , the regret of player satisfies
(103) |
In particular, if the players’ step size is chosen as , then the regret of player satisfies
(104) |
Proof.
The conclusion of the theorem is immediate if , so we may assume from here on that . Moreover, the conclusion of (103) is immediate if (as necessarily), so we may also assume that . Let be the constant of Lemma 4.1, let be the constant called in Lemma 4.2 and be the constant called in Lemma 4.2. As long as the constant of Theorem 3.1 is chosen so that and implies that , we have the following:
(105) | ||||
(106) | ||||
(107) |
where (105) follows from Lemma 4.1, (106) follows from Lemma 4.2, and (107) follows from the upper bound . We have thus established (103). The upper bound (104) follows immediately. ∎
Appendix D Adversarial regret bounds
In this section we discuss how Optimistic Hedge can be modified to achieve an algorithm that obtains the fast rates of Theorem 3.1 when played by all players, and which still obtains the optimal rate of in the adversarial setting. Such guarantees are common in the literature [DDK11, RS13b, SALS15, KHSC18, HAM21]. The guarantees of this modification of Optimistic Hedge are stated in the following corollary (of Lemmas 4.1 and 4.2):
Corollary D.1.
There is an algorithm which, if played by all players in a game, achieves the regret bound of for each player ; moreover, when player is faced with an adversarial sequence of losses, the algorithm ’s regret bound is .
Proof.
Let be the constant called in Theorem 3.1 and be the constant called in Lemma 4.2. The algorithm of Corollary D.1 is obtained as follows:
-
1.
Initially run Optimistic Hedge, with the step-size .
-
2.
If, for some , (94) first fails to hold at time , i.e.,
(108) then set and continue on running Optimistic Hedge with step size .
If there is no so that (108) holds (and by Lemma 4.2, this will be the case when is played by all players in a game), then the proof of Theorem 3.1 shows that the regret of each player is bounded as . Otherwise, since is defined as the smallest integer at least 4 so that (108) holds, we have
and thus, by Lemma 4.1, for any ,
(109) |
Further, by the choice of step size for time steps , we have, for any ,
(110) | ||||
(111) |
where (110) uses [SALS15, Proposition 7]. Adding (109) and (111) completes the proof of the corollary. ∎