Laura Greenstreet \Emaillaura.greenstreet@gmail.com
\NameNicholas J. A. Harvey \Emailnickhar@cs.ubc.ca
\NameVictor Sanches Portella \Emailvictorsp@cs.ubc.ca
\addrUniversity of British Columbia, Department of Computer Science
Efficient and Optimal Fixed-Time Regret with Two Experts
Abstract
Prediction with expert advice is a foundational problem in online learning. In instances with rounds and experts, the classical Multiplicative Weights Update method suffers at most regret when is known beforehand. Moreover, this is asymptotically optimal when both and grow to infinity. However, when the number of experts is small/fixed, algorithms with better regret guarantees exist. Cover showed in 1967 a dynamic programming algorithm for the two-experts problem restricted to costs that suffers at most regret with pre-processing time. In this work, we propose an optimal algorithm for prediction with two experts’ advice that works even for costs in and with processing time per turn. Our algorithm builds up on recent work on the experts problem based on techniques and tools from stochastic calculus.
keywords:
experts, Cover, online learning, optimal, fixed-time1 Introduction
The foundational problem in online learning of prediction with expert advice (or simply experts’ problem) consists of a sequential game between a player and an adversary. In each turn, the player chooses (possibly randomly) one of experts to follow. Concurrently, the adversary chooses for each expert a cost in . At the end of a turn, the player sees the costs of all experts and suffers the cost of the expert they followed. The performance of the player is usually measured by the regret: the difference between their cumulative loss and the cumulative loss of the best expert in hindsight. In this case, we are interested in strategies for the player whose (expected) regret against any adversary is sublinear in the total number of rounds of the game.
A well-known strategy for the player is the Multiplicative Weights Update (MWU) method (Arora et al., 2012). In the fixed-time setting — that is, when the player knows beforehand the total number of rounds — MWU with a carefully-chosen fixed step-size suffers at most regret. Additionally, this regret bound is asymptotically optimal when both and grow to infinity (Cesa-Bianchi et al., 1997). Yet, if the number of experts is fixed/small, better regret guarantees may be possible. From a theoretical standpoint, there is a clear motivation for the case where is fixed: MWU can suffer regret arbitrarily close to for any as the number of rounds grows111This is known to hold when MWU is used with a fixed or decreasing step-size, which are the usual cases when MWU is applied to the experts’ problem. When the step-size of MWU is allowed to be arbitrary, Gravin et al. (2017) show that MWU can suffer regret arbitrarily close to as the number of rounds grows. (Gravin et al., 2017). This means that different ideas are necessary for player strategies to guarantee smaller regret.
1.1 The Case of Two Experts
Of course, a natural question is whether regret smaller than is even possible as grows even if is fixed. Cover (1967) showed that for two experts (that is, for ) a regret of is the best possible in the worst-case by showing an algorithm for the case with costs in . In related work, the attainable worst-case regret bounds for three (Abbasi-Yadkori et al., 2017; Kobzar et al., 2020) and four (Bayraktar et al., 2020) experts were recently improved, respectively, to and up to lower-order terms. Although optimal, Cover’s algorithm is based in a dynamic programming approach that takes time in total. In comparison, MWU takes time per round to compute the probabilities to assign to the two experts at each round. Finally, one can adapt Cover’s algorithm for costs in , but the standard approach is to randomly round to costs to either or . In this case, the regret guarantees only hold in expectation.
In a related line of work, Harvey et al. (2020b) study the 2-experts problem in the anytime setting, that is, in the case where the player/algorithm does not know the total number of rounds ahead of time. They showed an optimal strategy for the player whose regret on any round is at most , where the constant arises naturally in the study of Brownian motion (see Mörters and Peres, 2010 for an introduction to the field and historical references). Moreover, their algorithm can be computed (up to machine precision) in time since it boils down to the evaluation of well-known mathematical functions such as the exponential function and the imaginary error function. In this work, we combine similar ideas based on stochastic calculus together with Cover’s algorithm to propose an efficient and optimal fixed-time algorithm for 2-experts.
Known result (Cover, 1967):
There is a dynamic programming algorithm for the 2-experts problem with costs in that suffers at most regret in games with rounds and requires pre-processing time.
Our contribution:
An algorithm for the two experts’ problem with costs in that suffers at most regret in games with rounds. This new algorithm has running time and is based on discretizing a continuous-time solution obtained using ideas from stochastic calculus.
More precisely, one of the key steps is deriving a player in the continuous-time setting from Harvey et al. (2020b) that exploits the knowledge of the time-horizon to obtain regret bounds better than in the anytime setting. However, unlike the anytime setting, discretizing this algorithm leads to non-negative discretization error. Another key contribution of our paper is showing that this discretization error is small. Finally, the connections to Cover’s classical algorithm sheds new intuition into the classical optimal solution. Interestingly, our results could be formally presented without resorting to stochastic calculus. Yet, it is the stochastic calculus point of view that guides us through the design and analysis of the algorithm.
Text organization:
We first formally define the experts problem and discuss some assumptions and simplifications in Section 2. In Section 3 we present a brief summary of Cover’s optimal algorithm for two experts. In Section 4 we define an analogous continuous-time problem and describe a solution inspired by Cover’s algorithm. Finally, in Section 5 we present and analyze a discretized version of the continuous-time algorithm, showing it enjoys optimal worst-case regret bounds.
2 Prediction with Expert Advice
In this section we shall more precisely define the problem of prediction with expert advice. The problem is parameterized by a fixed number of experts. A (strategy for the) player is a function that, given cost vectors chosen by the adversary in previous rounds, outputs a probability distribution over the experts represented by a vector . Similarly, a (strategy for the) adversary is a function that, given previous player’s choices of distributions over the experts, outputs a vector of expert costs for round , where is the cost of expert . The performance of a player strategy in a game with rounds against an adversary is measured by the regret, defined as
where above, and for the remainder of this section222If no specific strategies or are clear from the context, one may take and to be arbitrary strategies and we shall omit and when they are clear from context. we have and for all . Moreover, whenever the loss vectors are clear from context, we define the cumulative loss of expert at round by . In this text, for each we want to devise a strategy for the player that suffers regret at most sublinear in against any adversary in a game with rounds. That is, we want a family of strategies such that
(1) |
where the supremum ranges over all possible adversaries, even those that have full knowledge of (and may even be adversarial to) the player’s strategy.
2.1 Restricted Adversaries
In (1), the supremum ranges over all the possible adversaries for a game with rounds. However, we need only consider in the supremum oblivious adversaries (Karlin and Peres, 2017, Section 18.5.4), that is, adversaries whose choice on each round depends only on the round number and not on the choices of the player. For any , we denote by the oblivious adversary that plays on round .
In fact, we may restrict our attention to even smaller sets of adversaries (for details on these reductions, see Gravin et al., 2016 and Karlin and Peres, 2017, Section 18.5.3). First, in (1) we need only to consider binary adversaries, that is, adversaries which can assign only costs in to the experts. Furthermore, to obtain the value of the optimal regret for two experts we only need to consider adversaries that pick vector costs in , which we call restricted binary adversaries. Intuitively, the adversary can do no better by placing equal costs on both experts at any given round. The optimal algorithm for two experts proposed by Cover (1967) heavily relies on the assumption that the adversary is a restricted binary one and does not extend to general costs in without resorting to randomly rounding the costs — which makes the regret guarantees hold only in expectation.
In this work we design an algorithm the suffers at most regret for arbitrary costs. Our initial analysis handles only restricted binary adversaries, but simple concavity arguments extend the upper bound to general adversaries. Throughout this text we fix a time horizon .
2.2 The Gap Between Experts
The case where we have only 2 experts admits a simplification that aids us greatly in the design of upper- and lower-bounds on the optimal regret. Namely, the gap (between experts) at round is given by , where is the cumulative loss vector at round as defined in Section 2. Furthermore, we denote by lagging expert (on round ) an expert with maximum cumulative loss on round among both experts. Similarly, we denote by leading expert (on round ) an expert with minimum cumulative loss on round . The following proposition from Harvey et al. (2020b) shows that, for the restricted binary adversaries described earlier, the regret can be almost fully characterized by the expert gaps and the player’s choices of distributions on the experts. In the next proposition (and throughout the remainder of the text), for any predicate we define to be 1 if is true and otherwise.
Proposition 2.1 (Harvey et al., 2020a, Proposition 2.3).
Fix , let be a player strategy, and let be the expert costs chosen by the adversary. For each , set , let be the probability mass placed on the lagging expert on round , and let be the gap between experts on round . Then,
where . In particular, if for every with we have , then
3 An Overview of Cover’s Algorithm
Although in this section we give only a brief overview of Cover’s algorithm, for the sake of completeness we provide a full description and analysis of the algorithm in Appendix A. The key idea in Cover’s algorithm is to compute optimal decisions for all possible scenarios beforehand. This is a feasible approach when we know the total number of rounds and the adversary is a (restricted) binary adversary. More precisely, we will focus our attention to player strategies parameterized by functions which place probability mass on the lagging expert on round if the gap between experts is , and mass on the leading expert. Then the “regret-to-be-suffered” by at any round with a given gap between experts is
(2) |
We can compute all entries of as defined above via a dynamic programming approach, starting with for all and then computing these values for earlier rounds. Moreover, there is a simple strategy that minimizes the worst-case regret . Interestingly, the worst-case regret of given by is tightly connected with symmetric random walks, where a symmetric random walk (of length starting at ) is a sequence of random variables with for each and are i.i.d. uniform random variables on . The next theorem summarizes the guarantees on the regret of , showing that it suffers no more than regret. Moreover, it is worth noting that no player strategy can do any better asymptotically in (for a complete proof of the lower bound, see Appendix A.5).
Theorem 3.1 (Cover, 1967, and Karlin and Peres, 2017, Section 18.5.3).
For every , let the random variable be the number of passages through 0 of a symmetric random walk of length starting at position . Then for every . In particular,
Finally, although not more efficiently computable than the dynamic programming approach, has a closed form solution (see Karlin and Peres, 2017, Section 18.5.3) given, for and , by
(3) |
where is a symmetric random walk of length . This closed-form solution will serve as inspiration for our continuous-time algorithm.
4 A Continuous-Time Problem
Cover’s player strategy is optimal, but it is defined only for restricted binary adversaries. It is likely that it can be extended to binary adversaries, but it is definitely less clear how to extend such an algorithm for general adversaries picking costs in . Moreover, even when Cover’s algorithm can be used, it is quite inefficient: we either need to compute which has entries, or at each round we need to compute the probabilities in (3). In the latter case, in the first round we already need time to exactly compute the probabilities related to a length random walk.
To devise a new algorithm for the two experts problem, we first look at an analogous continuous-time problem, first proposed by Harvey et al. (2020b) and with a similar setting previously studied by Freund (2009). The main idea is to translate the random walk connection from the discrete case into a stochastic problem in continuous time, and then exploit the heavy machinery of stochastic calculus to derive a continuous time solution.
4.1 Regret as a Discrete Stochastic Integral
Let us begin by further connecting Cover’s algorithm to random walks. Let be a player strategy induced by some function . If for all , then Proposition 2.1 tells us that, for any restricted binary adversary sequence of gaps and for we have
(4) |
The right-hand side of the above equation is a discrete analog of the Riemman-Stieltjes integral of with respect to . In fact, if is a random sequence333We usually also require some kind of restriction on , such as requiring it to be a martingale or a local martingale., the above is also known as a discrete stochastic integral. In particular, consider the case where is a length reflected (i.e., absolute value of a) symmetric random walk. Then, any possible sequence of deterministic gaps has a positive probability of being realized by . In other words, any sequence of gaps is in the support of . Thus, bounding the worst-case regret of is equivalent to bounding almost surely the value of (4) when is a reflected symmetric random walk. This idea will prove itself powerful in continuous-time even though it is not very insightful for the discrete time problem.
4.2 A Continuous-Time Problem
A stochastic process that can be seen as the continuous-time analogue of symmetric random walks is Brownian motion (Revuz and Yor, 1999; Mörters and Peres, 2010). We fix a Brownian motion throughout the remainder of this text. Inspired by the observation that the discrete regret boils down to a discrete stochastic integral, Harvey et al. (2020b) define a continuous analogue of regret as a continuous stochastic integral. More specifically, given a function such that for all , define the continuous regret at time by
where the term in the limit of the right-hand above is the stochastic integral (from to ) of with respect to the process . We take the limit as a mere technicality: need not be defined at time and we want to ensure left-continuity of the continuous regret (the limit is well-defined since a stochastic integral with respect to a reflected Brownian motion is guaranteed to have limits from the left and to be continuous from the right). It is worth noting that usually stochastic integrals are defined with respect to martingales or local martingales, but is neither. Still, happens to be a semi-martingale, which roughly means that it can be written as a sum of two processes: a local-martingale and a process of bounded variation. In this case one can still define stochastic integrals in a way that foundational results such as Itô’s formula still hold and details can be found in Revuz and Yor (1999). We do not give the precise definition of a stochastic integral since we shall not deal with its definition directly. Still, one may think intuitively of such integrals as random Riemann-Stieltjes integrals, although the precise definition of stochastic integral is more delicate.
Let us now look for a continuous function with for all with small continuous regret. Note that without the conditions of continuity or the requirement of for , the problem would be trivial. If we did not require for all , then taking everywhere would yield continuous regret. Moreover, dropping this requirement would go against the analogous conditions needed in the discrete case, where regret could be written as a “discrete stochastic integral” on Proposition 2.1 only when the player chooses in rounds with gap. Finally, requiring continuity of the is a way to avoid technicalities and “unfair” player strategies.
When working with Riemann integrals, instead of manipulating the definitions directly we use powerful and general results such as the Fundamental Theorem of Calculus (FTC). Analogously, the following result, known as Itô’s formula, is one of the main tools we use when manipulating stochastic integrals and which can be seen as an analogue of the FTC (and shows how stochastic integrals do not always follow the classical rules of calculus). We denote by the class of bivariate functions that are continuously differentiable with respect to their first argument and twice continuously differentiable with respect to their second argument. Moreover, for any function we denote by the partial derivative of with respect to its first argument, and we denote by and , respectively, the first and second derivatives of with respect to its second argument.
Theorem 4.1 (Itô’s Formula, see Revuz and Yor, 1999, Theorem IV.3.3).
Let be in and let . Then, almost surely,
(5) |
Note that the first integral in the equation of the above theorem resembles the definition of the continuous regret. In fact, the above result shows an alternative way to write the continuous regret at time of a function such that there is with . However, it might be hard to compute (or even to bound) the second integral on (5). A straightforward way to circumvent this problem is to look for functions such that the second integral in (5) is 0. For that, it suffices to consider functions that satisfy the backwards heat equation on , that is,
(BHE) |
We summarize the above discussion and its implications in the following lemma.
Lemma 4.2.
Let be in and such that for all , such that (BHE) holds and such that . Then almost surely.
4.3 A Solution Inspired by Cover’s Algorithm
In the remainder of this text we will make extensive use of a well-known function related to the Gaussian distribution known as complementary error function, defined by
In Section 4.4 we will show that the function in given by
satisfies almost surely. Before bounding the continuous regret, it is enlightening to see how is related to Cover’s algorithm.
Specifically, let be as in (3). Due to the Central Limit Theorem, can be seen as an approximation of . To see why, let be a symmetric random walk, and define and for each . Note that follows a Bernoulli distribution with parameter for any . Moreover, let be a Gaussian random variable with mean 0 and variance 1. Then, by setting and , the Central Limit Theorem guarantees
where the limit holds in distribution. Thus, we roughly have that and have similar distributions. Then,
One may already presume that using in place of in the discrete experts’ problem should yield a regret bound close enough to the guarantees on the regret of Cover’s algorithm. Indeed, using Berry-Esseen’s Theorem (Durrett, 2019, Section 3.4.4) to more precisely bound the difference between and yields a regret bound with suboptimal constants against binary adversaries. However, it is not clear if the approximation error would yield the optimal constant in the regret bound. Additionally, these guarantees do not naturally extend to arbitrary experts’ costs in . In Section 5 we will show how to use an algorithm closely related to that enjoys a clean bound on the discrete-time regret.
Deriving directly from a PDE.
We have derived by a heuristic argument to approximate . Yet, one can derive the same solution without ever making use of by approaching the problem directly from the stochastic calculus point of view. Namely, consider player strategies that satisfy the BHE, are non-negative, and that place mass on each expert when the gap is . With only these conditions we would end up with anytime solutions similar to the ones considered by Harvey et al. (2020b). In the fixed-time case we can “invert time” by a change of variables . Then the BHE becomes the traditional heat equation, which satisfies together with the boundary conditions.
4.4 Bounding the Continuous Regret
Interestingly, not only is in , but it also satisfies the backwards heat equation, even though we have never explicitly required such a condition to hold. Since the proof of this fact boils down to technical but otherwise straightforward computations, we defer it to Appendix C.
Lemma 4.3.
For all and we have .
However, recall that to use Lemma 4.2 we need a function with that satisfies the backwards heat equation, not necessarily itself needs to satisfy the backwards heat equation. Luckily enough, the following lemma shows how to obtain such a function based on .
Proposition 4.4 (Harvey et al., 2020a, Lemma 5.6).
In light of the above proposition, for all define
In the case above, we can evaluate these integrals and obtain a formula for that is easier to analyze. Although we defer a complete proof of the next equation to Appendix C, using that (Olver et al., 2010, Section 7.7(i)) and that we can show for every and that
(6) |
Since satisfies (BHE), Lemma 4.2 shows the continuous regret of is given exactly by . The following lemma shows a bound on and, thus, a bound on the continuous regret of .
Lemma 4.5.
We have and
Proof 4.6.
The facts that and that for and are easily verifiable. For the bound on for , note first that for any we have
Therefore, for all we have
Applying the above to (6) yields the desired bound.
Combining these results we get the desired bound on the continuous regret of , which we summarize in the following theorem.
Theorem 4.7.
We have almost surely.
5 From Continuous to Discrete Time
In the continuous time algorithm we have that is the continuous regret at time with gap of the strategy that places probability mass on the lagging expert444We have never formally defined lagging and leading experts in continuous time, and we do not intend to do so. Here we are extrapolating the view given by Proposition 2.1 of regret as a stochastic integral of the probability put on the lagging expert with respect to the gaps for the sake of intuition. according to . At the same time, for Cover’s algorithm we have as an upper-bound on the regret when the mass on the lagging expert is given by . Furthermore, similar to the relation between and , we can write as a function of (details can be found on Appendix A): at round with gap at round , the probability mass placed on the lagging expert in Cover’s algorithm is555For this does not follow directly, but our goal at the moment is only to build intuition.
That is, is a sort of discrete derivative of with respect to its second argument. From this analogy, one might expect that a discrete derivative of with respect to its second argument yields a good strategy for the player in the original experts’ problem. As we shall see, this is exactly the case. Additionally, computing the discrete derivative of amounts to a couple of evaluations of the complementary error function, which we can assume to be computable (up to machine precision) in constant time.
In this section we shall describe the discretized algorithm and give an upper-bound on its regret against restricted binary adversaries, that is, adversaries that choose costs in . Luckily, unlike Cover’s algorithm, the strategy we shall see in this section smoothly extends to general costs in while preserving its performance guarantees. Since the details of this extension amounts to concavity arguments, we defer the details of this extension to Appendix E.
5.1 Discrete Itô’s Formula
In Section 4, the main tool to relate the continuous regret to the function was Itô’s formula. Similarly, one of the main tools for the analysis of the discretized continuous-time algorithm will be a discrete version of Itô’s formula. In order to state such a formula and to describe the algorithm, some standard notation to denote discrete derivative will be useful. Namely, for any function and any , define
We are now in place to state a discrete analogue of Itô’s formula. One important assumption of the next theorem is that are such that successive values have absolute difference equal to . In the case where are gaps in a 2-experts problem, this means that the adversary needs to be a restricted binary adversary. The version of the next theorem as stated — including the dependence on — can be found in Harvey et al. (2020b, Lemma 3.7). Yet, this theorem is a slight generalization of earlier results such as the ones due to Fujita (2008, Section 2) and Kudzhma (1982, Theorem 2)
Theorem 5.1 (Discrete Itô’s Formula).
Let be such that for every and let . Then,
The first summation in the right-hand side of discrete Itô’s formula can be seen as a discrete stochastic integral when is a random sequence. Remarkably, this term is extremely similar to the regret formula from Proposition 2.1. Thus, if we were to use discrete Itô’s formula to bound the regret, it would be desirable for the second term to (approximately) satisfy an analogue of (BHE). In fact, the potential from Cover’s algorithm satisfies the discrete BHE (with some care needed when the gap is zero, see Appendix A.3). Furthermore, the connection between BHE seems to extend to other problems in online learning: in recent work, Zhang et al. (2022) showed how coin-betting with potentials that satisfy the BHE yield optimal algorithms for unconstrained online learning.
Since satisfies (BHE), one might hope that would also satisfy such a discrete backwards-heat inequality, yielding an upper-bound on the regret of the strategy given by . In the work of Harvey et al. (2020b) in the anytime setting, it was the case that the terms in the second sum were non-negative, which in a sense means that the discretized algorithm suffers negative discretization error. In the fixed-time setting we are not as lucky.
5.2 Discretizing the Algorithm
Based on the discussion at the beginning of this section, a natural way to discretize the algorithm from Section 4 is to define the function by
where we need to treat the case at the very last step differently since is not defined on . It is not clear from its definition, but we indeed have for all . We defer the (relatively technical) proof of the next result to Appendix D.
Lemma 5.2.
We have for all .
Our goal now is to combine Proposition 2.1 and the discrete Itô’s formula to bound the regret of . Since satisfies the (BHE), one might hope that is close to satisfying the discrete version of this equation. To formalize this idea, for all and define
The above terms measure how well the first derivative with respect to the first variable and the second derivative with respect to the second variable are each approximated by their discrete analogues. That is, these are basically the discretization errors on the derivatives of . Then, combining the fact that satisfies (BHE) together with Proposition 2.1 yields the following theorem.
Theorem 5.3.
Consider a game of with a restricted binary adversary with gap sequence given by such that and for all . Then,
(7) |
5.3 Bounding the Discretization Error
In light of Theorem 5.3, it suffices to bound the accumulated discretization error of the derivatives to obtain potentially good bounds on the regret of . The next two lemmas show that both and are in . Since
(9) |
this will show that suffers at most regret.666This together with Prop. 2.1 also shows that the difference between in the regret of and is in . Since the proof of these bounds are relatively technical but otherwise not considerably insightful, we defer them to Appendix D.
Lemma 5.5.
For any and we have
Combining the above lemmas together with Theorem 5.3 yields the following regret bound.
Theorem 5.6.
Define for all , consider a game of against a restricted binary adversary. Then,
6 On Optimal Regret for More than Two Experts
In this paper we presented an efficient and optimal algorithm for two experts in the fixed time-setting. A natural question is whether similar techniques can be used to find the minimax regret when we have more than two experts. Encouragingly, techniques from stochastic calculus were also used to find the optimal regret for 4 experts (Bayraktar et al., 2020). Yet, it is not clear how to use similar techniques for cases with arbitrary number of experts. The approach used in this paper and by Harvey et al. (2020b) heavily relies on the gap parameterization for the problem. Although there is an analogous parameterization of the experts’ problem into gaps that yields a claim similar to Proposition 2.1, it is not clear what would be an analogous continuous-time problem to guide us in the algorithm design process since the gap process are not independent—even with independent costs on the experts. Moreover, many of the approaches in related work (Abbasi-Yadkori et al., 2017; Harvey et al., 2020b; Bayraktar et al., 2020) focus on specific adversaries such as the comb adversary. However, the latter does not seem to be a worst-case adversary for cases such as for five experts (Chase, 2019). We are not aware of adversaries that could yield worst-case regret for arbitrary fixed number of experts, although asymptotically in and it is well-known that assigning costs at random is minimax optimal (Cesa-Bianchi and Lugosi, 2006)
We would like to thank the anonymous ICML 2022 reviewers for their insightful comments. In particular, reviewer 1 suggested the use of Berren-Esseen-like results to derive regret, noted the regret difference between and , and found a calculation mistake.
N. Harvey was supported by an NSERC Discovery Grant.
References
- Abbasi-Yadkori et al. (2017) Yasin Abbasi-Yadkori, Peter L. Bartlett, and Victor Gabillon. Near minimax optimal players for the finite-time 3-expert prediction problem. In Annual Conference on Neural Information Processing Systems (NIPS), pages 3033–3042, 2017.
- Arora et al. (2012) Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. 8:121–164, 2012.
- Bayraktar et al. (2020) Erhan Bayraktar, Ibrahim Ekren, and Xin Zhang. Finite-time 4-expert prediction problem. Communications in Partial Differential Equations, 45(7):714–757, 2020.
- Cesa-Bianchi and Lugosi (2006) Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. ISBN 978-0-521-84108-5.
- Cesa-Bianchi et al. (1997) Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
- Chase (2019) Zachary Chase. Experimental evidence for asymptotic non-optimality of comb adversary strategy. 12 2019. URL http://arxiv.org/abs/1912.01548.
- Cover (1967) Thomas M. Cover. Behavior of sequential predictors of binary sequences. In Trans. Fourth Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes (Prague, 1965), pages 263–272. Academia, Prague, 1967.
- Durrett (2019) Rick Durrett. Probability—theory and examples, volume 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. Fifth edition.
- Freund (2009) Yoav Freund. A method for hedging in continuous time. October 2009. URL http://arxiv.org/abs/0904.3356.
- Fujita (2008) Takahiko Fujita. A random walk analogue of Lévy’s theorem. Studia Sci. Math. Hungar., 45(2):223–233, 2008.
- Gravin et al. (2016) Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards optimal algorithms for prediction with expert advice. In Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 528–547. ACM, New York, 2016.
- Gravin et al. (2017) Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Tight lower bounds for multiplicative weights algorithmic families. In 44th International Colloquium on Automata, Languages, and Programming (ICALP), volume 80, pages Art. No. 48, 14. 2017.
- Harvey et al. (2020a) Nicholas J. A. Harvey, Christopher Liaw, Edwin Perkins, and Sikander Randhawa. Optimal anytime regret with two experts. February 2020a. URL https://arxiv.org/abs/2002.08994v2.
- Harvey et al. (2020b) Nicholas J. A. Harvey, Christopher Liaw, Edwin A. Perkins, and Sikander Randhawa. Optimal anytime regret for two experts. In 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1404–1415. IEEE, 2020b.
- Karlin and Peres (2017) Anna R. Karlin and Yuval Peres. Game theory, alive. American Mathematical Society, Providence, RI, 2017.
- Kobzar et al. (2020) Vladimir A. Kobzar, Robert V. Kohn, and Zhilei Wang. New potential-based bounds for prediction with expert advice. In Conference on Learning Theory, (COLT), volume 125 of Proceedings of Machine Learning Research, pages 2370–2405. PMLR, 2020.
- Kudzhma (1982) R. Kudzhma. Itô’s formula for a random walk. Litovsk. Mat. Sb., 22(3):122–127, 1982.
- Mörters and Peres (2010) Peter Mörters and Yuval Peres. Brownian motion, volume 30 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2010.
- Olver et al. (2010) Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark, editors. NIST handbook of mathematical functions. U.S. Department of Commerce, National Institute of Standards and Technology, Washington, DC; Cambridge University Press, Cambridge, 2010.
- Revuz and Yor (1999) Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third edition, 1999.
- Robbins (1955) Herbert Robbins. A remark on Stirling’s formula. Amer. Math. Monthly, 62:26–29, 1955.
- Zhang et al. (2022) Zhiyu Zhang, Ashok Cutkosky, and Ioannis Paschalidis. PDE-based optimal strategy for unconstrained online learning. January 2022. URL https://arxiv.org/abs/2201.07877.
Appendix A Cover’s Algorithm for Two Experts
In this section, we shall review the optimal algorithm for the 2-experts problem originally proposed by Cover (1967) and the matching lower-bound.
A.1 A Dynamic Programming View
In the fixed-time setting we know the total number of rounds before the start of the game. Thus, we may compute ahead of time all the possible states the game can be on each round and decide the probabilities on the experts the player should choose in each case to minimize the worst-possible regret. More specifically, we start with a function that represents our player strategy: for any , if at the end of a round the experts’ gap is , on round the player places probability mass on the lagging777When the gap is 0, which means that both experts have the same cumulative loss, we break ties arbitrarily. For the optimal algorithm we shall ultimately derive this will not matter since we will have for all . expert and probability on the leading expert, and we denote by such a player strategy defined by . Now, for all and , denote by the maximum regret-to-be-suffered by the player strategy defined by on rounds given that at the end of round the gap between experts is . Slightly more formally, we have
(10) |
where . Above we take the supremum instead of the maximum only to account for cases where the set we are considering is empty (and, thus, the supremum evaluates to ), such as when and have distinct parities or when . Note that by definition of we have
Thus, if we compute , then we have a bound on the worst-case regret of against restricted binary adversaries. The following proposition shows how we can compute this value in a dynamic programming style.
Theorem A.1.
For any , and for all such that we have
(11) | |||||
(12) | |||||
(13) |
Proof A.2.
First, note that (11) clearly holds by the definition of . To show that equations (12) and (13) hold, let be such that . Let be a sequence of cost vectors such that we have . First, suppose and . Then there are two cases for : either the gap goes up and , or it goes down and . This together with Proposition 2.1 and the fact that implies
By taking the maximum over all possible cost vectors with gap at round we obtain (12). Now suppose and . In this case, suppose without loss of generality that is the expert to whom assigns mass (recall that the strategy breaks ties arbitrarily when the gap is ). Proposition 2.1 together with the fact that
Since the gap on round is certainly in this case, taking the maximum over all the adversaries with gap on round yields (13).
For the sake of convenience, we redefine for all such that to, instead, be the value given by the equations from the above theorem.888There will be places where this definition requires access to undefined or “out-of-bounds” entries (such as for entries with gap and time ). In such cases, we set such undefined/out-of-bounds values to . This does not affect any of our results and makes it less cumbersome to design and analyze the algorithm.
A.2 Picking Optimal Probabilities
We are interested in a function , if any, that minimizes . To see that there is indeed such a function, note that we can formulate the problem of minimizing as a linear program using Theorem A.1 to design the constraints. Such a linear program is certainly bounded (the regret is always between and ) and feasible. Thus, let be a function that attains and define . The next proposition shows that can be computed recursively and show how to obtain from .
Theorem A.3.
For each
Furthermore, if we define by
then .
Proof A.4.
Let us show that as defined in the statement of the theorem attains , where the infimum ranges over all functions from to . Note that smaller values of any entry of can only make smaller. Thus, showing that minimizes for all (given that is fixed for and ) suffices to that minimizes999One may fear that choosing to minimize might increase other entries, making this argument invalid. However, note that depends only on and entries with . Thus, it is not hard to see that by proceeding from higher to smaller values of , we can in fact pick to minimize . . Moreover, by Theorem A.1 and the definition of we have that obeys the formulas on the statement of this theorem. Thus, we only need to show that this choice of indeed minimizes the entries of .
Let us first show that
(14) |
For , since , the above claim clearly holds. Let and . If we have
The last term above, by the induction hypothesis, is in . Similarly, if we have
and the last term is in by the induction hypothesis. This completes the proof of (14).
Let and . Let us now show that minimizes given that entries of the form for and are fixed. For , Theorem A.1 shows that for some , which is minimized when . For , Theorem A.1 tells us that
Since and by (14), certainly minimizes since it makes both terms in the maximum equal. Finally, (14) guarantees that101010In fact, it guarantees that . Intuitively this makes sense since we want to give more probability to the current best/leading expert than to the lagging expert. .
A.3 Discrete Backwards Heat Equation
Interestingly, the potential function for Cover’s optimal algorithm satisfies the discrete backwards heat equation when the gap is not . For simplicity, let us focus on the simpler case with and gap . Then, taking we have
(by Theorem A.3) | ||||
The same holds for the case where the gap is zero, but we need to extend for when the gap is . Namely, set for . This guarantees that and , so the cases with zero gap agree with the formulas for non-zero gaps in Theorem A.3. Interestingly, one may verify that also satisfies the discrete BHE by setting for .
A.4 Connecting the Regret with Random Walks
As argued before, to give an upper-bound on the regret of , where is as in Theorem A.3, we need only to bound the value of . Interestingly, the entries of have a strong connection to random walks, and this helps us give an upper-bound on the value of . In the next theorem and for the remainder of the text, a random walk (of length starting at ) is a sequence of random variables where for each and are i.i.d. random variables taking values in . If we do not specify a starting point of a random walk, take it to be . We say that is symmetric if . Moreover, a reflected random walk (of length ) is the sequence of random variables where is a random walk. Finally, we say that a random walk passes through if the event happens for some .
The following lemma gives numeric bounds on the expected number of passages through of a symmetric random walk. Its proof boils down to careful applications of Stirling’s formula and can be found in Appendix B.
Lemma A.5.
Let the random variable be the number of passages through of a reflected symmetric random walk of length . Then,
We are now in position to prove an upper-bound on the performance of .
Theorem A.6.
For every , let the random variable be the number of passages through 0 of a reflected symmetric random walk of length starting at position . Then for every . In particular,
Proof A.7.
Let us show that for all by induction on . For we have . Assume and let . If we have . If , we have . Suppose now . If , then we have . Now let us look at the case . By Theorem A.3 and by the induction hypothesis, we have
Similarly, for the case when we have
In particular, we have and Lemma A.5 gives us the desired numerical bound.
A.5 Lower Bound on the Optimal Regret
In the previous section we showed that Cover’s algorithm suffers regret at most . In fact, by the definition of (see (10)) and we have that is the minimum regret algorithms of the form , where is some function, suffer in the worst-case scenario. However, this does not tell us whether more general player strategies can do better or not. The next theorem shows that any player strategy suffers, in the worst case, at least regret. The proof of the theorem boils down to lower-bounding the expected regret of a random adversary that plays uniformly from .
Theorem A.8.
Let be a player strategy for a 2-experts game with rounds. Then, there is such that
Proof A.9.
Let be i.i.d. random variables such that is equal to a vector in chosen uniformly at random and let be the (randomized) oblivious adversary that plays at round . We shall show that
(15) |
which implies the existence of a deterministic adversary as described in the statement. For each , let the random variable be the gap between experts due to the costs of on round , define , and set where is a lagging expert on round . It is worth to already note that is a reflected random walk of length . By Proposition 2.1 we have
where we recall that for any predicate we have equals if is true, and equals otherwise. First, let us show that
(16) |
For each , define , that is, is the conditional expectation given the choices of the random adversary on rounds . Let . On the event , one can see that is independent of and is uniformly distributed on . This together with the fact that is a function of implies
This ends the proof of (16). Let us now show that
(17) |
For each , since is a function of and is independent of , we have
Thus,
This completes the proof of (17) and the desired numerical lower-bound is given by Lemma A.5.
Appendix B On the Passages Through Zero of a Symmetric Random Walk
In this section we shall prove Lemma A.5, which bounds the expected number of passages through of a symmetric random walk. First, we need a simple corollary of Stirling’s formula (which we state here for convenience) to bound binomial terms.
Theorem B.1 (Stirling’s Formula, Robbins, 1955).
For any we have
Corollary B.2.
For any we have
Proof B.3.
Let . For the upper-bound, we have
Similarly, for the lower-bound we have
(Since for ). |
We are now ready to prove Lemma A.5, which we restate for convenience.
See A.5
Proof B.4.
Let be a symmetric random walk and define for every . Note that
Therefore,
Using Corollary B.2, a consequence of Stirling’s approximation to the factorial function, we can show upper- and lower-bounds to the above quantity. Namely, for the upper-bound we have
We proceed similarly for the lower-bound. By setting we get
To conclude the proof, some simple calculations yield
Appendix C Missing Proofs for Section 4
See 4.3
Proof C.1.
Fix and . Then,
Similarly,
and
as desired.
Let us now prove (6).
Lemma C.2.
For all and , we have
Proof C.3.
Fix . Using that (Olver et al., 2010, Section 7.7(i)), we have
for . Similarly, using that we have
Plugging both equations into the definition of concludes the proof.
Appendix D Missing Proofs for Section 5
Let us begin by proving a crucial condition on the function defined on Section 5.
See 5.2
Proof D.1.
The claim follows directly from the definition of for . Let . From the definition of , we have
Moreover, note that for any we have
(18) |
Therefore, for any and
This section contains the proofs on the bounds on and . We start by bounding .
See 5.5
Proof D.2.
Fix and . Note that is continuously differentiable with respect to its first argument on and two times differentiable on . Thus, by Taylor’s Theorem, there is such that
where denotes the second derivative of with respect to its first argument at . Therefore,
Thus, to bound we need only to bound . Computing the derivatives yields
and
(Since ), | ||||
(Since ). |
To bound , we will need to be slightly more careful. First, we will need the following simple lemma about Lipschitz continuity of .
Lemma D.3.
Let and define for every . Then is -Lipschitz continuous.
Proof D.4.
Let . First, note that . Therefore, using the fact that for any we have
We are now ready to bound . Fix and . Moreover, denote by the third partial derivative of with respect to its third argument. By Taylor’s Theorem, there are and such that
Therefore,
Let . To compute the partial derivatives, first note that
Thus, one may check that
(19) | ||||
and | ||||
By Lemma D.3, we know that is Lipschtiz continuous with Lipschitz constant . Therefore,
Appendix E Extending the Regret Analysis for General Costs
In Section 5, we relied on the fact the gap values were in . This assumption was fundamental for the version of the discrete Itô’s Formula that we have uses (see the assumption on in the statement of Theorem 5.1). It was also required by Proposition 2.1 to connect the regret with the “discrete stochastic integral”. To extend the upper-bound on the regret of the algorithm from Section 5 to general costs we will follow the same techniques used by Harvey et al. to extend the guarantees of their algorithm to general costs: we shall use a more general version of the discrete Itô’s formula, concavity of with respect to its second argument, and a lemma relating the per-round regret to terms that appear in the more general version of the discrete Itô’s formula (see Harvey et al., 2020b, Section 3.3 for details on these arguments).
As in the work of Harvey et al., we will rely on a more general version of the discrete Itô’s formula that holds for general costs. The main issue with this general formula is that more work is needed to relate it to the regret of our player strategy.
Theorem E.1 (General Discrete Itô’s Formula, Harvey et al., 2020a, Lemma 3.13).
Let be a function and let . Then,
Fix , fix gaps , and set . For the remainder of this section all results will be regarding a game of against an oblivious adversary with gap sequence .
For every , define the per-round regret (at round ) by
Our goal in this section is to prove the following lemma.
Lemma E.2.
For every we have
(20) |
Combining the above lemma with Theorem E.1 and the fact that satisfies (BHE) yields
Since for any , we have . At this point, the exact same proof of Theorem 5.6 applies and we obtain the same regret bound. Thus, it only remains to prove Lemma E.2. In order to prove Lemma E.2, we will use the following result from Harvey et al. (2020a).
Proposition E.3 (Harvey et al., 2020a, Lemma 3.14).
Let and be the values of the gap on rounds and , respectively, and let be the probability mass put in the worst expert at round by the player (with ). For all ,
-
1.
If a best expert at time remains a best expert at time , then,
-
2.
If a best expert at time remains a best expert at time , then and
We shall also make use of the following fact about concave function.
Lemma E.4.
Let be a concave function and let be real numbers. Then .
Proof E.5 (Proof of Lemma E.2).
Case 1.
In this case, (20) is equivalent to
(21) |
Since the first term in the right-hand side of the above inequality is linear in and since is concave (by (19) we know that is negative everywhere), we conclude that the whole right-hand side is concave as a function of . Thus, by Fact E.4 it suffices to prove the above inequality for to prove that it holds for . But for the right-hand side of (20) becomes exactly .
Case 2.
In this case, (20) is equivalent to
(22) |
Again, the right-hand side of the above inequality is concave as a function of . Since and , we know that . Thus, it suffices to prove the above inequality for . For we have
and in the previous case we showed that the above is non-negative for all . Since in this case, we have in particular that the above holds for . Suppose now that . Since , we have that (22) is equivalent to
By the definition of and since for all (see (18)), we have
This concludes the proof of (22).