This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\altauthor\Name

Laura Greenstreet \Emaillaura.greenstreet@gmail.com
\NameNicholas J. A. Harvey \Emailnickhar@cs.ubc.ca
\NameVictor Sanches Portella \Emailvictorsp@cs.ubc.ca
\addrUniversity of British Columbia, Department of Computer Science

Efficient and Optimal Fixed-Time Regret with Two Experts

Abstract

Prediction with expert advice is a foundational problem in online learning. In instances with TT rounds and nn experts, the classical Multiplicative Weights Update method suffers at most (T/2)lnn\sqrt{(T/2)\ln n} regret when TT is known beforehand. Moreover, this is asymptotically optimal when both TT and nn grow to infinity. However, when the number of experts nn is small/fixed, algorithms with better regret guarantees exist. Cover showed in 1967 a dynamic programming algorithm for the two-experts problem restricted to {0,1}\{0,1\} costs that suffers at most T/2π+O(1)\sqrt{T/2\pi}+O(1) regret with O(T2)O(T^{2}) pre-processing time. In this work, we propose an optimal algorithm for prediction with two experts’ advice that works even for costs in [0,1][0,1] and with O(1)O(1) processing time per turn. Our algorithm builds up on recent work on the experts problem based on techniques and tools from stochastic calculus.

keywords:
experts, Cover, online learning, optimal, fixed-time

1 Introduction

The foundational problem in online learning of prediction with expert advice (or simply experts’ problem) consists of a sequential game between a player and an adversary. In each turn, the player chooses (possibly randomly) one of nn experts to follow. Concurrently, the adversary chooses for each expert a cost in [0,1][0,1]. At the end of a turn, the player sees the costs of all experts and suffers the cost of the expert they followed. The performance of the player is usually measured by the regret: the difference between their cumulative loss and the cumulative loss of the best expert in hindsight. In this case, we are interested in strategies for the player whose (expected) regret against any adversary is sublinear in the total number of rounds of the game.

A well-known strategy for the player is the Multiplicative Weights Update (MWU) method (Arora et al., 2012). In the fixed-time setting — that is, when the player knows beforehand the total number of rounds TT — MWU with a carefully-chosen fixed step-size suffers at most (T/2)lnn\sqrt{(T/2)\ln n} regret. Additionally, this regret bound is asymptotically optimal when both nn and TT grow to infinity (Cesa-Bianchi et al., 1997). Yet, if the number of experts nn is fixed/small, better regret guarantees may be possible. From a theoretical standpoint, there is a clear motivation for the case where nn is fixed: MWU can suffer regret arbitrarily close to (T/2)lnn\sqrt{(T/2)\ln n} for any nn as the number of rounds TT grows111This is known to hold when MWU is used with a fixed or decreasing step-size, which are the usual cases when MWU is applied to the experts’ problem. When the step-size of MWU is allowed to be arbitrary, Gravin et al. (2017) show that MWU can suffer regret arbitrarily close to (2/3)(T/2)lnn(2/3)\sqrt{(T/2)\ln n} as the number of rounds TT grows.  (Gravin et al., 2017). This means that different ideas are necessary for player strategies to guarantee smaller regret.

1.1 The Case of Two Experts

Of course, a natural question is whether regret smaller than (T/2)lnn\sqrt{(T/2)\ln n} is even possible as TT grows even if nn is fixed. Cover (1967) showed that for two experts (that is, for n=2n=2) a regret of T/2π+O(1)\sqrt{T/2\pi}+O(1) is the best possible in the worst-case by showing an algorithm for the case with costs in {0,1}\{0,1\}. In related work, the attainable worst-case regret bounds for three (Abbasi-Yadkori et al., 2017; Kobzar et al., 2020) and four (Bayraktar et al., 2020) experts were recently improved, respectively, to 8T/(9π)+O(lnT)\sqrt{8T/(9\pi)}+O(\ln T) and Tπ/8\sqrt{T\pi/8} up to lower-order terms. Although optimal, Cover’s algorithm is based in a dynamic programming approach that takes O(T2)O(T^{2}) time in total. In comparison, MWU takes O(1)O(1) time per round to compute the probabilities to assign to the two experts at each round. Finally, one can adapt Cover’s algorithm for costs in [0,1][0,1], but the standard approach is to randomly round to costs to either 0 or 11. In this case, the regret guarantees only hold in expectation.

In a related line of work, Harvey et al. (2020b) study the 2-experts problem in the anytime setting, that is, in the case where the player/algorithm does not know the total number of rounds TT ahead of time. They showed an optimal strategy for the player whose regret on any round tt\in\mathbb{N} is at most (γ/2)t(\gamma/2)\sqrt{t}, where the constant γ1.30693\gamma\approx 1.30693 arises naturally in the study of Brownian motion (see Mörters and Peres, 2010 for an introduction to the field and historical references). Moreover, their algorithm can be computed (up to machine precision) in O(1)O(1) time since it boils down to the evaluation of well-known mathematical functions such as the exponential function and the imaginary error function. In this work, we combine similar ideas based on stochastic calculus together with Cover’s algorithm to propose an efficient and optimal fixed-time algorithm for 2-experts.

Known result (Cover, 1967):

There is a dynamic programming algorithm for the 2-experts problem with costs in {0,1}\{0,1\} that suffers at most T/2π+0.5\sqrt{T/2\pi}+0.5 regret in games with TT rounds and requires O(T2)O(T^{2}) pre-processing time.

Our contribution:

An algorithm for the two experts’ problem with costs in [0,1][0,1] that suffers at most T/2π+1.3\sqrt{T/2\pi}+1.3 regret in games with TT rounds. This new algorithm has running time O(T)O(T) and is based on discretizing a continuous-time solution obtained using ideas from stochastic calculus.

More precisely, one of the key steps is deriving a player in the continuous-time setting from Harvey et al. (2020b) that exploits the knowledge of the time-horizon to obtain regret bounds better than in the anytime setting. However, unlike the anytime setting, discretizing this algorithm leads to non-negative discretization error. Another key contribution of our paper is showing that this discretization error is small. Finally, the connections to Cover’s classical algorithm sheds new intuition into the classical optimal solution. Interestingly, our results could be formally presented without resorting to stochastic calculus. Yet, it is the stochastic calculus point of view that guides us through the design and analysis of the algorithm.

Text organization:

We first formally define the experts problem and discuss some assumptions and simplifications in Section 2. In Section 3 we present a brief summary of Cover’s optimal algorithm for two experts. In Section 4 we define an analogous continuous-time problem and describe a solution inspired by Cover’s algorithm. Finally, in Section 5 we present and analyze a discretized version of the continuous-time algorithm, showing it enjoys optimal worst-case regret bounds.

2 Prediction with Expert Advice

In this section we shall more precisely define the problem of prediction with expert advice. The problem is parameterized by a fixed number nn\in\mathbb{N} of experts. A (strategy for the) player is a function 𝒜\mathcal{A} that, given cost vectors 1,,t[0,1]n\ell_{1},\dotsc,\ell_{t}\in[0,1]^{n} chosen by the adversary in previous rounds, outputs a probability distribution over the nn experts represented by a vector xt+1Δn{x[0,1]n:i=1nx(i)=1}x_{t+1}\in\Delta_{n}\coloneqq\{\,{x\in[0,1]^{n}}\,\colon{\sum_{i=1}^{n}x(i)=1}\}. Similarly, a (strategy for the) adversary is a function \mathcal{B} that, given previous player’s choices x1,,xtΔnx_{1},\dotsc,x_{t}\in\Delta_{n} of distributions over the experts, outputs a vector ct+1[0,1]nc_{t+1}\in[0,1]^{n} of expert costs for round t+1t+1, where t+1(i)\ell_{t+1}(i) is the cost of expert i[n]{1,,n}i\in[n]\coloneqq\{1,\dotsc,n\}. The performance of a player strategy 𝒜\mathcal{A} in a game with TT\in\mathbb{N} rounds against an adversary \mathcal{B} is measured by the regret, defined as

Regret(T,𝒜,)t=1Tt𝖳xtmini[n]t=1Tt(i),\operatorname{Regret}(T,\mathcal{A},\mathcal{B})\coloneqq\sum_{t=1}^{T}\ell_{t}^{\mathsf{T}}x_{t}-\min_{i\in[n]}\sum_{t=1}^{T}\ell_{t}(i),

where above, and for the remainder of this section222If no specific strategies 𝒜\mathcal{A} or \mathcal{B} are clear from the context, one may take 𝒜\mathcal{A} and \mathcal{B} to be arbitrary strategies and we shall omit 𝒜\mathcal{A} and \mathcal{B} when they are clear from context. we have xt𝒜(1,,t1)x_{t}\coloneqq\mathcal{A}(\ell_{1},\dotsc,\ell_{t-1}) and t(x1,,xt1)\ell_{t}\coloneqq\mathcal{B}(x_{1},\dotsc,x_{t-1}) for all t[T]t\in[T]. Moreover, whenever the loss vectors 1,,T\ell_{1},\dotsc,\ell_{T} are clear from context, we define the cumulative loss of expert i[n]i\in[n] at round t[T]t\in[T] by Lt(i)j=1tt(i)L_{t}(i)\coloneqq\sum_{j=1}^{t}\ell_{t}(i). In this text, for each TT\in\mathbb{N} we want to devise a strategy 𝒜T\mathcal{A}_{T} for the player that suffers regret at most sublinear in TT against any adversary in a game with TT rounds. That is, we want a family of strategies {𝒜T}T\{\mathcal{A}_{T}\}_{T\in\mathbb{N}} such that

limT1TsupRegret(T,𝒜T,)=0,\lim_{T\to\infty}\frac{1}{T}\sup_{\mathcal{B}}\operatorname{Regret}(T,\mathcal{A}_{T},\mathcal{B})=0, (1)

where the supremum ranges over all possible adversaries, even those that have full knowledge of (and may even be adversarial to) the player’s strategy.

2.1 Restricted Adversaries

In (1), the supremum ranges over all the possible adversaries for a game with TT rounds. However, we need only consider in the supremum oblivious adversaries (Karlin and Peres, 2017, Section 18.5.4), that is, adversaries \mathcal{B} whose choice on each round depends only on the round number and not on the choices of the player. For any =(1,,T)𝖳(n)T\ell=(\ell_{1},\dotsc,\ell_{T})^{\mathsf{T}}\in(\mathbb{R}^{n})^{T}, we denote by \mathcal{B}_{\ell} the oblivious adversary that plays t\ell_{t} on round t[T]t\in[T].

In fact, we may restrict our attention to even smaller sets of adversaries (for details on these reductions, see Gravin et al., 2016 and Karlin and Peres, 2017, Section 18.5.3). First, in (1) we need only to consider binary adversaries, that is, adversaries which can assign only costs in {0,1}\{0,1\} to the experts. Furthermore, to obtain the value of the optimal regret for two experts we only need to consider adversaries that pick vector costs in {(1,0)𝖳,(0,1)𝖳}\mathcal{L}\coloneqq\{(1,0)^{\mathsf{T}},(0,1)^{\mathsf{T}}\}, which we call restricted binary adversaries. Intuitively, the adversary can do no better by placing equal costs on both experts at any given round. The optimal algorithm for two experts proposed by Cover (1967) heavily relies on the assumption that the adversary is a restricted binary one and does not extend to general costs in [0,1][0,1] without resorting to randomly rounding the costs — which makes the regret guarantees hold only in expectation.

In this work we design an algorithm the suffers at most T/(2π)+O(1)\sqrt{T/(2\pi)}+O(1) regret for arbitrary [0,1][0,1] costs. Our initial analysis handles only restricted binary adversaries, but simple concavity arguments extend the upper bound to general adversaries. Throughout this text we fix a time horizon TT\in\mathbb{N}.

2.2 The Gap Between Experts

The case where we have only 2 experts admits a simplification that aids us greatly in the design of upper- and lower-bounds on the optimal regret. Namely, the gap (between experts) at round t[T]t\in[T] is given by |Lt(1)Lt(2)|\lvert L_{t}(1)-L_{t}(2)\rvert, where LtL_{t} is the cumulative loss vector at round tt as defined in Section 2. Furthermore, we denote by lagging expert (on round t[T]t\in[T]) an expert with maximum cumulative loss on round tt among both experts. Similarly, we denote by leading expert (on round t[T]t\in[T]) an expert with minimum cumulative loss on round tt. The following proposition from Harvey et al. (2020b) shows that, for the restricted binary adversaries described earlier, the regret can be almost fully characterized by the expert gaps and the player’s choices of distributions on the experts. In the next proposition (and throughout the remainder of the text), for any predicate PP we define [P][{P}] to be 1 if PP is true and 0 otherwise.

Proposition 2.1 (Harvey et al., 2020a, Proposition 2.3).

Fix TT\in\mathbb{N}, let 𝒜\mathcal{A} be a player strategy, and let 1,,T{(1,0)𝖳,(0,1)𝖳}\ell_{1},\dotsc,\ell_{T}\in\{(1,0)^{\mathsf{T}},(0,1)^{\mathsf{T}}\} be the expert costs chosen by the adversary. For each t[T]t\in[T], set xt𝒜(1,,t1)x_{t}\coloneqq\mathcal{A}(\ell_{1},\dotsc,\ell_{t-1}), let pt{xt(1),xt(2)}p_{t}\in\{x_{t}(1),x_{t}(2)\} be the probability mass placed on the lagging expert on round tt, and let gtg_{t} be the gap between experts on round tt. Then,

Regret(T)=t=1T[gt1>0]pt(gtgt1)+t=1T[gt1=0]t𝖳xt,\operatorname{Regret}(T)=\sum_{t=1}^{T}[{g_{t-1}>0}]p_{t}\cdot(g_{t}-g_{t-1})+\sum_{t=1}^{T}[{g_{t-1}=0}]\ell_{t}^{\mathsf{T}}x_{t},

where g00g_{0}\coloneqq 0. In particular, if for every t[T]t\in[T] with gt1=0g_{t-1}=0 we have xt(1)=xt(2)=1/2x_{t}(1)=x_{t}(2)=1/2, then

Regret(T)=t=1Tpt(gtgt1).\operatorname{Regret}(T)=\sum_{t=1}^{T}p_{t}\cdot(g_{t}-g_{t-1}).

3 An Overview of Cover’s Algorithm

Although in this section we give only a brief overview of Cover’s algorithm, for the sake of completeness we provide a full description and analysis of the algorithm in Appendix A. The key idea in Cover’s algorithm is to compute optimal decisions for all possible scenarios beforehand. This is a feasible approach when we know the total number of rounds and the adversary is a (restricted) binary adversary. More precisely, we will focus our attention to player strategies 𝒜p\mathcal{A}_{p} parameterized by functions p:[T]×{0,,T1}[0,1]p\colon[T]\times\{0,\dotsc,T-1\}\to[0,1] which place p(t,g)p(t,g) probability mass on the lagging expert on round tt if the gap between experts is gg, and 1p(t,g)1-p(t,g) mass on the leading expert. Then the “regret-to-be-suffered” by 𝒜p\mathcal{A}_{p} at any round tt with a given gap between experts gg is

Vp[t,g]sup{Regret(T,𝒜p,)Regret(t,𝒜p,):Ts.t.|Lt(1)Lt(2)|=g}.V_{p}[t,g]\coloneqq\sup\{\,{\operatorname{Regret}(T,\mathcal{A}_{p},\mathcal{B}_{\ell})-\operatorname{Regret}(t,\mathcal{A}_{p},\mathcal{B}_{\ell})}\,\colon{\ell\in\mathcal{L}^{T}\leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ \lvert L_{t}(1)-L_{t}(2)\rvert=g}\}. (2)

We can compute all entries of VpV_{p} as defined above via a dynamic programming approach, starting with Vp[T,g]V_{p}[T,g] for all g{0,,T1}g\in\{0,\dotsc,T-1\} and then computing these values for earlier rounds. Moreover, there is a simple strategy pp^{*} that minimizes the worst-case regret Vp[0,0]V_{p}[0,0]. Interestingly, the worst-case regret of 𝒜p\mathcal{A}_{p^{*}} given by V[0,0]V^{*}[0,0] is tightly connected with symmetric random walks, where a symmetric random walk (of length tt starting at gg) is a sequence of random variables (Si)i=0t(S_{i})_{i=0}^{t} with Sig+X1++XiS_{i}\coloneqq g+X_{1}+\dotsm+X_{i} for each i{0,,t}i\in\{0,\dotsc,t\} and {Xj}j[t]\{X_{j}\}_{j\in[t]} are i.i.d. uniform random variables on {±1}\{\pm 1\}. The next theorem summarizes the guarantees on the regret of 𝒜p\mathcal{A}_{p^{*}}, showing that it suffers no more than T/2π+O(1)\sqrt{T/2\pi}+O(1) regret. Moreover, it is worth noting that no player strategy can do any better asymptotically in TT (for a complete proof of the lower bound, see Appendix A.5).

Theorem 3.1 (Cover, 1967, and Karlin and Peres, 2017, Section 18.5.3).

For every r,gr,g\in\mathbb{N}, let the random variable Zr(g)Z_{r}(g) be the number of passages through 0 of a symmetric random walk of length rr starting at position gg. Then V[t,g]=12𝐄[ZTt(g)]V^{*}[t,g]=\tfrac{1}{2}\bm{\mathrm{E}}[Z_{T-t}(g)] for every t,gt,g\in\mathbb{N}. In particular,

V[0,0]=12𝐄[ZT(0)]T2π+12.V^{*}[0,0]=\frac{1}{2}\bm{\mathrm{E}}[Z_{T}(0)]\leq\sqrt{\frac{T}{2\pi}}+\frac{1}{2}.

Finally, although not more efficiently computable than the dynamic programming approach, pp^{*} has a closed form solution (see Karlin and Peres, 2017, Section 18.5.3) given, for t[T]t\in[T] and g{0,,T1}g\in\{0,\dotsc,T-1\}, by

p(t,g)=12𝐏(STt=g)+𝐏(STt>g),p^{*}(t,g)=\frac{1}{2}\bm{\mathrm{P}}(S_{T-t}=g)+\bm{\mathrm{P}}(S_{T-t}>g), (3)

where STtS_{T-t} is a symmetric random walk of length TtT-t. This closed-form solution will serve as inspiration for our continuous-time algorithm.

4 A Continuous-Time Problem

Cover’s player strategy is optimal, but it is defined only for restricted binary adversaries. It is likely that it can be extended to binary adversaries, but it is definitely less clear how to extend such an algorithm for general adversaries picking costs in [0,1][0,1]. Moreover, even when Cover’s algorithm can be used, it is quite inefficient: we either need to compute VV^{*} which has O(T2)O(T^{2}) entries, or at each round we need to compute the probabilities in (3). In the latter case, in the first round we already need O(T2)O(T^{2}) time to exactly compute the probabilities related to a length T1T-1 random walk.

To devise a new algorithm for the two experts problem, we first look at an analogous continuous-time problem, first proposed by Harvey et al. (2020b) and with a similar setting previously studied by Freund (2009). The main idea is to translate the random walk connection from the discrete case into a stochastic problem in continuous time, and then exploit the heavy machinery of stochastic calculus to derive a continuous time solution.

4.1 Regret as a Discrete Stochastic Integral

Let us begin by further connecting Cover’s algorithm to random walks. Let 𝒜p\mathcal{A}_{p} be a player strategy induced by some function p:[T]×{0,,T1}p\colon[T]\times\{0,\dotsc,T-1\}\to\mathbb{R}. If p(t,0)=1/2p(t,0)=1/2 for all t{0,,T1}t\in\{0,\dotsc,T-1\}, then Proposition 2.1 tells us that, for any restricted binary adversary sequence of gaps g1,,gT0g_{1},\dotsc,g_{T}\in\mathbb{R}_{\geq 0} and for g00g_{0}\coloneqq 0 we have

Regret(T)=t=1Tp(t,gt1)(gtgt1).\operatorname{Regret}(T)=\sum_{t=1}^{T}p(t,g_{t-1})(g_{t}-g_{t-1}). (4)

The right-hand side of the above equation is a discrete analog of the Riemman-Stieltjes integral of pp with respect to gg. In fact, if (gt)t=0T(g_{t})_{t=0}^{T} is a random sequence333We usually also require some kind of restriction on (gt)t=0T(g_{t})_{t=0}^{T}, such as requiring it to be a martingale or a local martingale., the above is also known as a discrete stochastic integral. In particular, consider the case where (gt)t=0T(g_{t})_{t=0}^{T} is a length TT reflected (i.e., absolute value of a) symmetric random walk. Then, any possible sequence of deterministic gaps has a positive probability of being realized by (gt)t=0T(g_{t})_{t=0}^{T}. In other words, any sequence of gaps is in the support of (gt)t=0T(g_{t})_{t=0}^{T}. Thus, bounding the worst-case regret of 𝒜p\mathcal{A}_{p} is equivalent to bounding almost surely the value of (4) when (gt)t=0T(g_{t})_{t=0}^{T} is a reflected symmetric random walk. This idea will prove itself powerful in continuous-time even though it is not very insightful for the discrete time problem.

4.2 A Continuous-Time Problem

A stochastic process that can be seen as the continuous-time analogue of symmetric random walks is Brownian motion (Revuz and Yor, 1999; Mörters and Peres, 2010). We fix a Brownian motion (Bt)t0(B_{t})_{t\geq 0} throughout the remainder of this text. Inspired by the observation that the discrete regret boils down to a discrete stochastic integral, Harvey et al. (2020b) define a continuous analogue of regret as a continuous stochastic integral. More specifically, given a function p:[0,T)×[0,1]p\colon[0,T)\times\mathbb{R}\to[0,1] such that p(t,0)=1/2p(t,0)=1/2 for all t0t\in\mathbb{R}_{\geq 0}, define the continuous regret at time TT by

ContRegret(T,p)limε00Tεp(t,|Bt|)d|Bt|,\operatorname{ContRegret}(T,p)\coloneqq\lim_{\varepsilon\downarrow 0}\int_{0}^{T-\varepsilon}p(t,\lvert B_{t}\rvert)\mathop{}\!\mathrm{d}\lvert B_{t}\rvert,

where the term in the limit of the right-hand above is the stochastic integral (from 0 to TεT-\varepsilon) of pp with respect to the process (|Bt|)t0(\lvert B_{t}\rvert)_{t\geq 0}. We take the limit as a mere technicality: pp need not be defined at time TT and we want to ensure left-continuity of the continuous regret (the limit is well-defined since a stochastic integral with respect to a reflected Brownian motion is guaranteed to have limits from the left and to be continuous from the right). It is worth noting that usually stochastic integrals are defined with respect to martingales or local martingales, but (|Bt|)t0(\lvert B_{t}\rvert)_{t\geq 0} is neither. Still, (|Bt|)t0(\lvert B_{t}\rvert)_{t\geq 0} happens to be a semi-martingale, which roughly means that it can be written as a sum of two processes: a local-martingale and a process of bounded variation. In this case one can still define stochastic integrals in a way that foundational results such as Itô’s formula still hold and details can be found in Revuz and Yor (1999). We do not give the precise definition of a stochastic integral since we shall not deal with its definition directly. Still, one may think intuitively of such integrals as random Riemann-Stieltjes integrals, although the precise definition of stochastic integral is more delicate.

Let us now look for a continuous function p:[0,T)×[0,1]p\colon[0,T)\times\mathbb{R}\to[0,1] with p(t,0)=1/2p(t,0)=1/2 for all t0t\geq 0 with small continuous regret. Note that without the conditions of continuity or the requirement of p(t,0)=1/2p(t,0)=1/2 for t0t\geq 0, the problem would be trivial. If we did not require p(t,0)=1/2p(t,0)=1/2 for all t[0,T)t\in[0,T), then taking p(t,g)0p(t,g)\coloneqq 0 everywhere would yield 0 continuous regret. Moreover, dropping this requirement would go against the analogous conditions needed in the discrete case, where regret could be written as a “discrete stochastic integral” on Proposition 2.1 only when the player chooses (1/2,1/2)𝖳(1/2,1/2)^{\mathsf{T}} in rounds with 0 gap. Finally, requiring continuity of the pp is a way to avoid technicalities and “unfair” player strategies.

When working with Riemann integrals, instead of manipulating the definitions directly we use powerful and general results such as the Fundamental Theorem of Calculus (FTC). Analogously, the following result, known as Itô’s formula, is one of the main tools we use when manipulating stochastic integrals and which can be seen as an analogue of the FTC (and shows how stochastic integrals do not always follow the classical rules of calculus). We denote by C1,2C^{1,2} the class of bivariate functions that are continuously differentiable with respect to their first argument and twice continuously differentiable with respect to their second argument. Moreover, for any function fC1,2f\in C^{1,2} we denote by tf\partial_{t}f the partial derivative of ff with respect to its first argument, and we denote by gf\partial_{g}f and ggf\partial_{gg}f, respectively, the first and second derivatives of ff with respect to its second argument.

Theorem 4.1 (Itô’s Formula, see Revuz and Yor, 1999, Theorem IV.3.3).

Let f:[0,T)×f\colon[0,T)\times\mathbb{R}\to\mathbb{R} be in C1,2C^{1,2} and let T[0,T)T^{\prime}\in[0,T). Then, almost surely,

f(T,|BT|)f(0,|B0|)=0Tgf(t,|Bt|)d|Bt|+0T[tf(t,|Bt|)+12ggf(t,|Bt|)]dt.f(T^{\prime},\lvert B_{T^{\prime}}\rvert)-f(0,\lvert B_{0}\rvert)=\int_{0}^{T^{\prime}}\partial_{g}f(t,\lvert B_{t}\rvert)\mathop{}\!\mathrm{d}\lvert B_{t}\rvert+\int_{0}^{T^{\prime}}\big{[}{\partial_{t}f(t,\lvert B_{t}\rvert)+\tfrac{1}{2}\partial_{gg}f(t,\lvert B_{t}\rvert)}\big{]}\mathop{}\!\mathrm{d}t. (5)

Note that the first integral in the equation of the above theorem resembles the definition of the continuous regret. In fact, the above result shows an alternative way to write the continuous regret at time TT of a function p:[0,T)×[0,1]p\colon[0,T)\times\mathbb{R}\to[0,1] such that there is RC1,2R\in C^{1,2} with gR=p\partial_{g}R=p. However, it might be hard to compute (or even to bound) the second integral on (5). A straightforward way to circumvent this problem is to look for functions such that the second integral in (5) is 0. For that, it suffices to consider functions RC1,2R\in C^{1,2} that satisfy the backwards heat equation on [0,T)×[0,T)\times\mathbb{R}, that is,

ΔR(t,g)tR(t,g)+12ggR(t,g)=0,(t,g)[0,T)×.\overset{*}{\Delta}R(t,g)\coloneqq\partial_{t}R(t,g)+\frac{1}{2}\partial_{gg}R(t,g)=0,\qquad\forall(t,g)\in[0,T)\times\mathbb{R}. (BHE)

We summarize the above discussion and its implications in the following lemma.

Lemma 4.2.

Let R:[0,T)×[0,1]R\colon[0,T)\times\mathbb{R}\to[0,1] be in C1,2C^{1,2} and such that gR(t,g)=p(t,g)\partial_{g}R(t,g)=p(t,g) for all (t,g)[0,T)×0(t,g)\in[0,T)\times\mathbb{R}_{\geq 0}, such that (BHE) holds and such that R(0,0)=0R(0,0)=0. Then limtTR(t,|Bt|)=ContRegret(T,p)\lim_{t\uparrow T}R(t,\lvert B_{t}\rvert)=\operatorname{ContRegret}(T,p) almost surely.

4.3 A Solution Inspired by Cover’s Algorithm

In the remainder of this text we will make extensive use of a well-known function related to the Gaussian distribution known as complementary error function, defined by

erfc(z)12π0zex2dx=2πzex2dx,z.\operatorname*{erfc}(z)\coloneqq 1-\frac{2}{\sqrt{\pi}}\int_{0}^{z}e^{-x^{2}}\mathop{}\!\mathrm{d}x=\frac{2}{\sqrt{\pi}}\int_{z}^{\infty}e^{-x^{2}}\mathop{}\!\mathrm{d}x,\qquad\forall z\in\mathbb{R}.

In Section 4.4 we will show that the function Q:(,T)×[0,1]Q\colon(-\infty,T)\times\mathbb{R}\to[0,1] in C1,2C^{1,2} given by

Q(t,g)12erfc(g2(Tt)),(t,g)(,T)×Q(t,g)\coloneqq\frac{1}{2}\operatorname*{erfc}\left(\frac{g}{\sqrt{2(T-t)}}\right),\qquad\forall(t,g)\in(-\infty,T)\times\mathbb{R}

satisfies ContRegret(T,Q)=T/(2π)\operatorname{ContRegret}(T,Q)=\sqrt{T/(2\pi)} almost surely. Before bounding the continuous regret, it is enlightening to see how QQ is related to Cover’s algorithm.

Specifically, let pp^{*} be as in (3). Due to the Central Limit Theorem, QQ can be seen as an approximation of pp^{*}. To see why, let (St)t=0(S_{t})_{t=0}^{\infty} be a symmetric random walk, and define XtStSt1X_{t}\coloneqq S_{t}-S_{t-1} and Yt(Xt+1)/2Y_{t}\coloneqq(X_{t}+1)/2 for each t1t\geq 1. Note that YtY_{t} follows a Bernoulli distribution with parameter 1/21/2 for any t1t\geq 1. Moreover, let ZZ be a Gaussian random variable with mean 0 and variance 1. Then, by setting μ𝐄[2Y1]=1\mu\coloneqq\bm{\mathrm{E}}[2Y_{1}]=1 and σ2𝐄[(2Y1μ)2]=1\sigma^{2}\coloneqq\bm{\mathrm{E}}[(2Y_{1}-\mu)^{2}]=1, the Central Limit Theorem guarantees

1LSL\displaystyle\frac{1}{\sqrt{L}}S_{L} =1Li=1LXi=1Li=1L(2Yi1)=Lσ(1Li=1L2Yiμ)LZ,\displaystyle=\frac{1}{\sqrt{L}}\sum_{i=1}^{L}X_{i}=\frac{1}{\sqrt{L}}\sum_{i=1}^{L}(2Y_{i}-1)=\frac{\sqrt{L}}{\sigma}\left(\frac{1}{L}\sum_{i=1}^{L}2Y_{i}-\mu\right)\stackrel{{\scriptstyle L\to\infty}}{{\longrightarrow}}Z,

where the limit holds in distribution. Thus, we roughly have that SLS_{L} and LZ\sqrt{L}Z have similar distributions. Then,

p(t,g)\displaystyle p^{*}(t,g) =12𝐏(STt=g)+𝐏(STt>g)12𝐏((Tt)Z=g)+𝐏((Tt)Z>g)\displaystyle=\frac{1}{2}\bm{\mathrm{P}}(S_{T-t}=g)+\bm{\mathrm{P}}(S_{T-t}>g)\approx\frac{1}{2}\bm{\mathrm{P}}((\sqrt{T-t})Z=g)+\bm{\mathrm{P}}((\sqrt{T-t})Z>g)
=𝐏(Z>gTt)=12erfc(g2(Tt))=Q(t,g).\displaystyle=\bm{\mathrm{P}}\left(Z>\frac{g}{\sqrt{T-t}}\right)=\frac{1}{2}\operatorname*{erfc}\left(\frac{g}{\sqrt{2(T-t)}}\right)=Q(t,g).

One may already presume that using QQ in place of pp^{*} in the discrete experts’ problem should yield a regret bound close enough to the guarantees on the regret of Cover’s algorithm. Indeed, using Berry-Esseen’s Theorem (Durrett, 2019, Section 3.4.4) to more precisely bound the difference between pp^{*} and QQ yields a O(T)O(\sqrt{T}) regret bound with suboptimal constants against binary adversaries. However, it is not clear if the approximation error would yield the optimal constant in the regret bound. Additionally, these guarantees do not naturally extend to arbitrary experts’ costs in [0,1][0,1]. In Section 5 we will show how to use an algorithm closely related to QQ that enjoys a clean bound on the discrete-time regret.

Deriving QQ directly from a PDE.

We have derived QQ by a heuristic argument to approximate pp^{*}. Yet, one can derive the same solution without ever making use of pp^{*} by approaching the problem directly from the stochastic calculus point of view. Namely, consider player strategies that satisfy the BHE, are non-negative, and that place 1/21/2 mass on each expert when the gap is 0. With only these conditions we would end up with anytime solutions similar to the ones considered by Harvey et al. (2020b). In the fixed-time case we can “invert time” by a change of variables tTtt\leftarrow T-t. Then the BHE becomes the traditional heat equation, which QQ satisfies together with the boundary conditions.

4.4 Bounding the Continuous Regret

Interestingly, not only is QQ in C1,2C^{1,2}, but it also satisfies the backwards heat equation, even though we have never explicitly required such a condition to hold. Since the proof of this fact boils down to technical but otherwise straightforward computations, we defer it to Appendix C.

Lemma 4.3.

For all t[0,T)t\in[0,T) and g0g\in\mathbb{R}_{\geq 0} we have ΔQ(t,g)=0\overset{*}{\Delta}Q(t,g)=0.

However, recall that to use Lemma 4.2 we need a function RC1,2R\in C^{1,2} with gR=Q\partial_{g}R=Q that satisfies the backwards heat equation, not necessarily QQ itself needs to satisfy the backwards heat equation. Luckily enough, the following lemma shows how to obtain such a function RR based on QQ.

Proposition 4.4 (Harvey et al., 2020a, Lemma 5.6).

Let h:[0,T)×h\colon[0,T)\times\mathbb{R}\to\mathbb{R} be in C1,2C^{1,2} and define

f(t,g)0gh(t,y)dy120tgh(s,0)ds,(t,g)[0,T)×.f(t,g)\coloneqq\int_{0}^{g}h(t,y)\mathop{}\!\mathrm{d}y-\frac{1}{2}\int_{0}^{t}\partial_{g}h(s,0)\mathop{}\!\mathrm{d}s,\qquad\forall(t,g)\in[0,T)\times\mathbb{R}.

Then,

  1. (i)

    fC1,2f\in C^{1,2},

  2. (ii)

    If hh satisfies (BHE), then so does ff,

  3. (iii)

    h=gfh=\partial_{g}f.

In light of the above proposition, for all (t,g)(,T)×(t,g)\in(-\infty,T)\times\mathbb{R} define

R(t,g)0gQ(t,x)dx120tgQ(s,0)ds.R(t,g)\coloneqq\int_{0}^{g}Q(t,x)\mathop{}\!\mathrm{d}x-\frac{1}{2}\int_{0}^{t}\partial_{g}Q(s,0)\mathop{}\!\mathrm{d}s.

In the case above, we can evaluate these integrals and obtain a formula for RR that is easier to analyze. Although we defer a complete proof of the next equation to Appendix C, using that 0yerfc(x)dx=yerfc(y)1πey2+1π\int_{0}^{y}\operatorname*{erfc}(x)\mathop{}\!\mathrm{d}x=y\operatorname*{erfc}(y)-\frac{1}{\sqrt{\pi}}e^{-y^{2}}+\frac{1}{\sqrt{\pi}} (Olver et al., 2010, Section 7.7(i)) and that ddxerfc(x)=2πex2\frac{\mathop{}\!\mathrm{d}}{\mathop{}\!\mathrm{d}x}\operatorname*{erfc}(x)=-\frac{2}{\sqrt{\pi}}e^{-x^{2}} we can show for every t(,T)t\in(-\infty,T) and gg\in\mathbb{R} that

R(t,g)=g2erfc(g2(Tt))Tt2πexp(g22(Tt))+T2π.R(t,g)=\frac{g}{2}\operatorname*{erfc}\Bigg{(}\frac{g}{\sqrt{2(T-t)}}\Bigg{)}-\sqrt{\frac{T-t}{2\pi}}\exp\Bigg{(}-\frac{g^{2}}{2(T-t)}\Bigg{)}+\sqrt{\frac{T}{2\pi}}. (6)

Since RR satisfies (BHE), Lemma 4.2 shows the continuous regret of QQ is given exactly by RR. The following lemma shows a bound on RR and, thus, a bound on the continuous regret of QQ.

Lemma 4.5.

We have R(0,0)=0R(0,0)=0 and

R(t,g)T2π,(t,g)[0,T)×.R(t,g)\leq\sqrt{\frac{T}{2\pi}},\qquad\forall(t,g)\in[0,T)\times\mathbb{R}.
Proof 4.6.

The facts that R(0,0)=0R(0,0)=0 and that R(t,g)0R(t,g)\leq 0 for g0g\leq 0 and t[0,T)t\in[0,T) are easily verifiable. For the bound on RR for g>0g>0, note first that for any z>0z>0 we have

erfc(z)=2πzex2dx=2πz2x2xex2dx1zπz2xex2dx=ez2zπ.\operatorname*{erfc}(z)=\frac{2}{\sqrt{\pi}}\int_{z}^{\infty}e^{x^{2}}\mathop{}\!\mathrm{d}x=\frac{2}{\sqrt{\pi}}\int_{z}^{\infty}\frac{2x}{2x}e^{x^{2}}\mathop{}\!\mathrm{d}x\leq\frac{1}{z\sqrt{\pi}}\int_{z}^{\infty}2xe^{x^{2}}\mathop{}\!\mathrm{d}x=\frac{e^{-z^{2}}}{z\sqrt{\pi}}.

Therefore, for all (t,g)[0,T)×>0(t,g)\in[0,T)\times\mathbb{R}_{>0} we have

g2erfc(g2(Tt))g22(Tt)exp(g22(Tt))gπ=Tt2πexp(g22(Tt)).\frac{g}{2}\operatorname*{erfc}\Bigg{(}\frac{g}{\sqrt{2(T-t)}}\Bigg{)}\leq\frac{g}{2}\cdot\frac{\sqrt{2(T-t)}\exp\left(\frac{-g^{2}}{2(T-t)}\right)}{g\sqrt{\pi}}=\sqrt{\frac{T-t}{2\pi}}\exp\left(\frac{-g^{2}}{2(T-t)}\right).

Applying the above to (6) yields the desired bound.

Combining these results we get the desired bound on the continuous regret of QQ, which we summarize in the following theorem.

Theorem 4.7.

We have ContRegret(T,Q)T/(2π)\operatorname{ContRegret}(T,Q)\leq\sqrt{T/(2\pi)} almost surely.

5 From Continuous to Discrete Time

In the continuous time algorithm we have that R(t,g)R(t,g) is the continuous regret at time tt with gap gg of the strategy that places probability mass on the lagging expert444We have never formally defined lagging and leading experts in continuous time, and we do not intend to do so. Here we are extrapolating the view given by Proposition 2.1 of regret as a stochastic integral of the probability put on the lagging expert with respect to the gaps for the sake of intuition. according to Q(t,g)=gR(t,g)Q(t,g)=\partial_{g}R(t,g). At the same time, for Cover’s algorithm we have V[t,g]V^{*}[t,g] as an upper-bound on the regret when the mass on the lagging expert is given by p(t,g)p^{*}(t,g). Furthermore, similar to the relation between QQ and RR, we can write pp^{*} as a function of VV^{*} (details can be found on Appendix A): at round tt with gap gg at round t1t-1, the probability mass placed on the lagging expert in Cover’s algorithm is555For g=0g=0 this does not follow directly, but our goal at the moment is only to build intuition.

p(t,g)=V[t,g1]V[t,g+1]2gV[t,g].p^{*}(t,g)=\frac{V^{*}[t,g-1]-V^{*}[t,g+1]}{2}\approx\partial_{g}V^{*}[t,g].

That is, pp^{*} is a sort of discrete derivative of VV^{*} with respect to its second argument. From this analogy, one might expect that a discrete derivative of RR with respect to its second argument yields a good strategy for the player in the original experts’ problem. As we shall see, this is exactly the case. Additionally, computing the discrete derivative of RR amounts to a couple of evaluations of the complementary error function, which we can assume to be computable (up to machine precision) in constant time.

In this section we shall describe the discretized algorithm and give an upper-bound on its regret against restricted binary adversaries, that is, adversaries that choose costs in {(0,1)𝖳,(1,0)𝖳}\{(0,1)^{\mathsf{T}},(1,0)^{\mathsf{T}}\}. Luckily, unlike Cover’s algorithm, the strategy we shall see in this section smoothly extends to general costs in [0,1][0,1] while preserving its performance guarantees. Since the details of this extension amounts to concavity arguments, we defer the details of this extension to Appendix E.

5.1 Discrete Itô’s Formula

In Section 4, the main tool to relate the continuous regret to the function RR was Itô’s formula. Similarly, one of the main tools for the analysis of the discretized continuous-time algorithm will be a discrete version of Itô’s formula. In order to state such a formula and to describe the algorithm, some standard notation to denote discrete derivative will be useful. Namely, for any function f:2f\colon\mathbb{R}^{2}\to\mathbb{R} and any t,gt,g\in\mathbb{R}, define

fg(t,g)\displaystyle f_{g}(t,g) f(t,g+1)f(t,g1)2,\displaystyle\coloneqq\frac{f(t,g+1)-f(t,g-1)}{2},
ft(t,g)\displaystyle f_{t}(t,g) f(t,g)f(t1,g),\displaystyle\coloneqq f(t,g)-f(t-1,g),
fgg(t,g)\displaystyle f_{gg}(t,g) f(t,g+1)+f(t,g1)2f(t,g).\displaystyle\coloneqq f(t,g+1)+f(t,g-1)-2f(t,g).

We are now in place to state a discrete analogue of Itô’s formula. One important assumption of the next theorem is that g0,,gTg_{0},\dotsc,g_{T}\in\mathbb{R} are such that successive values have absolute difference equal to 11. In the case where g0,,gTg_{0},\dotsc,g_{T} are gaps in a 2-experts problem, this means that the adversary needs to be a restricted binary adversary. The version of the next theorem as stated — including the dependence on tt — can be found in Harvey et al. (2020b, Lemma 3.7). Yet, this theorem is a slight generalization of earlier results such as the ones due to Fujita (2008, Section 2) and Kudzhma (1982, Theorem 2)

Theorem 5.1 (Discrete Itô’s Formula).

Let g0,g1,,gTg_{0},g_{1},\dotsc,g_{T}\in\mathbb{R} be such that |gtgt1|=1\lvert g_{t}-g_{t-1}\rvert=1 for every t[T]t\in[T] and let f:2f\colon\mathbb{R}^{2}\to\mathbb{R}. Then,

f(T,gT)f(0,g0)=t=1Tfg(t,gt1)(gtgt1)+t=1T(12fgg(t,gt1)+ft(t,gt1)).f(T,g_{T})-f(0,g_{0})=\sum_{t=1}^{T}f_{g}(t,g_{t-1})(g_{t}-g_{t-1})+\sum_{t=1}^{T}\big{(}\tfrac{1}{2}f_{gg}(t,g_{t-1})+f_{t}(t,g_{t-1})\big{)}.

The first summation in the right-hand side of discrete Itô’s formula can be seen as a discrete stochastic integral when (gt)t=0T(g_{t})_{t=0}^{T} is a random sequence. Remarkably, this term is extremely similar to the regret formula from Proposition 2.1. Thus, if we were to use discrete Itô’s formula to bound the regret, it would be desirable for the second term to (approximately) satisfy an analogue of (BHE). In fact, the potential VV^{*} from Cover’s algorithm satisfies the discrete BHE (with some care needed when the gap is zero, see Appendix A.3). Furthermore, the connection between BHE seems to extend to other problems in online learning: in recent work, Zhang et al. (2022) showed how coin-betting with potentials that satisfy the BHE yield optimal algorithms for unconstrained online learning.

Since RR satisfies (BHE), one might hope that RR would also satisfy such a discrete backwards-heat inequality, yielding an upper-bound on the regret of the strategy given by RgR_{g}. In the work of Harvey et al. (2020b) in the anytime setting, it was the case that the terms in the second sum were non-negative, which in a sense means that the discretized algorithm suffers negative discretization error. In the fixed-time setting we are not as lucky.

5.2 Discretizing the Algorithm

Based on the discussion at the beginning of this section, a natural way to discretize the algorithm from Section 4 is to define the function q:[T]×{0,,T1}q\colon[T]\times\{0,\dotsc,T-1\}\to\mathbb{R} by

q(t,g){Rg(t,g)ift<T,[g=0]12ift=T,t[T],g{0,,T1},q(t,g)\coloneqq\begin{cases}R_{g}(t,g)&\text{if}\leavevmode\nobreak\ t<T,\\ [{g=0}]\frac{1}{2}&\text{if}\leavevmode\nobreak\ t=T,\end{cases}\qquad\forall t\in[T],\forall g\in\{0,\dotsc,T-1\},

where we need to treat the case at the very last step differently since RR is not defined on {T}×\{T\}\times\mathbb{R}. It is not clear from its definition, but we indeed have q(t,0)=1/2q(t,0)=1/2 for all t[T]t\in[T]. We defer the (relatively technical) proof of the next result to Appendix D.

Lemma 5.2.

We have q(t,0)=1/2q(t,0)=1/2 for all t[T]t\in[T].

Our goal now is to combine Proposition 2.1 and the discrete Itô’s formula to bound the regret of 𝒜q\mathcal{A}_{q}. Since RR satisfies the (BHE), one might hope that RR is close to satisfying the discrete version of this equation. To formalize this idea, for all t(,T)t\in(-\infty,T) and gg\in\mathbb{R} define

rgg(t,g)ggR(t,g)Rgg(t,g)andrt(t,g)tR(t,g)Rt(t,g).r_{gg}(t,g)\coloneqq\partial_{gg}R(t,g)-R_{gg}(t,g)\qquad\text{and}\qquad r_{t}(t,g)\coloneqq\partial_{t}R(t,g)-R_{t}(t,g).

The above terms measure how well the first derivative with respect to the first variable and the second derivative with respect to the second variable are each approximated by their discrete analogues. That is, these are basically the discretization errors on the derivatives of RR. Then, combining the fact that RR satisfies (BHE) together with Proposition 2.1 yields the following theorem.

Theorem 5.3.

Consider a game of 𝒜q\mathcal{A}_{q} with a restricted binary adversary with gap sequence given by g0,g1,g2,,gT{0,,T}g_{0},g_{1},g_{2},\dotsc,g_{T}\in\{0,\dotsc,T\} such that g0=0g_{0}=0 and |gtgt1|=1\lvert g_{t}-g_{t-1}\rvert=1 for all t[T]t\in[T]. Then,

Regret(T)T2π+12+12t=1T1rgg(t,gt1)+t=1T1rt(t,gt1).\operatorname{Regret}(T)\leq\sqrt{\frac{T}{2\pi}}+\frac{1}{2}+\frac{1}{2}\sum_{t=1}^{T-1}r_{gg}(t,g_{t-1})+\sum_{t=1}^{T-1}r_{t}(t,g_{t-1}). (7)
Proof 5.4.

Lemma 5.2 and Proposition 2.1 yield

Regret(T)=t=1Tq(t,gt1)(gtgt1)t=1T1q(t,gt1)(gtgt1)+12,\operatorname{Regret}(T)=\sum_{t=1}^{T}q(t,g_{t-1})(g_{t}-g_{t-1})\leq\sum_{t=1}^{T-1}q(t,g_{t-1})(g_{t}-g_{t-1})+\frac{1}{2}, (8)

where in the last inequality we used that q(T,gT1)1/2q(T,g_{T-1})\leq 1/2. Furthermore, by the discrete Itô’s formula (Theorem 5.1), we have

R(T1,gT1)R(0,g0)\displaystyle R(T-1,g_{T-1})-R(0,g_{0}) =t=1T1Rg(t,gt1)(gtgt1)+t=1T1(12Rgg(t,gt1)+Rt(t,gt1))\displaystyle=\sum_{t=1}^{T-1}R_{g}(t,g_{t-1})(g_{t}-g_{t-1})+\sum_{t=1}^{T-1}\big{(}\tfrac{1}{2}R_{gg}(t,g_{t-1})+R_{t}(t,g_{t-1})\big{)}
=(BHE)t=1T1q(t,gt1)(gtgt1)t=1T1(12rgg(t,gt1)+rt(t,gt1))\displaystyle\stackrel{{\scriptstyle\eqref{eq:bhe}}}{{=}}\sum_{t=1}^{T-1}q(t,g_{t-1})(g_{t}-g_{t-1})-\sum_{t=1}^{T-1}\big{(}\tfrac{1}{2}r_{gg}(t,g_{t-1})+r_{t}(t,g_{t-1})\big{)}
(8)Regret(T)12t=1T1(12rgg(t,gt1)+rt(t,gt1)).\displaystyle\stackrel{{\scriptstyle\eqref{eq:proof_regret_with_ito_1}}}{{\geq}}\operatorname{Regret}(T)-\frac{1}{2}-\sum_{t=1}^{T-1}\big{(}\tfrac{1}{2}r_{gg}(t,g_{t-1})+r_{t}(t,g_{t-1})\big{)}.

Rearranging and using the facts given by Lemma 4.5 that R(0,0)=0R(0,0)=0 and that R(T1,gT1)T/(2π)R(T-1,g_{T-1})\leq\sqrt{T/(2\pi)} yield the desired bound on the regret.

5.3 Bounding the Discretization Error

In light of Theorem 5.3, it suffices to bound the accumulated discretization error of the derivatives to obtain potentially good bounds on the regret of 𝒜q\mathcal{A}_{q}. The next two lemmas show that both rt(t,g)r_{t}(t,g) and rgg(t,g)r_{gg}(t,g) are in O((Tt)3/2)O((T-t)^{-3/2}). Since

t=1T1(Tt)3/2t=0T1(Tt)3/2dt=2(11T)2,\sum_{t=1}^{T-1}(T-t)^{-3/2}\leq\int_{t=0}^{T-1}(T-t)^{-3/2}\mathop{}\!\mathrm{d}t=2\left(1-\frac{1}{\sqrt{T}}\right)\leq 2, (9)

this will show that 𝒜q\mathcal{A}_{q} suffers at most T/2π+O(1)\sqrt{T/2\pi}+O(1) regret.666This together with Prop. 2.1 also shows that the difference between in the regret of 𝒜q\mathcal{A}_{q} and 𝒜Q\mathcal{A}_{Q} is in O(1)O(1). Since the proof of these bounds are relatively technical but otherwise not considerably insightful, we defer them to Appendix D.

Lemma 5.5.

For any t(,T)t\in(-\infty,T) and gg\in\mathbb{R} we have

rt(t,g)28π1(Tt)3/2andrgg(t,g)223π1(Tt)3/2.r_{t}(t,g)\leq\frac{\sqrt{2}}{8\sqrt{\pi}}\cdot\frac{1}{(T-t)^{3/2}}\qquad\text{and}\qquad r_{gg}(t,g)\leq\frac{2\sqrt{2}}{3\sqrt{\pi}}\cdot\frac{1}{(T-t)^{3/2}}.

Combining the above lemmas together with Theorem 5.3 yields the following regret bound.

Theorem 5.6.

Define q(t,g)Rg(t,g)q(t,g)\coloneqq R_{g}(t,g) for all (t,g){0,,T1}2(t,g)\in\{0,\dotsc,T-1\}^{2}, consider a game of 𝒜q\mathcal{A}_{q} against a restricted binary adversary. Then,

Regret(T)T2π+1.24.\operatorname{Regret}(T)\leq\sqrt{\frac{T}{2\pi}}+1.24.
Proof 5.7.

Let g1,g2,,gT{0,,T}g_{1},g_{2},\dotsc,g_{T}\in\{0,\dotsc,T\} be the gap sequence and set g00g_{0}\coloneqq 0. We have

Regret(T)\displaystyle\operatorname{Regret}(T) T2π+12+12t=1T1rgg(t,gt1)+t=1T1rt(t,gt1)\displaystyle\leq\sqrt{\frac{T}{2\pi}}+\frac{1}{2}+\frac{1}{2}\sum_{t=1}^{T-1}r_{gg}(t,g_{t-1})+\sum_{t=1}^{T-1}r_{t}(t,g_{t-1}) (by Theorem 5.3),
T2π+12+(28π+23π)t=1T11(Tt)3/2\displaystyle\leq\sqrt{\frac{T}{2\pi}}+\frac{1}{2}+\left(\frac{\sqrt{2}}{8\sqrt{\pi}}+\frac{\sqrt{2}}{3\sqrt{\pi}}\right)\sum_{t=1}^{T-1}\frac{1}{(T-t)^{3/2}} (by Lemma 5.5),
T2π+12+(24π+223π)T2π+1.24,\displaystyle\leq\sqrt{\frac{T}{2\pi}}+\frac{1}{2}+\left(\frac{\sqrt{2}}{4\sqrt{\pi}}+\frac{2\sqrt{2}}{3\sqrt{\pi}}\right)\leq\sqrt{\frac{T}{2\pi}}+1.24, (by (9)).

6 On Optimal Regret for More than Two Experts

In this paper we presented an efficient and optimal algorithm for two experts in the fixed time-setting. A natural question is whether similar techniques can be used to find the minimax regret when we have more than two experts. Encouragingly, techniques from stochastic calculus were also used to find the optimal regret for 4 experts (Bayraktar et al., 2020). Yet, it is not clear how to use similar techniques for cases with arbitrary number of experts. The approach used in this paper and by Harvey et al. (2020b) heavily relies on the gap parameterization for the problem. Although there is an analogous parameterization of the nn experts’ problem into n1n-1 gaps that yields a claim similar to Proposition 2.1, it is not clear what would be an analogous continuous-time problem to guide us in the algorithm design process since the gap process are not independent—even with independent costs on the experts. Moreover, many of the approaches in related work (Abbasi-Yadkori et al., 2017; Harvey et al., 2020b; Bayraktar et al., 2020) focus on specific adversaries such as the comb adversary. However, the latter does not seem to be a worst-case adversary for cases such as for five experts (Chase, 2019). We are not aware of adversaries that could yield worst-case regret for arbitrary fixed number of experts, although asymptotically in nn and TT it is well-known that assigning {0,1}\{0,1\} costs at random is minimax optimal (Cesa-Bianchi and Lugosi, 2006)

\acks

We would like to thank the anonymous ICML 2022 reviewers for their insightful comments. In particular, reviewer 1 suggested the use of Berren-Esseen-like results to derive O(T)O(\sqrt{T}) regret, noted the O(1)O(1) regret difference between 𝒜q\mathcal{A}_{q} and 𝒜Q\mathcal{A}_{Q}, and found a calculation mistake.

N. Harvey was supported by an NSERC Discovery Grant.

References

  • Abbasi-Yadkori et al. (2017) Yasin Abbasi-Yadkori, Peter L. Bartlett, and Victor Gabillon. Near minimax optimal players for the finite-time 3-expert prediction problem. In Annual Conference on Neural Information Processing Systems (NIPS), pages 3033–3042, 2017.
  • Arora et al. (2012) Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. 8:121–164, 2012.
  • Bayraktar et al. (2020) Erhan Bayraktar, Ibrahim Ekren, and Xin Zhang. Finite-time 4-expert prediction problem. Communications in Partial Differential Equations, 45(7):714–757, 2020.
  • Cesa-Bianchi and Lugosi (2006) Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. ISBN 978-0-521-84108-5.
  • Cesa-Bianchi et al. (1997) Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
  • Chase (2019) Zachary Chase. Experimental evidence for asymptotic non-optimality of comb adversary strategy. 12 2019. URL http://arxiv.org/abs/1912.01548.
  • Cover (1967) Thomas M. Cover. Behavior of sequential predictors of binary sequences. In Trans. Fourth Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes (Prague, 1965), pages 263–272. Academia, Prague, 1967.
  • Durrett (2019) Rick Durrett. Probability—theory and examples, volume 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. Fifth edition.
  • Freund (2009) Yoav Freund. A method for hedging in continuous time. October 2009. URL http://arxiv.org/abs/0904.3356.
  • Fujita (2008) Takahiko Fujita. A random walk analogue of Lévy’s theorem. Studia Sci. Math. Hungar., 45(2):223–233, 2008.
  • Gravin et al. (2016) Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards optimal algorithms for prediction with expert advice. In Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 528–547. ACM, New York, 2016.
  • Gravin et al. (2017) Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Tight lower bounds for multiplicative weights algorithmic families. In 44th International Colloquium on Automata, Languages, and Programming (ICALP), volume 80, pages Art. No. 48, 14. 2017.
  • Harvey et al. (2020a) Nicholas J. A. Harvey, Christopher Liaw, Edwin Perkins, and Sikander Randhawa. Optimal anytime regret with two experts. February 2020a. URL https://arxiv.org/abs/2002.08994v2.
  • Harvey et al. (2020b) Nicholas J. A. Harvey, Christopher Liaw, Edwin A. Perkins, and Sikander Randhawa. Optimal anytime regret for two experts. In 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1404–1415. IEEE, 2020b.
  • Karlin and Peres (2017) Anna R. Karlin and Yuval Peres. Game theory, alive. American Mathematical Society, Providence, RI, 2017.
  • Kobzar et al. (2020) Vladimir A. Kobzar, Robert V. Kohn, and Zhilei Wang. New potential-based bounds for prediction with expert advice. In Conference on Learning Theory, (COLT), volume 125 of Proceedings of Machine Learning Research, pages 2370–2405. PMLR, 2020.
  • Kudzhma (1982) R. Kudzhma. Itô’s formula for a random walk. Litovsk. Mat. Sb., 22(3):122–127, 1982.
  • Mörters and Peres (2010) Peter Mörters and Yuval Peres. Brownian motion, volume 30 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2010.
  • Olver et al. (2010) Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark, editors. NIST handbook of mathematical functions. U.S. Department of Commerce, National Institute of Standards and Technology, Washington, DC; Cambridge University Press, Cambridge, 2010.
  • Revuz and Yor (1999) Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third edition, 1999.
  • Robbins (1955) Herbert Robbins. A remark on Stirling’s formula. Amer. Math. Monthly, 62:26–29, 1955.
  • Zhang et al. (2022) Zhiyu Zhang, Ashok Cutkosky, and Ioannis Paschalidis. PDE-based optimal strategy for unconstrained online learning. January 2022. URL https://arxiv.org/abs/2201.07877.

Appendix A Cover’s Algorithm for Two Experts

In this section, we shall review the optimal algorithm for the 2-experts problem originally proposed by Cover (1967) and the matching lower-bound.

A.1 A Dynamic Programming View

In the fixed-time setting we know the total number of rounds before the start of the game. Thus, we may compute ahead of time all the possible states the game can be on each round and decide the probabilities on the experts the player should choose in each case to minimize the worst-possible regret. More specifically, we start with a function p:[T]×{0,,T1}[0,1]p\colon[T]\times\{0,\dotsc,T-1\}\to[0,1] that represents our player strategy: for any t[T]t\in[T], if at the end of a round t1t-1 the experts’ gap is g{0,,T1}g\in\{0,\dotsc,T-1\}, on round tt the player places p(t,g)p(t,g) probability mass on the lagging777When the gap is 0, which means that both experts have the same cumulative loss, we break ties arbitrarily. For the optimal algorithm we shall ultimately derive this will not matter since we will have p(t,0)=1/2p(t,0)=1/2 for all t[T]t\in[T]. expert and 1p(t,g)1-p(t,g) probability on the leading expert, and we denote by 𝒜p\mathcal{A}_{p} such a player strategy defined by pp. Now, for all t{0,,T}t\in\{0,\dotsc,T\} and g{0,,T}g\in\{0,\dotsc,T\}, denote by Vp[t,g]V_{p}[t,g] the maximum regret-to-be-suffered by the player strategy defined by pp on rounds t+1,,Tt+1,\dotsc,T given that at the end of round tt the gap between experts is gg. Slightly more formally, we have

Vp[t,g]sup{Regret(T,𝒜p,)Regret(t,𝒜p,):Tsuch that|Lt(1)Lt(2)|=g},V_{p}[t,g]\coloneqq\sup\{\,{\operatorname{Regret}(T,\mathcal{A}_{p},\mathcal{B}_{\ell})-\operatorname{Regret}(t,\mathcal{A}_{p},\mathcal{B}_{\ell})}\,\colon{\ell\in\mathcal{L}^{T}\leavevmode\nobreak\ \text{such that}\leavevmode\nobreak\ \lvert L_{t}(1)-L_{t}(2)\rvert=g}\}, (10)

where {(1,0)𝖳,(0,1)𝖳}\mathcal{L}\coloneqq\{(1,0)^{\mathsf{T}},(0,1)^{\mathsf{T}}\}. Above we take the supremum instead of the maximum only to account for cases where the set we are considering is empty (and, thus, the supremum evaluates to -\infty), such as when tt and gg have distinct parities or when g>tg>t. Note that by definition of VpV_{p} we have

Vp[0,0]=maxTRegret(T,𝒜p,).V_{p}[0,0]=\max_{\ell\in\mathcal{L}^{T}}\operatorname{Regret}(T,\mathcal{A}_{p},\mathcal{B}_{\ell}).

Thus, if we compute Vp[0,0]V_{p}[0,0], then we have a bound on the worst-case regret of 𝒜p\mathcal{A}_{p} against restricted binary adversaries. The following proposition shows how we can compute this value in a dynamic programming style.

Theorem A.1.

For any p:[T]×{0,,T1}[0,1]p\colon[T]\times\{0,\dotsc,T-1\}\to[0,1], and for all t,g{0,,T}t,g\in\{0,\dotsc,T\} such that Vp[t,g]V_{p}[t,g]\neq-\infty we have

Vp[t,g]\displaystyle V_{p}[t,g] =0\displaystyle=0 ift=T,\displaystyle\qquad\text{if}\leavevmode\nobreak\ t=T, (11)
Vp[t,g]\displaystyle V_{p}[t,g] =max{Vp[t+1,g+1]+p(t+1,g)Vp[t+1,g1]p(t+1,g)\displaystyle=\max\begin{cases}V_{p}[t+1,g+1]+p(t+1,g)\\ V_{p}[t+1,g-1]-p(t+1,g)\end{cases} ift<Tandg>0,\displaystyle\qquad\text{if}\leavevmode\nobreak\ t<T\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ g>0, (12)
Vp[t,g]\displaystyle V_{p}[t,g] =Vp[t+1,1]+max{p(t+1,0),1p(t+1,0)}\displaystyle=V_{p}[t+1,1]+\max\{p(t+1,0),1-p(t+1,0)\} ift<Tandg=0,.\displaystyle\qquad\text{if}\leavevmode\nobreak\ t<T\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ g=0,. (13)
Proof A.2.

First, note that (11) clearly holds by the definition of VpV_{p}. To show that equations (12) and (13) hold, let t,g{0,,T}t,g\in\{0,\dotsc,T\} be such that Vp[t,g]V_{p}[t,g]\neq-\infty. Let T\ell\in\mathcal{L}^{T} be a sequence of cost vectors such that we have gt|Lt(1)Lt(2)|=gg_{t}\coloneqq\lvert L_{t}(1)-L_{t}(2)\rvert=g. First, suppose t<Tt<T and g>0g>0. Then there are two cases for \ell: either the gap gt+1|Lt+1(1)Lt+1(2)|g_{t+1}\coloneqq\lvert L_{t+1}(1)-L_{t+1}(2)\rvert goes up and gt+1=gt+1g_{t+1}=g_{t}+1, or it goes down and gt+1=gt1g_{t+1}=g_{t}-1. This together with Proposition 2.1 and the fact that gt=g>0g_{t}=g>0 implies

Regret(T)Regret(t)\displaystyle\operatorname{Regret}(T)-\operatorname{Regret}(t) =Regret(T)Regret(t+1)+p(t+1,gt)(gt+1gt)\displaystyle=\operatorname{Regret}(T)-\operatorname{Regret}(t+1)+p(t+1,g_{t})(g_{t+1}-g_{t})
={Regret(T)Regret(t+1)+p(t+1,g),ifgt+1=g+1Regret(T)Regret(t+1)p(t+1,g),ifgt+1=g1.\displaystyle=\begin{cases}\operatorname{Regret}(T)-\operatorname{Regret}(t+1)+p(t+1,g),&\text{if}\leavevmode\nobreak\ g_{t+1}=g+1\\ \operatorname{Regret}(T)-\operatorname{Regret}(t+1)-p(t+1,g),&\text{if}\leavevmode\nobreak\ g_{t+1}=g-1.\end{cases}

By taking the maximum over all possible cost vectors with gap gg at round tt we obtain (12). Now suppose t<Tt<T and g=0g=0. In this case, suppose without loss of generality that 11 is the expert to whom 𝒜p\mathcal{A}_{p} assigns mass p(t,0)p(t,0) (recall that the strategy 𝒜p\mathcal{A}_{p} breaks ties arbitrarily when the gap is 0). Proposition 2.1 together with the fact that t={(1,0)𝖳,(0,1)𝖳}\ell_{t}\in\mathcal{L}=\{(1,0)^{\mathsf{T}},(0,1)^{\mathsf{T}}\}

Regret(T)Regret(t)\displaystyle\operatorname{Regret}(T)-\operatorname{Regret}(t) =Regret(T)Regret(t+1)+t𝖳xt\displaystyle=\operatorname{Regret}(T)-\operatorname{Regret}(t+1)+\ell_{t}^{\mathsf{T}}x_{t}
={Regret(T)Regret(t+1)+p(t+1,g)ift(1)=1,Regret(T)Regret(t+1)+1p(t+1,g)ift(1)=0.\displaystyle=\begin{cases}\operatorname{Regret}(T)-\operatorname{Regret}(t+1)+p(t+1,g)&\text{if}\leavevmode\nobreak\ \ell_{t}(1)=1,\\ \operatorname{Regret}(T)-\operatorname{Regret}(t+1)+1-p(t+1,g)&\text{if}\leavevmode\nobreak\ \ell_{t}(1)=0.\end{cases}

Since the gap on round t+1t+1 is certainly 11 in this case, taking the maximum over all the adversaries with gap 0 on round tt yields (13).

For the sake of convenience, we redefine Vp[t,g]V_{p}[t,g] for all t,gt,g such that Vp[t,g]=V_{p}[t,g]=-\infty to, instead, be the value given by the equations from the above theorem.888There will be places where this definition requires access to undefined or “out-of-bounds” entries (such as for entries with gap TT and time t<Tt<T). In such cases, we set such undefined/out-of-bounds values to 0. This does not affect any of our results and makes it less cumbersome to design and analyze the algorithm.

A.2 Picking Optimal Probabilities

We are interested in a function p:[T]×{0,,T1}[0,1]p^{*}\colon[T]\times\{0,\dotsc,T-1\}\to[0,1], if any, that minimizes Vp[0,0]V_{p}[0,0]. To see that there is indeed such a function, note that we can formulate the problem of minimizing Vp[0,0]V_{p}[0,0] as a linear program using Theorem A.1 to design the constraints. Such a linear program is certainly bounded (the regret is always between 0 and TT) and feasible. Thus, let p:[T]×{0,,T1}[0,1]p^{*}\colon[T]\times\{0,\dotsc,T-1\}\to[0,1] be a function that attains minpVp[0,0]\min_{p}V_{p}[0,0] and define VVpV^{*}\coloneqq V_{p^{*}}. The next proposition shows that VV^{*} can be computed recursively and show how to obtain pp^{*} from VV^{*}.

Theorem A.3.

For each t,g{0,,T}t,g\in\{0,\dotsc,T\}

V[t,g]\displaystyle V^{*}[t,g] =0\displaystyle=0 ift=Torg = T,\displaystyle\qquad\text{if}\leavevmode\nobreak\ t=T\leavevmode\nobreak\ \text{or}\leavevmode\nobreak\ \text{g = T},
V[t,g]\displaystyle V^{*}[t,g] =12(V[t+1,g+1]+V[t+1,g1])\displaystyle=\frac{1}{2}\left(V^{*}[t+1,g+1]+V^{*}[t+1,g-1]\right) ift<Tand 0<g<T,\displaystyle\text{if}\leavevmode\nobreak\ t<T\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ 0<g<T,
V[t,0]\displaystyle V^{*}[t,0] =V[t+1,1]+12\displaystyle=V^{*}[t+1,1]+\frac{1}{2} ift<T.\displaystyle\text{if}\leavevmode\nobreak\ t<T.

Furthermore, if we define p:{0,,T}2[0,1]p^{*}\colon\{0,\dotsc,T\}^{2}\to[0,1] by

p(t,g){12(Vp[t,g1]Vp[t,g+1]),ifg>0,12ifg=0,t[T],g{0,,T1},p^{*}(t,g)\coloneqq\begin{cases}\frac{1}{2}(V_{p^{*}}[t,g-1]-V_{p^{*}}[t,g+1]),&\text{if}\leavevmode\nobreak\ g>0,\\ \frac{1}{2}&\text{if}\leavevmode\nobreak\ {g=0},\end{cases}\qquad\forall t\in[T],\forall g\in\{0,\dotsc,T-1\},

then Vp=VV_{p^{*}}=V^{*}.

Proof A.4.

Let us show that pp^{*} as defined in the statement of the theorem attains infpVp[0,0]\inf_{p}V_{p}[0,0], where the infimum ranges over all functions from [T]×{0,,T1}[T]\times\{0,\dotsc,T-1\} to [0,1][0,1]. Note that smaller values of any entry of VpV_{p^{*}} can only make Vp[0,0]V_{p^{*}}[0,0] smaller. Thus, showing that p(t+1,g)p^{*}(t+1,g) minimizes Vp[t,g]V_{p^{*}}[t,g] for all t,g{0,,T1}t,g\in\{0,\dotsc,T-1\} (given that Vp[t,g]V_{p^{*}}[t^{\prime},g^{\prime}] is fixed for tt+1t^{\prime}\geq t+1 and g{0,,T}g^{\prime}\in\{0,\dotsc,T\}) suffices to that pp^{*} minimizes999One may fear that choosing p(t+1,g)p^{*}(t+1,g) to minimize Vp[t,g]V_{p^{*}}[t,g] might increase other entries, making this argument invalid. However, note that Vp[t,g]V_{p^{*}}[t,g] depends only on p(t+1,g)p^{*}(t+1,g) and entries Vp[t,g]V_{p^{*}}[t^{\prime},g^{\prime}] with t>tt^{\prime}>t. Thus, it is not hard to see that by proceeding from higher to smaller values of t{0,,T}t\in\{0,\dotsc,T\}, we can in fact pick p(t+1,g)p^{*}(t+1,g) to minimize Vp[t,g]V_{p^{*}}[t,g]. Vp[0.0]V_{p^{*}}[0.0]. Moreover, by Theorem A.1 and the definition of pp^{*} we have that VpV_{p^{*}} obeys the formulas on the statement of this theorem. Thus, we only need to show that this choice of pp^{*} indeed minimizes the entries of VpV_{p^{*}}.

Let us first show that

Vp[t,g1]Vp[t,g+1][0,1]for allg{1,,T1}andt{0,,T}.V_{p^{*}}[t,g-1]-V_{p^{*}}[t,g+1]\in[0,1]\leavevmode\nobreak\ \text{for all}\leavevmode\nobreak\ g\in\{1,\dotsc,T-1\}\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ t\in\{0,\dotsc,T\}. (14)

For t=Tt=T, since Vp[T,]0V_{p^{*}}[T,\cdot]\equiv 0, the above claim clearly holds. Let t{0,,T1}t\in\{0,\dotsc,T-1\} and g{1,,T1}g\in\{1,\dotsc,T-1\}. If g=1g=1 we have

Vp[t,g1]Vp[t,g+1]\displaystyle V_{p^{*}}[t,g-1]-V_{p^{*}}[t,g+1] =Vp[t+1,g]+1212(Vp[t+1,g]+Vp[t+1,g+2])\displaystyle=V_{p^{*}}[t+1,g]+\frac{1}{2}-\frac{1}{2}\left(V_{p^{*}}[t+1,g]+V_{p^{*}}[t+1,g+2]\right)
=12(Vp[t+1,g]Vp[t+1,g+2])+12.\displaystyle=\frac{1}{2}\left(V_{p^{*}}[t+1,g]-V_{p^{*}}[t+1,g+2]\right)+\frac{1}{2}.

The last term above, by the induction hypothesis, is in [0,1][0,1]. Similarly, if T1g2T-1\geq g\geq 2 we have

Vp[t,g1]Vp[t,g+1]\displaystyle V_{p^{*}}[t,g-1]-V_{p^{*}}[t,g+1]
=12(Vp[t+1,g]+Vp[t+1,g2])12(Vp[t+1,g]+Vp[t+1,g+2])\displaystyle=\frac{1}{2}\left(V_{p^{*}}[t+1,g]+V_{p^{*}}[t+1,g-2]\right)-\frac{1}{2}\left(V_{p^{*}}[t+1,g]+V_{p^{*}}[t+1,g+2]\right)
=12(Vp[t+1,g2]Vp[t+1,g])+12(Vp[t+1,g]Vp[t+1,g+2]),\displaystyle=\frac{1}{2}\left(V_{p^{*}}[t+1,g-2]-V_{p^{*}}[t+1,g]\right)+\frac{1}{2}\left(V_{p^{*}}[t+1,g]-V_{p^{*}}[t+1,g+2]\right),

and the last term is in [0,1][0,1] by the induction hypothesis. This completes the proof of (14).

Let t[T]t\in[T] and g{0,,T1}g\in\{0,\dotsc,T-1\}. Let us now show that p(t,g)p^{*}(t,g) minimizes Vp[t,g]V_{p^{*}}[t,g] given that entries of the form Vp[t,g]V_{p^{*}}[t^{\prime},g^{\prime}] for tt+1t^{\prime}\geq t+1 and g{0,,T}g^{\prime}\in\{0,\dotsc,T\} are fixed. For g=0g=0, Theorem A.1 shows that Vp[t,0]=Vp[t+1,1]+max{1α,α}V_{p^{*}}[t,0]=V_{p^{*}}[t+1,1]+\max\{1-\alpha,\alpha\} for some α[0,1]\alpha\in[0,1], which is minimized when α=1/2=p(t,0)\alpha=1/2=p^{*}(t,0). For g>0g>0, Theorem A.1 tells us that

Vp[t,g]=max{Vp[t+1,g+1]+p(t+1,g),Vp[t+1,g1]p(t+1,g)}.V_{p^{*}}[t,g]=\max\{V_{p^{*}}[t+1,g+1]+p^{*}(t+1,g),V_{p^{*}}[t+1,g-1]-p^{*}(t+1,g)\}.

Since p(t+1,g)0p^{*}(t+1,g)\geq 0 and V[t+1,g+1]V[t+1,g1]V^{*}[t+1,g+1]\leq V^{*}[t+1,g-1] by (14), p(t+1,g)p^{*}(t+1,g) certainly minimizes V[t,g]V^{*}[t,g] since it makes both terms in the maximum equal. Finally, (14) guarantees that101010In fact, it guarantees that p(t,g)1/2p^{*}(t,g)\leq 1/2. Intuitively this makes sense since we want to give more probability to the current best/leading expert than to the lagging expert. p(t+1,g)[0,1]p^{*}(t+1,g)\in[0,1].

A.3 Discrete Backwards Heat Equation

Interestingly, the potential function VV^{*} for Cover’s optimal algorithm satisfies the discrete backwards heat equation when the gap is not . For simplicity, let us focus on the simpler case with t[T]t\in[T] and gap g{1,,T1}g\in\{1,\dotsc,T-1\}. Then, taking V[t,T+1]=0V[t,T+1]=0 we have

Vt[t,g]\displaystyle V_{t}^{*}[t,g] =V[t,g]V[t1,g]\displaystyle=V^{*}[t,g]-V^{*}[t-1,g]
=V[t,g]12(V[t,g+1]+V[t,g1])\displaystyle\stackrel{{\scriptstyle}}{{=}}V^{*}[t,g]-\frac{1}{2}(V^{*}[t,g+1]+V[t,g-1]) (by Theorem A.3)
=12(2V[t,g]+V[t,g+1]+V[t,g1])\displaystyle=-\frac{1}{2}(-2V^{*}[t,g]+V^{*}[t,g+1]+V[t,g-1])
=12Vgg[t,g].\displaystyle=-\frac{1}{2}V_{gg}^{*}[t,g].

The same holds for the case where the gap is zero, but we need to extend VV^{*} for when the gap is 1-1. Namely, set V[t,1]=V[t,1]+1V[t,-1]=V[t,1]+1 for t<Tt<T. This guarantees that p(t,0)=1/2=(1/2)(V[t,1]V[t,1])p(t,0)=1/2=(1/2)(V[t,-1]-V[t,1]) and V[t,0]=1/2(V[t+1,1]+V[t+1,1])V^{*}[t,0]=1/2(V[t+1,1]+V[t+1,-1]), so the cases with zero gap agree with the formulas for non-zero gaps in Theorem A.3. Interestingly, one may verify that p(t,g)p^{*}(t,g) also satisfies the discrete BHE by setting p(t,g)=1p(t,g)p(t,-g)=1-p(t,g) for g0g\geq 0.

A.4 Connecting the Regret with Random Walks

As argued before, to give an upper-bound on the regret of 𝒜p\mathcal{A}_{p^{*}}, where pp^{*} is as in Theorem A.3, we need only to bound the value of Vp[0,0]=V[0,0]V_{p^{*}}[0,0]=V^{*}[0,0]. Interestingly, the entries of VV^{*} have a strong connection to random walks, and this helps us give an upper-bound on the value of V[0,0]V^{*}[0,0]. In the next theorem and for the remainder of the text, a random walk (of length tt\in\mathbb{N} starting at g0g\in\mathbb{Z}_{\geq 0}) is a sequence of random variables (Si)i=0t(S_{i})_{i=0}^{t} where Sig+X1++XiS_{i}\coloneqq g+X_{1}+\dotsm+X_{i} for each i{0,,t}i\in\{0,\dotsc,t\} and {Xj}j[t]\{X_{j}\}_{j\in[t]} are i.i.d. random variables taking values in {±1}\{\pm 1\}. If we do not specify a starting point of a random walk, take it to be 0. We say that StS_{t} is symmetric if 𝐏(X1=1)=𝐏(X1=1)=1/2\bm{\mathrm{P}}(X_{1}=1)=\bm{\mathrm{P}}(X_{1}=-1)=1/2. Moreover, a reflected random walk (of length tt) is the sequence of random variables (|Si|)i{0,,t}(\lvert S_{i}\rvert)_{i\in\{0,\dotsc,t\}} where (Si)i=0t(S_{i})_{i=0}^{t} is a random walk. Finally, we say that a random walk (Si)i=0t(S_{i})_{i=0}^{t} passes through gg\in\mathbb{N} if the event {Si=g}\{S_{i}=g\} happens for some i{0,,t}i\in\{0,\dotsc,t\}.

The following lemma gives numeric bounds on the expected number of passages through 0 of a symmetric random walk. Its proof boils down to careful applications of Stirling’s formula and can be found in Appendix B.

Lemma A.5.

Let the random variable ZT(0)Z_{T}(0) be the number of passages through 0 of a reflected symmetric random walk of length TT. Then,

2Tπ+35𝐄[ZT(0)]1+2Tπ.\sqrt{\frac{2T}{\pi}}+\frac{3}{5}\leq\bm{\mathrm{E}}[Z_{T}(0)]\leq 1+\sqrt{\frac{2T}{\pi}}.

We are now in position to prove an upper-bound on the performance of 𝒜p\mathcal{A}_{p^{*}}.

Theorem A.6.

For every r,gr,g\in\mathbb{N}, let the random variable Zr(g)Z_{r}(g) be the number of passages through 0 of a reflected symmetric random walk of length rr starting at position gg. Then V[t,g]=12𝐄[ZTt1(g)]V^{*}[t,g]=\tfrac{1}{2}\bm{\mathrm{E}}[Z_{T-t-1}(g)] for every t,gt,g\in\mathbb{N}. In particular,

V[0,0]=12𝐄[ZT1(0)]T2πV^{*}[0,0]=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-1}(0)]\leq\sqrt{\frac{T}{2\pi}}
Proof A.7.

Let us show that V[t,g]=(1/2)𝐄[ZTt1(g)]V^{*}[t,g]=(1/2)\bm{\mathrm{E}}[Z_{T-t-1}(g)] for all t,g{0,,T}t,g\in\{0,\dotsc,T\} by induction on TtT-t. For t=Tt=T we have ZTt1(g)=0Z_{T-t-1}(g)=0. Assume t=T1t=T-1 and let g{0,,T}g\in\{0,\dotsc,T\}. If g>0g>0 we have (1/2)Z0(g)=0=V[T1,g](1/2)Z_{0}(g)=0=V^{*}[T-1,g]. If g=0g=0, we have (1/2)Z0=1/2=V[T1,0](1/2)Z_{0}=1/2=V^{*}[T-1,0]. Suppose now t<T1t<T-1. If g=Tg=T, then we have V[t,g]=0=(1/2)ZTt1(g)V^{*}[t,g]=0=(1/2)Z_{T-t-1}(g). Now let us look at the case T>g>0T>g>0. By Theorem A.3 and by the induction hypothesis, we have

V[t,g]\displaystyle V^{*}[t,g]
=12(V[t+1,g+1]+V[t+1,g1])\displaystyle=\frac{1}{2}\left(V^{*}[t+1,g+1]+V^{*}[t+1,g-1]\right)
=12(12𝐄[ZTt2(g+1)]+12𝐄[ZTt2(g1)])\displaystyle=\frac{1}{2}\left(\frac{1}{2}\bm{\mathrm{E}}[Z_{T-t-2}(g+1)]+\frac{1}{2}\bm{\mathrm{E}}[Z_{T-t-2}(g-1)]\right)
=12(𝐏(STt2=STt1+1)𝐄[ZTt2(g+1)]\displaystyle=\frac{1}{2}\Big{(}\bm{\mathrm{P}}(S_{T-t-2}=S_{T-t-1}+1)\bm{\mathrm{E}}[Z_{T-t-2}(g+1)]
+𝐏(STt2=STt11)𝐄[ZTt2(g1)])\displaystyle\qquad+\bm{\mathrm{P}}(S_{T-t-2}=S_{T-t-1}-1)\bm{\mathrm{E}}[Z_{T-t-2}(g-1)]\Big{)}
=12𝐄[ZTt(g)].\displaystyle=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-t}(g)].

Similarly, for the case when g=0g=0 we have

V[t,0]=V[t+1,1]+12=12𝐄[ZTt2(1)+1]=12𝐄[ZTt1(0)].V^{*}[t,0]=V^{*}[t+1,1]+\frac{1}{2}=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-t-2}(1)+1]=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-t-1}(0)].

In particular, we have V[0,0]=(1/2)𝐄[ZT(0)]V^{*}[0,0]=(1/2)\bm{\mathrm{E}}[Z_{T}(0)] and Lemma A.5 gives us the desired numerical bound.

A.5 Lower Bound on the Optimal Regret

In the previous section we showed that Cover’s algorithm suffers regret at most T/(2π)+O(1)\sqrt{T/(2\pi)}+O(1). In fact, by the definition of VpV_{p} (see (10)) and VV^{*} we have that V[0,0]V^{*}[0,0] is the minimum regret algorithms of the form 𝒜p\mathcal{A}_{p}, where p:[T]×{0,,T1}[0,1]p\colon[T]\times\{0,\dotsc,T-1\}\to[0,1] is some function, suffer in the worst-case scenario. However, this does not tell us whether more general player strategies can do better or not. The next theorem shows that any player strategy suffers, in the worst case, at least T/(2π)O(1)\sqrt{T/(2\pi)}-O(1) regret. The proof of the theorem boils down to lower-bounding the expected regret of a random adversary that plays uniformly from ={(0,1)𝖳,(1,0)𝖳}\mathcal{L}=\{(0,1)^{\mathsf{T}},(1,0)^{\mathsf{T}}\}.

Theorem A.8.

Let 𝒜\mathcal{A} be a player strategy for a 2-experts game with TT\in\mathbb{N} rounds. Then, there is T\ell\in\mathcal{L}^{T} such that

Regret(T,𝒜,)T2π12π15\operatorname{Regret}(T,\mathcal{A},\mathcal{B}_{\ell})\geq\sqrt{\frac{T}{2\pi}}-\sqrt{\frac{1}{2\pi}}-\frac{1}{5}
Proof A.9.

Let {~t}t=1T\{\tilde{\ell}_{t}\}_{t=1}^{T} be i.i.d. random variables such that ~t\tilde{\ell}_{t} is equal to a vector in ={(0,1)𝖳,(1,0)𝖳}\mathcal{L}=\{(0,1)^{\mathsf{T}},(1,0)^{\mathsf{T}}\} chosen uniformly at random and let ~\tilde{\mathcal{B}} be the (randomized) oblivious adversary that plays ~t\tilde{\ell}_{t} at round tt. We shall show that

𝐄[Regret(T,𝒜,~)]T2π12π15,\bm{\mathrm{E}}[\operatorname{Regret}(T,\mathcal{A},\tilde{\mathcal{B}})]\geq\sqrt{\frac{T}{2\pi}}-\sqrt{\frac{1}{2\pi}}-\frac{1}{5}, (15)

which implies the existence of a deterministic adversary as described in the statement. For each t{0,,T}t\in\{0,\dotsc,T\}, let the random variable g~t\tilde{g}_{t} be the gap between experts due to the costs of ~\tilde{\mathcal{B}} on round tt, define xt𝒜(~1,,~t1)x_{t}\coloneqq\mathcal{A}(\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t-1}), and set ptxt(it)p_{t}\coloneqq x_{t}(i_{t}) where it[2]i_{t}\in[2] is a lagging expert on round tt. It is worth to already note that (g~t)t=0T(\tilde{g}_{t})_{t=0}^{T} is a reflected random walk of length TT. By Proposition 2.1 we have

𝐄[Regret(T)]=t=1T𝐄[[g~t1>0]pt(g~tg~t1)]+t=1T𝐄[[g~t1=0]~t𝖳xt],\bm{\mathrm{E}}[\operatorname{Regret}(T)]=\sum_{t=1}^{T}\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}>0}]p_{t}\cdot(\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]}+\sum_{t=1}^{T}\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}=0}]\tilde{\ell}_{t}^{\mathsf{T}}x_{t}}\big{]},

where we recall that for any predicate PP we have [P][{P}] equals 11 if PP is true, and equals 0 otherwise. First, let us show that

𝐄[[g~t1>0]pt(g~tg~t1)]=0,t[T].\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}>0}]p_{t}\cdot(\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]}=0,\qquad\forall t\in[T]. (16)

For each t{0,,T}t\in\{0,\dotsc,T\}, define 𝐄t[]𝐄[|~1,,~t]\bm{\mathrm{E}}_{t}[\cdot]\coloneqq\bm{\mathrm{E}}[\cdot\rvert\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t}], that is, 𝐄t\bm{\mathrm{E}}_{t} is the conditional expectation given the choices of the random adversary on rounds 1,,t1,\dotsc,t. Let t[T]t\in[T]. On the event {g~t1>0}\{\tilde{g}_{t-1}>0\}, one can see that g~tg~t1\tilde{g}_{t}-\tilde{g}_{t-1} is independent of ~1,,~t1\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t-1} and is uniformly distributed on {±1}\{\pm 1\}. This together with the fact that ptp_{t} is a function of ~1,,~t1\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t-1} implies

𝐄[[g~t1>0]pt(g~tg~t1)]\displaystyle\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}>0}]p_{t}\cdot(\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]} =𝐄[𝐄t1[[g~t1>0]pt(g~tg~t1)]]\displaystyle=\bm{\mathrm{E}}\Big{[}{\bm{\mathrm{E}}_{t-1}\big{[}{[{\tilde{g}_{t-1}>0}]p_{t}\cdot(\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]}}\Big{]}
=𝐄[pt𝐄t1[[g~t1>0](g~tg~t1)]]\displaystyle=\bm{\mathrm{E}}\Big{[}{p_{t}\bm{\mathrm{E}}_{t-1}\big{[}{[{\tilde{g}_{t-1}>0}](\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]}}\Big{]}
=𝐄[pt𝐄[[g~t1>0](g~tg~t1)]]\displaystyle=\bm{\mathrm{E}}\Big{[}{p_{t}\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}>0}](\tilde{g}_{t}-\tilde{g}_{t-1})}\big{]}}\Big{]}
=𝐄[pt0]=0.\displaystyle=\bm{\mathrm{E}}[{p_{t}\cdot 0}]=0.

This ends the proof of (16). Let us now show that

t=1T𝐄[[g~t1=0]~t𝖳xt]=12𝐄[ZT1(0)].\sum_{t=1}^{T}\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}=0}]\tilde{\ell}_{t}^{\mathsf{T}}x_{t}}\big{]}=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-1}(0)]. (17)

For each t[T]t\in[T], since xtx_{t} is a function of ~1,,~t1\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t-1} and ~t\tilde{\ell}_{t} is independent of ~1,,~t1\tilde{\ell}_{1},\dotsc,\tilde{\ell}_{t-1}, we have

𝐄[[g~t1=0]~t𝖳xt]\displaystyle\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}=0}]\tilde{\ell}_{t}^{\mathsf{T}}x_{t}}\big{]} =𝐄[𝐄t1[[g~t1=0]~t𝖳xt]]\displaystyle=\bm{\mathrm{E}}\Big{[}{\bm{\mathrm{E}}_{t-1}\big{[}{[{\tilde{g}_{t-1}=0}]\tilde{\ell}_{t}^{\mathsf{T}}x_{t}}\big{]}}\Big{]}
=𝐄[[g~t1=0]𝐄t1[~t]𝖳xt]\displaystyle=\bm{\mathrm{E}}\Big{[}{[{\tilde{g}_{t-1}=0}]\bm{\mathrm{E}}_{t-1}\big{[}{\tilde{\ell}_{t}}\big{]}^{\mathsf{T}}x_{t}}\Big{]}
=𝐄[[g~t1=0]𝐄[~t]𝖳xt]\displaystyle=\bm{\mathrm{E}}\Big{[}{[{\tilde{g}_{t-1}=0}]\bm{\mathrm{E}}\big{[}{\tilde{\ell}_{t}}\big{]}^{\mathsf{T}}x_{t}}\Big{]}
=𝐄[[gt1=0](12xt(1)+12xt(2))]=12𝐏(g~t1=0).\displaystyle=\bm{\mathrm{E}}\Big{[}{[{g_{t-1}=0}]\Big{(}\frac{1}{2}x_{t}(1)+\frac{1}{2}x_{t}(2)\Big{)}}\Big{]}=\frac{1}{2}\bm{\mathrm{P}}(\tilde{g}_{t-1}=0).

Thus,

t=1T𝐄[[g~t1=0]~t𝖳xt]=12t=1T𝐏(g~t1=0)=12𝐄[t=1T𝟙{g~t1=0}]=12𝐄[ZT1(0)].\sum_{t=1}^{T}\bm{\mathrm{E}}\big{[}{[{\tilde{g}_{t-1}=0}]\tilde{\ell}_{t}^{\mathsf{T}}x_{t}}\big{]}=\frac{1}{2}\sum_{t=1}^{T}\bm{\mathrm{P}}(\tilde{g}_{t-1}=0)=\frac{1}{2}\bm{\mathrm{E}}\Big{[}{\sum_{t=1}^{T}\mathbbm{1}_{\{\tilde{g}_{t-1}=0\}}}\Big{]}=\frac{1}{2}\bm{\mathrm{E}}[Z_{T-1}(0)].

This completes the proof of (17) and the desired numerical lower-bound is given by Lemma A.5.

Appendix B On the Passages Through Zero of a Symmetric Random Walk

In this section we shall prove Lemma A.5, which bounds the expected number of passages through 0 of a symmetric random walk. First, we need a simple corollary of Stirling’s formula (which we state here for convenience) to bound binomial terms.

Theorem B.1 (Stirling’s Formula, Robbins, 1955).

For any nn\in\mathbb{N} we have

2πn(ne)ne1/(12n+1)<n!<2πn(ne)ne1/(12n).\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{1/(12n+1)}<n!<\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{1/(12n)}.
Corollary B.2.

For any nn\in\mathbb{N} we have

22nπn(1215n)(2nn)22nπn.\frac{2^{2n}}{\sqrt{\pi n}}\left(1-\frac{2}{15n}\right)\leq\binom{2n}{n}\leq\frac{2^{2n}}{\sqrt{\pi n}}.
Proof B.3.

Let nn\in\mathbb{N}. For the upper-bound, we have

(2nn)\displaystyle\binom{2n}{n} =2n!(n!)2\displaystyle=\frac{2n!}{(n!)^{2}}
<2πn(2n)2ne2ne1/24n2πnn2ne2ne2/(12n+1)\displaystyle<\frac{\sqrt{2\pi n}\cdot(2n)^{2n}\cdot e^{-2n}\cdot e^{1/24n}}{2\pi n\cdot n^{2n}\cdot e^{-2n}\cdot e^{2/(12n+1)}}
=22n22πnexp(124n212n+1)\displaystyle=\frac{2^{2n}\sqrt{2}}{\sqrt{2\pi n}}\cdot\exp\left(\frac{1}{24n}-\frac{2}{12n+1}\right)
22nπnexp(124n)22nπn.\displaystyle\leq\frac{2^{2n}}{\sqrt{\pi n}}\cdot\exp\left(-\frac{1}{24n}\right)\leq\frac{2^{2n}}{\sqrt{\pi n}}.

Similarly, for the lower-bound we have

(2nn)\displaystyle\binom{2n}{n} =2n!(n!)2\displaystyle=\frac{2n!}{(n!)^{2}}
>2πn(2n)2ne2ne1/(24n+1)2πnn2ne2ne2/12n\displaystyle>\frac{\sqrt{2\pi n}\cdot(2n)^{2n}\cdot e^{-2n}\cdot e^{1/(24n+1)}}{2\pi n\cdot n^{2n}\cdot e^{-2n}\cdot e^{2/12n}}
=22n22πnexp(124n16n)\displaystyle=\frac{2^{2n}\sqrt{2}}{\sqrt{2\pi n}}\cdot\exp\left(\frac{1}{24n}-\frac{1}{6n}\right)
=22nπnexp(430n)\displaystyle=\frac{2^{2n}}{\sqrt{\pi n}}\cdot\exp\left(-\frac{4}{30n}\right)
22nπn(1215n),\displaystyle\geq\frac{2^{2n}}{\sqrt{\pi n}}\left(1-\frac{2}{15n}\right), (Since ex1xe^{-x}\geq 1-x for x0x\geq 0).

We are now ready to prove Lemma A.5, which we restate for convenience.

See A.5

Proof B.4.

Let {St}t=0T\{S_{t}\}_{t=0}^{T} be a symmetric random walk and define XiSiSi1X_{i}\coloneqq S_{i}-S_{i-1} for every i[T]i\in[T]. Note that

𝐏(|St|=0)=𝐏(St=0)\displaystyle\bm{\mathrm{P}}(\lvert S_{t}\rvert=0)=\bm{\mathrm{P}}(S_{t}=0) =𝐏(|{Xi=1:i[t]}|=|{Xi=1:i[t]}|)\displaystyle=\bm{\mathrm{P}}\left(|\{\,{X_{i}=1}\,\colon{i\in[t]}\}|=|\{\,{X_{i}=-1}\,\colon{i\in[t]}\}|\right)
={0iftis odd,(tt/2)2tiftis even.\displaystyle=\begin{cases}0&\text{if}\leavevmode\nobreak\ t\leavevmode\nobreak\ \text{is odd},\\ \binom{t}{t/2}2^{-t}&\text{if}\leavevmode\nobreak\ t\leavevmode\nobreak\ \text{is even}.\end{cases}

Therefore,

𝐄[ZT(0)]=t=0T𝐏(St=0)=k=0T/2𝐏(S2k=0)=k=0T/2(2kk)122k.\bm{\mathrm{E}}[Z_{T}(0)]=\sum_{t=0}^{T}\bm{\mathrm{P}}(S_{t}=0)=\sum_{k=0}^{\lfloor T/2\rfloor}\bm{\mathrm{P}}(S_{2k}=0)=\sum_{k=0}^{\lfloor T/2\rfloor}\binom{2k}{k}\frac{1}{2^{2k}}.

Using Corollary B.2, a consequence of Stirling’s approximation to the factorial function, we can show upper- and lower-bounds to the above quantity. Namely, for the upper-bound we have

k=0T/2(2kk)122k\displaystyle\sum_{k=0}^{\lfloor T/2\rfloor}\binom{2k}{k}\frac{1}{2^{2k}} 1+k=1T/222kπk122k=1+1π(k=1T/21k)1+1π(0T/21xdx)\displaystyle\leq 1+\sum_{k=1}^{\lfloor T/2\rfloor}\frac{2^{2k}}{\sqrt{\pi k}}\frac{1}{2^{2k}}=1+\frac{1}{\sqrt{\pi}}\Bigg{(}\sum_{k=1}^{\lfloor T/2\rfloor}\frac{1}{\sqrt{k}}\Bigg{)}\leq 1+\frac{1}{\sqrt{\pi}}\Bigg{(}\int_{0}^{T/2}\frac{1}{\sqrt{x}\mathop{}\!\mathrm{d}x}\Bigg{)}
=1+(2kπ)|k=0T/2=1+2Tπ.\displaystyle=1+\left(2\sqrt{\frac{k}{\pi}}\right)\Bigg{\rvert}_{k=0}^{T/2}=1+\sqrt{\frac{2T}{\pi}}.

We proceed similarly for the lower-bound. By setting βk=1k3/2\beta\coloneqq\sum_{k=1}^{\infty}k^{-3/2} we get

k=0T/2(2kk)122k\displaystyle\sum_{k=0}^{\lfloor T/2\rfloor}\binom{2k}{k}\frac{1}{2^{2k}} 1+k=1T/222kπk(1215k)122k=1+1π(k=1T/21kk=1T/2215k3/2)\displaystyle\geq 1+\sum_{k=1}^{\lfloor T/2\rfloor}\frac{2^{2k}}{\sqrt{\pi k}}\left(1-\frac{2}{15k}\right)\frac{1}{2^{2k}}=1+\frac{1}{\sqrt{\pi}}\Bigg{(}\sum_{k=1}^{\lfloor T/2\rfloor}\frac{1}{\sqrt{k}}-\sum_{k=1}^{\lfloor T/2\rfloor}\frac{2}{15k^{3/2}}\Bigg{)}
1+1π(0T/21xdx215k=1k3/2)=(2kπ)|k=0T/2215β+1\displaystyle\leq 1+\frac{1}{\sqrt{\pi}}\Bigg{(}\int_{0}^{T/2}\frac{1}{\sqrt{x}\mathop{}\!\mathrm{d}x}-\frac{2}{15}\sum_{k=1}^{\infty}k^{-{3/2}}\Bigg{)}=\left(2\sqrt{\frac{k}{\pi}}\right)\Bigg{\rvert}_{k=0}^{T/2}-\frac{2}{15}\beta+1
=2Tπ215β+1.\displaystyle=\sqrt{\frac{2T}{\pi}}-\frac{2}{15}\beta+1.

To conclude the proof, some simple calculations yield

β=k=11k3/2=1+k=21k3/21+11x3/2dx=1+(2x)|x=1=3.\beta=\sum_{k=1}^{\infty}\frac{1}{k^{3/2}}=1+\sum_{k=2}^{\infty}\frac{1}{k^{3/2}}\leq 1+\int_{1}^{\infty}\frac{1}{x^{3/2}}\mathop{}\!\mathrm{d}x=1+\left(-\frac{2}{\sqrt{x}}\right)\Bigg{\rvert}_{x=1}^{\infty}=3.

Appendix C Missing Proofs for Section 4

See 4.3

Proof C.1.

Fix t[0,T)t\in[0,T) and g0g\in\mathbb{R}_{\geq 0}. Then,

tQ(t,g)\displaystyle\partial_{t}Q(t,g) =12terfc(g2(Tt))=1πexp(g22(Tt))t(g2(Tt))\displaystyle=\frac{1}{2}\partial_{t}\operatorname*{erfc}\left(\frac{g}{\sqrt{2(T-t)}}\right)=-\frac{1}{\sqrt{\pi}}\exp\left(-\frac{g^{2}}{2(T-t)}\right)\partial_{t}\left(\frac{g}{\sqrt{2(T-t)}}\right)
=1πexp(g22(Tt))g(2(Tt))3/2.\displaystyle=-\frac{1}{\sqrt{\pi}}\exp\left(-\frac{g^{2}}{2(T-t)}\right)\frac{g}{(2(T-t))^{3/2}}.

Similarly,

gQ(t,g)\displaystyle\partial_{g}Q(t,g) =12gerfc(g2(Tt))=1πexp(g22(Tt))g(g2(Tt))\displaystyle=\frac{1}{2}\partial_{g}\operatorname*{erfc}\left(\frac{g}{\sqrt{2(T-t)}}\right)=-\frac{1}{\sqrt{\pi}}\exp\left(-\frac{g^{2}}{2(T-t)}\right)\partial_{g}\left(\frac{g}{\sqrt{2(T-t)}}\right)
=1πexp(g22(Tt))12(Tt)\displaystyle=-\frac{1}{\sqrt{\pi}}\exp\left(-\frac{g^{2}}{2(T-t)}\right)\frac{1}{\sqrt{2(T-t)}}

and

ggQ(t,g)\displaystyle\partial_{gg}Q(t,g) =12π(Tt)gexp(g22(Tt))\displaystyle=-\frac{1}{\sqrt{2\pi(T-t)}}\partial_{g}\exp\left(-\frac{g^{2}}{2(T-t)}\right)
=12π(Tt)exp(g22(Tt))gTt=2tQ(t,g),\displaystyle=\frac{1}{\sqrt{2\pi(T-t)}}\exp\left(-\frac{g^{2}}{2(T-t)}\right)\frac{g}{T-t}=-2\partial_{t}Q(t,g),

as desired.

Let us now prove (6).

Lemma C.2.

For all t(,T)t\in(-\infty,T) and gg\in\mathbb{R}, we have

R(t,g)=g2erfc(g2(Tt))Tt2πexp(g22(Tt))+T2π.R(t,g)=\frac{g}{2}\operatorname*{erfc}\Bigg{(}\frac{g}{\sqrt{2(T-t)}}\Bigg{)}-\sqrt{\frac{T-t}{2\pi}}\exp\Bigg{(}-\frac{g^{2}}{2(T-t)}\Bigg{)}+\sqrt{\frac{T}{2\pi}}.
Proof C.3.

Fix (t,g)(,T)×(t,g)\in(-\infty,T)\times\mathbb{R}. Using that 0gerfc(x)dx=gerfc(g)1πeg2+1π\int_{0}^{g}\operatorname*{erfc}(x)\mathop{}\!\mathrm{d}x=g\operatorname*{erfc}(g)-\frac{1}{\sqrt{\pi}}e^{-g^{2}}+\frac{1}{\sqrt{\pi}} (Olver et al., 2010, Section 7.7(i)), we have

0gQ(t,x)dx\displaystyle\int_{0}^{g}Q(t,x)\mathop{}\!\mathrm{d}x =2(Tt)20g/2(Tt)erfc(y)dy\displaystyle=\frac{\sqrt{2(T-t)}}{2}\int_{0}^{\nicefrac{{g}}{{\sqrt{2(T-t)}}}}\operatorname*{erfc}(y)\mathop{}\!\mathrm{d}y
=2(Tt)2(xerfc(x)ex2π)|x=0x=g/2(Tt)\displaystyle=\frac{\sqrt{2(T-t)}}{2}\left(x\operatorname*{erfc}(x)-\frac{e^{-x^{2}}}{\sqrt{\pi}}\right)\Bigg{|}_{x=0}^{x=\nicefrac{{g}}{{\sqrt{2(T-t)}}}}
=g2erfc(g2(Tt))Tt2πexp(g22(Tt))+Tt2π\displaystyle=\frac{g}{2}\operatorname*{erfc}\Bigg{(}\frac{g}{\sqrt{2(T-t)}}\Bigg{)}-\sqrt{\frac{T-t}{2\pi}}\exp\Bigg{(}-\frac{g^{2}}{2(T-t)}\Bigg{)}+\sqrt{\frac{T-t}{2\pi}}

for g0g\geq 0. Similarly, using that ddxerfc(x)=2πex2\frac{\mathop{}\!\mathrm{d}}{\mathop{}\!\mathrm{d}x}\operatorname*{erfc}(x)=-\frac{2}{\sqrt{\pi}}e^{-x^{2}} we have

120tgQ(s,0)ds=120t(12π(Ts))ds=Tt2πT2π.\frac{1}{2}\int_{0}^{t}\partial_{g}Q(s,0)\mathop{}\!\mathrm{d}s=\frac{1}{2}\int_{0}^{t}\left(-\frac{1}{\sqrt{2\pi(T-s)}}\right)\mathop{}\!\mathrm{d}s=\sqrt{\frac{T-t}{2\pi}}-\sqrt{\frac{T}{2\pi}}.

Plugging both equations into the definition of RR concludes the proof.

Appendix D Missing Proofs for Section 5

Let us begin by proving a crucial condition on the function qq defined on Section 5.

See 5.2

Proof D.1.

The claim follows directly from the definition of qq for t=Tt=T. Let t{0,,T1}t\in\{0,\dotsc,T-1\}. From the definition of qq, we have

q(t,0)\displaystyle q(t,0) =Rg(t,0)=12(R(t,1)R(t,1))\displaystyle=R_{g}(t,0)=\frac{1}{2}(R(t,1)-R(t,-1))
=12(01Q(t,x)dx120tgQ(s,0)ds01Q(t,x)dx+120tgQ(s,0)ds)\displaystyle=\frac{1}{2}\left(\int_{0}^{1}Q(t,x)\mathop{}\!\mathrm{d}x-\frac{1}{2}\int_{0}^{t}\partial_{g}Q(s,0)\mathop{}\!\mathrm{d}s-\int_{0}^{-1}Q(t,x)\mathop{}\!\mathrm{d}x+\frac{1}{2}\int_{0}^{t}\partial_{g}Q(s,0)\mathop{}\!\mathrm{d}s\right)
=12(01Q(t,x)dx01Q(t,x)dx)\displaystyle=\frac{1}{2}\left(\int_{0}^{1}Q(t,x)\mathop{}\!\mathrm{d}x-\int_{0}^{-1}Q(t,x)\mathop{}\!\mathrm{d}x\right)
=12(01Q(t,x)dx+01Q(t,x)dx)\displaystyle=\frac{1}{2}\left(\int_{0}^{1}Q(t,x)\mathop{}\!\mathrm{d}x+\int_{0}^{1}Q(t,-x)\mathop{}\!\mathrm{d}x\right)
=1201(Q(t,x)+Q(t,x))dx.\displaystyle=\frac{1}{2}\int_{0}^{1}\big{(}Q(t,x)+Q(t,-x)\big{)}\mathop{}\!\mathrm{d}x.

Moreover, note that for any zz\in\mathbb{R} we have

erfc(z)=12π0zex2dx=1+2π0zex2dx=2erfc(z).\operatorname*{erfc}(-z)=1-\frac{2}{\sqrt{\pi}}\int_{0}^{-z}e^{-x^{2}}\mathop{}\!\mathrm{d}x=1+\frac{2}{\sqrt{\pi}}\int_{0}^{z}e^{-x^{2}}\mathop{}\!\mathrm{d}x=2-\operatorname*{erfc}(z). (18)

Therefore, Q(t,x)=1Q(t,x)Q(t,-x)=1-Q(t,x) for any xx\in\mathbb{R} and

q(t,0)=1201(Q(t,x)+Q(t,x))dx=12011dx=12.q(t,0)=\frac{1}{2}\int_{0}^{1}\big{(}Q(t,x)+Q(t,-x)\big{)}\mathop{}\!\mathrm{d}x=\frac{1}{2}\int_{0}^{1}1\mathop{}\!\mathrm{d}x=\frac{1}{2}.

This section contains the proofs on the bounds on rtr_{t} and rggr_{gg}. We start by bounding rtr_{t}.

See 5.5

Proof D.2.

Fix t(,T)t\in(-\infty,T) and gg\in\mathbb{R}. Note that RR is continuously differentiable with respect to its first argument on [t1,t][t-1,t] and two times differentiable on (t1,t)(t-1,t). Thus, by Taylor’s Theorem, there is t(t1,t)t^{\prime}\in(t-1,t) such that

R(t1,g)=R(t,g)+(1)tR(t,g)+(1)2ttR(t,g)2,R(t-1,g)=R(t,g)+(-1)\partial_{t}R(t,g)+(-1)^{2}\frac{\partial_{tt}R(t^{\prime},g)}{2},

where ttR(t,g)\partial_{tt}R(t^{\prime},g) denotes the second derivative of RR with respect to its first argument at (t,g)(t^{\prime},g). Therefore,

rt(t,g)=tR(t,g)(R(t,g)R(t1,g))=ttR(t,g)2.r_{t}(t,g)=\partial_{t}R(t,g)-(R(t,g)-R(t-1,g))=\frac{\partial_{tt}R(t^{\prime},g)}{2}.

Thus, to bound rt(t,g)r_{t}(t,g) we need only to bound (1/2)ttR(t,g)(1/2)\partial_{tt}R(t^{\prime},g). Computing the derivatives yields

tR(t,g)=122π(Tt)exp(g22(Tt))\partial_{t}R(t^{\prime},g)=\frac{1}{2\sqrt{2\pi(T-t^{\prime})}}\exp\Big{(}-\frac{g^{2}}{2(T-t^{\prime})}\Big{)}

and

ttR(t,g)\displaystyle\partial_{tt}R(t^{\prime},g) =28π(Tt)5/2exp(g22(Tt))(Ttg2)\displaystyle=\frac{\sqrt{2}}{8\sqrt{\pi}(T-t^{\prime})^{5/2}}\exp\Big{(}-\frac{g^{2}}{2(T-t^{\prime})}\Big{)}\left(T-t^{\prime}-g^{2}\right)
28π(Tt)5/2exp(g22(Tt))(Tt)\displaystyle\leq\frac{\sqrt{2}}{8\sqrt{\pi}(T-t^{\prime})^{5/2}}\exp\Big{(}-\frac{g^{2}}{2(T-t^{\prime})}\Big{)}\left(T-t^{\prime}\right)
28π(Tt)3/2\displaystyle\leq\frac{\sqrt{2}}{8\sqrt{\pi}(T-t^{\prime})^{3/2}} (Since Tt>0T-t^{\prime}>0),
28π(Tt)3/2\displaystyle\leq\frac{\sqrt{2}}{8\sqrt{\pi}(T-t)^{3/2}} (Since Tt>TtT-t^{\prime}>T-t).

To bound rggr_{gg}, we will need to be slightly more careful. First, we will need the following simple lemma about Lipschitz continuity of xxex2x\in\mathbb{R}\mapsto xe^{x^{2}}.

Lemma D.3.

Let K>0K>0 and define f(α)αeα2/Kf(\alpha)\coloneqq\alpha e^{-\alpha^{2}/K} for every α\alpha\in\mathbb{R}. Then ff is 22-Lipschitz continuous.

Proof D.4.

Let α\alpha\in\mathbb{R}. First, note that f(α)=eα2/K(12α2/K)f^{\prime}(\alpha)=e^{-\alpha^{2}/K}(1-2\alpha^{2}/K). Therefore, using the fact that eβ1+βe^{\beta}\geq 1+\beta for any β\beta\in\mathbb{R} we have

|f(α)|=1exp(α2K)|12α2K|1|1+α2K||12α2K|2|1α2K||1+α2K|2.\lvert f^{\prime}(\alpha)\rvert=\frac{1}{\exp\left(\frac{\alpha^{2}}{K}\right)}\Bigg{\lvert}1-\frac{2\alpha^{2}}{K}\Bigg{\rvert}\leq\frac{1}{\Big{\lvert}1+\frac{\alpha^{2}}{K}\Big{\rvert}}\Bigg{\lvert}1-\frac{2\alpha^{2}}{K}\Bigg{\rvert}\leq 2\frac{\Big{\lvert}1-\frac{\alpha^{2}}{K}\Big{\rvert}}{\Big{\lvert}1+\frac{\alpha^{2}}{K}\Big{\rvert}}\leq 2.

We are now ready to bound rggr_{gg}. Fix t(,T)t\in(-\infty,T) and gg\in\mathbb{R}. Moreover, denote by g(3)R\partial_{g}^{(3)}R the third partial derivative of RR with respect to its third argument. By Taylor’s Theorem, there are g+(g,g+1)g_{+}^{\prime}\in(g,g+1) and g(g1,g)g_{-}^{\prime}\in(g-1,g) such that

R(t,g+1)\displaystyle R(t,g+1) =R(t,g)+gR(t,g)+12ggR(t,g)+13!g(3)R(t,g+)and\displaystyle=R(t,g)+\partial_{g}R(t,g)+\frac{1}{2}\partial_{gg}R(t,g)+\frac{1}{3!}\partial_{g}^{(3)}R(t,g_{+}^{\prime})\quad\text{and}
R(t,g1)\displaystyle R(t,g-1) =R(t,g)gR(t,g)+12ggR(t,g)13!g(3)R(t,g).\displaystyle=R(t,g)-\partial_{g}R(t,g)+\frac{1}{2}\partial_{gg}R(t,g)-\frac{1}{3!}\partial_{g}^{(3)}R(t,g_{-}^{\prime}).

Therefore,

rgg(t,g)=ggR(t,g)(R(t,g+1)+R(t,g1)2R(t,g))=13!(g(3)R(t,g)g(3)R(t,g+)).r_{gg}(t,g)=\partial_{gg}R(t,g)-(R(t,g+1)+R(t,g-1)-2R(t,g))=\frac{1}{3!}(\partial_{g}^{(3)}R(t,g_{-}^{\prime})-\partial_{g}^{(3)}R(t,g_{+}^{\prime})).

Let gg^{\prime}\in\mathbb{R}. To compute the partial derivatives, first note that

gR(t,g)=Q(t,g)=12erfc(g/2(Tt)).\partial_{g}R(t,g^{\prime})=Q(t,g^{\prime})=\frac{1}{2}\operatorname*{erfc}(g^{\prime}/\sqrt{2(T-t)}).

Thus, one may check that

ggR(t,g)\displaystyle\partial_{gg}R(t,g^{\prime}) =122πexp((g)22(Tt))12(Tt)\displaystyle=-\frac{1}{2}\frac{2}{\sqrt{\pi}}\exp\Big{(}-\frac{(g^{\prime})^{2}}{2(T-t)}\Big{)}\frac{1}{\sqrt{2(T-t)}} (19)
=12π(Tt)exp((g)22(Tt))\displaystyle=-\frac{1}{\sqrt{2\pi(T-t)}}\exp\Big{(}-\frac{(g^{\prime})^{2}}{2(T-t)}\Big{)}
   and
g(3)R(t,g)\displaystyle\partial_{g}^{(3)}R(t,g^{\prime}) =12π(Tt)exp((g)22(Tt))2g2(Tt)\displaystyle=\frac{1}{\sqrt{2\pi(T-t)}}\exp\Big{(}-\frac{(g^{\prime})^{2}}{2(T-t)}\Big{)}\frac{2g^{\prime}}{2(T-t)}
=12πg(Tt)3/2exp((g)22(Tt)).\displaystyle=\frac{1}{\sqrt{2\pi}}\frac{g^{\prime}}{(T-t)^{3/2}}\exp\Big{(}-\frac{(g^{\prime})^{2}}{2(T-t)}\Big{)}.

By Lemma D.3, we know that g(3)R(t,)\partial_{g}^{(3)}R(t,\cdot) is Lipschtiz continuous with Lipschitz constant 2(2π)1/2(Tt)3/22(2\pi)^{-1/2}(T-t)^{-3/2}. Therefore,

rgg(t,g)=13!(g(3)R(t,g)g(3)R(t,g+))232π(Tt)3/2|gg+|223π(Tt)3/2.r_{gg}(t,g)=\frac{1}{3!}(\partial_{g}^{(3)}R(t,g_{-}^{\prime})-\partial_{g}^{(3)}R(t,g_{+}^{\prime}))\leq\frac{2}{3\sqrt{2\pi}(T-t)^{3/2}}\lvert g_{-}^{\prime}-g_{+}^{\prime}\rvert\leq\frac{2\sqrt{2}}{3\sqrt{\pi}(T-t)^{3/2}}.

Appendix E Extending the Regret Analysis for General Costs

In Section 5, we relied on the fact the gap values were in {+1,1}\{+1,-1\}. This assumption was fundamental for the version of the discrete Itô’s Formula that we have uses (see the assumption on g0,,gTg_{0},\dotsc,g_{T} in the statement of Theorem 5.1). It was also required by Proposition 2.1 to connect the regret with the “discrete stochastic integral”. To extend the upper-bound on the regret of the algorithm from Section 5 to general costs we will follow the same techniques used by Harvey et al. to extend the guarantees of their algorithm to general costs: we shall use a more general version of the discrete Itô’s formula, concavity of RR with respect to its second argument, and a lemma relating the per-round regret to terms that appear in the more general version of the discrete Itô’s formula (see Harvey et al., 2020b, Section 3.3 for details on these arguments).

As in the work of Harvey et al., we will rely on a more general version of the discrete Itô’s formula that holds for general [0,1][0,1] costs. The main issue with this general formula is that more work is needed to relate it to the regret of our player strategy.

Theorem E.1 (General Discrete Itô’s Formula, Harvey et al., 2020a, Lemma 3.13).

Let f:2f\colon\mathbb{R}^{2}\to\mathbb{R} be a function and let g0,g1,,gTg_{0},g_{1},\dotsc,g_{T}\in\mathbb{R}. Then,

f(T,gT)f(0,g0)=t=1T\displaystyle f(T,g_{T})-f(0,g_{0})=\sum_{t=1}^{T} (f(t,gt)f(t,gt1+1)+f(t,gt11)2)\displaystyle\Big{(}f(t,g_{t})-\frac{f(t,g_{t-1}+1)+f(t,g_{t-1}-1)}{2}\Big{)}
+t=1T(12fgg(t,gt1)+ft(t,gt1)).\displaystyle+\sum_{t=1}^{T}\big{(}\tfrac{1}{2}f_{gg}(t,g_{t-1})+f_{t}(t,g_{t-1})\big{)}.

Fix TT\in\mathbb{N}, fix gaps gTg\in\mathbb{R}^{T}, and set g00g_{0}\coloneqq 0. For the remainder of this section all results will be regarding a game of 𝒜q\mathcal{A}_{q} against an oblivious adversary with gap sequence g0,g1,,gT0g_{0},g_{1},\dotsc,g_{T}\in\mathbb{R}_{\geq 0}.

For every t{0,,T}t\in\{0,\dotsc,T\}, define the per-round regret (at round tt) by

ΔRegret(t)Regret(t)Regret(t1).\Delta_{\operatorname{Regret}}(t)\coloneqq\operatorname{Regret}(t)-\operatorname{Regret}(t-1).

Our goal in this section is to prove the following lemma.

Lemma E.2.

For every t[T]t\in[T] we have

ΔRegret(t)R(t,gt)R(t,gt1+1)+R(t,gt11)2,t[T].\Delta_{\operatorname{Regret}}(t)\leq R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2},\qquad\forall t\in[T]. (20)

Combining the above lemma with Theorem E.1 and the fact that RR satisfies (BHE) yields

Regret(T)\displaystyle\operatorname{Regret}(T) =t=1TΔRegret(t)R(T1,gT1)+ΔRegret(T)+t=1T1(12rgg(t,gt1)+rt(t,gt1)).\displaystyle=\sum_{t=1}^{T}\Delta_{\operatorname{Regret}}(t)\leq R(T-1,g_{T-1})+\Delta_{\operatorname{Regret}}(T)+\sum_{t=1}^{T-1}(\tfrac{1}{2}r_{gg}(t,g_{t-1})+r_{t}(t,g_{t-1})).

Since q(T,g)=[g=0](1/2)q(T,g)=[{g=0}](1/2) for any g{0,,T1}g\in\{0,\dotsc,T-1\}, we have ΔRegret(T)1/2\Delta_{\operatorname{Regret}}(T)\leq 1/2. At this point, the exact same proof of Theorem 5.6 applies and we obtain the same regret bound. Thus, it only remains to prove Lemma E.2. In order to prove Lemma E.2, we will use the following result from Harvey et al. (2020a).

Proposition E.3 (Harvey et al., 2020a, Lemma 3.14).

Let gt1g_{t-1} and gtg_{t} be the values of the gap on rounds t1t-1 and tt, respectively, and let q(t,gt1)q(t,g_{t-1}) be the probability mass put in the worst expert at round tt by the player (with q(t,0)=1/2q(t,0)=1/2). For all t1t\geq 1,

  1. 1.

    If a best expert at time t1t-1 remains a best expert at time tt, then,

    ΔRegret(t)=q(t,gt1)(gtgt1).\Delta_{\operatorname{Regret}}(t)=q(t,g_{t-1})(g_{t}-g_{t-1}).
  2. 2.

    If a best expert at time t1t-1 remains a best expert at time tt, then gt+gt11g_{t}+g_{t-1}\leq 1 and

    ΔRegret(t)=gtq(t,gt1)(gt+gt1).\Delta_{\operatorname{Regret}}(t)=g_{t}-q(t,g_{t-1})(g_{t}+g_{t-1}).

We shall also make use of the following fact about concave function.

Lemma E.4.

Let f:f\colon\mathbb{R}\to\mathbb{R} be a concave function and let α<β\alpha<\beta be real numbers. Then f(x)min{f(α),f(β)}f(x)\geq\min\{f(\alpha),f(\beta)\}.

Proof E.5 (Proof of Lemma E.2).

To prove (20), we will consider each one of the cases from Proposition E.3 separately.

Case 1.

In this case, (20) is equivalent to

0q(t,gt1)(gtgt1)+R(t,gt)R(t,gt1+1)+R(t,gt11)2.0\leq-q(t,g_{t-1})(g_{t}-g_{t-1})+R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}. (21)

Since the first term in the right-hand side of the above inequality is linear in gtg_{t} and since R(t,)R(t,\cdot) is concave (by (19) we know that ggR(t,)\partial_{gg}R(t,\cdot) is negative everywhere), we conclude that the whole right-hand side is concave as a function of gtg_{t}. Thus, by Fact E.4 it suffices to prove the above inequality for gt{gt11,gt1+1}g_{t}\in\{g_{t-1}-1,g_{t-1}+1\} to prove that it holds for gt[gt11,gt1+1]g_{t}\in[g_{t-1}-1,g_{t-1}+1]. But for gt{gt11,gt1+1}g_{t}\in\{g_{t-1}-1,g_{t-1}+1\} the right-hand side of (20) becomes exactly q(t,gt1)(gtgt1)q(t,g_{t-1})(g_{t}-g_{t-1}).

Case 2.

In this case, (20) is equivalent to

0gt+q(t,gt1)(gt+gt1)+R(t,gt)R(t,gt1+1)+R(t,gt11)2.0\leq-g_{t}+q(t,g_{t-1})(g_{t}+g_{t-1})+R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}. (22)

Again, the right-hand side of the above inequality is concave as a function of gtg_{t}. Since gt0g_{t}\geq 0 and gt+gt11g_{t}+g_{t-1}\leq 1, we know that gt[0,1gt1]g_{t}\in[0,1-g_{t-1}]. Thus, it suffices to prove the above inequality for gt{0,1gt1}g_{t}\in\{0,1-g_{t-1}\}. For gt=0g_{t}=0 we have

gt+q(t,gt1)(gt+gt1)+R(t,gt)R(t,gt1+1)+R(t,gt11)2\displaystyle-g_{t}+q(t,g_{t-1})(g_{t}+g_{t-1})+R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}
=\displaystyle= q(t,gt1)(gtgt1)+R(t,gt)R(t,gt1+1)+R(t,gt11)2,\displaystyle-q(t,g_{t-1})(g_{t}-g_{t-1})+R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2},

and in the previous case we showed that the above is non-negative for all gt[gt11,gt1+1]g_{t}\in[g_{t-1}-1,g_{t-1}+1]. Since gt11g_{t-1}\leq 1 in this case, we have in particular that the above holds for gt=0g_{t}=0. Suppose now that gt=1gt1g_{t}=1-g_{t-1}. Since q(t,gt1)=Rg(t,gt1)q(t,g_{t-1})=R_{g}(t,g_{t-1}), we have that (22) is equivalent to

0\displaystyle 0 gt+q(t,gt1)(gt+gt1)+R(t,gt)R(t,gt1+1)+R(t,gt11)2\displaystyle\leq-g_{t}+q(t,g_{t-1})(g_{t}+g_{t-1})+R(t,g_{t})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}
=gt11+q(t,gt1)+R(t,1gt1)R(t,gt1+1)+R(t,gt11)2\displaystyle=g_{t-1}-1+q(t,g_{t-1})+R(t,1-g_{t-1})-\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}
=gt11+R(t,gt1+1)R(t,gt11)2+R(t,1gt1)R(t,gt1+1)+R(t,gt11)2\displaystyle=\begin{aligned} g_{t-1}-1&+\frac{R(t,g_{t-1}+1)-R(t,g_{t-1}-1)}{2}+R(t,1-g_{t-1})\\ -&\frac{R(t,g_{t-1}+1)+R(t,g_{t-1}-1)}{2}\end{aligned}
=gt11+R(t,1gt1)R(t,gt11).\displaystyle=g_{t-1}-1+R(t,1-g_{t-1})-R(t,g_{t-1}-1).

By the definition of RR and since erfc(z)=2erfc(z)\operatorname*{erfc}(-z)=2-\operatorname*{erfc}(z) for all zz\in\mathbb{R} (see (18)), we have

R(t,1gt1)R(t,gt11)\displaystyle R(t,1-g_{t-1})-R(t,g_{t-1}-1) =1gt12erfc(1gt12(Tt))gt112erfc(gt112(Tt))\displaystyle=\frac{1-g_{t-1}}{2}\operatorname*{erfc}\left(\frac{1-g_{t-1}}{\sqrt{2(T-t)}}\right)-\frac{g_{t-1}-1}{2}\operatorname*{erfc}\left(\frac{g_{t-1}-1}{\sqrt{2(T-t)}}\right)
=(1gt12)(erfc(1gt12(Tt))+erfc(gt112(Tt)))\displaystyle=\left(\frac{1-g_{t-1}}{2}\right)\left(\operatorname*{erfc}\left(\frac{1-g_{t-1}}{\sqrt{2(T-t)}}\right)+\operatorname*{erfc}\left(\frac{g_{t-1}-1}{\sqrt{2(T-t)}}\right)\right)
=(1gt12)2=1gt1.\displaystyle=\left(\frac{1-g_{t-1}}{2}\right)2=1-g_{t-1}.

This concludes the proof of (22).