This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-asymptotic estimation of risk measures using stochastic gradient Langevin dynamics

Jiarui Chu111Corresponding Author: Jiarui Chu
      Affiliation: Princeton University, ORFE
      E-mail Address:jiaruic@princeton.edu
             Ludovic Tangpi222Princeton University, ORFE, ludovic.tangpi@princeton.edu
Abstract

In this paper we will study the approximation of some law invariant risk measures. As a starting point, we approximate the average value at risk using stochastic gradient Langevin dynamics, which can be seen as a variant of the stochastic gradient descent algorithm. Further, the Kusuoka’s spectral representation allows us to bootstrap the estimation of the average value at risk to extend the algorithm to general law invariant risk measures. We will present both theoretical, non-asymptotic convergence rates of the approximation algorithm and numerical simulations.

Keywords Convex risk measure \cdot Stochastic Optimization \cdot Risk minimization \cdot Average value at risk \cdot Stochastic gradient Langevin

Mathematics Subject Classification 91G70 \cdot 90C90

Statements and Declarations The authors gratefully acknowledge support from the NSF grant DMS-2005832. The authors have no competing interests to declare.

1 Introduction

Every financial decision involves some degree of risk. Quantifying risk associated with a future random outcome allows organizations to compare financial decisions and develop risk management plans to prepare for potential loss and uncertainty. By the seminal work of Artzner, Delbaen, Eber, and Heath [3], the canonical way to quantify the riskiness of a random financial position XX is to compute the number ρ(X)\rho(X) for a convex risk measure ρ\rho, whose definition we recall.

Definition 1.1.

A mapping ρ:𝕃\rho:\mathbb{L}^{\infty}\to\mathbb{R} is a convex risk measure if it satisfies the following conditions for all X,Y𝒳X,Y\in\mathcal{X}:

  • translation invariance: ρ(X+m)=ρ(X)m\rho(X+m)=\rho(X)-m for all mm\in\mathbb{R}

  • monotonicity: ρ(X)ρ(Y)\rho(X)\geq\rho(Y) if XYX\leq Y

  • convexity: ρ(λX+(1λ)Y)λρ(X)+(1λ)ρ(Y)\rho(\lambda X+(1-\lambda)Y)\leq\lambda\rho(X)+(1-\lambda)\rho(Y) for λ[0,1]\lambda\in[0,1].

Intuitively333In the rest of the paper, for notation simplicity, we will assume risk measures to be increasing. That is, we work with ρ(X)\rho(-X). This does not restrict the generality., ρ(X)\rho(X) measures the minimum amount of capital that should be added to the current financial position XX to make it acceptable. Due to its fundamental importance in quantitative finance, the theory of risk measures (sometimes called quantitative risk management) has been extensively developed. We refer for instance to [18, 28, 42, 16, 45, 15, 33, 46] for a few milestones and the influential textbooks of McNeil, Frey, and Embrechts [48] and Föllmer and Schied [30] for overviews.

An important problem for risk managers in practice is to efficiently simulate the number ρ(X)\rho(X) for a financial position XX and a risk measure ρ\rho. The difficulty here stems from the fact that, unless ρ\rho is a “simple enough” risk measure and the law of XX belongs to a tractable family of distributions, there are no closed form formula allowing to compute ρ(X)\rho(X). The goal of this paper is to develop a method allowing to numerically simulate the riskiness ρ(X)\rho(X) for general convex risk measures, and when the law of XX is not necessarily known (as it is the case in practical applications).

One commonly used measure of the riskiness of a financial position is the value at risk (VaR). For a given risk intolerance u(0,1)u\in(0,1), the value at risk VaRu(X)\mathrm{VaR}_{u}(X) of XX is the (1u)(1-u)-quantile of the distribution of XX. Despite the various shortcomings of this measure of risk documented by the academic community [48], VaR\mathrm{VaR} remains the standard in the banking industry, and due to its widespread use, the computation of VaR\mathrm{VaR} has been extensively studied. We refer interested readers for instance to [35, 38, 10, 25], and references therein for various simulation techniques. Recommendations [50] from the Basel Committee on Banking Supervision which advises on risk management for financial institutions have revived the development of convex risk measures such as the average value at risk (AVaR), also called conditional value at risk or expected shortfall. This risk measure is the expected loss given that losses are greater than or equal to the VaR\mathrm{VaR}. That is, AVaRu\mathrm{AVaR}_{u} is given by:

AVaRu(X):=𝔼[X|X>VaRu(X)].\mathrm{AVaR}_{u}(X):=\mathbb{E}[X|X>\mathrm{VaR}_{u}(X)]. (1.1)

For general distributions of XX, AVaRu(X)\mathrm{AVaR}_{u}(X) usually does not have closed form expressions. Therefore, in practice, numerical estimations are often required. As a result, the estimation of AVaR\mathrm{AVaR} has received considerable attention. We refer for instance to works by Eckstein and Kupper [24] and Bühler, Gonon, Teichmann, and Wood [9] in which (among other things) the simulation of optimized certainty equivalents (of which AVaR\mathrm{AVaR} is a particular case) are considered using deep learning techniques. One approximation technique for the AVaR\mathrm{AVaR} is based on Monte-Carlo type algorithm. In this direction, let us refer for instance to Hong and Liu. [37] and Chen [13] on Monte Carlo estimation of VaR\mathrm{VaR} and AVaR\mathrm{AVaR}, Zhu and Zhou [64] on nested Monte Carlo estimation. More recently, motivated by developments of gradient descent methods in stochastic optimization, (in particular the stochastic Langevin gradient descent (SGLD) technique), Sabanis and Zhang [56] provide non-asymptotic error bounds for the estimation of AVaR\mathrm{AVaR}. Other works developing such gradient descent techniques in the context of risk management include Iyengar and Ma [41], Tamar, Glassner, and Mannor [59] and Soma and Yoshida [58]. Essentially, these papers take advantage of new developments in machine learning and optimization, see e.g. Allen-Zhu [2], Gelfand and Mitter [34], Nesterov [49] and Raginsky, Rakhlin, and Telgarsky [53]. Let us also mention the recent work of Reppen and Soner [54] who develop a data–driven approach based on ideas from learning theory.

In this work we go beyond the numerical simulation of AVaR\mathrm{AVaR} by extending stochastic gradient descent type techniques to compute a large family of risk measures, including the AVaR\mathrm{AVaR}. We are interested in this work in deriving explicit (non–asymptotic) error estimates for the approximation. We will restrict our attention to law-invariant convex risk measures (whose definition we recall below), since in practice, only the law of a financial position can be (approximately) observed. In fact, the requirement for a risk measure to be law-invariant is natural and is satisfied by most risk measures444All risk measures considered in this work will be implicitly assumed to be convex law–invariant risk measures..

Definition 1.2.

[32] A risk measure ρ\rho is law-invariant if for all X,YX,Y with the same distribution, we have ρ(X)=ρ(Y).\rho(X)=\rho(Y).

To the best of our knowledge, the papers considering (non-parametric) estimation of general convex risk measures are Weber [61], Belomestny and Krätschmer [6] and Bartl and Tangpi [4]. These papers consider a (data-driven) Monte-Carlo estimation method by proposing a plug-in estimator based on the empirical measure of the historical observations of the underlying distribution of the random outcome. Weber [61] proves a large deviation theorem, and Belomestny and Krätschmer [6] provide a central limit theorem. Note that both of these papers give asymptotic estimation results. Bartl and Tangpi [4] provide sharp non-asymptotic convergence rates for the estimation.

To estimate general law-invariant convex risk measures, we rely on the Kusuoka’s spectral representation [46]. Intuitively, this representation says that any law invariant risk measure can be constructed as an integral of the AVaR\mathrm{AVaR} risk measure. Therefore, the first step of our approximation of general law–invariant risk measures is to estimate AVaR\mathrm{AVaR}. Since we would like to analyze approximation algorithms for the risk of claims with possibly non-convex payoffs, we employ the idea of Raginsky, Rakhlin, and Telgarsky [53] and use the stochastic gradient Langevin dynamic which, essentially, adds a Gaussian noise to the unbiased estimate of the gradient in stochastic gradient descent. To quantify the distance between the estimator and the true value of the risk measure, we present non–asymptotic rates on the mean squared estimation error both in the case of a AVaR\mathrm{AVaR}, and of general law–invariant risk measures. The proof of the mean squared error of estimating the AVaR\mathrm{AVaR} makes use of the observation that the SGLD algorithm is a variant of the Euler-Maruyama discretization of the solution of the Langevin stochastic differential equation (SDE). This observation allows us to use results on the convergence rate of the Euler-Maruyama scheme and classical techniques of deriving the convergence rate of the solution of the Langevin SDE to the invariant measure. For the rate on the mean squared estimation error of the general case, our proof relies heavily on Kusuoka’s representation which allows to build general law–invariant risk measures from AVaR\mathrm{AVaR}.

Beyond our theoretical guarantees for the convergence of approximation algorithms for general convex risk measures, the present work also contributes to the non-convex optimization literature in that we propose a new proof for the convergence of SGLD algorithms for some non–convex objective functions. The idea is essentially to reduce the problem into the analysis of contractivity properties of the semi–group originating from a Langevin diffusion with non–convex potential. This problem was notably investigated by Eberle [23].

The paper is organized as follows: We start by describing the approximation techniques and presenting the main results in Section 2. In the same section, we also present numerical results on the estimation of AVaR. In Section 3, we prove the rates on the mean squared error for the estimation of AVaR. The derivation of the mean squared error for the estimation of a general law-invariant risk measure is done in Section 4.

Notations: Let :={0}\mathbb{N}^{\star}:=\mathbb{N}\setminus\{0\} and let +\mathbb{R}_{+}^{\star} be the set of real positive numbers. Fix an arbitrary Polish space EE endowed with a metric dEd_{E}. Throughout this paper, for every pp-dimensional EE-valued vector ee with pp\in\mathbb{N}^{\star}, we denote by e1,,epe^{1},\ldots,e^{p} its coordinates. For (α,β)p×p(\alpha,\beta)\in\mathbb{R}^{p}\times\mathbb{R}^{p}, we also denote by αβ\alpha\cdot\beta the usual inner product, with associated norm \|\cdot\|, which we simplify to |||\cdot| when pp is equal to 11. For any (,c)×(\ell,c)\in\mathbb{N}^{\star}\times\mathbb{N}^{\star}, E×cE^{\ell\times c} will denote the space of ×c\ell\times c matrices with EE-valued entries.

Let (E)\mathcal{B}(E) be the Borel σ\sigma-algebra on EE (for the topology generated by the metric dEd_{E} on E)E). For any p1p\geq 1, for any two probability measures μ\mu and ν\nu on (E,(E))(E,\mathcal{B}(E)) with finite pp-moments, we denote by 𝒲p(μ,ν)\mathcal{W}_{p}(\mu,\nu) the pp-Wasserstein distance between μ\mu and ν\nu, that is

𝒲p(μ,ν):=(infαΓ(μ,ν)E×Ed(x,y)pα(dx,dy))1/p,\mathcal{W}_{p}(\mu,\nu):=\bigg{(}\inf_{\alpha\in\Gamma(\mu,\nu)}\int_{E\times E}d(x,y)^{p}\alpha(\mathrm{d}x,\mathrm{d}y)\bigg{)}^{1/p},

where the infimum is taken over the set Γ(μ,ν)\Gamma(\mu,\nu) of all couplings π\pi of μ\mu and ν\nu, that is, probability measures on (E2,(E)2)\big{(}E^{2},\mathcal{B}(E)^{\otimes 2}\big{)} with marginals μ\mu and ν\nu on the first and second factors respectively.

2 Approximation technique and main results

In this section we rigorously describe the approximation method developed in this article as well as our main results. Throughout, we fix a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) on which all random variables will be defined, unless otherwise stated. Let us denote by 𝕃\mathbb{L}^{\infty} the space of essentially bounded random variables on this probability space. The starting point of our method is based on the following spectral representation of law-invariant risk measures due to Kusuoka [46]:

Theorem 2.1.

A mapping ρ:𝕃\rho:\mathbb{L}^{\infty}\to\mathbb{R} is a law-invariant risk measure if and only if it satisfies

ρ(X)=supγ([0,1)AVaRu(X)γ(du)β(γ)), for all X𝕃\rho(X)=\sup_{\gamma\in\mathcal{M}}\bigg{(}\int_{[0,1)}\text{AVaR}_{u}(X)\gamma(du)-\beta(\gamma)\bigg{)},\text{ for all }X\in\mathbb{L}^{\infty} (2.1)

for some functional β:[0,)\beta:\mathcal{M}\to[0,\infty), where \mathcal{M} is the set of all Borel probability measures on [0,1][0,1].

In fact, this spectral representation suggests that the risk measure AVaR\mathrm{AVaR} is the “basic building block” allowing to construct all convex law–invariant risk measures. Thus, the idea will be to propose an approximation algorithm for AVaR\mathrm{AVaR} that will be later bootstrapped to derive an algorithm for general law invariant risk measures. This approach is also used in [4] for a very different approximation method.

2.1 Approximation of average value at risk

Let us first focus on estimating AVaR\mathrm{AVaR}. For this purpose, recall (see e.g. [30, Proposition 4.51]) that for every X𝕃X\in\mathbb{L}^{\infty} and u(0,1)u\in(0,1), AVaRu(X)\mathrm{AVaR}_{u}(X) takes the form

AVaRu(X)=infq(11u𝔼[(Xq)+]+q).\text{AVaR}_{u}(X)=\inf_{q\in\mathbb{R}}\Big{(}\frac{1}{1-u}\mathbb{E}[({}X-q)^{+}]+q\Big{)}.

In other words, AVaRu(X)\mathrm{AVaR}_{u}(X) is nothing but the value of a stochastic optimization problem. In most financial applications, the contingent claim XX whose risk is assessed is of the form X=f(r,S)X=f(r,S) where S=(S1,,Sd)S=(S^{1},\dots,S^{d}) is a dd–dimensional random vector of risk factors, and r𝒳r\in\mathcal{X}, see e.g. [48, Section 2.1] for details. Note that here the space 𝒳\mathcal{X} can be infinite dimensional. A standard practice to approach the infinite dimensional case is to use neural networks for approximation, which leads to non-convex objective functions. Therefore, we allow ff to be non-convex with certain regularity conditions. A standard example arises when XX is the profit and loss (P&\&L) of an investment strategy. In this case, rr is the portfolio and the random vector SS represents (increments) of the stock prices. That is, f(r,S):=i=1dri(S1iS0i)f(r,S):=\sum_{i=1}^{d}r_{i}(S_{1}^{i}-S^{i}_{0}) where S1iS^{i}_{1} and S0iS^{i}_{0} are the values of the stock ii at times 11 and 0, respectively. Hence, we let rr be in a compact and convex set Ad1A\subseteq\mathbb{R}^{d-1}, and our goal will be to estimate the value of the (multi-dimensional) risk minimization problem

AVaRu¯(f)\displaystyle\overline{\mathrm{AVaR}_{u}}(f) :=infrAAVaRu(f(r,S))\displaystyle:=\inf_{r\in A}\mathrm{AVaR}_{u}(f(r,S))
=infrA,q(11u𝔼[(f(r,S)q)+]+q).\displaystyle=\inf_{r\in A,q\in\mathbb{R}}\Big{(}\frac{1}{1-u}\mathbb{E}\Big{[}\big{(}f(r,S)-q\big{)}^{+}\Big{]}+q\Big{)}. (2.2)

A natural way to numerically solve such problems is by gradient descent. However, when the dataset is large, gradient descent usually does not perform well, since computing the gradient on the full dataset at each iteration is computationally expensive.

Among others, one method that has been proposed to get around the high computational cost of gradient descent is the stochastic gradient descent (SGD) algorithm, which replaces the true gradient with an unbiased estimate calculated from a random subset of the data. A more recent approach, called the Stochastic Langevin Gradient Descent, injects a random noise to an unbiased estimate of the gradient at each iteration of the SGD algorithm. Originally introduced by Welling and Teh. [62] as a tool for Bayesian posterior sampling on large scale and high dimensional datasets, SGLD maintains the scalability property of SGD, and has a few advantages over the SGD: By adding a noise to SGD, SGLD navigates out of saddle points and local minima more easily [7], outperforms SGD in terms of accuracy [19], and overcomes the curse of dimensionality [14]. Moreover, SGLD also applies to cases where the objective function is non-convex but sufficiently regular [34] [53].

We will apply the SGLD in the present context of estimation of AVaR\mathrm{AVaR}. Recall that our goal is to solve the optimization problem given in equation (2.2). Let z:=(r,q)z:=(r,q), and consider the (objective) function

L~(r,q):=11u𝔼[(f(r,S)q)+]+q\widetilde{L}(r,q):=\frac{1}{1-u}\mathbb{E}\Big{[}\big{(}f(r,S)-q\big{)}^{+}\Big{]}+q

and, given a strictly positive constant γ>0\gamma>0, let

L(r,q):=L~(r,q)+γ2q2,and L¯(r,q):=L(r,q)+γ2dist2(r,A)L(r,q):=\widetilde{L}(r,q)+\frac{\gamma}{2}\|q\|^{2},\quad\text{and }\overline{L}(r,q):=L(r,q)+\frac{\gamma}{2}\text{dist}^{2}(r,A) (2.3)

be the usual penalized objective function, where

dist2(r,A):=infxArx2\text{dist}^{2}(r,A):=\inf_{x\in A}\|r-x\|^{2}

denotes the squared distance from rr to the set AA. Since for γ\gamma small we have

inf(r,q)dL¯(r,q)AVaR¯u(f),\inf_{(r,q)\in\mathbb{R}^{d}}\overline{L}(r,q)\approx\overline{\mathrm{AVaR}}_{u}(f),

we will approximate the left hand side above using the SGLD algorithm, which consists in approximating its minimizer by the (support of the) invariant measure of the Markov chain (Zm,hλ)(Z^{\lambda}_{m,h}) given by

Zm+1,hλ=Zm,hλL¯(Zm,hλ)h+2λ1ξm,Z_{m+1,h}^{\lambda}=Z_{m,h}^{\lambda}-\nabla\overline{L}(Z_{m,h}^{\lambda})h+\sqrt{2\lambda^{-1}}\xi_{m}, (2.4)

where (ξm)m1(\xi_{m})_{m\geq 1} are independent Gaussian random variables. In the practice of financial risk management, the distribution of SS is typically unknown. This is a well-studied issue in quantitative finance, refer for instance to [5, 17, 44] and the references therein. In particular, L\nabla L cannot be directly computed. It will be replaced by an unbiased estimator. Following Monte–Carlo simulation ideas, we let (S1,,SP)(S^{1},\dots,S^{P}) be independent copies of SS and (W~1,,W~N)(\widetilde{W}^{1},\dots,\widetilde{W}^{N}) be independent Brownian motions, and we thus let

~(z):=1Pp=1P11u(f(r,Sp)q)++q,(z):=~(z)+γ2q2,and ¯(z):=(z)+dist2(r,A),with z=(r,q)d.\widetilde{\ell}(z):=\frac{1}{P}\sum_{p=1}^{P}\frac{1}{1-u}(f(r,S^{p})-q)^{+}+q,\quad\ell(z):=\widetilde{\ell}(z)+\frac{\gamma}{2}\|q\|^{2},\quad\text{and }\overline{\ell}(z):=\ell(z)+\text{dist}^{2}(r,A),\quad\text{with }z=(r,q)\in\mathbb{R}^{d}.

In the following, we will take P=NP=N for simplicity. Put

Z~m+1,hλ,n=Z~m,hλ,nL(Z~m,hλ,n)h+2λ1ΔW~hn,withΔW~hn:=W~m+1nW~mn,\widetilde{Z}_{m+1,h}^{\lambda,n}=\widetilde{Z}_{m,h}^{\lambda,n}-\nabla L(\widetilde{Z}_{m,h}^{\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\widetilde{W}_{h}^{n},\quad\text{with}\quad\Delta\widetilde{W}_{h}^{n}:=\widetilde{W}^{n}_{m+1}-\widetilde{W}^{n}_{m}, (2.5)
Z~m+1,hλ,n=Z~m,hλ,n(Z~m,hλ,n)h+2λ1ΔW~hn,withΔW~hn:=W~m+1nW~mn,\widetilde{Z}_{m+1,h}^{\prime\lambda,n}=\widetilde{Z}_{m,h}^{\prime\lambda,n}-\nabla\ell(\widetilde{Z}_{m,h}^{\prime\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\widetilde{W}_{h}^{n},\quad\text{with}\quad\Delta\widetilde{W}_{h}^{n}:=\widetilde{W}^{n}_{m+1}-\widetilde{W}^{n}_{m}, (2.6)

and

Z¯m+1,hλ,n=Z¯m,hλ,n¯(Z¯m,hλ,n)h+2λ1ΔW¯hn,withΔW¯hn:=W¯m+1nW¯mn.\overline{Z}_{m+1,h}^{\lambda,n}=\overline{Z}_{m,h}^{\lambda,n}-\nabla\overline{\ell}(\overline{Z}_{m,h}^{\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\overline{W}_{h}^{n},\quad\text{with}\quad\Delta\overline{W}_{h}^{n}:=\overline{W}^{n}_{m+1}-\overline{W}^{n}_{m}. (2.7)

Hence we will show that

AVaR~u(f):=1Nn=1N¯(Z~M,hλ,n)\widetilde{\mathrm{AVaR}}_{u}(f):=\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\widetilde{Z}^{\prime\lambda,n}_{M,h}) (2.8)

approximates AVaR¯u(f)\overline{\mathrm{AVaR}}_{u}(f). Note that the optimal portfolio rr can be easily recovered. It is simply the last d1d-1 coordinates of Z¯M,hλ,n\overline{Z}^{\lambda,n}_{M,h}. Similarly, the value-at-risk can be obtained from the Markov chain Z¯M,hλ,n\overline{Z}^{\lambda,n}_{M,h}. See Remark 3.1 for details. Let us now formulate the assumptions we make on ff and the random vector SS.

Assumption 2.2.

The random variable SS takes values in d\mathbb{R}^{d} and the function f:d×Rd1f:\mathbb{R}^{d}\times R^{d-1}\to\mathbb{R} is Borel measurable, and they satisfy
(i)(i) SS has finite fourth moment.
(ii)(ii) The function (r,s)f(r,s)(r,s)\mapsto f(r,s) Lipschitz–continuous and continuously differentiable.
(iii)(iii) infr𝔼[f(r,S)]>0\inf_{r}\mathbb{E}[f(r,S)]>0.
(iv)(iv) The random variable rf(r,S)\nabla_{r}f(r,S) is bounded, uniformly in SS, and sf(r,)\nabla_{s}f(r,\cdot) is Lipschitz, uniformly in rr
(v)(v) Consider the function κ\kappa defined as

κ(u):=inf{2λ(zz)(L(z)L(z))zz,z,zd:zz=u}.\kappa(u):=\inf\Big{\{}-\sqrt{2\lambda}\frac{(z-z^{\prime})\cdot(\nabla L(z^{\prime})-\nabla L(z))}{\|z-z^{\prime}\|},\,\,z,z^{\prime}\in\mathbb{R}^{d}:\|z-z^{\prime}\|=u\Big{\}}.

It holds

lim infuκ(u)>0and01uκ(u)du<.\liminf_{u\to\infty}\kappa(u)>0\quad\text{and}\quad\int_{0}^{1}u\kappa(u)^{-}\,\mathrm{d}u<\infty.

Let us briefly comment on these conditions before stating the result. The integrability, regularity, and lower boundedness conditions (i)(iii)(i)-(iii) allow to ensure that the problem is well-posed. The boundedness condition (iv)(iv) is assumed mostly to simplify the exposition. Most of our statements will remain true if it is replaced by a suitable integrability condition. We introduce the more involved condition (v)(v) to make for the possible lack of convexity of the objective function LL. This condition is by now standard when employing coupling by reflection techniques to prove contractivity of diffusion semigroups. We refer for instance to Eberle [23, 22] or the earlier work of Chen and Li [12]. Note, for instance, that this condition is automatically satisfied if ff is convex (since in this case LL is strongly convex) or when LL is strictly convex outside a given ball (see [23, Example 1]).

The following is the first main result of this work:

Theorem 2.3.

Let Assumptions 2.2 hold. Let t,M,h>0t,M,h>0 be such that h=tM2h=\frac{t}{M^{2}}. For all t,λ>0,0<γ<1,t,\lambda>0,0<\gamma<1, and M,NM,N\in\mathbb{N}^{\star}, we have

𝔼[|1Nn=1N¯(Z~M,hλ,n)AVaR¯u(f)|2]C(u,t,λ,t)11N+C(u,t,λ)2γ2+C(u,t,λ)3h2+C(u,λ)4etC(λ)5+C(u)61λ2,\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\widetilde{Z}_{M,h}^{\prime\lambda,n})-\overline{\text{AVaR}}_{u}(f)\Big{|}^{2}\bigg{]}\leq C^{1}_{(u,t,\lambda,t)}\frac{1}{N}+C^{2}_{(u,t,\lambda)}\gamma^{2}+C^{3}_{(u,t,\lambda)}h^{2}+C^{4}_{(u,\lambda)}e^{-tC^{5}_{(\lambda)}}+C^{6}_{(u)}\frac{1}{\lambda^{2}}, (2.9)

where the constants are given in the appendix.

Theorem 2.3 provides a non-asymptotic rate for the convergence of the estimator 1Nn=1N¯(Z~M,hλ,n)\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\widetilde{Z}^{\prime\lambda,n}_{M,h}) to the (optimized) average value at risk. Such a rate is crucial in applications since it gives a precise order of magnitude for the choice of the parameters M,N,γM,N,\gamma and λ\lambda needed to achieve a desired order of accuracy. Moreover, the rate is independent of the dimension dd, implying in particular that the rate is not made worst when increasing the size of the portfolio S=(S1,,Sd)S=(S^{1},\dots,S^{d}) (or in general the number of risk factors). Furthermore, observe that this estimator AVaR~u(f)\widetilde{\mathrm{AVaR}}_{u}(f) is rather easy to simulate: one only needs to simulate NN independent Gaussian random variables, for each of them simulate the iterative scheme (2.6) and compute the empirical average of the outcomes. We provide numerical results on the estimation of AVaR in Section 2.3 below.

Remark 2.4.

Observe that the method developed here can also allow (with minor changes) to simulate the value function of utility maximization problems of the form

suprA𝔼μ[U(f(r,S))]\sup_{r\in A}\mathbb{E}^{\mu}[U(f(r,S))]

where UU is a concave utility function and 𝔼μ\mathbb{E}^{\mu} the expectation when SμS\sim\mu, or even of the robust utility maximization problem

suprAinfμ𝒫𝔼μ[U(f(r,S))]\sup_{r\in A}\inf_{\mu\in\mathcal{P}}\mathbb{E}^{\mu}[U(f(r,S))]

where 𝒫\mathcal{P} is the set of possible distributions of SS. In the latter case, one will need to compute (or find an appropriate unbiased estimator of) L\nabla L with

L(r):=infμ𝒫𝔼μ[U(f(r,S))].L(r):=\inf_{\mu\in\mathcal{P}}\mathbb{E}^{\mu}[U(f(r,S))].

This is easily done for instance when 𝒫\mathcal{P} is a ball with respect to the Wasserstein metric around a given distribution μ0\mu_{0}, see e.g. [5].

2.2 Approximation of general convex risk measures

Let us return to the problem of approximating general law-invariant convex risk measures. In this context (as in the case of AVaR\mathrm{AVaR}) our goal is to simulate the optimized risk measure

ρ¯(f):=infrAρ(f(r,S)).\overline{\rho}(f):=\inf_{r\in A}\rho(f(r,S)).

To that end, let us recall a notion of regularity of risk measures introduced in [4] that will be needed to derive an explicit non-asymptotic convergence rate. Recall that a random variable XX^{*} is said to follow the Pareto distribution with scale parameter x>0x>0 and shape parameter q>0q>0 if

P(Xt)={(x/t)q if tx1 if t<x.P(X\geq t)=\begin{cases}(x/t)^{q}\text{ if }t\geq x\\ 1\text{ if }t<x.\end{cases}
Definition 2.5.

[4] Let q(1,)q\in(1,\infty), and let XX^{*} follow Pareto distribution with scale parameter 1 and shape parameter qq. A convex risk measure ρ:𝕃\rho:\mathbb{L}^{\infty}\to\mathbb{R} is said to be qq-regular if it satisfies

supnρ(Xn)<.\sup_{n\in\mathbb{N}}\rho(X^{*}\wedge n)<\infty.

We refer to [4] for a discussion on this notion of regularity, but note for instance that AVaR\mathrm{AVaR} is qq–regular for all q>1q>1 and that this notion of regularity is slightly stronger than the well-known Fatou property and the Lebesgue property often assumed for risk measures, see e.g. Föllmer and Schied [30]. Moreover, one consequence of qq–regularity is the following slight refinement of Kusuoka’s representation: The risk measure ρ\rho satisfies

ρ(f(r,S))=supγ:s.t.β(γ)b([0,1)AVaRu(f(r,S))γ(du)β(γ)).\rho(f(r,S))=\sup_{\gamma\in\mathcal{M}:s.t.\beta(\gamma)\leq b}\left(\int_{[0,1)}\text{AVaR}_{u}(f(r,S))\gamma(du)-\beta(\gamma)\right). (2.10)

for some b>0b>0, see [4, Lemma 4.4] for details. Thus, the estimator we consider for ρ¯\overline{\rho} is given by

ρ~δ(f):=esssupγ:β(γ)b([0,δ)AVaRu~(f)γ(du)β(γ))\widetilde{\rho}^{\delta}(f):=\operatorname*{ess\,sup}_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\left(\int_{[0,\delta)}\widetilde{\text{AVaR}_{u}}(f)\gamma(du)-\beta(\gamma)\right) (2.11)

for some δ(0,1)\delta\in(0,1), and where AVaRu~(f)\widetilde{\text{AVaR}_{u}}(f) is the estimator of AVaRu¯(f)\overline{\mathrm{AVaR}_{u}}(f) given by (2.8), which implicitly depends on uu through the objective functions LL and ¯\overline{\ell}. The following theorem gives a convergence rate for the approximation of the general law-invariant convex risk measure ρ¯(f)\bar{\rho}(f) by ρ~δ(f)\widetilde{\rho}^{\delta}(f).

Theorem 2.6.

Let ρ\rho be a qq–regular convex risk measure with q>1q>1. Let ff be bounded and satisfy the assumptions of Theorem 2.3. Let h=tM2h=\frac{t}{M^{2}}. For all t,λ>0,0<γ<1,t,\lambda>0,0<\gamma<1, and M,NM,N\in\mathbb{N}^{\star}, we have

𝔼[|ρ¯(f)ρ~δ(f)|2]C(1δ)1/q+C(δ,t,λ)71N+C(δ,t,λ)2γ2+C(δ,t,λ)3h2+C(δ,λ)4etC(λ)5+C(δ)61λ2,\displaystyle\mathbb{E}\Big{[}|\bar{\rho}(f)-\widetilde{\rho}^{\delta}(f)|^{2}]\leq C(1-\delta)^{1/q}+C^{7}_{(\delta,t,\lambda)}\frac{1}{N}+C^{2}_{(\delta,t,\lambda)}\gamma^{2}+C^{3}_{(\delta,t,\lambda)}h^{2}+C^{4}_{(\delta,\lambda)}e^{-tC^{5}_{(\lambda)}}+C^{6}_{(\delta)}\frac{1}{\lambda^{2}},

where C(δ,t,λ,t)7C^{7}_{(\delta,t,\lambda,t)} is given in the Appendix, and constants C(δ,t,λ)2,C(δ,t,λ)3,C(δ,λ)4C^{2}_{(\delta,t,\lambda)},C^{3}_{(\delta,t,\lambda)},C^{4}_{(\delta,\lambda)} and C(δ)6C^{6}_{(\delta)} correspond to those given in the Appendix, with uu replaced by δ\delta.

2.3 Numerical results on AVaR

Let us complement the above theoretical guarantees with empirical experiments555Code available at https://github.com/jiaruic/sgld_risk_measures. We first focus on the approximation of the average value at risk and the value at risk with respect to the time evolution of the Markov chain in the SGLD algorithm. Thus, for the numerical computations, we set

A=[0,1]d,λ=108,γ=108,h=104,andu=0.95.A=[0,1]^{d},\quad\lambda=10^{8},\quad\gamma=10^{-8},\quad h=10^{-4},\quad\text{and}\quad u=0.95.

We will consider two cases in our experiments. In the first case we assume the underlying distribution to be known and use Monte–Carlo simulation, and in the second case we use real historical stock price data.

2.3.1 Monte Carlo simulation

For the Monte Carlo experiments, we set N=5000N=5000. Figure 1(a) shows the convergence of AVaR in the 1 dimensional case with f(r,S)=Sf(r,S)=S, where SS is sampled from a Gaussian distributions. Figure 1(b) shows the estimation error, AVaR~uAVaR¯u\widetilde{\text{AVaR}}_{u}-\overline{\text{AVaR}}_{u}, where AVaR¯u\overline{\text{AVaR}}_{u} is the theoretical average value at risk for 1-dimensional Gaussian distributions given by

AVaR¯u=μ+σϕ(Φ1(u))1u,\overline{\text{AVaR}}_{u}=\mu+\sigma\frac{\phi(\Phi^{-1}(u))}{1-u}, (2.12)

where ϕ\phi and Φ\Phi are respectively, the PDF and the CDF of a standard Gaussian distribution.

Refer to caption
(a) Paths of AVaR
Refer to caption
(b) Estimation Error of AVaR
Figure 1: Numerical Results for 1-d Gaussian random variables

For the multi-dimensional case, we take the function

f(r,S)=i=1derij=1derjSi, for i=1,,d.f(r,S)=\sum_{i=1}^{d}\frac{e^{r_{i}}}{\sum_{j=1}^{d}e^{r_{j}}}S_{i},\text{ for }i=1,\cdots,d.

Figure 2(a) and Figure 2(b) show the convergence of VaR and AVaR in the 2-dimensional case, where S1S^{1} is sampled from 𝒩(1,4)\mathcal{N}(1,4) and S2S^{2} is sampled from 𝒩(0,1)\mathcal{N}(0,1).

Refer to caption
(a) Path of VaR
Refer to caption
(b) Path of AVaR
Figure 2: Portfolio of two Gaussian random variables

2.3.2 Numerical results with real data

In this subsection, we compute AVaR for a portfolio of 106 stocks using real aggregated stock prices over 15-minute time intervals from January 2, 2015 to August 31, 2015. Among 128 NASDAQ stocks that are "sufficiently liquid", we remove the ones with missing values, and use the remaining 106 stocks. For a detailed description of the data used and for a definition of "sufficiently liquid", please refer to Section 3.2 of Pohl, Ristig, Schachermayer, and Tangpi [52]. We use changes in stock prices instead of stock prices themselves, because stock prices are highly dependent. We present paths of the estimated optimized VaR and AVaR of the portfolio of 106 stocks in Figures 3(a) and 3(b) respectively.

Refer to caption
(a) Paths of VaR
Refer to caption
(b) Paths of AVaR
Figure 3: Computations on real price increments of portfolio of 106 stocks.

In addition, our approach can also be easily applied to a fixed portfolio of stocks. We take 20 stocks from the 106 stocks described above, and consider a fixed portfolio of equal weights, i.e., ri=120r_{i}=\frac{1}{20} for each ii. We present paths the estimated AVaR in Figure 4.

Refer to caption
Figure 4: Path of AVaR for a fixed portfolio

2.4 Numerical results on general risk measures

In order to simulate general risk measures one needs to specify the penalty function β\beta, or alternatively the precise form of ρ\rho since β\beta is given by [31]

β(γ)=supX𝒜ρ[0,1]AVaRu(X)γ(du),with𝒜ρ:={X𝕃:ρ(X+m)0}.\beta(\gamma)=\sup_{X\in\mathcal{A}_{\rho}}\int_{[0,1]}\mathrm{AVaR}_{u}(X)\gamma(du),\quad\text{with}\quad\mathcal{A}_{\rho}:=\{X\in\mathbb{L}^{\infty}:\rho(X+m)\leq 0\}.

In general, the simulation of ρ~δ(f)\tilde{\rho}^{\delta}(f) as given in (2.11) will probably require introducing neural networks since it is the value of an infinite dimensional optimization problem. This will be addressed in future research. We will focus here on a case where the problem can be simplified.

In fact, denote by μβ\partial_{\mu}\beta the so–called linear functional derivative of β\beta. It is defined as the function μβ:([0,1])×[0,1]\partial_{\mu}\beta:\mathcal{M}([0,1])\times[0,1]\to\mathbb{R} such that

β(μ)β(μ)=0101μβ((1λ)μ+λμ,x)(μμ)(dx)dλ.\beta(\mu^{\prime})-\beta(\mu)=\int_{0}^{1}\int_{0}^{1}\partial_{\mu}\beta((1-\lambda)\mu+\lambda\mu^{\prime},x)(\mu^{\prime}-\mu)(\mathrm{d}x)\mathrm{d}\lambda.

Up to an additive constant, there exists a unique such derivative μβ\partial_{\mu}\beta, see e.g. [11]. We have the following:

Proposition 2.7.

Let the assumptions of Theorem 2.6 hold and assume that β\beta admits a second order linear functional derivative that is jointly continuous and such that

supη1,η2𝕃2𝔼[supμ([0,1])|μβ(μ,η1)|+supμ([0,1])|μ2β(μ,η1,η2)|]K<.\sup_{\eta_{1},\eta_{2}\in\mathbb{L}^{2}}\mathbb{E}\Big{[}\sup_{\mu\in\mathcal{M}([0,1])}|\partial_{\mu}\beta(\mu,\eta_{1})|+\sup_{\mu\in\mathcal{M}([0,1])}|\partial_{\mu}^{2}\beta(\mu,\eta_{1},\eta_{2})|\Big{]}\leq K<\infty.

Then, it holds

𝔼[|inf(xi)i=1,,J[0,1]F(1Ji=1Jδxi)ρ(f)|2]\displaystyle\mathbb{E}\bigg{[}\Big{|}\inf_{(x_{i})_{i=1,\dots,J}\subset[0,1]}F\Big{(}\frac{1}{J}\sum_{i=1}^{J}\delta_{x_{i}}\Big{)}-\rho(f)\Big{|}^{2}\bigg{]}
\displaystyle\leq 4K2J2+C(1δ)1/q+C(1δ)1/q+C(δ,t,λ)71N+C(δ,t,λ)2γ2+C(δ,t,λ)3h2+C(δ,λ)4etC(λ)5+C(δ)61λ2,\displaystyle\frac{4K^{2}}{J^{2}}+C(1-\delta)^{1/q}+C(1-\delta)^{1/q}+C^{7}_{(\delta,t,\lambda)}\frac{1}{N}+C^{2}_{(\delta,t,\lambda)}\gamma^{2}+C^{3}_{(\delta,t,\lambda)}h^{2}+C^{4}_{(\delta,\lambda)}e^{-tC^{5}_{(\lambda)}}+C^{6}_{(\delta)}\frac{1}{\lambda^{2}},

with F(μ):=01AVaR~u(f)μ(du)β(μ)F(\mu):=\int_{0}^{1}\widetilde{\mathrm{AVaR}}_{u}(f)\mu(\mathrm{d}u)-\beta(\mu) (recall Equation (2.8)).

This is a direct consequence of Theorem 2.6 and [39, Theorem 2.4]. The proof is omitted.

As an illustrative example, we consider the so–called entropic value-at-risk introduced by Ahmadi-Javid [1] and studied e.g. by Pichler and Schlotter [51] and Föllmer and Knispel [27] in connection to large portfolio asymptotics. This is a risk measure based on the Rényi entropy given by

ρu(X)=sup{𝔼[ZX]:Z0,𝔼[Z]=1,Hq(Z)log11u}\rho_{u}(X)=\sup\Big{\{}\mathbb{E}[ZX]:Z\geq 0,\,\,\mathbb{E}[Z]=1,\,\,H_{q}(Z)\leq\log\frac{1}{1-u}\Big{\}}

with Hq(Z):=1q1log𝔼ZqH_{q}(Z):=\frac{1}{q-1}\log\mathbb{E}Z^{q} for q+{1}q\in\mathbb{R}^{*}_{+}\setminus\{1\}. The associated penalty function takes the form

β(γ):={0 if 01σγ(x)qdx(11u)q1+ else,withσγ(x):=0x11vγ(dv).\beta(\gamma):=\begin{cases}0\text{ if }\int_{0}^{1}\sigma_{\gamma}(x)^{q}\mathrm{d}x\leq\big{(}\frac{1}{1-u}\big{)}^{q-1}\\ +\infty\text{ else}\end{cases},\quad\text{with}\quad\sigma_{\gamma}(x):=\int_{0}^{x}\frac{1}{1-v}\gamma(\mathrm{d}v).

To numerically compute entropic value-at-risk, for a large kk, we simulate

sup(xi)i=1,,J[0,1]F(1Ji=1Jδxj):=sup(xi)i=1,,J[0,1](1Ji=1JAVaR~xi(f)k{01(1Ji=1N11xi1xjx)qdx(11u)q1}+).\sup_{(x_{i})_{i=1,\dots,J}\subset[0,1]}F\Big{(}\frac{1}{J}\sum_{i=1}^{J}\delta_{x_{j}}\Big{)}:=\sup_{(x_{i})_{i=1,\dots,J}\subset[0,1]}\bigg{(}\frac{1}{J}\sum_{i=1}^{J}\widetilde{\mathrm{AVaR}}_{x_{i}}(f)-k\Big{\{}\int_{0}^{1}\Big{(}\frac{1}{J}\sum_{i=1}^{N}\frac{1}{1-x_{i}}1_{x_{j}\leq x}\Big{)}^{q}\mathrm{d}x-(\frac{1}{1-u})^{q-1}\Big{\}}^{+}\bigg{)}.

For Monte Carlo simulation, we set J=5000J=5000, k=1018k=10^{18}, q=1.00001q=1.00001, and estimate the supremum over (xi)i=1,,J[0,1](x_{i})_{i=1,\dots,J}\subset[0,1] by the maximum of 5000 random partitions, each consisting of JJ points, of the interval [0,1][0,1]. Figure 5(a) shows the convergence of the entropic value at risk in the 1 dimensional case with f(r,S)=Sf(r,S)=S, where SS is sampled from 𝒩(1,2)\mathcal{N}(1,2). Figure 5(b) shows the estimation error compared to the theoretical entropic value-at-risk for 𝒩(1,2)\mathcal{N}(1,2) given by ρu(X)=1+2log((1u)2)\rho_{u}(X)=1+\sqrt{-2\log((1-u)2)}.

Refer to caption
(a) Paths of Entropic Value-at-risk
Refer to caption
(b) Estimation Error of Entropic Value-at-risk
Figure 5: Numerical Results Entropic Value-at-risk

3 Rates for the average value at risk

This section is dedicated to the proof of Theorem 2.3. We will start by some preliminary considerations allowing us to introduce ideas used in the proofs. The details of the proofs will be given in the subsection 3.2.

3.1 Preliminaries

The starting point of our method is to recognize (2.6) as the mm–th step of the Euler-Maruyama scheme that discretizes the stochastic differential equation

dZtλ=L(Ztλ)dt+2λ1dWt,Z0λ=zd\,\mathrm{d}Z_{t}^{\lambda}=-\nabla{L}(Z_{t}^{\lambda})\,\mathrm{d}t+\sqrt{2\lambda^{-1}}\,\mathrm{d}W_{t},\quad Z_{0}^{\lambda}=z\in\mathbb{R}^{d} (3.1)

where WW is a dd-dimensional Brownian motion. This SDE is the Langevin SDE, with inverse temperature parameter λ\lambda. The Langevin SDE is widely studied in physics [57] and for the sampling of Gibbs distribution via Markov chain Monte–Carlo methods [21]. Equipping the probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) with the \mathbb{P}–completion of the filtration of WW, the equation (3.1) admits a unique strong solution. It is well-known that this solution has a unique invariant measure (that we denote by μλ\mu_{\infty}^{\lambda}) and whose density reads666As usual, in this article, we use the same notation for a probability measure on n\mathbb{R}^{n} for any nn\in\mathbb{N} and its density function.

μλ(x)=eλL(x)deλL(z)dz,\mu_{\infty}^{\lambda}(x)=\frac{e^{-\lambda{L}(x)}}{\int_{\mathbb{R}^{d}}e^{-\lambda{L}(z)}\,\mathrm{d}z}, (3.2)

see e.g. [47, Lemma 2.1]. In this work, the interest of the Langevin equation (aside from its analytical tractability) stems from the fact that the limiting measure μ\mu_{\infty} of μλ\mu_{\infty}^{\lambda} as λ\lambda\to\infty concentrates on the minimizers of LL, which we will show exist. This follows from results of Hwang [40]. Intuitively, this means that if (r,m)(r^{*},m^{*}) is the minimizer of LL, then for λ\lambda\to\infty

dL(z)μλ(dz)L(r,q).\int_{\mathbb{R}^{d}}L(z)\mu_{\infty}^{\lambda}(\,\mathrm{d}z)\approx L(r^{*},q^{*}). (3.3)

Moreover, the Langevin equation allows us to exploit classical techniques in order to derive explicit convergence rates to the invariant measure in the present non–convex potential case.

Remark 3.1.

One interesting byproduct of our method is that, the simulation of AVaR\mathrm{AVaR} directly allows to compute the value at risk and the optimal portfolios, as well as deriving non–asymptotic rates. Let us illustrate this on the problem of simulation of optimal portfolios rr^{*} in Equation 2.2. As observed above, μλ\mu^{\lambda}_{\infty} converges to a measure μ\mu supported on the optimal portfolios. Now, let G:dG:\mathbb{R}^{d}\to\mathbb{R} be a strictly convex function such that the gradient G\nabla G is invertible. Then, by Taylor’s expansion we have

G(Z¯M,hλ,n)G(q,r)G(K)(Z¯M,hλ,n(q,r))G(\overline{Z}^{\lambda,n}_{M,h})-G(q^{*},r^{*})\geq\nabla G(K)(\overline{Z}^{\lambda,n}_{M,h}-(q^{*},r^{*}))

for some random variable KK, showing that

Z~M,hλ,n(q,r)G(K)1|G(Z~M,hλ,n)G(q,r)|.\|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}-(q^{*},r^{*})\|\leq\|\nabla G(K)^{-1}\||G(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-G(q^{*},r^{*})|.

Therefore, provided that the inverse of G\nabla G does not grow too fast, the argument we give below to prove Theorem 2.3 would allow to derive theoretical guarantees for the optimal portfolio as well, replacing LL by GG.

3.2 Proof of Theorem 2.3

Throughout this section we assume that the assumptions of Theorem 2.3 are satisfied. We split the proof into several intermediate lemmas. The first one is probably well known, it asserts that the optimization problem defining AVaR¯(f)\overline{\mathrm{AVaR}}(f) admits a solution.

Lemma 3.2.

The function L~\widetilde{L} defined in Equation 2.3 admits a minimum.

Proof.

In [40, Proposition 2.1], Hwang gives a sufficient condition for L~\widetilde{L} admitting a minimum: {μλ}\{\mu_{\infty}^{\lambda}\} is tight. A sufficient condition for the tightness of {μλ}λ>0\{\mu_{\infty}^{\lambda}\}_{\lambda>0} is that there exists ε>0\varepsilon>0 such that the set B:={zd:L~(z)ε}B:=\{z\in\mathbb{R}^{d}:\widetilde{L}(z)\leq\varepsilon\} is compact [40, Proposition 2.3]. The rest of the proof checks the compactness of set BB for any ϵ>0\epsilon>0.

Since L~\widetilde{L} is continuous, the set B={zd:L~(z)ε}B=\{z\in\mathbb{R}^{d}:\widetilde{L}(z)\leq\varepsilon\} is closed as the pre-image of the closed set (,ε](-\infty,\varepsilon]. In addition, since 11u(x)+>x\frac{1}{1-u}(x)^{+}>x for |x||x| large enough, we have that B={zd:11u𝔼[(f(r,S)q)+]+qϵ}B=\{z\in\mathbb{R}^{d}:\frac{1}{1-u}\mathbb{E}[(f(r,S)-q)^{+}]+q\leq\epsilon\} is bounded. To see this, assume to the contrary that BB is unbounded. Then there exists a sequence {zi}B\{z_{i}\}\in B such that zi\|z_{i}\|\to\infty. Then for the subsequence of {zi}\{z_{i}\} with 11u(f(r,S)q)+>f(r,S)\frac{1}{1-u}(f(r,S)-q)^{+}>f(r,S), we have L~(xi)\widetilde{L}(x_{i})\to\infty, which contradicts L~(x)ϵ\widetilde{L}(x)\leq\epsilon. Thus, the set BB is bounded, and is therefore compact. ∎

To derive the claimed convergence rate, we decompose the expected error into terms that will be handled independently. First, we will exploit the approximation (3.3). Next, using the NN independent Brownian motions W~n\widetilde{W}^{n} introduced just before (2.6), we construct NN i.i.d. copies Z^tλ,n\widehat{Z}_{t}^{\lambda,n} of the solution of the Langevin equation as solutions of the SDEs

dZ^tλ,n=L(Z^tλ,n)dt+2λ1dW~n,n=1,,N.\,\mathrm{d}\widehat{Z}^{\lambda,n}_{t}=-\nabla L(\widehat{Z}^{\lambda,n}_{t})\,\mathrm{d}t+\sqrt{2\lambda^{-1}}\,\mathrm{d}\widetilde{W}^{n},\quad n=1,\dots,N. (3.4)

Recall that

Z~m+1,hλ,n=Z~m,hλ,nL(Z~m,hλ,n)h+2λ1ΔW~hn,withΔW~hn:=W~m+1nW~mn,\widetilde{Z}_{m+1,h}^{\lambda,n}=\widetilde{Z}_{m,h}^{\lambda,n}-\nabla L(\widetilde{Z}_{m,h}^{\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\widetilde{W}_{h}^{n},\quad\text{with}\quad\Delta\widetilde{W}_{h}^{n}:=\widetilde{W}^{n}_{m+1}-\widetilde{W}^{n}_{m},
Z~m+1,hλ,n=Z~m,hλ,n(Z~m,hλ,n)h+2λ1ΔW~hn,withΔW~hn:=W~m+1nW~mn,\widetilde{Z}_{m+1,h}^{\prime\lambda,n}=\widetilde{Z}_{m,h}^{\prime\lambda,n}-\nabla\ell(\widetilde{Z}_{m,h}^{\prime\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\widetilde{W}_{h}^{n},\quad\text{with}\quad\Delta\widetilde{W}_{h}^{n}:=\widetilde{W}^{n}_{m+1}-\widetilde{W}^{n}_{m},

and

Z¯m+1,hλ,n=Z¯m,hλ,n¯(Z¯m,hλ,n)h+2λ1ΔW¯hn,withΔW¯hn:=W¯m+1nW¯mn.\overline{Z}_{m+1,h}^{\lambda,n}=\overline{Z}_{m,h}^{\lambda,n}-\nabla\overline{\ell}(\overline{Z}_{m,h}^{\lambda,n})h+\sqrt{2\lambda^{-1}}\Delta\overline{W}_{h}^{n},\quad\text{with}\quad\Delta\overline{W}_{h}^{n}:=\overline{W}^{n}_{m+1}-\overline{W}^{n}_{m}.

We decompose the error as

𝔼[|1Nn=1N¯(Z¯M,hλ,n)AVAR¯u(f)|2]26𝔼[|1Nn=1N¯(Z¯M,hλ,n)1Nn=1N(Z~M,hλ,n)|2]+26𝔼[|1Nn=1N(Z~M,hλ,n)1Nn=1NL(Z~M,hλ,n)|2]\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\overline{Z}^{\lambda,n}_{M,h})-\overline{\text{AVAR}}_{u}(f)\Big{|}^{2}\bigg{]}\leq 2^{6}\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\overline{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}+2^{6}\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}{L}(\widetilde{Z}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}
+26𝔼[|1Nn=1NL(Z~M,hλ,n)1Nn=1NL(Z^tλ,n)|2]+26𝔼[|1Nn=1NL(Z^tλ,n)𝔼[L(Ztλ)]|2]+26|𝔼[L(Ztλ)]dLdμλ|2\displaystyle\qquad+2^{6}\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}{L}(\widehat{Z}^{\lambda,n}_{t})\Big{|}^{2}\bigg{]}+2^{6}\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}{L}(\widehat{Z}^{\lambda,n}_{t})-\mathbb{E}[L(Z_{t}^{\lambda})]\Big{|}^{2}\bigg{]}+2^{6}\Big{|}\mathbb{E}[L(Z^{\lambda}_{t})]-\int_{\mathbb{R}^{d}}{L}\,\mathrm{d}\mu^{\lambda}_{\infty}\Big{|}^{2}
+26|dLdμλdL~dμλ|2+26|dL~𝑑μλAVaR¯u(f)|2.\displaystyle\qquad+2^{6}\Big{|}\int_{\mathbb{R}^{d}}{L}\,\mathrm{d}\mu_{\infty}^{\lambda}-\int_{\mathbb{R}^{d}}\widetilde{L}\,\mathrm{d}\mu_{\infty}^{\lambda}\Big{|}^{2}+2^{6}\Big{|}\int_{\mathbb{R}^{d}}\widetilde{L}d\mu_{\infty}^{\lambda}-\overline{\text{AVaR}}_{u}(f)\Big{|}^{2}. (3.5)

The rest of the proof consists in controlling each term above separately.

Lemma 3.3.

Under the conditions of Theorem 2.3, for all t,0<γ<1,u(0,1),N,Mt,0<\gamma<1,u\in(0,1),N,M\in\mathbb{N}^{\star} and λ>1\lambda>1, we have

𝔼[|1Nn=1N¯(Z¯M,hλ,n)1Nn=1N(Z~M,hλ,n)|2]C(u,t,λ)1h2+C(u,t,λ)2γ2,\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\overline{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}\leq C^{{}^{\prime}1}_{(u,t,\lambda)}h^{2}+C^{{}^{\prime}2}_{(u,t,\lambda)}\gamma^{2},

where C(u,t,λ)1C^{{}^{\prime}1}_{(u,t,\lambda)}, and C(u,t,λ)2C^{{}^{\prime}2}_{(u,t,\lambda)} are given in equations (5.1) and (5.2).

Proof.

Let Z¯M,hλ,n,d1\overline{Z}^{\lambda,n,d-1}_{M,h} denote the last d1d-1 coordinates of Z¯M,hλ,n\overline{Z}^{\lambda,n}_{M,h}. By the definition of l¯\overline{l} and Jensen’s inequality, we have

𝔼[|1Nn=1N¯(Z¯M,hλ,n)1Nn=1N(Z~M,hλ,n)|2]\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\overline{\ell}(\overline{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}
=𝔼[|1Nn=1N{(Z¯M,hλ,n)+γ2dist2(Z¯M,hλ,n,d1,A)(Z~M,hλ,n)}|2]\displaystyle\quad=\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\left\{\ell(\overline{Z}^{\lambda,n}_{M,h})+\frac{\gamma}{2}\text{dist}^{2}(\overline{Z}^{\lambda,n,d-1}_{M,h},A)-\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\right\}\Big{|}^{2}\bigg{]}
C(1Nn=1N𝔼[((Z¯M,hλ,n)(Z~M,hλ,n))2]+γ2Nn=1N𝔼dist4(Z¯M,hλ,n,d1,A)).\displaystyle\quad\leq C\bigg{(}\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\Big{(}\ell(\overline{Z}^{\lambda,n}_{M,h})-\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}+\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\text{dist}^{4}(\overline{Z}^{\lambda,n,d-1}_{M,h},A)\bigg{)}. (3.6)

For the first term in (3.6), using the definition of \ell, we have

𝔼[((Z¯M,hλ,n)(Z~M,hλ,n))2]\displaystyle\mathbb{E}\Big{[}(\ell\Big{(}\overline{Z}^{\lambda,n}_{M,h})-\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}
=\displaystyle= 𝔼[(~(Z¯M,hλ,n)~(Z~M,hλ,n))2]+γ22𝔼[Z¯M,hλ,nZ~M,hλ,n2]\displaystyle\mathbb{E}\Big{[}\Big{(}\widetilde{\ell}(\overline{Z}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}+\frac{\gamma^{2}}{2}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}
\displaystyle\leq ((21u)2+γ22)𝔼[Z¯M,hλ,nZ~M,hλ,n2],\displaystyle\Bigg{(}\Big{(}\frac{2}{1-u}\Big{)}^{2}+\frac{\gamma^{2}}{2}\Bigg{)}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}, (3.7)

where we used that ~\widetilde{\ell} is 21u\frac{2}{1-u}-Lipschitz in the last step.

To control 𝔼[Z¯M,hλ,nZ~M,hλ,n2]\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}, using the definitions of Z¯M,hλ,n\overline{Z}^{\lambda,n}_{M,h} and Z~M,hλ,n\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}, we have

𝔼[Z¯M,hλ,nZ~M,hλ,n2]C𝔼[Z¯M1,hλ,nZ~M1,hλ,n2]+Ch2𝔼[¯(Z¯M1,hλ,n)(Z~M1,hλ,n)2].\displaystyle\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}\leq C\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M-1,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h}\big{\|}^{2}\Big{]}+Ch^{2}\mathbb{E}\Big{[}\big{\|}\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{M-1,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h})\big{\|}^{2}\Big{]}.

Using this recursive relationship, it can be checked by induction that we have

𝔼[Z¯M,hλ,nZ~M,hλ,n2]Cm=0M1h2𝔼[¯(Z¯m,hλ,n)(Z~m,hλ,n)2].\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}\leq C\sum_{m=0}^{M-1}h^{2}\mathbb{E}\Big{[}\big{\|}\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{m,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}. (3.8)

Note that the derivative of dist2(r,A)\text{dist}^{2}(r,A) is 2(rA(r)),2(r-A(r)), where we denote A(r):=argmindist2(r,A)A(r):=\operatorname*{arg\,min}{\text{dist}^{2}(r,A)}. In addition, A(r)=1\nabla A(r)=1. Therefore, ¯\nabla\overline{\ell} is (11u+2)\Big{(}\frac{1}{1-u}+2\Big{)}- Lipschitz. By adding and then subtracting ¯(Z~m,hλ,n)\nabla\overline{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}), and then using the definitions of ¯\nabla\overline{\ell} and \nabla\ell, we have

𝔼[¯(Z¯m,hλ,n)(Z~m,hλ,n)2]\displaystyle\mathbb{E}\Big{[}\big{\|}\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{m,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}
\displaystyle\leq C𝔼[¯(Z¯m,hλ,n)¯(Z~m,hλ,n)2]+C𝔼[¯(Z~m,hλ,n)(Z~m,hλ,n)2]\displaystyle C\mathbb{E}\Big{[}\big{\|}\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{m,h})-\nabla\overline{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}+C\mathbb{E}\Big{[}\big{\|}\nabla\overline{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}
\displaystyle\leq (1(1u)2+2)𝔼[Z¯m,hλ,nZ~m,hλ,n2]+C𝔼[Z~m,hλ,nA(Z~m,hλ,n)2].\displaystyle\Big{(}\frac{1}{(1-u)^{2}}+2\Big{)}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}\big{\|}^{2}\Big{]}+C\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}-A(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}. (3.9)

Combining equations (3.8) and (3.9), we have

𝔼[Z¯M1,hλ,nZ~M1,hλ,n2]Ch2m=0M1{(1(1u)2+2)𝔼[Z¯m,hλ,nZ~m,hλ,n2]+𝔼[Z~m,hλ,n2]+𝔼[A(Z~m,hλ,n)2]}.\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M-1,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h}\big{\|}^{2}\Big{]}\leq Ch^{2}\sum_{m=0}^{M-1}\Bigg{\{}\Big{(}\frac{1}{(1-u)^{2}}+2\Big{)}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}\big{\|}^{2}\Big{]}+\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}\big{\|}^{2}\Big{]}+\mathbb{E}\Big{[}\big{\|}A(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{2}\Big{]}\Bigg{\}}.

Using the discrete version of the Grönwall’s inequality [36, Proposition 5], we have

𝔼[Z¯M1,hλ,nZ~M1,hλ,n2]Ch2M(𝔼[Z~M,hλ,n2]+𝔼[A(Z~M,hλ,n)2])exp(C(1(1u)2+2)Mh2).\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{M-1,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h}\big{\|}^{2}]\leq Ch^{2}M\bigg{(}\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\Big{]}+\mathbb{E}\Big{[}\big{\|}A(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{\|}^{2}\Big{]}\bigg{)}\exp\Bigg{(}C\bigg{(}\frac{1}{(1-u)^{2}}+2\bigg{)}Mh^{2}\Bigg{)}. (3.10)

For the second term in (3.6), we rewrite as

γ2Nn=1N𝔼dist4(Z¯M,hλ,n,d1,A)=γ2Nn=1N𝔼[Z¯M,hλ,n,d1A(Z¯M,hλ,n,d1)4]γ2Nn=1N𝔼Z¯M,hλ,n4+Cγ2,\displaystyle\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\text{dist}^{4}(\overline{Z}^{\lambda,n,d-1}_{M,h},A)=\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n,d-1}_{M,h}-A(\overline{Z}^{\lambda,n,d-1}_{M,h})\big{\|}^{4}\Big{]}\leq\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\big{\|}\overline{Z}^{\lambda,n}_{M,h}\big{\|}^{4}+C\gamma^{2}, (3.11)

where the last step follows because the set AA is compact.

It remains to bound the fourth moments of Z¯M,hλ,n\overline{Z}^{\lambda,n}_{M,h} and Z~M,h,λ,n\widetilde{Z}^{{}^{\prime},\lambda,n}_{M,h}. Using the definition of Z¯m+1,hλ,n\overline{Z}^{\lambda,n}_{m+1,h}, and letting Cd=(d2+1)(d2)C_{d}=\Big{(}\frac{d}{2}+1\Big{)}\Big{(}\frac{d}{2}\Big{)} we have

𝔼[Z¯m,hλ,n4]\displaystyle\mathbb{E}[\big{\|}\overline{Z}^{\lambda,n}_{m,h}\big{\|}^{4}] =𝔼[Z~m1,hλ,nh¯(Z¯m1,hλ,n)+2λ1ΔWh~4]\displaystyle=\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}-h\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{m-1,h})+\sqrt{2\lambda^{-1}}\Delta\widetilde{W_{h}}\big{\|}^{4}\Big{]}
C(𝔼[Z¯m1,hλ,n4]+h4𝔼[|¯(Z¯m1,hλ,n)|4]+h2λ2(d2+1)(d2))\displaystyle\leq C\bigg{(}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\Big{[}|\nabla\overline{\ell}(\overline{Z}^{\lambda,n}_{m-1,h})|^{4}\Big{]}+\frac{h^{2}}{\lambda^{2}}\Big{(}\frac{d}{2}+1\Big{)}\Big{(}\frac{d}{2}\Big{)}\bigg{)}
C(𝔼[Z¯m1,hλ,n4]+h4𝔼[|(Z¯m1,hλ,n)|4]+h4𝔼[2(Z¯m1,hλ,nA(Z¯m1,hλ,n))4]+h2λ2Cd)\displaystyle\leq C\bigg{(}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\Big{[}\big{|}\nabla\ell(\overline{Z}^{\lambda,n}_{m-1,h})\big{|}^{4}\Big{]}+h^{4}\mathbb{E}\Big{[}\big{\|}2(\overline{Z}^{\lambda,n}_{m-1,h}-A(\overline{Z}^{\lambda,n}_{m-1,h}))\big{\|}^{4}\Big{]}+\frac{h^{2}}{\lambda^{2}}C_{d}\bigg{)}
C(𝔼[Z¯m1,hλ,n4]+h4𝔼[21u+γZ¯m1,hλ,n4]+h4𝔼[Z¯m1,hλ,n4]+h4𝔼[A(Z¯m1,hλ,n)4]+h2λ2Cd)\displaystyle\leq C\bigg{(}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\bigg{[}\Big{\|}\frac{2}{1-u}+\gamma\overline{Z}^{\lambda,n}_{m-1,h}\Big{\|}^{4}\bigg{]}+h^{4}\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\Big{[}\big{\|}A(\overline{Z}^{\lambda,n}_{m-1,h})\big{\|}^{4}\Big{]}+\frac{h^{2}}{\lambda^{2}}C_{d}\bigg{)}
C((1+γ4h4+h4)𝔼[Z¯m1,hλ,n4]+h4(1u)4+h4+h2λ2Cd),\displaystyle\leq C\left((1+\gamma^{4}h^{4}+h^{4})\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+\frac{h^{4}}{(1-u)^{4}}+h^{4}+\frac{h^{2}}{\lambda^{2}}C_{d}\right),

where we used the compactness of AA in the last step. Using this recursive relationship, it can be checked by induction that for all mm\in\mathbb{N}, we have

𝔼[Z¯m,hλ,n4]\displaystyle\mathbb{E}\Big{[}\big{\|}\overline{Z}^{\lambda,n}_{m,h}\big{\|}^{4}\Big{]} C((1+γ4h4+h4)m𝔼[Z¯0,hλ,n4]+(h4(1u)4+h4+h2λ2Cd)i=1m(1+γ4h4+h4)i)\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4}+h^{4})^{m}\mathbb{E}[\big{\|}\overline{Z}^{\lambda,n}_{0,h}\big{\|}^{4}]+\Big{(}\frac{h^{4}}{(1-u)^{4}}+h^{4}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}\sum_{i=1}^{m}(1+\gamma^{4}h^{4}+h^{4})^{i}\bigg{)}
C((1+γ4h4+h4)m+(h4(1u)4+h4+h2λ2Cd)(1+γ4h4+h4)((1+γ4h4+h4)m1)(1+γ4h4+h4)1)\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4}+h^{4})^{m}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+h^{4}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}\frac{(1+\gamma^{4}h^{4}+h^{4})((1+\gamma^{4}h^{4}+h^{4})^{m}-1)}{(1+\gamma^{4}h^{4}+h^{4})-1}\bigg{)}
C((1+γ4h4+h4)m+(h4(1u)4+h4+h2λ2Cd)(1+γ4h4+h4)m),\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4}+h^{4})^{m}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+h^{4}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}(1+\gamma^{4}h^{4}+h^{4})^{m}\bigg{)}, (3.12)

where the second inequality uses sum of geometric series. The bound of Z~M,hλ,n\widetilde{Z}^{{}^{\prime}\lambda,n}_{M,h} is given in the proof of Lemma 3.4 below. Combining equations (3.6), (3.2), (3.10),(3.11), and the moments given in equations (3.2) and (3.2), and taking h,γ1h,\gamma\leq 1, and M1M\geq 1, we have the result of the lemma. ∎

Lemma 3.4.

Under the conditions of Theorem 2.3, for all t,γ>0,u(0,1),N,Mt,\gamma>0,u\in(0,1),N,M\in\mathbb{N}^{\star} and λ>1\lambda>1 if h<1(21u+γ)h<\frac{1}{\left(\frac{2}{1-u}+\gamma\right)} then we have

𝔼[|1Nn=1N(Z~M,hλ,n)1Nn=1NL(Z~M,hλ,n)|2]C(u,t)3h2+(1+C(u,t)4)CN+C(u,t,λ)5γ2,\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}L(\widetilde{Z}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}\leq C^{{}^{\prime}3}_{(u,t)}h^{2}+\big{(}1+C^{{}^{\prime}4}_{(u,t)}\big{)}\frac{C}{N}+C^{{}^{\prime}5}_{(u,t,\lambda)}\gamma^{2},

for some constants C(u,t)3,C(u,t)4C^{{}^{\prime}3}_{(u,t)},C^{{}^{\prime}4}_{(u,t)} and C(u,t,λ)5C^{{}^{\prime}5}_{(u,t,\lambda)} given in equations (5.3) - (5.5).

Proof.

By the definition of LL and Jensen’s inequality, we have

𝔼[|1Nn=1N(Z~M,hλ,n)1Nn=1NL(Z~M,hλ,n)|2]\displaystyle\mathbb{E}\bigg{[}\Big{|}\frac{1}{N}\sum_{n=1}^{N}\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}L(\widetilde{Z}^{\lambda,n}_{M,h})\Big{|}^{2}\bigg{]}
=𝔼[|1Nn=1N{L~(Z~M,hλ,n)+γ2Z~M,hλ,n2~(Z~M,hλ,n)γ2Z~M,hλ,n2}|2]\displaystyle\quad=\mathbb{E}\Bigg{[}\bigg{|}\frac{1}{N}\sum_{n=1}^{N}\left\{\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})+\frac{\gamma}{2}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{2}-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\frac{\gamma}{2}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\right\}\bigg{|}^{2}\Bigg{]}
C(1Nn=1N𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2]+γ2Nn=1N𝔼[(Z~M,hλ,n2Z~M,hλ,n2)2])\displaystyle\quad\leq C\bigg{(}\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}(\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{\ell}\big{(}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{)}^{2}\Big{]}+\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\big{(}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{2}-\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\big{)}^{2}\Big{]}\bigg{)}
CNn=1N𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2]+Cγ2Nn=1N(𝔼[Z~M,hλ,n4]+𝔼[Z~M,hλ,n4]),\displaystyle\quad\leq\frac{C}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{)}^{2}\Big{]}+C\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\Big{(}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{4}\Big{]}+\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\Big{]}\Big{)}, (3.13)

where the last inequality follows by Cauchy-Schwarz inequality.

Next, we will bound the fourth moments of Z~M,hλ,n\widetilde{Z}^{\lambda,n}_{M,h} and Z~M,hλ,n\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}. Recall Cd=(d2+1)(d2)C_{d}=\Big{(}\frac{d}{2}+1\Big{)}\Big{(}\frac{d}{2}\Big{)}. Using the definition of Z~m+1,hλ,n\widetilde{Z}^{\lambda,n}_{m+1,h}, we have

𝔼[Z~m,hλ,n4]\displaystyle\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m,h}\big{\|}^{4}\Big{]} =𝔼[Z~m1,hλ,nhL(Z~m1,hλ,n)+2λ1ΔWh~4]\displaystyle=\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}-h\nabla L(\widetilde{Z}^{\lambda,n}_{m-1,h})+\sqrt{2\lambda^{-1}}\Delta\widetilde{W_{h}}\big{\|}^{4}\Big{]}
C(𝔼[Z~m1,hλ,n4]+h4𝔼[|L(Z~m1,hλ,n)|4]+h2λ2Cd)\displaystyle\leq C\bigg{(}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\Big{[}|\nabla L(\widetilde{Z}^{\lambda,n}_{m-1,h})|^{4}\Big{]}+\frac{h^{2}}{\lambda^{2}}C_{d}\bigg{)}
C(𝔼[Z~m1,hλ,n4]+h4𝔼[21u+γZ~m1,hλ,n4]+h2λ2Cd)\displaystyle\leq C\bigg{(}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+h^{4}\mathbb{E}\bigg{[}\Big{\|}\frac{2}{1-u}+\gamma\widetilde{Z}^{\lambda,n}_{m-1,h}\Big{\|}^{4}\bigg{]}+\frac{h^{2}}{\lambda^{2}}C_{d}\bigg{)}
C(𝔼[Z~m1,hλ,n4]+h4(1u)4+γ4h4𝔼[Z~m1,hλ,n4]+h2λ2Cd)\displaystyle\leq C\left(\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+\frac{h^{4}}{(1-u)^{4}}+\gamma^{4}h^{4}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+\frac{h^{2}}{\lambda^{2}}C_{d}\right)
C((1+γ4h4)𝔼[Z~m1,hλ,n4]+h4(1u)4+h2λ2Cd),\displaystyle\leq C\left((1+\gamma^{4}h^{4})\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m-1,h}\big{\|}^{4}\Big{]}+\frac{h^{4}}{(1-u)^{4}}+\frac{h^{2}}{\lambda^{2}}C_{d}\right),

where the third inequality follows from the fact that L~\widetilde{L} is 21u\frac{2}{1-u}-Lipschitz. Using this recursive relationship, it can be checked by induction that for all mm\in\mathbb{N},

𝔼[Z~m,hλ,n4]\displaystyle\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{m,h}\big{\|}^{4}\Big{]} C((1+γ4h4)m𝔼[Z~0,hλ,n4]+(h4(1u)4+h2λ2Cd)i=1m(1+γ4h4)i)\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4})^{m}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{0,h}\big{\|}^{4}\Big{]}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}\sum_{i=1}^{m}(1+\gamma^{4}h^{4})^{i}\bigg{)}
C((1+γ4h4)m+(h4(1u)4+h2λ2Cd)(1+γ4h4)((1+γ4h4)m1)(1+γ4h4)1)\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4})^{m}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}\frac{(1+\gamma^{4}h^{4})((1+\gamma^{4}h^{4})^{m}-1)}{(1+\gamma^{4}h^{4})-1}\bigg{)}
C((1+γ4h4)m+(h4(1u)4+h2λ2Cd)(1+γ4h4)m)\displaystyle\leq C\bigg{(}(1+\gamma^{4}h^{4})^{m}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}(1+\gamma^{4}h^{4})^{m}\bigg{)} (3.14)

where the second inequality follows by properties of geometric series. By the same argument, we also have

𝔼[Z~M,hλ,n4]C((1+γ4h4)M+(h4(1u)4+h2λ2Cd)(1+γ4h4)M).\displaystyle\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\Big{]}\leq C\bigg{(}(1+\gamma^{4}h^{4})^{M}+\Big{(}\frac{h^{4}}{(1-u)^{4}}+\frac{h^{2}}{\lambda^{2}}C_{d}\Big{)}(1+\gamma^{4}h^{4})^{M}\bigg{)}. (3.15)

Thus, we have

γ2Nn=1N(𝔼[Z~M,hλ,n4]+𝔼[Z~M,hλ,n4])γ2C(u,t,γ)5,\displaystyle\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\Big{(}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{4}\Big{]}+\mathbb{E}\Big{[}\big{\|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\Big{]}\Big{)}\leq\gamma^{2}C^{{}^{\prime}5}_{(u,t,\gamma)}, (3.16)

where C(u,t,γ)5C^{{}^{\prime}5}_{(u,t,\gamma)} is given in (5.5).

Let us now turn to the first term on the right hand side of (3.13). Adding and subtracting L~(Z~M,hλ,n)\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}), we have

𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2]2𝔼[(L~(Z~M,hλ,n)L~(Z~M,hλ,n))2]+2𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2].\displaystyle\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}\leq 2\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}+2\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]}. (3.17)

Using that L~\widetilde{L} is 21u\frac{2}{1-u}–Lipschitz, it follows that

𝔼[(L~(Z~M,hλ,n)L~(Z~M,hλ,n))2]\displaystyle\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]} 4(1u)2𝔼[Z~M,hλ,nZ~M,hλ,n2]\displaystyle\leq\frac{4}{(1-u)^{2}}\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{2}\right]

Next, we will bound the fourth moment of the difference between Z~M,hλ,n\widetilde{Z}^{\lambda,n}_{M,h} and Z~M,hλ,n\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}. Using the definitions of Z~M,hλ,n\widetilde{Z}^{\lambda,n}_{M,h} and Z~M,hλ,n\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}, given in Equation 2.4 and Equation 2.6 respectively, we have

𝔼[Z~M,hλ,nZ~M,hλ,n4]=𝔼[Z~M1,hλ,nhL(Z~M1,hλ,n)Z~M1,hλ,n+h(Z~M1,hλ,n)4]\displaystyle\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\right]=\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M-1,h}-h\nabla L(\widetilde{Z}^{\lambda,n}_{M-1,h})-\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h}+h\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h})\big{\|}^{4}\right]
C𝔼[Z~M1,hλ,nZ~M1,hλ,n4]+Ch4𝔼[L(Z~M1,hλ,n)(Z~M1,hλ,n)4].\displaystyle\leq C\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M-1,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h}\big{\|}^{4}\right]+Ch^{4}\mathbb{E}\left[\big{\|}\nabla L(\widetilde{Z}^{\lambda,n}_{M-1,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M-1,h})\big{\|}^{4}\right].

Using this recursive relationship, it can be checked by induction that we have

𝔼[Z~M,hλ,nZ~M,hλ,n4]Cm=0M1h4𝔼[L(Z~m,hλ,n)(Z~m,hλ,n)4].\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\right]\leq C\sum_{m=0}^{M-1}h^{4}\mathbb{E}\left[\big{\|}\nabla L(\widetilde{Z}^{\lambda,n}_{m,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{m,h})\big{\|}^{4}\right]. (3.18)

Let Z~M,hλ,n,d1\widetilde{Z}^{\lambda,n,d-1}_{M,h} (resp. Z~M,hλ,n,d1\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h}) denote the last d1d-1 coordinates of Z~M,hλ,n\widetilde{Z}^{\lambda,n}_{M,h} (resp. Z~M,hλ,n\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}). Define the vector 𝑺:=(1,S1,,Sd)\boldsymbol{S}:=(-1,S^{1},\cdots,S^{d}), and let e1de_{1}\in\mathbb{R}^{d} be a vector with 1+γm1+\gamma m in the first entry and 0 everywhere else. Adding and subtracting L(Z~M,hλ,n)\nabla L(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}), we get

𝔼[L(Z~M,hλ,n)(Z~M,hλ,n)4]\displaystyle\mathbb{E}\left[\big{\|}\nabla L(\widetilde{Z}^{\lambda,n}_{M,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{\|}^{4}\right]
\displaystyle\leq C𝔼[L(Z~M,hλ,n)L(Z~M,hλ,n)4]+C𝔼[L(Z~M,hλ,n)(Z~M,hλ,n)4]\displaystyle C\mathbb{E}\left[\big{\|}\nabla L(\widetilde{Z}^{\lambda,n}_{M,h})-\nabla L(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{\|}^{4}\right]+C\mathbb{E}\left[\big{\|}\nabla L(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\nabla\ell(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\big{\|}^{4}\right]
\displaystyle\leq C(11u+γ)4𝔼[Z~M,hλ,nZ~M,hλ,n4]\displaystyle C\left(\frac{1}{1-u}+\gamma\right)^{4}\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\right]
+C𝔼[|(e1+11u𝔼[rf(Z~M,hλ,n,d1,S)1{f(Z~M,hλ,n,d1,S)Z~M,hλ,n}|Z~M,hλ,n])\displaystyle+C\mathbb{E}\bigg{[}\bigg{|}\Big{(}e_{1}+\frac{1}{1-u}\mathbb{E}\big{[}\nabla_{r}f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)1_{\{f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)\geq\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\}}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{]}\Big{)} (3.19)
(e1+11u1Ni=1N(f(Z~M,hλ,n,d1,Si)1{f(Z~M,hλ,n,d1,Si)Z~M,hλ,n(1)}))|4]\displaystyle-\Big{(}e_{1}+\frac{1}{1-u}\frac{1}{N}\sum_{i=1}^{N}(\nabla f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})1_{\{f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})\geq\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}(1)\}})\Big{)}\bigg{|}^{4}\bigg{]}
\displaystyle\leq C(11u+γ)4𝔼[Z~M,hλ,nZ~M,hλ,n4]+C(11u)4,\displaystyle C\bigg{(}\frac{1}{1-u}+\gamma\bigg{)}^{4}\mathbb{E}\bigg{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\bigg{]}+C\bigg{(}\frac{1}{1-u}\bigg{)}^{4}, (3.20)

where the last inequality follows by Assumption 2.2 (iv)(iv) . Putting equations (3.18), (3.20) together, we have

𝔼[Z~M,hλ,nZ~M,hλ,n4]Ch2(11u+γ)4m=0M1𝔼[Z~m,hλ,nZ~m,hλ,n4]+CMh4(11u)4.\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\right]\leq Ch^{2}\left(\frac{1}{1-u}+\gamma\right)^{4}\sum_{m=0}^{M-1}\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{m,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{m,h}\big{\|}^{4}\right]+CMh^{4}\left(\frac{1}{1-u}\right)^{4}.

Using the discrete version of the Grönwall’s inequality [36, Proposition 5], we have

𝔼[Z~M,hλ,nZ~M,hλ,n4]CMh4(11u)4exp(CMh4(11u+γ)4).\mathbb{E}\left[\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\big{\|}^{4}\right]\leq CMh^{4}\left(\frac{1}{1-u}\right)^{4}\exp\left(CMh^{4}\left(\frac{1}{1-u}+\gamma\right)^{4}\right). (3.21)

Therefore, it follows that

𝔼[(L~(Z~M,hλ,n)L~(Z~M,hλ,n))2]\displaystyle\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]} 4(1u)4M1/2h2exp(CMh4(11u+γ)4).\displaystyle\leq\frac{4}{(1-u)^{4}}M^{1/2}h^{2}\exp\left(CMh^{4}\left(\frac{1}{1-u}+\gamma\right)^{4}\right).

For the second term on the right hand side of (3.17), we use a law of large number type argument. In fact, we have

𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2]=𝔼[𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2|Z~M,hλ,n]]\displaystyle\mathbb{E}\Big{[}(\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}))^{2}\Big{]}=\mathbb{E}\bigg{[}\mathbb{E}\Big{[}(\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}))^{2}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\bigg{]}
=𝔼[𝔼[(11u𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+]11u1Ni=1N(f(Z~M,hλ,n,d1,Si)Z~M,hλ,n)+)2|Z~M,hλ,n]]\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\left(\frac{1}{1-u}\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}\Big{]}-\frac{1}{1-u}\frac{1}{N}\sum_{i=1}^{N}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}\right)^{2}\Bigg{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\right]\right]
=(11u)21N2𝔼[𝔼[i,j=1N((f(Z~M,hλ,n,d1,Si)Z~M,hλ,n)+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+|Z~M,hλ,n])\displaystyle=\left(\frac{1}{1-u}\right)^{2}\frac{1}{N^{2}}\mathbb{E}\Bigg{[}\mathbb{E}\Bigg{[}\sum_{i,j=1}^{N}\left((f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\right)
((f(Z~M,hλ,n,d1,Sj)Z~M,hλ,n)+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n(1))+|Z~M,hλ,n])|Z~M,hλ,n]]\displaystyle\qquad\qquad\cdot\left((f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{j})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}(1))^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\right)\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Bigg{]}\Bigg{]} (3.22)

For iji\neq j, using that SiS^{i} and SjS^{j} are independent, we have

𝔼[𝔼[((f(Z~M,hλ,n,d1,Si)Z~M,hλ,n)+𝔼[f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+|Z~M,hλ,n])\displaystyle\mathbb{E}\Bigg{[}\mathbb{E}\Bigg{[}\left((f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\right)
((f(Z~M,hλ,n,d1,Sj)Z~M,hλ,n)+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+|Z~M,hλ,n])|Z~M,hλ,n]]\displaystyle\cdot\left((f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{j})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\right)\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Bigg{]}
=\displaystyle= 𝔼[𝔼[(f(Z~M,hλ,n,d1,Si)Z~M,hλ,n)+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n(1))+|Z~M,hλ,n]|Z~M,hλ,n]]\displaystyle\mathbb{E}\Bigg{[}\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}(1))^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Bigg{]}
𝔼[𝔼[(f(Z~M,hλ,n,d1,Sj)Z~M,hλ,n(1))+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+|Z~M,hλ,n]|Z~M,hλ,n]]=0.\displaystyle\cdot\mathbb{E}\Bigg{[}\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{j})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}(1))^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Bigg{]}=0.

Therefore, we can estimate the desired term as

𝔼[(L~(Z~M,hλ,n)~(Z~M,hλ,n))2]\displaystyle\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})-\widetilde{\ell}(\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})\Big{)}^{2}\Big{]} (3.23)
=(11u)21N2i=1N𝔼[𝔼[((f(Z~M,hλ,n,d1,Si)Z~M,hλ,n)+𝔼[(f(Z~M,hλ,n,d1,S)Z~M,hλ,n)+|Z~M,hλ,n])2|Z~M,hλ,n]]\displaystyle=\left(\frac{1}{1-u}\right)^{2}\frac{1}{N^{2}}\sum_{i=1}^{N}\mathbb{E}\Bigg{[}\mathbb{E}\Big{[}\left((f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S^{i})-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}-\mathbb{E}\Big{[}(f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)-\widetilde{Z^{\prime}}^{\lambda,n}_{M,h})^{+}|\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\right)^{2}\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\Bigg{]}
=(11u)2CN𝔼[𝔼[|f(Z~M,hλ,n,d1,S)|2|Z~M,hλ,n]](11u)2CN𝔼[1+|S|2]CNf(1+C(u,t)4),\displaystyle=\left(\frac{1}{1-u}\right)^{2}\frac{C}{N}\mathbb{E}\bigg{[}\mathbb{E}\Big{[}\big{|}f(\widetilde{Z^{\prime}}^{\lambda,n,d-1}_{M,h},S)\big{|}^{2}\Big{|}\widetilde{Z^{\prime}}^{\lambda,n}_{M,h}\Big{]}\bigg{]}\leq\left(\frac{1}{1-u}\right)^{2}\frac{C}{N}\mathbb{E}\Big{[}1+|S|^{2}\Big{]}\leq\frac{C}{N}f\big{(}1+C^{{}^{\prime}4}_{(u,t)}\big{)}, (3.24)

with C(u,t)4C^{{}^{\prime}4}_{(u,t)} given in (5.4). Here the third to last step follows by Jensen’s inequality and tower property. The penultimate step follows by the boundedness of rf(r,S)\nabla_{r}f(r,S) and Lipschitzness of sf(r,S)\nabla_{s}f(r,S). In the last step, we used that SS has finite fourth moment, and equation (3.2). Finally, putting (3.13), (3.16) and (3.24) together yields the lemma. ∎

We now investigate the second term on the right hand side of (3.5).

Lemma 3.5.

Under the conditions of Theorem 2.3, for all t,γ>0,u(0,1),M,Nt,\gamma>0,u\in(0,1),M,N\in\mathbb{N}^{\star} and λ>1\lambda>1, we have

𝔼[|1Nn=1NL(Z~M,hλ,n)1Nn=1NL(Z^tλ,n)|2]C(u)6h2+C(u,M,λ)7γ2,\displaystyle\mathbb{E}\bigg{[}\bigg{|}\frac{1}{N}\sum_{n=1}^{N}L(\widetilde{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}L(\widehat{Z}^{\lambda,n}_{t})\bigg{|}^{2}\bigg{]}\leq C^{{}^{\prime}6}_{(u)}h^{2}+C^{{}^{\prime}7}_{(u,M,\lambda)}\gamma^{2},

where constants C(u)6C^{{}^{\prime}6}_{(u)}, and C(u,M,λ)7C^{{}^{\prime}7}_{(u,M,\lambda)} are given in Equations (5.6) and (5.7).

Proof.

Following the same argument as in the proof of Lemma 3.3, we have

𝔼[|1Nn=1NL(Z~M,hλ,n)1Nn=1NL(Z^tλ,n)|2]C(1Nn=1N𝔼[(L~(Z~M,hλ,n)L~(Z^tλ,n))2]+γ2Nn=1N𝔼[(Z~M,hλ,n2Z^tλ,n2)2]).\displaystyle\mathbb{E}\bigg{[}\bigg{|}\frac{1}{N}\sum_{n=1}^{N}L(\widetilde{Z}^{\lambda,n}_{M,h})-\frac{1}{N}\sum_{n=1}^{N}L(\widehat{Z}^{\lambda,n}_{t})\bigg{|}^{2}\bigg{]}\leq C\bigg{(}\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\Big{(}\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widehat{Z}^{\lambda,n}_{t})\Big{)}^{2}\Big{]}+\frac{\gamma^{2}}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\Big{(}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{2}-\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{2}\Big{)}^{2}\Big{]}\bigg{)}. (3.25)

Let Zλ(s)Z^{\lambda}(s) be a continuous time approximation of the Euler-Maruyama scheme in (2.6). One way to define such an approximation is by setting

Zλ(s):=z0nsL(Zλ(r))dr+0ns2λ1dWr,Z^{\lambda}(s):=z-\int_{0}^{n_{s}}\nabla L(Z^{\lambda}(r))\,\mathrm{d}r+\int_{0}^{n_{s}}\sqrt{2\lambda^{-1}}\,\mathrm{d}W_{r},

with ns:=max{tM2n:tM2ns,n}n_{s}:=\max\{\frac{t}{M^{2}}n:\frac{t}{M^{2}}n\leq s,n\in\mathbb{Z}\} . Note that for each 1iN,1mM1\leq i\leq N,1\leq m\leq M, and t>0t>0, we have Zλ(sm)=Z~m,hλ,iZ^{\lambda}(s_{m})=\widetilde{Z}_{m,h}^{\lambda,i}. In other words, Zλ(s)Z^{\lambda}(s) coincides with Z~m,hλ,i\widetilde{Z}_{m,h}^{\lambda,i} at the time discretization points. For the first term in (3.25), we have

1Nn=1N𝔼[|L~(Z~M,hλ,n)L~(Z^tλ,n)|2]\displaystyle\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}|\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widehat{Z}^{\lambda,n}_{t})|^{2}\Big{]} 4(1u)2Nn=1N𝔼[Z~M,tM2λ,nZ^tλ,n2]\displaystyle\leq\frac{4}{(1-u)^{2}N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}_{M,\frac{t}{M^{2}}}^{\lambda,n}-\widehat{Z}_{t}^{\lambda,n}\big{\|}^{2}\Big{]}
4(1u)2Nn=1N𝔼[sup0stZλ(s)Z^sλ,n2],\displaystyle\leq\frac{4}{(1-u)^{2}N}\sum_{n=1}^{N}\mathbb{E}\Big{[}\sup_{0\leq s\leq t}\big{\|}Z^{\lambda}(s)-\widehat{Z}_{s}^{\lambda,n}\big{\|}^{2}\Big{]},

where we used that L~\widetilde{L} is 21u\frac{2}{1-u}–Lipschitz in the first inequality. By standard results on the error estimation for SDE approximations, see e.g. [43, Theorems 10.3.5 and 10.6.3], we have

𝔼[sup0st|Zλ(s)Z^sλ,n|4]Ch4\mathbb{E}\Big{[}\sup_{0\leq s\leq t}|Z^{\lambda}(s)-\widehat{Z}_{s}^{\lambda,n}|^{4}\Big{]}\leq Ch^{4} (3.26)

for some constant C>0C>0. Thus, we have

1Nn=1N𝔼[|L~(Z~M,hλ,n)L~(Z^tλ,n)|2]Ch2(1u)2.\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\Big{[}|\widetilde{L}(\widetilde{Z}^{\lambda,n}_{M,h})-\widetilde{L}(\widehat{Z}^{\lambda,n}_{t})|^{2}\Big{]}\leq\frac{Ch^{2}}{(1-u)^{2}}. (3.27)

For the second term on the right hand side of (3.25), by Cauchy-Schwartz inequality, we have

𝔼[(Z~M,hλ,n2Z^tλ,n2)2])\displaystyle\mathbb{E}\Big{[}\Big{(}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{2}-\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{2}\Big{)}^{2}\Big{]}\bigg{)} 𝔼[Z~M,hλ,n4+Z^tλ,n4]12𝔼[Z~M,hλ,nZ^tλ,n4]12\displaystyle\leq\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{4}+\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{4}\Big{]}^{\frac{1}{2}}\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}-\widehat{Z}^{\lambda,n}_{t}\big{\|}^{4}\Big{]}^{\frac{1}{2}}
𝔼[Z~M,hλ,n4]+Z^tλ,n4]12𝔼[sup0stZλ(s)Z^sλ,n4]12\displaystyle\leq\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{4}]+\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{4}\Big{]}^{\frac{1}{2}}\mathbb{E}\Big{[}\sup_{0\leq s\leq t}\big{\|}Z^{\lambda}(s)-\widehat{Z}^{\lambda,n}_{s}\big{\|}^{4}\Big{]}^{\frac{1}{2}}
C𝔼[Z~M,hλ,n4+Z^tλ,n4]12h2\displaystyle\leq C\mathbb{E}\Big{[}\big{\|}\widetilde{Z}^{\lambda,n}_{M,h}\big{\|}^{4}+\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{4}\Big{]}^{\frac{1}{2}}h^{2} (3.28)

where we used equation (3.26) in the last step.

It remains to control the fourth moment of Z^tλ,n\widehat{Z}^{\lambda,n}_{t}, since that of Z~M,hλ,n\widetilde{Z}^{\lambda,n}_{M,h} was bounded in Equation (3.2). Since Z^tλ,n\widehat{Z}^{\lambda,n}_{t} solves the SDE

dZ^tλ,n=L(Z^tλ,n)dt+2λ1dW~tn,\,\mathrm{d}\widehat{Z}^{\lambda,n}_{t}=-\nabla L(\widehat{Z}^{\lambda,n}_{t})\,\mathrm{d}t+\sqrt{2\lambda^{-1}}\,\mathrm{d}\widetilde{W}^{n}_{t},

with a linearly growing drift, the following bound on the fourth moment of the solution follows by standard arguments:

𝔼[Z^tλ,n4]\displaystyle\mathbb{E}\Big{[}\big{\|}\widehat{Z}^{\lambda,n}_{t}\big{\|}^{4}] C(1+t(1u)4+1λ2t2)eγ4Ct,\displaystyle\leq C\left(1+\frac{t}{(1-u)^{4}}+\frac{1}{\lambda^{2}}t^{2}\right)e^{\gamma^{4}Ct}, (3.29)

where CC is a constant depending only on the Lipschitz constant of L\nabla L. We omit the proof. Putting equations (3.27), (3.28), (3.29) and (3.2) together, and recalling that t=hM2t=hM^{2}, we have the result of the lemma. ∎

Now we move on to analyzing the third term in (3.5).

Lemma 3.6.

Under the conditions of Theorem 2.3, for all h,t>0,u(0,1),Nh,t>0,u\in(0,1),N\in\mathbb{N}^{\star}, 0<γ<10<\gamma<1, and λ>1\lambda>1, we have

𝔼[|1Nn=1NL(Z^tλ,n)𝔼[L(Ztλ)]|2]1NC(u,M,λ,t)8\displaystyle\mathbb{E}\bigg{[}\bigg{|}\frac{1}{N}\sum_{n=1}^{N}L(\widehat{Z}^{\lambda,n}_{t})-\mathbb{E}[L(Z_{t}^{\lambda})]\bigg{|}^{2}\bigg{]}\leq\frac{1}{N}C^{{}^{\prime}8}_{(u,M,\lambda,t)}

for C(u,M,λ,t)8C^{{}^{\prime}8}_{(u,M,\lambda,t)} given in Equation (5.8).

Proof.

Since (Z^λ,n)n1(\widehat{Z}^{\lambda,n})_{n\geq 1} are i.i.d. copies of ZtλZ_{t}^{\lambda}, a standard law of large numbers argument gives

𝔼[|1Nn=1NL(Z^tλ,n)𝔼[L(Ztλ)]|2]1NVar(L(Ztλ))\displaystyle\mathbb{E}\bigg{[}\bigg{|}\frac{1}{N}\sum_{n=1}^{N}L(\widehat{Z}^{\lambda,n}_{t})-\mathbb{E}[L(Z_{t}^{\lambda})]\bigg{|}^{2}\bigg{]}\leq\frac{1}{N}\mathrm{Var}(L(Z^{\lambda}_{t}))

where Var(L(Ztλ))\mathrm{Var}(L(Z^{\lambda}_{t})) is the variance of L(Ztλ)L(Z^{\lambda}_{t}). Since L\nabla L is (11u+γ)(\frac{1}{1-u}+\gamma)–Lipschitz, it follows by [20, Corollary 5.11], that the law of ZtλZ^{\lambda}_{t} satisfies the Poincaré inequality. That is,

Var(L(Ztλ))2λ(11u+γ)2𝔼[L(Ztλ)2].\mathrm{Var}(L(Z_{t}^{\lambda}))\leq\frac{2}{\lambda(\frac{1}{1-u}+\gamma)^{2}}\mathbb{E}\Big{[}\big{\|}\nabla L(Z_{t}^{\lambda})\big{\|}^{2}\Big{]}.

Now using that L~\nabla\widetilde{L} is 21u\frac{2}{1-u}–Lipschitz, we have L(x)L~(0)+(21u+γ)x\big{\|}\nabla L(x)\big{\|}\leq\|\nabla\widetilde{L}(0)\|+(\frac{2}{1-u}+\gamma)\|x\|. Thus, it follows that

Var(L(Ztλ))\displaystyle\mathrm{Var}(L(Z_{t}^{\lambda})) 2λ(11u+γ)2𝔼[|L~(0)+(21u+γ)Ztλ2|]\displaystyle\leq\frac{2}{\lambda(\frac{1}{1-u}+\gamma)^{2}}\mathbb{E}\bigg{[}\bigg{|}\|\nabla\widetilde{L}(0)\|+\Big{(}\frac{2}{1-u}+\gamma\Big{)}\big{\|}Z^{\lambda}_{t}\big{\|}^{2}\bigg{|}\bigg{]}
C(1λ(11u+γ)2+1λ(𝔼[Ztλ4])12)\displaystyle\leq C\bigg{(}\frac{1}{\lambda(\frac{1}{1-u}+\gamma)^{2}}+\frac{1}{\lambda}\Big{(}\mathbb{E}\Big{[}\big{\|}Z_{t}^{\lambda}\big{\|}^{4}\Big{]}\Big{)}^{\frac{1}{2}}\bigg{)}
C(1λ(1+γ)2+1λ(1+t(1u)2+tλ)eγ4Ct),\displaystyle\leq C\bigg{(}\frac{1}{\lambda(1+\gamma)^{2}}+\frac{1}{\lambda}\bigg{(}1+\frac{\sqrt{t}}{(1-u)^{2}}+\frac{t}{\lambda}\bigg{)}e^{\gamma^{4}Ct}\bigg{)},

where the last inequality follows from (3.29). Taking γ0\gamma\geq 0 yields the result of the lemma. ∎

In the next lemma we analyze the fourth term in (3.5). Essentially, this concerns the rate of convergence of the law μtλ\mu_{t}^{\lambda} of the solution of the Langevin equation to its invariant measure μλ\mu^{\lambda}_{\infty}.

Lemma 3.7.

For all t>0,u,γ(0,1)t>0,u,\gamma\in(0,1), λ>1\lambda>1 and an initial position Z0λ=zd+1Z^{\lambda}_{0}=z\in\mathbb{R}^{d+1}, we have

|𝔼L(Ztλ)L(x)dμλ|2\displaystyle\bigg{|}\mathbb{E}L(Z^{\lambda}_{t})-\int L(x)\,\mathrm{d}\mu_{\infty}^{\lambda}\bigg{|}^{2} C(u,λ)4etC(λ)5+γ2C(λ,t)9,\displaystyle\leq C^{4}_{(u,\lambda)}e^{-tC^{5}_{(\lambda)}}+\gamma^{2}C^{{}^{\prime}9}_{(\lambda,t)},

where constants C(u,λ)4,C(λ)5C^{4}_{(u,\lambda)},C^{5}_{(\lambda)}, and C(λ,t)9C^{{}^{\prime}9}_{(\lambda,t)} are given in equations (5.9) - (5.11).

Proof.

The investigation of convergence rates to the invariance measure is an active research area, see e.g. [8]. In the present case with non-convex potential functions L\nabla L, this follows from the so-called coupling by reflection arguments of Eberle [22]. In fact, by 2.2. (iv)(iv), it follows from [22, Corollary 2.1], (see also [23, Corollary 2]) that there is a constant C(λ)>0C_{(\lambda)}>0 depending only on d,γd,\gamma and λ\lambda such that

𝒲1(μtλ,μλ)C(d,λ)etC(d,λ)𝒲1(δz,μλ)for all t>0.\mathcal{W}_{1}(\mu^{\lambda}_{t},\mu_{\infty}^{\lambda})\leq C_{(d,\lambda)}e^{-tC_{(d,\lambda)}}\mathcal{W}_{1}(\delta_{z},\mu^{\lambda}_{\infty})\quad\text{for all }t>0. (3.30)

It now remains to bound |𝔼L(Ztλ)L(x)𝑑μλ||\mathbb{E}L(Z^{\lambda}_{t})-\int L(x)d\mu_{\infty}^{\lambda}| by 𝒲1(μtλ,μλ)\mathcal{W}_{1}(\mu^{\lambda}_{t},\mu^{\lambda}_{\infty}). Let α^Γ(μλ,μtλ)\hat{\alpha}\in\Gamma(\mu_{\infty}^{\lambda},\mu_{t}^{\lambda}) be an optimal coupling of μλ\mu_{\infty}^{\lambda} and μtλ\mu_{t}^{\lambda}, i.e. such that

𝔼α^XY=infαΓ(μλ,μtλ)𝔼αXY,\mathbb{E}_{\hat{\alpha}}\|X-Y\|=\inf_{\alpha\in\Gamma(\mu_{\infty}^{\lambda},\mu_{t}^{\lambda})}\mathbb{E}_{\alpha}\|X-Y\|,

see e.g. [60] for existence of α^\hat{\alpha}. Above, we denote by 𝔼α[XY]\mathbb{E}_{\alpha}[\|X-Y\|] the expectation under α\alpha of XY\|X-Y\|, with (X,Y)α(X,Y)\sim\alpha. By definition of LL and Lipschitz–continuity of L~\widetilde{L}, it holds

|L(x)L(y)|21uxy+γ2(x2+y2).|L(x)-L(y)|\leq\frac{2}{1-u}\|x-y\|+\frac{\gamma}{2}(\|x\|^{2}+\|y\|^{2}).

Taking the expectation with respect to α^Γ(μλ,μtλ)\hat{\alpha}\in\Gamma(\mu_{\infty}^{\lambda},\mu_{t}^{\lambda}) we have

𝔼α^[|L(X)L(Y)|]\displaystyle\mathbb{E}_{\hat{\alpha}}\Big{[}|L(X)-L(Y)|\Big{]} 21u𝔼α^[XY]+γ2𝔼α^[X2+Y2]\displaystyle\leq\frac{2}{1-u}\mathbb{E}_{\hat{\alpha}}\Big{[}\|X-Y\|\Big{]}+\frac{\gamma}{2}\mathbb{E}_{\hat{\alpha}}\Big{[}\|X\|^{2}+\|Y\|^{2}\Big{]}
21u𝒲1(μλ,μtλ)+γ2(𝔼[Zλ2]+𝔼[Ztλ2]),\displaystyle\leq\frac{2}{1-u}\mathcal{W}_{1}(\mu_{\infty}^{\lambda},\mu_{t}^{\lambda})+\frac{\gamma}{2}\Big{(}\mathbb{E}[\|Z^{\lambda}_{\infty}\|^{2}]+\mathbb{E}[\big{\|}Z^{\lambda}_{t}\big{\|}^{2}]\Big{)}, (3.31)

where ZλμλZ^{\lambda}_{\infty}\sim\mu^{\lambda}_{\infty}. As in the proof of Lemma 3.5, see e.g. Equation 3.29, ZtλZ^{\lambda}_{t} has second moment bounded by a constant C(λ,t)>0C_{(\lambda,t)}>0. Concerning the term 𝔼[Zλ2]\mathbb{E}[\|Z^{\lambda}_{\infty}\|^{2}], note that it holds

𝔼[Zλ2]=x2dμλ=x2eλL(x)eλL(a)dadx.\mathbb{E}[\|Z^{\lambda}_{\infty}\|^{2}]=\int\|x\|^{2}\,\mathrm{d}\mu_{\infty}^{\lambda}=\int\|x\|^{2}\frac{e^{-\lambda L(x)}}{\int e^{-\lambda L(a)}\,\mathrm{d}a}\,\mathrm{d}x.

Since L~\widetilde{L} is 21u\frac{2}{1-u}-Lipschitz, |L~(x)|C+21ux|\widetilde{L}(x)|\leq C+\frac{2}{1-u}\|x\| for some constant CC. We thus have

x2eλL(x)dx=x2eλ(L~(x)+γx2/2)dxx2eCλ+2λx/(1u)x2/2dx<.\displaystyle\int\|x\|^{2}e^{-\lambda L(x)}\,\mathrm{d}x=\int\|x\|^{2}e^{-\lambda(\widetilde{L}(x)+\gamma\|x\|^{2}/2)}\,\mathrm{d}x\leq\int\|x\|^{2}e^{C\lambda+2\lambda\|x\|/(1-u)-\|x\|^{2}/2}\,\mathrm{d}x<\infty. (3.32)

Using the same argument, 0<eλL(a)da<0<\int e^{-\lambda L(a)}\,\mathrm{d}a<\infty. Therefore, we have 𝔼[Zλ2]<\mathbb{E}[\|Z^{\lambda}_{\infty}\|^{2}]<\infty. Combining this with (3.30) and (3.31) yields the desired result. ∎

Remark 3.8.

If the function ff is convex, it follows that LL is strongly convex in the sense that 2LγId\nabla^{2}L\geq\gamma I_{d}, where IdI_{d} is the (d)×(d)(d)\times(d) identity matrix. In this case, the exponential convergence to equilibrium follows by standard arguments, see e.g. [8]. It fact, we have the following bound is second order Wasserstein distance:

𝒲22(μtλ,μλ)e2γt𝒲22(δz,μλ).\mathcal{W}_{2}^{2}(\mu^{\lambda}_{t},\mu^{\lambda}_{\infty})\leq e^{-2\gamma t}\mathcal{W}_{2}^{2}(\delta_{z},\mu^{\lambda}_{\infty}). (3.33)

In this case, a slight modification of the above arguments allow to get the bound

|𝔼L(Ztλ)L(x)dμλ|2\displaystyle\bigg{|}\mathbb{E}L(Z^{\lambda}_{t})-\int L(x)\,\mathrm{d}\mu_{\infty}^{\lambda}\bigg{|}^{2} C(1(1u)2+1)eγt𝒲22(δz,μλ).\displaystyle\leq C\left(\frac{1}{(1-u)^{2}}+1\right)e^{-\gamma t}\mathcal{W}_{2}^{2}(\delta_{z},\mu^{\lambda}_{\infty}).

The estimation of the fifth term in the decomposition (3.5) is an immediate consequence of the existence of second moment of the invariant measure μλ\mu_{\infty}^{\lambda} obtained in the proof of the preceding lemma. In fact, by definition of LL and L~\widetilde{L} we have

|L(x)dμλL~(x)dμλ|2=(γ2)2x2dμλCγ2,\bigg{|}\int L(x)\,\mathrm{d}\mu_{\infty}^{\lambda}-\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda}\bigg{|}^{2}=\left(\frac{\gamma}{2}\right)^{2}\int\|x\|^{2}\,\mathrm{d}\mu_{\infty}^{\lambda}\leq C\gamma^{2}, (3.34)

for a constant C>0C>0. We conclude the proof of the theorem with the following lemma estimating the last term on the right hand side of (3.5).

Lemma 3.9.

Under the conditions of Theorem 2.3, we have

|L~(x)dμλAVaR(f)¯|2Cγ2+C(u)9λ2,\displaystyle\bigg{|}\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda}-\overline{\mathrm{AVaR}(f)}\bigg{|}^{2}\leq C\gamma^{2}+\frac{C^{{}^{\prime}9}_{(u)}}{\lambda^{2}}, (3.35)

where C(u)6C^{6}_{(u)} is given in Equation (5.13).

Proof.

First recall, see Equation 2.2 and Lemma 3.2 that AVaR¯(f)=L~(z)\overline{\mathrm{AVaR}}(f)=\widetilde{L}(z^{*}) where z=(r,m)z^{*}=(r^{*},m^{*}) is the optimizer in (2.2). Now, consider the differential entropy

μλlogμλdx\displaystyle-\int\mu_{\infty}^{\lambda}\log\mu_{\infty}^{\lambda}\,\mathrm{d}x =eλL(x)eλL(u)dulogeλL(x)eλL(u)dudx\displaystyle=-\int\frac{e^{-\lambda L(x)}}{\int e^{-\lambda L(u)}\,\mathrm{d}u}\log\frac{e^{-\lambda L(x)}}{\int e^{-\lambda L(u)}\,\mathrm{d}u}\,\mathrm{d}x
=eλL(x)eλL(u)du(λL(x)logeλL(u)du)dx\displaystyle=-\int\frac{e^{-\lambda L(x)}}{\int e^{-\lambda L(u)}\,\mathrm{d}u}\left(-\lambda L(x)-\log\int e^{-\lambda L(u)}\,\mathrm{d}u\right)\,\mathrm{d}x
=λL(x)dμλ+logeλL(x)dx\displaystyle=\lambda\int L(x)\,\mathrm{d}\mu_{\infty}^{\lambda}+\log\int e^{-\lambda L(x)}\,\mathrm{d}x
=λ(L~(x)+γ2x2)dμλ+logeλL~(x)λγ2x2dx.\displaystyle=\lambda\int\left(\widetilde{L}(x)+\frac{\gamma}{2}\|x\|^{2}\right)\,\mathrm{d}\mu_{\infty}^{\lambda}+\log\int e^{-\lambda\widetilde{L}(x)-\frac{\lambda\gamma}{2}\|x\|^{2}}\,\mathrm{d}x.

Rearranging the terms gives the following expression for the integral of L~\widetilde{L}:

L~(x)dμλ=γ2x2dμλ1λμλlogμλdx1λlogeλL~(x)λγ2x2dx.\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda}=-\frac{\gamma}{2}\int\|x\|^{2}\,\mathrm{d}\mu_{\infty}^{\lambda}-\frac{1}{\lambda}\int\mu_{\infty}^{\lambda}\log\mu_{\infty}^{\lambda}\,\mathrm{d}x-\frac{1}{\lambda}\log\int e^{-\lambda\widetilde{L}(x)-\frac{\lambda\gamma}{2}\|x\|^{2}}\,\mathrm{d}x. (3.36)

Since for any continuous random variable, a Gaussian distribution with the same second moment maximizes the differential entropy ([63, Theorem 10.48]), it holds

μλlogμλdx12log((2πe)d+1x2dμλ),-\int\mu_{\infty}^{\lambda}\log\mu_{\infty}^{\lambda}\,\mathrm{d}x\leq\frac{1}{2}\log\left(({2\pi e})^{d+1}\int\|x\|^{2}\,\mathrm{d}\mu_{\infty}^{\lambda}\right), (3.37)

and using that x2dμλ<\int\|x\|^{2}\,\mathrm{d}\mu_{\infty}^{\lambda}<\infty (see equation (3.32)) and subtracting L~(z)\widetilde{L}(z^{*}) from both sides of (3.36), we have

L~(x)dμλL~(z)\displaystyle\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda}-\widetilde{L}(z^{*}) C(γ+1λ1λlogeλL~(x)γ2x2dxL~(z))\displaystyle\leq C\left(-\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\int e^{-\lambda\widetilde{L}(x)-\frac{\gamma}{2}\|x\|^{2}}\,\mathrm{d}x-\widetilde{L}(z^{*})\right)
C(γ+1λ1λlog(eλL~(z)eλ(L~(x)L~(z)γλ2x2dx)L~(z))\displaystyle\leq C\left(\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\left(e^{-\lambda\widetilde{L}(z^{*})}\int e^{-\lambda(\widetilde{L}(x)-\widetilde{L}(z^{*})-\frac{\gamma\lambda}{2}\|x\|^{2}}\,\mathrm{d}x\right)-\widetilde{L}(z^{*})\right)
=C(γ+1λ1λlogeλ(L~(x)L~(z))λγ2x2dx).\displaystyle=C\left(\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\int e^{-\lambda(\widetilde{L}(x)-\widetilde{L}(z^{*}))-\frac{\lambda\gamma}{2}\|x\|^{2}}\,\mathrm{d}x\right).

Hence, using the fact that L~\nabla\widetilde{L} is 11u\frac{1}{1-u}-Lipschitz, we can estimate the exponent in the integral above as

|L~(x)L~(x)L~(z)(xz)|12(1u)xz2.|\widetilde{L}(x)-\widetilde{L}(x^{*})-\nabla\widetilde{L}(z^{*})\cdot(x-z^{*})|\leq\frac{1}{2(1-u)}\|x-z^{*}\|^{2}.

Thanks to the above inequality, using L~(z)=0\nabla\widetilde{L}(z^{*})=0 we obtain

L~(x)dμλL(z)\displaystyle\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda}-L(z^{*}) C(γ+1λ1λlogeλ2(1u)xz2γλ2x2dx)\displaystyle\leq C\bigg{(}\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\int e^{-\frac{\lambda}{2(1-u)}\|x-z^{*}\|^{2}-\frac{\gamma\lambda}{2}\|x\|^{2}}\,\mathrm{d}x\bigg{)}
=C(γ+1λ1λlog(2πλ(γ+1(1u))eγλz22(1u)(1/(1u)+γ)))\displaystyle=C\bigg{(}\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\bigg{(}\sqrt{\frac{2\pi}{\lambda(\gamma+\frac{1}{(1-u)})}}e^{-\frac{\gamma\lambda\|z^{*}\|^{2}}{2(1-u)(1/(1-u)+\gamma)}}\bigg{)}\bigg{)}
C(γ+1λ1λlog(2πλ(γ+11u))+γ(1u)(11u+γ)).\displaystyle\leq C\bigg{(}\gamma+\frac{1}{\lambda}-\frac{1}{\lambda}\log\bigg{(}\sqrt{\frac{2\pi}{\lambda(\gamma+\frac{1}{1-u})}}\bigg{)}+\frac{\gamma}{(1-u)(\frac{1}{1-u}+\gamma)}\bigg{)}.

For the other side of the inequality, since zz^{*} is a minimizer of L~\widetilde{L}, we have

L~(x)L~(x)dμλ\displaystyle\widetilde{L}(x^{*})-\int\widetilde{L}(x)\,\mathrm{d}\mu_{\infty}^{\lambda} L~(z)L~(z)dμλ=0.\displaystyle\leq\widetilde{L}(z^{*})-\widetilde{L}(z^{*})\int\,\mathrm{d}\mu_{\infty}^{\lambda}=0.

Using 0<γ<10<\gamma<1 and λ1\lambda\geq 1 concludes the proof. ∎

4 Rate for general law invariant convex risk measures

We now focus on the estimation of the approximation error of general convex risk measures. As explained in Section 2, the main argument for the derivation of the rate is the representation of the (law invariant) convex risk measure with respect to AVaR\mathrm{AVaR}. One technical difficulty is that this representation involves an integral with respect to the risk aversion level uu of AVaR\mathrm{AVaR}. Notice that both the functions LL and L~\widetilde{L} as well as the Markov chain Z~m,hλ,n\widetilde{Z}^{\lambda,n}_{m,h} in the approximation scheme depend on uu. We will make this dependence explicit in this section by writing LuL^{u}, Z^tλ,n,u\widehat{Z}^{\lambda,n,u}_{t}, and Z~M,hλ,n,u\widetilde{Z}^{\lambda,n,u}_{M,h} for the function and processes defined in (2.3), (3.4) and (2.6) respectively.

4.1 Proof of Theorem 2.6

This subsection covers the proof of Theorem 2.6. Recall that we approximate a general law invariant convex risk measure by

ρ~δ(f):=esssupγ:β(γ)b(0δAVaR~u(f)γ(du)β(γ)),\widetilde{\rho}^{\delta}(f):=\operatorname*{ess\,sup}_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\bigg{(}\int_{0}^{\delta}\widetilde{\text{AVaR}}_{u}(f)\gamma(\,\mathrm{d}u)-\beta(\gamma)\bigg{)},

with AVaR~u(f)=1Nn=1Nu¯(Z~M,h,λ,n,u)\widetilde{\text{AVaR}}_{u}(f)=\frac{1}{N}\sum_{n=1}^{N}\overline{\ell^{u}}(\widetilde{Z}_{M,h}^{\prime,\lambda,n,u}). Further define

ρδ(f):=infrAsupγ:β(γ)b(0δAVaRu(f(r,S))γ(du)β(γ)),δ(0,1).\rho^{\delta}(f):=\inf_{r\in A}\sup_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\bigg{(}\int_{0}^{\delta}\text{AVaR}_{u}(f(r,S))\gamma(\,\mathrm{d}u)-\beta(\gamma)\bigg{)},\quad\delta\in(0,1). (4.1)

To begin, we decompose the approximation error into two parts:

𝔼[|ρ(f)ρ~δ(f)|2]2|ρ(f)ρδ(f)|2+2𝔼[|ρδ(f)ρ~δ(f)|2].\displaystyle\mathbb{E}\Big{[}|\rho(f)-\widetilde{\rho}^{\delta}(f)|^{2}\Big{]}\leq 2|\rho(f)-\rho^{\delta}(f)|^{2}+2\mathbb{E}\Big{[}|\rho^{\delta}(f)-\widetilde{\rho}^{\delta}(f)|^{2}\Big{]}. (4.2)

Let us estimate the first term. First, we will show that for all δ(0,1)\delta\in(0,1), ρδ\rho^{\delta} satisfies

ρδ(f)\displaystyle\rho^{\delta}(f) =supγ:β(γ)b(0δinfrAAVaRu(f(r,S))γ(du)β(γ))\displaystyle=\sup_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\bigg{(}\int_{0}^{\delta}\inf_{r\in A}\text{AVaR}_{u}(f(r,S))\gamma(\,\mathrm{d}u)-\beta(\gamma)\bigg{)} (4.3)
=supγ:β(γ)b(0δAVaR¯u(f)γ(du)β(γ)).\displaystyle=\sup_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\bigg{(}\int_{0}^{\delta}\overline{\text{AVaR}}_{u}(f)\gamma(\,\mathrm{d}u)-\beta(\gamma)\bigg{)}.

Note that going from the definition in (4.1) to the expression in (4.3) requires interchaging the infimum and the supremum and then the infimum and the integral in the definition given in (4.1). Since the supremum is taken over a compact set and the function (r,γ)0δAVaRu(f(r,S))γ(du)β(γ)(r,\gamma)\mapsto\int_{0}^{\delta}\text{AVaR}_{u}(f(r,S))\gamma(du)-\beta(\gamma) is continuous in rr and upper semi–continuous and concave in γ\gamma, by Fan’s minimax theorem, see [26, Theorem 2], we can interchange the supremum and the infimum. To interchange the infimum and the integral, we apply Rockafellar’s interchange theorem [55]. Thus, using Equation 2.10 and Equation 4.3, it holds that

|ρ(f)ρδ(f)|2supγ:β(γ)bδ1|AVaR¯(f)|2γ(du).|\rho(f)-\rho^{\delta}(f)|^{2}\leq\sup_{\gamma\in\mathcal{M}:\beta(\gamma)\leq b}\int_{\delta}^{1}|\overline{\text{AVaR}}(f)|^{2}\gamma(\,\mathrm{d}u).

Let us partition the interval [δ,1)[\delta,1) into n1In\bigcup_{n\geq 1}I_{n}, where In=[δ+(12n+1)(1δ),δ+(12n)(1δ))I_{n}=\big{[}\delta+(1-2^{-n+1})(1-\delta),\delta+(1-2^{-n})(1-\delta)\big{)} for every n1n\geq 1. Defining

Γb(In):=supγ s.t β(γ)bγ(In)\Gamma_{b}(I_{n}):=\sup_{\gamma\in\mathcal{M}\text{ s.t }\beta(\gamma)\leq b}\gamma(I_{n})

, we obtain the estimation

|ρ¯(f)ρδ(f)|2\displaystyle|\bar{\rho}(f)-\rho^{\delta}(f)|^{2} n1Γb(In)supuIn|AVaR¯u(f)|2.\displaystyle\leq\sum_{n\geq 1}\Gamma_{b}(I_{n})\sup_{u\in I_{n}}|\overline{\text{AVaR}}_{u}(f)|^{2}. (4.4)

Now by [4, Lemma 4.5 and Lemma 4.3], we have

Γb(In)C((1δ)2n)1/qand|AVaR¯u(f)|C(1u)1/p,\Gamma_{b}(I_{n})\leq C((1-\delta)2^{-n})^{1/q}\quad\text{and}\quad\left|\overline{\text{AVaR}}_{u}(f)\right|\leq\frac{C}{(1-u)^{1/p}}, (4.5)

for every p(1,)p\in(1,\infty). Therefore, for any given p(1,)p\in(1,\infty) and nn\in\mathbb{N}, it holds

supuIn|AVaR¯u(f)|2C((1δ)2n)2/p.\sup_{u\in I_{n}}\left|\overline{\mathrm{AVaR}}_{u}(f)\right|^{2}\leq\frac{C}{((1-\delta)2^{-n})^{2/p}}. (4.6)

Choosing p=4qp=4q and using (4.4), (4.5) and (4.6), we have

|ρ¯(f)ρδ(f)|2\displaystyle|\bar{\rho}(f)-\rho^{\delta}(f)|^{2} n1((1δ)2n)1/qC((1δ)2n)2/p\displaystyle\leq\sum_{n\geq 1}((1-\delta)2^{-n})^{1/q}\frac{C}{((1-\delta)2^{-n})^{2/p}}
C(1δ)1/q2/pn12n(2/p1/q)\displaystyle\leq C(1-\delta)^{1/q-2/p}\sum_{n\geq 1}2^{n(2/p-1/q)}
C(1δ)1/2qn12n/2q.\displaystyle\leq C(1-\delta)^{1/2q}\sum_{n\geq 1}2^{-n/2q}.

Since q(1,)q\in(1,\infty), it holds n12n/2qC\sum_{n\geq 1}2^{-n/2q}\leq C for some universal constant CC, and thus

|ρ¯(f)ρδ(f)|2C(1δ)1/2q.|\bar{\rho}(f)-\rho^{\delta}(f)|^{2}\leq C(1-\delta)^{1/2q}. (4.7)

For the second term in equation (4.2), we use the error rate for the estimation of AVaR. Let R\mathcal{M}^{R} be the set of all random probability measures on [0,1)[0,1). We have

𝔼|ρδ(f)ρ~δ(f)|2]\displaystyle\mathbb{E}|\rho^{\delta}(f)-\widetilde{\rho}^{\delta}(f)|^{2}\Big{]} 𝔼[esssupγ0δ|AVaR~uAVaRu|2γ(du)]\displaystyle\leq\mathbb{E}\Big{[}\operatorname*{ess\,sup}_{\gamma\in\mathcal{M}}\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\gamma(du)\Big{]}
𝔼[esssupγR0δ|AVaR~uAVaRu|2γ(du)].\displaystyle\leq\mathbb{E}\Big{[}\operatorname*{ess\,sup}_{\gamma\in\mathcal{M}^{R}}\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\gamma(du)\Big{]}.

For each random measure γiR\gamma^{i}\in\mathcal{M}^{R}, define the corresponding random variable Ai:=0δ|AVaR~uAVaRu|γi(du)A^{i}:=\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|\gamma^{i}(du), and let 𝒜\mathcal{A} be the set of all such random variables. To make use of the error rate for the estimation of AVaR, we first show the set of random variables 𝒜\mathcal{A} is directed upward, i.e. for any pair of random variables Ai,Aj𝒜A^{i},A^{j}\in\mathcal{A}, there exists A~𝒜\widetilde{A}\in\mathcal{A} with A~max{Ai,Aj}\widetilde{A}\geq\max\{A^{i},A^{j}\}. Then, by the following theorem, we can rewrite esssup𝒜=limnAn\operatorname*{ess\,sup}\mathcal{A}=\lim_{n}A^{n} for some increasing sequence (A)n𝒜(A)_{n}\in\mathcal{A}.

Theorem 4.1.

[29] If 𝒜\mathcal{A} is directed upward, there exists an increasing sequence A1A2𝒜A^{1}\leq A^{2}\leq\cdots\in\mathcal{A} such that esssup𝒜=limnAn\operatorname*{ess\,sup}\mathcal{A}=\lim_{n}A^{n} \mathbb{P}-almost surely.

To show 𝒜\mathcal{A} is directed upward, for any γi,γj𝒜,\gamma^{i},\gamma^{j}\in{\mathcal{A}}, define the set of events

B:={ω:0δ|AVaR~uAVaRu|γi(ω,du)0δ|AVaR~uAVaRu|γj(ω,du)},B:=\Bigg{\{}\omega:\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|\gamma^{i}(\omega,du)\geq\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|\gamma^{j}(\omega,du)\Bigg{\}}, (4.8)

and the random measure

γ~(ω,du):=γ1(ω,du)𝟙ωB+γ2(ω,du)𝟙ωBC.\widetilde{\gamma}(\omega,du):=\gamma^{1}(\omega,du)\mathds{1}_{\omega\in B}+\gamma^{2}(\omega,du)\mathds{1}_{\omega\in B^{C}}. (4.9)

Then, we have A~:=|AVaR~uAVaRu|γ~(ω,du)max{Ai,Aj}\widetilde{A}:=\int|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|\widetilde{\gamma}(\omega,du)\geq\max\{A^{i},A^{j}\} and A~𝒜\widetilde{A}\in\mathcal{A}. Therefore, 𝒜\mathcal{A} is directed upward. By Theorem 4.1, there exists an increasing sequence (An)n(A^{n})_{n} in 𝒜\mathcal{A} with esssup𝒜=limnAn\operatorname*{ess\,sup}\mathcal{A}=\lim_{n}A^{n} \mathbb{P}-almost surely, and we have

𝔼[|ρδ(f)ρ~δ(f)|2]\displaystyle\mathbb{E}\Big{[}|\rho^{\delta}(f)-\widetilde{\rho}^{\delta}(f)|^{2}\Big{]} =𝔼[limn0δ|AVaR~uAVaRu|2γn(du)]\displaystyle=\mathbb{E}\Big{[}\lim_{n}\int_{0}^{\delta}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\gamma^{n}(du)\Big{]}
=limn0δ𝔼[|AVaR~uAVaRu|2]γn(du)\displaystyle=\lim_{n}\int_{0}^{\delta}\mathbb{E}\Big{[}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\Big{]}\gamma^{n}(du)
supn0δ𝔼[|AVaR~uAVaRu|2]γn(du)\displaystyle\leq\sup_{n\in\mathbb{N}}\int_{0}^{\delta}\mathbb{E}\Big{[}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\Big{]}\gamma^{n}(du)
supu[0,δ]𝔼[|AVaR~uAVaRu|2]supn0δγn(du),\displaystyle\leq\sup_{u\in[0,\delta]}\mathbb{E}\Big{[}|\widetilde{\text{AVaR}}_{u}-\text{AVaR}_{u}|^{2}\Big{]}\sup_{n\in\mathbb{N}}\int_{0}^{\delta}\gamma^{n}(du),

where we used monotone convergence theorem and Fubini’s theorem in the second line. Using Theorem 2.3 and taking the supremum over u(0,δ)u\in(0,\delta), we have the result of the theorem.

5 Appendix

Here we give explicit formulas for constants in the proof of Theorem 2.3.

C(u,t,λ)1\displaystyle C^{{}^{\prime}1}_{(u,t,\lambda)} =C((11u)2+1)1t(21/t+(1(1u)4+Cdλ2)21/t+1)exp(C1t(1(1u)2+1))\displaystyle=C\bigg{(}\Big{(}\frac{1}{1-u}\Big{)}^{2}+1\bigg{)}\frac{1}{t}\bigg{(}2^{1/t}+\Big{(}\frac{1}{(1-u)^{4}}+\frac{C_{d}}{\lambda^{2}}\Big{)}\cdot 2^{1/t}+1\bigg{)}\exp\bigg{(}C\frac{1}{t}\Big{(}\frac{1}{(1-u)^{2}}+1\Big{)}\bigg{)} (5.1)
C(u,t,λ)2\displaystyle C^{{}^{\prime}2}_{(u,t,\lambda)} =C(31/t+(1(1u)4+1+Cdλ2)31/t+1)\displaystyle=C\bigg{(}3^{1/t}+\Big{(}\frac{1}{(1-u)^{4}}+1+\frac{C_{d}}{\lambda^{2}}\Big{)}\cdot 3^{1/t}+1\bigg{)} (5.2)
C(u,t)3\displaystyle C^{{}^{\prime}3}_{(u,t)} =C1(1u)41texp(Ct(11u+1)4)\displaystyle=C\frac{1}{(1-u)^{4}}\frac{1}{t}\exp\bigg{(}\frac{C}{t}\Big{(}\frac{1}{1-u}+1\Big{)}^{4}\bigg{)} (5.3)
C(u,t)4\displaystyle C^{{}^{\prime}4}_{(u,t)} =C1(1u)2(21/t+(1(1u)4+1λ2Cd)21/t)\displaystyle=C\frac{1}{(1-u)^{2}}\bigg{(}2^{1/t}+\Big{(}\frac{1}{(1-u)^{4}}+\frac{1}{\lambda^{2}}C_{d}\Big{)}\cdot 2^{1/t}\bigg{)} (5.4)
C(u,t,λ)5\displaystyle C^{{}^{\prime}5}_{(u,t,\lambda)} =(21/t+(1(1u)4+1λ2Cd)21/t)\displaystyle=\bigg{(}2^{1/t}+\Big{(}\frac{1}{(1-u)^{4}}+\frac{1}{\lambda^{2}}C_{d}\Big{)}\cdot 2^{1/t}\bigg{)} (5.5)
C(u)6\displaystyle C^{{}^{\prime}6}_{(u)} =C1(1u)4\displaystyle=C\frac{1}{(1-u)^{4}} (5.6)
C(u,t,λ)7\displaystyle C^{{}^{\prime}7}_{(u,t,\lambda)} =C(21/t+(1(1u)4+1λ2Cd)21/t+(1+1t(1u)4+1tλ2e1/t)1/2)\displaystyle=C\bigg{(}2^{1/t}+\Big{(}\frac{1}{(1-u)^{4}}+\frac{1}{\lambda^{2}}C_{d}\Big{)}\cdot 2^{1/t}+\Big{(}1+\frac{1}{t(1-u)^{4}}+\frac{1}{t\lambda^{2}}e^{1/t}\Big{)}^{1/2}\bigg{)} (5.7)
C(u,t,λ)8\displaystyle C^{{}^{\prime}8}_{(u,t,\lambda)} =C(1λ(11u)2+1λ(11u)2(1+1t(1u)4+1t2λ2)1/2eCt)\displaystyle=C\bigg{(}\frac{1}{\lambda(\frac{1}{1-u})^{2}}+\frac{1}{\lambda(\frac{1}{1-u})^{2}}\Big{(}1+\frac{1}{t(1-u)^{4}}+\frac{1}{t^{2}\lambda^{2}}\Big{)}^{1/2}e^{Ct}\bigg{)} (5.8)
C(u,λ)4\displaystyle C^{4}_{(u,\lambda)} =C(λ)(1u)2𝒲12(δz,μλ)\displaystyle=\frac{C_{(\lambda)}}{(1-u)^{2}}\mathcal{W}^{2}_{1}(\delta_{z},\mu^{\lambda}_{\infty}) (5.9)
C(λ)5\displaystyle C^{5}_{(\lambda)} =2C(λ)\displaystyle=2C_{(\lambda)} (5.10)
C(u,t,λ)9\displaystyle C^{{}^{\prime}9}_{(u,t,\lambda)} =(C10)2+C(1+1t(1u)4+1t2λ2)eC/t\displaystyle=(C^{{}^{\prime}10})^{2}+C\bigg{(}1+\frac{1}{t(1-u)^{4}}+\frac{1}{t^{2}\lambda^{2}}\bigg{)}e^{C/t} (5.11)
C(u,λ)10\displaystyle C^{{}^{\prime}10}_{(u,\lambda)} =x2eCλ+2λx/(1u)x2/2dx\displaystyle=\int\|x\|^{2}e^{C\lambda+2\lambda\|x\|/(1-u)-\|x\|^{2}/2}\,\mathrm{d}x (5.12)
C(u)6\displaystyle C^{6}_{(u)} =C(log2π(1u)+1)2\displaystyle=C\Big{(}\log\sqrt{2\pi(1-u)}+1\Big{)}^{2} (5.13)
C(u,t,λ)1\displaystyle C^{1}_{(u,t,\lambda)} =1+C(u,t)4+C(u,t,λ)8\displaystyle=1+C^{{}^{\prime}4}_{(u,t)}+C^{{}^{\prime}8}_{(u,t,\lambda)} (5.14)
C(u,t,λ)2\displaystyle C^{2}_{(u,t,\lambda)} =C(u,t,λ)2+C(u,t,λ)5+C(u,t,λ)7+C(u,t,λ)11+C\displaystyle=C^{{}^{\prime}2}_{(u,t,\lambda)}+C^{{}^{\prime}5}_{(u,t,\lambda)}+C^{{}^{\prime}7}_{(u,t,\lambda)}+C^{{}^{\prime}11}_{(u,t,\lambda)}+C (5.15)
C(u,t,λ)3\displaystyle C^{3}_{(u,t,\lambda)} =C(u,t,λ)1+C(u,t)3+C(u)6\displaystyle=C^{{}^{\prime}1}_{(u,t,\lambda)}+C^{{}^{\prime}3}_{(u,t)}+C^{{}^{\prime}6}_{(u)} (5.16)
C(δ,t,λ)7\displaystyle C^{7}_{(\delta,t,\lambda)} =C1(1δ)2(21/t+(1(1δ)4+1λ2Cd)21/t)+C(1λ+1λ(1+1t(1δ)4+1t2λ2)1/2eCt).\displaystyle=C\frac{1}{(1-\delta)^{2}}\bigg{(}2^{1/t}+\Big{(}\frac{1}{(1-\delta)^{4}}+\frac{1}{\lambda^{2}}C_{d}\Big{)}\cdot 2^{1/t}\bigg{)}+C\bigg{(}\frac{1}{\lambda}+\frac{1}{\lambda}\Big{(}1+\frac{1}{t(1-\delta)^{4}}+\frac{1}{t^{2}\lambda^{2}}\Big{)}^{1/2}e^{Ct}\bigg{)}. (5.17)

References

  • Ahmadi-Javid [2012] A. Ahmadi-Javid. Entropic value-at-risk: A new coherent risk measure. J Optimiz Theory App, 155(3):1105–1123, 2012.
  • Allen-Zhu [2018] Zeyuan Allen-Zhu. Natasha 2: Faster non-convex optimization than sgd. Advances in neural information processing systems, 31, 2018.
  • Artzner et al. [1999] Philippe Artzner, Freddy Delbaen, Jean Marc Eber, and David Heath. Coherent measures of risk. Math. Finance, 9:203–228, 1999.
  • Bartl and Tangpi [2022] Daniel Bartl and Ludovic Tangpi. Nonasymptotic convergence rates for the plug-in estimation of risk measures. Mathematics of Operations Research, 2022.
  • Bartl et al. [2020] Daniel Bartl, Samuel Drapeau, and Ludovic Tangpi. Computational aspects of robust optimized certainty equivalents and option pricing. Math. Finance, 30:287–309, 2020.
  • Belomestny and Krätschmer [2012] Denis Belomestny and Volker Krätschmer. Central limit theorems for law-invariant coherent risk measures. Journal of Applied Probability, 49(1):1–21, 2012.
  • Bhardwaj [2019] Chandrasekaran Anirudh Bhardwaj. Adaptively preconditioned stochastic gradient langevin dynamics. arXiv preprint arXiv:1906.04324, 2019.
  • Bolley et al. [2012] François Bolley, Ivan Gentil, and Armand Guillin. Convergence to equilibrium in Wasserstein distance for Fokker–Planck equations. J. Funct. Anal., 263(8):2430–2457, 2012.
  • Bühler et al. [2019] Hans Bühler, Lukas Gonon, Josef Teichmann, and Ben Wood. Deep hedging. Quantitative Finance, 19(8):1271–1291, 2019.
  • Butler and Schachter [1997] JS Butler and Barry Schachter. Estimating value-at-risk with a precision measure by combining kernel estimation with historical simulation. Rev. Deriv. Res., 1:371–390, 1997.
  • Carmona and Delarue [2018] René Carmona and François Delarue. Probabilistic theory of mean field games with applications. I, volume 83 of Probability Theory and Stochastic Modelling. Springer, Cham, 2018. ISBN 978-3-319-56437-1; 978-3-319-58920-6. Mean field FBSDEs, control, and games.
  • Chen and Li [1989] Mu-Fa Chen and Shao-Fu Li. Coupling methods for multidimensional diffusion processes. The Ann. Probab., pages 151–177, 1989.
  • Chen [2008] Song Xi Chen. Nonparametric estimation of expected shortfall. Journal of Financial Econometrics, 6(1):87–107, 2008.
  • Chen et al. [2014] Tianqi Chen, Emily Fox, and Carlos Guestrin. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691. PMLR, 2014.
  • Cheridito and Li [2009] Patrick Cheridito and Tianhui Li. Risk measures on Orlicz hearts. Math. Finance, 19(2):189–214, 2009.
  • Cheridito et al. [2006] Patrick Cheridito, Freddy Delbaen, and Michael Kupper. Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab., 11(3):57–106, 2006.
  • Cont et al. [2010] Rama Cont, Romain Deguest, and Giacomo Scandolo. Robustness and sensitivity analysis of risk measurement procedures. Quantitative finance, 10(6):593–606, 2010.
  • Delbaen [2012] Freddy Delbaen. Monetary Utility Functions. Osaka University Press, 2012.
  • Deng et al. [2020] Wei Deng, Guang Lin, and Faming Liang. A contour stochastic gradient langevin dynamics algorithm for simulations of multi-modal distributions. Advances in neural information processing systems, 33:15725–15736, 2020.
  • Djellout et al. [2004] H. Djellout, A. Guillin, and L. Wu. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. The Annals of Probability, 32(3B):2702–2732, 2004.
  • "Durmus et al. [2019] Alain "Durmus, Szymon Majewski, and Blażej" Miasojedow. Analysis of langevin monte carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711, 2019.
  • Eberle [2011] Andreas Eberle. Reflection coupling and wasserstein contractivity without convexity. C. R. Math, 349(19-20):1101–1104, 2011.
  • Eberle [2016] Andreas Eberle. Reflection couplings and contraction rates for diffusions. Probab. Theory Relat. Fields, 166(3):851–886, 2016.
  • Eckstein and Kupper [2021] Stephan Eckstein and Michael Kupper. Computation of optimal transport and related hedging problems via penalization and neural networks. Applied Mathematics & Optimization, 83:639–667, 2021.
  • Fan and Gu [2003] Jianqing Fan and Juan Gu. Semiparametric estimation of value at risk. Econom. J., 6(2):261–290, 2003.
  • Fan [1953] Ky Fan. Minimax theorems. PNAS USA, 39(1):42, 1953.
  • Föllmer and Knispel [2011] Hans Föllmer and T. Knispel. Entropic risk measures: Coherence vs. convexity, model ambiguity and robust large deviations. Stoch. Dyn., 11(02n03):333–351, 2011.
  • Föllmer and Schied [2002] Hans Föllmer and Alexander Schied. Convex measures of risk and trading constraint. Finance Stoch., 6(4):429–447, 2002.
  • Föllmer and Schied [2004] Hans Föllmer and Alexander Schied. Stochastic Finance: An Introduction in Discrete Time. Walter de Gruyter, Berlin, New York, 2 edition, 2004.
  • Föllmer and Schied [2004] Hans Föllmer and Alexander Schied. Stochastic Finance. An Introduction in Discrete Time. de Gruyter Studies in Mathematics. Walter de Gruyter, Berlin, New York, 2 edition, 2004.
  • Föllmer and Schied [2016] Hans Föllmer and Alexander Schied. Stochastic finance. de Gruyter, 2016.
  • Frittelli and Gianin. [2005] Marco Frittelli and Emanuela Rosazza Gianin. Law invariant convex risk measures. Advances in mathematical economics., pages 33–46, 2005.
  • Frittelli and Rosazza Gianin [2002] Marco Frittelli and Emanuela Rosazza Gianin. Putting order in risk measures. Journal of Banking & Finance, 26(7):1473–1486, July 2002.
  • Gelfand and Mitter [1991] Saul B Gelfand and Sanjoy K Mitter. Recursive stochastic algorithms for global optimization in r^d. SIAM J Control Optim, 29(5):999–1018, 1991.
  • Glasserman et al. [2000] Paul Glasserman, Philip Heidelberger, and Perwez Shahabuddin. Variance reduction techniques for estimating value-at-risk. Manag. Sci, 46(10):1349–1364, 2000.
  • Holte [2009] John M Holte. Discrete gronwall lemma and applications. In MAA-NCS meeting at the University of North Dakota, volume 24, pages 1–7, 2009.
  • Hong and Liu. [2011] L. Jeff Hong and Guangwu Liu. Monte carlo estimation of value-at-risk, conditional value-at-risk and their sensitivities. Proceedings of the 2011 Winter Simulation Conference (WSC). IEEE, 2011.
  • Hoogerheide and van Dijk [2010] Lennart Hoogerheide and Herman K van Dijk. Bayesian forecasting of value at risk and expected shortfall using adaptive importance sampling. Int. J. Forecast., 26(2):231–247, 2010.
  • Hu et al. [2020] Kaitong Hu, Shenjie Ren, David Siska, and Lukasz Szpruch. Mean–field Langevin dynamics and energy landscape of neural networks. Annales de l’Institut Henri Poincaré (B) Probabilités and Statistiques, to appear, 2020.
  • Hwang [1980] C. R. Hwang. Laplace’s method revisited: weak convergence of probability measures. Annals of Probability, pages 2189–2211, 1980.
  • Iyengar and Ma [2013] Garud Iyengar and Alfred Ka Chun Ma. Fast gradient descent method for mean-cvar optimization. Ann. Oper. Res., 205(1):203–212, 2013.
  • Jouini et al. [2006] Elyès Jouini, Walter Schachermayer, and Nizar Touzi. Law invariant risk measures have the Fatou property. In Shigeo Kusuoka and Akira Yamazaki, editors, Advances in Mathematical Economics, volume 9 of Advances in Mathematical Economics, pages 49–71. Springer Japan, 2006.
  • Kloeden and Platen [2013] P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations. Springer Science and Business Media, 2013.
  • Krätschmer et al. [2014] Volker Krätschmer, Alexander Schied, and Henryk Zähle. Comparative and quantitative robustness for law-invariant risk measures. Finance Stoch., 2014.
  • Kupper and Schachermayer [2009] Michael Kupper and Walter Schachermayer. Representation results for law invariant time consistent functions. Math. Financ. Econ., 2(3):189–210, 2009.
  • Kusuoka [2001] Shigeo Kusuoka. On law invariant coherent risk measures. In Advances in mathematical economics, pages 83–95. Springer, 2001.
  • Lacker et al. [2020] D. Lacker, M. Shkolnikov, and J. Zhang. Inverting the markovian projection, with an application to local stochastic volatility models. Annals of Probability, 48(5):2189–2211, 2020.
  • McNeil et al. [2015] Alexander J. McNeil, Rüdiger Frey, and Paul Embrechts. Quantitative Risk Management. Princeton University Press, 2015.
  • Nesterov [2005] Yu Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127–152, 2005.
  • on Banking Supervision [2014] Basel Committee on Banking Supervision. Fundamental review of the trading book: A revised market risk framework. Technical report, Bank of international settlments, 2014.
  • Pichler and Schlotter [2020] Alois Pichler and R. Schlotter. Entropic based risk measures. European Journal on Operational Research, 285(1):223–236, 2020.
  • Pohl et al. [2020] Mathias Pohl, Alexander Ristig, Walter Schachermayer, and Ludovic Tangpi. Theoretical and empirical analysis of trading activity. Math. Program., 181(2):405–434, 2020.
  • Raginsky et al. [2017] Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In COLT, pages 1674–1703. PMLR, 2017.
  • Reppen and Soner [2023] Anders Max Reppen and Halil Mete Soner. Deep empirical risk minimization in finance: looking into the future. Mathematical Finance, 33(1):116–145, 2023.
  • Rockafellar [1968] Ralph Rockafellar. Integrals which are convex functionals. Pacific journal of mathematics, 24(3):525–539, 1968.
  • Sabanis and Zhang [2020] Sotirios Sabanis and Ying Zhang. A fully data-driven approach to minimizing CVaR for portfolio of assets via SGLD with discontinuous updating. arXiv preprint arXiv:2007.01672, 2020.
  • Sekimoto [2010] Ken Sekimoto. Stochastic energetics, volume 799. Springer, 2010.
  • Soma and Yoshida [2020] Tasuku Soma and Yuichi Yoshida. Statistical learning with conditional value at risk. arXiv preprint arXiv:2002.05826, 2020.
  • Tamar et al. [2015] Aviv Tamar, Yonatan Glassner, and Shie Mannor. Optimizing the cvar via sampling. In AAAI-15, 2015.
  • Villani [2009] C. Villani. Optimal Transport. Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, 2009.
  • Weber [2007] Stefan Weber. Distribution-invariant risk measures, Entropy, and large deviations. J. Appl. Prob., 44:16–40, 2007.
  • Welling and Teh. [2011] Max Welling and Yee W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
  • Yeung [2008] R. W. Yeung. Information theory and network coding. Springer Science and Business Media, 2008.
  • Zhu and Zhou [2015] Helin Zhu and Enlu Zhou. Estimation of conditional value-at-risk for input uncertainty with budget allocation. In Proc. 2015 WSC, pages 655–666. IEEE, 2015.