This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Random forward models and log-likelihoods in Bayesian inverse problems

H. C. Lie111Institute of Mathematics, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany, hlie@math.fu-berlin.de    T. J. Sullivan222Institute of Mathematics, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany, 333Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany, t.j.sullivan@fu-berlin.de sullivan@zib.de    A. L. Teckentrup444School of Mathematics, University of Edinburgh, UK, 555The Alan Turing Institute, 96 Euston Road, London, NW1 2DB, UK a.teckentrup@ed.ac.uk
(September 22, 2025)
Abstract

Abstract:   We consider the use of randomised forward models and log-likelihoods within the Bayesian approach to inverse problems. Such random approximations to the exact forward model or log-likelihood arise naturally when a computationally expensive model is approximated using a cheaper stochastic surrogate, as in Gaussian process emulation (kriging), or in the field of probabilistic numerical methods. We show that the Hellinger distance between the exact and approximate Bayesian posteriors is bounded by moments of the difference between the true and approximate log-likelihoods. Example applications of these stability results are given for randomised misfit models in large data applications and the probabilistic solution of ordinary differential equations.

Keywords:   Bayesian inverse problem, random likelihood, surrogate model, posterior consistency, uncertainty quantification, randomised misfit, probabilistic numerics.

2010 Mathematics Subject Classification:   62F15, 62G08, 65C99, 65D05, 65D30, 65J22, 68W20.

1 Introduction

Inverse problems are ubiquitous in the applied sciences and in recent years renewed attention has been paid to their mathematical and statistical foundations (Evans and Stark, 2002; Kaipio and Somersalo, 2005; Stuart, 2010). Questions of well-posedness — i.e. the existence, uniqueness, and stability of solutions — have been of particular interest for infinite-dimensional/non-parametric inverse problems because of the need to ensure stable and discretisation-independent inferences (Lassas and Siltanen, 2004) and develop algorithms that scale well with respect to high discretisation dimension (Cotter et al., 2013).

This paper considers the stability of the posterior distribution in a Bayesian inverse problem (BIP) when an accurate but computationally intractable forward model or likelihood is replaced by a random surrogate or emulator. Such stochastic surrogates arise often in practice. For example, an expensive forward model such as the solution of a PDE may replaced by a kriging/Gaussian process (GP) model (Stuart and Teckentrup, 2017). In the realm of “big data” a residual vector of prohibitively high dimension may be randomly subsampled or orthogonally projected onto a randomly-chosen low-dimensional subspace (Le et al., 2017; Nemirovski et al., 2008). In the field of probabilistic numerical methods (Hennig et al., 2015), a deterministic dynamical system may be solved stochastically, with the stochasticity representing epistemic uncertainty about the behaviour of the system below the temporal or spatial grid scale (Conrad et al., 2016; Lie et al., 2017).

In each of the above-mentioned settings, the stochasticity in the forward model propagates to associated inverse problems, so that the Bayesian posterior becomes a random measure, μNS\mu_{N}^{\textup{S}}, which we define precisely in (3.1). Alternatively, one may choose to average over the randomness to obtain a marginal posterior, μNM\mu_{N}^{\textup{M}}, which we define precisely in (3.2). It is natural to ask in which sense the approximate posterior (either the random or the marginal version) is close to the ideal posterior of interest, μ\mu.

In earlier work, Stuart and Teckentrup (2017) examined the case in which the random surrogate was a GP. More precisely, the object subjected to GP emulation was either the forward model (i.e. the parameter-to-observation map) or the negative log-likelihood. The prior GP was assumed to be continuous, and was then conditioned upon finitely many observations (i.e. pointwise evaluations) of the parameter-to-observation map or negative log-likelihood as appropriate. That paper provided error bounds on the Hellinger distance between the BIP’s exact posterior distribution and various approximations based on the GP emulator, namely approximations based on the mean of the predictive (i.e. conditioned) GP, as well as approximations based on the full GP emulator. Those results showed that the Hellinger distance between the exact BIP posterior and its approximations can be bounded by moments of the error in the emulator.

In this paper, we extend the analysis of Stuart and Teckentrup (2017) to consider more general (i.e. non-Gaussian) random approximations to forward models and log-likelihoods, and quantify the impact upon the posterior measure in a BIP. After establishing some notation in Section 2, we state the main approximation theorems in Section 3. Section 4 gives an application of the general theory to random misfit models, in which high-dimensional data are rendered tractable by projection into a randomly-chosen low-dimensional subspace. Section 5 gives an application to the stochastic numerical solution of deterministic dynamical systems, in which the stochasticity is a device used to represent the impact of numerical discretisation uncertainty. The proofs of all theorems are deferred to an appendix located after the bibliographic references.

2 Setup and notation

2.1 Spaces of probability measures

Throughout, (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) is a fixed probability space that is rich enough to serve as a common domain for all random variables of interest.

The space of probability measures on the Borel σ\sigma-algebra of a topological space 𝒰\mathcal{U} will be denoted by 1(𝒰)\mathcal{M}_{1}(\mathcal{U}); in practice, 𝒰\mathcal{U} will be a separable Banach space.

When μ1(𝒰)\mu\in\mathcal{M}_{1}(\mathcal{U}), integration of a measurable function (random variable) f:𝒰f\colon\mathcal{U}\to\mathbb{R} will also be denoted by expectation, i.e. 𝔼μ[f]𝒰f(u)dμ(u)\mathbb{E}_{\mu}[f]\coloneqq\int_{\mathcal{U}}f(u)\,\mathrm{d}\mu(u).

The space 1(𝒰)\mathcal{M}_{1}(\mathcal{U}) will be endowed with the Hellinger metric dH:1(𝒰)20d_{\textup{H}}\colon\mathcal{M}_{1}(\mathcal{U})^{2}\to\mathbb{R}_{\geq 0}: for μ,ν1(𝒰)\mu,\nu\in\mathcal{M}_{1}(\mathcal{U}) that are both absolutely continuous with respect to a reference measure π\pi,

dH(μ,ν)212𝒰|dμdπdνdπ|2dπ=1𝒰dμdπdνdπdπ=1𝔼ν[dμdν].d_{\textup{H}}(\mu,\nu)^{2}\coloneqq\frac{1}{2}\int_{\mathcal{U}}\left|\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\pi}}-\sqrt{\frac{\mathrm{d}\nu}{\mathrm{d}\pi}}\,\right|^{2}\,\mathrm{d}\pi=1-\int_{\mathcal{U}}\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\pi}\frac{\mathrm{d}\nu}{\mathrm{d}\pi}}\,\mathrm{d}\pi=1-\mathbb{E}_{\nu}\biggl{[}\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\nu}}\,\biggr{]}. (2.1)

The Hellinger distance is in fact independent of the choice of reference measure π\pi and defines a metric on 1(𝒰)\mathcal{M}_{1}(\mathcal{U}) (Bogachev, 2007, Lemma 4.7.35–36) with respect to which 1(𝒰)\mathcal{M}_{1}(\mathcal{U}) evidently has diameter at most 11. The Hellinger topology coincides with the total variation topology (Kraft, 1955) and is strictly weaker than the Kullback–Leibler (relative entropy) topology (Pinsker, 1964); all these topologies are strictly stronger than the topology of weak convergence of measures.

As used in Sections 35, the Hellinger metric is useful for uncertainty quantification when assessing the similarity of Bayesian posterior probability distributions, since expected values of square-integrable functions are Lipschitz continuous with respect to the Hellinger metric:

|𝔼μ[f]𝔼ν[f]|2𝔼μ[|f|2]+𝔼ν[|f|2]dH(μ,ν)\bigl{|}\mathbb{E}_{\mu}[f]-\mathbb{E}_{\nu}[f]\bigr{|}\leq 2\sqrt{\mathbb{E}_{\mu}\bigl{[}|f|^{2}\bigr{]}+\mathbb{E}_{\nu}\bigl{[}|f|^{2}\bigr{]}}\,d_{\textup{H}}(\mu,\nu) (2.2)

when fLμ2(𝒰)Lν2(𝒰)f\in L^{2}_{\mu}(\mathcal{U})\cap L^{2}_{\nu}(\mathcal{U}). In particular, for bounded ff, |𝔼μ[f]𝔼ν[f]|22fdH(μ,ν)|\mathbb{E}_{\mu}[f]-\mathbb{E}_{\nu}[f]|\leq 2\sqrt{2}\|f\|_{\infty}d_{\textup{H}}(\mu,\nu).

2.2 Bayesian inverse problems

By an inverse problem we mean the recovery of u𝒰u\in\mathcal{U} from an imperfect observation y𝒴y\in\mathcal{Y} of G(u)G(u), for a known forward operator G:𝒰𝒴G\colon\mathcal{U}\to\mathcal{Y}. In practice, the operator GG may arise as the composition G=OSG=O\circ S of the solution operator S:𝒰𝒱S\colon\mathcal{U}\to\mathcal{V} of a system of ordinary or partial differential equations with an observation operator O:𝒱𝒴O\colon\mathcal{V}\to\mathcal{Y}, and it is typically the case that 𝒴=J\mathcal{Y}=\mathbb{R}^{J} for some JJ\in\mathbb{N}, whereas 𝒰\mathcal{U} and 𝒱\mathcal{V} can have infinite dimension. For simplicity, we assume an additive noise model

y=G(u)+η,y=G(u)+\eta, (2.3)

where the statistics but not the realisation of η\eta are known. In the strict sense, this inverse problem is ill-posed in the sense that there may be no element u𝒰u\in\mathcal{U} for which G(u)=yG(u)=y, or there may be multiple such uu that are highly sensitive to the observed data yy.

The Bayesian perspective eases these problems by interpreting uu, yy, and η\eta all as random variables or fields. Through knowledge of the distribution of η\eta, (2.3) defines the conditional distribution of y|uy|u. After positing a prior probability distribution μ01(𝒰)\mu_{0}\in\mathcal{M}_{1}(\mathcal{U}) for uu, the Bayesian solution to the inverse problem is nothing other than the posterior distribution for the conditioned random variable u|yu|y. This posterior measure, which we denote μy1(𝒰)\mu^{y}\in\mathcal{M}_{1}(\mathcal{U}), is from the Bayesian point of view the proper synthesis of the prior information in μ0\mu_{0} with the observed data yy. The same posterior μy\mu^{y} can also be arrived at via the minimisation of penalised Kullback–Leibler, χ2\chi^{2}, or Dirichlet energies (Dupuis and Ellis, 1997; Jordan and Kinderlehrer, 1996; Ohta and Takatsu, 2011), where the penalisation again expresses compromise between fidelity to the prior and fidelity to the data.

The rigorous formulation of Bayes’ formula for this context requires careful treatment and some further notation (Stuart, 2010). The pair (u,y)(u,y) is assumed to be a well-defined random variable with values in 𝒰×𝒴\mathcal{U}\times\mathcal{Y}. The marginal distribution of uu is the Bayesian prior μ01(𝒰)\mu_{0}\in\mathcal{M}_{1}(\mathcal{U}). The observational noise η\eta is distributed according to 01(𝒴)\mathbb{Q}_{0}\in\mathcal{M}_{1}(\mathcal{Y}), independently of uu. The random variable y|uy|u is distributed according to u\mathbb{Q}_{u}, the translate of 0\mathbb{Q}_{0} by G(u)G(u), which is assumed to be absolutely continuous with respect to 0\mathbb{Q}_{0}, with

dud0(y)exp(Φ(u;y)).\frac{\mathrm{d}\mathbb{Q}_{u}}{\mathrm{d}\mathbb{Q}_{0}}(y)\propto\exp(-\Phi(u;y)).

The function Φ:𝒰×𝒴\Phi\colon\mathcal{U}\times\mathcal{Y}\to\mathbb{R} is called the negative log-likelihood or simply potential. In the elementary setting of centred Gaussian noise, η𝒩(0,Γ)\eta\sim\mathcal{N}(0,\Gamma) on 𝒴=J\mathcal{Y}=\mathbb{R}^{J}, the potential is the non-negative quadratic misfit666Hereafter, to reduce notational clutter, we write both 𝒰\|\hbox to5.71527pt{\hss$\cdot$\hss}\|_{\mathcal{U}} and 𝒴\|\hbox to5.71527pt{\hss$\cdot$\hss}\|_{\mathcal{Y}} as \|\hbox to5.71527pt{\hss$\cdot$\hss}\|. Φ(u;y)=12Γ1/2(yG(u))𝒴2\Phi(u;y)=\tfrac{1}{2}\bigl{\|}\Gamma^{-1/2}(y-G(u))\bigr{\|}_{\mathcal{Y}}^{2}. However, particularly for cases in which dim𝒴=\dim\mathcal{Y}=\infty, it may be necessary to allow Φ\Phi to take negative values and even to be unbounded below (Stuart, 2010, Remark 3.8).

With this notation, Bayes’ theorem is then as follows (Dashti and Stuart, 2016, Theorem 3.4):

Theorem 2.1 (Generalised Bayesian formula).

Suppose that Φ:𝒰×𝒴\Phi\colon\mathcal{U}\times\mathcal{Y}\to\mathbb{R} is μ00\mu_{0}\otimes\mathbb{Q}_{0}-measurable and that

Z(y)𝔼μ0[exp(Φ(u;y))]Z(y)\coloneqq\mathbb{E}_{\mu_{0}}\bigl{[}\exp(-\Phi(u;y))\bigr{]}

satisfies 0<Z(y)<0<Z(y)<\infty for 0\mathbb{Q}_{0}-almost all y𝒴y\in\mathcal{Y}. Then, for such yy, the conditional distribution μy\mu^{y} of u|yu|y exists and is absolutely continuous with respect to μ0\mu_{0} with density

dμydμ0(u)=exp(Φ(u;y))Z(y).\frac{\mathrm{d}\mu^{y}}{\mathrm{d}\mu_{0}}(u)=\frac{\exp(-\Phi(u;y))}{Z(y)}. (2.4)

Note that, for (2.4) to make sense, it is essential to check that 0<Z(y)<0<Z(y)<\infty. Hereafter, to save space, we regard the data yy as fixed, and hence write Φ(u)\Phi(u) in place of Φ(u;y)\Phi(u;y), ZZ in place of Z(y)Z(y), and μ\mu in place of μy\mu^{y}. In particular, we shall redefine the negative log-likelihood as a function Φ:𝒰\Phi\colon\mathcal{U}\to\mathbb{R}, instead of a function Φ:𝒰×𝒴\Phi\colon\mathcal{U}\times\mathcal{Y}\to\mathbb{R} as in Theorem 2.1 above.

From the perspective of numerical analysis, it is natural to ask about the well-posedness of the Bayesian posterior μ\mu: is it stable when the prior μ0\mu_{0}, the potential Φ\Phi, or the observed data yy are slightly perturbed, e.g. due to discretisation, truncation, or other numerical errors? For example, what is the impact of using an approximate numerical forward operator GNG_{N} in place of GG, and hence an approximate ΦN:𝒰\Phi_{N}\colon\mathcal{U}\to\mathbb{R} in place of Φ\Phi? Here, we quantify stability in the Hellinger metric dHd_{\textup{H}} from (2.1).

Stability of the posterior with respect to the observed data yy and the log-likelihood Φ\Phi was established for Gaussian priors by Stuart (2010) and for more general priors by many later contributions (Dashti et al., 2012; Hosseini, 2017; Hosseini and Nigam, 2017; Sullivan, 2017). (We note in passing that the stability of BIPs with respect to perturbation of the prior is possible but much harder to establish, particularly when the data yy are highly informative and the normalisation constant Z(y)Z(y) is close to zero; see e.g. the “brittleness” phenomenon of (Owhadi and Scovel, 2017; Owhadi et al., 2015).) Typical approximation theorems for the replacement of the potential Φ\Phi by a deterministic approximate potential ΦN\Phi_{N}, leading to an approximate posterior μN\mu_{N}, aim to transfer the convergence rate of the forward problem to the inverse problem, i.e. to prove an implication of the form

|Φ(u)ΦN(u)|M(u)ψ(N)dH(μ,μN)Cψ(N),\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}\leq M(\|u\|)\psi(N)\implies d_{\textup{H}}\bigl{(}\mu,\mu_{N}\bigr{)}\leq C\psi(N),

where M:00M\colon\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0} is suitably well behaved, ψ:0\psi\colon\mathbb{N}\to\mathbb{R}_{\geq 0} quantifies the convergence rate of the forward problem, and CC is a constant. Following Stuart and Teckentrup (2017), the purpose of this article is to extend this paradigm and these approximation results to the case in which the approximation ΦN\Phi_{N} is a random object.

3 Well-posed Bayesian inverse problems with random likelihoods

In many practical applications, the negative log-likelihood Φ\Phi is computationally too expensive or impossible to evaluate exactly; one therefore often uses an approximation ΦN\Phi_{N} of Φ\Phi. This leads to an approximation μN\mu_{N} of the exact posterior μ\mu, and a key desideratum is convergence, in a suitable sense, of μN\mu_{N} to μ\mu as the approximation error ΦNΦ\Phi_{N}-\Phi in the potential tends to zero.

The focus of this work is on random approximations ΦN\Phi_{N}. One particular example of such random approximations are the GP emulators analysed in Stuart and Teckentrup (2017); other examples include the randomised misfit models in Section 4 and the probabilistic numerical methods in Section 5. The present section extends the analysis of Stuart and Teckentrup (2017) from the case of GP approximations of forward models or log-likelihoods to more general non-Gaussian approximations. In doing so, more precise conditions are obtained for the exact Bayesian posterior to be well approximated by its random counterpart.

Let now ΦN:Ω×𝒰\Phi_{N}\colon\Omega\times\mathcal{U}\to\mathbb{R} be a measurable function that provides a random approximation to Φ:𝒰\Phi\colon\mathcal{U}\to\mathbb{R}, where we recall that we have fixed the data yy. Let νN\nu_{N} be a probability measure on Ω\Omega such that the distribution of the inputs of ΦN\Phi_{N} is given by νNμ0\nu_{N}\otimes\mu_{0}; we sometimes abuse notation and think of ΦN\Phi_{N} itself as being νN\nu_{N}-distributed. We assume throughout that the randomness in the approximation ΦN\Phi_{N} of Φ\Phi is independent of the randomness in the parameters being inferred.

Replacing Φ\Phi by ΦN\Phi_{N} in (2.4), we obtain the sample approximation μNS\mu_{N}^{\textup{S}}, the random measure given by

dμNSdμ0(ω,u)\displaystyle\frac{\mathrm{d}\mu_{N}^{\textup{S}}}{\mathrm{d}\mu_{0}}(\omega,u) exp(ΦN(ω,u))ZNS,\displaystyle\coloneqq\frac{\exp(-\Phi_{N}(\omega,u))}{Z_{N}^{\textup{S}}}, (3.1)
ZNS(ω)\displaystyle Z_{N}^{\textup{S}}(\omega) 𝔼μ0[exp(ΦN(ω,))]=𝒰exp(ΦN(ω,u))dμ0(u).\displaystyle\coloneqq\mathbb{E}_{\mu_{0}}\bigl{[}\exp(-\Phi_{N}(\omega,\hbox to5.71527pt{\hss$\cdot$\hss}))\bigr{]}=\int_{\mathcal{U}}\exp(-\Phi_{N}(\omega,u^{\prime}))\,\mathrm{d}\mu_{0}(u^{\prime}).

(Henceforth, we will omit the ω\omega argument for brevity.) Thus, the measure μ\mu is approximated by the random measure μNS:Ω1(𝒰)\mu_{N}^{\textup{S}}\colon\Omega\to\mathcal{M}_{1}(\mathcal{U}), and the normalisation constant ZNS:ΩZ_{N}^{\textup{S}}\colon\Omega\to\mathbb{R} is a random variable. A deterministic approximation of the posterior distribution μ\mu can now be obtained either by fixing ω\omega, i.e. by taking one particular realisation of the random posterior μNS\mu_{N}^{\textup{S}}, or by taking the expected value of the random likelihood exp(ΦN(u))\exp(-\Phi_{N}(u)), i.e. by averaging over different realisations of μNS\mu_{N}^{\textup{S}}. This yields the marginal approximation μNM\mu_{N}^{\textup{M}} defined by

dμNMdμ0(u)\displaystyle\frac{\mathrm{d}\mu_{N}^{\textup{M}}}{\mathrm{d}\mu_{0}}(u) 𝔼νN[exp(ΦN(u))]𝔼νN[ZNS],\displaystyle\coloneqq\frac{\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N}(u))\bigr{]}}{\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}}, (3.2)

where 𝔼νN[ZNS]=ΩZNS(ω)dνN(ω)\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}]=\int_{\Omega}Z_{N}^{\textup{S}}(\omega)\,\mathrm{d}\nu_{N}(\omega). We note that an alternative averaged, deterministic approximation can be obtained by taking the expected value of the density (ZNS)1eΦN(u)(Z_{N}^{\textup{S}})^{-1}e^{-\Phi_{N}(u)} in (3.1) as a whole, i.e. by taking the expected value of the ratio rather than the ratio of expected values. A result very similar to Theorem 3.1, with slightly modified assumptions, holds also in this case, with the proof following the same steps. However, the marginal approximation presented here appears more intuitive and more amenable to applications. Firstly, the marginal approximation provides a clear interpretation as the posterior distribution obtained by the approximation of the true data likelihood exp(Φ(u))\exp(-\Phi(u)) by 𝔼νN[exp(ΦN(u))]\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N}(u))\bigr{]}. Secondly, the marginal approximation is more amenable to sampling methods such as Markov chain Monte Carlo, with clear connections to the pseudo-marginal approach (Andrieu and Roberts, 2009; Beaumont, 2003).

3.1 Random misfit models

This section considers the general setting in which the deterministic potential Φ\Phi is approximated by a random potential ΦNνN\Phi_{N}\sim\nu_{N}. Recall from (2.4) that ZZ is the normalisation constant of μ\mu, and that for μ\mu to be well-defined, we must have that 0<Z<0<Z<\infty. The following two results, Theorems 3.1 and 3.2, extend Theorems 4.9 and 4.11 respectively of Stuart and Teckentrup (2017), in which the approximation is a GP model:

Theorem 3.1 (Deterministic convergence of the marginal posterior).

Suppose that there exist scalars C1,C2,C30C_{1},C_{2},C_{3}\geq 0, independent of NN, such that, for the Hölder-conjugate exponent pairs (p1,p1)(p_{1},p_{1}^{\prime}), (p2,p2)(p_{2},p_{2}^{\prime}), and (p3,p3)(p_{3},p_{3}^{\prime}), we have

  1. (a)

    min{𝔼νN[exp(ΦN)]1Lμ0p1(𝒰),exp(Φ)Lμ0p1(𝒰)}C1(p1)\min\left\{\bigl{\|}\mathbb{E}_{\nu_{N}}[\exp(-\Phi_{N})]^{-1}\bigr{\|}_{L^{p_{1}}_{\mu_{0}}(\mathcal{U})},\bigl{\|}\exp(\Phi)\bigr{\|}_{L^{p_{1}}_{\mu_{0}}(\mathcal{U})}\right\}\leq C_{1}(p_{1});

  2. (b)

    𝔼νN[(exp(Φ)+exp(ΦN))p2]1/p2Lμ02p1p3(𝒰)C2(p1,p2,p3)\left\|\mathbb{E}_{\nu_{N}}\Big{[}\big{(}\exp(-\Phi)+\exp(-\Phi_{N})\big{)}^{p_{2}}\Big{]}^{1/p_{2}}\right\|_{L^{2p_{1}^{\prime}p_{3}}_{\mu_{0}}(\mathcal{U})}\leq C_{2}(p_{1},p_{2},p_{3});

  3. (c)

    C31𝔼νN[ZNS]C3C_{3}^{-1}\leq\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}]\leq C_{3}.

Then there exists C=C(C1,C2,C3,Z)>0C=C(C_{1},C_{2},C_{3},Z)>0, independent of NN, such that

dH(μ,μNM)\displaystyle d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)} C𝔼νN[|ΦΦN|p2]1/p2Lμ02p1p3(𝒰),\displaystyle\leq C\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p_{2}^{\prime}}\bigr{]}^{1/p_{2}^{\prime}}\right\|_{L^{2p_{1}^{\prime}p_{3}^{\prime}}_{\mu_{0}}(\mathcal{U})}, (3.3a)
C(C1,C2,C3,Z)\displaystyle C(C_{1},C_{2},C_{3},Z) =(C1(p1)Z+C3max{Z3,C33})C22(p1,p2,p3).\displaystyle=\left(\frac{C_{1}(p_{1})}{Z}+C_{3}\max\left\{Z^{-3},C_{3}^{3}\right\}\right)C_{2}^{2}(p_{1},p_{2},p_{3}). (3.3b)

In the proof of Theorem 3.1, we show that hypothesis (a) arises as an upper bound on the quantity (eΦ+𝔼νN[eΦN])1Lμ0p1(𝒰)\|(e^{-\Phi}+\mathbb{E}_{\nu_{N}}[e^{-\Phi_{N}}])^{-1}\|_{L^{p_{1}}_{\mu_{0}}(\mathcal{U})}. In order for the conclusion of Theorem 3.1 to hold, we need the latter to be finite. Thus, hypothesis (a) is an exponential decay condition on the positive tails of either Φ\Phi or ΦN\Phi_{N}, with respect to the appropriate measures. Alternatively, by applying Jensen’s inequality to 𝔼νN[eΦN]1\mathbb{E}_{\nu_{N}}[e^{-\Phi_{N}}]^{-1}, one can strengthen hypothesis (a) into the hypothesis of exponential integrability of either Φ\Phi with respect to μ0\mu_{0} or ΦN\Phi_{N} with respect to νNμ0\nu_{N}\otimes\mu_{0}; this yields the same interpretation. Thus, the parameter p1p_{1} quantifies the exponential decay of the positive tail of either Φ\Phi or ΦN\Phi_{N}.

By comparing the quantity (eΦ+𝔼νN[eΦN])1Lμ0p1(𝒰)\|(e^{-\Phi}+\mathbb{E}_{\nu_{N}}[e^{-\Phi_{N}}])^{-1}\|_{L^{p_{1}}_{\mu_{0}}(\mathcal{U})} from hypothesis (a) with the quantity in hypothesis (b), it follows that hypothesis (b) is an exponential decay condition on the negative tails of both Φ\Phi and ΦN\Phi_{N}. The two new parameters in this decay condition arise because we apply Hölder’s inequality twice in order to develop the desired bound (3.3) on dH(μ,μNM)d_{\textup{H}}(\mu,\mu_{N}^{\textup{M}}). The key desideratum here is that the bound is multiplicative in some Lμ0p(𝒰)L^{p^{\prime}}_{\mu_{0}}(\mathcal{U})-norm of 𝔼νN[|ΦΦN|p2]1/p2\mathbb{E}_{\nu_{N}}[|\Phi-\Phi_{N}|^{p_{2}^{\prime}}]^{1/p_{2}^{\prime}}. The two new parameters p2p_{2} and p1p3p_{1}^{\prime}p_{3} quantify the decay with respect to νN\nu_{N} and μ0\mu_{0} respectively. Note that the interaction between the hypotheses (a) and (b) as described by the conjugate exponent pair (p1,p1)(p_{1},p_{1}^{\prime}) implies that one can trade off faster exponential decay of one tail with slower exponential decay of the other.

The two-sided condition on 𝔼νN[ZNS]\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}] in hypothesis (c) ensures that both tails of ΦN\Phi_{N} with respect to νNμ0\nu_{N}\otimes\mu_{0} decay sufficiently quickly. This hypothesis ensures that the Radon–Nikodym derivative in (3.2) is well-defined.

Finally, we note that the quantity on the right hand side of (3.3a) depends directly on the conjugate exponents of p1,p2p_{1},p_{2} and p3p_{3} appearing in hypotheses (a) and (b). The more well behaved the quantities in these hypotheses are, the weaker the norm we can choose on the right hand side of (3.3a).

Theorem 3.2 (Mean-square convergence of the sample posterior).

Suppose that there exist scalars D1,D20D_{1},D_{2}\geq 0, independent of NN, such that, for Hölder-conjugate exponent pairs (q1,q1)(q_{1},q_{1}^{\prime}) and (q2,q2)(q_{2},q_{2}^{\prime}), we have

  1. (a)

    𝔼νN[(eΦ/2+eΦN/2)2q1]1/q1Lμ0q2(𝒰)D1(q1,q2)\left\|\mathbb{E}_{\nu_{N}}\Big{[}\big{(}e^{-\Phi/2}+e^{-\Phi_{N}/2}\big{)}^{2q_{1}}\Big{]}^{1/q_{1}}\right\|_{L^{q_{2}}_{\mu_{0}}(\mathcal{U})}\leq D_{1}(q_{1},q_{2});

  2. (b)

    𝔼νN[(ZNSmax{Z3,(ZNS)3}(eΦ+eΦN)2)q1]1/q1Lμ0q2(𝒰)D2(q1,q2)\left\|\mathbb{E}_{\nu_{N}}\left[\left(Z_{N}^{\textup{S}}\max\left\{Z^{-3},(Z_{N}^{\textup{S}})^{-3}\right\}\left(e^{-\Phi}+e^{-\Phi_{N}}\right)^{2}\right)^{q_{1}}\right]^{1/q_{1}}\right\|_{L^{q_{2}}_{\mu_{0}}(\mathcal{U})}\leq D_{2}(q_{1},q_{2}).

Then

𝔼νN[dH(μ,μNS)2]1/2\displaystyle\mathbb{E}_{\nu_{N}}\left[d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\right]^{1/2} (D1+D2)𝔼νN[|ΦΦN|2q1]1/2q1Lμ02q2(𝒰),\displaystyle\leq\left(D_{1}+D_{2}\right)\left\|\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}\right]^{1/2q_{1}^{\prime}}\right\|_{L^{2q_{2}^{\prime}}_{\mu_{0}}(\mathcal{U})}, (3.4)

Hypothesis (a) of Theorem 3.2 arises during the proof as a result of developing an upper bound on 𝔼νN[(eΦ/2eΦN/2)2]\|\mathbb{E}_{\nu_{N}}[(e^{-\Phi/2}-e^{-\Phi_{N}/2})^{2}]\| that is multiplicative in some Lμ0p(𝒰)L^{p^{\prime}}_{\mu_{0}}(\mathcal{U})-norm of 𝔼νN[|ΦΦN|2q1]1/2q1\|\mathbb{E}_{\nu_{N}}[|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}]^{1/2q_{1}^{\prime}}\|. Thus, it describes an exponential decay condition of the negative tails of both Φ\Phi or ΦN\Phi_{N}; in particular, hypothesis (a) is always satisfied when the potentials Φ\Phi or ΦN\Phi_{N} are non-negative, as is usually the case for finite-dimensional data. The appearance of q1q_{1} and q2q_{2} arises due to one application of Hölder’s inequality for fulfulling the desideratum of multiplicativity, and q1q_{1} and q2q_{2} quantify the decay with respect to νN\nu_{N} and μ0\mu_{0} respectively.

Hypothesis (b) of Theorem 3.2 arises as a result of developing an upper bound on the quantity 𝔼νN[ZNS(Z1/2(ZNS)1/2)2]\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}(Z^{-1/2}-(Z_{N}^{\textup{S}})^{-1/2})^{2}] that fulfills the desideratum of multiplicativity mentioned above. The presence of both ZNSZ_{N}^{\textup{S}} and its reciprocal indicates that hypothesis (b) is analogous to hypothesis (c) of Theorem 3.1, in that hypothesis (b) is a condition on the tails of ΦN\Phi_{N} with respect to μ0\mu_{0}. The difference between hypothesis (b) of Theorem 3.2 and hypothesis (c) of Theorem 3.1 arises due to the fact that the Radon–Nikodym derivative in (3.1) features ZNSZ_{N}^{\textup{S}} instead of 𝔼νN[ZNS]\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}].

We now show that the assumptions of Theorems 3.1 and 3.2 are satisfied when the exact potential Φ\Phi and the approximation quality ΦNΦ\Phi_{N}\approx\Phi are suitably well behaved. Since 0<Z<0<Z<\infty, it follows that C31<Z<C3C_{3}^{-1}<Z<C_{3} for some 0<C3<0<C_{3}<\infty.

Assumption 3.3.

There exists C0C_{0}\in\mathbb{R} that does not depend on NN, such that, for all NN\in\mathbb{N},

ΦC0andνN({ΦNΦNC0})=1,\Phi\geq-C_{0}\quad\text{and}\quad\nu_{N}\left(\{\Phi_{N}\mid\Phi_{N}\geq-C_{0}\}\right)=1, (3.5)

and for any 0<C3<0<C_{3}<\infty with the property that C31<Z<C3C_{3}^{-1}<Z<C_{3}, there exists N(C3)N^{\ast}(C_{3})\in\mathbb{N} such that, for all NNN\geq N^{\ast},

𝔼μ0[𝔼νN[|ΦNΦ|]]12exp(C0)min{Z1C3,C3Z}.\mathbb{E}_{\mu_{0}}\left[\mathbb{E}_{\nu_{N}}\left[|\Phi_{N}-\Phi|\right]\right]\leq\frac{1}{2\exp(C_{0})}\min\left\{Z-\frac{1}{C_{3}},C_{3}-Z\right\}. (3.6)

The lower bound conditions in (3.5) ensure that the hypothesised exponential decay conditions on the negative tails of the true likelihood and the random likelihoods from Theorems 3.1 and 3.2 are satisfied. The uniform lower bound on Φ\Phi translates into a uniform upper bound of the Radon–Nikodym derivative of the posterior with respect to the prior, and is a very mild condition that is satisfied in many, if not most, BIPs. Given this fact, it is reasonable to demand that the ΦN\Phi_{N} satisfy the same uniform lower bound, νN\nu_{N}-almost surely and for all NN\in\mathbb{N}; this is the content of the second condition in (3.5). Condition (3.6) expresses the condition that, by choosing NN sufficiently large, one can approximate Φ\Phi arbitrarily well using the random ΦN\Phi_{N}, with respect to the Lμ0νN1L^{1}_{\mu_{0}\otimes\nu_{N}} topology. This assumption ensures that the stated aims of this work are reasonable.

Lemma 3.4.

Suppose that Assumption 3.3 holds with C0C_{0} as in (3.5) and C3C_{3} and N(C3)N^{\ast}(C_{3}) as in (3.6), that exp(Φ)Lμ0p(𝒰)\exp(\Phi)\in L^{p^{\ast}}_{\mu_{0}}(\mathcal{U}) for some 1p+1\leq p^{\ast}\leq+\infty with conjugate exponent (p)(p^{\ast})^{\prime}, and there exists some C4C_{4}\in\mathbb{R} that does not depend on NN, such that, for all NN\in\mathbb{N},

νN({ΦN𝔼μ0[ΦN]C4})=1.\nu_{N}\left(\left\{\Phi_{N}\mid\mathbb{E}_{\mu_{0}}\left[\Phi_{N}\right]\leq C_{4}\right\}\right)=1. (3.7)

Then the hypotheses of Theorem 3.1 hold, with

p1=p,p2=p3=+,C1=exp(Φ)Lμ0p(𝒰),C2=2exp(C0),p_{1}=p^{\ast},\ p_{2}=p_{3}=+\infty,\ C_{1}=\|\exp(\Phi)\|_{L^{p^{\ast}}_{\mu_{0}}(\mathcal{U})},\ C_{2}=2\exp(C_{0}),

and C3C_{3} as above. Moreover, the hypotheses of Theorem 3.2 hold, with

q1=q2=,D1=4exp(C0),D2=4exp(3C0)max{C33,exp(3C4)}.q_{1}=q_{2}=\infty,\ D_{1}=4\exp(C_{0}),\ D_{2}=4\exp(3C_{0})\max\{C_{3}^{-3},\exp(3C_{4})\}.

The uniform upper bound condition on ΦN\Phi_{N} with respect to μ0\mu_{0} in (3.7) is rather strong; we use it to ensure that ZNSZ_{N}^{\textup{S}} is bounded away from zero, uniformly with respect to ΦN\Phi_{N} and NN\in\mathbb{N}. Together with the condition on ΦN\Phi_{N} in (3.5), this translates to uniform lower and upper bounds on ZNSZ_{N}^{\textup{S}}; the latter implies that hypothesis (b) in Theorem 3.2 holds with the stated values of q1q_{1} and q2q_{2}. A sufficient condition for (3.7) is that the ΦN\Phi_{N} are themselves uniformly bounded. This condition is of interest when the misfit Φ\Phi is associated to a bounded forward model and the data take values in a bounded subset.

Lemma 3.5.

Suppose that Assumption 3.3 holds with C0C_{0} as in (3.5) and C3C_{3} and N(C3)N^{\ast}(C_{3}) as in (3.6), and that there exists some 2<ρ<+2<\rho^{\ast}<+\infty such that 𝔼νN[exp(ρΦN)]Lμ01(𝒰)\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\in L^{1}_{\mu_{0}}(\mathcal{U}). Then the hypotheses of Theorem 3.1 hold, with

p1=ρ,p2=p3=+,C1=𝔼νN[exp(ρΦN)]Lμ01(𝒰)1/ρ,C2=2exp(C0),p_{1}=\rho^{\ast},\ p_{2}=p_{3}=+\infty,\ C_{1}=\|\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\|_{L^{1}_{\mu_{0}}(\mathcal{U})}^{1/\rho^{\ast}},\ C_{2}=2\exp(C_{0}),

and C3C_{3} as above. Moreover, the hypotheses of Theorem 3.2 hold, with

q1\displaystyle q_{1} =ρ2,\displaystyle=\frac{\rho^{\ast}}{2}, q2\displaystyle q_{2} =+,\displaystyle=+\infty,
D1\displaystyle D_{1} =4exp(C0),\displaystyle=4\exp(C_{0}), D2\displaystyle D_{2} =4exp(2C0)(C33exp(C0)+𝔼νN[exp(ρΦN)]Lμ01(𝒰)2/ρ).\displaystyle=4\exp(2C_{0})\left(C_{3}^{-3}\exp(C_{0})+\|\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\|^{2/\rho^{\ast}}_{L^{1}_{\mu_{0}}(\mathcal{U})}\right).

By comparing the hypotheses and conclusions of Lemma 3.4 and Lemma 3.5, we observe that, by reducing the exponent of integrability from q1=+q_{1}=+\infty to q1=ρ/2q_{1}=\rho^{\ast}/2, we can replace the strong uniform upper bound condition (3.7) on ΦN\Phi_{N} from Lemma 3.4 with the weaker condition that exp(ΦN)Lμ0ρ(𝒰)\exp(\Phi_{N})\in L^{\rho^{\ast}}_{\mu_{0}}(\mathcal{U}) in Lemma 3.5, and thus increase the scope of applicability of the conclusion.

In Lemmas 3.4 and 3.5 above, we have specified the largest possible values of the exponents that are compatible with the hypotheses. This is because later, in Theorem 3.9, we will want to use the smallest possible values of the corresponding conjugate exponents in the resulting inequalities (3.3a) and (3.4).

3.2 Random forward models in quadratic potentials

In many settings, the potentials Φ\Phi and ΦN\Phi_{N} have a common form and differ only in the parameter-to-observable map. In this section we shall assume that Φ\Phi and ΦN\Phi_{N} are quadratic misfits of the form

Φ(u)=12Γ1/2(G(u)y)2andΦN(u)=12Γ1/2(GN(u)y)2,\Phi(u)=\frac{1}{2}\bigl{\|}\Gamma^{-1/2}(G(u)-y)\bigr{\|}^{2}\quad\text{and}\quad\Phi_{N}(u)=\frac{1}{2}\bigl{\|}\Gamma^{-1/2}(G_{N}(u)-y)\bigr{\|}^{2}, (3.8)

corresponding to centred Gaussian observational noise with symmetric positive-definite covariance Γ\Gamma. Again, we assume that GG is deterministic while GNG_{N} is random. In this section, for this setting, we show how the quality of the approximation GNGG_{N}\approx G transfers to the approximation ΦNΦ\Phi_{N}\approx\Phi, and hence to the approximation μNμ\mu_{N}\approx\mu (for either the sample or marginal approximate posterior).

Pointwise in uu and ω\omega, the errors in the misfit and the forward model are related according to the following proposition.

Proposition 3.6.

Let Φ\Phi and ΦN\Phi_{N} be defined as in (3.8), where 𝒴=J\mathcal{Y}=\mathbb{R}^{J} for some JJ\in\mathbb{N} and the eigenvalues of the operator Γ\Gamma are bounded away from zero. Then, for some C=CΓ>0C=C_{\Gamma}>0, for all u𝒰u\in\mathcal{U}, and νN\nu_{N}-almost surely

|Φ(u)ΦN(u)|2CΓ(Φ(u)1/2G(u)GN(u)+G(u)GN(u)2).\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}\leq 2C_{\Gamma}\left(\Phi(u)^{1/2}\|G(u)-G_{N}(u)\|+\|G(u)-G_{N}(u)\|^{2}\right). (3.9)

Hence, for q[1,)q\in[1,\infty) and all u𝒰u\in\mathcal{U},

𝔼νN[|Φ(u)ΦN(u)|q]1/q\displaystyle\mathbb{E}_{\nu_{N}}\left[\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}^{q}\right]^{1/q} 4CΓ(Φ(u)q/2𝔼νN[G(u)GN(u)q]\displaystyle\leq 4C_{\Gamma}\Bigl{(}\Phi(u)^{q/2}\mathbb{E}_{\nu_{N}}\left[\|G(u)-G_{N}(u)\|^{q}\right] (3.10)
+𝔼νN[G(u)GN(u)2q])1/q.\displaystyle\phantom{=}\quad+\mathbb{E}_{\nu_{N}}\left[\|G(u)-G_{N}(u)\|^{2q}\right]\Bigr{)}^{1/q}.

By assuming that 𝒴=J\mathcal{Y}=\mathbb{R}^{J}, we assume that the data live in a finite-dimensional space. This is a standard assumption in the area, and implies that the operator Γ\Gamma is simply a matrix. The assumption of the eigenvalues of Γ\Gamma being bounded away from zero is equivalent to assuming that Γ\Gamma is invertible, which follows immediately from the assumption stated earlier that Γ\Gamma is a symmetric and positive-definite covariance matrix.

Corollary 3.7.

Let 1qs1\leq q\leq s, and suppose that ΦLμ0s(𝒰)\Phi\in L^{s}_{\mu_{0}}(\mathcal{U}). If there exists an NN^{\ast}\in\mathbb{N} such that, for all NNN\geq N^{\ast},

𝔼νN[GGN2q]1/qLμ0s(𝒰)1,\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\|G-G_{N}\|^{2q}\bigr{]}^{1/q}\right\|_{L_{\mu_{0}}^{s}(\mathcal{U})}\leq 1,

then, there exists some C=C(s)>0C=C(s)>0 that does not depend on NN such that for all NNN\geq N^{\ast},

𝔼νN[|ΦΦN|q]1/qLμ0s(𝒰)\displaystyle\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{q}\bigr{]}^{1/q}\right\|_{L^{s}_{\mu_{0}}(\mathcal{U})} C𝔼νN[GGN2q]1/qLμ0s(𝒰)1/2\displaystyle\leq C\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\|G-G_{N}\|^{2q}\bigr{]}^{1/q}\right\|_{L^{s}_{\mu_{0}}(\mathcal{U})}^{1/2}

where C(s)=(8CΓ)(𝔼μ0[Φs]1/2+1)1/sC(s)=(8C_{\Gamma})\left(\mathbb{E}_{\mu_{0}}[\Phi^{s}]^{1/2}+1\right)^{1/s} and CΓC_{\Gamma} is as in Proposition 3.6.

The hypotheses ensure that the integrability of the misfit Φ\Phi determines the highest degree of integrability of the forward operators GNG_{N} and GG, and that for sufficiently large NN, we may make the norm of the difference of GGNG-G_{N} in an appropriate topology small enough. The constraint (3.7) is used to combine the G(u)GN(u)\|G(u)-G_{N}(u)\| and G(u)GN(u)2\|G(u)-G_{N}(u)\|^{2} terms in (3.9). The resulting simplification ensures that we may apply Lemma 3.8.

Lemma 3.8.

Let Φ\Phi and ΦN\Phi_{N} be as in (3.8). If, for some q,s1q,s\geq 1,

limN𝔼νN[GGN2q]1/qLμ0s(𝒰)=0,\lim_{N\to\infty}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\|G-G_{N}\|^{2q}\bigr{]}^{1/q}\right\|_{L^{s}_{\mu_{0}}(\mathcal{U})}=0, (3.11)

then Assumption 3.3 holds.

The lemma states that if the random forward model converges to the true forward model in the appropriate topology, then the conditions in Assumption 3.3 are satisfied by the corresponding random misfits. Since the misfits were assumed to be quadratic in (3.8), the key contribution of Lemma 3.8 is to ensure that the approximation quality condition (3.6) is satisfied.

We shall use the preceding results to obtain bounds on the Hellinger distance in terms of errors in the forward model, of the following form: for C,D>0C,D>0 and r1,r2,s1,s21r_{1},r_{2},s_{1},s_{2}\geq 1 that do not depend on NN,

dH(μ,μNM)\displaystyle d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)} C𝔼νN[GNG2r1]1/r1Lμ0r2(𝒰)1/2\displaystyle\leq C\bigl{\|}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2r_{1}}\right]^{1/r_{1}}\bigr{\|}^{1/2}_{L^{r_{2}}_{\mu_{0}}(\mathcal{U})} (3.12)
𝔼νN[dH(μ,μNS)2]1/2\displaystyle\mathbb{E}_{\nu_{N}}\left[d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\right]^{1/2} D𝔼νN[GNG2s1]1/s1Lμ0s2(𝒰)1/2.\displaystyle\leq D\bigl{\|}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2s_{1}}\right]^{1/s_{1}}\bigr{\|}^{1/2}_{L^{s_{2}}_{\mu_{0}}(\mathcal{U})}. (3.13)

For brevity and simplicity, the following result uses one pair q,s1q,s\geq 1 in (3.11) in order to obtain convergence statements for both μNM\mu_{N}^{\textup{M}} and μNS\mu^{\textup{S}}_{N}. If one is interested in only one of these measures, then one may optimise qq and ss accordingly.

Theorem 3.9 (Convergence of posteriors for randomised forward models in quadratic potentials).

Let Φ\Phi and ΦN\Phi_{N} be as in (3.8).

  1. (a)

    Suppose there exists some p>1p^{\ast}>1 with Hölder conjugate (p)(p^{\ast})^{\prime} such that exp(Φ)Lμ0p(𝒰)\exp(\Phi)\in L^{p^{\ast}}_{\mu_{0}}(\mathcal{U}), and suppose that (3.7) holds for some C4>0C_{4}>0. If GNGG_{N}\to G as in (3.11) with q=2q=2 and s=2p/(p1)s=2p^{\ast}/(p^{\ast}-1), then the following hold:

    1. (i)

      there exists some C>0C>0 that does not depend on NN, for which (3.12) holds with r1=1r_{1}=1 and r2=2p/(p1)r_{2}=2p^{\ast}/(p^{\ast}-1), and

    2. (ii)

      there exists some D>0D>0 that does not depend on NN, for which (3.13) holds with s1=2s_{1}=2 and s2=2s_{2}=2.

  2. (b)

    Suppose there exists some 2<ρ<2<\rho^{\ast}<\infty such that 𝔼νN[exp(ρΦN)]Lμ01\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\in L^{1}_{\mu_{0}}. If GNGG_{N}\to G as in (3.11) with q=2ρ/(ρ2)q=2\rho^{\ast}/(\rho^{\ast}-2) and s=2ρ/(ρ1)s=2\rho^{\ast}/(\rho^{\ast}-1), then the following hold:

    1. (i)

      there exists some C>0C>0 that does not depend on NN, for which (3.12) holds with r1=1r_{1}=1 and r2=2ρ/(ρ1)r_{2}=2\rho^{\ast}/(\rho^{\ast}-1), and

    2. (ii)

      there exists some D>0D>0 that does not depend on NN, for which (3.13) holds with s1=2ρ/(ρ2)s_{1}=2\rho^{\ast}/(\rho^{\ast}-2) and s2=2s_{2}=2.

In both cases, μNM\mu^{\textup{M}}_{N} and μNS\mu^{\textup{S}}_{N} converge to μ\mu in the appropriate metrics given in (3.12) and (3.13) respectively.

The proof of Theorem 3.9 consists of tracking the dependence of the parameters over the sequential application of the preceding results, all of which are used.

Case (a) applies in the situation where the random approximations ΦN\Phi_{N} are uniformly bounded from above; as discussed earlier, this condition is satisfied in the case that the misfit Φ\Phi is associated to a bounded forward model and the data take values in a bounded subset of 𝒴=J\mathcal{Y}=\mathbb{R}^{J}. Note that the topology of the convergence of GNG_{N} to GG is quantified by ss and qq, and that ss depends on the parameter pp^{\ast} that quantifies the exponential μ0\mu_{0}-integrability of the misfit Φ\Phi. In particular, the faster the exponential decay of the positive tail of Φ\Phi (i.e. the larger the value of pp^{\ast}), the stronger the topology of convergence of GNG_{N} to GG.

In contrast to case (a), case (b) does not assume that the misfit Φ\Phi is exponentially integrable or that the random approximations ΦN\Phi_{N} are uniformly bounded from above νN\nu_{N}-almost surely. Instead, exponential integrability of the random misfit ΦN\Phi_{N} is required. Another difference is that the exponential integrability parameter ρ\rho^{\ast} determines the strength of the topology of convergence of the random forward models, not only with respect to the μ0\mu_{0}-topology, but also to the νN\nu_{N}-topology as well.

4 Application: randomised misfit models

This section considers a particular Monte Carlo approximation ΦN\Phi_{N} of a quadratic potential Φ\Phi, proposed by Nemirovski et al. (2008); Shapiro et al. (2009), and further applied and analysed in the context of BIPs by Le et al. (2017). This approximation is particularly useful when the data yJy\in\mathbb{R}^{J} has very high dimension, so that one does not wish to interrogate every component of the data vector yy, or evaluate every component of the model prediction G(u)G(u) and compare it with the corresponding component of yy.

Let σ\sigma be an J\mathbb{R}^{J}-valued random vector with mean zero and identity covariance, and let σ(1),,σ(N)\sigma^{(1)},\dots,\sigma^{(N)} be independent and identically distributed copies (samples) of σ\sigma. We then have the following approximation:

Φ(u)\displaystyle\Phi(u) 12Γ1/2(yG(u))2\displaystyle\coloneqq\frac{1}{2}\left\|\Gamma^{-1/2}(y-G(u))\right\|^{2}
=12(Γ1/2(yG(u)))𝚃𝔼[σσ𝚃](Γ1/2(yG(u)))\displaystyle=\frac{1}{2}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}^{\mathtt{T}}\mathbb{E}[\sigma\sigma^{\mathtt{T}}]\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}
=12𝔼[|σ𝚃(Γ1/2(yG(u)))|2]\displaystyle=\frac{1}{2}\mathbb{E}\biggl{[}\bigl{|}\sigma^{\mathtt{T}}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}\bigr{|}^{2}\biggr{]}
12Ni=1N|σ(i)𝚃(Γ1/2(yG(u)))|2\displaystyle\approx\frac{1}{2N}\sum_{i=1}^{N}\bigl{|}{\sigma^{(i)}}^{\mathtt{T}}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}\bigr{|}^{2}
ΦN(u).\displaystyle\eqqcolon\Phi_{N}(u).

The analysis and numerical studies in Le et al. (2017, Sections 3–4) suggest that a good choice for the random vector σ\sigma would be one with independent and identically distributed (i.i.d.) entries from a sub-Gaussian probability distribution on \mathbb{R}. Examples of sub-Gaussian distributions considered include

  1. (a)

    the standard Gaussian distribution: σj𝒩(0,1)\sigma_{j}\sim\mathcal{N}(0,1), for j=1,,Jj=1,\dots,J; and

  2. (b)

    the \ell-sparse distribution: for [0,1)\ell\in[0,1), let s111s\coloneqq\frac{1}{1-\ell}\geq 1 and set, for j=1,,Jj=1,\dots,J,

    σjs{1,with probability 12s,0,with probability =11s,1,with probability 12s.\sigma_{j}\coloneqq\sqrt{s}\begin{cases}1,&\text{with probability $\frac{1}{2s}$,}\\ 0,&\text{with probability $\ell=1-\frac{1}{s}$,}\\ -1,&\text{with probability $\frac{1}{2s}$.}\end{cases}

The randomised misfit ΦN\Phi_{N} can provide computational benefits in two ways. Firstly, a single evaluation of ΦN\Phi_{N} can be made cheap by choosing the \ell-sparse distribution for σ\sigma, with large sparsity parameter \ell. This choice ensures that a large proportion of the entries of each sample σ(i)\sigma^{(i)} will be zero, significantly reducing the cost to compute the required inner products in ΦN\Phi_{N}, since there is no need to compute the components of the data or model vector that will be eliminated by the sparsity pattern. The value of NN of course also influences the computational cost. It is observed by Le et al. (2017) that, for large JJ and moderate N10N\approx 10, the random potential ΦN\Phi_{N} and the original potential Φ\Phi are already very similar, in particular having approximately the same minimisers and minimum values. Statistically, these correspond to the maximum likelihood estimators under Φ\Phi and ΦN\Phi_{N} being very similar; after weighting by a prior, this corresponds to similarity of maximum a posteriori (MAP) estimators.

The second benefit of the randomised misfit approach, and the main motivation for its use in Le et al. (2017), is the reduction in computational effort needed to compute the MAP estimate. This task involves the solution of a large-scale optimisation problem involving Φ\Phi in the objective function, which is typically done using inexact Newton methods. It is shown by Le et al. (2017) that the required number of evaluations of the forward model GG and its adjoint is drastically reduced when using the randomised misfit ΦN\Phi_{N} as opposed to using the true misfit Φ\Phi, approximately by a factor of JN\frac{J}{N}.

The aim of this section is to show that the use of the randomised misfit ΦN\Phi_{N} does not only lead to the MAP estimate being well-approximated, but in fact the whole Bayesian posterior distribution. Thus, the corresponding conjecture is that the ideal and deterministic posterior dμ(u)exp(Φ(u))dμ0(u)\mathrm{d}\mu(u)\propto\exp(-\Phi(u))\,\mathrm{d}\mu_{0}(u) is well approximated by the random posterior dμNS(u)exp(ΦN(u))dμ0(u)\mathrm{d}\mu_{N}^{\textup{S}}(u)\propto\exp(-\Phi_{N}(u))\,\mathrm{d}\mu_{0}(u). Indeed, via Theorem 3.2, we have the following convergence result for the case of a sparsifying distribution:

Proposition 4.1.

Suppose that the entries of σ\sigma are i.i.d. \ell-sparse, for some [0,1)\ell\in[0,1), and that ΦLμ02(𝒰)\Phi\in L^{2}_{\mu_{0}}(\mathcal{U}). Then there exists a constant CC, independent of NN, such that

(𝔼νN[dH(μ,μNS)2])1/2CN.\left(\mathbb{E}_{\nu_{N}}\bigl{[}d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\bigr{]}\right)^{1/2}\leq\frac{C}{\sqrt{N}}. (4.1)

(In this section, νN\nu_{N} plays the role of the distribution of σ(1),,σ(N)\sigma^{(1)},\dots,\sigma^{(N)}.) As the proof reveals, a valid choice of the constant CC in (4.1) is

C=(D1+D2)J3𝔼νN[σj4]1ΦLμ02(𝒰)=(D1+D2)J3s31ΦLμ02(𝒰),C=(D_{1}+D_{2})\sqrt{J^{3}\mathbb{E}_{\nu_{N}}[\sigma_{j}^{4}]-1}\|\Phi\|_{L^{2}_{\mu_{0}}(\mathcal{U})}\\ =(D_{1}+D_{2})\sqrt{J^{3}s^{3}-1}\|\Phi\|_{L^{2}_{\mu_{0}}(\mathcal{U})}, (4.2)

where the constant (D1+D2)(D_{1}+D_{2}) is as in Theorem 3.2. Thus, as one would expect, the accuracy of the approximation decreases as σ\sigma approaches the complete sparsification case =1\ell=1 or as the data dimension JJ increases, but always with the same convergence rate N1/2N^{-1/2} in terms of the approximation dimension NN.

Remark 4.2.

The proof of Proposition 4.1 can be modified to yield the same result for arbitrary i.i.d. σj\sigma_{j} with bounded support, though the sparsifying case is obviously the one with the easiest interpretation. However, extending Proposition 4.1 to the case of i.i.d. Gaussian random variables σj𝒩(0,1)\sigma_{j}\sim\mathcal{N}(0,1) appears to be problematic. In the proof, we crucially make use of the bound |σj|s|\sigma_{j}|\leq\sqrt{s} to verify Assumption (b) of Theorem 3.2. For Gaussian random variables, we would similarly need an NN-independent bound on the exponential moments of

max1iN1jJσj(i),\max_{\begin{subarray}{c}1\leq i\leq N\\ 1\leq j\leq J\end{subarray}}\sigma_{j}^{(i)},

which is not possible. We leave this as an interesting question for future work: would a different proof strategy yield convergence in the Gaussian case, or is the Gaussian setting genuinely one in which the MAP problem is well approximated but the BIP is not?

5 Application: probabilistic integration of dynamical systems

The data-based inference of initial conditions or governing parameters for dynamical problems arises frequently in scientific applications, a prime example being data assimilation in numerical weather prediction (Law et al., 2015; Reich and Cotter, 2015). In this setting, the Bayesian likelihood involves a solution of the mathematical model for the dynamics, which is typically an ODE or time-dependent PDE; we focus here on the ODE situation. Even when the governing ODE is deterministic, it may be profitable to perform a probabilistic numerical solution: possible motivations for doing so include the representation of model error (model inadequacy) in the ODE itself, and the impact of discretisation uncertainty. When such a probabilistic solver is used for the ODE, the likelihood becomes random in the sense considered in this paper.

Random approximate solution of deterministic ODEs is an old idea (Diaconis, 1988; Skilling, 1992) that has received renewed attention in recent years (Conrad et al., 2016; Hennig et al., 2015; Lie et al., 2017; Schober et al., 2014). As random forward models, these probabilistic ODE solvers are amenable to the analysis of Section 3. Let f:ddf\colon\mathbb{R}^{d}\to\mathbb{R}^{d} and consider the following parameter-dependent initial value problem for a fixed, parameter-independent duration T>0T>0:

ddtz(t;u)\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}z(t;u) =f(z(t;u);u),\displaystyle=f(z(t;u);u), for 0tT0\leq t\leq T, (5.1)
z(0;u)\displaystyle z(0;u) =z0(u).\displaystyle=z_{0}(u).

In the context of the BIP presented in Section 2, the unknown parameter uu will appear in the definition of the initial condition z0=z0(u)z_{0}=z_{0}(u) or the right-hand side f(z(t))=f(z(t);u)f(z(t))=f(z(t);u), resulting in the parameter-dependent solution (z(t;u))t[0,T](z(t;u))_{t\in[0,T]}. Define the solution operator

S:𝒰C([0,T];d),uS(u)(z(t;u))t[0,T],S\colon\mathcal{U}\to C([0,T];\mathbb{R}^{d}),\quad u\mapsto S(u)\coloneqq(z(t;u))_{t\in[0,T]}, (5.2)

where (z(t;u))t[0,T](z(t;u))_{t\in[0,T]} solves (5.1). We equip C([0,T];d)C([0,T];\mathbb{R}^{d}) with the supremum norm.

For notational convenience, we will for the majority of this section not indicate the dependence of z0z_{0} or ff on uu. We will, however, explicitly track the dependence on z0z_{0} and ff of the error analysis below.

Let Ft:ddF^{t}\colon\mathbb{R}^{d}\to\mathbb{R}^{d} be the flow map associated to the initial value problem (5.1), i.e. Ft(z0)z(t;u)=S(u)(t)F^{t}(z_{0})\coloneqq z(t;u)=S(u)(t). Fix a time step τ>0\tau>0 such that NT/τN\coloneqq T/\tau\in\mathbb{N}, and a time grid

tkkτ for k[N]{0,1,,N}.t_{k}\coloneqq k\tau\text{ for }k\in[N]\coloneqq\{0,1,\dotsc,N\}. (5.3)

We denote by zkz(tk)Fτ(zk1)z_{k}\coloneqq z(t_{k})\equiv F^{\tau}(z_{k-1}) the value of the exact solution to (5.1) at time tkt_{k}. We shall sometimes abuse notation and write [N]={0,1,,N1}[N]=\{0,1,\dotsc,N-1\} or [N]={1,2,,N}[N]=\{1,2,\dotsc,N\}.

To a single-step numerical integration method (e.g. a Runge–Kutta method of some order) we shall associate a numerical flow map Ψτ:dd\Psi^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d}. The numerical flow map approximates the sequence (zk)k[N](z_{k})_{k\in[N]} by a sequence (Zk)k[N](Z^{\prime}_{k})_{k\in[N]}, where ZkΨτ(Zk1)Z^{\prime}_{k}\coloneqq\Psi^{\tau}(Z^{\prime}_{k-1}). A fundamental task in numerical analysis is to determine sufficient conditions for convergence of the sequence (Zk)k[N](Z^{\prime}_{k})_{k\in[N]} to (zk)k[N](z_{k})_{k\in[N]}. The investigations of Conrad et al. (2016) and Lie et al. (2017) concern a similar task in the context of uncertainty quantification. Given τ>0\tau>0, consider a collection (ξk)k[N](\xi_{k})_{k\in[N]} of stochastic processes ξk:Ω×[0,τ]d\xi_{k}\colon\Omega\times[0,\tau]\to\mathbb{R}^{d} having almost-surely continuous paths. Define a stochastic process (Zt)t[0,T](Z_{t})_{t\in[0,T]} in terms of a new randomised integrator

Z(tk+1;u)Ψτ(Z(tk;u))+ξk(τ).Z(t_{k+1};u)\coloneqq\Psi^{\tau}(Z(t_{k};u))+\xi_{k}(\tau). (5.4)

The stochastic processes (ξk)k[N](\xi_{k})_{k\in[N]} are intended to capture the effect of uncertainties, e.g. those that arise due to properties of the vector field that are not resolved by the time grid (5.3) associated to the time step τ\tau. We extend the definition (5.4) to continuous time via

Z(t;u)Ψttk(Z(tk;u))+ξk(ttk),for tk<t<tk+1.Z(t;u)\coloneqq\Psi^{t-t_{k}}(Z(t_{k};u))+\xi_{k}(t-t_{k}),\quad\text{for }t_{k}<t<t_{k+1}. (5.5)

We shall use the (ξk)k[N](\xi_{k})_{k\in[N]} to construct our random approximations to Φ\Phi. Note therefore that, in order to be consistent with our assumption (see the third paragraph of Section 3) that the randomness in the approximation of Φ\Phi is independent of the randomness in the parameter uu being inferred, we shall assume that the (ξk)k[N](\xi_{k})_{k\in[N]} do not depend on the parameter uu. However, the map Ψτ\Psi^{\tau} does depend on the parameter u𝒰u\in\mathcal{U}, because Ψτ\Psi^{\tau} involves the vector field f(;u)f(\hbox to5.71527pt{\hss$\cdot$\hss};u).

Define the random solution operator associated to the randomised integrator (5.5):

SN:𝒰C([0,T];d),uSN(u)(Z(t;u))t[0,T],S_{N}\colon\mathcal{U}\to C([0,T];\mathbb{R}^{d}),\quad u\mapsto S_{N}(u)\coloneqq(Z(t;u))_{t\in[0,T]}, (5.6)

where (Z(t;u))t[0,T](Z(t;u))_{t\in[0,T]} satisfies (5.5), and is almost surely continuous.

Let TJ[0,T]T_{J}\subset[0,T] be a strictly increasing sequence of time points, indexed by a finite, nonempty index set JJ with cardinality |J||J|\in\mathbb{N}. Note that TJT_{J} may coincide with the time grid defined in (5.3); to increase the scope of the subsequent analysis however, we allow for TJT_{J} to differ from (5.3). Let 𝒴d|J|\mathcal{Y}\coloneqq\mathbb{R}^{d|J|}, and equip it with the topology induced by the standard Euclidean inner product. Define the observation operator

O:C([0,T];d)𝒴,z~O(z~)(z~(tj))tjTJ,O\colon C([0,T];\mathbb{R}^{d})\to\mathcal{Y},\quad\tilde{z}\mapsto O\left(\tilde{z}\right)\coloneqq(\tilde{z}(t_{j}))_{t_{j}\in T_{J}}, (5.7)

which projects some z~C([0,T];d)\tilde{z}\in C([0,T];\mathbb{R}^{d}) to a finite-dimensional vector in 𝒴\mathcal{Y} constructed by stacking the d\mathbb{R}^{d}-valued vectors that result from evaluating z~\tilde{z} at the time points in TJT_{J}. We take the norm on 𝒴\mathcal{Y} to be 2d|J|\|\hbox to5.71527pt{\hss$\cdot$\hss}\|_{\ell^{d|J|}_{2}}.

Given the operators SS, OO, and SNS_{N} defined in (5.2), (5.7), and (5.6), we define the forward operators G,GN:𝒰𝒴G,G_{N}\colon\mathcal{U}\to\mathcal{Y} by

GOS,GNOSN.G\coloneqq O\circ S,\quad G_{N}\coloneqq O\circ S_{N}. (5.8)

The associated likelihoods are the quadratic misfits given by (3.8) with some fixed, positive-definite matrix Γ\Gamma.

We define the continuous-time error process by

e(t;u)z(t;u)Z(t;u),0tT.e(t;u)\coloneqq z(t;u)-Z(t;u),\quad 0\leq t\leq T. (5.9)

Since TJT_{J} is a proper subset of [0,T][0,T], it follows that

GN(u)G(u)|J|sup0tTe(t;u)2d.\|G_{N}(u)-G(u)\|\leq|J|\sup_{0\leq t\leq T}\|e(t;u)\|_{\ell^{d}_{2}}. (5.10)

This completes our formulation of the probabilistic numerical integration of the ODE (5.1) as a random likelihood model of the type considered in Section 3.

5.1 Convergence in continuous time for Lipschitz flows

In this section, we quote some assumptions and results from Lie et al. (2017). The vector field ff in (5.1) induces a flow Fτ:ddF^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d} by

Fτ(a)=a+0τf(Ft(a))dt.F^{\tau}(a)=a+\int_{0}^{\tau}f(F^{t}(a))\,\mathrm{d}t. (5.11)
Assumption 5.1 (Assumption 3.1, Lie et al. (2017)).

The vector field ff admits 0<τ10<\tau^{\ast}\leq 1 and CF1C_{F}\geq 1, such that for 0<τ<τ0<\tau<\tau^{\ast}, the flow Fτ:ddF^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d} defined by (5.11) is globally Lipschitz, with

Fτ(z0)Fτ(v0)(1+CFτ)z0v0,for all z0,v0d.\|F^{\tau}(z_{0})-F^{\tau}(v_{0})\|\leq(1+C_{F}\tau)\|z_{0}-v_{0}\|,\quad\text{for all $z_{0},v_{0}\in\mathbb{R}^{d}$.}

A globally Lipschitz vector field ff in (5.1) yields a flow map FtF^{t} that satisfies Assumption 5.1. However, vector fields that satisfy a one-sided Lipschitz condition also have the same property. Such vector fields have been studied in the numerical analysis literature for both ordinary and stochastic differential equations in the last four decades; see, e.g. (Butcher, 1975), and the references cited in Section 3.1 of (Higham et al., 2002).

Recall that Ψτ:dd\Psi^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d} represents the numerical method that we use to integrate (5.1).

Assumption 5.2 (Assumption 3.2, Lie et al. (2017)).

The numerical method Ψτ\Psi^{\tau} has uniform local truncation error of order q+1q+1: for some constant CΨ1C_{\Psi}\geq 1 that does not depend on τ\tau,

supvdΨτ(v)Fτ(v)CΨτq+1.\sup_{v\in\mathbb{R}^{d}}\|\Psi^{\tau}(v)-F^{\tau}(v)\|\leq C_{\Psi}\tau^{q+1}.

The assumption above is satisfied for both single-step and multistep numerical methods that are obtained by considering vector fields in Cq(d)C^{q}(\mathbb{R}^{d}), provided that the qthq^{\text{th}} derivatives are bounded; see Section III.2 of (Hairer et al., 2009). We emphasise that the above assumption is made to simplify the analysis, and that the convergence results below extend to the case where the uniform bound does not hold; see Section 4 of (Lie et al., 2017).

Now recall the collection (ξk(τ))k[N](\xi_{k}(\tau))_{k\in[N]} of random variables, where ξk(τ)\xi_{k}(\tau) is used in (5.4).

Assumption 5.3 (Assumption 5.1, Lie et al. (2017)).

The stochastic processes (ξk)k(\xi_{k})_{k\in\mathbb{N}} admit p1p\geq 1, R{+}R\in\mathbb{N}\cup\{+\infty\}, and Cξ,R1C_{\xi,R}\geq 1, independent of kk and τ\tau, such that for all 1rR1\leq r\leq R and all kk\in\mathbb{N},

𝔼νN[sup0<tT/Nξk(t)r](Cξ,R(TN)p+1/2)r.\mathbb{E}_{\nu_{N}}\left[\sup_{0<t\leq T/N}\|\xi_{k}(t)\|^{r}\right]\leq\left(C_{\xi,R}\left(\frac{T}{N}\right)^{p+1/2}\right)^{r}.

(In this section, νN\nu_{N} plays the role of the distribution of the ξk\xi_{k}’s.) The assumption above quantifies the regularity of the (ξk)k[N](\xi_{k})_{k\in[N]} by specifying how many moments each ξk(t)\xi_{k}(t) has and how quickly these decay with τ=T/N\tau=T/N. We do not require the (ξk(t))k[N](\xi_{k}(t))_{k\in[N]} to have zero mean, to be independent, or to be identically distributed. It is shown in Section 5.2 of (Lie et al., 2017) that, for example, the integrated Brownian motion process satisfies Assumption 5.3. The integrated Brownian motion process has been used as a state-independent model of the uncertainty in the off-grid behaviour of solutions to ODEs in (Conrad et al., 2016; Schober et al., 2014; Chkrebtii et al., 2016).

We now consider the following convergence theorem:

Theorem 5.4 (Theorem 5.2, Lie et al. (2017)).

Suppose that e0=0e_{0}=0, and suppose that Assumptions 5.1, 5.2, and 5.3 hold with parameters τ\tau^{\ast}, CFC_{F}, CΨC_{\Psi}, qq, Cξ,RC_{\xi,R}, pp, and RR. Let nn\in\mathbb{N}, with nRn\leq R. Then, for all T/τ<NT/\tau^{\ast}<N,

𝔼νN[sup0tTe(t;u)n]3n1((1+CFτ)nC¯+CΨn(τ)n+TCξ,Rn)(TN)n(q(p1/2)),\displaystyle\mathbb{E}_{\nu_{N}}\left[\sup_{0\leq t\leq T}\|e(t;u)\|^{n}\right]\leq 3^{n-1}\left(\left(1+C_{F}\tau^{\ast}\right)^{n}\overline{C}+C_{\Psi}^{n}(\tau^{\ast})^{n}+TC^{n}_{\xi,R}\right)\left(\frac{T}{N}\right)^{n(q\wedge(p-1/2))}, (5.12)

where

C¯\displaystyle\overline{C} 2Tmax{(4CΨ)n,(2Cξ,R)n}exp(TCF(n,τ))\displaystyle\coloneqq 2T\max\{(4C_{\Psi})^{n},(2C_{\xi,R})^{n}\}\exp\left(TC_{F}(n,\tau^{\ast})\right)
CF(n,τ)\displaystyle C_{F}(n,\tau^{\ast}) [(1+τ2n1)2(1+τCF)n1](τ)1.\displaystyle\coloneqq\left[(1+\tau^{\ast}2^{n-1})^{2}(1+\tau^{\ast}C_{F})^{n}-1\right](\tau^{\ast})^{-1}.

Note that the scalars C¯\overline{C} and CF(n,τ)C_{F}(n,\tau^{\ast}) depend on u𝒰u\in\mathcal{U}, since CFC_{F} and CΨC_{\Psi} depend on the vector field ff, which in turn depends on the parameter uu.

Recall that the random variable (Z(t;u))0tT(Z(t;u))_{0\leq t\leq T} defined in (5.5) is a random surrogate for the true solution of the ODE (5.1). The law of (Z(t;u))0tT(Z(t;u))_{0\leq t\leq T} is thus a probability measure on the space of continuous paths defined on the interval [0,T][0,T]. With this in mind, the interpretation of Theorem 5.4 is that the law of (Z(t;u))0tT(Z(t;u))_{0\leq t\leq T} contracts to the Dirac distribution located at the true solution (z(t;u))0tT(z(t;u))_{0\leq t\leq T} of (5.1), as the spacing T/NT/N in the time grid (5.3) decreases to zero. Equivalently, given the true solution operator SS and its random counterpart SNS_{N}, Theorem 5.4 implies that the random solution operator converges in the LnL^{n} topology to the true solution operator. Thus, Theorem 5.4 guarantees that by refining the time grid, one reduces the uncertainty over the solution of (5.1). This is a desirable feature for uncertainty quantification, since estimates of the solution uncertainty can also be fed forward to obtain estimates of the uncertainty of functionals of the solution, and since the probabilistic description allows for a more nuanced description of the uncertainty compared to the usual worst-case description that is common in the numerical analysis of deterministic methods.

Corollary 5.5 (Corollary 5.3, Lie et al. (2017)).

Fix nn\in\mathbb{N}. Suppose that Assumptions 5.1 and 5.2 hold, and that Assumption 5.3 holds with R=+R=+\infty and p1/2p\geq 1/2. Then, for all 0<τ<τ0<\tau<\tau^{\ast},

𝔼νN[exp(ρsup0tTe(t)n)]<,for all ρ.\mathbb{E}_{\nu_{N}}\left[\exp\left(\rho\sup_{0\leq t\leq T}\|e(t)\|^{n}\right)\right]<\infty,\quad\text{for all }\rho\in\mathbb{R}. (5.13)

Since the exponential integrability of a random variable is related to the exponential concentration of its values about its mean or median, the above result shows that strong assumptions on the model of uncertainty translate to strong conclusions about the behaviour of the corresponding error. In the context of random approximations of BIPs, we shall use Corollary 5.5 in order to establish the convergence of the random approximations in the Hellinger sense.

5.2 Effect of probabilistic integration on Bayesian posterior distribution

Define the approximate posteriors μNM\mu^{\textup{M}}_{N} and μNS\mu^{\textup{S}}_{N} according to (3.2) and (3.1), using the quadratic misfits Φ\Phi and ΦN\Phi_{N} from (3.8) and the forward models G=OSG=O\circ S and GN=OSNG_{N}=O\circ S_{N} given in (5.2) and (5.6) respectively, where OO denotes the observation operator associated to a fixed, finite sequence TJT_{J} of observation times in [0,T][0,T].

As we saw in the last section, the results of Lie et al. (2017) guarantee convergence in the LnL^{n} topology of the random solution operator SNS_{N} to the true solution operator SS. It is of interest to determine whether one can use this result to guarantee that one can perform inference over uu using the probabilistic integrator. In particular, given that the probabilistic integrator provides a random approximation, it is of interest to determine whether one can obtain results that are not only reasonable, but that improve as the time resolution of the probabilistic integrator increases. The following result shows that this is indeed the case: as the time resolution increases, the random forward model GNG_{N} yields a random posterior over the parameter space that converges in the Hellinger topology to the true posterior at the expected rate.

Theorem 5.6.

Suppose that 𝒰\mathcal{U} is a compact subset of m\mathbb{R}^{m} for some mm\in\mathbb{N}, and suppose that S,SN:𝒰C([0,T];d)S,S_{N}\colon\mathcal{U}\to C([0,T];\mathbb{R}^{d}) are continuous maps. Let 2<ρ<2<\rho^{\ast}<\infty be arbitrary. Suppose that e0=0e_{0}=0, and that Assumptions 5.1, 5.2, and 5.3 hold with parameters τ\tau^{\ast}, CFC_{F}, CΨC_{\Psi}, qq, R=+R=+\infty, Cξ,RC_{\xi,R}, and pp, and that these parameters depend continuously on uu. Then, for NN\in\mathbb{N} such that T/τ<NT/\tau^{\ast}<N, the following hold:

  1. (a)

    there exists some C>0C>0 that does not depend on NN, such that (3.12) holds for r1=1r_{1}=1 and r2=2ρ/(ρ1)r_{2}=2\rho^{\ast}/(\rho^{\ast}-1), and

  2. (b)

    there exists some D>0D>0 that does not depend on NN, such that (3.13) holds for s1=2ρ/(ρ2)s_{1}=2\rho^{\ast}/(\rho^{\ast}-2) and s2=2s_{2}=2.

The parameter ρ\rho^{\ast} above plays the same role as ρ\rho^{\ast} in Theorem 3.9, case (b); ρ\rho^{\ast} quantifies the exponential decay of the random misfit ΦN\Phi_{N} with respect to νN\nu_{N}. For this reason, ρ\rho^{\ast} is constrained to the same range of values 2<ρ<2<\rho^{\ast}<\infty as given there, and determines the parameters r2r_{2} and s1s_{1} that partly describe the convergence rates in (3.12) and (3.13). As shown in the proof, the reason why ρ\rho^{\ast} does not appear to play any further role is due to (5.10) and Corollary 5.5. In particular, since Assumption 5.3 holds with R=+R=+\infty, Corollary 5.5 ensures that the exponential decay parameter ρ\rho^{\ast} need not be constrained to any bounded interval.

The continuous dependence on uu of the parameters of Assumptions 5.1, 5.2 and 5.3 also allows for parameters that do not depend on uu, e.g. R=+R=+\infty. The assumption of continuous dependence on uu of the parameters, and the assumptions on 𝒰\mathcal{U}, ensure that the map u𝔼νN[exp(ρΦN(u))]u\mapsto\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N}(u))] is uniformly bounded by a scalar that depends only on 𝒰\mathcal{U}; from this the exponential integrability hypothesis on ΦN\Phi_{N} of Theorem 3.9(b) holds, and we can apply the corresponding conclusions. While these assumptions may appear to be strong, they simplify the analysis considerably and thus are not uncommon in the literature on parameter inference for dynamical systems. We leave the investigation of weaker assumptions for future work.

6 Concluding remarks

In this paper we have considered the impact upon a BIP of replacing the log-likelihood function Φ\Phi by a random function ΦN\Phi_{N}. Such approximations occur for example when a cheap stochastic emulator is used in place of an expensive exact log-likelihood, or when a probabilistic solver is used to simulate the forward model.

Our results show that such approximations are well-posed, with the approximate Bayesian posterior distribution converging to the true Bayesian posterior as the error between Φ\Phi and ΦN\Phi_{N}, measured in a suitable sense, goes to zero. More precisely, we have shown that the convergence rate of the random log-likelihood ΦN\Phi_{N} to Φ\Phi — as assessed in a nested LpL^{p} norm with respect to the distribution νN\nu_{N} of ΦN\Phi_{N} and the Bayesian prior distribution μ0\mu_{0} of the unknown uu — transfers to convergence of two natural approximations to the exact Bayesian posterior μ\mu, namely (a) the randomised posterior measure μNS\mu_{N}^{\textup{S}} that simply has ΦN\Phi_{N} in place of Φ\Phi, and (b) the deterministic pseudo-marginal posterior measure μNM\mu_{N}^{\textup{M}}, in which the likelihood function and marginal likelihood of μNS\mu_{N}^{\textup{S}} are individually averaged with respect to νN\nu_{N}.

Since the hypotheses that are required for these results operate directly at the level of finite-order moments of the error ΦNΦ\Phi_{N}-\Phi, the convergence results in this paper automatically apply to GP approximations, as previously considered by Stuart and Teckentrup (2017), which have moments of all orders. However, in a substantial generalisation, Theorems 3.1 and 3.2 show that, in the L2L^{2} case,

dH(μ,μNM)\displaystyle d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)} C𝔼νN[|ΦΦN|]Lμ02(𝒰),\displaystyle\leq C\;\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|\bigr{]}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})},
𝔼νN[dH(μ,μNS)2]1/2\displaystyle\mathbb{E}_{\nu_{N}}\left[d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\right]^{1/2} C𝔼νN[|ΦΦN|2]1/2Lμ02(𝒰),\displaystyle\leq C\;\left\|\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{2}\right]^{1/2}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})},

for general approximations ΦN\Phi_{N}. This optimal bound requires that the random misfit ΦN\Phi_{N} allows pointwise bounds on exp(ΦN)\exp(-\Phi_{N}) and ZNSZ_{N}^{\textup{S}} with respect to the distribution νN\nu_{N} on ΦN\Phi_{N} and the Bayesian prior μ0\mu_{0} on the unknown uu. If the distribution of ΦN\Phi_{N} does not allow pointwise (LL^{\infty}) bounds on exp(ΦN)\exp(-\Phi_{N}) and ZNSZ_{N}^{\textup{S}}, but only bounds in LrL^{r} for some 1r<1\leq r<\infty, then the norms in the bounds above need to be strengthened to higher order Lμ0q(𝒰)L^{q}_{\mu_{0}}(\mathcal{U}) norms and/or higher order moments of the error ΦΦN\Phi-\Phi_{N}, resulting in the quantity 𝔼νN[|ΦΦN|p]1/pLμ0q(𝒰)\left\|\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{p}\right]^{1/p}\right\|_{L^{q}_{\mu_{0}}(\mathcal{U})} appearing on the right hand sides, for some q2q\geq 2 and p1p\geq 1 (respectively p2p\geq 2).

Our error bounds are explicit in the sense that the aforementioned exponents pp and qq can typically be calculated explicitly given the structure of ΦN\Phi_{N}. This is the case for the GP emulators considered in Stuart and Teckentrup (2017), and also the randomised misfit models and probabilistic numerical solvers considered here. The constant CC in the error bounds, on the other hand, is typically not computable in advance; it involves quantities such as the normalising constants ZZ and 𝔼[ZNS]\mathbb{E}[Z_{N}^{\textup{S}}], which for most forward models GG are not known analytically and very expensive to compute numerically. In a sense, this is similar to the everyday situation of using an ODE or PDE solver of known order but unknown constant prefactor.

A significant open question in this work is the one highlighted at the end of Section 4: in contrast to randomised dimension reduction using bounded random variables, is the case of Gaussian randomly-projected misfits one in which the MAP problem and BIP genuinely have different convergence properties?

Acknowledgements

ALT is partially supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. HCL and TJS are partially supported by the Freie Universität Berlin within the Excellence Initiative of the German Research Foundation (DFG). This work was partially supported by the DFG through grant CRC 1114 “Scaling Cascades in Complex Systems”, and by the National Science Foundation (NSF) under grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute (SAMSI) and SAMSI’s QMC Working Group II “Probabilistic Numerics”. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the above-named funding agencies and institutions.

References

  • Andrieu and Roberts [2009] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist., 37(2):697–725, 2009. 10.1214/07-AOS574.
  • Beaumont [2003] M. A. Beaumont. Estimation of population growth or decline in genetically monitored populations. Genetics, 164(3):1139–1160, 2003. URL http://www.genetics.org/content/164/3/1139.
  • Bogachev [2007] V. I. Bogachev. Measure Theory. Vol. I, II. Springer-Verlag, Berlin, 2007. 10.1007/978-3-540-34514-5.
  • Butcher [1975] J. C. Butcher. A stability property of implicit Runge–Kutta methods. BIT Num. Math., 15(4):358–361, Dec 1975. 10.1007/BF01931672.
  • Chkrebtii et al. [2016] O. A. Chkrebtii, D. A. Campbell, B. Calderhead, and M. A. Girolami. Bayesian solution uncertainty quantification for differential equations. Bayesian Anal., 11(4):1239–1267, 2016. 10.1214/16-BA1017.
  • Conrad et al. [2016] P. R. Conrad, M. Girolami, S. Särkkä, A. M. Stuart, and K. C. Zygalakis. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput., 27(4):1065–1082, 2016. 10.1007/s11222-016-9671-0.
  • Cotter et al. [2013] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci., 28(3):424–446, 2013. 10.1214/13-STS421.
  • Dashti and Stuart [2016] M. Dashti and A. M. Stuart. The Bayesian approach to inverse problems. In R. Ghanem, D. Higdon, and H. Owhadi, editors, Handbook of Uncertainty Quantification, pages 311–428. Springer, 2016. 10.1007/978-3-319-11259-6_7-1.
  • Dashti et al. [2012] M. Dashti, S. Harris, and A. M. Stuart. Besov priors for Bayesian inverse problems. Inverse Probl. Imaging, 6(2):183–200, 2012. 10.3934/ipi.2012.6.183.
  • Diaconis [1988] P. Diaconis. Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986), pages 163–175. Springer, New York, 1988.
  • Dupuis and Ellis [1997] P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, 1997. 10.1002/9781118165904.
  • Evans and Stark [2002] S. N. Evans and P. B. Stark. Inverse problems as statistics. Inverse Probl., 18(4):R55–R97, 2002. 10.1088/0266-5611/18/4/201.
  • Hairer et al. [2009] E. Hairer, S. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I: Nonstiff Problems, volume 8 of Springer Series in Computational Mathematics. Springer-Verlag, New York, 2009. 10.1007/978-3-540-78862-1.
  • Hennig et al. [2015] P. Hennig, M. A. Osborne, and M. Girolami. Probabilistic numerics and uncertainty in computations. Proc. A., 471(2179):20150142, 17, 2015. 10.1098/rspa.2015.0142.
  • Higham et al. [2002] D. J. Higham, X. Mao, and A. M. Stuart. Strong convergence of Euler-type methods for nonlinear stochastic differential equations. SIAM J. Numer. Anal., 40(3):1041–1063, 2002. 10.1137/S0036142901389530.
  • Hosseini [2017] B. Hosseini. Well-posed Bayesian inverse problems with infinitely-divisible and heavy-tailed prior measures. SIAM/ASA J. Uncertain. Quantif., 5(1):1024–1060, 2017. 10.1137/16M1096372.
  • Hosseini and Nigam [2017] B. Hosseini and N. Nigam. Well-posed Bayesian inverse problems: priors with exponential tails. SIAM/ASA J. Uncertain. Quantif., 5(1):436–465, 2017. 10.1137/16M1076824.
  • Jordan and Kinderlehrer [1996] R. Jordan and D. Kinderlehrer. An extended variational principle. In Partial Differential Equations and Applications, volume 177 of Lecture Notes in Pure and Appl. Math., pages 187–200. Dekker, New York, 1996. 10.5006/1.3292113.
  • Kaipio and Somersalo [2005] J. Kaipio and E. Somersalo. Statistical and Computational Inverse Problems, volume 160 of Applied Mathematical Sciences. Springer-Verlag, New York, 2005. 10.1007/b138659.
  • Kraft [1955] C. H. Kraft. Some conditions for consistency and uniform consistency of statistical procedures. Univ. California Publ. Statist., 2:125–141, 1955.
  • Lassas and Siltanen [2004] M. Lassas and S. Siltanen. Can one use total variation prior for edge-preserving Bayesian inversion? Inverse Probl., 20(5):1537–1563, 2004. 10.1088/0266-5611/20/5/013.
  • Law et al. [2015] K. Law, A. Stuart, and K. Zygalakis. Data Assimilation: A Mathematical Introduction, volume 62 of Texts in Applied Mathematics. Springer, 2015. 10.1007/978-3-319-20325-6.
  • Le et al. [2017] E. B. Le, A. Myers, T. Bui-Thanh, and Q. P. Nguyen. A data-scalable randomized misfit approach for solving large-scale PDE-constrained inverse problems. Inverse Probl., 33(6):065003, 2017. 10.1088/1361-6420/aa6cbd.
  • Lie et al. [2017] H. C. Lie, A. M. Stuart, and T. J. Sullivan. Strong convergence rates of probabilistic integrators for ordinary differential equations, 2017. arXiv:1703.03680.
  • Nemirovski et al. [2008] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. Optim., 19(4):1574–1609, 2008. 10.1137/070704277.
  • Ohta and Takatsu [2011] S. Ohta and A. Takatsu. Displacement convexity of generalized relative entropies. Adv. Math., 228(3):1742–1787, 2011. 10.1016/j.aim.2011.06.029.
  • Owhadi and Scovel [2017] H. Owhadi and C. Scovel. Qualitative robustness in Bayesian inference. ESAIM Probab. Stat., 21:251–274, 2017. 10.1051/ps/2017014.
  • Owhadi et al. [2015] H. Owhadi, C. Scovel, and T. J. Sullivan. Brittleness of Bayesian inference under finite information in a continuous world. Electron. J. Stat., 9(1):1–79, 2015. 10.1214/15-EJS989.
  • Pinsker [1964] M. S. Pinsker. Information and Information Stability of Random Variables and Processes. Holden-Day, Inc., San Francisco, Calif.-London-Amsterdam, 1964.
  • Reich and Cotter [2015] S. Reich and C. Cotter. Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press, New York, 2015. 10.1017/CBO9781107706804.
  • Robert and Casella [1999] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer-Verlag, New York, 1999. 10.1007/978-1-4757-3071-5.
  • Schober et al. [2014] M. Schober, D. K. Duvenaud, and P. Hennig. Probabilistic ODE solvers with Runge–Kutta means. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 739–747. Curran Associates, Inc., 2014. https://papers.nips.cc/paper/5451-probabilistic-ode-solvers-with-runge-kutta-means.
  • Shapiro et al. [2009] A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory, volume 9 of MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA, 2009. 10.1137/1.9780898718751.
  • Skilling [1992] J. Skilling. Bayesian solution of ordinary differential equations. In C. R. Smith, G. J. Erickson, and P. O. Neudorfer, editors, Maximum Entropy and Bayesian Methods, volume 50 of Fundamental Theories of Physics, pages 23–37. Springer, 1992. 10.1007/978-94-017-2219-3.
  • Stuart [2010] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer., 19:451–559, 2010. 10.1017/S0962492910000061.
  • Stuart and Teckentrup [2017] A. M. Stuart and A. L. Teckentrup. Posterior consistency for Gaussian process approximations of Bayesian posterior distributions. Math. Comput., 2017. 10.1090/mcom/3244.
  • Sullivan [2017] T. J. Sullivan. Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors. Inverse Probl. Imaging, 11(5):857–874, 2017. 10.3934/ipi.2017040.

Appendix: Proofs of Results

The proofs in this section will make repeated use of the following inequalities for real aa and bb:

(ab)2\displaystyle(a-b)^{2} 2a2+2b2,\displaystyle\leq 2a^{2}+2b^{2}, (A.1)
(ab)2=(a2b2a+b)2\displaystyle(a-b)^{2}=\left(\frac{a^{2}-b^{2}}{a+b}\right)^{2} (a2b2)2a2+b2,\displaystyle\leq\frac{(a^{2}-b^{2})^{2}}{a^{2}+b^{2}}, (A.2)
|exp(a)exp(b)|\displaystyle|\exp(a)-\exp(b)| (exp(a)+exp(b))|ab|,\displaystyle\leq(\exp(a)+\exp(b))|a-b|, (A.3)
[(a+b)ab]1\displaystyle[(a+b)ab]^{-1} max{a3,b3}\displaystyle\leq\max\{a^{-3},b^{-3}\} for a,b>0a,b>0. (A.4)

We also have, for arbitrary NN\in\mathbb{N} and p1p\geq 1 (not necessarily integer-valued), by the triangle inequality and Jensen’s inequality,

|j=1Nsj|p\displaystyle\left|\sum^{N}_{j=1}s_{j}\right|^{p} Np(1Nj=1N|sj|)pNp1j=1N|sj|p,\displaystyle\leq N^{p}\left(\frac{1}{N}\sum^{N}_{j=1}|s_{j}|\right)^{p}\leq N^{p-1}\sum_{j=1}^{N}|s_{j}|^{p}, (A.5)
Proof of Theorem 3.1.

Using (2.4) and (3.1), we have

dμdμ0dμNMdμ0\displaystyle\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\mu_{0}}}-\sqrt{\frac{\mathrm{d}\mu_{N}^{\textup{M}}}{\mathrm{d}\mu_{0}}} =exp(Φ(u))Z1/2𝔼νN[exp(ΦN(u))]𝔼νN[ZNS]1/2\displaystyle=\frac{\sqrt{\exp(-\Phi(u))}}{Z^{1/2}}-\frac{\sqrt{\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N}(u))\bigr{]}}}{\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}^{1/2}}
=exp(Φ(u))𝔼νN[exp(ΦN(u))]Z1/2\displaystyle=\frac{\sqrt{\exp(-\Phi(u))}-\sqrt{\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N}(u))\bigr{]}}}{Z^{1/2}}
+𝔼νN[exp(ΦN(u))](1Z1/21𝔼νN[ZNS]1/2).\displaystyle\phantom{=}\quad+\sqrt{\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N}(u))\bigr{]}}\left(\frac{1}{Z^{1/2}}-\frac{1}{\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}^{1/2}}\right).

Inequality (A.1) with a=Z1/2(eΦ(u)/2𝔼νN[eΦN(u)]1/2)a=Z^{-1/2}\bigl{(}e^{-\Phi(u)/2}-\mathbb{E}_{\nu_{N}}[e^{-\Phi_{N}(u)}]^{1/2}\bigr{)} and b=𝔼νN[ZNS]1/2(Z1/2𝔼νN[ZNS]1/2)b=\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}]^{1/2}(Z^{-1/2}-\mathbb{E}_{\nu_{N}}[Z_{N}^{\textup{S}}]^{-1/2}) and the definition (2.1) of the Hellinger distance dHd_{\textup{H}} yield

2dH(μ,μNM)2\displaystyle 2\;d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)}^{2} =𝒰(dμdμ0(u)dμNMdμ0(u))2dμ0(u)\displaystyle=\int_{\mathcal{U}}\left(\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\mu_{0}}}(u)-\sqrt{\frac{\mathrm{d}\mu_{N}^{\textup{M}}}{\mathrm{d}\mu_{0}}}(u)\right)^{2}\,\mathrm{d}\mu_{0}(u)
2Z(exp(Φ)𝔼νN[exp(ΦN)])2Lμ01\displaystyle\leq\frac{2}{Z}\left\|\left(\sqrt{\exp(-\Phi)}-\sqrt{\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}}\right)^{2}\right\|_{L^{1}_{\mu_{0}}}
+2𝔼νN[ZNS](Z1/2𝔼νN[ZNS]1/2)2I+II.\displaystyle\qquad+2\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}\left(Z^{-1/2}-\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}^{-1/2}\right)^{2}\eqqcolon I+II.

For the first term, we use inequality (A.2) with a=eΦ(u)/2a=e^{-\Phi(u)/2} and b=𝔼νN[exp(ΦN(u))]1/2b=\mathbb{E}_{\nu_{N}}[\exp(-\Phi_{N}(u))]^{1/2}, together with Hölder’s inequality with conjugate exponents p1p_{1} and p1p_{1}^{\prime}, to derive

Z2I\displaystyle\frac{Z}{2}I (exp(Φ)𝔼νN[exp(ΦN)])2(exp(Φ)+𝔼νN[exp(ΦN)])1Lμ01\displaystyle\leq\left\|\left(\exp(-\Phi)-\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{2}\left(\exp(-\Phi)+\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{-1}\right\|_{L^{1}_{\mu_{0}}}
(exp(Φ)𝔼νN[exp(ΦN)])2Lμ0p1(exp(Φ)+𝔼νN[exp(ΦN)])1Lμ0p1.\displaystyle\leq\left\|\left(\exp(-\Phi)-\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{2}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}\left\|\left(\exp(-\Phi)+\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{-1}\right\|_{L^{p_{1}}_{\mu_{0}}}. (A.6)

We estimate the second factor on the right-hand side of (A.6). Using the facts that x1/xx\mapsto 1/x is decreasing on (0,)(0,\infty), that (x+y)1min{x1,y1}(x+y)^{-1}\leq\min\{x^{-1},y^{-1}\} for all x,y>0x,y>0, and that both exp(Φ(u))\exp(-\Phi(u)) and 𝔼νN[exp(ΦN(u))]\mathbb{E}_{\nu_{N}}[\exp(-\Phi_{N}(u))] are strictly positive, we obtain

(exp(Φ)+𝔼νN[exp(ΦN)])1Lμ0p1min{exp(Φ),𝔼νN[exp(ΦN)]1}Lμ0p1.\left\|\left(\exp(-\Phi)+\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{-1}\right\|_{L^{p_{1}}_{\mu_{0}}}\leq\left\|\min\left\{\exp(\Phi),\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}^{-1}\right\}\right\|_{L^{p_{1}}_{\mu_{0}}}.

For f,gLμ01(𝒰)f,g\in L^{1}_{\mu_{0}}(\mathcal{U}), the partition 𝒰={f<g}{fg}\mathcal{U}=\{f<g\}\uplus\{f\geq g\} and the corresponding integral inequalities on {f<g}\{f<g\} and {fg}\{f\geq g\} imply that min{f,g}Lμ01min{fLμ01,gLμ01}\|\min\{f,g\}\|_{L^{1}_{\mu_{0}}}\leq\min\{\|f\|_{L^{1}_{\mu_{0}}},\|g\|_{L^{1}_{\mu_{0}}}\}. Hence,

min{eΦ,𝔼νN[eΦN]1}Lμ0p1min{eΦLμ0p1,𝔼νN[eΦN]1Lμ0p1}C1,\left\|\min\left\{e^{-\Phi},\mathbb{E}_{\nu_{N}}\bigl{[}e^{-\Phi_{N}}\bigr{]}^{-1}\right\}\right\|_{L^{p_{1}}_{\mu_{0}}}\leq\min\left\{\|e^{\Phi}\|_{L^{p_{1}}_{\mu_{0}}},\left\|\mathbb{E}_{\nu_{N}}\bigl{[}e^{-\Phi_{N}}\bigr{]}^{-1}\right\|_{L^{p_{1}}_{\mu_{0}}}\right\}\leq C_{1}, (A.7)

where C1=C1(p1)C_{1}=C_{1}(p_{1}) is the constant specified in assumption (a). This completes our estimate for the second factor on the right-hand side of (A.6). For the first factor, the linearity of expectation, inequality (A.3), and Hölder’s inequality with conjugate exponents p2,p2p_{2},p_{2}^{\prime} with respect to νN\nu_{N} and p3,p3p_{3},p_{3}^{\prime} with respect to μ0\mu_{0} give

(exp(Φ)𝔼νN[exp(ΦN)])2Lμ0p1=𝔼νN[exp(Φ)exp(ΦN)]2Lμ0p1\displaystyle\left\|\left(\exp(-\Phi)-\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})\bigr{]}\right)^{2}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}=\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi)-\exp(-\Phi_{N})\bigr{]}^{2}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}
𝔼νN[|exp(Φ)+exp(ΦN)||ΦΦN|]2Lμ0p1\displaystyle\qquad\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\exp(-\Phi)+\exp(-\Phi_{N})||\Phi-\Phi_{N}|\bigr{]}^{2}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}
𝔼νN[(exp(Φ)+exp(ΦN))p2]2/p2𝔼νN[|ΦΦN|p2]2/p2Lμ0p1\displaystyle\qquad\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\big{(}\exp(-\Phi)+\exp(-\Phi_{N})\big{)}^{p_{2}}\bigr{]}^{2/p_{2}}\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p_{2}^{\prime}}\bigr{]}^{2/p_{2}^{\prime}}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}
𝔼νN[(exp(Φ)+exp(ΦN))p2]1/p2Lμ02p1p32𝔼νN[|ΦΦN|p2]1/p2Lμ02p1p32\displaystyle\qquad\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\big{(}\exp(-\Phi)+\exp(-\Phi_{N})\big{)}^{p_{2}}\bigr{]}^{1/p_{2}}\right\|^{2}_{L^{2p_{1}^{\prime}p_{3}}_{\mu_{0}}}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p_{2}^{\prime}}\bigr{]}^{1/p_{2}^{\prime}}\right\|^{2}_{L^{2p_{1}^{\prime}p_{3}^{\prime}}_{\mu_{0}}} (A.8)

Letting C2=C2(p1,p2,p3)C_{2}=C_{2}(p_{1}^{\prime},p_{2},p_{3}) be the constant in assumption (b), and using (A.7), it follows that

I2ZC1(p1)C22(p1,p2,p3)𝔼νN[|ΦΦN|p2]1/p2Lμ02p1p32.I\leq\frac{2}{Z}\cdot C_{1}(p_{1})\cdot C^{2}_{2}(p_{1}^{\prime},p_{2},p_{3})\cdot\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p_{2}^{\prime}}\bigr{]}^{1/p_{2}^{\prime}}\right\|^{2}_{L^{2p_{1}^{\prime}p_{3}^{\prime}}_{\mu_{0}}}.

Now inequality (A.2) with a=𝔼νN[ZNS]1/2a=\mathbb{E}_{\nu_{N}}[Z^{\textup{S}}_{N}]^{-1/2} and b=Z1/2b=Z^{-1/2} and inequality (A.4) yield

12𝔼νN[ZNS]II\displaystyle\frac{1}{2\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}}II =(Z1/2(𝔼νN[ZNS])1/2)2\displaystyle=\left(Z^{-1/2}-\big{(}\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}\big{)}^{-1/2}\right)^{2}
=(𝔼νN[ZNS]ZZ𝔼νN[ZNS])2Z𝔼νN[ZNS]Z+𝔼νN[ZNS]\displaystyle=\left(\frac{\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z}{Z\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}}\right)^{2}\frac{Z\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}}{Z+\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}}
(𝔼νN[ZNS]Z)2max{Z3,𝔼νN[ZNS]3}\displaystyle\leq\left(\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\right)^{2}\max\bigl{\{}Z^{-3},\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}^{-3}\bigr{\}}
(𝔼νN[ZNS]Z)2max{Z3,C33},\displaystyle\leq\left(\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\right)^{2}\max\bigl{\{}Z^{-3},C_{3}^{-3}\bigr{\}},

where the last inequality follows from assumption (c).

Using Tonelli’s theorem, Jensen’s inequality, inequality (A.3), and Hölder’s inequality with the same conjugate exponent pairs that we used to obtain (A.8),

(𝔼νN[ZNS]Z)2\displaystyle\left(\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\right)^{2}
=𝔼μ0[𝔼νN[exp(ΦN)exp(Φ)]]2p1/p1\displaystyle\quad=\mathbb{E}_{\mu_{0}}\bigl{[}\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi_{N})-\exp(-\Phi)\bigr{]}\bigr{]}^{2p_{1}^{\prime}/p_{1}^{\prime}}
𝔼νN[exp(Φ)exp(ΦN)]2Lμ0p1\displaystyle\quad\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\exp(-\Phi)-\exp(-\Phi_{N})\bigr{]}^{2}\right\|_{L^{p_{1}^{\prime}}_{\mu_{0}}}
𝔼νN[(exp(Φ)+exp(ΦN))p2]1/p2Lμ02p1p32𝔼νN[|ΦΦN|p2]1/p2Lμ02p1p32\displaystyle\quad\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\big{(}\exp(-\Phi)+\exp(-\Phi_{N})\big{)}^{p_{2}}\bigr{]}^{1/p_{2}}\right\|^{2}_{L^{2p_{1}^{\prime}p_{3}}_{\mu_{0}}}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p^{\prime}_{2}}\bigr{]}^{1/p^{\prime}_{2}}\right\|^{2}_{L^{2p_{1}^{\prime}p^{\prime}_{3}}_{\mu_{0}}}
C22(p1,p2,p3)𝔼νN[|ΦΦN|p2]1/p2Lμ02p1p32,\displaystyle\quad\leq C^{2}_{2}(p_{1},p_{2},p_{3})\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{p^{\prime}_{2}}\bigr{]}^{1/p^{\prime}_{2}}\right\|^{2}_{L^{2p_{1}^{\prime}p^{\prime}_{3}}_{\mu_{0}}},

where assumption (b) yields the last inequality. Combining the estimates for II and IIII yields (3.3). ∎

Proof of Theorem 3.2.

This proof is similar to the proof of Theorem 3.1. Since

dμdμ0dμNSdμ0=eΦ(u)/2eΦN(u)/2Z1/2eΦN(u)/2(1ZNS1Z1/2),\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\mu_{0}}}-\sqrt{\frac{\mathrm{d}\mu_{N}^{\textup{S}}}{\mathrm{d}\mu_{0}}}=\frac{e^{-\Phi(u)/2}-e^{-\Phi_{N}(u)/2}}{Z^{1/2}}-e^{-\Phi_{N}(u)/2}\left(\frac{1}{\sqrt{Z^{\textup{S}}_{N}}}-\frac{1}{Z^{1/2}}\right),

Tonelli’s theorem, inequality (A.1), and Jensen’s inequality yield

𝔼νN[dH(μ,μNS)2]\displaystyle\mathbb{E}_{\nu_{N}}\bigl{[}d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\bigr{]} =12𝔼νN[(dμdμ0dμNSdμ0)2]Lμ01\displaystyle=\frac{1}{2}\left\|\mathbb{E}_{\nu_{N}}\left[\left(\sqrt{\frac{\mathrm{d}\mu}{\mathrm{d}\mu_{0}}}-\sqrt{\frac{\mathrm{d}\mu_{N}^{\textup{S}}}{\mathrm{d}\mu_{0}}}\right)^{2}\right]\right\|_{L^{1}_{\mu_{0}}}
1Z𝔼νN[(exp(Φ)exp(ΦN))2]Lμ01\displaystyle\leq\frac{1}{Z}\left\|\mathbb{E}_{\nu_{N}}\left[\left(\sqrt{\exp(-\Phi)}-\sqrt{\exp(-\Phi_{N})}\right)^{2}\right]\right\|_{L^{1}_{\mu_{0}}}
+𝔼νN[ZNS(Z1/2(ZNS)1/2)2]\displaystyle\phantom{=}\quad+\mathbb{E}_{\nu_{N}}\left[Z_{N}^{\textup{S}}\bigl{(}Z^{-1/2}-\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-1/2}\bigr{)}^{2}\right]
I+II.\displaystyle\eqqcolon I+II.

For the first term II, inequality (A.3), and Hölder’s inequality with conjugate exponent pairs (q1,q1)(q_{1},q_{1}^{\prime}) and (q2,q2)(q_{2},q_{2}^{\prime}) give

ZI\displaystyle ZI =𝔼νN[(exp(Φ)exp(ΦN))2]Lμ01\displaystyle=\left\|\mathbb{E}_{\nu_{N}}\left[\left(\sqrt{\exp(-\Phi)}-\sqrt{\exp(-\Phi_{N})}\right)^{2}\right]\right\|_{L^{1}_{\mu_{0}}}
14𝔼νN[|exp(Φ/2)+exp(ΦN/2)|2|ΦΦN|2]Lμ01\displaystyle\leq\frac{1}{4}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\exp(-\Phi/2)+\exp(-\Phi_{N}/2)|^{2}|\Phi-\Phi_{N}|^{2}\bigr{]}\right\|_{L^{1}_{\mu_{0}}}
𝔼νN[|exp(Φ/2)+exp(ΦN/2)|2q1]1/q1𝔼νN[|ΦΦN|2q1]1/q1Lμ01\displaystyle\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\exp(-\Phi/2)+\exp(-\Phi_{N}/2)|^{2q_{1}}\bigr{]}^{1/q_{1}}\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}\bigr{]}^{1/q_{1}^{\prime}}\right\|_{L^{1}_{\mu_{0}}}
𝔼νN[(exp(Φ/2)+exp(ΦN/2))2q1]1/q1Lμ0q2𝔼νN[|ΦΦN|2q1]1/2q1Lμ02q22.\displaystyle\leq\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\big{(}\exp(-\Phi/2)+\exp(-\Phi_{N}/2)\big{)}^{2q_{1}}\bigr{]}^{1/q_{1}}\right\|_{L^{q_{2}}_{\mu_{0}}}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}\bigr{]}^{1/2q_{1}^{\prime}}\right\|^{2}_{L^{2q_{2}^{\prime}}_{\mu_{0}}}.

By (a), we may bound the first factor on the right-hand side of the last inequality by D1(q1,q2)D_{1}(q_{1},q_{2}). Now by (A.2) with a=Z1/2a=Z^{-1/2} and b=(ZNS)1/2b=(Z^{\textup{S}}_{N})^{-1/2}, and by inequality (A.4), we obtain (see the proof of Theorem 3.1 after (A.8)) that

II𝔼νN[ZNSmax{Z3,(ZNS)3}(ZZNS)2].\displaystyle II\leq\mathbb{E}_{\nu_{N}}\Bigl{[}Z_{N}^{\textup{S}}\max\bigl{\{}Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\bigr{\}}\bigl{(}Z-Z_{N}^{\textup{S}}\bigr{)}^{2}\Bigr{]}.

Jensen’s inequality and another application of inequality (A.2) yield

(ZZNS)2\displaystyle\bigl{(}Z-Z_{N}^{\textup{S}}\bigr{)}^{2} exp(Φ)exp(ΦN)Lμ022(exp(Φ)+exp(ΦN))2(ΦΦN)2Lμ01.\displaystyle\leq\|\exp(-\Phi)-\exp(-\Phi_{N})\|^{2}_{L^{2}_{\mu_{0}}}\leq\bigl{\|}\bigl{(}\exp(-\Phi)+\exp(-\Phi_{N})\bigr{)}^{2}(\Phi-\Phi_{N})^{2}\bigr{\|}_{L^{1}_{\mu_{0}}}.

Combining the preceding two estimates, using Tonelli’s theorem and Hölder’s inequality with the same conjugate exponent pairs (q1,q1)(q_{1},q_{1}^{\prime}) and (q2,q2)(q_{2},q_{2}^{\prime}) as used in the bound for II, and using (b), we get

II\displaystyle II 𝔼νN[ZNSmax{Z3,(ZNS)3}(eΦ+eΦN)2(ΦΦN)2]Lμ01\displaystyle\leq\left\|\mathbb{E}_{\nu_{N}}\left[Z_{N}^{\textup{S}}\max\bigl{\{}Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\bigr{\}}\left(e^{-\Phi}+e^{-\Phi_{N}}\right)^{2}(\Phi-\Phi_{N})^{2}\right]\right\|_{L^{1}_{\mu_{0}}}
𝔼νN[(ZNSmax{Z3,(ZNS)3}(eΦ+eΦN)2)q1]1q1𝔼νN[|ΦΦN|2q1]1q1Lμ01\displaystyle\leq\left\|\mathbb{E}_{\nu_{N}}\left[\left(Z_{N}^{\textup{S}}\max\bigl{\{}Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\bigr{\}}\left(e^{-\Phi}+e^{-\Phi_{N}}\right)^{2}\right)^{q_{1}}\right]^{\tfrac{1}{q_{1}}}\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}\right]^{\tfrac{1}{q_{1}^{\prime}}}\right\|_{L^{1}_{\mu_{0}}}
D2(q1,q2)𝔼νN[|ΦΦN|2q1]1/2q1Lμ02q22.\displaystyle\leq D_{2}(q_{1},q_{2})\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{2q_{1}^{\prime}}\bigr{]}^{1/2q_{1}^{\prime}}\right\|^{2}_{L^{2q_{2}^{\prime}}_{\mu_{0}}}.

Combining the preceding estimates yields (3.4). ∎

Proof of Lemma 3.4.

Since exp(Φ)Lμ0p\exp(\Phi)\in L^{p^{\ast}}_{\mu_{0}}, examination of assumption (a) of Theorem 3.1 indicates that we may set p1=pp_{1}=p^{\ast} and C1exp(Φ)Lμ0pC_{1}\coloneqq\|\exp(\Phi)\|_{L^{p^{\ast}}_{\mu_{0}}}. By (3.5), it follows that 𝔼νN[exp(Φ)+exp(ΦN)]2exp(C0)\mathbb{E}_{\nu_{N}}[\exp(-\Phi)+\exp(-\Phi_{N})]\leq 2\exp(C_{0}); thus assumption (b) of Theorem 3.1 holds with p2=p3=+p_{2}=p_{3}=+\infty (so that 2p1p3=+2p_{1}^{\prime}p_{3}=+\infty) and C2=2exp(C0)C_{2}=2\exp(C_{0}). We now prove that Assumption (c) of Theorem 3.1 holds. It follows by setting x=Φx=-\Phi and y=ΦNy=-\Phi_{N} in inequality (A.3) that |exp(Φ)exp(ΦN)|2exp(C0)|ΦΦN||\exp(-\Phi)-\exp(-\Phi_{N})|\leq 2\exp(C_{0})|\Phi-\Phi_{N}|. Thus

|ZNSZ|\displaystyle\bigl{|}Z_{N}^{\textup{S}}-Z\bigr{|} =|𝔼μ0[exp(ΦN)exp(Φ)]|\displaystyle=\bigl{|}\mathbb{E}_{\mu_{0}}\bigl{[}\exp(-\Phi_{N})-\exp(-\Phi)\bigr{]}\bigr{|}
𝔼μ0[|exp(ΦN)exp(Φ)|]\displaystyle\leq\mathbb{E}_{\mu_{0}}\bigl{[}|\exp(-\Phi_{N})-\exp(-\Phi)|\bigr{]}
2exp(C0)𝔼μ0[|ΦΦN|].\displaystyle\leq 2\exp(C_{0})\mathbb{E}_{\mu_{0}}\bigl{[}|\Phi-\Phi_{N}|\bigr{]}. (A.9)

Using Jensen’s inequality, (A.9), Tonelli’s theorem, and (3.6),

|𝔼νN[ZNS]Z|𝔼νN[|ZNSZ|]2eC0𝔼νN[|ΦΦN|]Lμ01min{Z1C3,C3Z}.\left|\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\right|\leq\mathbb{E}_{\nu_{N}}\bigl{[}\bigl{|}Z_{N}^{\textup{S}}-Z\bigr{|}\bigr{]}\leq 2e^{C_{0}}\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|\bigr{]}\right\|_{L^{1}_{\mu_{0}}}\leq\min\biggl{\{}Z-\frac{1}{C_{3}},C_{3}-Z\biggr{\}}.

The last inequality implies that assumption (c) of Theorem 3.1 holds with the same C3C_{3} as in (3.6), since for any 0<C3<+0<C_{3}<+\infty that satisfies C31<Z<C3C_{3}^{-1}<Z<C_{3} and (3.6), we have

C31Z𝔼νN[ZNS]ZZC31C31𝔼νN[ZNS]C_{3}^{-1}-Z\leq\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\leq Z-C_{3}^{-1}\implies C_{3}^{-1}\leq\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}

and

ZC3𝔼νN[ZNS]ZC3Z𝔼νN[ZNS]C3,Z-C_{3}\leq\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}-Z\leq C_{3}-Z\implies\mathbb{E}_{\nu_{N}}\bigl{[}Z_{N}^{\textup{S}}\bigr{]}\leq C_{3},

and combining both the implied statements yields assumption (c) of Theorem 3.1; thus (3.3) holds, as desired.

Now note that (3.5) implies that assumption (a) of Theorem 3.2 holds with q1=q2=+q_{1}=q_{2}=+\infty and D1=4exp(C0)D_{1}=4\exp(C_{0}). Furthermore, (3.5) also implies that ZNS=𝔼μ0[exp(ΦN)]exp(C0)Z_{N}^{\textup{S}}=\mathbb{E}_{\mu_{0}}[\exp(-\Phi_{N})]\leq\exp(C_{0}) for all ΦN\Phi_{N}. Thus, given that ZZ is νN\nu_{N}-a.s. constant, and given that there exists some 0<C3<0<C_{3}<\infty such that C31<Z<C3C_{3}^{-1}<Z<C_{3},

𝔼νN[(ZNS)q1max{Z3,(ZNS)3}q1(exp(Φ(u))+exp(ΦN(u)))2q1]1/q1\displaystyle\mathbb{E}_{\nu_{N}}\left[\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{q_{1}}\max\bigl{\{}Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\bigr{\}}^{q_{1}}\big{(}\exp(-\Phi(u))+\exp(-\Phi_{N}(u))\big{)}^{2q_{1}}\right]^{1/q_{1}}
4exp(3C0)𝔼νN[max{C33,(ZNS)3}q1]1/q1.\displaystyle\quad\leq 4\exp(3C_{0})\mathbb{E}_{\nu_{N}}\left[\max\bigl{\{}C_{3}^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\bigr{\}}^{q_{1}}\right]^{1/q_{1}}. (A.10)

A necessary and sufficient condition for setting q1=+q_{1}=+\infty above (and therefore also in assumption (b) of Theorem 3.2) is that ZNSZ_{N}^{\textup{S}} is νN\nu_{N}-a.s. bounded away from zero by a constant that does not depend on NN. By the convexity and monotonicity of xexp(x)x\mapsto\exp(x),

ZNS=𝔼μ0[exp(ΦN)]exp(𝔼μ0[ΦN])exp(C4),Z_{N}^{\textup{S}}=\mathbb{E}_{\mu_{0}}\left[\exp(-\Phi_{N})\right]\geq\exp\left(\mathbb{E}_{\mu_{0}}\left[-\Phi_{N}\right]\right)\geq\exp(-C_{4}),

for C4C_{4} as in (3.7). In particular, if (3.7) holds, then so does assumption (b) of Theorem 3.2, with q1=q2=+q_{1}=q_{2}=+\infty and D2=4exp(3C0)max{C33,exp(3C4)}D_{2}=4\exp(3C_{0})\max\{C_{3}^{-3},\exp(3C_{4})\}, by inequality (A.10). ∎

Proof of Lemma 3.5.

The proof proceeds in the same way as the proof of Lemma 3.4, with the exception that we need to prove that the assumption that 𝔼νN[exp(ρΦN)]Lμ01\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\in L^{1}_{\mu_{0}} for some ρ>2\rho^{\ast}>2 implies that assumption (a) of Theorem 3.1 and assumption (b) of Theorem 3.2 hold with the stated parameters. Therefore, the proof will only concern these two assertions. Since xxtx\mapsto x^{-t} is strictly convex on >0\mathbb{R}_{>0} for any t>0t>0, Jensen’s inequality yields that 𝔼νN[exp(ΦN)]1Lμ0t𝔼νN[exp(tΦN)]Lμ011/t\|\mathbb{E}_{\nu_{N}}[\exp(-\Phi_{N})]^{-1}\|_{L^{t}_{\mu_{0}}}\leq\|\mathbb{E}_{\nu_{N}}[\exp(t\Phi_{N})]\|^{1/t}_{L^{1}_{\mu_{0}}}. Therefore, setting t=ρt=\rho^{\ast}, we find that assumption (a) of Theorem 3.1 holds, with p1=ρp_{1}=\rho^{\ast} and C1=𝔼νN[exp(ρΦN)]Lμ011/ρC_{1}=\|\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\|^{1/\rho^{\ast}}_{L^{1}_{\mu_{0}}}. The inequality max{x,y}x+y\max\{x,y\}\leq x+y for x,y0x,y\geq 0 implies that

𝔼νN[max{ZNSZ3,(ZNS)2}q1(exp(Φ(u))+exp(ΦN(u)))2q1]1/q1\displaystyle\mathbb{E}_{\nu_{N}}\left[\max\bigl{\{}Z_{N}^{\textup{S}}Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-2}\bigr{\}}^{q_{1}}\big{(}\exp(-\Phi(u))+\exp(-\Phi_{N}(u))\big{)}^{2q_{1}}\right]^{1/q_{1}}
4exp(2C0)(C33exp(C0)+𝔼νN[(ZNS)2q1]1/q1),\displaystyle\leq 4\exp(2C_{0})\left(C_{3}^{-3}\exp(C_{0})+\mathbb{E}_{\nu_{N}}\left[\left(Z^{\textup{S}}_{N}\right)^{-2q_{1}}\right]^{1/q_{1}}\right),

while Jensen’s inequality, Tonelli’s theorem, and the definition of the Lμ01L^{1}_{\mu_{0}}-norm yield that

𝔼νN[(ZNS)2q1]\displaystyle\mathbb{E}_{\nu_{N}}[(Z^{\textup{S}}_{N})^{-2q_{1}}] 𝔼νN[𝔼μ0[exp(2q1ΦN)]]\displaystyle\leq\mathbb{E}_{\nu_{N}}\left[\mathbb{E}_{\mu_{0}}\left[\exp(2q_{1}\Phi_{N})\right]\right]
=𝔼μ0[𝔼νN[exp(2q1ΦN)]]\displaystyle=\mathbb{E}_{\mu_{0}}\left[\mathbb{E}_{\nu_{N}}\left[\exp(2q_{1}\Phi_{N})\right]\right]
=𝔼νN[exp(2q1ΦN)]Lμ01.\displaystyle=\left\|\mathbb{E}_{\nu_{N}}\left[\exp(2q_{1}\Phi_{N})\right]\right\|_{L^{1}_{\mu_{0}}}.

Since the last term is finite for q1ρ/2q_{1}\leq\rho^{\ast}/2 by the hypothesis that 𝔼νN[exp(ρΦN)]Lμ01\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\in L^{1}_{\mu_{0}}, it follows that assumption (b) of Theorem 3.2 holds with the parameters q1=ρ/2q_{1}=\rho^{\ast}/2, q2=+q_{2}=+\infty, and the scalar D2=4exp(2C0)(C33exp(C0)+𝔼νN[exp(ρΦN)]Lμ012/ρ)D_{2}=4\exp(2C_{0})(C_{3}^{-3}\exp(C_{0})+\|\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N})]\|^{2/\rho^{\ast}}_{L^{1}_{\mu_{0}}}). ∎

Proof of Proposition 3.6.

Recall (3.8), and fix an arbitrary u𝒰u\in\mathcal{U}. We have

|Φ(u)ΦN(u)|\displaystyle\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|} =12|G(u)y),Γ1(G(u)y)GN(u)y,Γ1(GN(u)y)|.\displaystyle=\frac{1}{2}\left|\left\langle G(u)-y),\Gamma^{-1}(G(u)-y)\right\rangle-\left\langle G_{N}(u)-y,\Gamma^{-1}(G_{N}(u)-y)\right\rangle\right|.

Adding and subtracting GN(u)y),Γ1(G(u)y)\left\langle G_{N}(u)-y),\Gamma^{-1}(G(u)-y)\right\rangle inside the absolute value, rearranging terms, applying the Cauchy–Schwarz inequality, and letting CΓC_{\Gamma} be the largest eigenvalue of Γ1\Gamma^{-1} yields

|Φ(u)ΦN(u)|\displaystyle\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|} =12|Γ1(G(u)y),G(u)GN(u)+Γ1(GN(u)y),G(u)GN(u)|\displaystyle=\frac{1}{2}\left|\left\langle\Gamma^{-1}(G(u)-y),G(u)-G_{N}(u)\right\rangle+\left\langle\Gamma^{-1}(G_{N}(u)-y),G(u)-G_{N}(u)\right\rangle\right|
=12|G(u)y+GN(u)y,Γ1(G(u)GN(u))|\displaystyle=\frac{1}{2}\left|\left\langle G(u)-y+G_{N}(u)-y,\Gamma^{-1}(G(u)-G_{N}(u))\right\rangle\right|
CΓG(u)+GN(u)2yG(u)GN(u).\displaystyle\leq C_{\Gamma}\|G(u)+G_{N}(u)-2y\|\|G(u)-G_{N}(u)\|. (A.11)

By the triangle inequality,

G(u)+GN(u)2y2max{G(u)y,GN(u)y}=2max{Φ(u)1/2,ΦN(u)1/2},\displaystyle\|G(u)+G_{N}(u)-2y\|\leq 2\max\{\|G(u)-y\|,\|G_{N}(u)-y\|\}=2\max\{\Phi(u)^{1/2},\Phi_{N}(u)^{1/2}\},

and the triangle inequality and (3.8) yield

ΦN(u)1/2\displaystyle\Phi_{N}(u)^{1/2} =21/2GN(u)y\displaystyle=2^{-1/2}\|G_{N}(u)-y\|
=21/2G(u)y+GN(u)G(u)\displaystyle=2^{-1/2}\|G(u)-y+G_{N}(u)-G(u)\|
21/2(21/2Φ(u)1/2+GN(u)G(u))\displaystyle\leq 2^{-1/2}(2^{1/2}\Phi(u)^{1/2}+\|G_{N}(u)-G(u)\|)
=Φ(u)1/2+21/2GN(u)G(u).\displaystyle=\Phi(u)^{1/2}+2^{-1/2}\|G_{N}(u)-G(u)\|.

Together, these inequalities yield

G(u)y+GN(u)y\displaystyle\|G(u)-y+G_{N}(u)-y\| 2(Φ(u)1/2+21/2GN(u)G(u)),\displaystyle\leq 2(\Phi(u)^{1/2}+2^{-1/2}\|G_{N}(u)-G(u)\|),

and substituting the above into (A.11) yields

|Φ(u)ΦN(u)|2CΓ(Φ(u)1/2GN(u)G(u)+G(u)GN(u)2),\displaystyle\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}\leq 2C_{\Gamma}\left(\Phi(u)^{1/2}\|G_{N}(u)-G(u)\|+\|G(u)-G_{N}(u)\|^{2}\right),

thus proving (3.9). Using (A.5) yields

|Φ(u)ΦN(u)|q2q1(2CΓ)q(Φ(u)q/2GN(u)G(u)q+G(u)GN(u)2q).\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}^{q}\leq 2^{q-1}(2C_{\Gamma})^{q}\left(\Phi(u)^{q/2}\|G_{N}(u)-G(u)\|^{q}+\|G(u)-G_{N}(u)\|^{2q}\right).

Now take expectations with respect to νN\nu_{N}: since GG and Φ\Phi are constant with respect to νN\nu_{N},

𝔼νN[|Φ(u)ΦN(u)|q](4CΓ)q(Φ(u)q/2𝔼νN[GN(u)G(u)q]+𝔼νN[G(u)GN(u)2q]),\mathbb{E}_{\nu_{N}}\bigl{[}\bigl{|}\Phi(u)-\Phi_{N}(u)\bigr{|}^{q}\bigr{]}\leq(4C_{\Gamma})^{q}\Bigl{(}\Phi(u)^{q/2}\mathbb{E}_{\nu_{N}}\bigl{[}\|G_{N}(u)-G(u)\|^{q}\bigr{]}+\mathbb{E}_{\nu_{N}}\bigl{[}\|G(u)-G_{N}(u)\|^{2q}\bigr{]}\Bigr{)},

and taking the qqth root of both sides proves (3.10). ∎

Proof of Corollary 3.7.

Taking the Lμ0sL^{s}_{\mu_{0}} norm of both sides of the second inequality in Proposition 3.6, and applying (A.5) with s/q1s/q\geq 1, we obtain

𝔼νN[|ΦΦN|q]1/qLμ0s\displaystyle\bigl{\|}\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{q}\right]^{1/q}\bigr{\|}_{L^{s}_{\mu_{0}}}
(4CΓ)𝔼μ0[(Φq/2𝔼νN[GNGq]+𝔼νN[GNG2q])s/q]1/s\displaystyle\leq(4C_{\Gamma})\mathbb{E}_{\mu_{0}}\left[\left(\Phi^{q/2}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{q}\right]+\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2q}\right]\right)^{s/q}\right]^{1/s}
(4CΓ)21/q1/s(𝔼μ0[Φ(u)s/2𝔼νN[GNGq]s/q]+𝔼μ0[𝔼νN[GNG2q]s/q])1/s.\displaystyle\leq(4C_{\Gamma})2^{1/q-1/s}\left(\mathbb{E}_{\mu_{0}}\left[\Phi(u)^{s/2}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{q}\right]^{s/q}\right]+\mathbb{E}_{\mu_{0}}\left[\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2q}\right]^{s/q}\right]\right)^{1/s}.

By the Cauchy–Schwarz inequality and Jensen’s inequality,

𝔼μ0[Φs/2𝔼νN[GNGq]s/q]\displaystyle\mathbb{E}_{\mu_{0}}\left[\Phi^{s/2}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{q}\right]^{s/q}\right] (𝔼μ0[Φs]𝔼μ0[𝔼νN[GNGq]2s/q])1/2\displaystyle\leq\left(\mathbb{E}_{\mu_{0}}\left[\Phi^{s}\right]\mathbb{E}_{\mu_{0}}\left[\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{q}\right]^{2s/q}\right]\right)^{1/2}
(𝔼μ0[Φs]𝔼μ0[𝔼νN[GNG2q]s/q])1/2.\displaystyle\leq\left(\mathbb{E}_{\mu_{0}}\left[\Phi^{s}\right]\mathbb{E}_{\mu_{0}}\left[\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2q}\right]^{s/q}\right]\right)^{1/2}.

Since 0a1aa1/20\leq a\leq 1\implies a\leq a^{1/2}, the hypotheses of the corollary and the preceding imply that

𝔼νN[|ΦΦN|q]1/qLμ0s(4CΓ)21/q1/s(𝔼μ0[Φs]1/2+1)1/s𝔼νN[GNG2q]1/qLμ0s1/2.\displaystyle\bigl{\|}\mathbb{E}_{\nu_{N}}\left[|\Phi-\Phi_{N}|^{q}\right]^{1/q}\bigr{\|}_{L^{s}_{\mu_{0}}}\leq(4C_{\Gamma})2^{1/q-1/s}\left(\mathbb{E}_{\mu_{0}}\left[\Phi^{s}\right]^{1/2}+1\right)^{1/s}\bigl{\|}\mathbb{E}_{\nu_{N}}\left[\|G_{N}-G\|^{2q}\right]^{1/q}\bigr{\|}_{L^{s}_{\mu_{0}}}^{1/2}.

Since 21/q1/s21/q22^{1/q-1/s}\leq 2^{1/q}\leq 2, the proof is complete. ∎

Proof of Lemma 3.8.

Given (3.8), we may choose the parameter C0C_{0} in (3.5) to be C0=0C_{0}=0. By Jensen’s inequality, (3.11) implies (3.6). ∎

Proof of Theorem 3.9.

We first verify that Assumption 3.3 holds. Since Φ\Phi and ΦN\Phi_{N} satisfy (3.8), it follows that we may set C0=0C_{0}=0 in (3.5). Since we assume throughout that 0<Z=𝔼μ0[exp(Φ)]<0<Z=\mathbb{E}_{\mu_{0}}[\exp(-\Phi)]<\infty, it follows that Φ\Phi has moments of all orders, and hence belongs to Lμ0sL^{s}_{\mu_{0}} for all ss\in\mathbb{N}. Therefore, given that (3.11) holds for q,s1q,s\geq 1, it follows from Jensen’s inequality and Corollary 3.7 that we can make 𝔼νN[|ΦNΦ|]Lμ01\|\mathbb{E}_{\nu_{N}}[|\Phi_{N}-\Phi|]\|_{L^{1}_{\mu_{0}}} as small as desired. In particular, for any 0<C3<+0<C_{3}<+\infty that satisfies C31<Z<C3C_{3}^{-1}<Z<C_{3}, there exists a N(C3)N^{\ast}(C_{3})\in\mathbb{N} such that, for all NN(C3)N\geq N^{\ast}(C_{3}), (3.6) holds.

The rest of the proof consists of applying Lemma 3.4 or 3.5, Corollary 3.7 and Lemma 3.8.

Case (a). The hypotheses in this case ensure that we may apply Lemma 3.4. Set p1=pp_{1}=p^{\ast} and p2=p3=+p_{2}=p_{3}=+\infty, so that p1=(p)=p/(p1)p_{1}^{\prime}=(p^{\ast})^{\prime}=p^{\ast}/(p^{\ast}-1) and p2=p3=1p_{2}^{\prime}=p_{3}^{\prime}=1. Substituting these exponents into (3.3a) and applying Corollary 3.7 with s=2p1p3=2p/(p1)s=2p_{1}^{\prime}p_{3}^{\prime}=2p^{\ast}/(p^{\ast}-1) and q=p2=1q=p_{2}^{\prime}=1 (note that sq1s\geq q\geq 1), we obtain

dH(μ,μNM)\displaystyle d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)} C𝔼νN[|ΦΦN|]Lμ02p/(p1)(𝒰)C𝔼νN[GGN2]Lμ02p/(p1)(𝒰)1/2,\displaystyle\leq C\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|\bigr{]}\right\|_{L^{2p^{\ast}/(p^{\ast}-1)}_{\mu_{0}}(\mathcal{U})}\leq C\left\|\mathbb{E}_{\nu_{N}}\left[\|G-G_{N}\|^{2}\right]\right\|_{L^{2p^{\ast}/(p^{\ast}-1)}_{\mu_{0}}(\mathcal{U})}^{1/2},

where C>0C>0 changes value between inequalities. Thus we have shown that (3.12) holds with r1=1r_{1}=1 and r2=2p/(p1)r_{2}=2p^{\ast}/(p^{\ast}-1).

To prove that (3.13) holds with the desired exponents, we again use Lemma 3.4 to set q1=q2=+q_{1}=q_{2}=+\infty, so that q1=q2=1q_{1}^{\prime}=q_{2}^{\prime}=1. Substituting these exponents into (3.4), and applying Corollary 3.7 with s=2q2=2s=2q_{2}^{\prime}=2 and q=2q1=2q=2q_{1}^{\prime}=2, we obtain

𝔼νN[dH(μ,μNS)2]1/2D𝔼νN[|ΦΦN|2]1/2Lμ02D𝔼νN[GGN4]1/2Lμ02(𝒰)1/2,\displaystyle\mathbb{E}_{\nu_{N}}\left[d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\right]^{1/2}\leq D\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{2}\bigr{]}^{1/2}\right\|_{L^{2}_{\mu_{0}}}\leq D\left\|\mathbb{E}_{\nu_{N}}\left[\|G-G_{N}\|^{4}\right]^{1/2}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})}^{1/2},

where D>0D>0 changes value between inequalities. Thus we have shown that (3.13) holds with s1=s2=2s_{1}=s_{2}=2.

It remains to ensure that both the rightmost terms above converge to zero. Since (3.11) holds with q=2q=2 and s=2p/(p1)s=2p^{\ast}/(p^{\ast}-1), the desired convergence follows from the nesting property of finite-measure LpL^{p}-spaces. Therefore, both μNM\mu^{\textup{M}}_{N} and μNS\mu^{\textup{S}}_{N} converge to μ\mu as claimed.

Case (b). Since the arguments in this case are the same as in the previous case, we only record the different material.

The hypotheses ensure that we may apply Lemma 3.5. Set p1=ρp_{1}=\rho^{\ast} and p2=p3=+p_{2}=p_{3}=+\infty, so that (p1)=ρ/(ρ1)(p_{1})^{\prime}=\rho^{\ast}/(\rho^{\ast}-1) and p2=p3=1p_{2}^{\prime}=p_{3}^{\prime}=1. Substituting these exponents into (3.3a) and applying Corollary 3.7 with s=2p1p3=2ρ/(ρ1)s=2p_{1}^{\prime}p_{3}^{\prime}=2\rho^{\ast}/(\rho^{\ast}-1) and q=p2=1q=p_{2}^{\prime}=1, we obtain

dH(μ,μNM)\displaystyle d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{M}}\bigr{)} C𝔼νN[|ΦΦN|]Lμ02ρ/(ρ1)(𝒰)C𝔼νN[GGN2]Lμ02p/(p1)(𝒰)1/2,\displaystyle\leq C\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|\bigr{]}\right\|_{L^{2\rho^{\ast}/(\rho^{\ast}-1)}_{\mu_{0}}(\mathcal{U})}\leq C\left\|\mathbb{E}_{\nu_{N}}\left[\|G-G_{N}\|^{2}\right]\right\|_{L^{2p^{\ast}/(p^{\ast}-1)}_{\mu_{0}}(\mathcal{U})}^{1/2},

where C>0C>0 changes value between inequalities. Thus we have shown that (3.12) holds with r1=1r_{1}=1 and r2=2ρ/(ρ1)r_{2}=2\rho^{\ast}/(\rho^{\ast}-1).

To prove that (3.13) holds with the desired exponents, we again use Lemma 3.5 to set q1=ρ2q_{1}=\tfrac{\rho^{\ast}}{2} and q2=+q_{2}=+\infty, so that q1=ρ/(ρ2)q_{1}^{\prime}=\rho^{\ast}/(\rho^{\ast}-2) and q2=1q_{2}^{\prime}=1. Substituting these exponents into (3.4), and applying Corollary 3.7 with s=2q2=2s=2q_{2}^{\prime}=2 and q=2q1=2ρ/(ρ2)q=2q_{1}^{\prime}=2\rho^{\ast}/(\rho^{\ast}-2), we obtain

𝔼νN[dH(μ,μNS)2]1/2\displaystyle\mathbb{E}_{\nu_{N}}\left[d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\right]^{1/2} D𝔼νN[|ΦΦN|2ρ/(ρ2)](ρ2)/(2ρ)Lμ02\displaystyle\leq D\left\|\mathbb{E}_{\nu_{N}}\bigl{[}|\Phi-\Phi_{N}|^{2\rho^{\ast}/(\rho^{\ast}-2)}\bigr{]}^{(\rho^{\ast}-2)/(2\rho^{\ast})}\right\|_{L^{2}_{\mu_{0}}}
D𝔼νN[GGN4ρ/(ρ2)](ρ2)/(2ρ)Lμ02(𝒰)1/2,\displaystyle\leq D\left\|\mathbb{E}_{\nu_{N}}\bigl{[}\|G-G_{N}\|^{4\rho^{\ast}/(\rho^{\ast}-2)}\bigr{]}^{(\rho^{\ast}-2)/(2\rho^{\ast})}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})}^{1/2},

where D>0D>0 changes value between inequalities. Thus (3.13) holds with s1=2ρ/(ρ2)s_{1}=2\rho^{\ast}/(\rho^{\ast}-2) and s2=2s_{2}=2. Since (3.11) holds with q=2ρ/(ρ2)q=2\rho^{\ast}/(\rho^{\ast}-2) and s=2ρ/(ρ1)s=2\rho^{\ast}/(\rho^{\ast}-1), it follows from the nesting property of LpL^{p}-spaces defined on finite measure spaces that both

𝔼νN[GGN2]Lμ02p/(p1)(𝒰)1/2 and 𝔼νN[GGN4ρ/(ρ2)](ρ2)/(2ρ)Lμ02(𝒰)1/2\left\|\mathbb{E}_{\nu_{N}}\left[\|G-G_{N}\|^{2}\right]\right\|_{L^{2p^{\ast}/(p^{\ast}-1)}_{\mu_{0}}(\mathcal{U})}^{1/2}\text{ and }\left\|\mathbb{E}_{\nu_{N}}\left[\|G-G_{N}\|^{4\rho^{\ast}/(\rho^{\ast}-2)}\right]^{(\rho^{\ast}-2)/(2\rho^{\ast})}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})}^{1/2}

converge to zero. ∎

Proof of Proposition 4.1.

We start by verifying the assumptions of Theorem 3.2. First, since Φ(u)0\Phi(u)\geq 0 for all u𝒰u\in\mathcal{U}, and ΦN(u)0\Phi_{N}(u)\geq 0 for all u𝒰u\in\mathcal{U} and all {σ(i)}i=1N\{\sigma^{(i)}\}_{i=1}^{N}, assumption (a) is satisfied for q1=q2=q_{1}=q_{2}=\infty. For assumption (b), we then have, for any q2[1,]q_{2}\in[1,\infty],

(𝔼σ[(ZNS)q1max{Z3,(ZNS)3}q1(exp(Φ(u))+exp(ΦN(u)))2q1]1/q1Lμ0q2(𝒰)\displaystyle\left\|\Big{(}\mathbb{E}_{\sigma}\left[\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{q_{1}}\max\{Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\}^{q_{1}}\big{(}\exp\big{(}-\Phi(u)\big{)}+\exp\big{(}-\Phi_{N}(u)\big{)}\big{)}^{2q_{1}}\right]^{1/q_{1}}\right\|_{L^{q_{2}}_{\mu_{0}}(\mathcal{U})}
4𝔼σ[(ZNS)q1max{Z3,(ZNS)3}q1]1/q1\displaystyle\quad\leq 4\,\mathbb{E}_{\sigma}\left[\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{q_{1}}\max\{Z^{-3},\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-3}\}^{q_{1}}\right]^{1/q_{1}}
4(Z3q1𝔼σ[(ZNS)q1]+𝔼σ[(ZNS)2q1])1/q1.\displaystyle\quad\leq 4\left(Z^{-3q_{1}}\mathbb{E}_{\sigma}\bigl{[}\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{q_{1}}\bigr{]}+\mathbb{E}_{\sigma}\bigl{[}\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-2q_{1}}\bigr{]}\right)^{1/q_{1}}.

Since ΦN(u)0\Phi_{N}(u)\geq 0 for all u𝒰u\in\mathcal{U} and all {σ(i)}i=1N\{\sigma^{(i)}\}_{i=1}^{N}, we have for any q1[1,]q_{1}\in[1,\infty]

𝔼σ[(ZNS)q1]1/q1=𝔼σ[(𝒰exp(ΦN(u))dμ0(u))q1]1/q11.\mathbb{E}_{\sigma}\left[\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{q_{1}}\right]^{1/q_{1}}=\mathbb{E}_{\sigma}\left[\left(\int_{\mathcal{U}}\exp(-\Phi_{N}(u))\,\mathrm{d}\mu_{0}(u)\right)^{q_{1}}\right]^{1/q_{1}}\leq 1. (A.12)

Using the \ell-sparse distribution of σ\sigma, we further have |σj(i)|s|\sigma^{(i)}_{j}|\leq\sqrt{s} and

ΦN(u)=12Ni=1N|σ(i)𝚃(Γ1/2(yG(u)))|2s2(Γ1/2(yG(u)))2=sΦ(u),\displaystyle\Phi_{N}(u)=\frac{1}{2N}\sum_{i=1}^{N}\bigl{|}{\sigma^{(i)}}^{\mathtt{T}}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}\bigr{|}^{2}\leq\frac{s}{2}\bigl{\|}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}\bigr{\|}^{2}=s\Phi(u),

which implies that ZNSZs=𝒰exp(sΦ(u))dμ0(u)Z_{N}^{\textup{S}}\geq Z_{s}=\int_{\mathcal{U}}\exp(-s\Phi(u))\,\mathrm{d}\mu_{0}(u). It follows that, for any q1[1,]q_{1}\in[1,\infty],

𝔼σ[(ZNS)2q1]1/q1𝔼σ[Zs2q1]1/q1=Zs2,\mathbb{E}_{\sigma}\bigl{[}\bigl{(}Z_{N}^{\textup{S}}\bigr{)}^{-2q_{1}}\bigr{]}^{1/q_{1}}\leq\mathbb{E}_{\sigma}\left[Z_{s}^{-2q_{1}}\right]^{1/q_{1}}=Z_{s}^{-2},

and assumption (b) is hence also satisfied for q1=q2=q_{1}=q_{2}=\infty. Hence, by Theorem 3.2,

(𝔼σ[dH(μ,μNS)2])1/2C(𝔼σ[|Φ(u)ΦN(u)|2])1/2Lμ02(𝒰).\Bigl{(}\mathbb{E}_{\sigma}\bigl{[}d_{\textup{H}}\bigl{(}\mu,\mu_{N}^{\textup{S}}\bigr{)}^{2}\bigr{]}\Bigr{)}^{1/2}\leq C\left\|\bigl{(}\mathbb{E}_{\sigma}\bigl{[}|\Phi(u)-\Phi_{N}(u)|^{2}\bigr{]}\bigr{)}^{1/2}\right\|_{L^{2}_{\mu_{0}}(\mathcal{U})}.

Using standard properties of Monte Carlo estimators (see e.g. Robert and Casella [1999]), we have

(𝔼σ[|Φ(u)ΦN(u)|2])1/2=𝕍σ[12|σ𝚃Γ1/2(yG(u))|2]N.\Bigl{(}\mathbb{E}_{\sigma}\bigl{[}|\Phi(u)-\Phi_{N}(u)|^{2}\bigr{]}\Bigr{)}^{1/2}=\sqrt{\frac{\mathbb{V}_{\sigma}\bigr{[}\frac{1}{2}\bigl{|}\sigma^{\mathtt{T}}\Gamma^{-1/2}(y-G(u))\bigr{|}^{2}\bigr{]}}{N}}.

Now, using 𝕍[X]=𝔼[X2]𝔼[X]2\mathbb{V}[X]=\mathbb{E}[X^{2}]-\mathbb{E}[X]^{2}, (j=1Jxj)4J3j=1Jxj4\left(\sum_{j=1}^{J}x_{j}\right)^{4}\leq J^{3}\sum_{j=1}^{J}x_{j}^{4}, the linearity of expectation, the \ell-sparse distribution of σ\sigma, and x4x2\|x\|_{4}\leq\|x\|_{2}, we have

0\displaystyle 0 𝕍σ[12|σ𝚃Γ1/2(yG(u))|2]\displaystyle\leq\mathbb{V}_{\sigma}\left[\frac{1}{2}\bigl{|}\sigma^{\mathtt{T}}\Gamma^{-1/2}(y-G(u))\bigr{|}^{2}\right]
=𝔼σ[14|σ𝚃Γ1/2(yG(u))|4]𝔼σ[14|σ𝚃Γ1/2(yG(u))|2]2\displaystyle=\mathbb{E}_{\sigma}\left[\frac{1}{4}\bigl{|}\sigma^{\mathtt{T}}\Gamma^{-1/2}(y-G(u))\bigr{|}^{4}\right]-\mathbb{E}_{\sigma}\left[\frac{1}{4}\bigl{|}\sigma^{\mathtt{T}}\Gamma^{-1/2}(y-G(u))\bigr{|}^{2}\right]^{2}
=𝔼σ[14|j=1Jσj(Γ1/2(yG(u)))j|4]14Γ1/2(yG(u))4\displaystyle=\mathbb{E}_{\sigma}\left[\frac{1}{4}\left|\sum_{j=1}^{J}\sigma_{j}\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}_{j}\right|^{4}\right]-\frac{1}{4}\bigl{\|}\Gamma^{-1/2}(y-G(u))\bigr{\|}^{4}
14J3j=1J𝔼σ[σj4](Γ1/2(yG(u)))j414Γ1/2(yG(u))4\displaystyle\leq\frac{1}{4}J^{3}\sum_{j=1}^{J}\mathbb{E}_{\sigma}[\sigma_{j}^{4}]\bigl{(}\Gamma^{-1/2}(y-G(u))\bigr{)}_{j}^{4}-\frac{1}{4}\bigl{\|}\Gamma^{-1/2}(y-G(u))\bigr{\|}^{4}
=14J3𝔼σ[σj4]Γ1/2(yG(u))4414Γ1/2(yG(u))4\displaystyle=\frac{1}{4}J^{3}\mathbb{E}_{\sigma}[\sigma_{j}^{4}]\bigl{\|}\Gamma^{-1/2}(y-G(u))\bigr{\|}_{4}^{4}-\frac{1}{4}\bigl{\|}\Gamma^{-1/2}(y-G(u))\bigr{\|}^{4}
(J3𝔼σ[σj4]1)Φ(u)2.\displaystyle\leq\left(J^{3}\mathbb{E}_{\sigma}[\sigma_{j}^{4}]-1\right)\Phi(u)^{2}.

The claim (4.1) now follows, with the choice of constant as in (4.2). ∎

Proof of Theorem 5.6.

Recall that TJT_{J} is a set of time points in [0,T][0,T], indexed by an index set JJ with cardinality |J||J|\in\mathbb{N}. In (5.10), we observed that

GN(u)G(u)|J|sup0tTe(t;u)2d.\|G_{N}(u)-G(u)\|\leq|J|\sup_{0\leq t\leq T}\|e(t;u)\|_{\ell^{d}_{2}}.

Fix ρ>2\rho^{\ast}>2. Omitting the argument uu of ΦN\Phi_{N}, Φ\Phi, GNG_{N} and GG, we have

exp(ρΦN)\displaystyle\exp\bigl{(}\rho^{\ast}\Phi_{N}\bigr{)} =exp(ρ(ΦNΦ+Φ))\displaystyle=\exp\bigl{(}\rho^{\ast}\bigl{(}\Phi_{N}-\Phi+\Phi\bigr{)}\bigr{)}
exp(ρ|ΦNΦ|+ρΦ)\displaystyle\leq\exp\bigl{(}\rho^{\ast}|\Phi_{N}-\Phi|+\rho^{\ast}\Phi\bigr{)}
=exp(ρ|ΦNΦ|)exp(ρΦ)\displaystyle=\exp\bigl{(}\rho^{\ast}|\Phi_{N}-\Phi|\bigr{)}\exp(\rho^{\ast}\Phi)
exp(2ρCΓ(Φ1/2GNG+GGN2))exp(ρΦ)\displaystyle\leq\exp\bigl{(}2\rho^{\ast}C_{\Gamma}\bigl{(}\Phi^{1/2}\|G_{N}-G\|+\|G-G_{N}\|^{2}\bigr{)}\bigr{)}\exp(\rho^{\ast}\Phi)
exp(ρΦ)2[exp(4ρCΓΦ1/2GNG)+exp(4ρCΓGGN2)],\displaystyle\leq\frac{\exp(\rho^{\ast}\Phi)}{2}\bigl{[}\exp\bigl{(}4\rho^{\ast}C_{\Gamma}\Phi^{1/2}\|G_{N}-G\|\bigr{)}+\exp\bigl{(}4\rho^{\ast}C_{\Gamma}\|G-G_{N}\|^{2}\bigr{)}\bigr{]},

where the last two inequalities follow from (3.9) and Young’s inequality ab(a2+b2)/2ab\leq(a^{2}+b^{2})/2 for a,b0a,b\geq 0. Using (5.10), we therefore obtain

exp(ρΦN)\displaystyle\exp(\rho^{\ast}\Phi_{N}) exp(ρΦ)2[exp(4ρCΓΦ1/2|J|sup0tTe(t)2d)\displaystyle\leq\frac{\exp(\rho^{\ast}\Phi)}{2}\left[\exp\left(4\rho^{\ast}C_{\Gamma}\Phi^{1/2}|J|\sup_{0\leq t\leq T}\|e(t)\|_{\ell^{d}_{2}}\right)\right.
+exp(4ρCΓ|J|2sup0tTe(t)2d2)],\displaystyle\phantom{=}\quad\left.+\exp\left(4\rho^{\ast}C_{\Gamma}|J|^{2}\sup_{0\leq t\leq T}\|e(t)\|^{2}_{\ell^{d}_{2}}\right)\right],

where we note that we have suppressed the uu-dependence of e(t;u)e(t;u) and simply written e(t)e(t). Since 𝒰\mathcal{U} is compact and SS is continuous, it follows that GG and hence Φ\Phi are continuous on 𝒰\mathcal{U}; by the extreme value theorem, Φ\Phi is bounded on 𝒰\mathcal{U}, i.e. ΦLμ0(𝒰)\|\Phi\|_{L^{\infty}_{\mu_{0}}(\mathcal{U})} is finite. Using this fact and taking expectations with respect to νN\nu_{N} we obtain

𝔼νN[exp(ρΦN(u))]exp(ρΦLμ0(𝒰))2\displaystyle\mathbb{E}_{\nu_{N}}\left[\exp(\rho^{\ast}\Phi_{N}(u))\right]\leq\frac{\exp(\rho^{\ast}\|\Phi\|_{L^{\infty}_{\mu_{0}}(\mathcal{U})})}{2} (𝔼νN[exp(4ρCΓΦLμ0(𝒰)1/2|J|sup0tTe(t;u)2d)]\displaystyle\left(\mathbb{E}_{\nu_{N}}\left[\exp\left(4\rho^{\ast}C_{\Gamma}\|\Phi\|_{L^{\infty}_{\mu_{0}}(\mathcal{U})}^{1/2}|J|\sup_{0\leq t\leq T}\|e(t;u)\|_{\ell^{d}_{2}}\right)\right]\right.
+𝔼νN[exp(4ρCΓ|J|2sup0tTe(t;u)2d2)]).\displaystyle\quad\quad+\left.\mathbb{E}_{\nu_{N}}\left[\exp\left(4\rho^{\ast}C_{\Gamma}|J|^{2}\sup_{0\leq t\leq T}\|e(t;u)\|^{2}_{\ell^{d}_{2}}\right)\right]\right).

By Corollary 5.5, the two terms on the right-hand side are finite for every u𝒰u\in\mathcal{U}. Given the continuous dependence of the parameters of Assumptions 5.1, 5.2, and 5.3 on uu, and given that 𝒰\mathcal{U} is a compact subset of a finite-dimensional Euclidean space, it follows that the right-hand side can be bounded by a scalar that does not depend on any uu. Hence, the function u𝔼νN[exp(ρΦN(u))]u\mapsto\mathbb{E}_{\nu_{N}}[\exp(\rho^{\ast}\Phi_{N}(u))] belongs to Lμ0(𝒰)Lμ01(𝒰)L^{\infty}_{\mu_{0}}(\mathcal{U})\subset L^{1}_{\mu_{0}}(\mathcal{U}), so that the first hypothesis of Theorem 3.9(b) holds. For the second hypothesis, observe that, since Assumption 5.3 holds for R=+R=+\infty, it follows that (5.12) holds for any nn\in\mathbb{N}, and thus (3.11) holds for any q,s1q,s\geq 1. Therefore the hypotheses of Theorem 3.9(b) are satisfied, and the desired conclusion follows from Theorem 3.9. ∎