This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Forward and Inverse Uncertainty Quantification using Multilevel Monte Carlo Algorithms for an Elliptic Nonlocal Equation

A. Jasra, Department of Statistics & Applied Probability National University of Singapore Singapore, Singapore    K.J.H. Law Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA, 37831    Y. Zhou Department of Statistics & Applied Probability National University of Singapore Singapore, Singapore
Abstract

This paper considers uncertainty quantification for an elliptic nonlocal equation. In particular, it is assumed that the parameters which define the kernel in the nonlocal operator are uncertain and a priori distributed according to a probability measure. It is shown that the induced probability measure on some quantities of interest arising from functionals of the solution to the equation with random inputs is well-defined; as is the posterior distribution on parameters given observations. As the elliptic nonlocal equation cannot be solved approximate posteriors are constructed. The multilevel Monte Carlo (MLMC) and multilevel sequential Monte Carlo (MLSMC) sampling algorithms are used for a priori and a posteriori estimation, respectively, of quantities of interest. These algorithms reduce the amount of work to estimate posterior expectations, for a given level of error, relative to Monte Carlo and i.i.d. sampling from the posterior at a given level of approximation of the solution of the elliptic nonlocal equation.


Key words: Uncertainty quantification, multilevel Monte Carlo, sequential Monte Carlo, nonlocal equations, Bayesian inverse problem
AMS subject classification: 82C80, 60K35.

1 Introduction

Anomalous diffusion, where the associated underlying stochastic process is not Brownian motion, has recently attracted considerable attention [2]. This case is interesting because there may be long-range correlations, among other reasons. Anomalous superdiffusion can be be related to fractional Laplacian operators and/or so-called nonlocal operators [10], defined point-wise by their operation on a function u:du:\mathbb{R}^{d}\rightarrow\mathbb{R} as

(u)(x):=dB(x,x)[u(x)u(x)]𝑑x.\mathcal{L}(u)(x):=\int_{\mathbb{R}^{d}}B(x,x^{\prime})[u(x^{\prime})-u(x)]dx^{\prime}. (1.1)

The fractional Laplacian is actually a special case of this equation. The present work will focus on these operators and in particular the associated stationary equation, analogous to the local elliptic equation. It should be noted that these nonlocal operators and the associated equations can be applied not only to problems of anomalous diffusion, but to a wide range of phenomena, including peridynamic models of continuum mechanics which allow crack nucleation and propagation [17, 10].

In this work it will be assumed that the kernel BB appearing above is parametrized by a (possibly infinite-dimensional) parameter λE\lambda\in E; λ\lambda is assumed random. Furthermore, some partial noisy observations of the solution uu will be available. In a probabilistic framework, a prior distribution is placed on λμ\lambda\sim\mu and the posterior distribution λ|yη\lambda|y\sim\eta results from conditioning on the observations. The joint distribution can often trivially be derived and evaluated in closed form for a given pair (λ,y)(\lambda,y), as (λ,y)=(y|λ)μ(λ)\mathbb{P}(\lambda,y)=\mathbb{P}(y|\lambda)\mu(\lambda). Hence, the posterior for a given observed value of yy can be evaluated, up to a normalizing constant. One aims to approximate quantities of interest q=EQ(λ)ν(dλ)q=\int_{E}Q(\lambda)\nu(d\lambda) for some Q:EQ:E\rightarrow\mathbb{R}, where ν=μ\nu=\mu for the forward problem and ν=η\nu=\eta for the inverse problem. The likelihood (y|λ)\mathbb{P}(y|\lambda) is often concentrated in a small, possibly nonlinear, subspace of EE. This posterior concentration generically precludes the naive application of standard forward approximation algorithms independently to the numerator, EQ(λ)(y|λ)μ(dλ)\int_{E}Q(\lambda)\mathbb{P}(y|\lambda)\mu(d\lambda), and denominator, E(y|λ)μ(dλ)\int_{E}\mathbb{P}(y|\lambda)\mu(d\lambda). It is noted that, typically, (1.1) will have to be approximated numerically, and this will lead to an approximate posterior density.

Monte Carlo (MC) and Sequential Monte Carlo (SMC) methods are amongst the most widely used computational techniques in statistics, engineering, physics, finance and many other disciplines. In particular, if i.i.d. samples λiμ\lambda_{i}\sim\mu may be obtained, the MC sampler simply iterates this NN times and approximates (1/N)i=1NQ(λi)EQ(λ)μ(dλ)(1/N)\sum_{i=1}^{N}Q(\lambda_{i})\approx\int_{E}Q(\lambda)\mu(d\lambda). SMC samplers [8] are designed to approximate a sequence {ηl}l0\{\eta_{l}\}_{l\geq 0} of probability distributions on a common space, whose densities are only known up-to a normalising constant. The method uses N1N\geq 1 samples (or particles) that are generated in parallel, and are propagated with importance sampling (often) via Markov chain Monte Carlo (MCMC) and resampling methods. Several convergence results, as NN grows, have been proved (see e.g. [7]).

For problems which must first be approximated at finite resolution, as is the case in this article, and must subsequently be sampled from using MC-based methods, a multilevel (ML) framework may be used. This can potentially reduce the cost to obtain a given level of mean square error (MSE) [14, 11, 12], relative to performing i.i.d. sampling from the approximate posterior at a given (high) resolution. A telescopic sum of successively refined approximation increments are estimated instead of a single highly resolved approximation. The convergence of the refined approximation increments allows one to balance the cost between levels, and optimally obtain a cost 𝒪(ε2)\mathcal{O}(\varepsilon^{-2}) for MSE 𝒪(ε2)\mathcal{O}(\varepsilon^{2}). This is shown in the context of the models in this article. Specifically, it is shown that the cost of MLMC, to provide a MSE of 𝒪(ε2)\mathcal{O}(\varepsilon^{2}), is less than i.i.d. sampling from the most accurate prior approximation.

SMC within the ML framework has been primarily developed in [3, 9, 16]. These methodologies, some of which consider a similar context to this article, have been introduced where the ML approach is thought to be beneficial, but i.i.d. sampling is not possible. Indeed, one often has to resort to advanced MCMC or SMC methods to implement the ML identity; see for instance [15] for MCMC. SMC for nonlocal problems however, is a very sensible framework to implement the ML identity, as it will approximate a sequence of related probabilities of increasing complexity; in some cases, exact simulation from couples of the probabilities is not possible. As a result, SMC is the computational framework which we will primarily pursue in this article. It is shown that the cost of MLSMC, to provide a MSE of 𝒪(ε2)\mathcal{O}(\varepsilon^{2}), is less than i.i.d. sampling from the most accurate posterior approximation. It is noted, however, that some developments both with regards to theoretical analysis and implentation of the algorithm, must be carefully considered in order to successfully use MLSMC for nonlocal models.

This article is structured as follows. In section 2 the nonlocal elliptic equations are given, along with the Bayesian inverse problem, and well-posedness of both are established. In particular, it is shown that the posterior has a well-defined Radon-Nikodym derivative w.r.t. prior measure. In section 3 we consider how one can approximate expectations w.r.t. the prior and the posterior, specifically using MLMC and MLSMC methods. Our complexity results are given also. Section 4 provides some numerical implementations of MLMC and MLSMC for the prior, and posterior, respectively.

2 Nonlocal elliptic equations

2.1 Setup

Consider the following equation

(u)(x)\displaystyle\mathcal{L}(u)(x) =\displaystyle= b(x),forxΩd\displaystyle b(x),\quad{\rm for~}x\in\Omega\subset\mathbb{R}^{d} (2.1)
u(x)\displaystyle u(x) =\displaystyle= 0,forxΓ,\displaystyle 0,\quad{\rm for~}x\in\Gamma, (2.2)

where \mathcal{L} is given by (1.1), the domain Ω\Omega is simply connected, and its “boundary” Γ\Gamma is sufficiently regular and nonlocal, in the sense that it has non-zero volume in d\mathbb{R}^{d},

Under appropriate conditions on BB, 1:Hs(ΩΓ)Hcs(ΩΓ)\mathcal{L}^{-1}:H^{-s}(\Omega\cup\Gamma)\rightarrow H^{s}_{c}(\Omega\cup\Gamma), for s[0,1)s\in[0,1), with s=1s=1 only for the limiting local version in which B(x,x)=(I2/x2)δ(xx)B(x,x^{\prime})=(I-\partial^{2}/\partial x^{2})\delta(x^{\prime}-x) (or a similar uniformly elliptic form) [10]. Following [10] Hcs={uHs;u|Γ=0,du(x)𝑑x=0}H^{s}_{c}=\{u\in H^{s};u|_{\Gamma}=0,\int_{\mathbb{R}^{d}}u(x)dx=0\} denotes the volume constrained space of functions, and the fractional Sobolev space HsH^{s} is defined as follows, for s(0,1)s\in(0,1),

Hs:={uL2(ΩΓ):uL2(ΩΓ)+|u|Hs(ΩΓ)<},H^{s}:=\{u\in L^{2}(\Omega\cup\Gamma):\|u\|_{L^{2}(\Omega\cup\Gamma)}+|u|_{H^{s}(\Omega\cup\Gamma)}<\infty\}, (2.3)

where

|u|Hs(ΩΓ):=ΩΓΩΓ(u(x)u(y))2|xy|d+2s𝑑y𝑑x.|u|_{H^{s}(\Omega\cup\Gamma)}:=\int_{\Omega\cup\Gamma}\int_{\Omega\cup\Gamma}\frac{(u(x)-u(y))^{2}}{|x-y|^{d+2s}}dydx.

We define Hc0:=Lc2H^{0}_{c}:=L^{2}_{c} for ease of notation.

For bV(ΩΓ)b\in V^{*}(\Omega\cup\Gamma), the dual with respect to L2L^{2} of some space V(ΩΓ)V(\Omega\cup\Gamma), the weak formulation of (2.2) can be defined as follows. Integrate the equation against an arbitrary test function vV(ΩΓ)v\in V(\Omega\cup\Gamma). Now, find uV(ΩΓ)u\in V(\Omega\cup\Gamma) (satisfying the boundary conditions) such that

(u,v)\displaystyle\mathcal{B}(u,v) =F(v),forallvV(ΩΓ),\displaystyle=F(v),\quad{\rm for~all~}v\in V(\Omega\cup\Gamma), (2.4)

where

(u,v):=dd[u(x)u(x)]B(x,x)𝑑xv(x)𝑑x,F(v):=db(x)v(x)𝑑x.\begin{split}\mathcal{B}(u,v)&:=\int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}[u(x^{\prime})-u(x)]B(x,x^{\prime})dx^{\prime}v(x)dx,\\ F(v)&:=\int_{\mathbb{R}^{d}}b(x)v(x)dx.\end{split}

2.2 Numerical methods for forward solution

Let hh_{\ell} denote the maximum diameter of an element of VV+1VV_{\ell}\subset V_{\ell+1}\subset\cdots\subset V. The finite approximation of (2.4) is stated as follows. Identify some uVu_{\ell}\in V_{\ell} such that

(u,v)\displaystyle\mathcal{B}(u_{\ell},v_{\ell}) =F(v),forallvV.\displaystyle=F(v_{\ell}),\quad{\rm for~all~}v_{\ell}\in V_{\ell}. (2.5)

Assuming the spaces VV_{\ell} are spanned by elements {ϕk}k=1M\{\phi^{k}_{\ell}\}_{k=1}^{M_{\ell}}, then one can substitute the ansatz u=k=1Mukϕku_{\ell}=\sum_{k=1}^{M_{\ell}}u_{\ell}^{k}\phi^{k}_{\ell} into the above equation, resulting in the finite linear system

k=1M(ϕj,ϕk)uk\displaystyle\sum_{k=1}^{M_{\ell}}\mathcal{B}(\phi_{\ell}^{j},\phi^{k}_{\ell})u_{k} =F(ϕj),forj=1,,M.\displaystyle=F(\phi_{\ell}^{j}),\quad{\rm for~}j=1,\dots,M_{\ell}. (2.6)

In particular, the spaces {V}\{V_{\ell}\} will be comprised of discontinuous elements, so that the method described is a discontinuous Galerkin finite element method (FEM) [4]. Piecewise polynomial discontinuous element spaces are dense in HcsH^{s}_{c} for s[0,1/2)s\in[0,1/2), and are therefore conforming when uHcs=Vu\in H_{c}^{s}=V for s[0,1/2)s\in[0,1/2), so there is no need to impose penalty terms at the boundaries as one must do for smoother problems in which the discontinuous elements are non-conforming.

2.3 Forward UQ

The following assumption will be made for simplicity

Assumption 2.1.

V=L2(ΩΓ)V=L^{2}(\Omega\cup\Gamma), with norm \|\cdot\| and inner product ,\langle\cdot,\cdot\rangle. B(x,x)=B(|xx|)0B(x,x^{\prime})=B(|x^{\prime}-x|)\geq 0 is continuous with B(0)c>0B(0)\geq c^{\prime}>0, and there exist K1>0K_{1}>0 such that for all xΩx\in\Omega

ΩΓB(x,x)𝑑x\displaystyle\int_{\Omega\cup\Gamma}B(x,x^{\prime})dx^{\prime} K1.\displaystyle\leq K_{1}. (2.7)

As shown in Section 6 of [13], this implies that for all u,v,bVu,v,b\in V

  • FF is continuous: |F(v)|bv|F(v)|\leq\|b\|\|v\|;

  • \mathcal{B} is continuous: |(u,v)|4K1uv|\mathcal{B}(u,v)|\leq 4K_{1}\|u\|\|v\|;

  • \mathcal{B} is coercive: |(u,u)|K2u2|\mathcal{B}(u,u)|\geq K_{2}\|u\|^{2} for some K2(,Ω)>0K_{2}(\mathcal{B},\Omega)>0. This actually follows from the Poincaré-type inequality derived in Proposition 2.5 of [1].

Hence, Lax-Milgram lemma ([5], Thm. 1.1.3) can be invoked, guaranteeing existence of a unique solution uL2(Ω)u\in L^{2}(\Omega) such that

uK21b.\|u\|\leq K_{2}^{-1}\|b\|.

In other words, the map bub\mapsto u is continuous. The system (2.6) inherits solvability since the bilinear is a fortiori coercive on VV_{\ell}, so that

uK21b.\|u_{\ell}\|\leq K_{2}^{-1}\|b\|.

Let BλB_{\lambda} be a parametrization of BB, where λμ0\lambda\sim\mu_{0} and λE\lambda\in E, and either E=pE=\mathbb{R}^{p} or EpE\subset\mathbb{R}^{p} compact. Then the following theorem holds.

Theorem 2.2 (Well-posedness of forward UQ).

If Assumption 2.1 holds almost surely for λμ\lambda\sim\mu and the map uQu\mapsto Q is continuous from L2L^{2} to \mathbb{R}, then the map λQ\lambda\mapsto Q is almost surely continuous. Hence Q(λ)LLpQ(\lambda)\in L^{\infty}\supset L^{p} for all p1p\geq 1, i.e. all moments exist. LpL^{p} here denotes the space of random variables XX such that E|X|pμ(dλ)<\int_{E}|X|^{p}\mu(d\lambda)<\infty.

Proof.

Since Assumption 2.1 holds almost surely for λμ\lambda\sim\mu, then uu:=K21b\|u\|\leq u^{*}:=K_{2}^{-1}\|b\| uniformly. So, the quantity of interest Q(λ)=u(λ)LLpQ(\lambda)=\|u(\lambda)\|\in L^{\infty}\supset L^{p} for all p1p\geq 1. Therefore, for any Q:L2(Ω)Q:L^{2}(\Omega)\rightarrow\mathbb{R} such that Q(λ):=Q(u(λ))KuQ(\lambda):=Q(u(\lambda))\leq K\|u\|, the result follows, since then Q(λ)Q:=KuQ(\lambda)\leq Q^{*}:=Ku^{*} for all λ\lambda. ∎

Corollary 2.3 (Well-posedness of finite approximation).

If Assumption 2.1 holds almost surely for λμ\lambda\sim\mu and the map uQu_{\ell}\mapsto Q_{\ell} is continuous from L2L^{2} to \mathbb{R}, then the map through the discrete system λQ\lambda\mapsto Q_{\ell} is almost surely continuous and QLQ_{\ell}\in L^{\infty}.

2.4 Inverse UQ

For the inverse problem, let us assume that some data is given in the form

y=𝒢(λ)+ξ,λξN(0,Σ),y=\mathcal{G}(\lambda)+\xi,\quad\lambda\perp\xi\sim N(0,\Sigma), (2.8)

where

𝒢(λ):=[g1,u(λ),,gM,u(λ)],giV.\mathcal{G}(\lambda):=[\langle g_{1},u(\lambda)\rangle,\dots,\langle g_{M},u(\lambda)\rangle]^{\top},\quad g_{i}\in V. (2.9)

Then the following theorem holds.

Theorem 2.4 (Well-posedness of inverse UQ).

The posterior distribution of λ|y\lambda|y is well-defined and takes the form

dηydμ(λ)=1Zexp{12|Σ1/2(𝒢(λ)y)|2},\frac{d\eta^{y}}{d\mu}(\lambda)=\frac{1}{Z}\exp\{-\frac{1}{2}|\Sigma^{-1/2}(\mathcal{G}(\lambda)-y)|^{2}\}, (2.10)

with Z=Eexp{12|Σ1/2(𝒢(λ)y)|2}μ(dλ)Z=\int_{E}\exp\{-\frac{1}{2}|\Sigma^{-1/2}(\mathcal{G}(\lambda)-y)|^{2}\}\mu(d\lambda).

Proof.

The form of the posterior is obtained by a change of variables to {λ,ξ=y𝒢(λ)\{\lambda,\xi=y-\mathcal{G}(\lambda)}, which are independent by assumption. Note the change of variables has Jacobian 1. So, changing variables back, this also gives the joint density. The posterior is obtained by normalizing for the observed value of yy. The form of the observation operator guarantees Z>exp{|Σ1|(|𝒢|2+|y|2)}Z>\exp\{-|\Sigma^{-1}|(|\mathcal{G}^{*}|^{2}+|y|^{2})\}, where 𝒢=(𝒢1,𝒢2,,𝒢M)\mathcal{G}^{*}=(\mathcal{G}_{1}^{*},\mathcal{G}_{2}^{*},\dots,\mathcal{G}_{M}^{*}) and 𝒢i\mathcal{G}_{i}^{*} is defined as in the proof of Theorem 2.2, and |Σ1||\Sigma^{-1}| is the matrix norm. ∎

Now define 𝒢(λ):=[g1,u(λ),,gM,u(λ)],giL2(Ω).\mathcal{G}_{\ell}(\lambda):=[\langle g_{1},u_{\ell}(\lambda)\rangle,\dots,\langle g_{M},u_{\ell}(\lambda)\rangle],\quad g_{i}\in L^{2}(\Omega).

Corollary 2.5 (Well-posedness of inverse UQ for finite problem).

The finite approximation of the posterior distribution of λ|y\lambda|y is well-defined and takes the form

dηydμ(λ)=1Zexp{12|Σ1/2(𝒢(λ)y)|2},\frac{d\eta^{y}_{\ell}}{d\mu}(\lambda)=\frac{1}{Z}\exp\{-\frac{1}{2}|\Sigma^{-1/2}(\mathcal{G}_{\ell}(\lambda)-y)|^{2}\}, (2.11)

with Z=Eexp{12|Σ1/2(𝒢(λ)y)|2}μ(dλ)Z_{\ell}=\int_{E}\exp\{-\frac{1}{2}|\Sigma^{-1/2}(\mathcal{G}_{\ell}(\lambda)-y)|^{2}\}\mu(d\lambda).

Remark 2.6.

The forcing may also be taken as uncertain, although the uniformity will require it to be defined on a compact space. The probability space EE will be taken as compact for simplicity, as it is then easy to verify Assumption 2.1.

3 Approximation of expectations

The objective here is to approximate expectations of some functional φ:E\varphi:E\rightarrow\mathbb{R} with respect to a probability measure η\eta (μ\mu or ηy\eta^{y}), denoted 𝔼η(φ).\mathbb{E}_{\eta}(\varphi). The solution uu of (2.2) above must be approximated by some uu_{\ell}, with a degree of accuracy which depends on \ell. Indeed there exists a hierarchy of levels =0,,L\ell=0,\dots,L (where LL may be arbitrarily large) of increasing accuracy and increasing cost. For the inverse problem this manifests in a hierarchy of target probability measures {η}=0L\{\eta_{\ell}\}_{\ell=0}^{L} via the approximation of (2.10). For the forward problem η=η\eta_{\ell}=\eta for all \ell, but it will be assumed that the function φ\varphi requires evaluation of the solution of (2.2), as in Theorem 2.2, and the corresponding approximations will be denoted by {φ}=0L\{\varphi_{\ell}\}_{\ell=0}^{L}. One may then compute the estimator Y^LN=1Nn=1NφL(λLn),λLnηL\hat{Y}_{L}^{N}=\frac{1}{N}\sum_{n=1}^{N}\varphi_{L}(\lambda_{L}^{n}),\lambda_{L}^{n}\sim\eta_{L}, where for the forward problem it may be that ηLη\eta_{L}\equiv\eta for all LL. The mean square error (MSE) is given by

𝔼|𝔼η[φ(λ)]Y^LN|2=𝔼{𝔼ηL[φL(λ)]Y^LN}2variance+{𝔼ηL[φL(λ)]𝔼η[φ(λ)]}2bias.\mathbb{E}\left|\mathbb{E}_{\eta}[\varphi(\lambda)]-\hat{Y}_{L}^{N}\right|^{2}=\underbrace{\mathbb{E}\left\{\mathbb{E}_{\eta_{L}}[\varphi_{L}(\lambda)]-\hat{Y}_{L}^{N}\right\}^{2}}_{\rm variance}+\underbrace{\{\mathbb{E}_{\eta_{L}}[\varphi_{L}(\lambda)]-\mathbb{E}_{\eta}[\varphi(\lambda)]\}^{2}}_{\rm bias}\ . (3.1)

Now assume there is some discretization level, say of diameter hLh_{L}, which gives rise to an error estimate on the output of size 𝒪(hLα)\mathcal{O}(h_{L}^{\alpha}), for example arising from the deterministic numerical approximation of a spatio-temporal problem. This also translates to a number of degrees of freedom proportional to hLdh_{L}^{-d}, where dd is the spatio-temporal dimension. Now the complexity of typical forward solves will range from a dot product 𝒪(hLd)\mathcal{O}(h_{L}^{-d}) (linear) to a full Gaussian elimination 𝒪(hL3d)\mathcal{O}(h_{L}^{-3d}) (cubic). In this problem one aims to find hLα=𝒪(ε)h_{L}^{\alpha}=\mathcal{O}(\varepsilon), so one would find the cost controlled by 𝒪(εζ/α)\mathcal{O}(\varepsilon^{-\zeta/\alpha}), for ζ(d,3d)\zeta\in(d,3d). If one can obtain independent, identically distributed samples uLnu^{n}_{L}, then the necessary number of samples to obtain a variance of size 𝒪(ε2)\mathcal{O}(\varepsilon^{2}) is given by N=𝒪(ε2)N=\mathcal{O}(\varepsilon^{-2}). The total cost to obtain a mean-square error tolerance of 𝒪(ε2)\mathcal{O}(\varepsilon^{2}) is therefore 𝒪(ε2ζ/α)\mathcal{O}(\varepsilon^{-2-\zeta/\alpha}).

3.1 Multilevel Monte Carlo

For the forward UQ problem, in which one can sample directly from η\eta_{\ell}, one very popular methodology for improving the efficiency of solution to such problems is the multilevel Monte Carlo (MLMC) method [14, 11]. Indeed there has been an explosion of recent activity [12] since its introduction in [11]. In this methodology the simple estimator Y^LN\hat{Y}_{L}^{N} above for a given desired LL is replaced by a telescopic sum of unbiased increment estimators YN=i=1N{φ(λ(i))φ1(λ1(i))}N1Y^{N_{\ell}}_{\ell}=\sum_{i=1}^{N_{\ell}}\{\varphi_{\ell}(\lambda_{\ell}^{(i)})-\varphi_{\ell-1}(\lambda_{\ell-1}^{(i)})\}N_{\ell}^{-1} where {λ1(i),λ(i)}\{\lambda_{\ell-1}^{(i)},\lambda_{\ell}^{(i)}\} are i.i.d. samples, with marginal laws η1\eta_{\ell-1}, η\eta_{\ell}, respectively, carefully constructed on a joint probability space. This is repeated independently for 0L0\leq\ell\leq L. The overall multilevel estimator will be

Y^L,Multi==0LYN,\hat{Y}_{L,{\rm Multi}}=\sum_{\ell=0}^{L}Y^{N_{\ell}}_{\ell}\,, (3.2)

under the convention that g(λ1(i))=0g(\lambda_{-1}^{(i)})=0. A simple error analysis shows that the mean squared error (MSE) in this case is given by

𝔼{Y^L,Multi𝔼η[φ(λ)]}2\displaystyle\mathbb{E}\{\hat{Y}_{L,{\rm Multi}}-\mathbb{E}_{\eta}[\varphi(\lambda)]\}^{2} ==0L𝔼{YN[𝔼ηφ(λ)𝔼η1φ1(λ)]}2variance\displaystyle=\underbrace{\sum_{\ell=0}^{L}\mathbb{E}\left\{Y^{N_{\ell}}_{\ell}-[\mathbb{E}_{\eta_{\ell}}{\varphi_{\ell}}(\lambda)-\mathbb{E}_{\eta_{\ell-1}}{\varphi_{\ell-1}}(\lambda)]\right\}^{2}}_{\rm variance} (3.3)
+{𝔼ηL[φL(λ)]𝔼η[φ(λ)]}2bias.\displaystyle+\underbrace{\{\mathbb{E}_{\eta_{L}}[{\varphi_{L}}(\lambda)]-\mathbb{E}_{\eta}[\varphi(\lambda)]\}^{2}}_{\rm bias}\ .

Notice that the bias is given by the finest level, whilst the variance is decomposed into a sum of variances of the increments. The variance of the th\ell^{th} increment estimator has the form 𝒱N1\mathcal{V}_{\ell}N_{\ell}^{-1}, where the terms 𝒱=𝔼|φ(λ)φ1(λ1)|2\mathcal{V}_{\ell}=\mathbb{E}|\varphi_{\ell}(\lambda_{\ell})-\varphi_{\ell-1}(\lambda_{\ell-1})|^{2} decay, following from refinement of the approximation of η\eta and/or φ\varphi. One can therefore balance the variance at a given level 𝒱\mathcal{V}_{\ell} with the number of samples NN_{\ell}. As the level increases, the corresponding cost increases, but the variance decreases, allowing fewer samples to achieve a given variance. This can be optimized, and results in a total cost of 𝒪(εmax{2,ζ/α})\mathcal{O}(\varepsilon^{-\max\{2,\zeta/\alpha\}}), in the optimal case.

To be explicit, denote by Q:=Q(u(λ),λ)Q_{\ell}:=Q(u_{\ell}(\lambda_{\ell}),\lambda_{\ell}) the level \ell approximation of the quantity of interest QQ. Introduce the following assumptions

  • (A1)

    There exist α,β,ζ>0\alpha,\beta,\zeta>0, and a C>0C>0 such that

    {|𝔼(QLQ)|ChLα;𝔼|QQ1|2Chβ;C(Q)Chζ,\begin{cases}|\mathbb{E}(Q_{L}-Q_{\infty})|&\leq Ch_{L}^{\alpha};\\ \mathbb{E}|Q_{\ell}-Q_{\ell-1}|^{2}&\leq Ch_{\ell}^{\beta};\\ {\rm C}(Q_{\ell})&\leq Ch_{\ell}^{-\zeta},\end{cases} (3.4)

    where C(Q){\rm C}(Q_{\ell}) denotes the cost to evaluate QQ_{\ell}.

We have the following classical MLMC Theorem [12]

Theorem 3.1.

Assume (A(A1)) and max{β,ζ}2α\rm{max}\{\beta,\zeta\}\leq 2\alpha. Then for any ε>0\varepsilon>0, there exist L,{N}=0LL,\{N_{\ell}\}_{\ell=0}^{L} and C>0C>0 such that

𝔼[(Y^L,Multi𝔼η[Q])2]Cε2,\mathbb{E}\Big{[}\Big{(}\hat{Y}_{L,{\rm Multi}}-\mathbb{E}_{\eta}[Q]\Big{)}^{2}\Big{]}\leq C\varepsilon^{2}, (3.5)

for the following cost

COSTC{ε2,ifβ>ζ,ε2|log(ε)|2,ifβ=ζ,ε(2+ζβα),ifβ<ζ.{\rm COST}\leq C\begin{cases}\varepsilon^{-2},&\text{if}\quad\beta>\zeta,\\ \varepsilon^{-2}|\log(\varepsilon)|^{2},&\text{if}\quad\beta=\zeta,\\ \varepsilon^{-\left(2+\frac{\zeta-\beta}{\alpha}\right)},&\text{if}\quad\beta<\zeta.\end{cases} (3.6)

3.2 Multilevel sequential Monte Carlo sampler

Now the inverse problem will be considered. There is a sequence of probability measures {η}0\{\eta_{\ell}\}_{\ell\geq 0} on a common measurable space (E,)(E,\mathcal{E}), and for each ll there is a measure γ:+\gamma_{\ell}:\mathcal{E}\rightarrow\mathbb{R}^{+}, such that

η(dλ)=γ(dλ)Z\eta_{\ell}(d\lambda)=\frac{\gamma_{\ell}(d\lambda)}{Z_{\ell}} (3.7)

where the normalizing constant Z=Eγ(dλ)Z_{\ell}=\int_{E}\gamma_{\ell}(d\lambda) is unknown. The objective is to compute:

𝔼η[φ(λ)]:=Eφ(λ)η(dλ)\mathbb{E}_{\eta_{\infty}}[\varphi(\lambda)]:=\int_{E}\varphi(\lambda)\eta_{\infty}(d\lambda)

for potentially many measurable η\eta_{\infty}-integrable functions φ:E\varphi:E\rightarrow\mathbb{R}.

3.2.1 Notations

Let (E,)(E,\mathcal{E}) be a measurable space. The notation b(E)\mathcal{B}_{b}(E) denotes the class of bounded and measurable real-valued functions, and the supremum norm is written as f=supλE|f(λ)|\|f\|_{\infty}=\sup_{\lambda\in E}|f(\lambda)|. Consider non-negative operators K:E×+K:E\times\mathcal{E}\rightarrow\mathbb{R}_{+} such that for each λE\lambda\in E the mapping AK(λ,A)A\mapsto K(\lambda,A) is a finite non-negative measure on \mathcal{E} and for each AA\in\mathcal{E} the function λK(λ,A)\lambda\mapsto K(\lambda,A) is measurable; the kernel KK is Markovian if K(λ,dv)K(\lambda,dv) is a probability measure for every λE\lambda\in E. For a finite measure μ\mu on (E,)(E,\mathcal{E}), and a real-valued, measurable f:Ef:E\rightarrow\mathbb{R}, we define the operations:

μK:AK(λ,A)μ(dλ);Kf:λf(v)K(λ,dv).\mu K:A\mapsto\int K(\lambda,A)\,\mu(d\lambda)\ ;\quad Kf:\lambda\mapsto\int f(v)\,K(\lambda,dv).

We also write μ(f)=f(λ)μ(dλ)\mu(f)=\int f(\lambda)\mu(d\lambda). In addition r\|\cdot\|_{r}, r1r\geq 1, denotes the LrL_{r}-norm, where the expectation is w.r.t. the law of the appropriate simulated algorithm.

3.2.2 Algorithm

As described in Section 1, the context of interest is when a sequence of densities {η}0\{\eta_{\ell}\}_{\ell\geq 0}, as in (3.7), are associated to an ‘accuracy’ parameter hlh_{l}, with h0h_{\ell}\rightarrow 0 as \ell\rightarrow\infty, such that >h0>h1>h=0\infty>h_{0}>h_{1}\cdots>h_{\infty}=0. In practice one cannot treat h=0h_{\infty}=0 and so must consider these distributions with h>0h_{\ell}>0. The laws with large hh_{\ell} are easy to sample from with low computational cost, but are very different from η\eta_{\infty}, whereas, those distributions with small hh_{\ell} are hard to sample with relatively high computational cost, but are closer to η\eta_{\infty}. Thus, we choose a maximum level L1L\geq 1 and we will estimate

𝔼ηL[φ(λ)]:=Eφ(λ)ηL(λ)𝑑λ.\mathbb{E}_{\eta_{L}}[\varphi(\lambda)]:=\int_{E}\varphi(\lambda)\eta_{L}(\lambda)d\lambda\ .

By the standard telescoping identity used in MLMC, one has

𝔼ηL[φ(λ)]\displaystyle\mathbb{E}_{\eta_{L}}[\varphi(\lambda)] =𝔼η0[φ(λ)]+=1L{𝔼η[φ(λ)]𝔼η1[φ(λ)]}\displaystyle=\mathbb{E}_{\eta_{0}}[\varphi(\lambda)]+\sum_{\ell=1}^{L}\Big{\{}\mathbb{E}_{\eta_{\ell}}[\varphi(\lambda)]-\mathbb{E}_{\eta_{\ell-1}}[\varphi(\lambda)]\Big{\}}
=𝔼η0[φ(λ)]+=1L𝔼η1[(γ(λ)Z1γ1(λ)Z1)φ(λ)].\displaystyle=\mathbb{E}_{\eta_{0}}[\varphi(\lambda)]+\sum_{\ell=1}^{L}\mathbb{E}_{\eta_{\ell-1}}\Big{[}\Big{(}\frac{\gamma_{\ell}(\lambda)Z_{\ell-1}}{\gamma_{\ell-1}(\lambda)Z_{\ell}}-1\Big{)}\varphi(\lambda)\Big{]}\ . (3.8)

Suppose now that one applies an SMC sampler [8] to obtain a collection of samples (particles) that sequentially approximate η0,η1,,ηL\eta_{0},\eta_{1},\ldots,\eta_{L}. We consider the case when one initializes the population of particles by sampling i.i.d. from η0\eta_{0}, then at every step resamples and applies a MCMC kernel to mutate the particles. We denote by (λ01:N0,,λL11:NL1)(\lambda_{0}^{1:N_{0}},\dots,\lambda_{L-1}^{1:N_{L-1}}), with +>N0N1NL11+\infty>N_{0}\geq N_{1}\geq\cdots N_{L-1}\geq 1, the samples after mutation; one resamples λl1:Nl\lambda_{l}^{1:N_{l}} according to the weights G(λi)=(γ+1/γ)(λi)G_{\ell}(\lambda_{\ell}^{i})=(\gamma_{\ell+1}/\gamma_{\ell})(\lambda_{\ell}^{i}), for indices l{0,,L1}l\in\{0,\dots,L-1\}. We will denote by {M}1L1\{M_{\ell}\}_{1\leq\ell\leq L-1} the sequence of MCMC kernels used at stages 1,,L11,\dots,L-1, such that ηM=η\eta_{\ell}M_{\ell}=\eta_{\ell}. For φ:E\varphi:E\rightarrow\mathbb{R}, {1,,L}\ell\in\{1,\dots,L\}, we have the following estimator of 𝔼η1[φ(λ)]\mathbb{E}_{\eta_{\ell-1}}[\varphi(\lambda)]:

η1N1(φ)=1N1i=1N1φ(λ1i).\eta_{\ell-1}^{N_{\ell-1}}(\varphi)=\frac{1}{N_{\ell-1}}\sum_{i=1}^{N_{\ell-1}}\varphi(\lambda_{\ell-1}^{i})\ .

We define

η1N1(G1M(dλ))=1N1i=1N1G1(λ1i)M(λ1i,dλ).\eta_{\ell-1}^{N_{\ell-1}}(G_{\ell-1}M_{\ell}(d\lambda_{\ell}))=\frac{1}{N_{\ell-1}}\sum_{i=1}^{N_{\ell-1}}G_{\ell-1}(\lambda_{\ell-1}^{i})M_{\ell}(\lambda_{\ell-1}^{i},d\lambda_{\ell})\ .

The joint probability distribution for the SMC algorithm is

i=1N0η0(dλ0i)=1L1i=1Nη1N1(G1M(dλi))η1N1(G1).\prod_{i=1}^{N_{0}}\eta_{0}(d\lambda_{0}^{i})\prod_{\ell=1}^{L-1}\prod_{i=1}^{N_{\ell}}\frac{\eta_{\ell-1}^{N_{\ell-1}}(G_{\ell-1}M_{\ell}(d\lambda_{\ell}^{i}))}{\eta_{\ell-1}^{N_{\ell-1}}(G_{\ell-1})}\ .

If one considers one more step in the above procedure, that would deliver samples {λLi}i=1NL\{\lambda_{L}^{i}\}_{i=1}^{N_{L}}, a standard SMC sampler estimate of the quantity of interest in (3.8) is ηLN(g)\eta_{L}^{N}(g); the earlier samples are discarded. Within a multilevel context, a consistent SMC estimate of (3.8) is

Y^=η0N0(φ)+=1L{η1N1(φG1)η1N1(G1)η1N1(φ)},\widehat{Y}=\eta_{0}^{N_{0}}(\varphi)+\sum_{\ell=1}^{L}\Big{\{}\frac{\eta_{\ell-1}^{N_{\ell-1}}(\varphi G_{\ell-1})}{\eta_{\ell-1}^{N_{\ell-1}}(G_{\ell-1})}-\eta_{\ell-1}^{N_{\ell-1}}(\varphi)\Big{\}}\ , (3.9)

and this will be proven to be superior than the standard one, under assumptions.

The relevant MSE error decomposition here is:

𝔼[{Y^𝔼η[φ(λ)]}2]2𝔼[{Y^𝔼ηL[φ(λ)]}2]+2{𝔼ηL[φ(λ)]𝔼η[φ(λ)]}2.\mathbb{E}\big{[}\{\widehat{Y}-\mathbb{E}_{\eta_{\infty}}[\varphi(\lambda)]\}^{2}\big{]}\leq 2\,\mathbb{E}\big{[}\{\widehat{Y}-\mathbb{E}_{\eta_{L}}[\varphi(\lambda)]\}^{2}\big{]}+2\,\{\mathbb{E}_{\eta_{L}}[\varphi(\lambda)]-\mathbb{E}_{\eta_{\infty}}[\varphi(\lambda)]\}^{2}\ . (3.10)

3.2.3 Multilevel SMC

We will now restate an analytical result from [3] that controls the error term 𝔼[{Y^𝔼ηL[φ(λ)]}2]\mathbb{E}[\{\widehat{Y}-\mathbb{E}_{\eta_{L}}[\varphi(\lambda)]\}^{2}] in expression (3.10). For any {0,,L}\ell\in\{0,\dots,L\} and φb(E)\varphi\in\mathcal{B}_{b}(E) we write: η(φ):=Eφ(λ)η(λ)𝑑λ.\eta_{\ell}(\varphi):=\int_{E}\varphi(\lambda)\eta_{\ell}(\lambda)d\lambda. The following standard assumptions will be made ; see [3, 7].

  • (A2)

    There exist 0<C¯<C¯<+0<\underline{C}<\overline{C}<+\infty such that

    sup1supλEG(λ)\displaystyle\sup_{\ell\geq 1}\sup_{\lambda\in E}G_{\ell}(\lambda) \displaystyle\leq C¯;\displaystyle\overline{C}\ ;
    inf1infλEG(λ)\displaystyle\inf_{\ell\geq 1}\inf_{\lambda\in E}G_{\ell}(\lambda) \displaystyle\geq C¯.\displaystyle\underline{C}\ .
  • (A3)

    There exists a ρ(0,1)\rho\in(0,1) such that for any 1\ell\geq 1, (λ,v)E2(\lambda,v)\in E^{2}, AA\in\mathcal{E}:

    AM(λ,dλ)ρAM(v,dλ).\int_{A}M_{\ell}(\lambda,d\lambda^{\prime})\geq\rho\int_{A}M_{\ell}(v,d\lambda^{\prime})\ .

Under these assumptions the following Theorem is proven in [3]

Theorem 3.2.

Assume (A(A2)-(A3)). There exist C<+C<+\infty such that for any φb(E)\varphi\in\mathcal{B}_{b}(E), with φ=1\|\varphi\|_{\infty}=1,

𝔼[{Y^𝔼ηL[φ(λ)]}2]C(1N0+\displaystyle\mathbb{E}\big{[}\{\widehat{Y}-\mathbb{E}_{\eta_{L}}[\varphi(\lambda)]\}^{2}\big{]}\leq C\,\bigg{(}\frac{1}{N_{0}}+ =1L(𝒱lNl1+𝒱1/2N11/2q=+1L𝒱q1/2Nq1)),\displaystyle\sum_{\ell=1}^{L}\bigg{(}\tfrac{\mathcal{V}_{l}}{N_{l-1}}+\tfrac{\mathcal{V}_{\ell}^{1/2}}{N_{\ell-1}^{1/2}}\sum_{q=\ell+1}^{L}\tfrac{\mathcal{V}_{q}^{1/2}}{N_{q-1}}\bigg{)}\bigg{)}\ ,

where 𝒱:=Z1ZG112\mathcal{V}_{\ell}:=\|\tfrac{Z_{\ell-1}}{Z_{\ell}}G_{\ell-1}-1\|_{\infty}^{2}.

The following additional assumption will now be made

  • (A4)

    There exist α,β,ζ>0\alpha,\beta,\zeta>0, and a C>0C>0 such that

    {𝒱Chβ;|(ηLη)(1)|ChLα;C(G)Chζ,\begin{cases}\mathcal{V}_{\ell}&\leq Ch_{\ell}^{\beta};\\ |(\eta_{L}-\eta_{\infty})(1)|&\leq Ch_{L}^{\alpha};\\ {\rm C}(G_{\ell})&\leq Ch_{\ell}^{-\zeta},\end{cases} (3.11)

    where C(G){\rm C}(G_{\ell}) denotes the cost to evaluate GG_{\ell}.

The following Theorem may now be proven

Theorem 3.3.

Assume (A(A2)-(A4)) and max{β,ζ}2α\rm{max}\{\beta,\zeta\}\leq 2\alpha. Then for any ε>0\varepsilon>0, there exist L,{N}=0LL,\{N_{\ell}\}_{\ell=0}^{L} and C>0C>0 such that

𝔼[(Y^𝔼η[φ(λ)])2]Cε2,\mathbb{E}\Big{[}\Big{(}\widehat{Y}-\mathbb{E}_{\eta}[\varphi(\lambda)]\Big{)}^{2}\Big{]}\leq C\varepsilon^{2}, (3.12)

for the following cost

COSTC{ε2,ifβ>ζ,ε2|log(ε)|2,ifβ=ζ,ε(2+ζβα),ifβ<ζ.{\rm COST}\leq C\begin{cases}\varepsilon^{-2},&\text{if}\quad\beta>\zeta,\\ \varepsilon^{-2}|\log(\varepsilon)|^{2},&\text{if}\quad\beta=\zeta,\\ \varepsilon^{-\left(2+\frac{\zeta-\beta}{\alpha}\right)},&\text{if}\quad\beta<\zeta.\end{cases} (3.13)
Proof.

The MSE can be bounded using (3.10). Following from (A(A4))(ii), the second term requires that hLαεh_{L}^{\alpha}\eqsim\varepsilon, and assuming hL=MLh_{L}=M^{-L} for some M2M\geq 2, this translates to LlogεL\eqsim\log\varepsilon. As in [3], the additional error term is dealt with by first ignoring it and optimizing COST(𝐍){\rm COST}({\bf N}), for a given 𝒱(𝐍)=ε2\mathcal{V}({\bf N})=\varepsilon^{2}, where 𝐍:=(N1,,NL){\bf N}:=(N_{1},\dots,N_{L}). This requires that N𝒱l/Clh(β+ζ)/2N_{\ell}\propto\sqrt{\mathcal{V}_{l}/C_{l}}\eqsim h_{\ell}^{(\beta+\zeta)/2}. The constraint then requires that N=ε2KLh(β+ζ)/2N_{\ell}=\varepsilon^{-2}K_{L}h_{\ell}^{(\beta+\zeta)/2}, where KL=l=1L1hl(βζ)/2K_{L}=\sum_{l=1}^{L-1}h_{l}^{(\beta-\zeta)/2}, so one has

COST(𝐍)==0LNC=ε2KL2.{\rm COST}({\bf N})=\sum_{\ell=0}^{L}N_{\ell}C_{\ell}=\varepsilon^{-2}K_{L}^{2}.

It was shown in [3] that the result follows, provided ζ<2α\zeta<2\alpha. In fact, re-examining the additional term as in Section 3.3 of [3], one has

=1L𝒱1/2N11/2q=+1L𝒱q1/2Nq1=𝒪(ε2ε1ζ/2α=1L1h(βζ)/4KL3/2)=𝒪(ε2ε1ζ/2αL1/2KL1),\begin{split}\sum_{\ell=1}^{L}\tfrac{\mathcal{V}_{\ell}^{1/2}}{N_{\ell-1}^{1/2}}\sum_{q=\ell+1}^{L}\tfrac{\mathcal{V}_{q}^{1/2}}{N_{q-1}}&=\mathcal{O}(\varepsilon^{2}\varepsilon^{1-\zeta/2\alpha}\sum_{\ell=1}^{L-1}h_{\ell}^{(\beta-\zeta)/4}K_{L}^{-3/2})\\ &=\mathcal{O}(\varepsilon^{2}\varepsilon^{1-\zeta/2\alpha}L^{1/2}K_{L}^{-1}),\end{split}

where the second line follows from the inequality (=0La)2L=0La2(\sum_{\ell=0}^{L}a_{\ell})^{2}\leq L\sum_{\ell=0}^{L}a_{\ell}^{2} and the definition of KLK_{L}. Now substituting KL=𝒪(1),𝒪(L),𝒪(ε(ζβ)/2α)K_{L}=\mathcal{O}(1),\mathcal{O}(L),\mathcal{O}(\varepsilon^{-(\zeta-\beta)/2\alpha}) for the 3 cases and recalling the assumption max{β,ζ}2α\rm{max}\{\beta,\zeta\}\leq 2\alpha gives 𝒱(𝐍)=ε2\mathcal{V}({\bf N})=\varepsilon^{2} for the costs in (3.13).

3.2.4 Verification of assumptions

Assume a uniform prior μ\mu. Following from (2.8), the unnormalized measures will be given by

γ=exp{Φ(𝒢(λ))},\gamma_{\ell}=\exp\{-\Phi(\mathcal{G}_{\ell}(\lambda))\}, (3.14)

where Φ(𝒢l(λ))=12|Σ1/2(𝒢l(λ)y)|2\Phi(\mathcal{G}_{l}(\lambda))=\frac{1}{2}|\Sigma^{-1/2}(\mathcal{G}_{l}(\lambda)-y)|^{2}, and

𝒢(λ):=[g1,u(λ),,gM,u(λ)],giL2(Ω),\mathcal{G}_{\ell}(\lambda):=[\langle g_{1},u_{\ell}(\lambda)\rangle,\dots,\langle g_{M},u_{\ell}(\lambda)\rangle]^{\top},\quad g_{i}\in L^{2}(\Omega),

and ulu_{l} is the solution of the numerical approximation of (2.2). Notice that these are uniformly bounded, over both λE\lambda\in E and over \ell, following from Corollary 2.5 for finite \ell, and Theorem 2.4 in the limit.

It is shown in [3] Section 4 that

|Φ(𝒢(λ))Φ(𝒢1(λ))|C(|𝒢|,|𝒢1|)|𝒢(λ)𝒢1(λ)|.|\Phi(\mathcal{G}_{\ell}(\lambda))-\Phi(\mathcal{G}_{\ell-1}(\lambda))|\leq C(|\mathcal{G}_{\ell}|,|\mathcal{G}_{\ell-1}|)|\mathcal{G}_{\ell}(\lambda)-\mathcal{G}_{\ell-1}(\lambda)|. (3.15)

One proceeds similarly to that paper, and finds that

|𝒢(λ)𝒢1(λ)|Cu(λ)u1(λ)V.|\mathcal{G}_{\ell}(\lambda)-\mathcal{G}_{\ell-1}(\lambda)|\leq C\|u_{\ell}(\lambda)-u_{\ell-1}(\lambda)\|_{V}. (3.16)

Note that G=γ+1/γG_{\ell}=\gamma_{\ell+1}/\gamma_{\ell}, so inserting the bound (3.15) into (3.14), and observing the boundedness of the {𝒢}\{\mathcal{G}_{\ell}\} noted above, one can see that Assumption (A(A2)) holds. Now inserting (3.15) and (3.16) into (3.14), one can see that in order to establish Assumption (A(A4)), it suffices to establish rates of convergence for u(λ)u1(λ)V\|u_{\ell}(\lambda)-u_{\ell-1}(\lambda)\|_{V}. In particular, Assumption (A(A2)) and the C2 inequality (Minkowski plus Young’s) provide that

𝒱=Z1ZG112=G1η1(G1)12C(G112+η1(G1)12)2CG112Cu(λ)u1(λ)V2.\begin{split}\mathcal{V}_{\ell}&=\|\tfrac{Z_{\ell-1}}{Z_{\ell}}G_{\ell-1}-1\|_{\infty}^{2}=\left\|\frac{G_{\ell-1}}{\eta_{\ell-1}(G_{\ell-1})}-1\right\|_{\infty}^{2}\\ &\leq C(\|{G_{\ell-1}}-1\|_{\infty}^{2}+\|\eta_{\ell-1}(G_{\ell-1})-1\|_{\infty}^{2})\leq 2C\|{G_{\ell-1}}-1\|_{\infty}^{2}\\ &\leq C^{\prime}\|u_{\ell}(\lambda)-u_{\ell-1}(\lambda)\|_{V}^{2}.\end{split}

Therefore, the rate of convergence of u(λ)u1(λ)V\|u_{\ell}(\lambda)-u_{\ell-1}(\lambda)\|_{V} is the required quantity for both the forward (for Lipschitz QQ) and the inverse multilevel estimation problems. By the triangle inequality it suffices to consider the approximation of the truth u(λ)u(λ)V\|u_{\ell}(\lambda)-u(\lambda)\|_{V}, and by Céa’s lemma ([5], Thm. 2.4.1), Assumption 2.1 guarantees the existence of a C>0C>0 such that

u(λ)u(λ)VCinfvVv(λ)u(λ)V\|u_{\ell}(\lambda)-u(\lambda)\|_{V}\leq C\inf_{v_{\ell}\in V_{\ell}}\|v_{\ell}(\lambda)-u(\lambda)\|_{V} (3.17)

Theorem 6.2 of [10] provides rates of convergence of the best approximation above, in the case that V=HsV=H^{s} and the FEM triangulation is shape regular and quasi-uniform. In particular, their Case 1 corresponds to the case of a singular kernel, i.e. not even integrable, and :HcsHs\mathcal{L}:H^{s}_{c}\rightarrow H^{-s}, for s(0,1)s\in(0,1), so the solution operator is smoothing. Their Case 2 corresponds to a slightly more regular kernel than ours, where in fact B(x,)L2B(x,\cdot)\in L^{2}, rather than L1L^{1} as given by Assumption 2.1 and consequently :Lc2L2\mathcal{L}:L^{2}_{c}\rightarrow L^{2}. It is shown that in Case 2, if the solution uHm+tu\in H^{m+t}, for polynomial elements of order mm and some t[0,1]t\in[0,1], then convergence with respect to the L2L^{2} norm is in fact 𝒪(hm+t)\mathcal{O}(h^{m+t}). So, for linear elements, second order convergence can be obtained, leading to β=4\beta=4 in (3.11).

Recall that 2.1 ensures well-posedness of the solution from L2buL2L^{2}\ni b\mapsto u\in L^{2}, and so discontinuities are allowed. This is actually a strength of nonlocal models and one impetus for their use. It is reasonable to expect that if the the nodes of the discontinuous elements match up with the discontinuities, i.e. there is a node at each point of discontinuity for d=1d=1 or there are nodes all along a curve/surface of discontinuity for d>1d>1, and if the solution is sufficiently smooth in the subdomains, then the rates of convergence should match those of the smooth subproblems. For example, if there is a point discontinuity such that the domain can be separated at that point into Ω1Ω2=Ω\Omega_{1}\cup\Omega_{2}=\Omega and one has u|Ω1Hm+t(Ω1)u|_{\Omega_{1}}\in H^{m+t}(\Omega_{1}) and u|Ω2Hm+t(Ω2)u|_{\Omega_{2}}\in H^{m+t}(\Omega_{2}), where u|Ωiu|_{\Omega_{i}} is the restriction of uu to the set Ωi\Omega_{i}, then one expects the convergence rate of m+tm+t to be preserved. This is postulated and verified numerically in [4]. It is also illustrated numerically in that work that even with discontinuous elements, if there is no node at the discontinuity then the convergence rate reverts to 1/21/2, so β=1\beta=1 in (3.11).

4 Numerical examples

The particular nonlocal model which will be of interest here is that in which the kernel is given by

Bλ(x,x)=f(x,x,θ)1δ2|xx|α𝟏{|xx|<δ},B_{\lambda}(x,x^{\prime})=f(x,x^{\prime},\theta)\frac{1}{\delta^{2}|x-x^{\prime}|^{\alpha}}{\mathbf{1}}_{\{|x-x^{\prime}|<\delta\}}, (4.1)

where λ:=(θ,α,δ)\lambda:=(\theta,\alpha,\delta), and 𝟏A{\mathbf{1}}_{A} is the characteristic function which takes the value 1 if (x,x)A(x,x^{\prime})\in A and zero otherwise, and f(x,x)=f(x,x)f(x,x^{\prime})=f(x^{\prime},x). Notice that α(0,d)\alpha\in(0,d) is scalar, but (θ,δ)(\theta,\delta) may be defined on either a finite-dimensional subspace of function-space, or in principle in the full infinite dimensional space. This may be of interest to incorporate spatial dependence of material properties.

Following [6] we consider the following model. Let Ω=[0,1]\Omega=[0,1] and Γ=[δ,0)(1,1+δ]\Gamma=[-\delta,0)\cup(1,1+\delta]. For (x,x)Ω2(x,x^{\prime})\in\Omega^{2}, let z=(x+x)/2z=(x+x^{\prime})/2. The kernel is defined by,

f(x,x,θ)\displaystyle f(x,x^{\prime},\theta) ={0.2+(z0.625)2if z[0,0.625)z+θif z[0.625,0.75)14.4+(z0.75)+2if z[0.75,1)\displaystyle=\begin{cases}0.2+(z-0.625)^{2}&\text{if }z\in[0,0.625)\\ z+\theta&\text{if }z\in[0.625,0.75)\\ 14.4+(z-0.75)+2&\text{if }z\in[0.75,1)\end{cases}
b(x)\displaystyle b(x) =5\displaystyle=5

The following prior is used for the parameters,

θ\displaystyle\theta Uniform(1,2)\displaystyle\sim\mathrm{Uniform}(1,2)
α\displaystyle\alpha Beta(2,2)\displaystyle\sim\mathrm{Beta}(2,2)
δ\displaystyle\delta Gamma(1,1) truncated on (0.125,1)\displaystyle\sim\mathrm{Gamma}(1,1)\text{ truncated on }(0.125,1)

This is an extension of an example used in [6], which is identical to this model with θ=1.25\theta=1.25 and α=1\alpha=1 (various values of δ\delta are used there). To solve the equation with FEM, a uniform partition on Ω\Omega is used with h=2(k+)h_{\ell}=2^{-(k+\ell)}, k=3k=3. Thus for each level \ell, h<δh<\delta. The same discontinuous Galerkin FEM as in [6] is applied to solve the system.

Similar methods to those in [3] are used to estimate the convergence rates for both the forward and inverse problems. The estimated rates are α^=2.005\hat{\alpha}=2.005 and β^=4.271\hat{\beta}=4.271. The rates α=2\alpha=2 and β=4\beta=4 are used for the simulations.

For the inverse problem, data is generated with θ=2\theta=2, α=0.5\alpha=0.5 and δ=0.2\delta=0.2. The observations are y|uN(m(u),σ2)y|u\sim N(m(u),\sigma^{2}), where σ2=0.01\sigma^{2}=0.01 and m(u)=[u(a),u(b)]m(u)=[u(a),u(b)]^{\top}. The quantity of interest for both the forward and inverse problem is u(0.5)u(0.5).

The forward problem is solved in the standard multilevel fashion by generating coupled particles from the prior at each level. The main result, the cost vs. MSE of the estimates is shown in Figure 1. The Bayesian inverse problem is also solved for this model, and the main result is shown in Figure 2.

Refer to caption
Figure 1: Cost vs. MSE for MLMC
Refer to caption
Figure 2: Cost vs. MSE for MLSMC

5 Summary

This is the first systematic treatment of UQ for nonlocal models, to the knowledge of the authors. Natural extensions include obtaining rigorous convergence results for piecewise smooth solutions, exploring higher-dimensional examples, spatial parameters, time-dependent models, and parameters defined on non-compact spaces.

Acknowledgements. KJHL was supported by the DARPA FORMULATE project. AJ & YZ were supported by Ministry of Education AcRF tier 2 grant, R-155-000-161-112. We express our gratitude to Marta D’Elia, Pablo Seleson, and Max Gunzburger for useful discussions.

References

  • [1] Andreu, F., Mazon, J. M., Jose, M. Rossi, J. & Toledo, J. (2009). A nonlocal p-Laplacian evolution equation with nonhomogeneous Dirichlet boundary conditions. SIAM Journal on Mathematical Analysis, 40, 1815–1851.
  • [2] Bakunin, O. G. (2008). Turbulence and diffusion: scaling versus equations. Springer Science & Business Media.
  • [3] Beskos, A., Jasra, A., Law, K. J. H, Tempone, R. & Zhou, Y. (2015). Multilevel sequential Monte Carlo samplers. arXiv preprint arXiv:1503.07259.
  • [4] Chen, X. & Gunzburger, M. (2011). Continuous and discontinuous finite element methods for a peridynamics model of mechanics. Computer Methods in Applied Mechanics and Engineering, 200, 1237–1250.
  • [5] Ciarlet, P. G. (2002). The finite element method for elliptic problems. SIAM.
  • [6] D’Elia, M. & Gunzburger, M. (2014). Optimal distributed control of nonlocal steady diffusion problems. SIAM Journal on Control and Optimization. 52, 243–273.
  • [7] Del Moral, P. (2004). Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Springer: New York.
  • [8] Del Moral, P., Doucet, A. & Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Statist. Soc. B, 68, 411–436.
  • [9] Del Moral, P., Jasra, A., Law, K. J. H, & Zhou, Y. (2016). Multilevel sequential Monte Carlo samplers for normalizing constants. arXiv preprint arXiv:1603.01136.
  • [10] Du, A., Gunzburger, M., Lehoucq, R. & Zhou, K. (2012). Analysis and approximation of nonlocal diffusion problems with volume constraints. SIAM review, 54, 667–696.
  • [11] Giles, M. B. (2008). Multilevel Monte Carlo path simulation. Op. Res. 56, 607-617.
  • [12] Giles, M. B (2015). Multilevel Monte Carlo methods. Acta Numerica, 24, 259-328.
  • [13] Gunzburger, M. & Lehoucq, R. B. (2010). A nonlocal vector calculus with application to nonlocal boundary value problems. Multiscale Modeling & Simulation, 8, 1581–1598.
  • [14] Heinrich, S. (2001). Multilevel Monte Carlo methods. Large-scale scientific computing. Springer Berlin Heidelberg, 2001. 58-67.
  • [15] Hoang, V., Schwab, C. & Stuart, A. (2013). Complexity analysis of accelerated MCMC methods for Bayesian inversion. Inverse Prob., 29, 085010.
  • [16] Jasra, A., Kamatani, K., Law, K. J., & Zhou, Y. (2015). Multilevel particle filter. arXiv preprint arXiv:1510.04977.
  • [17] Silling, S. A., Zimmermann, M & Abeyaratne, R. (2003). Deformation of a peridynamic bar. Journal of Elasticity. 73, 173–190.