This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multilevel lattice-based kernel approximation for elliptic PDEs with random coefficients

Alexander D. Gilbert111School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au,  f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au,  a.srikumar@unsw.edu.au
   Michael B. Giles222Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK.
mike.giles@maths.ox.ac.uk
   Frances Y. Kuo111School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au,  f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au,  a.srikumar@unsw.edu.au
   Ian H. Sloan111School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au,  f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au,  a.srikumar@unsw.edu.au
   Abirami Srikumar111School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au,  f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au,  a.srikumar@unsw.edu.au
(September 1, 2025)
Abstract

This paper introduces a multilevel kernel-based approximation method to estimate efficiently solutions to elliptic partial differential equations (PDEs) with periodic random coefficients. Building upon the work of Kaarnioja, Kazashi, Kuo, Nobile, Sloan (Numer. Math., 2022) on kernel interpolation with quasi-Monte Carlo (QMC) lattice point sets, we leverage multilevel techniques to enhance computational efficiency while maintaining a given level of accuracy. In the function space setting with product-type weight parameters, the single-level approximation can achieve an accuracy of ε>0\varepsilon>0 with cost 𝒪(εηνθ)\mathcal{O}(\varepsilon^{-\eta-\nu-\theta}) for positive constants η,ν,θ\eta,\nu,\theta depending on the rates of convergence associated with dimension truncation, kernel approximation, and finite element approximation, respectively. Our multilevel approximation can achieve the same ε\varepsilon accuracy at a reduced cost 𝒪(εηmax(ν,θ))\mathcal{O}(\varepsilon^{-\eta-\max(\nu,\theta)}). Full regularity theory and error analysis are provided, followed by numerical experiments that validate the efficacy of the proposed multilevel approximation in comparison to the single-level approach.

1 Introduction

Kernel-based interpolation using a quasi-Monte Carlo (QMC) lattice design was first introduced in [48], where the authors analysed splines constructed from reproducing kernel functions on a lattice point set for periodic function approximation. They found that the special structure of a lattice point set coupled with periodic kernel functions led to linear systems with a circulant matrix that were able to be solved efficiently via the fast Fourier transform (FFT). The recent paper [23] applies kernel interpolation for approximating the solutions to partial differential equations (PDEs) with periodic random coefficients over the parametric domain. In this paper, we seek to enhance the kernel-based approximation method for estimating PDEs over the parametric domain by leveraging multilevel methods [17].

We are interested in the following parametric elliptic PDE

(Ψ(𝒙,𝒚)u(𝒙,𝒚))\displaystyle-\nabla\cdot(\Psi({\boldsymbol{x}},{\boldsymbol{y}})\nabla u({\boldsymbol{x}},{\boldsymbol{y}}))\, =f(𝒙),\displaystyle=\,f({\boldsymbol{x}}),\, 𝒙D,\displaystyle{\boldsymbol{x}}\in D, (1.1)
u(𝒙,𝒚)\displaystyle u({\boldsymbol{x}},{\boldsymbol{y}})\, = 0,\displaystyle=\,0, 𝒙D,\displaystyle{\boldsymbol{x}}\in\partial D,

where the physical variable 𝒙{\boldsymbol{x}} belongs to a bounded convex domain DdD\subset\mathbb{R}^{d}, for d=1,2,d=1,2, or 33, and

𝒚Ω[0,1]{\boldsymbol{y}}\in\Omega\coloneqq[0,1]^{\mathbb{N}}

is a countable vector of parameters. It is assumed that the input field Ψ(,𝒚)\Psi(\cdot,{\boldsymbol{y}}) in (1.1) is represented by the periodic model introduced recently in [24], which is periodic in 𝒚{\boldsymbol{y}} and given by the series expansion

Ψ(𝒙,𝒚)ψ0(𝒙)+j=1sin(2πyj)ψj(𝒙).\Psi({\boldsymbol{x}},{\boldsymbol{y}})\,\coloneqq\,\psi_{0}({\boldsymbol{x}})+\sum_{j=1}^{\infty}\sin(2\pi y_{j})\,\psi_{j}({\boldsymbol{x}}). (1.2)

Here 𝒚(y1,y2,){\boldsymbol{y}}\coloneqq(y_{1},y_{2},\ldots), where each yjy_{j} is independent and uniformly distributed on [0,1][0,1]. The functions ψjL(D)\psi_{j}\in L^{\infty}(D) are known and deterministic such that Ψ(,𝒚)L(D)\Psi(\cdot,{\boldsymbol{y}})\in L^{\infty}(D) for all 𝒚Ω{\boldsymbol{y}}\in\Omega. Additional requirements on ψj\psi_{j} will be introduced later as necessary.

The goal is to approximate efficiently the solution u(𝒙,𝒚)u({\boldsymbol{x}},{\boldsymbol{y}}) in both 𝒙D{\boldsymbol{x}}\in D and 𝒚Ω{\boldsymbol{y}}\in\Omega simultaneously. We will follow a similar method to [23], where the approximation is based on discretising the spatial domain DD using finite elements and applying kernel approximation over the parametric domain Ω\Omega based on a lattice point set. This paper introduces a new multilevel approximation that spreads the work over a hierarchy of finite element meshes and kernel interpolants so that the overall cost is reduced.

The solution to (1.1) with the coefficient given by (1.2) lies in a periodic function space (with respect to the parametric domain) equipped with a reproducing kernel 𝒦(,)\mathcal{K}(\cdot,\cdot). Given evaluations u(,𝒕k)u(\cdot,{\boldsymbol{t}}_{k}) on a lattice point set {𝒕k}k=0N1Ω\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}\subset\Omega, the solution u=u(,𝒚)u=u(\cdot,{\boldsymbol{y}}) can be approximated over Ω\Omega using a kernel interpolant given by

INu(,𝒚)k=0N1ak𝒦(𝒕k,𝒚)for 𝒚Ω,\displaystyle I_{N}u(\cdot,{\boldsymbol{y}})\coloneq\sum_{k=0}^{N-1}a_{k}\,\mathcal{K}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\qquad\mbox{for }{\boldsymbol{y}}\in\Omega, (1.3)

where INI_{N} is the kernel interpolation operator on Ω\Omega. The coefficients ak=ak(𝒙)a_{k}=a_{k}({\boldsymbol{x}}) for k=0,,N1k=0,\ldots,N-1 are obtained by solving the linear system associated with interpolation at the lattice points via FFT. Combining (1.3) with a finite element discretisation leads to an approximation of uu over D×ΩD\times\Omega. This is the method employed by [23], which we will refer to as the single-level kernel interpolant (see also Section 2.6 for details).

To introduce the multilevel kernel approximation, consider now a sequence of kernel interpolants IINI_{\ell}\coloneq I_{N_{\ell}}, for =0,1,\ell=0,1,\ldots, based on a sequence of embedded lattice point sets with nonincreasing size N0N1N_{0}\geq N_{1}\geq\cdots and a sequence of nested finite element approximations u(,𝒚)u_{\ell}(\cdot,{\boldsymbol{y}}) for =0,1,\ell=0,1,\ldots, where uu_{\ell} becomes increasingly accurate (and thus has increasing cost) as \ell increases. Omitting the 𝒙{\boldsymbol{x}} and 𝒚{\boldsymbol{y}} dependence, the multilevel kernel approximation with maximum level LL\in\mathbb{N} is given by

ILMLuI0u0+=1LI(uu1).I^{\rm{ML}}_{L}u\,\coloneqq\,I_{0}u_{0}+\sum_{\ell=1}^{L}I_{\ell}(u_{\ell}-u_{\ell-1}). (1.4)

The motivation of such an algorithm is to reduce the overall computational cost compared to the single-level method while achieving the same level of accuracy. The cost savings come from interpolating the difference uu1u_{\ell}-u_{\ell-1}, which converges to 0 as \ell increases, thus requiring an interpolation approximation with fewer points to achieve a comparable level of accuracy. Indeed, as will be demonstrated later, to achieve an accuracy of 𝒪(ε)\mathcal{O}(\varepsilon) with the single-level approximation in the function space setting with product-type weight parameters, the computational cost is 𝒪(εηνθ)\mathcal{O}(\varepsilon^{-\eta-\nu-\theta}) for positive constants η,ν,θ\eta,\nu,\theta depending on the rates of convergence for dimension truncation, kernel approximation, and finite element approximation, respectively; whereas for the multilevel approximation, the cost is reduced to 𝒪(εηmax(ν,θ))\mathcal{O}(\varepsilon^{-\eta-\max(\nu,\theta)}). The main contribution of this work is to introduce the multilevel kernel approximation along with a full error and cost analysis.

Parametric PDEs of the form (1.1) can be used to model steady-state flow through porous media and have been thoroughly studied in uncertainty quantification literature (see e.g., [2, 4, 5, 6, 44]). QMC methods have achieved much success in tackling parametric PDE problems, including evaluating expected values of quantities of interest (see, e.g., [13, 14, 19, 33, 34]), as well as density estimation (see [14]). The two most common forms of random coefficient are the uniform model (see e.g., [6, 33, 34]) and lognormal model (see e.g., [2, 4, 44]), and as an alternative, the periodic model (1.2) was introduced in [24] to exploit fast Fourier methods.

Multivariate function approximation is another area where QMC methods have recently been successful. One example is trigonometric approximation in periodic spaces, where lattice rules are used to evaluate the coefficients in a finite (or truncated) Fourier expansion (see e.g., [1, 26, 27, 28, 29, 35, 36]). Fast component-by-component algorithms for constructing good lattice rules for trigonometric approximation in periodic spaces were presented and analysed in [7, 8].

There has also been research on kernel methods for approximation such as radial basis approximation for interpolating scattered data, signal processing, and meshless methods for solving PDEs (see e.g., [11, 21, 37, 40, 45, 46]). Approximation in L2L^{2} using kernel interpolation on a lattice in a periodic setting was first analysed in [48], which also highlighted the advantage that linear systems involved can be solved efficiently via FFT; see also [47] for the LL^{\infty} case. The paper [23] introduced single level kernel interpolation with lattice points for parametric PDEs and also provided a full analysis of the L2L^{2} error on D×ΩD\times\Omega.

Multilevel methods were first introduced in [22] for parametric integration and then developed further in [15, 16] for pricing financial options by computing the expected value of a pay-off depending on a stochastic differential equation utilising paths simulated via Monte Carlo (MC). These papers showed that the overall computational cost of the multilevel estimator was lower than that of the direct estimate at the same level of error. Multilevel MC methods were extended to multilevel QMC in [18]. Subsequently, multilevel methods with MC and QMC have been successfully used in several papers to compute expectations of quantities of interest from parametric PDEs (see e.g., [2, 5, 13, 34]). The paper [43] employed a similar multilevel strategy to approximate the PDE solution on D×ΩD\times\Omega, where instead of using kernel interpolation on the parameter domain (as in this paper) they used sparse grid stochastic collocation.

To the best of our knowledge, this paper is the first to use a multilevel approach with a QMC method to approximate the solution of a PDE as a function over both the spatial and parametric domains.

The structure of the paper is as follows. Section 2 summarises the problem setting and essential background on dimension truncation, finite element methods and kernel interpolation. Section 3 introduces the multilevel kernel approximation alongside a breakdown of the error and the corresponding multilevel cost analysis. Section 4 includes the regularity analysis required for the error analysis presented in Section 5. Practical details on implementing our multilevel approximation are covered in Section 6. Finally, Section 7 presents results of the numerical experiments. Any technical proofs not detailed in the main text are provided in the appendix.

2 Background

2.1 Notation

Let 𝝂=(νj)j1{\boldsymbol{\nu}}=(\nu_{j})_{j\geq 1} with νj0\nu_{j}\in\mathbb{N}_{0} be a multi-index, and let |𝝂|j1νj|{\boldsymbol{\nu}}|\coloneqq\sum_{j\geq 1}\nu_{j} and supp(𝝂){j1:νj0}\mathrm{supp}({\boldsymbol{\nu}})\coloneq\{j\geq 1:\nu_{j}\neq 0\}. Define \mathcal{F} to be the set of multi-indices with finite support, i.e., {𝝂0:|supp(𝝂)|<}.\mathcal{F}\coloneqq\{{\boldsymbol{\nu}}\in\mathbb{N}_{0}^{\infty}\,:\,|\mathrm{supp}({\boldsymbol{\nu}})|<\infty\}. Multi-index notation is used for the parametric derivatives, i.e., the mixed partial derivative of order 𝝂{\boldsymbol{\nu}}\in\mathcal{F} with respect to 𝒚{\boldsymbol{y}} is denoted by 𝝂=j1(/yj)νj\partial^{{\boldsymbol{\nu}}}=\prod_{j\geq 1}(\partial/\partial y_{j})^{\nu_{j}}.

For 𝝂,𝒎{\boldsymbol{\nu}},{\boldsymbol{m}}\in\mathcal{F}, we define (𝝂𝒎)j1(νjmj)\binom{{\boldsymbol{\nu}}}{{\boldsymbol{m}}}\coloneqq\prod_{j\geq 1}\binom{\nu_{j}}{m_{j}} and interpret 𝒎𝝂{\boldsymbol{m}}\leq{\boldsymbol{\nu}} to be mjνjm_{j}\leq\nu_{j} for all j1j\geq 1 (i.e., comparison between indices is done componentwise). For a given sequence 𝒃=(bj)j1{\boldsymbol{b}}=(b_{j})_{j\geq 1}, we define 𝒃𝝂j1bjνj{\boldsymbol{b}}^{{\boldsymbol{\nu}}}\coloneqq\prod_{j\geq 1}b_{j}^{\nu_{j}}.

The Stirling numbers of the second kind are given by

S(0,0)1,S(n,k)1k!j=0k(1)kj(kj)jnfor 0kn.\displaystyle S(0,0)\coloneq 1,\quad S(n,k)\coloneq\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}{k\choose j}j^{n}\quad\text{for }0\leq k\leq n. (2.1)

The notation ABA\lesssim B means that there exists some c>0c>0 such that AcBA\leq cB, and ABA\simeq B means that there exists constants c1,c2>0c_{1},c_{2}>0 such that Ac1BA\leq c_{1}B and Bc2AB\leq c_{2}A.

For gL2(D)×L2(Ω)=L2(D×Ω)g\in L^{2}(D)\times L^{2}(\Omega)=L^{2}(D\times\Omega), we define the L2L^{2}-norm on D×ΩD\times\Omega by

gL2(D×Ω)=(ΩD|g(𝒙,𝒚)|2d𝒙d𝒚)1/2=(Ωg(,𝒚)L2(D)2d𝒚)1/2.\displaystyle\|g\|_{L^{2}(D\times\Omega)}=\bigg{(}\int_{\Omega}\int_{D}|g({\boldsymbol{x}},{\boldsymbol{y}})|^{2}\,\mathrm{d}{\boldsymbol{x}}\,\mathrm{d}{\boldsymbol{y}}\bigg{)}^{1/2}=\bigg{(}\int_{\Omega}\|g(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}^{2}\,\mathrm{d}{\boldsymbol{y}}\bigg{)}^{1/2}.

Furthermore, note that all functions in this paper are measurable, so by Fubini’s Theorem we can swap the order of the integrals to give L2(D×Ω)L2(Ω×D)\|\cdot\|_{L^{2}(D\times\Omega)}\equiv\|\cdot\|_{L^{2}(\Omega\times D)}.

2.2 Parametric variational formulation

Let VH01(D)V\coloneq H_{0}^{1}(D) denote the usual first order Sobolev space of functions on 𝒙D{\boldsymbol{x}}\in D that vanish on the boundary, with associated norm vVvL2(D).\|v\|_{V}\coloneq\|\nabla v\|_{L^{2}(D)}. Multiplying both sides of (1.1) by a test function vVv\in V and then integrating with respect to 𝒙{\boldsymbol{x}}, using the divergence theorem, yields the variational equation: find u(,𝒚)Vu(\cdot,{\boldsymbol{y}})\in V such that

𝒜(𝒚;u(,𝒚),v)=f,vfor all vV,\displaystyle\mathcal{A}({\boldsymbol{y}};u(\cdot,{\boldsymbol{y}}),v)=\langle f,v\rangle\quad\text{for all }v\in V, (2.2)

where fVf\in V^{\prime}, with VH1(D)V^{\prime}\coloneq H^{-1}(D) denoting the dual space of VV, and 𝒜(𝒚;,):V×V\mathcal{A}({\boldsymbol{y}};\cdot,\cdot):V\times V\to\mathbb{R} is the parametric bilinear form defined by

𝒜(𝒚;u,v)DΨ(𝒙,𝒚)u(𝒙)v(𝒙)d𝒙for u,vV.\displaystyle\mathcal{A}({\boldsymbol{y}};u,v)\coloneqq\int_{D}\Psi({\boldsymbol{x}},{\boldsymbol{y}})\nabla u({\boldsymbol{x}})\cdot\nabla v({\boldsymbol{x}})\,\mathrm{d}{\boldsymbol{x}}\quad\text{for }u,v\in V.

The L2(D)L^{2}(D) inner product ,\langle\,\cdot,\cdot\,\rangle is extended continuously to the duality pairing on V×VV\times V^{\prime}.

The variational problem (2.2) is subject to the same assumptions as in [24]:

  1. (A1)

    ψ0L(D)\psi_{0}\in L^{\infty}(D) and j1ψjL(D)<\sum_{j\geq 1}\|\psi_{j}\|_{L^{\infty}(D)}<\infty,

  2. (A2)

    there are positive constants Ψmin\Psi_{\min} and Ψmax\Psi_{\max} such that 0<ΨminΨ(𝒙,𝒚)Ψmax<0<\Psi_{\min}\leq\Psi({\boldsymbol{x}},{\boldsymbol{y}})\leq\Psi_{\max}<\infty for all 𝒙D{\boldsymbol{x}}\in D and 𝒚Ω{\boldsymbol{y}}\in\Omega,

  3. (A3)

    j1ψjL(D)p<\sum_{j\geq 1}\|\psi_{j}\|^{p}_{L^{\infty}(D)}<\infty for some 0<p<10<p<1,

  4. (A4)

    ψ0W1,(D)\psi_{0}\in W^{1,\infty}(D) and j1ψjW1,(D)<\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}<\infty,

  5. (A5)

    ψ1L(D)ψ2L(D)\|\psi_{1}\|_{L^{\infty}(D)}\geq\|\psi_{2}\|_{L^{\infty}(D)}\geq\cdots, and

  6. (A6)

    the physical domain DdD\subset\mathbb{R}^{d}, where d=1,2d=1,2 or 33, is a convex and bounded polyhedron with plane faces.

Here W1,(D)W^{1,\infty}(D) is the Sobolev space with essentially bounded first order weak derivatives, equipped with the norm vW1,(D)max{vL(D),vL(D)}\|v\|_{W^{1,\infty}(D)}\coloneq\max\{\|v\|_{L^{\infty}(D)},\|\nabla v\|_{L^{\infty}(D)}\}. For the new multilevel analysis, we will also require additional assumptions:

  1. (A7)

    j1ψjW1,(D)q<\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}^{q}<\infty for some 0<p<q10<p<q\leq 1, and

  2. (A8)

    fL2(D)f\in L^{2}(D).

We additionally define

bjψjL(D)Ψminandb¯jψjW1,(D)Ψmin.\displaystyle b_{j}\coloneqq\frac{\|\psi_{j}\|_{L^{\infty}(D)}}{\Psi_{\min}}\quad\text{and}\quad\overline{b}_{j}\coloneqq\frac{\|\psi_{j}\|_{W^{1,\infty}(D)}}{\Psi_{\min}}. (2.3)

From the Lax-Milgram Lemma, (2.2) is uniquely solvable for all 𝒚Ω{\boldsymbol{y}}\in\Omega and this solution will satisfy the a priori bound

u(,𝒚)VfVΨmin.\displaystyle\|u(\cdot,{\boldsymbol{y}})\|_{V}\leq\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}. (2.4)

In addition, from [24, Theorem 2.3] the parametric derivatives are bounded by

𝝂u(,𝒚)VfVΨmin(2π)|𝝂|𝒎𝝂|𝒎|!𝒃𝒎i1S(νi,mi)for 𝝂.\displaystyle\|\partial^{{\boldsymbol{\nu}}}u(\cdot,{\boldsymbol{y}})\|_{V}\leq\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})\quad\text{for }{\boldsymbol{\nu}}\in\mathcal{F}. (2.5)

2.3 Dimension truncation

To approximate u(,𝒚)u(\cdot,{\boldsymbol{y}}) in 𝒚{\boldsymbol{y}}, the infinite-dimensional parameter domain Ω\Omega must be first truncated to a finite number of dimensions ss. This is done by setting yj=0y_{j}=0 for j>sj>s, where we define the truncated parameter 𝒚1:s(y1,y2,,ys,0,){\boldsymbol{y}}_{1:s}\coloneq(y_{1},y_{2},\ldots,y_{s},0,\ldots), or equivalently by truncating the coefficient expansion (1.2) to ss terms. The dimension-truncated solution is denoted by us(,𝒚)u(,𝒚1:s)u^{s}(\cdot,{\boldsymbol{y}})\coloneq u(\cdot,{\boldsymbol{y}}_{1:s}) and it is obtained by solving the variational problem (2.2) at 𝒚=𝒚1:s{\boldsymbol{y}}={\boldsymbol{y}}_{1:s}. With a slight abuse of notation we treat 𝒚1:s{\boldsymbol{y}}_{1:s} as a vector in the ss-dimensional parameter domain Ωs[0,1]s\Omega_{s}\coloneqq[0,1]^{s}.

The dimension-truncated problem is subject to all the same assumptions as the variational problem (2.2) (i.e., Assumptions (A(A0))–(A(A6)) hold), hence, the a priori bound (2.4) and regularity bound (2.5) also hold here. Additionally, we have from [23, Theorem 4.1] that

uusL2(D×Ω)fVs(1p12),\displaystyle\|u-u^{s}\|_{L^{2}(D\times\Omega)}\,\lesssim\,\|f\|_{V^{\prime}}\,s^{-(\frac{1}{p}-\frac{1}{2})}, (2.6)

where the implied constant is independent of ss and ff.

2.4 Finite element methods

The solution to the PDE (1.1) will be approximated by discretising in space using the finite element (FE) method. We consider piecewise linear FE methods, however, the multilevel method can be applied using more general discretisations. Denote by VhVV_{h}\subset V the space of continuous piecewise linear functions on a shape-regular triangulation of DD with mesh width h>0h>0 and Mdim(Vh)=𝒪(hd)M\coloneq\dim(V_{h})=\mathcal{O}(h^{-d}). For 𝒚Ω{\boldsymbol{y}}\in\Omega, the FE approximation of u(,𝒚)u(\cdot,{\boldsymbol{y}}) from (2.2) is obtained by finding uh(,𝒚)Vhu_{h}(\cdot,{\boldsymbol{y}})\in V_{h} such that

𝒜(𝒚,uh(,𝒚),vh)=f,vhfor all vhVh.\mathcal{A}({\boldsymbol{y}},u_{h}(\cdot,{\boldsymbol{y}}),v_{h})\,=\,\langle f,v_{h}\rangle\quad\text{for all }v_{h}\in V_{h}. (2.7)

The FE basis functions for the space VhV_{h} are denoted by ϕh,i\phi_{h,i}, i=1,2,,Mi=1,2,\ldots,M, and the FE approximation with coefficients given by [uh,i(𝒚)]i=1M[u_{h,i}({\boldsymbol{y}})]_{i=1}^{M} can be represented as

uh(𝒙,𝒚)=i=1Muh,i(𝒚)ϕh,i(𝒙).u_{h}({\boldsymbol{x}},{\boldsymbol{y}})\,=\,\sum_{i=1}^{M}u_{h,i}({\boldsymbol{y}})\,\phi_{h,i}({\boldsymbol{x}})\,. (2.8)

Since VhVV_{h}\subset V, the a priori bound (2.4) and the regularity bound (2.5) also hold for the FE approximation uhu_{h}, as well as for FE approximation of the dimension-truncated solution, denoted uhsu_{h}^{s}.

From [23, Theorem 4.3], we have that under Assumptions (A(A0)), (A(A0)), (A(A0)), (A(A0)), and (A(A6)), the error of FE approximation satisfies

u(,𝒚)uh(,𝒚)L2(D)h2fL2(D)ash0,\displaystyle\|u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\,\lesssim\,h^{2}\,\|f\|_{L^{2}(D)}\qquad\mbox{as}\,\,\,h\to 0, (2.9)

where the implied constant is independent of hh,ff and 𝒚{\boldsymbol{y}}. By Galerkin orthogonality, the FE error u(,𝒚)uh(,𝒚)u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}}) is orthogonal to VhV_{h}, i.e.,

𝒜(𝒚;u(,𝒚)uh(,𝒚),vh)=0for all vhVh.\displaystyle\mathcal{A}({\boldsymbol{y}};\,u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}}),\,v_{h})=0\quad\text{for all }v_{h}\in V_{h}. (2.10)

We also define 𝖨:VV\mathsf{I}:V\to V to be the identity operator and 𝖯𝒚h:VVh\mathsf{P}^{h}_{{\boldsymbol{y}}}:V\to V_{h} to be the parametric FE projection operator onto VhV_{h}, which is defined for some wVw\in V by

𝒜(𝒚;(𝖨𝖯𝒚h)w,vh)=0for all vhVh.\displaystyle\mathcal{A}({\boldsymbol{y}};\,(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})w,\,v_{h})=0\quad\text{for all }v_{h}\in V_{h}. (2.11)

The approximation uh(,𝒚)u_{h}(\cdot,{\boldsymbol{y}}) is the projection of uu onto VhV_{h}, i.e., uh=𝖯𝒚huVhu_{h}=\mathsf{P}^{h}_{\boldsymbol{y}}u\in V_{h}, and by the definition of a projection (𝖯𝒚h)2=𝖯𝒚h(\mathsf{P}^{h}_{\boldsymbol{y}})^{2}=\mathsf{P}^{h}_{\boldsymbol{y}} on VhV_{h}.

Under Assumptions (A(A0)), (A(A0)) and (A(A0)), we have the following result from [3, Theorem 3.2.2] which holds for all wH2(D)Vw~\in~H^{2}(D)\cap V,

(𝖨𝖯𝒚h)wVhΔwL2(D)as h0,\displaystyle\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})w\|_{V}\lesssim h\,\|\Delta w\|_{L^{2}(D)}\quad\text{as }h\to 0, (2.12)

where the implied constant is independent of hh and 𝒚{\boldsymbol{y}}.

2.5 Lattice-based kernel interpolation

The solution u(,𝒚)u(\cdot,{\boldsymbol{y}}) will be approximated via kernel interpolation in the dimension-truncated parametric domain Ωs\Omega_{s}. Consider the weighted Korobov space α,𝜸(Ωs)\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}), which is the Hilbert space of one-periodic L2L^{2} functions defined on Ωs\Omega_{s} with absolutely convergent Fourier series and square-integrable mixed derivatives of order α\alpha. We restrict α1\alpha\geq 1 to be an integer smoothness parameter111In general, α\alpha need not be an integer (e.g., see [23, 42]), however, to have a simple, closed-form representation of the reproducing kernel and norm we restrict ourselves to integer α\alpha. and include weight parameters 𝜸={γ𝔲>0:𝔲{1:s}}{\boldsymbol{\gamma}}=\{\gamma_{\mathrm{\mathfrak{u}}}>0:{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}\} that model the relative importance of different groups of parametric variables. The space α,𝜸(Ωs)\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) is a reproducing kernel Hilbert space, equipped with the norm

gα,𝜸(Ωs)2𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]|𝔲||[0,1]s|𝔲|(j𝔲αyjα)g(𝒚)d𝒚𝔲|2d𝒚𝔲,\displaystyle\|g\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\coloneq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{|{\mathrm{\mathfrak{u}}}|}}\bigg{|}\int_{[0,1]^{s-|{\mathrm{\mathfrak{u}}}|}}\!\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}g({\boldsymbol{y}})\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\bigg{|}^{2}\,\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}, (2.13)

where 𝒚𝔲(yj)j𝔲{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}\coloneq(y_{j})_{j\in{\mathrm{\mathfrak{u}}}} and 𝒚𝔲(yj)j{1:s}\𝔲{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\coloneq(y_{j})_{j\in\{1:s\}\backslash{\mathrm{\mathfrak{u}}}}. The reproducing kernel for this space, 𝒦α,𝜸:Ωs×Ωs\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}:\Omega_{s}\times\Omega_{s}\to\mathbb{R}, is given by

𝒦α,𝜸(𝒚,𝒚)𝔲{1:s}γ𝔲j𝔲[(1)α+1(2π)2α(2α)!B2α(|yjyj|)],\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{y}},{\boldsymbol{y}}^{\prime})\,\coloneqq\,\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\gamma_{\mathrm{\mathfrak{u}}}\prod_{j\in{\mathrm{\mathfrak{u}}}}\bigg{[}(-1)^{\alpha+1}\frac{(2\pi)^{2\alpha}}{(2\alpha)!}B_{2\alpha}(|y_{j}-y^{\prime}_{j}|)\bigg{]},

where B2αB_{2\alpha} is the Bernoulli polynomial of degree 2α2\alpha.

Consider a set of lattice points {𝒕k}k=0N1\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1} defined by

𝒕k=k𝒛modNNfork=0,,N1,\displaystyle{\boldsymbol{t}}_{k}=\frac{k{\boldsymbol{z}}\,\mathrm{mod}\,N}{N}\qquad\mbox{for}\quad k=0,\ldots,N-1,

where 𝒛s{\boldsymbol{z}}\in\mathbb{N}^{s} is a generating vector with components in {1,,N1}\{1,\ldots,N-1\} that are coprime to NN. For gα,𝜸(Ωs)g\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}), the lattice-based kernel interpolant (defined using function values of gg evaluated at the points {𝒕k}k=0N1\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}) is given by

INg(𝒚)INsg(𝒚)=k=0N1aN,k𝒦α,𝜸(𝒕k,𝒚),I_{N}g({\boldsymbol{y}})\,\coloneqq\,I^{s}_{N}g({\boldsymbol{y}})=\sum_{k=0}^{N-1}a_{N,k}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}}), (2.14)

such that INg(𝒕k)=g(𝒕k)I_{N}g({\boldsymbol{t}}_{k^{\prime}})=g({\boldsymbol{t}}_{k^{\prime}}) for all k=0,,N1k^{\prime}=0,\ldots,N-1. The generating vector 𝒛{\boldsymbol{z}} is obtained using a component-by-component (CBC) construction algorithm similar to those described in [7, 8, 23], where components of the vector are selected to minimise a bound on the worst-case error of approximation using the kernel method. To ensure that INgI_{N}g interpolates gg at the lattice points, the coefficients aN,ka_{N,k} are obtained by solving the linear system

KN,α,𝜸𝒂N=𝒈N,K_{N,\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{a}}_{N}\,=\,{\boldsymbol{g}}_{N}\,, (2.15)

with 𝒂N=[aN,k]k=0N1{\boldsymbol{a}}_{N}\,=\,[a_{N,k}]_{k=0}^{N-1}, KN,α,𝜸=[𝒦α,𝜸(𝒕k,𝒕k)]k,k=0N1K_{N,\alpha,{\boldsymbol{\gamma}}}=[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})]_{k,k^{\prime}=0}^{N-1} and 𝒈N=[g(𝒕k)]k=0N1{\boldsymbol{g}}_{N}=[g({\boldsymbol{t}}_{k})]_{k=0}^{N-1}.

Due to the periodic and symmetric nature of the kernel, along with the properties of the lattice point set, the elements of KN,α,𝜸K_{N,\alpha,{\boldsymbol{\gamma}}} satisfy

[KN,α,𝜸]k,k=𝒦α,𝜸(𝒕k,𝒕k)=𝒦α,𝜸(𝒕k𝒕k,𝟎)=𝒦α,𝜸(𝒕(kk)modN,𝟎)\displaystyle[K_{N,\alpha,{\boldsymbol{\gamma}}}]_{k,k^{\prime}}=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k}-{\boldsymbol{t}}_{k^{\prime}},{\boldsymbol{0}})=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{(k-k^{\prime})\,\mathrm{mod}\,N},{\boldsymbol{0}})

for k,k=0,,N1k,k^{\prime}=0,\ldots,N-1. This implies that KN,α,𝜸K_{N,\alpha,{\boldsymbol{\gamma}}} is a symmetric, circulant matrix uniquely determined by its first column and can be diagonalised via FFT at cost 𝒪(NlogN)\mathcal{O}(N\log N). The kernel only needs to be evaluated at N/2\lceil N/2\rceil lattice points, since the first column is symmetric about its midpoint, and the linear system (2.15) can be solved after diagonalising using the FFT.

A bound on the approximation error for the kernel interpolation of gα,𝜸(Ωs)g\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) using a CBC generated lattice point set is given in [23, Theorem 3.3], which states that

(IIN)gL2(Ωs)κ[φ(N)]14λ(𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)12λgα,𝜸(Ωs)\displaystyle\|(I-I_{N})g\|_{L^{2}(\Omega_{s})}\leq\frac{\kappa}{[\varphi(N)]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}^{\frac{1}{2\lambda}}\|g\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})} (2.16)

for all λ(12α,1]\lambda\in(\frac{1}{2\alpha},1] with κ2(22αλ+1+1)14λ\kappa\coloneq\sqrt{2}\,(2^{2\alpha\lambda+1}+1)^{\frac{1}{4\lambda}}. Here IIs:α,𝜸(Ωs)α,𝜸(Ωs)I\coloneq I^{s}:\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})\to\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) is the identity operator, ζ\zeta is the Riemann zeta function defined by ζ(x)=j=1jx\zeta(x)=\sum_{j=1}^{\infty}j^{-x}, and φ\varphi is the Euler totient function. Note that the original theorem presented in [23] requires the number of points NN to be prime, however, using [30, Theorem 3.4] the result above has been extended to non-prime NN.

There are 2s2^{s} weight parameters {γ𝔲}𝔲{1:s}\{\gamma_{\mathrm{\mathfrak{u}}}\}_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}, too many to specify individually in practice. Therefore, special forms of weights γ𝔲\gamma_{\mathrm{\mathfrak{u}}} have been considered, including:

  • Product weights: γ𝔲=j𝔲γj\gamma_{\mathrm{\mathfrak{u}}}=\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j} for some positive sequence (γj)j1(\gamma_{j})_{j\geq 1};

  • POD (“product and order dependent”) weights: γ𝔲=Γ|𝔲|j𝔲γj\gamma_{\mathrm{\mathfrak{u}}}=\Gamma_{|{\mathrm{\mathfrak{u}}}|}\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j} for positive sequences (γj)j1(\gamma_{j})_{j\geq 1} and (Γj)j0(\Gamma_{j})_{j\geq 0};

  • SPOD (“smoothness-driven product and order dependent”) weights: γ𝔲=𝒗𝔲{1:α}|𝔲|\gamma_{\mathrm{\mathfrak{u}}}=\sum_{{\boldsymbol{v}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}} Γ|𝒗𝔲|j𝔲γj,vj\Gamma_{|{\boldsymbol{v}}_{\mathrm{\mathfrak{u}}}|}\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j,v_{j}} for positive sequences (γj,vj)j1(\gamma_{j,v_{j}})_{j\geq 1} and (Γj)j0(\Gamma_{j})_{j\geq 0}.

The error bound (2.16) holds for all forms of weights, but the computational cost differs, see the next subsection.

2.6 Single-level kernel interpolation for PDEs

Lattice-based kernel interpolation was first applied to PDE problems in [23]. Denoting the dimension-truncated FE solution of (2.2) by uhs(,𝒚)Vhu^{s}_{h}(\cdot,{\boldsymbol{y}})\in V_{h}, the estimate (2.16) can be applied to the single-level kernel interpolant (see [23, Theorem 4.4]) to obtain

uhsINuhsL2(D×Ω)φ(N)14λfVCs(λ)\displaystyle\|u^{s}_{h}-I_{N}u^{s}_{h}\|_{L^{2}(D\times\Omega)}\lesssim\varphi(N)^{-\frac{1}{4\lambda}}\|f\|_{V^{\prime}}\,C_{s}(\lambda) (2.17)

for all λ(12α,1]\lambda\in(\frac{1}{2\alpha},1], where

[Cs(λ)]2λ\displaystyle[C_{s}(\lambda)]^{2\lambda} (𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)\displaystyle\coloneq\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}
×(𝔲{1:s}1γ𝔲(𝒎𝔲{1:α}|𝔲||𝒎𝔲|!𝒃𝒎𝔲i𝔲S(α,mi))2)λ.\displaystyle\qquad\times\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|!\,{\boldsymbol{b}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\bigg{)}^{\lambda}.

In [23], the weights γ𝔲\gamma_{\mathrm{\mathfrak{u}}} are chosen to ensure the constant Cs(λ)C_{s}(\lambda) can be bounded independently of dimension ss. Different forms of weights (SPOD, POD, product) achieve dimension independent bounds with some concessions on the rate of convergence.

The single-level kernel interpolation methodology from [23] is summarised below:

  1. 1.

    Compute the first column of KN,α,𝜸K_{N,\alpha,{\boldsymbol{\gamma}}} in (2.15) (i.e., compute [𝒦α,𝜸(𝒕k,𝟎)]k=0N1[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{0}})]_{k=0}^{N-1}) with cost 𝒪(sραςN)\mathcal{O}(s^{\rho}\,\alpha^{\varsigma}N), where ρ=2\rho=2 and ς=2\varsigma=2 for SPOD weights, ρ=2\rho=2 and ς=0\varsigma=0 for POD weights, ρ=1\rho=1 and ς=0\varsigma=0 for product weights, see [23, Table 2].

  2. 2.

    Evaluate the coefficient (1.2) at each lattice point and FE node to set up the stiffness matrix at the cost 𝒪(shdN)\mathcal{O}(s\,h^{-d}N).

  3. 3.

    Compute the FE solution uhsu^{s}_{h} (denoted as gg in (2.15)) for the PDE associated with each lattice point to construct 𝒈N{\boldsymbol{g}}_{N} on the right-hand side of (2.15) for every FE node at the cost 𝒪(hτN)\mathcal{O}(h^{-\tau}N) for some τ>d\tau>d, with τd\tau\approx d for a linear complexity FE solver (e.g., an algebraic multigrid solver).

  4. 4.

    Solve the circulant linear system (2.15) for coefficients 𝒂N{\boldsymbol{a}}_{N} of the interpolant for every FE node at the cost 𝒪(hdNlogN)\mathcal{O}(h^{-d}N\log N).

The total cost of construction for the single-level interpolant is therefore

cost(INuhs)sραςN+shdN+hτN+hdNlogN.\displaystyle\mathrm{cost}(I_{N}u_{h}^{s})\simeq s^{\rho}\alpha^{\varsigma}N+s\,h^{-d}N+h^{-\tau}N+h^{-d}N\log N. (2.18)

There is a pre-computation cost (varies with the form of weights) associated with the CBC construction of the lattice generating vector 𝒛{\boldsymbol{z}}. There is also a post-computation cost to assemble the approximation.

From [23], the error satisfies

error(INuhs)sκ+hβ+Nμ,\displaystyle\mathrm{error}(I_{N}u^{s}_{h})\lesssim s^{-\kappa}+h^{\beta}+N^{-\mu},

see (2.6), (2.9), (2.17), with κ=1p12\kappa=\frac{1}{p}-\frac{1}{2}, β=2\beta=2, and μ=14λ\mu=\frac{1}{4\lambda} for λ(12α,1]\lambda\in(\frac{1}{2\alpha},1]. The cost can now be expressed in terms of the error ε>0\varepsilon>0 as follows. We demand that each of the three components of the error is bounded above and below by multiples of ε\varepsilon, i.e., sε1κs\simeq\varepsilon^{-\frac{1}{\kappa}}, hε1βh\simeq\varepsilon^{\frac{1}{\beta}} and Nε1μN\simeq\varepsilon^{-\frac{1}{\mu}}. It follows that there exists C>0C>0 such that NμCsκN^{\mu}\leq Cs^{\kappa}, which gives logN1μ(logC+κlogs)1μ(logC+κ)s\log N\leq\frac{1}{\mu}(\log C+\kappa\log s)\leq\frac{1}{\mu}(\log C+\kappa)\,s since s1s\geq 1, and therefore logNs\log N\lesssim s. Assuming that dτd+βκd\leq\tau\leq d+\frac{\beta}{\kappa} and treating ας\alpha^{\varsigma} as a constant, the cost (2.18) can be bounded further by

cost(INuhs)sρN+sNhdερκ1μ+ε1κ1μdβεmax(ρκ+1μ,1κ+1μ+dβ).\displaystyle\mathrm{cost}(I_{N}u^{s}_{h})\lesssim s^{\rho}N+s\,Nh^{-d}\simeq\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}+\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}\simeq\varepsilon^{-\max(\frac{\rho}{\kappa}+\frac{1}{\mu},\frac{1}{\kappa}+\frac{1}{\mu}+\frac{d}{\beta})}. (2.19)

In the case of product weights, we have ρ=1\rho=1 and cost(INuhs)ε1κ1μdβ\mathrm{cost}(I_{N}u^{s}_{h})\lesssim\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}.

3 Multilevel kernel approximation

Consider a sequence of conforming FE spaces {V}=0\{V_{\ell}\}_{\ell=0}^{\infty}, where each VVhVV_{\ell}\coloneq V_{h_{\ell}}\subset V corresponds to a shape regular triangulation 𝒯\mathcal{T}_{\ell} of DD with mesh width hmax{diam():𝒯}>0h_{\ell}\coloneqq\max\{\mathrm{diam}(\bigtriangleup):\bigtriangleup\in\mathcal{T}_{\ell}\}>0 and dim(V)=M<\dim(V_{\ell})=M_{\ell}<\infty. Recall that M=𝒪(hd)M_{\ell}=\mathcal{O}(h_{\ell}^{-d}). Then for \ell\in\mathbb{N}, we denote the dimension-truncated FE approximation in the space VV_{\ell} by u(,𝒚)uhs(,𝒚)=uhs(,𝒚1:s)Vu_{\ell}(\cdot,{\boldsymbol{y}})\coloneq u^{s}_{h_{\ell}}(\cdot,{\boldsymbol{y}})=u^{s}_{h_{\ell}}(\cdot,{\boldsymbol{y}}_{1:s})\in V_{\ell}.

For a maximum level LL\in\mathbb{N} and setting u10u_{-1}\coloneq 0, the multilevel kernel approximation is given by (1.4), where {I}\{I_{\ell}\}_{\ell\in\mathbb{N}} is a sequence of interpolation operators such that each IINI_{\ell}\coloneqq I_{N_{\ell}} is a real-valued kernel interpolant based on NN_{\ell} lattice points {𝒕,k}k=0N1\{{\boldsymbol{t}}_{\ell,k}\}_{k=0}^{N_{\ell}-1} as defined in Section 2.5. The interpolants are ordered in terms of nonincreasing accuracy, or equivalently, nonincreasing numbers of interpolation points, i.e., N0N1N2N_{0}\geq N_{1}\geq N_{2}\geq\cdots. The intuition is that the magnitude of uu1u_{\ell}-u_{\ell-1} decreases with increasing \ell, thus requiring fewer interpolation points to achieve reasonable accuracy.

We assume that the FE solution at each level is approximated with increasingly fine meshes corresponding to mesh widths h0>h1>h_{0}>h_{1}>\cdots (i.e., the approximation uu_{\ell} increases in accuracy as the level increases), so that approximations using large values of NN_{\ell} are compensated by coarser FE meshes, thus moderating the cost. To simplify the ML algorithm, we additionally assume that the FE spaces are nested, i.e., VV+1V_{\ell}\subset V_{\ell+1} for \ell\in\mathbb{N}. If non-nested FE spaces are used, a “supermesh” (the mesh corresponding to the space spanned by the basis functions of both VV_{\ell} and V1V_{\ell-1}) must be considered, which adds to the computational cost (see e.g., [9, 10, 12]).

Similarly, we assume the lattice points on each level are nested (i.e., the points used on each level \ell form a subset of the points used on level 1\ell-1), which can be achieved using an embedded lattice rule (see [30]). Thus, the space spanned by the kernel basis functions on level \ell is a subspace of the one spanned by the kernel basis functions on level 1\ell-1. This proves to be advantageous since the generating vector for the lattice only needs to be constructed once and FE evaluations can be reused between levels, e.g., evaluations used to compute I1u1I_{\ell-1}u_{\ell-1} can be reused to compute Iu1I_{\ell}u_{\ell-1}.

To compute the multilevel kernel approximation, on each level \ell, we compute the NN_{\ell}-point interpolant II_{\ell} of the difference of a FE approximation on a fine mesh, u(,𝒚)Vu_{\ell}(\cdot,{\boldsymbol{y}})\in V_{\ell}, and a coarse mesh, u1(,𝒚)V1u_{\ell-1}(\cdot,{\boldsymbol{y}})\in V_{\ell-1}. As a result, the final approximation is not a direct interpolation of the solution uu, and thus ILMLI_{L}^{\rm{ML}} will be referred to exclusively as the multilevel kernel approximation.

3.1 Error decomposition for multilevel methods

The multilevel kernel approximation error can be expressed as (omitting the dependence on 𝒙{\boldsymbol{x}} and 𝒚{\boldsymbol{y}})

uILMLu\displaystyle u-I^{\mathrm{ML}}_{L}u =u=0LI(uu1)+=0LI(uu1)=0LI(uu1)\displaystyle=u-\sum_{\ell=0}^{L}I(u_{\ell}-u_{\ell-1})+\sum_{\ell=0}^{L}I(u_{\ell}-u_{\ell-1})-\sum_{\ell=0}^{L}I_{\ell}(u_{\ell}-u_{\ell-1})
=uuL+=0L(II)(uu1),\displaystyle=u-u_{L}+\sum_{\ell=0}^{L}(I-I_{\ell})(u_{\ell}-u_{\ell-1}),

where I:α,𝜸(Ωs)α,𝜸(Ωs)I:\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})\to\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) denotes the identity operator, and we define u10u_{-1}\coloneq 0. Following the methodology of [23], we take the L2(D)L^{2}(D) norm and L2(Ω)L^{2}(\Omega) norm, then use the triangle inequality to obtain the error estimate,

error(ILMLu)\displaystyle\mathrm{error}(I^{ML}_{L}u) =uILMLuL2(Ω×D)\displaystyle=\|u-I^{\mathrm{ML}}_{L}u\|_{L^{2}(\Omega\times D)}
uuLL2(Ω×D)+=0L(II)(uu1)L2(Ω×D).\displaystyle\leq\|u-u_{L}\|_{L^{2}(\Omega\times D)}+\sum_{\ell=0}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}. (3.1)

The first term in (3.1) is often referred to as the bias in multilevel literature, which can be further separated into a dimension truncation error and a FE error

uuLL2(Ω×D)uusL2(Ω×D)+usuhLsL2(Ω×D),\displaystyle\|u-u_{L}\|_{L^{2}(\Omega\times D)}\leq\|u-u^{s}\|_{L^{2}(\Omega\times D)}+\|u^{s}-u^{s}_{h_{L}}\|_{L^{2}(\Omega\times D)}, (3.2)

where the two components can be bounded using (2.6) and (2.9). The bias is controlled by choosing ss and hLh_{L} as necessary to obtain a prescribed error. The second term in (3.1) is the error associated with the multilevel scheme, which is controlled by the choice of interpolation points NN_{\ell} and the fineness of the FE mesh hh_{\ell} at each level.

3.2 Cost analysis of multilevel methods

The cost of constructing the multilevel kernel approximation is similar to the single-level interpolant, except now the cost is “spread” over multiple levels. Recall from Subsection 2.6 that the total cost of construction for the single-level kernel interpolant is given by (2.18). For the multilevel algorithm, the total cost is from

  1. 1.

    evaluating the kernel functions for the full lattice point set, 𝒪(sραςN0)\mathcal{O}(s^{\rho}\,\alpha^{\varsigma}N_{0}),

  2. 2.

    then, for each level \ell with NN0N_{\ell}\leq N_{0},

    1. (a)

      evaluating the coefficient (1.2) at each lattice point and FE node to set up the stiffness matrix, 𝒪(shdN)\mathcal{O}(s\,h^{-d}_{\ell}N_{\ell}),

    2. (b)

      solving for the FE solution at each lattice point, 𝒪(hτN)\mathcal{O}(h_{\ell}^{-\tau}N_{\ell}), and

    3. (c)

      solving the linear system for coefficients of the interpolant at every FE node, 𝒪(hdNlogN)\mathcal{O}(h_{\ell}^{-d}N_{\ell}\log N_{\ell}).

Since the lattice points are nested, the total cost of computation is

cost(ILMLu)sραςN0+=0L(shdN+hτN+hdNlogN).\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\simeq s^{\rho}\alpha^{\varsigma}N_{0}+\sum_{\ell=0}^{L}(s\,h_{\ell}^{-d}N_{\ell}+h_{\ell}^{-\tau}N_{\ell}+h_{\ell}^{-d}N_{\ell}\log N_{\ell}). (3.3)

Since kernel functions only need to be evaluated for each lattice point once and can be reused at each level as necessary, this cost, given by sραςN0s^{\rho}\alpha^{\varsigma}N_{0}, is independent of \ell and is outside the summation. For details on practical implementation, see Section 6.

3.3 Abstract complexity analysis

Theorem 1 below is an abstract complexity theorem for the error and cost of our multilevel approximation. It specifies a choice of ss, LL, NN_{\ell} and hh_{\ell} for =0,,L\ell=0,\ldots,L such that the total error of approximation can be bounded by some given ε>0\varepsilon>0. Assumption (M(M2)) is motivated by the bias split (3.2) together with (2.6) and (2.9), and ss is chosen to balance the two terms. Assumption (M(M2)) is motivated by the cost estimate (3.3), where ας\alpha^{\varsigma} is treated as constant, and a bound on τ\tau is assumed to simplify the cost. The error analysis in Section 5 will justify Assumption (M(M2)) with precise values of the relevant constants.

Theorem 1.

Given h0(0,1)h_{0}\in(0,1) and d1d\geq 1, define hh0 2h_{\ell}\coloneq h_{0}\,2^{-\ell} for 0\ell\geq 0, and suppose there are positive constants β,κ,μ,ρ\beta,\kappa,\mu,\rho, and τ\tau such that

  1. (M1)

    uuLL2(Ω×D)sκ+hLβ\|u-u_{L}\|_{L^{2}(\Omega\times D)}\lesssim s^{-\kappa}+h_{L}^{\beta},

  2. (M2)

    (II0)u0L2(Ω×D)N0μ\|(I-I_{0})u_{0}\|_{L^{2}(\Omega\times D)}\lesssim N_{0}^{-\mu} and (II)(uu1)L2(Ω×D)Nμh1β\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim N_{\ell}^{-\mu}h_{\ell-1}^{\beta}   for =1,,L\ell=1,\ldots,L, and

  3. (M3)

    cost(ILMLu)sρN0+=0LN(shd+hτ+hdlogN)\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,\displaystyle s^{\rho}N_{0}+\sum_{\ell=0}^{L}N_{\ell}\,(s\,h_{\ell}^{-d}+h_{\ell}^{-\tau}+h_{\ell}^{-d}\log N_{\ell}).

Given 0<ε<min(1,2h0β)0<\varepsilon<\min(1,2h_{0}^{\beta}), and assuming dτd+βκd\leq\tau\leq d+\frac{\beta}{\kappa}, we may choose integers LL given by (3.6), shLβκs\simeq h_{L}^{-\frac{\beta}{\kappa}}, and N0,,NLN_{0},\ldots,N_{L} given by (3.14), such that error(ILMLu)ε\mathrm{error}(I^{\mathrm{ML}}_{L}u)\lesssim\varepsilon and

cost(ILMLu){ερκ1μwhendβ<1μ,ερκ1μ(logε1)1+1μwhendβ=1μ,ερκ11+μ(dβ+1)when1μ<dβ1μ+(1μ+1)ρ1κ,ε1κdβwhendβ>1μ+(1μ+1)ρ1κ,\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim\begin{cases}\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}&\quad\mbox{when}\,\,\frac{d}{\beta}<\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}\,(\log\varepsilon^{-1})^{1+\frac{1}{\mu}}&\quad\mbox{when}\,\,\frac{d}{\beta}=\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{1+\mu}(\frac{d}{\beta}+1)}&\quad\mbox{when}\,\,\frac{1}{\mu}<\frac{d}{\beta}\leq\frac{1}{\mu}+(\frac{1}{\mu}+1)\frac{\rho-1}{\kappa},\\ \varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}&\quad\mbox{when}\,\,\frac{d}{\beta}>\frac{1}{\mu}+(\frac{1}{\mu}+1)\frac{\rho-1}{\kappa},\end{cases} (3.4)

where the implied constants depend on h0,d,β,κ,μ,ρh_{0},d,\beta,\kappa,\mu,\rho, and τ\tau.

Proof.

Substituting Assumptions (M(M2)) and (M(M2)) into (3.1) gives the error bound

error(ILMLu)sκ+hLβ+N0μ+=1LNμh1β.\displaystyle\mathrm{error}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,s^{-\kappa}+h_{L}^{\beta}+N_{0}^{-{\mu}}+\sum_{\ell=1}^{L}N_{\ell}^{-{\mu}}h_{\ell-1}^{\beta}.

Choosing ss to balance the two components of error in Assumption (M(M2)), i.e., setting sκhLβs^{-\kappa}\simeq h_{L}^{\beta}, the bound further simplifies to

error(ILMLu)\displaystyle\mathrm{error}(I^{\mathrm{ML}}_{L}u)\, hLβ+=0LNμhβ,\displaystyle\lesssim\,{h_{L}^{\beta}+\sum_{\ell=0}^{L}N_{\ell}^{-{\mu}}h_{\ell}^{\beta},} (3.5)

where the implied constant depends on a factor max(h0β,2β)\max(h_{0}^{-\beta},2^{\beta}).

We require that error(ILMLu)ε\mathrm{error}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,\varepsilon, which holds if each of the two terms in (3.5) is bounded by ε/2\varepsilon/2. So from the first term we choose LL such that hLβ=h0β 2Lβε/2h_{L}^{\beta}=h_{0}^{\beta}\,2^{-L\beta}\leq\varepsilon/2, yielding the conditions Llog2(2h0βε1)/βL\geq\log_{2}(2h_{0}^{\beta}\,\varepsilon^{-1})/\beta. Taking the smallest allowable value of LL with the ceiling function, we obtain

Llog2(2h0βε1)β,which implies2Lε1β.\displaystyle L\coloneq\bigg{\lceil}\frac{\log_{2}(2h_{0}^{\beta}\,\varepsilon^{-1})}{\beta}\bigg{\rceil},\quad\mbox{which implies}\quad 2^{L}\simeq\varepsilon^{-\frac{1}{\beta}}. (3.6)

To ensure L1L\geq 1 so that we are not in the trivial single-level case, we require that the value inside the ceiling function is positive, giving the condition ε<2h0β\varepsilon<2h_{0}^{\beta}. For the second term in (3.5), we demand that

=0LNμhβε2.\displaystyle\sum_{\ell=0}^{L}N_{\ell}^{-{\mu}}h_{\ell}^{\beta}\leq\frac{\varepsilon}{2}. (3.7)

Since we have assumed that τd+βκ\tau\leq d+\frac{\beta}{\kappa}, which follows from desiring shdhLβκhdhτs\,h_{\ell}^{-d}\simeq h_{L}^{-\frac{\beta}{\kappa}}h_{\ell}^{-d}\geq h_{\ell}^{-\tau}, Assumption (M(M2)) simplifies to

cost(ILMLu)sρN0+=0LN(shd+hdlogN).\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim s^{\rho}N_{0}+\sum_{\ell=0}^{L}N_{\ell}\,(s\,h_{\ell}^{-d}+h_{\ell}^{-d}\log N_{\ell}). (3.8)

If logNs\log N_{\ell}\lesssim s for all =0,L\ell=0,\ldots L, then, using sκhLβs^{-\kappa}\simeq h_{L}^{\beta}, the cost (3.8) can be bounded by

cost(ILMLu)hLρβκN0+hLβκ=0LNhd,\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim h_{L}^{-\frac{\rho\,\beta}{\kappa}}N_{0}+h_{L}^{-\frac{\beta}{\kappa}}\sum_{\ell=0}^{L}N_{\ell}\,h_{\ell}^{-d}, (3.9)

where the first term represents a setup cost and the second term is the multilevel cost.

We now proceed to choose N0,,NLN_{0},\ldots,N_{L} by minimising the multilevel cost term in (3.9) subject to the constraint (3.7), with equality instead of \leq. We will later verify that the condition logNs\log N_{\ell}\lesssim s is indeed true. The Lagrangian for this optimisation is

(N^0,,N^L,χ)hLβκ=0LN^hd+χ(=0LN^μhβε2),\displaystyle\mathcal{L}(\widehat{N}_{0},\ldots,\widehat{N}_{L},\chi)\coloneq h_{L}^{-\frac{\beta}{\kappa}}\sum_{\ell=0}^{L}\widehat{N}_{\ell}\,h_{\ell}^{-d}+\chi\bigg{(}\sum_{\ell=0}^{L}\widehat{N}_{\ell}^{-{\mu}}h^{\beta}_{\ell}-\frac{\varepsilon}{2}\bigg{)},

where χ\chi is the Lagrange multiplier and N^\widehat{N}_{\ell} for =0,,L\ell=0,\ldots,L are continuous variables. This gives us the following first-order optimality conditions

N^\displaystyle\frac{\partial\mathcal{L}}{\partial\widehat{N}_{\ell}} =hLβκhdχμN^μ1hβ=0for =0,,L,\displaystyle=h_{L}^{-\frac{\beta}{\kappa}}h_{\ell}^{-d}-{\chi}\,{\mu}\,\widehat{N}_{\ell}^{-{\mu}-1}h_{\ell}^{\beta}=0\quad\mbox{for }\ell=0,\ldots,L, (3.10)
χ\displaystyle\frac{\partial\mathcal{L}}{\partial\chi} ==0LN^μhβε2=0.\displaystyle=\sum_{\ell=0}^{L}\widehat{N}_{\ell}^{-{\mu}}h_{\ell}^{\beta}-\frac{\varepsilon}{2}=0. (3.11)

Rearranging (3.10) gives N^1+μh(d+β)=χμhLβκ\widehat{N}_{\ell}^{1+{\mu}}h_{\ell}^{-(d+\beta)}=\chi\,\mu\,h_{L}^{\frac{\beta}{\kappa}} for =0,,L\ell=0,\ldots,L, noting that the right-hand side is independent of \ell. Thus, we have N^1+μh(d+β)=N^01+μh0(d+β)\widehat{N}_{\ell}^{1+{\mu}}h_{\ell}^{-(d+\beta)}=\widehat{N}_{0}^{1+{\mu}}h_{0}^{-(d+\beta)} for =1,,L\ell=1,\ldots,L, so

N^=N^0(hh0)d+β1+μ=N^0(2)d+β1+μfor =1,,L.\displaystyle\widehat{N}_{\ell}=\widehat{N}_{0}\,\bigg{(}\frac{h_{\ell}}{h_{0}}\bigg{)}^{\frac{d+\beta}{1+\mu}}=\widehat{N}_{0}\,(2^{-\ell})^{\frac{d+\beta}{1+\mu}}\quad\mbox{for }\ell=1,\ldots,L. (3.12)

Substituting (3.12) into (3.11) gives

N^0μh0β=0L(2)μ(d+β)1+μβ=ε2,which yieldsN^0=(2ε1h0β=0L2(dμβ)1+μ)1μ.\displaystyle\widehat{N}_{0}^{-{\mu}}h_{0}^{\beta}\sum_{\ell=0}^{L}(2^{\ell})^{\frac{\mu(d+\beta)}{1+\mu}-\beta}=\frac{\varepsilon}{2},\quad\mbox{which yields}\quad\widehat{N}_{0}=\bigg{(}2\,\varepsilon^{-1}h_{0}^{\beta}\sum_{\ell=0}^{L}2^{\frac{(d\mu-\beta)\ell}{1+\mu}}\bigg{)}^{\frac{1}{\mu}}. (3.13)

To obtain integer values for NN_{\ell}, we define

NN^=N^0 2(d+β)1+μfor =0,,L.\displaystyle N_{\ell}\coloneq\big{\lceil}\widehat{N}_{\ell}\big{\rceil}=\Big{\lceil}\widehat{N}_{0}\,{2}^{\frac{-(d+\beta)\ell}{1+\mu}}\Big{\rceil}\qquad\mbox{for }\ell=0,\ldots,L. (3.14)

Since NN^N_{\ell}\geq\widehat{N}_{\ell}, the bound (3.7) continues to hold for this choice of NN_{\ell}.

Clearly, N0N1NLN_{0}\geq N_{1}\geq\cdots\geq N_{L} as required. We now verify that logN0s\log N_{0}\lesssim s. Since ε<2h0β\varepsilon<2h_{0}^{\beta}, we have N^0>1\widehat{N}_{0}>1 from (3.13), and therefore

N0<N^0+1<2N^0\displaystyle N_{0}<\widehat{N}_{0}+1<2\widehat{N}_{0} 2(2ε1h0β(L+1) 2|dμβ|1+μL)1με(1+1β+|dμβ|(1+μ)β)1μ,\displaystyle\leq 2\bigg{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,(L+1)\,2^{\frac{|d\mu-\beta|}{1+\mu}L}\bigg{)}^{\frac{1}{\mu}}\lesssim\varepsilon^{-(1+\frac{1}{\beta}+\frac{|d\mu-\beta|}{(1+\mu)\beta})\frac{1}{\mu}},

where we loosely overestimated the geometric series in (3.13) by taking L+1L+1 times the largest possible term with absolute value in the exponent, and then used L+12LL+1\leq 2^{L} and 2Lε1β2^{L}\simeq\varepsilon^{-\frac{1}{\beta}} from (3.6). Thus logN0logε1\log N_{0}\lesssim\log\varepsilon^{-1}. On the other hand, we have sκhLβε2s^{-\kappa}\simeq h_{L}^{\beta}\leq\frac{\varepsilon}{2} and so sε1κs\gtrsim\varepsilon^{-\frac{1}{\kappa}}. Hence we have logN0s\log N_{0}\lesssim s as required. We conclude that the results from the optimisation with respect to the simplified cost function (3.9) can be applied to the multilevel problem with cost given by Assumption (M(M2)).

We now verify that the cost satisfies (3.4) by substituting N02N^0N_{0}\leq 2\widehat{N}_{0}, NN^+1=N^0 2(d+β)1+μ+1N_{\ell}\leq\widehat{N}_{\ell}+1=\widehat{N}_{0}\,2^{-\frac{(d+\beta)\ell}{1+\mu}}+1, (3.13), h=h0 2h_{\ell}=h_{0}\,2^{-\ell}, and 2Lε1β2^{L}\simeq\varepsilon^{\frac{1}{\beta}} into (3.9), resulting in

cost(ILMLu)h0ρβκερκN^0+h0βκdε1κ=0L(N^0 2(d+β)1+μ+1) 2d\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim h_{0}^{-\frac{\rho\beta}{\kappa}}\varepsilon^{-\frac{\rho}{\kappa}}\widehat{N}_{0}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\sum_{\ell=0}^{L}\Big{(}\widehat{N}_{0}\,2^{-\frac{(d+\beta)\ell}{1+\mu}}+1\Big{)}\,2^{d\ell}
=h0ρβκερκ(2ε1h0βEL)1μ+h0βκdε1κ(2ε1h0βEL)1μEL+h0βκdε1κ=0L2d\displaystyle=h_{0}^{-\frac{\rho\beta}{\kappa}}\varepsilon^{-\frac{\rho}{\kappa}}\Big{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,E_{L}\Big{)}^{\frac{1}{\mu}}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\Big{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,E_{L}\Big{)}^{\frac{1}{\mu}}E_{L}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\sum_{\ell=0}^{L}2^{d\ell}
ερκ1μEL1μ+ε1κ1μEL1μ+1+ε1κdβ\displaystyle\lesssim\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}\,E_{L}^{\frac{1}{\mu}}+\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}}\,E_{L}^{\frac{1}{\mu}+1}+\varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}
{ερκ1μwhen dβ<1μ,ερκ1μ(logε1)1μ+1when dβ=1μ,ερκ11+μ(dβ+1)+ε1κdβwhen dβ>1μ,\displaystyle\simeq\begin{cases}\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}&\mbox{when }\frac{d}{\beta}<\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}(\log\varepsilon^{-1})^{\frac{1}{\mu}+1}&\mbox{when }\frac{d}{\beta}=\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{1+\mu}(\frac{d}{\beta}+1)}+\varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}&\mbox{when }\frac{d}{\beta}>\frac{1}{\mu},\end{cases}

where the implied constant includes a factor h0max(ρβκ,βκ+d)h_{0}^{-\max(\frac{\rho\beta}{\kappa},\frac{\beta}{\kappa}+d)}, and we used

EL:==0L2(dμβ)1+μ{1when dμ<β(i.e., dβ<1μ),Llogε1when dμ=β(i.e., dβ=1μ),2dμβ1+μLεdμββ(1+μ)when dμ>β(i.e., dβ>1μ).\displaystyle E_{L}:=\sum_{\ell=0}^{L}2^{\frac{(d\mu-\beta)\ell}{1+\mu}}\simeq\begin{cases}1&\mbox{when }d\mu<\beta\;(\mbox{i.e., }\frac{d}{\beta}<\frac{1}{\mu}),\\ L\simeq\log\varepsilon^{-1}&\mbox{when }d\mu=\beta\;(\mbox{i.e., }\frac{d}{\beta}=\frac{1}{\mu}),\\ 2^{\frac{d\mu-\beta}{1+\mu}L}\simeq\varepsilon^{-\frac{d\mu-\beta}{\beta(1+\mu)}}&\mbox{when }d\mu>\beta\;(\mbox{i.e., }\frac{d}{\beta}>\frac{1}{\mu}).\end{cases}

The final case in the cost (i.e., when dβ>1μ\frac{d}{\beta}>\frac{1}{\mu}) is split into two cases based on which of the two terms dominates, resulting in the four cases in (3.4). ∎

The third case in (3.4) becomes obsolete when ρ=1\rho=1, i.e., when we have product weights. In this scenario, to achieve an accuracy 𝒪(ε)\mathcal{O}(\varepsilon), the cost for the multilevel approximation is 𝒪(ε1κmax(1μ,dβ))\mathcal{O}\big{(}\varepsilon^{-\frac{1}{\kappa}-\max(\frac{1}{\mu},\frac{d}{\beta})}\big{)}, compared to 𝒪(ε1κ1μdβ)\mathcal{O}\big{(}\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}\big{)} for the single-level approximation. This multilevel cost is near-optimal, since the cost of a single FE evaluation at the finest level is 𝒪(ε1κdβ)\mathcal{O}(\varepsilon^{-\frac{{1}}{\kappa}-\frac{d}{\beta}}) and the cost of interpolation at level 0 at a single node is 𝒪(ε1κ1μ)\mathcal{O}(\varepsilon^{-\frac{1}{\kappa}-{\frac{1}{\mu}}}).

We note that when ρ>1\rho>1, we may encounter the scenario where the single-level cost and multilevel cost have the same order. This occurs when the setup cost sρN0s^{\rho}N_{0} in Assumption (M(M2)) dominates, either due to a large dimension ss or because the contribution to the cost from the FE solves is relatively small due to fast convergence in FE error (i.e., β\beta is large). Such cases are exceptional, and it would be unnecessary to consider a multilevel approach in these situations.

4 Parametric regularity analysis

Analysis of the multilevel kernel approximation error for parametric PDEs requires bounds on the mixed derivatives with respect to both the physical variable 𝒙{\boldsymbol{x}} and the parametric variable 𝒚{\boldsymbol{y}} simultaneously. The proofs of all parametric regularity lemmas in this section are given in Appendix II.

The following lemma provides a bound on the Laplacian of the derivatives (with respect to the parametric variables) of the solution to (2.2). It shows that uu and its derivatives with respect to 𝒚{\boldsymbol{y}} possess sufficient spatial regularity to establish estimates for the FE error. Recall that Stirling numbers are defined by (2.1).

Lemma 2.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for every 𝐲Ω{\boldsymbol{y}}\in\Omega, let u(,𝐲)Vu(\cdot,{\boldsymbol{y}})\in V be the solution to the problem (2.2). Then for every 𝛎{\boldsymbol{\nu}}\in\mathcal{F} and all 𝐲Ω{\boldsymbol{y}}\in\Omega, we have that 𝛎u(,𝐲)H2(D)V\partial^{\boldsymbol{\nu}}u(\cdot,{\boldsymbol{y}})\in H^{2}(D)\cap V and

Δ(𝝂u(,𝒚))L2(D)fL2(D)(2π)|𝝂|𝒎𝝂(|𝒎|+1)!𝒃¯𝒎i1S(νi,mi),\displaystyle\|\Delta(\partial^{{\boldsymbol{\nu}}}u(\cdot,{\boldsymbol{y}}))\|_{L^{2}(D)}\lesssim\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+1)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}), (4.1)

where 𝐛¯=(b¯j)j1\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1} is defined in (2.3) and the implied constant is independent of 𝐲{\boldsymbol{y}}.

The following lemma gives a bound on the derivatives with respect to the parametric variables of the FE error in VV.

Lemma 3.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for every 𝐲Ω{\boldsymbol{y}}\in\Omega, let u(,𝐲)Vu(\cdot,{\boldsymbol{y}})\in V be the solution to (2.2) and uh(,𝐲)Vhu_{h}(\cdot,{\boldsymbol{y}})\in V_{h} be its piecewise linear FE approximation (2.7). Then for every 𝛎{\boldsymbol{\nu}}\in\mathcal{F}, and sufficiently small h>0h>0, we have

𝝂(uuh)(,𝒚)VhfL2(D)(2π)|𝝂|𝒎𝝂(|𝒎|+2)!𝒃¯𝒎i1S(νi,mi),\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})(\cdot,{\boldsymbol{y}})\|_{V}\lesssim\,h\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}), (4.2)

where 𝐛¯=(b¯j)j1\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1} is defined in (2.3) and the implied constant is independent of hh and 𝐲{\boldsymbol{y}}.

We are interested in regularity estimates with respect to the L2L^{2} norm. Thus, Lemma 4 presents the bound on the L2L^{2} norm of the derivatives of FE error obtained using a duality argument.

Lemma 4.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for every 𝐲Ω{\boldsymbol{y}}\in\Omega, let u(,𝐲)Vu(\cdot,{\boldsymbol{y}})\in V be the solution to (2.2) and uh(,𝐲)Vhu_{h}(\cdot,{\boldsymbol{y}})\in V_{h} be its piecewise linear FE approximation (2.7). Then for every 𝛎{\boldsymbol{\nu}}\in\mathcal{F}, and sufficiently small h>0h>0, we have

𝝂(uuh)(,𝒚)L2(D)h2fL2(D)(2π)|𝝂|𝒎𝝂(|𝒎|+5)!𝒃¯𝒎i1S(νi,mi),\displaystyle\|\partial^{\boldsymbol{\nu}}(u-u_{h})(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\lesssim h^{2}\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

where 𝐛¯=(b¯j)j1\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1} is defined in (2.3) and the implied constant is independent of hh and 𝐲{\boldsymbol{y}}.

5 Multilevel error analysis

We are now ready to derive an estimate for the final component of the error in (3.1).

5.1 Estimating the multilevel FE error

Clearly, gL2(Ω)gL2(Ωs)\|g\|_{L^{2}(\Omega)}\equiv\|g\|_{L^{2}(\Omega_{s})} for any gg that is a function solely of 𝒚1:s{\boldsymbol{y}}_{1:s}. Then, taking the L2(Ω)L^{2}(\Omega)-norm and using the triangle inequality, we have from (2.16) the following pointwise bound for some 𝒙D{\boldsymbol{x}}\in D,

(II)(uu1)(𝒙,)L2(Ω)=(II)(uu1)(𝒙,)L2(Ωs)\displaystyle\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\|_{L^{2}(\Omega)}=\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\|_{L^{2}(\Omega_{s})}
κ[φ(N)]14λ(𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)12λ(uu1)(𝒙,)α,𝜸(Ωs).\displaystyle\leq\frac{\kappa}{[\varphi(N_{\ell})]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}^{\frac{1}{2\lambda}}\|(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}. (5.1)

where

(uu1)(𝒙,)α,𝜸(Ωs)(usuhs)(𝒙,)α,𝜸(Ωs)+(usuh1s)(𝒙,)α,𝜸(Ωs).\displaystyle\|(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}\leq\|(u^{s}-u^{s}_{h_{\ell}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}+\|(u^{s}-u^{s}_{h_{\ell-1}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}. (5.2)

It then follows that the sum over \ell in (3.1) can be bounded by

(II0)u0L2(Ωs×D)+=1L(II)(uu1)L2(Ωs×D)\displaystyle\|(I-I_{0})\,u_{0}\|_{L^{2}(\Omega_{s}\times D)}+\sum_{\ell=1}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega_{s}\times D)}
fVCs(λ)[φ(N0)]14λ+=1Lκ[φ(N)]14λ(𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)12λ\displaystyle\leq\frac{\|f\|_{V^{\prime}}C_{s}(\lambda)}{[\varphi(N_{0})]^{\frac{1}{4\lambda}}}+\sum_{\ell=1}^{L}\frac{\kappa}{[\varphi(N_{\ell})]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}^{\frac{1}{2\lambda}}
×(D(usuhs)(𝒙,)α,𝜸(Ωs)2d𝒙+D(usuh1s)(𝒙,)α,𝜸(Ωs)2d𝒙),\displaystyle\quad\times\bigg{(}\sqrt{\int_{D}\|(u^{s}-u^{s}_{h_{\ell}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}+\sqrt{\int_{D}\|(u^{s}-u^{s}_{h_{\ell-1}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}\,\bigg{)}, (5.3)

where we separate out the =0\ell=0 term as it is simply the standard single-level kernel interpolation error for which the bound (2.17) applies. Combining (5.1) and (5.2) with the triangular inequality gives the last line.

Therefore, we seek to estimate the individual finite element error terms in the above expression to estimate the full multilevel kernel approximation error.

Theorem 5.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for α1\alpha\geq 1 and weight parameters (γ𝔲)𝔲(\gamma_{{\mathrm{\mathfrak{u}}}})_{{\mathrm{\mathfrak{u}}}\subset\mathbb{N}}, let usVu^{s}\in V be the solution to (2.2) and uhsVhu^{s}_{h}\in V_{h} be the FE approximation (2.7) to usu^{s}. Then the following estimate holds

D(usuhs)(𝒙,)α,𝜸(Ωs)2d𝒙\displaystyle\sqrt{\int_{D}\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}
h2fL2(D)𝔲{1:s}1γ𝔲(𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|+5)!𝒃¯𝒎𝔲i𝔲S(α,mi))2,\displaystyle\lesssim h^{2}\,\|f\|_{L^{2}(D)}\sqrt{\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}},

where the implied constant is independent of hh.

Proof.

From the definition of the norm of α,𝜸(Ωs)\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) given in (2.13) and the Cauchy-Schwarz inequality, it follows that for any 𝒙D{\boldsymbol{x}}\in D,

(usuhs)(𝒙,)α,𝜸(Ωs)2\displaystyle\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}
=𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]|𝔲||[0,1]s|𝔲|(j𝔲αyjα)(usuhs)(𝒙,𝒚)d𝒚𝔲|2d𝒚𝔲\displaystyle=\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{|{\mathrm{\mathfrak{u}}}|}}\bigg{|}\int_{[0,1]^{s-|{\mathrm{\mathfrak{u}}}|}}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\,\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\bigg{|}^{2}\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}
𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]|𝔲|[0,1]s|𝔲||(j𝔲αyjα)(usuhs)(𝒙,𝒚)|2d𝒚𝔲d𝒚𝔲.\displaystyle\leq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{|{\mathrm{\mathfrak{u}}}|}}\int_{[0,1]^{s-|{\mathrm{\mathfrak{u}}}|}}\bigg{|}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\bigg{|}^{2}\,\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\,\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}.

Now, taking the L2L^{2}-norm with respect to DD and applying the Fubini-Tonelli theorem gives

D(usuhs)(𝒙,)α,𝜸(Ωs)2d𝒙\displaystyle\int_{D}\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}
𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]sD((j𝔲αyjα)(usuhs)(𝒙,𝒚))2d𝒙d𝒚\displaystyle\leq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\int_{D}\bigg{(}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{x}}\,\mathrm{d}{\boldsymbol{y}}
=𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]s(j𝔲αyjα)(usuhs)(,𝒚)L2(D)2d𝒚\displaystyle=\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{\|}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})(\cdot,{\boldsymbol{y}})\bigg{\|}^{2}_{L^{2}(D)}\,\mathrm{d}{\boldsymbol{y}}
𝔲{1:s}h4fL2(D)2γ𝔲[0,1]s(𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|+5)!𝒃¯𝒎𝔲i𝔲S(α,mi))2d𝒚\displaystyle\lesssim\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{h^{4}\,\|f\|^{2}_{L^{2}(D)}}{\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{y}}
=h4fL2(D)2𝔲{1:s}1γ𝔲(𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|+5)!𝒃¯𝒎𝔲i𝔲S(α,mi))2.\displaystyle=\,\,h^{4}\,\|f\|^{2}_{L^{2}(D)}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\,\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}.

In the step with \lesssim, we applied Lemma 4 for each 𝔲{\mathrm{\mathfrak{u}}} with 𝝂{\boldsymbol{\nu}} given by νi=α\nu_{i}=\alpha for i𝔲i\in{\mathrm{\mathfrak{u}}} and νi=0\nu_{i}=0 for i𝔲i\notin{\mathrm{\mathfrak{u}}}, and 𝒎𝔲{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}} denotes the multi-index with mi=0m_{i}=0 for all i𝔲i\not\in{\mathrm{\mathfrak{u}}}, with 𝒃¯𝒎𝔲i𝔲b¯imi\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\coloneq\prod_{i\in{\mathrm{\mathfrak{u}}}}\overline{b}_{i}^{m_{i}}. The implied constant is independent of hh. Taking the square-root completes the proof. ∎

Applying Theorem 5 to (5.1), we derive the bound on the error of the multilevel kernel approximation of uL=uhLsu_{L}=u^{s}_{h_{L}} which is presented below. The constant Cs(λ)C_{s}(\lambda) in (2.17) is bounded from above by 𝒟s(λ)\mathcal{D}_{s}(\lambda) defined in (6) since |𝒎𝔲|!<(|𝒎𝔲|+5)!|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|!<(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)! and bjb¯jb_{j}\leq\overline{b}_{j}. We also use the fact that fVfL2(D)\|f\|_{V^{\prime}}\lesssim\|f\|_{L^{2}(D)} to include the first term in (5.1) into the summation as the =0\ell=0 term.

Theorem 6.

Under Assumptions (A(A0))–(A(A0)) and (A(A6)), for α\alpha\in\mathbb{N} and λ(12α,1]\lambda\in(\frac{1}{2\alpha},1], the error of the multilevel kernel approximation of the dimension-truncated FE solution uLu_{L} satisfies

=0L(II)(uu1)L2(Ω×D)fL2(D)𝒟s(λ)=0L[φ(N)]14λh12,\displaystyle\sum_{\ell=0}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim\|f\|_{L^{2}(D)}\,\mathcal{D}_{s}(\lambda)\,\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-\frac{1}{4\lambda}}h_{\ell-1}^{2},

with

[𝒟s(λ)]2λ\displaystyle[\mathcal{D}_{s}(\lambda)]^{2\lambda} (𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)\displaystyle\coloneq\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}
×(𝔲{1:s}1γ𝔲(𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|+5)!𝒃¯𝒎𝔲i𝔲S(α,mi))2)λ,\displaystyle\qquad\times\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\bigg{)}^{\lambda}, (5.4)

where 𝐛¯=(b¯j)j1\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1} is defined in (2.3) and h11h_{-1}\coloneq 1.

5.2 Choosing the weight parameters γ𝔲\gamma_{\mathrm{\mathfrak{u}}}

We now choose the weights γ𝔲\gamma_{\mathrm{\mathfrak{u}}} to minimise 𝒟s(λ)\mathcal{D}_{s}(\lambda) and select λ\lambda to obtain the best possible convergence rate, while ensuring that 𝒟s(λ)\mathcal{D}_{s}(\lambda) is bounded independently of ss.

Theorem 7.

Suppose that Assumptions (A(A0))–(A(A6)) hold. The choice of weights

γ𝔲(𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|+5)!𝒃¯𝒎𝔲i𝔲S(α,mi)|𝔲|[2ζ(2αλ)]|𝔲|)21+λ\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\Bigg{(}\frac{\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})}{\sqrt{|{\mathrm{\mathfrak{u}}}|\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}}}\Bigg{)}^{\frac{2}{1+\lambda}} (5.5)

for 𝔲\emptyset\neq{\mathrm{\mathfrak{u}}}\subset\mathbb{N}, |𝔲|<|{\mathrm{\mathfrak{u}}}|<\infty, and γ1\gamma_{\emptyset}\coloneq 1, minimise 𝒟s(λ)\mathcal{D}_{s}(\lambda) given in (6). Here 𝐛¯=(b¯j)j1\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1} is defined in (2.3). In addition, if we take α1/q+1/2\alpha\coloneq\lfloor 1/q+1/2\rfloor and λq2q\lambda\coloneq\frac{q}{2-q}, then 𝒟s(λ)\mathcal{D}_{s}(\lambda) can be bounded independently of dimension ss. Thus, the multilevel kernel approximation of the dimension-truncated FE solution of (2.2) satisfies

=0L(II)(uu1)L2(Ω×D)fL2(D)=0L[φ(N)](12q14)h12,\displaystyle\sum_{\ell=0}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim\|f\|_{L^{2}(D)}\,\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-(\frac{1}{2q}-\frac{1}{4})}h_{\ell-1}^{2}, (5.6)

where we define h11h_{-1}\coloneq 1, and the implied constant is independent of ss.

Proof.

The proof of this theorem follows very closely to that of [23, Theorem 4.5]. We seek to choose weights to bound the constant 𝒟s(λ)\mathcal{D}_{s}(\lambda) independently of dimension. Applying [33, Lemma 6.2] to (6), we find that the weights given by (5.5) minimise [𝒟s(λ)]2λ[\mathcal{D}_{s}(\lambda)]^{2\lambda}.

Substituting (5.5) into (6) and simplifying gives

[𝒟s(λ)]2λ1+λ𝒎{0:α}s((|𝒎|+5)!i=1sβimi)2λ1+λ,\displaystyle[\mathcal{D}_{s}(\lambda)]^{\frac{2\lambda}{1+\lambda}}\leq\sum_{{\boldsymbol{m}}\in\{0:\alpha\}^{s}}\bigg{(}(|{\boldsymbol{m}}|+5)!\prod_{i=1}^{s}\beta_{i}^{m_{i}}\bigg{)}^{\frac{2\lambda}{1+\lambda}}, (5.7)

where we use the bound max(|𝔲|,1)[e1/e]|𝔲|\max(|{\mathrm{\mathfrak{u}}}|,1)\leq[e^{1/e}]^{|{\mathrm{\mathfrak{u}}}|} and define βiSmax(α)[2e1/eζ(2αλ)]12λb¯i\beta_{i}\coloneq S_{\max}(\alpha)\,[2e^{1/e}\zeta(2\alpha\lambda)]^{\frac{1}{2\lambda}}\,\overline{b}_{i}, with Smax(α)max1mαS(α,m)S_{\max}(\alpha)\coloneq\max_{1\leq m\leq\alpha}S(\alpha,m).

Defining a new sequence djβj/αd_{j}\coloneq\beta_{\lceil j/\alpha\rceil} for j1j\geq 1, we can write, for 𝒎{0:α}s{\boldsymbol{m}}\in\{0:\alpha\}^{s},

i=1sβimi=i𝔳𝒎di,\displaystyle\prod_{i=1}^{s}\beta_{i}^{m_{i}}=\prod_{i\in{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}}d_{i},

where 𝔳𝒎{1,2,,m1,α+1,α+2,,α+m2,(s1)α+1,,(s1)α+ms}{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}\coloneq\{1,2,\ldots,m_{1},\alpha+1,\alpha+2,\ldots,\alpha+m_{2}\ldots,(s-1)\alpha+1,\ldots,(s-1)\alpha+m_{s}\}. The cardinality of 𝔳𝒎{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}} is |𝔳𝒎|=i=1smi=|𝒎||{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}|=\sum_{i=1}^{s}m_{i}=|{\boldsymbol{m}}|. Then, following [23], the upper bound (5.7) can be further bounded above by

|𝔳|<𝔳((|𝔳|+5)!i𝔳di)2λ1+λ0[(+5)!]2λ1+λ1!(i1di2λ1+λ).\displaystyle\sum_{\stackrel{{\scriptstyle\scriptstyle{{\mathrm{\mathfrak{v}}}\subset\mathbb{N}}}}{{\scriptstyle{|{\mathrm{\mathfrak{v}}}|<\infty}}}}\bigg{(}(|{\mathrm{\mathfrak{v}}}|+5)!\prod_{i\in{\mathrm{\mathfrak{v}}}}d_{i}\bigg{)}^{\frac{2\lambda}{1+\lambda}}\leq\sum_{\ell\geq 0}[(\ell+5)!]^{\frac{2\lambda}{1+\lambda}}\frac{1}{\ell!}\bigg{(}\sum_{i\geq 1}d_{i}^{\frac{2\lambda}{1+\lambda}}\bigg{)}^{\ell}. (5.8)

Now following from Assumption (A(A6)), we know j1b¯jq<\sum_{j\geq 1}\overline{b}_{j}^{q}<\infty, so we choose λ\lambda such that 2λ1+λ=q\frac{2\lambda}{1+\lambda}=q which gives λ=q2q\lambda=\frac{q}{2-q}. Then

i1di2λ1+λ=i1diq=αi1βiq=αmax(1,Smax(α)[2e1/eζ(2αλ)]12λ)qj1b¯iq<,\displaystyle\sum_{i\geq 1}d_{i}^{\frac{2\lambda}{1+\lambda}}=\sum_{i\geq 1}d_{i}^{\,q}=\alpha\sum_{i\geq 1}\beta_{i}^{q}=\alpha\max(1,S_{\max}(\alpha)\,[2e^{1/e}\zeta(2\alpha\lambda)]^{\frac{1}{2\lambda}})^{q}\sum_{j\geq 1}\overline{b}_{i}^{q}<\infty,

provided that 2αλ>12\alpha\lambda>1, which can be ensured by choosing α>1/q1/2\alpha>1/q-1/2. This condition can be satisfied by taking α=(1q12)+1=1q+12\alpha=\lfloor(\frac{1}{q}-\frac{1}{2})+1\rfloor=\lfloor\frac{1}{q}+\frac{1}{2}\rfloor.

To show the full series (5.8) converges we perform the ratio test with the terms of the series given by T[(+5)!]q1!(i1diq)T_{\ell}\coloneq[(\ell+5)!]^{q}\,\frac{1}{\ell!}\big{(}\sum_{i\geq 1}d_{i}^{q}\big{)}^{\ell}. This shows that 𝒟s(λ)\mathcal{D}_{s}(\lambda) can be bounded from above independently of dimension since

lim|T+1T|=(i1diq)lim(+1+5)q+1\displaystyle\lim_{\ell\to\infty}\bigg{|}\frac{T_{\ell+1}}{T_{\ell}}\bigg{|}=\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\frac{(\ell+1+5)^{q}}{\ell+1} (i1diq)lim(+1)q+5q+1\displaystyle\leq\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\frac{(\ell+1)^{q}+5^{q}}{\ell+1}
=(i1diq)lim((+1)q1+5q+1)=0,\displaystyle=\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\bigg{(}(\ell+1)^{q-1}+\frac{5^{q}}{\ell+1}\bigg{)}=0,

where the inequality results from using the fact that (iai)qiaiq(\sum_{i}a_{i})^{q}\leq\sum_{i}a_{i}^{q} for q(0,1]q\in(0,1].

We can conclude that for the choice of weights (5.5), 𝒟s(λ)\mathcal{D}_{s}(\lambda) can be bounded independently of ss. Equation (5.6) follows by applying the bound on 𝒟s(λ)\mathcal{D}_{s}(\lambda) to Theorem 6. ∎

Remark 8.

The recent paper [41] demonstrates that it is possible to achieve double the convergence rate of the L2L^{2} kernel approximation error for functions that are sufficiently smooth, by leveraging the orthogonal projection property of the kernel interpolant, using a method reminiscent of the Aubin-Nitsche trick. The theory from [41] can be directly applied to our theoretical results, i.e., one should expect a convergence rate of 12λ\frac{1}{2\lambda} rather than the 14λ\frac{1}{4\lambda} as implied by (2.16), since the analytic nature of the solution u(𝐱,𝐲)u({\boldsymbol{x}},{\boldsymbol{y}}) of (2.2) with respect to 𝐲{\boldsymbol{y}} implies that u(𝐱,)α,𝛄(Ωs)u({\boldsymbol{x}},\cdot)\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s}) for all α1\alpha\geq 1. However, if we impose restrictions to ensure dimension independence of the constant 𝒟s(λ)\mathcal{D}_{s}(\lambda), then we cannot simply double the convergence rate in (5.6). Instead, the convergence rate for the QMC error determined in Theorem 7 will remain unmodified, while the choices of α\alpha and λ\lambda will change. We now choose α=12q+34\alpha=\lfloor\frac{1}{2q}+\frac{3}{4}\rfloor and λ=2q2q\lambda=\frac{2q}{2-q}, which means that α\alpha could now be selected to be smaller than before whilst maintaining the same rate of convergence and dimension independence of the constant.

The main result for total error of approximation using the multilevel kernel approximation method is presented below.

Theorem 9.

Under Assumptions (A(A0))–(A(A6)) and with α\alpha, λ\lambda and weights γ𝔲\gamma_{\mathrm{\mathfrak{u}}} chosen as in Theorem 7, the total error of the multilevel kernel approximation of uu satisfies

uILMLuL2(Ω×D)\displaystyle\|u-I^{\rm{ML}}_{L}u\|_{L^{2}(\Omega\times D)} fL2(D)(s(1p12)+hL2+=0L[φ(N)](12q14)h12),\displaystyle\lesssim\|f\|_{L^{2}(D)}\bigg{(}s^{-(\frac{1}{p}-\frac{1}{2})}+h_{L}^{2}+\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-(\frac{1}{2q}-\frac{1}{4})}h_{\ell-1}^{2}\bigg{)},

where h11h_{-1}\coloneq 1, and the implied constant is independent of the dimension ss.

Proof.

The result is simply obtained by combining the dimension truncation and FE bounds given in (2.6) and (2.9), respectively, with the multilevel kernel approximation error (5.6) to bound the total approximation error decomposition given in (3.1). ∎

Theorem 9 shows that Assumptions (M(M2)) and (M(M2)) of Theorem 1 hold with κ=1p12\kappa=\frac{1}{p}-\frac{1}{2}, μ=12q14\mu=\frac{1}{2q}-\frac{1}{4} and β=2\beta=2.

Remark 10.

CBC construction with weights of the form (5.5) are too costly to implement. Following [23], it can be shown that Theorem 9 continues to hold for the SPOD weights

γ𝔲𝒎𝔲{1:α}|𝔲|[(|𝒎𝔲|+5!)]21+λi𝔲(b¯imiS(α,mi)2e1/eζ(2αλ))21+λ.\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}[(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5!)]^{\frac{2}{1+\lambda}}\prod_{i\in{\mathrm{\mathfrak{u}}}}\bigg{(}\frac{\overline{b}_{i}^{m_{i}}\,S(\alpha,m_{i})}{\sqrt{2e^{1/e}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}.

However, these SPOD weights still come with a quadratic cost in dimension when evaluating the kernel basis functions to construct the approximation compared to the linear cost of product weights. Instead, the paper [25] proposes serendipitous product weights, whereby the order-dependent part of the weights is simply omitted, giving in our case

γ𝔲i𝔲(m=1αb¯imS(α,m)2e1/eζ(2αλ))21+λ.\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\prod_{i\in{\mathrm{\mathfrak{u}}}}\Bigg{(}\sum_{m=1}^{\alpha}\frac{\overline{b}_{i}^{m}S(\alpha,m)}{\sqrt{2e^{1/e}\zeta(2\alpha\lambda)}}\Bigg{)}^{\frac{2}{1+\lambda}}. (5.9)

For these weights, the upper bound on the error of the kernel approximation will continue to have the same theoretical rates of convergence seen in Theorem 9, although the implied constant will no longer be independent of ss.

The paper [25] demonstrates that serendipitous product weights offer comparable performance to SPOD weights for easier problems and superior performance in problems where SPOD weights may fail completely. A possible explanation is that the SPOD weights derived from theory may be poorer due to overestimates in the bounds. A second, more practical explanation is that the magnitude of SPOD weights substantially increases as dimension grows which leads to larger, more peaked kernel basis functions. The resulting approximations using these basis functions tend to also be very peaked, resulting in a less “smooth” approximation. It is worth noting that integration problems are more robust to overestimates that may affect the quality of weights, as the weights do not directly appear in the computation of the integral. This is not true for the kernel approximation problem where the weights appear explicitly in the kernel basis functions.

6 Implementing the ML kernel approximation

In this section, we outline how to compute efficiently the multilevel approximation for the PDE problem using a matrix-vector representation of (1.4). Throughout we consider a fixed truncation dimension and so omit the superscript ss.

First, we outline how to construct the single-level kernel interpolant for the PDE problem. As in [23], the single-level kernel interpolant applied to the FE approximation of the PDE solution is given by (2.14) with g=uh(𝒙,)g=u_{h}({\boldsymbol{x}},\cdot), for 𝒙D{\boldsymbol{x}}\in D. Each kernel interpolant coefficient aN,h,kVha_{N,h,k}\in V_{h} is now a FE function, which we can expand using the FE basis functions {ϕh,i}i=1M\{\phi_{h,i}\}_{i=1}^{M}, with M=dim(Vh)M=\dim(V_{h}), to write

INuh(𝒙,𝒚)=k=0N1aN,h,k(𝒙)𝒦α,𝜸(𝒕k,𝒚)=i=1Mk=0N1aN,h,k,i𝒦α,𝜸(𝒕k,𝒚)ϕh,i(𝒙).\displaystyle I_{N}u_{h}({\boldsymbol{x}},{\boldsymbol{y}})\,=\,\sum_{k=0}^{N-1}a_{N,h,k}({\boldsymbol{x}})\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\,=\,\sum_{i=1}^{M}\sum_{k=0}^{N-1}a_{N,h,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\,\phi_{h,i}({\boldsymbol{x}}). (6.1)

To enforce interpolation for the FE solution, we equate the coefficients of the FE functions ϕh,i\phi_{h,i} in (2.8) and (6.1) at each lattice point 𝒚=𝒕k{\boldsymbol{y}}={\boldsymbol{t}}_{k^{\prime}}, leading to the requirement

uh,i(𝒕k)=k=0N1aN,h,k,i𝒦α,𝜸(𝒕k,𝒕k)for all k=0,1,,N1,i=1,2,,M.\displaystyle u_{h,i}({\boldsymbol{t}}_{k^{\prime}})=\sum_{k=0}^{N-1}a_{N,h,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})\quad\text{for all }k^{\prime}=0,1,\ldots,N-1,\;i=1,2,\ldots,M.

Thus, the coefficients 𝑨N,h[aN,h,k,i]0kN1, 1iMN×M{\boldsymbol{A}}_{N,h}\coloneqq[a_{N,h,k,i}]_{0\leq k\leq N-1,\,1\leq i\leq M}\in\mathbb{R}^{N\times M} are the solution to the matrix equation

KN,α,𝜸𝑨N,h=𝑼N,h,K_{N,\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{N,h}\,=\,{\boldsymbol{U}}_{\!N,h},

where KN,α,𝜸=[𝒦α,𝜸(𝒕k,𝒕k)]k,k=0N1K_{N,\alpha,{\boldsymbol{\gamma}}}=[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})]_{k,k^{\prime}=0}^{N-1}, and 𝑼N,h=[uh,i(𝒕k)]0kN1, 1iM{\boldsymbol{U}}_{\!N,h}=[u_{h_{,}i}({\boldsymbol{t}}_{k})]_{0\leq k\leq N-1,\,1\leq i\leq M} is an N×MN\times M matrix of the FE nodal values at the lattice points.

Next we present a similar matrix-vector representation for the multilevel kernel approximation (1.4). Since we are using an embedded lattice rule, NN_{\ell} is a factor of N1N_{\ell-1} and the point sets across the levels are embedded, with an ordering such that

𝒕,k=𝒕1,k(N1/N)for =1,2,,L.\displaystyle{\boldsymbol{t}}_{\ell,k}={\boldsymbol{t}}_{\ell-1,\,k(N_{\ell-1}/N_{\ell})}\quad\text{for }\ell=1,2,\ldots,L. (6.2)

It follows that 𝒕,k=𝒕0,k(N0/N){\boldsymbol{t}}_{\ell,k}={\boldsymbol{t}}_{0,\,k(N_{0}/N_{\ell})} and the kernel matrices for each level \ell are nested with the same structure.

Since the FE spaces are also nested, we can expand the difference on each level 1\ell\geq 1 in the FE basis for V=VhV_{\ell}=V_{h_{\ell}}

u(,𝒚)u1(,𝒚)=i=1M[u,i(𝒚)u¯1,i(𝒚)]ϕh,iV,u_{\ell}(\cdot,{\boldsymbol{y}})-u_{\ell-1}(\cdot,{\boldsymbol{y}})=\sum_{i=1}^{M_{\ell}}\big{[}u_{\ell,i}({\boldsymbol{y}})-\overline{u}_{\ell-1,i}({\boldsymbol{y}})\big{]}\phi_{h_{\ell},i}\in V_{\ell}, (6.3)

where Mdim(V)M_{\ell}\coloneqq\dim(V_{\ell}) and [u¯1,i(𝒚)]i=1M[\overline{u}_{\ell-1,i}({\boldsymbol{y}})]_{i=1}^{M_{\ell}} are the FE coefficients for u1(,𝒚)u_{\ell-1}(\cdot,{\boldsymbol{y}}) after (exact) interpolation onto VV_{\ell}.

Similar to (6.1), the kernel approximation on level \ell is

I(uu1)(𝒙,𝒚)=i=1Mk=0N1a,k,i𝒦α,𝜸(𝒕,k,𝒚)ϕh,i(𝒙),I_{\ell}(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},{\boldsymbol{y}})=\sum_{i=1}^{M_{\ell}}\sum_{k=0}^{N_{\ell}-1}a_{\ell,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{\ell,k},{\boldsymbol{y}})\,\phi_{h_{\ell},i}({\boldsymbol{x}}), (6.4)

and we now choose the coefficients a,k,i=aN,h,k,ia_{\ell,k,i}=a_{N_{\ell},h_{\ell},k,i} such that (6.4) interpolates the difference (6.3) at each lattice point 𝒚=𝒕,k{\boldsymbol{y}}={\boldsymbol{t}}_{\ell,k}. This leads to L+1L+1 matrix equations

KN,α,𝜸𝑨={𝑼0for =0,𝑼𝑼¯1for =1,,L,\displaystyle K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{\ell}\,=\,\begin{cases}{{\boldsymbol{U}}}_{\!0}&\mbox{for }\ell=0,\\ {\boldsymbol{U}}_{\!\ell}-\overline{{\boldsymbol{U}}}_{\!\ell-1}&\mbox{for }\ell=1,\ldots,L,\end{cases} (6.5)

where now each coefficient matrix is 𝑨𝑨N,h=[a,k,i]N×M{\boldsymbol{A}}_{\ell}\coloneqq{\boldsymbol{A}}_{N_{\ell},h_{\ell}}=[a_{\ell,k,i}]\in\mathbb{R}^{N_{\ell}\times M_{\ell}}, 𝑼𝑼N,h=[u,i(𝒕,k)]N×M{\boldsymbol{U}}_{\!\ell}\coloneqq{\boldsymbol{U}}_{\!N_{\ell},h_{\ell}}=[u_{\ell,i}({\boldsymbol{t}}_{\ell,k})]\in\mathbb{R}^{N_{\ell}\times M_{\ell}} is the matrix of FE coefficients at the lattice points for level \ell, and 𝑼¯1=[u¯1,i(𝒕,k)]N×M{\overline{{\boldsymbol{U}}}}_{\!\ell-1}=[\overline{u}_{\ell-1,i}({\boldsymbol{t}}_{\ell,k})]\in\mathbb{R}^{N_{\ell}\times M_{\ell}} is the matrix of FE coefficients for u1(,𝒕,k)V1u_{\ell-1}(\cdot,{\boldsymbol{t}}_{\ell,k})\in V_{\ell-1} interpolated onto VV_{\ell} then evaluated at the lattice points for level \ell.

To obtain each coefficient matrix 𝑨{\boldsymbol{A}}_{\ell}, we exploit the circulant structure of kernel matrices to solve the linear systems (6.5) using FFT. As a result, we only require the first column of each KN,α,𝜸K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}} and furthermore, since the lattice points are nested as in (6.2), we obtain this column by taking every (N0/N)(N_{0}/N_{\ell})th entry (starting from the first) of the first column of KN0,α,𝜸K_{N_{0},\alpha,{\boldsymbol{\gamma}}}. Similarly, since the FE spaces are nested we obtain 𝑼¯1\overline{{\boldsymbol{U}}}_{\ell-1} for 1\ell\geq 1 using the FE matrix 𝑼1{\boldsymbol{U}}_{\ell-1} from the previous level. Specifically, we take every (N1/N)(N_{\ell-1}/N_{\ell})th row of 𝑼1{\boldsymbol{U}}_{\ell-1} then interpolate it onto VV_{\ell} to give 𝑼¯1\overline{{\boldsymbol{U}}}_{\ell-1}. This procedure is given in Algorithm 1.

Algorithm 1 Constructing the ML approximation
1:Compute 1st column of KN0,α,𝜸K_{N_{0},\alpha,{\boldsymbol{\gamma}}} \triangleright kernel matrix for entire lattice point set
2:Compute 𝑼0{\boldsymbol{U}}_{\!0} \triangleright FE solutions in V0V_{0} at all lattice points
3:Solve KN0,α,𝜸𝑨0=𝑼0K_{N_{0},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{0}={\boldsymbol{U}}_{\!0} using FFT \triangleright solve for coefficients
4:for =1,,L\ell=1,\ldots,L do
5:  Compute 1st column of KN,α,𝜸K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}} \triangleright (N0/N)(N_{0}/N_{\ell})th entries of column 1 of KN0,α,𝜸K_{N_{0},\alpha,{\boldsymbol{\gamma}}}
6:  Compute 𝑼{\boldsymbol{U}}_{\ell} \triangleright FE solutions in VV_{\ell} at NN_{\ell} lattice points
7:  Interpolate (N1/N)(N_{\ell-1}/N_{\ell})th rows of 𝑼1{\boldsymbol{U}}_{\ell-1} onto VV_{\ell} to obtain U¯1\overline{U}_{\!\ell-1}
8:  Solve KN,α,𝜸𝑨=𝑼𝑼¯1K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{\ell}={{\boldsymbol{U}}}_{\!\ell}-{\overline{{\boldsymbol{U}}}}_{\!\ell-1} using FFT \triangleright solve for coefficients
9:end for

On completion of Algorithm 1, we have a collection of coefficient matrices {𝑨}=0L\{{\boldsymbol{A}}_{\ell}\}_{\ell=0}^{L} that we can use to compute ILMLu(𝒙,𝒚)I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*}), using (1.4) and (6.4), to approximate the PDE solution at any (𝒙,𝒚)D×Ωs({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\in D\times\Omega_{s}. By using the embedded property of the lattice points and the local support of the FE basis, we can evaluate ILMLu(𝒙,𝒚)I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*}) efficiently with cost 𝒪(N0)\mathcal{O}(N_{0}) as we explain below.

We can write (6.4) as

I(uu1)(𝒙,𝒚)\displaystyle I_{\ell}(u_{\ell}-u_{\ell-1})({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\, =k=0N1(i=1𝒙supp(ϕh,i)Ma,k,iϕh,i(𝒙)[𝒂(𝒙)]k)𝒦α,𝜸(𝒕,k,𝒚)\displaystyle=\,\sum_{k=0}^{N_{\ell}-1}\Bigg{(}\underbrace{\sum_{\begin{subarray}{c}i=1\\ {\boldsymbol{x}}^{*}\in\mathrm{supp}(\phi_{h_{\ell},i})\end{subarray}}^{M_{\ell}}a_{\ell,k,i}\,\phi_{h_{\ell},i}({\boldsymbol{x}}^{*})}_{[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})]_{k}}\Bigg{)}\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{\ell,k},{\boldsymbol{y}}^{*})
=k=0N01𝕀(kN0(modN0))[𝒂(𝒙)]kN/N0𝒦α,𝜸(𝒕0,k,𝒚).\displaystyle=\,\sum_{k^{\prime}=0}^{N_{0}-1}\mathbb{I}(k^{\prime}N_{\ell}\equiv 0\,({\rm mod}\,N_{0}))\,[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})]_{\lfloor k^{\prime}N_{\ell}/N_{0}\rfloor}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{0,k^{\prime}},{\boldsymbol{y}}^{*}).

Effectively, for each \ell, we compute the sum over ii in (6.4) by “interpolating in space” along the columns of 𝑨{\boldsymbol{A}}_{\ell} to get a kernel coefficient vector at 𝒙{\boldsymbol{x}}^{*}, denoted 𝒂(𝒙)N{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})\in\mathbb{R}^{N_{\ell}}. Since the FE basis functions are locally supported, there will be a small number of indices ii in (6.4) such that ϕh,i(𝒙)\phi_{h_{\ell},i}({\boldsymbol{x}}^{*}) is nonzero. Multiplying the corresponding columns 𝑨{\boldsymbol{A}}_{\ell} by ϕh,i(𝒙)\phi_{h_{\ell},i}({\boldsymbol{x}}^{*}) and summing them up then gives 𝒂(𝒙){\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*}) at a cost of 𝒪(N)\mathcal{O}(N_{\ell}). In the second equality above we used the embedded property of the lattice points (6.2) to write this as a sum over the entire lattice point set. The indicator function 𝕀\mathbb{I} accounts for the extra terms that were added.

Summing over =0,1,,L\ell=0,1,\ldots,L gives a single kernel coefficient vector 𝒂(𝒙)N0{\boldsymbol{a}}({\boldsymbol{x}}^{*})\in\mathbb{R}^{N_{0}},

[𝒂(𝒙)]k=[𝒂0(𝒙)]k+=1L𝕀(kN0(modN0))[𝒂(𝒙)]kN/N0,[{\boldsymbol{a}}({\boldsymbol{x}}^{*})]_{k^{\prime}}\,=\,[{\boldsymbol{a}}_{0}({\boldsymbol{x}}^{*})]_{k^{\prime}}+\sum_{\ell=1}^{L}\mathbb{I}(k^{\prime}N_{\ell}\equiv 0\,({\rm mod}\,N_{0}))\,[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})]_{\lfloor k^{\prime}N_{\ell}/N_{0}\rfloor},

at a cost 𝒪(N0)\mathcal{O}(N_{0}). Since the points are embedded =0LNL=𝒪(N0)\sum_{\ell=0}^{L}N_{L}=\mathcal{O}(N_{0}) and hence, the total cost to construct the kernel coefficient vector 𝒂(𝒙){\boldsymbol{a}}({\boldsymbol{x}}^{*}) is 𝒪(N0)\mathcal{O}(N_{0}).

Similarly, we store all of the kernel evaluations at 𝒚{\boldsymbol{y}} anchored at each of the lattice points in a single vector 𝜿(𝒚)[𝒦α,𝜸(𝒕k,𝒚)]k=0N01{\boldsymbol{\kappa}}({\boldsymbol{y}}^{*})\coloneqq[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}}^{*})]_{k=0}^{N_{0}-1}, which costs 𝒪(sραςN0)\mathcal{O}(s^{\rho}\alpha^{\varsigma}N_{0}) where ρ,ς>0\rho,\varsigma>0 depend on the structure of the kernel and are specified in Section 2.6.

Finally, the ML kernel approximation of u(𝒙,𝒚)u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*}) is computed by the product

u(𝒙,𝒚)ILMLu(𝒙,𝒚)=𝒂(𝒙)𝜿(𝒚).u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\,\approx\,I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\,=\,{\boldsymbol{a}}({\boldsymbol{x}}^{*})^{\top}{\boldsymbol{\kappa}}({\boldsymbol{y}}^{*}).

Combining the costs above leads to a total cost of evaluating ILMLu(𝒙,𝒚)I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*}) of 𝒪(sραςN0)\mathcal{O}(s^{\rho}\alpha^{\varsigma}N_{0}).

7 Numerical experiments

In this section, we present the results of numerical experiments conducted using the high performance computational cluster Katana [38].

7.1 Problem specification

The parametric PDE.

We consider the spatial domain D=(0,1)2D=(0,1)^{2} with source term f(𝒙)=x2f({\boldsymbol{x}})=x_{2}. For the parametric coefficient defined in (1.2), we choose ψ01\psi_{0}\equiv 1 and

ψj(𝒙)=C6jϑsin(jπx1)sin(jπx2)forj1,\displaystyle\psi_{j}({\boldsymbol{x}})=\frac{C}{\sqrt{6}}j^{-\vartheta}\sin(j\pi x_{1})\sin(j\pi x_{2})\qquad\mbox{for}\quad j\geq 1,

where C>0C>0 is a scaling parameter to vary the magnitude of the random coefficient, and ϑ>1\vartheta>1 is a decay parameter determining how quickly the importance of each random parameter decreases. The factor 16\frac{1}{\sqrt{6}} is included for easier comparison to the experiments in [23] and [25]. For this choice, the bounds on the coefficient assumed in (A(A0)) are given by amin1C6ζ(ϑ)a_{\min}\coloneq\!1-\frac{C}{\sqrt{6}}\zeta(\vartheta) and amax1+C6ζ(ϑ)a_{\max}\coloneq\!1+\frac{C}{\sqrt{6}}\zeta(\vartheta). Thus, from (2.3) we have

bj=Cjϑamin6andb¯j=Cπj1ϑamin6forj1.\displaystyle b_{j}=\frac{Cj^{-\vartheta}}{a_{\min}\sqrt{6}}\qquad\mbox{and}\qquad\overline{b}_{j}=\frac{C\pi j^{1-\vartheta}}{a_{\min}\sqrt{6}}\qquad\mbox{for}\quad j\geq 1.

Assumptions (A(A0)) and (A(A6)) hold provided that 1>p>1ϑ1>p>\frac{1}{\vartheta} and 1>q>1ϑ11>q>\frac{1}{\vartheta-1}. Having chosen p>1ϑp>\frac{1}{\vartheta}, we can set q=p1pq=\frac{p}{1-p}.

We will present numerical results for two choices of parameters: C=1.5,ϑ=3.6C=1.5,\vartheta=3.6 and C=0.2,ϑ=1.2C=0.2,\vartheta=1.2, with the first being easier than the second, and consider the truncated parametric domain [0,1]64[0,1]^{64}, i.e., s=64s=64.

Smoothness, weights, and CBC construction.

The smoothness parameter of the kernel is fixed as α=1\alpha=1. We refrain from using higher values of α\alpha to avoid stability issues in inaccurate eigenvalue computation via FFT in double precision. Issues in double precision computations arise even for small values of α\alpha and relatively small NN, necessitating the use of arbitrary-precision computing (see e.g., [31]).

Following Remark 10, we use serendipitous weights (5.9) for our experiments. We take α=1\alpha=1 and λ=12α+0.1=0.6\lambda=\frac{1}{2\alpha}+0.1=0.6 in (5.9), and use the CBC algorithm described in [30] to construct one embedded lattice rule designed for each parameter set to be used in the range 27N2172^{7}\leq N\leq 2^{17} up to dimension s=64s=64.

Convergence and cost.

According to (2.17) and Theorem 6, and ignoring the dependence of the constants on ss (which is fixed here), we have theoretically 𝒪(Nμ)\mathcal{O}(N_{\ell}^{-\mu}) convergence in Assumption (M(M2)) of Theorem 1 with μ=1/(4λ)0.417\mu=1/(4\lambda)\approx 0.417. As explained in Remark 8, since the PDE solution has a higher smoothness order than α=1\alpha=1, the theoretical convergence rate doubles to μ=1/(2λ)0.833\mu=1/(2\lambda)\approx 0.833.

Recall that the computational cost to achieve an error ε>0\varepsilon>0 for the single-level approximation is (2.19), and for the multilevel approximation it is (3.4). Our spatial domain is the unit square, so d=2d=2. The FE convergence rate in Assumption (M(M2)) of Theorem 1 is determined by Theorem 6 to be β=2\beta=2, which also applies for the single-level FE error as indicated by (2.9). Since the serendipitous weights (5.9) are product weights, we have ρ=1\rho=1. With a fixed dimension s=64s=64, we may informally set κ=\kappa=\infty and τd\tau\approx d in Theorem 1. The third case in (3.4) is not relevant when using product weights. Hence, we obtain the theoretical cost bounds

{cost(INuhs)ε1μ1=ε2λ1=ε2.2,cost(ILMLu)εmax(1μ,1)=εmax(2λ,1)=ε1.2.\displaystyle\begin{cases}\mbox{cost}(I_{N}u_{h}^{s})\lesssim\varepsilon^{-\frac{1}{\mu}-1}=\varepsilon^{-2\lambda-1}=\varepsilon^{-2.2},\\ \mbox{cost}(I^{\text{ML}}_{L}u)\lesssim\varepsilon^{-\max(\frac{1}{\mu},1)}=\varepsilon^{-\max(2\lambda,1)}=\varepsilon^{-1.2}.\end{cases} (7.1)

7.2 Diagnostic plots

FE.

Figure 1 plots the computational cost, measured by CPU time in seconds, of assembling and solving the FE linear system required to obtain the PDE solution, and the FE error uhsuhsL2(Ω×D)1Nk=0N1uhs(,𝒕k)uhs(,𝒕k)L2(D)2\|u^{s}_{h^{*}}-u^{s}_{h}\|_{L^{2}(\Omega\times D)}\approx\sqrt{\frac{1}{N}\sum_{k=0}^{N-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{t}}_{k})-u^{s}_{h}(\cdot,{\boldsymbol{t}}_{k})\|_{L^{2}(D)}^{2}} for mesh widths h{23,,28}h\in\{2^{-3},\ldots,2^{-8}\} with respect to the reference solution with mesh width h=29h^{*}=2^{-9} and s=64s=64. We use N=216N=2^{16} lattice points for the integral over L2(Ω)L^{2}(\Omega), and the L2(D)L^{2}(D) norm is computed exactly from the FE coefficients. We observe an error convergence rate of 𝒪(h2)\mathcal{O}(h^{2}) and a cost of 𝒪(h2)\mathcal{O}(h^{-2}), demonstrating that indeed β=2\beta=2 and τd=2\tau\approx d=2 in Assumptions (M(M2)) and (M(M2)) of Theorem 1.

Refer to caption
Refer to caption
Figure 1: Cost for the FE solve on the left. The FE error uhsuhsL2(Ω×D)\|u^{s}_{h^{*}}-u^{s}_{h}\|_{L^{2}(\Omega\times D)} on the right. They demonstrate β=2\beta=2 and τd=2\tau\approx d=2 in Assumptions (M(M2)) and (M(M2)) of Theorem 1.
Refer to caption
Refer to caption
Figure 2: Comparing dimension truncation error with FE error.

Dimension truncation.

For s{23,,27}s\in\{2^{3},\ldots,2^{7}\} and reference values s=28s^{*}=2^{8} and h=29h=2^{9}, we compute the error uhsuhsL2(Ω×D)1Nk=0N1uhs(,𝒕k)uhs(,𝒕k)L2(D)2\|u^{s^{*}}_{h}-u^{s}_{h}\|_{L^{2}(\Omega\times D)}\approx\sqrt{\frac{1}{N}\sum_{k=0}^{N-1}\|u^{s^{*}}_{h}(\cdot,{\boldsymbol{t}}_{k})-u^{s}_{h}(\cdot,{\boldsymbol{t}}_{k})\|_{L^{2}(D)}^{2}} using N=216N=2^{16} lattice points. In Figure 2, on the horizontal axis, we overlay the dimension truncation errors with the FE errors for h{23,,28}h\in\{2^{-3},\ldots,2^{-8}\}, with the vertical axis on the left in terms of ss (purple circles), and on the right in terms of hh (blue triangles). (The relative slope between the overlaid plots is arbitrary.) We observe 𝒪(sκ)\mathcal{O}(s^{-\kappa}) convergence (c.f. Assumption (M(M2)) of Theorem 1) with κ=4.06\kappa=4.06 for the easier problem C=1.5,ϑ=3.6C=1.5,\vartheta=3.6, and κ=1.72\kappa=1.72 for the harder problem C=0.2,ϑ=1.2C=0.2,\vartheta=1.2. For both problems with s=26=64s=2^{6}=64 we obtain dimension truncation errors much less than 10610^{-6}, i.e., smaller than all the FE errors with h{23,,28}h\in\{2^{-3},\ldots,2^{-8}\} in Figure 1.

Single-level kernel interpolation.

Following [23], we use an efficient formula to estimate the single-level kernel interpolation error

(IIN)uhsL2(Ω×D)\displaystyle\|(I-I_{N})u^{s}_{h^{*}}\|_{L^{2}(\Omega\times D)} 1RNr=1Rk=0N1uhs(,𝒚r+𝒕k)INuhs(,𝒚r+𝒕k)L2(D)2,\displaystyle\approx\sqrt{\frac{1}{RN}\sum_{r=1}^{R}\sum_{k=0}^{N-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-I_{N}u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\|^{2}_{L^{2}(D)}},

where {𝒚r}r=1R\{{\boldsymbol{y}}_{r}\}_{r=1}^{R} is a sequence of Sobol points, {𝒕k}k=0N1\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1} is a sequence of lattice points, with h=29h^{*}=2^{-9}, s=64s=64, R=10R=10, and N{24,,216}N\in\{2^{4},\ldots,2^{16}\}. In Figure 3, on the horizontal axis, we overlay the interpolation errors with the FE errors for h{23,,28}h\in\{2^{-3},\ldots,2^{-8}\}, with the vertical axis on the left in terms of hh (blue triangles), and on the right in terms of NN (red circles). We observe 𝒪(N1.5)\mathcal{O}(N^{-1.5}) convergence for the easier problem C=1.5,ϑ=3.6C=1.5,\vartheta=3.6, and 𝒪(N0.99)\mathcal{O}(N^{-0.99}) for the harder problem C=0.2,ϑ=1.2C=0.2,\vartheta=1.2.

Refer to caption
Refer to caption
Figure 3: Matching single-level kernel interpolation error and FE error.
SL ML{\rm ML}_{\ell} 0 1 2 3 4 5
hh hh_{\ell} 232^{-3} 242^{-4} 252^{-5} 262^{-6} 272^{-7} 282^{-8}
C=1.5,ϑ=3.6C=1.5,\vartheta=3.6 NN NLN_{L-\ell} 262^{6} 272^{7} 292^{9} 2102^{10} 2112^{11} 2132^{13}
C=0.2,ϑ=1.2C=0.2,\vartheta=1.2 NN NLN_{L-\ell} 262^{6} 282^{8} 2102^{10} 2122^{12} 2142^{14} 2162^{16}
Table 1: Combinations of FE mesh widths and number of lattice points used for the single-level and multilevel approximations.

We use Figure 3 to decide how to match the FE mesh width to the number of lattice points so as to give comparable errors. For each hh from the FE error data point in Figure 3, we trace upward to maintain the same error and look for the nearest interpolation error data point in order to find a matching NN with a similar error. (Note that the relative slope between the overlaid plots is arbitrary and does not affect the outcome.) This yields the pairings (h,N)(h,N) for the single-level kernel interpolant for the two problems in Table 1. We will also use Table 1 to form the pairings (h,N)(h_{\ell},N_{\ell}) for our multilevel kernel approximation below. Notice the reverse labeling NLN_{L-\ell} in Table 1, reflecting the fact that the values of NN_{\ell} decrease with increasing \ell. For example, when L=2L=2, we have h0=23h_{0}=2^{-3}, h1=24h_{1}=2^{-4}, h2=25h_{2}=2^{-5}, and according to Table 1 for the case C=1.5,ϑ=3.6C=1.5,\vartheta=3.6, we take N0=29N_{0}=2^{9}, N1=27N_{1}=2^{7}, N2=26N_{2}=2^{6}.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Estimates of (IIN)(uu1)L2(Ω×D)\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)} with varying NN and \ell.

Interpolation error of the FE difference.

Figure 4 plots the estimates of

(IIN)(uu1)L2(Ω×D)\displaystyle\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}
1RNr=1Rk=0N1(uhsuh1s)(,𝒚r+𝒕k)IN(uhsuh1s)(,𝒚r+𝒕k)L2(D)2,\displaystyle\approx\sqrt{\frac{1}{RN}\sum_{r=1}^{R}\sum_{k=0}^{N-1}\|(u^{s}_{h_{\ell}}-u^{s}_{h_{\ell-1}})(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-I_{N}(u^{s}_{h_{\ell}}-u^{s}_{h_{\ell-1}})(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\|^{2}_{L^{2}(D)}},

for both sets of parameters, with s=64s=64, R=10R=10, h=23h_{\ell}=2^{-\ell-3}, and varying NN and \ell.

The top two plots in Figure 4 show that (IIN)(uu1)L2(Ω×D)\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)} decreases with increasing NN for each {0,,4}\ell\in\{0,\ldots,4\}. The case =0\ell=0 is exactly the single-level interpolation error (IIN)uh0sL2(Ω×D)\|(I-I_{N})u^{s}_{h_{0}}\|_{L^{2}(\Omega\times D)}, and we observe a faster rate of convergence compared to the interpolation error of the FE differences for {1,,4}\ell\in\{1,\ldots,4\}. For the easier problem C=1.5,ϑ=3.6C=1.5,\vartheta=3.6 on the left, we observe 𝒪(N1.51)\mathcal{O}(N^{-1.51}) for =0\ell=0 and 𝒪(N1.19)\mathcal{O}(N^{-1.19}) on average for the other \ell. For the harder problem C=0.2,ϑ=1.2C=0.2,\vartheta=1.2 on the right, we observe 𝒪(N1.00)\mathcal{O}(N^{-1.00}) for =0\ell=0 and 𝒪(N0.71)\mathcal{O}(N^{-0.71}) on average for the other \ell. (The rates for =0\ell=0 are different to those observed in Figure 3 because we have h0=23h_{0}=2^{-3} here while h=29h^{*}=2^{-9} in Figure 3.) These observed rates correspond to the value of μ\mu in Assumption (M(M2)) of Theorem 6.

The bottom plots in Figure 4 show that (IIN)(uu1)L2(Ω×D)\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)} decreases with increasing level \ell for a range of values of NN. (Larger values of NN are used for the harder problem C=0.2,ϑ=1.2C=0.2,\vartheta=1.2 on the right.) This illustrates how the FE error changes on each level. We observe an error reduction of roughly 𝒪(h2)\mathcal{O}(h_{\ell}^{2}), thus verifying that β=2\beta=2 in Assumption (M(M2)) of Theorem 6.

7.3 Multilevel results

Figure 5 plots the computational cost, measured by CPU time in seconds, of the single-level and multilevel approximations against their respective errors for the two parameter sets. The errors are estimated by

(IA)uhsL2(Ω×D)\displaystyle\|(I-A)u^{s}_{h^{*}}\|_{L^{2}(\Omega\times D)} 1RNr=1Rk=0N1uhs(,𝒚r+𝒕k)Auhs(,𝒚r+𝒕k)L2(D)2,\displaystyle\approx\sqrt{\frac{1}{RN^{*}}\sum_{r=1}^{R}\sum_{k=0}^{N^{*}-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-Au^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\|^{2}_{L^{2}(D)}},

where AuhsAu^{s}_{h^{*}} denotes either a single-level or multilevel approximation of uhsu^{s}_{h^{*}}, with s=64s=64, R=10R=10, h=29h^{*}=2^{-9}, and NN^{*} is the maximum number of lattice points used to construct the approximation (i.e., N=N0N^{*}=N_{0} for a multilevel approximation).

Refer to caption
Refer to caption
Figure 5: Computation time for the single-level and multilevel approximations against their errors.

Starting from bottom right to top left within the plots in Figure 5, each data point of the single-level result corresponds to a decreasing FE mesh width h{23,,28}h\in\{2^{-3},\ldots,2^{-8}\} with corresponding increasing number of lattice points NN from Table 1, while each data point of the multilevel result corresponds to an increasing total number of levels L{0,,5}L\in\{0,\ldots,5\} with the corresponding choices for {(h,N):=0,,L}\{(h_{\ell},N_{\ell}):\ell=0,\ldots,L\} taken from Table 1. The =0\ell=0 data point for the multilevel error coincides with the single-level error as expected.

Recall from (7.1) that the theoretical costs for the single-level and multilevel approximations to achieve an error ε>0\varepsilon>0 are 𝒪(ε2.2)\mathcal{O}(\varepsilon^{-2.2}) and 𝒪(ε1.2)\mathcal{O}(\varepsilon^{-1.2}), respectively. The triangles in the plots show gradients 1 and 2 as denoted. The observed costs shown in the legends of Figure 5 are mostly lower (better) than the theoretical costs. It may be that the theoretical rate of convergence μ\mu used in estimating the cost in (7.1) is overly pessimistic. We can instead estimate the cost using the observed rates for μ\mu from Figure 4.

For the easier problem C=1.5,ϑ=3.6C=1.5,\vartheta=3.6 on the left of Figure 4, we observe μ=1.51\mu=1.51 and μ=1.19\mu=1.19 (average) for single-level and multilevel errors, respectively, giving the expected costs 𝒪(ε1.66)\mathcal{O}(\varepsilon^{-1.66}) and 𝒪(ε1)\mathcal{O}(\varepsilon^{-1}), respectively. These are closer to the corresponding observed costs 𝒪(ε1.55)\mathcal{O}(\varepsilon^{-1.55}) and 𝒪(ε1.06)\mathcal{O}(\varepsilon^{-1.06}) on the left of Figure 5.

For the harder problem C=0.2,ϑ=1.2C=0.2,\vartheta=1.2 on the right of Figure 4, we observe μ=1.00\mu=1.00 and μ=0.71\mu=0.71 (average) for single-level and multilevel errors, respectively, giving the expected costs 𝒪(ε2)\mathcal{O}(\varepsilon^{-2}) and 𝒪(ε1.41)\mathcal{O}(\varepsilon^{-1.41}), respectively. These again are closer to the corresponding observed costs 𝒪(ε1.85)\mathcal{O}(\varepsilon^{-1.85}) and 𝒪(ε1.27)\mathcal{O}(\varepsilon^{-1.27}) on the right of Figure 5.

These observations have been replicated for several other parameter sets, which are omitted for brevity.

8 Conclusion

This paper introduces a multilevel kernel method for approximating solutions to PDEs with periodic coefficients over the parametric domain. A theoretical framework is developed with full detail, including L2L^{2} error estimates for the multilevel method and a comprehensive cost comparison between the single-level and multilevel approaches. The construction of the multilevel approximation is also outlined and is supported by numerical experiments demonstrating the advantages of the multilevel approximation for different choices of parameters with varying levels of difficulty.

When applying multilevel methods to integration, randomisation can be used to compute an unbiased mean-square error estimate, and the number of levels and/or samples per level can be adaptively adjusted on the run. In contrast, a practical challenge in applying the multilevel kernel approximation is that there is no efficient adaptive multilevel approximation strategy. This is because the error cannot be effectively evaluated during the implementation, since any estimation must be done with respect to some reference solution. This requires either solving the PDE again at several sample points or constructing an expensive single-level proxy of the true solution, defeating the purpose of applying multilevel approximation. We leave investigations into adaptive multilevel kernel approximation for future research.

References

  • [1] G. Byrenheid, L. Kämmerer, T. Ullrich and T. Volkmer, Tight error bounds for rank-1 lattice sampling in spaces of hybrid mixed smoothness, Numer. Math. 136 993–1034, (2017).
  • [2] J. Charrier, R. Scheichl and A. L. Teckentrup, Finite element error analysis of elliptic PDEs with random coefficients and its application to multilevel Monte Carlo methods, SIAM J. Numer. Anal. 51, 332–352 (2013).
  • [3] P. G. Ciarlet, The Finite Element Method for Elliptic Problems SIAM, Philadelphia, PA, USA, 2002.
  • [4] K. A. Cliffe, I. G. Graham, R. Scheichl and L. Stals, Parallel computation of flow in heterogeneous media using mixed finite elements, J. Comput. Phys. 164, 258–282 (2000).
  • [5] K. A. Cliffe, M. B. Giles, R. Scheichl and A. L. Teckentrup, Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients, Comput. Visual.  Sc. 14, 3–15 (2011).
  • [6] A Cohen, R. De Vore and Ch. Schwab, Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs, Found. Comput. Math. 10, 615–646 (2010).
  • [7] R. Cools, F. Y. Kuo, D. Nuyens and I. H. Sloan, Lattice algorithms for multivariate approximation in periodic spaces with general weight parameters, Contemp. Math. 754, 93–113 (2020)
  • [8] R. Cools, F. Y. Kuo, D. Nuyens and I. H. Sloan, Fast component-by-component construction of lattice algorithms for multivariate approximation with POD and SPOD weights, Math. Comp. 90, 787–812 (2021).
  • [9] M. Croci and P. E. Farrell, Complexity bounds on supermesh construction for quasi-uniform meshes, J. Comput. Phys. 414, 1–7 (2020).
  • [10] M. Croci, M. B. Giles , M. E. Rognes and P. E. Farrell, Efficient white noise sampling and coupling for multilevel Monte Carlo with nonnested meshes, SIAM/ASA J. Uncertain. Quantif. 6, 1630–1655 (2018).
  • [11] N. Dyn, Interpolation and approximation by radial and related functions, In: Approximation Theory VI (C. Chui, L. Schumaker, and J. Ward, eds), Academic Press (New York), 211–223 (1989).
  • [12] P. E. Farrell, M. D. Piggott, C. C. Pain, G. J. Gorman and C. R. Wilson, Conservative interpolation between unstructured meshes via supermesh construction, Comput. Methods Appl. Mech. Engrg. 198, 2632–2642 (2009)
  • [13] A. D. Gilbert and R. Scheichl, Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems I: regularity and error analysis, IMA J. Numer. Anal. 44, 466–503 (2024).
  • [14] A. D. Gilbert, F. Y. Kuo and A. Srikumar, Density estimation for elliptic PDE with random input by preintegration and quasi-Monte Carlo methods, to appear in: SIAM J. Numer. Anal., arXiv:2402.11807 (2025).
  • [15] M. B. Giles, Improved multilevel Monte Carlo convergence using the Milstein scheme, In: Monte Carlo and Quasi-Monte Carlo Methods 2006 (A. Keller, S. Heinrich, and H. Niederreiter, eds), Springer-Verlag, 343–358 (2007)
  • [16] M. B. Giles, Multilevel Monte Carlo path simulation, Oper. Res. 56, 607–617 (2008).
  • [17] M. B. Giles, Multilevel Monte Carlo methods, Acta Numer. 24, 259–328 (2015).
  • [18] M. B. Giles and B. J. Waterhouse, Multilevel quasi-Monte Carlo path simulation, Radon Series Comp. Appl. Math., 8, 1–18 (2009)
  • [19] I. G. Graham, F. Y. Kuo, J. A. Nichols, R. Scheichl, Ch. Schwab and I. H. Sloan, Quasi-Monte Carlo finite element methods for elliptic PDEs with lognormal random coefficients, Numer. Math., 131, 329–368 (2015).
  • [20] H. Hakula, H. Harbrecht, V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Uncertainty quantification for random domains using periodic random variables, Numer. Math. 156, 273–317 (2024)
  • [21] R. L. Hardy Multiquadric equations of topography and other irregular surfaces, J. Geophys. Res. 76, 1905–1915. (1971).
  • [22] S. Heinrich, Multilevel Monte Carlo methods, Lecture notes in Compu. Sci. Vol. 2179, 3624–3651, Springer (2001)
  • [23] V. Kaarnioja, Y. Kazashi, F. Y. Kuo, F. Nobile and I. H. Sloan, Fast approximation by periodic kernel-based lattice-point interpolation with application in uncertainty quantification, Numer. Math. 150, 33–77 (2022).
  • [24] V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Uncertainty quantification using periodic random variables, SIAM J. Numer. Anal. 58, 1068–1091 (2020).
  • [25] V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions, In: Monte Carlo and Quasi-Monte Carlo Methods 2022 (A. Hinrichs, P. Kritzer and F. Pillichshammer, eds.), Springer-Verlag, 81–103 (2024)
  • [26] L. Kämmerer, Reconstructing hyperbolic cross trigonometric polynomials from sampling along rank-1 lattices, SIAM J. Numer. Anal., 51, 2773–2796 (2013).
  • [27] L. Kämmerer, Multiple rank-1 lattices as sampling schemes for multivariate trigonometric polynomials, J. Fourier. Anal. Appl., 24, 17–44 (2018).
  • [28] L. Kämmerer, D. Potts and T. Volkmer, Approximation of multivariate periodic functions by trigonometric polynomials based on rank-1 lattice sampling, J. Complexity, 31, 543–576 (2015).
  • [29] F. Y. Kuo, G. Migliorati, F. Nobile and D. Nuyens, Function integration, reconstruction and approximation using rank-1 lattices Math. Comp., 90, 1861 – 1897 (2021).
  • [30] F. Y. Kuo, W. Mo and D. Nuyens, Constructing embedded lattice-based algorithms for multivariate function approximation with a composite number of points, Constr. Approx., 61, 81–113 (2025).
  • [31] F. Y. Kuo, W. Mo, D. Nuyens, I. H. Sloan and A. Srikumar, Comparison of two search criteria for lattice-based kernel approximation, In: Monte Carlo and Quasi-Monte Carlo Methods 2022 (A. Hinrichs, P. Kritzer and F. Pillichshammer, eds), Springer Verlag, 413–429 (2024)
  • [32] F. Y. Kuo and D. Nuyens, Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients – a survey of analysis and implementation, Found. Comput. Math. 16, 1631–1696 (2016).
  • [33] F. Y. Kuo, Ch. Schwab and I. H. Sloan, Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients, SIAM J. Numer. Anal. 50, 3351–3374 (2012).
  • [34] F. Y. Kuo, Ch. Schwab and I. H. Sloan, Multilevel quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients, Found. Comput. Math. 15, 411–449 (2015).
  • [35] F. Y. Kuo, I. H. Sloan and H. Wózniakowski, Lattice rules for multivariate approximation in the worst case setting, In: Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and D. Talay, eds), Springer Verlag, 289–330, (2006).
  • [36] F. Y. Kuo, I. H. Sloan and H. Wózniakowski, Lattice rule algorithms for multivariate approximation in the average case setting, J. Complexity. 24, 283–323, (2008).
  • [37] F. J. Narcowich, J. D. Ward and H. Wendland, Refined error estimates for radial basis function interpolation, Constr. Approx. 19, 541–564, (2003).
  • [38] PVC (Research Infrastructure), UNSW Sydney, Katana. (2010) doi:10.26190/669X-A286
  • [39] J. Quaintance and H. W. Gould, Combinatorial Identities for Stirling Numbers: The Unpublished Notes of H. W. Gould, World Scientific Publishing Company, River Edge, NJ, (2015).
  • [40] R. Schaback and H. Wendland, Kernel techniques: from machine learning to meshless methods, Acta Numer. 15, 543–639 (2006).
  • [41] I. H. Sloan and V. Kaarnioja, Doubling the rate: improved error bounds for orthogonal projection with application to interpolation, BIT Numer. Math. 65 (Online) (2025)
  • [42] I. H. Sloan and H. Woźniakowski, Tractability of multivariate integration for weighted Korobov classes, J. Comp. 17, 697–721 (2001).
  • [43] A. L. Teckentrup, P. Jantsch, C. G. Webster and M. Gunzburger, A multilevel stochastic collocation method for partial differential equations with random input data, SIAM-ASA J. Uncert. Quantif. 3, 1046–1074 (2015).
  • [44] A. L. Teckentrup, R. Scheichl, M. B. Giles and E. Ullmann, Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients, Numer. Math. 125, 569–600 (2013).
  • [45] H. Wendland, Scattered Data Approximation, Cambridge University Press, Cambridge (2005)
  • [46] Z. M. Wu and R. Schaback, Local error estimates for radial basis function interpolation of scattered data, IMA J. Numer. Anal. 13, 13–27 (1993).
  • [47] Z. Y. Zeng, P. Kritzer and F. J. Hickernell, Spline methods using integration lattices and digital nets, Constr. Approx. 30, 529–555 (2009).
  • [48] Z. Y. Zeng, K. T. Leung and F. J. Hickernell, Error analysis of splines for periodic problems using lattice designs, In: Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and D. Talay, eds) Springer Verlag, 501–514 (2006)

Appendix I Combinatorial identities and proofs

This appendix provides some important combinatorial results that are required in the proofs of several regularity theorems. First, we provide an identity to simplify sums involving Stirling numbers of the second kind (2.1) from [39],

k=1m(mk)S(mk,)=(+1)S(m,+1)for <m.\displaystyle\sum_{k=1}^{m-\ell}{m\choose k}S(m-k,\ell)=(\ell+1)S(m,\ell+1)\quad\text{for }\ell<m. (I.1)
Lemma 11.

Let c>0c>0 and let 𝐛=(bj)j1{\boldsymbol{b}}=(b_{j})_{j\geq 1}, (𝔸𝛎)𝛎(\mathbb{A}_{{\boldsymbol{\nu}}})_{{\boldsymbol{\nu}}\in\mathcal{F}} and (𝔹𝛎)𝛎(\mathbb{B}_{{\boldsymbol{\nu}}})_{{\boldsymbol{\nu}}\in\mathcal{F}} be sequences of non-negative real numbers that satisfy the recurrence

𝔸𝝂=j1k=1νj(νjk)ckbj𝔸𝝂k𝒆j+𝔹𝝂for all 𝝂 including 𝝂=𝟎,\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\,\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}+\mathbb{B}_{\boldsymbol{\nu}}\quad\text{for all }{\boldsymbol{\nu}}\in\mathcal{F}\mbox{ including }{\boldsymbol{\nu}}={\boldsymbol{0}}, (I.2)

where 𝐞j{\boldsymbol{e}}_{j} is the multi-index whose the jjth component is 1 and all other components are 0. Then

𝔸𝝂=𝒎𝝂c|𝒎|(𝝂𝒎)𝔹𝝂𝒎𝒎||!𝒃i1S(mi,i).\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}). (I.3)

where S(mi,i)S(m_{i},\ell_{i}) for i1i\geq 1 are Stirling numbers of the second kind given by (2.1). The result also holds with both equalities replaced by inequalities \leq.

Proof.

The statement is proved by using induction on |𝝂||{\boldsymbol{\nu}}|. The statement holds trivially for 𝝂=𝟎{\boldsymbol{\nu}}={\boldsymbol{0}} since S(0,0)=1S(0,0)=1. Assume the statement (I.3) holds for all multi-indices of order less than |𝝂||{\boldsymbol{\nu}}|, i.e., if 1kνj1\leq k\leq\nu_{j} for some j1j\geq 1, then

𝔸𝝂k𝒆j=𝒎𝝂k𝒆jc|𝒎|(𝝂k𝒆j𝒎)𝔹𝝂k𝒆j𝒎𝒎||!𝒃i1S(mi,i).\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}). (I.4)

We now prove the statement for indices of order |𝝂||{\boldsymbol{\nu}}|. Substituting the induction hypothesis (I.4) into (I.2), we have

𝔸𝝂=𝔹𝝂+j1Φj,\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{j\geq 1}\Phi_{j}, (I.5)

where

Φjk=1νj(νjk)ckbj𝒎𝝂k𝒆jc|𝒎|(𝝂k𝒆j𝒎)𝔹𝝂k𝒆j𝒎𝒎||!𝒃i1S(mi,i).\displaystyle\Phi_{j}\coloneq\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}).

We use a dash to denote multi-indices with the jjth component removed, for example, 𝝂=(ν1,,νj1,νj+1,){\boldsymbol{\nu}}^{\prime}~=~(\nu_{1},\ldots,\nu_{j-1},\nu_{j+1},\ldots) and adopt the notation 𝔹𝝂=𝔹𝝂,νj\mathbb{B}_{{\boldsymbol{\nu}}}=\mathbb{B}_{{\boldsymbol{\nu}}^{\prime},\nu_{j}}. Then

Φj=\displaystyle\Phi_{j}= k=1νj(νjk)ckbj[𝒎𝝂mj=0νjkc|𝒎|+mj(𝝂𝒎)(νjkmj)𝔹𝝂𝒎,νjkmj\displaystyle\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\,\Bigg{[}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{|{\boldsymbol{m}}^{\prime}|+m_{j}}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}{\nu_{j}-k\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}
×𝒎j=0mj(||+j)!𝒃bjjS(mj,j)iji1S(mi,i)]\displaystyle\hskip 113.81102pt\times\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}\sum_{\ell_{j}=0}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\Bigg{]}
=\displaystyle= 𝒎𝝂c|𝒎|(𝝂𝒎)𝒎𝒃(iji1S(mi,i))𝔹𝝂𝒎,νjkmj\displaystyle\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}c^{|{\boldsymbol{m}}^{\prime}|}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\bigg{)}\,\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}
×k=1νjmj=0νjkcmj+k(νjk)(νjkmj)bjj=0mj(||+j)!bjjS(mj,j)Θ1,\displaystyle\hskip 56.9055pt\times\underbrace{\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{m_{j}+k}{\nu_{j}\choose k}{\nu_{j}-k\choose m_{j}}\,b_{j}\sum_{\ell_{j}=0}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})}_{\eqcolon\,\Theta_{1}}, (I.6)

where we have rearranged the terms and swapped the order of the sums.

Recognising that

(νjk)(νjkmj)=(νjmj+k)(mj+kk),{\nu_{j}\choose k}{\nu_{j}-k\choose m_{j}}={\nu_{j}\choose m_{j}+k}{m_{j}+k\choose k},

we have

Θ1\displaystyle\Theta_{1} =k=1νjmj=0νjkcmj+k(νjmj+k)(mj+kk)𝔹𝝂𝒎,νj(k+mj)j=0mj(||+j)!bjj+1S(mj,j)\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}\!\!c^{m_{j}+k}{\nu_{j}\choose m_{j}+k}\!{m_{j}+k\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-(k+m_{j})}\!\!\sum_{\ell_{j}=0}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j},\ell_{j})
=k=1νjmj=kνjcmj(νjmj)(mjk)𝔹𝝂𝒎,νjmjj=0mjk(||+j)!bjj+1S(mjk,j)\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=k}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}{m_{j}\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\ell_{j}=0}^{m_{j}-k}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})
=mj=1νjcmj(νjmj)𝔹𝝂𝒎,νjmjk=1mj(mjk)j=0mjk(||+j)!bjj+1S(mjk,j)Θ2,\displaystyle=\sum_{m_{j}=1}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\underbrace{\sum_{k=1}^{m_{j}}{m_{j}\choose k}\sum_{\ell_{j}=0}^{m_{j}-k}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})}_{\eqcolon\,\Theta_{2}}, (I.7)

where we get to the second line by relabelling the summation indices of the second sum from mj=0,,νjkm_{j}=0,\ldots,\nu_{j}-k to mj=k,,νjm_{j}=k,\ldots,\nu_{j} and to the third line by swapping the order of sums indexed by kk and mjm_{j}. Swapping the order of the summations indexed by kk and j\ell_{j} in Θ2\Theta_{2} gives

Θ2\displaystyle\Theta_{2} =j=0mj1(||+j)!bjj+1k=1mjj(mjk)S(mjk,j)\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}+1}\sum_{k=1}^{m_{j}-\ell_{j}}{m_{j}\choose k}S(m_{j}-k,\ell_{j})
=j=0mj1(||+j)!bjj+1(j+1)S(mj,j+1)\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j})!\,b_{j}^{\ell_{j}+1}(\ell_{j}+1)\,S(m_{j},\ell_{j}+1)
=j=1mj(||+j1)!jbjjS(mj,j)=||+j0j=0mj(||+j1)!jbjjS(mj,j),\displaystyle=\sum_{\ell_{j}=1}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})=\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0}}}}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j}), (I.8)

where the second equality is obtained using (I.1) and the third equality is from a simple re-indexing of the sum. To reach the final line, we can add the terms corresponding to j=0\ell_{j}=0 into the sum so that the sum begins from j=0\ell_{j}=0 because these terms are all equal to 0 due to the presence of an j\ell_{j} factor in the terms of the series, provided that we also introduce the condition ||+j0|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0 to ensure the factorial term is defined. Substituting (I.8) into (I.7) gives

Θ1\displaystyle\Theta_{1} =mj=0νjcmj(νjmj)𝔹𝝂𝒎,νjmj||+j0j=0mj(||+j1)!jbjjS(mj,j),\displaystyle=\sum_{m_{j}=0}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0}}}}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j}),

where we have added the terms corresponding to mj=0m_{j}=0 into the summation over mjm_{j} since these terms are all also equal to 0 as j=0\ell_{j}=0 when mj=0m_{j}=0. Substituting this formula for Θ1\Theta_{1} back into (I) then rearranging gives

Φj\displaystyle\Phi_{j} =𝒎𝝂c|𝒎|(𝝂𝒎)𝒎𝒃(iji1S(mi,i))\displaystyle=\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}c^{|{\boldsymbol{m}}^{\prime}|}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\bigg{)}
×mj=0νjcmj(νjmj)𝔹𝝂𝒎,νjmj||+j0j=0mj(||+j1)!jbjjS(mj,j)\displaystyle\qquad\quad\times\sum_{m_{j}=0}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0}}}}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})
=𝒎𝝂c|𝒎|(𝝂𝒎)𝔹𝝂𝒎𝟎𝒎𝒃(||1)!ji1S(mi,i),\displaystyle=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|-1)!\,\ell_{j}\,\prod_{i\geq 1}S(m_{i},\ell_{i}),

where we obtain the last line by combining the sums over 𝒎{\boldsymbol{m}}^{\prime} and mjm_{j} into a single sum over the original index 𝒎{\boldsymbol{m}} and combine the sums over {\boldsymbol{\ell}}^{\prime} and j\ell_{j} into a single sum over the index {\boldsymbol{\ell}}.

Now substituting this back into (I.5), we have

𝔸𝝂\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}} =𝔹𝝂+j1𝒎𝝂c|𝒎|(𝝂𝒎)𝔹𝝂𝒎𝟎𝒎𝒃(||1)!ji1S(mi,i)\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{j\geq 1}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,(|{\boldsymbol{\ell}}|-1)!\,\ell_{j}\,\prod_{i\geq 1}S(m_{i},\ell_{i})
=𝔹𝝂+𝒎𝝂c|𝒎|(𝝂𝒎)𝔹𝝂𝒎𝟎𝒎𝒃||!i1S(mi,i)\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,|{\boldsymbol{\ell}}|!\,\prod_{i\geq 1}S(m_{i},\ell_{i})
=𝔹𝝂+𝒎𝝂c|𝒎|(𝝂𝒎)𝔹𝝂𝒎(i1S(mi,0)+𝒎𝒃||!i1S(mi,i))\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\bigg{(}-\prod_{i\geq 1}S(m_{i},0)+\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,|{\boldsymbol{\ell}}|!\,\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}
=𝒎𝝂c|𝒎|(𝝂𝒎)𝒎𝒃||!𝔹𝝂𝒎i1S(mi,i),\displaystyle=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,|{\boldsymbol{\ell}}|!\,\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\prod_{i\geq 1}S(m_{i},\ell_{i}),

which is our desired result. We move to the second line interchanging the order of the summations and then to the third line by including the case when =𝟎{\boldsymbol{\ell}}={\boldsymbol{0}} and subtracting the corresponding term i1S(mi,0)\prod_{i\geq 1}S(m_{i},0) to maintain equality. Since S(mi,0)=0S(m_{i},0)=0 when mi0m_{i}\neq 0, the term i1S(mi,0)\prod_{i\geq 1}S(m_{i},0) only contributes when 𝒎=𝟎{\boldsymbol{m}}={\boldsymbol{0}}, and the resulting sum cancels out with the term 𝔹𝝂\mathbb{B}_{\boldsymbol{\nu}} giving the required result.

Since this induction proof holds for equality, in the case when (I.2) and (I.3) have their equalities replaced by inequalities \leq, the statement will continue to hold. ∎

From [20, Lemma A.3], we also have that for some 𝝂{\boldsymbol{\nu}}\in\mathcal{F},

𝒎𝝂𝒎𝒌𝝂𝒎(𝝂𝒎)𝔸𝔹𝒌i1(S(mi,i)S(νimi,ki))\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{A}_{{\boldsymbol{\ell}}}\,\mathbb{B}_{{\boldsymbol{k}}}\prod_{i\geq 1}\bigg{(}S(m_{i},\ell_{i})\,S(\nu_{i}-m_{i},k_{i})\bigg{)}
=\displaystyle= 𝒎𝝂(𝒎(𝒎)𝔸𝔹𝒎)i1S(νi,mi),\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}\mathbb{A}_{{\boldsymbol{\ell}}}\,\mathbb{B}_{{\boldsymbol{m}}-{\boldsymbol{\ell}}}\bigg{)}\prod_{i\geq 1}S(\nu_{i},m_{i}), (I.9)

where (𝔸𝝂)𝝂(\mathbb{A}_{\boldsymbol{\nu}})_{{\boldsymbol{\nu}}\in\mathcal{F}} and (𝔹𝝂)𝝂(\mathbb{B}_{\boldsymbol{\nu}})_{{\boldsymbol{\nu}}\in\mathcal{F}} are arbitrary sequences of real numbers.

Appendix II Parametric regularity proofs

  • Proof of Lemma 2.

    We follow a similar strategy to the proof of [32, Lemma 6.2]. For 𝝂=𝟎{\boldsymbol{\nu}}={\boldsymbol{0}}, we rewrite the strong formulation (1.1) using the product rule to obtain

    Ψ(𝒙,𝒚)Δu(𝒙,𝒚)\displaystyle-\Psi({\boldsymbol{x}},{\boldsymbol{y}})\,\Delta u({\boldsymbol{x}},{\boldsymbol{y}}) =Ψ(𝒙,𝒚)u(𝒙,𝒚)+f(𝒙),\displaystyle=\nabla\Psi({\boldsymbol{x}},{\boldsymbol{y}})\cdot\nabla u({\boldsymbol{x}},{\boldsymbol{y}})+f({\boldsymbol{x}}),

    from which it follows that

    ΨminΔu(,𝒚)L2(D)Ψ(,𝒚)L(D)u(,𝒚)V+fL2(D),\displaystyle\Psi_{\min}\|\Delta u(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\leq\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)}\|u(\cdot,{\boldsymbol{y}})\|_{V}+\|f\|_{L^{2}(D)},

    where we have used Assumption (A(A0)) and the definition u(,𝒚)V=u(,𝒚)L2(D)\|u(\cdot,{\boldsymbol{y}})\|_{V}=\|\nabla u(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}. Combining this with (2.4), we have

    Δu(,𝒚)L2(D)\displaystyle\|\Delta u(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)} 1Ψmin(sup𝒚ΩΨ(,𝒚)L(D)fVΨmin+fL2(D))\displaystyle\leq\frac{1}{\Psi_{\min}}\Bigg{(}\sup_{{\boldsymbol{y}}\in\Omega}\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)}\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}+\|f\|_{L^{2}(D)}\Bigg{)}
    1Ψmin(sup𝒚ΩΨ(,𝒚)L(D)CPoifL2(D)Ψmin+fL2(D))\displaystyle\leq\frac{1}{\Psi_{\min}}\Bigg{(}\sup_{{\boldsymbol{y}}\in\Omega}\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)}\frac{C_{\rm{Poi}}\|f\|_{L^{2}(D)}}{\Psi_{\min}}+\|f\|_{L^{2}(D)}\Bigg{)}
    C1fL2(D),\displaystyle\leq C_{1}\|f\|_{L^{2}(D)},

    where

    C1max{1,CPoi}Ψmin(ΨL(D×Ω)Ψmin+1),\displaystyle C_{1}\coloneqq\frac{\max\{1,C_{\rm{Poi}}\}}{\Psi_{\min}}\bigg{(}\frac{\|\nabla\Psi\|_{L^{\infty}(D\times\Omega)}}{\Psi_{\min}}+1\bigg{)}, (II.1)

    and CPoiC_{\rm{Poi}} is the Poincaré constant of the embedding VL2(D)V\hookrightarrow L^{2}(D). It follows from Assumption (A(A0)) that ΨL(D×Ω)sup𝒚ΩΨ(,𝒚)L(D)\|\nabla\Psi\|_{L^{\infty}(D\times\Omega)}\coloneq\sup_{{\boldsymbol{y}}\in\Omega}\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)} is finite. Hence, u(,𝒚)H2(D)u(\cdot,{\boldsymbol{y}})\in H^{2}(D) and (4.1) holds for 𝝂=𝟎{\boldsymbol{\nu}}={\boldsymbol{0}}.

    Recall that the gradient \nabla and Laplacian Δ\Delta are taken with respect to 𝒙{\boldsymbol{x}} and that 𝝂\partial^{{\boldsymbol{\nu}}} is taken with respect to 𝒚{\boldsymbol{y}}. For 𝝂𝟎{\boldsymbol{\nu}}\neq{\boldsymbol{0}}, we take the 𝝂{\boldsymbol{\nu}}th derivative of (1.1) using the Leibniz product rule. Since ff is independent of 𝒚{\boldsymbol{y}}, the right-hand side is 0 and we can rearrange the resulting expression to obtain, omitting the dependence of 𝒙{\boldsymbol{x}} and 𝒚{\boldsymbol{y}},

    (Ψ𝝂u)\displaystyle\nabla\cdot(\Psi\nabla\partial^{{\boldsymbol{\nu}}}u) =(𝟎𝒎𝝂(𝝂𝒎)(𝒎Ψ)(𝝂𝒎u))\displaystyle=-\nabla\cdot\bigg{(}\ \sum_{{\boldsymbol{0}}\neq{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\big{)}\bigg{)}
    =(j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)ψj(𝝂k𝒆ju)),\displaystyle=-\nabla\cdot\bigg{(}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\psi_{j}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}\bigg{)}, (II.2)

    where we use the fact that the mixed partial derivatives of Ψ\Psi with respect to 𝒚{\boldsymbol{y}} are

    𝒎Ψ(𝒙,𝒚)={Ψ(𝒙,𝒚)if 𝒎=𝟎,(2π)ksin(2πyj+kπ2)ψj(𝒙)if 𝒎=k𝒆j,k1,0otherwise.\displaystyle\partial^{\boldsymbol{m}}\Psi({\boldsymbol{x}},{\boldsymbol{y}})=\begin{cases}\Psi({\boldsymbol{x}},{\boldsymbol{y}})&\text{if }{\boldsymbol{m}}={\boldsymbol{0}},\\ (2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\dfrac{k\pi}{2}\bigg{)}\psi_{j}({\boldsymbol{x}})&\text{if }{\boldsymbol{m}}=k{\boldsymbol{e}}_{j},\,k\geq 1,\\ 0&\text{otherwise.}\end{cases} (II.3)

    Expanding (Proof of Lemma 2.) using the product rule for \nabla and rearranging gives

    ΨΔ𝝂u=Ψ𝝂u\displaystyle\Psi\Delta\partial^{\boldsymbol{\nu}}u=-\nabla\Psi\cdot\nabla\partial^{\boldsymbol{\nu}}u
    j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)((ψj)(𝝂k𝒆ju)+ψj(Δ𝝂k𝒆ju)).\displaystyle\qquad\quad-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\Big{(}\big{(}\nabla\psi_{j}\big{)}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}+\psi_{j}\big{(}\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}\Big{)}.

    Now taking the L2(D)L^{2}(D) norm and applying the triangle inequality gives

    ΨminΔ𝝂uL2(D)ΨL(D)𝝂uL2(D)\displaystyle\Psi_{\min}\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}\leq\|\nabla\Psi\|_{L^{\infty}(D)}\|\nabla\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}
    +j1k=1νj(νjk)(2π)k(ψjL(D)Δ𝝂k𝒆juL2(D)+ψjL(D)𝝂k𝒆juL2(D)),\displaystyle+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\big{(}\|\psi_{j}\|_{L^{\infty}(D)}\|\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{L^{2}(D)}+\|\nabla\psi_{j}\|_{L^{\infty}(D)}\|\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{L^{2}(D)}\big{)},

    where we used |sin(x)|1|\sin(x)|\leq 1 for all real xx. Formulating the above into a recursion gives

    Δ𝝂uL2(D)j1k=1νj(νjk)(2π)kbjΔ𝝂k𝒆juL2(D)+B𝝂,\displaystyle\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,b_{j}\,\|\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{L^{2}(D)}+B_{\boldsymbol{\nu}}, (II.4)

    where bjb_{j} and b¯j\overline{b}_{j} are defined in (2.3) and we define

    B𝝂ΨL(D)Ψmin𝝂uV+j1k=1νj(νjk)(2π)kb¯j𝝂k𝒆juV.\displaystyle B_{\boldsymbol{\nu}}\coloneq\frac{\|\nabla\Psi\|_{L^{\infty}(D)}}{\Psi_{\min}}\|\partial^{\boldsymbol{\nu}}u\|_{V}+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\overline{b}_{j}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{V}.

    Substituting (2.5) into B𝝂B_{\boldsymbol{\nu}}, we can bound B𝝂B_{\boldsymbol{\nu}} by

    B𝝂\displaystyle B_{\boldsymbol{\nu}} ΨL(D)ΨminfVΨmin(2π)|𝝂|𝒎𝝂|𝒎|!𝒃𝒎i1S(νi,mi)\displaystyle\leq\frac{\|\nabla\Psi\|_{L^{\infty}(D)}}{\Psi_{\min}}\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})
    +j1k=1νj(νjk)(2π)kb¯jfVΨmin(2π)|𝝂|k𝒎𝝂k𝒆j|𝒎|!𝒃𝒎S(νjk,mj)iji1S(νi,mi)\displaystyle\qquad+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\overline{b}_{j}\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{|{\boldsymbol{\nu}}|-k}\!\!\!\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!\!\!|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})
    =fVΨmin(2π)|𝝂|[aL(D)Ψmin𝒎𝝂|𝒎|!𝒃𝒎i1S(νi,mi)\displaystyle=\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{|{\boldsymbol{\nu}}|}\bigg{[}\frac{\|\nabla a\|_{L^{\infty}(D)}}{\Psi_{\min}}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})
    +j1k=1νj(νjk)b¯j𝒎𝝂k𝒆j|𝒎|!𝒃𝒎S(νjk,mj)iji1S(νi,mi)Θ].\displaystyle\qquad+\underbrace{\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})}_{\eqcolon\,\Theta}\bigg{]}. (II.5)

    We simplify Θ\Theta using the same strategy in the proof of Lemma 11 by separating out the jjth component of the sum over 𝒎{\boldsymbol{m}} and interchanging the order of the sums to give

    Θ\displaystyle\Theta =j1k=1νj(νjk)b¯j𝒎𝝂mj=0νjk(|𝒎|+mj)!𝒃𝒎bjmjS(νjk,mj)iji1S(νi,mi)\displaystyle=\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\overline{b}_{j}\,\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{m_{j}=0}^{\nu_{j}-k}(|{\boldsymbol{m}}^{\prime}|+m_{j})!\,{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}b_{j}^{m_{j}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})
    =j1𝒎𝝂𝒃𝒎(iji1S(νi,mi))mj=0νjkk=1νj(νjk)b¯j(|𝒎|+mj)!bjmjS(νjk,mj)\displaystyle=\sum_{j\geq 1}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})\bigg{)}\sum_{m_{j}=0}^{\nu_{j}-k}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}\,(|{\boldsymbol{m}}^{\prime}|+m_{j})!\,b_{j}^{m_{j}}S(\nu_{j}-k,m_{j})
    j1𝒎𝝂𝒃𝒎(iji1S(νi,mi))|𝒎|+mj0mj=0νj(|𝒎|+mj1)!mjb¯jmjS(νj,mj)\displaystyle\leq\sum_{j\geq 1}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})\bigg{)}\sum_{\stackrel{{\scriptstyle\scriptstyle{m_{j}=0}}}{{\scriptstyle{|{\boldsymbol{m}}^{\prime}|+m_{j}\neq 0}}}}^{\nu_{j}}(|{\boldsymbol{m}}^{\prime}|+m_{j}-1)!\,m_{j}\,\overline{b}_{j}^{m_{j}}\,S(\nu_{j},m_{j})
    𝒎𝝂|𝒎|!𝒃¯𝒎i1S(νi,mi),\displaystyle\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

    where we drop the condition |𝒎|+mj0|{\boldsymbol{m}}^{\prime}|+m_{j}\neq 0 since if 𝒎=𝟎{\boldsymbol{m}}={\boldsymbol{0}} and 𝝂𝟎{\boldsymbol{\nu}}\neq{\boldsymbol{0}}, there exists some index ii such that νi0\nu_{i}\neq 0 and S(νi,mi)=0S(\nu_{i},m_{i})=0. We have also replaced 𝒃{\boldsymbol{b}} with 𝒃¯\overline{{\boldsymbol{b}}} since bib¯ib_{i}\leq\overline{b}_{i} for all i1i\geq 1.

    Substituting the bound on Θ\Theta into (Proof of Lemma 2.) and using 𝒃𝒃¯{\boldsymbol{b}}\leq\overline{{\boldsymbol{b}}}, we can bound B𝝂B_{\boldsymbol{\nu}} by

    B𝝂\displaystyle B_{\boldsymbol{\nu}} fL2(D)CPoiΨmin(ΨL(D×Ω)Ψmin+1)(2π)|𝝂|𝒎𝝂|𝒎|!𝒃¯𝒎i1S(νi,mi)\displaystyle\leq\|f\|_{L^{2}(D)}\frac{C_{\rm{Poi}}}{\Psi_{\min}}\bigg{(}\frac{\|\nabla\Psi\|_{L^{\infty}(D\times\Omega)}}{\Psi_{\min}}+1\bigg{)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i})
    C1fL2(D)(2π)|𝝂|𝒎𝝂|𝒎|!𝒃¯𝒎i1S(νi,mi)𝔹𝝂,\displaystyle\leq C_{1}\|f\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i})\eqcolon\mathbb{B}_{{\boldsymbol{\nu}}},

    where C1C_{1} is defined in (II.1). Thus, defining 𝔸𝝂Δ𝝂uL2(D)\mathbb{A}_{\boldsymbol{\nu}}\coloneq\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}, we can write (II.4) as

    𝔸𝝂j1k=1νj(νjk)(2π)kbj𝔸𝝂k𝒆j+𝔹𝝂.\displaystyle\mathbb{A}_{\boldsymbol{\nu}}\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,b_{j}\,\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}+\mathbb{B}_{\boldsymbol{\nu}}.

    Noting that 𝔸0𝔹0\mathbb{A}_{0}\leq\mathbb{B}_{0}, we can then apply Lemma 11 (we cannot apply Lemma 11 to (II.4) with B𝝂B_{{\boldsymbol{\nu}}} since it is not true that 𝔸𝟎B𝟎\mathbb{A}_{\boldsymbol{0}}\leq B_{\boldsymbol{0}}) to give

    Δ𝝂uL2(D)\displaystyle\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)} 𝒎𝝂(2π)|𝒎|(𝝂𝒎)(𝒎||!𝒃i1S(mi,i))\displaystyle\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}
    ×C1fL2(D)(2π)|𝝂𝒎|𝒌𝝂𝒎|𝒌|!𝒃¯𝒌i1S(νimi,ki)\displaystyle\times{C_{1}\|f\|_{L^{2}(D)}}(2\pi)^{|{\boldsymbol{\nu}}-{\boldsymbol{m}}|}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}|{\boldsymbol{k}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})
    C1fL2(D)(2π)|𝝂|\displaystyle\leq{C_{1}\|f\|_{L^{2}(D)}}(2\pi)^{|{\boldsymbol{\nu}}|}
    ×𝒎𝝂(𝝂𝒎)(𝒎||!𝒃¯i1S(mi,i))(𝒌𝝂𝒎|𝒌|!𝒃¯𝒌i1S(νimi,ki)),\displaystyle\times\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\!\!{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\!|{\boldsymbol{\ell}}|!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\!\bigg{)}\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}\!\!\!|{\boldsymbol{k}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}\!S(\nu_{i}-m_{i},k_{i})\!\bigg{)},

    Then, using (I) and the identity 𝒎(𝒎)||!|𝒎|!=(|𝒎|+1)!\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!\,|{\boldsymbol{m}}-{\boldsymbol{\ell}}|!=(|{\boldsymbol{m}}|+1)! from [32], we obtain,

    Δ(𝝂u)L2(D)\displaystyle\|\Delta(\partial^{\boldsymbol{\nu}}u)\|_{L^{2}(D)} C1fL2(D)(2π)|𝝂|𝒎𝝂𝒃¯𝒎[𝒎(𝒎)||!|𝒎|!]i1S(νi,mi)\displaystyle\leq{C_{1}\|f\|_{L^{2}(D)}}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\bigg{[}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!\,|{\boldsymbol{m}}-{\boldsymbol{\ell}}|!\bigg{]}\prod_{i\geq 1}S(\nu_{i},m_{i})
    =C1fL2(D)(2π)|𝝂|𝒎𝝂(|𝒎|+1)!𝒃¯𝒎i1S(νi,mi),\displaystyle=C_{1}\|f\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+1)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

    as required. \hfill\Box

  • Proof of Lemma 3.

    We follow the proof strategy presented in [32, Lemma 6.3]. Let fL2(D)f\in L^{2}(D), 𝒚Ω{\boldsymbol{y}}\in\Omega and 𝝂{\boldsymbol{\nu}}\in\mathcal{F}. Since uhu_{h} is analytic in VV, we have that 𝝂uhVh\partial^{{\boldsymbol{\nu}}}u_{h}\in V_{h} for every 𝝂{\boldsymbol{\nu}}\in\mathcal{F} and hence

    (𝖨𝖯𝒚h)(𝝂uh(,𝒚))0,\displaystyle(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})(\partial^{\boldsymbol{\nu}}u_{h}(\cdot,{\boldsymbol{y}}))\equiv 0,

    where 𝖯𝒚h\mathsf{P}^{h}_{\boldsymbol{y}} is the orthogonal projection defined by (2.11). It then follows that

    𝝂(uuh)V\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V} =𝖯𝒚h𝝂(uuh)+(𝖨𝖯𝒚h)𝝂(uuh)V\displaystyle=\|\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{{\boldsymbol{\nu}}}(u-u_{h})+(\mathsf{I}-\mathsf{P}_{\boldsymbol{y}}^{h})\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}
    𝖯𝒚h𝝂(uuh)V+(𝖨𝖯𝒚h)𝝂uV,\displaystyle\leq\|\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}+\|(\mathsf{I}-\mathsf{P}_{\boldsymbol{y}}^{h})\partial^{{\boldsymbol{\nu}}}u\|_{V}, (II.6)

    where we have omitted the dependence of uu on 𝒙{\boldsymbol{x}} and 𝒚{\boldsymbol{y}} for brevity.

    Starting with the Galerkin orthogonality property given in (2.10), we take the 𝝂\partial^{{\boldsymbol{\nu}}} derivative with respect to 𝒚{\boldsymbol{y}} using the Leibniz product rule to give

    D𝒎𝝂(𝝂𝒎)(𝒎Ψ)𝝂𝒎(uuh)vhd𝒙=0for all vhVh.\displaystyle\int_{D}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}(\partial^{\boldsymbol{m}}\Psi)\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}=0\quad\text{for all }v_{h}\in V_{h}. (II.7)

    Next, separating out the term when 𝒎=𝟎{\boldsymbol{m}}={\boldsymbol{0}} and then substituting the mixed derivatives of the coefficient (II.3) into (II.7), we obtain

    DΨ𝝂(uuh)vhd𝒙\displaystyle\int_{D}\Psi\,\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}
    =j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)Dψj𝝂k𝒆j(uuh)vhd𝒙\displaystyle\qquad=-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\int_{D}\psi_{j}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}

    for all vhVhv_{h}\in V_{h}. Now, letting vh=𝖯𝒚h𝝂(uuh)v_{h}=\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h}), we have

    DΨ𝝂(uuh)𝖯𝒚h𝝂(uuh)d𝒙\displaystyle\int_{D}\Psi\,\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}
    =j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)Dψj𝝂k𝒆j(uuh)𝖯𝒚h𝝂(uuh)d𝒙.\displaystyle=-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\!\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\!\int_{D}\!\psi_{j}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\cdot\nabla\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}. (II.8)

    Since 𝝂(uuh)V\partial^{{\boldsymbol{\nu}}}(u-u_{h})\in V, using (2.11) with w=𝝂(uuh)w=\partial^{{\boldsymbol{\nu}}}(u-u_{h}) and vh=𝖯𝒚h𝝂(uuh)v_{h}=\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h}) yields

    DΨ((𝖨𝖯𝒚h)𝝂(uuh))𝖯𝒚h𝝂(uuh)d𝒙=0,\displaystyle\int_{D}\Psi\nabla((\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{\boldsymbol{\nu}}(u-u_{h}))\cdot\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}=0,

    which can then be rearranged to give

    DΨ𝝂(uuh)𝖯𝒚h𝝂(uuh)d𝒙\displaystyle\int_{D}\Psi\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}} =DΨ|𝖯𝒚h𝝂(uuh)|2d𝒙\displaystyle=\int_{D}\Psi\,|\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})|^{2}\,\mathrm{d}{\boldsymbol{x}}
    Ψmin𝖯𝒚h𝝂(uuh)V2,\displaystyle\geq\Psi_{\min}\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\|^{2}_{V}, (II.9)

    where we have used Assumption (A(A0)). Substituting in the lower bound (II.9) for the left-hand side of (Proof of Lemma 3.), applying the Cauchy-Schwarz inequality to the right hand side and then dividing through by Ψmin𝖯𝒚h𝝂(uuh)V\Psi_{\min}\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\|_{V} yields

    𝖯𝒚h𝝂(uuh)V1Ψminj1k=1νj(νjk)(2π)kψjL(D)𝝂k𝒆j(uuh)V.\displaystyle\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\|_{V}\leq\frac{1}{\Psi_{\min}}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\|\psi_{j}\|_{L^{\infty}(D)}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\|_{V}.

    Then substituting this into (Proof of Lemma 3.) gives

    𝝂(uuh)V\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V} j1k=1νj(νjk)(2π)kbj𝝂k𝒆j(uuh)V+(𝖨𝖯𝒚h)𝝂uV,\displaystyle\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\,\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\|_{V}+\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}}u\|_{V},

    where bjb_{j} is defined in (2.3).

Applying Lemma 11 with 𝔸𝝂=𝝂(uuh)V\mathbb{A}_{\boldsymbol{\nu}}=\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}, 𝔹𝝂=(𝖨𝖯𝒚h)𝝂uV\mathbb{B}_{\boldsymbol{\nu}}=\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}}u\|_{V} and c=2πc=2\pi to the above inequality gives

𝝂(uuh)V𝒎𝝂(2π)|𝒎|(𝝂𝒎)(𝒎||!𝒃i1S(mi,i))(𝖨𝖯𝒚h)𝝂𝒎uV.\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{V}.

Since 𝝂𝒎uH2(D)\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\in H^{2}(D), we have (𝖨𝖯𝒚h)𝝂𝒎uVChΔ𝝂𝒎uL2(D)\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{V}\leq C\,h\,\|\Delta\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{L^{2}(D)} from (2.12) (with CC independent of 𝒚{\boldsymbol{y}}), which in turn gives

𝝂(uuh)V\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V} Ch𝒎𝝂(2π)|𝒎|(𝝂𝒎)(𝒎||!𝒃i1S(mi,i))Δ𝝂𝒎uL2(D).\displaystyle\leq C\,h\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\|\Delta\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{L^{2}(D)}.

Then using (4.1) from Lemma 2 with constant C1C_{1}, 𝒃𝒃¯{\boldsymbol{b}}\leq\overline{{\boldsymbol{b}}} and defining C2CC1/2C_{2}~\coloneq~{C\,C_{1}}/{2}, we obtain

𝝂(uuh)V\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}
2C2h(2π)|𝝂|fL2(D)\displaystyle\leq 2C_{2}\,h\,(2\pi)^{|{\boldsymbol{\nu}}|}\,\|f\|_{L^{2}(D)}
×𝒎𝝂(𝝂𝒎)(𝒎||!𝒃¯j1S(mj,j))(𝒌𝝂𝒎(|𝒌|+1)!𝒃¯𝒌i1S(νimi,ki))\displaystyle\qquad\times\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}\prod_{j\geq 1}S(m_{j},\ell_{j})\bigg{)}\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}(|{\boldsymbol{k}}|+1)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})\bigg{)}
=2C2h(2π)|𝝂|fL2(D)𝒎𝝂𝒃¯𝒎(𝒎(𝒎)||!(|𝒎|!+1)!)i1S(νi,mi)\displaystyle=2C_{2}\,h\,(2\pi)^{|{\boldsymbol{\nu}}|}\,\|f\|_{L^{2}(D)}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!\,(|{\boldsymbol{m}}-{\boldsymbol{\ell}}|!+1)!\bigg{)}\prod_{i\geq 1}S(\nu_{i},m_{i})
=C2h(2π)|𝝂|fL2(D)𝒎𝝂(|𝒎|+2)!𝒃¯𝒎i1S(νi,mi).\displaystyle=C_{2}\,h\,(2\pi)^{|{\boldsymbol{\nu}}|}\,\|f\|_{L^{2}(D)}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}).

We arrive at the first equality using (I) with 𝔸=||!𝒃¯\mathbb{A}_{{\boldsymbol{\ell}}}=|{\boldsymbol{\ell}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}} and 𝔹𝒌=(|𝒌|+1)!𝒃¯𝒌\mathbb{B}_{{\boldsymbol{k}}}=(|{\boldsymbol{k}}|+1)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}} and then move to the last line using the identity from [32]

𝒎(𝒎)||!(|𝒎|+1)!=(|𝒎|+2)!2,\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!(|{\boldsymbol{m}}-{\boldsymbol{\ell}}|+1)!=\frac{(|{\boldsymbol{m}}|+2)!}{2},

which gives the required result. The constant C2C_{2} is independent of hh and 𝒚{\boldsymbol{y}}. \hfill\Box

  • Proof of Lemma 4.

    We use an Aubin-Nitsche duality argument. For some linear functional 𝒢L2(D)\mathcal{G}~\in~L^{2}(D), define vgVv^{g}\in V to be the solution to the dual problem,

    𝒜(𝒚;w,vg(,𝒚))=𝒢(w)for all wV,\displaystyle\mathcal{A}({\boldsymbol{y}};w,v^{g}(\cdot,{\boldsymbol{y}}))=\mathcal{G}(w)\quad\text{for all }w\in V,

    which, since 𝒜\mathcal{A} is symmetric, is equivalent to the parametric variational problem (2.2) with ff replaced by gg, the representer of 𝒢\mathcal{G}. Thus, vg(,𝒚)Vv^{g}(\cdot,{\boldsymbol{y}})\in V inherits the regularity of the solution to (2.2) and the FE approximation vgh(,𝒚)Vhv_{g}^{h}(\cdot,{\boldsymbol{y}})\in V_{h} also satisfies (4.2).

    Letting w=uuhw=u-u_{h} (and suppressing the dependence on 𝒚{\boldsymbol{y}}), it follows from Galerkin orthogonality(2.10) that 𝒜(𝒚;uuh,vhg)=0\mathcal{A}({\boldsymbol{y}};u-u_{h},v^{g}_{h})=0, which leads to

    𝒢(uuh)=𝒜(𝒚;uuh,vg)=𝒜(𝒚;uuh,vgvhg).\displaystyle\mathcal{G}(u-u_{h})=\mathcal{A}({\boldsymbol{y}};u-u_{h},v_{g})=\mathcal{A}({\boldsymbol{y}};u-u_{h},v^{g}-v^{g}_{h}).

    Differentiating this with respect to 𝒚{\boldsymbol{y}} gives

    𝒢(𝝂(uuh))\displaystyle\mathcal{G}\big{(}\partial^{{\boldsymbol{\nu}}}(u-u_{h})\big{)} =D𝝂(Ψ(uuh)(vgvhg))d𝒙.\displaystyle=\int_{D}\partial^{{\boldsymbol{\nu}}}\big{(}\Psi\,\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}\,\mathrm{d}{\boldsymbol{x}}.

    Applying the Leibniz product rule, the integrand on the right becomes

    𝒎𝝂(𝝂𝒎)(𝒎Ψ)𝝂𝒎((uuh)(vgvhg))\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\,\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}
    =Ψ𝝂((uuh)(vgvhg))\displaystyle=\Psi\,\partial^{{\boldsymbol{\nu}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}
    +𝟎𝒎𝝂(𝝂𝒎)(𝒎Ψ)𝝂𝒎((uuh)(vgvhg))\displaystyle\hskip 28.45274pt+\sum_{{\boldsymbol{0}}\neq{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\,\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}
    =Ψ𝝂((uuh)(vgvhg))\displaystyle=\Psi\,\partial^{{\boldsymbol{\nu}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}
    +j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)ψj𝝂k𝒆j((uuh)(vgvhg))\displaystyle\hskip 28.45274pt+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\,\psi_{j}\,\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}
    =Ψ𝒎𝝂(𝝂𝒎)𝒎(uuh)𝝂𝒎(vgvhg)+j1k=1νj(νjk)(2π)ksin(2πyj+kπ2)ψj\displaystyle=\Psi\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\!\!{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\nabla\partial^{{\boldsymbol{m}}}(u-u_{h})\cdot\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}\!{\nu_{j}\choose k}(2\pi)^{k}\sin\!\bigg{(}\!2\pi y_{j}+\frac{k\pi}{2}\!\bigg{)}\psi_{j}
    ×(𝝂k𝒆j(𝝂k𝒆j)(uuh)𝝂k𝒆j(vgvhg)),\displaystyle\hskip 56.9055pt\times\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\nabla\partial^{{\boldsymbol{\ell}}}(u-u_{h})\cdot\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\bigg{)},

    where we have substituted in the bound (II.3) and applied the Leibniz product rule to 𝝂k𝒆j((uuh)(vgvhg))\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}.

Now, taking the absolute value and using the Cauchy-Schwarz inequality gives

|𝒢(𝝂(uuh))|Ψmax𝒎𝝂(𝝂𝒎)𝒎(uuh)V𝝂𝒎(vgvhg)V\displaystyle\big{|}\mathcal{G}\big{(}\partial^{\boldsymbol{\nu}}(u-u_{h})\big{)}\big{|}\leq\Psi_{\max}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\|\partial^{{\boldsymbol{m}}}(u-u_{h})\|_{V}\,\|\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})\|_{V} (II.10)
+Ψminj1k=1νj(νjk)(2π)kbj𝝂k𝒆j(𝝂k𝒆j)(uuh)V𝝂k𝒆j(vgvhg)V,\displaystyle+\Psi_{\min}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\!\!\!\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\|_{V}\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\|_{V},

where we have also used the definition of bjb_{j} in (2.3).

The terms in the first sum in (II.10) can be bounded by (4.2) from Lemma 3 to give

𝒎𝝂\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}} (𝝂𝒎)𝒎(uuh)V𝝂𝒎(vgvhg)V\displaystyle{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\|\partial^{{\boldsymbol{m}}}(u-u_{h})\|_{V}\,\|\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})\|_{V}
\displaystyle\leq\, C2h2fL2(D)gL2(D)(2π)|𝝂|𝒎𝝂(𝝂𝒎)\displaystyle C_{2}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}
×(𝒎(||+2)!𝒃¯i1S(mi,i))(𝒌𝝂𝒎(|𝒌|+2)!𝒃¯𝒌i1S(νimi,ki))\displaystyle\times\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}(|{\boldsymbol{\ell}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\,\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}(|{\boldsymbol{k}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})\bigg{)}
=\displaystyle=\, C230h2fL2(D)gL2(D)(2π)|𝝂|𝒎𝝂𝒃¯𝒎(|𝒎|+5)!i1S(νi,mi),\displaystyle\frac{C_{2}}{30}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(|{\boldsymbol{m}}|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}), (II.11)

where C2C_{2} denotes the constant factor from (4.2) and we obtain the last equality using (I) with 𝔸=(||+2)!𝒃¯\mathbb{A}_{{\boldsymbol{\ell}}}=(|{\boldsymbol{\ell}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}} and 𝔹𝒌=(|𝒌|+2)!𝒃¯𝒌\mathbb{B}_{{\boldsymbol{k}}}=(|{\boldsymbol{k}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}}, along with the identity from [32]

𝒎(𝒎)(||+2)!(|𝒎|+2)!=(|𝒎|+5)!30.\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+2)!\,(|{\boldsymbol{m}}-{\boldsymbol{\ell}}|+2)!=\frac{(|{\boldsymbol{m}}|+5)!}{30}. (II.12)

Similarly, for the summation over the index {\boldsymbol{\ell}} in (II.10), we again use (4.2) from Lemma 3 along with (I) and (II.12) to obtain

𝝂k𝒆j(𝝂k𝒆j)(uuh)V𝝂k𝒆j(vgvhg)V\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\|_{V}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\|_{V}
C2h2fL2(D)gL2(D)(2π)|𝝂|k𝝂k𝒆j𝒃¯(||+5)!S(νjk,j)iji1S(νi,i).\displaystyle\leq C_{2}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|-k}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\,(|{\boldsymbol{\ell}}|+5)!\,S(\nu_{j}-k,\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i}).

Substituting this back into the sum indexed by jj in (II.10), we have

j1k=1νj(νjk)(2π)kbj𝝂k𝒆j(𝝂k𝒆j)(uuh)V𝝂k𝒆j(vgvhg)V\displaystyle\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\|_{V}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\|_{V}
C2h2fL2(D)gL2(D)(2π)|𝝂|\displaystyle\leq C_{2}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}
×j1k=1νj(νjk)bj𝝂k𝒆j𝒃¯(||+5)!S(νjk,j)iji1S(νi,i)Θj.\displaystyle\qquad\qquad\qquad\times\sum_{j\geq 1}\underbrace{\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}b_{j}\!\!\!\!\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!\!\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\,(|{\boldsymbol{\ell}}|+5)!\,S(\nu_{j}-k,\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})}_{\eqcolon\,\Theta_{j}}. (II.13)

To bound Θj\Theta_{j}, we use the same technique as in Lemma 11. We separate out component j\ell_{j} from the innermost sum over {\boldsymbol{\ell}}, bound bjb_{j} by b¯j\overline{b}_{j} then swap the order of the sums over kk and j\ell_{j} so that (I.1) can be used to evaluate the sum over kk. This gives

Θj\displaystyle\Theta_{j} =k=1νj(νjk)bj𝝂j=0νjk𝒃¯(iji1S(νi,i))b¯jj(||+j+5)!S(νjk,j)\displaystyle=\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}b_{j}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{\ell_{j}=0}^{\nu_{j}-k}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\,\overline{b}_{j}^{\ell_{j}}\,(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}+5)!\,S(\nu_{j}-k,\ell_{j})
𝝂𝒃¯(iji1S(νi,i))j=0νj1k=1νjj(νjk)b¯jj+1(||+j+5)!S(νjk,j)\displaystyle\leq\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=0}^{\nu_{j}-1}\sum_{k=1}^{\nu_{j}-\ell_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}^{\ell_{j}+1}\,(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}+5)!\,S(\nu_{j}-k,\ell_{j})
=𝝂𝒃¯(iji1S(νi,i))j=0νj1b¯jj+1(||+j+5)!(j+1)S(νj,j+1)\displaystyle=\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=0}^{\nu_{j}-1}\overline{b}_{j}^{\ell_{j}+1}\,(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}+5)!\,(\ell_{j}+1)\,S(\nu_{j},\ell_{j}+1)
=𝝂𝒃¯(iji1S(νi,i))j=1νjb¯jj(||+j+4)!jS(νj,j),\displaystyle=\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=1}^{\nu_{j}}\overline{b}_{j}^{\ell_{j}}\,(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}+4)!\,\ell_{j}\,S(\nu_{j},\ell_{j}),

We can add the terms j=0\ell_{j}=0 to the sum due to the presence of the factor j\ell_{j} and thus

Θj𝝂𝒃¯(||+4)!ji1S(νi,i),\displaystyle\Theta_{j}\leq\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+4)!\,\ell_{j}\,\prod_{i\geq 1}S(\nu_{i},\ell_{i}),

and using j1j=||||+5\sum_{j\geq 1}\ell_{j}=|{\boldsymbol{\ell}}|\leq|{\boldsymbol{\ell}}|+5 we have,

j1Θj𝝂𝒃¯(||+5)!i1S(νi,i).\displaystyle\sum_{j\geq 1}\Theta_{j}\leq\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+5)!\,\prod_{i\geq 1}S(\nu_{i},\ell_{i}). (II.14)

Combining (II.14), (II.13), (II.11) and (II.10), we have

|𝒢(𝝂(uuh))|\displaystyle|\mathcal{G}(\partial^{{\boldsymbol{\nu}}}(u-u_{h}))|
ΨmaxC230h2fL2(D)gL2(D)(2π)|𝝂|𝒎𝝂𝒃¯𝒎(|𝒎|+5)!i1S(νi,mi)\displaystyle\leq\Psi_{\max}\frac{C_{2}}{30}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(|{\boldsymbol{m}}|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i})
+ΨminC2h2fL2(D)gL2(D)(2π)|𝝂|𝝂𝒃¯(||+5)!i1S(νi,i)\displaystyle\hskip 28.45274pt+\Psi_{\min}\,C_{2}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+5)!\,\prod_{i\geq 1}S(\nu_{i},\ell_{i})
=C2(Ψmax30+Ψmin)h2fL2(D)gL2(D)(2π)|𝝂|𝒎𝝂𝒃¯𝒎(|𝒎|+5)!i1S(νi,mi).\displaystyle=C_{2}\Big{(}\frac{\Psi_{\max}}{30}+\Psi_{\min}\Big{)}\,h^{2}\,\|f\|_{L^{2}(D)}\|g\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(|{\boldsymbol{m}}|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}).

Finally, we let 𝒢\mathcal{G} be the functional with representer g=𝝂(uuh)(,𝒚)g=\partial^{{\boldsymbol{\nu}}}(u-u_{h})(\cdot,{\boldsymbol{y}}), i.e., 𝒢(v)=|𝝂(uuh),v|\mathcal{G}(v)=|\langle\partial^{{\boldsymbol{\nu}}}(u-u_{h}),v\rangle| for vVv\in V, which gives

𝝂(uuh)L2(D)2\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|^{2}_{L^{2}(D)}
h2fL2(D)𝝂(uuh)L2(D)(2π)|𝝂|𝒎𝝂𝒃¯𝒎(|𝒎|+5)!i1S(νi,mi),\displaystyle\qquad\lesssim\,h^{2}\,\|f\|_{L^{2}(D)}\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{L^{2}(D)}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(|{\boldsymbol{m}}|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}),

where the implied constant is independent of hh and 𝒚{\boldsymbol{y}}. Finally, dividing through by 𝝂(uuh)L2(D)\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{L^{2}(D)} yields the required result (4.2). \hfill\Box