Multilevel lattice-based kernel approximation for elliptic PDEs with random coefficients

Alexander D. Gilbert¹¹1School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au, f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au, a.srikumar@unsw.edu.au Michael B. Giles²²2Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK.
mike.giles@maths.ox.ac.uk Frances Y. Kuo¹¹1School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au, f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au, a.srikumar@unsw.edu.au Ian H. Sloan¹¹1School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au, f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au, a.srikumar@unsw.edu.au Abirami Srikumar¹¹1School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.
alexander.gilbert@unsw.edu.au, f.kuo@unsw.edu.au,
i.sloan@unsw.edu.au, a.srikumar@unsw.edu.au

(September 1, 2025)

Abstract

This paper introduces a multilevel kernel-based approximation method to estimate efficiently solutions to elliptic partial differential equations (PDEs) with periodic random coefficients. Building upon the work of Kaarnioja, Kazashi, Kuo, Nobile, Sloan (Numer. Math., 2022) on kernel interpolation with quasi-Monte Carlo (QMC) lattice point sets, we leverage multilevel techniques to enhance computational efficiency while maintaining a given level of accuracy. In the function space setting with product-type weight parameters, the single-level approximation can achieve an accuracy of $\varepsilon>0$ with cost $\mathcal{O}(\varepsilon^{-\eta-\nu-\theta})$ for positive constants $\eta,\nu,\theta$ depending on the rates of convergence associated with dimension truncation, kernel approximation, and finite element approximation, respectively. Our multilevel approximation can achieve the same $\varepsilon$ accuracy at a reduced cost $\mathcal{O}(\varepsilon^{-\eta-\max(\nu,\theta)})$ . Full regularity theory and error analysis are provided, followed by numerical experiments that validate the efficacy of the proposed multilevel approximation in comparison to the single-level approach.

1 Introduction

Kernel-based interpolation using a quasi-Monte Carlo (QMC) lattice design was first introduced in [48], where the authors analysed splines constructed from reproducing kernel functions on a lattice point set for periodic function approximation. They found that the special structure of a lattice point set coupled with periodic kernel functions led to linear systems with a circulant matrix that were able to be solved efficiently via the fast Fourier transform (FFT). The recent paper [23] applies kernel interpolation for approximating the solutions to partial differential equations (PDEs) with periodic random coefficients over the parametric domain. In this paper, we seek to enhance the kernel-based approximation method for estimating PDEs over the parametric domain by leveraging multilevel methods [17].

We are interested in the following parametric elliptic PDE

	$\displaystyle-\nabla\cdot(\Psi({\boldsymbol{x}},{\boldsymbol{y}})\nabla u({\boldsymbol{x}},{\boldsymbol{y}}))\,$	$\displaystyle=\,f({\boldsymbol{x}}),\,$	$\displaystyle{\boldsymbol{x}}\in D,$		(1.1)
	$\displaystyle u({\boldsymbol{x}},{\boldsymbol{y}})\,$	$\displaystyle=\,0,$	$\displaystyle{\boldsymbol{x}}\in\partial D,$

where the physical variable ${\boldsymbol{x}}$ belongs to a bounded convex domain $D\subset\mathbb{R}^{d}$ , for $d=1,2,$ or $3$ , and

{\boldsymbol{y}}\in\Omega\coloneqq[0,1]^{\mathbb{N}}

is a countable vector of parameters. It is assumed that the input field $\Psi(\cdot,{\boldsymbol{y}})$ in (1.1) is represented by the periodic model introduced recently in [24], which is periodic in ${\boldsymbol{y}}$ and given by the series expansion

\Psi({\boldsymbol{x}},{\boldsymbol{y}})\,\coloneqq\,\psi_{0}({\boldsymbol{x}})+\sum_{j=1}^{\infty}\sin(2\pi y_{j})\,\psi_{j}({\boldsymbol{x}}).

(1.2)

Here ${\boldsymbol{y}}\coloneqq(y_{1},y_{2},\ldots)$ , where each $y_{j}$ is independent and uniformly distributed on $[0,1]$ . The functions $\psi_{j}\in L^{\infty}(D)$ are known and deterministic such that $\Psi(\cdot,{\boldsymbol{y}})\in L^{\infty}(D)$ for all ${\boldsymbol{y}}\in\Omega$ . Additional requirements on $\psi_{j}$ will be introduced later as necessary.

The goal is to approximate efficiently the solution $u({\boldsymbol{x}},{\boldsymbol{y}})$ in both ${\boldsymbol{x}}\in D$ and ${\boldsymbol{y}}\in\Omega$ simultaneously. We will follow a similar method to [23], where the approximation is based on discretising the spatial domain $D$ using finite elements and applying kernel approximation over the parametric domain $\Omega$ based on a lattice point set. This paper introduces a new multilevel approximation that spreads the work over a hierarchy of finite element meshes and kernel interpolants so that the overall cost is reduced.

The solution to (1.1) with the coefficient given by (1.2) lies in a periodic function space (with respect to the parametric domain) equipped with a reproducing kernel $\mathcal{K}(\cdot,\cdot)$ . Given evaluations $u(\cdot,{\boldsymbol{t}}_{k})$ on a lattice point set $\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}\subset\Omega$ , the solution $u=u(\cdot,{\boldsymbol{y}})$ can be approximated over $\Omega$ using a kernel interpolant given by

\displaystyle I_{N}u(\cdot,{\boldsymbol{y}})\coloneq\sum_{k=0}^{N-1}a_{k}\,\mathcal{K}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\qquad\mbox{for }{\boldsymbol{y}}\in\Omega,

(1.3)

where $I_{N}$ is the kernel interpolation operator on $\Omega$ . The coefficients $a_{k}=a_{k}({\boldsymbol{x}})$ for $k=0,\ldots,N-1$ are obtained by solving the linear system associated with interpolation at the lattice points via FFT. Combining (1.3) with a finite element discretisation leads to an approximation of $u$ over $D\times\Omega$ . This is the method employed by [23], which we will refer to as the single-level kernel interpolant (see also Section 2.6 for details).

To introduce the multilevel kernel approximation, consider now a sequence of kernel interpolants $I_{\ell}\coloneq I_{N_{\ell}}$ , for $\ell=0,1,\ldots$ , based on a sequence of embedded lattice point sets with nonincreasing size $N_{0}\geq N_{1}\geq\cdots$ and a sequence of nested finite element approximations $u_{\ell}(\cdot,{\boldsymbol{y}})$ for $\ell=0,1,\ldots$ , where $u_{\ell}$ becomes increasingly accurate (and thus has increasing cost) as $\ell$ increases. Omitting the ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ dependence, the multilevel kernel approximation with maximum level $L\in\mathbb{N}$ is given by

I^{\rm{ML}}_{L}u\,\coloneqq\,I_{0}u_{0}+\sum_{\ell=1}^{L}I_{\ell}(u_{\ell}-u_{\ell-1}).

(1.4)

The motivation of such an algorithm is to reduce the overall computational cost compared to the single-level method while achieving the same level of accuracy. The cost savings come from interpolating the difference $u_{\ell}-u_{\ell-1}$ , which converges to 0 as $\ell$ increases, thus requiring an interpolation approximation with fewer points to achieve a comparable level of accuracy. Indeed, as will be demonstrated later, to achieve an accuracy of $\mathcal{O}(\varepsilon)$ with the single-level approximation in the function space setting with product-type weight parameters, the computational cost is $\mathcal{O}(\varepsilon^{-\eta-\nu-\theta})$ for positive constants $\eta,\nu,\theta$ depending on the rates of convergence for dimension truncation, kernel approximation, and finite element approximation, respectively; whereas for the multilevel approximation, the cost is reduced to $\mathcal{O}(\varepsilon^{-\eta-\max(\nu,\theta)})$ . The main contribution of this work is to introduce the multilevel kernel approximation along with a full error and cost analysis.

Parametric PDEs of the form (1.1) can be used to model steady-state flow through porous media and have been thoroughly studied in uncertainty quantification literature (see e.g., [2, 4, 5, 6, 44]). QMC methods have achieved much success in tackling parametric PDE problems, including evaluating expected values of quantities of interest (see, e.g., [13, 14, 19, 33, 34]), as well as density estimation (see [14]). The two most common forms of random coefficient are the uniform model (see e.g., [6, 33, 34]) and lognormal model (see e.g., [2, 4, 44]), and as an alternative, the periodic model (1.2) was introduced in [24] to exploit fast Fourier methods.

Multivariate function approximation is another area where QMC methods have recently been successful. One example is trigonometric approximation in periodic spaces, where lattice rules are used to evaluate the coefficients in a finite (or truncated) Fourier expansion (see e.g., [1, 26, 27, 28, 29, 35, 36]). Fast component-by-component algorithms for constructing good lattice rules for trigonometric approximation in periodic spaces were presented and analysed in [7, 8].

There has also been research on kernel methods for approximation such as radial basis approximation for interpolating scattered data, signal processing, and meshless methods for solving PDEs (see e.g., [11, 21, 37, 40, 45, 46]). Approximation in $L^{2}$ using kernel interpolation on a lattice in a periodic setting was first analysed in [48], which also highlighted the advantage that linear systems involved can be solved efficiently via FFT; see also [47] for the $L^{\infty}$ case. The paper [23] introduced single level kernel interpolation with lattice points for parametric PDEs and also provided a full analysis of the $L^{2}$ error on $D\times\Omega$ .

Multilevel methods were first introduced in [22] for parametric integration and then developed further in [15, 16] for pricing financial options by computing the expected value of a pay-off depending on a stochastic differential equation utilising paths simulated via Monte Carlo (MC). These papers showed that the overall computational cost of the multilevel estimator was lower than that of the direct estimate at the same level of error. Multilevel MC methods were extended to multilevel QMC in [18]. Subsequently, multilevel methods with MC and QMC have been successfully used in several papers to compute expectations of quantities of interest from parametric PDEs (see e.g., [2, 5, 13, 34]). The paper [43] employed a similar multilevel strategy to approximate the PDE solution on $D\times\Omega$ , where instead of using kernel interpolation on the parameter domain (as in this paper) they used sparse grid stochastic collocation.

To the best of our knowledge, this paper is the first to use a multilevel approach with a QMC method to approximate the solution of a PDE as a function over both the spatial and parametric domains.

The structure of the paper is as follows. Section 2 summarises the problem setting and essential background on dimension truncation, finite element methods and kernel interpolation. Section 3 introduces the multilevel kernel approximation alongside a breakdown of the error and the corresponding multilevel cost analysis. Section 4 includes the regularity analysis required for the error analysis presented in Section 5. Practical details on implementing our multilevel approximation are covered in Section 6. Finally, Section 7 presents results of the numerical experiments. Any technical proofs not detailed in the main text are provided in the appendix.

2 Background

2.1 Notation

Let ${\boldsymbol{\nu}}=(\nu_{j})_{j\geq 1}$ with $\nu_{j}\in\mathbb{N}_{0}$ be a multi-index, and let $|{\boldsymbol{\nu}}|\coloneqq\sum_{j\geq 1}\nu_{j}$ and $\mathrm{supp}({\boldsymbol{\nu}})\coloneq\{j\geq 1:\nu_{j}\neq 0\}$ . Define $\mathcal{F}$ to be the set of multi-indices with finite support, i.e., $\mathcal{F}\coloneqq\{{\boldsymbol{\nu}}\in\mathbb{N}_{0}^{\infty}\,:\,|\mathrm{supp}({\boldsymbol{\nu}})|<\infty\}.$ Multi-index notation is used for the parametric derivatives, i.e., the mixed partial derivative of order ${\boldsymbol{\nu}}\in\mathcal{F}$ with respect to ${\boldsymbol{y}}$ is denoted by $\partial^{{\boldsymbol{\nu}}}=\prod_{j\geq 1}(\partial/\partial y_{j})^{\nu_{j}}$ .

For ${\boldsymbol{\nu}},{\boldsymbol{m}}\in\mathcal{F}$ , we define $\binom{{\boldsymbol{\nu}}}{{\boldsymbol{m}}}\coloneqq\prod_{j\geq 1}\binom{\nu_{j}}{m_{j}}$ and interpret ${\boldsymbol{m}}\leq{\boldsymbol{\nu}}$ to be $m_{j}\leq\nu_{j}$ for all $j\geq 1$ (i.e., comparison between indices is done componentwise). For a given sequence ${\boldsymbol{b}}=(b_{j})_{j\geq 1}$ , we define ${\boldsymbol{b}}^{{\boldsymbol{\nu}}}\coloneqq\prod_{j\geq 1}b_{j}^{\nu_{j}}$ .

The Stirling numbers of the second kind are given by

\displaystyle S(0,0)\coloneq 1,\quad S(n,k)\coloneq\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}{k\choose j}j^{n}\quad\text{for }0\leq k\leq n.

(2.1)

The notation $A\lesssim B$ means that there exists some $c>0$ such that $A\leq cB$ , and $A\simeq B$ means that there exists constants $c_{1},c_{2}>0$ such that $A\leq c_{1}B$ and $B\leq c_{2}A$ .

For $g\in L^{2}(D)\times L^{2}(\Omega)=L^{2}(D\times\Omega)$ , we define the $L^{2}$ -norm on $D\times\Omega$ by

\displaystyle\|g\|_{L^{2}(D\times\Omega)}=\bigg{(}\int_{\Omega}\int_{D}|g({\boldsymbol{x}},{\boldsymbol{y}})|^{2}\,\mathrm{d}{\boldsymbol{x}}\,\mathrm{d}{\boldsymbol{y}}\bigg{)}^{1/2}=\bigg{(}\int_{\Omega}\|g(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}^{2}\,\mathrm{d}{\boldsymbol{y}}\bigg{)}^{1/2}.

Furthermore, note that all functions in this paper are measurable, so by Fubini’s Theorem we can swap the order of the integrals to give $\|\cdot\|_{L^{2}(D\times\Omega)}\equiv\|\cdot\|_{L^{2}(\Omega\times D)}$ .

2.2 Parametric variational formulation

Let $V\coloneq H_{0}^{1}(D)$ denote the usual first order Sobolev space of functions on ${\boldsymbol{x}}\in D$ that vanish on the boundary, with associated norm $\|v\|_{V}\coloneq\|\nabla v\|_{L^{2}(D)}.$ Multiplying both sides of (1.1) by a test function $v\in V$ and then integrating with respect to ${\boldsymbol{x}}$ , using the divergence theorem, yields the variational equation: find $u(\cdot,{\boldsymbol{y}})\in V$ such that

\displaystyle\mathcal{A}({\boldsymbol{y}};u(\cdot,{\boldsymbol{y}}),v)=\langle f,v\rangle\quad\text{for all }v\in V,

(2.2)

where $f\in V^{\prime}$ , with $V^{\prime}\coloneq H^{-1}(D)$ denoting the dual space of $V$ , and $\mathcal{A}({\boldsymbol{y}};\cdot,\cdot):V\times V\to\mathbb{R}$ is the parametric bilinear form defined by

\displaystyle\mathcal{A}({\boldsymbol{y}};u,v)\coloneqq\int_{D}\Psi({\boldsymbol{x}},{\boldsymbol{y}})\nabla u({\boldsymbol{x}})\cdot\nabla v({\boldsymbol{x}})\,\mathrm{d}{\boldsymbol{x}}\quad\text{for }u,v\in V.

The $L^{2}(D)$ inner product $\langle\,\cdot,\cdot\,\rangle$ is extended continuously to the duality pairing on $V\times V^{\prime}$ .

The variational problem (2.2) is subject to the same assumptions as in [24]:

(A1)

$\psi_{0}\in L^{\infty}(D)$ and $\sum_{j\geq 1}\|\psi_{j}\|_{L^{\infty}(D)}<\infty$ ,
(A2)

there are positive constants $\Psi_{\min}$ and $\Psi_{\max}$ such that $0<\Psi_{\min}\leq\Psi({\boldsymbol{x}},{\boldsymbol{y}})\leq\Psi_{\max}<\infty$ for all ${\boldsymbol{x}}\in D$ and ${\boldsymbol{y}}\in\Omega$ ,
(A3)

$\sum_{j\geq 1}\|\psi_{j}\|^{p}_{L^{\infty}(D)}<\infty$ for some $0<p<1$ ,
(A4)

$\psi_{0}\in W^{1,\infty}(D)$ and $\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}<\infty$ ,
(A5)

$\|\psi_{1}\|_{L^{\infty}(D)}\geq\|\psi_{2}\|_{L^{\infty}(D)}\geq\cdots$ , and
(A6)

the physical domain $D\subset\mathbb{R}^{d}$ , where $d=1,2$ or $3$ , is a convex and bounded polyhedron with plane faces.

Here $W^{1,\infty}(D)$ is the Sobolev space with essentially bounded first order weak derivatives, equipped with the norm $\|v\|_{W^{1,\infty}(D)}\coloneq\max\{\|v\|_{L^{\infty}(D)},\|\nabla v\|_{L^{\infty}(D)}\}$ . For the new multilevel analysis, we will also require additional assumptions:

(A7)

$\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}^{q}<\infty$ for some $0<p<q\leq 1$ , and
(A8)

$f\in L^{2}(D)$ .

We additionally define

\displaystyle b_{j}\coloneqq\frac{\|\psi_{j}\|_{L^{\infty}(D)}}{\Psi_{\min}}\quad\text{and}\quad\overline{b}_{j}\coloneqq\frac{\|\psi_{j}\|_{W^{1,\infty}(D)}}{\Psi_{\min}}.

(2.3)

From the Lax-Milgram Lemma, (2.2) is uniquely solvable for all ${\boldsymbol{y}}\in\Omega$ and this solution will satisfy the a priori bound

\displaystyle\|u(\cdot,{\boldsymbol{y}})\|_{V}\leq\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}.

(2.4)

In addition, from [24, Theorem 2.3] the parametric derivatives are bounded by

\displaystyle\|\partial^{{\boldsymbol{\nu}}}u(\cdot,{\boldsymbol{y}})\|_{V}\leq\frac{\|f\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}|{\boldsymbol{m}}|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})\quad\text{for }{\boldsymbol{\nu}}\in\mathcal{F}.

(2.5)

2.3 Dimension truncation

To approximate $u(\cdot,{\boldsymbol{y}})$ in ${\boldsymbol{y}}$ , the infinite-dimensional parameter domain $\Omega$ must be first truncated to a finite number of dimensions $s$ . This is done by setting $y_{j}=0$ for $j>s$ , where we define the truncated parameter ${\boldsymbol{y}}_{1:s}\coloneq(y_{1},y_{2},\ldots,y_{s},0,\ldots)$ , or equivalently by truncating the coefficient expansion (1.2) to $s$ terms. The dimension-truncated solution is denoted by $u^{s}(\cdot,{\boldsymbol{y}})\coloneq u(\cdot,{\boldsymbol{y}}_{1:s})$ and it is obtained by solving the variational problem (2.2) at ${\boldsymbol{y}}={\boldsymbol{y}}_{1:s}$ . With a slight abuse of notation we treat ${\boldsymbol{y}}_{1:s}$ as a vector in the $s$ -dimensional parameter domain $\Omega_{s}\coloneqq[0,1]^{s}$ .

The dimension-truncated problem is subject to all the same assumptions as the variational problem (2.2) (i.e., Assumptions (A(A0))–(A(A6)) hold), hence, the a priori bound (2.4) and regularity bound (2.5) also hold here. Additionally, we have from [23, Theorem 4.1] that

\displaystyle\|u-u^{s}\|_{L^{2}(D\times\Omega)}\,\lesssim\,\|f\|_{V^{\prime}}\,s^{-(\frac{1}{p}-\frac{1}{2})},

(2.6)

where the implied constant is independent of $s$ and $f$ .

2.4 Finite element methods

The solution to the PDE (1.1) will be approximated by discretising in space using the finite element (FE) method. We consider piecewise linear FE methods, however, the multilevel method can be applied using more general discretisations. Denote by $V_{h}\subset V$ the space of continuous piecewise linear functions on a shape-regular triangulation of $D$ with mesh width $h>0$ and $M\coloneq\dim(V_{h})=\mathcal{O}(h^{-d})$ . For ${\boldsymbol{y}}\in\Omega$ , the FE approximation of $u(\cdot,{\boldsymbol{y}})$ from (2.2) is obtained by finding $u_{h}(\cdot,{\boldsymbol{y}})\in V_{h}$ such that

\mathcal{A}({\boldsymbol{y}},u_{h}(\cdot,{\boldsymbol{y}}),v_{h})\,=\,\langle f,v_{h}\rangle\quad\text{for all }v_{h}\in V_{h}.

(2.7)

The FE basis functions for the space $V_{h}$ are denoted by $\phi_{h,i}$ , $i=1,2,\ldots,M$ , and the FE approximation with coefficients given by $[u_{h,i}({\boldsymbol{y}})]_{i=1}^{M}$ can be represented as

u_{h}({\boldsymbol{x}},{\boldsymbol{y}})\,=\,\sum_{i=1}^{M}u_{h,i}({\boldsymbol{y}})\,\phi_{h,i}({\boldsymbol{x}})\,.

(2.8)

Since $V_{h}\subset V$ , the a priori bound (2.4) and the regularity bound (2.5) also hold for the FE approximation $u_{h}$ , as well as for FE approximation of the dimension-truncated solution, denoted $u_{h}^{s}$ .

From [23, Theorem 4.3], we have that under Assumptions (A(A0)), (A(A0)), (A(A0)), (A(A0)), and (A(A6)), the error of FE approximation satisfies

\displaystyle\|u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\,\lesssim\,h^{2}\,\|f\|_{L^{2}(D)}\qquad\mbox{as}\,\,\,h\to 0,

(2.9)

where the implied constant is independent of $h$ , $f$ and ${\boldsymbol{y}}$ . By Galerkin orthogonality, the FE error $u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}})$ is orthogonal to $V_{h}$ , i.e.,

\displaystyle\mathcal{A}({\boldsymbol{y}};\,u(\cdot,{\boldsymbol{y}})-u_{h}(\cdot,{\boldsymbol{y}}),\,v_{h})=0\quad\text{for all }v_{h}\in V_{h}.

(2.10)

We also define $\mathsf{I}:V\to V$ to be the identity operator and $\mathsf{P}^{h}_{{\boldsymbol{y}}}:V\to V_{h}$ to be the parametric FE projection operator onto $V_{h}$ , which is defined for some $w\in V$ by

\displaystyle\mathcal{A}({\boldsymbol{y}};\,(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})w,\,v_{h})=0\quad\text{for all }v_{h}\in V_{h}.

(2.11)

The approximation $u_{h}(\cdot,{\boldsymbol{y}})$ is the projection of $u$ onto $V_{h}$ , i.e., $u_{h}=\mathsf{P}^{h}_{\boldsymbol{y}}u\in V_{h}$ , and by the definition of a projection $(\mathsf{P}^{h}_{\boldsymbol{y}})^{2}=\mathsf{P}^{h}_{\boldsymbol{y}}$ on $V_{h}$ .

Under Assumptions (A(A0)), (A(A0)) and (A(A0)), we have the following result from [3, Theorem 3.2.2] which holds for all $w~\in~H^{2}(D)\cap V$ ,

\displaystyle\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})w\|_{V}\lesssim h\,\|\Delta w\|_{L^{2}(D)}\quad\text{as }h\to 0,

(2.12)

where the implied constant is independent of $h$ and ${\boldsymbol{y}}$ .

2.5 Lattice-based kernel interpolation

The solution $u(\cdot,{\boldsymbol{y}})$ will be approximated via kernel interpolation in the dimension-truncated parametric domain $\Omega_{s}$ . Consider the weighted Korobov space $\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ , which is the Hilbert space of one-periodic $L^{2}$ functions defined on $\Omega_{s}$ with absolutely convergent Fourier series and square-integrable mixed derivatives of order $\alpha$ . We restrict $\alpha\geq 1$ to be an integer smoothness parameter¹¹1In general, $\alpha$ need not be an integer (e.g., see [23, 42]), however, to have a simple, closed-form representation of the reproducing kernel and norm we restrict ourselves to integer $\alpha$ . and include weight parameters ${\boldsymbol{\gamma}}=\{\gamma_{\mathrm{\mathfrak{u}}}>0:{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}\}$ that model the relative importance of different groups of parametric variables. The space $\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ is a reproducing kernel Hilbert space, equipped with the norm

\displaystyle\|g\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\coloneq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathrm{\mathfrak{u}}}|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{|{\mathrm{\mathfrak{u}}}|}}\bigg{|}\int_{[0,1]^{s-|{\mathrm{\mathfrak{u}}}|}}\!\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}g({\boldsymbol{y}})\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\bigg{|}^{2}\,\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}},

(2.13)

where ${\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}\coloneq(y_{j})_{j\in{\mathrm{\mathfrak{u}}}}$ and ${\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\coloneq(y_{j})_{j\in\{1:s\}\backslash{\mathrm{\mathfrak{u}}}}$ . The reproducing kernel for this space, $\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}:\Omega_{s}\times\Omega_{s}\to\mathbb{R}$ , is given by

\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{y}},{\boldsymbol{y}}^{\prime})\,\coloneqq\,\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\gamma_{\mathrm{\mathfrak{u}}}\prod_{j\in{\mathrm{\mathfrak{u}}}}\bigg{[}(-1)^{\alpha+1}\frac{(2\pi)^{2\alpha}}{(2\alpha)!}B_{2\alpha}(|y_{j}-y^{\prime}_{j}|)\bigg{]},

where $B_{2\alpha}$ is the Bernoulli polynomial of degree $2\alpha$ .

Consider a set of lattice points $\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}$ defined by

\displaystyle{\boldsymbol{t}}_{k}=\frac{k{\boldsymbol{z}}\,\mathrm{mod}\,N}{N}\qquad\mbox{for}\quad k=0,\ldots,N-1,

where ${\boldsymbol{z}}\in\mathbb{N}^{s}$ is a generating vector with components in $\{1,\ldots,N-1\}$ that are coprime to $N$ . For $g\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ , the lattice-based kernel interpolant (defined using function values of $g$ evaluated at the points $\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}$ ) is given by

I_{N}g({\boldsymbol{y}})\,\coloneqq\,I^{s}_{N}g({\boldsymbol{y}})=\sum_{k=0}^{N-1}a_{N,k}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}}),

(2.14)

such that $I_{N}g({\boldsymbol{t}}_{k^{\prime}})=g({\boldsymbol{t}}_{k^{\prime}})$ for all $k^{\prime}=0,\ldots,N-1$ . The generating vector ${\boldsymbol{z}}$ is obtained using a component-by-component (CBC) construction algorithm similar to those described in [7, 8, 23], where components of the vector are selected to minimise a bound on the worst-case error of approximation using the kernel method. To ensure that $I_{N}g$ interpolates $g$ at the lattice points, the coefficients $a_{N,k}$ are obtained by solving the linear system

K_{N,\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{a}}_{N}\,=\,{\boldsymbol{g}}_{N}\,,

(2.15)

with ${\boldsymbol{a}}_{N}\,=\,[a_{N,k}]_{k=0}^{N-1}$ , $K_{N,\alpha,{\boldsymbol{\gamma}}}=[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})]_{k,k^{\prime}=0}^{N-1}$ and ${\boldsymbol{g}}_{N}=[g({\boldsymbol{t}}_{k})]_{k=0}^{N-1}$ .

Due to the periodic and symmetric nature of the kernel, along with the properties of the lattice point set, the elements of $K_{N,\alpha,{\boldsymbol{\gamma}}}$ satisfy

\displaystyle[K_{N,\alpha,{\boldsymbol{\gamma}}}]_{k,k^{\prime}}=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k}-{\boldsymbol{t}}_{k^{\prime}},{\boldsymbol{0}})=\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{(k-k^{\prime})\,\mathrm{mod}\,N},{\boldsymbol{0}})

for $k,k^{\prime}=0,\ldots,N-1$ . This implies that $K_{N,\alpha,{\boldsymbol{\gamma}}}$ is a symmetric, circulant matrix uniquely determined by its first column and can be diagonalised via FFT at cost $\mathcal{O}(N\log N)$ . The kernel only needs to be evaluated at $\lceil N/2\rceil$ lattice points, since the first column is symmetric about its midpoint, and the linear system (2.15) can be solved after diagonalising using the FFT.

A bound on the approximation error for the kernel interpolation of $g\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ using a CBC generated lattice point set is given in [23, Theorem 3.3], which states that

\displaystyle\|(I-I_{N})g\|_{L^{2}(\Omega_{s})}\leq\frac{\kappa}{[\varphi(N)]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(|{\mathrm{\mathfrak{u}}}|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}\bigg{)}^{\frac{1}{2\lambda}}\|g\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}

(2.16)

for all $\lambda\in(\frac{1}{2\alpha},1]$ with $\kappa\coloneq\sqrt{2}\,(2^{2\alpha\lambda+1}+1)^{\frac{1}{4\lambda}}$ . Here $I\coloneq I^{s}:\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})\to\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ is the identity operator, $\zeta$ is the Riemann zeta function defined by $\zeta(x)=\sum_{j=1}^{\infty}j^{-x}$ , and $\varphi$ is the Euler totient function. Note that the original theorem presented in [23] requires the number of points $N$ to be prime, however, using [30, Theorem 3.4] the result above has been extended to non-prime $N$ .

There are $2^{s}$ weight parameters $\{\gamma_{\mathrm{\mathfrak{u}}}\}_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}$ , too many to specify individually in practice. Therefore, special forms of weights $\gamma_{\mathrm{\mathfrak{u}}}$ have been considered, including:

•

Product weights: $\gamma_{\mathrm{\mathfrak{u}}}=\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j}$ for some positive sequence $(\gamma_{j})_{j\geq 1}$ ;
•

POD (“product and order dependent”) weights: $\gamma_{\mathrm{\mathfrak{u}}}=\Gamma_{|{\mathrm{\mathfrak{u}}}|}\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j}$ for positive sequences $(\gamma_{j})_{j\geq 1}$ and $(\Gamma_{j})_{j\geq 0}$ ;
•

SPOD (“smoothness-driven product and order dependent”) weights: $\gamma_{\mathrm{\mathfrak{u}}}=\sum_{{\boldsymbol{v}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}$ $\Gamma_{|{\boldsymbol{v}}_{\mathrm{\mathfrak{u}}}|}\prod_{j\in{\mathrm{\mathfrak{u}}}}\gamma_{j,v_{j}}$ for positive sequences $(\gamma_{j,v_{j}})_{j\geq 1}$ and $(\Gamma_{j})_{j\geq 0}$ .

The error bound (2.16) holds for all forms of weights, but the computational cost differs, see the next subsection.

2.6 Single-level kernel interpolation for PDEs

Lattice-based kernel interpolation was first applied to PDE problems in [23]. Denoting the dimension-truncated FE solution of (2.2) by $u^{s}_{h}(\cdot,{\boldsymbol{y}})\in V_{h}$ , the estimate (2.16) can be applied to the single-level kernel interpolant (see [23, Theorem 4.4]) to obtain

\displaystyle\|u^{s}_{h}-I_{N}u^{s}_{h}\|_{L^{2}(D\times\Omega)}\lesssim\varphi(N)^{-\frac{1}{4\lambda}}\|f\|_{V^{\prime}}\,C_{s}(\lambda)

(2.17)

for all $\lambda\in(\frac{1}{2\alpha},1]$ , where

	$\displaystyle[C_{s}(\lambda)]^{2\lambda}$	$\displaystyle\coloneq\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(\|{\mathrm{\mathfrak{u}}}\|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{\|{\mathrm{\mathfrak{u}}}\|}\bigg{)}$
		$\displaystyle\qquad\times\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|!\,{\boldsymbol{b}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\bigg{)}^{\lambda}.$

In [23], the weights $\gamma_{\mathrm{\mathfrak{u}}}$ are chosen to ensure the constant $C_{s}(\lambda)$ can be bounded independently of dimension $s$ . Different forms of weights (SPOD, POD, product) achieve dimension independent bounds with some concessions on the rate of convergence.

The single-level kernel interpolation methodology from [23] is summarised below:

1.

Compute the first column of $K_{N,\alpha,{\boldsymbol{\gamma}}}$ in (2.15) (i.e., compute $[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{0}})]_{k=0}^{N-1}$ ) with cost $\mathcal{O}(s^{\rho}\,\alpha^{\varsigma}N)$ , where $\rho=2$ and $\varsigma=2$ for SPOD weights, $\rho=2$ and $\varsigma=0$ for POD weights, $\rho=1$ and $\varsigma=0$ for product weights, see [23, Table 2].
2.

Evaluate the coefficient (1.2) at each lattice point and FE node to set up the stiffness matrix at the cost $\mathcal{O}(s\,h^{-d}N)$ .
3.

Compute the FE solution $u^{s}_{h}$ (denoted as $g$ in (2.15)) for the PDE associated with each lattice point to construct ${\boldsymbol{g}}_{N}$ on the right-hand side of (2.15) for every FE node at the cost $\mathcal{O}(h^{-\tau}N)$ for some $\tau>d$ , with $\tau\approx d$ for a linear complexity FE solver (e.g., an algebraic multigrid solver).
4.

Solve the circulant linear system (2.15) for coefficients ${\boldsymbol{a}}_{N}$ of the interpolant for every FE node at the cost $\mathcal{O}(h^{-d}N\log N)$ .

The total cost of construction for the single-level interpolant is therefore

\displaystyle\mathrm{cost}(I_{N}u_{h}^{s})\simeq s^{\rho}\alpha^{\varsigma}N+s\,h^{-d}N+h^{-\tau}N+h^{-d}N\log N.

(2.18)

There is a pre-computation cost (varies with the form of weights) associated with the CBC construction of the lattice generating vector ${\boldsymbol{z}}$ . There is also a post-computation cost to assemble the approximation.

From [23], the error satisfies

\displaystyle\mathrm{error}(I_{N}u^{s}_{h})\lesssim s^{-\kappa}+h^{\beta}+N^{-\mu},

see (2.6), (2.9), (2.17), with $\kappa=\frac{1}{p}-\frac{1}{2}$ , $\beta=2$ , and $\mu=\frac{1}{4\lambda}$ for $\lambda\in(\frac{1}{2\alpha},1]$ . The cost can now be expressed in terms of the error $\varepsilon>0$ as follows. We demand that each of the three components of the error is bounded above and below by multiples of $\varepsilon$ , i.e., $s\simeq\varepsilon^{-\frac{1}{\kappa}}$ , $h\simeq\varepsilon^{\frac{1}{\beta}}$ and $N\simeq\varepsilon^{-\frac{1}{\mu}}$ . It follows that there exists $C>0$ such that $N^{\mu}\leq Cs^{\kappa}$ , which gives $\log N\leq\frac{1}{\mu}(\log C+\kappa\log s)\leq\frac{1}{\mu}(\log C+\kappa)\,s$ since $s\geq 1$ , and therefore $\log N\lesssim s$ . Assuming that $d\leq\tau\leq d+\frac{\beta}{\kappa}$ and treating $\alpha^{\varsigma}$ as a constant, the cost (2.18) can be bounded further by

\displaystyle\mathrm{cost}(I_{N}u^{s}_{h})\lesssim s^{\rho}N+s\,Nh^{-d}\simeq\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}+\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}\simeq\varepsilon^{-\max(\frac{\rho}{\kappa}+\frac{1}{\mu},\frac{1}{\kappa}+\frac{1}{\mu}+\frac{d}{\beta})}.

(2.19)

In the case of product weights, we have $\rho=1$ and $\mathrm{cost}(I_{N}u^{s}_{h})\lesssim\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}$ .

3 Multilevel kernel approximation

Consider a sequence of conforming FE spaces $\{V_{\ell}\}_{\ell=0}^{\infty}$ , where each $V_{\ell}\coloneq V_{h_{\ell}}\subset V$ corresponds to a shape regular triangulation $\mathcal{T}_{\ell}$ of $D$ with mesh width $h_{\ell}\coloneqq\max\{\mathrm{diam}(\bigtriangleup):\bigtriangleup\in\mathcal{T}_{\ell}\}>0$ and $\dim(V_{\ell})=M_{\ell}<\infty$ . Recall that $M_{\ell}=\mathcal{O}(h_{\ell}^{-d})$ . Then for $\ell\in\mathbb{N}$ , we denote the dimension-truncated FE approximation in the space $V_{\ell}$ by $u_{\ell}(\cdot,{\boldsymbol{y}})\coloneq u^{s}_{h_{\ell}}(\cdot,{\boldsymbol{y}})=u^{s}_{h_{\ell}}(\cdot,{\boldsymbol{y}}_{1:s})\in V_{\ell}$ .

For a maximum level $L\in\mathbb{N}$ and setting $u_{-1}\coloneq 0$ , the multilevel kernel approximation is given by (1.4), where $\{I_{\ell}\}_{\ell\in\mathbb{N}}$ is a sequence of interpolation operators such that each $I_{\ell}\coloneqq I_{N_{\ell}}$ is a real-valued kernel interpolant based on $N_{\ell}$ lattice points $\{{\boldsymbol{t}}_{\ell,k}\}_{k=0}^{N_{\ell}-1}$ as defined in Section 2.5. The interpolants are ordered in terms of nonincreasing accuracy, or equivalently, nonincreasing numbers of interpolation points, i.e., $N_{0}\geq N_{1}\geq N_{2}\geq\cdots$ . The intuition is that the magnitude of $u_{\ell}-u_{\ell-1}$ decreases with increasing $\ell$ , thus requiring fewer interpolation points to achieve reasonable accuracy.

We assume that the FE solution at each level is approximated with increasingly fine meshes corresponding to mesh widths $h_{0}>h_{1}>\cdots$ (i.e., the approximation $u_{\ell}$ increases in accuracy as the level increases), so that approximations using large values of $N_{\ell}$ are compensated by coarser FE meshes, thus moderating the cost. To simplify the ML algorithm, we additionally assume that the FE spaces are nested, i.e., $V_{\ell}\subset V_{\ell+1}$ for $\ell\in\mathbb{N}$ . If non-nested FE spaces are used, a “supermesh” (the mesh corresponding to the space spanned by the basis functions of both $V_{\ell}$ and $V_{\ell-1}$ ) must be considered, which adds to the computational cost (see e.g., [9, 10, 12]).

Similarly, we assume the lattice points on each level are nested (i.e., the points used on each level $\ell$ form a subset of the points used on level $\ell-1$ ), which can be achieved using an embedded lattice rule (see [30]). Thus, the space spanned by the kernel basis functions on level $\ell$ is a subspace of the one spanned by the kernel basis functions on level $\ell-1$ . This proves to be advantageous since the generating vector for the lattice only needs to be constructed once and FE evaluations can be reused between levels, e.g., evaluations used to compute $I_{\ell-1}u_{\ell-1}$ can be reused to compute $I_{\ell}u_{\ell-1}$ .

To compute the multilevel kernel approximation, on each level $\ell$ , we compute the $N_{\ell}$ -point interpolant $I_{\ell}$ of the difference of a FE approximation on a fine mesh, $u_{\ell}(\cdot,{\boldsymbol{y}})\in V_{\ell}$ , and a coarse mesh, $u_{\ell-1}(\cdot,{\boldsymbol{y}})\in V_{\ell-1}$ . As a result, the final approximation is not a direct interpolation of the solution $u$ , and thus $I_{L}^{\rm{ML}}$ will be referred to exclusively as the multilevel kernel approximation.

3.1 Error decomposition for multilevel methods

The multilevel kernel approximation error can be expressed as (omitting the dependence on ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ )

	$\displaystyle u-I^{\mathrm{ML}}_{L}u$	$\displaystyle=u-\sum_{\ell=0}^{L}I(u_{\ell}-u_{\ell-1})+\sum_{\ell=0}^{L}I(u_{\ell}-u_{\ell-1})-\sum_{\ell=0}^{L}I_{\ell}(u_{\ell}-u_{\ell-1})$
		$\displaystyle=u-u_{L}+\sum_{\ell=0}^{L}(I-I_{\ell})(u_{\ell}-u_{\ell-1}),$

where $I:\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})\to\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ denotes the identity operator, and we define $u_{-1}\coloneq 0$ . Following the methodology of [23], we take the $L^{2}(D)$ norm and $L^{2}(\Omega)$ norm, then use the triangle inequality to obtain the error estimate,

	$\displaystyle\mathrm{error}(I^{ML}_{L}u)$	$\displaystyle=\\|u-I^{\mathrm{ML}}_{L}u\\|_{L^{2}(\Omega\times D)}$
		$\displaystyle\leq\\|u-u_{L}\\|_{L^{2}(\Omega\times D)}+\sum_{\ell=0}^{L}\\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\\|_{L^{2}(\Omega\times D)}.$		(3.1)

The first term in (3.1) is often referred to as the bias in multilevel literature, which can be further separated into a dimension truncation error and a FE error

\displaystyle\|u-u_{L}\|_{L^{2}(\Omega\times D)}\leq\|u-u^{s}\|_{L^{2}(\Omega\times D)}+\|u^{s}-u^{s}_{h_{L}}\|_{L^{2}(\Omega\times D)},

(3.2)

where the two components can be bounded using (2.6) and (2.9). The bias is controlled by choosing $s$ and $h_{L}$ as necessary to obtain a prescribed error. The second term in (3.1) is the error associated with the multilevel scheme, which is controlled by the choice of interpolation points $N_{\ell}$ and the fineness of the FE mesh $h_{\ell}$ at each level.

3.2 Cost analysis of multilevel methods

The cost of constructing the multilevel kernel approximation is similar to the single-level interpolant, except now the cost is “spread” over multiple levels. Recall from Subsection 2.6 that the total cost of construction for the single-level kernel interpolant is given by (2.18). For the multilevel algorithm, the total cost is from

1.

evaluating the kernel functions for the full lattice point set, $\mathcal{O}(s^{\rho}\,\alpha^{\varsigma}N_{0})$ ,
2.
then, for each level $\ell$ with $N_{\ell}\leq N_{0}$ ,
1. (a)
  
  evaluating the coefficient (1.2) at each lattice point and FE node to set up the stiffness matrix, $\mathcal{O}(s\,h^{-d}_{\ell}N_{\ell})$ ,
2. (b)
  
  solving for the FE solution at each lattice point, $\mathcal{O}(h_{\ell}^{-\tau}N_{\ell})$ , and
3. (c)
  
  solving the linear system for coefficients of the interpolant at every FE node, $\mathcal{O}(h_{\ell}^{-d}N_{\ell}\log N_{\ell})$ .

Since the lattice points are nested, the total cost of computation is

\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\simeq s^{\rho}\alpha^{\varsigma}N_{0}+\sum_{\ell=0}^{L}(s\,h_{\ell}^{-d}N_{\ell}+h_{\ell}^{-\tau}N_{\ell}+h_{\ell}^{-d}N_{\ell}\log N_{\ell}).

(3.3)

Since kernel functions only need to be evaluated for each lattice point once and can be reused at each level as necessary, this cost, given by $s^{\rho}\alpha^{\varsigma}N_{0}$ , is independent of $\ell$ and is outside the summation. For details on practical implementation, see Section 6.

3.3 Abstract complexity analysis

Theorem 1 below is an abstract complexity theorem for the error and cost of our multilevel approximation. It specifies a choice of $s$ , $L$ , $N_{\ell}$ and $h_{\ell}$ for $\ell=0,\ldots,L$ such that the total error of approximation can be bounded by some given $\varepsilon>0$ . Assumption (M(M2)) is motivated by the bias split (3.2) together with (2.6) and (2.9), and $s$ is chosen to balance the two terms. Assumption (M(M2)) is motivated by the cost estimate (3.3), where $\alpha^{\varsigma}$ is treated as constant, and a bound on $\tau$ is assumed to simplify the cost. The error analysis in Section 5 will justify Assumption (M(M2)) with precise values of the relevant constants.

Theorem 1.

Given $h_{0}\in(0,1)$ and $d\geq 1$ , define $h_{\ell}\coloneq h_{0}\,2^{-\ell}$ for $\ell\geq 0$ , and suppose there are positive constants $\beta,\kappa,\mu,\rho$ , and $\tau$ such that

(M1)

$\|u-u_{L}\|_{L^{2}(\Omega\times D)}\lesssim s^{-\kappa}+h_{L}^{\beta}$ ,
(M2)

$\|(I-I_{0})u_{0}\|_{L^{2}(\Omega\times D)}\lesssim N_{0}^{-\mu}$ and $\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim N_{\ell}^{-\mu}h_{\ell-1}^{\beta}$ for $\ell=1,\ldots,L$ , and
(M3)

$\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,\displaystyle s^{\rho}N_{0}+\sum_{\ell=0}^{L}N_{\ell}\,(s\,h_{\ell}^{-d}+h_{\ell}^{-\tau}+h_{\ell}^{-d}\log N_{\ell})$ .

Given $0<\varepsilon<\min(1,2h_{0}^{\beta})$ , and assuming $d\leq\tau\leq d+\frac{\beta}{\kappa}$ , we may choose integers $L$ given by (3.6), $s\simeq h_{L}^{-\frac{\beta}{\kappa}}$ , and $N_{0},\ldots,N_{L}$ given by (3.14), such that $\mathrm{error}(I^{\mathrm{ML}}_{L}u)\lesssim\varepsilon$ and

\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim\begin{cases}\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}&\quad\mbox{when}\,\,\frac{d}{\beta}<\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}\,(\log\varepsilon^{-1})^{1+\frac{1}{\mu}}&\quad\mbox{when}\,\,\frac{d}{\beta}=\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{1+\mu}(\frac{d}{\beta}+1)}&\quad\mbox{when}\,\,\frac{1}{\mu}<\frac{d}{\beta}\leq\frac{1}{\mu}+(\frac{1}{\mu}+1)\frac{\rho-1}{\kappa},\\ \varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}&\quad\mbox{when}\,\,\frac{d}{\beta}>\frac{1}{\mu}+(\frac{1}{\mu}+1)\frac{\rho-1}{\kappa},\end{cases}

(3.4)

where the implied constants depend on $h_{0},d,\beta,\kappa,\mu,\rho$ , and $\tau$ .

Proof.

Substituting Assumptions (M(M2)) and (M(M2)) into (3.1) gives the error bound

\displaystyle\mathrm{error}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,s^{-\kappa}+h_{L}^{\beta}+N_{0}^{-{\mu}}+\sum_{\ell=1}^{L}N_{\ell}^{-{\mu}}h_{\ell-1}^{\beta}.

Choosing $s$ to balance the two components of error in Assumption (M(M2)), i.e., setting $s^{-\kappa}\simeq h_{L}^{\beta}$ , the bound further simplifies to

\displaystyle\mathrm{error}(I^{\mathrm{ML}}_{L}u)\,

\displaystyle\lesssim\,{h_{L}^{\beta}+\sum_{\ell=0}^{L}N_{\ell}^{-{\mu}}h_{\ell}^{\beta},}

(3.5)

where the implied constant depends on a factor $\max(h_{0}^{-\beta},2^{\beta})$ .

We require that $\mathrm{error}(I^{\mathrm{ML}}_{L}u)\,\lesssim\,\varepsilon$ , which holds if each of the two terms in (3.5) is bounded by $\varepsilon/2$ . So from the first term we choose $L$ such that $h_{L}^{\beta}=h_{0}^{\beta}\,2^{-L\beta}\leq\varepsilon/2$ , yielding the conditions $L\geq\log_{2}(2h_{0}^{\beta}\,\varepsilon^{-1})/\beta$ . Taking the smallest allowable value of $L$ with the ceiling function, we obtain

\displaystyle L\coloneq\bigg{\lceil}\frac{\log_{2}(2h_{0}^{\beta}\,\varepsilon^{-1})}{\beta}\bigg{\rceil},\quad\mbox{which implies}\quad 2^{L}\simeq\varepsilon^{-\frac{1}{\beta}}.

(3.6)

To ensure $L\geq 1$ so that we are not in the trivial single-level case, we require that the value inside the ceiling function is positive, giving the condition $\varepsilon<2h_{0}^{\beta}$ . For the second term in (3.5), we demand that

\displaystyle\sum_{\ell=0}^{L}N_{\ell}^{-{\mu}}h_{\ell}^{\beta}\leq\frac{\varepsilon}{2}.

(3.7)

Since we have assumed that $\tau\leq d+\frac{\beta}{\kappa}$ , which follows from desiring $s\,h_{\ell}^{-d}\simeq h_{L}^{-\frac{\beta}{\kappa}}h_{\ell}^{-d}\geq h_{\ell}^{-\tau}$ , Assumption (M(M2)) simplifies to

\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim s^{\rho}N_{0}+\sum_{\ell=0}^{L}N_{\ell}\,(s\,h_{\ell}^{-d}+h_{\ell}^{-d}\log N_{\ell}).

(3.8)

If $\log N_{\ell}\lesssim s$ for all $\ell=0,\ldots L$ , then, using $s^{-\kappa}\simeq h_{L}^{\beta}$ , the cost (3.8) can be bounded by

\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim h_{L}^{-\frac{\rho\,\beta}{\kappa}}N_{0}+h_{L}^{-\frac{\beta}{\kappa}}\sum_{\ell=0}^{L}N_{\ell}\,h_{\ell}^{-d},

(3.9)

where the first term represents a setup cost and the second term is the multilevel cost.

We now proceed to choose $N_{0},\ldots,N_{L}$ by minimising the multilevel cost term in (3.9) subject to the constraint (3.7), with equality instead of $\leq$ . We will later verify that the condition $\log N_{\ell}\lesssim s$ is indeed true. The Lagrangian for this optimisation is

\displaystyle\mathcal{L}(\widehat{N}_{0},\ldots,\widehat{N}_{L},\chi)\coloneq h_{L}^{-\frac{\beta}{\kappa}}\sum_{\ell=0}^{L}\widehat{N}_{\ell}\,h_{\ell}^{-d}+\chi\bigg{(}\sum_{\ell=0}^{L}\widehat{N}_{\ell}^{-{\mu}}h^{\beta}_{\ell}-\frac{\varepsilon}{2}\bigg{)},

where $\chi$ is the Lagrange multiplier and $\widehat{N}_{\ell}$ for $\ell=0,\ldots,L$ are continuous variables. This gives us the following first-order optimality conditions

	$\displaystyle\frac{\partial\mathcal{L}}{\partial\widehat{N}_{\ell}}$	$\displaystyle=h_{L}^{-\frac{\beta}{\kappa}}h_{\ell}^{-d}-{\chi}\,{\mu}\,\widehat{N}_{\ell}^{-{\mu}-1}h_{\ell}^{\beta}=0\quad\mbox{for }\ell=0,\ldots,L,$		(3.10)
	$\displaystyle\frac{\partial\mathcal{L}}{\partial\chi}$	$\displaystyle=\sum_{\ell=0}^{L}\widehat{N}_{\ell}^{-{\mu}}h_{\ell}^{\beta}-\frac{\varepsilon}{2}=0.$		(3.11)

Rearranging (3.10) gives $\widehat{N}_{\ell}^{1+{\mu}}h_{\ell}^{-(d+\beta)}=\chi\,\mu\,h_{L}^{\frac{\beta}{\kappa}}$ for $\ell=0,\ldots,L$ , noting that the right-hand side is independent of $\ell$ . Thus, we have $\widehat{N}_{\ell}^{1+{\mu}}h_{\ell}^{-(d+\beta)}=\widehat{N}_{0}^{1+{\mu}}h_{0}^{-(d+\beta)}$ for $\ell=1,\ldots,L$ , so

\displaystyle\widehat{N}_{\ell}=\widehat{N}_{0}\,\bigg{(}\frac{h_{\ell}}{h_{0}}\bigg{)}^{\frac{d+\beta}{1+\mu}}=\widehat{N}_{0}\,(2^{-\ell})^{\frac{d+\beta}{1+\mu}}\quad\mbox{for }\ell=1,\ldots,L.

(3.12)

Substituting (3.12) into (3.11) gives

\displaystyle\widehat{N}_{0}^{-{\mu}}h_{0}^{\beta}\sum_{\ell=0}^{L}(2^{\ell})^{\frac{\mu(d+\beta)}{1+\mu}-\beta}=\frac{\varepsilon}{2},\quad\mbox{which yields}\quad\widehat{N}_{0}=\bigg{(}2\,\varepsilon^{-1}h_{0}^{\beta}\sum_{\ell=0}^{L}2^{\frac{(d\mu-\beta)\ell}{1+\mu}}\bigg{)}^{\frac{1}{\mu}}.

(3.13)

To obtain integer values for $N_{\ell}$ , we define

\displaystyle N_{\ell}\coloneq\big{\lceil}\widehat{N}_{\ell}\big{\rceil}=\Big{\lceil}\widehat{N}_{0}\,{2}^{\frac{-(d+\beta)\ell}{1+\mu}}\Big{\rceil}\qquad\mbox{for }\ell=0,\ldots,L.

(3.14)

Since $N_{\ell}\geq\widehat{N}_{\ell}$ , the bound (3.7) continues to hold for this choice of $N_{\ell}$ .

Clearly, $N_{0}\geq N_{1}\geq\cdots\geq N_{L}$ as required. We now verify that $\log N_{0}\lesssim s$ . Since $\varepsilon<2h_{0}^{\beta}$ , we have $\widehat{N}_{0}>1$ from (3.13), and therefore

\displaystyle N_{0}<\widehat{N}_{0}+1<2\widehat{N}_{0}

\displaystyle\leq 2\bigg{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,(L+1)\,2^{\frac{|d\mu-\beta|}{1+\mu}L}\bigg{)}^{\frac{1}{\mu}}\lesssim\varepsilon^{-(1+\frac{1}{\beta}+\frac{|d\mu-\beta|}{(1+\mu)\beta})\frac{1}{\mu}},

where we loosely overestimated the geometric series in (3.13) by taking $L+1$ times the largest possible term with absolute value in the exponent, and then used $L+1\leq 2^{L}$ and $2^{L}\simeq\varepsilon^{-\frac{1}{\beta}}$ from (3.6). Thus $\log N_{0}\lesssim\log\varepsilon^{-1}$ . On the other hand, we have $s^{-\kappa}\simeq h_{L}^{\beta}\leq\frac{\varepsilon}{2}$ and so $s\gtrsim\varepsilon^{-\frac{1}{\kappa}}$ . Hence we have $\log N_{0}\lesssim s$ as required. We conclude that the results from the optimisation with respect to the simplified cost function (3.9) can be applied to the multilevel problem with cost given by Assumption (M(M2)).

We now verify that the cost satisfies (3.4) by substituting $N_{0}\leq 2\widehat{N}_{0}$ , $N_{\ell}\leq\widehat{N}_{\ell}+1=\widehat{N}_{0}\,2^{-\frac{(d+\beta)\ell}{1+\mu}}+1$ , (3.13), $h_{\ell}=h_{0}\,2^{-\ell}$ , and $2^{L}\simeq\varepsilon^{\frac{1}{\beta}}$ into (3.9), resulting in

	$\displaystyle\mathrm{cost}(I^{\mathrm{ML}}_{L}u)\lesssim h_{0}^{-\frac{\rho\beta}{\kappa}}\varepsilon^{-\frac{\rho}{\kappa}}\widehat{N}_{0}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\sum_{\ell=0}^{L}\Big{(}\widehat{N}_{0}\,2^{-\frac{(d+\beta)\ell}{1+\mu}}+1\Big{)}\,2^{d\ell}$
	$\displaystyle=h_{0}^{-\frac{\rho\beta}{\kappa}}\varepsilon^{-\frac{\rho}{\kappa}}\Big{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,E_{L}\Big{)}^{\frac{1}{\mu}}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\Big{(}2\,\varepsilon^{-1}h_{0}^{\beta}\,E_{L}\Big{)}^{\frac{1}{\mu}}E_{L}+h_{0}^{-\frac{\beta}{\kappa}-d}\varepsilon^{-\frac{1}{\kappa}}\sum_{\ell=0}^{L}2^{d\ell}$
	$\displaystyle\lesssim\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}\,E_{L}^{\frac{1}{\mu}}+\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}}\,E_{L}^{\frac{1}{\mu}+1}+\varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}$
	$\displaystyle\simeq\begin{cases}\varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}&\mbox{when }\frac{d}{\beta}<\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{\mu}}(\log\varepsilon^{-1})^{\frac{1}{\mu}+1}&\mbox{when }\frac{d}{\beta}=\frac{1}{\mu},\\ \varepsilon^{-\frac{\rho}{\kappa}-\frac{1}{1+\mu}(\frac{d}{\beta}+1)}+\varepsilon^{-\frac{1}{\kappa}-\frac{d}{\beta}}&\mbox{when }\frac{d}{\beta}>\frac{1}{\mu},\end{cases}$

where the implied constant includes a factor $h_{0}^{-\max(\frac{\rho\beta}{\kappa},\frac{\beta}{\kappa}+d)}$ , and we used

\displaystyle E_{L}:=\sum_{\ell=0}^{L}2^{\frac{(d\mu-\beta)\ell}{1+\mu}}\simeq\begin{cases}1&\mbox{when }d\mu<\beta\;(\mbox{i.e., }\frac{d}{\beta}<\frac{1}{\mu}),\\ L\simeq\log\varepsilon^{-1}&\mbox{when }d\mu=\beta\;(\mbox{i.e., }\frac{d}{\beta}=\frac{1}{\mu}),\\ 2^{\frac{d\mu-\beta}{1+\mu}L}\simeq\varepsilon^{-\frac{d\mu-\beta}{\beta(1+\mu)}}&\mbox{when }d\mu>\beta\;(\mbox{i.e., }\frac{d}{\beta}>\frac{1}{\mu}).\end{cases}

The final case in the cost (i.e., when $\frac{d}{\beta}>\frac{1}{\mu}$ ) is split into two cases based on which of the two terms dominates, resulting in the four cases in (3.4). ∎

The third case in (3.4) becomes obsolete when $\rho=1$ , i.e., when we have product weights. In this scenario, to achieve an accuracy $\mathcal{O}(\varepsilon)$ , the cost for the multilevel approximation is $\mathcal{O}\big{(}\varepsilon^{-\frac{1}{\kappa}-\max(\frac{1}{\mu},\frac{d}{\beta})}\big{)}$ , compared to $\mathcal{O}\big{(}\varepsilon^{-\frac{1}{\kappa}-\frac{1}{\mu}-\frac{d}{\beta}}\big{)}$ for the single-level approximation. This multilevel cost is near-optimal, since the cost of a single FE evaluation at the finest level is $\mathcal{O}(\varepsilon^{-\frac{{1}}{\kappa}-\frac{d}{\beta}})$ and the cost of interpolation at level $0$ at a single node is $\mathcal{O}(\varepsilon^{-\frac{1}{\kappa}-{\frac{1}{\mu}}})$ .

We note that when $\rho>1$ , we may encounter the scenario where the single-level cost and multilevel cost have the same order. This occurs when the setup cost $s^{\rho}N_{0}$ in Assumption (M(M2)) dominates, either due to a large dimension $s$ or because the contribution to the cost from the FE solves is relatively small due to fast convergence in FE error (i.e., $\beta$ is large). Such cases are exceptional, and it would be unnecessary to consider a multilevel approach in these situations.

4 Parametric regularity analysis

Analysis of the multilevel kernel approximation error for parametric PDEs requires bounds on the mixed derivatives with respect to both the physical variable ${\boldsymbol{x}}$ and the parametric variable ${\boldsymbol{y}}$ simultaneously. The proofs of all parametric regularity lemmas in this section are given in Appendix II.

The following lemma provides a bound on the Laplacian of the derivatives (with respect to the parametric variables) of the solution to (2.2). It shows that $u$ and its derivatives with respect to ${\boldsymbol{y}}$ possess sufficient spatial regularity to establish estimates for the FE error. Recall that Stirling numbers are defined by (2.1).

Lemma 2.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for every ${\boldsymbol{y}}\in\Omega$ , let $u(\cdot,{\boldsymbol{y}})\in V$ be the solution to the problem (2.2). Then for every ${\boldsymbol{\nu}}\in\mathcal{F}$ and all ${\boldsymbol{y}}\in\Omega$ , we have that $\partial^{\boldsymbol{\nu}}u(\cdot,{\boldsymbol{y}})\in H^{2}(D)\cap V$ and

\displaystyle\|\Delta(\partial^{{\boldsymbol{\nu}}}u(\cdot,{\boldsymbol{y}}))\|_{L^{2}(D)}\lesssim\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+1)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

(4.1)

where $\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1}$ is defined in (2.3) and the implied constant is independent of ${\boldsymbol{y}}$ .

The following lemma gives a bound on the derivatives with respect to the parametric variables of the FE error in $V$ .

Lemma 3.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for every ${\boldsymbol{y}}\in\Omega$ , let $u(\cdot,{\boldsymbol{y}})\in V$ be the solution to (2.2) and $u_{h}(\cdot,{\boldsymbol{y}})\in V_{h}$ be its piecewise linear FE approximation (2.7). Then for every ${\boldsymbol{\nu}}\in\mathcal{F}$ , and sufficiently small $h>0$ , we have

\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})(\cdot,{\boldsymbol{y}})\|_{V}\lesssim\,h\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

(4.2)

where $\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1}$ is defined in (2.3) and the implied constant is independent of $h$ and ${\boldsymbol{y}}$ .

We are interested in regularity estimates with respect to the $L^{2}$ norm. Thus, Lemma 4 presents the bound on the $L^{2}$ norm of the derivatives of FE error obtained using a duality argument.

Lemma 4.

\displaystyle\|\partial^{\boldsymbol{\nu}}(u-u_{h})(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\lesssim h^{2}\,\|f\|_{L^{2}(D)}\,(2\pi)^{|{\boldsymbol{\nu}}|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(|{\boldsymbol{m}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}),

where $\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1}$ is defined in (2.3) and the implied constant is independent of $h$ and ${\boldsymbol{y}}$ .

5 Multilevel error analysis

We are now ready to derive an estimate for the final component of the error in (3.1).

5.1 Estimating the multilevel FE error

Clearly, $\|g\|_{L^{2}(\Omega)}\equiv\|g\|_{L^{2}(\Omega_{s})}$ for any $g$ that is a function solely of ${\boldsymbol{y}}_{1:s}$ . Then, taking the $L^{2}(\Omega)$ -norm and using the triangle inequality, we have from (2.16) the following pointwise bound for some ${\boldsymbol{x}}\in D$ ,

		$\displaystyle\\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\\|_{L^{2}(\Omega)}=\\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\\|_{L^{2}(\Omega_{s})}$
		$\displaystyle\leq\frac{\kappa}{[\varphi(N_{\ell})]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(\|{\mathrm{\mathfrak{u}}}\|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{\|{\mathrm{\mathfrak{u}}}\|}\bigg{)}^{\frac{1}{2\lambda}}\\|(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}.$		(5.1)

where

\displaystyle\|(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}\leq\|(u^{s}-u^{s}_{h_{\ell}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}+\|(u^{s}-u^{s}_{h_{\ell-1}})({\boldsymbol{x}},\cdot)\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}.

(5.2)

It then follows that the sum over $\ell$ in (3.1) can be bounded by

		$\displaystyle\\|(I-I_{0})\,u_{0}\\|_{L^{2}(\Omega_{s}\times D)}+\sum_{\ell=1}^{L}\\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\\|_{L^{2}(\Omega_{s}\times D)}$
		$\displaystyle\leq\frac{\\|f\\|_{V^{\prime}}C_{s}(\lambda)}{[\varphi(N_{0})]^{\frac{1}{4\lambda}}}+\sum_{\ell=1}^{L}\frac{\kappa}{[\varphi(N_{\ell})]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(\|{\mathrm{\mathfrak{u}}}\|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{\|{\mathrm{\mathfrak{u}}}\|}\bigg{)}^{\frac{1}{2\lambda}}$
		$\displaystyle\quad\times\bigg{(}\sqrt{\int_{D}\\|(u^{s}-u^{s}_{h_{\ell}})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}+\sqrt{\int_{D}\\|(u^{s}-u^{s}_{h_{\ell-1}})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}\,\bigg{)},$		(5.3)

where we separate out the $\ell=0$ term as it is simply the standard single-level kernel interpolation error for which the bound (2.17) applies. Combining (5.1) and (5.2) with the triangular inequality gives the last line.

Therefore, we seek to estimate the individual finite element error terms in the above expression to estimate the full multilevel kernel approximation error.

Theorem 5.

Under Assumptions (A(A0)), (A(A0)), (A(A0)) and (A(A6)), for $\alpha\geq 1$ and weight parameters $(\gamma_{{\mathrm{\mathfrak{u}}}})_{{\mathrm{\mathfrak{u}}}\subset\mathbb{N}}$ , let $u^{s}\in V$ be the solution to (2.2) and $u^{s}_{h}\in V_{h}$ be the FE approximation (2.7) to $u^{s}$ . Then the following estimate holds

	$\displaystyle\sqrt{\int_{D}\\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}$
	$\displaystyle\lesssim h^{2}\,\\|f\\|_{L^{2}(D)}\sqrt{\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}},$

where the implied constant is independent of $h$ .

Proof.

From the definition of the norm of $\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ given in (2.13) and the Cauchy-Schwarz inequality, it follows that for any ${\boldsymbol{x}}\in D$ ,

	$\displaystyle\\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}$
	$\displaystyle=\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{\|{\mathrm{\mathfrak{u}}}\|}}\bigg{\|}\int_{[0,1]^{s-\|{\mathrm{\mathfrak{u}}}\|}}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\,\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\bigg{\|}^{2}\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}$
	$\displaystyle\leq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{\|{\mathrm{\mathfrak{u}}}\|}}\int_{[0,1]^{s-\|{\mathrm{\mathfrak{u}}}\|}}\bigg{\|}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\bigg{\|}^{2}\,\mathrm{d}{\boldsymbol{y}}_{-{\mathrm{\mathfrak{u}}}}\,\mathrm{d}{\boldsymbol{y}}_{{\mathrm{\mathfrak{u}}}}.$

Now, taking the $L^{2}$ -norm with respect to $D$ and applying the Fubini-Tonelli theorem gives

	$\displaystyle\int_{D}\\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}$
	$\displaystyle\leq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\int_{D}\bigg{(}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{x}}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle=\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{\\|}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})(\cdot,{\boldsymbol{y}})\bigg{\\|}^{2}_{L^{2}(D)}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle\lesssim\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{h^{4}\,\\|f\\|^{2}_{L^{2}(D)}}{\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle=\,\,h^{4}\,\\|f\\|^{2}_{L^{2}(D)}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\,\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}.$

In the step with $\lesssim$ , we applied Lemma 4 for each ${\mathrm{\mathfrak{u}}}$ with ${\boldsymbol{\nu}}$ given by $\nu_{i}=\alpha$ for $i\in{\mathrm{\mathfrak{u}}}$ and $\nu_{i}=0$ for $i\notin{\mathrm{\mathfrak{u}}}$ , and ${\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}$ denotes the multi-index with $m_{i}=0$ for all $i\not\in{\mathrm{\mathfrak{u}}}$ , with $\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\coloneq\prod_{i\in{\mathrm{\mathfrak{u}}}}\overline{b}_{i}^{m_{i}}$ . The implied constant is independent of $h$ . Taking the square-root completes the proof. ∎

Applying Theorem 5 to (5.1), we derive the bound on the error of the multilevel kernel approximation of $u_{L}=u^{s}_{h_{L}}$ which is presented below. The constant $C_{s}(\lambda)$ in (2.17) is bounded from above by $\mathcal{D}_{s}(\lambda)$ defined in (6) since $|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|!<(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!$ and $b_{j}\leq\overline{b}_{j}$ . We also use the fact that $\|f\|_{V^{\prime}}\lesssim\|f\|_{L^{2}(D)}$ to include the first term in (5.1) into the summation as the $\ell=0$ term.

Theorem 6.

Under Assumptions (A(A0))–(A(A0)) and (A(A6)), for $\alpha\in\mathbb{N}$ and $\lambda\in(\frac{1}{2\alpha},1]$ , the error of the multilevel kernel approximation of the dimension-truncated FE solution $u_{L}$ satisfies

\displaystyle\sum_{\ell=0}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim\|f\|_{L^{2}(D)}\,\mathcal{D}_{s}(\lambda)\,\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-\frac{1}{4\lambda}}h_{\ell-1}^{2},

with

	$\displaystyle[\mathcal{D}_{s}(\lambda)]^{2\lambda}$	$\displaystyle\coloneq\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(\|{\mathrm{\mathfrak{u}}}\|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{\|{\mathrm{\mathfrak{u}}}\|}\bigg{)}$
		$\displaystyle\qquad\times\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\bigg{)}^{\lambda},$		(5.4)

where $\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1}$ is defined in (2.3) and $h_{-1}\coloneq 1$ .

5.2 Choosing the weight parameters $\gamma_{\mathrm{\mathfrak{u}}}$

We now choose the weights $\gamma_{\mathrm{\mathfrak{u}}}$ to minimise $\mathcal{D}_{s}(\lambda)$ and select $\lambda$ to obtain the best possible convergence rate, while ensuring that $\mathcal{D}_{s}(\lambda)$ is bounded independently of $s$ .

Theorem 7.

Suppose that Assumptions (A(A0))–(A(A6)) hold. The choice of weights

\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\Bigg{(}\frac{\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})}{\sqrt{|{\mathrm{\mathfrak{u}}}|\,[2\zeta(2\alpha\lambda)]^{|{\mathrm{\mathfrak{u}}}|}}}\Bigg{)}^{\frac{2}{1+\lambda}}

(5.5)

for $\emptyset\neq{\mathrm{\mathfrak{u}}}\subset\mathbb{N}$ , $|{\mathrm{\mathfrak{u}}}|<\infty$ , and $\gamma_{\emptyset}\coloneq 1$ , minimise $\mathcal{D}_{s}(\lambda)$ given in (6). Here $\overline{{\boldsymbol{b}}}=(\overline{b}_{j})_{j\geq 1}$ is defined in (2.3). In addition, if we take $\alpha\coloneq\lfloor 1/q+1/2\rfloor$ and $\lambda\coloneq\frac{q}{2-q}$ , then $\mathcal{D}_{s}(\lambda)$ can be bounded independently of dimension $s$ . Thus, the multilevel kernel approximation of the dimension-truncated FE solution of (2.2) satisfies

\displaystyle\sum_{\ell=0}^{L}\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}\lesssim\|f\|_{L^{2}(D)}\,\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-(\frac{1}{2q}-\frac{1}{4})}h_{\ell-1}^{2},

(5.6)

where we define $h_{-1}\coloneq 1$ , and the implied constant is independent of $s$ .

Proof.

The proof of this theorem follows very closely to that of [23, Theorem 4.5]. We seek to choose weights to bound the constant $\mathcal{D}_{s}(\lambda)$ independently of dimension. Applying [33, Lemma 6.2] to (6), we find that the weights given by (5.5) minimise $[\mathcal{D}_{s}(\lambda)]^{2\lambda}$ .

Substituting (5.5) into (6) and simplifying gives

\displaystyle[\mathcal{D}_{s}(\lambda)]^{\frac{2\lambda}{1+\lambda}}\leq\sum_{{\boldsymbol{m}}\in\{0:\alpha\}^{s}}\bigg{(}(|{\boldsymbol{m}}|+5)!\prod_{i=1}^{s}\beta_{i}^{m_{i}}\bigg{)}^{\frac{2\lambda}{1+\lambda}},

(5.7)

where we use the bound $\max(|{\mathrm{\mathfrak{u}}}|,1)\leq[e^{1/e}]^{|{\mathrm{\mathfrak{u}}}|}$ and define $\beta_{i}\coloneq S_{\max}(\alpha)\,[2e^{1/e}\zeta(2\alpha\lambda)]^{\frac{1}{2\lambda}}\,\overline{b}_{i}$ , with $S_{\max}(\alpha)\coloneq\max_{1\leq m\leq\alpha}S(\alpha,m)$ .

Defining a new sequence $d_{j}\coloneq\beta_{\lceil j/\alpha\rceil}$ for $j\geq 1$ , we can write, for ${\boldsymbol{m}}\in\{0:\alpha\}^{s}$ ,

\displaystyle\prod_{i=1}^{s}\beta_{i}^{m_{i}}=\prod_{i\in{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}}d_{i},

where ${\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}\coloneq\{1,2,\ldots,m_{1},\alpha+1,\alpha+2,\ldots,\alpha+m_{2}\ldots,(s-1)\alpha+1,\ldots,(s-1)\alpha+m_{s}\}$ . The cardinality of ${\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}$ is $|{\mathrm{\mathfrak{v}}}_{\boldsymbol{m}}|=\sum_{i=1}^{s}m_{i}=|{\boldsymbol{m}}|$ . Then, following [23], the upper bound (5.7) can be further bounded above by

\displaystyle\sum_{\stackrel{{\scriptstyle\scriptstyle{{\mathrm{\mathfrak{v}}}\subset\mathbb{N}}}}{{\scriptstyle{|{\mathrm{\mathfrak{v}}}|<\infty}}}}\bigg{(}(|{\mathrm{\mathfrak{v}}}|+5)!\prod_{i\in{\mathrm{\mathfrak{v}}}}d_{i}\bigg{)}^{\frac{2\lambda}{1+\lambda}}\leq\sum_{\ell\geq 0}[(\ell+5)!]^{\frac{2\lambda}{1+\lambda}}\frac{1}{\ell!}\bigg{(}\sum_{i\geq 1}d_{i}^{\frac{2\lambda}{1+\lambda}}\bigg{)}^{\ell}.

(5.8)

Now following from Assumption (A(A6)), we know $\sum_{j\geq 1}\overline{b}_{j}^{q}<\infty$ , so we choose $\lambda$ such that $\frac{2\lambda}{1+\lambda}=q$ which gives $\lambda=\frac{q}{2-q}$ . Then

\displaystyle\sum_{i\geq 1}d_{i}^{\frac{2\lambda}{1+\lambda}}=\sum_{i\geq 1}d_{i}^{\,q}=\alpha\sum_{i\geq 1}\beta_{i}^{q}=\alpha\max(1,S_{\max}(\alpha)\,[2e^{1/e}\zeta(2\alpha\lambda)]^{\frac{1}{2\lambda}})^{q}\sum_{j\geq 1}\overline{b}_{i}^{q}<\infty,

provided that $2\alpha\lambda>1$ , which can be ensured by choosing $\alpha>1/q-1/2$ . This condition can be satisfied by taking $\alpha=\lfloor(\frac{1}{q}-\frac{1}{2})+1\rfloor=\lfloor\frac{1}{q}+\frac{1}{2}\rfloor$ .

To show the full series (5.8) converges we perform the ratio test with the terms of the series given by $T_{\ell}\coloneq[(\ell+5)!]^{q}\,\frac{1}{\ell!}\big{(}\sum_{i\geq 1}d_{i}^{q}\big{)}^{\ell}$ . This shows that $\mathcal{D}_{s}(\lambda)$ can be bounded from above independently of dimension since

	$\displaystyle\lim_{\ell\to\infty}\bigg{\|}\frac{T_{\ell+1}}{T_{\ell}}\bigg{\|}=\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\frac{(\ell+1+5)^{q}}{\ell+1}$	$\displaystyle\leq\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\frac{(\ell+1)^{q}+5^{q}}{\ell+1}$
		$\displaystyle=\bigg{(}\sum_{i\geq 1}d_{i}^{q}\bigg{)}\lim_{\ell\to\infty}\bigg{(}(\ell+1)^{q-1}+\frac{5^{q}}{\ell+1}\bigg{)}=0,$

where the inequality results from using the fact that $(\sum_{i}a_{i})^{q}\leq\sum_{i}a_{i}^{q}$ for $q\in(0,1]$ .

We can conclude that for the choice of weights (5.5), $\mathcal{D}_{s}(\lambda)$ can be bounded independently of $s$ . Equation (5.6) follows by applying the bound on $\mathcal{D}_{s}(\lambda)$ to Theorem 6. ∎

Remark 8.

The recent paper [41] demonstrates that it is possible to achieve double the convergence rate of the $L^{2}$ kernel approximation error for functions that are sufficiently smooth, by leveraging the orthogonal projection property of the kernel interpolant, using a method reminiscent of the Aubin-Nitsche trick. The theory from [41] can be directly applied to our theoretical results, i.e., one should expect a convergence rate of $\frac{1}{2\lambda}$ rather than the $\frac{1}{4\lambda}$ as implied by (2.16), since the analytic nature of the solution $u({\boldsymbol{x}},{\boldsymbol{y}})$ of (2.2) with respect to ${\boldsymbol{y}}$ implies that $u({\boldsymbol{x}},\cdot)\in\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})$ for all $\alpha\geq 1$ . However, if we impose restrictions to ensure dimension independence of the constant $\mathcal{D}_{s}(\lambda)$ , then we cannot simply double the convergence rate in (5.6). Instead, the convergence rate for the QMC error determined in Theorem 7 will remain unmodified, while the choices of $\alpha$ and $\lambda$ will change. We now choose $\alpha=\lfloor\frac{1}{2q}+\frac{3}{4}\rfloor$ and $\lambda=\frac{2q}{2-q}$ , which means that $\alpha$ could now be selected to be smaller than before whilst maintaining the same rate of convergence and dimension independence of the constant.

The main result for total error of approximation using the multilevel kernel approximation method is presented below.

Theorem 9.

Under Assumptions (A(A0))–(A(A6)) and with $\alpha$ , $\lambda$ and weights $\gamma_{\mathrm{\mathfrak{u}}}$ chosen as in Theorem 7, the total error of the multilevel kernel approximation of $u$ satisfies

\displaystyle\|u-I^{\rm{ML}}_{L}u\|_{L^{2}(\Omega\times D)}

\displaystyle\lesssim\|f\|_{L^{2}(D)}\bigg{(}s^{-(\frac{1}{p}-\frac{1}{2})}+h_{L}^{2}+\sum_{\ell=0}^{L}[\varphi(N_{\ell})]^{-(\frac{1}{2q}-\frac{1}{4})}h_{\ell-1}^{2}\bigg{)},

where $h_{-1}\coloneq 1$ , and the implied constant is independent of the dimension $s$ .

Proof.

The result is simply obtained by combining the dimension truncation and FE bounds given in (2.6) and (2.9), respectively, with the multilevel kernel approximation error (5.6) to bound the total approximation error decomposition given in (3.1). ∎

Theorem 9 shows that Assumptions (M(M2)) and (M(M2)) of Theorem 1 hold with $\kappa=\frac{1}{p}-\frac{1}{2}$ , $\mu=\frac{1}{2q}-\frac{1}{4}$ and $\beta=2$ .

Remark 10.

CBC construction with weights of the form (5.5) are too costly to implement. Following [23], it can be shown that Theorem 9 continues to hold for the SPOD weights

\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathrm{\mathfrak{u}}}|}}[(|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}|+5!)]^{\frac{2}{1+\lambda}}\prod_{i\in{\mathrm{\mathfrak{u}}}}\bigg{(}\frac{\overline{b}_{i}^{m_{i}}\,S(\alpha,m_{i})}{\sqrt{2e^{1/e}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}.

However, these SPOD weights still come with a quadratic cost in dimension when evaluating the kernel basis functions to construct the approximation compared to the linear cost of product weights. Instead, the paper [25] proposes serendipitous product weights, whereby the order-dependent part of the weights is simply omitted, giving in our case

\displaystyle\gamma_{\mathrm{\mathfrak{u}}}\coloneq\prod_{i\in{\mathrm{\mathfrak{u}}}}\Bigg{(}\sum_{m=1}^{\alpha}\frac{\overline{b}_{i}^{m}S(\alpha,m)}{\sqrt{2e^{1/e}\zeta(2\alpha\lambda)}}\Bigg{)}^{\frac{2}{1+\lambda}}.

(5.9)

For these weights, the upper bound on the error of the kernel approximation will continue to have the same theoretical rates of convergence seen in Theorem 9, although the implied constant will no longer be independent of $s$ .

The paper [25] demonstrates that serendipitous product weights offer comparable performance to SPOD weights for easier problems and superior performance in problems where SPOD weights may fail completely. A possible explanation is that the SPOD weights derived from theory may be poorer due to overestimates in the bounds. A second, more practical explanation is that the magnitude of SPOD weights substantially increases as dimension grows which leads to larger, more peaked kernel basis functions. The resulting approximations using these basis functions tend to also be very peaked, resulting in a less “smooth” approximation. It is worth noting that integration problems are more robust to overestimates that may affect the quality of weights, as the weights do not directly appear in the computation of the integral. This is not true for the kernel approximation problem where the weights appear explicitly in the kernel basis functions.

6 Implementing the ML kernel approximation

In this section, we outline how to compute efficiently the multilevel approximation for the PDE problem using a matrix-vector representation of (1.4). Throughout we consider a fixed truncation dimension and so omit the superscript $s$ .

First, we outline how to construct the single-level kernel interpolant for the PDE problem. As in [23], the single-level kernel interpolant applied to the FE approximation of the PDE solution is given by (2.14) with $g=u_{h}({\boldsymbol{x}},\cdot)$ , for ${\boldsymbol{x}}\in D$ . Each kernel interpolant coefficient $a_{N,h,k}\in V_{h}$ is now a FE function, which we can expand using the FE basis functions $\{\phi_{h,i}\}_{i=1}^{M}$ , with $M=\dim(V_{h})$ , to write

\displaystyle I_{N}u_{h}({\boldsymbol{x}},{\boldsymbol{y}})\,=\,\sum_{k=0}^{N-1}a_{N,h,k}({\boldsymbol{x}})\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\,=\,\sum_{i=1}^{M}\sum_{k=0}^{N-1}a_{N,h,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}})\,\phi_{h,i}({\boldsymbol{x}}).

(6.1)

To enforce interpolation for the FE solution, we equate the coefficients of the FE functions $\phi_{h,i}$ in (2.8) and (6.1) at each lattice point ${\boldsymbol{y}}={\boldsymbol{t}}_{k^{\prime}}$ , leading to the requirement

\displaystyle u_{h,i}({\boldsymbol{t}}_{k^{\prime}})=\sum_{k=0}^{N-1}a_{N,h,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})\quad\text{for all }k^{\prime}=0,1,\ldots,N-1,\;i=1,2,\ldots,M.

Thus, the coefficients ${\boldsymbol{A}}_{N,h}\coloneqq[a_{N,h,k,i}]_{0\leq k\leq N-1,\,1\leq i\leq M}\in\mathbb{R}^{N\times M}$ are the solution to the matrix equation

K_{N,\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{N,h}\,=\,{\boldsymbol{U}}_{\!N,h},

where $K_{N,\alpha,{\boldsymbol{\gamma}}}=[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{t}}_{k^{\prime}})]_{k,k^{\prime}=0}^{N-1}$ , and ${\boldsymbol{U}}_{\!N,h}=[u_{h_{,}i}({\boldsymbol{t}}_{k})]_{0\leq k\leq N-1,\,1\leq i\leq M}$ is an $N\times M$ matrix of the FE nodal values at the lattice points.

Next we present a similar matrix-vector representation for the multilevel kernel approximation (1.4). Since we are using an embedded lattice rule, $N_{\ell}$ is a factor of $N_{\ell-1}$ and the point sets across the levels are embedded, with an ordering such that

\displaystyle{\boldsymbol{t}}_{\ell,k}={\boldsymbol{t}}_{\ell-1,\,k(N_{\ell-1}/N_{\ell})}\quad\text{for }\ell=1,2,\ldots,L.

(6.2)

It follows that ${\boldsymbol{t}}_{\ell,k}={\boldsymbol{t}}_{0,\,k(N_{0}/N_{\ell})}$ and the kernel matrices for each level $\ell$ are nested with the same structure.

Since the FE spaces are also nested, we can expand the difference on each level $\ell\geq 1$ in the FE basis for $V_{\ell}=V_{h_{\ell}}$

u_{\ell}(\cdot,{\boldsymbol{y}})-u_{\ell-1}(\cdot,{\boldsymbol{y}})=\sum_{i=1}^{M_{\ell}}\big{[}u_{\ell,i}({\boldsymbol{y}})-\overline{u}_{\ell-1,i}({\boldsymbol{y}})\big{]}\phi_{h_{\ell},i}\in V_{\ell},

(6.3)

where $M_{\ell}\coloneqq\dim(V_{\ell})$ and $[\overline{u}_{\ell-1,i}({\boldsymbol{y}})]_{i=1}^{M_{\ell}}$ are the FE coefficients for $u_{\ell-1}(\cdot,{\boldsymbol{y}})$ after (exact) interpolation onto $V_{\ell}$ .

Similar to (6.1), the kernel approximation on level $\ell$ is

I_{\ell}(u_{\ell}-u_{\ell-1})({\boldsymbol{x}},{\boldsymbol{y}})=\sum_{i=1}^{M_{\ell}}\sum_{k=0}^{N_{\ell}-1}a_{\ell,k,i}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{\ell,k},{\boldsymbol{y}})\,\phi_{h_{\ell},i}({\boldsymbol{x}}),

(6.4)

and we now choose the coefficients $a_{\ell,k,i}=a_{N_{\ell},h_{\ell},k,i}$ such that (6.4) interpolates the difference (6.3) at each lattice point ${\boldsymbol{y}}={\boldsymbol{t}}_{\ell,k}$ . This leads to $L+1$ matrix equations

\displaystyle K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{\ell}\,=\,\begin{cases}{{\boldsymbol{U}}}_{\!0}&\mbox{for }\ell=0,\\ {\boldsymbol{U}}_{\!\ell}-\overline{{\boldsymbol{U}}}_{\!\ell-1}&\mbox{for }\ell=1,\ldots,L,\end{cases}

(6.5)

where now each coefficient matrix is ${\boldsymbol{A}}_{\ell}\coloneqq{\boldsymbol{A}}_{N_{\ell},h_{\ell}}=[a_{\ell,k,i}]\in\mathbb{R}^{N_{\ell}\times M_{\ell}}$ , ${\boldsymbol{U}}_{\!\ell}\coloneqq{\boldsymbol{U}}_{\!N_{\ell},h_{\ell}}=[u_{\ell,i}({\boldsymbol{t}}_{\ell,k})]\in\mathbb{R}^{N_{\ell}\times M_{\ell}}$ is the matrix of FE coefficients at the lattice points for level $\ell$ , and ${\overline{{\boldsymbol{U}}}}_{\!\ell-1}=[\overline{u}_{\ell-1,i}({\boldsymbol{t}}_{\ell,k})]\in\mathbb{R}^{N_{\ell}\times M_{\ell}}$ is the matrix of FE coefficients for $u_{\ell-1}(\cdot,{\boldsymbol{t}}_{\ell,k})\in V_{\ell-1}$ interpolated onto $V_{\ell}$ then evaluated at the lattice points for level $\ell$ .

To obtain each coefficient matrix ${\boldsymbol{A}}_{\ell}$ , we exploit the circulant structure of kernel matrices to solve the linear systems (6.5) using FFT. As a result, we only require the first column of each $K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}$ and furthermore, since the lattice points are nested as in (6.2), we obtain this column by taking every $(N_{0}/N_{\ell})$ th entry (starting from the first) of the first column of $K_{N_{0},\alpha,{\boldsymbol{\gamma}}}$ . Similarly, since the FE spaces are nested we obtain $\overline{{\boldsymbol{U}}}_{\ell-1}$ for $\ell\geq 1$ using the FE matrix ${\boldsymbol{U}}_{\ell-1}$ from the previous level. Specifically, we take every $(N_{\ell-1}/N_{\ell})$ th row of ${\boldsymbol{U}}_{\ell-1}$ then interpolate it onto $V_{\ell}$ to give $\overline{{\boldsymbol{U}}}_{\ell-1}$ . This procedure is given in Algorithm 1.

Algorithm 1 Constructing the ML approximation

1:Compute 1st column of

K_{N_{0},\alpha,{\boldsymbol{\gamma}}}

\triangleright

kernel matrix for entire lattice point set

2:Compute

{\boldsymbol{U}}_{\!0}

\triangleright

FE solutions in

V_{0}

at all lattice points

3:Solve

K_{N_{0},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{0}={\boldsymbol{U}}_{\!0}

using FFT

\triangleright

solve for coefficients

4:for

\ell=1,\ldots,L

5: Compute 1st column of

K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}

\triangleright

(N_{0}/N_{\ell})

th entries of column 1 of

K_{N_{0},\alpha,{\boldsymbol{\gamma}}}

6: Compute

{\boldsymbol{U}}_{\ell}

\triangleright

FE solutions in

V_{\ell}

N_{\ell}

lattice points

7: Interpolate

(N_{\ell-1}/N_{\ell})

th rows of

{\boldsymbol{U}}_{\ell-1}

onto

V_{\ell}

to obtain

\overline{U}_{\!\ell-1}

8: Solve

K_{N_{\ell},\alpha,{\boldsymbol{\gamma}}}\,{\boldsymbol{A}}_{\ell}={{\boldsymbol{U}}}_{\!\ell}-{\overline{{\boldsymbol{U}}}}_{\!\ell-1}

using FFT

\triangleright

solve for coefficients

9:end for

On completion of Algorithm 1, we have a collection of coefficient matrices $\{{\boldsymbol{A}}_{\ell}\}_{\ell=0}^{L}$ that we can use to compute $I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})$ , using (1.4) and (6.4), to approximate the PDE solution at any $({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\in D\times\Omega_{s}$ . By using the embedded property of the lattice points and the local support of the FE basis, we can evaluate $I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})$ efficiently with cost $\mathcal{O}(N_{0})$ as we explain below.

We can write (6.4) as

	$\displaystyle I_{\ell}(u_{\ell}-u_{\ell-1})({\boldsymbol{x}}^{},{\boldsymbol{y}}^{})\,$	$\displaystyle=\,\sum_{k=0}^{N_{\ell}-1}\Bigg{(}\underbrace{\sum_{\begin{subarray}{c}i=1\\ {\boldsymbol{x}}^{}\in\mathrm{supp}(\phi_{h_{\ell},i})\end{subarray}}^{M_{\ell}}a_{\ell,k,i}\,\phi_{h_{\ell},i}({\boldsymbol{x}}^{})}_{[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{})]_{k}}\Bigg{)}\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{\ell,k},{\boldsymbol{y}}^{})$
		$\displaystyle=\,\sum_{k^{\prime}=0}^{N_{0}-1}\mathbb{I}(k^{\prime}N_{\ell}\equiv 0\,({\rm mod}\,N_{0}))\,[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{})]_{\lfloor k^{\prime}N_{\ell}/N_{0}\rfloor}\,\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{0,k^{\prime}},{\boldsymbol{y}}^{}).$

Effectively, for each $\ell$ , we compute the sum over $i$ in (6.4) by “interpolating in space” along the columns of ${\boldsymbol{A}}_{\ell}$ to get a kernel coefficient vector at ${\boldsymbol{x}}^{*}$ , denoted ${\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})\in\mathbb{R}^{N_{\ell}}$ . Since the FE basis functions are locally supported, there will be a small number of indices $i$ in (6.4) such that $\phi_{h_{\ell},i}({\boldsymbol{x}}^{*})$ is nonzero. Multiplying the corresponding columns ${\boldsymbol{A}}_{\ell}$ by $\phi_{h_{\ell},i}({\boldsymbol{x}}^{*})$ and summing them up then gives ${\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})$ at a cost of $\mathcal{O}(N_{\ell})$ . In the second equality above we used the embedded property of the lattice points (6.2) to write this as a sum over the entire lattice point set. The indicator function $\mathbb{I}$ accounts for the extra terms that were added.

Summing over $\ell=0,1,\ldots,L$ gives a single kernel coefficient vector ${\boldsymbol{a}}({\boldsymbol{x}}^{*})\in\mathbb{R}^{N_{0}}$ ,

[{\boldsymbol{a}}({\boldsymbol{x}}^{*})]_{k^{\prime}}\,=\,[{\boldsymbol{a}}_{0}({\boldsymbol{x}}^{*})]_{k^{\prime}}+\sum_{\ell=1}^{L}\mathbb{I}(k^{\prime}N_{\ell}\equiv 0\,({\rm mod}\,N_{0}))\,[{\boldsymbol{a}}_{\ell}({\boldsymbol{x}}^{*})]_{\lfloor k^{\prime}N_{\ell}/N_{0}\rfloor},

at a cost $\mathcal{O}(N_{0})$ . Since the points are embedded $\sum_{\ell=0}^{L}N_{L}=\mathcal{O}(N_{0})$ and hence, the total cost to construct the kernel coefficient vector ${\boldsymbol{a}}({\boldsymbol{x}}^{*})$ is $\mathcal{O}(N_{0})$ .

Similarly, we store all of the kernel evaluations at ${\boldsymbol{y}}$ anchored at each of the lattice points in a single vector ${\boldsymbol{\kappa}}({\boldsymbol{y}}^{*})\coloneqq[\mathcal{K}_{\alpha,{\boldsymbol{\gamma}}}({\boldsymbol{t}}_{k},{\boldsymbol{y}}^{*})]_{k=0}^{N_{0}-1}$ , which costs $\mathcal{O}(s^{\rho}\alpha^{\varsigma}N_{0})$ where $\rho,\varsigma>0$ depend on the structure of the kernel and are specified in Section 2.6.

Finally, the ML kernel approximation of $u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})$ is computed by the product

u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\,\approx\,I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})\,=\,{\boldsymbol{a}}({\boldsymbol{x}}^{*})^{\top}{\boldsymbol{\kappa}}({\boldsymbol{y}}^{*}).

Combining the costs above leads to a total cost of evaluating $I_{L}^{\mathrm{ML}}u({\boldsymbol{x}}^{*},{\boldsymbol{y}}^{*})$ of $\mathcal{O}(s^{\rho}\alpha^{\varsigma}N_{0})$ .

7 Numerical experiments

In this section, we present the results of numerical experiments conducted using the high performance computational cluster Katana [38].

7.1 Problem specification

The parametric PDE.

We consider the spatial domain $D=(0,1)^{2}$ with source term $f({\boldsymbol{x}})=x_{2}$ . For the parametric coefficient defined in (1.2), we choose $\psi_{0}\equiv 1$ and

\displaystyle\psi_{j}({\boldsymbol{x}})=\frac{C}{\sqrt{6}}j^{-\vartheta}\sin(j\pi x_{1})\sin(j\pi x_{2})\qquad\mbox{for}\quad j\geq 1,

where $C>0$ is a scaling parameter to vary the magnitude of the random coefficient, and $\vartheta>1$ is a decay parameter determining how quickly the importance of each random parameter decreases. The factor $\frac{1}{\sqrt{6}}$ is included for easier comparison to the experiments in [23] and [25]. For this choice, the bounds on the coefficient assumed in (A(A0)) are given by $a_{\min}\coloneq\!1-\frac{C}{\sqrt{6}}\zeta(\vartheta)$ and $a_{\max}\coloneq\!1+\frac{C}{\sqrt{6}}\zeta(\vartheta)$ . Thus, from (2.3) we have

\displaystyle b_{j}=\frac{Cj^{-\vartheta}}{a_{\min}\sqrt{6}}\qquad\mbox{and}\qquad\overline{b}_{j}=\frac{C\pi j^{1-\vartheta}}{a_{\min}\sqrt{6}}\qquad\mbox{for}\quad j\geq 1.

Assumptions (A(A0)) and (A(A6)) hold provided that $1>p>\frac{1}{\vartheta}$ and $1>q>\frac{1}{\vartheta-1}$ . Having chosen $p>\frac{1}{\vartheta}$ , we can set $q=\frac{p}{1-p}$ .

We will present numerical results for two choices of parameters: $C=1.5,\vartheta=3.6$ and $C=0.2,\vartheta=1.2$ , with the first being easier than the second, and consider the truncated parametric domain $[0,1]^{64}$ , i.e., $s=64$ .

Smoothness, weights, and CBC construction.

The smoothness parameter of the kernel is fixed as $\alpha=1$ . We refrain from using higher values of $\alpha$ to avoid stability issues in inaccurate eigenvalue computation via FFT in double precision. Issues in double precision computations arise even for small values of $\alpha$ and relatively small $N$ , necessitating the use of arbitrary-precision computing (see e.g., [31]).

Following Remark 10, we use serendipitous weights (5.9) for our experiments. We take $\alpha=1$ and $\lambda=\frac{1}{2\alpha}+0.1=0.6$ in (5.9), and use the CBC algorithm described in [30] to construct one embedded lattice rule designed for each parameter set to be used in the range $2^{7}\leq N\leq 2^{17}$ up to dimension $s=64$ .

Convergence and cost.

According to (2.17) and Theorem 6, and ignoring the dependence of the constants on $s$ (which is fixed here), we have theoretically $\mathcal{O}(N_{\ell}^{-\mu})$ convergence in Assumption (M(M2)) of Theorem 1 with $\mu=1/(4\lambda)\approx 0.417$ . As explained in Remark 8, since the PDE solution has a higher smoothness order than $\alpha=1$ , the theoretical convergence rate doubles to $\mu=1/(2\lambda)\approx 0.833$ .

Recall that the computational cost to achieve an error $\varepsilon>0$ for the single-level approximation is (2.19), and for the multilevel approximation it is (3.4). Our spatial domain is the unit square, so $d=2$ . The FE convergence rate in Assumption (M(M2)) of Theorem 1 is determined by Theorem 6 to be $\beta=2$ , which also applies for the single-level FE error as indicated by (2.9). Since the serendipitous weights (5.9) are product weights, we have $\rho=1$ . With a fixed dimension $s=64$ , we may informally set $\kappa=\infty$ and $\tau\approx d$ in Theorem 1. The third case in (3.4) is not relevant when using product weights. Hence, we obtain the theoretical cost bounds

\displaystyle\begin{cases}\mbox{cost}(I_{N}u_{h}^{s})\lesssim\varepsilon^{-\frac{1}{\mu}-1}=\varepsilon^{-2\lambda-1}=\varepsilon^{-2.2},\\ \mbox{cost}(I^{\text{ML}}_{L}u)\lesssim\varepsilon^{-\max(\frac{1}{\mu},1)}=\varepsilon^{-\max(2\lambda,1)}=\varepsilon^{-1.2}.\end{cases}

(7.1)

7.2 Diagnostic plots

FE.

Figure 1 plots the computational cost, measured by CPU time in seconds, of assembling and solving the FE linear system required to obtain the PDE solution, and the FE error $\|u^{s}_{h^{*}}-u^{s}_{h}\|_{L^{2}(\Omega\times D)}\approx\sqrt{\frac{1}{N}\sum_{k=0}^{N-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{t}}_{k})-u^{s}_{h}(\cdot,{\boldsymbol{t}}_{k})\|_{L^{2}(D)}^{2}}$ for mesh widths $h\in\{2^{-3},\ldots,2^{-8}\}$ with respect to the reference solution with mesh width $h^{*}=2^{-9}$ and $s=64$ . We use $N=2^{16}$ lattice points for the integral over $L^{2}(\Omega)$ , and the $L^{2}(D)$ norm is computed exactly from the FE coefficients. We observe an error convergence rate of $\mathcal{O}(h^{2})$ and a cost of $\mathcal{O}(h^{-2})$ , demonstrating that indeed $\beta=2$ and $\tau\approx d=2$ in Assumptions (M(M2)) and (M(M2)) of Theorem 1.

Refer to caption — Figure 1: Cost for the FE solve on the left. The FE error $\|u^{s}_{h^{*}}-u^{s}_{h}\|_{L^{2}(\Omega\times D)}$ on the right. They demonstrate $\beta=2$ and $\tau\approx d=2$ in Assumptions (M(M2)) and (M(M2)) of Theorem 1.

Dimension truncation.

For $s\in\{2^{3},\ldots,2^{7}\}$ and reference values $s^{*}=2^{8}$ and $h=2^{9}$ , we compute the error $\|u^{s^{*}}_{h}-u^{s}_{h}\|_{L^{2}(\Omega\times D)}\approx\sqrt{\frac{1}{N}\sum_{k=0}^{N-1}\|u^{s^{*}}_{h}(\cdot,{\boldsymbol{t}}_{k})-u^{s}_{h}(\cdot,{\boldsymbol{t}}_{k})\|_{L^{2}(D)}^{2}}$ using $N=2^{16}$ lattice points. In Figure 2, on the horizontal axis, we overlay the dimension truncation errors with the FE errors for $h\in\{2^{-3},\ldots,2^{-8}\}$ , with the vertical axis on the left in terms of $s$ (purple circles), and on the right in terms of $h$ (blue triangles). (The relative slope between the overlaid plots is arbitrary.) We observe $\mathcal{O}(s^{-\kappa})$ convergence (c.f. Assumption (M(M2)) of Theorem 1) with $\kappa=4.06$ for the easier problem $C=1.5,\vartheta=3.6$ , and $\kappa=1.72$ for the harder problem $C=0.2,\vartheta=1.2$ . For both problems with $s=2^{6}=64$ we obtain dimension truncation errors much less than $10^{-6}$ , i.e., smaller than all the FE errors with $h\in\{2^{-3},\ldots,2^{-8}\}$ in Figure 1.

Single-level kernel interpolation.

Following [23], we use an efficient formula to estimate the single-level kernel interpolation error

\displaystyle\|(I-I_{N})u^{s}_{h^{*}}\|_{L^{2}(\Omega\times D)}

\displaystyle\approx\sqrt{\frac{1}{RN}\sum_{r=1}^{R}\sum_{k=0}^{N-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-I_{N}u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\|^{2}_{L^{2}(D)}},

where $\{{\boldsymbol{y}}_{r}\}_{r=1}^{R}$ is a sequence of Sobol^′ points, $\{{\boldsymbol{t}}_{k}\}_{k=0}^{N-1}$ is a sequence of lattice points, with $h^{*}=2^{-9}$ , $s=64$ , $R=10$ , and $N\in\{2^{4},\ldots,2^{16}\}$ . In Figure 3, on the horizontal axis, we overlay the interpolation errors with the FE errors for $h\in\{2^{-3},\ldots,2^{-8}\}$ , with the vertical axis on the left in terms of $h$ (blue triangles), and on the right in terms of $N$ (red circles). We observe $\mathcal{O}(N^{-1.5})$ convergence for the easier problem $C=1.5,\vartheta=3.6$ , and $\mathcal{O}(N^{-0.99})$ for the harder problem $C=0.2,\vartheta=1.2$ .

	SL	${\rm ML}_{\ell}$	0	1	2	3	4	5
	$h$	$h_{\ell}$	$2^{-3}$	$2^{-4}$	$2^{-5}$	$2^{-6}$	$2^{-7}$	$2^{-8}$
$C=1.5,\vartheta=3.6$	$N$	$N_{L-\ell}$	$2^{6}$	$2^{7}$	$2^{9}$	$2^{10}$	$2^{11}$	$2^{13}$
$C=0.2,\vartheta=1.2$	$N$	$N_{L-\ell}$	$2^{6}$	$2^{8}$	$2^{10}$	$2^{12}$	$2^{14}$	$2^{16}$

Table 1: Combinations of FE mesh widths and number of lattice points used for the single-level and multilevel approximations.

We use Figure 3 to decide how to match the FE mesh width to the number of lattice points so as to give comparable errors. For each $h$ from the FE error data point in Figure 3, we trace upward to maintain the same error and look for the nearest interpolation error data point in order to find a matching $N$ with a similar error. (Note that the relative slope between the overlaid plots is arbitrary and does not affect the outcome.) This yields the pairings $(h,N)$ for the single-level kernel interpolant for the two problems in Table 1. We will also use Table 1 to form the pairings $(h_{\ell},N_{\ell})$ for our multilevel kernel approximation below. Notice the reverse labeling $N_{L-\ell}$ in Table 1, reflecting the fact that the values of $N_{\ell}$ decrease with increasing $\ell$ . For example, when $L=2$ , we have $h_{0}=2^{-3}$ , $h_{1}=2^{-4}$ , $h_{2}=2^{-5}$ , and according to Table 1 for the case $C=1.5,\vartheta=3.6$ , we take $N_{0}=2^{9}$ , $N_{1}=2^{7}$ , $N_{2}=2^{6}$ .

Interpolation error of the FE difference.

Figure 4 plots the estimates of

	$\displaystyle\\|(I-I_{N})(u_{\ell}-u_{\ell-1})\\|_{L^{2}(\Omega\times D)}$
	$\displaystyle\approx\sqrt{\frac{1}{RN}\sum_{r=1}^{R}\sum_{k=0}^{N-1}\\|(u^{s}_{h_{\ell}}-u^{s}_{h_{\ell-1}})(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-I_{N}(u^{s}_{h_{\ell}}-u^{s}_{h_{\ell-1}})(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\\|^{2}_{L^{2}(D)}},$

for both sets of parameters, with $s=64$ , $R=10$ , $h_{\ell}=2^{-\ell-3}$ , and varying $N$ and $\ell$ .

The top two plots in Figure 4 show that $\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}$ decreases with increasing $N$ for each $\ell\in\{0,\ldots,4\}$ . The case $\ell=0$ is exactly the single-level interpolation error $\|(I-I_{N})u^{s}_{h_{0}}\|_{L^{2}(\Omega\times D)}$ , and we observe a faster rate of convergence compared to the interpolation error of the FE differences for $\ell\in\{1,\ldots,4\}$ . For the easier problem $C=1.5,\vartheta=3.6$ on the left, we observe $\mathcal{O}(N^{-1.51})$ for $\ell=0$ and $\mathcal{O}(N^{-1.19})$ on average for the other $\ell$ . For the harder problem $C=0.2,\vartheta=1.2$ on the right, we observe $\mathcal{O}(N^{-1.00})$ for $\ell=0$ and $\mathcal{O}(N^{-0.71})$ on average for the other $\ell$ . (The rates for $\ell=0$ are different to those observed in Figure 3 because we have $h_{0}=2^{-3}$ here while $h^{*}=2^{-9}$ in Figure 3.) These observed rates correspond to the value of $\mu$ in Assumption (M(M2)) of Theorem 6.

The bottom plots in Figure 4 show that $\|(I-I_{N})(u_{\ell}-u_{\ell-1})\|_{L^{2}(\Omega\times D)}$ decreases with increasing level $\ell$ for a range of values of $N$ . (Larger values of $N$ are used for the harder problem $C=0.2,\vartheta=1.2$ on the right.) This illustrates how the FE error changes on each level. We observe an error reduction of roughly $\mathcal{O}(h_{\ell}^{2})$ , thus verifying that $\beta=2$ in Assumption (M(M2)) of Theorem 6.

7.3 Multilevel results

Figure 5 plots the computational cost, measured by CPU time in seconds, of the single-level and multilevel approximations against their respective errors for the two parameter sets. The errors are estimated by

\displaystyle\|(I-A)u^{s}_{h^{*}}\|_{L^{2}(\Omega\times D)}

\displaystyle\approx\sqrt{\frac{1}{RN^{*}}\sum_{r=1}^{R}\sum_{k=0}^{N^{*}-1}\|u^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})-Au^{s}_{h^{*}}(\cdot,{\boldsymbol{y}}_{r}+{\boldsymbol{t}}_{k})\|^{2}_{L^{2}(D)}},

where $Au^{s}_{h^{*}}$ denotes either a single-level or multilevel approximation of $u^{s}_{h^{*}}$ , with $s=64$ , $R=10$ , $h^{*}=2^{-9}$ , and $N^{*}$ is the maximum number of lattice points used to construct the approximation (i.e., $N^{*}=N_{0}$ for a multilevel approximation).

Starting from bottom right to top left within the plots in Figure 5, each data point of the single-level result corresponds to a decreasing FE mesh width $h\in\{2^{-3},\ldots,2^{-8}\}$ with corresponding increasing number of lattice points $N$ from Table 1, while each data point of the multilevel result corresponds to an increasing total number of levels $L\in\{0,\ldots,5\}$ with the corresponding choices for $\{(h_{\ell},N_{\ell}):\ell=0,\ldots,L\}$ taken from Table 1. The $\ell=0$ data point for the multilevel error coincides with the single-level error as expected.

Recall from (7.1) that the theoretical costs for the single-level and multilevel approximations to achieve an error $\varepsilon>0$ are $\mathcal{O}(\varepsilon^{-2.2})$ and $\mathcal{O}(\varepsilon^{-1.2})$ , respectively. The triangles in the plots show gradients 1 and 2 as denoted. The observed costs shown in the legends of Figure 5 are mostly lower (better) than the theoretical costs. It may be that the theoretical rate of convergence $\mu$ used in estimating the cost in (7.1) is overly pessimistic. We can instead estimate the cost using the observed rates for $\mu$ from Figure 4.

For the easier problem $C=1.5,\vartheta=3.6$ on the left of Figure 4, we observe $\mu=1.51$ and $\mu=1.19$ (average) for single-level and multilevel errors, respectively, giving the expected costs $\mathcal{O}(\varepsilon^{-1.66})$ and $\mathcal{O}(\varepsilon^{-1})$ , respectively. These are closer to the corresponding observed costs $\mathcal{O}(\varepsilon^{-1.55})$ and $\mathcal{O}(\varepsilon^{-1.06})$ on the left of Figure 5.

For the harder problem $C=0.2,\vartheta=1.2$ on the right of Figure 4, we observe $\mu=1.00$ and $\mu=0.71$ (average) for single-level and multilevel errors, respectively, giving the expected costs $\mathcal{O}(\varepsilon^{-2})$ and $\mathcal{O}(\varepsilon^{-1.41})$ , respectively. These again are closer to the corresponding observed costs $\mathcal{O}(\varepsilon^{-1.85})$ and $\mathcal{O}(\varepsilon^{-1.27})$ on the right of Figure 5.

These observations have been replicated for several other parameter sets, which are omitted for brevity.

8 Conclusion

This paper introduces a multilevel kernel method for approximating solutions to PDEs with periodic coefficients over the parametric domain. A theoretical framework is developed with full detail, including $L^{2}$ error estimates for the multilevel method and a comprehensive cost comparison between the single-level and multilevel approaches. The construction of the multilevel approximation is also outlined and is supported by numerical experiments demonstrating the advantages of the multilevel approximation for different choices of parameters with varying levels of difficulty.

When applying multilevel methods to integration, randomisation can be used to compute an unbiased mean-square error estimate, and the number of levels and/or samples per level can be adaptively adjusted on the run. In contrast, a practical challenge in applying the multilevel kernel approximation is that there is no efficient adaptive multilevel approximation strategy. This is because the error cannot be effectively evaluated during the implementation, since any estimation must be done with respect to some reference solution. This requires either solving the PDE again at several sample points or constructing an expensive single-level proxy of the true solution, defeating the purpose of applying multilevel approximation. We leave investigations into adaptive multilevel kernel approximation for future research.

References

[1] G. Byrenheid, L. Kämmerer, T. Ullrich and T. Volkmer, Tight error bounds for rank-1 lattice sampling in spaces of hybrid mixed smoothness, Numer. Math. 136 993–1034, (2017).
[2] J. Charrier, R. Scheichl and A. L. Teckentrup, Finite element error analysis of elliptic PDEs with random coefficients and its application to multilevel Monte Carlo methods, SIAM J. Numer. Anal. 51, 332–352 (2013).
[3] P. G. Ciarlet, The Finite Element Method for Elliptic Problems SIAM, Philadelphia, PA, USA, 2002.
[4] K. A. Cliffe, I. G. Graham, R. Scheichl and L. Stals, Parallel computation of flow in heterogeneous media using mixed finite elements, J. Comput. Phys. 164, 258–282 (2000).
[5] K. A. Cliffe, M. B. Giles, R. Scheichl and A. L. Teckentrup, Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients, Comput. Visual. Sc. 14, 3–15 (2011).
[6] A Cohen, R. De Vore and Ch. Schwab, Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs, Found. Comput. Math. 10, 615–646 (2010).
[7] R. Cools, F. Y. Kuo, D. Nuyens and I. H. Sloan, Lattice algorithms for multivariate approximation in periodic spaces with general weight parameters, Contemp. Math. 754, 93–113 (2020)
[8] R. Cools, F. Y. Kuo, D. Nuyens and I. H. Sloan, Fast component-by-component construction of lattice algorithms for multivariate approximation with POD and SPOD weights, Math. Comp. 90, 787–812 (2021).
[9] M. Croci and P. E. Farrell, Complexity bounds on supermesh construction for quasi-uniform meshes, J. Comput. Phys. 414, 1–7 (2020).
[10] M. Croci, M. B. Giles , M. E. Rognes and P. E. Farrell, Efficient white noise sampling and coupling for multilevel Monte Carlo with nonnested meshes, SIAM/ASA J. Uncertain. Quantif. 6, 1630–1655 (2018).
[11] N. Dyn, Interpolation and approximation by radial and related functions, In: Approximation Theory VI (C. Chui, L. Schumaker, and J. Ward, eds), Academic Press (New York), 211–223 (1989).
[12] P. E. Farrell, M. D. Piggott, C. C. Pain, G. J. Gorman and C. R. Wilson, Conservative interpolation between unstructured meshes via supermesh construction, Comput. Methods Appl. Mech. Engrg. 198, 2632–2642 (2009)
[13] A. D. Gilbert and R. Scheichl, Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems I: regularity and error analysis, IMA J. Numer. Anal. 44, 466–503 (2024).
[14] A. D. Gilbert, F. Y. Kuo and A. Srikumar, Density estimation for elliptic PDE with random input by preintegration and quasi-Monte Carlo methods, to appear in: SIAM J. Numer. Anal., arXiv:2402.11807 (2025).
[15] M. B. Giles, Improved multilevel Monte Carlo convergence using the Milstein scheme, In: Monte Carlo and Quasi-Monte Carlo Methods 2006 (A. Keller, S. Heinrich, and H. Niederreiter, eds), Springer-Verlag, 343–358 (2007)
[16] M. B. Giles, Multilevel Monte Carlo path simulation, Oper. Res. 56, 607–617 (2008).
[17] M. B. Giles, Multilevel Monte Carlo methods, Acta Numer. 24, 259–328 (2015).
[18] M. B. Giles and B. J. Waterhouse, Multilevel quasi-Monte Carlo path simulation, Radon Series Comp. Appl. Math., 8, 1–18 (2009)
[19] I. G. Graham, F. Y. Kuo, J. A. Nichols, R. Scheichl, Ch. Schwab and I. H. Sloan, Quasi-Monte Carlo finite element methods for elliptic PDEs with lognormal random coefficients, Numer. Math., 131, 329–368 (2015).
[20] H. Hakula, H. Harbrecht, V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Uncertainty quantification for random domains using periodic random variables, Numer. Math. 156, 273–317 (2024)
[21] R. L. Hardy Multiquadric equations of topography and other irregular surfaces, J. Geophys. Res. 76, 1905–1915. (1971).
[22] S. Heinrich, Multilevel Monte Carlo methods, Lecture notes in Compu. Sci. Vol. 2179, 3624–3651, Springer (2001)
[23] V. Kaarnioja, Y. Kazashi, F. Y. Kuo, F. Nobile and I. H. Sloan, Fast approximation by periodic kernel-based lattice-point interpolation with application in uncertainty quantification, Numer. Math. 150, 33–77 (2022).
[24] V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Uncertainty quantification using periodic random variables, SIAM J. Numer. Anal. 58, 1068–1091 (2020).
[25] V. Kaarnioja, F. Y. Kuo and I. H. Sloan, Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions, In: Monte Carlo and Quasi-Monte Carlo Methods 2022 (A. Hinrichs, P. Kritzer and F. Pillichshammer, eds.), Springer-Verlag, 81–103 (2024)
[26] L. Kämmerer, Reconstructing hyperbolic cross trigonometric polynomials from sampling along rank-1 lattices, SIAM J. Numer. Anal., 51, 2773–2796 (2013).
[27] L. Kämmerer, Multiple rank-1 lattices as sampling schemes for multivariate trigonometric polynomials, J. Fourier. Anal. Appl., 24, 17–44 (2018).
[28] L. Kämmerer, D. Potts and T. Volkmer, Approximation of multivariate periodic functions by trigonometric polynomials based on rank-1 lattice sampling, J. Complexity, 31, 543–576 (2015).
[29] F. Y. Kuo, G. Migliorati, F. Nobile and D. Nuyens, Function integration, reconstruction and approximation using rank-1 lattices Math. Comp., 90, 1861 – 1897 (2021).
[30] F. Y. Kuo, W. Mo and D. Nuyens, Constructing embedded lattice-based algorithms for multivariate function approximation with a composite number of points, Constr. Approx., 61, 81–113 (2025).
[31] F. Y. Kuo, W. Mo, D. Nuyens, I. H. Sloan and A. Srikumar, Comparison of two search criteria for lattice-based kernel approximation, In: Monte Carlo and Quasi-Monte Carlo Methods 2022 (A. Hinrichs, P. Kritzer and F. Pillichshammer, eds), Springer Verlag, 413–429 (2024)
[32] F. Y. Kuo and D. Nuyens, Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients – a survey of analysis and implementation, Found. Comput. Math. 16, 1631–1696 (2016).
[33] F. Y. Kuo, Ch. Schwab and I. H. Sloan, Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients, SIAM J. Numer. Anal. 50, 3351–3374 (2012).
[34] F. Y. Kuo, Ch. Schwab and I. H. Sloan, Multilevel quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients, Found. Comput. Math. 15, 411–449 (2015).
[35] F. Y. Kuo, I. H. Sloan and H. Wózniakowski, Lattice rules for multivariate approximation in the worst case setting, In: Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and D. Talay, eds), Springer Verlag, 289–330, (2006).
[36] F. Y. Kuo, I. H. Sloan and H. Wózniakowski, Lattice rule algorithms for multivariate approximation in the average case setting, J. Complexity. 24, 283–323, (2008).
[37] F. J. Narcowich, J. D. Ward and H. Wendland, Refined error estimates for radial basis function interpolation, Constr. Approx. 19, 541–564, (2003).
[38] PVC (Research Infrastructure), UNSW Sydney, Katana. (2010) doi:10.26190/669X-A286
[39] J. Quaintance and H. W. Gould, Combinatorial Identities for Stirling Numbers: The Unpublished Notes of H. W. Gould, World Scientific Publishing Company, River Edge, NJ, (2015).
[40] R. Schaback and H. Wendland, Kernel techniques: from machine learning to meshless methods, Acta Numer. 15, 543–639 (2006).
[41] I. H. Sloan and V. Kaarnioja, Doubling the rate: improved error bounds for orthogonal projection with application to interpolation, BIT Numer. Math. 65 (Online) (2025)
[42] I. H. Sloan and H. Woźniakowski, Tractability of multivariate integration for weighted Korobov classes, J. Comp. 17, 697–721 (2001).
[43] A. L. Teckentrup, P. Jantsch, C. G. Webster and M. Gunzburger, A multilevel stochastic collocation method for partial differential equations with random input data, SIAM-ASA J. Uncert. Quantif. 3, 1046–1074 (2015).
[44] A. L. Teckentrup, R. Scheichl, M. B. Giles and E. Ullmann, Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients, Numer. Math. 125, 569–600 (2013).
[45] H. Wendland, Scattered Data Approximation, Cambridge University Press, Cambridge (2005)
[46] Z. M. Wu and R. Schaback, Local error estimates for radial basis function interpolation of scattered data, IMA J. Numer. Anal. 13, 13–27 (1993).
[47] Z. Y. Zeng, P. Kritzer and F. J. Hickernell, Spline methods using integration lattices and digital nets, Constr. Approx. 30, 529–555 (2009).
[48] Z. Y. Zeng, K. T. Leung and F. J. Hickernell, Error analysis of splines for periodic problems using lattice designs, In: Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and D. Talay, eds) Springer Verlag, 501–514 (2006)

Appendix I Combinatorial identities and proofs

This appendix provides some important combinatorial results that are required in the proofs of several regularity theorems. First, we provide an identity to simplify sums involving Stirling numbers of the second kind (2.1) from [39],

\displaystyle\sum_{k=1}^{m-\ell}{m\choose k}S(m-k,\ell)=(\ell+1)S(m,\ell+1)\quad\text{for }\ell<m.

(I.1)

Lemma 11.

Let $c>0$ and let ${\boldsymbol{b}}=(b_{j})_{j\geq 1}$ , $(\mathbb{A}_{{\boldsymbol{\nu}}})_{{\boldsymbol{\nu}}\in\mathcal{F}}$ and $(\mathbb{B}_{{\boldsymbol{\nu}}})_{{\boldsymbol{\nu}}\in\mathcal{F}}$ be sequences of non-negative real numbers that satisfy the recurrence

\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\,\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}+\mathbb{B}_{\boldsymbol{\nu}}\quad\text{for all }{\boldsymbol{\nu}}\in\mathcal{F}\mbox{ including }{\boldsymbol{\nu}}={\boldsymbol{0}},

(I.2)

where ${\boldsymbol{e}}_{j}$ is the multi-index whose the $j$ th component is 1 and all other components are 0. Then

\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}).

(I.3)

where $S(m_{i},\ell_{i})$ for $i\geq 1$ are Stirling numbers of the second kind given by (2.1). The result also holds with both equalities replaced by inequalities $\leq$ .

Proof.

The statement is proved by using induction on $|{\boldsymbol{\nu}}|$ . The statement holds trivially for ${\boldsymbol{\nu}}={\boldsymbol{0}}$ since $S(0,0)=1$ . Assume the statement (I.3) holds for all multi-indices of order less than $|{\boldsymbol{\nu}}|$ , i.e., if $1\leq k\leq\nu_{j}$ for some $j\geq 1$ , then

\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}).

(I.4)

We now prove the statement for indices of order $|{\boldsymbol{\nu}}|$ . Substituting the induction hypothesis (I.4) into (I.2), we have

\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{j\geq 1}\Phi_{j},

(I.5)

where

\displaystyle\Phi_{j}\coloneq\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}c^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i}).

We use a dash to denote multi-indices with the $j$ th component removed, for example, ${\boldsymbol{\nu}}^{\prime}~=~(\nu_{1},\ldots,\nu_{j-1},\nu_{j+1},\ldots)$ and adopt the notation $\mathbb{B}_{{\boldsymbol{\nu}}}=\mathbb{B}_{{\boldsymbol{\nu}}^{\prime},\nu_{j}}$ . Then

$\displaystyle\Phi_{j}=$	$\displaystyle\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\,\Bigg{[}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{\|{\boldsymbol{m}}^{\prime}\|+m_{j}}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}{\nu_{j}-k\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}$
	$\displaystyle\hskip 113.81102pt\times\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\Bigg{]}$
$\displaystyle=$	$\displaystyle\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}c^{\|{\boldsymbol{m}}^{\prime}\|}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\bigg{)}\,\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}$
	$\displaystyle\hskip 56.9055pt\times\underbrace{\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{m_{j}+k}{\nu_{j}\choose k}{\nu_{j}-k\choose m_{j}}\,b_{j}\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})}_{\eqcolon\,\Theta_{1}},$	(I.6)

where we have rearranged the terms and swapped the order of the sums.

Recognising that

{\nu_{j}\choose k}{\nu_{j}-k\choose m_{j}}={\nu_{j}\choose m_{j}+k}{m_{j}+k\choose k},

we have

$\displaystyle\Theta_{1}$	$\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}\!\!c^{m_{j}+k}{\nu_{j}\choose m_{j}+k}\!{m_{j}+k\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-(k+m_{j})}\!\!\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j},\ell_{j})$
	$\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=k}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}{m_{j}\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\ell_{j}=0}^{m_{j}-k}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})$
	$\displaystyle=\sum_{m_{j}=1}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\underbrace{\sum_{k=1}^{m_{j}}{m_{j}\choose k}\sum_{\ell_{j}=0}^{m_{j}-k}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})}_{\eqcolon\,\Theta_{2}},$	(I.7)

where we get to the second line by relabelling the summation indices of the second sum from $m_{j}=0,\ldots,\nu_{j}-k$ to $m_{j}=k,\ldots,\nu_{j}$ and to the third line by swapping the order of sums indexed by $k$ and $m_{j}$ . Swapping the order of the summations indexed by $k$ and $\ell_{j}$ in $\Theta_{2}$ gives

$\displaystyle\Theta_{2}$	$\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}\sum_{k=1}^{m_{j}-\ell_{j}}{m_{j}\choose k}S(m_{j}-k,\ell_{j})$
	$\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}(\ell_{j}+1)\,S(m_{j},\ell_{j}+1)$
	$\displaystyle=\sum_{\ell_{j}=1}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})=\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}\neq 0}}}}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j}),$	(I.8)

where the second equality is obtained using (I.1) and the third equality is from a simple re-indexing of the sum. To reach the final line, we can add the terms corresponding to $\ell_{j}=0$ into the sum so that the sum begins from $\ell_{j}=0$ because these terms are all equal to 0 due to the presence of an $\ell_{j}$ factor in the terms of the series, provided that we also introduce the condition $|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0$ to ensure the factorial term is defined. Substituting (I.8) into (I.7) gives

\displaystyle\Theta_{1}

\displaystyle=\sum_{m_{j}=0}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}\neq 0}}}}^{m_{j}}(|{\boldsymbol{\ell}}^{\prime}|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j}),

where we have added the terms corresponding to $m_{j}=0$ into the summation over $m_{j}$ since these terms are all also equal to 0 as $\ell_{j}=0$ when $m_{j}=0$ . Substituting this formula for $\Theta_{1}$ back into (I) then rearranging gives

	$\displaystyle\Phi_{j}$	$\displaystyle=\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}c^{\|{\boldsymbol{m}}^{\prime}\|}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\bigg{)}$
		$\displaystyle\qquad\quad\times\sum_{m_{j}=0}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}\neq 0}}}}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})$
		$\displaystyle=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}(\|{\boldsymbol{\ell}}\|-1)!\,\ell_{j}\,\prod_{i\geq 1}S(m_{i},\ell_{i}),$

where we obtain the last line by combining the sums over ${\boldsymbol{m}}^{\prime}$ and $m_{j}$ into a single sum over the original index ${\boldsymbol{m}}$ and combine the sums over ${\boldsymbol{\ell}}^{\prime}$ and $\ell_{j}$ into a single sum over the index ${\boldsymbol{\ell}}$ .

Now substituting this back into (I.5), we have

	$\displaystyle\mathbb{A}_{{\boldsymbol{\nu}}}$	$\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{j\geq 1}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,(\|{\boldsymbol{\ell}}\|-1)!\,\ell_{j}\,\prod_{i\geq 1}S(m_{i},\ell_{i})$
		$\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\sum_{{\boldsymbol{0}}\neq{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,\|{\boldsymbol{\ell}}\|!\,\prod_{i\geq 1}S(m_{i},\ell_{i})$
		$\displaystyle=\mathbb{B}_{\boldsymbol{\nu}}+\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\bigg{(}-\prod_{i\geq 1}S(m_{i},0)+\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,\|{\boldsymbol{\ell}}\|!\,\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}$
		$\displaystyle=\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}c^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\,\|{\boldsymbol{\ell}}\|!\,\mathbb{B}_{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\prod_{i\geq 1}S(m_{i},\ell_{i}),$

which is our desired result. We move to the second line interchanging the order of the summations and then to the third line by including the case when ${\boldsymbol{\ell}}={\boldsymbol{0}}$ and subtracting the corresponding term $\prod_{i\geq 1}S(m_{i},0)$ to maintain equality. Since $S(m_{i},0)=0$ when $m_{i}\neq 0$ , the term $\prod_{i\geq 1}S(m_{i},0)$ only contributes when ${\boldsymbol{m}}={\boldsymbol{0}}$ , and the resulting sum cancels out with the term $\mathbb{B}_{\boldsymbol{\nu}}$ giving the required result.

Since this induction proof holds for equality, in the case when (I.2) and (I.3) have their equalities replaced by inequalities $\leq$ , the statement will continue to hold. ∎

From [20, Lemma A.3], we also have that for some ${\boldsymbol{\nu}}\in\mathcal{F}$ ,

		$\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\mathbb{A}_{{\boldsymbol{\ell}}}\,\mathbb{B}_{{\boldsymbol{k}}}\prod_{i\geq 1}\bigg{(}S(m_{i},\ell_{i})\,S(\nu_{i}-m_{i},k_{i})\bigg{)}$
	$\displaystyle=$	$\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}\mathbb{A}_{{\boldsymbol{\ell}}}\,\mathbb{B}_{{\boldsymbol{m}}-{\boldsymbol{\ell}}}\bigg{)}\prod_{i\geq 1}S(\nu_{i},m_{i}),$		(I.9)

where $(\mathbb{A}_{\boldsymbol{\nu}})_{{\boldsymbol{\nu}}\in\mathcal{F}}$ and $(\mathbb{B}_{\boldsymbol{\nu}})_{{\boldsymbol{\nu}}\in\mathcal{F}}$ are arbitrary sequences of real numbers.

Appendix II Parametric regularity proofs

Proof of Lemma 2.

We follow a similar strategy to the proof of [32, Lemma 6.2]. For ${\boldsymbol{\nu}}={\boldsymbol{0}}$ , we rewrite the strong formulation (1.1) using the product rule to obtain

\displaystyle-\Psi({\boldsymbol{x}},{\boldsymbol{y}})\,\Delta u({\boldsymbol{x}},{\boldsymbol{y}})

\displaystyle=\nabla\Psi({\boldsymbol{x}},{\boldsymbol{y}})\cdot\nabla u({\boldsymbol{x}},{\boldsymbol{y}})+f({\boldsymbol{x}}),

from which it follows that

\displaystyle\Psi_{\min}\|\Delta u(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}\leq\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)}\|u(\cdot,{\boldsymbol{y}})\|_{V}+\|f\|_{L^{2}(D)},

where we have used Assumption (A(A0)) and the definition $\|u(\cdot,{\boldsymbol{y}})\|_{V}=\|\nabla u(\cdot,{\boldsymbol{y}})\|_{L^{2}(D)}$ . Combining this with (2.4), we have

	$\displaystyle\\|\Delta u(\cdot,{\boldsymbol{y}})\\|_{L^{2}(D)}$	$\displaystyle\leq\frac{1}{\Psi_{\min}}\Bigg{(}\sup_{{\boldsymbol{y}}\in\Omega}\\|\nabla\Psi(\cdot,{\boldsymbol{y}})\\|_{L^{\infty}(D)}\frac{\\|f\\|_{V^{\prime}}}{\Psi_{\min}}+\\|f\\|_{L^{2}(D)}\Bigg{)}$
		$\displaystyle\leq\frac{1}{\Psi_{\min}}\Bigg{(}\sup_{{\boldsymbol{y}}\in\Omega}\\|\nabla\Psi(\cdot,{\boldsymbol{y}})\\|_{L^{\infty}(D)}\frac{C_{\rm{Poi}}\\|f\\|_{L^{2}(D)}}{\Psi_{\min}}+\\|f\\|_{L^{2}(D)}\Bigg{)}$
		$\displaystyle\leq C_{1}\\|f\\|_{L^{2}(D)},$

where

\displaystyle C_{1}\coloneqq\frac{\max\{1,C_{\rm{Poi}}\}}{\Psi_{\min}}\bigg{(}\frac{\|\nabla\Psi\|_{L^{\infty}(D\times\Omega)}}{\Psi_{\min}}+1\bigg{)},

(II.1)

and $C_{\rm{Poi}}$ is the Poincaré constant of the embedding $V\hookrightarrow L^{2}(D)$ . It follows from Assumption (A(A0)) that $\|\nabla\Psi\|_{L^{\infty}(D\times\Omega)}\coloneq\sup_{{\boldsymbol{y}}\in\Omega}\|\nabla\Psi(\cdot,{\boldsymbol{y}})\|_{L^{\infty}(D)}$ is finite. Hence, $u(\cdot,{\boldsymbol{y}})\in H^{2}(D)$ and (4.1) holds for ${\boldsymbol{\nu}}={\boldsymbol{0}}$ .

Recall that the gradient $\nabla$ and Laplacian $\Delta$ are taken with respect to ${\boldsymbol{x}}$ and that $\partial^{{\boldsymbol{\nu}}}$ is taken with respect to ${\boldsymbol{y}}$ . For ${\boldsymbol{\nu}}\neq{\boldsymbol{0}}$ , we take the ${\boldsymbol{\nu}}$ th derivative of (1.1) using the Leibniz product rule. Since $f$ is independent of ${\boldsymbol{y}}$ , the right-hand side is 0 and we can rearrange the resulting expression to obtain, omitting the dependence of ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ ,

	$\displaystyle\nabla\cdot(\Psi\nabla\partial^{{\boldsymbol{\nu}}}u)$	$\displaystyle=-\nabla\cdot\bigg{(}\ \sum_{{\boldsymbol{0}}\neq{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\big{)}\bigg{)}$
		$\displaystyle=-\nabla\cdot\bigg{(}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\psi_{j}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}\bigg{)},$		(II.2)

where we use the fact that the mixed partial derivatives of $\Psi$ with respect to ${\boldsymbol{y}}$ are

\displaystyle\partial^{\boldsymbol{m}}\Psi({\boldsymbol{x}},{\boldsymbol{y}})=\begin{cases}\Psi({\boldsymbol{x}},{\boldsymbol{y}})&\text{if }{\boldsymbol{m}}={\boldsymbol{0}},\\ (2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\dfrac{k\pi}{2}\bigg{)}\psi_{j}({\boldsymbol{x}})&\text{if }{\boldsymbol{m}}=k{\boldsymbol{e}}_{j},\,k\geq 1,\\ 0&\text{otherwise.}\end{cases}

(II.3)

Expanding (Proof of Lemma 2.) using the product rule for $\nabla$ and rearranging gives

	$\displaystyle\Psi\Delta\partial^{\boldsymbol{\nu}}u=-\nabla\Psi\cdot\nabla\partial^{\boldsymbol{\nu}}u$
	$\displaystyle\qquad\quad-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\Big{(}\big{(}\nabla\psi_{j}\big{)}\big{(}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}+\psi_{j}\big{(}\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\big{)}\Big{)}.$

Now taking the $L^{2}(D)$ norm and applying the triangle inequality gives

	$\displaystyle\Psi_{\min}\\|\Delta\partial^{\boldsymbol{\nu}}u\\|_{L^{2}(D)}\leq\\|\nabla\Psi\\|_{L^{\infty}(D)}\\|\nabla\partial^{\boldsymbol{\nu}}u\\|_{L^{2}(D)}$
	$\displaystyle+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\big{(}\\|\psi_{j}\\|_{L^{\infty}(D)}\\|\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\\|_{L^{2}(D)}+\\|\nabla\psi_{j}\\|_{L^{\infty}(D)}\\|\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\\|_{L^{2}(D)}\big{)},$

where we used $|\sin(x)|\leq 1$ for all real $x$ . Formulating the above into a recursion gives

\displaystyle\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,b_{j}\,\|\Delta\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{L^{2}(D)}+B_{\boldsymbol{\nu}},

(II.4)

where $b_{j}$ and $\overline{b}_{j}$ are defined in (2.3) and we define

\displaystyle B_{\boldsymbol{\nu}}\coloneq\frac{\|\nabla\Psi\|_{L^{\infty}(D)}}{\Psi_{\min}}\|\partial^{\boldsymbol{\nu}}u\|_{V}+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\overline{b}_{j}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}u\|_{V}.

Substituting (2.5) into $B_{\boldsymbol{\nu}}$ , we can bound $B_{\boldsymbol{\nu}}$ by

$\displaystyle B_{\boldsymbol{\nu}}$	$\displaystyle\leq\frac{\\|\nabla\Psi\\|_{L^{\infty}(D)}}{\Psi_{\min}}\frac{\\|f\\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\|{\boldsymbol{m}}\|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})$
	$\displaystyle\qquad+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\overline{b}_{j}\frac{\\|f\\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{\|{\boldsymbol{\nu}}\|-k}\!\!\!\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!\!\!\|{\boldsymbol{m}}\|!\,{\boldsymbol{b}}^{\boldsymbol{m}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})$
	$\displaystyle=\frac{\\|f\\|_{V^{\prime}}}{\Psi_{\min}}(2\pi)^{\|{\boldsymbol{\nu}}\|}\bigg{[}\frac{\\|\nabla a\\|_{L^{\infty}(D)}}{\Psi_{\min}}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\|{\boldsymbol{m}}\|!\,{\boldsymbol{b}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i})$
	$\displaystyle\qquad+\underbrace{\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\|{\boldsymbol{m}}\|!\,{\boldsymbol{b}}^{\boldsymbol{m}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})}_{\eqcolon\,\Theta}\bigg{]}.$	(II.5)

We simplify $\Theta$ using the same strategy in the proof of Lemma 11 by separating out the $j$ th component of the sum over ${\boldsymbol{m}}$ and interchanging the order of the sums to give

	$\displaystyle\Theta$	$\displaystyle=\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\overline{b}_{j}\,\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{m_{j}=0}^{\nu_{j}-k}(\|{\boldsymbol{m}}^{\prime}\|+m_{j})!\,{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}b_{j}^{m_{j}}S(\nu_{j}-k,m_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})$
		$\displaystyle=\sum_{j\geq 1}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})\bigg{)}\sum_{m_{j}=0}^{\nu_{j}-k}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}\,(\|{\boldsymbol{m}}^{\prime}\|+m_{j})!\,b_{j}^{m_{j}}S(\nu_{j}-k,m_{j})$
		$\displaystyle\leq\sum_{j\geq 1}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{m}}^{\prime}}\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},m_{i})\bigg{)}\sum_{\stackrel{{\scriptstyle\scriptstyle{m_{j}=0}}}{{\scriptstyle{\|{\boldsymbol{m}}^{\prime}\|+m_{j}\neq 0}}}}^{\nu_{j}}(\|{\boldsymbol{m}}^{\prime}\|+m_{j}-1)!\,m_{j}\,\overline{b}_{j}^{m_{j}}\,S(\nu_{j},m_{j})$
		$\displaystyle\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\|{\boldsymbol{m}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i}),$

where we drop the condition $|{\boldsymbol{m}}^{\prime}|+m_{j}\neq 0$ since if ${\boldsymbol{m}}={\boldsymbol{0}}$ and ${\boldsymbol{\nu}}\neq{\boldsymbol{0}}$ , there exists some index $i$ such that $\nu_{i}\neq 0$ and $S(\nu_{i},m_{i})=0$ . We have also replaced ${\boldsymbol{b}}$ with $\overline{{\boldsymbol{b}}}$ since $b_{i}\leq\overline{b}_{i}$ for all $i\geq 1$ .

Substituting the bound on $\Theta$ into (Proof of Lemma 2.) and using ${\boldsymbol{b}}\leq\overline{{\boldsymbol{b}}}$ , we can bound $B_{\boldsymbol{\nu}}$ by

	$\displaystyle B_{\boldsymbol{\nu}}$	$\displaystyle\leq\\|f\\|_{L^{2}(D)}\frac{C_{\rm{Poi}}}{\Psi_{\min}}\bigg{(}\frac{\\|\nabla\Psi\\|_{L^{\infty}(D\times\Omega)}}{\Psi_{\min}}+1\bigg{)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\|{\boldsymbol{m}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i})$
		$\displaystyle\leq C_{1}\\|f\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\|{\boldsymbol{m}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}}\prod_{i\geq 1}S(\nu_{i},m_{i})\eqcolon\mathbb{B}_{{\boldsymbol{\nu}}},$

where $C_{1}$ is defined in (II.1). Thus, defining $\mathbb{A}_{\boldsymbol{\nu}}\coloneq\|\Delta\partial^{\boldsymbol{\nu}}u\|_{L^{2}(D)}$ , we can write (II.4) as

\displaystyle\mathbb{A}_{\boldsymbol{\nu}}\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,b_{j}\,\mathbb{A}_{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}+\mathbb{B}_{\boldsymbol{\nu}}.

Noting that $\mathbb{A}_{0}\leq\mathbb{B}_{0}$ , we can then apply Lemma 11 (we cannot apply Lemma 11 to (II.4) with $B_{{\boldsymbol{\nu}}}$ since it is not true that $\mathbb{A}_{\boldsymbol{0}}\leq B_{\boldsymbol{0}}$ ) to give

	$\displaystyle\\|\Delta\partial^{\boldsymbol{\nu}}u\\|_{L^{2}(D)}$	$\displaystyle\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{\|{\boldsymbol{m}}\|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\|{\boldsymbol{\ell}}\|!\,{\boldsymbol{b}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}$
		$\displaystyle\times{C_{1}\\|f\\|_{L^{2}(D)}}(2\pi)^{\|{\boldsymbol{\nu}}-{\boldsymbol{m}}\|}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}\|{\boldsymbol{k}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})$
		$\displaystyle\leq{C_{1}\\|f\\|_{L^{2}(D)}}(2\pi)^{\|{\boldsymbol{\nu}}\|}$
		$\displaystyle\times\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\!\!{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\!\|{\boldsymbol{\ell}}\|!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\!\bigg{)}\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}\!\!\!\|{\boldsymbol{k}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}\!S(\nu_{i}-m_{i},k_{i})\!\bigg{)},$

Then, using (I) and the identity $\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!\,|{\boldsymbol{m}}-{\boldsymbol{\ell}}|!=(|{\boldsymbol{m}}|+1)!$ from [32], we obtain,

	$\displaystyle\\|\Delta(\partial^{\boldsymbol{\nu}}u)\\|_{L^{2}(D)}$	$\displaystyle\leq{C_{1}\\|f\\|_{L^{2}(D)}}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\bigg{[}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}\|{\boldsymbol{\ell}}\|!\,\|{\boldsymbol{m}}-{\boldsymbol{\ell}}\|!\bigg{]}\prod_{i\geq 1}S(\nu_{i},m_{i})$
		$\displaystyle=C_{1}\\|f\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(\|{\boldsymbol{m}}\|+1)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}),$

as required. $\hfill\Box$

Proof of Lemma 3.

We follow the proof strategy presented in [32, Lemma 6.3]. Let $f\in L^{2}(D)$ , ${\boldsymbol{y}}\in\Omega$ and ${\boldsymbol{\nu}}\in\mathcal{F}$ . Since $u_{h}$ is analytic in $V$ , we have that $\partial^{{\boldsymbol{\nu}}}u_{h}\in V_{h}$ for every ${\boldsymbol{\nu}}\in\mathcal{F}$ and hence

\displaystyle(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})(\partial^{\boldsymbol{\nu}}u_{h}(\cdot,{\boldsymbol{y}}))\equiv 0,

where $\mathsf{P}^{h}_{\boldsymbol{y}}$ is the orthogonal projection defined by (2.11). It then follows that

	$\displaystyle\\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|_{V}$	$\displaystyle=\\|\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{{\boldsymbol{\nu}}}(u-u_{h})+(\mathsf{I}-\mathsf{P}_{\boldsymbol{y}}^{h})\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|_{V}$
		$\displaystyle\leq\\|\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|_{V}+\\|(\mathsf{I}-\mathsf{P}_{\boldsymbol{y}}^{h})\partial^{{\boldsymbol{\nu}}}u\\|_{V},$		(II.6)

where we have omitted the dependence of $u$ on ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ for brevity.

Starting with the Galerkin orthogonality property given in (2.10), we take the $\partial^{{\boldsymbol{\nu}}}$ derivative with respect to ${\boldsymbol{y}}$ using the Leibniz product rule to give

\displaystyle\int_{D}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}(\partial^{\boldsymbol{m}}\Psi)\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}=0\quad\text{for all }v_{h}\in V_{h}.

(II.7)

Next, separating out the term when ${\boldsymbol{m}}={\boldsymbol{0}}$ and then substituting the mixed derivatives of the coefficient (II.3) into (II.7), we obtain

	$\displaystyle\int_{D}\Psi\,\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}$
	$\displaystyle\qquad=-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\int_{D}\psi_{j}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\cdot\nabla v_{h}\,\mathrm{d}{\boldsymbol{x}}$

for all $v_{h}\in V_{h}$ . Now, letting $v_{h}=\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})$ , we have

		$\displaystyle\int_{D}\Psi\,\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}$
		$\displaystyle=-\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\!\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\!\int_{D}\!\psi_{j}\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\cdot\nabla\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}.$		(II.8)

Since $\partial^{{\boldsymbol{\nu}}}(u-u_{h})\in V$ , using (2.11) with $w=\partial^{{\boldsymbol{\nu}}}(u-u_{h})$ and $v_{h}=\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})$ yields

\displaystyle\int_{D}\Psi\nabla((\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{\boldsymbol{\nu}}(u-u_{h}))\cdot\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}=0,

which can then be rearranged to give

	$\displaystyle\int_{D}\Psi\nabla\partial^{\boldsymbol{\nu}}(u-u_{h})\cdot\nabla\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\,\mathrm{d}{\boldsymbol{x}}$	$\displaystyle=\int_{D}\Psi\,\|\nabla\mathsf{P}_{\boldsymbol{y}}^{h}\partial^{\boldsymbol{\nu}}(u-u_{h})\|^{2}\,\mathrm{d}{\boldsymbol{x}}$
		$\displaystyle\geq\Psi_{\min}\\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\\|^{2}_{V},$		(II.9)

where we have used Assumption (A(A0)). Substituting in the lower bound (II.9) for the left-hand side of (Proof of Lemma 3.), applying the Cauchy-Schwarz inequality to the right hand side and then dividing through by $\Psi_{\min}\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\|_{V}$ yields

\displaystyle\|\mathsf{P}^{h}_{\boldsymbol{y}}\partial^{\boldsymbol{\nu}}(u-u_{h})\|_{V}\leq\frac{1}{\Psi_{\min}}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\,\|\psi_{j}\|_{L^{\infty}(D)}\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\|_{V}.

Then substituting this into (Proof of Lemma 3.) gives

\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}

\displaystyle\leq\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\,\,\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}(u-u_{h})\|_{V}+\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}}u\|_{V},

where $b_{j}$ is defined in (2.3).

Applying Lemma 11 with $\mathbb{A}_{\boldsymbol{\nu}}=\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}$ , $\mathbb{B}_{\boldsymbol{\nu}}=\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}}u\|_{V}$ and $c=2\pi$ to the above inequality gives

\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}\leq\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{V}.

Since $\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\in H^{2}(D)$ , we have $\|(\mathsf{I}-\mathsf{P}^{h}_{\boldsymbol{y}})\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{V}\leq C\,h\,\|\Delta\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{L^{2}(D)}$ from (2.12) (with $C$ independent of ${\boldsymbol{y}}$ ), which in turn gives

\displaystyle\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{V}

\displaystyle\leq C\,h\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(2\pi)^{|{\boldsymbol{m}}|}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}|{\boldsymbol{\ell}}|!\,{\boldsymbol{b}}^{{\boldsymbol{\ell}}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\|\Delta\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}u\|_{L^{2}(D)}.

Then using (4.1) from Lemma 2 with constant $C_{1}$ , ${\boldsymbol{b}}\leq\overline{{\boldsymbol{b}}}$ and defining $C_{2}~\coloneq~{C\,C_{1}}/{2}$ , we obtain

	$\displaystyle\\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|_{V}$
	$\displaystyle\leq 2C_{2}\,h\,(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\\|f\\|_{L^{2}(D)}$
	$\displaystyle\qquad\times\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}\|{\boldsymbol{\ell}}\|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}\prod_{j\geq 1}S(m_{j},\ell_{j})\bigg{)}\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}(\|{\boldsymbol{k}}\|+1)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{k}}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})\bigg{)}$
	$\displaystyle=2C_{2}\,h\,(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\\|f\\|_{L^{2}(D)}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}\|{\boldsymbol{\ell}}\|!\,(\|{\boldsymbol{m}}-{\boldsymbol{\ell}}\|!+1)!\bigg{)}\prod_{i\geq 1}S(\nu_{i},m_{i})$
	$\displaystyle=C_{2}\,h\,(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\\|f\\|_{L^{2}(D)}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}(\|{\boldsymbol{m}}\|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}\prod_{i\geq 1}S(\nu_{i},m_{i}).$

We arrive at the first equality using (I) with $\mathbb{A}_{{\boldsymbol{\ell}}}=|{\boldsymbol{\ell}}|!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}$ and $\mathbb{B}_{{\boldsymbol{k}}}=(|{\boldsymbol{k}}|+1)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}}$ and then move to the last line using the identity from [32]

\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}|{\boldsymbol{\ell}}|!(|{\boldsymbol{m}}-{\boldsymbol{\ell}}|+1)!=\frac{(|{\boldsymbol{m}}|+2)!}{2},

which gives the required result. The constant $C_{2}$ is independent of $h$ and ${\boldsymbol{y}}$ . $\hfill\Box$

Proof of Lemma 4.

We use an Aubin-Nitsche duality argument. For some linear functional $\mathcal{G}~\in~L^{2}(D)$ , define $v^{g}\in V$ to be the solution to the dual problem,

\displaystyle\mathcal{A}({\boldsymbol{y}};w,v^{g}(\cdot,{\boldsymbol{y}}))=\mathcal{G}(w)\quad\text{for all }w\in V,

which, since $\mathcal{A}$ is symmetric, is equivalent to the parametric variational problem (2.2) with $f$ replaced by $g$ , the representer of $\mathcal{G}$ . Thus, $v^{g}(\cdot,{\boldsymbol{y}})\in V$ inherits the regularity of the solution to (2.2) and the FE approximation $v_{g}^{h}(\cdot,{\boldsymbol{y}})\in V_{h}$ also satisfies (4.2).

Letting $w=u-u_{h}$ (and suppressing the dependence on ${\boldsymbol{y}}$ ), it follows from Galerkin orthogonality(2.10) that $\mathcal{A}({\boldsymbol{y}};u-u_{h},v^{g}_{h})=0$ , which leads to

\displaystyle\mathcal{G}(u-u_{h})=\mathcal{A}({\boldsymbol{y}};u-u_{h},v_{g})=\mathcal{A}({\boldsymbol{y}};u-u_{h},v^{g}-v^{g}_{h}).

Differentiating this with respect to ${\boldsymbol{y}}$ gives

\displaystyle\mathcal{G}\big{(}\partial^{{\boldsymbol{\nu}}}(u-u_{h})\big{)}

\displaystyle=\int_{D}\partial^{{\boldsymbol{\nu}}}\big{(}\Psi\,\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}\,\mathrm{d}{\boldsymbol{x}}.

Applying the Leibniz product rule, the integrand on the right becomes

	$\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\,\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$
	$\displaystyle=\Psi\,\partial^{{\boldsymbol{\nu}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$
	$\displaystyle\hskip 28.45274pt+\sum_{{\boldsymbol{0}}\neq{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\big{(}\partial^{{\boldsymbol{m}}}\Psi\big{)}\,\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$
	$\displaystyle=\Psi\,\partial^{{\boldsymbol{\nu}}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$
	$\displaystyle\hskip 28.45274pt+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}\sin\bigg{(}2\pi y_{j}+\frac{k\pi}{2}\bigg{)}\,\psi_{j}\,\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$
	$\displaystyle=\Psi\!\!\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\!\!{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\nabla\partial^{{\boldsymbol{m}}}(u-u_{h})\cdot\nabla\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})+\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}\!{\nu_{j}\choose k}(2\pi)^{k}\sin\!\bigg{(}\!2\pi y_{j}+\frac{k\pi}{2}\!\bigg{)}\psi_{j}$
	$\displaystyle\hskip 56.9055pt\times\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\nabla\partial^{{\boldsymbol{\ell}}}(u-u_{h})\cdot\nabla\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\bigg{)},$

where we have substituted in the bound (II.3) and applied the Leibniz product rule to $\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\big{(}\nabla(u-u_{h})\cdot\nabla(v^{g}-v^{g}_{h})\big{)}$ .

Now, taking the absolute value and using the Cauchy-Schwarz inequality gives

		$\displaystyle\big{\|}\mathcal{G}\big{(}\partial^{\boldsymbol{\nu}}(u-u_{h})\big{)}\big{\|}\leq\Psi_{\max}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\\|\partial^{{\boldsymbol{m}}}(u-u_{h})\\|_{V}\,\\|\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})\\|_{V}$		(II.10)
		$\displaystyle+\Psi_{\min}\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\!\!\!\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\\|_{V}\\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\\|_{V},$

where we have also used the definition of $b_{j}$ in (2.3).

The terms in the first sum in (II.10) can be bounded by (4.2) from Lemma 3 to give

$\displaystyle\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}$	$\displaystyle{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}\\|\partial^{{\boldsymbol{m}}}(u-u_{h})\\|_{V}\,\\|\partial^{{\boldsymbol{\nu}}-{\boldsymbol{m}}}(v^{g}-v^{g}_{h})\\|_{V}$
$\displaystyle\leq\,$	$\displaystyle C_{2}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}{{\boldsymbol{\nu}}\choose{\boldsymbol{m}}}$
	$\displaystyle\times\bigg{(}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}(\|{\boldsymbol{\ell}}\|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\prod_{i\geq 1}S(m_{i},\ell_{i})\bigg{)}\,\bigg{(}\sum_{{\boldsymbol{k}}\leq{\boldsymbol{\nu}}-{\boldsymbol{m}}}(\|{\boldsymbol{k}}\|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}}\prod_{i\geq 1}S(\nu_{i}-m_{i},k_{i})\bigg{)}$
$\displaystyle=\,$	$\displaystyle\frac{C_{2}}{30}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(\|{\boldsymbol{m}}\|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}),$	(II.11)

where $C_{2}$ denotes the constant factor from (4.2) and we obtain the last equality using (I) with $\mathbb{A}_{{\boldsymbol{\ell}}}=(|{\boldsymbol{\ell}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}$ and $\mathbb{B}_{{\boldsymbol{k}}}=(|{\boldsymbol{k}}|+2)!\,\overline{{\boldsymbol{b}}}^{\boldsymbol{k}}$ , along with the identity from [32]

\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{m}}}{{\boldsymbol{m}}\choose{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+2)!\,(|{\boldsymbol{m}}-{\boldsymbol{\ell}}|+2)!=\frac{(|{\boldsymbol{m}}|+5)!}{30}.

(II.12)

Similarly, for the summation over the index ${\boldsymbol{\ell}}$ in (II.10), we again use (4.2) from Lemma 3 along with (I) and (II.12) to obtain

	$\displaystyle\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\\|_{V}\,\\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\\|_{V}$
	$\displaystyle\leq C_{2}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|-k}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\,(\|{\boldsymbol{\ell}}\|+5)!\,S(\nu_{j}-k,\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i}).$

Substituting this back into the sum indexed by $j$ in (II.10), we have

	$\displaystyle\sum_{j\geq 1}\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}(2\pi)^{k}b_{j}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}\choose{\boldsymbol{\ell}}}\\|\partial^{{\boldsymbol{\ell}}}(u-u_{h})\\|_{V}\,\\|\partial^{{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}-{\boldsymbol{\ell}}}(v^{g}-v^{g}_{h})\\|_{V}$
	$\displaystyle\leq C_{2}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}$
	$\displaystyle\qquad\qquad\qquad\times\sum_{j\geq 1}\underbrace{\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}b_{j}\!\!\!\!\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}-k{\boldsymbol{e}}_{j}}\!\!\!\!\overline{{\boldsymbol{b}}}^{\boldsymbol{\ell}}\,(\|{\boldsymbol{\ell}}\|+5)!\,S(\nu_{j}-k,\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})}_{\eqcolon\,\Theta_{j}}.$		(II.13)

To bound $\Theta_{j}$ , we use the same technique as in Lemma 11. We separate out component $\ell_{j}$ from the innermost sum over ${\boldsymbol{\ell}}$ , bound $b_{j}$ by $\overline{b}_{j}$ then swap the order of the sums over $k$ and $\ell_{j}$ so that (I.1) can be used to evaluate the sum over $k$ . This gives

	$\displaystyle\Theta_{j}$	$\displaystyle=\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}b_{j}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{\ell_{j}=0}^{\nu_{j}-k}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\,\overline{b}_{j}^{\ell_{j}}\,(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}+5)!\,S(\nu_{j}-k,\ell_{j})$
		$\displaystyle\leq\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=0}^{\nu_{j}-1}\sum_{k=1}^{\nu_{j}-\ell_{j}}{\nu_{j}\choose k}\,\overline{b}_{j}^{\ell_{j}+1}\,(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}+5)!\,S(\nu_{j}-k,\ell_{j})$
		$\displaystyle=\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=0}^{\nu_{j}-1}\overline{b}_{j}^{\ell_{j}+1}\,(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}+5)!\,(\ell_{j}+1)\,S(\nu_{j},\ell_{j}+1)$
		$\displaystyle=\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\overline{{\boldsymbol{b}}^{\prime}}^{{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(\nu_{i},\ell_{i})\bigg{)}\sum_{\ell_{j}=1}^{\nu_{j}}\overline{b}_{j}^{\ell_{j}}\,(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}+4)!\,\ell_{j}\,S(\nu_{j},\ell_{j}),$

We can add the terms $\ell_{j}=0$ to the sum due to the presence of the factor $\ell_{j}$ and thus

\displaystyle\Theta_{j}\leq\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+4)!\,\ell_{j}\,\prod_{i\geq 1}S(\nu_{i},\ell_{i}),

and using $\sum_{j\geq 1}\ell_{j}=|{\boldsymbol{\ell}}|\leq|{\boldsymbol{\ell}}|+5$ we have,

\displaystyle\sum_{j\geq 1}\Theta_{j}\leq\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(|{\boldsymbol{\ell}}|+5)!\,\prod_{i\geq 1}S(\nu_{i},\ell_{i}).

(II.14)

Combining (II.14), (II.13), (II.11) and (II.10), we have

	$\displaystyle\|\mathcal{G}(\partial^{{\boldsymbol{\nu}}}(u-u_{h}))\|$
	$\displaystyle\leq\Psi_{\max}\frac{C_{2}}{30}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\,\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(\|{\boldsymbol{m}}\|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i})$
	$\displaystyle\hskip 28.45274pt+\Psi_{\min}\,C_{2}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{\ell}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{{\boldsymbol{\ell}}}(\|{\boldsymbol{\ell}}\|+5)!\,\prod_{i\geq 1}S(\nu_{i},\ell_{i})$
	$\displaystyle=C_{2}\Big{(}\frac{\Psi_{\max}}{30}+\Psi_{\min}\Big{)}\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|g\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(\|{\boldsymbol{m}}\|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}).$

Finally, we let $\mathcal{G}$ be the functional with representer $g=\partial^{{\boldsymbol{\nu}}}(u-u_{h})(\cdot,{\boldsymbol{y}})$ , i.e., $\mathcal{G}(v)=|\langle\partial^{{\boldsymbol{\nu}}}(u-u_{h}),v\rangle|$ for $v\in V$ , which gives

	$\displaystyle\\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|^{2}_{L^{2}(D)}$
	$\displaystyle\qquad\lesssim\,h^{2}\,\\|f\\|_{L^{2}(D)}\\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\\|_{L^{2}(D)}(2\pi)^{\|{\boldsymbol{\nu}}\|}\sum_{{\boldsymbol{m}}\leq{\boldsymbol{\nu}}}\overline{{\boldsymbol{b}}}^{\boldsymbol{m}}(\|{\boldsymbol{m}}\|+5)!\prod_{i\geq 1}S(\nu_{i},m_{i}),$

where the implied constant is independent of $h$ and ${\boldsymbol{y}}$ . Finally, dividing through by $\|\partial^{{\boldsymbol{\nu}}}(u-u_{h})\|_{L^{2}(D)}$ yields the required result (4.2). $\hfill\Box$

		$\displaystyle\\|(I-I_{0})\,u_{0}\\|_{L^{2}(\Omega_{s}\times D)}+\sum_{\ell=1}^{L}\\|(I-I_{\ell})(u_{\ell}-u_{\ell-1})\\|_{L^{2}(\Omega_{s}\times D)}$
		$\displaystyle\leq\frac{\\|f\\|_{V^{\prime}}C_{s}(\lambda)}{[\varphi(N_{0})]^{\frac{1}{4\lambda}}}+\sum_{\ell=1}^{L}\frac{\kappa}{[\varphi(N_{\ell})]^{\frac{1}{4\lambda}}}\bigg{(}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\max(\|{\mathrm{\mathfrak{u}}}\|,1)\,\gamma_{\mathrm{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{\|{\mathrm{\mathfrak{u}}}\|}\bigg{)}^{\frac{1}{2\lambda}}$
		$\displaystyle\quad\times\bigg{(}\sqrt{\int_{D}\\|(u^{s}-u^{s}_{h_{\ell}})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}+\sqrt{\int_{D}\\|(u^{s}-u^{s}_{h_{\ell-1}})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}}\,\bigg{)},$		(5.3)

	$\displaystyle\int_{D}\\|(u^{s}-u^{s}_{h})({\boldsymbol{x}},\cdot)\\|_{\mathcal{H}_{\alpha,{\boldsymbol{\gamma}}}(\Omega_{s})}^{2}\,\mathrm{d}{\boldsymbol{x}}$
	$\displaystyle\leq\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\int_{D}\bigg{(}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})({\boldsymbol{x}},{\boldsymbol{y}})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{x}}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle=\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha\|{\mathrm{\mathfrak{u}}}\|}\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{\\|}\bigg{(}\prod_{j\in{\mathrm{\mathfrak{u}}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}(u^{s}-u^{s}_{h})(\cdot,{\boldsymbol{y}})\bigg{\\|}^{2}_{L^{2}(D)}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle\lesssim\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{h^{4}\,\\|f\\|^{2}_{L^{2}(D)}}{\gamma_{\mathrm{\mathfrak{u}}}}\int_{[0,1]^{s}}\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}\,\mathrm{d}{\boldsymbol{y}}$
	$\displaystyle=\,\,h^{4}\,\\|f\\|^{2}_{L^{2}(D)}\sum_{{\mathrm{\mathfrak{u}}}\subseteq\{1:s\}}\frac{1}{\gamma_{\mathrm{\mathfrak{u}}}}\,\bigg{(}\sum_{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathrm{\mathfrak{u}}}\|}}(\|{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}\|+5)!\,\overline{{\boldsymbol{b}}}^{{\boldsymbol{m}}_{\mathrm{\mathfrak{u}}}}\prod_{i\in{\mathrm{\mathfrak{u}}}}S(\alpha,m_{i})\bigg{)}^{2}.$

$\displaystyle\Phi_{j}=$	$\displaystyle\sum_{k=1}^{\nu_{j}}{\nu_{j}\choose k}c^{k}\,b_{j}\,\Bigg{[}\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{\|{\boldsymbol{m}}^{\prime}\|+m_{j}}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}{\nu_{j}-k\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}$
	$\displaystyle\hskip 113.81102pt\times\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\Bigg{]}$
$\displaystyle=$	$\displaystyle\sum_{{\boldsymbol{m}}^{\prime}\leq{\boldsymbol{\nu}}^{\prime}}c^{\|{\boldsymbol{m}}^{\prime}\|}{{\boldsymbol{\nu}}^{\prime}\choose{\boldsymbol{m}}^{\prime}}\sum_{{\boldsymbol{\ell}}^{\prime}\leq{\boldsymbol{m}}^{\prime}}{\boldsymbol{b}}^{\prime{\boldsymbol{\ell}}^{\prime}}\,\bigg{(}\prod_{\stackrel{{\scriptstyle\scriptstyle{i\geq 1}}}{{\scriptstyle{i\neq j}}}}S(m_{i},\ell_{i})\bigg{)}\,\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-k-m_{j}}$
	$\displaystyle\hskip 56.9055pt\times\underbrace{\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}c^{m_{j}+k}{\nu_{j}\choose k}{\nu_{j}-k\choose m_{j}}\,b_{j}\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})}_{\eqcolon\,\Theta_{1}},$	(I.6)

$\displaystyle\Theta_{1}$	$\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=0}^{\nu_{j}-k}\!\!c^{m_{j}+k}{\nu_{j}\choose m_{j}+k}\!{m_{j}+k\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-(k+m_{j})}\!\!\sum_{\ell_{j}=0}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j},\ell_{j})$
	$\displaystyle=\sum_{k=1}^{\nu_{j}}\sum_{m_{j}=k}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}{m_{j}\choose k}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\sum_{\ell_{j}=0}^{m_{j}-k}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})$
	$\displaystyle=\sum_{m_{j}=1}^{\nu_{j}}c^{m_{j}}{\nu_{j}\choose m_{j}}\mathbb{B}_{{\boldsymbol{\nu}}^{\prime}-{\boldsymbol{m}}^{\prime},\nu_{j}-m_{j}}\underbrace{\sum_{k=1}^{m_{j}}{m_{j}\choose k}\sum_{\ell_{j}=0}^{m_{j}-k}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}S(m_{j}-k,\ell_{j})}_{\eqcolon\,\Theta_{2}},$	(I.7)

$\displaystyle\Theta_{2}$	$\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}\sum_{k=1}^{m_{j}-\ell_{j}}{m_{j}\choose k}S(m_{j}-k,\ell_{j})$
	$\displaystyle=\sum_{\ell_{j}=0}^{m_{j}-1}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j})!\,b_{j}^{\ell_{j}+1}(\ell_{j}+1)\,S(m_{j},\ell_{j}+1)$
	$\displaystyle=\sum_{\ell_{j}=1}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j})=\sum_{\stackrel{{\scriptstyle\scriptstyle{\ell_{j}=0}}}{{\scriptstyle{\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}\neq 0}}}}^{m_{j}}(\|{\boldsymbol{\ell}}^{\prime}\|+\ell_{j}-1)!\,\ell_{j}\,b_{j}^{\ell_{j}}S(m_{j},\ell_{j}),$	(I.8)

Multilevel lattice-based kernel approximation for elliptic PDEs with random coefficients

Abstract

1 Introduction

2 Background

2.1 Notation

2.2 Parametric variational formulation

2.3 Dimension truncation

2.4 Finite element methods

2.5 Lattice-based kernel interpolation

2.6 Single-level kernel interpolation for PDEs

3 Multilevel kernel approximation

3.1 Error decomposition for multilevel methods

3.2 Cost analysis of multilevel methods

3.3 Abstract complexity analysis

Theorem 1.

Proof.

4 Parametric regularity analysis

Lemma 2.

Lemma 3.

Lemma 4.

5 Multilevel error analysis

5.1 Estimating the multilevel FE error

Theorem 5.

Proof.

Theorem 6.

5.2 Choosing the weight parameters γ𝔲\gamma_{\mathrm{\mathfrak{u}}}

Theorem 7.

Proof.

Remark 8.

Theorem 9.

Proof.

Remark 10.

6 Implementing the ML kernel approximation

7 Numerical experiments

7.1 Problem specification

The parametric PDE.

Smoothness, weights, and CBC construction.

Convergence and cost.

7.2 Diagnostic plots

FE.

Dimension truncation.

Single-level kernel interpolation.

Interpolation error of the FE difference.

7.3 Multilevel results

8 Conclusion

References

Appendix I Combinatorial identities and proofs

Lemma 11.

Proof.

Appendix II Parametric regularity proofs

5.2 Choosing the weight parameters $\gamma_{\mathrm{\mathfrak{u}}}$