This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Finite Neuron Method and Convergence Analysis

Jinchao Xu111xu@math.psu.edu, Department of Mathematics, Pennsylvania State University, University Park, PA, 16802, USA
(August 2020)
Abstract

We study a family of HmH^{m}-conforming piecewise polynomials based on artificial neural network, named as the finite neuron method (FNM), for numerical solution of 2m2m-th order partial differential equations in d\mathbb{R}^{d} for any m,d1m,d\geq 1 and then provide convergence analysis for this method. Given a general domain Ωd\Omega\subset\mathbb{R}^{d} and a partition 𝒯h\mathcal{T}_{h} of Ω\Omega, it is still an open problem in general how to construct conforming finite element subspace of Hm(Ω)H^{m}(\Omega) that have adequate approximation properties. By using techniques from artificial neural networks, we construct a family of HmH^{m}-conforming set of functions consisting of piecewise polynomials of degree kk for any kmk\geq m and we further obtain the error estimate when they are applied to solve elliptic boundary value problem of any order in any dimension. For example, the following error estimates between the exact solution uu and finite neuron approximation uNu_{N} are obtained.

uuNHm(Ω)=𝒪(N121d).\|u-u_{N}\|_{H^{m}(\Omega)}=\mathcal{O}(N^{-{1\over 2}-{1\over d}}).

Discussions will also be given on the difference and relationship between the finite neuron method and finite element methods (FEM). For example, for finite neuron method, the underlying finite element grids are not given a priori and the discrete solution can only be obtained by solving a non-linear and non-convex optimization problem. Despite of many desirable theoretical properties of the finite neuron method analyzed in the paper, its practical value is a subject of further investigation since the aforementioned underlying non-linear and non-convex optimization problem can be expensive and challenging to solve. For completeness and also convenience to readers, some basic known results and their proofs are also included in this manuscript.

1 Introduction

This paper is devoted to the study of numerical methods for high order partial differential equations in any dimension using appropriate piecewise polynomial function classes. In this introduction, we will briefly describle a class of elliptic boundary value problems of any order in any dimension, we will then give an overview of some existing numerical methods for this model and other related problems, and we will finally explain the motivation and objective of this paper.

1.1 Model problem

Let Ωd\Omega\subset\mathbb{R}^{d} be a bounded domain with a sufficiently smooth boundary Ω\partial\Omega. For any integer m1m\geq 1, we consider the following model 2m2m-th order partial differential equation with certain boundary conditions:

{Lu=fin Ω,Bk(u)=0on Ω(0km1),\left\{\begin{array}[]{rccl}\displaystyle Lu&=&f&\mbox{in }\Omega,\\ B^{k}(u)&=&0&\mbox{on }\partial\Omega\quad(0\leq k\leq m-1),\end{array}\right. (1.1)

where LL is a partial differential operator as follows

Lu=|α|=m(1)mα(aα(x)αu)+a0(x)u,Lu=\sum_{|\alpha|=m}(-1)^{m}\partial^{\alpha}(a_{\alpha}(x)\,\partial^{\alpha}\,u)+a_{0}(x)u, (1.2)

and 𝜶{\bm{\alpha}} denotes nn-dimensional multi-index 𝜶=(α1,,αn){\bm{\alpha}}=(\alpha_{1},\cdots,\alpha_{n}) with

|𝜶|=i=1nαi,𝜶=|𝜶|x1α1xnαn.|{\bm{\alpha}}|=\sum_{i=1}^{n}\alpha_{i},\quad\partial^{\bm{\alpha}}=\frac{\partial^{|{\bm{\alpha}}|}}{\partial x_{1}^{\alpha_{1}}\cdots\partial x_{n}^{\alpha_{n}}}.

For simplicity, we assume that aαa_{\alpha} are strictly positive and smooth functions on Ω\Omega for |α|=m|\alpha|=m and α=0\alpha=0, namely, α0>0\exists\alpha_{0}>0, such that

aα(x),a0(x)α0,xΩ,|α|=m.a_{\alpha}(x),a_{0}(x)\geq\alpha_{0},\,\,\forall x\in\Omega,\,\,|\alpha|=m. (1.3)

Given a nonnegative integer kk and a bounded domain Ωd\Omega\subset\mathbb{R}^{d}, let

Hk(Ω):={vL2(Ω),αvL2(Ω),|α|k}H^{k}(\Omega):=\left\{v\in L^{2}(\Omega),\partial^{\alpha}v\in L^{2}(\Omega),|\alpha|\leq k\right\}

be standard Sobolev spaces with norm and seminorm given respectively by

vk:=(|α|kαv02)1/2,|v|k:=(|α|=kαv02)1/2.\|v\|_{k}:=\left(\sum_{|\alpha|\leq k}\|\partial^{\alpha}v\|_{0}^{2}\right)^{1/2},\quad|v|_{k}:=\left(\sum_{|\alpha|=k}\|\partial^{\alpha}v\|_{0}^{2}\right)^{1/2}.

For k=0k=0, H0(Ω)H^{0}(\Omega) is the standard L2(Ω)L^{2}(\Omega) space with the inner product denoted by (,)(\cdot,\cdot). Similarly, for any subset KΩK\subset\Omega, L2(K)L^{2}(K) inner product is denoted by (,)0,K(\cdot,\cdot)_{0,K}. We note that, under the assumption (1.3),

a(v,v)vm,Ω2,vHm(Ω).a(v,v)\gtrsim\|v\|^{2}_{m,\Omega},\forall v\in H^{m}(\Omega). (1.4)

The boundary value problem (1.1) can be cast into an equivalent optimization or a variational problem as described below for some approximate subspace VHm(Ω)V\subset H^{m}(\Omega).

Minimization Problem M:

Find uVu\in V such that

J(u)=minvVJ(v),J(u)=\min_{v\in V}J(v), (1.5)

or

Variational Problem V:

Find uVu\in V such that

a(u,v)=f,vvV.a(u,v)=\langle f,v\rangle\quad\forall v\in V. (1.6)

The bilinear form a(,)a(\cdot,\cdot) in (1.6), the objective function J()J(\cdot) in (1.5) and the functional space VV depend on the type of boundary condition in (1.1).

One popular type of boundary conditions are Dirichlet boundary condition when Bk=BDkB^{k}=B_{D}^{k} are given by the following Dirichlet type trace operators

BDk(u):=kuνk|Ω(0km1),B_{D}^{k}(u):=\left.\frac{\partial^{k}u}{\partial\nu^{k}}\right|_{\partial\Omega}\quad(0\leq k\leq m-1), (1.7)

with ν\nu being the outward unit normal vector of Ω\partial\Omega.

For the aforementioned Dirichlet boundary condition, the elliptic boundary value problem (1.1) is equivalent to (1.5) or (1.6) with V=H0m(Ω)V=H^{m}_{0}(\Omega) and

a(u,v):=|α|=m(aααu,αv)0,Ω+(a0u,v)u,vV,a(u,v):=\sum_{|\alpha|=m}(a_{\alpha}\partial^{\alpha}u,\partial^{\alpha}v)_{0,\Omega}+(a_{0}u,v)\quad\forall u,v\in V, (1.8)

and

J(v)=12a(v,v)Ωfv𝑑x.J(v)=\frac{1}{2}a(v,v)-\int_{\Omega}fvdx. (1.9)

Other boundary conditions such as Neumann boundary and mixed boundary conditions are a little bit complicated to describe for general case when m2m\geq 2 and will be discussed later.

1.2 A brief overview of existing methods

Here we briefly review some classic finite element and other relevant methods for numerical solution of elliptic boundary value problems (1.1) for all d,m1d,m\geq 1.

Classic finite element methods use piecewise polynomial functions based on a given a subdivision, namely a finite element grid, of the domain, to discretize the variational problem. We will mainly review three different types of finite element methods: (1) conforming element method; (2) nonconforming and discontinuous Galerkin method; and (3) virtual element method.

Conforming finite element method.

Given a finite element grid, this type of method is to construct VhVV_{h}\subset V and find uhVhu_{h}\in V_{h} such that

J(uh)=minvhVhJ(vh).J(u_{h})=\min_{v_{h}\in V_{h}}J(v_{h}). (1.10)

It is well-known that a piecewise polynomial VhHm(Ω)V_{h}\subset H^{m}(\Omega) if and only if VhCm1(Ω¯)V_{h}\subset C^{m-1}(\bar{\Omega}). For m=1m=1, piecewise linear finite element VhH1(Ω)V_{h}\subset H^{1}(\Omega) can be easily constructed on simplicial finite element grids in any dimension d1d\geq 1. The construction and analysis of linear finite element method for m=1m=1 and d=2d=2 can be traced back to [Feng, 1965]. The situation becomes complicated when m2m\geq 2 and d2d\geq 2.

For example, it was proved that the construction of an H2H^{2}-conforming finite element space requires the use of polynomials of at least degree five in two dimensions [Ženíšek, 1970] and degree nine in three dimensions [Lai and Schumaker, 2007]. We refer to [Argyris et al., 1968] for the classic quintic H2H^{2}-Argyris element in two dimension and to [Zhang, 2009] for the ninth-degree H2H^{2}-element in three dimensions.

Many other efforts have been made in the literature in constructing HmH^{m}-conforming finite element spaces. [Bramble and Zlámal, 1970] proposed the 2D simplicial HmH^{m} conforming elements (m1m\geq 1) by using the polynomial spaces of degree 4m34m-3, which are the generalization of the H2H^{2} Argyris element (cf. [Argyris et al., 1968, Ciarlet, 1978]) and H3H^{3} Ženíšek element (cf. [Ženíšek, 1970]). Again, the degree of polynomials used is quite high. For (1.1) an alternative in 2D is to use mixed methods based on the Helmholtz decompositions for tensor-valued functions (cf. [Schedensack, 2016]). However, the general construction of HmH^{m}-conforming elements in any dimension is still an open problem.

We note that the construction of conforming finite element space depends on the structure of the underlying grid. For example, one can construct relatively low-order H2H^{2} finite elements on grids with special structures. Examples include the (quadratic) Powell-Sabin, (cubic) Clough-Tocher elements in two dimensions [Powell and Sabin, 1977, Clough and Tocher, 1965], and the (quintic) Alfeld splits in three dimensions [Alfeld, 1984], where full-order accuracy, namely 𝒪(h)\mathcal{O}(h), 𝒪(h2)\mathcal{O}(h^{2}) and 𝒪(h4)\mathcal{O}(h^{4}) accuracy can be estimated. On more recent developments on Alfeld splits we refer to [Fu et al., 2020] and references cited therein. But these constructions do not apply to general grids. For example, de Boor-DeVore-Höllig [Boor and Devore, 1983, Boor and Höllig, 1983] showed that the H2H^{2} element that consists of piecewise cubic polynomials on uniform grid sequence would not provide full approximation accuracy. This gives us hints the structure of the underlying grid plays an important role in constructing HmH^{m}-comforming finite element.

Nonconforming finite element and discontinuous Galerkin methods:

Given a finite element grid 𝒯h\mathcal{T}_{h}, compared to conforming method, the nonconforming finite element method does not require that VhVV_{h}\subset V, namely VhVV_{h}\nsubseteq V. We find uhVhu_{h}\in V_{h} such that

Jh(uh)=minvhVhJh(vh)J_{h}(u_{h})=\min_{v_{h}\in V_{h}}J_{h}(v_{h}) (1.11)

with

Jh(vh)=K𝒯hJK(vh)=K𝒯h12K|α|=maα|αvh|2+a0|vh|2dxKfvh𝑑x.\displaystyle J_{h}(v_{h})=\sum_{K\in\mathcal{T}_{h}}J_{K}(v_{h})=\sum_{K\in\mathcal{T}_{h}}\frac{1}{2}\int_{K}\sum_{|\alpha|=m}a_{\alpha}|\partial^{\alpha}v_{h}|^{2}+a_{0}|v_{h}|^{2}dx-\int_{K}fv_{h}dx.

One interesting example of nonconforming element for (1.1) is the Morley element [Morley, 1967] for m=d=2m=d=2 which uses piecewise quadratic polynomials. For mdm\leq d, Wang and Xu [Wang and Xu, 2013] provided a universal construction and analysis for a family of nonconforming finite elements consisting piecewise polynomials of minimal order for (1.1) on d\mathbb{R}^{d} simplicical grids. The elements in [Wang and Xu, 2013], now known as MWX-elements in the literature, gave a natural generalization of the classic Morley element to the general case that 1md1\leq m\leq d. Recently, there are a number of results on the extension of MWX-elements. [Wu and Xu, 2019] enriched the 𝒫m\mathcal{P}_{m} polynomial space by 𝒫m+1\mathcal{P}_{m+1} bubble functions to obtain a family of HmH^{m} nonconforming elements when m=d+1m=d+1. [Hu and Zhang, 2017] applied the full 𝒫4\mathcal{P}_{4} polynomial space for the construction of nonconforming element when m=3,d=2m=3,d=2, which has three more degrees of freedom locally than the element in [Wu and Xu, 2019]. They also used the full 𝒫2m3\mathcal{P}_{2m-3} polynomial space for the nonconforming finite element approximations when m4,d=2m\geq 4,d=2.

In addition to the aforementioned conforming and nonconforming finite element methods, discontinuous Galerkin (DG) method that make use of piecewise polynomials but globally discontinuous finite element functions have been also used for solving high order partial differential equations, c.f. [Baker, 1977]. The DG method requires the use of many stabilization terms and parameters and the number of stabilization terms and parameters naturally grow as the order of PDE grows. To reduce the amount of stabilization, one approach is to introduce some continuity and smoothness in the discrete space to replace the totally discontinuous spaces. Examples for such an approach include the C0C^{0}-interior penalty DG methods for fourth order elliptic problem by Brenner and Sung [Brenner and Sung, 2005] and for sixth order elliptic equation by Gudi and Neilan [Gudi and Neilan, 2011]. More recently, Wu and Xu [Wu and Xu, 2017] provided a family of interior penalty nonconforming finite element methods for (1.1) in d\mathbb{R}^{d}, for any m0,d1m\geq 0,d\geq 1. This family of elements recover the MWX-elements in [Wang and Xu, 2013] when mdm\leq d which does not require any stabilization.

Virtual finite element

Classic definition of finite element methods [Ciarlet, 1978] based on finite element triple can be extended in many different ways. One successful extension is the virtual element method (VEM) in which general polygons or polyhedrons are used as elements and non-polynomial functions are used as shape functions. For m=1m=1, we refer to [Beirão da Veiga et al., 2013] and [Brezzi et al., 2014]. For m=2m=2, we refer to [Brezzi and Marini, 2013] on conforming virtual element methods for plate bending problems, and [Antonietti et al., 2018] on nonconforming virtual element methods for biharmonic problems. For general m1m\geq 1, we refer to [Chen and Huang, 2020] for nonconforming elements which extend the MWX elements in [Wang and Xu, 2013] from simplicial elements to polyhedral elements.

1.3 Objectives

Deep neural network (DNN), a tool developed for machine learning [Goodfellow et al., 2016]. DNN provides a very special function class that have been used for numerical solution of partial different equations, c.f.  [Lagaris et al., 1998]. By using different activation function such as sigmoidal, deep neural network can give rise to a very wide range of functional classes that can be drastically different from the piecewise polynomial function classes used in classic finite element method. One advantage of DNN approach is that it is quite easy to obtain smooth, namely HmH^{m}-conforming for any m1m\geq 1, DNN functions by simply choosing smooth activation functions. These function classes, however, do not usually form a linear vector space and hence the usual variational principle in classic finite element method can not be applied easily and instead collocation type of methods are often used. DNN is known to have much less “curse of dimensionality” than the traditional functional classes (such as polynomials or piecewise polynomials), DNN based method is potentially efficient to high dimensional problems and has been studied, for example, in [E et al., 2017] and [Sirignano and Spiliopoulos, 2018].

One main motivation of this paper is to explore DNN type methods that are most closely related to the traditional finite element methods. Namely we are interested in DNN function classes that consist of piecewise polynomials. By exploring relationship between DNN and FEM, we hope, on one hand, to expand or extend the traditional FEM approach by using new tools from DNN, and, on the other hand, to gain and develop theoretical insights and algorithmic tools into DNN by combining the rich mathematical theories and techniques in FEM.

In an earlier work [He et al., 2020b], we studied the relationship between deep neural networks (DNNs) using ReLU as activation function and continuous piecewise linear functions. One conclusion that can be drawn from [He et al., 2020b] is that any ReLU-DNN function is an H1H^{1}-conforming linear finite element function, and verse versa. The current work can be considered an extension of [He et al., 2020b] by considering using ReLUk-DNN for high order partial differential equations. One focus in the current work is to provide error estimates when ReLUk-DNN is applied to solve high order partial differential equations. More specifically, we will study a special class of HmH^{m}-conforming generalized finite element methods (consisting of piecewise polynomials) for (1.1) for any m1m\geq 1 and d1d\geq 1 based on artificial neural network for numerical solution of arbitrarily high order elliptic boundary value problem (1.1) and then provide convergence analysis for this method. For this type of method, the underlying finite element grids are not given a priori and the discrete solution can be obtained for solving a non-linear and non-convex optimization problem. In the case that the boundary of Ωd\Omega\subset\mathcal{R}^{d}, namely Ω\partial\Omega, is curved, it is often an issue how to put a good finite element grid to accurately approximate Ω\partial\Omega. As it turns out, this is not an issue for the finite neuron method, which is probably one of the advantages of the finite neuron method analyzed in this paper.

We note that the numerical method studied in this paper for elliptic boundary value problems is closely related to the classic finite element method, namely it amounts to piecewise polynomials with respect to an implicitly defined grid. We can also argue that it can be viewed as a mesh-less or even vertex-less method. But comparing with the popular meshless method, this method does correspond to some underlying grid, but this grid is not given a priori. This underlying grid is determined by the artificial neurons, which mathematically speaking refers to hyperplanes ωix+bi=0\omega_{i}\cdot x+b_{i}=0, together with a given activation function. By combining the names for finite element method and artificial neural network, for convenience of exposition, we will name the method studied in this paper as the finite neuron method.

The rest of the paper is organized as follows. In Section 2, we describe Monte-Carlo sampling technique, stratified sampling technique and Barron space. In Section 3, we construct the finite neuron functions and prove their approximation properties. In Section 5, we propose the finite neuron method and provide the convergence analysis. Finally, in Section 6, we give some summaries and discussions on the results in this paper.

Following [Xu, 1992], we will use the notation “xyx\lesssim y” to denote “xCyx\leq Cy” for some constant CC independent of crucial parameter such as mesh size.

2 Preliminaries

In this section, for clarity of exposition, we present some standard materials from statistics about Monte Carlo sampling, stratified sampling and their applications to analysis of asymptotic approximation properties of neural network functions.

2.1 Monte-Carlo and stratified sampling techniques

Let λ0\lambda\geq 0 be a probability density function on a domain GD(D1)G\subset\mathbb{R}^{D}(D\geq 1) such that

Gλ(θ)𝑑θ=1.\int_{G}\lambda(\theta)d\theta=1. (2.1)

We define the expectation and variance as follows

𝔼g:=Gg(θ)λ(θ)𝑑θ,𝕍g:=𝔼((g𝔼g)2)=𝔼(g2)(𝔼g)2.\mathbb{E}g:=\int_{G}g(\theta)\lambda(\theta)d\theta,\qquad\mathbb{V}g:=\mathbb{E}((g-\mathbb{E}g)^{2})=\mathbb{E}(g^{2})-(\mathbb{E}g)^{2}. (2.2)

We note that

𝕍gmaxθ,θG(g(θ)g(θ))2.\displaystyle\mathbb{V}g\leq\max_{\theta,\theta^{\prime}\in G}(g(\theta)-g(\theta^{\prime}))^{2}.

For any subset GiGG_{i}\subset G, let

λ(Gi)=Giλ(θ)𝑑θ,λi(θ)=λ(θ)λ(Gi).\lambda(G_{i})=\int_{G_{i}}\lambda(\theta)d\theta,\qquad\lambda_{i}(\theta)={\lambda(\theta)\over\lambda(G_{i})}.

It holds that

𝔼Gg=i=1Mλ(Gi)𝔼Gig.\mathbb{E}_{G}g=\sum_{i=1}^{M}\lambda(G_{i})\mathbb{E}_{G_{i}}g.

For any function h(θ1,,θN):G1×G2GNh(\theta_{1},\cdots,\theta_{N}):G_{1}\times G_{2}\cdots G_{N}\mapsto\mathbb{R}, define

𝔼Gig=Gig(θ)λi(θ)𝑑θ\mathbb{E}_{G_{i}}g=\int_{G_{i}}g(\theta)\lambda_{i}(\theta)d\theta

and

𝔼Nh:=G1×G2××GNh(θ1,,θN)λ1(θ1)λ2(θ2)λN(θN)𝑑θ1𝑑θ2𝑑θN.\mathbb{E}_{N}h:=\int_{G_{1}\times G_{2}\times\ldots\times G_{N}}h(\theta_{1},\cdots,\theta_{N})\lambda_{1}(\theta_{1})\lambda_{2}(\theta_{2})\ldots\lambda_{N}(\theta_{N})d\theta_{1}d\theta_{2}\ldots d\theta_{N}. (2.3)

For the Monte Carlo method, let Gi=GG_{i}=G for all 1in1\leq i\leq n, namely,

𝔼Nh:=G×G××Gh(θ1,,θN)λ(θ1)λ(θ2)λ(θN)𝑑θ1𝑑θ2𝑑θN.\mathbb{E}_{N}h:=\int_{G\times G\times\ldots\times G}h(\theta_{1},\cdots,\theta_{N})\lambda(\theta_{1})\lambda(\theta_{2})\ldots\lambda(\theta_{N})d\theta_{1}d\theta_{2}\ldots d\theta_{N}. (2.4)

The following result is standard [Rubinstein and Kroese, 2016] and their proofs can be obtained by direct calculations.

Lemma 2.1.

For any gL(G)g\in L^{\infty}(G), we have

𝔼N(𝔼g1Ni=1Ng(ωi))2={1N𝕍(g)1Nsupω,ωG|g(ω)g(ω)|21N(𝔼(g2)(𝔼(g))2)1N𝔼(g2)1NgL2,\begin{split}\mathbb{E}_{N}\Big{(}\mathbb{E}g-\frac{1}{N}\sum_{i=1}^{N}g(\omega_{i})\Big{)}^{2}&=\left\{\begin{aligned} \frac{1}{N}\mathbb{V}(g)&\leq\frac{1}{N}\sup_{\omega,\omega^{\prime}\in G}|g(\omega)-g(\omega^{\prime})|^{2}\\ \frac{1}{N}\Big{(}\mathbb{E}(g^{2})-\big{(}\mathbb{E}(g)\big{)}^{2}\Big{)}&\leq\frac{1}{N}\mathbb{E}(g^{2})\leq\frac{1}{N}\|g\|^{2}_{L^{\infty}},\end{aligned}\right.\end{split} (2.5)
Proof.

First note that

(𝔼g1Ni=1Ng(ωi))2\displaystyle\left(\mathbb{E}g-\frac{1}{N}\sum_{i=1}^{N}g(\omega_{i})\right)^{2} =1N2(i=1N(𝔼gg(ωi)))2\displaystyle=\frac{1}{N^{2}}\left(\sum_{i=1}^{N}(\mathbb{E}g-g(\omega_{i}))\right)^{2} (2.6)
=1N2i,j=1N(𝔼gg(ωi))(𝔼gg(ωj))\displaystyle=\frac{1}{N^{2}}\sum_{i,j=1}^{N}(\mathbb{E}g-g(\omega_{i}))(\mathbb{E}g-g(\omega_{j}))
=I1N2+I2N2.\displaystyle=\frac{I_{1}}{N^{2}}+\frac{I_{2}}{N^{2}}.

with

I1=i=1N(𝔼gg(ωi))2,I2=ijN((𝔼g)2𝔼(g)(g(ωi)+g(ωj))+g(ωi)g(ωj))).I_{1}=\sum_{i=1}^{N}(\mathbb{E}g-g(\omega_{i}))^{2},\quad I_{2}=\sum_{i\neq j}^{N}\left((\mathbb{E}g)^{2}-\mathbb{E}(g)(g(\omega_{i})+g(\omega_{j}))+g(\omega_{i})g(\omega_{j}))\right). (2.7)

Consider I1I_{1}, for any ii,

𝔼N(𝔼gg(ωi))2=𝔼(𝔼gg)2=𝕍(g).\mathbb{E}_{N}(\mathbb{E}g-g(\omega_{i}))^{2}=\mathbb{E}(\mathbb{E}g-g)^{2}=\mathbb{V}(g).

Thus,

𝔼N(I1)=n𝕍(g).\mathbb{E}_{N}(I_{1})=n\mathbb{V}(g).

For I2I_{2}, note that

𝔼Ng(ωi)=𝔼Ng(ωj)=𝔼(g)\mathbb{E}_{N}g(\omega_{i})=\mathbb{E}_{N}g(\omega_{j})=\mathbb{E}(g)

and, for iji\neq j,

𝔼N(g(ωi)g(ωj))\displaystyle\mathbb{E}_{N}(g(\omega_{i})g(\omega_{j})) =G×G××Gg(ωj)g(ωj)λ(ω1)λ(ω2)λ(ωN)𝑑ω1𝑑ω2𝑑ωN\displaystyle=\int_{G\times G\times\ldots\times G}g(\omega_{j})g(\omega_{j})\lambda(\omega_{1})\lambda(\omega_{2})\ldots\lambda(\omega_{N})d\omega_{1}d\omega_{2}\cdots d\omega_{N} (2.8)
=G×Gg(ωj)g(ωj)λ(ω1)λ(ω1)λ(ω2)𝑑ω1𝑑ω2\displaystyle=\int_{G\times G}g(\omega_{j})g(\omega_{j})\lambda(\omega_{1})\lambda(\omega_{1})\lambda(\omega_{2})d\omega_{1}d\omega_{2}
=𝔼N(g(ωi))𝔼n(g(ωj))=[𝔼(g)]2.\displaystyle=\mathbb{E}_{N}(g(\omega_{i}))\mathbb{E}_{n}(g(\omega_{j}))=[\mathbb{E}(g)]^{2}.

Thus

𝔼N(I2)=𝔼N(ijN((𝔼g)2𝔼(g)(𝔼(g(ωi))+𝔼(g(ωj)))+𝔼(g(ωi)g(ωj))))=0.\mathbb{E}_{N}(I_{2})=\mathbb{E}_{N}\left(\sum_{i\neq j}^{N}((\mathbb{E}g)^{2}-\mathbb{E}(g)(\mathbb{E}(g(\omega_{i}))+\mathbb{E}(g(\omega_{j})))+\mathbb{E}(g(\omega_{i})g(\omega_{j})))\right)=0. (2.9)

Consequently, there exist the following two formulas for 𝔼N(𝔼g1Ni=1Ng(ωi))2\displaystyle\mathbb{E}_{N}\left(\mathbb{E}g-\frac{1}{N}\sum_{i=1}^{N}g(\omega_{i})\right)^{2}:

𝔼N(𝔼g1Ni=1Ng(ωi))2=1N2𝔼N(I1)={1N𝔼((𝔼gg)2)1N(𝔼(g2)(𝔼g)2).\mathbb{E}_{N}\left(\mathbb{E}g-\frac{1}{N}\sum_{i=1}^{N}g(\omega_{i})\right)^{2}=\frac{1}{N^{2}}\mathbb{E}_{N}(I_{1})=\left\{\begin{aligned} \frac{1}{N}\mathbb{E}\big{(}(\mathbb{E}g-g)^{2}\big{)}\\ \frac{1}{N}(\mathbb{E}(g^{2})-(\mathbb{E}g)^{2}).\end{aligned}\right. (2.10)

Based on the first formula above, since

|g(ω)𝔼g|=|G(g(ω)g(ω~))λ(ω~)𝑑ω~|supω,ωG|g(ω)g(ω)|,|g(\omega)-\mathbb{E}g|=|\int_{G}\big{(}g(\omega)-g(\tilde{\omega})\big{)}\lambda(\tilde{\omega})d\tilde{\omega}|\leq\sup_{\omega,\omega^{\prime}\in G}|g(\omega)-g(\omega^{\prime})|,

it holds that

𝔼N(𝔼g1Ni=1Ng(ωi))21Nsupω,ωG|g(ω)g(ω)|2.\mathbb{E}_{N}\left(\mathbb{E}g-{1\over N}\sum_{i=1}^{N}g(\omega_{i})\right)^{2}\leq\frac{1}{N}\sup_{\omega,\omega\in G}|g(\omega)-g(\omega^{\prime})|^{2}. (2.11)

Due to the second formula above,

𝔼N(𝔼g1Ni=1Ng(ωi))21N𝔼(g2)1NgL2\mathbb{E}_{N}\left(\mathbb{E}g-\frac{1}{N}\sum_{i=1}^{N}g(\omega_{i})\right)^{2}\leq\frac{1}{N}\mathbb{E}(g^{2})\leq\frac{1}{N}\|g\|^{2}_{L^{\infty}} (2.12)

which completes the proof. ∎

Stratified sampling [Bickel and Freedman, 1984] gives a more refined version of the Monte Carlo method.

Lemma 2.2.

For any nonoverlaping decomposition G=G1G2GMG=G_{1}\cup G_{2}\cup\cdots\cup G_{M} and positive integer nn, let ni=λ(Gi)nn_{i}=\lceil\lambda(G_{i})n\rceil be the smallest integer larger than λ(Gi)n\lambda(G_{i})n and N=i=1Mni\displaystyle N=\sum_{i=1}^{M}n_{i}. Let θi,jGi(1jni)\theta_{i,j}\in G_{i}(1\leq j\leq n_{i}) and

gn=i=1Mλ(Gi)gniiwithgnii=1nij=1nig(θi,j).g_{n}=\sum_{i=1}^{M}\lambda(G_{i})g_{n_{i}}^{i}\quad\mbox{with}\quad g_{n_{i}}^{i}={1\over n_{i}}\sum_{j=1}^{n_{i}}g(\theta_{i,j}). (2.13)

It holds that

𝔼N(𝔼GggN)2=i=1Mλ2(Gi)ni𝔼Gi(g𝔼Gig)21nmax1iMsupθ,θGi|g(θ)g(θ)|2.\mathbb{E}_{N}(\mathbb{E}_{G}g-g_{N})^{2}=\sum_{i=1}^{M}{\lambda^{2}(G_{i})\over n_{i}}\mathbb{E}_{G_{i}}\big{(}g-\mathbb{E}_{G_{i}}g\big{)}^{2}\leq{1\over n}\max_{1\leq i\leq M}\sup_{\theta,\theta^{\prime}\in G_{i}}\big{|}g(\theta)-g(\theta^{\prime})\big{|}^{2}. (2.14)
Proof.

It follows from definition that

g(x,θ)=i=1Mλ(Gi)g(x,θ),𝔼Gg=i=1Mλ(Gi)Gigλi(θ)𝑑θ=i=1Mλ(Gi)𝔼Gig.g(x,\theta)=\sum_{i=1}^{M}\lambda(G_{i})g(x,\theta),\qquad\mathbb{E}_{G}g=\sum_{i=1}^{M}\lambda(G_{i})\int_{G_{i}}g\lambda_{i}(\theta)d\theta=\sum_{i=1}^{M}\lambda(G_{i})\mathbb{E}_{G_{i}}g. (2.15)

Thus, the difference g𝔼Ggg-\mathbb{E}_{G}g is a linear combination of g𝔼Gigg-\mathbb{E}_{G_{i}}g on each GiG_{i} as follows

g𝔼Gg=i=1Mλ(Gi)(g𝔼Gig).g-\mathbb{E}_{G}g=\sum_{i=1}^{M}\lambda(G_{i})\big{(}g-\mathbb{E}_{G_{i}}g\big{)}. (2.16)

It follows from

𝔼Gggn=i=1Mλ(Gi)(𝔼Giggnii)\mathbb{E}_{G}g-g_{n}=\sum_{i=1}^{M}\lambda(G_{i})(\mathbb{E}_{G_{i}}g-g^{i}_{n_{i}}) (2.17)

and (2.3) that

𝔼n(𝔼Gggn)2=i,j=1Mλ(Gi)λ(Gj)𝔼n((𝔼Giggnii)(𝔼Gjggnjj))=i,j=1Mλ(Gi)λ(Gj)Iij\mathbb{E}_{n}(\mathbb{E}_{G}g-g_{n})^{2}=\sum_{i,j=1}^{M}\lambda(G_{i})\lambda(G_{j})\mathbb{E}_{n}\big{(}(\mathbb{E}_{G_{i}}g-g^{i}_{n_{i}})(\mathbb{E}_{G_{j}}g-g^{j}_{n_{j}})\big{)}=\sum_{i,j=1}^{M}\lambda(G_{i})\lambda(G_{j})I_{ij} (2.18)

with

Iij=𝔼n((𝔼Giggnii)(𝔼Gjggnjj)).I_{ij}=\mathbb{E}_{n}\big{(}(\mathbb{E}_{G_{i}}g-g^{i}_{n_{i}})(\mathbb{E}_{G_{j}}g-g^{j}_{n_{j}})\big{)}.

By Lemma 2.1,

Iij=𝔼n((𝔼Giggnii)2)δij=1ni𝔼((𝔼Gigg)2)δij.I_{ij}=\mathbb{E}_{n}\big{(}(\mathbb{E}_{G_{i}}g-g^{i}_{n_{i}})^{2}\big{)}\delta_{ij}={1\over n_{i}}\mathbb{E}\big{(}(\mathbb{E}_{G_{i}}g-g)^{2}\big{)}\delta_{ij}. (2.19)

Thus,

𝔼n(𝔼Gggn)2=i=1Mλ2(Gi)ni𝔼Gi(g𝔼Gig)2,\mathbb{E}_{n}(\mathbb{E}_{G}g-g_{n})^{2}=\sum_{i=1}^{M}{\lambda^{2}(G_{i})\over n_{i}}\mathbb{E}_{G_{i}}\big{(}g-\mathbb{E}_{G_{i}}g\big{)}^{2}, (2.20)

which completes the proof. ∎

Lemma 2.1 and Lemma 2.2 represent two simple identities and subsequent inequalities that can be verified by a direct calculation. Actually Lemma 2.1 is a special case of Lemma 2.2 with M=1M=1. Lemma 2.1 and Lemma 2.2 are the basis of Monte-Carlo sampling and stratified sampling in statistics. In the presentation of the this paper, we choose not to use any concepts related to random samplings.

Given another domain Ωd\Omega\subset\mathbb{R}^{d}, we consider the case that g(x,θ)g(x,\theta) is a function of both xΩx\in\Omega and θG\theta\in G. Given any function ρL1(G)\rho\in L^{1}(G), we consider

u(x)=Gg(x,θ)ρ(θ)𝑑θu(x)=\int_{G}g(x,\theta)\rho(\theta)d\theta (2.21)

with ρL1(G)<\|\rho\|_{L^{1}(G)}<\infty. Let λ(θ)=ρ(θ)ρL1(G)\lambda(\theta)={\rho(\theta)\over\|\rho\|_{L^{1}(G)}}. Thus,

u(x)=ρL1(G)Gg(x,θ)λ(θ)𝑑θu(x)=\|\rho\|_{L^{1}(G)}\int_{G}g(x,\theta)\lambda(\theta)d\theta (2.22)

with λ(θ)L1(G)=1\|\lambda(\theta)\|_{L^{1}(G)}=1.

We can apply the above two lemmas to the given function u(x)u(x).

Lemma 2.3.

[Monte Carlo Sampling] Consider u(x)u(x) in (2.21) with 0ρ(θ)L1(G)0\leq\rho(\theta)\in L^{1}(G). For any N1N\geq 1, there exist θiG\theta_{i}^{*}\in G such that

uuNL2(Ω)21NGg(,θ)L2(Ω)2ρ(θ)𝑑θ=ρL1(G)N𝔼(g(,θ)L2(Ω)2)\|u-u_{N}\|_{L^{2}(\Omega)}^{2}\leq\frac{1}{N}\int_{G}\|g(\cdot,\theta)\|_{L^{2}(\Omega)}^{2}\rho(\theta)d\theta={\|\rho\|_{L^{1}(G)}\over N}\mathbb{E}(\|g(\cdot,\theta)\|_{L^{2}(\Omega)}^{2})

where g(,θ)L2(Ω)2=Ω[g(x,θ)]2𝑑μ(x),\|g(\cdot,\theta)\|_{L^{2}(\Omega)}^{2}=\int_{\Omega}[g(x,\theta)]^{2}d\mu(x),

uN(x)=ρL1(G)Ni=1Ng(x,θi).u_{N}(x)=\frac{\|\rho\|_{L^{1}(G)}}{N}\sum_{i=1}^{N}g(x,\theta_{i}^{*}). (2.23)

Similarly, if g(,θ)Hm(Ω)g(\cdot,\theta)\in H^{m}(\Omega), for any N1N\geq 1, there exist θiGi\theta_{i}^{*}\in G_{i} with fNf_{N} given in (2.23) such that

uuNHm(Ω)2Gg(,θ)Hm(Ω)2ρ(θ)𝑑θ=ρL1(G)N𝔼(g(,θ)Hm(Ω)2).\|u-u_{N}\|_{H^{m}(\Omega)}^{2}\leq\int_{G}\|g(\cdot,\theta)\|_{H^{m}(\Omega)}^{2}\rho(\theta)d\theta=\frac{\|\rho\|_{L^{1}(G)}}{N}\mathbb{E}(\|g(\cdot,\theta)\|_{H^{m}(\Omega)}^{2}). (2.24)
Proof.

Note that

u(x)=ρL1(G)𝔼(g).u(x)=\|\rho\|_{L^{1}(G)}\mathbb{E}(g). (2.25)

By Lemma 2.1,

𝔼n((𝔼(g(x,))1Ni=1Ng(x,θi)))2)1N𝔼(g2).\mathbb{E}_{n}\left(\bigg{(}\mathbb{E}(g(x,\cdot))-\frac{1}{N}\sum_{i=1}^{N}g(x,\theta_{i}))\bigg{)}^{2}\right)\leq{1\over N}\mathbb{E}(g^{2}).

By taking integration w.r.t. xx on both sides, we get

𝔼n(h(θ1,θ2,,θN))1N𝔼(Ωg2𝑑μ(x)),\mathbb{E}_{n}\left(h(\theta_{1},\theta_{2},\cdots,\theta_{N})\right)\leq{1\over N}\mathbb{E}\Big{(}\int_{\Omega}g^{2}d\mu(x)\Big{)},

where

h(θ1,θ2,,θN)=Ω(𝔼(g(x,))1Ni=1Ng(x,θi)))2dμ(x).h(\theta_{1},\theta_{2},\cdots,\theta_{N})=\int_{\Omega}\bigg{(}\mathbb{E}(g(x,\cdot))-\frac{1}{N}\sum_{i=1}^{N}g(x,\theta_{i}))\bigg{)}^{2}d\mu(x).

Sine 𝔼N(1)=1\mathbb{E}_{N}(1)=1 and 𝔼N(h)1N𝔼(Ωg2𝑑μ(x))\mathbb{E}_{N}(h)\leq{1\over N}\mathbb{E}\Big{(}\int_{\Omega}g^{2}d\mu(x)\Big{)}, there exist θiG\theta_{i}^{*}\in G such that

h(θ1,θ2,,θN)1NΩ𝔼(g2)𝑑μ(x).h(\theta_{1}^{*},\theta_{2}^{*},\cdots,\theta_{N}^{*})\leq{1\over N}\int_{\Omega}\mathbb{E}(g^{2})d\mu(x).

This implies that

𝔼nuuNL2(Ω)2ρL1(G)NGg(,θ)L2(Ω)2λ(θ)𝑑θ.\mathbb{E}_{n}\|u-u_{N}\|_{L^{2}(\Omega)}^{2}\leq\frac{\|\rho\|_{L^{1}(G)}}{N}\int_{G}\|g(\cdot,\theta)\|_{L^{2}(\Omega)}^{2}\lambda(\theta)d\theta.

The proof for (2.24) is similar to the above analysis for the L2L^{2}-error analysis, which completes the proof. ∎

Lemma 2.4.

[Stratified Sampling] For u(x)u(x) in (2.21) with positive ρ(θ)L1(G)\rho(\theta)\in L^{1}(G), given any positive integers nn and MnM\leq n, for any nonoverlaping decomposition G=G1G2GMG=G_{1}\cup G_{2}\cup\cdots\cup G_{M}, there exists{θi}i=1N\{\theta_{i}^{\ast}\}_{i=1}^{N} with nN2nn\leq N\leq 2n such that

uuNL2(Ω)N1/2ρL1(G)max1jMsupθj,θjGjg(x,θj)g(x,θj)L2(Ω)\|u-u_{N}\|_{L^{2}(\Omega)}\leq N^{-1/2}\|\rho\|_{L^{1}(G)}\max_{1\leq j\leq M}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\|_{L^{2}(\Omega)} (2.26)

where

uN(x)=2ρL1(G)Ni=1Nβig(x,θi)u_{N}(x)={2\|\rho\|_{L^{1}(G)}\over N}\sum_{i=1}^{N}\beta_{i}g(x,\theta_{i}^{\ast})

and βi[0,1]\beta_{i}\in[0,1].

Proof.

Let nj=λ(Gj)nn_{j}=\lceil\lambda(G_{j})n\rceil and θi,jGj(1inj)\theta_{i,j}\in G_{j}(1\leq i\leq n_{j}). Define N=j=1Mnj\displaystyle N=\sum_{j=1}^{M}n_{j} and

uN(x)=ρL1(G)i=1Mλ(Gi)gniiwithgnii=1nij=1nig(θi,j).u_{N}(x)=\|\rho\|_{L^{1}(G)}\sum_{i=1}^{M}\lambda(G_{i})g_{n_{i}}^{i}\quad\mbox{with}\quad g_{n_{i}}^{i}={1\over n_{i}}\sum_{j=1}^{n_{i}}g(\theta_{i,j}).

Since u(x)=ρL1(G)i=1Mλ(Gi)𝔼Gig,\displaystyle u(x)=\|\rho\|_{L^{1}(G)}\sum_{i=1}^{M}\lambda(G_{i})\mathbb{E}_{G_{i}}g, by Lemma 2.1,

𝔼nuuNL2(Ω)2=ρL1(G)2j=1Mλ2(Gj)nj𝔼Gj𝔼GjggL2(Ω)2ρL1(G)2j=1Mλ2(Gj)njsupθj,θjGjg(x,θj)g(x,θj)L2(Ω)2.\begin{split}\mathbb{E}_{n}\|u-u_{N}\|_{L^{2}(\Omega)}^{2}=&\|\rho\|_{L^{1}(G)}^{2}\sum_{j=1}^{M}{\lambda^{2}(G_{j})\over n_{j}}\mathbb{E}_{G_{j}}\|\mathbb{E}_{G_{j}}g-g\|^{2}_{L^{2}(\Omega)}\\ \leq&\|\rho\|_{L^{1}(G)}^{2}\sum_{j=1}^{M}{\lambda^{2}(G_{j})\over n_{j}}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\|^{2}_{L^{2}(\Omega)}.\end{split} (2.27)

Since λ(Gj)nj1n{\lambda(G_{j})\over n_{j}}\leq{1\over n} and j=1Mλ(Gj)=1\displaystyle\sum_{j=1}^{M}\lambda(G_{j})=1,

𝔼nuuNL2(Ω)2n1ρL1(G)2max1jMsupθj,θjGjg(x,θj)g(x,θj)L2(Ω)2.\mathbb{E}_{n}\|u-u_{N}\|_{L^{2}(\Omega)}^{2}\leq n^{-1}\|\rho\|_{L^{1}(G)}^{2}\max_{1\leq j\leq M}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\|^{2}_{L^{2}(\Omega)}. (2.28)

There exist {θi,j}\{\theta_{i,j}^{\ast}\} such that θi,jGi\theta_{i,j}^{\ast}\in G_{i} and

uuNL2(Ω)2n1ρL1(G)2max1jMsupθj,θjGjg(x,θj)g(x,θj)L2(Ω)2.\|u-u_{N}\|_{L^{2}(\Omega)}^{2}\leq n^{-1}\|\rho\|_{L^{1}(G)}^{2}\max_{1\leq j\leq M}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\|^{2}_{L^{2}(\Omega)}. (2.29)

Note that nNn+M2nn\leq N\leq n+M\leq 2n,

uN(x)=2ρL1(G)Nj=1MNλ(Gj)2nji=1njg(x,θi,j)=2ρL1(G)Nj=1Mβi,ji=1njg(x,θi,j)u_{N}(x)={2\|\rho\|_{L^{1}(G)}\over N}\sum_{j=1}^{M}\frac{N\lambda(G_{j})}{2n_{j}}\sum_{i=1}^{n_{j}}g(x,\theta_{i,j}^{\ast})={2\|\rho\|_{L^{1}(G)}\over N}\sum_{j=1}^{M}\beta_{i,j}\sum_{i=1}^{n_{j}}g(x,\theta_{i,j}^{\ast})

with

βi,j=Nλ(Gj)2nj2λ(Gj)n2λ(Gj)n1,\beta_{i,j}=\frac{N\lambda(G_{j})}{2n_{j}}\leq\frac{2\lambda(G_{j})n}{2\lambda(G_{j})n}\leq 1, (2.30)

which completes the proof. ∎

2.2 Barron spectral space

Let us use a simple example to motivate the Barron space. Consider the Fourier transform of a real function u:du:\mathbb{R}^{d}\mapsto\mathbb{R}

u^(ω)=(2π)d/2deiωxu(x)𝑑x.\hat{u}(\omega)=(2\pi)^{-d/2}\int_{\mathbb{R}^{d}}e^{-i\omega\cdot x}u(x)dx. (2.31)

This gives the following integral representation of uu in terms of the cosine function

u(x)=Redeiωxu^(ω)𝑑ω=dcos(ωx+b(ω))|u^(ω)|𝑑ω,u(x)=Re\int_{\mathbb{R}^{d}}e^{i\omega\cdot x}\hat{u}(\omega)d\omega=\int_{\mathbb{R}^{d}}\cos(\omega\cdot x+b(\omega))|\hat{u}(\omega)|d\omega, (2.32)

where u^(ω)=eib(ω)|u^(ω)|\hat{u}(\omega)=e^{ib(\omega)}|\hat{u}(\omega)|. Let

g(x,ω)=cos(ωx+b(ω)) and ρ(ω)=|u^(ω)|.g(x,\omega)=\cos(\omega\cdot x+b(\omega))\quad\mbox{ and }\quad\rho(\omega)=|\hat{u}(\omega)|. (2.33)

Thus,

u(x)=dg(x,ω)ρ(ω)𝑑ω,u(x)=\int_{\mathbb{R}^{d}}g(x,\omega)\rho(\omega)d\omega, (2.34)

If

d|u^(ω)|𝑑ω<,\int_{\mathbb{R}^{d}}|\hat{u}(\omega)|d\omega<\infty,

then ρL1<\|\rho\|_{L^{1}}<\infty. By applying the Lemma 2.3, there exist ωid\omega_{i}\in\mathbb{R}^{d} such that

uuN0,ΩN1/2u^L1(d).\|u-u_{N}\|_{0,\Omega}\leq N^{-1/2}\|\hat{u}\|_{L^{1}(\mathbb{R}^{d})}. (2.35)

where

uN(x)=u^L1(d)Ni=1Ncos(ωix+b(ωi))u_{N}(x)={\|\hat{u}\|_{L^{1}(\mathbb{R}^{d})}\over N}\sum_{i=1}^{N}\cos(\omega_{i}\cdot x+b(\omega_{i})) (2.36)

More generally, we consider the approximation property in HmH^{m}-norm. By (2.32),

αu(x)=dcos|α|(ωx+b(ω))ωα|u^(ω)|𝑑ω,|α|m.\partial^{\alpha}u(x)=\int_{\mathbb{R}^{d}}\cos^{|\alpha|}(\omega\cdot x+b(\omega))\omega^{\alpha}|\hat{u}(\omega)|d\omega,\quad\forall\ |\alpha|\leq m. (2.37)

For any positive integer mm, let

gm(x,ω)=cos(ωx+b(ω))(1+ω)mandρm(ω)=(1+ω)m|u^(ω)|,g_{m}(x,\omega)={\cos(\omega\cdot x+b(\omega))\over(1+\|\omega\|)^{m}}\quad\mbox{and}\quad\rho_{m}(\omega)=(1+\|\omega\|)^{m}|\hat{u}(\omega)|, (2.38)

where

ρmL1(d)=d(1+ω)m|u^(ω)|𝑑ω<.\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}=\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{m}|\hat{u}(\omega)|d\omega<\infty.

Then, u(x)=dgm(x,ω)ρm𝑑ω=ρmL1(d)𝔼gm(x,ω)\displaystyle u(x)=\int_{\mathbb{R}^{d}}g_{m}(x,\omega)\rho_{m}d\omega=\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}\mathbb{E}g_{m}(x,\omega). Define

uN(x)=ρmL1(d)Ni=1Ngm(x,ωi)=ρmL1(d)Ni=1Ncos(ωix+b(ωi))(1+ωi)m.u_{N}(x)={\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}\over N}\sum_{i=1}^{N}g_{m}(x,\omega_{i})={\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}\over N}\sum_{i=1}^{N}{\cos(\omega_{i}\cdot x+b(\omega_{i}))\over(1+\|\omega_{i}\|)^{m}}. (2.39)

It holds that

α(u(x)uN(x))=ρmL1(d)Ni=1N𝔼α(gm(x,ω)gm(x,ωi)).\partial^{\alpha}(u(x)-u_{N}(x))={\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}\over N}\sum_{i=1}^{N}\mathbb{E}\partial^{\alpha}(g_{m}(x,\omega)-g_{m}(x,\omega_{i})).

By Lemma 2.1,

𝔼N|α|mα(u(x)uN(x))0,Ω2\displaystyle\mathbb{E}_{N}\sum_{|\alpha|\leq m}\|\partial^{\alpha}(u(x)-u_{N}(x))\|_{0,\Omega}^{2} ρmL1(d)2𝔼N|α|m1N2i=1N(𝔼α(gm(x,ω)gm(x,ωi)))2\displaystyle\leq\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}^{2}\mathbb{E}_{N}\sum_{|\alpha|\leq m}\frac{1}{N^{2}}\sum_{i=1}^{N}\left(\mathbb{E}\partial^{\alpha}(g_{m}(x,\omega)-g_{m}(x,\omega_{i}))\right)^{2} (2.40)
ρmL1(d)2N|α|m𝔼(αgm(x,ω))2\displaystyle\leq{\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}^{2}\over N}\sum_{|\alpha|\leq m}\mathbb{E}\left(\partial^{\alpha}g_{m}(x,\omega)\right)^{2} (2.41)

Note that the definitions of gmg_{m} and ρm\rho_{m} in (2.38) guarantee that

|αgm(x,ω)|1.|\partial^{\alpha}g_{m}(x,\omega)|\leq 1.

Thus,

𝔼N|α|mα(u(x)uN(x))0,Ω2ρmL1(d)2N.\mathbb{E}_{N}\sum_{|\alpha|\leq m}\|\partial^{\alpha}(u(x)-u_{N}(x))\|_{0,\Omega}^{2}\lesssim{\|\rho_{m}\|_{L^{1}(\mathbb{R}^{d})}^{2}\over N}.

This implies that there exist ωid\omega_{i}\in\mathbb{R}^{d} such that

uuNHm(Ω)N1/2d(1+ω)m|u^(ω)|𝑑ω.\|u-u_{N}\|_{H^{m}(\Omega)}\lesssim N^{-1/2}\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{m}|\hat{u}(\omega)|d\omega. (2.42)

Given vL2(Ω)v\in L^{2}(\Omega), consider all the possible extension vE:dv_{E}:\mathbb{R}^{d}\mapsto\mathbb{R} with vE|Ω=vv_{E}|_{\Omega}=v and define the Barron spectral norm for any s1s\geq 1:

vBs(Ω)=infvE|Ω=vd(1+ω)s|v^E(ω)|𝑑ω\|v\|_{B^{s}(\Omega)}=\inf_{v_{E}|_{\Omega}=v}\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{s}|\hat{v}_{E}(\omega)|d\omega (2.43)

and Barron spectral space

Bs(Ω)={vL2(Ω):vBs(Ω)<}.B^{s}(\Omega)=\{v\in L^{2}(\Omega):\|v\|_{B^{s}(\Omega)}<\infty\}. (2.44)

In summary, we have

uuNHm(Ω)N1/2uBm(Ω).\|u-u_{N}\|_{H^{m}(\Omega)}\lesssim N^{-1/2}\|u\|_{B^{m}(\Omega)}. (2.45)

The estimate of (2.35), first obtained in [Jones, 1992] using a slightly different technique, appears to be the first asymptotic error estimate for the artificial neural network. [Barron, 1993] extended Jones’s estimate (2.35) to sigmoidal type of activation function in place of cosine.

The above short discussions reflect the core idea in the analysis of approximation property of artificial neural networks. Namely, represent ff as an expectation of some probability distribution as in (2.34) and then a simple application of Monte-Carlo sampling then leads to error estimate like (2.42) for a special neural network function given by (2.36) using σ=cos\sigma=\cos an activation function. For a more general activation function σ\sigma, we just need to derive a corresponding representation like (2.34) with gg in terms of σ\sigma. Quantitative estimates on the order of approximation are obtained for sigmoidal activation functions are obtained in [Barron, 1993] and for periodic activation functions in [Mhaskar and Micchelli, 1995, Mhaskar and Micchelli, 1994]. Error estimates in Sobolev norms for general activation functions can be found in [Hornik et al., 1994]. A review of a variety of known results, especially for networks with one hidden layer, can be found in [Pinkus, 1999]. More recently, these results have been improved by a factor of n1/dn^{1/d} in [Klusowski and Barron, 2016] using the idea of stratified sampling, based in part on the techniques in [Makovoz, 1996]. [Siegel and Xu, 2020] provides an analysis for general activation functions under very weak assumptions which applies to essentially all activation functions used in practice. In [E et al., 2019a, E et al., 2019b, E and Wojtowytsch, 2020], a more refined definition of the Barron norm is introduced to give sharper approximation error bounds of neural networks.

The following lemma shows some relationship between Sobolev norm and the Barron spectral norm.

Lemma 2.5.

Let m0m\geq 0 be an integer and Ωd\Omega\subset\mathbb{R}^{d} a bounded domain. Then for any Schwartz function vv, we have

vHm(Ω)vBm(Ω)vHm+d2+ϵ(Ω).\|v\|_{H^{m}(\Omega)}\lesssim\|v\|_{{B}^{m}(\Omega)}\lesssim\|v\|_{H^{m+{d\over 2}+\epsilon}(\Omega)}. (2.46)
Proof.

The first inequality in (2.46) and its proof can be found in [Siegel and Xu, 2020]. A version of the second inequality in (2.46) and its proof can be found in [Barron, 1993]. Below is a proof, by definition and Cauchy-Schwarz inequality,

vBm(Ω)=\displaystyle\|v\|_{B^{m}(\Omega)}= infvE|Ω=v(d(1+ω)m|v^E(ω)|𝑑ω)2\displaystyle\inf_{v_{E}|_{\Omega}=v}\left(\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{m}|\hat{v}_{E}(\omega)|d\omega\right)^{2} (2.47)
\displaystyle\leq d(1+ω)d2ϵ𝑑ωinfvE|Ω=vd(1+ω)d+2m+2ϵ|v^E(ω)|2𝑑ω\displaystyle\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{-d-2\epsilon}d\omega\inf_{v_{E}|_{\Omega}=v}\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{d+2m+2\epsilon}|\hat{v}_{E}(\omega)|^{2}d\omega (2.48)
\displaystyle\lesssim infvE|Ω=vd(1+ω)d+2m+2ϵ|v^E(ω)|2𝑑ωvHm+d2+ϵ(Ω).\displaystyle\inf_{v_{E}|_{\Omega}=v}\int_{\mathbb{R}^{d}}(1+\|\omega\|)^{d+2m+2\epsilon}|\hat{v}_{E}(\omega)|^{2}d\omega\lesssim\|v\|_{H^{m+{d\over 2}+\epsilon}(\Omega)}. (2.49)

3 Finite neuron functions and approximation properties

As mentioned before, for m=1m=1, the finite element for (5.1) can be given by piecewise linear function in any dimension d1d\geq 1. As shown in [He et al., 2020b], the linear finite element function can be represented by deep neural network with ReLU as activation functions. Here

ReLU(x)=x+=max(0,x).\mbox{ReLU}(x)=x_{+}=\max(0,x). (3.1)

In this paper, we will consider the power of ReLU as activation functions

ReLUk(x)=[x+]k=[max(0,x)]k.\mbox{ReLU}^{k}(x)=[x_{+}]^{k}=[\max(0,x)]^{k}. (3.2)

We will use a short-hand notation that x+k=[x+]kx_{+}^{k}=[x_{+}]^{k} in the rest of the paper.

We consider the following neuron network functional class with one hidden layer:

VNk={i=1Nai(wix+bi)+k,ai,bi1,wi1×d}.V_{N}^{k}=\left\{\sum_{i=1}^{N}a_{i}(w_{i}x+b_{i})_{+}^{k},a_{i},b_{i}\in\mathbb{R}^{1},w_{i}\in\mathbb{R}^{1\times d}\right\}. (3.3)

We note that VNkV_{N}^{k} is not a linear vector space. The definition of neural network function class such as (3.3) can be traced back in [McCulloch and Pitts, 1943] and its early mathematical analysis can be found in [Hornik et al., 1989, Cybenko, 1989, Funahashi, 1989].

The functions in VNkV^{k}_{N} as defined in (3.3) will be known as finite neuron functions in this paper.

Lemma 3.1.

For any k1k\geq 1, VNkV_{N}^{k} consists of functions that are piecewise polynomials of degree kk with respect to a grid whose boundaries are given by intersection of the following hyperplanes

wix+bi=0,1iN.w_{i}x+b_{i}=0,\quad 1\leq i\leq N.

see Fig 3.1 - 4.1. Furthermore, if kmk\geq m,

VNk(Ω)Hk(Ω)Hm(Ω).V_{N}^{k}(\Omega)\subset H^{k}(\Omega)\subset H^{m}(\Omega).
Refer to caption
Figure 3.1: Hyperplanes with =1\ell=1

The main goal of this section is to prove that the following type of error estimate holds, for some δ0\delta\geq 0,

infvNVNkuvNHm(Ω)N12δ.\inf_{v_{N}\in V_{N}^{k}}\|u-v_{N}\|_{H^{m}(\Omega)}\lesssim N^{-{1\over 2}-\delta}. (3.4)

We will use two different approaches to establish (3.4). The first approach, presented in §3.1, mainly follows [Hornik et al., 1994] and [Siegel and Xu, 2020] that gives error estimates for a general class of activation functions. The second approach, presented in §3.2, follows [Klusowski and Barron, 2016] that gives error estimates specifically for ReLU activation function.

We assume that Ωd\Omega\subset\mathbb{R}^{d} is a given bounded domain. Thus,

T=maxxΩ¯x<.T=\max_{x\in\bar{\Omega}}\|x\|<\infty. (3.5)

3.1 B-spline as activation functions

The activation function [ReLU]k{\rm[ReLU]}^{k} (3.2) are related to cardinal B-Splines. A cardinal B-Spline of degree k0k\geq 0 denoted by bkb^{k}, is defined by convolution as

bk(x)=(bk1b0)(x)=bk1(xt)b0(t)𝑑t,b^{k}(x)=(b^{k-1}*b^{0})(x)=\int_{\mathbb{R}}b^{k-1}(x-t)b^{0}(t)dt, (3.6)

where

b0(x)={1x[0,1),0otherwise.b^{0}(x)=\left\{\begin{array}[]{lr}1&x\in[0,1),\\ 0&\hbox{otherwise}.\end{array}\right. (3.7)
Refer to caption
Figure 3.2: Plots of some B-spline basis

More explicitly, see [de Boor, 1971], for any x[0,k+1]x\in[0,k+1] and k1k\geq 1, we have

bk(x)=xkbk1(x)+k+1xkbk1(x1),b^{k}(x)=\frac{x}{k}b^{k-1}(x)+\frac{k+1-x}{k}b^{k-1}(x-1), (3.8)

or

bk(x)=(k+1)i=0k+1wi(ix)+k and wi=j=0,jik+11ij.b^{k}(x)=(k+1)\sum_{i=0}^{k+1}w_{i}(i-x)_{+}^{k}\hbox{~{}and~{}}w_{i}={\displaystyle\prod_{j=0,j\neq i}^{k+1}}\frac{1}{i-j}. (3.9)

We note that all bkb^{k} are locally supported and see Fig. 3.2 for their plots.

For an uniform grid with mesh size h=1n+1h=\frac{1}{n+1}, we define

bj,hk(x)=bk(xhj).b^{k}_{j,h}(x)=b^{k}(\frac{x}{h}-j). (3.10)

Then the cardinal B-Spline series of degree kk on the uniform grid is

Snk={v(x)=j=kncjbj,hk(x)}.S_{n}^{k}=\Big{\{}v(x)=\sum_{j=-k}^{n}c_{j}b^{k}_{j,h}(x)\Big{\}}. (3.11)
Lemma 3.2.

For VNkV_{N}^{k} and SnkS_{n}^{k} defined by (3.3) and (3.11), we have

SnkVn+k+1k.S_{n}^{k}\subset V_{n+k+1}^{k}. (3.12)

As a result, for any bounded domain Ω1\Omega\subset\mathbb{R}^{1}, we have

infvVNkuvm,ΩinfvSNk1kuvm,ΩNm(k+1)uk+1,Ω\inf_{v\in V_{N}^{k}}\|u-v\|_{m,\Omega}\leq\inf_{v\in S_{N-k-1}^{k}}\|u-v\|_{m,\Omega}\lesssim N^{m-(k+1)}\|u\|_{k+1,\Omega} (3.13)

Given an activation function σ\sigma, consider its Fourier transformation:

σ^(a)=12πσ(t)eiat𝑑t.\hat{\sigma}(a)=\frac{1}{2\pi}\int_{\mathbb{R}}\sigma(t)e^{-iat}dt. (3.14)

For any a0a\neq 0 with σ^(a)0\hat{\sigma}(a)\neq 0, by making a change of variables t=a1ωx+bt=a^{-1}\omega\cdot x+b and dt=dbdt=db, we have

σ^(a)\displaystyle\hat{\sigma}(a) =12πσ(a1ωx+b)eia(a1ωx+b)𝑑b=eiωx12πσ(a1ωx+b)eiab𝑑b.\displaystyle=\frac{1}{2\pi}\int_{\mathbb{R}}\sigma(a^{-1}\omega\cdot x+b)e^{-ia(a^{-1}\omega\cdot x+b)}db=e^{-i\omega\cdot x}\frac{1}{2\pi}\int_{\mathbb{R}}\sigma(a^{-1}\omega\cdot x+b)e^{-iab}db. (3.15)

This implies that

eiωx=12πσ^(a)σ(a1ωx+b)eiab𝑑b.e^{i\omega\cdot x}=\frac{1}{2\pi\hat{\sigma}(a)}\int_{\mathbb{R}}\sigma(a^{-1}\omega\cdot x+b)e^{-iab}db. (3.16)

We write u^(ω)=eiθ(ω)|u^(ω)|\hat{u}(\omega)=e^{-i\theta(\omega)}|\hat{u}(\omega)| and then obtain the following integral represntation:

u(x)=deiωxu^(ω)𝑑ω=d12πσ^(a)σ(a1ωx+b)|u^(ω)|ei(ab+θ(ω))𝑑b𝑑ωu(x)=\int_{\mathbb{R}^{d}}e^{i\omega\cdot x}\hat{u}(\omega)d\omega=\int_{\mathbb{R}^{d}}\int_{\mathbb{R}}\frac{1}{2\pi\hat{\sigma}(a)}\sigma\left(a^{-1}\omega\cdot x+b\right)|\hat{u}(\omega)|e^{-i(ab+\theta(\omega))}dbd\omega (3.17)

Now we consider activation function σ(x)=bk(x)\sigma(x)=b^{k}(x) and σ^\hat{\sigma} be the Fourier transform of σ(x)\sigma(x). Note that, by (3.2),

σ^(a)=(1eiaia)k+1=(2asina2)k+1eia(k+1)2.\displaystyle\hat{\sigma}(a)=\left({1-e^{-ia}\over ia}\right)^{k+1}=\left({2\over a}\sin{a\over 2}\right)^{k+1}e^{-{ia(k+1)\over 2}}. (3.18)

We first take a=πa=\pi in (3.18). Thus,

σ^(π)=(2π)k+1eiπ(k+1)2.\hat{\sigma}(\pi)=\left({2\over\pi}\right)^{k+1}e^{-{i\pi(k+1)\over 2}}. (3.19)

Combining (3.17) and (3.19), we obtain that

u(x)=14(π2)kdσ(π1ωx+b)|u^(ω)|ei(πb+π(k+1)2+θ(ω))𝑑b𝑑ωu(x)=\frac{1}{4}\left({\pi\over 2}\right)^{k}\int_{\mathbb{R}^{d}}\int_{\mathbb{R}}\sigma\left(\pi^{-1}\omega\cdot x+b\right)|\hat{u}(\omega)|e^{-i(\pi b+{\pi(k+1)\over 2}+\theta(\omega))}dbd\omega (3.20)

An application of the Monte Carlo method in Lemma 2.1 to the integral representation (3.20) leads to the following estimate.

Theorem 3.3.

For any 0mk0\leq m\leq k, there exist ωid\omega_{i}\in\mathbb{R}^{d}, bib_{i}\in\mathbb{R} such that

uuNHm(Ω)N12uBm+1(Ω)\left\|u-u_{N}\right\|_{H^{m}(\Omega)}\lesssim N^{-{1\over 2}}\|u\|_{{B}^{m+1}(\Omega)} (3.21)

with

uN(x)=i=1Nβibk(π1ωix+bi).u_{N}(x)=\sum_{i=1}^{N}\beta_{i}b^{k}\left(\pi^{-1}\omega_{i}\cdot x+b_{i}\right). (3.22)

Based on the integral representation (3.20), a stratified analysis similar to the one in [Siegel and Xu, 2020] leads to the following result.

Theorem 3.4.

For any 0mk0\leq m\leq k and positive ϵ\epsilon, there exist there exist ωid\omega_{i}\in\mathbb{R}^{d}, bib_{i}\in\mathbb{R} such that

uuNHm(Ω)N12ϵ(d+1)(2+ϵ)uBm+1+ϵ(Ω)\left\|u-u_{N}\right\|_{H^{m}(\Omega)}\leq N^{-{1\over 2}-{\epsilon\over(d+1)(2+\epsilon)}}\|u\|_{{B}^{m+1+\epsilon}(\Omega)} (3.23)

with

uN(x)=i=1Nβibk(ω¯ix+bi).u_{N}(x)=\sum_{i=1}^{N}\beta_{i}b^{k}\left(\bar{\omega}_{i}\cdot x+b_{i}\right). (3.24)

Next, we try to improve the estimate (3.23). Again, we will use (3.17). Let aω=4πω4π+π\displaystyle a_{\omega}=4\pi\lceil{\|\omega\|\over 4\pi}\rceil+\pi in (3.18) and ω¯=ωaω\displaystyle\bar{\omega}={\omega\over a_{\omega}}. We have

σ^(aω)=(2aω)k+1,ω+πaωω+5π,ω¯1,\hat{\sigma}(a_{\omega})=\left({2\over a_{\omega}}\right)^{k+1},\quad\|\omega\|+\pi\leq a_{\omega}\leq\|\omega\|+5\pi,\quad\|\bar{\omega}\|\leq 1, (3.25)

which, together with (3.17), indicates that

u(x)=d12πσ(ω¯x+b)(aω2)k+1u^(ω)eiaω(b+k+12)𝑑b𝑑ω.\begin{split}u(x)=\int_{\mathbb{R}^{d}}\int_{\mathbb{R}}\frac{1}{2\pi}\sigma\left(\bar{\omega}\cdot x+b\right)\left({a_{\omega}\over 2}\right)^{k+1}\hat{u}(\omega)e^{-ia_{\omega}(b+{k+1\over 2})}dbd\omega.\end{split} (3.26)
Theorem 3.5.

There exist ω¯i1\|\bar{\omega}_{i}\|\leq 1, |bi|T+k+1|b_{i}|\leq T+k+1 such that

uuNHm(Ω)N121d+1uBk+1(Ω)\left\|u-u_{N}\right\|_{H^{m}(\Omega)}\lesssim N^{-{1\over 2}-{1\over d+1}}\|u\|_{{B}^{k+1}(\Omega)} (3.27)

with

uN(x)=i=1Nβibk(ω¯ix+bi).u_{N}(x)=\sum_{i=1}^{N}\beta_{i}b^{k}\left(\bar{\omega}_{i}\cdot x+b_{i}\right). (3.28)
Proof.

We write (3.26) as follows

u(x)=dg(x,b,ω)ρ(b,ω)𝑑b𝑑ω\displaystyle u(x)=\int_{\mathbb{R}^{d}}\int_{\mathbb{R}}g(x,b,\omega)\rho(b,\omega)dbd\omega

with

u^(ω)=eiθ(ω)|u^(ω)|,θ~(ω)=θ(ω)+aω(b+k+12)\hat{u}(\omega)=e^{-i\theta(\omega)}|\hat{u}(\omega)|,\quad\tilde{\theta}(\omega)=\theta(\omega)+a_{\omega}(b+{k+1\over 2})

and

g(x,b,ω)=σ(ω¯x+b)sgn(cosθ~(ω)),g(x,b,\omega)=\sigma\left({\bar{\omega}}\cdot x+b\right)sgn(\cos\tilde{\theta}(\omega)), (3.29)
ρ(b,ω)=1(2π)d(aω2)k+1|u^(ω)||cosθ~(ω)|.\rho(b,\omega)=\frac{1}{(2\pi)^{d}}\left({a_{\omega}\over 2}\right)^{k+1}|\hat{u}(\omega)||\cos\tilde{\theta}(\omega)|. (3.30)

Note that

ω¯1,|b|T+k+1.\|\bar{\omega}\|\leq 1,\quad|b|\leq T+k+1. (3.31)

Let

G={(ω,b):ωd,|b|T+k+1},G~={(ω¯,b):ω¯1,|b|T+k+1}.G=\{(\omega,b):\omega\in\mathbb{R}^{d},\ |b|\leq T+k+1\},\ \tilde{G}=\{(\bar{\omega},b):\|\bar{\omega}\|\leq 1,\ |b|\leq T+k+1\}.

For any positive integer nn, divide G~\tilde{G} into M~(M~n2)\tilde{M}(\tilde{M}\leq{n\over 2}) nonoverlapping subdomains, say G~=G~1G~2G~M~\tilde{G}=\tilde{G}_{1}\cup\tilde{G}_{2}\cup\cdots\cup\tilde{G}_{\tilde{M}}, such that

|bb|n1d+1,|ω¯ω¯|n1d+1,(ω¯,b),(ω¯,b)G~i, 1iM~.|b-b^{\prime}|\lesssim n^{-{1\over d+1}},\quad|\bar{\omega}-\bar{\omega}^{\prime}|\lesssim n^{-{1\over d+1}},\quad(\bar{\omega},b),\ (\bar{\omega}^{\prime},b^{\prime})\in\tilde{G}_{i},\ 1\leq i\leq\tilde{M}. (3.32)

Define M=2M~M=2\tilde{M} and for 1iM~1\leq i\leq\tilde{M},

Gi={(ω,b):(ω¯,b)G~i,cosθ~(ω)0},GM~+i={(ω,b):(ω¯,b)G~i,cosθ~(ω)0}.G_{i}=\{(\omega,b):(\bar{\omega},b)\in\tilde{G}_{i},\ \cos\tilde{\theta}(\omega)\geq 0\},\ G_{\tilde{M}+i}=\{(\omega,b):(\bar{\omega},b)\in\tilde{G}_{i},\ \cos\tilde{\theta}(\omega)\leq 0\}.

Thus, G=G1G2GMG=G_{1}\cup G_{2}\cup\cdots\cup G_{M} with G~iG~j=\tilde{G}_{i}\cap\tilde{G}_{j}=\varnothing if iji\neq j, and

|bb|n1d+1,|ω¯ω¯|n1d+1,sgn(cosθ~(ω))=sgn(cosθ~(ω)).|b-b^{\prime}|\lesssim n^{-{1\over d+1}},\quad|\bar{\omega}-\bar{\omega}^{\prime}|\lesssim n^{-{1\over d+1}},\quad sgn(\cos\tilde{\theta}(\omega))=sgn(\cos\tilde{\theta}(\omega^{\prime})). (3.33)

Let ni=λ(Gi)nn_{i}=\lceil\lambda(G_{i})n\rceil, N=i=1MniN=\displaystyle\sum_{i=1}^{M}n_{i} and

uN(x)=ρL1(G)i=1Mλ(Gi)nij=1nig(x,θi,j).u_{N}(x)=\|\rho\|_{L^{1}(G)}\sum_{i=1}^{M}\frac{\lambda(G_{i})}{n_{i}}\sum_{j=1}^{n_{i}}g(x,\theta_{i,j}). (3.34)

It holds that

𝔼(uuNHm(Ω)2)ρL1(G)i=1Mλ2(Gi)nisupθi,θiGig(x,θi)g(x,θi)Hm(Ω)2\begin{split}\mathbb{E}\left(\left\|u-u_{N}\right\|_{H^{m}(\Omega)}^{2}\right)\leq&\|\rho\|_{L^{1}(G)}\sum_{i=1}^{M}\frac{\lambda^{2}(G_{i})}{n_{i}}\sup_{\theta_{i},\theta_{i}^{\prime}\in G_{i}}\|g(x,\theta_{i})-g(x,\theta_{i}^{\prime})\|^{2}_{H^{m}(\Omega)}\end{split} (3.35)

with θ=(b,ω)\theta=(b,\omega). For any (b,ω)Gi(b,\omega)\in G_{i}, 1iM1\leq i\leq M, if km+1k\geq m+1,

|g(x,θ)g(x,θ)||bb|+|ωω|n1d+1|g(x,\theta)-g(x,\theta^{\prime})|\lesssim|b-b^{\prime}|+|\omega-\omega^{\prime}|\lesssim n^{-{1\over d+1}} (3.36)

Thus,

i=1Mλ2(Gi)nisupθi,θiGig(x,θi)g(x,θi)Hm(Ω)2n2d+1|Ω|.\sum_{i=1}^{M}\frac{\lambda^{2}(G_{i})}{n_{i}}\sup_{\theta_{i},\theta_{i}^{\prime}\in G_{i}}\|g(x,\theta_{i})-g(x,\theta_{i}^{\prime})\|^{2}_{H^{m}(\Omega)}\lesssim n^{-{2\over d+1}}|\Omega|. (3.37)

Thus,

𝔼(uuNHm(Ω)2)n12d+1|Ω|ρL1(G).\begin{split}\mathbb{E}\left(\left\|u-u_{N}\right\|_{H^{m}(\Omega)}^{2}\right)\lesssim&n^{-1-{2\over d+1}}|\Omega|\|\rho\|_{L^{1}(G)}.\end{split} (3.38)

Since aω+5πa\leq\|\omega\|+5\pi,

ρL1(G)G(ω+1)k+1|u^(ω)|𝑑ω𝑑buBk+1(Ω).\|\rho\|_{L^{1}(G)}\lesssim\int_{G}(\|\omega\|+1)^{k+1}|\hat{u}(\omega)|d\omega db\lesssim\|u\|_{B^{k+1}(\Omega)}.

Note that nN2nn\leq N\leq 2n. Thus, there exist ωid\omega_{i}\in\mathbb{R}^{d}, βi\beta_{i}, bib_{i}\in\mathbb{R} such that

uuNHm(Ω)N121d+1uBk+1(Ω),\left\|u-u_{N}\right\|_{H^{m}(\Omega)}\lesssim N^{-{1\over 2}-{1\over d+1}}\|u\|_{B^{k+1}(\Omega)}, (3.39)

which completes the proof. ∎

The above analysis can also be applied to more general activation functions with compact support.

Theorem 3.6.

Suppose that σWm+1,()\sigma\in W^{m+1,\infty}(\mathbb{R}) that has a compact support. If for any a>0a>0, there exists a~>0\tilde{a}>0 such that

a~a,|σ^(a~)|a,\tilde{a}\gtrsim a,\quad|\hat{\sigma}(\tilde{a})|\gtrsim a^{-\ell}, (3.40)

then, there exist ωid\omega_{i}\in\mathbb{R}^{d} and bib_{i}\in\mathbb{R} such that

uuNHm(Ω)N121d+1uB(Ω),\left\|u-u_{N}\right\|_{H^{m}(\Omega)}\lesssim N^{-{1\over 2}-{1\over d+1}}\|u\|_{B^{\ell}(\Omega)}, (3.41)

where

uN(x)=i=1Nβiσ(ω¯ix+bi).u_{N}(x)=\sum_{i=1}^{N}\beta_{i}\sigma\left(\bar{\omega}_{i}\cdot x+b_{i}\right). (3.42)

3.2 [ReLU]k as activation functions

Rather than using general Fourier transform as in (3.16) to represent eiωxe^{i\omega\cdot x} in terms of σ(ωx+b)\sigma(\omega\cdot x+b), [Klusowski and Barron, 2016] gave a different method to represent eiωxe^{i\omega\cdot x} in terms of (ωx+b)+k(\omega\cdot x+b)_{+}^{k} for k=1k=1 and 22. The following lemma gives a generalization of this representation for all k0k\geq 0.

Lemma 3.7.

For any k0k\geq 0 and xΩx\in\Omega,

eiωx=j=0k(iωx)jj!+ik+1k!ωk+10T[(ω¯xt)+keiωt+(1)k1(ω¯xt)+keiωt]𝑑t.e^{i\omega\cdot x}=\sum_{j=0}^{k}{(i\omega\cdot x)^{j}\over j!}+{i^{k+1}\over k!}\|\omega\|^{k+1}\int_{0}^{T}\left[(\bar{\omega}\cdot x-t)_{+}^{k}e^{i\|\omega\|t}+(-1)^{k-1}(-\bar{\omega}\cdot x-t)_{+}^{k}e^{-i\|\omega\|t}\right]dt. (3.43)
Proof.

For |z|c|z|\leq c, by the Taylor expansion with integral remainder,

eiz=j=0k(iz)jj!+ik+1k!0zeiu(zu)k𝑑u.e^{iz}=\sum_{j=0}^{k}{(iz)^{j}\over j!}+{i^{k+1}\over k!}\int_{0}^{z}e^{iu}(z-u)^{k}du. (3.44)

Note that

(zu)k=(zu)+k(uz)+k.(z-u)^{k}=(z-u)^{k}_{+}-(u-z)^{k}_{+}.

It follows that

0z(zu)keiu𝑑u=0z(zu)+keiu𝑑u+0z(1)k(uz)+keiu𝑑u=0z(zu)+keiu𝑑u+0z(1)k1(uz)+keiu𝑑u=0c(zu)+keiu𝑑u+(1)k1(uz)+keiudu.\begin{split}\int_{0}^{z}(z-u)^{k}e^{iu}du=&\int_{0}^{z}(z-u)_{+}^{k}e^{iu}du+\int_{0}^{z}(-1)^{k}(u-z)_{+}^{k}e^{iu}du\\ =&\int_{0}^{z}(z-u)_{+}^{k}e^{iu}du+\int_{0}^{-z}(-1)^{k-1}(-u-z)_{+}^{k}e^{-iu}du\\ =&\int_{0}^{c}(z-u)_{+}^{k}e^{iu}du+(-1)^{k-1}(-u-z)_{+}^{k}e^{-iu}du.\end{split} (3.45)

Thus,

eizj=0k(iz)jj!=ik+1k!0c[(zu)+keiu+(1)k1(zu)+keiu]𝑑u.e^{iz}-\sum_{j=0}^{k}{(iz)^{j}\over j!}={i^{k+1}\over k!}\int_{0}^{c}\left[(z-u)_{+}^{k}e^{iu}+(-1)^{k-1}(-z-u)_{+}^{k}e^{-iu}\right]du. (3.46)

Let

z=ωx,u=ωt,ω¯=ωω.z=\omega\cdot x,\quad u=\|\omega\|t,\quad\bar{\omega}={\omega\over\|\omega\|}. (3.47)

Since xT\|x\|\leq T and |ω¯x|T|\bar{\omega}\cdot x|\leq T, we obtain

eiωxj=0k(iωx)jj!=ik+1k!ωk+10T[(ω¯xt)+keiωt+(1)k1(ω¯xt)+keiωt]𝑑t,e^{i\omega\cdot x}-\sum_{j=0}^{k}{(i\omega\cdot x)^{j}\over j!}={i^{k+1}\over k!}\|\omega\|^{k+1}\int_{0}^{T}\left[(\bar{\omega}\cdot x-t)_{+}^{k}e^{i\|\omega\|t}+(-1)^{k-1}(-\bar{\omega}\cdot x-t)_{+}^{k}e^{-i\|\omega\|t}\right]dt, (3.48)

which completes the proof. ∎

Since u(x)=1(2π)ddeiωxu^(ω)𝑑ωu(x)={1\over(2\pi)^{d}}\int_{\mathbb{R}^{d}}e^{i\omega\cdot x}\hat{u}(\omega)d\omega and αu(x)=di|α|ωαeiωxu^(ω)𝑑ω,\partial^{\alpha}u(x)=\int_{\mathbb{R}^{d}}i^{|\alpha|}\omega^{\alpha}e^{i\omega\cdot x}\hat{u}(\omega)d\omega,

αu(0)xα=di|α|ωαxαu^(ω)𝑑ω.\displaystyle\partial^{\alpha}u(0)x^{\alpha}=\int_{\mathbb{R}^{d}}i^{|\alpha|}\omega^{\alpha}x^{\alpha}\hat{u}(\omega)d\omega. (3.49)

Note that (ωx)j=|α|=jj!α!ωαxα\displaystyle(\omega\cdot x)^{j}=\sum_{|\alpha|=j}{j!\over\alpha!}\omega^{\alpha}x^{\alpha}. It follows that

|α|=j1α!αu(0)xα=ij|α|=j1α!dωαxαu^(ω)𝑑ω=1j!d(iωx)ju^(ω)𝑑ω.\sum_{|\alpha|=j}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}=i^{j}\sum_{|\alpha|=j}{1\over\alpha!}\int_{\mathbb{R}^{d}}\omega^{\alpha}x^{\alpha}\hat{u}(\omega)d\omega={1\over j!}\int_{\mathbb{R}^{d}}(i\omega\cdot x)^{j}\hat{u}(\omega)d\omega. (3.50)

Let u^(ω)=|u^(ω)|eib(ω)\hat{u}(\omega)=|\hat{u}(\omega)|e^{ib(\omega)}. Then, eiωtu^(ω)=|u^(ω)|ei(ωt+b(ω))e^{i\|\omega\|t}\hat{u}(\omega)=|\hat{u}(\omega)|e^{i(\|\omega\|t+b(\omega))}. By Lemma 3.7,

u(x)|α|k1α!αu(0)xα=d(eiωxj=0k1j!(iωx)j)u^(ω)𝑑ω.=Re(ik+1k!d0T[(ω¯xt)+keiωt+(1)k1(ω¯xt)+keiω1t]u^(ω)ωk+1𝑑t𝑑ω)=1k!{1,1}d0T(zω¯xt)+ks(zt,ω)|u^(ω)|ωk+1𝑑t𝑑ω𝑑z\begin{split}&u(x)-\sum_{|\alpha|\leq k}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}\\ =&\int_{\mathbb{R}^{d}}\big{(}e^{i\omega\cdot x}-\sum_{j=0}^{k}{1\over j!}(i\omega\cdot x)^{j}\big{)}\hat{u}(\omega)d\omega.\\ =&{\rm Re}\bigg{(}{i^{k+1}\over k!}\int_{\mathbb{R}^{d}}\int_{0}^{T}\left[(\bar{\omega}\cdot x-t)_{+}^{k}e^{i\|\omega\|t}+(-1)^{k-1}(-\bar{\omega}\cdot x-t)_{+}^{k}e^{-i\|\omega\|_{\ell_{1}}t}\right]\hat{u}(\omega)\|\omega\|^{k+1}dtd\omega\bigg{)}\\ =&{1\over k!}\int_{\{-1,1\}}\int_{\mathbb{R}^{d}}\int_{0}^{T}(z\bar{\omega}\cdot x-t)_{+}^{k}s(zt,\omega)|\hat{u}(\omega)|\|\omega\|^{k+1}dtd\omega dz\end{split} (3.51)

with {1,1}r(z)𝑑z=r(1)+r(1)\int_{\{-1,1\}}r(z)dz=r(-1)+r(1) and

s(zt,ω)={(1)k+12cos(zωt+b(ω))k is odd,(1)k+22sin(zωt+b(ω))k is even.s(zt,\omega)=\begin{cases}(-1)^{k+1\over 2}\cos(z\|\omega\|t+b(\omega))&k\text{ is odd},\\ (-1)^{k+2\over 2}\sin(z\|\omega\|t+b(\omega))&k\text{ is even}.\end{cases} (3.52)

Define G={1,1}×[0,T]×dG=\{-1,1\}\times[0,T]\times\mathbb{R}^{d}, θ=(z,t,ω)G\theta=(z,t,\omega)\in G,

g(x,θ)=(zω¯xt)+ksgns(zt,ω),ρ(θ)=1(2π)d|s(zt,ω)||u^(ω)|ωk+1,λ(θ)=ρ(θ)ρL1(G).g(x,\theta)=(z\bar{\omega}\cdot x-t)_{+}^{k}{\rm sgn}s(zt,\omega),\qquad\rho(\theta)={1\over(2\pi)^{d}}|s(zt,\omega)||\hat{u}(\omega)|\|\omega\|^{k+1},\quad\lambda(\theta)={\rho(\theta)\over\|\rho\|_{L^{1}(G)}}. (3.53)

Then (3.51) can be written as

u(x)=|α|k1α!Dαu(0)xα+νk!Gg(x,θ)λ(θ)𝑑θ,u(x)=\sum_{|\alpha|\leq k}{1\over\alpha!}D^{\alpha}u(0)x^{\alpha}+{\nu\over k!}\int_{G}g(x,\theta)\lambda(\theta)d\theta, (3.54)

with ν=Gρ(θ)𝑑θ\nu=\int_{G}\rho(\theta)d\theta. In summary, we have the following lemma.

Lemma 3.8.

It holds that

u(x)=|α|k1α!αu(0)xα+νk!rk(x),xΩu(x)=\sum_{|\alpha|\leq k}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}+{\nu\over k!}r_{k}(x),\qquad x\in\Omega (3.55)

with ν=Gρ(θ)𝑑θ\nu=\int_{G}\rho(\theta)d\theta and

rk(x)=Gg(x,θ)λ(θ)𝑑θ,G={1,1}×[0,T]×d,r_{k}(x)=\int_{G}g(x,\theta)\lambda(\theta)d\theta,\qquad G=\{-1,1\}\times[0,T]\times\mathbb{R}^{d}, (3.56)

and g(x,θ)g(x,\theta), ρ(θ)\rho(\theta) and λ(θ)\lambda(\theta) defined in (3.53).

According to (3.53), the main ingredient (zω¯xt)+k(z\bar{\omega}\cdot x-t)_{+}^{k} of g(x,θ)g(x,\theta) only includes the direction ω¯\bar{\omega} of ω\omega which belongs to a bounded domain 𝕊d1\mathbb{S}^{d-1}. Thanks to the continuity of (zω¯xt)+k(z\bar{\omega}\cdot x-t)_{+}^{k} with respect to (z,ω¯,t)(z,\bar{\omega},t) and the boundedness of 𝕊d1\mathbb{S}^{d-1}, the application of the stratified sampling to the residual term of the Taylor expansion leads to the approximation property in Theorem 3.9.

Theorem 3.9.

Assume uBk+1(Ω)u\in B^{k+1}(\Omega) There exist βj[1,1]\beta_{j}\in[-1,1], ω¯j=1\|\bar{\omega}_{j}\|=1, tj[0,T]t_{j}\in[0,T] such that

uN(x)=|α|k1α!αu(0)xα+2νk!Nj=1Nβj(ω¯jxtj)+ku_{N}(x)=\sum_{|\alpha|\leq k}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}+{2\nu\over k!N}\sum_{j=1}^{N}\beta_{j}(\bar{\omega}_{j}\cdot x-t_{j})_{+}^{k} (3.57)

with ν=Gρ(θ)𝑑θ\nu=\int_{G}\rho(\theta)d\theta and ρ(θ)\rho(\theta) defined in (3.53) satisfies the following estimate

uuNHm(Ω){N121duBk+1(Ω),m<k,N12uBk+1(Ω)m=k.\|u-u_{N}\|_{H^{m}(\Omega)}\lesssim\begin{cases}N^{-{1\over 2}-{1\over d}}\|u\|_{B^{k+1}(\Omega)},&m<k,\\ N^{-{1\over 2}}\|u\|_{B^{k+1}(\Omega)}&m=k.\end{cases} (3.58)
Proof.

Let

uN(x)=|α|k1α!αu(0)xα+νk!rk,N(x),rk,N(x)=1Nj=1Nβj(ω¯jxtj)+k.u_{N}(x)=\sum_{|\alpha|\leq k}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}+{\nu\over k!}r_{k,N}(x),\qquad r_{k,N}(x)={1\over N}\sum_{j=1}^{N}\beta_{j}(\bar{\omega}_{j}\cdot x-t_{j})_{+}^{k}.

Recall the representation of u(x)u(x) in (3.55) and rk(x)r_{k}(x) in (3.56). It holds that

u(x)uN(x)=2νk!(rk(x)rk,N(x)).u(x)-u_{N}(x)={2\nu\over k!}(r_{k}(x)-r_{k,N}(x)). (3.59)

By Lemma 2.4, for any decomposition G=i=1NGi\displaystyle G=\cup_{i=1}^{N}G_{i}, there exist {θi}i=1N\{\theta_{i}\}_{i=1}^{N} and {βi}i=1N[0,1]\{\beta_{i}\}_{i=1}^{N}\in[0,1] such that

xα(uuN)L2(Ω)=νk!xα(rkrk,N)L2(Ω)1k!N1/2max1jnsupθj,θjGjxα(g(x,θj)g(x,θj))L2(Ω).\|\partial_{x}^{\alpha}(u-u_{N})\|_{L^{2}(\Omega)}={\nu\over k!}\|\partial_{x}^{\alpha}(r_{k}-r_{k,N})\|_{L^{2}(\Omega)}\leq{1\over k!N^{1/2}}\max_{1\leq j\leq n}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|\partial_{x}^{\alpha}\big{(}g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\big{)}\|_{L^{2}(\Omega)}. (3.60)

Consider a ϵ\epsilon-covering decomposition G=i=1NGiG=\cup_{i=1}^{N}G_{i} such that

z=z,|tt|<ϵ,ω¯ω¯1<ϵθ=(z,t,ω),θ=(z,t,ω)Giz=z^{\prime},\ |t-t^{\prime}|<\epsilon,\ \|\bar{\omega}-\bar{\omega}^{\prime}\|_{\ell^{1}}<\epsilon\qquad\forall\theta=(z,t,\omega),\ \theta^{\prime}=(z^{\prime},t^{\prime},\omega^{\prime})\in G_{i} (3.61)

where ω¯\bar{\omega} is defined in (3.47). For any θi,θiGi\theta_{i},\theta^{\prime}_{i}\in G_{i},

|xα(g(x,θi)g(x,θi))|=k!(k|α|)!|gα(x,ω¯,t)gα(x,ω¯,t)||\partial_{x}^{\alpha}\big{(}g(x,\theta_{i})-g(x,\theta_{i}^{\prime})\big{)}|={k!\over(k-|\alpha|)!}|g_{\alpha}(x,\bar{\omega},t)-g_{\alpha}(x,\bar{\omega}^{\prime},t^{\prime})|

with

gα(x,ω¯,t)=(zω¯xt)+k|α|ω¯α.g_{\alpha}(x,\bar{\omega},t)=(z\bar{\omega}\cdot x-t)^{k-|\alpha|}_{+}\bar{\omega}^{\alpha}. (3.62)

Since

|ω¯igα|(2T)m|α|1((k|α|)xi+2Tαi),|tgα|(k|α|)(2T)k|α|1,|\partial_{\bar{\omega}_{i}}g_{\alpha}|\leq(2T)^{m-|\alpha|-1}\big{(}(k-|\alpha|)x_{i}+2T\alpha_{i}\big{)},\qquad|\partial_{t}g_{\alpha}|\leq(k-|\alpha|)(2T)^{k-|\alpha|-1},

it follows that

|xα(g(x,θi)g(x,θi))|k!(k|α|)!(2T)k|α|1((k|α|)(|x|1+1)+2T|α|)ϵ.\big{|}\partial_{x}^{\alpha}\big{(}g(x,\theta_{i})-g(x,\theta_{i}^{\prime})\big{)}\big{|}\leq{k!\over(k-|\alpha|)!}(2T)^{k-|\alpha|-1}\bigg{(}(k-|\alpha|)(|x|_{\ell_{1}}+1)+2T|\alpha|\bigg{)}\epsilon. (3.63)

Thus, by Lemma 2.4, if m=|α|<km=|\alpha|<k,

xα(uuN)L2(Ω)|Ω|1/2(k|α|)!(2T)k|α|1((k|α|)(T+1)+2T|α|)N12ϵ.\|\partial_{x}^{\alpha}(u-u_{N})\|_{L^{2}(\Omega)}\leq{|\Omega|^{1/2}\over(k-|\alpha|)!}(2T)^{k-|\alpha|-1}\bigg{(}(k-|\alpha|)(T+1)+2T|\alpha|\bigg{)}N^{-{1\over 2}}\epsilon. (3.64)

Note that ϵN1d\epsilon\sim N^{-{1\over d}}. There exist θi,j\theta_{i,j} such that for any 0k<m0\leq k<m,

uuNHk(Ω)C(m,k,Ω)νN121d\|u-u_{N}\|_{H^{k}(\Omega)}\leq C(m,k,\Omega)\nu N^{-{1\over 2}-{1\over d}} (3.65)

with νuBk+1(Ω)\nu\leq\|u\|_{B^{k+1}(\Omega)} and

C(m,k,Ω)=|Ω|1/2(|α|k1(k|α|)!(2T)k|α|1((k|α|)(T+1)+2T|α|))1/2.C(m,k,\Omega)=|\Omega|^{1/2}\bigg{(}\sum_{|\alpha|\leq k}{1\over(k-|\alpha|)!}(2T)^{k-|\alpha|-1}\big{(}(k-|\alpha|)(T+1)+2T|\alpha|\big{)}\bigg{)}^{1/2}. (3.66)

If m=|α|=km=|\alpha|=k,

max1jMsupθj,θjGjDxα(g(x,θj)g(x,θj))L2(Ω)1.\max_{1\leq j\leq M}\sup_{\theta_{j},\theta_{j}^{\prime}\in G_{j}}\|D_{x}^{\alpha}\big{(}g(x,\theta_{j})-g(x,\theta_{j}^{\prime})\big{)}\|_{L^{2}(\Omega)}\lesssim 1.

This leads to

uuNHm(Ω)C(m,k,Ω)νN12for k=m.\|u-u_{N}\|_{H^{m}(\Omega)}\leq C(m,k,\Omega)\nu N^{-{1\over 2}}\quad\mbox{for }\ k=m. (3.67)

Note that uNu_{N} defined above can be written as

uN(x)=|α|k1α!αu(0)xα+1k!Nj=1Nβj(ω¯jxtj)+ku_{N}(x)=\sum_{|\alpha|\leq k}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}+{1\over k!N}\sum_{j=1}^{N}\beta_{j}(\bar{\omega}_{j}\cdot x-t_{j})_{+}^{k}

with βj[1,1]\beta_{j}\in[-1,1], which completes the proof. ∎

Lemma 3.10.

There exist αi\alpha_{i}, ωi\omega_{i}, bib_{i} and N2(k+dk)N\leq 2\begin{pmatrix}k+d\\ k\end{pmatrix} such that

|α|m1α!αu(0)xα=i=1Nαi(ωix+bi)+k\sum_{|\alpha|\leq m}{1\over\alpha!}\partial^{\alpha}u(0)x^{\alpha}=\sum_{i=1}^{N}\alpha_{i}(\omega_{i}\cdot x+b_{i})_{+}^{k}

with xα=x1α1x2α2xdαd,α!=α1!α2!αd!.x^{\alpha}=x_{1}^{\alpha_{1}}x_{2}^{\alpha_{2}}\cdots x_{d}^{\alpha_{d}},\quad\alpha!=\alpha_{1}!\alpha_{2}!\cdots\alpha_{d}!.

The above result can be found in [He et al., 2020a]

A combination of Theorem 3.9 and the above the lemma gives the following estimate in Theorem 3.11.

Theorem 3.11.

Suppose uBk+1(Ω)u\in B^{k+1}(\Omega). There exist βj,t\beta_{j},t\in\mathbb{R}, ωjd\omega_{j}\in\mathbb{R}^{d} such that

uN(x)=j=1Nβj(ω¯jxtj)+ku_{N}(x)=\sum_{j=1}^{N}\beta_{j}(\bar{\omega}_{j}\cdot x-t_{j})_{+}^{k} (3.68)

satisfies the following estimate

uuNHm(Ω){N121duBk+1(Ω),k>m,N12uBk+1(Ω),k=m,\|u-u_{N}\|_{H^{m}(\Omega)}\lesssim\begin{cases}N^{-{1\over 2}-{1\over d}}\|u\|_{B^{k+1}(\Omega)},\qquad k>m,\\ N^{-{1\over 2}}\|u\|_{B^{k+1}(\Omega)},\qquad k=m,\end{cases} (3.69)

where ω¯\bar{\omega} is defined in (3.47).

Remark 3.12.

We make the following comparisons between results in Section 3.1 and 3.2.

  1. 1.

    The results in 3.1 are for activation functions σ=bk\sigma=b_{k}, while the results in Section 3.2 are for activation functions σ=ReLUk\sigma={\rm ReLU}^{k}.

  2. 2.

    By (3.9), the following relation obviously holds

    VN(bk)VN+k(ReLUk).V_{N}(b_{k})\subset V_{N+k}({\rm ReLU}^{k}).

    Thus, asymptotically speaking, the results that hold for σ=bk\sigma=b_{k} also hold for σ=ReLUk\sigma={\rm ReLU}^{k}.

  3. 3.

    The results in Section 3.2 are in some cases asymptotically better than those in Section 3.1, but require more regularity assumptions on uu. For example, Theorem 3.3 only requires uBmu\in B^{m}, but Theorem 3.9 only requires uBk+1u\in B^{k+1} even for m=0m=0.

  4. 4.

    The computational efficiency for the solution of the optimization problems (5.28) or (5.31) below, may be different with different choice of activation function, namely, σ=bk\sigma=b_{k} or ReLUk{\rm ReLU}^{k}.

4 Deep finite neuron functions, adaptivity and spectral accuracy

In this section, we will study deep finite neural functions through the framework of deep neural networks and then discuss its adaptive and spectral accuracy properties.

4.1 Deep finite neuron functions

Given d,+d,\ell\in\mathbb{N}^{+}, n1,,n with n0=d,n+1=1,n_{1},\dots,n_{\ell}\in\mathbb{N}\mbox{ with }n_{0}=d,n_{\ell+1}=1,

θi(x)=ωix+bi,ωini+1×ni,bni+1,\theta^{i}(x)=\omega_{i}\cdot x+b_{i},\quad\omega_{i}\in\mathbb{R}^{n_{i+1}\times n_{i}},\ b\in\mathbb{R}^{n_{i+1}}, (4.1)

and the activation function ReLUk\mbox{ReLU}^{k}, define a deep finite neuron function u(x)u(x) from d\mathbb{R}^{d} to \mathbb{R} as follows:

f0(x)\displaystyle f^{0}(x) =θ0(x)\displaystyle=\theta^{0}(x)
fi(x)\displaystyle f^{i}(x) =[θiσ](fi1(x))i=1:\displaystyle=[\theta^{i}\circ\sigma](f^{i-1}(x))\quad i=1:\ell
f(x)\displaystyle f(x) =f(x).\displaystyle=f^{\ell}(x).

The following more concise notation is often used in computer science literature:

f(x)=θσθ1σθ1σθ0(x),f(x)=\theta^{\ell}\circ\sigma\circ\theta^{\ell-1}\circ\sigma\cdots\circ\theta^{1}\circ\sigma\circ\theta^{0}(x), (4.2)

here θi:nini+1\theta^{i}:\mathbb{R}^{n_{i}}\to\mathbb{R}^{n_{i+1}} are linear functions as defined in (4.1). Such a deep neutral network has (+1)(\ell+1)-layer DNN, namely \ell-hidden layers. The size of this deep neutral network is n1++nn_{1}+\cdots+n_{\ell}.

Based on these notation and connections, define deep finite neuron functions with activation function σ=ReLUk\sigma=\mbox{ReLU}^{k} by

𝒩kn(n1,n2,,n)={f(x)=θ(x), with Wini+1×ni,bini,i=0:,n0=d,n+1=1}{}_{n}{\mathcal{N}}^{k}(n_{1},n_{2},\ldots,n_{\ell})=\bigg{\{}f^{\ell}(x)=\theta^{\ell}(x^{\ell}),\mbox{ with }W^{i}\in\mathbb{R}^{n_{i+1}\times n_{i}},b^{i}\in\mathbb{R}^{n_{i}},i=0:\ell,n_{0}=d,n_{\ell+1}=1\bigg{\}} (4.3)

Generally, we can define the \ell-hidden layer neural network as:

𝒩kn:=n1,n2,,n1𝒩n(n1,n2,,n,1).{}_{n}{\mathcal{N}}_{\ell}^{k}:=\bigcup_{n_{1},n_{2},\cdots,n_{\ell}\geq 1}{}_{n}{\mathcal{N}}(n_{1},n_{2},\cdots,n_{\ell},1). (4.4)

For =1\ell=1, functions in 𝒩1kn{}_{n}{\mathcal{N}}_{1}^{k} consist of piecewise polynomials of degree k2k^{2} on a finite neuron grids whose boundaries are level sets of quadratic polynomials, see Fig 4.1.

Refer to caption
Refer to caption
Refer to caption
Figure 4.1: Hyperplanes with =2\ell=2

4.2 Reproduction of polynomials and spectral accuracy

One interesting property of the ReLUk-DNN is that it reproduces polynomials of degree kk.

Lemma 4.1.

Given k2k\geq 2, q2q\geq 2, there exist 1\ell\geq 1, n1,,nn_{1},\cdots,n_{\ell} such that

q𝒩lkn(n1,,n),\mathbb{P}_{q}\subset{}_{n}{\mathcal{N}}_{l}^{k}(n_{1},\cdots,n_{\ell}),

where q\mathbb{P}_{q} is the set of all polynomials with degree not larger than qq.

For a proof of the above result, we refer to [Li et al., 2019].

Theorem 4.2.

Let ReLUk\mbox{ReLU}^{k} be the activation function, and 𝒩kn(N){}_{n}{\mathcal{N}}_{\ell}^{k}(N) be the DNN model with \ell hidden layers. There exists some \ell such that

infvN𝒩kn(N)uvNHm(Ω)infvNkuvNHm(Ω),\inf_{v_{N}\in{}_{n}{\mathcal{N}}_{\ell}^{k}(N)}\|u-v_{N}\|_{H^{m}(\Omega)}\lesssim\inf_{v_{N}\in\mathbb{P}_{k^{\ell}}}\|u-v_{N}\|_{H^{m}(\Omega)}, (4.5)

Estimate (4.5) indicates that the deep finite neuron function may provide spectral approximate accuracy.

4.3 Reproduction of linear finite element functions and adaptivity

The deep neural network with ReLU activation function have been much studied in the literature and most widely used in practice. One interesting fact is that ReLU-DNN is simply piecewise linear functions. More specifically, from [He et al., 2018], we have the following result:

Lemma 4.3.

Assume that 𝒯h{\cal T}_{h} is a simplicial finite element grid of NN elements, in which any union of simplexes that share a same vertex is convex, any linear finite element function on this grid can be written as a ReLU-DNN with at most 𝒪(d)\mathcal{O}(d) hidden layers. The number of neurons is at most 𝒪(κdN)\mathcal{O}(\kappa^{d}N) for some constant κ2\kappa\geq 2 depending on the shape-regularity of 𝒯h\mathcal{T}_{h}. The number of non-zero parameters is at most 𝒪(dκdN)\mathcal{O}(d\kappa^{d}N).

The above result indicate that the deep finite neuron functions can reproduce any linear finite element functions. Given the adaptive feature and capability of finite element methods, we see that the finite neuron method can be at least as adaptive as finite element method.

5 The finite neuron method for boundary value problems

In this section, we apply the finite neuron functions for numerical solutions of (1.1). In §5.1, we first present some analytic results for (1.1). In §5.2, we obtain error estimates for the finite neuron method for (1.1) for both the Neumann and Dirichlet boundary conditions.

5.1 Elliptic boundary value problems of order 2m2m

As discussed in the introduction, let us rewrite the Dirchlet boundary value problem as follows:

{Lu=fin Ω,BDk(u)=0on Ω(0km1).\left\{\begin{array}[]{rccl}\displaystyle Lu&=&f&\mbox{in }\Omega,\\ B_{D}^{k}(u)&=&0&\mbox{on }\partial\Omega\quad(0\leq k\leq m-1).\end{array}\right. (5.1)

Here BDk(u)B_{D}^{k}(u) are given by (1.7). We next discuss about the pure Neumann boundary conditions for general PDE operator (1.2) when m2m\geq 2. We first begin our discussion with the following simple result.

Lemma 5.1.

For each k=0,1,,m1k=0,1,\ldots,m-1, there exists a bounded linear differential operator of order 2mk12m-k-1:

BNk:H2m(Ω)L2(Ω)B_{N}^{k}:H^{2m}(\Omega)\mapsto L^{2}(\partial\Omega) (5.2)

such that the following identity holds

(Lu,v)=a(u,v)k=0m1BNk(u),BDk(v)0,Ω.(Lu,v)=a(u,v)-\sum_{k=0}^{m-1}\langle B_{N}^{k}(u),B_{D}^{k}(v)\rangle_{0,\partial\Omega}.

Namely

|α|=m(1)m(α(aααu),v)0,Ω=|𝜶|=m(aα𝜶u,𝜶v)0,Ωk=0m1BNk(u),BDk(v)0,Ω\sum_{|\alpha|=m}(-1)^{m}\left(\partial^{\alpha}(a_{\alpha}\,\partial^{\alpha}\,u),\,v\right)_{0,\Omega}=\sum_{|{\bm{\alpha}}|=m}\left(a_{\alpha}\partial^{\bm{\alpha}}\,u,\partial^{\bm{\alpha}}v\right)_{0,\Omega}-\sum_{k=0}^{m-1}\langle B_{N}^{k}(u),B_{D}^{k}(v)\rangle_{0,\partial\Omega} (5.3)

for all uH2m(Ω),vHm(Ω)u\in H^{2m}(\Omega),v\in H^{m}(\Omega). Furthermore,

k=0m1BDk(u)L2(Ω)+k=0m1BNk(u)L2(Ω)u2m,Ω.\sum_{k=0}^{m-1}\|B_{D}^{k}(u)\|_{L^{2}(\partial\Omega)}+\sum_{k=0}^{m-1}\|B_{N}^{k}(u)\|_{L^{2}(\partial\Omega)}\lesssim\|u\|_{2m,\Omega}. (5.4)

Lemma 5.1 can be proved by induction with respect to mm. We refer to [Lions and Magenes, 2012] (Chapter 2) and [Chen and Huang, 2020] for a proof on a similar identity.

In general the explicit expression of BNkB_{N}^{k} can be quite complicated. Let us get some idea by looking at some simple examples with the following special operator:

Lu=(Δ)mu+u,Lu=(-\Delta)^{m}u+u, (5.5)

and

a(u,v)=|α|=m(aααu,αv)0,Ω+(a0u,v)u,vV.a(u,v)=\sum_{|\alpha|=m}(a_{\alpha}\partial^{\alpha}u,\partial^{\alpha}v)_{0,\Omega}+(a_{0}u,v)\quad\forall u,v\in V. (5.6)
  • For m=1m=1, it is easy to see that BN0u=uν|ΩB_{N}^{0}u=\frac{\partial u}{\partial\nu}|_{\partial\Omega}.

  • For m=2m=2 and d=2d=2, see [Chien, 1980]:

    BN0u=ν(Δu+2uτ2)τ(κτuτ)|ΩandBN1u=2uν2|Ω,B_{N}^{0}u=\frac{\partial}{\partial\nu}\left(\Delta u+\frac{\partial^{2}u}{\partial\tau^{2}}\right)-\frac{\partial}{\partial\tau}\left({\kappa_{\tau}}\frac{\partial u}{\partial\tau}\right)|_{\partial\Omega}~{}~{}\hbox{and}~{}~{}B_{N}^{1}u=\frac{\partial^{2}u}{\partial\nu^{2}}|_{\partial\Omega},

    with τ\tau being the anti-clockwise unit tangential vector, and κτ\kappa_{\tau} the curvature of Ω\partial\Omega.

We are now in a position to state that the pure Neumann boundary value problems for PDE operator (1.2) as follows.

{Lu=fin Ω,BNk(u)=0on Ω(0km1).\left\{\begin{array}[]{rccl}Lu&=&f&\mbox{in }\Omega,\\ B_{N}^{k}(u)&=&0&\mbox{on }\partial\Omega\quad(0\leq k\leq m-1).\end{array}\right. (5.7)

Combining the trace theorem for Hm(Ω)H^{m}(\Omega), see [Adams and Fournier, 2003], and Lemma (5.1), it is easy to see that (1.5) is equivalent to (5.7) with V=Hm(Ω)V=H^{m}(\Omega).

For a given parameter δ>0\delta>0, we next consider the following problem with mixed boundary condition:

{Luδ=fin Ω,BDk(uδ)+δBNk(uδ)=0, 0km1.\left\{\begin{aligned} Lu_{\delta}&=f\qquad\mbox{in }\Omega,\\ B_{D}^{k}(u_{\delta})+\delta B_{N}^{k}(u_{\delta})&=0,\ \ 0\leq k\leq m-1.\end{aligned}\right. (5.8)

It is easy to see that (5.8) is equivalent to the following problem: Find uδHm(Ω)u_{\delta}\in H^{m}(\Omega), such that

Jδ(uδ)=minvHm(Ω)Jδ(v).J_{\delta}(u_{\delta})=\min_{v\in H^{m}(\Omega)}J_{\delta}(v). (5.9)

where

Jδ(v)=12aδ(v,v)(f,v)J_{\delta}(v)={1\over 2}a_{\delta}(v,v)-(f,v) (5.10)

and

aδ(u,v)=a(u,v)+δ1k=0m1BDk(u),BDk(v)0,Ω.a_{\delta}(u,v)=a(u,v)+\delta^{-1}\sum_{k=0}^{m-1}\langle B_{D}^{k}(u),B_{D}^{k}(v)\rangle_{0,\partial\Omega}. (5.11)

In summary, we have

Lemma 5.2.

The following equivalences hod:

  1. 1.

    uu solves for (5.1) or (5.7) if and only if uu solves

    J(u)=minvVJ(v)J(u)=\min_{v\in V}J(v) (5.12)

    with V=H0m(Ω)V=H^{m}_{0}(\Omega) or V=Hm(Ω)V=H^{m}(\Omega),

  2. 2.

    uδu_{\delta} solves for (5.8) if and only if uδu_{\delta} solves

    Jδ(uδ)=minvVJδ(v)\displaystyle J_{\delta}(u_{\delta})=\min_{v\in V}J_{\delta}(v)

    with V=Hm(Ω)V=H^{m}(\Omega).

Lemma 5.3.

Assume that uVu\in V be solution of (5.1) or (5.7) and uδVu_{\delta}\in V be the solution of (5.9), then the following identities hold:

vua2=J(v)J(u)vV,\|v-u\|_{a}^{2}=J(v)-J(u)\quad\forall v\in V, (5.13)

and

vuδa,δ2=Jδ(v)Jδ(uδ)vV.\|v-u_{\delta}\|_{a,\delta}^{2}=J_{\delta}(v)-J_{\delta}(u_{\delta})\quad\forall v\in V. (5.14)

Here

va2=a(v,v),va,δ2=aδ(v,v).\|v\|_{a}^{2}=a(v,v),\quad\|v\|_{a,\delta}^{2}=a_{\delta}(v,v). (5.15)
Proof.

Let uu be the solution of (1.5). Given vVv\in V, consider the quadratic function of tt:

g(t)=J(u+t(vu)).g(t)=J(u+t(v-u)).

It is easy to see that

0=argmintg(t),g(0)=0,0=\arg\min_{t}g(t),\quad g^{\prime}(0)=0,

and

J(v)J(u)=g(1)g(0)=g(0)+12g′′(0)=vua2.J(v)-J(u)=g(1)-g(0)=g^{\prime}(0)+{1\over 2}g^{\prime\prime}(0)=\|v-u\|_{a}^{2}.

This completes the proof of (5.13). The proof of (5.14) is similar. ∎

Lemma 5.4.

Let uu be the solution of (5.1) and uδu_{\delta} be the solution of (5.8). Then

uuδa,δδu2m,Ω.\displaystyle\|u-u_{\delta}\|_{a,\delta}\lesssim\sqrt{\delta}\|u\|_{2m,\Omega}. (5.16)
Proof.

Let w=uuδw=u-u_{\delta} and we have

{Lw=0in Ω,BDk(w)+δBNk(w)=δBNk(u), 0km1.\left\{\begin{aligned} Lw&=0\qquad\mbox{in }\Omega,\\ B_{D}^{k}(w)+\delta B_{N}^{k}(w)&=\delta B_{N}^{k}(u),\ \ 0\leq k\leq m-1.\end{aligned}\right. (5.17)

By Lemma 5.1, and (5.17), we have

0\displaystyle 0 =(Lw,w)\displaystyle=(Lw,w) (5.18)
=|α|=m(aααw,αw)k=0m1ΩBNk(w)BDk(w)𝑑s+(a0w,w)\displaystyle=\sum_{|\alpha|=m}(a_{\alpha}\partial^{\alpha}w,\partial^{\alpha}w)-\sum_{k=0}^{m-1}\int_{\partial\Omega}B_{N}^{k}(w)B_{D}^{k}(w)ds+(a_{0}w,w) (5.19)
=|α|=m(aααw,αw)+k=0m1Ω(δ1BDk(w)BNk(u))BDk(w)𝑑s+(a0w,w),\displaystyle=\sum_{|\alpha|=m}(a_{\alpha}\partial^{\alpha}w,\partial^{\alpha}w)+\sum_{k=0}^{m-1}\int_{\partial\Omega}(\delta^{-1}B_{D}^{k}(w)-B_{N}^{k}(u))B_{D}^{k}(w)ds+(a_{0}w,w), (5.20)

implying

a(w,w)+δ1k=0m1ΩBDk(w)2𝑑s=k=0m1ΩBNk(u)BDk(w)𝑑s.\displaystyle a(w,w)+\delta^{-1}\sum_{k=0}^{m-1}\int_{\partial\Omega}B_{D}^{k}(w)^{2}ds=\sum_{k=0}^{m-1}\int_{\partial\Omega}B_{N}^{k}(u)B_{D}^{k}(w)ds. (5.21)

By Cauchy inequality, we have

a(w,w)+δ1k=0m1BDk(w)L2(Ω)2\displaystyle a(w,w)+\delta^{-1}\sum_{k=0}^{m-1}\|B_{D}^{k}(w)\|^{2}_{L^{2}(\partial\Omega)}\leq k=0m1BNk(u)L2(Ω)BDk(w)L2(Ω)\displaystyle\sum_{k=0}^{m-1}\|B_{N}^{k}(u)\|_{L^{2}(\partial\Omega)}\|B_{D}^{k}(w)\|_{L^{2}(\partial\Omega)} (5.22)
\displaystyle\leq 2δk=0m1BNk(u)L2(Ω)2+12δ1k=0m1BDk(w)L2(Ω)2,\displaystyle 2\delta\sum_{k=0}^{m-1}\|B_{N}^{k}(u)\|^{2}_{L^{2}(\partial\Omega)}+\frac{1}{2}\delta^{-1}\sum_{k=0}^{m-1}\|B_{D}^{k}(w)\|^{2}_{L^{2}(\partial\Omega)}, (5.23)

which implies

a(w,w)+12δ1k=0m1BDk(w)L2(Ω)22δk=0m1BNk(u)L2(Ω)2.\displaystyle a(w,w)+\frac{1}{2}\delta^{-1}\sum_{k=0}^{m-1}\|B_{D}^{k}(w)\|^{2}_{L^{2}(\partial\Omega)}\leq 2\delta\sum_{k=0}^{m-1}\|B_{N}^{k}(u)\|^{2}_{L^{2}(\partial\Omega)}. (5.24)

By the definition of a,δ\|\cdot\|_{a,\delta} and noting that w=uuδw=u-u_{\delta}, we have

uuδa,δ24δk=0m1BNk(u)L2(Ω)2.\displaystyle\|u-u_{\delta}\|_{a,\delta}^{2}\leq 4\delta\sum_{k=0}^{m-1}\|B_{N}^{k}(u)\|^{2}_{L^{2}(\partial\Omega)}. (5.25)

Combing this with (5.4), then completes the proof. ∎

Lemma 5.5.

For any sms\geq-m and fHs(Ω)f\in H^{s}(\Omega), the solution uu of (5.1) or (5.7) satisfies uH2m+s(Ω)u\in H^{2m+s}(\Omega) and

u2m+s,Ωfs,Ω.\|u\|_{2m+s,\Omega}\lesssim\|f\|_{s,\Omega}. (5.26)

We refer to [Lions and Magenes, 2012] (Chapter 2, Theorem 5.1 therein) for a detailed proof.

Following from (5.26) and (2.46), we have

Lemma 5.6.

For any sms\geq-m, ϵ>0\epsilon>0, and fHs(Ω)f\in H^{s}(\Omega), the solution uu of (5.1) or (5.7) satisfies

uBm+1(Ω)fm+d2+1+ϵ,Ω.\|u\|_{B^{m+1}(\Omega)}\leq\|f\|_{-m+\frac{d}{2}+1+\epsilon,\Omega}. (5.27)

5.2 The finite neuron method for (1.1) and error estimates

Let VNVV_{N}\subset V be a subset of VV defined by (3.3) which may not be linear subspace. Consider the the discrete problem of (5.12):

Find uNVN such that J(uN)=minvNVNJ(vN).\mbox{Find $u_{N}\in V_{N}$ such that }J(u_{N})=\min_{v_{N}\in V_{N}}J(v_{N}). (5.28)

It is easy to see that the solution to (5.28) always exists (for deep neural network functions as defined below), but may not be unique.

Theorem 5.7.

Let uVu\in V and uNVNu_{N}\in V_{N} be solutions to (5.7) and (5.28) respectively. Then

uuNa=infvNVNuvNa.\|u-u_{N}\|_{a}=\inf_{v_{N}\in V_{N}}\|u-v_{N}\|_{a}. (5.29)
Proof.

By Lemma 5.3, we have

uNua2=J(uN)J(u)J(vN)J(u)=vNua2,vVN.\|u_{N}-u\|_{a}^{2}=J(u_{N})-J(u)\leq J(v_{N})-J(u)=\|v_{N}-u\|_{a}^{2},\quad\forall v\in V_{N}.

The proof is completed. ∎

We obtain the following result.

Theorem 5.8.

Let uVu\in V and uNVNu_{N}\in V_{N} be solutions to (5.7) and (5.28) respectively. Then for arbitrary ϵ>0\epsilon>0, we have

uuNa(fL2(Ω)+fk+d2+1+ϵ){N121dm<k,N12m=k.\|u-u_{N}\|_{a}\lesssim(\|f\|_{L^{2}(\Omega)}+\|f\|_{-k+\frac{d}{2}+1+\epsilon})\begin{cases}N^{-{1\over 2}-{1\over d}}&m<k,\\ N^{-{1\over 2}}&m=k.\end{cases} (5.30)

By (5.29), Theorem 3.9 and the embedding of Barron space into Sobolev space, namely Lemma 2.5, the regularity result (5.26), we get the proof.

Next we consider the discrete problem of (5.9):

Find uNVN such that Jδ(uN)=minvNVNJδ(vN).\mbox{Find $u_{N}\in V_{N}$ such that }J_{\delta}(u_{N})=\min_{v_{N}\in V_{N}}J_{\delta}(v_{N}). (5.31)
Lemma 5.9.

For any given number δ\delta, let uδu_{\delta} be the solution of (5.8) and uNu_{N} be the solution of (5.31), respectively. We have

uNuδa,δ(1+δ12)infvNVNvNuδm,Ω.\|u_{N}-u_{\delta}\|_{a,\delta}\lesssim(1+\delta^{-\frac{1}{2}})\inf_{v_{N}\in V_{N}}\|v_{N}-u_{\delta}\|_{m,\Omega}. (5.32)
Proof.

First of all, by Lemma 5.3 and the variational property, it holds that

uNuδa,δ2=Jδ(uN)Jδ(uδ)Jδ(vN)Jδ(uδ)=vNuδa,δ2,vNVN.\displaystyle\|u_{N}-u_{\delta}\|_{a,\delta}^{2}=J_{\delta}(u_{N})-J_{\delta}(u_{\delta})\leq J_{\delta}(v_{N})-J_{\delta}(u_{\delta})=\|v_{N}-u_{\delta}\|_{a,\delta}^{2},\quad\forall\,v_{N}\in V_{N}. (5.33)

Further, for any vNVNv_{N}\in V_{N}, by the definition of a,δ\|\cdot\|_{a,\delta} and trace inequality, we have

vNuδa,δvNuδm,Ω+δ12vNuδ0,Ω(1+δ12)vNuδm,Ω.\|v_{N}-u_{\delta}\|_{a,\delta}\lesssim\|v_{N}-u_{\delta}\|_{m,\Omega}+\delta^{-\frac{1}{2}}\|v_{N}-u_{\delta}\|_{0,\partial\Omega}\lesssim(1+\delta^{-\frac{1}{2}})\|v_{N}-u_{\delta}\|_{m,\Omega}.

This completes the proof. ∎

Lemma 5.10.

Let uu be the solution of (5.1) and uNu_{N} be the solution of (5.31), respectively. We have

uNua,δ(1+δ12)infvNVNvNum,Ω+δfL2(Ω).\|u_{N}-u\|_{a,\delta}\lesssim(1+\delta^{-\frac{1}{2}})\inf_{v_{N}\in V_{N}}\|v_{N}-u\|_{m,\Omega}+\sqrt{\delta}\|f\|_{L^{2}(\Omega)}. (5.34)
Proof.

First, by triangle inequality and (5.33), for any vNVNv_{N}\in V_{N}, we have

uNua,δ\displaystyle\|u_{N}-u\|_{a,\delta} uNuδa,δ+uδua,δ\displaystyle\leq\|u_{N}-u_{\delta}\|_{a,\delta}+\|u_{\delta}-u\|_{a,\delta} (5.35)
vNuδa,δ+uδua,δ\displaystyle\leq\|v_{N}-u_{\delta}\|_{a,\delta}+\|u_{\delta}-u\|_{a,\delta} (5.36)
vNua,δ+2uδua,δ.\displaystyle\leq\|v_{N}-u\|_{a,\delta}+2\|u_{\delta}-u\|_{a,\delta}. (5.37)

Then by the definition of a,δ\|\cdot\|_{a,\delta}, trace inequality and (5.16), for any vNVNv_{N}\in V_{N}, we have

uNua,δ(1+δ12)vNum,Ω+δfL2(Ω).\|u_{N}-u\|_{a,\delta}\lesssim(1+\delta^{-\frac{1}{2}})\|v_{N}-u\|_{m,\Omega}+\sqrt{\delta}\|f\|_{L^{2}(\Omega)}. (5.38)

This completes the proof. ∎

Theorem 5.11.

Let uu be the solution of (5.1) and uNu_{N} be the solution of (5.31) with δN1/21/d\delta\sim N^{-1/2-1/{d}}, respectively. Then

uuNa(fL2(Ω)+fk+d2+1+ϵ){N1412dm<k,N14m=k.\|u-u_{N}\|_{a}\lesssim(\|f\|_{L^{2}(\Omega)}+\|f\|_{-k+\frac{d}{2}+1+\epsilon})\begin{cases}N^{-{1\over 4}-{1\over{2d}}}&m<k,\\ N^{-{1\over 4}}&m=k.\end{cases} (5.39)
Proof.

Let us only consider the case that k>mk>m. By Theorem 3.9,

infvNVNuvNm,ΩN121duBm+1(Ω).\inf_{v_{N}\in V_{N}}\|u-v_{N}\|_{m,\Omega}\lesssim N^{-\frac{1}{2}-{1\over d}}\|u\|_{B^{m+1}(\Omega)}.

Thus, by (5.32)

uuNm,ΩuNua,δδ12N121duBm+1,q(Ω)+δ12fL2(Ω)(δ12N121d+δ12)(uBm+1(Ω)+fL2(Ω)),δ>0.\|u-u_{N}\|_{m,\Omega}\lesssim\|u_{N}-u\|_{a,\delta}\lesssim\delta^{-\frac{1}{2}}N^{-\frac{1}{2}-{1\over d}}\|u\|_{B^{m+1,q}(\Omega)}+\delta^{\frac{1}{2}}\|f\|_{L^{2}(\Omega)}\\ \leq(\delta^{-\frac{1}{2}}N^{-\frac{1}{2}-{1\over d}}+\delta^{\frac{1}{2}})(\|u\|_{B^{m+1}(\Omega)}+\|f\|_{L^{2}(\Omega)}),\ \ \forall\,\delta>0. (5.40)

Set δN1/21/d\delta\sim N^{-1/2-1/{d}}, and it follows that

uuNm,ΩN1412d(uBm+1(Ω)+fL2(Ω)).\|u-u_{N}\|_{m,\Omega}\lesssim N^{-{1\over 4}-{1\over 2d}}(\|u\|_{B^{m+1}(\Omega)}+\|f\|_{L^{2}(\Omega)}). (5.41)

Now by the embedding of Barron space to Sobolev space, namely Lemma 2.5, the regularity result (5.26), the proof is completed. ∎

We note that (5.31) was studied in [E and Yu, 2018] for m=1m=1 and k=3k=3. Convergence analysis for (5.28) and (5.31) seems to be new in this paper. For other convergence analysis of DNN for numerical PDE, we refer to [shin2020on:arXiv:2004.01806] and [Mishra and Rusch, 2020, Mishra and Molinaro, 2020] for convergence analysis of PINN (Physics Informed Neural Network).

6 Summary and discussions

In this paper, we consider a very special class of neural network function based on ReLUk as activation function. This function class consists of piecewise polynomials which closely resemble finite element functions. By considering elliptic boundary value problems of 2m2m-th order in any dimensions, it is still unknown how to construct HmH^{m}-conforming finite element space in general in the classic finite element setting. In contrast, it is rather straightforward to construct HmH^{m}-conforming piecewise polynomials using neural networks, known as the finite neuron method, and we further proved that the finite neuron method provides good approximation properties.

It is still a subject of debate and of further investigation whether it is practically efficient to use artificial neural network for numerical solution of partial differential equations. One major challenge for this type of method is that the resulting optimization problem is hard to solve, as we shall discuss below.

6.1 Solution of the non-convex optimization problem

(5.28) or (5.31) is a highly nonlinear and non-convex optimization problem with respect to parameters defining the functions in VNV_{N}, see (3.3). How to solve this type of optimization problem efficiently is a topic of intensive research in deep learning. For example, stochastic gradient method is used in [E and Yu, 2018] to solve (5.31) for m=1m=1 and k=3k=3. Multi-scale deep neural network (MscaleDNN) [Liu et al., 2020] and phase shift DNN (PhaseDNN) [Cai et al., 2019] are developed to convert the high frequency solution to a low frequency one before training. Randomized Newton’s method is developed to train the neural network from a nonlinear computation point of view [Chen and Hao, 2019]. More refined algorithms still need to be developed to solve (5.28) or (5.31) with high accuracy so that the convergence order, (5.30) or (5.39), of the finite neuron method can not be achieved.

6.2 Competitions between locality and global smoothness

One insight gained from the studies in the paper is that the challenges in constructing classic HmH^{m}-finite element subspace seems to lie in the competitions between local d.o.f. (degree of freedom) and global smoothness. In the classic finite element, one requires to define d.o.f. on each element and then glue the local d.o.f. together to obtain a globally HmH^{m}-smooth function. This process has proven to be very difficult to realize in general when m2m\geq 2. But, if we relax the locality, as in Powell-Sabine element [Powell and Sabin, 1977], we can use piecewise polynomials of lower degree to construct globally smooth function. The neural network approach studied in this paper can be considered as a global construction without any use of a grid in the first place (even though an implicitly defined grid exists). As a result, it is quite easy to construct globally smooth functions that are piecewise polynomials. It is quite remarkable that such a global construction leads to function class that has very good approximation properties. This is an attractive property of the function classes from the artificial neural network. One feasible question to ask if it is possible to develop finite element construction technique that are more global than the classic finite element but more local than the finite neuron method, which may be an interesting topic for further research.

Local D.O.F. Slightly more global \cdots global
General grid Special grid \cdots No grid
[Uncaptioned image] [Uncaptioned image] \cdots
Conjecture: k=(m1)2d+1k=(m-1)2^{d}+1 Powell-Sabin [Powell and Sabin, 1977] \cdots ReLum-DNN
True: d=1,m1d=1,m\geq 1 k=2k=2 \cdots k=mk=m
d=2,m=2d=2,m=2 (Still open) d=2,m=2d=2,m=2 ?? any dd and mm

Observation: More global d.o.f. lead to easier construction of conforming elements for high order PDEs.

6.3 Piecewise PmP_{m} for Hm(Ω)H^{m}(\Omega): from finite element to finite neuron method

As it is noted above, in the classic finite element setting, it is challenging to construct HmH^{m}-conforming finite element spaces for any m,d1m,d\geq 1. But if we relax the conformity, as shown in [Wang and Xu, 2013], it is possible to give a universal construction of convergent HmH^{m}-nonconforming finite element consisting of piecewise polynomial of degree mm. In the finite neuron method setting, by relaxing the constraints from the a priori given finite element grid, the construction of HmH^{m}-conforming piecewise polynomials of degree mm becomes straightforward. In fact, the finite neuron method can be considered as mesh-less method, or even, vertex-less method although there is a hidden grid for any finite neuron function. This raises a question if it is possible to develop some ”in-between” method that have the advantages of both the classic finite element method and the finite neuron method.

6.4 Adaptivity and spectral accuracy

One of the important properties in the traditional finite element method is its ability to locally adapt the finite element grids to provide accurate approximation of PDE solution that may have local singularities (such as corner singularities and interface singularities). In contrast, the traditional spectral method (using high order polynomials) can provide very high order accuracy for solutions that are globally smooth. The finite neuron method analyzed in this paper seems to possess both the adaptivity feature as in the traditional finite element method and also the global spectral accuracy as in the traditional spectral methods. Adaptivity feature of the finite neuron method is expected since, as shown in § 4, the deep finite neuron method can recover locally adaptive finite element spaces for m=1m=1. Spectral feature of the finite neuron method is illustrated in Theorem 4.2. As a result, tt is conceivable that the finite neuron method may have both the local and also global adaptive feature, or perhaps even adaptive features in all different scales. Nevertheless, such highly adaptive features of the finite neuron method come with a potentially big price, namely the solution of a nonlinear and non-convex optimization problems.

6.5 Comparison with PINN

One important class of methods that is related to the FNM analyzed in this paper is the the method of physical-informed neural networks (PINN) introduced in [Raissi et al., 2019]. By minimizing certain norms of PDE residual together with penalizations of boundary conditions and other relevant quantities, PINN is a very general approach that can be directly applied to a wide range of problems. In comparison, FNM can only be applied to some special class of problems that admit some special physical laws such as principle of energy minimization or principle of least action, see [Feynman et al.,]. Because of the special physical law associated with our underlying minimization problems, the Neumann boundary conditions are naturally enforced in the minimization problem and, unlike in the PINN method, no penalization is needed to enforce such type of boundary conditions.

6.6 On the sharpness of the error estimates

In this paper, we provide a number of error estimates for our FNM such as (3.23), (3.27) and (3.69), which give increasingly better asymptotic order but also require more regularities. Even for sufficiently regular solution uu, the best asymptotic estimate (3.69) may still not be optimal. In finite element method, piecewise polynomial of degree kk usually give rise to increasingly better asymptotic error when kk increases. But the asymptotic rate in the estimate of (3.69) does not improve as kk increases. On the other hand, If k>jk>j, ReLUk-DNN should conceivably give better accuracy than ReLUj-DNN since ReLUj can be approximated arbitrarily accurate by certain finite difference of ReLUk. How to obtain better asymptotic estimates than (3.69) is still a subject of further investigation.

We also note our error estimate (5.39) for Dirichlet boundary condition is not as good as the one (5.30) for Neumann boundary conditions. This is undesirable and may not be optimal. In comparison, Nitsche trick does not suffer a loss of accuracy when used in traditional finite element method.

6.7 Neural splines in multi-dimensions

The spline functions described in § 3.1 are widely used in scientific and enginnering computing, but their generalization multiple dimension are non-trivial, especially when Ω\Omega has curved boundary. In [Hu and Zhang, 2015], using the tensor product, the authors extended the 1D spline to multi-dimensions on rectangular grids. Some others involve rational functions such as NURBS [Cottrell et al., 2009]. But the generalization of 𝒩1kn{}_{n}{\mathcal{N}}_{1}^{k} or 𝒩1n(bk){}_{n}{\mathcal{N}}_{1}(b^{k}) to multi-dimension is straightforward and also the resulting (nonlinear) space has very good approximate properties. It is conceivable that the neural network extension of B-spline to multiple dimensions which are locally polynomials and globally smooth, may find useful applications in computer aid design (CAD) and isogeometric analysis [Cottrell et al., 2009]. This is a potentially an interesting research direction.

Acknowledgements

Main results in this manuscript were prepared for and reported in “International Conference on Computational Mathematics and Scientific Computing” (August 17-20, 2020, http://lsec.cc.ac.cn/\simiccmsc/Home.html). and the author is grateful to the invitation of the conference organizers and also to the helpful feedbacks from the audience. The author also wishes to thank Limin Ma, Qingguo Hong and Shuo Zhang for their help in preparing this manuscript. This work was partially supported by the Verne M. William Professorship Fund from Penn State University and the National Science Foundation (Grant No. DMS-1819157).

References

  • [Adams and Fournier, 2003] Adams, R. A. and Fournier, J. (2003). Sobolev spaces, volume 140. Academic press.
  • [Alfeld, 1984] Alfeld, P. (1984). A trivariate clough-tocher scheme for tetrahedral data. Computer Aided Geometric Design, 1(2):169–181.
  • [Antonietti et al., 2018] Antonietti, P. F., Manzini, G., and Verani, M. (2018). The fully nonconforming virtual element method for biharmonic problems. Mathematical Models and Methods in Applied Sciences, 28(02):387–407.
  • [Argyris et al., 1968] Argyris, J. H., Fried, I., and Scharpf, D. W. (1968). The tuba family of plate elements for the matrix displacement method. The Aeronautical Journal, 72(692):701–709.
  • [Baker, 1977] Baker, G. A. (1977). Finite element methods for elliptic equations using nonconforming elements. Mathematics of Computation, 31(137):45–59.
  • [Barron, 1993] Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945.
  • [Beirão da Veiga et al., 2013] Beirão da Veiga, L., Brezzi, F., Cangiani, A., Manzini, G., Marini, L. D., and Russo, A. (2013). Basic principles of virtual element methods. Mathematical Models and Methods in Applied Sciences, 23(01):199–214.
  • [Bickel and Freedman, 1984] Bickel, P. J. and Freedman, D. A. (1984). Asymptotic normality and the bootstrap in stratified sampling. The annals of statistics, pages 470–482.
  • [Boor and Devore, 1983] Boor, C. D. and Devore, R. (1983). Approximation by smooth multivariate splines. Transactions of the American Mathematical Society, 276:775–788.
  • [Boor and Höllig, 1983] Boor, C. D. and Höllig, K. (1983). Approximation order from bivariate c1c^{1}-cubics: A counterexample. Proceedings of the American Mathematical Society, 87(4):649–655.
  • [Bramble and Zlámal, 1970] Bramble, J. H. and Zlámal, M. (1970). Triangular elements in the finite element method. Mathematics of Computation, 24(112):809–820.
  • [Brenner and Sung, 2005] Brenner, S. C. and Sung, L. (2005). C0C^{0} interior penalty methods for fourth order elliptic boundary value problems on polygonal domains. Journal of Scientific Computing, 22(1-3):83–118.
  • [Brezzi et al., 2014] Brezzi, F., Falk, R. S., and Marini, L. D. (2014). Basic principles of mixed virtual element methods. ESAIM: Mathematical Modelling and Numerical Analysis, 48(4):1227–1240.
  • [Brezzi and Marini, 2013] Brezzi, F. and Marini, L. D. (2013). Virtual element methods for plate bending problems. Computer Methods in Applied Mechanics and Engineering, 253:455–462.
  • [Cai et al., 2019] Cai, W., Li, X., and Liu, L. (2019). A phase shift deep neural network for high frequency wave equations in inhomogeneous media. arXiv preprint arXiv:1909.11759.
  • [Chen and Huang, 2020] Chen, L. and Huang, X. (2020). Nonconforming virtual element method for 2m2mth order partial differential equations in Rn{R}^{n}. Mathematics of Computation, 89(324):1711–1744.
  • [Chen and Hao, 2019] Chen, Q. and Hao, W. (2019). A randomized newton’s method for solving differential equations based on the neural network discretization. arXiv preprint arXiv:1912.03196.
  • [Chien, 1980] Chien, W. Z. (1980). Variational methods and finite elements.
  • [Ciarlet, 1978] Ciarlet, P. G. (1978). The finite element method for elliptic problems. North-Holland.
  • [Clough and Tocher, 1965] Clough, R. and Tocher, J. (1965). Finite-element stiffness analysis of plate bending. In Proc. First Conf. Matrix Methods in Struct. Mech., Wright-Patterson AFB, Dayton (Ohio).
  • [Cottrell et al., 2009] Cottrell, J. A., Hughes, T. J., and Bazilevs, Y. (2009). Isogeometric analysis: toward integration of CAD and FEA. John Wiley & Sons.
  • [Cybenko, 1989] Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314.
  • [de Boor, 1971] de Boor, C. (1971). Subroutine package for calculating with B-splines. Los Alamos Scient. Lab. Report LA-4728-MS.
  • [E et al., 2017] E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4):349–380.
  • [E et al., 2019a] E, W., Ma, C., and Wu, L. (2019a). Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039.
  • [E et al., 2019b] E, W., Ma, C., and Wu, L. (2019b). A priori estimates of the population risk for two-layer neural networks. Communications in Mathematical Sciences, 17(5):1407–1425.
  • [E and Wojtowytsch, 2020] E, W. and Wojtowytsch, S. (2020). Representation formulas and pointwise properties for barron functions. arXiv preprint arXiv:2006.05982.
  • [E and Yu, 2018] E, W. and Yu, B. (2018). The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12.
  • [Feng, 1965] Feng, K. (1965). Finite difference schemes based on variational principles. Appl. Math. Comput. Math, 2:238–262.
  • [Feynman et al., ] Feynman, R., Leighton, R., and Sands, M. The feynman lectures on physics. 3 volumes 1964, 1966. In Library of Congress, Catalog Card, number 63-20717.
  • [Fu et al., 2020] Fu, G., Guzmán, J., and Neilan, M. (2020). Exact smooth piecewise polynomial sequences on alfeld splits. Mathematics of Computation, 89(323):1059–1091.
  • [Funahashi, 1989] Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3):183–192.
  • [Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep learning, volume 1. MIT press Cambridge.
  • [Gudi and Neilan, 2011] Gudi, T. and Neilan, M. (2011). An interior penalty method for a sixth-order elliptic equation. IMA Journal of Numerical Analysis, 31(4):1734–1753.
  • [He et al., 2020a] He, J., Li, L., and Xu, J. (2020a). DNN with Heaviside, ReLU and ReQU activation functions. preprint.
  • [He et al., 2018] He, J., Li, L., Xu, J., and Zheng, C. (2018). Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973.
  • [He et al., 2020b] He, J., Li, L., Xu, J., and Zheng, C. (2020b). ReLU deep neural networks and linear finite elements. Journal of Computational Mathematics, 38(3):502–527.
  • [Hornik et al., 1989] Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366.
  • [Hornik et al., 1994] Hornik, K., Stinchcombe, M., White, H., and Auer, P. (1994). Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Computation, 6(6):1262–1275.
  • [Hu and Zhang, 2015] Hu, J. and Zhang, S. (2015). The minimal conforming HkH^{k} finite element spaces on n\mathbb{R}^{n} rectangular grids. Mathematics of Computation, 84(292):563–579.
  • [Hu and Zhang, 2017] Hu, J. and Zhang, S. (2017). A canonical construction of HmH^{m}-nonconforming triangular finite elements. Annals of Applied Mathematics, 33(3):266–288.
  • [Jones, 1992] Jones, L. K. (1992). A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training. The annals of Statistics, 20(1):608–613.
  • [Klusowski and Barron, 2016] Klusowski, J. M. and Barron, A. R. (2016). Uniform approximation by neural networks activated by first and second order ridge splines. arXiv preprint arXiv:1607.07819.
  • [Lagaris et al., 1998] Lagaris, I. E., Likas, A., and Fotiadis, D. I. (1998). Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000.
  • [Lai and Schumaker, 2007] Lai, M. and Schumaker, L. L. (2007). Spline functions on triangulations. Number 110. Cambridge University Press.
  • [Li et al., 2019] Li, B., Tang, S., and Yu, H. (2019). Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. arXiv preprint arXiv:1903.05858.
  • [Lions and Magenes, 2012] Lions, J. L. and Magenes, E. (2012). Non-homogeneous boundary value problems and applications, volume 1. Springer Science & Business Media.
  • [Liu et al., 2020] Liu, Z., Cai, W., and Xu, Z. (2020). Multi-scale deep neural network (mscalednn) for solving poisson-boltzmann equation in complex domains. arXiv preprint arXiv:2007.11207.
  • [Makovoz, 1996] Makovoz, Y. (1996). Random approximants and neural networks. Journal of Approximation Theory, 85(1):98–109.
  • [McCulloch and Pitts, 1943] McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133.
  • [Mhaskar and Micchelli, 1995] Mhaskar, H. and Micchelli, C. A. (1995). Degree of approximation by neural and translation networks with a single hidden layer. Advances in applied mathematics, 16(2):151–183.
  • [Mhaskar and Micchelli, 1994] Mhaskar, H. N. and Micchelli, C. A. (1994). Dimension-independent bounds on the degree of approximation by neural networks. IBM Journal of Research and Development, 38(3):277–284.
  • [Mishra and Molinaro, 2020] Mishra, S. and Molinaro, R. (2020). Estimates on the generalization error of physics informed neural networks (pinns) for approximating pdes ii: A class of inverse problems. arXiv preprint arXiv:2007.01138.
  • [Mishra and Rusch, 2020] Mishra, S. and Rusch, T. K. (2020). Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences. arXiv preprint arXiv:2005.12564.
  • [Morley, 1967] Morley, L. S. D. (1967). The triangular equilibrium element in the solution of plate bending problems. RAE.
  • [Pinkus, 1999] Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta numerica, 8:143–195.
  • [Powell and Sabin, 1977] Powell, M. J. and Sabin, M. A. (1977). Piecewise quadratic approximations on triangles. ACM Transactions on Mathematical Software (TOMS), 3(4):316–325.
  • [Raissi et al., 2019] Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707.
  • [Rubinstein and Kroese, 2016] Rubinstein, R. Y. and Kroese, D. P. (2016). Simulation and the Monte Carlo method, volume 10. John Wiley & Sons.
  • [Schedensack, 2016] Schedensack, M. (2016). A new discretization for mmth-Laplace equations with arbitrary polynomial degrees. SIAM Journal on Numerical Analysis, 54(4):2138–2162.
  • [Siegel and Xu, 2020] Siegel, J. W. and Xu, J. (2020). Approximation rates for neural networks with general activation functions. Neural Networks.
  • [Sirignano and Spiliopoulos, 2018] Sirignano, J. and Spiliopoulos, K. (2018). Dgm: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364.
  • [Wang and Xu, 2013] Wang, M. and Xu, J. (2013). Minimal finite element spaces for 2m2m-th-order partial differential equations in n\mathbb{R}^{n}. Mathematics of Computation, 82(281):25–43.
  • [Wu and Xu, 2017] Wu, S. and Xu, J. (2017). PmP_{m} interior penalty nonconforming finite element methods for 2m-th order PDEs in n\mathbb{R}^{n}. arXiv preprint arXiv:1710.07678.
  • [Wu and Xu, 2019] Wu, S. and Xu, J. (2019). Nonconforming finite element spaces for 2m2m-th order partial differential equations on n\mathbb{R}^{n} simplicial grids when m=n+1m=n+1. Mathematics of Computation, 88(316):531–551.
  • [Xu, 1992] Xu, J. (1992). Iterative methods by space decomposition and subspace correction. SIAM review, 34(4):581–613.
  • [Ženíšek, 1970] Ženíšek, A. (1970). Interpolation polynomials on the triangle. Numerische Mathematik, 15(4):283–296.
  • [Zhang, 2009] Zhang, S. (2009). A family of 3D continuously differentiable finite elements on tetrahedral grids. Applied Numerical Mathematics, 59(1):219–233.