∎
22email: cgduan.math@whu.edu.cn 33institutetext: Yuling Jiao 44institutetext: School of Mathematics and Statistics, and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China.
44email: yulingjiaomath@whu.edu.cn 55institutetext: Yanming Lai 66institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, P.R. China.
66email: laiyanming@whu.edu.cn 77institutetext: Xiliang Lu 88institutetext: School of Mathematics and Statistics, and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China.
88email: xllv.math@whu.edu.cn 99institutetext: Qimeng Quan 1010institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, P.R. China.
1010email: quanqm@whu.edu.cn 1111institutetext: Jerry Zhijian Yang 1212institutetext: School of Mathematics and Statistics, and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China.
1212email: zjyang.math@whu.edu.cn
Analysis of Deep Ritz Methods for Laplace Equations with Dirichlet Boundary Condition
Abstract
Deep Ritz methods (DRM) have been proven numerically to be efficient in solving partial differential equations. In this paper, we present a convergence rate in norm for deep Ritz methods for Laplace equations with Dirichlet boundary condition, where the error depends on the depth and width in the deep neural networks and the number of samples explicitly. Further we can properly choose the depth and width in the deep neural networks in terms of the number of training samples. The main idea of the proof is to decompose the total error of DRM into three parts, that is approximation error, statistical error and the error caused by the boundary penalty. We bound the approximation error in norm with networks and control the statistical error via Rademacher complexity. In particular, we derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with network, which is of immense independent interest. We also analysis the error inducing by the boundary penalty method and give a prior rule for tuning the penalty parameter.
Keywords:
deep Ritz methods convergence rate Dirichlet boundary condition approximation error Rademacher complexityMSC:
65C201 Introduction
Partial differential equations (PDEs) are one of the fundamental mathematical models in studying a variety of phenomenons arising in science and engineering. There have been established many conventional numerical methods successfully for solving PDEs in the case of low dimension , particularly the finite element method brenner2007mathematical ; ciarlet2002finite ; Quarteroni2008Numerical ; Thomas2013Numerical ; Hughes2012the . However, one will encounter some difficulties in both of theoretical analysis and numerical implementation when extending conventional numerical schemes to high-dimensional PDEs. The classic analysis of convergence, stability and any other properties will be trapped into troublesome situation due to the complex construction of finite element space ciarlet2002finite ; brenner2007mathematical . Moreover, in the term of practical computation, the scale of the discrete problem will increase exponentially with respect to the dimension.
Motivated by the well-known fact that deep learning method for high-dimensional data analysis has been achieved great successful applications in discriminative, generative and reinforcement learning he2015delving ; Goodfellow2014Generative ; silver2016mastering , solving high dimensional PDEs with deep neural networks becomes an extremely potential approach and has attracted much attentions Cosmin2019Artificial ; Justin2018DGM ; DeepXDE ; raissi2019physics ; Weinan2017The ; Yaohua2020weak ; Berner2020Numerically ; Han2018solving . Roughly speaking, these works can be divided into three categories. The first category is using deep neural network to improve classical numerical methods, see for example Kiwon2020Solver ; Yufei2020Learning ; hsieh2018learning ; Greenfeld2019Learning . In the second category, the neural operator is introduced to learn mappings between infinite-dimensional spaces with neural networks Li2020Advances ; anandkumar2020neural ; li2021fourier . For the last category, one utilizes deep neural networks to approximate the solutions of PDEs directly including physics-informed neural networks (PINNs) raissi2019physics , deep Ritz method (DRM) Weinan2017The and weak adversarial networks (WAN) Yaohua2020weak . PINNs is based on residual minimization for solving PDEs Cosmin2019Artificial ; Justin2018DGM ; DeepXDE ; raissi2019physics . Proceed from the variational form, Weinan2017The ; Yaohua2020weak ; Xu2020finite propose neural-network based methods related to classical Ritz and Galerkin method. In Yaohua2020weak , WAN are proposed inspired by Galerkin method. Based on Ritz method, Weinan2017The proposes the DRM to solve variational problems corresponding to a class of PDEs.
1.1 Related works and contributions
The idea using neural networks to solve PDEs goes back to 1990’s Isaac1998Artificial ; Dissanayake1994neural . Although there are great empirical achievements in recent several years, a challenging and interesting question is to provide a rigorous error analysis such as finite element method. Several recent efforts have been devoted to making processes along this line, see for example e2020observations ; Luo2020TwoLayerNN ; Mishra2020EstimatesOT ; Mller2021ErrorEF ; lu2021priori ; hong2021rademacher ; Shin2020ErrorEO ; Wang2020WhenAW ; e2021barron . In Luo2020TwoLayerNN , least squares minimization method with two-layer neural networks is studied, the optimization error under the assumption of over-parametrization and generalization error without the over-parametrization assumption are analyzed. In lu2021priori ; Xu2020finite , the generalization error bounds of two-layer neural networks are derived via assuming that the exact solutions lie in spectral Barron space.
Dirichlet boundary condition corresponding to a constrained minimization problem, which may cause some difficulties in computation. The penalty method has been applied in finite element methods and finite volume method Babuska1973The ; Maury2009Numerical . It is also been used in deep PDEs solvers Weinan2017The ; raissi2019physics ; Xu2020finite since it is not easy to construct a network with given values on the boundary. We also apply penalty method to DRM with activation functions, and obtain the error estimation in this work. The main contribution are listed as follows:
-
•
We derive a bound on the approximation error of deep network in norm, which is of independent interest, see Theorem 3.2. That is, for any , there exist a network with depth width (where is the dimension), such that
-
•
We establish a bound on the statistical error in DRM with the tools of pseudo-dimension, especially we give a bound on
i.e., the Rademacher complexity of the non-Lipschitz composition of gradient norm and network, via calculating the Pseudo dimension of networks with both and activation functions, see Theorem 3.3. The technique we used here is also helpful for bounding the statistical errors to other deep PDEs solvers.
-
•
We give an upper bound of the error caused by the Robin approximation without additional assumptions, i.e., bound the error between the minimizer of the penalized form and the weak solutions of the Laplace equation , see Theorem 3.4,
This result improves the one established in Mller2021ErrorEF ; muller2020deep ; hong2021rademacher .
-
•
Based on the above two error bounds we establish a nonasymptotic convergence rate of deep Ritz method for Laplace equation with Dirichlet boundary condition. We prove that if we set
and
it holds that
where is the number of training samples on both the domain and the boundary. Our theory shed lights on how to choose the topological structure of the employed networks and tune the penalty parameters to achieve the desired convergence rate in terms of number of training samples.
Recently, Mller2021ErrorEF ; muller2020deep also study the convergence of DRM with Dirichlet boundary condition via penalty method. However, the results of derive in Mller2021ErrorEF ; muller2020deep are quite different from ours. Firstly, the approximation results in Mller2021ErrorEF ; muller2020deep in based on the approximation error of networks in Sobolev norms established in guhring2019error . However, the network may not be suitable for solving PDEs. In this work, we derive an upper bound on the approximation error of networks in norm, which is of independent interest. Secondly, to analyze the error caused by the penalty term, Mller2021ErrorEF ; muller2020deep assumed some additional conditions, and we do not need these conditions to obtain the error inducing by the penalty. Lastly, we provide the convergence rate analysis involving the statistical error caused by finite samples used in the SGD training, while in Mller2021ErrorEF ; muller2020deep they do not consider the statistical error at all. Moreover, to bound the statistical error we need to control the Rademacher complexity of the non-Lipschitz composition of gradient norm and network, such technique can be useful for bounding the statistical errors to other deep PDEs solvers.
The rest of this paper is organized as follows. In Section 2 we describe briefly the model problem and recall some standard properties of PDEs and variational problems. We also introduce some notations in deep Ritz methods as preliminaries. We devote Section 3 to the detail analysis on the convergence rate of the deep Ritz method with penalty, where various error estimations are analyzed rigorously one by one and the main results on the convergence rate are presented. Some concluding remarks and discussions are given in Section 4.
2 Preliminaries
Consider the following elliptic equation with zero-boundary condition
(1) |
where is a bounded open subset of , , and . Moreover, we suppose the coefficient satisfies a.e.. Without loss of generality, we assume . Define the bilinear form
(2) |
and the corresponding quadratic energy functional by
(3) |
Lemma 1
Evans2010PartialDE The unique weak solution of (1) is the unique minimizer of over . Moreover, .
Now we introduce the Robin approximation of (1) with as below
(4) |
Similarly, we define the bilinear form
and the corresponding quadratic energy functional with boundary penalty
(5) |
where means the trace operator.
Lemma 2
The unique weak solution of (4) is the unique minimizer of over . Moreover, .
Proof
See Appendix A.1.
From the perspective of infinite dimensional optimization, can be seen as the penalized version of . The following lemma provides the relationship between the minimizers of them.
Lemma 3
The minimizer of the penalized problem (5) converges to in as .
Proof
This result follows from Proposition 2.1 in Maury2009Numerical directly.
The deep Ritz method can be divided into three steps. First, one use deep neural network to approximate the trial function. A deep neural network is defined by
where , and the activation functions may be different for different . The depth and the width of neural networks are defined as
is called the number of units of , and is called the free parameters of the network.
Definition 1
The class is the collection of neural networks which satisfies that
-
(i)
depth and width are and , respectively;
-
(ii)
the function values and the squared norm of are bounded by ;
-
(iii)
activation functions are given by , where is the (multi-)index.
For example, is the class of networks with activation functions as , and is that with activation functions as and . We may simply use if there is no confusion.
Second, one use Monte Carlo method to discretize the energy functional. We rewrite (5) as
(6) | ||||
where , are the uniform distribution on and . We now introduce the discrete version of (5) and replace by neural network , as follows
(7) | ||||
We denote the minimizer of (7) over as , that is
(8) |
where i.i.d. and i.i.d..
Finally, we choose an algorithm for solving the optimization problem, and denote as the solution by optimizer .
3 Error Analysis
In this section we prove the convergence rate analysis for DRM with deep networks. The following Theorem plays an important role by decoupling the total errors into four types of errors.
Theorem 3.1
Proof
Given , we can decompose its distance to the weak solution of (1) using triangle inequality
(9) |
First, we decouple the first term into three parts. For any , we have
Since can be any element in , we take the infimum of
(10) | ||||
For any , set , then
where the last equality comes from the fact that is the minimizer of (5). Therefore
that is
(11) | ||||
Combining (10) and (11), we obtain
(12) | ||||
Substituting (12) into (9), it is evident to see that the theorem holds.
The approximation error describes the expressive power of the networks in norm, which corresponds to the approximation error in FEM known as the Céa’s lemma ciarlet2002finite . The statistical error is caused by the Monte Carlo discritization of defined in (5) with in (7). While, the optimization error indicates the performance of the solver we utilized. In contrast, this error is corresponding to the error of solving linear systems in FEM. In this paper we consider the scenario of perfect training with = 0. The error caused by the boundary penalty is the distance between the minimizer of the energy with zero boundary condition and the minimizer of the energy with penalty.
3.1 Approximation error
Theorem 3.2
Assume , then there exist a network with depth and width satisfying
such that
where is a genetic constant and is a constant depending only on .
Proof
Our proof is based on some classical approximation results of B-splines schumaker2007spline ; de1978practical . Let us recall some notation and useful results. We denote by the dyadic partition of , i.e.,
where . The cardinal B-spline of order with respect to partition is defined by
which can be rewritten in the following equivalent form,
(13) |
The multivariate cardinal B-spline of order is defined by the product of univariate cardinal B-splines of order , i.e.,
Denote
Then, the element in are piecewise polynomial functions according to to partition with each piece being degree and in . Since
We can further denote
The following approximation result of cardinal B-splines in Sobolev spaces which is a direct consequence of theorem 3.4 in schultz1969approximation play an important role in the proof of this Theorem.
Lemma 4
Assume , there exists with such that
where is a constant only depend on .
Lemma 5
The multivariate B-spline can be implemented exactly by a network with depth and width .
Proof
Denote
as the activation function in network. By definition of in (13), it’s clear that can be implemented by network without any error with depth and width . On the other hand network can also realize multiplication without any error. In fact, for any ,
Hence multivariate B-spline of order can be implemented by network exactly with depth and width .
3.2 Statistical error
In this section, we bound the statistical error
For simplicity of presentation, we use to denote the upper bound of , and suppose , that is
First, we need to decompose the statistical error into four parts, and estimate each one.
Lemma 6
where
Proof
It is easy to verified by triangle inequality.
We use to denote Given i.i.d samples from , with , we need the following Rademacher complexity to measure the capacity of the given function class restricted on random samples .
Definition 2
The Rademacher complexity of a set is defined as
where are i.i.d Rademacher variables with The Rademacher complexity of function class associate with random sample is defined as
For the sake of simplicity, we deal with last three terms first.
Lemma 7
Suppose that , is -Lipschitz continuous on for all . Let be classes of functions on and . Then
Proof
Corollary 3.17 in ledoux2013probability .
Lemma 8
Proof
Suppose . Define
According to the symmetrization method, we have
(16) | ||||
We now turn to the most difficult term in Lemma 6. Since gradient is not a Lipschitz operator, Lemma 7 does not work and we can not bound the Rademacher complexity in the same way.
Lemma 9
Proof
Based on the symmetrization method, we have
(17) |
The proof of (17) is a direct consequence of the following claim.
Claim: Let be a function implemented by a network with depth and width . Then can be implemented by a - network with depth and width .
Denote and as and , respectively. As long as we show that each partial derivative can be implemented by a - network respectively, we can easily obtain the network we desire, since, and the square function can be implemented by .
Now we show that for any , can be implemented by a - network. We deal with the first two layers in details since there are a little bit difference for the first two layer and apply induction for layers . For the first layer, since , we have for any
Hence can be implemented by a - network with depth and width . For the second layer,
Since and can be implemented by two - subnetworks, respectively, and the multiplication can also be implemented by
we conclude that can be implemented by a - network. We have
and
Thus .
Now we apply induction for layers . For the third layer,
Since
and
we conclude that can be implemented by a - network and , .
We assume that can be implemented by a - network and , . For the th layer,
Since
and
we conclude that can be implemented by a - network and , .
Hence we derive that can be implemented by a - network and , . Finally we obtain that , .
We are now in a position to bound the Rademacher complexity of and . To obtain the estimation, we need to introduce covering number, VC-dimension, pseudo-dimension and recall several properties of them.
Definition 3
Suppose that For any , let be a -cover of with respect to the distance , that is, for any , there exists a such that , where is defined by
The covering number is defined to be the minimum cardinality among all -cover of with respect to the distance .
Definition 4
Suppose that is a class of functions from to Given sample is defined by
The uniform covering number is defined by
Next we give a upper bound of in terms of the covering number of by using the Dudley’s entropy formula dudley .
Lemma 10 (Massart’s finite class lemma boucheron2013concentration )
For any finite set with diameter , then
We give an upper bound of in terms of the covering number by using the Dudley’s entropy formula dudley .
Lemma 11 (Dudley’s entropy formula dudley )
Assume and the diameter of is less than , i.e., . Then
Proof
By definition
Thus, it suffice to show
by conditioning on . Given an positive integer , let , . Let be a cover of whose covering number is denoted as . Then, by definition, there such that
Moreover, we denote the best approximate element of in with respect to as . Then,
Since , and the diameter of is smaller than , we can choose such that the third term in the above display vanishes. By Hölder’s inequality, we deduce that the first term can be bounded by as follows.
Let . Then by definition, the number of elements in and satisfying
And the diameter of denoted as can be bounded as
Then,
where we use triangle inequality in the first inequality, and use Lemma 10 in the second inequality. Putting all the above estimates together, we get
where, last inequality holds since for , we can choose to be the largest integer such that , at this time
Definition 5
Let be a set of functions from to Suppose that We say that is shattered by if for any , there exists a satisfying
Definition 6
The VC-dimension of , denoted as , is defined to be the maximum cardinality among all sets shattered by .
VC-dimension reflects the capability of a class of functions to perform binary classification of points. The larger VC-dimension is, the stronger the capability to perform binary classification is. For more diseussion of VC-dimension, readers are referred to anthony2009neural .
For real-valued functions, we can generalize the concept of VC-dimension into pseudo-dimension anthony2009neural .
Definition 7
Let be a set of functions from to Suppose that We say that is pseudo-shattered by if there exists such that for any , there exists a satisfying
and we say that witnesses the shattering.
Definition 8
The pseudo-dimension of , denoted as , is defined to be the maximum cardinality among all sets pseudo-shattered by .
The following proposition showing a relationship between uniform covering number and pseudo-dimension.
Lemma 12
Let be a set of real functions from a domain to the bounded interval . Let . Then
which is less than for .
Proof
See Theorem 12.2 in anthony2009neural .
We now present the bound of pseudo-dimension for the and .
Lemma 13
Let be polynomials with variables of degree at most . If , then
Proof
See Theorem 8.3 in anthony2009neural .
Lemma 14
Let be a set of functions that
-
(i)
can be implemented by a neural network with depth no more than and width no more than , and
-
(ii)
the activation function in each unit be the or the .
Then
Proof
The argument is follows from the proof of Theorem 6 in bartlett2019nearly . The result stated here is somewhat stronger then Theorem 6 in bartlett2019nearly since .
We consider a new set of functions:
It is clear that . We now bound the VC-dimension of . Denoting as the total number of parameters(weights and biases) in the neural network implementing functions in , in our case we want to derive the uniform bound for
over all and . Actually the maximum of over all and is the growth function . In order to apply Lemma 13, we partition the parameter space into several subsets to ensure that in each subset is a polynomial with respcet to without any breakpoints. In fact, our partition is exactly the same as the partition in bartlett2019nearly . Denote the partition as with some integer satisfying
(18) |
where and denotes the number of units at the th layer and the total number of parameters at the inputs to units in all the layers up to layer of the neural network implementing functions in , respectively. See bartlett2019nearly for the construction of the partition. Obviously we have
(19) |
Note that is a polynomial with respect to with degree the same as the degree of , which is equal to as shown in bartlett2019nearly . Hence by Lemma 13, we have
(20) |
Combining yields
We then have
since the maximum of over all and is the growth function . Some algebras as that of the proof of Theorem 6 in bartlett2019nearly , we obtain
where refers to the number of units of the neural network implementing functions in .
With the help of above preparations, the statistical error can easily be bounded by a tedious calculation.
Theorem 3.3
Let and be the depth and width of the network respectively, then
where is the number of training samples on both the domain and the boundary.
Proof
In order to apply Lemma 11, we need to handle the term
where in the first inequality we use Lemma 12. Now we calculate the integral. Set
then . Denote , . And
Choosing , by Lemma 11 and the above display, we get for both and there holds
(21) |
Then by Lemma 6, 11, 8, 9 and equation (21), we have
Plugging the upper bound of derived in Lemma 12 into the above display and using the relationship of depth and width between and , we get
(22) | ||||
3.3 Error from the boundary penalty method
Although the Lemma 3 shows the convergence property of Robin problem (4) as
it says nothing about the convergence rate. In this section, we consider the error from the boundary penalty method. Roughly speaking, we bound the distance between the minimizer and with respect to the penalty parameter .
Proof
Following the idea which is proposed in Maury2009Numerical (proof of Proposition 2.3), we proceed to prove this theorem. For , we introduce
(23) |
Given such that , we set . Due to , it follows that
(24) |
where is dependent only on , and . Apparently, (23) can be written
where the second equality comes from that
(25) |
Since , is also the minimizer of over . Recall (24), we obtain the estimation of
Now that is coercive, we arrive at
In hong2021rademacher , they proved that the error which is suboptimal comparing with the above results derived here. In Mller2021ErrorEF ; muller2020deep , they proved the bound under some unverifiable conditions.
3.4 Convergence rate
Note that for the approximation error and the statistical error approach and for the error from penalty blows up. Hence, there must be a trade off for choosing proper .
Theorem 3.5
Let be the weak solution of (1) with bounded , . is the minimizer of the discrete version of the associated Robin energy with parameter . Given be the number of training samples on the domain and the boundary, there is a network with depth and width as
such that
Furthermore, for
it holds that
Proof
Combining Theorem 3.1, Theorem 3.2 and Theorem 3.3, we obtain by taking
Using Theorem 3.1 and Theorem 3.4, it holds that for all
(26) | ||||
We have derive the error estimate for fixed , and now we are in a position to find a proper and get the convergence rate. Since (26) holds for any , we take the infimum of :
By taking
we can obtain
4 Conclusions and Extensions
This paper provided an analysis of convergence rate for deep Ritz methods for Laplace equations with Dirichlet boundary condition. Specifically, our study shed light on how to set depth and width of networks and how to set the penalty parameter to achieve the desired convergence rate in terms of number of training samples. The estimation on the approximation error of deep network is established in . The statistical error can be derived technically by the Rademacher complexity of the non-Lipschitz composition of gradient norm and network. We also analysis the error from the boundary penalty method.
There are several interesting further research directions. First, the current analysis can be extended to general second order elliptic equations with other boundary conditions. Second, the approximation and statistical error bounds deriving here can be used for studying the nonasymptotic convergence rate for residual based method, such as PINNs. Finally, the similar result may be applicable to deep Ritz methods for optimal control problems and inverse problems.
Acknowledgements.
Y. Jiao is supported in part by the National Science Foundation of China under Grant 11871474 and by the research fund of KLATASDSMOE of China. X. Lu is partially supported by the National Science Foundation of China (No. 11871385), the National Key Research and Development Program of China (No.2018YFC1314600) and the Natural Science Foundation of Hubei Province (No. 2019CFA007), and by the research fund of KLATASDSMOE of China. J. Yang was supported by NSFC (Grant No. 12125103, 12071362), the National Key Research and Development Program of China (No. 2020YFA0714200) and the Natural Science Foundation of Hubei Province (No. 2019CFA007).Appendix A Appendix
A.1 Proof of Lemma 2
We claim that is coercive on . In fact,
where is constant from Poincaré inequality Gilbarg1983Elliptic . Thus, there exists a unique weak solution such that
We can check is the unique minimizer of by standard technique.
We will study the regularity for weak solutions of (4). For the following discussion, we first introduce several useful classic results of second order elliptic equations in Evans2010PartialDE ; Gilbarg1983Elliptic .
Lemma 15
Assume , , and is sufficiently smooth. Suppose that is a weak solution of the elliptic boundary-value problem
Then and there exists a positive constant , depending only on and , such that
Proof
See Evans2010PartialDE .
Lemma 16
Assume , , and is sufficiently smooth. Suppose that is a weak solution of the elliptic boundary-value problem
Then and there exists a positive constant , depending only on and , such that
Proof
See Gilbarg1983Elliptic .
Lemma 17
Assume , , is sufficiently smooth and . Let be the weak solution of the following Robin problem
(27) |
Then and there exists a positive constant independent of such that
Proof
Following the idea which is proposed in Costabel1996ASP in a slightly different context. We first estimate the trace . We define the Dirichlet-to-Neumann map
where satisfies in , then
Now we are going to show that is a positive definite operator in . We notice that the variational formulation of (27) can be read as follow:
Taking , then we have
This means that is a positive definite operator in , and further, is bounded. We have the estimate
(28) |
We rewrite the Robin problem (27) as follows
By Lemma 16 we have
(29) |
With the help of above lemmas, we now turn to proof the regularity properties of the weak solution.
Theorem A.1
Assume , . Suppose that is a weak solution of the boundary-value problem (4). If is sufficiently smooth, then , and we have the estimate
where the constant depending only on and .
Proof
We decompose (4) into two equations
(30) |
(31) |
and obtain the solution of (4)
Applying Lemma 15 to (30), we have
(32) |
where depends on and . Using Lemma 17, it is easy to obtain
(33) |
where the last inequality follows from the trace theorem. Combining (32) and (33), the desired estimation can be derived by triangle inequality.
References
- (1) Anandkumar, A., Azizzadenesheli, K., Bhattacharya, K., Kovachki, N., Li, Z., Liu, B., Stuart, A.: Neural operator: Graph kernel network for partial differential equations. In: ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (2020)
- (2) Anitescu, C., Atroshchenko, E., Alajlan, N., Rabczuk, T.: Artificial neural network methods for the solution of second order boundary value problems. Cmc-computers Materials & Continua 59(1), 345–359 (2019)
- (3) Anthony, M., Bartlett, P.L.: Neural network learning: Theoretical foundations. cambridge university press (2009)
- (4) Babuska, I.: The finite element method with penalty. Mathematics of Computation (1973)
- (5) Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. J. Mach. Learn. Res. 20(63), 1–17 (2019)
- (6) Berner, J., Dablander, M., Grohs, P.: Numerically solving parametric families of high-dimensional kolmogorov partial differential equations via deep learning. In: H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 16615–16627. Curran Associates, Inc. (2020)
- (7) Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities: A nonasymptotic theory of independence. Oxford university press (2013)
- (8) Brenner, S., Scott, R.: The mathematical theory of finite element methods, vol. 15. Springer Science & Business Media (2007)
- (9) Ciarlet, P.G.: The finite element method for elliptic problems. SIAM (2002)
- (10) Costabel, M., Dauge, M.: A singularly perturbed mixed boundary value problem. Communications in Partial Differential Equations 21 (1996)
- (11) De Boor, C., De Boor, C.: A practical guide to splines, vol. 27. springer-verlag New York (1978)
- (12) Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Communications in Numerical Methods in Engineering 10(3), 195–201 (1994)
- (13) Dudley, R.: The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis 1(3), 290–330 (1967). DOI https://doi.org/10.1016/0022-1236(67)90017-1. URL https://www.sciencedirect.com/science/article/pii/0022123667900171
- (14) E, W., Ma, C., Wu, L.: The Barron space and the flow-induced function spaces for neural network models (2021)
- (15) E, W., Wojtowytsch, S.: Some observations on partial differential equations in Barron and multi-layer spaces (2020)
- (16) Evans, L.C.: Partial differential equations, second edition (2010)
- (17) Gühring, I., Kutyniok, G., Petersen, P.: Error bounds for approximations with deep relu neural networks in norms (2019)
- (18) Gilbarg, D., Trudinger, N.: Elliptic partial differential equations of second order, 2nd ed. Springer (1998)
- (19) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Advances in Neural Information Processing Systems 3 (2014). DOI 10.1145/3422622
- (20) Greenfeld, D., Galun, M., Basri, R., Yavneh, I., Kimmel, R.: Learning to optimize multigrid PDE solvers. In: K. Chaudhuri, R. Salakhutdinov (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 2415–2423. PMLR (2019)
- (21) Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115(34), 8505–8510 (2018). DOI 10.1073/pnas.1718942115. URL https://www.pnas.org/content/115/34/8505
- (22) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034 (2015)
- (23) Hong, Q., Siegel, J.W., Xu, J.: Rademacher complexity and numerical quadrature analysis of stable neural networks with applications to numerical pdes (2021)
- (24) Hsieh, J.T., Zhao, S., Eismann, S., Mirabella, L., Ermon, S.: Learning neural pde solvers with convergence guarantees. In: International Conference on Learning Representations (2018)
- (25) Hughes, T.J.: The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Courier Corporation (2012)
- (26) Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Networks 9(5), 987–1000 (1998). URL https://doi.org/10.1109/72.712178
- (27) Ledoux, M., Talagrand, M.: Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media (2013)
- (28) Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Stuart, A., Bhattacharya, K., Anandkumar, A.: Multipole graph neural operator for parametric partial differential equations. In: H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6755–6766. Curran Associates, Inc. (2020). URL https://proceedings.neurips.cc/paper/2020/file/4b21cf96d4cf612f239a6c322b10c8fe-Paper.pdf
- (29) Li, Z., Kovachki, N.B., Azizzadenesheli, K., liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equations. In: International Conference on Learning Representations (2021)
- (30) Lu, J., Lu, Y., Wang, M.: A priori generalization analysis of the deep ritz method for solving high dimensional elliptic equations (2021)
- (31) Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. CoRR abs/1907.04502 (2019). URL http://arxiv.org/abs/1907.04502
- (32) Luo, T., Yang, H.: Two-layer neural networks for partial differential equations: Optimization and generalization theory. ArXiv abs/2006.15733 (2020)
- (33) Maury, B.: Numerical analysis of a finite element/volume penalty method. Siam Journal on Numerical Analysis 47(2), 1126–1148 (2009)
- (34) Mishra, S., Molinaro, R.: Estimates on the generalization error of physics informed neural networks (pinns) for approximating pdes. ArXiv abs/2007.01138 (2020)
- (35) Müller, J., Zeinhofer, M.: Deep ritz revisited. In: ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (2020)
- (36) Müller, J., Zeinhofer, M.: Error estimates for the variational training of neural networks with boundary penalty. ArXiv abs/2103.01007 (2021)
- (37) Quarteroni, A., Valli, A.: Numerical Approximation of Partial Differential Equations, vol. 23. Springer Science & Business Media (2008)
- (38) Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707 (2019)
- (39) Schultz, M.H.: Approximation theory of multivariate spline functions in sobolev spaces. SIAM Journal on Numerical Analysis 6(4), 570–582 (1969)
- (40) Schumaker, L.: Spline functions: basic theory. Cambridge University Press (2007)
- (41) Shin, Y., Zhang, Z., Karniadakis, G.: Error estimates of residual minimization using neural networks for linear pdes. ArXiv abs/2010.08019 (2020)
- (42) Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484–489 (2016)
- (43) Sirignano, J.A., Spiliopoulos, K.: Dgm: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, 1339–1364 (2018)
- (44) Thomas, J.: Numerical Partial Differential Equations: Finite Difference Methods, vol. 22. Springer Science & Business Media (2013)
- (45) Um, K., Brand, R., Fei, Y.R., Holl, P., Thuerey, N.: Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers. In: H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6111–6122. Curran Associates, Inc. (2020)
- (46) Wang, S., Yu, X., Perdikaris, P.: When and why pinns fail to train: A neural tangent kernel perspective. ArXiv abs/2007.14527 (2020)
- (47) Wang, Y., Shen, Z., Long, Z., Dong, B.: Learning to discretize: Solving 1d scalar conservation laws via deep reinforcement learning. Communications in Computational Physics 28(5), 2158–2179 (2020)
- (48) Weinan, E., Yu, B.: The deep ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6(1), 1–12 (2017)
- (49) Xu, J.: Finite neuron method and convergence analysis. Communications in Computational Physics 28(5), 1707–1745 (2020)
- (50) Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for high-dimensional partial differential equations. Journal of Computational Physics 411, 109409 (2020)