Error Analysis of Physics-Informed Neural Networks for Approximating Dynamic PDEs of Second Order in Time
Abstract
We consider the approximation of a class of dynamic partial differential equations (PDE) of second order in time by the physics-informed neural network (PINN) approach, and provide an error analysis of PINN for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the activation function, the PINN approximation errors for the solution field, its time derivative and its gradient field can be effectively bounded by the training loss and the number of training data points (quadrature points). Our analyses further suggest new forms for the training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN loss formulation. Adopting these new forms for the loss function leads to a variant PINN algorithm. We present ample numerical experiments with the new PINN algorithm for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation, which show that the method can capture the solution well.
Keywords: physics informed neural network; neural network; error estimate; PDE; scientific machine learning
1 Introduction
Deep neural networks (DNN) have achieved a great success in a number of fields in science and engineering LeCun2015DP such as natural language processing, robotics, computer vision, speech and image recognition, to name but a few. This has inspired a great deal of research efforts in the past few years to adapt such techniques to scientific computing. DNN-based techniques seem particularly promising for problems in higher dimensions, e.g. high-dimensional partial differential equations (PDE), since traditional numerical methods for high-dimensional problems can quickly become infeasible due to the exponential increase in the computational effort (so-called curse of dimensionality). Under these circumstances deep-learning algorithms can be helpful. In particular, the neural networks approach for PDE problems provide implicit regularization and can alleviate and perhaps overcome the curse of high dimensions Beck2019Machine ; Berner2020Analysis . This approach also provides a natural framework for estimating the unknown parameters Fang2020NN ; Raissi2019pinn ; Raissi2018Hidden ; Thuerey2020Deep ; Wang2017pinn .
As deep neural networks are universal function approximators, it is natural to employ them as ansatz spaces for solutions of (ordinary or partial) differential equations. This paves the way for their use in physical modeling and scientific computing and gives rise to the field of scientific machine learning Karniadakisetal2021 ; SirignanoS2018 ; Raissi2019pinn ; EY2018 ; Lu2021DeepXDE . The physics-informed neural network (PINN) approach was introduced in Raissi2019pinn . It has been successfully applied to a variety of forward and inverse PDE problems and has become one of the most commonly-used methods in scientific machine learning (see e.g. Raissi2019pinn ; HeX2019 ; CyrGPPT2020 ; JagtapKK2020 ; WangL2020 ; JagtapK2020 ; CaiCLL2020 ; Tartakovskyetal2020 ; DongN2021 ; TangWL2021 ; DongL2021 ; CalabroFS2021 ; WanW2022 ; FabianiCRS2021 ; KrishnapriyanGZKM2021 ; DongY2022 ; DongY2022rm ; WangYP2022 ; Pateletal2022 ; DongW2022 ; Siegeletal2022 ; HuLWX2022 ; Penwardenetal2023 , among others). The references Karniadakisetal2021 ; Cuomo2022Scientific provide a comprehensive review of the literature on PINN and about the benefits and drawbacks of this approach.
The mathematical foundation for PINN aiming at the approximation of PDE solution is currently an active area of research. It is important to account for different components of the neural-network error: optimization error, approximation error, and estimation error Niyogi1999Generalization ; Shin2020On . Approximation error refers to the discrepancy between the exact functional map and the neural network mapping function on a given network architecture Calin2020Deep ; Elbrachter2021deep . Estimation error arises when the network is trained on a finite data set to get a mapping on the target domain. The generalization error is the combination of approximation and estimation errors and defines the accuracy of the neural-network predicted solution trained on the given set of data.
Theoretical understanding of PINN has been advanced by a number of recent works. In Shin2020On Shin et al. rigorously justify why PINN works and shows its consistency for linear elliptic and parabolic PDEs under certain assumptions. These results are extended in Shin2010.08019 to a general abstract framework for analyzing PINN for linear problems with the loss function formulated in terms of the strong or weak forms of the equations. In Mishra2022Estimates Mishra and Molinaro provide an abstract framework on PINN for forward PDE problems, and estimate the generalization error by means of the training error and the number of training data points. This framework is extended in Mishra2022inverse to study several inverse PDE problems, including the Poisson, heat, wave and Stokes equations. Bai and Koley Bai2021PINN investigate the PINN approximation of nonlinear dispersive PDEs such as the KdV-Kawahara, Camassa-Holm and Benjamin-Ono equations. In Biswas2022Error Biswa et al. provide explicit error estimates (in suitable norms) and stability analyses for the incompressible Navier–Stokes equations. Zerbinati Zerbinati2022pinns presents PINN as an under-determined point matching collocation method, reveals its connection with Galerkin Least Squares (GALS) method, and establishes an a priori error estimate for elliptic problems.
An important theoretical result on the approximation errors from the recent work DeRyck2021On establishes that a feed-forward neural network with a activation function and two hidden layers may approximate a function with a bound in a Sobolev space,
Here , is the dimension of the problem, is the number of training points, and are explicitly known constants independent of . Based on this result, De Ryck et al. 2023_IMA_Mishra_NS have studied the PINN for the Navier–Stokes equations and shown that a small training error implies a small generalization error. In particular, Hu et al. Ruimeng2209.11929 provide the higher-order (spatial Sobolev norm) error estimates for the primitive equations, which improve the existing results in the PINN literature that only involve errors. In DeRyck2022Estimates it has been shown that, with a sufficient number of randomly chosen training points, the total error can be bounded by the generalization error for Kolmogorov-type PDEs, which in turn is bounded by the training error. It is proved that the size of the PINN and the number of training samples only increase polynomially with the problem dimension, thus enabling PINN to overcome the curse of dimensionality in this case. In Mishra2021pinn the authors investigate the high-dimensional radiative transfer equation and prove that the generalization error is bounded by the training error and the number of training points, where the upper bound depends on the dimension only through a logarithmic factor. Hence PINN does not suffer from the curse of dimensionality, provided that the training errors do not depend on the underlying dimension.
Although PINN has been widely used for approximating PDEs, theoretical investigations on its convergence and errors are still quite limited and are largely confined to elliptic and parabolic PDEs. There seems to be less (or little) theoretical analysis on the convergence of PINN for hyperbolic type PDEs. In this paper, we consider a class of dynamic PDEs of second order in time, which are hyperbolic in nature, and provide an analysis of the convergence and errors of the PINN algorithm applied to such problems. We have focused on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation in our analyses. Building upon the result of DeRyck2021On ; 2023_IMA_Mishra_NS on neural networks with two hidden layers, we have shown that for these three kinds of PDEs:
-
•
The underlying PDE residuals in PINN can be made arbitrarily small with neural networks having two hidden layers.
-
•
The total error of the PINN approximation is bounded by the generalization error of PINN.
-
•
The total error of PINN approximations for the solution field, its time derivative and its gradient is bounded by the training error (training loss) of PINN and the number of quadrature points (training data points).
Furthermore, our theoretical analyses have suggested PINN training loss functions for these PDEs that are somewhat different in form than from the canonical PINN formulation. These lie in two aspects: (i) Our analyses require certain residual terms (such as the gradient of the initial condition, the time derivative of the boundary condition, or in the case of linear elastodynamic equation the strain and divergence of the initial condition) in the training loss, which would be absent from the canonical PINN formulation of the loss function. (ii) Our analyses may require, depending on the type of boundary conditions, a norm other than the norm for certain boundary residuals in the training loss, which is different from the commonly-used norm in the canonical PINN formulation of the loss function.
These new forms for the training loss function suggested by the theoretical analyses lead to a variant PINN algorithm. We have implemented the PINN algorithm based on these new forms of the training loss function for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Ample numerical experiments based on this algorithm have been presented. The simulation results indicate that the method has captured the solution field reasonably well for these PDEs. The numerical results also to some extent corroborate the theoretical relation between the approximation error and the PINN training loss obtained from the error analysis.
The rest of this paper is organized as follows. In Section 2 we present an overview of PINN for dynamic PDEs of second order in time. In Sections 3, 4 and 5, we present an error analysis of the PINN algorithm for approximating the wave equation, Sine-Gordon equation, and the linear elastodynamic equation, respectively. Section 6 summarizes a set of numerical experiments with these three PDEs to supplement and support our theoretical analyses. Section 7 concludes the presentation with some closing remarks. Finally, the appendix (Section 8) recalls some auxiliary results for our analysis and provides the proofs of the main theorems in Sections 4 and 5.
2 Physics Informed Neural Networks (PINN) for Approximating PDEs
2.1 Generic PDE of Second Order in Time
Consider a compact domain ( being an integer), and let and denote the differential and boundary operators. We consider the following general form of an initial boundary value problem with a generic PDE of second order in time. For any , and ,
(1a) | ||||
(1b) | ||||
(1c) |
Here, is the unknown field solution, denotes the boundary data, and and are the initial distributions for and . We assume that in the highest derivative with respect to the time variable , if any, is of first order.
2.2 Neural Network Representation of a Function
Let denote an activation function that is at least twice continuously differentiable. For any and , we define , where () are the components of . We adopt the following formal definition for a feedforward neural network as given in 2023_IMA_Mishra_NS .
Definition 2.1 (2023_IMA_Mishra_NS ).
Let , and . Let be a twice differentiable function and define
(2) |
For , we define and by for , and we define by
(5) |
Denote the function that satisfies for all that
(6) |
We set and for approximating the PDE problem (1).
as defined above is the neural-network representation of a parameterized function associated with the parameter . This neural network contains layers (), with widths for each layer. The input layer has a width , and the output layer has a width . The layers between the input/output layers are the hidden layers, with widths (). and are the weight/bias coefficients corresponding to layer for . From layer to layer the network logic represents an affine transform, followed by a function composition with the activation function . Note that no activation function is applied to the output layer. We refer to with (i.e. single hidden layer) as a shallow neural network, and with (i.e. multiple hidden layers) as a deeper or deep neural network.
2.3 Physics Informed Neural Network for Initial/Boundary Value Problem
Let and be the spatial-temporal domaindomain. We approximate the solution to the problem (1) by a neural network . With PINN we consider the residual function of the initial/boundary value problem (1), defined for any sufficiently smooth function as, for any , and ,
(7a) | ||||
(7b) | ||||
(7c) | ||||
(7d) |
These residuals chacracterize how well a given function satisfies the initial/boundary value problem (1). If is the exact solution, .
To facilitate the subsequent analyses, we introduce an auxiliary function and rewrite as
(8) |
We reformulate (1a) into two equations, thus separating the interior residual into the following two components:
(9) | ||||
(10) |
With PINN, we seek a neural network to minimize the following quantity,
(11) |
The different terms of (11) may be rescaled by different weights (penalty coefficients). For simplicity, we set all these weights to one in the analysis. as defined above is often referred to as the generalization error. Because of the integrals involved therein, can be hard to minimize. In practice, one will approximate (11) by an appropriate numerical quadrature rule, as follows
(12) |
where
(13a) | ||||
(13b) | ||||
(13c) | ||||
(13d) | ||||
(13e) |
The quadrature points in the spatial-temporal domain and on the spatial and temporal boundaries, , and , constitute the input data sets to the neural network. In the above equations is referred to as the training error (or training loss), and are suitable quadrature weights for , and . Therefore, PINN attempts to minimize the training error over the network parameters , and upon convergence of optimization the trained contains the approximation of the solution to the problem (1).
Remark 2.2.
The generalization error (11) (with the corresponding training error (12)) is the standard (canonical) PINN form if one introduces and reformulates (1a) into two equations. We would like to emphasize that our analyses below suggest alternative forms for the generalization error, e.g.
(14) |
which differs from (11) in the terms , and the last term. The corresponding training error is,
(15) |
where
(16) |
The error analyses also suggest additional terms in the generalization error for different equations.
2.4 Numerical Quadrature Rules
As discussed above, we need to approximate the integrals of functions. The analysis in the subsequent sections requires well-known results on numerical quadrature rules as reviewed below.
Given and a function , we would like to approximate . A quadrature rule provides an approximation by
(17) |
where () are the quadrature points and () denote the appropriate quadrature weights. The approximation accuracy is influenced by the type of quadrature rule, the number of quadrature points (), and the regularity of . For the mid-point rule, which is assumed in the analysis in the current work, the approximation accuracy is given by
(18) |
where ( denotes ) and has been partitioned into cubes and () denote the midpoints of these cubes DavisR2007 . In this paper, we use to denote a universal constant, which may depend on and but not on . And we use the subscript to emphasize its dependence when necessary, e.g. is a constant depending only on .
We focus on PDE problems in relatively low dimensions () in this paper and employ the standard quadrature rules. We note that in higher dimensions the standard quadrature rules may not be favorable. In this case the random training points or low-discrepancy training points Mishra2021Enhancing may be preferred.
In subsequent sections we focus on three representative dynamic equations of second order in time (the wave equation, Sine-Gordon equation, and the linear elastodynamic equation), and provide the error estimate for approximating these equations by PINN. We note that these analyses suggest alternative forms for the training loss function that are somewhat different from the standard PINN forms Raissi2019pinn . The PINN numerical results based on the standard form for the loss function, and based on the alternative forms as suggested by the error estimate, will be provided after the presentation of the theoretical analysis. In what follows, for brevity we adopt the notation of , (), for any sufficiently smooth function .
3 Physics Informed Neural Networks for Approximating Wave Equation
3.1 Wave Equation
Consider the following wave equations on the torus with periodic boundary conditions:
(19a) | ||||
(19b) | ||||
(19c) | ||||
(19d) | ||||
(19e) | ||||
(19f) |
The regularity results for linear evolution equations of the second order in time have been studied in the Book Temam1997Infinite . When the self-adjoint operator takes , the linear evolution equations of second order in time become the classical wave equations, and then we can also obtain the following regularity results.
Lemma 3.1.
Let , , and , then there exists a unique solution to the classical wave equations such that and .
Lemma 3.2.
Let , , and with , then there exists and a classical solution to the wave equations such that , , and .
Proof.
By Lemma 3.1, there exists and the solution to the wave equations such that , , and . As , is a Banach algebra.
For , since , and , we have and . Then, it implies that and .
For , by , we have and . Then, it implies that and .
Repeating the same argument, we have and . Then, applying the Sobolev embedding theorem and , it holds and for . Therefore, and . ∎
3.2 Physics Informed Neural Networks
We would like to approximate the solutions to the problem (19) with PINN. We seek deep neural networks and , parameterized by , that approximate the solution and of (19). Define residuals,
(20a) | ||||
(20b) | ||||
(20c) | ||||
(20d) | ||||
(20e) | ||||
(20f) |
Note that for the exact solution . Let and be the space-time domain. With PINN, we minimize the the following generalization error,
(21) |
The form of different terms in this expression will become clearer below.
To complete the PINN formulation, we will choose the training set based on suitable quadrature points. We divide the full training set into the following three components:
-
•
Interior training points for , with each .
-
•
Spatial boundary training points for , with each .
-
•
Temporal boundary training points for with each .
We define the PINN training loss, , as follows,
(22) |
where
(23a) | ||||
(23b) | ||||
(23c) | ||||
(23d) | ||||
(23e) | ||||
(23f) | ||||
(23g) | ||||
(23h) |
Here the quadrature points in space-time constitute the data sets , and , and are suitable quadrature weights with denoting , or .
Let
denote the difference between the solution to the wave equations and the PINN approximation of the solution. We define the total error of the PINN approximation by
(24) |
3.3 Error Analysis
In light of the wave equations (19) and the definitions for different residuals (20), we have
(25a) | ||||
(25b) | ||||
(25c) | ||||
(25d) | ||||
(25e) | ||||
(25f) |
3.3.1 Bound on the Residuals
Theorem 3.3.
Let , , with . Let , and with . For every integer , there exist neural networks and , each with two hidden layers, of widths at most and , such that
(26a) | ||||
(26b) | ||||
(26c) |
3.3.2 Bounds on the Total Approximation Error
We next show that the total error is also small when the generalization error is small with the PINN approximation . Then we prove that the total error can be arbitrarily small, provided that the training error is sufficiently small and the sample set is sufficiently large.
Theorem 3.4.
Let , and be the classical solution to the wave equations (19). Let and denote the PINN approximation with parameter . Then the following relation holds,
(29) |
where
Proof.
By taking the inner product of (25a) and (25b) with and and integrating over , respectively, we have
(30) | ||||
(31) |
Here, we have used .
By adding (30) to (3.3.2), we have
(32) |
Integrating (3.3.2) over for any and applying the Cauchy–Schwarz inequality, we obtain
We apply the integral form of the Grnwall inequality to the above inequality to get
where
Then, we integrate the above inequality over to yield (29). ∎
Remark 3.5.
For the wave equations (19) with periodic boundary, we would like to mention below two other forms for the generalization error (and the related training loss). Compared with (3.2), they differ only on the spatial boundary , i.e.,
(33) |
and
(34) |
The related training loss functions are given by
(35) |
or
(36) |
These three forms for the generalization error result from different treatments of the boundary term in the proof of Theorem 3.4:
Our numerical experiments indicate that adopting the training loss (3.5) or (3.5) seems to lead to poorer simulation results. For the periodic boundary, both terms and may be needed for the periodicity information. We suspect that this may be why only a single boundary term ( or ), as given by (3.5) and (3.5), leads to poorer numerical results.
Theorem 3.6.
4 Physics Informed Neural Networks for Approximating the Sine-Gordon Equation
4.1 Sine-Gordon Equation
Let be an open connected bounded set with a boundary . We consider the following Sine-Gordon equation:
(38a) | ||||
(38b) | ||||
(38c) | ||||
(38d) | ||||
(38e) |
where and are the field functions to be solved for, is a source term, and , and denote the boundary/initial conditions. , and are constants. is a nonlinear term. We assume that the nonlinearity is globally Lipschitz, i.e., there exists a constant (independent of and ) such that
(39) |
Remark 4.1.
The existence and regularity of the solution to the Sine-Gordon equation with different nonlinear terms have been the subject of several studies in the literature; see Baoxiang1997Classical ; Kubota2001Global ; Shatah1982Global ; Shatah1985Normal ; Temam1997Infinite .
The book Temam1997Infinite provides the existence and regularity result of the following Sine-Gordon equation,
Let , be a function from to and satisfy certain assumptions. If , and , then there exists a unique solution to this Sine-Gordon equation such that and . Furthermore, , and , it holds and .
Let be a smooth function of degree 2. The following equation is studied in Shatah1985Normal ,
where it is reformulated as
in which , and . Set , with . Given and , there exists a depending on the size of the initial data and a unique solution .
The reference Baoxiang1997Classical provides the following result. Under certain conditions for the nonlinear term , with , , , and , there exists a unique solution of nonlinear Klein–Gordon equation.
The following result is due to Kubota2001Global . Under certain conditions for the nonlinear term , with , and with a positive constant , there exists a positive constant and a unique solution to the nonlinear wave equations with different speeds of propagation.
A survey of literature indicates that, while several works have touched on the regularity of the solution to the Sine-Gordon equations, none of them is comprehensive. To facilitate the subsequent analyses, we make the following assumption in light of Remark 4.1. Let , and be sufficiently smooth and bounded. Given and with , we assume that there exists and a classical solution and to the Sine-Gordon equations (38) such that and . Then, it follows that and based on the Sobolev embedding theorem.
4.2 Physics Informed Neural Networks
Let and be the space-time domain. We define the following residuals for the PINN approximation, and , for the Sine-Gordon equations (38):
(40a) | ||||
(40b) | ||||
(40c) | ||||
(40d) | ||||
(40e) |
where . Note that for the exact solution , . With PINN we minimize the following generalization error,
(41) |
Let
where denotes the exact solution. We define the total error of the PINN approximation of the Sine-Gordon equations (38) as,
(42) |
Then we choose the training set with , based on suitable quadrature points:
-
•
Interior training points for , with each .
-
•
Spatial boundary training points for , with each .
-
•
Temporal boundary training points for with each .
The integrals in (4.2) are approximated by a numerical quadrature rule, resulting in the training loss,
(43) |
where
(44a) | ||||
(44b) | ||||
(44c) | ||||
(44d) | ||||
(44e) | ||||
(44f) | ||||
(44g) |
Here the quadrature points in space-time constitute the data sets , and , and are the quadrature weights with being , or .
4.3 Error Analysis
By substracting the Sine-Gordon equations (38) from the residual equations (40), we get,
(45a) | ||||
(45b) | ||||
(45c) | ||||
(45d) | ||||
(45e) |
The results on the PINN approximations to the Sine-Gordon equations are summarized in the following theorems.
Theorem 4.2.
Let , , with . Assume that is Lipschitz continuous, and . Then for every integer , there exist neural networks and , each with two hidden layers, of widths at most and , such that
(46a) | ||||
(46b) | ||||
(46c) |
The proof of this theorem is provided in the Appendix 8.3.
Theorem 4.2 implies that the PINN residuals in (40) can be made arbitrarily small by choosing a sufficiently large . Therefore, the generalization error can be made arbitrarily small.
We next show that the PINN total approximation error can be controlled by the generalization error (Theorem 4.3 below), and by the training error (Theorem 4.4 below). The proofs for Theorem 4.3 and Theorem 4.4 are provided in the Appendix 8.3.
Theorem 4.3.
Let , and be the classical solution of the Sine-Gordon equation (38). Let denote the PINN approximation with parameter . Then the following relation holds,
(47) |
where
and .
Theorem 4.4.
Let and , and let and be the classical solution to the Sine-Gordon equation (38). Let denote the PINN approximation with parameter . Then the following relation holds,
(48) |
where the constant is defined by
It follows from Theorem 4.4 that the PINN approximation error can be arbitrarily small, provided that the training error is sufficiently small and the sample set is sufficiently large.
5 Physics Informed Neural Networks for Approximating Linear Elastodynamic Equation
5.1 Linear Elastodynamic Equation
Consider an elastic body occupying an open, bounded convex polyhedral domain . The boundary , with the outward unit normal vector , is assumed to be composed of two disjoint portions and , with . Given a suitable external load , and suitable initial/boundary data , and , we consider the linear elastodynamic equations,
(49a) | ||||
(49b) | ||||
(49c) | ||||
(49d) | ||||
(49e) | ||||
(49f) |
In the above system, and denote the displacement and the velocity, respectively, and (with ) denotes the time domain. is the strain tensor, . The constants and are the first and the second Lam parameters, respectively.
Combining (49a) and (49b), we can recover the classical linear elastodynamics equation:
(50) |
The well-posedness of this equation is established in Hughes1978Classical .
Lemma 5.1 (Hughes1978Classical ; Yosida1980Functional ).
Let , and with . Then there exists a unique solution to the classical linear elastodynamic equation (50) such that , and with .
Lemma 5.2.
Let , , and with , then there exists and a classical solution to the elastodynamic equations (49) such that , , and .
Proof.
As , is a Banach algebra. By Lemma 5.1, there exists and the solution to the linear elastodynamics equations such that , , with and with .
Since and . By applying the Sobolev embedding theorem and , we obtain and for . Therefore, and . ∎
5.2 Physics Informed Neural Networks
We now consider the PINN approximation of the linear elastodynamic equations (49). Let , and denote the space-time domain. Define the following residuals for the PINN approximation and for the elastodynamic equations (49):
(51a) | ||||
(51b) | ||||
(51c) | ||||
(51d) | ||||
(51e) | ||||
(51f) |
Note that for the exact solution , we have . With PINN we minimize the the following generalization error,
(52) |
Let
denote the difference between the solution to the elastodynamic equations (49) and the PINN approximation with parameter . We define the total error of the PINN approximation as,
(53) |
We choose the training set based on suitable quadrature points. The full training set is defined by , and :
-
•
Interior training points for , with each .
-
•
Spatial boundary training points for , with each , and for , with each .
-
•
Temporal boundary training points for with each .
Then, the integrals in (5.2) can be approximated by a suitable numerical quadrature, resulting in the following training loss,
(54) |
where,
(55a) | ||||
(55b) | ||||
(55c) | ||||
(55d) | ||||
(55e) | ||||
(55f) | ||||
(55g) | ||||
(55h) | ||||
(55i) | ||||
(55j) |
Here the quadrature points in space-time constitute the data sets , , and . denote the suitable quadrature weights with being , , and .
5.3 Error Analysis
Subtracting the elastodynamic equations (49) from the residual equations (51), we obtain
(56a) | ||||
(56b) | ||||
(56c) | ||||
(56d) | ||||
(56e) | ||||
(56f) |
The PINN approximation results are summarized in the following three theorems. The proofs of these theorems are provided in the Appendix 8.4.
Theorem 5.3.
Let , , with . Let , and with . For every integer , there exist neural networks and , with , each with two hidden layers, of widths at most and , such that
(57a) | ||||
(57b) | ||||
(57c) | ||||
(57d) |
It follows from Theorem 5.3 that, by choosing a sufficiently large , one can make the PINN residuals in (51), and thus the generalization error in (5.2), arbitrarily small.
Theorem 5.4.
Let , and be the classical solution to the linear elastodynamic equation (49). Let denote the PINN approximation with the parameter . Then the following relation holds,
where
with and .
Theorem 5.4 shows that the total error of the PINN approximation can be controlled by the generalization error .
Theorem 5.5.
Let , and be the classical solution to the linear elastodynamic equation (49). Let denote the PINN approximation with the parameter . Then the following relation holds,
(58) |
where
Theorem 5.5 shows that the PINN approximation error can be controlled by the training error with a large enough sample set .
6 Numerical Examples
The theoretical analyses from Sections 3 to 5 suggest several forms for the PINN loss function with the wave, Sine-Gordon and the linear elastodynamic equations. These forms contain certain non-standard terms, such as the square root of the residuals or the gradient terms on some boundaries, which would generally be absent from the canonical PINN formulation of the loss function. The presence of such non-standard terms is crucial to bounding the PINN approximation errors, as shown in the error analyses.
These non-standard forms of the loss function lead to a variant PINN algorithm. In this section we illustrate the performance of the variant PINN algorithm as suggested by the theoretical analysis, as well as the more standard PINN algorithm, using several numerical examples in one spatial dimension (1D) plus time for the wave equation and the Sine-Gordon equation, and in two spatial dimensions (2D) plus time for the linear elastodynamic equation.
The following settings are common to all the numerical simulations in this section. Let denote the spatial and temporal coordinates in the spatial-temporal domain, where and for one and two spatial dimensions, respectively. For the wave equation and the Sine-Gordon equation, the neural networks contain two nodes in the input layer (representing and ), two hidden layers with the number of nodes to be specified later, and two nodes in the output layer (representing the solution and its time detivative ). For the linear elastodynamic equaton, three input nodes and four output nodes are employed in the neural network, as will be explained in more detail later. We employ the (hyperbolic tangent) activation function for all the hidden nodes, and no activation function is applied to the output nodes (i.e. linear). For training the neural networks, we employ collocation points within the spatial-temporal domain drawn from a uniform random distribution, and also uniform random points on each spatial boundary and on the initial boundary. In the simulations the value of is varied systematically among , , , and . After the neural networks are trained, for the wave equation and the Sine-Gordon equation, we compare the PINN solution and the exact solution on a set of uniform spatial-temporal grid points (evaluation points) () that covers the problem domain and the boundaries. For the elastodynamic equation, we compare the PINN solution and the exact solution at different time instants, and at each time instant the corresponding solutions are evaluated at a uniform set of grid points in the spatial domain, ().
The PINN errors reported below are computed as follows. Let () denote the set of uniform grid points, where denote the number of evaluation points. The errors of PINN are defined by,
(59a) | ||||
(59b) |
where denotes the PINN solution and denotes the exact solution.
Our implementation of the PINN algorithm is based on the PyTorch library (pytorch.org). In all the following numerical examples, we combine the Adam kingma2014adam optimizer and the L-BFGS 2006_NumericalOptimization optimizer (in batch mode) to train the neural network. We first employ the Adam optimizer to train the network for 100 epochs/iterations, and then employ the L-BFGS optimizer to continue the network training for another 30000 iterations. We employ the default parameter values in Adam, with the learning rate , and . The initial learning rate is adopted in the L-BFGS optimizer.
6.1 Wave Equation






We next test the PINN algorithm for solving the wave equation (19) in one spatial dimension (plus time), under a configuration in accordance with that of 2021_JCP_Dong_modifiedbatch . Consider the spatial-temporal domain, , and the initial-boundary value problem with the wave equation on this domain,
(60a) | |||
(60b) | |||
(60c) |
where is the wave field to be solved for, is the wave speed, is the initial peak location of the wave, is a constant that controls the width of the wave profile, and the periodic boundary conditions are imposed on and . In the simulations, we employ , , and . Then the above problem has the solution,
where mod refers to the modulo operation. The two terms in represent the leftward- and rightward-traveling waves, respectively.
We reformulate the problem (60) into the following system,
(61a) | |||
(61b) | |||
(61c) |
where is an auxiliary field given by the first equation in (61a).
To solve the system (61) with PINN, we employ and neurons in the first and the second hidden layers of neural networks, respectively. We employ the following loss function in PINN in light of (3.2),
(62) |
Note that in the simulations we have employed the same number of collocation points () within the domain and on each of the domain boundaries. The above loss function differs slightly from the one in the error analysis (3.2), in several aspects. First, we have added a set of penalty coefficients () for different loss terms in numerical simulations. Second, the collocation points used in simulations (e.g. , , , , ) are generated randomly within the domain or on the domain boundaries from a uniform distribution. In addition, the averaging used here do not exactly correspond to the numerical quadrature rule (mid-point rule) used in the theoretical analysis.
We have also considered another form (given below) for the loss function, as suggested by an alternate analysis as discussed in Remark 3.5 (see equation (3.5)),
(63) |
The difference between this form and the form (6.1) lies in the last two terms, with the terms here not squared.
The loss function (6.1) will be referred to as the loss form #1 in subsequent discussions, and (6.1) will be referred to as the loss form #2. The PINN schemes that employ these two different loss forms will be referred to as PINN-F1 and PINN-F2, respectively.
Figure 1 shows distributions of the exact solutions, the PINN solutions, and the PINN point-wise absolute errors for and in the spatial-temporal domain. Here the PINN solution is computed by PINN-F1, in which penalty coefficients are given by . One can observe that the method has captured the wave fields for and reasonably well, with the error for notably smaller than that of .














Figures 2 and 3 provide a comparison of the solutions obtained using the two forms of loss functions. Figure 2 compares profiles of the PINN-F1 and PINN-F2 solutions, and the exact solution, for (top row) at three time instants (, , and ), as well as the error profiles (bottom row). Figure 3 shows the corresponding results for the field variable . These results are obtained by using training data points in the domain and on each of the domain boundaries. It is observed that both PINN schemes, with the loss functions given by (6.1) and (6.1) respectively, have captured the solution reasonably well. We further observe that the PINN-F1 scheme (with the loss form (6.1)) produces notably more accurate results than the PINN-F2 (with loss form (6.1)), especially for the field .
We have varied the number of training data points systematically and studied its effect on the PINN results. Figure 4 shows the loss histories of PINN-F1 and PINN-F2 corresponding to different number of training data points () in the simulations, with a total of training iterations. We can make two observations. First, the history curves with the loss function form #1 is generally smoother, indicating that the loss function decreases almost monotonically as the training progresses. On the other hand, significant fluctuations in the loss history can be observed with the form #2. Second, the eventual loss values produced by the loss form #1 are significantly smaller, by over an order of magnitude, than those produced by the loss form #2.
Table 1 is a further comparison between the PINN-F1 and PINN-F2. Here the and errors of and computed by PINN-F1 and PINN-F2 corresponding to different training data points () have been listed. There appears to be a general trend that the errors tend to decrease with increasing number of training points, but the decrease is not monotonic. It can be observed that the errors are notably smaller than those for , as signified earlier in e.g. Figure 1. One can again observe that PINN-F1 results are notably more accurate than those of PINN-F2 for the wave equation.
method | -error | -error | |||
---|---|---|---|---|---|
1000 | 5.7013e-03 | 1.3531e-02 | 1.8821e-02 | 4.6631e-02 | |
1500 | 2.1689e-03 | 4.1035e-03 | 6.7631e-03 | 1.5109e-02 | |
PINN-F1 | 2000 | 4.6896e-03 | 9.6417e-03 | 1.3828e-02 | 3.3063e-02 |
2500 | 3.7879e-03 | 9.8574e-03 | 1.2868e-02 | 3.3622e-02 | |
3000 | 2.6588e-03 | 6.0746e-03 | 8.1457e-03 | 1.9860e-02 | |
1000 | 4.7281e-02 | 9.2431e-02 | 1.4367e-01 | 3.2764e-01 | |
1500 | 4.9087e-02 | 1.2438e-01 | 2.1525e-01 | 5.0601e-01 | |
PINN-F2 | 2000 | 1.8554e-02 | 4.9224e-02 | 6.0780e-02 | 1.6358e-01 |
2500 | 2.3526e-02 | 5.4266e-02 | 9.8690e-02 | 1.9467e-01 | |
3000 | 1.4164e-02 | 3.7796e-02 | 5.3045e-02 | 1.4179e-01 |


Theorem 3.6 suggests the solution errors for , , and approximately scale as the square root of the training loss function. Figure 5 provides some numerical evidence for this point. Here we plot the errors for , and from our simulations as a function of the training loss value for PINN-F1 and PINN-F2 in logarithmic scales. It is evident that for PINN-F1 the scaling essentially follows the square root relation. For PINN-F2 the relation between the error and the training loss appears to scale with a power somewhat larger than .
6.2 Sine-Gordon Equation






We test the PINN algorithm suggested by the theoretical analysis for the Sine-Gordon equation (38) in this subsection. Consider the spatial-temporal domain , and the following initial/boundary value problem on this domain,
(64a) | |||
(64b) | |||
(64c) |
In these equations, is the field function to be solved for, is a source term, and are the initial conditions, and and are the boundary conditions. The source term, initial and boundary conditions appropriately are chosen by the following exact solution,
(65) |
To simulate this problem with PINN, we reformulate the problem as follows,
(66a) | |||
(66b) | |||
(66c) | |||
(66d) |
where is a variable defined by equation (66a).
In light of (4.2), we employ the following loss function in PINN,
(67) |
where () are the penalty coefficients for different loss terms added in the PINN implementation. It should be noted that the loss terms with the coefficients and will be absent from the conventional PINN formulation (see Raissi2019pinn ). These terms in the training loss are necessary based on the error analysis in Section 4. It should also be noted that the loss terms are not squared, as dictated by the theoretical analysis of Section 4.
We have also implemented a PINN scheme with a variant form for the loss function,
(68) |
The difference between (6.2) and (6.2) lies in the terms. These terms in (6.2) are squared, and they are not in (6.2). We refer to the PINN scheme employing the loss function (6.2) as PINN-G1 and the scheme employing the loss function (6.2) as PINN-G2.
In the simulations we employ a feed-forward neural network with two input nodes (representing and ), two output nodes (representing and ), and two hidden layers, each having a width of nodes. The activation function has been used for all the hidden nodes. We employ collocation points generated from a uniform random distribution within the domain, on each of the domain boundary, and also on the initial boundary, where is varied systematically in the simulations. The penalty coefficients in the loss functions are taken to be .












Figure 6 shows distributions of of and from the exact solution (left column) and the PINN solution (middle column), as well as the point-wise absolute errors of the PINN solution for these fields (right column). These results are obtained by PINN-G2 with random collocation points within the domain and on each of the domain boundaries. The PINN solution is in good agreement with the true solution.
Figures 7 and 8 compare the profiles of and between the exact solution, and the solutions obtained by PINN-G1 and PINN-G2, at several time instants (, and ). Profiles of the absolute errors of the PINN-G1/PINN-G2 solutions are also shown in these figures. We observe that both PINN-G1 and PINN-G2 have captured the solution for quite accurately, and to a lesser extent, also for . Comparison of the error profiles between PINN-G1 and PINN-G2 suggests that the PINN-G2 error in general appears to be somewhat smaller than that of PINN-G1. But this seems not to be true consistently in the entire domain.


method | -error | -error | |||
---|---|---|---|---|---|
1000 | 3.0818e-03 | 4.3500e-03 | 9.6044e-03 | 1.8894e-02 | |
1500 | 3.4335e-03 | 4.8035e-03 | 1.0566e-02 | 1.7050e-02 | |
PINN-G1 | 2000 | 2.1914e-03 | 3.0055e-03 | 7.5882e-03 | 1.1099e-02 |
2500 | 3.0172e-03 | 3.5698e-03 | 9.2515e-03 | 1.4645e-02 | |
3000 | 2.5281e-03 | 4.4858e-03 | 7.2785e-03 | 1.6213e-02 | |
1000 | 3.0674e-03 | 2.0581e-03 | 7.3413e-03 | 1.1323e-02 | |
1500 | 1.0605e-03 | 1.4729e-03 | 2.2914e-03 | 6.2831e-03 | |
PINN-G2 | 2000 | 2.2469e-03 | 1.6072e-03 | 4.8842e-03 | 8.8320e-03 |
2500 | 6.6072e-04 | 6.0509e-04 | 1.4099e-03 | 4.3423e-03 | |
3000 | 6.6214e-04 | 1.0830e-03 | 1.9697e-03 | 7.8866e-03 |
The effect of the collocation points on the PINN results has been studied by varying the number of training collocation points systematically between and within the domain and on each of the domain boundaries. The results are provided in Figure 9 and Table 2. Figure 9 shows histories of the loss function corresponding to different number of collocation points for PINN-G1 and PINN-G2. Table 2 provides the and errors of and versus the number of collocation points computed by PINN-G1 and PINN-G2. The PINN errors in general tend to decrease with increasing number of collocation points, but this trend is not monotonic. It can be observed that both PINN-G1 and PINN-G2 have captured the solutions quite accurately, with those errors from PINN-G2 in general slightly better.


Figure 10 provides some numerical evidence for the relation between the total error and the training loss as suggested by Theorem 4.4. Here we plot the errors for , and as a function of the training loss value obtained by PINN-G1 and PINN-G2. The results indicate that the total error scales approximately as the square root of the training loss, which in some sense corroborates the error-loss relation as expressed in Theorem 4.4.
6.3 Linear Elastodynamic Equation
In this subsection we look into the linear elastodynamic equation (in two spatial dimensions plus time) and test the PINN algorithm as suggested by the theoretical analysis in Section 5 using this equation. Consider the spatial-temporal domain , and the following initial/boundary value problem with the linear elastodynamics equation on :
(69a) | ||||
(69b) | ||||
(69c) |
where (, ) is the displacement field to be solved for, is a source term, and , and are material constants. is the Dirichlet boundary and is the Neumann boundary, with and , where is the outward-pointing unit normal vector. In our simulations we choose the left boundary () as the Dirichlet boundary, and the rest are Neumann boundaries. and are Dirichlet and Neumann boundary conditions, respectively. and are the initial conditions for the displacement and the velocity. We employ the material parameter values , and the following manufactured solution (2018_CMAME_DGelastodynamics ) to this problem,
(70) |
The source term , the boundary/initial distributions , , and are chosen by the expression (70).
To simulate this problem using the PINN algorithm suggested by the theoretical analysis from Section 5, we reformulate (69) into the following system
(71a) | ||||
(71b) | ||||
(71c) |
where is an intermediate variable (representing the velocity) as given by (71a).
In light of (5.2), we employ the following loss function for PINN,
Loss | ||||
(72) |
where we have added the penalty coefficients, (), for different loss terms in the implementation, and denotes the number of collocation points within the domain and on the domain boundaries. In the numerical tests we have also implemented another form for the loss function as follows,
Loss | ||||
(73) |
The difference between these two forms for the loss function lies in the and terms. It should be noted that the and terms in (6.3) are not squared, in light of the error terms (55a)(55j) from the theoretical analysis. In contrast, these terms are squared in (6.3). The PINN scheme utilizing the loss function (6.3) is henceforth referred to as PINN-H1, and the scheme that employs the loss function (6.3) shall be referred to as PINN-H2.
In the simulations, we employ a feed-forward neural network with three input nodes, which represent and the time variable t, and four output nodes, which represent and . The neural network has two hidden layers, with widths of 90 and 60 nodes, respectively, and the activation function for all the hidden nodes. For the network training, collocation points are generated from a uniform random distribution within the domain, on each of the domain boundary, as well as on the initial boundary. is systematically varied in the simulations. We employ the penalty coefficients in the simulations.















In Figures 11 and 12 we compare the PINN-H1/PINN-H2 solutions with the exact solution and provide an overview of their errors. Figure 11 is a visualization of the deformed configuration of the domain. Here we have plotted the deformed field, , for a set of grid points at three time instants from the exact solution, the PINN-H1 and PINN-H2 solutions. Figure 12 shows distributions of the point-wise absolute error of the PINN-H1/PINN-H2 solutions, , at the same three time instants. Here denotes the PINN solution. While both PINN schemes capture the solution fairly well at and , at both schemes show larger deviations from the true solution. In general, the PINN-H1 scheme appears to produce a better approximation to the solution than PINN-H2.
-error | -error | |||||||
---|---|---|---|---|---|---|---|---|
PINN-H1 | ||||||||
1000 | 4.8837e-02 | 6.0673e-02 | 4.7460e-02 | 5.1640e-02 | 1.7189e-01 | 2.1201e-01 | 6.9024e-01 | 6.1540e-01 |
1500 | 2.8131e-02 | 3.1485e-02 | 4.1104e-02 | 4.1613e-02 | 1.9848e-01 | 2.4670e-01 | 3.4716e-01 | 4.0582e-01 |
2000 | 2.7796e-02 | 4.0410e-02 | 3.5891e-02 | 4.6334e-02 | 1.4704e-01 | 1.7687e-01 | 4.0678e-01 | 5.0022e-01 |
2500 | 3.0909e-02 | 4.0215e-02 | 3.3966e-02 | 4.4024e-02 | 1.7589e-01 | 2.4211e-01 | 4.1403e-01 | 3.9570e-01 |
3000 | 2.6411e-02 | 3.5600e-02 | 4.3209e-02 | 5.2802e-02 | 1.4289e-01 | 1.3625e-01 | 5.1167e-01 | 5.3298e-01 |
PINN-H2 | ||||||||
1000 | 4.9869e-02 | 1.3451e-01 | 5.6327e-02 | 5.4796e-02 | 3.2314e-01 | 3.4978e-01 | 6.7624e-01 | 5.7277e-01 |
1500 | 5.4708e-02 | 1.3987e-01 | 4.5871e-02 | 5.1622e-02 | 2.8609e-01 | 5.2598e-01 | 4.9343e-01 | 2.3518e-01 |
2000 | 6.2114e-02 | 1.0190e-01 | 6.4477e-02 | 5.0011e-02 | 2.5745e-01 | 3.1642e-01 | 5.9057e-01 | 5.8411e-01 |
2500 | 3.7887e-02 | 6.0630e-02 | 5.4363e-02 | 5.0659e-02 | 2.2212e-01 | 2.4774e-01 | 5.3681e-01 | 3.5427e-01 |
3000 | 5.4862e-02 | 6.3407e-02 | 5.5208e-02 | 6.0082e-02 | 3.4102e-01 | 2.1308e-01 | 5.1894e-01 | 4.4995e-01 |
The effect of the number of collocation points () on the PINN results has been studied in Figure 13 and Table 3, where is systematically varied in the range to . Figure 13 shows the histories of the loss function for training PINN-H1 and PINN-H2 under different collocation points. Table 3 lists the corresponding and errors of and obtained from PINN-H1 and PINN-H2. One can observe that the PINN errors in general tend to improve with increasing number of collocation points. It can also be observed that the PINN-H1 errors in general appear better than those of PINN-H2 for this problem.
Figure 14 shows the errors of , , and as a function of the loss function value in the network training of PINN-H1 and PINN-H2. The data indicates that these errors approximately scale as the square root of the training loss, which is consistent with the relation as given by Theorem 5.5. This in a sense provides numerical evidence for the theoretical analysis in Section 5.




7 Concluding Remarks
In the present paper we have considered the approximation of a class of dynamic PDEs of second order in time by physics-informed neural networks (PINN). We provide an analysis of the convergence and the error of PINN for approximating the wave equation, the Sine-Gordon equation, and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the activation function for all the hidden nodes, the PINN approximation errors for the solution field, its time derivative and its gradient can be bounded by the PINN training loss and the number of training data points (quadrature points).
Our theoretical analyses further suggest new forms for the PINN training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN formulation of the loss function. These typically include the gradient of the equation residual, the gradient of the initial-condition residual, and the time derivative of the boundary-condition residual. In addition, depending on the type of boundary conditions involved in the problem, our analyses suggest that a norm other than the commonly-used norm may be more appropriate for the boundary residuals in the loss function. Adopting these new forms of the loss function suggested by the theoretical analyses leads to a variant PINN algorithm. We have implemented the new algorithm and presented a number of numerical experiments on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. The simulation results demonstrate that the method can capture the solution field well for these PDEs. The numerical data corroborate the theoretical analyses.
Declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Availability of data/code and material
Data will be made available on reasonable request.
Acknowledgements
The work was partially supported by the China Postdoctoral Science Foundation (No.2021M702747), Natural Science Foundation of Hunan Province (No.2022JJ40422), NSF of China (No.12101495), General Special Project of Education Department of Shaanxi Provincial Government (No.21JK0943), and the US National Science Foundation (DMS-2012415).
8 Appendix: Auxiliary Results and Proofs of Main Theorems from Sections 4 and 5
8.1 Notation
Let a -tuple of non-negative integers be multi-index with . For given two multi-indices , we say that , if, and only if, for all . And then, denote
Let , for which it holds
8.2 Some Auxiliary Results
Lemma 8.1.
Let with and be an open set. Every function has a continuous representative belonging to .
Lemma 8.2.
Let , and with , then
Lemma 8.3 (Multiplicative trace inequality, e.g. DeRyck2021On ).
Let , be a Lipschitz domain and let be the trace operator. Denote by the diameter of and by the radius of the largest -dimensional ball that can be inscribed into . Then it holds that
(74) |
where .
Lemma 8.4 (2023_IMA_Mishra_NS ).
Let and let be a neural network with for , c.f. Definition 2.1. Assume that . Then it holds for that
(75) |
Lemma 8.5 (2023_IMA_Mishra_NS ).
Let with for , and . Then for every with there exists a tanh neural network with two hidden layers, one of width at most and another of width at most , such that for it holds that
(76) |
and where
Moreover, the weights of scale as with .
8.3 Proof of Main Theorems from Section 4: Sine-Gordon Equation
Theorem 4.2: Let , , with . Assume that is Lipschitz continuous, and . Then for every integer , there exist neural networks and , each having two hidden layers, of widths at most and , such that
Proof.
Theorem 4.3: Let , and be the classical solution to the Sine-Gordon equation (38). Let denote the PINN approximation with the parameter . Then the following relation holds,
where is defined in the proof.
Proof.
8.4 Proof of Main Theorems from Section 5: Linear Elastodynamic Equation
Theorem 5.3: Let , , with . Let , and with . For every integer , there exist neural networks and , with , each with two hidden layers, of widths at most and , such that
Proof.
Lemma 5.2 implies that,
Let and . Based on Lemma 8.5, there exists neural networks and , with , each having two hidden layers, of widths at most and , such that for every and ,
(81) | ||||
(82) |
Let represent the derivative with respect to the -th dimension. For , we have
Using (81) and (82) and the above relations, we can now bound the PINN residuals,
∎
Theorem 5.4: Let , and be the classical solution to the linear elastodynamic equation (49). Let denote the PINN approximation with the parameter . then the following relation holds,
where is given in the proof.
Proof.
Theorem 5.5: Let , and be the classical solution to the linear elastodynamic equation (49). Let denote the PINN approximation with the parameter . Then the following relation holds,
where is defined in the following proof.
References
- [1] P. F. Antonietti and I. Mazzieri. High-order discontinuous Galerkin methods for the elastodynamics equation on polygonal and polyhedral meshes. Comput. Methods Appl. Mech. Engrg., 342:414–437, 2018.
- [2] Genming Bai, Ujjwal Koley, Siddhartha Mishra, and Roberto Molinaro. Physics informed neural networks (PINNs) for approximating nonlinear dispersive PDEs. J. Comput. Math., 39(6):816–847, 2021.
- [3] Christian Beck, Weinan E, and Arnulf Jentzen. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci., 29(4):1563–1619, 2019.
- [4] Julius Berner, Philipp Grohs, and Arnulf Jentzen. Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci., 2(3):631–657, 2020.
- [5] Animikh Biswas, Jing Tian, and Süleyman Ulusoy. Error estimates for deep learning methods in fluid dynamics. Numer. Math., 151(3):753–777, 2022.
- [6] Z. Cai, J. Chen, M. Liu, and X. Liu. Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs. J. Comput. Phys., 420:109707, 2020.
- [7] F. Calabro, G. Fabiani, and C. Siettos. Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Engrg., 387:114188, 2021.
- [8] Ovidiu Calin. Deep learning architectures–a mathematical approach. Springer Series in the Data Sciences. Springer, Cham, 2020.
- [9] Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi, and Francesco Piccialli. Scientific machine learning through physics-informed neural networks: where we are and what’s next. J. Sci. Comput., 92(3):Paper No. 88, 62, 2022.
- [10] E.C. Cyr, M.A. Gulian, R.G. Patel, M. Perego, and N.A. Trask. Robust training and initialization of deep neural networks: An adaptive basis viewpoint. Proceedings of Machine Learning Research, 107:512–536, 2020.
- [11] P.J. Davis and P. Rabinowitz. Methods of numerical integration. Dover Publications, Inc, 2007.
- [12] Tim De Ryck, Ameya D Jagtap, and Siddhartha Mishra. Error estimates for physics-informed neural networks approximating the Navier–Stokes equations. IMA J. Numer. Anal., 0:1–37, 2023.
- [13] Tim De Ryck, Samuel Lanthaler, and Siddhartha Mishra. On the approximation of functions by tanh neural networks. Neural Networks, 143:732–750, 2021.
- [14] Tim De Ryck and Siddhartha Mishra. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. Adv. Comput. Math., 48(79), 2022.
- [15] S. Dong and Z. Li. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Engrg., 387:114129, 2021. (also arXiv:2012.02895).
- [16] S. Dong and N. Ni. A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural networks. J. Comput. Phys., 435:110242, 2021.
- [17] S. Dong and Y. Wang. A method for computing inverse parametric PDE problems with randomized neural networks. arXiv:2210.04338, 2022.
- [18] S. Dong and J. Yang. Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput. Methods Appl. Mech. Engrg., 398:115284, 2022. (also arXiv:2201.09989).
- [19] S. Dong and J. Yang. On computing the hyperparameter of extreme learning machines: algorithms and applications to computational PDEs, and comparison with classical and high-order finite elements. J. Comput. Phys., 463:111290, 2022. (also arXiv:2110.14121).
- [20] Suchuan Dong and Zongwei Li. A modified batch intrinsic plasticity method for pre-training the random coefficients of extreme learning machines. J. Comput. Phys., 445:Paper No. 110585, 31, 2021.
- [21] W. E and B. Yu. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat., 6:1–12, 2018.
- [22] Dennis Elbrächter, Dmytro Perekrestenko, Philipp Grohs, and Helmut Bölcskei. Deep neural network approximation theory. IEEE Trans. Inform. Theory, 67(5):2581–2623, 2021.
- [23] G. Fabiani, F. Calabro, L. Russo, and C. Siettos. Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J. Sci. Comput., 89:44, 2021.
- [24] Rui Fang, David Sondak, Pavlos Protopapas, and Sauro Succi. Neural network models for the anisotropic Reynolds stress tensor in turbulent channel flow. J. Turbul., 21(9-10):525–543, 2020.
- [25] J. He and J. Xu. MgNet: A unified framework for multigrid and convolutional neural network. Sci. China Math., 62:1331–1354, 2019.
- [26] Ruimeng Hu, Quyuan Lin, Raydan Alan, and Sui Tang. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. arXiv:2209.11929.
- [27] Z. Hu, C. Liu, Y. Wang, and Z. Xu. Energetic variational neural network discretizations to gradient flows. arXiv:2206.07303, 2022.
- [28] Thomas J. R. Hughes and Jerrold E. Marsden. Classical elastodynamics as a linear symmetric hyperbolic system. J. Elasticity, 8(1):97–110, 1978.
- [29] A.D. Jagtap and G.E. Karniadakis. Extended physics-informed neural network (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys., 28:2002–2041, 2020.
- [30] A.D. Jagtap, E. Kharazmi, and G.E. Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Engrg., 365:113028, 2020.
- [31] G.E. Karniadakis, G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physics-informed machine learning. Nat. Rev. Phys., 3:422–440, 2021.
- [32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [33] A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, and M.W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
- [34] Kôji Kubota and Kazuyoshi Yokoyama. Global existence of classical solutions to systems of nonlinear wave equations with different speeds of propagation. Japan. J. Math., 27(1):113–202, 2001.
- [35] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.
- [36] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: a deep learning library for solving differential equations. SIAM Rev., 63(1):208–228, 2021.
- [37] Siddhartha Mishra and Roberto Molinaro. Physics informed neural networks for simulating radiative transfer. J. Quant. Spectrosc. Radiat. Transfer, 270:107705, 2021.
- [38] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal., 42(2):981–1022, 2022.
- [39] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal., 0(2):1–43, 2022.
- [40] Siddhartha Mishra and T. Konstantin Rusch. Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences. SIAM J. Numer. Anal., 59(3):1811–1834, 2021.
- [41] Partha Niyogi and Federico Girosi. Generalization bounds for function approximation from scattered noisy data. Adv. Comput. Math., 10(1):51–80, 1999.
- [42] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, New York, second edition, 2006.
- [43] R.G. Patel, I. Manickam, N.A. Trask, M.A. Wood, M. Lee, I. Tomas, and E.C. Cyr. Thermodynamically consistent physics-informed neural networks for hyperbolic systems. J. Comput. Phys., 449:110754, 2022.
- [44] M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, and R.M. Kirby. A unified scalable framework for causal sweeping strategies for physics-informed neural networks (pinns) and their temporal decompositions. arXiv:2302.14227, 2023.
- [45] Maziar Raissi and George Em Karniadakis. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
- [46] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
- [47] Jalal Shatah. Global existence of small solutions to nonlinear evolution equations. J. Differential Equations, 46(3):409–425, 1982.
- [48] Jalal Shatah. Normal forms and quadratic nonlinear Klein-Gordon equations. Comm. Pure Appl. Math., 38(5):685–696, 1985.
- [49] Yeonjong Shin, Jérôme Darbon, and George Em Karniadakis. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys., 28(5):2042–2074, 2020.
- [50] Yeonjong Shin, Zhongqiang Zhang, and George Em Karniadakis. Error estimates of residual minimization using neural networks for linear PDEs. arXiv:2010.08019.
- [51] J.W. Siegel, Q. Hong, X. Jin, W. Hao, and J. Xu. Greedy training algorithms for neural networks and applications to PDEs. arXiv:2107.04466, 2022.
- [52] J. Sirignano and K. Spoliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:1339–1364, 2018.
- [53] K. Tang, X. Wan, and Q. Liao. Adaptive deep density estimation for fokker-planck equations. J. Comput. Phys., 457:111080, 2022.
- [54] A.M. Tartakovsky, C.O. Marrero, P. Perdikaris, G.D. Tartakovsky, and D. Barajas-Solano. Physics-informed deep neural networks for learning parameters and constitutive relationships in subsurface flow problems. Water Resour. Res., 56:e2019WR026731, 2020.
- [55] Roger Temam. Infinite-dimensional dynamical systems in mechanics and physics, volume 68 of Applied Mathematical Sciences. Springer-Verlag, New York, second edition, 1997.
- [56] Nils Thuerey, Konstantin Weißenow, Lukas Prantl, and Xiangyu Hu. Deep learning methods for reynolds-averaged Navier–Stokes simulations of airfoil flows. AIAA J., 58(1):25–36, 2020.
- [57] X. Wan and S. Wei. VAE-KRnet and its applications to variational Bayes. Commun. Comput. Phys., 31:1049–1082, 2022.
- [58] Baoxiang Wang. Classical global solutions for non-linear Klein-Gordon-Schrödinger equations. Math. Methods Appl. Sci., 20(7):599–616, 1997.
- [59] Jianxun Wang, Jinlong Wu, and Heng Xiao. Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids, 2(3):034603, 2017.
- [60] S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: a neural tangent kernel perspective. J. Comput. Phys., 449:110768, 2022.
- [61] Y. Wang and G. Lin. Efficient deep learning techniques for multiphase flow simulation in heterogeneous porous media. J. Comput. Phys., 401:108968, 2020.
- [62] Kôsaku Yosida. Functional analysis, volume 123. Springer-Verlag, Berlin-New York, sixth edition, 1980.
- [63] Umberto Zerbinati. PINNs and GaLS: A priori error estimates for shallow physics informed neural networks applied to elliptic problems. IFAC-PapersOnLine, 55(20):61–66, 2022.