Error Analysis of Physics-Informed Neural Networks for Approximating Dynamic PDEs of Second Order in Time

Yanxia Qian^a, Yongchao Zhang^b, Yunqing Huang^a∗, Suchuan Dong^c
^aSchool of Mathematics and Computational Science, Xiangtan University,
Xiangtan, Hunan, 411105, P.R. China
^bSchool of Mathematics, Northwest University,
Xi’an, Shaanxi 710069, P.R. China
^cCenter for Computational and Applied Mathematics,
Department of Mathematics, Purdue University, USA Authors of correspondence. Emails: yxqian0520@xtu.edu.cn (Y. Qian), yoczhang@nwu.edu.cn (Y. Zhang), huangyq@xtu.edu.cn (Y. Huang), sdong@purdue.edu (S. Dong)

((March 21, 2023))

Abstract

We consider the approximation of a class of dynamic partial differential equations (PDE) of second order in time by the physics-informed neural network (PINN) approach, and provide an error analysis of PINN for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the $\tanh$ activation function, the PINN approximation errors for the solution field, its time derivative and its gradient field can be effectively bounded by the training loss and the number of training data points (quadrature points). Our analyses further suggest new forms for the training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN loss formulation. Adopting these new forms for the loss function leads to a variant PINN algorithm. We present ample numerical experiments with the new PINN algorithm for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation, which show that the method can capture the solution well.

Keywords: physics informed neural network; neural network; error estimate; PDE; scientific machine learning

1 Introduction

Deep neural networks (DNN) have achieved a great success in a number of fields in science and engineering LeCun2015DP such as natural language processing, robotics, computer vision, speech and image recognition, to name but a few. This has inspired a great deal of research efforts in the past few years to adapt such techniques to scientific computing. DNN-based techniques seem particularly promising for problems in higher dimensions, e.g. high-dimensional partial differential equations (PDE), since traditional numerical methods for high-dimensional problems can quickly become infeasible due to the exponential increase in the computational effort (so-called curse of dimensionality). Under these circumstances deep-learning algorithms can be helpful. In particular, the neural networks approach for PDE problems provide implicit regularization and can alleviate and perhaps overcome the curse of high dimensions Beck2019Machine ; Berner2020Analysis . This approach also provides a natural framework for estimating the unknown parameters Fang2020NN ; Raissi2019pinn ; Raissi2018Hidden ; Thuerey2020Deep ; Wang2017pinn .

As deep neural networks are universal function approximators, it is natural to employ them as ansatz spaces for solutions of (ordinary or partial) differential equations. This paves the way for their use in physical modeling and scientific computing and gives rise to the field of scientific machine learning Karniadakisetal2021 ; SirignanoS2018 ; Raissi2019pinn ; EY2018 ; Lu2021DeepXDE . The physics-informed neural network (PINN) approach was introduced in Raissi2019pinn . It has been successfully applied to a variety of forward and inverse PDE problems and has become one of the most commonly-used methods in scientific machine learning (see e.g. Raissi2019pinn ; HeX2019 ; CyrGPPT2020 ; JagtapKK2020 ; WangL2020 ; JagtapK2020 ; CaiCLL2020 ; Tartakovskyetal2020 ; DongN2021 ; TangWL2021 ; DongL2021 ; CalabroFS2021 ; WanW2022 ; FabianiCRS2021 ; KrishnapriyanGZKM2021 ; DongY2022 ; DongY2022rm ; WangYP2022 ; Pateletal2022 ; DongW2022 ; Siegeletal2022 ; HuLWX2022 ; Penwardenetal2023 , among others). The references Karniadakisetal2021 ; Cuomo2022Scientific provide a comprehensive review of the literature on PINN and about the benefits and drawbacks of this approach.

The mathematical foundation for PINN aiming at the approximation of PDE solution is currently an active area of research. It is important to account for different components of the neural-network error: optimization error, approximation error, and estimation error Niyogi1999Generalization ; Shin2020On . Approximation error refers to the discrepancy between the exact functional map and the neural network mapping function on a given network architecture Calin2020Deep ; Elbrachter2021deep . Estimation error arises when the network is trained on a finite data set to get a mapping on the target domain. The generalization error is the combination of approximation and estimation errors and defines the accuracy of the neural-network predicted solution trained on the given set of data.

Theoretical understanding of PINN has been advanced by a number of recent works. In Shin2020On Shin et al. rigorously justify why PINN works and shows its consistency for linear elliptic and parabolic PDEs under certain assumptions. These results are extended in Shin2010.08019 to a general abstract framework for analyzing PINN for linear problems with the loss function formulated in terms of the strong or weak forms of the equations. In Mishra2022Estimates Mishra and Molinaro provide an abstract framework on PINN for forward PDE problems, and estimate the generalization error by means of the training error and the number of training data points. This framework is extended in Mishra2022inverse to study several inverse PDE problems, including the Poisson, heat, wave and Stokes equations. Bai and Koley Bai2021PINN investigate the PINN approximation of nonlinear dispersive PDEs such as the KdV-Kawahara, Camassa-Holm and Benjamin-Ono equations. In Biswas2022Error Biswa et al. provide explicit error estimates (in suitable norms) and stability analyses for the incompressible Navier–Stokes equations. Zerbinati Zerbinati2022pinns presents PINN as an under-determined point matching collocation method, reveals its connection with Galerkin Least Squares (GALS) method, and establishes an a priori error estimate for elliptic problems.

An important theoretical result on the approximation errors from the recent work DeRyck2021On establishes that a feed-forward neural network $\hat{u}_{\theta}$ with a $\tanh$ activation function and two hidden layers may approximate a function $u$ with a bound in a Sobolev space,

\|\hat{u}_{\theta N}-u\|_{w^{k,\infty}}\leq C{\rm ln}(cN)^{k}/N^{s-k}.

Here $u\in w^{s,\infty}([0,1]^{d})$ , $d$ is the dimension of the problem, $N$ is the number of training points, and $c,C>0$ are explicitly known constants independent of $N$ . Based on this result, De Ryck et al. 2023_IMA_Mishra_NS have studied the PINN for the Navier–Stokes equations and shown that a small training error implies a small generalization error. In particular, Hu et al. Ruimeng2209.11929 provide the higher-order (spatial Sobolev norm) error estimates for the primitive equations, which improve the existing results in the PINN literature that only involve $L^{2}$ errors. In DeRyck2022Estimates it has been shown that, with a sufficient number of randomly chosen training points, the total $L^{2}$ error can be bounded by the generalization error for Kolmogorov-type PDEs, which in turn is bounded by the training error. It is proved that the size of the PINN and the number of training samples only increase polynomially with the problem dimension, thus enabling PINN to overcome the curse of dimensionality in this case. In Mishra2021pinn the authors investigate the high-dimensional radiative transfer equation and prove that the generalization error is bounded by the training error and the number of training points, where the upper bound depends on the dimension only through a logarithmic factor. Hence PINN does not suffer from the curse of dimensionality, provided that the training errors do not depend on the underlying dimension.

Although PINN has been widely used for approximating PDEs, theoretical investigations on its convergence and errors are still quite limited and are largely confined to elliptic and parabolic PDEs. There seems to be less (or little) theoretical analysis on the convergence of PINN for hyperbolic type PDEs. In this paper, we consider a class of dynamic PDEs of second order in time, which are hyperbolic in nature, and provide an analysis of the convergence and errors of the PINN algorithm applied to such problems. We have focused on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation in our analyses. Building upon the result of DeRyck2021On ; 2023_IMA_Mishra_NS on $\tanh$ neural networks with two hidden layers, we have shown that for these three kinds of PDEs:

•

The underlying PDE residuals in PINN can be made arbitrarily small with $\tanh$ neural networks having two hidden layers.
•

The total error of the PINN approximation is bounded by the generalization error of PINN.
•

The total error of PINN approximations for the solution field, its time derivative and its gradient is bounded by the training error (training loss) of PINN and the number of quadrature points (training data points).

Furthermore, our theoretical analyses have suggested PINN training loss functions for these PDEs that are somewhat different in form than from the canonical PINN formulation. These lie in two aspects: (i) Our analyses require certain residual terms (such as the gradient of the initial condition, the time derivative of the boundary condition, or in the case of linear elastodynamic equation the strain and divergence of the initial condition) in the training loss, which would be absent from the canonical PINN formulation of the loss function. (ii) Our analyses may require, depending on the type of boundary conditions, a norm other than the $L^{2}$ norm for certain boundary residuals in the training loss, which is different from the commonly-used $L^{2}$ norm in the canonical PINN formulation of the loss function.

These new forms for the training loss function suggested by the theoretical analyses lead to a variant PINN algorithm. We have implemented the PINN algorithm based on these new forms of the training loss function for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Ample numerical experiments based on this algorithm have been presented. The simulation results indicate that the method has captured the solution field reasonably well for these PDEs. The numerical results also to some extent corroborate the theoretical relation between the approximation error and the PINN training loss obtained from the error analysis.

The rest of this paper is organized as follows. In Section 2 we present an overview of PINN for dynamic PDEs of second order in time. In Sections 3, 4 and 5, we present an error analysis of the PINN algorithm for approximating the wave equation, Sine-Gordon equation, and the linear elastodynamic equation, respectively. Section 6 summarizes a set of numerical experiments with these three PDEs to supplement and support our theoretical analyses. Section 7 concludes the presentation with some closing remarks. Finally, the appendix (Section 8) recalls some auxiliary results for our analysis and provides the proofs of the main theorems in Sections 4 and 5.

2 Physics Informed Neural Networks (PINN) for Approximating PDEs

2.1 Generic PDE of Second Order in Time

Consider a compact domain $D\subset\mathbb{R}^{d}$ ( $d>0$ being an integer), and let $\mathcal{D}$ and $\mathcal{B}$ denote the differential and boundary operators. We consider the following general form of an initial boundary value problem with a generic PDE of second order in time. For any $\bm{x}\in D$ , $\bm{y}\in\partial D$ and $t\in[0,T]$ ,


		$\displaystyle\frac{\partial^{2}u}{\partial t^{2}}(\bm{x},t)+\mathcal{D}[u](\bm{x},t)=0,$		(1a)
		$\displaystyle\mathcal{B}u(\bm{y},t)=u_{d}(\bm{y},t),$		(1b)
		$\displaystyle u(\bm{x},0)=u_{in}(\bm{x}),\quad\frac{\partial u}{\partial t}(\bm{x},0)=v_{in}(\bm{x}).$		(1c)

Here, $u(\bm{x},t)$ is the unknown field solution, $u_{d}$ denotes the boundary data, and $u_{in}$ and $v_{in}$ are the initial distributions for $u$ and $\frac{\partial u}{\partial t}$ . We assume that in $\mathcal{D}$ the highest derivative with respect to the time variable $t$ , if any, is of first order.

2.2 Neural Network Representation of a Function

Let $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ denote an activation function that is at least twice continuously differentiable. For any $n\in\mathbb{N}$ and $z\in\mathbb{R}^{n}$ , we define $\sigma(z):=(\sigma(z_{1}),\cdots,\sigma(z_{n}))$ , where $z_{i}$ ( $1\leq i\leq n$ ) are the components of $z$ . We adopt the following formal definition for a feedforward neural network as given in 2023_IMA_Mishra_NS .

Definition 2.1 (2023_IMA_Mishra_NS ).

Let $R\in(0,\infty]$ , $L,W\in\mathbb{N}$ and $l_{0},\cdots,l_{L}\in\mathbb{N}$ . Let $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ be a twice differentiable function and define

\Theta=\Theta_{L,W,R}:=\bigcup_{L^{\prime}\in\mathbb{N},L^{\prime}\leq L}\,\bigcup_{l_{0},\cdots,l_{L}\in\{1,\cdots,W\}}\,{\hbox to0.0pt{$\diagdown$\hss}\diagup}_{k=1}^{L^{\prime}}([-R,R]^{l_{k}\times l_{k-1}}\times[-R,R]^{l_{k}}).

(2)

For $\theta\in\Theta$ , we define $\theta_{k}:=(W_{k},b_{k})$ and $\mathcal{A}_{k}^{\theta}:\mathbb{R}^{l_{k-1}}\rightarrow\mathbb{R}^{l_{k}}$ by $z\mapsto W_{k}z+b_{k}$ for $1\leq k\leq L$ , and we define $f_{k}^{\theta}:\mathbb{R}^{l_{k-1}}\rightarrow\mathbb{R}^{l_{k}}$ by

\displaystyle f_{k}^{\theta}=\left\{\begin{array}[]{lll}\mathcal{A}_{L}^{\theta}(z)&k=L,\\ (\sigma\circ\mathcal{A}_{k}^{\theta})(z)&1\leq k<L.\end{array}\right.

(5)

Denote $u_{\theta}:\mathbb{R}^{l_{0}}\rightarrow\mathbb{R}^{l_{L}}$ the function that satisfies for all $z\in\mathbb{R}^{l_{0}}$ that

u_{\theta}(z)=(f_{L}^{\theta}\circ f_{L-1}^{\theta}\circ\cdots\circ f_{1}^{\theta})(z)\qquad z\in\mathbb{R}^{l_{0}}.

(6)

We set $z=(\bm{x},t)$ and $l_{0}=d+1$ for approximating the PDE problem (1).

$u_{\theta}$ as defined above is the neural-network representation of a parameterized function associated with the parameter $\theta$ . This neural network contains $(L+1)$ layers ( $L\geq 2$ ), with widths $(l_{0},l_{1},\cdots,l_{L})$ for each layer. The input layer has a width $l_{0}$ , and the output layer has a width $l_{L}$ . The $(L-1)$ layers between the input/output layers are the hidden layers, with widths $l_{k}$ ( $1\leq k\leq L-1$ ). $W_{k}$ and $b_{k}$ are the weight/bias coefficients corresponding to layer $k$ for $1\leq k\leq L$ . From layer to layer the network logic represents an affine transform, followed by a function composition with the activation function $\sigma$ . Note that no activation function is applied to the output layer. We refer to $u_{\theta}$ with $L=2$ (i.e. single hidden layer) as a shallow neural network, and $u_{\theta}$ with $L\geq 3$ (i.e. multiple hidden layers) as a deeper or deep neural network.

2.3 Physics Informed Neural Network for Initial/Boundary Value Problem

Let $\Omega=D\times[0,T]$ and $\Omega_{*}=\partial D\times[0,T]$ be the spatial-temporal domaindomain. We approximate the solution $u$ to the problem (1) by a neural network $u_{\theta}:\Omega\rightarrow\mathbb{R}^{n}$ . With PINN we consider the residual function of the initial/boundary value problem (1), defined for any sufficiently smooth function $u:\Omega\rightarrow\mathbb{R}^{n}$ as, for any $\bm{x}\in D$ , $\bm{y}\in\partial D$ and $t\in[0,T]$ ,


		$\displaystyle\mathcal{R}_{int}[u](\bm{x},t)=\frac{\partial^{2}u}{\partial t^{2}}(\bm{x},t)+\mathcal{D}[u](\bm{x},t),$		(7a)
		$\displaystyle\mathcal{R}_{sb}[u](\bm{y},t)=\mathcal{B}u(\bm{y},t)-u_{d}(\bm{y},t),$		(7b)
		$\displaystyle\mathcal{R}_{tb1}[u](\bm{x},0)=u(\bm{x},0)-u_{in}(\bm{x}),$		(7c)
		$\displaystyle\mathcal{R}_{tb2}[u](\bm{x},0)=\frac{\partial u}{\partial t}(\bm{x},0)-v_{in}(\bm{x}).$		(7d)

These residuals chacracterize how well a given function $u$ satisfies the initial/boundary value problem (1). If $u$ is the exact solution, $\mathcal{R}_{int}[u]=\mathcal{R}_{sb}[u]=\mathcal{R}_{tb1}[u]=\mathcal{R}_{tb2}[u]=0$ .

To facilitate the subsequent analyses, we introduce an auxiliary function $v=\frac{\partial u}{\partial t}$ and rewrite $\mathcal{R}_{tb2}$ as

\mathcal{R}_{tb2}[v](\bm{x},0)=v(\bm{x},0)-v_{in}(\bm{x}).

(8)

We reformulate (1a) into two equations, thus separating the interior residual into the following two components:

		$\displaystyle\mathcal{R}_{int1}[u,v](\bm{x},t)=\frac{\partial u}{\partial t}(\bm{x},t)-v(\bm{x},t),$		(9)
		$\displaystyle\mathcal{R}_{int2}[u,v](\bm{x},t)=\frac{\partial v}{\partial t}(\bm{x},t)+\mathcal{D}[u](\bm{x},t).$		(10)

With PINN, we seek a neural network $(u_{\theta},v_{\theta})$ to minimize the following quantity,

\begin{split}\mathcal{E}_{G}(\theta)^{2}=&\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}\\ &+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega_{*}}|R_{sb}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t.\end{split}

(11)

The different terms of (11) may be rescaled by different weights (penalty coefficients). For simplicity, we set all these weights to one in the analysis. $\mathcal{E}_{G}$ as defined above is often referred to as the generalization error. Because of the integrals involved therein, $\mathcal{E}_{G}$ can be hard to minimize. In practice, one will approximate (11) by an appropriate numerical quadrature rule, as follows

\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}=

\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2},

(12)

where


$\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(13a)
$\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(13b)
$\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(13c)
$\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(13d)
$\displaystyle\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}\|R_{sb}[u_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2}.$	(13e)

The quadrature points in the spatial-temporal domain and on the spatial and temporal boundaries, $\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}$ , $\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}}$ and $\mathcal{S}_{tb}=\{(\bm{x}_{tb}^{n},t_{tb}^{n}=0)\}_{n=1}^{N_{tb}}$ , constitute the input data sets to the neural network. In the above equations $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ is referred to as the training error (or training loss), and $\omega_{\star}^{n}$ are suitable quadrature weights for $\star=int$ , $sb$ and $tb$ . Therefore, PINN attempts to minimize the training error $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ over the network parameters $\theta$ , and upon convergence of optimization the trained $u_{\theta}$ contains the approximation of the solution $u$ to the problem (1).

Remark 2.2.

The generalization error (11) (with the corresponding training error (12)) is the standard (canonical) PINN form if one introduces $v=\frac{\partial u}{\partial t}$ and reformulates (1a) into two equations. We would like to emphasize that our analyses below suggest alternative forms for the generalization error, e.g.

\begin{split}\mathcal{E}_{G}(\theta)^{2}=&\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}\\ &+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}\\ &+\left(\int_{\Omega_{*}}|R_{sb}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},\end{split}

(14)

which differs from (11) in the terms $\nabla R_{int1}$ , $\nabla R_{tb1}$ and the last term. The corresponding training error is,

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}=$	$\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$
		$\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb}),$		(15)

where

\left\{\begin{split}&\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2},\\ &\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}.\end{split}\right.

(16)

The error analyses also suggest additional terms in the generalization error for different equations.

2.4 Numerical Quadrature Rules

As discussed above, we need to approximate the integrals of functions. The analysis in the subsequent sections requires well-known results on numerical quadrature rules as reviewed below.

Given $\Lambda\subset\mathbb{R}^{d}$ and a function $f\in L^{1}(\Lambda)$ , we would like to approximate $\int_{\Lambda}f(z){\,\rm{d}}{z}$ . A quadrature rule provides an approximation by

\int_{\Lambda}f(z){\,\rm{d}}{z}\approx\frac{1}{M}\sum_{n=1}^{M}\omega_{n}f(z_{n}),

(17)

where $z_{n}\in\Lambda$ ( $1\leq n\leq M$ ) are the quadrature points and $\omega_{n}$ ( $1\leq n\leq M$ ) denote the appropriate quadrature weights. The approximation accuracy is influenced by the type of quadrature rule, the number of quadrature points ( $M$ ), and the regularity of $f$ . For the mid-point rule, which is assumed in the analysis in the current work, the approximation accuracy is given by

\left|\int_{\Lambda}f(z){\,\rm{d}}{z}-\frac{1}{M}\sum_{n=1}^{M}\omega_{n}f(z_{n})\right|\leq C_{f}M^{-2/d},

(18)

where $C_{f}\lesssim\|f\|_{C^{2}(D)}$ ( $a\lesssim b$ denotes $a\leq Cb$ ) and $D$ has been partitioned into $M\sim N^{d}$ cubes and $z_{n}$ ( $1\leq n\leq M$ ) denote the midpoints of these cubes DavisR2007 . In this paper, we use $C$ to denote a universal constant, which may depend on $k,d,T,u$ and $v$ but not on $N$ . And we use the subscript to emphasize its dependence when necessary, e.g. $C_{d}$ is a constant depending only on $d$ .

We focus on PDE problems in relatively low dimensions ( $d\leq 3$ ) in this paper and employ the standard quadrature rules. We note that in higher dimensions the standard quadrature rules may not be favorable. In this case the random training points or low-discrepancy training points Mishra2021Enhancing may be preferred.

In subsequent sections we focus on three representative dynamic equations of second order in time (the wave equation, Sine-Gordon equation, and the linear elastodynamic equation), and provide the error estimate for approximating these equations by PINN. We note that these analyses suggest alternative forms for the training loss function that are somewhat different from the standard PINN forms Raissi2019pinn . The PINN numerical results based on the standard form for the loss function, and based on the alternative forms as suggested by the error estimate, will be provided after the presentation of the theoretical analysis. In what follows, for brevity we adopt the notation of $\mathcal{F}_{\Xi}=\frac{\partial\mathcal{F}}{\partial\Xi}$ , $\mathcal{F}_{\Xi\Upsilon}=\frac{\partial^{2}\mathcal{F}}{\partial\Xi\partial\Upsilon}$ ( $\Xi,\Upsilon\in\{t,x\}$ ), for any sufficiently smooth function $\mathcal{F}:\Omega\rightarrow\mathbb{R}^{n}$ .

3 Physics Informed Neural Networks for Approximating Wave Equation

3.1 Wave Equation

Consider the following wave equations on the torus $D=[0,1)^{d}\subset\mathbb{R}^{d}$ with periodic boundary conditions:


		$\displaystyle u_{t}-v=0\ \,\qquad\qquad\qquad\text{in}\ D\times[0,T],$		(19a)
		$\displaystyle v_{t}-\Delta u=f\ \ \quad\qquad\qquad\text{in}\ D\times[0,T],$		(19b)
		$\displaystyle u(\bm{x},0)=\psi_{1}(\bm{x})\qquad\qquad\ \text{in}\ D,$		(19c)
		$\displaystyle v(\bm{x},0)=\psi_{2}(\bm{x})\,\qquad\qquad\ \text{in}\ D,$		(19d)
		$\displaystyle u(\bm{x},t)=u(\bm{x}+1,t)\qquad\ \ \text{in}\ \partial D\times[0,T],$		(19e)
		$\displaystyle\nabla u(\bm{x},t)=\nabla u(\bm{x}+1,t)\quad\text{in}\ \partial D\times[0,T].$		(19f)

The regularity results for linear evolution equations of the second order in time have been studied in the Book Temam1997Infinite . When the self-adjoint operator $\mathcal{A}$ takes $\Delta$ , the linear evolution equations of second order in time become the classical wave equations, and then we can also obtain the following regularity results.

Lemma 3.1.

Let $r\geq 1$ , $\psi_{1}\in H^{r}(D)$ , $\psi_{2}\in H^{r-1}(D)$ and $f\in L^{2}([0,T];H^{r-1}(D))$ , then there exists a unique solution $u$ to the classical wave equations such that $u\in C([0,T];H^{r}(D))$ and $u_{t}\in C([0,T];H^{r-1}(D))$ .

Lemma 3.2.

Let $k\in\mathbb{N}$ , $\psi_{1}\in H^{r}(D)$ , $\psi_{2}\in H^{r-1}(D)$ and $f\in C^{k-1}([0,T];H^{r-1}(D))$ with $r>\frac{d}{2}+k$ , then there exists $T>0$ and a classical solution $u$ to the wave equations such that $u(t=0)=\psi_{1}$ , $u_{t}(t=0)=\psi_{2}$ , $u\in C^{k}(D\times[0,T])$ and $v\in C^{k-1}(D\times[0,T])$ .

Proof.

By Lemma 3.1, there exists $T>0$ and the solution $(u,v)$ to the wave equations such that $u(t=0)=\psi_{1}$ , $v(t=0)=\psi_{2}$ , $u\in C([0,T];H^{r}(D))$ and $v\in C([0,T];H^{r-1}(D))$ . As $r>\frac{d}{2}+k$ , $H^{r-k}(D)$ is a Banach algebra.

For $k=1$ , since $u\in C([0,T];H^{r}(D))$ , $v\in C([0,T];H^{r-1}(D))$ and $f\in C([0,T];H^{r-1}(D))$ , we have $u_{t}=v\in C([0,T];H^{r-1}(D))$ and $v_{t}=\Delta u+f\in C([0,T];H^{r-2}(D))$ . Then, it implies that $u\in C^{1}([0,T];H^{r-1}(D))$ and $v\in C^{1}([0,T];H^{r-2}(D))$ .

For $k=2$ , by $f\in C^{1}([0,T];H^{r-1}(D))$ , we have $u_{tt}=v_{t}\in C([0,T];H^{r-2}(D))$ and $v_{tt}=\Delta u_{t}+f_{t}\in C([0,T];H^{r-3}(D))$ . Then, it implies that $u\in C^{2}([0,T];H^{r-2}(D))$ and $v\in C^{2}([0,T];H^{r-3}(D))$ .

Repeating the same argument, we have $u\in\cap_{l=0}^{k}C^{l}([0,T];H^{r-l}(D))$ and $v\subset\cap_{l=0}^{k}C^{l}([0,T];H^{r-l-1}(D))$ . Then, applying the Sobolev embedding theorem and $r>\frac{d}{2}+k$ , it holds $H^{r-l}(D)\subset C^{r-l}(D)$ and $H^{r-l-1}(D)\subset C^{r-l-1}(D)$ for $0\leq l\leq k$ . Therefore, $u\in C^{k}(D\times[0,T])$ and $v\in C^{k-1}(D\times[0,T])$ . ∎

3.2 Physics Informed Neural Networks

We would like to approximate the solutions to the problem (19) with PINN. We seek deep neural networks $u_{\theta}:D\times[0,T]\rightarrow\mathbb{R}$ and $v_{\theta}:D\times[0,T]\rightarrow\mathbb{R}$ , parameterized by $\theta\in\Theta$ , that approximate the solution $u$ and $v$ of (19). Define residuals,


		$\displaystyle R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)=u_{\theta t}-v_{\theta},$		(20a)
		$\displaystyle R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)=v_{\theta t}-\Delta u_{\theta}-f,$		(20b)
		$\displaystyle R_{tb1}[u_{\theta}](\bm{x})=u_{\theta}(\bm{x},0)-\psi_{1}(\bm{x}),$		(20c)
		$\displaystyle R_{tb2}[v_{\theta}](\bm{x})=v_{\theta}(\bm{x},0)-\psi_{2}(\bm{x}),$		(20d)
		$\displaystyle R_{sb1}[v_{\theta}](\bm{x},t)=v_{\theta}(\bm{x},t)-v_{\theta}(\bm{x}+1,t),$		(20e)
		$\displaystyle R_{sb2}[u_{\theta}](\bm{x},t)=\nabla u_{\theta}(\bm{x},t)-\nabla u_{\theta}(\bm{x}+1,t).$		(20f)

Note that for the exact solution $R_{int1}[u,v]=R_{int2}[u,v]=R_{tb1}[u]=R_{tb2}[v]=R_{sb1}[v]=R_{sb2}[u]=0$ . Let $\Omega=D\times[0,T]$ and $\Omega_{*}=\partial D\times[0,T]$ be the space-time domain. With PINN, we minimize the the following generalization error,

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\int_{D}\|R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}[v_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\int_{\Omega_{}}\|R_{sb1}[v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t+\int_{\Omega_{}}\|R_{sb2}[u_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t.$	(21)

The form of different terms in this expression will become clearer below.

To complete the PINN formulation, we will choose the training set $\mathcal{S}\subset\overline{D}\times[0,T]$ based on suitable quadrature points. We divide the full training set $\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb}$ into the following three components:

•

Interior training points $\mathcal{S}_{int}=\{{z}_{n}\}$ for $1\leq n\leq N_{int}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in D\times(0,T)$ .
•

Spatial boundary training points $\mathcal{S}_{sb}=\{{z}_{n}\}$ for $1\leq n\leq N_{sb}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in\partial D\times(0,T)$ .
•

Temporal boundary training points $\mathcal{S}_{tb}=\{\bm{x}_{n}\}$ for $1\leq n\leq N_{tb}$ with each $\bm{x}_{n}\in D$ .

We define the PINN training loss, $\theta\mapsto\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ , as follows,

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$	$\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$
		$\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb})^{2}+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb})^{2},$		(22)

where


$\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(23a)
$\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int2}[u_{\theta},v_{\theta}]](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(23b)
$\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(23c)
$\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(23d)
$\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(23e)
$\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(23f)
$\displaystyle\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}\|R_{sb1}[v_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2},$	(23g)
$\displaystyle\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}\|R_{sb2}[u_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2}.$	(23h)

Here the quadrature points in space-time constitute the data sets $\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}$ , $\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}}$ and $\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}}$ , and $\omega_{\star}^{n}$ are suitable quadrature weights with $\star$ denoting $int$ , $tb$ or $sb$ .

Let

\hat{u}=u_{\theta}-u,\qquad\hat{v}=v_{\theta}-v,

denote the difference between the solution to the wave equations and the PINN approximation of the solution. We define the total error of the PINN approximation by

\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},t)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t.

(24)

3.3 Error Analysis

In light of the wave equations (19) and the definitions for different residuals (20), we have


		$\displaystyle R_{int1}=\hat{u}_{t}-\hat{v},$		(25a)
		$\displaystyle R_{int2}=\hat{v}_{t}-\Delta\hat{u}$		(25b)
		$\displaystyle R_{tb1}=\hat{u}(\bm{x},0),$		(25c)
		$\displaystyle R_{tb2}=\hat{v}(\bm{x},0),$		(25d)
		$\displaystyle R_{sb1}=\hat{v}(\bm{x},t)-\hat{v}(\bm{x}+1,t),$		(25e)
		$\displaystyle R_{sb2}=\nabla\hat{u}(\bm{x},t)-\nabla\hat{u}(\bm{x}+1,t).$		(25f)

3.3.1 Bound on the Residuals

Theorem 3.3.

Let $d$ , $r$ , $k\in\mathbb{N}$ with $k\geq 3$ . Let $\psi_{1}\in H^{r}(D)$ , $\psi_{2}\in H^{r-1}(D)$ and $f\in C^{k-1}([0,T];H^{r-1}(D))$ with $r>\frac{d}{2}+k$ . For every integer $N>5$ , there exist $\tanh$ neural networks $u_{\theta}$ and $v_{\theta}$ , each with two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that


		$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)},\\|R_{tb1}\\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1},$		(26a)
		$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)},\\|\nabla R_{int1}\\|_{L^{2}(\Omega)},\\|\nabla R_{tb1}\\|_{L^{2}(D)},\\|R_{sb2}\\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2},$		(26b)
		$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb1}\\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.$		(26c)

Proof.

Based on Lemma 3.2, it holds that $u\in H^{k}(\Omega)$ and $v\in H^{k-1}(\Omega)$ . In light of Lemma 8.5, there exists neural networks $u_{\theta}$ and $v_{\theta}$ , with the same two hidden layers and widths $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that for every $0\leq l\leq 2$ and $0\leq s\leq 2$ ,

		$\displaystyle\\|u_{\theta}-u\\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u}\lambda_{l,u}(N)N^{-k+l},$		(27)
		$\displaystyle\\|v_{\theta}-v\\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v}\lambda_{s,v}(N)N^{-k+1+s},$		(28)

where $\lambda_{l,u}=2^{l}3^{d+1}(1+\sigma){\rm ln}^{l}\left(\beta_{l,\sigma,d+1,u}N^{d+k+3}\right)$ , $\sigma=\frac{1}{100}$ , $\lambda_{s,v}=2^{s}3^{d+1}(1+\sigma){\rm ln}^{s}\left(\beta_{s,\sigma,d+1,v}N^{d+k+2}\right)$ , and the definition for the other constants can be found in Lemma 8.5.

In light of Lemma 8.3, we can bound the PINN residual terms,

	$\displaystyle\\|\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)},\qquad\\|\hat{v}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\Delta\hat{u}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)},\qquad\\|\nabla\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\nabla\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{u}\\|_{L^{2}(D)}\leq\\|\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(D)}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\nabla\hat{u}\\|_{L^{2}(D)}\leq\\|\nabla\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(\partial D\times[0,t])}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\nabla\hat{u}\\|_{L^{2}(\partial D\times[0,t])}\leq\\|\nabla\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)}.$

By combining these relations with (27) and (28), we can obtain

	$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)}=\\|\hat{u}_{t}-\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)}+\\|\hat{v}\\|_{L^{2}(\Omega)}$
	$\displaystyle\qquad\leq C_{1,k,d+1,u}\lambda_{1,u}(N)N^{-k+1}+C_{0,k-1,d+1,v}\lambda_{0,v}(N)N^{-k+1}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)}=\\|\hat{v}_{t}-\Delta\hat{u}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)}+\\|\hat{u}\\|_{H^{2}(\Omega)}$
	$\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{int1}\\|_{L^{2}(\Omega)}=\\|\nabla(\hat{u}_{t}-\hat{v})\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)}+\\|\hat{v}\\|_{H^{1}(\Omega)}$
	$\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|R_{tb1}\\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb1}\\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{tb1}\\|_{L^{2}(D)},\\|R_{sb2}\\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.$

∎

Theorem 3.3 implies that one can make the PINN residuals (20) arbitrarily small by choosing $N$ to be sufficiently large. It follows that the generalization error $\mathcal{E}_{G}(\theta)^{2}$ in (3.2) can be made arbitrarily small.

3.3.2 Bounds on the Total Approximation Error

We next show that the total error $\mathcal{E}(\theta)^{2}$ is also small when the generalization error $\mathcal{E}_{G}(\theta)^{2}$ is small with the PINN approximation $(u_{\theta},v_{\theta})$ . Then we prove that the total error $\mathcal{E}(\theta)^{2}$ can be arbitrarily small, provided that the training error $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ is sufficiently small and the sample set is sufficiently large.

Theorem 3.4.

Let $d\in\mathbb{N}$ , $u\in C^{1}(\Omega)$ and $v\in C^{0}(\Omega)$ be the classical solution to the wave equations (19). Let $u_{\theta}$ and $v_{\theta}$ denote the PINN approximation with parameter $\theta$ . Then the following relation holds,

\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},\tau)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp(2T),

(29)

where

	$\displaystyle C_{G}$	$\displaystyle=\int_{D}(\|R_{tb1}\|^{2}+\|R_{tb2}\|^{2}+\|\nabla R_{tb1}\|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(\|R_{int1}\|^{2}+\|R_{int2}\|^{2}+\|\nabla R_{int1}\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t$
		$\displaystyle\quad+\int_{0}^{T}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.$

Proof.

By taking the inner product of (25a) and (25b) with $\hat{u}$ and $\hat{v}$ and integrating over $D$ , respectively, we have

$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int1}\hat{u}{\,\rm{d}}\bm{x}\leq\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x},$	(30)
$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{v}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{u}_{t}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}R_{sb1}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\leq-\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad+\frac{1}{2}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x})+\frac{1}{2}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}.$	(31)

Here, we have used $\hat{v}=\hat{u}_{t}-R_{int1}$ .

By adding (30) to (3.3.2), we have

		$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad\leq\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x}).$		(32)

Integrating (3.3.2) over $[0,\tau]$ for any $\tau\leq T$ and applying the Cauchy–Schwarz inequality, we obtain

	$\displaystyle\int_{D}\|\hat{u}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla\hat{u}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\hat{v}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad\leq\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}+2\int_{0}^{\tau}\int_{D}\left(\|\hat{u}\|^{2}+\|\nabla\hat{u}\|^{2}+\|\hat{v}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(\|R_{int1}\|^{2}+\|R_{int2}\|^{2}+\|\nabla R_{int1}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{0}^{T}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.$

We apply the integral form of the Gr ${\rm\ddot{o}}$ nwall inequality to the above inequality to get

\int_{D}\left(|\hat{u}(\bm{x},\tau)|^{2}+|\nabla\hat{u}(\bm{x},\tau)|^{2}+|\hat{v}(\bm{x},\tau)|^{2}\right){\,\rm{d}}\bm{x}\leq C_{G}\exp(2T),

where

	$\displaystyle C_{G}$	$\displaystyle=\int_{D}(\|R_{tb1}\|^{2}+\|R_{tb2}\|^{2}+\|\nabla R_{tb1}\|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(\|R_{int1}\|^{2}+\|R_{int2}\|^{2}+\|\nabla R_{int1}\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t$
		$\displaystyle\qquad+\int_{0}^{T}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.$

Then, we integrate the above inequality over $[0,T]$ to yield (29). ∎

Remark 3.5.

For the wave equations (19) with periodic boundary, we would like to mention below two other forms for the generalization error (and the related training loss). Compared with (3.2), they differ only on the spatial boundary $\Omega_{*}$ , i.e.,

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle+\int_{D}\|R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}[v_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\left(\int_{\Omega_{*}}\|R_{sb1}[v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},$	(33)

and

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle+\int_{D}\|R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}[v_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\left(\int_{\Omega_{*}}\|R_{sb2}[u_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$	(34)

The related training loss functions are given by

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$	$\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$
		$\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb}),$		(35)

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$	$\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$
		$\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb}).$		(36)

These three forms for the generalization error result from different treatments of the boundary term $\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}$ in the proof of Theorem 3.4:

	$\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}R_{sb1}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq\|\partial D\|^{\frac{1}{2}}(\\|u\\|_{C^{1}(\partial D\times[0,t])}+\|\|u_{\theta}\|\|_{C^{1}(\partial D\times[0,t])})\left(\int_{\partial D}\|R_{sb1}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}},$
	$\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}\hat{v}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq\|\partial D\|^{\frac{1}{2}}(\\|v\\|_{C^{0}(\partial D\times[0,t])}+\|\|v_{\theta}\|\|_{C^{0}(\partial D\times[0,t])})\left(\int_{\partial D}\|R_{sb2}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}},$
	$\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}R_{sb1}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq\frac{1}{2}\left(\int_{\partial D}\|R_{sb1}\|^{2}{\,\rm{d}}s(\bm{x})+\int_{\partial D}\|R_{sb2}\|^{2}{\,\rm{d}}s(\bm{x})\right).$

Our numerical experiments indicate that adopting the training loss (3.5) or (3.5) seems to lead to poorer simulation results. For the periodic boundary, both terms $R_{sb1}$ and $R_{sb2}$ may be needed for the periodicity information. We suspect that this may be why only a single boundary term ( $R_{sb1}$ or $R_{sb2}$ ), as given by (3.5) and (3.5), leads to poorer numerical results.

Theorem 3.6.

Let $d\in\mathbb{N}$ and $T>0$ . Let $u\in C^{4}(\Omega)$ and $v\in C^{3}(\Omega)$ be the classical solution of the wave equations (19), and let $(u_{\theta},v_{\theta})$ denote the PINN approximation with parameter $\theta\in\Theta$ . Then the total error satisfies

		$\displaystyle\int_{0}^{T}\int_{D}(\|\hat{u}(\bm{x},t)\|^{2}+\|\nabla\hat{u}(\bm{x},t)\|^{2}+\|\hat{v}(\bm{x},t)\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp(2T)$
		$\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta,\mathcal{S})^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{2}{d}}).$		(37)

The constant $C_{T}$ is defined as

	$\displaystyle C_{T}=$	$\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})$
		$\displaystyle+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}$
		$\displaystyle+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})+C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb1}^{2})+C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb2}^{2}),$

where

	$\displaystyle C_{({R_{tb1}^{2}})}\lesssim\\|\hat{u}\\|_{C^{2}}^{2},\quad C_{({R_{tb2}^{2}})}\lesssim\\|\hat{v}\\|_{C^{2}}^{2},\quad C_{(\|\nabla R_{tb1}\|^{2})}\lesssim\\|\hat{u}\\|_{C^{3}}^{2},\quad C_{({R_{int1}^{2}})}\lesssim\\|\hat{u}\\|_{C^{3}}^{2}+\\|\hat{u}\\|_{C^{2}}^{2},$
	$\displaystyle\qquad C_{(R_{int2}^{2})},C_{(\|\nabla R_{int1}\|^{2})}\lesssim\\|\hat{u}\\|_{C^{4}}^{2}+\\|\hat{v}\\|_{C^{3}}^{2},\quad C_{({R_{sb1}^{2}})}\lesssim\\|\hat{v}\\|_{C^{3}}^{2},\quad C_{({R_{sb2}^{2}})}\lesssim\\|\hat{u}\\|_{C^{4}}^{2},$

and the bounds $\|u_{\theta}\|_{C^{n}}$ and $\|v_{\theta}\|_{C^{n}}$ ( $n\in\mathbb{N}$ ) are given by Lemma 8.4.

Proof.

By combining Theorem 3.4 with the quadrature error formula (18), we have

	$\displaystyle\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})$
		$\displaystyle\leq C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2}),$
	$\displaystyle\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})$
		$\displaystyle\leq C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2}),$
	$\displaystyle\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2}),$
	$\displaystyle\int_{\Omega}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})$
		$\displaystyle\leq C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2}),$
	$\displaystyle\int_{\Omega}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})$
		$\displaystyle\leq C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2}),$
	$\displaystyle\int_{\Omega}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2}),$
	$\displaystyle\int_{\Omega_{*}}\|R_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t$	$\displaystyle=\int_{\Omega_{}}\|R_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb1}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2})$
		$\displaystyle\leq C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2}),$
	$\displaystyle\int_{\Omega_{*}}\|R_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t$	$\displaystyle=\int_{\Omega_{}}\|R_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb2}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2})$
		$\displaystyle\leq C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2}).$

By the above inequalities and (29), it holds that

\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},t)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp(2T),

where

	$\displaystyle C_{T}=$	$\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})$
		$\displaystyle+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}$
		$\displaystyle+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})+C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb1}^{2})+C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb2}^{2}).$

The complexities of the constants $C_{({R_{q}^{2}})}$ are given by Lemma 8.4, and we observe that for every residual $R_{q}$ , it holds that $\|R_{q}^{2}\|_{C^{n}}\leq 2^{n}\|R_{q}\|_{C^{n}}^{2}$ ( $n\in\mathbb{N}$ ) for $R_{q}=R_{tb1}$ , $R_{tb2}$ , $\nabla R_{tb1}$ , $R_{int1}$ , $R_{int2}$ , $\nabla R_{int1}$ and $R_{sb2}$ . ∎

4 Physics Informed Neural Networks for Approximating the Sine-Gordon Equation

4.1 Sine-Gordon Equation

Let $D\subset\mathbb{R}^{d}$ be an open connected bounded set with a boundary $\partial D$ . We consider the following Sine-Gordon equation:


		$\displaystyle u_{t}-v=0\ \qquad\qquad\qquad\qquad\qquad\qquad\quad\ \ \,\text{in}\ D\times[0,T],$		(38a)
		$\displaystyle\varepsilon^{2}v_{t}=a^{2}\Delta u-\varepsilon_{1}^{2}u-g(u)+f\ \ \quad\qquad\qquad\text{in}\ D\times[0,T],$		(38b)
		$\displaystyle u(\bm{x},0)=\psi_{1}(\bm{x})\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ D,$		(38c)
		$\displaystyle v(\bm{x},0)=\psi_{2}(\bm{x})\qquad\qquad\qquad\qquad\qquad\qquad\,\text{in}\ D,$		(38d)
		$\displaystyle u(\bm{x},t)\|_{\partial D}=u_{d}(t)\qquad\qquad\qquad\qquad\qquad\ \ \ \,\text{in}\ \partial D\times[0,T],$		(38e)

where $u$ and $v$ are the field functions to be solved for, $f$ is a source term, and $u_{d}$ , $\psi_{1}$ and $\psi_{2}$ denote the boundary/initial conditions. $\varepsilon>0$ , $a>0$ and $\varepsilon_{1}\geq 0$ are constants. $g(u)$ is a nonlinear term. We assume that the nonlinearity is globally Lipschitz, i.e., there exists a constant $L$ (independent of $v$ and $w$ ) such that

|g(v)-g(w)|\leq L|v-w|,\qquad\forall v,\,w\in\mathbb{R}.

(39)

Remark 4.1.

The existence and regularity of the solution to the Sine-Gordon equation with different nonlinear terms have been the subject of several studies in the literature; see Baoxiang1997Classical ; Kubota2001Global ; Shatah1982Global ; Shatah1985Normal ; Temam1997Infinite .

The book Temam1997Infinite provides the existence and regularity result of the following Sine-Gordon equation,

u_{tt}+\alpha u_{t}-\Delta u+g(u)=f.

Let $\alpha\in\mathbb{R}$ , $g(u)$ be a $C^{2}$ function from $\mathbb{R}$ to $\mathbb{R}$ and satisfy certain assumptions. If $f\in C([0,T];L^{2}(D))$ , $\psi_{1}\in H^{1}(D)$ and $\psi_{2}\in L^{2}(D)$ , then there exists a unique solution $u$ to this Sine-Gordon equation such that $u\in C([0,T];H^{1}(D))$ and $u_{t}\in C([0,T];L^{2}(D))$ . Furthermore, $f^{\prime}\in C([0,T];L^{2}(D))$ , $\psi_{1}\in H^{2}(D)$ and $\psi_{2}\in H^{1}(D)$ , it holds $u\in C([0,T];H^{2}(D))$ and $u_{t}\in C([0,T];H^{1}(D))$ .

Let $g$ be a smooth function of degree 2. The following equation is studied in Shatah1985Normal ,

u_{tt}-\Delta u+u+g(u,u_{t},u_{tt})=0,

where it is reformulated as

\bm{u}_{t}=A\bm{u}+G(\bm{u}),

in which $\bm{u}=\begin{pmatrix}u\\ u_{t}\end{pmatrix}$ , $A=\begin{pmatrix}0&1\\ \Delta-1&0\end{pmatrix}$ and $G=\begin{pmatrix}0,\\ -g(u,u_{t},u_{tt})\end{pmatrix}$ . Set $X=H^{k}(\mathbb{R}^{n})\bigoplus H^{k-1}(\mathbb{R}^{n})$ , $k>n+2+2a$ with $a>1$ . Given $\bm{u}_{0}=\begin{pmatrix}\psi_{1}\\ \psi_{2}\end{pmatrix}\in X$ and $\|\bm{u}_{0}\|_{X}=\sigma$ , there exists a $T_{0}=T_{0}(\sigma)$ depending on the size of the initial data $\sigma$ and a unique solution $\bm{u}\in C([0,T_{0}],X)$ .

The reference Baoxiang1997Classical provides the following result. Under certain conditions for the nonlinear term $g(u)$ , with $f=0$ , $d\leq 5$ , $k\geq\frac{d}{2}+1$ , $\psi_{1}\in H^{k}(D)$ and $\psi_{2}\in H^{k-1}(D)$ , there exists a unique solution $u\in C((0,\infty);H^{k}(D))$ of nonlinear Klein–Gordon equation.

The following result is due to Kubota2001Global . Under certain conditions for the nonlinear term $g(u)$ , with $f=0$ , $\psi_{1}\in H^{k}(D)$ and $\psi_{2}\in H^{k-1}(D)$ with a positive constant $k\geq 4$ , there exists a positive constant $T_{k}$ and a unique solution $u\in C([0,T_{k}];H^{k}(D))\cap C^{1}([0,T_{k}];H^{k-1}(D))\cap C^{2}([0,T_{k}];H^{k-2}(D))$ to the nonlinear wave equations with different speeds of propagation.

A survey of literature indicates that, while several works have touched on the regularity of the solution to the Sine-Gordon equations, none of them is comprehensive. To facilitate the subsequent analyses, we make the following assumption in light of Remark 4.1. Let $k\geq 1$ , $g(u)$ and $f$ be sufficiently smooth and bounded. Given $\psi_{1}\in H^{r}(D)$ and $\psi_{2}\in H^{r-1}(D)$ with $r\geq\frac{d}{2}+k$ , we assume that there exists $T>0$ and a classical solution $u$ and $v$ to the Sine-Gordon equations (38) such that $u\in C([0,T];H^{k}(D))$ and $v\in C([0,T];H^{k-1}(D))$ . Then, it follows that $u\in C^{k}(D\times[0,T])$ and $v\in C^{k-1}(D\times[0,T])$ based on the Sobolev embedding theorem.

4.2 Physics Informed Neural Networks

Let $\Omega=D\times[0,T]$ and $\Omega_{*}=\partial D\times[0,T]$ be the space-time domain. We define the following residuals for the PINN approximation, $u_{\theta}:\Omega\rightarrow\mathbb{R}$ and $v_{\theta}:\Omega\rightarrow\mathbb{R}$ , for the Sine-Gordon equations (38):


		$\displaystyle R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)=u_{\theta t}-v_{\theta},$		(40a)
		$\displaystyle R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)=\varepsilon^{2}v_{\theta t}-a^{2}\Delta u_{\theta}+\varepsilon_{1}^{2}u_{\theta}+g(u_{\theta})-f,$		(40b)
		$\displaystyle R_{tb1}[u_{\theta}](\bm{x})=u_{\theta}(\bm{x},0)-\psi_{1}(\bm{x}),$		(40c)
		$\displaystyle R_{tb2}[v_{\theta}](\bm{x})=v_{\theta}(\bm{x},0)-\psi_{2}(\bm{x}),$		(40d)
		$\displaystyle R_{sb}[v_{\theta}](\bm{x},t)=v_{\theta}(\bm{x},t)\|_{\partial D}-u_{dt}(t),$		(40e)

where $u_{dt}=\frac{\partial u_{d}}{\partial t}$ . Note that for the exact solution $(u,v)$ , $R_{int1}[u,v]=R_{int2}[u,v]=R_{tb1}[u]=R_{tb2}[v]=R_{sb}[v]=0$ . With PINN we minimize the following generalization error,

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle+\int_{D}\|R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}[v_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\left(\int_{\Omega_{*}}\|R_{sb}[v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$	(41)

Let

\hat{u}=u_{\theta}-u,\quad\hat{v}=v_{\theta}-v,

where $(u,v)$ denotes the exact solution. We define the total error of the PINN approximation of the Sine-Gordon equations (38) as,

\mathcal{E}(\theta)^{2}=\int_{\Omega}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t.

(42)

Then we choose the training set $\mathcal{S}\subset\overline{D}\times[0,T]$ with $\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb}$ , based on suitable quadrature points:

•

Interior training points $\mathcal{S}_{int}=\{{z}_{n}\}$ for $1\leq n\leq N_{int}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in D\times(0,T)$ .
•

Spatial boundary training points $\mathcal{S}_{sb}=\{{z}_{n}\}$ for $1\leq n\leq N_{sb}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in\partial D\times(0,T)$ .
•

Temporal boundary training points $\mathcal{S}_{tb}=\{\bm{x}_{n}\}$ for $1\leq n\leq N_{tb}$ with each $\bm{x}_{n}\in D$ .

The integrals in (4.2) are approximated by a numerical quadrature rule, resulting in the training loss,

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$	$\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}$
		$\displaystyle+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb}),$		(43)

where


$\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(44a)
$\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(44b)
$\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(44c)
$\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(44d)
$\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(44e)
$\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(44f)
$\displaystyle\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}\|R_{sb}[v_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2}.$	(44g)

Here the quadrature points in space-time constitute the data sets $\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}$ , $\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}}$ and $\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}}$ , and $\omega_{\star}^{n}$ are the quadrature weights with $\star$ being $int$ , $tb$ or $sb$ .

4.3 Error Analysis

By substracting the Sine-Gordon equations (38) from the residual equations (40), we get,


		$\displaystyle R_{int1}=\hat{u}_{t}-\hat{v},$		(45a)
		$\displaystyle R_{int2}=\varepsilon^{2}\hat{v}_{t}-a^{2}\Delta\hat{u}+\varepsilon_{1}^{2}\hat{u}+g(u_{\theta})-g(u),$		(45b)
		$\displaystyle R_{tb1}=\hat{u}(\bm{x},0),$		(45c)
		$\displaystyle R_{tb2}=\hat{v}(\bm{x},0),$		(45d)
		$\displaystyle R_{sb}=\hat{v}(\bm{x},t)\|_{\partial D}.$		(45e)

The results on the PINN approximations to the Sine-Gordon equations are summarized in the following theorems.

Theorem 4.2.

Let $d$ , $r$ , $k\in\mathbb{N}$ with $k\geq 3$ . Assume that $g(u)$ is Lipschitz continuous, $u\in C^{k}(D\times[0,T])$ and $v\in C^{k-1}(D\times[0,T])$ . Then for every integer $N>5$ , there exist $\tanh$ neural networks $u_{\theta}$ and $v_{\theta}$ , each with two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that


		$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)},\\|R_{tb1}\\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1},$		(46a)
		$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)},\\|\nabla R_{int1}\\|_{L^{2}(\Omega)},\\|\nabla R_{tb1}\\|_{L^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2},$		(46b)
		$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb}\\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.$		(46c)

The proof of this theorem is provided in the Appendix 8.3.

Theorem 4.2 implies that the PINN residuals in (40) can be made arbitrarily small by choosing a sufficiently large $N$ . Therefore, the generalization error $\mathcal{E}_{G}(\theta)^{2}$ can be made arbitrarily small.

We next show that the PINN total approximation error $\mathcal{E}(\theta)^{2}$ can be controlled by the generalization error $\mathcal{E}_{G}(\theta)^{2}$ (Theorem 4.3 below), and by the training error $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ (Theorem 4.4 below). The proofs for Theorem 4.3 and Theorem 4.4 are provided in the Appendix 8.3.

Theorem 4.3.

Let $d\in\mathbb{N}$ , $u\in C^{1}(\Omega)$ and $v\in C^{0}(\Omega)$ be the classical solution of the Sine-Gordon equation (38). Let $(u_{\theta},v_{\theta})$ denote the PINN approximation with parameter $\theta$ . Then the following relation holds,

\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

(47)

where

	$\displaystyle C_{G}=\int_{D}(\|R_{tb1}\|^{2}+a^{2}\|\nabla R_{tb1}\|^{2}+\varepsilon^{2}\|R_{tb2}\|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(\|R_{int1}\|^{2}+\|R_{int2}\|^{2}+a^{2}\|\nabla R_{int1}\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+2C_{\partial D}\|T\|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},$

and $C_{\partial D}=a^{2}|\partial D|^{\frac{1}{2}}(\|u\|_{C^{1}(\partial D\times[0,t])}+||u_{\theta}||_{C^{1}(\partial D\times[0,t])})$ .

Theorem 4.4.

Let $d\in\mathbb{N}$ and $T>0$ , and let $u\in C^{4}(\Omega)$ and $v\in C^{3}(\Omega)$ be the classical solution to the Sine-Gordon equation (38). Let $(u_{\theta},v_{\theta})$ denote the PINN approximation with parameter $\theta\in\Theta$ . Then the following relation holds,

		$\displaystyle\int_{0}^{T}\int_{D}(\|\hat{u}(\bm{x},t)\|^{2}+a^{2}\|\nabla\hat{u}(\bm{x},t)\|^{2}+\varepsilon^{2}\|\hat{v}(\bm{x},t)\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right)$
		$\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta,\mathcal{S})^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),$		(48)

where the constant $C_{T}$ is defined by

	$\displaystyle C_{T}=$	$\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\varepsilon^{2}\left(C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})\right)$
		$\displaystyle+a^{2}\left(C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})\right)+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})$
		$\displaystyle+C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+a^{2}\left(C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})\right),$
		$\displaystyle+2C_{\partial D}\|T\|^{\frac{1}{2}}\left(C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})\right)^{\frac{1}{2}}.$

It follows from Theorem 4.4 that the PINN approximation error $\mathcal{E}(\theta)^{2}$ can be arbitrarily small, provided that the training error $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ is sufficiently small and the sample set is sufficiently large.

5 Physics Informed Neural Networks for Approximating Linear Elastodynamic Equation

5.1 Linear Elastodynamic Equation

Consider an elastic body occupying an open, bounded convex polyhedral domain $D\subset\mathbb{R}^{d}$ . The boundary $\partial D=\Gamma_{D}\cup\Gamma_{N}$ , with the outward unit normal vector $\bm{n}$ , is assumed to be composed of two disjoint portions $\Gamma_{D}\neq\emptyset$ and $\Gamma_{N}$ , with $\Gamma_{D}\cap\Gamma_{N}=\emptyset$ . Given a suitable external load $\bm{f}\in L^{2}((0,T];\bm{L}^{2}(D))$ , and suitable initial/boundary data $\bm{g}\in C^{1}((0,T];\bm{H}^{\frac{1}{2}}(\Gamma_{N}))$ , $\bm{\psi}_{1}\in\bm{H}_{0,\Gamma_{D}}^{\frac{1}{2}}(D)$ and $\bm{\psi}_{2}\in\bm{L}^{2}(D)$ , we consider the linear elastodynamic equations,


		$\displaystyle\bm{u}_{t}-\bm{v}=0\ \quad\quad\qquad\qquad\qquad\qquad\qquad\quad\ \ \,\text{in}\ D\times[0,T],$		(49a)
		$\displaystyle\rho\bm{v}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}\,\quad\qquad\text{in}\ D\times[0,T],$		(49b)
		$\displaystyle\bm{u}=\bm{u}_{d}\ \ \,\quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ \Gamma_{D}\times[0,T],$		(49c)
		$\displaystyle 2\mu\underline{\bm{\varepsilon}}(\bm{u})\bm{n}+\lambda(\nabla\cdot\bm{u})\bm{n}=\bm{g}\ \ \quad\qquad\qquad\qquad\text{in}\ \Gamma_{N}\times[0,T],$		(49d)
		$\displaystyle\bm{u}=\bm{\psi}_{1}\ \ \quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\,\text{in}\ D\times\{0\},$		(49e)
		$\displaystyle\bm{v}=\bm{\psi}_{2}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ D\times\{0\}.$		(49f)

In the above system, $\bm{u}=(u_{1},u_{2},\cdots,u_{d})$ and $\bm{v}=(v_{1},v_{2},\cdots,v_{d})$ denote the displacement and the velocity, respectively, and $[0,T]$ (with $T>0$ ) denotes the time domain. $\underline{\bm{\varepsilon}}(\bm{u})$ is the strain tensor, $\underline{\bm{\varepsilon}}(\bm{u})=\frac{1}{2}(\nabla\bm{u}+\nabla\bm{u}^{T})$ . The constants $\lambda$ and $\mu$ are the first and the second Lam ${\rm\acute{e}}$ parameters, respectively.

Combining (49a) and (49b), we can recover the classical linear elastodynamics equation:

\rho\bm{u}_{tt}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}\qquad\text{in}\ D\times[0,T].

(50)

The well-posedness of this equation is established in Hughes1978Classical .

Lemma 5.1 (Hughes1978Classical ; Yosida1980Functional ).

Let $\bm{\psi}_{1}\in H^{r}(D)$ , $\bm{\psi}_{2}\in H^{r-1}(D)$ and $\bm{f}\in H^{r-1}(D\times[0,T])$ with $r\geq 1$ . Then there exists a unique solution $\bm{u}$ to the classical linear elastodynamic equation (50) such that $\bm{u}(t=0)=\bm{\psi}_{1}$ , $\bm{u}_{t}(t=0)=\bm{\psi}_{2}$ and $\bm{u}\in C^{l}([0,T];H^{r-l}(D))$ with $0\leq l\leq r$ .

Lemma 5.2.

Let $k\in\mathbb{N}$ , $\bm{\psi}_{1}\in H^{r}(D)$ , $\bm{\psi}_{2}\in H^{r-1}(D)$ and $\bm{f}\in H^{r-1}(D\times[0,T])$ with $r>\frac{d}{2}+k$ , then there exists $T>0$ and a classical solution $(\bm{u},\bm{v})$ to the elastodynamic equations (49) such that $\bm{u}(t=0)=\bm{\psi}_{1}$ , $\bm{u}_{t}(t=0)=\bm{\psi}_{2}$ , $\bm{u}\in C^{k}(D\times[0,T])$ and $\bm{v}\in C^{k-1}(D\times[0,T])$ .

Proof.

As $r>\frac{d}{2}+k$ , $H^{r-k}(D)$ is a Banach algebra. By Lemma 5.1, there exists $T>0$ and the solution $(\bm{u},\bm{v})$ to the linear elastodynamics equations such that $\bm{u}(t=0)=\bm{\psi}_{1}$ , $\bm{v}(t=0)=\bm{\psi}_{2}$ , $\bm{u}\in C^{l}([0,T];H^{r-l}(D))$ with $0\leq l\leq r$ and $\bm{v}\in C^{l}([0,T];H^{r-1-l}(D))$ with $0\leq l\leq r-1$ .

Since $\bm{u}\in\cap_{l=0}^{k}C^{l}([0,T];H^{r-l}(D))$ and $\bm{v}\subset\cap_{l=0}^{k-1}C^{l}([0,T];H^{r-l-1}(D))$ . By applying the Sobolev embedding theorem and $r>\frac{d}{2}+k$ , we obtain $H^{r-l}(D)\subset C^{r-l}(D)$ and $H^{r-l-1}(D)\subset C^{r-l-1}(D)$ for $0\leq l\leq k$ . Therefore, $\bm{u}\in C^{k}(D\times[0,T])$ and $\bm{v}\in C^{k-1}(D\times[0,T])$ . ∎

5.2 Physics Informed Neural Networks

We now consider the PINN approximation of the linear elastodynamic equations (49). Let $\Omega=D\times[0,T]$ , $\Omega_{D}=\Gamma_{D}\times[0,T]$ and $\Omega_{N}=\Gamma_{N}\times[0,T]$ denote the space-time domain. Define the following residuals for the PINN approximation $\bm{u}_{\theta}:\Omega\rightarrow\mathbb{R}$ and $\bm{v}_{\theta}:\Omega\rightarrow\mathbb{R}$ for the elastodynamic equations (49):


		$\displaystyle\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)=\bm{u}_{\theta t}-\bm{v}_{\theta},$		(51a)
		$\displaystyle\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)=\rho\bm{v}_{\theta t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}))-\lambda\nabla(\nabla\cdot\bm{u}_{\theta})-\bm{f},$		(51b)
		$\displaystyle\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})=\bm{u}_{\theta}(\bm{x},0)-\bm{\psi}_{1}(\bm{x}),$		(51c)
		$\displaystyle\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x})=\bm{v}_{\theta}(\bm{x},0)-\bm{\psi}_{2}(\bm{x}),$		(51d)
		$\displaystyle\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x},t)=\bm{v}_{\theta}\|_{\Gamma_{D}}-\bm{u}_{dt},$		(51e)
		$\displaystyle\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x},t)=(2\mu\underline{\bm{\varepsilon}}(\bm{u}_{\theta})\bm{n}+\lambda(\nabla\cdot\bm{u}_{\theta})\bm{n})\|_{\Gamma_{N}}-\bm{g}.$		(51f)

Note that for the exact solution $(\bm{u},\bm{v})$ , we have $\bm{R}_{int1}[\bm{u},\bm{v}]=\bm{R}_{int2}[\bm{u},\bm{v}]=\bm{R}_{tb1}[\bm{u}]=\bm{R}_{tb2}[\bm{v}]=\bm{R}_{sb1}[\bm{v}]=\bm{R}_{sb2}[\bm{u}]=0$ . With PINN we minimize the the following generalization error,

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t))\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle+\int_{\Omega}\|\nabla\cdot(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t))\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{D}\|\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\int_{D}\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}))\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla\cdot\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\left(\int_{\Omega_{D}}\|\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+\left(\int_{\Omega_{N}}\|\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$	(52)

Let

\hat{\bm{u}}=\bm{u}_{\theta}-\bm{u},\quad\hat{\bm{v}}=\bm{v}_{\theta}-\bm{v}

denote the difference between the solution to the elastodynamic equations (49) and the PINN approximation with parameter $\theta$ . We define the total error of the PINN approximation as,

\mathcal{E}(\theta)^{2}=\int_{\Omega}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t.

(53)

We choose the training set $\mathcal{S}\subset\overline{D}\times[0,T]$ based on suitable quadrature points. The full training set is defined by $\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb}$ , and $\mathcal{S}_{sb}=\mathcal{S}_{sb1}\cup\mathcal{S}_{sb2}$ :

•

Interior training points $\mathcal{S}_{int}=\{{z}_{n}\}$ for $1\leq n\leq N_{int}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in D\times(0,T)$ .
•

Spatial boundary training points $\mathcal{S}_{sb1}=\{{z}_{n}\}$ for $1\leq n\leq N_{sb1}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in\Gamma_{D}\times(0,T)$ , and $\mathcal{S}_{sb2}=\{{z}_{n}\}$ for $1\leq n\leq N_{sb2}$ , with each ${z}_{n}=(\bm{x},t)_{n}\in\Gamma_{N}\times(0,T)$ .
•

Temporal boundary training points $\mathcal{S}_{tb}=\{\bm{x}_{n}\}$ for $1\leq n\leq N_{tb}$ with each $\bm{x}_{n}\in D$ .

Then, the integrals in (5.2) can be approximated by a suitable numerical quadrature, resulting in the following training loss,

	$\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$	$\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int4}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$
		$\displaystyle\quad+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb4}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb1})+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb2}),$		(54)

where,


$\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(55a)
$\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(55b)
$\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n}))\|^{2},$	(55c)
$\displaystyle\mathcal{E}_{T}^{int4}(\theta,\mathcal{S}_{int})^{2}$	$\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}\|\nabla\cdot\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})\|^{2},$	(55d)
$\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(55e)
$\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(55f)
$\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n}))\|^{2},$	(55g)
$\displaystyle\mathcal{E}_{T}^{tb4}(\theta,\mathcal{S}_{tb})^{2}$	$\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}\|\nabla\cdot\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n})\|^{2},$	(55h)
$\displaystyle\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb1})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb1}}\omega_{sb1}^{n}\|\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2},$	(55i)
$\displaystyle\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb2})^{2}$	$\displaystyle=\sum_{n=1}^{N_{sb2}}\omega_{sb2}^{n}\|\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})\|^{2}.$	(55j)

Here the quadrature points in space-time constitute the data sets $\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}$ , $\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}}$ , $\mathcal{S}_{sb1}=\{(\bm{x}_{sb1}^{n},t_{sb1}^{n})\}_{n=1}^{N_{sb1}}$ and $\mathcal{S}_{sb2}=\{(\bm{x}_{sb2}^{n},t_{sb2}^{n})\}_{n=1}^{N_{sb2}}$ . $\omega_{\star}^{n}$ denote the suitable quadrature weights with $\star$ being $int$ , $tb$ , $sb1$ and $sb2$ .

5.3 Error Analysis

Subtracting the elastodynamic equations (49) from the residual equations (51), we obtain


		$\displaystyle\bm{R}_{int1}=\hat{\bm{u}}_{t}-\hat{\bm{v}},$		(56a)
		$\displaystyle\bm{R}_{int2}=\rho\hat{\bm{v}}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\hat{\bm{u}}))-\lambda\nabla(\nabla\cdot\hat{\bm{u}}),$		(56b)
		$\displaystyle\bm{R}_{tb1}=\hat{\bm{u}}\|_{t=0},$		(56c)
		$\displaystyle\bm{R}_{tb2}=\hat{\bm{v}}\|_{t=0},$		(56d)
		$\displaystyle\bm{R}_{sb1}=\hat{\bm{v}}\|_{\Gamma_{D}},$		(56e)
		$\displaystyle\bm{R}_{sb2}=(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\|_{\Gamma_{N}}.$		(56f)

The PINN approximation results are summarized in the following three theorems. The proofs of these theorems are provided in the Appendix 8.4.

Theorem 5.3.

Let $d$ , $r$ , $k\in\mathbb{N}$ with $k\geq 3$ . Let $\bm{\psi}_{1}\in H^{r}(D)$ , $\bm{\psi}_{2}\in H^{r-1}(D)$ and $\bm{f}\in H^{r-1}(D\times[0,T])$ with $r>\frac{d}{2}+k$ . For every integer $N>5$ , there exist $\tanh$ neural networks $(\bm{u}_{j})_{\theta}$ and $(\bm{v}_{j})_{\theta}$ , with $j=1,2,\cdots,d$ , each with two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that


		$\displaystyle\\|\bm{R}_{int1}\\|_{L^{2}(\Omega)},\\|\bm{R}_{tb1}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$		(57a)
		$\displaystyle\\|\bm{R}_{int2}\\|_{L^{2}(\Omega)},\\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\\|_{L^{2}(\Omega)},\\|\nabla\cdot\bm{R}_{int1}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$		(57b)
		$\displaystyle\\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\\|_{L^{2}(D)},\\|\nabla\cdot\bm{R}_{tb1}\\|_{L^{2}(D)},\\|\bm{R}_{sb2}\\|_{L^{2}(\Gamma_{N}\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2},$		(57c)
		$\displaystyle\\|\bm{R}_{tb2}\\|_{L^{2}(D)},\\|\bm{R}_{sb1}\\|_{L^{2}(\Gamma_{D}\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.$		(57d)

It follows from Theorem 5.3 that, by choosing a sufficiently large $N$ , one can make the PINN residuals in (51), and thus the generalization error $\mathcal{E}_{G}(\theta)^{2}$ in (5.2), arbitrarily small.

Theorem 5.4.

Let $d\in\mathbb{N}$ , $\bm{u}\in C^{1}(\Omega)$ and $\bm{v}\in C(\Omega)$ be the classical solution to the linear elastodynamic equation (49). Let $(\bm{u}_{\theta},\bm{v}_{\theta})$ denote the PINN approximation with the parameter $\theta$ . Then the following relation holds,

\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+2\mu+\lambda)T\right),

where

	$\displaystyle C_{G}=\int_{D}\|\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda\|\nabla\cdot\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}\|\bm{R}_{tb2}\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(\|\bm{R}_{int1}\|^{2}+2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}+\lambda\|\nabla\cdot\bm{R}_{int1}\|^{2}+\|\bm{R}_{int2}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+2\|T\|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2\|T\|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},$

with $C_{\Gamma_{D}}=(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}\|\bm{u}\|_{C^{1}(\Gamma_{D}\times[0,T])}+(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}||\bm{u}_{\theta}||_{C^{1}(\Gamma_{D}\times[0,T])}$ and $C_{\Gamma_{N}}=|\Gamma_{N}|^{\frac{1}{2}}(\|\bm{v}\|_{C(\Gamma_{N}\times[0,T])}+||\bm{v}_{\theta}||_{C(\Gamma_{N}\times[0,T])})$ .

Theorem 5.4 shows that the total error of the PINN approximation $\mathcal{E}(\theta)^{2}$ can be controlled by the generalization error $\mathcal{E}_{G}(\theta)^{2}$ .

Theorem 5.5.

Let $d\in\mathbb{N}$ , $\bm{u}\in C^{4}(\Omega)$ and $\bm{v}\in C^{3}(\Omega)$ be the classical solution to the linear elastodynamic equation (49). Let $(\bm{u}_{\theta},\bm{v}_{\theta})$ denote the PINN approximation with the parameter $\theta$ . Then the following relation holds,

		$\displaystyle\int_{0}^{T}\int_{D}(\|\hat{\bm{u}}(\bm{x},t)\|^{2}+2\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))\|^{2}+\lambda\|\nabla\cdot\hat{\bm{u}}(\bm{x},t)\|^{2}+\rho\|\hat{\bm{v}}(\bm{x},t)\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+2\mu+\lambda)T\right)$
		$\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),$		(58)

where

	$\displaystyle C_{T}=$	$\displaystyle C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\rho\left(C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})\right)+2\mu\left(C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})\right)$
		$\displaystyle+\lambda\left(C_{(\|\nabla\cdot\bm{R}_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla\cdot\bm{R}_{tb1}\|^{2})\right)+C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})$
		$\displaystyle+C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+2\mu\left(C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})\right)$
		$\displaystyle+\lambda\left(C_{(\|\nabla\cdot\bm{R}_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla\cdot\bm{R}_{int1}\|^{2})\right)+2\|T\|^{\frac{1}{2}}C_{\Gamma_{D}}\left(C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})\right)^{\frac{1}{2}}$
		$\displaystyle+2\|T\|^{\frac{1}{2}}C_{\Gamma_{N}}\left(C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})\right)^{\frac{1}{2}}.$

Theorem 5.5 shows that the PINN approximation error $\mathcal{E}(\theta)^{2}$ can be controlled by the training error $\mathcal{E}_{T}(\theta,\mathcal{S})^{2}$ with a large enough sample set $\mathcal{S}$ .

6 Numerical Examples

The theoretical analyses from Sections 3 to 5 suggest several forms for the PINN loss function with the wave, Sine-Gordon and the linear elastodynamic equations. These forms contain certain non-standard terms, such as the square root of the residuals or the gradient terms on some boundaries, which would generally be absent from the canonical PINN formulation of the loss function. The presence of such non-standard terms is crucial to bounding the PINN approximation errors, as shown in the error analyses.

These non-standard forms of the loss function lead to a variant PINN algorithm. In this section we illustrate the performance of the variant PINN algorithm as suggested by the theoretical analysis, as well as the more standard PINN algorithm, using several numerical examples in one spatial dimension (1D) plus time for the wave equation and the Sine-Gordon equation, and in two spatial dimensions (2D) plus time for the linear elastodynamic equation.

The following settings are common to all the numerical simulations in this section. Let $(\bm{x},t)\in D\times[0,T]$ denote the spatial and temporal coordinates in the spatial-temporal domain, where $\bm{x}=x$ and $\bm{x}=(x,y)$ for one and two spatial dimensions, respectively. For the wave equation and the Sine-Gordon equation, the neural networks contain two nodes in the input layer (representing $x$ and $t$ ), two hidden layers with the number of nodes to be specified later, and two nodes in the output layer (representing the solution $u$ and its time detivative $v=\frac{\partial u}{\partial t}$ ). For the linear elastodynamic equaton, three input nodes and four output nodes are employed in the neural network, as will be explained in more detail later. We employ the $\tanh$ (hyperbolic tangent) activation function for all the hidden nodes, and no activation function is applied to the output nodes (i.e. linear). For training the neural networks, we employ $N$ collocation points within the spatial-temporal domain drawn from a uniform random distribution, and also $N$ uniform random points on each spatial boundary and on the initial boundary. In the simulations the value of $N$ is varied systematically among $1000$ , $1500$ , $2000$ , $2500$ and $3000$ . After the neural networks are trained, for the wave equation and the Sine-Gordon equation, we compare the PINN solution and the exact solution on a set of $N_{ev}=3000\times 3000$ uniform spatial-temporal grid points (evaluation points) $(x,t)_{n}\in D\times[0,T]$ ( $n=1,\cdots,N_{ev}$ ) that covers the problem domain and the boundaries. For the elastodynamic equation, we compare the PINN solution and the exact solution at different time instants, and at each time instant the corresponding solutions are evaluated at a uniform set of $N_{ev}=1500\times 1500$ grid points in the spatial domain, $\bm{x}_{n}=(x,y)_{n}\in D$ ( $n=1,\cdots,N_{ev}$ ).

The PINN errors reported below are computed as follows. Let $z_{n}=(\bm{x},t)_{n}$ ( $(\bm{x},t)_{n}\in D\times[0,T],n=1,\cdots,N_{ev}$ ) denote the set of uniform grid points, where $N_{ev}$ denote the number of evaluation points. The errors of PINN are defined by,


		$\displaystyle l_{2}\text{-error}=\frac{\sqrt{\sum_{n=1}^{N_{ev}}\|u(z_{n})-u_{\theta}(z_{n})\|^{2}}}{\sqrt{\sum_{n=1}^{N_{ev}}u(z_{n})^{2}}}=\frac{\sqrt{\left(\sum_{n=1}^{N_{ev}}\|u(z_{n})-u_{\theta}(z_{n})\|^{2}\right)/N_{ev}}}{\sqrt{\left(\sum_{n=1}^{N_{ev}}u(z_{n})^{2}\right)/N_{ev}}},$		(59a)
		$\displaystyle l_{\infty}\text{-error}=\frac{\max\{\|u(z_{n})-u_{\theta}(z_{n})\|\}_{n=1}^{N_{ev}}}{\sqrt{\left(\sum_{n=1}^{N_{ev}}u(z_{n})^{2}\right)/N_{ev}}},$		(59b)

where $u_{\theta}$ denotes the PINN solution and $u$ denotes the exact solution.

Our implementation of the PINN algorithm is based on the PyTorch library (pytorch.org). In all the following numerical examples, we combine the Adam kingma2014adam optimizer and the L-BFGS 2006_NumericalOptimization optimizer (in batch mode) to train the neural network. We first employ the Adam optimizer to train the network for 100 epochs/iterations, and then employ the L-BFGS optimizer to continue the network training for another 30000 iterations. We employ the default parameter values in Adam, with the learning rate $0.001$ , $\beta_{1}=0.9$ and $\beta_{2}=0.99$ . The initial learning rate $1.0$ is adopted in the L-BFGS optimizer.

6.1 Wave Equation

Refer to caption — (a) True solution for $u$

We next test the PINN algorithm for solving the wave equation (19) in one spatial dimension (plus time), under a configuration in accordance with that of 2021_JCP_Dong_modifiedbatch . Consider the spatial-temporal domain, $(x,t)\in D\times[0,T]=[0,5]\times[0,2]$ , and the initial-boundary value problem with the wave equation on this domain,


	$\displaystyle\frac{\partial^{2}u}{\partial t^{2}}-c^{2}\frac{\partial^{2}u}{\partial x^{2}}=0,$		(60a)
	$\displaystyle u(0,t)=u(5,t),\qquad\frac{\partial u}{\partial x}(0,t)=\frac{\partial u}{\partial x}(5,t),$		(60b)
	$\displaystyle u(x,0)=2\,{\rm sech}^{3}\left(\frac{3}{\delta_{0}}(x-x_{0})\right),\qquad\frac{\partial u}{\partial t}(x,0)=0,$		(60c)

where $u(x,t)$ is the wave field to be solved for, $c$ is the wave speed, $x_{0}$ is the initial peak location of the wave, $\delta_{0}$ is a constant that controls the width of the wave profile, and the periodic boundary conditions are imposed on $x=0$ and $5$ . In the simulations, we employ $c=2$ , $\delta_{0}=2$ , and $x_{0}=3$ . Then the above problem has the solution,

\left\{\begin{split}&u(x,t)={\rm sech}^{3}\left(\frac{3}{\delta_{0}}\left(-2.5+\xi\right)\right)+{\rm sech}^{3}\left(\frac{3}{\delta_{0}}\left(-2.5+\eta\right)\right),\\ &\xi={\rm mod}\left(x-x_{0}+ct+2.5,5\right),\quad\eta={\rm mod}\left(x-x_{0}-ct+2.5,5\right),\end{split}\right.

where mod refers to the modulo operation. The two terms in $u(x,t)$ represent the leftward- and rightward-traveling waves, respectively.

We reformulate the problem (60) into the following system,


	$\displaystyle u_{t}-v=0,\qquad v_{t}-c^{2}u_{xx}=0,$		(61a)
	$\displaystyle u(0,t)=u(5,t),\qquad u_{x}(0,t)=u_{x}(5,t),$		(61b)
	$\displaystyle u(x,0)=2\,{\rm sech}^{3}\left(\frac{3}{\delta_{0}}(x-x_{0})\right),\qquad v(x,0)=0,$		(61c)

where $v(x,t)$ is an auxiliary field given by the first equation in (61a).

To solve the system (61) with PINN, we employ $90$ and $60$ neurons in the first and the second hidden layers of neural networks, respectively. We employ the following loss function in PINN in light of (3.2),

$\displaystyle\text{Loss}=$	$\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-2\,{\rm sech}^{3}\left(\frac{3}{\delta}_{0}(x_{tb}^{n}-x_{0})\right)\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)+\frac{18\sinh((3x_{tb}^{n}-3x_{0})/\delta_{0})}{\delta_{0}\cosh^{4}((3x_{tb}^{n}-3x_{0})/\delta_{0})}\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[v_{\theta}(0,t_{sb}^{n})-v_{\theta}(5,t_{sb}^{n})\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(0,t_{sb}^{n})-u_{\theta x}(5,t_{sb}^{n})\right]^{2}.$	(62)

Note that in the simulations we have employed the same number of collocation points ( $N$ ) within the domain and on each of the domain boundaries. The above loss function differs slightly from the one in the error analysis (3.2), in several aspects. First, we have added a set of penalty coefficients $W_{n}>0$ ( $1\leq n\leq 8$ ) for different loss terms in numerical simulations. Second, the collocation points used in simulations (e.g. $x_{int}^{n}$ , $t_{int}^{n}$ , $x_{sb}^{n}$ , $t_{sb}^{n}$ , $x_{tb}^{n}$ ) are generated randomly within the domain or on the domain boundaries from a uniform distribution. In addition, the averaging used here do not exactly correspond to the numerical quadrature rule (mid-point rule) used in the theoretical analysis.

We have also considered another form (given below) for the loss function, as suggested by an alternate analysis as discussed in Remark 3.5 (see equation (3.5)),

$\displaystyle\text{Loss}=$	$\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-2\,{\rm sech}^{3}\left(\frac{3}{\delta}_{0}(x_{tb}^{n}-x_{0})\right)\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)+\frac{18\sinh((3x_{tb}^{n}-3x_{0})/\delta_{0})}{\delta_{0}\cosh^{4}((3x_{tb}^{n}-3x_{0})/\delta_{0})}\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left\|v_{\theta}(0,t_{sb}^{n})-v_{\theta}(5,t_{sb}^{n})\right\|+\frac{W_{8}}{N}\sum_{n=1}^{N}\left\|u_{\theta x}(0,t_{sb}^{n})-u_{\theta x}(5,t_{sb}^{n})\right\|.$	(63)

The difference between this form and the form (6.1) lies in the last two terms, with the terms here not squared.

The loss function (6.1) will be referred to as the loss form #1 in subsequent discussions, and (6.1) will be referred to as the loss form #2. The PINN schemes that employ these two different loss forms will be referred to as PINN-F1 and PINN-F2, respectively.

Figure 1 shows distributions of the exact solutions, the PINN solutions, and the PINN point-wise absolute errors for $u$ and $v$ in the spatial-temporal domain. Here the PINN solution is computed by PINN-F1, in which penalty coefficients are given by $\bm{W}=(W_{1},\dots,W_{8})=(0.8,0.8,0.8,0.5,0.5,0.5,0.9,0.9)$ . One can observe that the method has captured the wave fields for $u$ and $\frac{\partial u}{\partial t}$ reasonably well, with the error for $u$ notably smaller than that of $\frac{\partial u}{\partial t}$ .

Figures 2 and 3 provide a comparison of the solutions obtained using the two forms of loss functions. Figure 2 compares profiles of the PINN-F1 and PINN-F2 solutions, and the exact solution, for $u$ (top row) at three time instants ( $t=0.5$ , $1.0$ , and $1.5$ ), as well as the error profiles (bottom row). Figure 3 shows the corresponding results for the field variable $v=\frac{\partial u}{\partial t}$ . These results are obtained by using $N=2000$ training data points in the domain and on each of the domain boundaries. It is observed that both PINN schemes, with the loss functions given by (6.1) and (6.1) respectively, have captured the solution reasonably well. We further observe that the PINN-F1 scheme (with the loss form (6.1)) produces notably more accurate results than the PINN-F2 (with loss form (6.1)), especially for the field $\frac{\partial u}{\partial t}$ .

We have varied the number of training data points $N$ systematically and studied its effect on the PINN results. Figure 4 shows the loss histories of PINN-F1 and PINN-F2 corresponding to different number of training data points ( $N$ ) in the simulations, with a total of $30,000$ training iterations. We can make two observations. First, the history curves with the loss function form #1 is generally smoother, indicating that the loss function decreases almost monotonically as the training progresses. On the other hand, significant fluctuations in the loss history can be observed with the form #2. Second, the eventual loss values produced by the loss form #1 are significantly smaller, by over an order of magnitude, than those produced by the loss form #2.

Table 1 is a further comparison between the PINN-F1 and PINN-F2. Here the $l_{2}$ and $l_{\infty}$ errors of $u$ and $v$ computed by PINN-F1 and PINN-F2 corresponding to different training data points ( $N$ ) have been listed. There appears to be a general trend that the errors tend to decrease with increasing number of training points, but the decrease is not monotonic. It can be observed that the $u$ errors are notably smaller than those for $v=\frac{\partial u}{\partial t}$ , as signified earlier in e.g. Figure 1. One can again observe that PINN-F1 results are notably more accurate than those of PINN-F2 for the wave equation.

Table 1: Wave equation: The

u

and

v

errors versus the number of training data points

N

method	$N$	$l_{2}$ -error		$l_{\infty}$ -error
method	$N$	$u_{\theta}$	$v_{\theta}$	$u_{\theta}$	$v_{\theta}$
	1000	5.7013e-03	1.3531e-02	1.8821e-02	4.6631e-02
	1500	2.1689e-03	4.1035e-03	6.7631e-03	1.5109e-02
PINN-F1	2000	4.6896e-03	9.6417e-03	1.3828e-02	3.3063e-02
	2500	3.7879e-03	9.8574e-03	1.2868e-02	3.3622e-02
	3000	2.6588e-03	6.0746e-03	8.1457e-03	1.9860e-02
	1000	4.7281e-02	9.2431e-02	1.4367e-01	3.2764e-01
	1500	4.9087e-02	1.2438e-01	2.1525e-01	5.0601e-01
PINN-F2	2000	1.8554e-02	4.9224e-02	6.0780e-02	1.6358e-01
	2500	2.3526e-02	5.4266e-02	9.8690e-02	1.9467e-01
	3000	1.4164e-02	3.7796e-02	5.3045e-02	1.4179e-01

Theorem 3.6 suggests the solution errors for $u$ , $v=\frac{\partial u}{\partial t}$ , and $\nabla u$ approximately scale as the square root of the training loss function. Figure 5 provides some numerical evidence for this point. Here we plot the $l^{2}$ errors for $u$ , $\frac{\partial u}{\partial t}$ and $\frac{\partial u}{\partial x}$ from our simulations as a function of the training loss value for PINN-F1 and PINN-F2 in logarithmic scales. It is evident that for PINN-F1 the scaling essentially follows the square root relation. For PINN-F2 the relation between the error and the training loss appears to scale with a power somewhat larger than $\frac{1}{2}$ .

6.2 Sine-Gordon Equation

We test the PINN algorithm suggested by the theoretical analysis for the Sine-Gordon equation (38) in this subsection. Consider the spatial-temporal domain $(x,t)\in\Omega=D\times[0,T]=[0,1]\times[0,2]$ , and the following initial/boundary value problem on this domain,


	$\displaystyle\frac{\partial^{2}u}{\partial t^{2}}-\frac{\partial^{2}u}{\partial x^{2}}+u+\sin(u)=f(x,t),$		(64a)
	$\displaystyle u({0},t)=\phi_{1}(t),\qquad u({1},t)=\phi_{2}(t),$		(64b)
	$\displaystyle u({x},0)=\psi_{1}({x}),\qquad\frac{\partial u}{\partial t}({x},0)=\psi_{2}({x}).$		(64c)

In these equations, $u(x,t)$ is the field function to be solved for, $f(x,t)$ is a source term, $\psi_{1}$ and $\psi_{2}$ are the initial conditions, and $\phi_{1}$ and $\phi_{2}$ are the boundary conditions. The source term, initial and boundary conditions appropriately are chosen by the following exact solution,

\displaystyle u(x,t)=\left[2\cos\left(\pi x+\frac{\pi}{5}\right)+\frac{9}{5}\cos\left(2\pi x+\frac{7\pi}{20}\right)\right]\left[2\cos\left(\pi t+\frac{\pi}{5}\right)+\frac{9}{5}\cos\left(2\pi t+\frac{7\pi}{20}\right)\right].

(65)

To simulate this problem with PINN, we reformulate the problem as follows,


	$\displaystyle u_{t}-v=0,$		(66a)
	$\displaystyle v_{t}-u_{xx}+u+\sin(u)=f(x,t),$		(66b)
	$\displaystyle u({0},t)=\phi_{1}(t),\qquad u({1},t)=\phi_{2}(t),$		(66c)
	$\displaystyle u({x},0)=\psi_{1}({x}),\qquad v({x},0)=\psi_{2}({x}),$		(66d)

where $v$ is a variable defined by equation (66a).

In light of (4.2), we employ the following loss function in PINN,

$\displaystyle\text{Loss}=$	$\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})+u_{\theta}(x_{int}^{n},t_{int}^{n})+\sin(u_{\theta}(x_{int}^{n},t_{int}^{n}))-f(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-\psi_{1}(x_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)-\psi_{2}(x_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)-\psi_{1x}(x_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[\|v_{\theta}(0,t_{sb}^{n})-\phi_{1t}({t_{sb}^{n}})\|+\|v_{\theta}(1,t_{sb}^{n})-\phi_{2t}({t_{sb}^{n}})\|\right],$	(67)

where $W_{n}>0$ ( $1\leq n\leq 7$ ) are the penalty coefficients for different loss terms added in the PINN implementation. It should be noted that the loss terms with the coefficients $W_{3}$ and $W_{6}$ will be absent from the conventional PINN formulation (see Raissi2019pinn ). These terms in the training loss are necessary based on the error analysis in Section 4. It should also be noted that the $W_{7}$ loss terms are not squared, as dictated by the theoretical analysis of Section 4.

We have also implemented a PINN scheme with a variant form for the loss function,

$\displaystyle\text{Loss}=$	$\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})+u_{\theta}(x_{int}^{n},t_{int}^{n})+\sin(u_{\theta}(x_{int}^{n},t_{int}^{n}))-f(x_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-\psi_{1}(x_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)-\psi_{2}(x_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)-\psi_{1x}(x_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[(v_{\theta}(0,t_{sb}^{n})-\phi_{1t}({t_{sb}^{n}}))^{2}+(v_{\theta}(1,t_{sb}^{n})-\phi_{2t}({t_{sb}^{n}}))^{2}\right].$	(68)

The difference between (6.2) and (6.2) lies in the $W_{7}$ terms. These $W_{7}$ terms in (6.2) are squared, and they are not in (6.2). We refer to the PINN scheme employing the loss function (6.2) as PINN-G1 and the scheme employing the loss function (6.2) as PINN-G2.

In the simulations we employ a feed-forward neural network with two input nodes (representing $x$ and $t$ ), two output nodes (representing $u$ and $v$ ), and two hidden layers, each having a width of $80$ nodes. The $\tanh$ activation function has been used for all the hidden nodes. We employ $N$ collocation points generated from a uniform random distribution within the domain, on each of the domain boundary, and also on the initial boundary, where $N$ is varied systematically in the simulations. The penalty coefficients in the loss functions are taken to be $\bm{W}=(W_{1},\dots,W_{7})=(0.5,0.4,0.5,0.6,0.6,0.6,0.8)$ .

Figure 6 shows distributions of of $u(x,t)$ and $v=\frac{\partial u}{\partial t}$ from the exact solution (left column) and the PINN solution (middle column), as well as the point-wise absolute errors of the PINN solution for these fields (right column). These results are obtained by PINN-G2 with $N=2000$ random collocation points within the domain and on each of the domain boundaries. The PINN solution is in good agreement with the true solution.

Figures 7 and 8 compare the profiles of $u$ and $v$ between the exact solution, and the solutions obtained by PINN-G1 and PINN-G2, at several time instants ( $t=0.5$ , $1$ and $1.5$ ). Profiles of the absolute errors of the PINN-G1/PINN-G2 solutions are also shown in these figures. We observe that both PINN-G1 and PINN-G2 have captured the solution for $u$ quite accurately, and to a lesser extent, also for $v$ . Comparison of the error profiles between PINN-G1 and PINN-G2 suggests that the PINN-G2 error in general appears to be somewhat smaller than that of PINN-G1. But this seems not to be true consistently in the entire domain.

Table 2: Sine-Gordon equation: The

l_{2}

and

l_{\infty}

errors for

u

and

v

versus the number of training collocation points

N

corresponding to PINN-G1 and PINN-G2.

method	$N$	$l_{2}$ -error		$l_{\infty}$ -error
method	$N$	$u_{\theta}$	$v_{\theta}$	$u_{\theta}$	$v_{\theta}$
	1000	3.0818e-03	4.3500e-03	9.6044e-03	1.8894e-02
	1500	3.4335e-03	4.8035e-03	1.0566e-02	1.7050e-02
PINN-G1	2000	2.1914e-03	3.0055e-03	7.5882e-03	1.1099e-02
	2500	3.0172e-03	3.5698e-03	9.2515e-03	1.4645e-02
	3000	2.5281e-03	4.4858e-03	7.2785e-03	1.6213e-02
	1000	3.0674e-03	2.0581e-03	7.3413e-03	1.1323e-02
	1500	1.0605e-03	1.4729e-03	2.2914e-03	6.2831e-03
PINN-G2	2000	2.2469e-03	1.6072e-03	4.8842e-03	8.8320e-03
	2500	6.6072e-04	6.0509e-04	1.4099e-03	4.3423e-03
	3000	6.6214e-04	1.0830e-03	1.9697e-03	7.8866e-03

The effect of the collocation points on the PINN results has been studied by varying the number of training collocation points systematically between $N=1000$ and $N=3000$ within the domain and on each of the domain boundaries. The results are provided in Figure 9 and Table 2. Figure 9 shows histories of the loss function corresponding to different number of collocation points for PINN-G1 and PINN-G2. Table 2 provides the $l_{2}$ and $l_{\infty}$ errors of $u$ and $v$ versus the number of collocation points computed by PINN-G1 and PINN-G2. The PINN errors in general tend to decrease with increasing number of collocation points, but this trend is not monotonic. It can be observed that both PINN-G1 and PINN-G2 have captured the solutions quite accurately, with those errors from PINN-G2 in general slightly better.

Figure 10 provides some numerical evidence for the relation between the total error and the training loss as suggested by Theorem 4.4. Here we plot the $l_{2}$ errors for $u$ , $v$ and $\frac{\partial u}{\partial x}$ as a function of the training loss value obtained by PINN-G1 and PINN-G2. The results indicate that the total error scales approximately as the square root of the training loss, which in some sense corroborates the error-loss relation as expressed in Theorem 4.4.

6.3 Linear Elastodynamic Equation

In this subsection we look into the linear elastodynamic equation (in two spatial dimensions plus time) and test the PINN algorithm as suggested by the theoretical analysis in Section 5 using this equation. Consider the spatial-temporal domain $(x,y,t)\in\Omega=D\times[0,T]=[0,1]\times[0,1]\times[0,2]$ , and the following initial/boundary value problem with the linear elastodynamics equation on $\Omega$ :


		$\displaystyle\rho\frac{\partial^{2}\bm{u}}{\partial t^{2}}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}(\bm{x},t),$		(69a)
		$\displaystyle\bm{u}\|_{\Gamma_{d}}=\bm{\phi}_{d},\qquad\Big{(}2\mu\underline{\bm{\varepsilon}}(\bm{u})+\lambda(\nabla\cdot\bm{u})\Big{)}\|_{\Gamma_{n}}\bm{n}=\bm{\phi}_{n},$		(69b)
		$\displaystyle\bm{u}(\bm{x},0)=\bm{\psi}_{1},\qquad\frac{\partial\bm{u}}{\partial t}(\bm{x},0)=\bm{\psi}_{2},$		(69c)

where $\bm{u}=(u_{1}(\bm{x},t),u_{2}(\bm{x},t))^{T}$ ( $\bm{x}=(x,y)\in D$ , $t\in[0,T]$ ) is the displacement field to be solved for, $\bm{f}(\bm{x},t)$ is a source term, and $\rho$ , $\mu$ and $\lambda$ are material constants. $\Gamma_{d}$ is the Dirichlet boundary and $\Gamma_{n}$ is the Neumann boundary, with $\partial D=\Gamma_{d}\cup\Gamma_{n}$ and $\Gamma_{d}\cap\Gamma_{n}=\emptyset$ , where $\bm{n}$ is the outward-pointing unit normal vector. In our simulations we choose the left boundary ( $x=0$ ) as the Dirichlet boundary, and the rest are Neumann boundaries. $\bm{\phi}_{d}$ and $\bm{\phi}_{n}$ are Dirichlet and Neumann boundary conditions, respectively. $\bm{\psi}_{1}$ and $\bm{\psi}_{2}$ are the initial conditions for the displacement and the velocity. We employ the material parameter values $\mu=\lambda=\rho=1$ , and the following manufactured solution (2018_CMAME_DGelastodynamics ) to this problem,

\displaystyle\bm{u}(\bm{x},t)=\sin(\sqrt{2}\pi t)\begin{bmatrix}-\sin(\pi x)^{2}\sin(2\pi y)\\ \sin(2\pi x)\sin(\pi y)^{2}\end{bmatrix}.

(70)

The source term $\bm{f}(\bm{x},t)$ , the boundary/initial distributions $\bm{\phi}_{d}$ , $\bm{\phi}_{n}$ , $\bm{\psi}_{1}$ and $\bm{\psi}_{2}$ are chosen by the expression (70).

To simulate this problem using the PINN algorithm suggested by the theoretical analysis from Section 5, we reformulate (69) into the following system


		$\displaystyle\bm{u}_{t}-\bm{v}=\bm{0},\qquad\bm{v}_{t}-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\nabla(\nabla\cdot\bm{u})=\bm{f}(\bm{x},t),$		(71a)
		$\displaystyle\bm{u}\|_{\Gamma_{d}}=\bm{\phi}_{d},\qquad\Big{(}2\underline{\bm{\varepsilon}}(\bm{u})+(\nabla\cdot\bm{u})\Big{)}\|_{\Gamma_{n}}\bm{n}=\bm{\phi}_{n},$		(71b)
		$\displaystyle\bm{u}(\bm{x},0)=\bm{\psi}_{1},\qquad\bm{v}(\bm{x},0)=\bm{\psi}_{2},$		(71c)

where $\bm{v}(\bm{x},t)$ is an intermediate variable (representing the velocity) as given by (71a).

In light of (5.2), we employ the following loss function for PINN,

Loss	$\displaystyle=\frac{W_{1}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})))-\nabla(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))-\bm{f}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{2}(\bm{x}_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{9}}{N}\sum_{n=1}^{N}\|\bm{v}_{\theta}(\bm{x}_{sb1}^{n},t_{sb1}^{n})-\bm{\phi}_{dt}(\bm{x}_{sb1}^{n},t_{sb1}^{n})\|$
	$\displaystyle+\frac{W_{10}}{N}\sum_{n=1}^{N}\|2\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}+(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}-\bm{\phi}_{n}(\bm{x}_{sb2}^{n},t_{sb2}^{n})\|,$	(72)

where we have added the penalty coefficients, $W_{n}>0$ ( $1\leq n\leq 10$ ), for different loss terms in the implementation, and $N$ denotes the number of collocation points within the domain and on the domain boundaries. In the numerical tests we have also implemented another form for the loss function as follows,

Loss	$\displaystyle=\frac{W_{1}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})))-\nabla(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))-\bm{f}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{2}(\bm{x}_{tb}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}$
	$\displaystyle+\frac{W_{9}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{sb1}^{n},t_{sb1}^{n})-\bm{\phi}_{dt}(\bm{x}_{sb1}^{n},t_{sb1}^{n})\right]^{2}$
	$\displaystyle+\frac{W_{10}}{N}\sum_{n=1}^{N}\left[2\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}+(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}-\bm{\phi}_{n}(\bm{x}_{sb2}^{n},t_{sb2}^{n})\right]^{2}.$	(73)

The difference between these two forms for the loss function lies in the $W_{9}$ and $W_{10}$ terms. It should be noted that the $W_{9}$ and $W_{10}$ terms in (6.3) are not squared, in light of the error terms (55a) $-$ (55j) from the theoretical analysis. In contrast, these terms are squared in (6.3). The PINN scheme utilizing the loss function (6.3) is henceforth referred to as PINN-H1, and the scheme that employs the loss function (6.3) shall be referred to as PINN-H2.

In the simulations, we employ a feed-forward neural network with three input nodes, which represent $\bm{x}=(x,y)$ and the time variable t, and four output nodes, which represent $\bm{u}=(u_{1},u_{2})$ and $\bm{v}=(v_{1},v_{2})$ . The neural network has two hidden layers, with widths of 90 and 60 nodes, respectively, and the $\tanh$ activation function for all the hidden nodes. For the network training, $N$ collocation points are generated from a uniform random distribution within the domain, on each of the domain boundary, as well as on the initial boundary. $N$ is systematically varied in the simulations. We employ the penalty coefficients $\bm{W}=(W_{1},...,W_{10})=(0.9,0.9,0.9,0.9,0.5,0.5,0.5,0.5,0.9,0.9)$ in the simulations.

In Figures 11 and 12 we compare the PINN-H1/PINN-H2 solutions with the exact solution and provide an overview of their errors. Figure 11 is a visualization of the deformed configuration of the domain. Here we have plotted the deformed field, $\bm{x}+\bm{u}(\bm{x},t)$ , for a set of grid points $\bm{x}\in D$ at three time instants from the exact solution, the PINN-H1 and PINN-H2 solutions. Figure 12 shows distributions of the point-wise absolute error of the PINN-H1/PINN-H2 solutions, $\|\bm{u}_{\theta}-\bm{u}\|=\sqrt{(u_{\theta 1}(\bm{x},t)-u_{1}(\bm{x},t))^{2}+(u_{\theta 2}(\bm{x},t)-u_{2}(\bm{x},t))^{2}}$ , at the same three time instants. Here $\bm{u}_{\theta}=(u_{\theta 1},u_{\theta 2})$ denotes the PINN solution. While both PINN schemes capture the solution fairly well at $t=0.5$ and $1$ , at $t=1.5$ both schemes show larger deviations from the true solution. In general, the PINN-H1 scheme appears to produce a better approximation to the solution than PINN-H2.

Table 3: Linear elastodynamic equation: The

l_{2}

and

l_{\infty}

errors for

\bm{u}=(u_{1},u_{2})

and

\bm{v}=(v_{1},v_{2})

versus the number of training data points

N

from the PINN-H1 and PINN-H2 solutions.

$N$	$l_{2}$ -error				$l_{\infty}$ -error
$N$	$u_{\theta 1}$	$u_{\theta 2}$	$v_{\theta 1}$	$v_{\theta 2}$	$u_{\theta 1}$	$u_{\theta 2}$	$v_{\theta 1}$	$v_{\theta 2}$
	PINN-H1
1000	4.8837e-02	6.0673e-02	4.7460e-02	5.1640e-02	1.7189e-01	2.1201e-01	6.9024e-01	6.1540e-01
1500	2.8131e-02	3.1485e-02	4.1104e-02	4.1613e-02	1.9848e-01	2.4670e-01	3.4716e-01	4.0582e-01
2000	2.7796e-02	4.0410e-02	3.5891e-02	4.6334e-02	1.4704e-01	1.7687e-01	4.0678e-01	5.0022e-01
2500	3.0909e-02	4.0215e-02	3.3966e-02	4.4024e-02	1.7589e-01	2.4211e-01	4.1403e-01	3.9570e-01
3000	2.6411e-02	3.5600e-02	4.3209e-02	5.2802e-02	1.4289e-01	1.3625e-01	5.1167e-01	5.3298e-01
	PINN-H2
1000	4.9869e-02	1.3451e-01	5.6327e-02	5.4796e-02	3.2314e-01	3.4978e-01	6.7624e-01	5.7277e-01
1500	5.4708e-02	1.3987e-01	4.5871e-02	5.1622e-02	2.8609e-01	5.2598e-01	4.9343e-01	2.3518e-01
2000	6.2114e-02	1.0190e-01	6.4477e-02	5.0011e-02	2.5745e-01	3.1642e-01	5.9057e-01	5.8411e-01
2500	3.7887e-02	6.0630e-02	5.4363e-02	5.0659e-02	2.2212e-01	2.4774e-01	5.3681e-01	3.5427e-01
3000	5.4862e-02	6.3407e-02	5.5208e-02	6.0082e-02	3.4102e-01	2.1308e-01	5.1894e-01	4.4995e-01

The effect of the number of collocation points ( $N$ ) on the PINN results has been studied in Figure 13 and Table 3, where $N$ is systematically varied in the range $N=1000$ to $N=3000$ . Figure 13 shows the histories of the loss function for training PINN-H1 and PINN-H2 under different collocation points. Table 3 lists the corresponding $l_{2}$ and $l_{\infty}$ errors of $\bm{u}$ and $\bm{v}$ obtained from PINN-H1 and PINN-H2. One can observe that the PINN errors in general tend to improve with increasing number of collocation points. It can also be observed that the PINN-H1 errors in general appear better than those of PINN-H2 for this problem.

Figure 14 shows the errors of $\bm{u}$ , $\bm{u}_{t}$ , $\underline{\bm{\varepsilon}}(\bm{u})$ and $\nabla\cdot\bm{u}$ as a function of the loss function value in the network training of PINN-H1 and PINN-H2. The data indicates that these errors approximately scale as the square root of the training loss, which is consistent with the relation as given by Theorem 5.5. This in a sense provides numerical evidence for the theoretical analysis in Section 5.

7 Concluding Remarks

In the present paper we have considered the approximation of a class of dynamic PDEs of second order in time by physics-informed neural networks (PINN). We provide an analysis of the convergence and the error of PINN for approximating the wave equation, the Sine-Gordon equation, and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the $\tanh$ activation function for all the hidden nodes, the PINN approximation errors for the solution field, its time derivative and its gradient can be bounded by the PINN training loss and the number of training data points (quadrature points).

Our theoretical analyses further suggest new forms for the PINN training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN formulation of the loss function. These typically include the gradient of the equation residual, the gradient of the initial-condition residual, and the time derivative of the boundary-condition residual. In addition, depending on the type of boundary conditions involved in the problem, our analyses suggest that a norm other than the commonly-used $L^{2}$ norm may be more appropriate for the boundary residuals in the loss function. Adopting these new forms of the loss function suggested by the theoretical analyses leads to a variant PINN algorithm. We have implemented the new algorithm and presented a number of numerical experiments on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. The simulation results demonstrate that the method can capture the solution field well for these PDEs. The numerical data corroborate the theoretical analyses.

Declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Availability of data/code and material

Data will be made available on reasonable request.

Acknowledgements

The work was partially supported by the China Postdoctoral Science Foundation (No.2021M702747), Natural Science Foundation of Hunan Province (No.2022JJ40422), NSF of China (No.12101495), General Special Project of Education Department of Shaanxi Provincial Government (No.21JK0943), and the US National Science Foundation (DMS-2012415).

8 Appendix: Auxiliary Results and Proofs of Main Theorems from Sections 4 and 5

8.1 Notation

Let a $d$ -tuple of non-negative integers $\alpha\in\mathbb{N}_{0}^{d}$ be multi-index with $d\in\mathbb{N}$ . For given two multi-indices $\alpha,\beta\in\mathbb{N}_{0}^{d}$ , we say that $\alpha,\beta$ , if, and only if, $\alpha_{i}\leq\beta_{i}$ for all $i=1,\cdots,d$ . And then, denote

|\alpha|=\sum_{i=1}^{d}\alpha_{i},\qquad\alpha!=\prod_{i=1}^{d}\alpha_{i}!,\qquad\begin{pmatrix}\alpha\\ \beta\end{pmatrix}=\frac{\alpha!}{\beta!(\alpha-\beta)!}.

Let $P_{m,n}=\{\alpha\in\mathbb{N}_{0}^{n},|\alpha|=m\}$ , for which it holds

|P_{m,n}|=\begin{pmatrix}m+n-1\\ m\end{pmatrix}.

8.2 Some Auxiliary Results

Lemma 8.1.

Let $d\in\mathbb{N},k,l\in\mathbb{N}_{0}$ with $k>l+\frac{d}{2}$ and $\Omega\subset\mathbb{R}^{d}$ be an open set. Every function $f\in H^{k}(\Omega)$ has a continuous representative belonging to $C^{l}(\Omega)$ .

Lemma 8.2.

Let $d\in\mathbb{N},k\in\mathbb{N}_{0}$ , $f\in H^{k}(\Omega)$ and $g\in W^{k,\infty}(\Omega)$ with $\Omega\subset\mathbb{R}^{d}$ , then

\|fg\|_{H^{k}(\Omega)}\leq 2^{k}\|f\|_{H^{k}(\Omega)}\|g\|_{W^{k,\infty}(\Omega)}.

Lemma 8.3 (Multiplicative trace inequality, e.g. DeRyck2021On ).

Let $d\geq 2$ , $\Omega\subset\mathbb{R}^{d}$ be a Lipschitz domain and let $\gamma_{0}:H^{1}(\Omega)\rightarrow L^{2}(\partial\Omega):u\mapsto u|_{\partial\Omega}$ be the trace operator. Denote by $h_{\Omega}$ the diameter of $\Omega$ and by $\rho_{\Omega}$ the radius of the largest $d$ -dimensional ball that can be inscribed into $\Omega$ . Then it holds that

\|\gamma_{0}u\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d,\rho_{\Omega}}\|u\|_{H^{1}(\Omega)},

(74)

where $C_{h_{\Omega},d,\rho_{\Omega}}=\sqrt{\frac{2\max\{2h_{\Omega},d\}}{\rho_{\Omega}}}$ .

Lemma 8.4 (2023_IMA_Mishra_NS ).

Let $d,n,L,W\in\mathbb{N}$ and let $u_{\theta}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ be a neural network with $\theta\in\Theta$ for $L\geq 2,R,W\geq 1$ , c.f. Definition 2.1. Assume that $\|\sigma\|_{C^{n}}\geq 1$ . Then it holds for $1\leq j\leq d$ that

\|(u_{\theta})_{j}\|_{C^{n}(\Omega)}\leq 16^{L}d^{2n}(e^{2}n^{4}W^{3}R^{n}\|\sigma\|_{C^{n}(\Omega)})^{nL}.

(75)

Lemma 8.5 (2023_IMA_Mishra_NS ).

Let $d\geq 2,m\geq 3,\sigma>0,a_{i},b_{i}\in\mathbb{Z}$ with $a_{i}<b_{i}$ for $1\leq i\leq d$ , $\Omega=\prod_{i=1}^{d}[a_{i},b_{i}]$ and $f\in H^{m}(\Omega)$ . Then for every $N\in\mathbb{N}$ with $N>5$ there exists a tanh neural network $\hat{f}^{N}$ with two hidden layers, one of width at most $3[\frac{m}{2}]|P_{m-1,d+1}|+\sum_{i=1}^{d}(b_{i}-a_{i})(N-1)$ and another of width at most $3[\frac{d+2}{2}]|P_{d+1,d+1}|N^{d}\prod_{i=1}^{d}(b_{i}-a_{i})$ , such that for $k=0,1,2$ it holds that

\|f-\hat{f}^{N}\|_{H^{k}(\Omega)}\leq 2^{k}3^{d}C_{k,m,d,f}(1+\sigma){\rm ln}^{k}\left(\beta_{k,\sigma,d,f}N^{d+m+2}\right)N^{-m+k},

(76)

and where

	$\displaystyle C_{k,m,d,f}=\max_{0\leq l\leq k}\left(\begin{array}[]{c}d+l-1\\ l\\ \end{array}\right)^{1/2}\frac{((m-l)!)^{1/2}}{([\frac{m-l}{d}]!)^{d/2}}\left(\frac{3\sqrt{d}}{\pi}\right)^{m-l}\|f\|_{H^{m}(\Omega)},$
	$\displaystyle\beta_{k,\sigma,d,f}=\frac{5\cdot 2^{kd}\max\{\prod_{i=1}^{d}(b_{i}-a_{i}),d\}\max\{\\|f\\|_{W^{k,\infty}(\Omega)},1\}}{3^{d}\sigma\min\{1,C_{k,m,d,f}\}}.$

Moreover, the weights of $\hat{f}^{N}$ scale as $O(N^{\gamma})$ with $\gamma=\max\{m^{2}/2,d(1+m/2+d/2)\}$ .

8.3 Proof of Main Theorems from Section 4: Sine-Gordon Equation

Theorem 4.2: Let $d$ , $r$ , $k\in\mathbb{N}$ with $k\geq 3$ . Assume that $g(u)$ is Lipschitz continuous, $u\in C^{k}(D\times[0,T])$ and $v\in C^{k-1}(D\times[0,T])$ . Then for every integer $N>5$ , there exist $\tanh$ neural networks $u_{\theta}$ and $v_{\theta}$ , each having two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that

	$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)},\\|R_{tb1}\\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)},\\|\nabla R_{int1}\\|_{L^{2}(\Omega)},\\|\nabla R_{tb1}\\|_{L^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb}\\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.$

Proof.

Based on $u\in C^{k}(D\times[0,T])$ , $v\in C^{k-1}(D\times[0,T])$ and Lemma 8.5, there exist neural networks $u_{\theta}$ and $v_{\theta}$ , with the same two hidden layers and widths $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that for every $0\leq l\leq 2$ and $0\leq s\leq 2$ ,

	$\displaystyle\\|u_{\theta}-u\\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u}\lambda_{l,u}(N)N^{-k+l},$
	$\displaystyle\\|v_{\theta}-v\\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v}\lambda_{s,v}(N)N^{-k+1+s}.$

It is now straightforward to bound the PINN residual.

	$\displaystyle\\|\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)},\qquad\\|\hat{v}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\Delta\hat{u}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)}\qquad\\|\nabla\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\nabla\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{u}\\|_{L^{2}(D)}\leq\\|\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(D)}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\nabla\hat{u}\\|_{L^{2}(D)}\leq\\|\nabla\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(\partial D\times[0,t])}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)}.$

Similar to Theorem 3.3, we can obtain

	$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)}=\\|\hat{u}_{t}-\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)}+\\|\hat{v}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)}=\\|\varepsilon^{2}\hat{v}_{t}-a^{2}\Delta\hat{u}+\varepsilon_{1}^{2}\hat{u}+g(u_{\theta})-g(u)\\|_{L^{2}(\Omega)}$
	$\displaystyle\qquad\leq\varepsilon^{2}\\|\hat{v}\\|_{H^{1}(\Omega)}+a^{2}\\|\hat{u}\\|_{H^{2}(\Omega)}+\varepsilon_{1}^{2}\\|\hat{u}\\|_{L^{2}(\Omega)}+L\\|\hat{u}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{int1}\\|_{L^{2}(\Omega)}=\\|\nabla(\hat{u}_{t}-\hat{v})\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)}+\\|\hat{v}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|R_{tb1}\\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb}\\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{tb1}\\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.$

∎

Theorem 4.3: Let $d\in\mathbb{N}$ , $u\in C^{1}(\Omega)$ and $v\in C^{0}(\Omega)$ be the classical solution to the Sine-Gordon equation (38). Let $(u_{\theta},v_{\theta})$ denote the PINN approximation with the parameter $\theta$ . Then the following relation holds,

\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

where $C_{G}$ is defined in the proof.

Proof.

By taking the inner product of (45a) and (45b) with $\hat{u}$ and $\hat{v}$ over $D$ , respectively, we have

$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int1}\hat{u}{\,\rm{d}}\bm{x}\leq\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x},$	(77)
$\displaystyle\varepsilon^{2}\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=-a^{2}\int_{D}\nabla\hat{u}\cdot\nabla\hat{v}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-a^{2}\int_{D}\nabla\hat{u}\cdot\nabla\hat{u}_{t}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-a^{2}\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\leq-a^{2}\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}+C_{\partial D}\left(\int_{\partial D}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}$
	$\displaystyle\qquad+\frac{1}{2}(\varepsilon_{1}^{2}+L)\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}(\varepsilon_{1}^{2}+L+1)\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x},$	(78)

where $C_{\partial D}=a^{2}|\partial D|^{\frac{1}{2}}(\|u\|_{C^{1}(\partial D\times[0,t])}+||u_{\theta}||_{C^{1}(\partial D\times[0,t])})$ and $\hat{v}=\hat{u}_{t}-R_{int1}$ have been used.

Add (77) to (8.3), and we get

		$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+a^{2}\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad\leq\frac{1}{2}(\varepsilon_{1}^{2}+L+2)\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}(\varepsilon_{1}^{2}+L+2)\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\frac{a^{2}}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}+C_{\partial D}\left(\int_{\partial D}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}.$		(79)

Integrating (8.3) over $[0,\tau]$ for any $\tau\leq T$ and applying the Cauchy–Schwarz inequality, we obtain

	$\displaystyle\int_{D}\|\hat{u}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\|\nabla\hat{u}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}\|\hat{v}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad\leq\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}+(2+\varepsilon_{1}^{2}+L+a^{2})\int_{0}^{\tau}\int_{D}\left(\|\hat{u}\|^{2}+\|\nabla\hat{u}\|^{2}+\|\hat{v}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(\|R_{int1}\|^{2}+a^{2}\|\nabla R_{int1}\|^{2}+\|R_{int2}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t+2C_{\partial D}\|T\|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$

Applying the integral form of the Gr ${\rm\ddot{o}}$ nwall inequality to the above inequality leads to,

\int_{D}|\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}|\nabla\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}|\hat{v}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}\leq C_{G}\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

(80)

where

	$\displaystyle C_{G}=\int_{D}(\|R_{tb1}\|^{2}+a^{2}\|\nabla R_{tb1}\|^{2}+\varepsilon^{2}\|R_{tb2}\|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(\|R_{int1}\|^{2}+\|R_{int2}\|^{2}+a^{2}\|\nabla R_{int1}\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+2C_{\partial D}\|T\|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$

Then, we integrate (80) over $[0,T]$ to end the proof. ∎

Theorem 4.4: Let $d\in\mathbb{N}$ and $T>0$ . Let $u\in C^{4}(\Omega)$ and $v\in C^{3}(\Omega)$ be the classical solution to the Sine-Gordon equation (38). Let $(u_{\theta},v_{\theta})$ denote the PINN approximation with the parameter $\theta\in\Theta$ . Then the following relation holds,

	$\displaystyle\int_{0}^{T}\int_{D}(\|\hat{u}(\bm{x},t)\|^{2}+a^{2}\|\nabla\hat{u}(\bm{x},t)\|^{2}+\varepsilon^{2}\|\hat{v}(\bm{x},t)\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right)$
	$\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),$

where the constant $C_{T}$ is given in the proof.

Proof.

We can combine Theorem 4.3 with the quadrature error formula (18) to obtain the error estimate,

	$\displaystyle\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|R_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})$
		$\displaystyle\leq C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2}),$
	$\displaystyle\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|R_{tb2}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})$
		$\displaystyle\leq C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2}),$
	$\displaystyle\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\nabla R_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2}),$
	$\displaystyle\int_{\Omega}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})$
		$\displaystyle\leq C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2}),$
	$\displaystyle\int_{\Omega}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})$
		$\displaystyle\leq C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2}),$
	$\displaystyle\int_{\Omega}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2}),$
	$\displaystyle\int_{\Omega_{*}}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t$	$\displaystyle=\int_{\Omega_{}}\|R_{sb}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{}}(R_{sb}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})$
		$\displaystyle\leq C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2}).$

In light of (80) and the above inequalities, we have

\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq TC_{T}\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

where

	$\displaystyle C_{T}=$	$\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\varepsilon^{2}\left(C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})\right)$
		$\displaystyle+a^{2}\left(C_{(\|\nabla R_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla R_{tb1}\|^{2})\right)+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})$
		$\displaystyle+C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+a^{2}\left(C_{(\|\nabla R_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla R_{int1}\|^{2})\right),$
		$\displaystyle+2C_{\partial D}\|T\|^{\frac{1}{2}}\left(C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})\right)^{\frac{1}{2}},$

and

	$\displaystyle C_{({R_{tb1}^{2}})}\lesssim\\|\hat{u}\\|_{C^{2}}^{2},\quad C_{({R_{tb2}^{2}})}\lesssim\\|\hat{v}\\|_{C^{2}}^{2},\quad C_{(\|\nabla R_{tb1}\|^{2})}\lesssim\\|\hat{u}\\|_{C^{3}}^{2},\quad C_{({R_{int1}^{2}})}\lesssim\\|\hat{u}\\|_{C^{3}}^{2}+\\|\hat{v}\\|_{C^{2}}^{2},$
	$\displaystyle\qquad\qquad C_{({R_{int2}^{2}})},C_{(\|\nabla R_{int1}\|^{2})}\lesssim\\|\hat{u}\\|_{C^{4}}^{2}+\\|\hat{v}\\|_{C^{3}}^{2},\quad C_{({R_{sb}^{2}})}\lesssim\\|\hat{v}\\|_{C^{3}}^{2}.$

Here, the boundedness $\|u_{\theta}\|_{C^{n}}$ and $\|v_{\theta}\|_{C^{n}}$ ( $n\in\mathbb{N}$ ) of the above constants can be obtained by Lemma 8.4 and $\|R_{q}^{2}\|_{C^{n}}\leq 2^{n}\|R_{q}\|_{C^{n}}^{2}$ for $R_{q}=R_{tb1}$ , $R_{tb2}$ , $\nabla R_{tb1}$ , $R_{int1}$ , $R_{int2}$ , $\nabla R_{int1}$ and $R_{sb}$ . ∎

8.4 Proof of Main Theorems from Section 5: Linear Elastodynamic Equation

Theorem 5.3: Let $d$ , $r$ , $k\in\mathbb{N}$ with $k\geq 3$ . Let $\bm{\psi}_{1}\in H^{r}(D)$ , $\bm{\psi}_{2}\in H^{r-1}(D)$ and $\bm{f}\in H^{r-1}(D\times[0,T])$ with $r>\frac{d}{2}+k$ . For every integer $N>5$ , there exist $\tanh$ neural networks $(\bm{u}_{j})_{\theta}$ and $(\bm{v}_{j})_{\theta}$ , with $j=1,2,\cdots,d$ , each with two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that

	$\displaystyle\\|\bm{R}_{int1}\\|_{L^{2}(\Omega)},\\|\bm{R}_{tb1}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|\bm{R}_{int2}\\|_{L^{2}(\Omega)},\\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\\|_{L^{2}(\Omega)},\\|\nabla\cdot\bm{R}_{int1}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\\|_{L^{2}(D)},\\|\nabla\cdot\bm{R}_{tb1}\\|_{L^{2}(D)},\\|\bm{R}_{sb2}\\|_{L^{2}(\Gamma_{N}\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\bm{R}_{tb2}\\|_{L^{2}(D)},\\|\bm{R}_{sb1}\\|_{L^{2}(\Gamma_{D}\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.$

Proof.

Lemma 5.2 implies that,

\bm{u}\in C^{k}(D\times[0,T]),\qquad\bm{v}\in C^{k-1}(D\times[0,T]).

Let $\bm{u}_{\theta}=((u_{1})_{\theta},(u_{2})_{\theta},\cdots,(u_{d})_{\theta})$ and $\bm{v}_{\theta}=((v_{1})_{\theta},(v_{2})_{\theta},\cdots,(v_{d})_{\theta})$ . Based on Lemma 8.5, there exists $\tanh$ neural networks $(u_{i})_{\theta}$ and $(v_{i})_{\theta}$ , with $i=1,2,\cdots,d$ , each having two hidden layers, of widths at most $3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1)$ and $3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}$ , such that for every $0\leq l\leq 2$ and $0\leq s\leq 2$ ,

		$\displaystyle\\|u_{i}-(u_{i})_{\theta}\\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u_{i}}\lambda_{l,u_{i}}(N)N^{-k+l},$		(81)
		$\displaystyle\\|v_{i}-(v_{i})_{\theta}\\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v_{i}}\lambda_{s,v_{i}}(N)N^{-k+1+s}.$		(82)

Let $\partial_{i}$ represent the derivative with respect to the $i$ -th dimension. For $1\leq i,\ j\leq d$ , we have

	$\displaystyle\\|(\hat{u}_{t})_{i}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}_{i}\\|_{H^{1}(\Omega)},\qquad\\|(\hat{v}_{t})_{i}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}_{i}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\partial_{i}\partial_{j}\hat{u}_{i}\\|_{L^{2}(\Omega)},\\|\partial_{i}\partial_{i}\hat{u}_{i}\\|_{L^{2}(\Omega)},\\|\partial_{j}\partial_{j}\hat{u}_{i}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}_{i}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\partial_{j}(\hat{u}_{t})_{i}\\|_{L^{2}(\Omega)}\leq\\|(\hat{u}_{t})_{i}\\|_{H^{1}(\Omega)}\leq\\|\hat{u}_{i}\\|_{H^{2}(\Omega)},\qquad\\|\partial_{j}\hat{v}_{i}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}_{i}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{u}_{i}\\|_{L^{2}(D)}\leq\\|\hat{u}_{i}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}_{i}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{v}_{i}\\|_{L^{2}(D)}\leq\\|\hat{v}_{i}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}_{i}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\partial_{j}(\hat{u})_{i}\\|_{L^{2}(D)}\leq\\|\partial_{j}(\hat{u})_{i}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}_{i}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\hat{v}_{i}\\|_{L^{2}(\Gamma_{D}\times[0,t])}\leq\\|\hat{v}_{i}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}_{i}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\partial_{i}\hat{u}_{i}n_{i}\\|_{L^{2}(\Gamma_{N}\times[0,t])},\\|\partial_{j}\hat{u}_{i}n_{i}\\|_{L^{2}(\Gamma_{N}\times[0,t])},\\|\partial_{j}\hat{u}_{i}n_{j}\\|_{L^{2}(\Gamma_{N}\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}_{i}\\|_{H^{2}(\Omega)}.$

Using (81) and (82) and the above relations, we can now bound the PINN residuals,

	$\displaystyle\\|\bm{R}_{int1}\\|_{L^{2}(\Omega)}\leq\\|\hat{\bm{u}}_{t}-\hat{\bm{v}}\\|_{L^{2}(\Omega)}\leq\\|\hat{\bm{u}}\\|_{H^{1}(\Omega)}+\\|\hat{\bm{v}}\\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|\bm{R}_{int2}\\|_{L^{2}(\Omega)}\leq\\|\rho\hat{\bm{v}}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\hat{\bm{u}}))-\lambda\nabla(\nabla\cdot\hat{\bm{u}})\\|_{L^{2}(\Omega)}$
	$\displaystyle\qquad\lesssim\\|\hat{\bm{v}}\\|_{H^{1}(\Omega)}+\\|\hat{\bm{u}}\\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\\|_{L^{2}(\Omega)},\\|\nabla\cdot\bm{R}_{int1}\\|_{L^{2}(\Omega)}\lesssim\\|\hat{\bm{u}}\\|_{H^{2}(\Omega)}+\\|\hat{\bm{v}}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\bm{R}_{tb1}\\|_{L^{2}(D)}\leq\\|\hat{\bm{u}}\\|_{L^{2}(\partial\Omega)}\lesssim\\|\hat{\bm{u}}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|\bm{R}_{tb2}\\|_{L^{2}(D)}\leq\\|\hat{\bm{v}}\\|_{L^{2}(\partial\Omega)}\lesssim\\|\hat{\bm{v}}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},$
	$\displaystyle\\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\\|_{L^{2}(D)},\\|\nabla\cdot\bm{R}_{tb1}\\|_{L^{2}(D)}\lesssim\\|\hat{\bm{u}}\\|_{H^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\bm{R}_{sb1}\\|_{L^{2}(\Gamma_{D}\times[0,t])}\leq\\|\hat{\bm{v}}\\|_{L^{2}(\partial\Omega)}\lesssim\\|\hat{\bm{v}}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},$
	$\displaystyle\\|\bm{R}_{sb2}\\|_{L^{2}(\Gamma_{N}\times[0,t])}\leq\\|2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n}\\|_{\partial\Omega}\lesssim\\|\hat{\bm{u}}\\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.$

∎

Theorem 5.4: Let $d\in\mathbb{N}$ , $\bm{u}\in C^{1}(\Omega)$ and $\bm{v}\in C(\Omega)$ be the classical solution to the linear elastodynamic equation (49). Let $(\bm{u}_{\theta},\bm{v}_{\theta})$ denote the PINN approximation with the parameter $\theta$ . then the following relation holds,

\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+2\mu+\lambda)T\right),

where $C_{G}$ is given in the proof.

Proof.

By taking the inner product of (56a) and (56b) with $\hat{\bm{u}}$ and $\hat{\bm{v}}$ and integrating over $D$ , respectively, we have

		$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}=\int_{D}\hat{\bm{u}}\hat{\bm{v}}{\,\rm{d}}\bm{x}+\int_{D}\bm{R}_{int1}\hat{\bm{u}}{\,\rm{d}}\bm{x}\leq\int_{D}\|\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\bm{R}_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\hat{\bm{v}}\|^{2}{\,\rm{d}}\bm{x},$		(83)
		$\displaystyle\rho\frac{d}{2dt}\int_{D}\|\hat{\bm{v}}\|^{2}{\,\rm{d}}\bm{x}=-2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\hat{\bm{v}}{\,\rm{d}}\bm{x}-\lambda\int_{D}(\nabla\cdot\hat{\bm{u}})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}+\int_{\partial D}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})$
		$\displaystyle\qquad+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}$
		$\displaystyle=-2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\hat{\bm{u}}_{t}{\,\rm{d}}\bm{x}+2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\bm{R}_{int1}{\,\rm{d}}\bm{x}-\lambda\int_{D}(\nabla\cdot\hat{\bm{u}})(\nabla\cdot\hat{\bm{u}}_{t}){\,\rm{d}}\bm{x}+\lambda\int_{D}(\nabla\cdot\bm{R}_{int1})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\int_{\Gamma_{D}}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\bm{R}_{sb1}{\,\rm{d}}s(\bm{x})+\int_{\Gamma_{N}}\bm{R}_{sb2}\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}$
		$\displaystyle=-\frac{d}{dt}\int_{D}\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}{\,\rm{d}}\bm{x}-\frac{d}{dt}\int_{D}\frac{\lambda}{2}\|\nabla\cdot\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\bm{R}_{int1}{\,\rm{d}}\bm{x}+\lambda\int_{D}(\nabla\cdot\bm{R}_{int1})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\int_{\Gamma_{D}}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\bm{R}_{sb1}{\,\rm{d}}s(\bm{x})+\int_{\Gamma_{N}}\bm{R}_{sb2}\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}$
		$\displaystyle\leq-\frac{d}{dt}\int_{D}\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}{\,\rm{d}}\bm{x}-\frac{d}{dt}\int_{D}\frac{\lambda}{2}\|\nabla\cdot\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\frac{\lambda}{2}\int_{D}\|\nabla\cdot\bm{R}_{int1}\|{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}\|\nabla\cdot\hat{\bm{v}}\|{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\hat{\bm{v}}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\bm{R}_{int2}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+C_{\Gamma_{D}}\left(\int_{\Gamma_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}+C_{\Gamma_{N}}\left(\int_{\Gamma_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}.$		(84)

Here we have used $\hat{\bm{v}}=\hat{\bm{u}}_{t}-\bm{R}_{int1}$ , and the constants are given by $C_{\Gamma_{D}}=(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}\|\bm{u}\|_{C^{1}(\Gamma_{D}\times[0,T])}+(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}||\bm{u}_{\theta}||_{C^{1}(\Gamma_{D}\times[0,T])}$ and $C_{\Gamma_{N}}=|\Gamma_{N}|^{\frac{1}{2}}(\|\bm{v}\|_{C(\Gamma_{N}\times[0,T])}+||\bm{v}_{\theta}||_{C(\Gamma_{N}\times[0,T])})$ .

Add (83) to (8.4), and we get,

		$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{dt}\int_{D}\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\lambda\|\nabla\cdot\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+\rho\frac{d}{2dt}\int_{D}\|\hat{\bm{v}}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad\leq\int_{D}\|\hat{\bm{u}}\|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}\|\nabla\cdot\hat{\bm{v}}\|{\,\rm{d}}\bm{x}+\int_{D}\|\hat{\bm{v}}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}(\|\bm{R}_{int1}\|^{2}+\|\bm{R}_{int2}\|^{2}){\,\rm{d}}\bm{x}$
		$\displaystyle\qquad\qquad+\mu\int_{D}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}\|\nabla\cdot\bm{R}_{int1}\|{\,\rm{d}}\bm{x}+C_{\Gamma_{D}}\left(\int_{\Gamma_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}+C_{\Gamma_{N}}\left(\int_{\Gamma_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}.$		(85)

Integrating (8.4) over $[0,\tau]$ for any $\tau\leq T$ and applying Cauchy–Schwarz inequality, we obtain,

	$\displaystyle\int_{D}\|\hat{\bm{u}}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},\tau))\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda\|\nabla\cdot\hat{\bm{u}}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}\|\hat{\bm{v}}(\bm{x},\tau)\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad\leq\int_{D}\|\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda\|\nabla\cdot\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}\|\bm{R}_{tb2}\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad\qquad+(2+2\mu+\lambda)\int_{0}^{\tau}\int_{D}\left(\|\hat{\bm{u}}\|^{2}+\|\underline{\bm{\varepsilon}}(\hat{\bm{u}})\|^{2}+\|\nabla\cdot\hat{\bm{u}}\|^{2}+\|\hat{\bm{v}}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad\qquad+\int_{0}^{T}\int_{D}\left(\|\bm{R}_{int1}\|^{2}+2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}+\lambda\|\nabla\cdot\bm{R}_{int1}\|^{2}+\|\bm{R}_{int2}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad\qquad+2\|T\|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2\|T\|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$

By applying the integral form of the Gr ${\rm\ddot{o}}$ nwall inequality to the above inequality, we have

\int_{D}(|\hat{\bm{u}}(\bm{x},\tau)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},\tau))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},\tau)|^{2}+\rho\int_{D}|\hat{\bm{v}}(\bm{x},\tau)|^{2}){\,\rm{d}}\bm{x}\leq C_{G}\exp\left((2+2\mu+\lambda)T\right),

(86)

where

	$\displaystyle C_{G}=\int_{D}\|\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda\|\nabla\cdot\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}\|\bm{R}_{tb2}\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(\|\bm{R}_{int1}\|^{2}+2\mu\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}+\lambda\|\nabla\cdot\bm{R}_{int1}\|^{2}+\|\bm{R}_{int2}\|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t$
	$\displaystyle\qquad+2\|T\|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2\|T\|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.$

Then, we finish the proof by integrating (86) over $[0,T]$ . ∎

Theorem 5.5: Let $d\in\mathbb{N}$ , $\bm{u}\in C^{4}(\Omega)$ and $\bm{v}\in C^{3}(\Omega)$ be the classical solution to the linear elastodynamic equation (49). Let $(\bm{u}_{\theta},\bm{v}_{\theta})$ denote the PINN approximation with the parameter $\theta$ . Then the following relation holds,

	$\displaystyle\int_{0}^{T}\int_{D}(\|\hat{\bm{u}}(\bm{x},t)\|^{2}+2\mu\|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))\|^{2}+\lambda\|\nabla\cdot\hat{\bm{u}}(\bm{x},t)\|^{2}+\rho\|\hat{\bm{v}}(\bm{x},t)\|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+2\mu+\lambda)T\right)$
	$\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),$

where $C_{T}$ is defined in the following proof.

Proof.

By the definitions of different components of the training error (55) and applying the estimate (18) on the quadrature error, we have

	$\displaystyle\int_{D}\|\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2}),$
	$\displaystyle\int_{D}\|\bm{R}_{tb2}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\bm{R}_{tb2}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2}),$
	$\displaystyle\int_{D}\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})+\mathcal{Q}_{M_{tb}}^{D}(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})$
		$\displaystyle\leq C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2}),$
	$\displaystyle\int_{D}\|\nabla\cdot\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\|\nabla\cdot\bm{R}_{tb1}\|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\|\nabla\cdot\bm{R}_{tb1}\|^{2})+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla\cdot\bm{R}_{tb1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla\cdot\bm{R}_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla\cdot\bm{R}_{tb1}\|^{2}),$
	$\displaystyle\int_{\Omega}\|\bm{R}_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\bm{R}_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2}),$
	$\displaystyle\int_{\Omega}\|\bm{R}_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\bm{R}_{int2}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2}),$
	$\displaystyle\int_{\Omega}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})$
		$\displaystyle\leq C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2}),$
	$\displaystyle\int_{\Omega}\|\nabla\cdot\bm{R}_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t$	$\displaystyle=\int_{\Omega}\|\nabla\cdot\bm{R}_{int1}\|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla\cdot\bm{R}_{int1}\|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla\cdot\bm{R}_{int1}\|^{2})$
		$\displaystyle\leq C_{(\|\nabla\cdot\bm{R}_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla\cdot\bm{R}_{int1}\|^{2}),$
	$\displaystyle\int_{\Omega_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t$	$\displaystyle=\int_{\Omega_{D}}\|\bm{R}_{sb1}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2}),$
	$\displaystyle\int_{\Omega_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t$	$\displaystyle=\int_{\Omega_{N}}\|\bm{R}_{sb2}\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})$
		$\displaystyle\leq C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2}).$

In light of the above inequalities and (86), we obtain

\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq TC_{T}\exp\left((2+2\mu+\lambda)T\right),

where

	$\displaystyle C_{T}=$	$\displaystyle C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\rho\left(C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})\right)+2\mu\left(C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|^{2})\right)$
		$\displaystyle+\lambda\left(C_{(\|\nabla\cdot\bm{R}_{tb1}\|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\|\nabla\cdot\bm{R}_{tb1}\|^{2})\right)+C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})$
		$\displaystyle+C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+2\mu\left(C_{(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|^{2})\right)$
		$\displaystyle+\lambda\left(C_{(\|\nabla\cdot\bm{R}_{int1}\|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\|\nabla\cdot\bm{R}_{int1}\|^{2})\right)+2\|T\|^{\frac{1}{2}}C_{\Gamma_{D}}\left(C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})\right)^{\frac{1}{2}}$
		$\displaystyle+2\|T\|^{\frac{1}{2}}C_{\Gamma_{N}}\left(C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})\right)^{\frac{1}{2}}.$

The boundedness of the constants $C(\bm{R}_{q}^{2})$ can be obtained from Lemma 8.4 and $\|\bm{R}_{q}^{2}\|_{C^{n}}\leq 2^{n}\|\bm{R}_{q}\|_{C^{n}}^{2}$ , with $\bm{R}_{q}=\bm{R}_{tb1}$ , $\bm{R}_{tb2}$ , $\underline{\bm{\varepsilon}}(\bm{R}_{tb1})$ , $\nabla\cdot\bm{R}_{tb1}$ , $\bm{R}_{int1}$ , $\bm{R}_{int2}$ , $\underline{\bm{\varepsilon}}(\bm{R}_{int1})$ , $\nabla\cdot\bm{R}_{int1}$ , $\bm{R}_{sb1}$ and $\bm{R}_{sb2}$ . ∎

References

[1] P. F. Antonietti and I. Mazzieri. High-order discontinuous Galerkin methods for the elastodynamics equation on polygonal and polyhedral meshes. Comput. Methods Appl. Mech. Engrg., 342:414–437, 2018.
[2] Genming Bai, Ujjwal Koley, Siddhartha Mishra, and Roberto Molinaro. Physics informed neural networks (PINNs) for approximating nonlinear dispersive PDEs. J. Comput. Math., 39(6):816–847, 2021.
[3] Christian Beck, Weinan E, and Arnulf Jentzen. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci., 29(4):1563–1619, 2019.
[4] Julius Berner, Philipp Grohs, and Arnulf Jentzen. Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci., 2(3):631–657, 2020.
[5] Animikh Biswas, Jing Tian, and Süleyman Ulusoy. Error estimates for deep learning methods in fluid dynamics. Numer. Math., 151(3):753–777, 2022.
[6] Z. Cai, J. Chen, M. Liu, and X. Liu. Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs. J. Comput. Phys., 420:109707, 2020.
[7] F. Calabro, G. Fabiani, and C. Siettos. Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Engrg., 387:114188, 2021.
[8] Ovidiu Calin. Deep learning architectures–a mathematical approach. Springer Series in the Data Sciences. Springer, Cham, 2020.
[9] Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi, and Francesco Piccialli. Scientific machine learning through physics-informed neural networks: where we are and what’s next. J. Sci. Comput., 92(3):Paper No. 88, 62, 2022.
[10] E.C. Cyr, M.A. Gulian, R.G. Patel, M. Perego, and N.A. Trask. Robust training and initialization of deep neural networks: An adaptive basis viewpoint. Proceedings of Machine Learning Research, 107:512–536, 2020.
[11] P.J. Davis and P. Rabinowitz. Methods of numerical integration. Dover Publications, Inc, 2007.
[12] Tim De Ryck, Ameya D Jagtap, and Siddhartha Mishra. Error estimates for physics-informed neural networks approximating the Navier–Stokes equations. IMA J. Numer. Anal., 0:1–37, 2023.
[13] Tim De Ryck, Samuel Lanthaler, and Siddhartha Mishra. On the approximation of functions by tanh neural networks. Neural Networks, 143:732–750, 2021.
[14] Tim De Ryck and Siddhartha Mishra. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. Adv. Comput. Math., 48(79), 2022.
[15] S. Dong and Z. Li. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Engrg., 387:114129, 2021. (also arXiv:2012.02895).
[16] S. Dong and N. Ni. A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural networks. J. Comput. Phys., 435:110242, 2021.
[17] S. Dong and Y. Wang. A method for computing inverse parametric PDE problems with randomized neural networks. arXiv:2210.04338, 2022.
[18] S. Dong and J. Yang. Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput. Methods Appl. Mech. Engrg., 398:115284, 2022. (also arXiv:2201.09989).
[19] S. Dong and J. Yang. On computing the hyperparameter of extreme learning machines: algorithms and applications to computational PDEs, and comparison with classical and high-order finite elements. J. Comput. Phys., 463:111290, 2022. (also arXiv:2110.14121).
[20] Suchuan Dong and Zongwei Li. A modified batch intrinsic plasticity method for pre-training the random coefficients of extreme learning machines. J. Comput. Phys., 445:Paper No. 110585, 31, 2021.
[21] W. E and B. Yu. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat., 6:1–12, 2018.
[22] Dennis Elbrächter, Dmytro Perekrestenko, Philipp Grohs, and Helmut Bölcskei. Deep neural network approximation theory. IEEE Trans. Inform. Theory, 67(5):2581–2623, 2021.
[23] G. Fabiani, F. Calabro, L. Russo, and C. Siettos. Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J. Sci. Comput., 89:44, 2021.
[24] Rui Fang, David Sondak, Pavlos Protopapas, and Sauro Succi. Neural network models for the anisotropic Reynolds stress tensor in turbulent channel flow. J. Turbul., 21(9-10):525–543, 2020.
[25] J. He and J. Xu. MgNet: A unified framework for multigrid and convolutional neural network. Sci. China Math., 62:1331–1354, 2019.
[26] Ruimeng Hu, Quyuan Lin, Raydan Alan, and Sui Tang. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. arXiv:2209.11929.
[27] Z. Hu, C. Liu, Y. Wang, and Z. Xu. Energetic variational neural network discretizations to gradient flows. arXiv:2206.07303, 2022.
[28] Thomas J. R. Hughes and Jerrold E. Marsden. Classical elastodynamics as a linear symmetric hyperbolic system. J. Elasticity, 8(1):97–110, 1978.
[29] A.D. Jagtap and G.E. Karniadakis. Extended physics-informed neural network (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys., 28:2002–2041, 2020.
[30] A.D. Jagtap, E. Kharazmi, and G.E. Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Engrg., 365:113028, 2020.
[31] G.E. Karniadakis, G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physics-informed machine learning. Nat. Rev. Phys., 3:422–440, 2021.
[32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[33] A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, and M.W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
[34] Kôji Kubota and Kazuyoshi Yokoyama. Global existence of classical solutions to systems of nonlinear wave equations with different speeds of propagation. Japan. J. Math., 27(1):113–202, 2001.
[35] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.
[36] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: a deep learning library for solving differential equations. SIAM Rev., 63(1):208–228, 2021.
[37] Siddhartha Mishra and Roberto Molinaro. Physics informed neural networks for simulating radiative transfer. J. Quant. Spectrosc. Radiat. Transfer, 270:107705, 2021.
[38] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal., 42(2):981–1022, 2022.
[39] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal., 0(2):1–43, 2022.
[40] Siddhartha Mishra and T. Konstantin Rusch. Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences. SIAM J. Numer. Anal., 59(3):1811–1834, 2021.
[41] Partha Niyogi and Federico Girosi. Generalization bounds for function approximation from scattered noisy data. Adv. Comput. Math., 10(1):51–80, 1999.
[42] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, New York, second edition, 2006.
[43] R.G. Patel, I. Manickam, N.A. Trask, M.A. Wood, M. Lee, I. Tomas, and E.C. Cyr. Thermodynamically consistent physics-informed neural networks for hyperbolic systems. J. Comput. Phys., 449:110754, 2022.
[44] M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, and R.M. Kirby. A unified scalable framework for causal sweeping strategies for physics-informed neural networks (pinns) and their temporal decompositions. arXiv:2302.14227, 2023.
[45] Maziar Raissi and George Em Karniadakis. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
[46] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
[47] Jalal Shatah. Global existence of small solutions to nonlinear evolution equations. J. Differential Equations, 46(3):409–425, 1982.
[48] Jalal Shatah. Normal forms and quadratic nonlinear Klein-Gordon equations. Comm. Pure Appl. Math., 38(5):685–696, 1985.
[49] Yeonjong Shin, Jérôme Darbon, and George Em Karniadakis. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys., 28(5):2042–2074, 2020.
[50] Yeonjong Shin, Zhongqiang Zhang, and George Em Karniadakis. Error estimates of residual minimization using neural networks for linear PDEs. arXiv:2010.08019.
[51] J.W. Siegel, Q. Hong, X. Jin, W. Hao, and J. Xu. Greedy training algorithms for neural networks and applications to PDEs. arXiv:2107.04466, 2022.
[52] J. Sirignano and K. Spoliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:1339–1364, 2018.
[53] K. Tang, X. Wan, and Q. Liao. Adaptive deep density estimation for fokker-planck equations. J. Comput. Phys., 457:111080, 2022.
[54] A.M. Tartakovsky, C.O. Marrero, P. Perdikaris, G.D. Tartakovsky, and D. Barajas-Solano. Physics-informed deep neural networks for learning parameters and constitutive relationships in subsurface flow problems. Water Resour. Res., 56:e2019WR026731, 2020.
[55] Roger Temam. Infinite-dimensional dynamical systems in mechanics and physics, volume 68 of Applied Mathematical Sciences. Springer-Verlag, New York, second edition, 1997.
[56] Nils Thuerey, Konstantin Weißenow, Lukas Prantl, and Xiangyu Hu. Deep learning methods for reynolds-averaged Navier–Stokes simulations of airfoil flows. AIAA J., 58(1):25–36, 2020.
[57] X. Wan and S. Wei. VAE-KRnet and its applications to variational Bayes. Commun. Comput. Phys., 31:1049–1082, 2022.
[58] Baoxiang Wang. Classical global solutions for non-linear Klein-Gordon-Schrödinger equations. Math. Methods Appl. Sci., 20(7):599–616, 1997.
[59] Jianxun Wang, Jinlong Wu, and Heng Xiao. Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids, 2(3):034603, 2017.
[60] S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: a neural tangent kernel perspective. J. Comput. Phys., 449:110768, 2022.
[61] Y. Wang and G. Lin. Efficient deep learning techniques for multiphase flow simulation in heterogeneous porous media. J. Comput. Phys., 401:108968, 2020.
[62] Kôsaku Yosida. Functional analysis, volume 123. Springer-Verlag, Berlin-New York, sixth edition, 1980.
[63] Umberto Zerbinati. PINNs and GaLS: A priori error estimates for shallow physics informed neural networks applied to elliptic problems. IFAC-PapersOnLine, 55(20):61–66, 2022.

$\displaystyle\mathcal{E}_{G}(\theta)^{2}$	$\displaystyle=\int_{\Omega}\|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}\|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}\|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\int_{D}\|R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|R_{tb2}[v_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\nabla R_{tb1}[u_{\theta}](\bm{x})\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle+\int_{\Omega_{}}\|R_{sb1}[v_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t+\int_{\Omega_{}}\|R_{sb2}[u_{\theta}](\bm{x},t)\|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t.$	(21)

	$\displaystyle\\|\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)},\qquad\\|\hat{v}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\Delta\hat{u}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)},\qquad\\|\nabla\hat{u}_{t}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\nabla\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{u}\\|_{L^{2}(D)}\leq\\|\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(D)}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\nabla\hat{u}\\|_{L^{2}(D)}\leq\\|\nabla\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)},$
	$\displaystyle\\|\hat{v}\\|_{L^{2}(\partial D\times[0,t])}\leq\\|\hat{v}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)},$
	$\displaystyle\\|\nabla\hat{u}\\|_{L^{2}(\partial D\times[0,t])}\leq\\|\nabla\hat{u}\\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)}.$

	$\displaystyle\\|R_{int1}\\|_{L^{2}(\Omega)}=\\|\hat{u}_{t}-\hat{v}\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{1}(\Omega)}+\\|\hat{v}\\|_{L^{2}(\Omega)}$
	$\displaystyle\qquad\leq C_{1,k,d+1,u}\lambda_{1,u}(N)N^{-k+1}+C_{0,k-1,d+1,v}\lambda_{0,v}(N)N^{-k+1}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{int2}\\|_{L^{2}(\Omega)}=\\|\hat{v}_{t}-\Delta\hat{u}\\|_{L^{2}(\Omega)}\leq\\|\hat{v}\\|_{H^{1}(\Omega)}+\\|\hat{u}\\|_{H^{2}(\Omega)}$
	$\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{int1}\\|_{L^{2}(\Omega)}=\\|\nabla(\hat{u}_{t}-\hat{v})\\|_{L^{2}(\Omega)}\leq\\|\hat{u}\\|_{H^{2}(\Omega)}+\\|\hat{v}\\|_{H^{1}(\Omega)}$
	$\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},$
	$\displaystyle\\|R_{tb1}\\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},$
	$\displaystyle\\|R_{tb2}\\|_{L^{2}(D)},\\|R_{sb1}\\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{v}\\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},$
	$\displaystyle\\|\nabla R_{tb1}\\|_{L^{2}(D)},\\|R_{sb2}\\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\\|\hat{u}\\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.$

$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int1}\hat{u}{\,\rm{d}}\bm{x}\leq\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x},$	(30)
$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$	$\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{v}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{u}_{t}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle=-\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}R_{sb1}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}$
	$\displaystyle\leq-\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}$
	$\displaystyle\qquad+\frac{1}{2}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x})+\frac{1}{2}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}.$	(31)

		$\displaystyle\frac{d}{2dt}\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad\leq\int_{D}\|\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla\hat{u}\|^{2}{\,\rm{d}}\bm{x}+\int_{D}\|\hat{v}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|R_{int1}\|^{2}{\,\rm{d}}\bm{x}$
		$\displaystyle\qquad+\frac{1}{2}\int_{D}\|R_{int2}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}\|\nabla R_{int1}\|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{\partial D}(\|R_{sb1}\|^{2}+\|R_{sb2}\|^{2}){\,\rm{d}}s(\bm{x}).$		(32)