This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Error Analysis of Physics-Informed Neural Networks for Approximating Dynamic PDEs of Second Order in Time

Yanxia Qiana, Yongchao Zhangb, Yunqing Huanga∗, Suchuan Dongc
aSchool of Mathematics and Computational Science, Xiangtan University,
Xiangtan, Hunan, 411105, P.R. China
bSchool of Mathematics, Northwest University,
Xi’an, Shaanxi 710069, P.R. China
cCenter for Computational and Applied Mathematics,
Department of Mathematics, Purdue University, USA
Authors of correspondence. Emails: yxqian0520@xtu.edu.cn (Y. Qian), yoczhang@nwu.edu.cn (Y. Zhang), huangyq@xtu.edu.cn (Y. Huang), sdong@purdue.edu (S. Dong)
((March 21, 2023))
Abstract

We consider the approximation of a class of dynamic partial differential equations (PDE) of second order in time by the physics-informed neural network (PINN) approach, and provide an error analysis of PINN for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the tanh\tanh activation function, the PINN approximation errors for the solution field, its time derivative and its gradient field can be effectively bounded by the training loss and the number of training data points (quadrature points). Our analyses further suggest new forms for the training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN loss formulation. Adopting these new forms for the loss function leads to a variant PINN algorithm. We present ample numerical experiments with the new PINN algorithm for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation, which show that the method can capture the solution well.

Keywords: physics informed neural network; neural network; error estimate; PDE; scientific machine learning

1 Introduction

Deep neural networks (DNN) have achieved a great success in a number of fields in science and engineering LeCun2015DP such as natural language processing, robotics, computer vision, speech and image recognition, to name but a few. This has inspired a great deal of research efforts in the past few years to adapt such techniques to scientific computing. DNN-based techniques seem particularly promising for problems in higher dimensions, e.g. high-dimensional partial differential equations (PDE), since traditional numerical methods for high-dimensional problems can quickly become infeasible due to the exponential increase in the computational effort (so-called curse of dimensionality). Under these circumstances deep-learning algorithms can be helpful. In particular, the neural networks approach for PDE problems provide implicit regularization and can alleviate and perhaps overcome the curse of high dimensions Beck2019Machine ; Berner2020Analysis . This approach also provides a natural framework for estimating the unknown parameters Fang2020NN ; Raissi2019pinn ; Raissi2018Hidden ; Thuerey2020Deep ; Wang2017pinn .

As deep neural networks are universal function approximators, it is natural to employ them as ansatz spaces for solutions of (ordinary or partial) differential equations. This paves the way for their use in physical modeling and scientific computing and gives rise to the field of scientific machine learning Karniadakisetal2021 ; SirignanoS2018 ; Raissi2019pinn ; EY2018 ; Lu2021DeepXDE . The physics-informed neural network (PINN) approach was introduced in Raissi2019pinn . It has been successfully applied to a variety of forward and inverse PDE problems and has become one of the most commonly-used methods in scientific machine learning (see e.g. Raissi2019pinn ; HeX2019 ; CyrGPPT2020 ; JagtapKK2020 ; WangL2020 ; JagtapK2020 ; CaiCLL2020 ; Tartakovskyetal2020 ; DongN2021 ; TangWL2021 ; DongL2021 ; CalabroFS2021 ; WanW2022 ; FabianiCRS2021 ; KrishnapriyanGZKM2021 ; DongY2022 ; DongY2022rm ; WangYP2022 ; Pateletal2022 ; DongW2022 ; Siegeletal2022 ; HuLWX2022 ; Penwardenetal2023 , among others). The references Karniadakisetal2021 ; Cuomo2022Scientific provide a comprehensive review of the literature on PINN and about the benefits and drawbacks of this approach.

The mathematical foundation for PINN aiming at the approximation of PDE solution is currently an active area of research. It is important to account for different components of the neural-network error: optimization error, approximation error, and estimation error Niyogi1999Generalization ; Shin2020On . Approximation error refers to the discrepancy between the exact functional map and the neural network mapping function on a given network architecture Calin2020Deep ; Elbrachter2021deep . Estimation error arises when the network is trained on a finite data set to get a mapping on the target domain. The generalization error is the combination of approximation and estimation errors and defines the accuracy of the neural-network predicted solution trained on the given set of data.

Theoretical understanding of PINN has been advanced by a number of recent works. In Shin2020On Shin et al. rigorously justify why PINN works and shows its consistency for linear elliptic and parabolic PDEs under certain assumptions. These results are extended in Shin2010.08019 to a general abstract framework for analyzing PINN for linear problems with the loss function formulated in terms of the strong or weak forms of the equations. In Mishra2022Estimates Mishra and Molinaro provide an abstract framework on PINN for forward PDE problems, and estimate the generalization error by means of the training error and the number of training data points. This framework is extended in Mishra2022inverse to study several inverse PDE problems, including the Poisson, heat, wave and Stokes equations. Bai and Koley Bai2021PINN investigate the PINN approximation of nonlinear dispersive PDEs such as the KdV-Kawahara, Camassa-Holm and Benjamin-Ono equations. In Biswas2022Error Biswa et al. provide explicit error estimates (in suitable norms) and stability analyses for the incompressible Navier–Stokes equations. Zerbinati Zerbinati2022pinns presents PINN as an under-determined point matching collocation method, reveals its connection with Galerkin Least Squares (GALS) method, and establishes an a priori error estimate for elliptic problems.

An important theoretical result on the approximation errors from the recent work DeRyck2021On establishes that a feed-forward neural network u^θ\hat{u}_{\theta} with a tanh\tanh activation function and two hidden layers may approximate a function uu with a bound in a Sobolev space,

u^θNuwk,Cln(cN)k/Nsk.\|\hat{u}_{\theta N}-u\|_{w^{k,\infty}}\leq C{\rm ln}(cN)^{k}/N^{s-k}.

Here uws,([0,1]d)u\in w^{s,\infty}([0,1]^{d}), dd is the dimension of the problem, NN is the number of training points, and c,C>0c,C>0 are explicitly known constants independent of NN. Based on this result, De Ryck et al. 2023_IMA_Mishra_NS have studied the PINN for the Navier–Stokes equations and shown that a small training error implies a small generalization error. In particular, Hu et al. Ruimeng2209.11929 provide the higher-order (spatial Sobolev norm) error estimates for the primitive equations, which improve the existing results in the PINN literature that only involve L2L^{2} errors. In DeRyck2022Estimates it has been shown that, with a sufficient number of randomly chosen training points, the total L2L^{2} error can be bounded by the generalization error for Kolmogorov-type PDEs, which in turn is bounded by the training error. It is proved that the size of the PINN and the number of training samples only increase polynomially with the problem dimension, thus enabling PINN to overcome the curse of dimensionality in this case. In Mishra2021pinn the authors investigate the high-dimensional radiative transfer equation and prove that the generalization error is bounded by the training error and the number of training points, where the upper bound depends on the dimension only through a logarithmic factor. Hence PINN does not suffer from the curse of dimensionality, provided that the training errors do not depend on the underlying dimension.

Although PINN has been widely used for approximating PDEs, theoretical investigations on its convergence and errors are still quite limited and are largely confined to elliptic and parabolic PDEs. There seems to be less (or little) theoretical analysis on the convergence of PINN for hyperbolic type PDEs. In this paper, we consider a class of dynamic PDEs of second order in time, which are hyperbolic in nature, and provide an analysis of the convergence and errors of the PINN algorithm applied to such problems. We have focused on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation in our analyses. Building upon the result of DeRyck2021On ; 2023_IMA_Mishra_NS on tanh\tanh neural networks with two hidden layers, we have shown that for these three kinds of PDEs:

  • The underlying PDE residuals in PINN can be made arbitrarily small with tanh\tanh neural networks having two hidden layers.

  • The total error of the PINN approximation is bounded by the generalization error of PINN.

  • The total error of PINN approximations for the solution field, its time derivative and its gradient is bounded by the training error (training loss) of PINN and the number of quadrature points (training data points).

Furthermore, our theoretical analyses have suggested PINN training loss functions for these PDEs that are somewhat different in form than from the canonical PINN formulation. These lie in two aspects: (i) Our analyses require certain residual terms (such as the gradient of the initial condition, the time derivative of the boundary condition, or in the case of linear elastodynamic equation the strain and divergence of the initial condition) in the training loss, which would be absent from the canonical PINN formulation of the loss function. (ii) Our analyses may require, depending on the type of boundary conditions, a norm other than the L2L^{2} norm for certain boundary residuals in the training loss, which is different from the commonly-used L2L^{2} norm in the canonical PINN formulation of the loss function.

These new forms for the training loss function suggested by the theoretical analyses lead to a variant PINN algorithm. We have implemented the PINN algorithm based on these new forms of the training loss function for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Ample numerical experiments based on this algorithm have been presented. The simulation results indicate that the method has captured the solution field reasonably well for these PDEs. The numerical results also to some extent corroborate the theoretical relation between the approximation error and the PINN training loss obtained from the error analysis.

The rest of this paper is organized as follows. In Section 2 we present an overview of PINN for dynamic PDEs of second order in time. In Sections 3, 4 and 5, we present an error analysis of the PINN algorithm for approximating the wave equation, Sine-Gordon equation, and the linear elastodynamic equation, respectively. Section 6 summarizes a set of numerical experiments with these three PDEs to supplement and support our theoretical analyses. Section 7 concludes the presentation with some closing remarks. Finally, the appendix (Section 8) recalls some auxiliary results for our analysis and provides the proofs of the main theorems in Sections 4 and 5.

2 Physics Informed Neural Networks (PINN) for Approximating PDEs

2.1 Generic PDE of Second Order in Time

Consider a compact domain DdD\subset\mathbb{R}^{d} (d>0d>0 being an integer), and let 𝒟\mathcal{D} and \mathcal{B} denote the differential and boundary operators. We consider the following general form of an initial boundary value problem with a generic PDE of second order in time. For any 𝒙D\bm{x}\in D, 𝒚D\bm{y}\in\partial D and t[0,T]t\in[0,T],

2ut2(𝒙,t)+𝒟[u](𝒙,t)=0,\displaystyle\frac{\partial^{2}u}{\partial t^{2}}(\bm{x},t)+\mathcal{D}[u](\bm{x},t)=0, (1a)
u(𝒚,t)=ud(𝒚,t),\displaystyle\mathcal{B}u(\bm{y},t)=u_{d}(\bm{y},t), (1b)
u(𝒙,0)=uin(𝒙),ut(𝒙,0)=vin(𝒙).\displaystyle u(\bm{x},0)=u_{in}(\bm{x}),\quad\frac{\partial u}{\partial t}(\bm{x},0)=v_{in}(\bm{x}). (1c)

Here, u(𝒙,t)u(\bm{x},t) is the unknown field solution, udu_{d} denotes the boundary data, and uinu_{in} and vinv_{in} are the initial distributions for uu and ut\frac{\partial u}{\partial t}. We assume that in 𝒟\mathcal{D} the highest derivative with respect to the time variable tt, if any, is of first order.

2.2 Neural Network Representation of a Function

Let σ:\sigma:\mathbb{R}\rightarrow\mathbb{R} denote an activation function that is at least twice continuously differentiable. For any nn\in\mathbb{N} and znz\in\mathbb{R}^{n}, we define σ(z):=(σ(z1),,σ(zn))\sigma(z):=(\sigma(z_{1}),\cdots,\sigma(z_{n})), where ziz_{i} (1in1\leq i\leq n) are the components of zz. We adopt the following formal definition for a feedforward neural network as given in 2023_IMA_Mishra_NS .

Definition 2.1 (2023_IMA_Mishra_NS ).

Let R(0,]R\in(0,\infty], L,WL,W\in\mathbb{N} and l0,,lLl_{0},\cdots,l_{L}\in\mathbb{N}. Let σ:\sigma:\mathbb{R}\rightarrow\mathbb{R} be a twice differentiable function and define

Θ=ΘL,W,R:=L,LLl0,,lL{1,,W}k=1L([R,R]lk×lk1×[R,R]lk).\Theta=\Theta_{L,W,R}:=\bigcup_{L^{\prime}\in\mathbb{N},L^{\prime}\leq L}\,\bigcup_{l_{0},\cdots,l_{L}\in\{1,\cdots,W\}}\,{\hbox to0.0pt{$\diagdown$\hss}\diagup}_{k=1}^{L^{\prime}}([-R,R]^{l_{k}\times l_{k-1}}\times[-R,R]^{l_{k}}). (2)

For θΘ\theta\in\Theta, we define θk:=(Wk,bk)\theta_{k}:=(W_{k},b_{k}) and 𝒜kθ:lk1lk\mathcal{A}_{k}^{\theta}:\mathbb{R}^{l_{k-1}}\rightarrow\mathbb{R}^{l_{k}} by zWkz+bkz\mapsto W_{k}z+b_{k} for 1kL1\leq k\leq L, and we define fkθ:lk1lkf_{k}^{\theta}:\mathbb{R}^{l_{k-1}}\rightarrow\mathbb{R}^{l_{k}} by

fkθ={𝒜Lθ(z)k=L,(σ𝒜kθ)(z)1k<L.\displaystyle f_{k}^{\theta}=\left\{\begin{array}[]{lll}\mathcal{A}_{L}^{\theta}(z)&k=L,\\ (\sigma\circ\mathcal{A}_{k}^{\theta})(z)&1\leq k<L.\end{array}\right. (5)

Denote uθ:l0lLu_{\theta}:\mathbb{R}^{l_{0}}\rightarrow\mathbb{R}^{l_{L}} the function that satisfies for all zl0z\in\mathbb{R}^{l_{0}} that

uθ(z)=(fLθfL1θf1θ)(z)zl0.u_{\theta}(z)=(f_{L}^{\theta}\circ f_{L-1}^{\theta}\circ\cdots\circ f_{1}^{\theta})(z)\qquad z\in\mathbb{R}^{l_{0}}. (6)

We set z=(𝒙,t)z=(\bm{x},t) and l0=d+1l_{0}=d+1 for approximating the PDE problem (1).

uθu_{\theta} as defined above is the neural-network representation of a parameterized function associated with the parameter θ\theta. This neural network contains (L+1)(L+1) layers (L2L\geq 2), with widths (l0,l1,,lL)(l_{0},l_{1},\cdots,l_{L}) for each layer. The input layer has a width l0l_{0}, and the output layer has a width lLl_{L}. The (L1)(L-1) layers between the input/output layers are the hidden layers, with widths lkl_{k} (1kL11\leq k\leq L-1). WkW_{k} and bkb_{k} are the weight/bias coefficients corresponding to layer kk for 1kL1\leq k\leq L. From layer to layer the network logic represents an affine transform, followed by a function composition with the activation function σ\sigma. Note that no activation function is applied to the output layer. We refer to uθu_{\theta} with L=2L=2 (i.e. single hidden layer) as a shallow neural network, and uθu_{\theta} with L3L\geq 3 (i.e. multiple hidden layers) as a deeper or deep neural network.

2.3 Physics Informed Neural Network for Initial/Boundary Value Problem

Let Ω=D×[0,T]\Omega=D\times[0,T] and Ω=D×[0,T]\Omega_{*}=\partial D\times[0,T] be the spatial-temporal domaindomain. We approximate the solution uu to the problem (1) by a neural network uθ:Ωnu_{\theta}:\Omega\rightarrow\mathbb{R}^{n}. With PINN we consider the residual function of the initial/boundary value problem (1), defined for any sufficiently smooth function u:Ωnu:\Omega\rightarrow\mathbb{R}^{n} as, for any 𝒙D\bm{x}\in D, 𝒚D\bm{y}\in\partial D and t[0,T]t\in[0,T],

int[u](𝒙,t)=2ut2(𝒙,t)+𝒟[u](𝒙,t),\displaystyle\mathcal{R}_{int}[u](\bm{x},t)=\frac{\partial^{2}u}{\partial t^{2}}(\bm{x},t)+\mathcal{D}[u](\bm{x},t), (7a)
sb[u](𝒚,t)=u(𝒚,t)ud(𝒚,t),\displaystyle\mathcal{R}_{sb}[u](\bm{y},t)=\mathcal{B}u(\bm{y},t)-u_{d}(\bm{y},t), (7b)
tb1[u](𝒙,0)=u(𝒙,0)uin(𝒙),\displaystyle\mathcal{R}_{tb1}[u](\bm{x},0)=u(\bm{x},0)-u_{in}(\bm{x}), (7c)
tb2[u](𝒙,0)=ut(𝒙,0)vin(𝒙).\displaystyle\mathcal{R}_{tb2}[u](\bm{x},0)=\frac{\partial u}{\partial t}(\bm{x},0)-v_{in}(\bm{x}). (7d)

These residuals chacracterize how well a given function uu satisfies the initial/boundary value problem (1). If uu is the exact solution, int[u]=sb[u]=tb1[u]=tb2[u]=0\mathcal{R}_{int}[u]=\mathcal{R}_{sb}[u]=\mathcal{R}_{tb1}[u]=\mathcal{R}_{tb2}[u]=0.

To facilitate the subsequent analyses, we introduce an auxiliary function v=utv=\frac{\partial u}{\partial t} and rewrite tb2\mathcal{R}_{tb2} as

tb2[v](𝒙,0)=v(𝒙,0)vin(𝒙).\mathcal{R}_{tb2}[v](\bm{x},0)=v(\bm{x},0)-v_{in}(\bm{x}). (8)

We reformulate (1a) into two equations, thus separating the interior residual into the following two components:

int1[u,v](𝒙,t)=ut(𝒙,t)v(𝒙,t),\displaystyle\mathcal{R}_{int1}[u,v](\bm{x},t)=\frac{\partial u}{\partial t}(\bm{x},t)-v(\bm{x},t), (9)
int2[u,v](𝒙,t)=vt(𝒙,t)+𝒟[u](𝒙,t).\displaystyle\mathcal{R}_{int2}[u,v](\bm{x},t)=\frac{\partial v}{\partial t}(\bm{x},t)+\mathcal{D}[u](\bm{x},t). (10)

With PINN, we seek a neural network (uθ,vθ)(u_{\theta},v_{\theta}) to minimize the following quantity,

G(θ)2=Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+Ω|Rsb[uθ](𝒙,t)|2ds(𝒙)dt.\begin{split}\mathcal{E}_{G}(\theta)^{2}=&\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}\\ &+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega_{*}}|R_{sb}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t.\end{split} (11)

The different terms of (11) may be rescaled by different weights (penalty coefficients). For simplicity, we set all these weights to one in the analysis. G\mathcal{E}_{G} as defined above is often referred to as the generalization error. Because of the integrals involved therein, G\mathcal{E}_{G} can be hard to minimize. In practice, one will approximate (11) by an appropriate numerical quadrature rule, as follows

T(θ,𝒮)2=\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}= Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Ttb1(θ,𝒮tb)2+Ttb2(θ,𝒮tb)2+Tsb(θ,𝒮sb)2,\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2}, (12)

where

Tint1(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (13a)
Tint2(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint2[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int2}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (13b)
Ttb1(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}, (13c)
Ttb2(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb2[vθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})|^{2}, (13d)
Tsb(θ,𝒮sb)2\displaystyle\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2} =n=1Nsbωsbn|Rsb[uθ](𝒙sbn,tsbn)|2.\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}|R_{sb}[u_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}. (13e)

The quadrature points in the spatial-temporal domain and on the spatial and temporal boundaries, 𝒮int={(𝒙intn,tintn)}n=1Nint\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}, 𝒮sb={(𝒙sbn,tsbn)}n=1Nsb\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}} and 𝒮tb={(𝒙tbn,ttbn=0)}n=1Ntb\mathcal{S}_{tb}=\{(\bm{x}_{tb}^{n},t_{tb}^{n}=0)\}_{n=1}^{N_{tb}}, constitute the input data sets to the neural network. In the above equations T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} is referred to as the training error (or training loss), and ωn\omega_{\star}^{n} are suitable quadrature weights for =int\star=int, sbsb and tbtb. Therefore, PINN attempts to minimize the training error T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} over the network parameters θ\theta, and upon convergence of optimization the trained uθu_{\theta} contains the approximation of the solution uu to the problem (1).

Remark 2.2.

The generalization error (11) (with the corresponding training error (12)) is the standard (canonical) PINN form if one introduces v=utv=\frac{\partial u}{\partial t} and reformulates (1a) into two equations. We would like to emphasize that our analyses below suggest alternative forms for the generalization error, e.g.

G(θ)2=Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙+Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙+(Ω|Rsb[uθ](𝒙,t)|2ds(𝒙)dt)12,\begin{split}\mathcal{E}_{G}(\theta)^{2}=&\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}\\ &+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}\\ &+\left(\int_{\Omega_{*}}|R_{sb}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},\end{split} (14)

which differs from (11) in the terms Rint1\nabla R_{int1}, Rtb1\nabla R_{tb1} and the last term. The corresponding training error is,

T(θ,𝒮)2=\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2}= Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2+Ttb1(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}
+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Tsb(θ,𝒮sb),\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb}), (15)

where

{Tint3(θ,𝒮int)2=n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,Ttb3(θ,𝒮tb)2=n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2.\left\{\begin{split}&\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2},\\ &\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}.\end{split}\right. (16)

The error analyses also suggest additional terms in the generalization error for different equations.

2.4 Numerical Quadrature Rules

As discussed above, we need to approximate the integrals of functions. The analysis in the subsequent sections requires well-known results on numerical quadrature rules as reviewed below.

Given Λd\Lambda\subset\mathbb{R}^{d} and a function fL1(Λ)f\in L^{1}(\Lambda), we would like to approximate Λf(z)dz\int_{\Lambda}f(z){\,\rm{d}}{z}. A quadrature rule provides an approximation by

Λf(z)dz1Mn=1Mωnf(zn),\int_{\Lambda}f(z){\,\rm{d}}{z}\approx\frac{1}{M}\sum_{n=1}^{M}\omega_{n}f(z_{n}), (17)

where znΛz_{n}\in\Lambda (1nM1\leq n\leq M) are the quadrature points and ωn\omega_{n} (1nM1\leq n\leq M) denote the appropriate quadrature weights. The approximation accuracy is influenced by the type of quadrature rule, the number of quadrature points (MM), and the regularity of ff. For the mid-point rule, which is assumed in the analysis in the current work, the approximation accuracy is given by

|Λf(z)dz1Mn=1Mωnf(zn)|CfM2/d,\left|\int_{\Lambda}f(z){\,\rm{d}}{z}-\frac{1}{M}\sum_{n=1}^{M}\omega_{n}f(z_{n})\right|\leq C_{f}M^{-2/d}, (18)

where CffC2(D)C_{f}\lesssim\|f\|_{C^{2}(D)} (aba\lesssim b denotes aCba\leq Cb) and DD has been partitioned into MNdM\sim N^{d} cubes and znz_{n} (1nM1\leq n\leq M) denote the midpoints of these cubes DavisR2007 . In this paper, we use CC to denote a universal constant, which may depend on k,d,T,uk,d,T,u and vv but not on NN. And we use the subscript to emphasize its dependence when necessary, e.g. CdC_{d} is a constant depending only on dd.

We focus on PDE problems in relatively low dimensions (d3d\leq 3) in this paper and employ the standard quadrature rules. We note that in higher dimensions the standard quadrature rules may not be favorable. In this case the random training points or low-discrepancy training points Mishra2021Enhancing may be preferred.

In subsequent sections we focus on three representative dynamic equations of second order in time (the wave equation, Sine-Gordon equation, and the linear elastodynamic equation), and provide the error estimate for approximating these equations by PINN. We note that these analyses suggest alternative forms for the training loss function that are somewhat different from the standard PINN forms Raissi2019pinn . The PINN numerical results based on the standard form for the loss function, and based on the alternative forms as suggested by the error estimate, will be provided after the presentation of the theoretical analysis. In what follows, for brevity we adopt the notation of Ξ=Ξ\mathcal{F}_{\Xi}=\frac{\partial\mathcal{F}}{\partial\Xi}, ΞΥ=2ΞΥ\mathcal{F}_{\Xi\Upsilon}=\frac{\partial^{2}\mathcal{F}}{\partial\Xi\partial\Upsilon} (Ξ,Υ{t,x}\Xi,\Upsilon\in\{t,x\}), for any sufficiently smooth function :Ωn\mathcal{F}:\Omega\rightarrow\mathbb{R}^{n}.

3 Physics Informed Neural Networks for Approximating Wave Equation

3.1 Wave Equation

Consider the following wave equations on the torus D=[0,1)ddD=[0,1)^{d}\subset\mathbb{R}^{d} with periodic boundary conditions:

utv=0inD×[0,T],\displaystyle u_{t}-v=0\ \,\qquad\qquad\qquad\text{in}\ D\times[0,T], (19a)
vtΔu=finD×[0,T],\displaystyle v_{t}-\Delta u=f\ \ \quad\qquad\qquad\text{in}\ D\times[0,T], (19b)
u(𝒙,0)=ψ1(𝒙)inD,\displaystyle u(\bm{x},0)=\psi_{1}(\bm{x})\qquad\qquad\ \text{in}\ D, (19c)
v(𝒙,0)=ψ2(𝒙)inD,\displaystyle v(\bm{x},0)=\psi_{2}(\bm{x})\,\qquad\qquad\ \text{in}\ D, (19d)
u(𝒙,t)=u(𝒙+1,t)inD×[0,T],\displaystyle u(\bm{x},t)=u(\bm{x}+1,t)\qquad\ \ \text{in}\ \partial D\times[0,T], (19e)
u(𝒙,t)=u(𝒙+1,t)inD×[0,T].\displaystyle\nabla u(\bm{x},t)=\nabla u(\bm{x}+1,t)\quad\text{in}\ \partial D\times[0,T]. (19f)

The regularity results for linear evolution equations of the second order in time have been studied in the Book Temam1997Infinite . When the self-adjoint operator 𝒜\mathcal{A} takes Δ\Delta, the linear evolution equations of second order in time become the classical wave equations, and then we can also obtain the following regularity results.

Lemma 3.1.

Let r1r\geq 1, ψ1Hr(D)\psi_{1}\in H^{r}(D), ψ2Hr1(D)\psi_{2}\in H^{r-1}(D) and fL2([0,T];Hr1(D))f\in L^{2}([0,T];H^{r-1}(D)), then there exists a unique solution uu to the classical wave equations such that uC([0,T];Hr(D))u\in C([0,T];H^{r}(D)) and utC([0,T];Hr1(D))u_{t}\in C([0,T];H^{r-1}(D)).

Lemma 3.2.

Let kk\in\mathbb{N}, ψ1Hr(D)\psi_{1}\in H^{r}(D), ψ2Hr1(D)\psi_{2}\in H^{r-1}(D) and fCk1([0,T];Hr1(D))f\in C^{k-1}([0,T];H^{r-1}(D)) with r>d2+kr>\frac{d}{2}+k, then there exists T>0T>0 and a classical solution uu to the wave equations such that u(t=0)=ψ1u(t=0)=\psi_{1}, ut(t=0)=ψ2u_{t}(t=0)=\psi_{2}, uCk(D×[0,T])u\in C^{k}(D\times[0,T]) and vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]).

Proof.

By Lemma 3.1, there exists T>0T>0 and the solution (u,v)(u,v) to the wave equations such that u(t=0)=ψ1u(t=0)=\psi_{1}, v(t=0)=ψ2v(t=0)=\psi_{2}, uC([0,T];Hr(D))u\in C([0,T];H^{r}(D)) and vC([0,T];Hr1(D))v\in C([0,T];H^{r-1}(D)). As r>d2+kr>\frac{d}{2}+k, Hrk(D)H^{r-k}(D) is a Banach algebra.

For k=1k=1, since uC([0,T];Hr(D))u\in C([0,T];H^{r}(D)), vC([0,T];Hr1(D))v\in C([0,T];H^{r-1}(D)) and fC([0,T];Hr1(D))f\in C([0,T];H^{r-1}(D)), we have ut=vC([0,T];Hr1(D))u_{t}=v\in C([0,T];H^{r-1}(D)) and vt=Δu+fC([0,T];Hr2(D))v_{t}=\Delta u+f\in C([0,T];H^{r-2}(D)). Then, it implies that uC1([0,T];Hr1(D))u\in C^{1}([0,T];H^{r-1}(D)) and vC1([0,T];Hr2(D))v\in C^{1}([0,T];H^{r-2}(D)).

For k=2k=2, by fC1([0,T];Hr1(D))f\in C^{1}([0,T];H^{r-1}(D)), we have utt=vtC([0,T];Hr2(D))u_{tt}=v_{t}\in C([0,T];H^{r-2}(D)) and vtt=Δut+ftC([0,T];Hr3(D))v_{tt}=\Delta u_{t}+f_{t}\in C([0,T];H^{r-3}(D)). Then, it implies that uC2([0,T];Hr2(D))u\in C^{2}([0,T];H^{r-2}(D)) and vC2([0,T];Hr3(D))v\in C^{2}([0,T];H^{r-3}(D)).

Repeating the same argument, we have ul=0kCl([0,T];Hrl(D))u\in\cap_{l=0}^{k}C^{l}([0,T];H^{r-l}(D)) and vl=0kCl([0,T];Hrl1(D))v\subset\cap_{l=0}^{k}C^{l}([0,T];H^{r-l-1}(D)). Then, applying the Sobolev embedding theorem and r>d2+kr>\frac{d}{2}+k, it holds Hrl(D)Crl(D)H^{r-l}(D)\subset C^{r-l}(D) and Hrl1(D)Crl1(D)H^{r-l-1}(D)\subset C^{r-l-1}(D) for 0lk0\leq l\leq k. Therefore, uCk(D×[0,T])u\in C^{k}(D\times[0,T]) and vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]). ∎

3.2 Physics Informed Neural Networks

We would like to approximate the solutions to the problem (19) with PINN. We seek deep neural networks uθ:D×[0,T]u_{\theta}:D\times[0,T]\rightarrow\mathbb{R} and vθ:D×[0,T]v_{\theta}:D\times[0,T]\rightarrow\mathbb{R}, parameterized by θΘ\theta\in\Theta, that approximate the solution uu and vv of (19). Define residuals,

Rint1[uθ,vθ](𝒙,t)=uθtvθ,\displaystyle R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)=u_{\theta t}-v_{\theta}, (20a)
Rint2[uθ,vθ](𝒙,t)=vθtΔuθf,\displaystyle R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)=v_{\theta t}-\Delta u_{\theta}-f, (20b)
Rtb1[uθ](𝒙)=uθ(𝒙,0)ψ1(𝒙),\displaystyle R_{tb1}[u_{\theta}](\bm{x})=u_{\theta}(\bm{x},0)-\psi_{1}(\bm{x}), (20c)
Rtb2[vθ](𝒙)=vθ(𝒙,0)ψ2(𝒙),\displaystyle R_{tb2}[v_{\theta}](\bm{x})=v_{\theta}(\bm{x},0)-\psi_{2}(\bm{x}), (20d)
Rsb1[vθ](𝒙,t)=vθ(𝒙,t)vθ(𝒙+1,t),\displaystyle R_{sb1}[v_{\theta}](\bm{x},t)=v_{\theta}(\bm{x},t)-v_{\theta}(\bm{x}+1,t), (20e)
Rsb2[uθ](𝒙,t)=uθ(𝒙,t)uθ(𝒙+1,t).\displaystyle R_{sb2}[u_{\theta}](\bm{x},t)=\nabla u_{\theta}(\bm{x},t)-\nabla u_{\theta}(\bm{x}+1,t). (20f)

Note that for the exact solution Rint1[u,v]=Rint2[u,v]=Rtb1[u]=Rtb2[v]=Rsb1[v]=Rsb2[u]=0R_{int1}[u,v]=R_{int2}[u,v]=R_{tb1}[u]=R_{tb2}[v]=R_{sb1}[v]=R_{sb2}[u]=0. Let Ω=D×[0,T]\Omega=D\times[0,T] and Ω=D×[0,T]\Omega_{*}=\partial D\times[0,T] be the space-time domain. With PINN, we minimize the the following generalization error,

G(θ)2\displaystyle\mathcal{E}_{G}(\theta)^{2} =Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙+Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙\displaystyle=\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}
+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙\displaystyle+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+Ω|Rsb1[vθ](𝒙,t)|2ds(𝒙)dt+Ω|Rsb2[uθ](𝒙,t)|2ds(𝒙)dt.\displaystyle+\int_{\Omega_{*}}|R_{sb1}[v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t+\int_{\Omega_{*}}|R_{sb2}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t. (21)

The form of different terms in this expression will become clearer below.

To complete the PINN formulation, we will choose the training set 𝒮D¯×[0,T]\mathcal{S}\subset\overline{D}\times[0,T] based on suitable quadrature points. We divide the full training set 𝒮=𝒮int𝒮sb𝒮tb\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb} into the following three components:

  • Interior training points 𝒮int={zn}\mathcal{S}_{int}=\{{z}_{n}\} for 1nNint1\leq n\leq N_{int}, with each zn=(𝒙,t)nD×(0,T){z}_{n}=(\bm{x},t)_{n}\in D\times(0,T).

  • Spatial boundary training points 𝒮sb={zn}\mathcal{S}_{sb}=\{{z}_{n}\} for 1nNsb1\leq n\leq N_{sb}, with each zn=(𝒙,t)nD×(0,T){z}_{n}=(\bm{x},t)_{n}\in\partial D\times(0,T).

  • Temporal boundary training points 𝒮tb={𝒙n}\mathcal{S}_{tb}=\{\bm{x}_{n}\} for 1nNtb1\leq n\leq N_{tb} with each 𝒙nD\bm{x}_{n}\in D.

We define the PINN training loss, θT(θ,𝒮)2\theta\mapsto\mathcal{E}_{T}(\theta,\mathcal{S})^{2}, as follows,

T(θ,𝒮)2\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2} =Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2+Ttb1(θ,𝒮tb)2\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}
+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Tsb1(θ,𝒮sb)2+Tsb2(θ,𝒮sb)2,\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb})^{2}+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb})^{2}, (22)

where

Tint1(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (23a)
Tint2(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint2[uθ,vθ]](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int2}[u_{\theta},v_{\theta}]](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (23b)
Tint3(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (23c)
Ttb1(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}, (23d)
Ttb2(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb2[vθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})|^{2}, (23e)
Ttb3(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}, (23f)
Tsb1(θ,𝒮sb)2\displaystyle\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb})^{2} =n=1Nsbωsbn|Rsb1[vθ](𝒙sbn,tsbn)|2,\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}|R_{sb1}[v_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}, (23g)
Tsb2(θ,𝒮sb)2\displaystyle\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb})^{2} =n=1Nsbωsbn|Rsb2[uθ](𝒙sbn,tsbn)|2.\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}|R_{sb2}[u_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}. (23h)

Here the quadrature points in space-time constitute the data sets 𝒮int={(𝒙intn,tintn)}n=1Nint\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}, 𝒮tb={𝒙tbn)}n=1Ntb\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}} and 𝒮sb={(𝒙sbn,tsbn)}n=1Nsb\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}}, and ωn\omega_{\star}^{n} are suitable quadrature weights with \star denoting intint, tbtb or sbsb.

Let

u^=uθu,v^=vθv,\hat{u}=u_{\theta}-u,\qquad\hat{v}=v_{\theta}-v,

denote the difference between the solution to the wave equations and the PINN approximation of the solution. We define the total error of the PINN approximation by

(θ)2=0TD(|u^(𝒙,t)|2+|u^(𝒙,t)|2+|v^(𝒙,t)|2)d𝒙dt.\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},t)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t. (24)

3.3 Error Analysis

In light of the wave equations (19) and the definitions for different residuals (20), we have

Rint1=u^tv^,\displaystyle R_{int1}=\hat{u}_{t}-\hat{v}, (25a)
Rint2=v^tΔu^\displaystyle R_{int2}=\hat{v}_{t}-\Delta\hat{u} (25b)
Rtb1=u^(𝒙,0),\displaystyle R_{tb1}=\hat{u}(\bm{x},0), (25c)
Rtb2=v^(𝒙,0),\displaystyle R_{tb2}=\hat{v}(\bm{x},0), (25d)
Rsb1=v^(𝒙,t)v^(𝒙+1,t),\displaystyle R_{sb1}=\hat{v}(\bm{x},t)-\hat{v}(\bm{x}+1,t), (25e)
Rsb2=u^(𝒙,t)u^(𝒙+1,t).\displaystyle R_{sb2}=\nabla\hat{u}(\bm{x},t)-\nabla\hat{u}(\bm{x}+1,t). (25f)

3.3.1 Bound on the Residuals

Theorem 3.3.

Let dd, rr, kk\in\mathbb{N} with k3k\geq 3. Let ψ1Hr(D)\psi_{1}\in H^{r}(D), ψ2Hr1(D)\psi_{2}\in H^{r-1}(D) and fCk1([0,T];Hr1(D))f\in C^{k-1}([0,T];H^{r-1}(D)) with r>d2+kr>\frac{d}{2}+k. For every integer N>5N>5, there exist tanh\tanh neural networks uθu_{\theta} and vθv_{\theta}, each with two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that

Rint1L2(Ω),Rtb1L2(D)lnNNk+1,\displaystyle\|R_{int1}\|_{L^{2}(\Omega)},\|R_{tb1}\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1}, (26a)
Rint2L2(Ω),Rint1L2(Ω),Rtb1L2(D),Rsb2L2(D×[0,t])ln2NNk+2,\displaystyle\|R_{int2}\|_{L^{2}(\Omega)},\|\nabla R_{int1}\|_{L^{2}(\Omega)},\|\nabla R_{tb1}\|_{L^{2}(D)},\|R_{sb2}\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2}, (26b)
Rtb2L2(D),Rsb1L2(D×[0,t])lnNNk+2.\displaystyle\|R_{tb2}\|_{L^{2}(D)},\|R_{sb1}\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}. (26c)
Proof.

Based on Lemma 3.2, it holds that uHk(Ω)u\in H^{k}(\Omega) and vHk1(Ω)v\in H^{k-1}(\Omega). In light of Lemma 8.5, there exists neural networks uθu_{\theta} and vθv_{\theta}, with the same two hidden layers and widths 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that for every 0l20\leq l\leq 2 and 0s20\leq s\leq 2,

uθuHl(Ω)Cl,k,d+1,uλl,u(N)Nk+l,\displaystyle\|u_{\theta}-u\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u}\lambda_{l,u}(N)N^{-k+l}, (27)
vθvHs(Ω)Cs,k1,d+1,vλs,v(N)Nk+1+s,\displaystyle\|v_{\theta}-v\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v}\lambda_{s,v}(N)N^{-k+1+s}, (28)

where λl,u=2l3d+1(1+σ)lnl(βl,σ,d+1,uNd+k+3)\lambda_{l,u}=2^{l}3^{d+1}(1+\sigma){\rm ln}^{l}\left(\beta_{l,\sigma,d+1,u}N^{d+k+3}\right), σ=1100\sigma=\frac{1}{100}, λs,v=2s3d+1(1+σ)lns(βs,σ,d+1,vNd+k+2)\lambda_{s,v}=2^{s}3^{d+1}(1+\sigma){\rm ln}^{s}\left(\beta_{s,\sigma,d+1,v}N^{d+k+2}\right), and the definition for the other constants can be found in Lemma 8.5.

In light of Lemma 8.3, we can bound the PINN residual terms,

u^tL2(Ω)u^H1(Ω),v^tL2(Ω)v^H1(Ω),\displaystyle\|\hat{u}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{1}(\Omega)},\qquad\|\hat{v}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{v}\|_{H^{1}(\Omega)},
Δu^L2(Ω)u^H2(Ω),u^tL2(Ω)u^H2(Ω),\displaystyle\|\Delta\hat{u}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)},\qquad\|\nabla\hat{u}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)},
v^L2(Ω)v^H1(Ω),\displaystyle\|\nabla\hat{v}\|_{L^{2}(\Omega)}\leq\|\hat{v}\|_{H^{1}(\Omega)},
u^L2(D)u^L2(Ω)ChΩ,d+1,ρΩu^H1(Ω),\displaystyle\|\hat{u}\|_{L^{2}(D)}\leq\|\hat{u}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{1}(\Omega)},
v^L2(D)v^L2(Ω)ChΩ,d+1,ρΩv^H1(Ω),\displaystyle\|\hat{v}\|_{L^{2}(D)}\leq\|\hat{v}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)},
u^L2(D)u^L2(Ω)ChΩ,d+1,ρΩu^H2(Ω),\displaystyle\|\nabla\hat{u}\|_{L^{2}(D)}\leq\|\nabla\hat{u}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{2}(\Omega)},
v^L2(D×[0,t])v^L2(Ω)ChΩ,d+1,ρΩv^H1(Ω),\displaystyle\|\hat{v}\|_{L^{2}(\partial D\times[0,t])}\leq\|\hat{v}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)},
u^L2(D×[0,t])u^L2(Ω)ChΩ,d+1,ρΩu^H2(Ω).\displaystyle\|\nabla\hat{u}\|_{L^{2}(\partial D\times[0,t])}\leq\|\nabla\hat{u}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{2}(\Omega)}.

By combining these relations with (27) and (28), we can obtain

Rint1L2(Ω)=u^tv^L2(Ω)u^H1(Ω)+v^L2(Ω)\displaystyle\|R_{int1}\|_{L^{2}(\Omega)}=\|\hat{u}_{t}-\hat{v}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{1}(\Omega)}+\|\hat{v}\|_{L^{2}(\Omega)}
C1,k,d+1,uλ1,u(N)Nk+1+C0,k1,d+1,vλ0,v(N)Nk+1lnNNk+1,\displaystyle\qquad\leq C_{1,k,d+1,u}\lambda_{1,u}(N)N^{-k+1}+C_{0,k-1,d+1,v}\lambda_{0,v}(N)N^{-k+1}\lesssim{\rm ln}NN^{-k+1},
Rint2L2(Ω)=v^tΔu^L2(Ω)v^H1(Ω)+u^H2(Ω)\displaystyle\|R_{int2}\|_{L^{2}(\Omega)}=\|\hat{v}_{t}-\Delta\hat{u}\|_{L^{2}(\Omega)}\leq\|\hat{v}\|_{H^{1}(\Omega)}+\|\hat{u}\|_{H^{2}(\Omega)}
C2,k,d+1,uλ2,u(N)Nk+2+C1,k1,d+1,vλ1,v(N)Nk+2ln2NNk+2,\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},
Rint1L2(Ω)=(u^tv^)L2(Ω)u^H2(Ω)+v^H1(Ω)\displaystyle\|\nabla R_{int1}\|_{L^{2}(\Omega)}=\|\nabla(\hat{u}_{t}-\hat{v})\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)}+\|\hat{v}\|_{H^{1}(\Omega)}
C2,k,d+1,uλ2,u(N)Nk+2+C1,k1,d+1,vλ1,v(N)Nk+2ln2NNk+2,\displaystyle\qquad\leq C_{2,k,d+1,u}\lambda_{2,u}(N)N^{-k+2}+C_{1,k-1,d+1,v}\lambda_{1,v}(N)N^{-k+2}\lesssim{\rm ln}^{2}NN^{-k+2},
Rtb1L2(D)ChΩ,d+1,ρΩu^H1(Ω)lnNNk+1,\displaystyle\|R_{tb1}\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
Rtb2L2(D),Rsb1L2(D×[0,t])ChΩ,d+1,ρΩv^H1(Ω)lnNNk+2,\displaystyle\|R_{tb2}\|_{L^{2}(D)},\|R_{sb1}\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},
Rtb1L2(D),Rsb2L2(D×[0,t])ChΩ,d+1,ρΩu^H2(Ω)ln2NNk+2.\displaystyle\|\nabla R_{tb1}\|_{L^{2}(D)},\|R_{sb2}\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.

Theorem 3.3 implies that one can make the PINN residuals (20) arbitrarily small by choosing NN to be sufficiently large. It follows that the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2} in (3.2) can be made arbitrarily small.

3.3.2 Bounds on the Total Approximation Error

We next show that the total error (θ)2\mathcal{E}(\theta)^{2} is also small when the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2} is small with the PINN approximation (uθ,vθ)(u_{\theta},v_{\theta}). Then we prove that the total error (θ)2\mathcal{E}(\theta)^{2} can be arbitrarily small, provided that the training error T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} is sufficiently small and the sample set is sufficiently large.

Theorem 3.4.

Let dd\in\mathbb{N}, uC1(Ω)u\in C^{1}(\Omega) and vC0(Ω)v\in C^{0}(\Omega) be the classical solution to the wave equations (19). Let uθu_{\theta} and vθv_{\theta} denote the PINN approximation with parameter θ\theta. Then the following relation holds,

(θ)2=0TD(|u^(𝒙,t)|2+|u^(𝒙,τ)|2+|v^(𝒙,t)|2)d𝒙dtCGTexp(2T),\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},\tau)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp(2T), (29)

where

CG\displaystyle C_{G} =D(|Rtb1|2+|Rtb2|2+|Rtb1|2)d𝒙+0TD(|Rint1|2+|Rint2|2+|Rint1|2)d𝒙dt\displaystyle=\int_{D}(|R_{tb1}|^{2}+|R_{tb2}|^{2}+|\nabla R_{tb1}|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(|R_{int1}|^{2}+|R_{int2}|^{2}+|\nabla R_{int1}|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t
+0TD(|Rsb1|2+|Rsb2|2)ds(𝒙)dt.\displaystyle\quad+\int_{0}^{T}\int_{\partial D}(|R_{sb1}|^{2}+|R_{sb2}|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.
Proof.

By taking the inner product of (25a) and (25b) with u^\hat{u} and v^\hat{v} and integrating over DD, respectively, we have

d2dtD|u^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x} =Du^v^d𝒙+DRint1u^d𝒙D|u^|2d𝒙+12D|Rint1|2d𝒙+12D|v^|2d𝒙,\displaystyle=\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int1}\hat{u}{\,\rm{d}}\bm{x}\leq\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int1}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}, (30)
d2dtD|v^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x} =Du^v^d𝒙+Dv^u^𝒏ds(𝒙)+DRint2v^d𝒙\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{v}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
=Du^u^td𝒙+Du^Rint1d𝒙+Dv^u^𝒏ds(𝒙)+DRint2v^d𝒙\displaystyle=-\int_{D}\nabla\hat{u}\cdot\nabla\hat{u}_{t}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
=d2dtD|u^|2d𝒙+Du^Rint1d𝒙+DRsb1Rsb2𝒏ds(𝒙)+DRint2v^d𝒙\displaystyle=-\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+\int_{\partial D}R_{sb1}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
d2dtD|u^|2d𝒙+12D|u^|2d𝒙+12D|Rint1|2d𝒙\displaystyle\leq-\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}
+12D(|Rsb1|2+|Rsb2|2)ds(𝒙)+12D|v^|2d𝒙+12D|Rint2|2d𝒙.\displaystyle\qquad+\frac{1}{2}\int_{\partial D}(|R_{sb1}|^{2}+|R_{sb2}|^{2}){\,\rm{d}}s(\bm{x})+\frac{1}{2}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int2}|^{2}{\,\rm{d}}\bm{x}. (31)

Here, we have used v^=u^tRint1\hat{v}=\hat{u}_{t}-R_{int1}.

By adding (30) to (3.3.2), we have

d2dtD|u^|2d𝒙+d2dtD|u^|2d𝒙+d2dtD|v^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}
D|u^|2d𝒙+12D|u^|2d𝒙+D|v^|2d𝒙+12D|Rint1|2d𝒙\displaystyle\qquad\leq\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int1}|^{2}{\,\rm{d}}\bm{x}
+12D|Rint2|2d𝒙+12D|Rint1|2d𝒙+12D(|Rsb1|2+|Rsb2|2)ds(𝒙).\displaystyle\qquad+\frac{1}{2}\int_{D}|R_{int2}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{\partial D}(|R_{sb1}|^{2}+|R_{sb2}|^{2}){\,\rm{d}}s(\bm{x}). (32)

Integrating (3.3.2) over [0,τ][0,\tau] for any τT\tau\leq T and applying the Cauchy–Schwarz inequality, we obtain

D|u^(𝒙,τ)|2d𝒙+D|u^(𝒙,τ)|2d𝒙+D|v^(𝒙,τ)|2d𝒙\displaystyle\int_{D}|\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\hat{v}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}
D|Rtb1|2d𝒙+D|Rtb2|2d𝒙+D|Rtb1|2d𝒙+20τD(|u^|2+|u^|2+|v^|2)d𝒙dt\displaystyle\qquad\leq\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x}+2\int_{0}^{\tau}\int_{D}\left(|\hat{u}|^{2}+|\nabla\hat{u}|^{2}+|\hat{v}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+0TD(|Rint1|2+|Rint2|2+|Rint1|2)d𝒙dt+0TD(|Rsb1|2+|Rsb2|2)ds(𝒙)dt.\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(|R_{int1}|^{2}+|R_{int2}|^{2}+|\nabla R_{int1}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{0}^{T}\int_{\partial D}(|R_{sb1}|^{2}+|R_{sb2}|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.

We apply the integral form of the Gro¨{\rm\ddot{o}}nwall inequality to the above inequality to get

D(|u^(𝒙,τ)|2+|u^(𝒙,τ)|2+|v^(𝒙,τ)|2)d𝒙CGexp(2T),\int_{D}\left(|\hat{u}(\bm{x},\tau)|^{2}+|\nabla\hat{u}(\bm{x},\tau)|^{2}+|\hat{v}(\bm{x},\tau)|^{2}\right){\,\rm{d}}\bm{x}\leq C_{G}\exp(2T),

where

CG\displaystyle C_{G} =D(|Rtb1|2+|Rtb2|2+|Rtb1|2)d𝒙+0TD(|Rint1|2+|Rint2|2+|Rint1|2)d𝒙dt\displaystyle=\int_{D}(|R_{tb1}|^{2}+|R_{tb2}|^{2}+|\nabla R_{tb1}|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(|R_{int1}|^{2}+|R_{int2}|^{2}+|\nabla R_{int1}|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t
+0TD(|Rsb1|2+|Rsb2|2)ds(𝒙)dt.\displaystyle\qquad+\int_{0}^{T}\int_{\partial D}(|R_{sb1}|^{2}+|R_{sb2}|^{2}){\,\rm{d}}s(\bm{x}){\,\rm{d}}t.

Then, we integrate the above inequality over [0,T][0,T] to yield (29). ∎

Remark 3.5.

For the wave equations (19) with periodic boundary, we would like to mention below two other forms for the generalization error (and the related training loss). Compared with (3.2), they differ only on the spatial boundary Ω\Omega_{*}, i.e.,

G(θ)2\displaystyle\mathcal{E}_{G}(\theta)^{2} =Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt\displaystyle=\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t
+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙\displaystyle+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+(Ω|Rsb1[vθ](𝒙,t)|2ds(𝒙)dt)12,\displaystyle+\left(\int_{\Omega_{*}}|R_{sb1}[v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}, (33)

and

G(θ)2\displaystyle\mathcal{E}_{G}(\theta)^{2} =Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt\displaystyle=\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t
+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙\displaystyle+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+(Ω|Rsb2[uθ](𝒙,t)|2ds(𝒙)dt)12.\displaystyle+\left(\int_{\Omega_{*}}|R_{sb2}[u_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}. (34)

The related training loss functions are given by

T(θ,𝒮)2\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2} =Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2+Ttb1(θ,𝒮tb)2\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}
+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Tsb1(θ,𝒮sb),\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb}), (35)

or

T(θ,𝒮)2\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2} =Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2+Ttb1(θ,𝒮tb)2\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}
+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Tsb2(θ,𝒮sb).\displaystyle+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb}). (36)

These three forms for the generalization error result from different treatments of the boundary term Dv^u^𝐧\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n} in the proof of Theorem 3.4:

Dv^u^𝒏ds(𝒙)=DRsb1u^𝒏ds(𝒙)|D|12(uC1(D×[0,t])+uθC1(D×[0,t]))(D|Rsb1|2ds(𝒙))12,\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}R_{sb1}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq|\partial D|^{\frac{1}{2}}(\|u\|_{C^{1}(\partial D\times[0,t])}+||u_{\theta}||_{C^{1}(\partial D\times[0,t])})\left(\int_{\partial D}|R_{sb1}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}},
Dv^u^𝒏ds(𝒙)=Dv^Rsb2𝒏ds(𝒙)|D|12(vC0(D×[0,t])+vθC0(D×[0,t]))(D|Rsb2|2ds(𝒙))12,\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}\hat{v}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq|\partial D|^{\frac{1}{2}}(\|v\|_{C^{0}(\partial D\times[0,t])}+||v_{\theta}||_{C^{0}(\partial D\times[0,t])})\left(\int_{\partial D}|R_{sb2}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}},
Dv^u^𝒏ds(𝒙)=DRsb1Rsb2𝒏ds(𝒙)12(D|Rsb1|2ds(𝒙)+D|Rsb2|2ds(𝒙)).\displaystyle\int_{\partial D}\hat{v}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})=\int_{\partial D}R_{sb1}R_{sb2}\cdot\bm{n}{\,\rm{d}}s(\bm{x})\leq\frac{1}{2}\left(\int_{\partial D}|R_{sb1}|^{2}{\,\rm{d}}s(\bm{x})+\int_{\partial D}|R_{sb2}|^{2}{\,\rm{d}}s(\bm{x})\right).

Our numerical experiments indicate that adopting the training loss (3.5) or (3.5) seems to lead to poorer simulation results. For the periodic boundary, both terms Rsb1R_{sb1} and Rsb2R_{sb2} may be needed for the periodicity information. We suspect that this may be why only a single boundary term (Rsb1R_{sb1} or Rsb2R_{sb2}), as given by (3.5) and (3.5), leads to poorer numerical results.

Theorem 3.6.

Let dd\in\mathbb{N} and T>0T>0. Let uC4(Ω)u\in C^{4}(\Omega) and vC3(Ω)v\in C^{3}(\Omega) be the classical solution of the wave equations (19), and let (uθ,vθ)(u_{\theta},v_{\theta}) denote the PINN approximation with parameter θΘ\theta\in\Theta. Then the total error satisfies

0TD(|u^(𝒙,t)|2+|u^(𝒙,t)|2+|v^(𝒙,t)|2)d𝒙dtCTTexp(2T)\displaystyle\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},t)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp(2T)
=𝒪(T(θ,𝒮)2+Mint2d+1+Mtb2d+Msb2d).\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta,\mathcal{S})^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{2}{d}}). (37)

The constant CTC_{T} is defined as

CT=\displaystyle C_{T}= C(Rtb12)Mtb2d+𝒬MtbD(Rtb12)+C(Rtb22)Mtb2d+𝒬MtbD(Rtb22)+C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2)\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})
+C(Rint12)Mint2d+1+𝒬MintΩ(Rint12)+C(Rint22)Mint2d+1+𝒬MintΩ(Rint22)+C(|Rint1|2)Mint2d+1\displaystyle+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}
+𝒬MintΩ(|Rint1|2)+C(Rsb12)Msb2d+𝒬MsbΩ(Rsb12)+C(Rsb22)Msb2d+𝒬MsbΩ(Rsb22),\displaystyle+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})+C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2})+C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2}),

where

C(Rtb12)u^C22,C(Rtb22)v^C22,C(|Rtb1|2)u^C32,C(Rint12)u^C32+u^C22,\displaystyle C_{({R_{tb1}^{2}})}\lesssim\|\hat{u}\|_{C^{2}}^{2},\quad C_{({R_{tb2}^{2}})}\lesssim\|\hat{v}\|_{C^{2}}^{2},\quad C_{(|\nabla R_{tb1}|^{2})}\lesssim\|\hat{u}\|_{C^{3}}^{2},\quad C_{({R_{int1}^{2}})}\lesssim\|\hat{u}\|_{C^{3}}^{2}+\|\hat{u}\|_{C^{2}}^{2},
C(Rint22),C(|Rint1|2)u^C42+v^C32,C(Rsb12)v^C32,C(Rsb22)u^C42,\displaystyle\qquad C_{(R_{int2}^{2})},C_{(|\nabla R_{int1}|^{2})}\lesssim\|\hat{u}\|_{C^{4}}^{2}+\|\hat{v}\|_{C^{3}}^{2},\quad C_{({R_{sb1}^{2}})}\lesssim\|\hat{v}\|_{C^{3}}^{2},\quad C_{({R_{sb2}^{2}})}\lesssim\|\hat{u}\|_{C^{4}}^{2},

and the bounds uθCn\|u_{\theta}\|_{C^{n}} and vθCn\|v_{\theta}\|_{C^{n}} (nn\in\mathbb{N}) are given by Lemma 8.4.

Proof.

By combining Theorem 3.4 with the quadrature error formula (18), we have

D|Rtb1|2d𝒙\displaystyle\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|Rtb1|2d𝒙𝒬MtbD(Rtb12)+𝒬MtbD(Rtb12)\displaystyle=\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})
C(Rtb12)Mtb2d+𝒬MtbD(Rtb12),\displaystyle\leq C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2}),
D|Rtb2|2d𝒙\displaystyle\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x} =D|Rtb2|2d𝒙𝒬MtbD(Rtb22)+𝒬MtbD(Rtb22)\displaystyle=\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})
C(Rtb22)Mtb2d+𝒬MtbD(Rtb22),\displaystyle\leq C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2}),
D|Rtb1|2d𝒙\displaystyle\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|Rtb1|2d𝒙𝒬MtbD(|Rtb1|2)+𝒬MtbD(|Rtb1|2)\displaystyle=\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})
C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2),\displaystyle\leq C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2}),
Ω|Rint1|2d𝒙dt\displaystyle\int_{\Omega}|R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint1|2d𝒙dt𝒬MintΩ(Rint12)+𝒬MintΩ(Rint12)\displaystyle=\int_{\Omega}|R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})
C(Rint12)Mint2d+1+𝒬MintΩ(Rint12),\displaystyle\leq C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2}),
Ω|Rint2|2d𝒙dt\displaystyle\int_{\Omega}|R_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint2|2d𝒙dt𝒬MintΩ(Rint22)+𝒬MintΩ(Rint22)\displaystyle=\int_{\Omega}|R_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})
C(Rint22)Mint2d+1+𝒬MintΩ(Rint22),\displaystyle\leq C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2}),
Ω|Rint1|2d𝒙dt\displaystyle\int_{\Omega}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint1|2d𝒙dt𝒬MintΩ(|Rint1|2)+𝒬MintΩ(|Rint1|2)\displaystyle=\int_{\Omega}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})
C(|Rint1|2)Mint2d+1+𝒬MintΩ(|Rint1|2),\displaystyle\leq C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2}),
Ω|Rsb1|2ds(𝒙)dt\displaystyle\int_{\Omega_{*}}|R_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t =Ω|Rsb1|2ds(𝒙)dt𝒬MsbΩ(Rsb12)+𝒬MsbΩ(Rsb12)\displaystyle=\int_{\Omega_{*}}|R_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2})
C(Rsb12)Msb2d+𝒬MsbΩ(Rsb12),\displaystyle\leq C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2}),
Ω|Rsb2|2ds(𝒙)dt\displaystyle\int_{\Omega_{*}}|R_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t =Ω|Rsb2|2ds(𝒙)dt𝒬MsbΩ(Rsb22)+𝒬MsbΩ(Rsb22)\displaystyle=\int_{\Omega_{*}}|R_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2})
C(Rsb22)Msb2d+𝒬MsbΩ(Rsb22).\displaystyle\leq C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2}).

By the above inequalities and (29), it holds that

0TD(|u^(𝒙,t)|2+|u^(𝒙,t)|2+|v^(𝒙,t)|2)d𝒙dtCTTexp(2T),\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+|\nabla\hat{u}(\bm{x},t)|^{2}+|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp(2T),

where

CT=\displaystyle C_{T}= C(Rtb12)Mtb2d+𝒬MtbD(Rtb12)+C(Rtb22)Mtb2d+𝒬MtbD(Rtb22)+C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2)\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})
+C(Rint12)Mint2d+1+𝒬MintΩ(Rint12)+C(Rint22)Mint2d+1+𝒬MintΩ(Rint22)+C(|Rint1|2)Mint2d+1\displaystyle+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+C_{(R_{int2}^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}
+𝒬MintΩ(|Rint1|2)+C(Rsb12)Msb2d+𝒬MsbΩ(Rsb12)+C(Rsb22)Msb2d+𝒬MsbΩ(Rsb22).\displaystyle+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})+C_{({R_{sb1}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb1}^{2})+C_{({R_{sb2}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb2}^{2}).

The complexities of the constants C(Rq2)C_{({R_{q}^{2}})} are given by Lemma 8.4, and we observe that for every residual RqR_{q}, it holds that Rq2Cn2nRqCn2\|R_{q}^{2}\|_{C^{n}}\leq 2^{n}\|R_{q}\|_{C^{n}}^{2} (nn\in\mathbb{N}) for Rq=Rtb1R_{q}=R_{tb1}, Rtb2R_{tb2}, Rtb1\nabla R_{tb1}, Rint1R_{int1}, Rint2R_{int2}, Rint1\nabla R_{int1} and Rsb2R_{sb2}. ∎

4 Physics Informed Neural Networks for Approximating the Sine-Gordon Equation

4.1 Sine-Gordon Equation

Let DdD\subset\mathbb{R}^{d} be an open connected bounded set with a boundary D\partial D. We consider the following Sine-Gordon equation:

utv=0inD×[0,T],\displaystyle u_{t}-v=0\ \qquad\qquad\qquad\qquad\qquad\qquad\quad\ \ \,\text{in}\ D\times[0,T], (38a)
ε2vt=a2Δuε12ug(u)+finD×[0,T],\displaystyle\varepsilon^{2}v_{t}=a^{2}\Delta u-\varepsilon_{1}^{2}u-g(u)+f\ \ \quad\qquad\qquad\text{in}\ D\times[0,T], (38b)
u(𝒙,0)=ψ1(𝒙)inD,\displaystyle u(\bm{x},0)=\psi_{1}(\bm{x})\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ D, (38c)
v(𝒙,0)=ψ2(𝒙)inD,\displaystyle v(\bm{x},0)=\psi_{2}(\bm{x})\qquad\qquad\qquad\qquad\qquad\qquad\,\text{in}\ D, (38d)
u(𝒙,t)|D=ud(t)inD×[0,T],\displaystyle u(\bm{x},t)|_{\partial D}=u_{d}(t)\qquad\qquad\qquad\qquad\qquad\ \ \ \,\text{in}\ \partial D\times[0,T], (38e)

where uu and vv are the field functions to be solved for, ff is a source term, and udu_{d}, ψ1\psi_{1} and ψ2\psi_{2} denote the boundary/initial conditions. ε>0\varepsilon>0, a>0a>0 and ε10\varepsilon_{1}\geq 0 are constants. g(u)g(u) is a nonlinear term. We assume that the nonlinearity is globally Lipschitz, i.e., there exists a constant LL (independent of vv and ww) such that

|g(v)g(w)|L|vw|,v,w.|g(v)-g(w)|\leq L|v-w|,\qquad\forall v,\,w\in\mathbb{R}. (39)
Remark 4.1.

The existence and regularity of the solution to the Sine-Gordon equation with different nonlinear terms have been the subject of several studies in the literature; see Baoxiang1997Classical ; Kubota2001Global ; Shatah1982Global ; Shatah1985Normal ; Temam1997Infinite .

The book Temam1997Infinite provides the existence and regularity result of the following Sine-Gordon equation,

utt+αutΔu+g(u)=f.u_{tt}+\alpha u_{t}-\Delta u+g(u)=f.

Let α\alpha\in\mathbb{R}, g(u)g(u) be a C2C^{2} function from \mathbb{R} to \mathbb{R} and satisfy certain assumptions. If fC([0,T];L2(D))f\in C([0,T];L^{2}(D)), ψ1H1(D)\psi_{1}\in H^{1}(D) and ψ2L2(D)\psi_{2}\in L^{2}(D), then there exists a unique solution uu to this Sine-Gordon equation such that uC([0,T];H1(D))u\in C([0,T];H^{1}(D)) and utC([0,T];L2(D))u_{t}\in C([0,T];L^{2}(D)). Furthermore, fC([0,T];L2(D))f^{\prime}\in C([0,T];L^{2}(D)), ψ1H2(D)\psi_{1}\in H^{2}(D) and ψ2H1(D)\psi_{2}\in H^{1}(D), it holds uC([0,T];H2(D))u\in C([0,T];H^{2}(D)) and utC([0,T];H1(D))u_{t}\in C([0,T];H^{1}(D)).

Let gg be a smooth function of degree 2. The following equation is studied in Shatah1985Normal ,

uttΔu+u+g(u,ut,utt)=0,u_{tt}-\Delta u+u+g(u,u_{t},u_{tt})=0,

where it is reformulated as

𝒖t=A𝒖+G(𝒖),\bm{u}_{t}=A\bm{u}+G(\bm{u}),

in which 𝐮=(uut)\bm{u}=\begin{pmatrix}u\\ u_{t}\end{pmatrix}, A=(01Δ10)A=\begin{pmatrix}0&1\\ \Delta-1&0\end{pmatrix} and G=(0,g(u,ut,utt))G=\begin{pmatrix}0,\\ -g(u,u_{t},u_{tt})\end{pmatrix}. Set X=Hk(n)Hk1(n)X=H^{k}(\mathbb{R}^{n})\bigoplus H^{k-1}(\mathbb{R}^{n}), k>n+2+2ak>n+2+2a with a>1a>1. Given 𝐮0=(ψ1ψ2)X\bm{u}_{0}=\begin{pmatrix}\psi_{1}\\ \psi_{2}\end{pmatrix}\in X and 𝐮0X=σ\|\bm{u}_{0}\|_{X}=\sigma, there exists a T0=T0(σ)T_{0}=T_{0}(\sigma) depending on the size of the initial data σ\sigma and a unique solution 𝐮C([0,T0],X)\bm{u}\in C([0,T_{0}],X).

The reference Baoxiang1997Classical provides the following result. Under certain conditions for the nonlinear term g(u)g(u), with f=0f=0, d5d\leq 5, kd2+1k\geq\frac{d}{2}+1, ψ1Hk(D)\psi_{1}\in H^{k}(D) and ψ2Hk1(D)\psi_{2}\in H^{k-1}(D), there exists a unique solution uC((0,);Hk(D))u\in C((0,\infty);H^{k}(D)) of nonlinear Klein–Gordon equation.

The following result is due to Kubota2001Global . Under certain conditions for the nonlinear term g(u)g(u), with f=0f=0, ψ1Hk(D)\psi_{1}\in H^{k}(D) and ψ2Hk1(D)\psi_{2}\in H^{k-1}(D) with a positive constant k4k\geq 4, there exists a positive constant TkT_{k} and a unique solution uC([0,Tk];Hk(D))C1([0,Tk];Hk1(D))C2([0,Tk];Hk2(D))u\in C([0,T_{k}];H^{k}(D))\cap C^{1}([0,T_{k}];H^{k-1}(D))\cap C^{2}([0,T_{k}];H^{k-2}(D)) to the nonlinear wave equations with different speeds of propagation.

A survey of literature indicates that, while several works have touched on the regularity of the solution to the Sine-Gordon equations, none of them is comprehensive. To facilitate the subsequent analyses, we make the following assumption in light of Remark 4.1. Let k1k\geq 1, g(u)g(u) and ff be sufficiently smooth and bounded. Given ψ1Hr(D)\psi_{1}\in H^{r}(D) and ψ2Hr1(D)\psi_{2}\in H^{r-1}(D) with rd2+kr\geq\frac{d}{2}+k, we assume that there exists T>0T>0 and a classical solution uu and vv to the Sine-Gordon equations (38) such that uC([0,T];Hk(D))u\in C([0,T];H^{k}(D)) and vC([0,T];Hk1(D))v\in C([0,T];H^{k-1}(D)). Then, it follows that uCk(D×[0,T])u\in C^{k}(D\times[0,T]) and vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]) based on the Sobolev embedding theorem.

4.2 Physics Informed Neural Networks

Let Ω=D×[0,T]\Omega=D\times[0,T] and Ω=D×[0,T]\Omega_{*}=\partial D\times[0,T] be the space-time domain. We define the following residuals for the PINN approximation, uθ:Ωu_{\theta}:\Omega\rightarrow\mathbb{R} and vθ:Ωv_{\theta}:\Omega\rightarrow\mathbb{R}, for the Sine-Gordon equations (38):

Rint1[uθ,vθ](𝒙,t)=uθtvθ,\displaystyle R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)=u_{\theta t}-v_{\theta}, (40a)
Rint2[uθ,vθ](𝒙,t)=ε2vθta2Δuθ+ε12uθ+g(uθ)f,\displaystyle R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)=\varepsilon^{2}v_{\theta t}-a^{2}\Delta u_{\theta}+\varepsilon_{1}^{2}u_{\theta}+g(u_{\theta})-f, (40b)
Rtb1[uθ](𝒙)=uθ(𝒙,0)ψ1(𝒙),\displaystyle R_{tb1}[u_{\theta}](\bm{x})=u_{\theta}(\bm{x},0)-\psi_{1}(\bm{x}), (40c)
Rtb2[vθ](𝒙)=vθ(𝒙,0)ψ2(𝒙),\displaystyle R_{tb2}[v_{\theta}](\bm{x})=v_{\theta}(\bm{x},0)-\psi_{2}(\bm{x}), (40d)
Rsb[vθ](𝒙,t)=vθ(𝒙,t)|Dudt(t),\displaystyle R_{sb}[v_{\theta}](\bm{x},t)=v_{\theta}(\bm{x},t)|_{\partial D}-u_{dt}(t), (40e)

where udt=udtu_{dt}=\frac{\partial u_{d}}{\partial t}. Note that for the exact solution (u,v)(u,v), Rint1[u,v]=Rint2[u,v]=Rtb1[u]=Rtb2[v]=Rsb[v]=0R_{int1}[u,v]=R_{int2}[u,v]=R_{tb1}[u]=R_{tb2}[v]=R_{sb}[v]=0. With PINN we minimize the following generalization error,

G(θ)2\displaystyle\mathcal{E}_{G}(\theta)^{2} =Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint2[uθ,vθ](𝒙,t)|2d𝒙dt+Ω|Rint1[uθ,vθ](𝒙,t)|2d𝒙dt\displaystyle=\int_{\Omega}|R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|R_{int2}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t
+D|Rtb1[uθ](𝒙)|2d𝒙+D|Rtb2[vθ](𝒙)|2d𝒙+D|Rtb1[uθ](𝒙)|2d𝒙\displaystyle+\int_{D}|R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|R_{tb2}[v_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla R_{tb1}[u_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+(Ω|Rsb[vθ](𝒙,t)|2ds(𝒙)dt)12.\displaystyle+\left(\int_{\Omega_{*}}|R_{sb}[v_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}. (41)

Let

u^=uθu,v^=vθv,\hat{u}=u_{\theta}-u,\quad\hat{v}=v_{\theta}-v,

where (u,v)(u,v) denotes the exact solution. We define the total error of the PINN approximation of the Sine-Gordon equations (38) as,

(θ)2=Ω(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dt.\mathcal{E}(\theta)^{2}=\int_{\Omega}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t. (42)

Then we choose the training set 𝒮D¯×[0,T]\mathcal{S}\subset\overline{D}\times[0,T] with 𝒮=𝒮int𝒮sb𝒮tb\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb}, based on suitable quadrature points:

  • Interior training points 𝒮int={zn}\mathcal{S}_{int}=\{{z}_{n}\} for 1nNint1\leq n\leq N_{int}, with each zn=(𝒙,t)nD×(0,T){z}_{n}=(\bm{x},t)_{n}\in D\times(0,T).

  • Spatial boundary training points 𝒮sb={zn}\mathcal{S}_{sb}=\{{z}_{n}\} for 1nNsb1\leq n\leq N_{sb}, with each zn=(𝒙,t)nD×(0,T){z}_{n}=(\bm{x},t)_{n}\in\partial D\times(0,T).

  • Temporal boundary training points 𝒮tb={𝒙n}\mathcal{S}_{tb}=\{\bm{x}_{n}\} for 1nNtb1\leq n\leq N_{tb} with each 𝒙nD\bm{x}_{n}\in D.

The integrals in (4.2) are approximated by a numerical quadrature rule, resulting in the training loss,

T(θ,𝒮)2\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2} =Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}
+Ttb1(θ,𝒮tb)2+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Tsb(θ,𝒮sb),\displaystyle+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb}), (43)

where

Tint1(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (44a)
Tint2(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint2[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|R_{int2}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (44b)
Tint3(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|Rint1[uθ,vθ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\nabla R_{int1}[u_{\theta},v_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (44c)
Ttb1(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}, (44d)
Ttb2(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb2[vθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|R_{tb2}[v_{\theta}](\bm{x}_{tb}^{n})|^{2}, (44e)
Ttb3(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|Rtb1[uθ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\nabla R_{tb1}[u_{\theta}](\bm{x}_{tb}^{n})|^{2}, (44f)
Tsb(θ,𝒮sb)2\displaystyle\mathcal{E}_{T}^{sb}(\theta,\mathcal{S}_{sb})^{2} =n=1Nsbωsbn|Rsb[vθ](𝒙sbn,tsbn)|2.\displaystyle=\sum_{n=1}^{N_{sb}}\omega_{sb}^{n}|R_{sb}[v_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}. (44g)

Here the quadrature points in space-time constitute the data sets 𝒮int={(𝒙intn,tintn)}n=1Nint\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}, 𝒮tb={𝒙tbn)}n=1Ntb\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}} and 𝒮sb={(𝒙sbn,tsbn)}n=1Nsb\mathcal{S}_{sb}=\{(\bm{x}_{sb}^{n},t_{sb}^{n})\}_{n=1}^{N_{sb}}, and ωn\omega_{\star}^{n} are the quadrature weights with \star being intint, tbtb or sbsb.

4.3 Error Analysis

By substracting the Sine-Gordon equations (38) from the residual equations (40), we get,

Rint1=u^tv^,\displaystyle R_{int1}=\hat{u}_{t}-\hat{v}, (45a)
Rint2=ε2v^ta2Δu^+ε12u^+g(uθ)g(u),\displaystyle R_{int2}=\varepsilon^{2}\hat{v}_{t}-a^{2}\Delta\hat{u}+\varepsilon_{1}^{2}\hat{u}+g(u_{\theta})-g(u), (45b)
Rtb1=u^(𝒙,0),\displaystyle R_{tb1}=\hat{u}(\bm{x},0), (45c)
Rtb2=v^(𝒙,0),\displaystyle R_{tb2}=\hat{v}(\bm{x},0), (45d)
Rsb=v^(𝒙,t)|D.\displaystyle R_{sb}=\hat{v}(\bm{x},t)|_{\partial D}. (45e)

The results on the PINN approximations to the Sine-Gordon equations are summarized in the following theorems.

Theorem 4.2.

Let dd, rr, kk\in\mathbb{N} with k3k\geq 3. Assume that g(u)g(u) is Lipschitz continuous, uCk(D×[0,T])u\in C^{k}(D\times[0,T]) and vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]). Then for every integer N>5N>5, there exist tanh\tanh neural networks uθu_{\theta} and vθv_{\theta}, each with two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that

Rint1L2(Ω),Rtb1L2(D)lnNNk+1,\displaystyle\|R_{int1}\|_{L^{2}(\Omega)},\|R_{tb1}\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1}, (46a)
Rint2L2(Ω),Rint1L2(Ω),Rtb1L2(D)ln2NNk+2,\displaystyle\|R_{int2}\|_{L^{2}(\Omega)},\|\nabla R_{int1}\|_{L^{2}(\Omega)},\|\nabla R_{tb1}\|_{L^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2}, (46b)
Rtb2L2(D),RsbL2(D×[0,t])lnNNk+2.\displaystyle\|R_{tb2}\|_{L^{2}(D)},\|R_{sb}\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}. (46c)

The proof of this theorem is provided in the Appendix 8.3.

Theorem 4.2 implies that the PINN residuals in (40) can be made arbitrarily small by choosing a sufficiently large NN. Therefore, the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2} can be made arbitrarily small.

We next show that the PINN total approximation error (θ)2\mathcal{E}(\theta)^{2} can be controlled by the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2} (Theorem 4.3 below), and by the training error T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} (Theorem 4.4 below). The proofs for Theorem 4.3 and Theorem 4.4 are provided in the Appendix 8.3.

Theorem 4.3.

Let dd\in\mathbb{N}, uC1(Ω)u\in C^{1}(\Omega) and vC0(Ω)v\in C^{0}(\Omega) be the classical solution of the Sine-Gordon equation (38). Let (uθ,vθ)(u_{\theta},v_{\theta}) denote the PINN approximation with parameter θ\theta. Then the following relation holds,

(θ)2=0TD(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dtCGTexp((2+ε12+L+a2)T),\mathcal{E}(\theta)^{2}=\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right), (47)

where

CG=D(|Rtb1|2+a2|Rtb1|2+ε2|Rtb2|2)d𝒙+0TD(|Rint1|2+|Rint2|2+a2|Rint1|2)d𝒙dt\displaystyle C_{G}=\int_{D}(|R_{tb1}|^{2}+a^{2}|\nabla R_{tb1}|^{2}+\varepsilon^{2}|R_{tb2}|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(|R_{int1}|^{2}+|R_{int2}|^{2}+a^{2}|\nabla R_{int1}|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t
+2CD|T|12(0TD|Rsb|2ds(𝒙)dt)12,\displaystyle\qquad+2C_{\partial D}|T|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},

and CD=a2|D|12(uC1(D×[0,t])+uθC1(D×[0,t]))C_{\partial D}=a^{2}|\partial D|^{\frac{1}{2}}(\|u\|_{C^{1}(\partial D\times[0,t])}+||u_{\theta}||_{C^{1}(\partial D\times[0,t])}).

Theorem 4.4.

Let dd\in\mathbb{N} and T>0T>0, and let uC4(Ω)u\in C^{4}(\Omega) and vC3(Ω)v\in C^{3}(\Omega) be the classical solution to the Sine-Gordon equation (38). Let (uθ,vθ)(u_{\theta},v_{\theta}) denote the PINN approximation with parameter θΘ\theta\in\Theta. Then the following relation holds,

0TD(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dtCTTexp((2+ε12+L+a2)T)\displaystyle\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right)
=𝒪(T(θ,𝒮)2+Mint2d+1+Mtb2d+Msb1d),\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta,\mathcal{S})^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}), (48)

where the constant CTC_{T} is defined by

CT=\displaystyle C_{T}= C(Rtb12)Mtb2d+𝒬MtbD(Rtb12)+ε2(C(Rtb22)Mtb2d+𝒬MtbD(Rtb22))\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\varepsilon^{2}\left(C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})\right)
+a2(C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2))+C(Rint12)Mint2d+1+𝒬MintΩ(Rint12)\displaystyle+a^{2}\left(C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})\right)+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})
+C(Rint22)Mint2d+1+𝒬MintΩ(Rint22)+a2(C(|Rint1|2)Mint2d+1+𝒬MintΩ(|Rint1|2)),\displaystyle+C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+a^{2}\left(C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})\right),
+2CD|T|12(C(Rsb2)Msb2d+𝒬MsbΩ(Rsb2))12.\displaystyle+2C_{\partial D}|T|^{\frac{1}{2}}\left(C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})\right)^{\frac{1}{2}}.

It follows from Theorem 4.4 that the PINN approximation error (θ)2\mathcal{E}(\theta)^{2} can be arbitrarily small, provided that the training error T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} is sufficiently small and the sample set is sufficiently large.

5 Physics Informed Neural Networks for Approximating Linear Elastodynamic Equation

5.1 Linear Elastodynamic Equation

Consider an elastic body occupying an open, bounded convex polyhedral domain DdD\subset\mathbb{R}^{d}. The boundary D=ΓDΓN\partial D=\Gamma_{D}\cup\Gamma_{N}, with the outward unit normal vector 𝒏\bm{n}, is assumed to be composed of two disjoint portions ΓD\Gamma_{D}\neq\emptyset and ΓN\Gamma_{N}, with ΓDΓN=\Gamma_{D}\cap\Gamma_{N}=\emptyset. Given a suitable external load 𝒇L2((0,T];𝑳2(D))\bm{f}\in L^{2}((0,T];\bm{L}^{2}(D)), and suitable initial/boundary data 𝒈C1((0,T];𝑯12(ΓN))\bm{g}\in C^{1}((0,T];\bm{H}^{\frac{1}{2}}(\Gamma_{N})), 𝝍1𝑯0,ΓD12(D)\bm{\psi}_{1}\in\bm{H}_{0,\Gamma_{D}}^{\frac{1}{2}}(D) and 𝝍2𝑳2(D)\bm{\psi}_{2}\in\bm{L}^{2}(D), we consider the linear elastodynamic equations,

𝒖t𝒗=0inD×[0,T],\displaystyle\bm{u}_{t}-\bm{v}=0\ \quad\quad\qquad\qquad\qquad\qquad\qquad\quad\ \ \,\text{in}\ D\times[0,T], (49a)
ρ𝒗t2μ(𝜺¯(𝒖))λ(𝒖)=𝒇inD×[0,T],\displaystyle\rho\bm{v}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}\,\quad\qquad\text{in}\ D\times[0,T], (49b)
𝒖=𝒖dinΓD×[0,T],\displaystyle\bm{u}=\bm{u}_{d}\ \ \,\quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ \Gamma_{D}\times[0,T], (49c)
2μ𝜺¯(𝒖)𝒏+λ(𝒖)𝒏=𝒈inΓN×[0,T],\displaystyle 2\mu\underline{\bm{\varepsilon}}(\bm{u})\bm{n}+\lambda(\nabla\cdot\bm{u})\bm{n}=\bm{g}\ \ \quad\qquad\qquad\qquad\text{in}\ \Gamma_{N}\times[0,T], (49d)
𝒖=𝝍1inD×{0},\displaystyle\bm{u}=\bm{\psi}_{1}\ \ \quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\,\text{in}\ D\times\{0\}, (49e)
𝒗=𝝍2inD×{0}.\displaystyle\bm{v}=\bm{\psi}_{2}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\text{in}\ D\times\{0\}. (49f)

In the above system, 𝒖=(u1,u2,,ud)\bm{u}=(u_{1},u_{2},\cdots,u_{d}) and 𝒗=(v1,v2,,vd)\bm{v}=(v_{1},v_{2},\cdots,v_{d}) denote the displacement and the velocity, respectively, and [0,T][0,T] (with T>0T>0) denotes the time domain. 𝜺¯(𝒖)\underline{\bm{\varepsilon}}(\bm{u}) is the strain tensor, 𝜺¯(𝒖)=12(𝒖+𝒖T)\underline{\bm{\varepsilon}}(\bm{u})=\frac{1}{2}(\nabla\bm{u}+\nabla\bm{u}^{T}). The constants λ\lambda and μ\mu are the first and the second Lame´{\rm\acute{e}} parameters, respectively.

Combining (49a) and (49b), we can recover the classical linear elastodynamics equation:

ρ𝒖tt2μ(𝜺¯(𝒖))λ(𝒖)=𝒇inD×[0,T].\rho\bm{u}_{tt}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}\qquad\text{in}\ D\times[0,T]. (50)

The well-posedness of this equation is established in Hughes1978Classical .

Lemma 5.1 (Hughes1978Classical ; Yosida1980Functional ).

Let 𝛙1Hr(D)\bm{\psi}_{1}\in H^{r}(D), 𝛙2Hr1(D)\bm{\psi}_{2}\in H^{r-1}(D) and 𝐟Hr1(D×[0,T])\bm{f}\in H^{r-1}(D\times[0,T]) with r1r\geq 1. Then there exists a unique solution 𝐮\bm{u} to the classical linear elastodynamic equation (50) such that 𝐮(t=0)=𝛙1\bm{u}(t=0)=\bm{\psi}_{1}, 𝐮t(t=0)=𝛙2\bm{u}_{t}(t=0)=\bm{\psi}_{2} and 𝐮Cl([0,T];Hrl(D))\bm{u}\in C^{l}([0,T];H^{r-l}(D)) with 0lr0\leq l\leq r.

Lemma 5.2.

Let kk\in\mathbb{N}, 𝛙1Hr(D)\bm{\psi}_{1}\in H^{r}(D), 𝛙2Hr1(D)\bm{\psi}_{2}\in H^{r-1}(D) and 𝐟Hr1(D×[0,T])\bm{f}\in H^{r-1}(D\times[0,T]) with r>d2+kr>\frac{d}{2}+k, then there exists T>0T>0 and a classical solution (𝐮,𝐯)(\bm{u},\bm{v}) to the elastodynamic equations (49) such that 𝐮(t=0)=𝛙1\bm{u}(t=0)=\bm{\psi}_{1}, 𝐮t(t=0)=𝛙2\bm{u}_{t}(t=0)=\bm{\psi}_{2}, 𝐮Ck(D×[0,T])\bm{u}\in C^{k}(D\times[0,T]) and 𝐯Ck1(D×[0,T])\bm{v}\in C^{k-1}(D\times[0,T]).

Proof.

As r>d2+kr>\frac{d}{2}+k, Hrk(D)H^{r-k}(D) is a Banach algebra. By Lemma 5.1, there exists T>0T>0 and the solution (𝒖,𝒗)(\bm{u},\bm{v}) to the linear elastodynamics equations such that 𝒖(t=0)=𝝍1\bm{u}(t=0)=\bm{\psi}_{1}, 𝒗(t=0)=𝝍2\bm{v}(t=0)=\bm{\psi}_{2}, 𝒖Cl([0,T];Hrl(D))\bm{u}\in C^{l}([0,T];H^{r-l}(D)) with 0lr0\leq l\leq r and 𝒗Cl([0,T];Hr1l(D))\bm{v}\in C^{l}([0,T];H^{r-1-l}(D)) with 0lr10\leq l\leq r-1.

Since 𝒖l=0kCl([0,T];Hrl(D))\bm{u}\in\cap_{l=0}^{k}C^{l}([0,T];H^{r-l}(D)) and 𝒗l=0k1Cl([0,T];Hrl1(D))\bm{v}\subset\cap_{l=0}^{k-1}C^{l}([0,T];H^{r-l-1}(D)). By applying the Sobolev embedding theorem and r>d2+kr>\frac{d}{2}+k, we obtain Hrl(D)Crl(D)H^{r-l}(D)\subset C^{r-l}(D) and Hrl1(D)Crl1(D)H^{r-l-1}(D)\subset C^{r-l-1}(D) for 0lk0\leq l\leq k. Therefore, 𝒖Ck(D×[0,T])\bm{u}\in C^{k}(D\times[0,T]) and 𝒗Ck1(D×[0,T])\bm{v}\in C^{k-1}(D\times[0,T]). ∎

5.2 Physics Informed Neural Networks

We now consider the PINN approximation of the linear elastodynamic equations (49). Let Ω=D×[0,T]\Omega=D\times[0,T], ΩD=ΓD×[0,T]\Omega_{D}=\Gamma_{D}\times[0,T] and ΩN=ΓN×[0,T]\Omega_{N}=\Gamma_{N}\times[0,T] denote the space-time domain. Define the following residuals for the PINN approximation 𝒖θ:Ω\bm{u}_{\theta}:\Omega\rightarrow\mathbb{R} and 𝒗θ:Ω\bm{v}_{\theta}:\Omega\rightarrow\mathbb{R} for the elastodynamic equations (49):

𝑹int1[𝒖θ,𝒗θ](𝒙,t)=𝒖θt𝒗θ,\displaystyle\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)=\bm{u}_{\theta t}-\bm{v}_{\theta}, (51a)
𝑹int2[𝒖θ,𝒗θ](𝒙,t)=ρ𝒗θt2μ(𝜺¯(𝒖θ))λ(𝒖θ)𝒇,\displaystyle\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)=\rho\bm{v}_{\theta t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}))-\lambda\nabla(\nabla\cdot\bm{u}_{\theta})-\bm{f}, (51b)
𝑹tb1[𝒖θ](𝒙)=𝒖θ(𝒙,0)𝝍1(𝒙),\displaystyle\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})=\bm{u}_{\theta}(\bm{x},0)-\bm{\psi}_{1}(\bm{x}), (51c)
𝑹tb2[𝒗θ](𝒙)=𝒗θ(𝒙,0)𝝍2(𝒙),\displaystyle\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x})=\bm{v}_{\theta}(\bm{x},0)-\bm{\psi}_{2}(\bm{x}), (51d)
𝑹sb1[𝒗θ](𝒙,t)=𝒗θ|ΓD𝒖dt,\displaystyle\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x},t)=\bm{v}_{\theta}|_{\Gamma_{D}}-\bm{u}_{dt}, (51e)
𝑹sb2[𝒖θ](𝒙,t)=(2μ𝜺¯(𝒖θ)𝒏+λ(𝒖θ)𝒏)|ΓN𝒈.\displaystyle\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x},t)=(2\mu\underline{\bm{\varepsilon}}(\bm{u}_{\theta})\bm{n}+\lambda(\nabla\cdot\bm{u}_{\theta})\bm{n})|_{\Gamma_{N}}-\bm{g}. (51f)

Note that for the exact solution (𝒖,𝒗)(\bm{u},\bm{v}), we have 𝑹int1[𝒖,𝒗]=𝑹int2[𝒖,𝒗]=𝑹tb1[𝒖]=𝑹tb2[𝒗]=𝑹sb1[𝒗]=𝑹sb2[𝒖]=0\bm{R}_{int1}[\bm{u},\bm{v}]=\bm{R}_{int2}[\bm{u},\bm{v}]=\bm{R}_{tb1}[\bm{u}]=\bm{R}_{tb2}[\bm{v}]=\bm{R}_{sb1}[\bm{v}]=\bm{R}_{sb2}[\bm{u}]=0. With PINN we minimize the the following generalization error,

G(θ)2\displaystyle\mathcal{E}_{G}(\theta)^{2} =Ω|𝑹int1[𝒖θ,𝒗θ](𝒙,t)|2d𝒙dt+Ω|𝑹int2[𝒖θ,𝒗θ](𝒙,t)|2d𝒙dt+Ω|𝜺¯(𝑹int1[𝒖θ,𝒗θ](𝒙,t))|2d𝒙dt\displaystyle=\int_{\Omega}|\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{\Omega}|\underline{\bm{\varepsilon}}(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t))|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t
+Ω|(𝑹int1[𝒖θ,𝒗θ](𝒙,t))|2d𝒙dt+D|𝑹tb1[𝒖θ](𝒙)|2d𝒙+D|𝑹tb2[𝒗θ](𝒙)|2d𝒙\displaystyle+\int_{\Omega}|\nabla\cdot(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x},t))|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t+\int_{D}|\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+D|𝜺¯(𝑹tb1[𝒖θ](𝒙))|2d𝒙+D|𝑹tb1[𝒖θ](𝒙)|2d𝒙\displaystyle+\int_{D}|\underline{\bm{\varepsilon}}(\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}))|^{2}{\,\rm{d}}\bm{x}+\int_{D}|\nabla\cdot\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x})|^{2}{\,\rm{d}}\bm{x}
+(ΩD|𝑹sb1[𝒗θ](𝒙,t)|2ds(𝒙)dt)12+(ΩN|𝑹sb2[𝒖θ](𝒙,t)|2ds(𝒙)dt)12.\displaystyle+\left(\int_{\Omega_{D}}|\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+\left(\int_{\Omega_{N}}|\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x},t)|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}. (52)

Let

𝒖^=𝒖θ𝒖,𝒗^=𝒗θ𝒗\hat{\bm{u}}=\bm{u}_{\theta}-\bm{u},\quad\hat{\bm{v}}=\bm{v}_{\theta}-\bm{v}

denote the difference between the solution to the elastodynamic equations (49) and the PINN approximation with parameter θ\theta. We define the total error of the PINN approximation as,

(θ)2=Ω(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dt.\mathcal{E}(\theta)^{2}=\int_{\Omega}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t. (53)

We choose the training set 𝒮D¯×[0,T]\mathcal{S}\subset\overline{D}\times[0,T] based on suitable quadrature points. The full training set is defined by 𝒮=𝒮int𝒮sb𝒮tb\mathcal{S}=\mathcal{S}_{int}\cup\mathcal{S}_{sb}\cup\mathcal{S}_{tb}, and 𝒮sb=𝒮sb1𝒮sb2\mathcal{S}_{sb}=\mathcal{S}_{sb1}\cup\mathcal{S}_{sb2}:

  • Interior training points 𝒮int={zn}\mathcal{S}_{int}=\{{z}_{n}\} for 1nNint1\leq n\leq N_{int}, with each zn=(𝒙,t)nD×(0,T){z}_{n}=(\bm{x},t)_{n}\in D\times(0,T).

  • Spatial boundary training points 𝒮sb1={zn}\mathcal{S}_{sb1}=\{{z}_{n}\} for 1nNsb11\leq n\leq N_{sb1}, with each zn=(𝒙,t)nΓD×(0,T){z}_{n}=(\bm{x},t)_{n}\in\Gamma_{D}\times(0,T), and 𝒮sb2={zn}\mathcal{S}_{sb2}=\{{z}_{n}\} for 1nNsb21\leq n\leq N_{sb2}, with each zn=(𝒙,t)nΓN×(0,T){z}_{n}=(\bm{x},t)_{n}\in\Gamma_{N}\times(0,T).

  • Temporal boundary training points 𝒮tb={𝒙n}\mathcal{S}_{tb}=\{\bm{x}_{n}\} for 1nNtb1\leq n\leq N_{tb} with each 𝒙nD\bm{x}_{n}\in D.

Then, the integrals in (5.2) can be approximated by a suitable numerical quadrature, resulting in the following training loss,

T(θ,𝒮)2\displaystyle\mathcal{E}_{T}(\theta,\mathcal{S})^{2} =Tint1(θ,𝒮int)2+Tint2(θ,𝒮int)2+Tint3(θ,𝒮int)2+Tint4(θ,𝒮int)2+Ttb1(θ,𝒮tb)2\displaystyle=\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{int4}(\theta,\mathcal{S}_{int})^{2}+\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2}
+Ttb2(θ,𝒮tb)2+Ttb3(θ,𝒮tb)2+Ttb4(θ,𝒮tb)2+Tsb1(θ,𝒮sb1)+Tsb2(θ,𝒮sb2),\displaystyle\quad+\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{tb4}(\theta,\mathcal{S}_{tb})^{2}+\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb1})+\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb2}), (54)

where,

Tint1(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int1}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|𝑹int1[𝒖θ,𝒗θ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (55a)
Tint2(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int2}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|𝑹int2[𝒖θ,𝒗θ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\bm{R}_{int2}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (55b)
Tint3(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int3}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|𝜺¯(𝑹int1[𝒖θ,𝒗θ](𝒙intn,tintn))|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\underline{\bm{\varepsilon}}(\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n}))|^{2}, (55c)
Tint4(θ,𝒮int)2\displaystyle\mathcal{E}_{T}^{int4}(\theta,\mathcal{S}_{int})^{2} =n=1Nintωintn|𝑹int1[𝒖θ,𝒗θ](𝒙intn,tintn)|2,\displaystyle=\sum_{n=1}^{N_{int}}\omega_{int}^{n}|\nabla\cdot\bm{R}_{int1}[\bm{u}_{\theta},\bm{v}_{\theta}](\bm{x}_{int}^{n},t_{int}^{n})|^{2}, (55d)
Ttb1(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb1}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|𝑹tb1[𝒖θ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n})|^{2}, (55e)
Ttb2(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb2}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|𝑹tb2[𝒗θ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\bm{R}_{tb2}[\bm{v}_{\theta}](\bm{x}_{tb}^{n})|^{2}, (55f)
Ttb3(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb3}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|𝜺¯(𝑹tb1[𝒖θ](𝒙tbn))|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\underline{\bm{\varepsilon}}(\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n}))|^{2}, (55g)
Ttb4(θ,𝒮tb)2\displaystyle\mathcal{E}_{T}^{tb4}(\theta,\mathcal{S}_{tb})^{2} =n=1Ntbωtbn|𝑹tb1[𝒖θ](𝒙tbn)|2,\displaystyle=\sum_{n=1}^{N_{tb}}\omega_{tb}^{n}|\nabla\cdot\bm{R}_{tb1}[\bm{u}_{\theta}](\bm{x}_{tb}^{n})|^{2}, (55h)
Tsb1(θ,𝒮sb1)2\displaystyle\mathcal{E}_{T}^{sb1}(\theta,\mathcal{S}_{sb1})^{2} =n=1Nsb1ωsb1n|𝑹sb1[𝒗θ](𝒙sbn,tsbn)|2,\displaystyle=\sum_{n=1}^{N_{sb1}}\omega_{sb1}^{n}|\bm{R}_{sb1}[\bm{v}_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}, (55i)
Tsb2(θ,𝒮sb2)2\displaystyle\mathcal{E}_{T}^{sb2}(\theta,\mathcal{S}_{sb2})^{2} =n=1Nsb2ωsb2n|𝑹sb2[𝒖θ](𝒙sbn,tsbn)|2.\displaystyle=\sum_{n=1}^{N_{sb2}}\omega_{sb2}^{n}|\bm{R}_{sb2}[\bm{u}_{\theta}](\bm{x}_{sb}^{n},t_{sb}^{n})|^{2}. (55j)

Here the quadrature points in space-time constitute the data sets 𝒮int={(𝒙intn,tintn)}n=1Nint\mathcal{S}_{int}=\{(\bm{x}_{int}^{n},t_{int}^{n})\}_{n=1}^{N_{int}}, 𝒮tb={𝒙tbn)}n=1Ntb\mathcal{S}_{tb}=\{\bm{x}_{tb}^{n})\}_{n=1}^{N_{tb}}, 𝒮sb1={(𝒙sb1n,tsb1n)}n=1Nsb1\mathcal{S}_{sb1}=\{(\bm{x}_{sb1}^{n},t_{sb1}^{n})\}_{n=1}^{N_{sb1}} and 𝒮sb2={(𝒙sb2n,tsb2n)}n=1Nsb2\mathcal{S}_{sb2}=\{(\bm{x}_{sb2}^{n},t_{sb2}^{n})\}_{n=1}^{N_{sb2}}. ωn\omega_{\star}^{n} denote the suitable quadrature weights with \star being intint, tbtb, sb1sb1 and sb2sb2.

5.3 Error Analysis

Subtracting the elastodynamic equations (49) from the residual equations (51), we obtain

𝑹int1=𝒖^t𝒗^,\displaystyle\bm{R}_{int1}=\hat{\bm{u}}_{t}-\hat{\bm{v}}, (56a)
𝑹int2=ρ𝒗^t2μ(𝜺¯(𝒖^))λ(𝒖^),\displaystyle\bm{R}_{int2}=\rho\hat{\bm{v}}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\hat{\bm{u}}))-\lambda\nabla(\nabla\cdot\hat{\bm{u}}), (56b)
𝑹tb1=𝒖^|t=0,\displaystyle\bm{R}_{tb1}=\hat{\bm{u}}|_{t=0}, (56c)
𝑹tb2=𝒗^|t=0,\displaystyle\bm{R}_{tb2}=\hat{\bm{v}}|_{t=0}, (56d)
𝑹sb1=𝒗^|ΓD,\displaystyle\bm{R}_{sb1}=\hat{\bm{v}}|_{\Gamma_{D}}, (56e)
𝑹sb2=(2μ𝜺¯(𝒖^)𝒏+λ(𝒖^)𝒏)|ΓN.\displaystyle\bm{R}_{sb2}=(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})|_{\Gamma_{N}}. (56f)

The PINN approximation results are summarized in the following three theorems. The proofs of these theorems are provided in the Appendix 8.4.

Theorem 5.3.

Let dd, rr, kk\in\mathbb{N} with k3k\geq 3. Let 𝛙1Hr(D)\bm{\psi}_{1}\in H^{r}(D), 𝛙2Hr1(D)\bm{\psi}_{2}\in H^{r-1}(D) and 𝐟Hr1(D×[0,T])\bm{f}\in H^{r-1}(D\times[0,T]) with r>d2+kr>\frac{d}{2}+k. For every integer N>5N>5, there exist tanh\tanh neural networks (𝐮j)θ(\bm{u}_{j})_{\theta} and (𝐯j)θ(\bm{v}_{j})_{\theta}, with j=1,2,,dj=1,2,\cdots,d, each with two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that

𝑹int1L2(Ω),𝑹tb1L2(Ω)lnNNk+1,\displaystyle\|\bm{R}_{int1}\|_{L^{2}(\Omega)},\|\bm{R}_{tb1}\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1}, (57a)
𝑹int2L2(Ω),𝜺¯(𝑹int1)L2(Ω),𝑹int1L2(Ω)ln2NNk+2,\displaystyle\|\bm{R}_{int2}\|_{L^{2}(\Omega)},\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|_{L^{2}(\Omega)},\|\nabla\cdot\bm{R}_{int1}\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}, (57b)
𝜺¯(𝑹tb1)L2(D),𝑹tb1L2(D),𝑹sb2L2(ΓN×[0,t])ln2NNk+2,\displaystyle\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|_{L^{2}(D)},\|\nabla\cdot\bm{R}_{tb1}\|_{L^{2}(D)},\|\bm{R}_{sb2}\|_{L^{2}(\Gamma_{N}\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2}, (57c)
𝑹tb2L2(D),𝑹sb1L2(ΓD×[0,t])lnNNk+2.\displaystyle\|\bm{R}_{tb2}\|_{L^{2}(D)},\|\bm{R}_{sb1}\|_{L^{2}(\Gamma_{D}\times[0,t])}\lesssim{\rm ln}NN^{-k+2}. (57d)

It follows from Theorem 5.3 that, by choosing a sufficiently large NN, one can make the PINN residuals in (51), and thus the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2} in (5.2), arbitrarily small.

Theorem 5.4.

Let dd\in\mathbb{N}, 𝐮C1(Ω)\bm{u}\in C^{1}(\Omega) and 𝐯C(Ω)\bm{v}\in C(\Omega) be the classical solution to the linear elastodynamic equation (49). Let (𝐮θ,𝐯θ)(\bm{u}_{\theta},\bm{v}_{\theta}) denote the PINN approximation with the parameter θ\theta. Then the following relation holds,

0TD(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dtCGTexp((2+2μ+λ)T),\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+2\mu+\lambda)T\right),

where

CG=D|𝑹tb1|2d𝒙+D2μ|𝜺¯(𝑹tb1)|2d𝒙+Dλ|𝑹tb1|2d𝒙+ρD|𝑹tb2|2d𝒙\displaystyle C_{G}=\int_{D}|\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda|\nabla\cdot\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}|\bm{R}_{tb2}|^{2}{\,\rm{d}}\bm{x}
+0TD(|𝑹int1|2+2μ|𝜺¯(𝑹int1)|2+λ|𝑹int1|2+|𝑹int2|2)d𝒙dt\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(|\bm{R}_{int1}|^{2}+2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}+\lambda|\nabla\cdot\bm{R}_{int1}|^{2}+|\bm{R}_{int2}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+2|T|12CΓD(0TΓD|𝑹sb1|2ds(𝒙)dt)12+2|T|12CΓN(0TΓN|𝑹sb2|2ds(𝒙)dt)12,\displaystyle\qquad+2|T|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2|T|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}},

with CΓD=(2μ+λ)|ΓD|12𝐮C1(ΓD×[0,T])+(2μ+λ)|ΓD|12𝐮θC1(ΓD×[0,T])C_{\Gamma_{D}}=(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}\|\bm{u}\|_{C^{1}(\Gamma_{D}\times[0,T])}+(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}||\bm{u}_{\theta}||_{C^{1}(\Gamma_{D}\times[0,T])} and CΓN=|ΓN|12(𝐯C(ΓN×[0,T])+𝐯θC(ΓN×[0,T]))C_{\Gamma_{N}}=|\Gamma_{N}|^{\frac{1}{2}}(\|\bm{v}\|_{C(\Gamma_{N}\times[0,T])}+||\bm{v}_{\theta}||_{C(\Gamma_{N}\times[0,T])}).

Theorem 5.4 shows that the total error of the PINN approximation (θ)2\mathcal{E}(\theta)^{2} can be controlled by the generalization error G(θ)2\mathcal{E}_{G}(\theta)^{2}.

Theorem 5.5.

Let dd\in\mathbb{N}, 𝐮C4(Ω)\bm{u}\in C^{4}(\Omega) and 𝐯C3(Ω)\bm{v}\in C^{3}(\Omega) be the classical solution to the linear elastodynamic equation (49). Let (𝐮θ,𝐯θ)(\bm{u}_{\theta},\bm{v}_{\theta}) denote the PINN approximation with the parameter θ\theta. Then the following relation holds,

0TD(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dtCTTexp((2+2μ+λ)T)\displaystyle\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+2\mu+\lambda)T\right)
=𝒪(T(θ)2+Mint2d+1+Mtb2d+Msb1d),\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}), (58)

where

CT=\displaystyle C_{T}= C(𝑹tb12)Mtb2d+𝒬MtbD(𝑹tb12)+ρ(C(𝑹tb22)Mtb2d+𝒬MtbD(𝑹tb22))+2μ(C(|𝜺¯(𝑹tb1)|2)Mtb2d+𝒬MtbD(|𝜺¯(𝑹tb1)|2))\displaystyle C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\rho\left(C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})\right)+2\mu\left(C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})\right)
+λ(C(|𝑹tb1|2)Mtb2d+𝒬MtbD(|𝑹tb1|2))+C(𝑹int12)Mint2d+1+𝒬MintΩ(𝑹int12)\displaystyle+\lambda\left(C_{(|\nabla\cdot\bm{R}_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla\cdot\bm{R}_{tb1}|^{2})\right)+C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})
+C(𝑹int22)Mint2d+1+𝒬MintΩ(𝑹int22)+2μ(C(|𝜺¯(𝑹int1)|2)Mint2d+1+𝒬MintΩ(|𝜺¯(𝑹int1)|2))\displaystyle+C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+2\mu\left(C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})\right)
+λ(C(|𝑹int1|2)Mint2d+1+𝒬MintΩ(|𝑹int1|2))+2|T|12CΓD(C(𝑹sb12)Msb12d+𝒬Msb1ΩD(𝑹sb12))12\displaystyle+\lambda\left(C_{(|\nabla\cdot\bm{R}_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla\cdot\bm{R}_{int1}|^{2})\right)+2|T|^{\frac{1}{2}}C_{\Gamma_{D}}\left(C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})\right)^{\frac{1}{2}}
+2|T|12CΓN(C(𝑹sb22)Msb22d+𝒬Msb2ΩN(𝑹sb22))12.\displaystyle+2|T|^{\frac{1}{2}}C_{\Gamma_{N}}\left(C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})\right)^{\frac{1}{2}}.

Theorem 5.5 shows that the PINN approximation error (θ)2\mathcal{E}(\theta)^{2} can be controlled by the training error T(θ,𝒮)2\mathcal{E}_{T}(\theta,\mathcal{S})^{2} with a large enough sample set 𝒮\mathcal{S}.

6 Numerical Examples

The theoretical analyses from Sections 3 to 5 suggest several forms for the PINN loss function with the wave, Sine-Gordon and the linear elastodynamic equations. These forms contain certain non-standard terms, such as the square root of the residuals or the gradient terms on some boundaries, which would generally be absent from the canonical PINN formulation of the loss function. The presence of such non-standard terms is crucial to bounding the PINN approximation errors, as shown in the error analyses.

These non-standard forms of the loss function lead to a variant PINN algorithm. In this section we illustrate the performance of the variant PINN algorithm as suggested by the theoretical analysis, as well as the more standard PINN algorithm, using several numerical examples in one spatial dimension (1D) plus time for the wave equation and the Sine-Gordon equation, and in two spatial dimensions (2D) plus time for the linear elastodynamic equation.

The following settings are common to all the numerical simulations in this section. Let (𝒙,t)D×[0,T](\bm{x},t)\in D\times[0,T] denote the spatial and temporal coordinates in the spatial-temporal domain, where 𝒙=x\bm{x}=x and 𝒙=(x,y)\bm{x}=(x,y) for one and two spatial dimensions, respectively. For the wave equation and the Sine-Gordon equation, the neural networks contain two nodes in the input layer (representing xx and tt), two hidden layers with the number of nodes to be specified later, and two nodes in the output layer (representing the solution uu and its time detivative v=utv=\frac{\partial u}{\partial t}). For the linear elastodynamic equaton, three input nodes and four output nodes are employed in the neural network, as will be explained in more detail later. We employ the tanh\tanh (hyperbolic tangent) activation function for all the hidden nodes, and no activation function is applied to the output nodes (i.e. linear). For training the neural networks, we employ NN collocation points within the spatial-temporal domain drawn from a uniform random distribution, and also NN uniform random points on each spatial boundary and on the initial boundary. In the simulations the value of NN is varied systematically among 10001000, 15001500, 20002000, 25002500 and 30003000. After the neural networks are trained, for the wave equation and the Sine-Gordon equation, we compare the PINN solution and the exact solution on a set of Nev=3000×3000N_{ev}=3000\times 3000 uniform spatial-temporal grid points (evaluation points) (x,t)nD×[0,T](x,t)_{n}\in D\times[0,T] (n=1,,Nevn=1,\cdots,N_{ev}) that covers the problem domain and the boundaries. For the elastodynamic equation, we compare the PINN solution and the exact solution at different time instants, and at each time instant the corresponding solutions are evaluated at a uniform set of Nev=1500×1500N_{ev}=1500\times 1500 grid points in the spatial domain, 𝒙n=(x,y)nD\bm{x}_{n}=(x,y)_{n}\in D (n=1,,Nevn=1,\cdots,N_{ev}).

The PINN errors reported below are computed as follows. Let zn=(𝒙,t)nz_{n}=(\bm{x},t)_{n} ((𝒙,t)nD×[0,T],n=1,,Nev(\bm{x},t)_{n}\in D\times[0,T],n=1,\cdots,N_{ev}) denote the set of uniform grid points, where NevN_{ev} denote the number of evaluation points. The errors of PINN are defined by,

l2-error=n=1Nev|u(zn)uθ(zn)|2n=1Nevu(zn)2=(n=1Nev|u(zn)uθ(zn)|2)/Nev(n=1Nevu(zn)2)/Nev,\displaystyle l_{2}\text{-error}=\frac{\sqrt{\sum_{n=1}^{N_{ev}}|u(z_{n})-u_{\theta}(z_{n})|^{2}}}{\sqrt{\sum_{n=1}^{N_{ev}}u(z_{n})^{2}}}=\frac{\sqrt{\left(\sum_{n=1}^{N_{ev}}|u(z_{n})-u_{\theta}(z_{n})|^{2}\right)/N_{ev}}}{\sqrt{\left(\sum_{n=1}^{N_{ev}}u(z_{n})^{2}\right)/N_{ev}}}, (59a)
l-error=max{|u(zn)uθ(zn)|}n=1Nev(n=1Nevu(zn)2)/Nev,\displaystyle l_{\infty}\text{-error}=\frac{\max\{|u(z_{n})-u_{\theta}(z_{n})|\}_{n=1}^{N_{ev}}}{\sqrt{\left(\sum_{n=1}^{N_{ev}}u(z_{n})^{2}\right)/N_{ev}}}, (59b)

where uθu_{\theta} denotes the PINN solution and uu denotes the exact solution.

Our implementation of the PINN algorithm is based on the PyTorch library (pytorch.org). In all the following numerical examples, we combine the Adam kingma2014adam optimizer and the L-BFGS 2006_NumericalOptimization optimizer (in batch mode) to train the neural network. We first employ the Adam optimizer to train the network for 100 epochs/iterations, and then employ the L-BFGS optimizer to continue the network training for another 30000 iterations. We employ the default parameter values in Adam, with the learning rate 0.0010.001, β1=0.9\beta_{1}=0.9 and β2=0.99\beta_{2}=0.99. The initial learning rate 1.01.0 is adopted in the L-BFGS optimizer.

6.1 Wave Equation

   

Refer to caption
(a) True solution for uu
Refer to caption
(b) PINN solution uθu_{\theta}
Refer to caption
(c) PINN absolute error for uu

Refer to caption
(d) True solution for vv
Refer to caption
(e) PINN solution vθv_{\theta}
Refer to caption
(f) PINN absolute error for vv
Figure 1: Wave equation: Distributions of the True solutions, the PINN solutions and the PINN point-wise absolute errors for uu and vv in the spatial-temporal domain. N=2000N=2000 training points within the domain and on each of the domain boundaries.

We next test the PINN algorithm for solving the wave equation (19) in one spatial dimension (plus time), under a configuration in accordance with that of 2021_JCP_Dong_modifiedbatch . Consider the spatial-temporal domain, (x,t)D×[0,T]=[0,5]×[0,2](x,t)\in D\times[0,T]=[0,5]\times[0,2], and the initial-boundary value problem with the wave equation on this domain,

2ut2c22ux2=0,\displaystyle\frac{\partial^{2}u}{\partial t^{2}}-c^{2}\frac{\partial^{2}u}{\partial x^{2}}=0, (60a)
u(0,t)=u(5,t),ux(0,t)=ux(5,t),\displaystyle u(0,t)=u(5,t),\qquad\frac{\partial u}{\partial x}(0,t)=\frac{\partial u}{\partial x}(5,t), (60b)
u(x,0)=2sech3(3δ0(xx0)),ut(x,0)=0,\displaystyle u(x,0)=2\,{\rm sech}^{3}\left(\frac{3}{\delta_{0}}(x-x_{0})\right),\qquad\frac{\partial u}{\partial t}(x,0)=0, (60c)

where u(x,t)u(x,t) is the wave field to be solved for, cc is the wave speed, x0x_{0} is the initial peak location of the wave, δ0\delta_{0} is a constant that controls the width of the wave profile, and the periodic boundary conditions are imposed on x=0x=0 and 55. In the simulations, we employ c=2c=2, δ0=2\delta_{0}=2, and x0=3x_{0}=3. Then the above problem has the solution,

{u(x,t)=sech3(3δ0(2.5+ξ))+sech3(3δ0(2.5+η)),ξ=mod(xx0+ct+2.5,5),η=mod(xx0ct+2.5,5),\left\{\begin{split}&u(x,t)={\rm sech}^{3}\left(\frac{3}{\delta_{0}}\left(-2.5+\xi\right)\right)+{\rm sech}^{3}\left(\frac{3}{\delta_{0}}\left(-2.5+\eta\right)\right),\\ &\xi={\rm mod}\left(x-x_{0}+ct+2.5,5\right),\quad\eta={\rm mod}\left(x-x_{0}-ct+2.5,5\right),\end{split}\right.

where mod refers to the modulo operation. The two terms in u(x,t)u(x,t) represent the leftward- and rightward-traveling waves, respectively.

We reformulate the problem (60) into the following system,

utv=0,vtc2uxx=0,\displaystyle u_{t}-v=0,\qquad v_{t}-c^{2}u_{xx}=0, (61a)
u(0,t)=u(5,t),ux(0,t)=ux(5,t),\displaystyle u(0,t)=u(5,t),\qquad u_{x}(0,t)=u_{x}(5,t), (61b)
u(x,0)=2sech3(3δ0(xx0)),v(x,0)=0,\displaystyle u(x,0)=2\,{\rm sech}^{3}\left(\frac{3}{\delta_{0}}(x-x_{0})\right),\qquad v(x,0)=0, (61c)

where v(x,t)v(x,t) is an auxiliary field given by the first equation in (61a).

To solve the system (61) with PINN, we employ 9090 and 6060 neurons in the first and the second hidden layers of neural networks, respectively. We employ the following loss function in PINN in light of (3.2),

Loss=\displaystyle\text{Loss}= W1Nn=1N[uθt(xintn,tintn)vθ(xintn,tintn)]2\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[vθt(xintn,tintn)uθxx(xintn,tintn)]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W3Nn=1N[uθtx(xintn,tintn)vθx(xintn,tintn)]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W4Nn=1N[uθ(xtbn,0)2sech3(3δ0(xtbnx0))]2\displaystyle+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-2\,{\rm sech}^{3}\left(\frac{3}{\delta}_{0}(x_{tb}^{n}-x_{0})\right)\right]^{2}
+W5Nn=1N[vθ(xtbn,0)]2+W6Nn=1N[uθx(xtbn,0)+18sinh((3xtbn3x0)/δ0)δ0cosh4((3xtbn3x0)/δ0)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)+\frac{18\sinh((3x_{tb}^{n}-3x_{0})/\delta_{0})}{\delta_{0}\cosh^{4}((3x_{tb}^{n}-3x_{0})/\delta_{0})}\right]^{2}
+W7Nn=1N[vθ(0,tsbn)vθ(5,tsbn)]2+W8Nn=1N[uθx(0,tsbn)uθx(5,tsbn)]2.\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[v_{\theta}(0,t_{sb}^{n})-v_{\theta}(5,t_{sb}^{n})\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(0,t_{sb}^{n})-u_{\theta x}(5,t_{sb}^{n})\right]^{2}. (62)

Note that in the simulations we have employed the same number of collocation points (NN) within the domain and on each of the domain boundaries. The above loss function differs slightly from the one in the error analysis (3.2), in several aspects. First, we have added a set of penalty coefficients Wn>0W_{n}>0 (1n81\leq n\leq 8) for different loss terms in numerical simulations. Second, the collocation points used in simulations (e.g. xintnx_{int}^{n}, tintnt_{int}^{n}, xsbnx_{sb}^{n}, tsbnt_{sb}^{n}, xtbnx_{tb}^{n}) are generated randomly within the domain or on the domain boundaries from a uniform distribution. In addition, the averaging used here do not exactly correspond to the numerical quadrature rule (mid-point rule) used in the theoretical analysis.

We have also considered another form (given below) for the loss function, as suggested by an alternate analysis as discussed in Remark 3.5 (see equation (3.5)),

Loss=\displaystyle\text{Loss}= W1Nn=1N[uθt(xintn,tintn)vθ(xintn,tintn)]2\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[vθt(xintn,tintn)uθxx(xintn,tintn)]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W3Nn=1N[uθtx(xintn,tintn)vθx(xintn,tintn)]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W4Nn=1N[uθ(xtbn,0)2sech3(3δ0(xtbnx0))]2\displaystyle+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-2\,{\rm sech}^{3}\left(\frac{3}{\delta}_{0}(x_{tb}^{n}-x_{0})\right)\right]^{2}
+W5Nn=1N[vθ(xtbn,0)]2+W6Nn=1N[uθx(xtbn,0)+18sinh((3xtbn3x0)/δ0)δ0cosh4((3xtbn3x0)/δ0)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)+\frac{18\sinh((3x_{tb}^{n}-3x_{0})/\delta_{0})}{\delta_{0}\cosh^{4}((3x_{tb}^{n}-3x_{0})/\delta_{0})}\right]^{2}
+W7Nn=1N|vθ(0,tsbn)vθ(5,tsbn)|+W8Nn=1N|uθx(0,tsbn)uθx(5,tsbn)|.\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left|v_{\theta}(0,t_{sb}^{n})-v_{\theta}(5,t_{sb}^{n})\right|+\frac{W_{8}}{N}\sum_{n=1}^{N}\left|u_{\theta x}(0,t_{sb}^{n})-u_{\theta x}(5,t_{sb}^{n})\right|. (63)

The difference between this form and the form (6.1) lies in the last two terms, with the terms here not squared.

The loss function (6.1) will be referred to as the loss form #1 in subsequent discussions, and (6.1) will be referred to as the loss form #2. The PINN schemes that employ these two different loss forms will be referred to as PINN-F1 and PINN-F2, respectively.

Figure 1 shows distributions of the exact solutions, the PINN solutions, and the PINN point-wise absolute errors for uu and vv in the spatial-temporal domain. Here the PINN solution is computed by PINN-F1, in which penalty coefficients are given by 𝑾=(W1,,W8)=(0.8,0.8,0.8,0.5,0.5,0.5,0.9,0.9)\bm{W}=(W_{1},\dots,W_{8})=(0.8,0.8,0.8,0.5,0.5,0.5,0.9,0.9). One can observe that the method has captured the wave fields for uu and ut\frac{\partial u}{\partial t} reasonably well, with the error for uu notably smaller than that of ut\frac{\partial u}{\partial t}.

Refer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to caption
(c) t=1.5t=1.5
Figure 2: Wave equation: Comparison of profiles of uu (top row) and its absolute error (bottom row) between the PINN solutions (loss forms #1 and #2) and the exact solution at time instants (a) t=0.5t=0.5, (b) t=1.0t=1.0, and (c) t=1.5t=1.5. N=2000N=2000 training data points within the domain and on each of the domain boundaries (x=0x=0 and 55, and t=0t=0).
Refer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to caption
(c) t=1.5t=1.5
Figure 3: Wave equation: Comparison of the profiles of v=utv=\frac{\partial u}{\partial t} (top row) and its absolute error (bottom row) between the PINN solutions (loss forms #1 and #2) and the exact solution at time instants (a) t=0.5t=0.5, (b) t=1.0t=1.0, and (c) t=1.5t=1.5. N=2000N=2000 training data points within the domain and on each of the domain boundaries (x=0x=0 and 55, and t=0t=0).
Refer to caption
(a) PINN-F1
Refer to caption
(b) PINN-F2
Figure 4: Wave equation: Histories of the loss function versus the training iteration with PINN-F1 and PINN-F2, corresponding to different number of training data points (NN).

Figures 2 and 3 provide a comparison of the solutions obtained using the two forms of loss functions. Figure 2 compares profiles of the PINN-F1 and PINN-F2 solutions, and the exact solution, for uu (top row) at three time instants (t=0.5t=0.5, 1.01.0, and 1.51.5), as well as the error profiles (bottom row). Figure 3 shows the corresponding results for the field variable v=utv=\frac{\partial u}{\partial t}. These results are obtained by using N=2000N=2000 training data points in the domain and on each of the domain boundaries. It is observed that both PINN schemes, with the loss functions given by (6.1) and (6.1) respectively, have captured the solution reasonably well. We further observe that the PINN-F1 scheme (with the loss form (6.1)) produces notably more accurate results than the PINN-F2 (with loss form (6.1)), especially for the field ut\frac{\partial u}{\partial t}.

We have varied the number of training data points NN systematically and studied its effect on the PINN results. Figure 4 shows the loss histories of PINN-F1 and PINN-F2 corresponding to different number of training data points (NN) in the simulations, with a total of 30,00030,000 training iterations. We can make two observations. First, the history curves with the loss function form #1 is generally smoother, indicating that the loss function decreases almost monotonically as the training progresses. On the other hand, significant fluctuations in the loss history can be observed with the form #2. Second, the eventual loss values produced by the loss form #1 are significantly smaller, by over an order of magnitude, than those produced by the loss form #2.

Table 1 is a further comparison between the PINN-F1 and PINN-F2. Here the l2l_{2} and ll_{\infty} errors of uu and vv computed by PINN-F1 and PINN-F2 corresponding to different training data points (NN) have been listed. There appears to be a general trend that the errors tend to decrease with increasing number of training points, but the decrease is not monotonic. It can be observed that the uu errors are notably smaller than those for v=utv=\frac{\partial u}{\partial t}, as signified earlier in e.g. Figure 1. One can again observe that PINN-F1 results are notably more accurate than those of PINN-F2 for the wave equation.

Table 1: Wave equation: The uu and vv errors versus the number of training data points NN.
method NN l2l_{2}-error ll_{\infty}-error
uθu_{\theta} vθv_{\theta} uθu_{\theta} vθv_{\theta}
1000 5.7013e-03 1.3531e-02 1.8821e-02 4.6631e-02
1500 2.1689e-03 4.1035e-03 6.7631e-03 1.5109e-02
PINN-F1 2000 4.6896e-03 9.6417e-03 1.3828e-02 3.3063e-02
2500 3.7879e-03 9.8574e-03 1.2868e-02 3.3622e-02
3000 2.6588e-03 6.0746e-03 8.1457e-03 1.9860e-02
1000 4.7281e-02 9.2431e-02 1.4367e-01 3.2764e-01
1500 4.9087e-02 1.2438e-01 2.1525e-01 5.0601e-01
PINN-F2 2000 1.8554e-02 4.9224e-02 6.0780e-02 1.6358e-01
2500 2.3526e-02 5.4266e-02 9.8690e-02 1.9467e-01
3000 1.4164e-02 3.7796e-02 5.3045e-02 1.4179e-01
Refer to caption
(a) PINN-F1
Refer to caption
(b) PINN-F2
Figure 5: Wave equation: The l2l^{2} errors of uu, ut\frac{\partial u}{\partial t}, and ux\frac{\partial u}{\partial x} as a function of the training loss value. N=2000N=2000 training data points.

Theorem 3.6 suggests the solution errors for uu, v=utv=\frac{\partial u}{\partial t}, and u\nabla u approximately scale as the square root of the training loss function. Figure 5 provides some numerical evidence for this point. Here we plot the l2l^{2} errors for uu, ut\frac{\partial u}{\partial t} and ux\frac{\partial u}{\partial x} from our simulations as a function of the training loss value for PINN-F1 and PINN-F2 in logarithmic scales. It is evident that for PINN-F1 the scaling essentially follows the square root relation. For PINN-F2 the relation between the error and the training loss appears to scale with a power somewhat larger than 12\frac{1}{2}.

6.2 Sine-Gordon Equation

Refer to caption
(a) True solution for uu
Refer to caption
(b) PINN solution for uu
Refer to caption
(c) Solution error for uu
Refer to caption
(d) True solution for vv
Refer to caption
(e) PINN solution for vv
Refer to caption
(f) Solution error for vv
Figure 6: Sine-Gordon equation: Distributions of the exact solution (left column), the PINN solution (middle column) and the PINN absolute error (right column) for uu (top row) and for v=utv=\frac{\partial u}{\partial t} (bottom row). N=2000N=2000 collocation points within the domain and on the domain boundaries.

We test the PINN algorithm suggested by the theoretical analysis for the Sine-Gordon equation (38) in this subsection. Consider the spatial-temporal domain (x,t)Ω=D×[0,T]=[0,1]×[0,2](x,t)\in\Omega=D\times[0,T]=[0,1]\times[0,2], and the following initial/boundary value problem on this domain,

2ut22ux2+u+sin(u)=f(x,t),\displaystyle\frac{\partial^{2}u}{\partial t^{2}}-\frac{\partial^{2}u}{\partial x^{2}}+u+\sin(u)=f(x,t), (64a)
u(0,t)=ϕ1(t),u(1,t)=ϕ2(t),\displaystyle u({0},t)=\phi_{1}(t),\qquad u({1},t)=\phi_{2}(t), (64b)
u(x,0)=ψ1(x),ut(x,0)=ψ2(x).\displaystyle u({x},0)=\psi_{1}({x}),\qquad\frac{\partial u}{\partial t}({x},0)=\psi_{2}({x}). (64c)

In these equations, u(x,t)u(x,t) is the field function to be solved for, f(x,t)f(x,t) is a source term, ψ1\psi_{1} and ψ2\psi_{2} are the initial conditions, and ϕ1\phi_{1} and ϕ2\phi_{2} are the boundary conditions. The source term, initial and boundary conditions appropriately are chosen by the following exact solution,

u(x,t)=[2cos(πx+π5)+95cos(2πx+7π20)][2cos(πt+π5)+95cos(2πt+7π20)].\displaystyle u(x,t)=\left[2\cos\left(\pi x+\frac{\pi}{5}\right)+\frac{9}{5}\cos\left(2\pi x+\frac{7\pi}{20}\right)\right]\left[2\cos\left(\pi t+\frac{\pi}{5}\right)+\frac{9}{5}\cos\left(2\pi t+\frac{7\pi}{20}\right)\right]. (65)

To simulate this problem with PINN, we reformulate the problem as follows,

utv=0,\displaystyle u_{t}-v=0, (66a)
vtuxx+u+sin(u)=f(x,t),\displaystyle v_{t}-u_{xx}+u+\sin(u)=f(x,t), (66b)
u(0,t)=ϕ1(t),u(1,t)=ϕ2(t),\displaystyle u({0},t)=\phi_{1}(t),\qquad u({1},t)=\phi_{2}(t), (66c)
u(x,0)=ψ1(x),v(x,0)=ψ2(x),\displaystyle u({x},0)=\psi_{1}({x}),\qquad v({x},0)=\psi_{2}({x}), (66d)

where vv is a variable defined by equation (66a).

In light of (4.2), we employ the following loss function in PINN,

Loss=\displaystyle\text{Loss}= W1Nn=1N[uθt(xintn,tintn)vθ(xintn,tintn)]2\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[vθt(xintn,tintn)uθxx(xintn,tintn)+uθ(xintn,tintn)+sin(uθ(xintn,tintn))f(xintn,tintn)]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})+u_{\theta}(x_{int}^{n},t_{int}^{n})+\sin(u_{\theta}(x_{int}^{n},t_{int}^{n}))-f(x_{int}^{n},t_{int}^{n})\right]^{2}
+W3Nn=1N[uθtx(xintn,tintn)vθx(xintn,tintn)]2+W4Nn=1N[uθ(xtbn,0)ψ1(xtbn)]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-\psi_{1}(x_{tb}^{n})\right]^{2}
+W5Nn=1N[vθ(xtbn,0)ψ2(xtbn)]2+W6Nn=1N[uθx(xtbn,0)ψ1x(xtbn)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)-\psi_{2}(x_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)-\psi_{1x}(x_{tb}^{n})\right]^{2}
+W7Nn=1N[|vθ(0,tsbn)ϕ1t(tsbn)|+|vθ(1,tsbn)ϕ2t(tsbn)|],\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[|v_{\theta}(0,t_{sb}^{n})-\phi_{1t}({t_{sb}^{n}})|+|v_{\theta}(1,t_{sb}^{n})-\phi_{2t}({t_{sb}^{n}})|\right], (67)

where Wn>0W_{n}>0 (1n71\leq n\leq 7) are the penalty coefficients for different loss terms added in the PINN implementation. It should be noted that the loss terms with the coefficients W3W_{3} and W6W_{6} will be absent from the conventional PINN formulation (see Raissi2019pinn ). These terms in the training loss are necessary based on the error analysis in Section 4. It should also be noted that the W7W_{7} loss terms are not squared, as dictated by the theoretical analysis of Section 4.

We have also implemented a PINN scheme with a variant form for the loss function,

Loss=\displaystyle\text{Loss}= W1Nn=1N[uθt(xintn,tintn)vθ(xintn,tintn)]2\displaystyle\frac{W_{1}}{N}\sum_{n=1}^{N}\left[u_{\theta t}(x_{int}^{n},t_{int}^{n})-v_{\theta}(x_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[vθt(xintn,tintn)uθxx(xintn,tintn)+uθ(xintn,tintn)+sin(uθ(xintn,tintn))f(xintn,tintn)]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[v_{\theta t}(x_{int}^{n},t_{int}^{n})-u_{\theta xx}(x_{int}^{n},t_{int}^{n})+u_{\theta}(x_{int}^{n},t_{int}^{n})+\sin(u_{\theta}(x_{int}^{n},t_{int}^{n}))-f(x_{int}^{n},t_{int}^{n})\right]^{2}
+W3Nn=1N[uθtx(xintn,tintn)vθx(xintn,tintn)]2+W4Nn=1N[uθ(xtbn,0)ψ1(xtbn)]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[u_{\theta tx}(x_{int}^{n},t_{int}^{n})-v_{\theta x}(x_{int}^{n},t_{int}^{n})\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[u_{\theta}(x_{tb}^{n},0)-\psi_{1}(x_{tb}^{n})\right]^{2}
+W5Nn=1N[vθ(xtbn,0)ψ2(xtbn)]2+W6Nn=1N[uθx(xtbn,0)ψ1x(xtbn)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[v_{\theta}(x_{tb}^{n},0)-\psi_{2}(x_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[u_{\theta x}(x_{tb}^{n},0)-\psi_{1x}(x_{tb}^{n})\right]^{2}
+W7Nn=1N[(vθ(0,tsbn)ϕ1t(tsbn))2+(vθ(1,tsbn)ϕ2t(tsbn))2].\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[(v_{\theta}(0,t_{sb}^{n})-\phi_{1t}({t_{sb}^{n}}))^{2}+(v_{\theta}(1,t_{sb}^{n})-\phi_{2t}({t_{sb}^{n}}))^{2}\right]. (68)

The difference between (6.2) and (6.2) lies in the W7W_{7} terms. These W7W_{7} terms in (6.2) are squared, and they are not in (6.2). We refer to the PINN scheme employing the loss function (6.2) as PINN-G1 and the scheme employing the loss function (6.2) as PINN-G2.

In the simulations we employ a feed-forward neural network with two input nodes (representing xx and tt), two output nodes (representing uu and vv), and two hidden layers, each having a width of 8080 nodes. The tanh\tanh activation function has been used for all the hidden nodes. We employ NN collocation points generated from a uniform random distribution within the domain, on each of the domain boundary, and also on the initial boundary, where NN is varied systematically in the simulations. The penalty coefficients in the loss functions are taken to be 𝑾=(W1,,W7)=(0.5,0.4,0.5,0.6,0.6,0.6,0.8)\bm{W}=(W_{1},\dots,W_{7})=(0.5,0.4,0.5,0.6,0.6,0.6,0.8).

Refer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to caption
(c) t=1.5t=1.5
Figure 7: Sine-Gordon equation: Top row, comparison of profiles between the exact solution and PINN-G1/PINN-G2 solutions for uu at several time instants. Bottom row, profiles of the absolute error of the PINN-G1 and PINN-G2 solutions for uu. N=2000N=2000 training collocation points.
Refer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to caption
(c) t=1.5t=1.5
Figure 8: Sine-Gordon equation: Top row, comparison of profiles between the exact solution and PINN-G1/PINN-G2 solutions for v=utv=\frac{\partial u}{\partial t} at several time instants. Bottom row, profiles of the absolute error of the PINN-G1 and PINN-G2 solutions for vv. N=2000N=2000 training collocation points.

Figure 6 shows distributions of of u(x,t)u(x,t) and v=utv=\frac{\partial u}{\partial t} from the exact solution (left column) and the PINN solution (middle column), as well as the point-wise absolute errors of the PINN solution for these fields (right column). These results are obtained by PINN-G2 with N=2000N=2000 random collocation points within the domain and on each of the domain boundaries. The PINN solution is in good agreement with the true solution.

Figures 7 and 8 compare the profiles of uu and vv between the exact solution, and the solutions obtained by PINN-G1 and PINN-G2, at several time instants (t=0.5t=0.5, 11 and 1.51.5). Profiles of the absolute errors of the PINN-G1/PINN-G2 solutions are also shown in these figures. We observe that both PINN-G1 and PINN-G2 have captured the solution for uu quite accurately, and to a lesser extent, also for vv. Comparison of the error profiles between PINN-G1 and PINN-G2 suggests that the PINN-G2 error in general appears to be somewhat smaller than that of PINN-G1. But this seems not to be true consistently in the entire domain.

Refer to caption
(a) PINN-G1
Refer to caption
(b) PINN-G2
Figure 9: Sine-Gordon equation: Loss histories of (a) PINN-G1 and (b) PINN-G2 corresponding to various numbers of training collocation points.
Table 2: Sine-Gordon equation: The l2l_{2} and ll_{\infty} errors for uu and vv versus the number of training collocation points NN corresponding to PINN-G1 and PINN-G2.
method NN l2l_{2}-error ll_{\infty}-error
uθu_{\theta} vθv_{\theta} uθu_{\theta} vθv_{\theta}
1000 3.0818e-03 4.3500e-03 9.6044e-03 1.8894e-02
1500 3.4335e-03 4.8035e-03 1.0566e-02 1.7050e-02
PINN-G1 2000 2.1914e-03 3.0055e-03 7.5882e-03 1.1099e-02
2500 3.0172e-03 3.5698e-03 9.2515e-03 1.4645e-02
3000 2.5281e-03 4.4858e-03 7.2785e-03 1.6213e-02
1000 3.0674e-03 2.0581e-03 7.3413e-03 1.1323e-02
1500 1.0605e-03 1.4729e-03 2.2914e-03 6.2831e-03
PINN-G2 2000 2.2469e-03 1.6072e-03 4.8842e-03 8.8320e-03
2500 6.6072e-04 6.0509e-04 1.4099e-03 4.3423e-03
3000 6.6214e-04 1.0830e-03 1.9697e-03 7.8866e-03

The effect of the collocation points on the PINN results has been studied by varying the number of training collocation points systematically between N=1000N=1000 and N=3000N=3000 within the domain and on each of the domain boundaries. The results are provided in Figure 9 and Table 2. Figure 9 shows histories of the loss function corresponding to different number of collocation points for PINN-G1 and PINN-G2. Table 2 provides the l2l_{2} and ll_{\infty} errors of uu and vv versus the number of collocation points computed by PINN-G1 and PINN-G2. The PINN errors in general tend to decrease with increasing number of collocation points, but this trend is not monotonic. It can be observed that both PINN-G1 and PINN-G2 have captured the solutions quite accurately, with those errors from PINN-G2 in general slightly better.

Refer to caption
(a) PINN-G1
Refer to caption
(b) PINN-G2
Figure 10: Sine-Gordon equation: The l2l^{2} errors of uu, ut\frac{\partial u}{\partial t}, and ux\frac{\partial u}{\partial x} as a function of the training loss value.

Figure 10 provides some numerical evidence for the relation between the total error and the training loss as suggested by Theorem 4.4. Here we plot the l2l_{2} errors for uu, vv and ux\frac{\partial u}{\partial x} as a function of the training loss value obtained by PINN-G1 and PINN-G2. The results indicate that the total error scales approximately as the square root of the training loss, which in some sense corroborates the error-loss relation as expressed in Theorem 4.4.

6.3 Linear Elastodynamic Equation

In this subsection we look into the linear elastodynamic equation (in two spatial dimensions plus time) and test the PINN algorithm as suggested by the theoretical analysis in Section 5 using this equation. Consider the spatial-temporal domain (x,y,t)Ω=D×[0,T]=[0,1]×[0,1]×[0,2](x,y,t)\in\Omega=D\times[0,T]=[0,1]\times[0,1]\times[0,2], and the following initial/boundary value problem with the linear elastodynamics equation on Ω\Omega:

ρ2𝒖t22μ(𝜺¯(𝒖))λ(𝒖)=𝒇(𝒙,t),\displaystyle\rho\frac{\partial^{2}\bm{u}}{\partial t^{2}}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\lambda\nabla(\nabla\cdot\bm{u})=\bm{f}(\bm{x},t), (69a)
𝒖|Γd=ϕd,(2μ𝜺¯(𝒖)+λ(𝒖))|Γn𝒏=ϕn,\displaystyle\bm{u}|_{\Gamma_{d}}=\bm{\phi}_{d},\qquad\Big{(}2\mu\underline{\bm{\varepsilon}}(\bm{u})+\lambda(\nabla\cdot\bm{u})\Big{)}|_{\Gamma_{n}}\bm{n}=\bm{\phi}_{n}, (69b)
𝒖(𝒙,0)=𝝍1,𝒖t(𝒙,0)=𝝍2,\displaystyle\bm{u}(\bm{x},0)=\bm{\psi}_{1},\qquad\frac{\partial\bm{u}}{\partial t}(\bm{x},0)=\bm{\psi}_{2}, (69c)

where 𝒖=(u1(𝒙,t),u2(𝒙,t))T\bm{u}=(u_{1}(\bm{x},t),u_{2}(\bm{x},t))^{T} (𝒙=(x,y)D\bm{x}=(x,y)\in D, t[0,T]t\in[0,T]) is the displacement field to be solved for, 𝒇(𝒙,t)\bm{f}(\bm{x},t) is a source term, and ρ\rho, μ\mu and λ\lambda are material constants. Γd\Gamma_{d} is the Dirichlet boundary and Γn\Gamma_{n} is the Neumann boundary, with D=ΓdΓn\partial D=\Gamma_{d}\cup\Gamma_{n} and ΓdΓn=\Gamma_{d}\cap\Gamma_{n}=\emptyset, where 𝒏\bm{n} is the outward-pointing unit normal vector. In our simulations we choose the left boundary (x=0x=0) as the Dirichlet boundary, and the rest are Neumann boundaries. ϕd\bm{\phi}_{d} and ϕn\bm{\phi}_{n} are Dirichlet and Neumann boundary conditions, respectively. 𝝍1\bm{\psi}_{1} and 𝝍2\bm{\psi}_{2} are the initial conditions for the displacement and the velocity. We employ the material parameter values μ=λ=ρ=1\mu=\lambda=\rho=1, and the following manufactured solution (2018_CMAME_DGelastodynamics ) to this problem,

𝒖(𝒙,t)=sin(2πt)[sin(πx)2sin(2πy)sin(2πx)sin(πy)2].\displaystyle\bm{u}(\bm{x},t)=\sin(\sqrt{2}\pi t)\begin{bmatrix}-\sin(\pi x)^{2}\sin(2\pi y)\\ \sin(2\pi x)\sin(\pi y)^{2}\end{bmatrix}. (70)

The source term 𝒇(𝒙,t)\bm{f}(\bm{x},t), the boundary/initial distributions ϕd\bm{\phi}_{d}, ϕn\bm{\phi}_{n}, 𝝍1\bm{\psi}_{1} and 𝝍2\bm{\psi}_{2} are chosen by the expression (70).

To simulate this problem using the PINN algorithm suggested by the theoretical analysis from Section 5, we reformulate (69) into the following system

𝒖t𝒗=𝟎,𝒗t2(𝜺¯(𝒖))(𝒖)=𝒇(𝒙,t),\displaystyle\bm{u}_{t}-\bm{v}=\bm{0},\qquad\bm{v}_{t}-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}))-\nabla(\nabla\cdot\bm{u})=\bm{f}(\bm{x},t), (71a)
𝒖|Γd=ϕd,(2𝜺¯(𝒖)+(𝒖))|Γn𝒏=ϕn,\displaystyle\bm{u}|_{\Gamma_{d}}=\bm{\phi}_{d},\qquad\Big{(}2\underline{\bm{\varepsilon}}(\bm{u})+(\nabla\cdot\bm{u})\Big{)}|_{\Gamma_{n}}\bm{n}=\bm{\phi}_{n}, (71b)
𝒖(𝒙,0)=𝝍1,𝒗(𝒙,0)=𝝍2,\displaystyle\bm{u}(\bm{x},0)=\bm{\psi}_{1},\qquad\bm{v}(\bm{x},0)=\bm{\psi}_{2}, (71c)

where 𝒗(𝒙,t)\bm{v}(\bm{x},t) is an intermediate variable (representing the velocity) as given by (71a).

In light of (5.2), we employ the following loss function for PINN,

Loss =W1Nn=1N[𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn)]2\displaystyle=\frac{W_{1}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[𝒗θt(𝒙intn,tintn)2(𝜺¯(𝒖θ(𝒙intn,tintn)))(𝒖θ(𝒙intn,tintn))𝒇(𝒙intn,tintn))]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})))-\nabla(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))-\bm{f}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}
+W3Nn=1N[𝜺¯(𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn))]2+W4Nn=1N[(𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn))]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}
+W5Nn=1N[𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn)]2+W6Nn=1N[𝒗θ(𝒙tbn,0)𝝍2(𝒙tbn)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{2}(\bm{x}_{tb}^{n})\right]^{2}
+W7Nn=1N[𝜺¯(𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn))]2+W8Nn=1N[(𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn))]2\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}
+W9Nn=1N|𝒗θ(𝒙sb1n,tsb1n)ϕdt(𝒙sb1n,tsb1n)|\displaystyle+\frac{W_{9}}{N}\sum_{n=1}^{N}|\bm{v}_{\theta}(\bm{x}_{sb1}^{n},t_{sb1}^{n})-\bm{\phi}_{dt}(\bm{x}_{sb1}^{n},t_{sb1}^{n})|
+W10Nn=1N|2𝜺¯(𝒖θ(𝒙sb2n,tsb2n))𝒏+(𝒖θ(𝒙sb2n,tsb2n))𝒏ϕn(𝒙sb2n,tsb2n)|,\displaystyle+\frac{W_{10}}{N}\sum_{n=1}^{N}|2\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}+(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}-\bm{\phi}_{n}(\bm{x}_{sb2}^{n},t_{sb2}^{n})|, (72)

where we have added the penalty coefficients, Wn>0W_{n}>0 (1n101\leq n\leq 10), for different loss terms in the implementation, and NN denotes the number of collocation points within the domain and on the domain boundaries. In the numerical tests we have also implemented another form for the loss function as follows,

Loss =W1Nn=1N[𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn)]2\displaystyle=\frac{W_{1}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})\right]^{2}
+W2Nn=1N[𝒗θt(𝒙intn,tintn)2(𝜺¯(𝒖θ(𝒙intn,tintn)))(𝒖θ(𝒙intn,tintn))𝒇(𝒙intn,tintn))]2\displaystyle+\frac{W_{2}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-2\nabla\cdot(\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n})))-\nabla(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))-\bm{f}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}
+W3Nn=1N[𝜺¯(𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn))]2+W4Nn=1N[(𝒖θt(𝒙intn,tintn)𝒗θ(𝒙intn,tintn))]2\displaystyle+\frac{W_{3}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}+\frac{W_{4}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta t}(\bm{x}_{int}^{n},t_{int}^{n})-\bm{v}_{\theta}(\bm{x}_{int}^{n},t_{int}^{n}))\right]^{2}
+W5Nn=1N[𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn)]2+W6Nn=1N[𝒗θ(𝒙tbn,0)𝝍2(𝒙tbn)]2\displaystyle+\frac{W_{5}}{N}\sum_{n=1}^{N}\left[\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n})\right]^{2}+\frac{W_{6}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{2}(\bm{x}_{tb}^{n})\right]^{2}
+W7Nn=1N[𝜺¯(𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn))]2+W8Nn=1N[(𝒖θ(𝒙tbn,0)𝝍1(𝒙tbn))]2\displaystyle+\frac{W_{7}}{N}\sum_{n=1}^{N}\left[\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}+\frac{W_{8}}{N}\sum_{n=1}^{N}\left[\nabla\cdot(\bm{u}_{\theta}(\bm{x}_{tb}^{n},0)-\bm{\psi}_{1}(\bm{x}_{tb}^{n}))\right]^{2}
+W9Nn=1N[𝒗θ(𝒙sb1n,tsb1n)ϕdt(𝒙sb1n,tsb1n)]2\displaystyle+\frac{W_{9}}{N}\sum_{n=1}^{N}\left[\bm{v}_{\theta}(\bm{x}_{sb1}^{n},t_{sb1}^{n})-\bm{\phi}_{dt}(\bm{x}_{sb1}^{n},t_{sb1}^{n})\right]^{2}
+W10Nn=1N[2𝜺¯(𝒖θ(𝒙sb2n,tsb2n))𝒏+(𝒖θ(𝒙sb2n,tsb2n))𝒏ϕn(𝒙sb2n,tsb2n)]2.\displaystyle+\frac{W_{10}}{N}\sum_{n=1}^{N}\left[2\underline{\bm{\varepsilon}}(\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}+(\nabla\cdot\bm{u}_{\theta}(\bm{x}_{sb2}^{n},t_{sb2}^{n}))\bm{n}-\bm{\phi}_{n}(\bm{x}_{sb2}^{n},t_{sb2}^{n})\right]^{2}. (73)

The difference between these two forms for the loss function lies in the W9W_{9} and W10W_{10} terms. It should be noted that the W9W_{9} and W10W_{10} terms in (6.3) are not squared, in light of the error terms (55a)-(55j) from the theoretical analysis. In contrast, these terms are squared in (6.3). The PINN scheme utilizing the loss function (6.3) is henceforth referred to as PINN-H1, and the scheme that employs the loss function (6.3) shall be referred to as PINN-H2.

In the simulations, we employ a feed-forward neural network with three input nodes, which represent 𝒙=(x,y)\bm{x}=(x,y) and the time variable t, and four output nodes, which represent 𝒖=(u1,u2)\bm{u}=(u_{1},u_{2}) and 𝒗=(v1,v2)\bm{v}=(v_{1},v_{2}). The neural network has two hidden layers, with widths of 90 and 60 nodes, respectively, and the tanh\tanh activation function for all the hidden nodes. For the network training, NN collocation points are generated from a uniform random distribution within the domain, on each of the domain boundary, as well as on the initial boundary. NN is systematically varied in the simulations. We employ the penalty coefficients 𝑾=(W1,,W10)=(0.9,0.9,0.9,0.9,0.5,0.5,0.5,0.5,0.9,0.9)\bm{W}=(W_{1},...,W_{10})=(0.9,0.9,0.9,0.9,0.5,0.5,0.5,0.5,0.9,0.9) in the simulations.

Refer to captionRefer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to captionRefer to caption
(c) t=1.5t=1.5
Figure 11: Linear elastodynamic equation: Visualization of the deformed configuration at time instants (a) t=0.5t=0.5, (b) t=1.0t=1.0, and (c) t=1.5t=1.5 from the exact solution (top row), the PINN-H1 solution (middle row) and the PINN-H2 solution (bottom row). Plotted here are the deformed field, 𝒙+𝒖(𝒙,t)\bm{x}+\bm{u}(\bm{x},t), for a set of grid points 𝒙D=[0,1]×[0,1]\bm{x}\in D=[0,1]\times[0,1]. N=2000N=2000 training collocation points within domain and on the domain boundaries.
Refer to captionRefer to caption
(a) t=0.5t=0.5
Refer to captionRefer to caption
(b) t=1t=1
Refer to captionRefer to caption
(c) t=1.5t=1.5
Figure 12: Linear elastodynamic equation: Distributions of the point-wise absolute error, 𝒖θ𝒖\|\bm{u}_{\theta}-\bm{u}\|, of the PINN-H1 solution (top row) and the PINN-H2 solution (bottom row) at three time instants (a) t=0.5t=0.5, (b) t=1.0t=1.0, and (c) t=1.5t=1.5. N=2000N=2000 training collocation points within domain and on the domain boundaries.

In Figures 11 and 12 we compare the PINN-H1/PINN-H2 solutions with the exact solution and provide an overview of their errors. Figure 11 is a visualization of the deformed configuration of the domain. Here we have plotted the deformed field, 𝒙+𝒖(𝒙,t)\bm{x}+\bm{u}(\bm{x},t), for a set of grid points 𝒙D\bm{x}\in D at three time instants from the exact solution, the PINN-H1 and PINN-H2 solutions. Figure 12 shows distributions of the point-wise absolute error of the PINN-H1/PINN-H2 solutions, 𝒖θ𝒖=(uθ1(𝒙,t)u1(𝒙,t))2+(uθ2(𝒙,t)u2(𝒙,t))2\|\bm{u}_{\theta}-\bm{u}\|=\sqrt{(u_{\theta 1}(\bm{x},t)-u_{1}(\bm{x},t))^{2}+(u_{\theta 2}(\bm{x},t)-u_{2}(\bm{x},t))^{2}}, at the same three time instants. Here 𝒖θ=(uθ1,uθ2)\bm{u}_{\theta}=(u_{\theta 1},u_{\theta 2}) denotes the PINN solution. While both PINN schemes capture the solution fairly well at t=0.5t=0.5 and 11, at t=1.5t=1.5 both schemes show larger deviations from the true solution. In general, the PINN-H1 scheme appears to produce a better approximation to the solution than PINN-H2.

Table 3: Linear elastodynamic equation: The l2l_{2} and ll_{\infty} errors for 𝒖=(u1,u2)\bm{u}=(u_{1},u_{2}) and 𝒗=(v1,v2)\bm{v}=(v_{1},v_{2}) versus the number of training data points NN from the PINN-H1 and PINN-H2 solutions.
NN l2l_{2}-error ll_{\infty}-error
uθ1u_{\theta 1} uθ2u_{\theta 2} vθ1v_{\theta 1} vθ2v_{\theta 2} uθ1u_{\theta 1} uθ2u_{\theta 2} vθ1v_{\theta 1} vθ2v_{\theta 2}
PINN-H1
1000 4.8837e-02 6.0673e-02 4.7460e-02 5.1640e-02 1.7189e-01 2.1201e-01 6.9024e-01 6.1540e-01
1500 2.8131e-02 3.1485e-02 4.1104e-02 4.1613e-02 1.9848e-01 2.4670e-01 3.4716e-01 4.0582e-01
2000 2.7796e-02 4.0410e-02 3.5891e-02 4.6334e-02 1.4704e-01 1.7687e-01 4.0678e-01 5.0022e-01
2500 3.0909e-02 4.0215e-02 3.3966e-02 4.4024e-02 1.7589e-01 2.4211e-01 4.1403e-01 3.9570e-01
3000 2.6411e-02 3.5600e-02 4.3209e-02 5.2802e-02 1.4289e-01 1.3625e-01 5.1167e-01 5.3298e-01
PINN-H2
1000 4.9869e-02 1.3451e-01 5.6327e-02 5.4796e-02 3.2314e-01 3.4978e-01 6.7624e-01 5.7277e-01
1500 5.4708e-02 1.3987e-01 4.5871e-02 5.1622e-02 2.8609e-01 5.2598e-01 4.9343e-01 2.3518e-01
2000 6.2114e-02 1.0190e-01 6.4477e-02 5.0011e-02 2.5745e-01 3.1642e-01 5.9057e-01 5.8411e-01
2500 3.7887e-02 6.0630e-02 5.4363e-02 5.0659e-02 2.2212e-01 2.4774e-01 5.3681e-01 3.5427e-01
3000 5.4862e-02 6.3407e-02 5.5208e-02 6.0082e-02 3.4102e-01 2.1308e-01 5.1894e-01 4.4995e-01

The effect of the number of collocation points (NN) on the PINN results has been studied in Figure 13 and Table 3, where NN is systematically varied in the range N=1000N=1000 to N=3000N=3000. Figure 13 shows the histories of the loss function for training PINN-H1 and PINN-H2 under different collocation points. Table 3 lists the corresponding l2l_{2} and ll_{\infty} errors of 𝒖\bm{u} and 𝒗\bm{v} obtained from PINN-H1 and PINN-H2. One can observe that the PINN errors in general tend to improve with increasing number of collocation points. It can also be observed that the PINN-H1 errors in general appear better than those of PINN-H2 for this problem.

Figure 14 shows the errors of 𝒖\bm{u}, 𝒖t\bm{u}_{t}, 𝜺¯(𝒖)\underline{\bm{\varepsilon}}(\bm{u}) and 𝒖\nabla\cdot\bm{u} as a function of the loss function value in the network training of PINN-H1 and PINN-H2. The data indicates that these errors approximately scale as the square root of the training loss, which is consistent with the relation as given by Theorem 5.5. This in a sense provides numerical evidence for the theoretical analysis in Section 5.

Refer to caption
(a) PINN-H1
Refer to caption
(b) PINN-H2
Figure 13: Linear elastodynamic equation: Training loss histories of PINN-H1 and PINN-H2 corresponding to different numbers of collocation points (NN) in the simulation.
Refer to caption
(a) PINN-H1
Refer to caption
(b) PINN-H2
Figure 14: Linear elastodynamic equation: The errors for 𝒖\bm{u}, 𝒖t\bm{u}_{t}, 𝜺¯(𝒖)\underline{\bm{\varepsilon}}(\bm{u}) and 𝒖\nabla\cdot\bm{u} versus the training loss value obtained by PINN-H1 and PINN-H2.

7 Concluding Remarks

In the present paper we have considered the approximation of a class of dynamic PDEs of second order in time by physics-informed neural networks (PINN). We provide an analysis of the convergence and the error of PINN for approximating the wave equation, the Sine-Gordon equation, and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the tanh\tanh activation function for all the hidden nodes, the PINN approximation errors for the solution field, its time derivative and its gradient can be bounded by the PINN training loss and the number of training data points (quadrature points).

Our theoretical analyses further suggest new forms for the PINN training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN formulation of the loss function. These typically include the gradient of the equation residual, the gradient of the initial-condition residual, and the time derivative of the boundary-condition residual. In addition, depending on the type of boundary conditions involved in the problem, our analyses suggest that a norm other than the commonly-used L2L^{2} norm may be more appropriate for the boundary residuals in the loss function. Adopting these new forms of the loss function suggested by the theoretical analyses leads to a variant PINN algorithm. We have implemented the new algorithm and presented a number of numerical experiments on the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. The simulation results demonstrate that the method can capture the solution field well for these PDEs. The numerical data corroborate the theoretical analyses.

Declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Availability of data/code and material

Data will be made available on reasonable request.

Acknowledgements

The work was partially supported by the China Postdoctoral Science Foundation (No.2021M702747), Natural Science Foundation of Hunan Province (No.2022JJ40422), NSF of China (No.12101495), General Special Project of Education Department of Shaanxi Provincial Government (No.21JK0943), and the US National Science Foundation (DMS-2012415).

8 Appendix: Auxiliary Results and Proofs of Main Theorems from Sections 4 and 5

8.1 Notation

Let a dd-tuple of non-negative integers α0d\alpha\in\mathbb{N}_{0}^{d} be multi-index with dd\in\mathbb{N}. For given two multi-indices α,β0d\alpha,\beta\in\mathbb{N}_{0}^{d}, we say that α,β\alpha,\beta, if, and only if, αiβi\alpha_{i}\leq\beta_{i} for all i=1,,di=1,\cdots,d. And then, denote

|α|=i=1dαi,α!=i=1dαi!,(αβ)=α!β!(αβ)!.|\alpha|=\sum_{i=1}^{d}\alpha_{i},\qquad\alpha!=\prod_{i=1}^{d}\alpha_{i}!,\qquad\begin{pmatrix}\alpha\\ \beta\end{pmatrix}=\frac{\alpha!}{\beta!(\alpha-\beta)!}.

Let Pm,n={α0n,|α|=m}P_{m,n}=\{\alpha\in\mathbb{N}_{0}^{n},|\alpha|=m\}, for which it holds

|Pm,n|=(m+n1m).|P_{m,n}|=\begin{pmatrix}m+n-1\\ m\end{pmatrix}.

8.2 Some Auxiliary Results

Lemma 8.1.

Let d,k,l0d\in\mathbb{N},k,l\in\mathbb{N}_{0} with k>l+d2k>l+\frac{d}{2} and Ωd\Omega\subset\mathbb{R}^{d} be an open set. Every function fHk(Ω)f\in H^{k}(\Omega) has a continuous representative belonging to Cl(Ω)C^{l}(\Omega).

Lemma 8.2.

Let d,k0d\in\mathbb{N},k\in\mathbb{N}_{0}, fHk(Ω)f\in H^{k}(\Omega) and gWk,(Ω)g\in W^{k,\infty}(\Omega) with Ωd\Omega\subset\mathbb{R}^{d}, then

fgHk(Ω)2kfHk(Ω)gWk,(Ω).\|fg\|_{H^{k}(\Omega)}\leq 2^{k}\|f\|_{H^{k}(\Omega)}\|g\|_{W^{k,\infty}(\Omega)}.
Lemma 8.3 (Multiplicative trace inequality, e.g. DeRyck2021On ).

Let d2d\geq 2, Ωd\Omega\subset\mathbb{R}^{d} be a Lipschitz domain and let γ0:H1(Ω)L2(Ω):uu|Ω\gamma_{0}:H^{1}(\Omega)\rightarrow L^{2}(\partial\Omega):u\mapsto u|_{\partial\Omega} be the trace operator. Denote by hΩh_{\Omega} the diameter of Ω\Omega and by ρΩ\rho_{\Omega} the radius of the largest dd-dimensional ball that can be inscribed into Ω\Omega. Then it holds that

γ0uL2(Ω)ChΩ,d,ρΩuH1(Ω),\|\gamma_{0}u\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d,\rho_{\Omega}}\|u\|_{H^{1}(\Omega)}, (74)

where ChΩ,d,ρΩ=2max{2hΩ,d}ρΩC_{h_{\Omega},d,\rho_{\Omega}}=\sqrt{\frac{2\max\{2h_{\Omega},d\}}{\rho_{\Omega}}}.

Lemma 8.4 (2023_IMA_Mishra_NS ).

Let d,n,L,Wd,n,L,W\in\mathbb{N} and let uθ:ddu_{\theta}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} be a neural network with θΘ\theta\in\Theta for L2,R,W1L\geq 2,R,W\geq 1, c.f. Definition 2.1. Assume that σCn1\|\sigma\|_{C^{n}}\geq 1. Then it holds for 1jd1\leq j\leq d that

(uθ)jCn(Ω)16Ld2n(e2n4W3RnσCn(Ω))nL.\|(u_{\theta})_{j}\|_{C^{n}(\Omega)}\leq 16^{L}d^{2n}(e^{2}n^{4}W^{3}R^{n}\|\sigma\|_{C^{n}(\Omega)})^{nL}. (75)
Lemma 8.5 (2023_IMA_Mishra_NS ).

Let d2,m3,σ>0,ai,bid\geq 2,m\geq 3,\sigma>0,a_{i},b_{i}\in\mathbb{Z} with ai<bia_{i}<b_{i} for 1id1\leq i\leq d, Ω=i=1d[ai,bi]\Omega=\prod_{i=1}^{d}[a_{i},b_{i}] and fHm(Ω)f\in H^{m}(\Omega). Then for every NN\in\mathbb{N} with N>5N>5 there exists a tanh neural network f^N\hat{f}^{N} with two hidden layers, one of width at most 3[m2]|Pm1,d+1|+i=1d(biai)(N1)3[\frac{m}{2}]|P_{m-1,d+1}|+\sum_{i=1}^{d}(b_{i}-a_{i})(N-1) and another of width at most 3[d+22]|Pd+1,d+1|Ndi=1d(biai)3[\frac{d+2}{2}]|P_{d+1,d+1}|N^{d}\prod_{i=1}^{d}(b_{i}-a_{i}), such that for k=0,1,2k=0,1,2 it holds that

ff^NHk(Ω)2k3dCk,m,d,f(1+σ)lnk(βk,σ,d,fNd+m+2)Nm+k,\|f-\hat{f}^{N}\|_{H^{k}(\Omega)}\leq 2^{k}3^{d}C_{k,m,d,f}(1+\sigma){\rm ln}^{k}\left(\beta_{k,\sigma,d,f}N^{d+m+2}\right)N^{-m+k}, (76)

and where

Ck,m,d,f=max0lk(d+l1l)1/2((ml)!)1/2([mld]!)d/2(3dπ)ml|f|Hm(Ω),\displaystyle C_{k,m,d,f}=\max_{0\leq l\leq k}\left(\begin{array}[]{c}d+l-1\\ l\\ \end{array}\right)^{1/2}\frac{((m-l)!)^{1/2}}{([\frac{m-l}{d}]!)^{d/2}}\left(\frac{3\sqrt{d}}{\pi}\right)^{m-l}|f|_{H^{m}(\Omega)},
βk,σ,d,f=52kdmax{i=1d(biai),d}max{fWk,(Ω),1}3dσmin{1,Ck,m,d,f}.\displaystyle\beta_{k,\sigma,d,f}=\frac{5\cdot 2^{kd}\max\{\prod_{i=1}^{d}(b_{i}-a_{i}),d\}\max\{\|f\|_{W^{k,\infty}(\Omega)},1\}}{3^{d}\sigma\min\{1,C_{k,m,d,f}\}}.

Moreover, the weights of f^N\hat{f}^{N} scale as O(Nγ)O(N^{\gamma}) with γ=max{m2/2,d(1+m/2+d/2)}\gamma=\max\{m^{2}/2,d(1+m/2+d/2)\}.

8.3 Proof of Main Theorems from Section 4: Sine-Gordon Equation

Theorem 4.2: Let dd, rr, kk\in\mathbb{N} with k3k\geq 3. Assume that g(u)g(u) is Lipschitz continuous, uCk(D×[0,T])u\in C^{k}(D\times[0,T]) and vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]). Then for every integer N>5N>5, there exist tanh\tanh neural networks uθu_{\theta} and vθv_{\theta}, each having two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that

Rint1L2(Ω),Rtb1L2(D)lnNNk+1,\displaystyle\|R_{int1}\|_{L^{2}(\Omega)},\|R_{tb1}\|_{L^{2}(D)}\lesssim{\rm ln}NN^{-k+1},
Rint2L2(Ω),Rint1L2(Ω),Rtb1L2(D)ln2NNk+2,\displaystyle\|R_{int2}\|_{L^{2}(\Omega)},\|\nabla R_{int1}\|_{L^{2}(\Omega)},\|\nabla R_{tb1}\|_{L^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2},
Rtb2L2(D),RsbL2(D×[0,t])lnNNk+2.\displaystyle\|R_{tb2}\|_{L^{2}(D)},\|R_{sb}\|_{L^{2}(\partial D\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.
Proof.

Based on uCk(D×[0,T])u\in C^{k}(D\times[0,T]), vCk1(D×[0,T])v\in C^{k-1}(D\times[0,T]) and Lemma 8.5, there exist neural networks uθu_{\theta} and vθv_{\theta}, with the same two hidden layers and widths 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that for every 0l20\leq l\leq 2 and 0s20\leq s\leq 2,

uθuHl(Ω)Cl,k,d+1,uλl,u(N)Nk+l,\displaystyle\|u_{\theta}-u\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u}\lambda_{l,u}(N)N^{-k+l},
vθvHs(Ω)Cs,k1,d+1,vλs,v(N)Nk+1+s.\displaystyle\|v_{\theta}-v\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v}\lambda_{s,v}(N)N^{-k+1+s}.

It is now straightforward to bound the PINN residual.

u^tL2(Ω)u^H1(Ω),v^tL2(Ω)v^H1(Ω),\displaystyle\|\hat{u}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{1}(\Omega)},\qquad\|\hat{v}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{v}\|_{H^{1}(\Omega)},
Δu^L2(Ω)u^H2(Ω)u^tL2(Ω)u^H2(Ω),\displaystyle\|\Delta\hat{u}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)}\qquad\|\nabla\hat{u}_{t}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)},
v^L2(Ω)v^H1(Ω),\displaystyle\|\nabla\hat{v}\|_{L^{2}(\Omega)}\leq\|\hat{v}\|_{H^{1}(\Omega)},
u^L2(D)u^L2(Ω)ChΩ,d+1,ρΩu^H1(Ω),\displaystyle\|\hat{u}\|_{L^{2}(D)}\leq\|\hat{u}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{1}(\Omega)},
v^L2(D)v^L2(Ω)ChΩ,d+1,ρΩv^H1(Ω),\displaystyle\|\hat{v}\|_{L^{2}(D)}\leq\|\hat{v}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)},
u^L2(D)u^L2(Ω)ChΩ,d+1,ρΩu^H2(Ω),\displaystyle\|\nabla\hat{u}\|_{L^{2}(D)}\leq\|\nabla\hat{u}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{2}(\Omega)},
v^L2(D×[0,t])v^L2(Ω)ChΩ,d+1,ρΩv^H1(Ω).\displaystyle\|\hat{v}\|_{L^{2}(\partial D\times[0,t])}\leq\|\hat{v}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)}.

Similar to Theorem 3.3, we can obtain

Rint1L2(Ω)=u^tv^L2(Ω)u^H1(Ω)+v^L2(Ω)lnNNk+1,\displaystyle\|R_{int1}\|_{L^{2}(\Omega)}=\|\hat{u}_{t}-\hat{v}\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{1}(\Omega)}+\|\hat{v}\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
Rint2L2(Ω)=ε2v^ta2Δu^+ε12u^+g(uθ)g(u)L2(Ω)\displaystyle\|R_{int2}\|_{L^{2}(\Omega)}=\|\varepsilon^{2}\hat{v}_{t}-a^{2}\Delta\hat{u}+\varepsilon_{1}^{2}\hat{u}+g(u_{\theta})-g(u)\|_{L^{2}(\Omega)}
ε2v^H1(Ω)+a2u^H2(Ω)+ε12u^L2(Ω)+Lu^L2(Ω)ln2NNk+2,\displaystyle\qquad\leq\varepsilon^{2}\|\hat{v}\|_{H^{1}(\Omega)}+a^{2}\|\hat{u}\|_{H^{2}(\Omega)}+\varepsilon_{1}^{2}\|\hat{u}\|_{L^{2}(\Omega)}+L\|\hat{u}\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},
Rint1L2(Ω)=(u^tv^)L2(Ω)u^H2(Ω)+v^H1(Ω)ln2NNk+2,\displaystyle\|\nabla R_{int1}\|_{L^{2}(\Omega)}=\|\nabla(\hat{u}_{t}-\hat{v})\|_{L^{2}(\Omega)}\leq\|\hat{u}\|_{H^{2}(\Omega)}+\|\hat{v}\|_{H^{1}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},
Rtb1L2(D)ChΩ,d+1,ρΩu^H1(Ω)lnNNk+1,\displaystyle\|R_{tb1}\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
Rtb2L2(D),RsbL2(D×[0,t])ChΩ,d+1,ρΩv^H1(Ω)lnNNk+2,\displaystyle\|R_{tb2}\|_{L^{2}(D)},\|R_{sb}\|_{L^{2}(\partial D\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},
Rtb1L2(D)ChΩ,d+1,ρΩu^H2(Ω)ln2NNk+2.\displaystyle\|\nabla R_{tb1}\|_{L^{2}(D)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.

Theorem 4.3: Let dd\in\mathbb{N}, uC1(Ω)u\in C^{1}(\Omega) and vC0(Ω)v\in C^{0}(\Omega) be the classical solution to the Sine-Gordon equation (38). Let (uθ,vθ)(u_{\theta},v_{\theta}) denote the PINN approximation with the parameter θ\theta. Then the following relation holds,

0TD(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dtCGTexp((2+ε12+L+a2)T),\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

where CGC_{G} is defined in the proof.

Proof.

By taking the inner product of (45a) and (45b) with u^\hat{u} and v^\hat{v} over DD, respectively, we have

d2dtD|u^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x} =Du^v^d𝒙+DRint1u^d𝒙D|u^|2d𝒙+12D|Rint1|2d𝒙+12D|v^|2d𝒙,\displaystyle=\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int1}\hat{u}{\,\rm{d}}\bm{x}\leq\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int1}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}, (77)
ε2d2dtD|v^|2d𝒙\displaystyle\varepsilon^{2}\frac{d}{2dt}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x} =a2Du^v^d𝒙+a2DRsbu^𝒏ds(𝒙)ε12Du^v^d𝒙\displaystyle=-a^{2}\int_{D}\nabla\hat{u}\cdot\nabla\hat{v}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}
D(g(uθ)g(u))v^d𝒙+DRint2v^d𝒙\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
=a2Du^u^td𝒙+a2Du^Rint1d𝒙+a2DRsbu^𝒏ds(𝒙)ε12Du^v^d𝒙\displaystyle=-a^{2}\int_{D}\nabla\hat{u}\cdot\nabla\hat{u}_{t}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}
D(g(uθ)g(u))v^d𝒙+DRint2v^d𝒙\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
=a2d2dtD|u^|2d𝒙+a2Du^Rint1d𝒙+a2DRsbu^𝒏ds(𝒙)ε12Du^v^d𝒙\displaystyle=-a^{2}\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}\nabla\hat{u}\cdot\nabla R_{int1}{\,\rm{d}}\bm{x}+a^{2}\int_{\partial D}R_{sb}\nabla\hat{u}\cdot\bm{n}{\,\rm{d}}s(\bm{x})-\varepsilon_{1}^{2}\int_{D}\hat{u}\hat{v}{\,\rm{d}}\bm{x}
D(g(uθ)g(u))v^d𝒙+DRint2v^d𝒙\displaystyle\qquad-\int_{D}(g(u_{\theta})-g(u))\hat{v}{\,\rm{d}}\bm{x}+\int_{D}R_{int2}\hat{v}{\,\rm{d}}\bm{x}
a2d2dtD|u^|2d𝒙+a22D|u^|2d𝒙+a22D|Rint1|2d𝒙+CD(D|Rsb|2ds(𝒙))12\displaystyle\leq-a^{2}\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}+C_{\partial D}\left(\int_{\partial D}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}
+12(ε12+L)D|u^|2d𝒙+12(ε12+L+1)D|v^|2d𝒙+12D|Rint2|2d𝒙,\displaystyle\qquad+\frac{1}{2}(\varepsilon_{1}^{2}+L)\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}(\varepsilon_{1}^{2}+L+1)\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int2}|^{2}{\,\rm{d}}\bm{x}, (78)

where CD=a2|D|12(uC1(D×[0,t])+uθC1(D×[0,t]))C_{\partial D}=a^{2}|\partial D|^{\frac{1}{2}}(\|u\|_{C^{1}(\partial D\times[0,t])}+||u_{\theta}||_{C^{1}(\partial D\times[0,t])}) and v^=u^tRint1\hat{v}=\hat{u}_{t}-R_{int1} have been used.

Add (77) to (8.3), and we get

d2dtD|u^|2d𝒙+a2d2dtD|u^|2d𝒙+ε2d2dtD|v^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+a^{2}\frac{d}{2dt}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\frac{d}{2dt}\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}
12(ε12+L+2)D|u^|2d𝒙+a22D|u^|2d𝒙+12(ε12+L+2)D|v^|2d𝒙+12D|Rint1|2d𝒙+12D|Rint2|2d𝒙\displaystyle\qquad\leq\frac{1}{2}(\varepsilon_{1}^{2}+L+2)\int_{D}|\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{a^{2}}{2}\int_{D}|\nabla\hat{u}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}(\varepsilon_{1}^{2}+L+2)\int_{D}|\hat{v}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int1}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|R_{int2}|^{2}{\,\rm{d}}\bm{x}
+a22D|Rint1|2d𝒙+CD(D|Rsb|2ds(𝒙))12.\displaystyle\qquad+\frac{a^{2}}{2}\int_{D}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}+C_{\partial D}\left(\int_{\partial D}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}. (79)

Integrating (8.3) over [0,τ][0,\tau] for any τT\tau\leq T and applying the Cauchy–Schwarz inequality, we obtain

D|u^(𝒙,τ)|2d𝒙+a2D|u^(𝒙,τ)|2d𝒙+ε2D|v^(𝒙,τ)|2d𝒙\displaystyle\int_{D}|\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}|\nabla\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}|\hat{v}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}
D|Rtb1|2d𝒙+a2D|Rtb1|2d𝒙+ε2D|Rtb2|2d𝒙+(2+ε12+L+a2)0τD(|u^|2+|u^|2+|v^|2)d𝒙dt\displaystyle\qquad\leq\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x}+(2+\varepsilon_{1}^{2}+L+a^{2})\int_{0}^{\tau}\int_{D}\left(|\hat{u}|^{2}+|\nabla\hat{u}|^{2}+|\hat{v}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+0TD(|Rint1|2+a2|Rint1|2+|Rint2|2)d𝒙dt+2CD|T|12(0TD|Rsb|2ds(𝒙)dt)12.\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(|R_{int1}|^{2}+a^{2}|\nabla R_{int1}|^{2}+|R_{int2}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t+2C_{\partial D}|T|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.

Applying the integral form of the Gro¨{\rm\ddot{o}}nwall inequality to the above inequality leads to,

D|u^(𝒙,τ)|2d𝒙+a2D|u^(𝒙,τ)|2d𝒙+ε2D|v^(𝒙,τ)|2d𝒙CGexp((2+ε12+L+a2)T),\int_{D}|\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+a^{2}\int_{D}|\nabla\hat{u}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\varepsilon^{2}\int_{D}|\hat{v}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}\leq C_{G}\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right), (80)

where

CG=D(|Rtb1|2+a2|Rtb1|2+ε2|Rtb2|2)d𝒙+0TD(|Rint1|2+|Rint2|2+a2|Rint1|2)d𝒙dt\displaystyle C_{G}=\int_{D}(|R_{tb1}|^{2}+a^{2}|\nabla R_{tb1}|^{2}+\varepsilon^{2}|R_{tb2}|^{2}){\,\rm{d}}\bm{x}+\int_{0}^{T}\int_{D}(|R_{int1}|^{2}+|R_{int2}|^{2}+a^{2}|\nabla R_{int1}|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t
+2CD|T|12(0TD|Rsb|2ds(𝒙)dt)12.\displaystyle\qquad+2C_{\partial D}|T|^{\frac{1}{2}}\left(\int_{0}^{T}\int_{\partial D}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.

Then, we integrate (80) over [0,T][0,T] to end the proof. ∎

Theorem 4.4: Let dd\in\mathbb{N} and T>0T>0. Let uC4(Ω)u\in C^{4}(\Omega) and vC3(Ω)v\in C^{3}(\Omega) be the classical solution to the Sine-Gordon equation (38). Let (uθ,vθ)(u_{\theta},v_{\theta}) denote the PINN approximation with the parameter θΘ\theta\in\Theta. Then the following relation holds,

0TD(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dtCTTexp((2+ε12+L+a2)T)\displaystyle\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right)
=𝒪(T(θ)2+Mint2d+1+Mtb2d+Msb1d),\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),

where the constant CTC_{T} is given in the proof.

Proof.

We can combine Theorem 4.3 with the quadrature error formula (18) to obtain the error estimate,

D|Rtb1|2d𝒙\displaystyle\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|Rtb1|2d𝒙𝒬MtbD(Rtb12)+𝒬MtbD(Rtb12)\displaystyle=\int_{D}|R_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})
C(Rtb12)Mtb2d+𝒬MtbD(Rtb12),\displaystyle\leq C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2}),
D|Rtb2|2d𝒙\displaystyle\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x} =D|Rtb2|2d𝒙𝒬MtbD(Rtb22)+𝒬MtbD(Rtb22)\displaystyle=\int_{D}|R_{tb2}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})
C(Rtb22)Mtb2d+𝒬MtbD(Rtb22),\displaystyle\leq C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2}),
D|Rtb1|2d𝒙\displaystyle\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|Rtb1|2d𝒙𝒬MtbD(|Rtb1|2)+𝒬MtbD(|Rtb1|2)\displaystyle=\int_{D}|\nabla R_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})
C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2),\displaystyle\leq C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2}),
Ω|Rint1|2d𝒙dt\displaystyle\int_{\Omega}|R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint1|2d𝒙dt𝒬MintΩ(Rint12)+𝒬MintΩ(Rint12)\displaystyle=\int_{\Omega}|R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})
C(Rint12)Mint2d+1+𝒬MintΩ(Rint12),\displaystyle\leq C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2}),
Ω|Rint2|2d𝒙dt\displaystyle\int_{\Omega}|R_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint2|2d𝒙dt𝒬MintΩ(Rint22)+𝒬MintΩ(Rint22)\displaystyle=\int_{\Omega}|R_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})
C(Rint22)Mint2d+1+𝒬MintΩ(Rint22),\displaystyle\leq C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2}),
Ω|Rint1|2d𝒙dt\displaystyle\int_{\Omega}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|Rint1|2d𝒙dt𝒬MintΩ(|Rint1|2)+𝒬MintΩ(|Rint1|2)\displaystyle=\int_{\Omega}|\nabla R_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})
C(|Rint1|2)Mint2d+1+𝒬MintΩ(|Rint1|2),\displaystyle\leq C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2}),
Ω|Rsb|2ds(𝒙)dt\displaystyle\int_{\Omega_{*}}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t =Ω|Rsb|2ds(𝒙)dt𝒬MsbΩ(Rsb2)+𝒬MsbΩ(Rsb2)\displaystyle=\int_{\Omega_{*}}|R_{sb}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})
C(Rsb2)Msb2d+𝒬MsbΩ(Rsb2).\displaystyle\leq C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2}).

In light of (80) and the above inequalities, we have

0TD(|u^(𝒙,t)|2+a2|u^(𝒙,t)|2+ε2|v^(𝒙,t)|2)d𝒙dtTCTexp((2+ε12+L+a2)T),\int_{0}^{T}\int_{D}(|\hat{u}(\bm{x},t)|^{2}+a^{2}|\nabla\hat{u}(\bm{x},t)|^{2}+\varepsilon^{2}|\hat{v}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq TC_{T}\exp\left((2+\varepsilon_{1}^{2}+L+a^{2})T\right),

where

CT=\displaystyle C_{T}= C(Rtb12)Mtb2d+𝒬MtbD(Rtb12)+ε2(C(Rtb22)Mtb2d+𝒬MtbD(Rtb22))\displaystyle C_{({R_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb1}^{2})+\varepsilon^{2}\left(C_{({R_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(R_{tb2}^{2})\right)
+a2(C(|Rtb1|2)Mtb2d+𝒬MtbD(|Rtb1|2))+C(Rint12)Mint2d+1+𝒬MintΩ(Rint12)\displaystyle+a^{2}\left(C_{(|\nabla R_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla R_{tb1}|^{2})\right)+C_{({R_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int1}^{2})
+C(Rint22)Mint2d+1+𝒬MintΩ(Rint22)+a2(C(|Rint1|2)Mint2d+1+𝒬MintΩ(|Rint1|2)),\displaystyle+C_{({R_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(R_{int2}^{2})+a^{2}\left(C_{(|\nabla R_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla R_{int1}|^{2})\right),
+2CD|T|12(C(Rsb2)Msb2d+𝒬MsbΩ(Rsb2))12,\displaystyle+2C_{\partial D}|T|^{\frac{1}{2}}\left(C_{({R_{sb}^{2}})}M_{sb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb}}^{\Omega_{*}}(R_{sb}^{2})\right)^{\frac{1}{2}},

and

C(Rtb12)u^C22,C(Rtb22)v^C22,C(|Rtb1|2)u^C32,C(Rint12)u^C32+v^C22,\displaystyle C_{({R_{tb1}^{2}})}\lesssim\|\hat{u}\|_{C^{2}}^{2},\quad C_{({R_{tb2}^{2}})}\lesssim\|\hat{v}\|_{C^{2}}^{2},\quad C_{(|\nabla R_{tb1}|^{2})}\lesssim\|\hat{u}\|_{C^{3}}^{2},\quad C_{({R_{int1}^{2}})}\lesssim\|\hat{u}\|_{C^{3}}^{2}+\|\hat{v}\|_{C^{2}}^{2},
C(Rint22),C(|Rint1|2)u^C42+v^C32,C(Rsb2)v^C32.\displaystyle\qquad\qquad C_{({R_{int2}^{2}})},C_{(|\nabla R_{int1}|^{2})}\lesssim\|\hat{u}\|_{C^{4}}^{2}+\|\hat{v}\|_{C^{3}}^{2},\quad C_{({R_{sb}^{2}})}\lesssim\|\hat{v}\|_{C^{3}}^{2}.

Here, the boundedness uθCn\|u_{\theta}\|_{C^{n}} and vθCn\|v_{\theta}\|_{C^{n}} (nn\in\mathbb{N}) of the above constants can be obtained by Lemma 8.4 and Rq2Cn2nRqCn2\|R_{q}^{2}\|_{C^{n}}\leq 2^{n}\|R_{q}\|_{C^{n}}^{2} for Rq=Rtb1R_{q}=R_{tb1}, Rtb2R_{tb2}, Rtb1\nabla R_{tb1}, Rint1R_{int1}, Rint2R_{int2}, Rint1\nabla R_{int1} and RsbR_{sb}. ∎

8.4 Proof of Main Theorems from Section 5: Linear Elastodynamic Equation

Theorem 5.3: Let dd, rr, kk\in\mathbb{N} with k3k\geq 3. Let 𝝍1Hr(D)\bm{\psi}_{1}\in H^{r}(D), 𝝍2Hr1(D)\bm{\psi}_{2}\in H^{r-1}(D) and 𝒇Hr1(D×[0,T])\bm{f}\in H^{r-1}(D\times[0,T]) with r>d2+kr>\frac{d}{2}+k. For every integer N>5N>5, there exist tanh\tanh neural networks (𝒖j)θ(\bm{u}_{j})_{\theta} and (𝒗j)θ(\bm{v}_{j})_{\theta}, with j=1,2,,dj=1,2,\cdots,d, each with two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that

𝑹int1L2(Ω),𝑹tb1L2(Ω)lnNNk+1,\displaystyle\|\bm{R}_{int1}\|_{L^{2}(\Omega)},\|\bm{R}_{tb1}\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
𝑹int2L2(Ω),𝜺¯(𝑹int1)L2(Ω),𝑹int1L2(Ω)ln2NNk+2,\displaystyle\|\bm{R}_{int2}\|_{L^{2}(\Omega)},\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|_{L^{2}(\Omega)},\|\nabla\cdot\bm{R}_{int1}\|_{L^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},
𝜺¯(𝑹tb1)L2(D),𝑹tb1L2(D),𝑹sb2L2(ΓN×[0,t])ln2NNk+2,\displaystyle\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|_{L^{2}(D)},\|\nabla\cdot\bm{R}_{tb1}\|_{L^{2}(D)},\|\bm{R}_{sb2}\|_{L^{2}(\Gamma_{N}\times[0,t])}\lesssim{\rm ln}^{2}NN^{-k+2},
𝑹tb2L2(D),𝑹sb1L2(ΓD×[0,t])lnNNk+2.\displaystyle\|\bm{R}_{tb2}\|_{L^{2}(D)},\|\bm{R}_{sb1}\|_{L^{2}(\Gamma_{D}\times[0,t])}\lesssim{\rm ln}NN^{-k+2}.
Proof.

Lemma 5.2 implies that,

𝒖Ck(D×[0,T]),𝒗Ck1(D×[0,T]).\bm{u}\in C^{k}(D\times[0,T]),\qquad\bm{v}\in C^{k-1}(D\times[0,T]).

Let 𝒖θ=((u1)θ,(u2)θ,,(ud)θ)\bm{u}_{\theta}=((u_{1})_{\theta},(u_{2})_{\theta},\cdots,(u_{d})_{\theta}) and 𝒗θ=((v1)θ,(v2)θ,,(vd)θ)\bm{v}_{\theta}=((v_{1})_{\theta},(v_{2})_{\theta},\cdots,(v_{d})_{\theta}). Based on Lemma 8.5, there exists tanh\tanh neural networks (ui)θ(u_{i})_{\theta} and (vi)θ(v_{i})_{\theta}, with i=1,2,,di=1,2,\cdots,d, each having two hidden layers, of widths at most 3k2|Pk1,d+2|+NT+d(N1)3\lceil\frac{k}{2}\rceil|P_{k-1,d+2}|+\lceil NT\rceil+d(N-1) and 3d+32|Pd+2,d+2|NTNd3\lceil\frac{d+3}{2}\rceil|P_{d+2,d+2}|\lceil NT\rceil N^{d}, such that for every 0l20\leq l\leq 2 and 0s20\leq s\leq 2,

ui(ui)θHl(Ω)Cl,k,d+1,uiλl,ui(N)Nk+l,\displaystyle\|u_{i}-(u_{i})_{\theta}\|_{H^{l}(\Omega)}\leq C_{l,k,d+1,u_{i}}\lambda_{l,u_{i}}(N)N^{-k+l}, (81)
vi(vi)θHs(Ω)Cs,k1,d+1,viλs,vi(N)Nk+1+s.\displaystyle\|v_{i}-(v_{i})_{\theta}\|_{H^{s}(\Omega)}\leq C_{s,k-1,d+1,v_{i}}\lambda_{s,v_{i}}(N)N^{-k+1+s}. (82)

Let i\partial_{i} represent the derivative with respect to the ii-th dimension. For 1i,jd1\leq i,\ j\leq d, we have

(u^t)iL2(Ω)u^iH1(Ω),(v^t)iL2(Ω)v^iH1(Ω),\displaystyle\|(\hat{u}_{t})_{i}\|_{L^{2}(\Omega)}\leq\|\hat{u}_{i}\|_{H^{1}(\Omega)},\qquad\|(\hat{v}_{t})_{i}\|_{L^{2}(\Omega)}\leq\|\hat{v}_{i}\|_{H^{1}(\Omega)},
iju^iL2(Ω),iiu^iL2(Ω),jju^iL2(Ω)u^iH2(Ω),\displaystyle\|\partial_{i}\partial_{j}\hat{u}_{i}\|_{L^{2}(\Omega)},\|\partial_{i}\partial_{i}\hat{u}_{i}\|_{L^{2}(\Omega)},\|\partial_{j}\partial_{j}\hat{u}_{i}\|_{L^{2}(\Omega)}\leq\|\hat{u}_{i}\|_{H^{2}(\Omega)},
j(u^t)iL2(Ω)(u^t)iH1(Ω)u^iH2(Ω),jv^iL2(Ω)v^iH1(Ω),\displaystyle\|\partial_{j}(\hat{u}_{t})_{i}\|_{L^{2}(\Omega)}\leq\|(\hat{u}_{t})_{i}\|_{H^{1}(\Omega)}\leq\|\hat{u}_{i}\|_{H^{2}(\Omega)},\qquad\|\partial_{j}\hat{v}_{i}\|_{L^{2}(\Omega)}\leq\|\hat{v}_{i}\|_{H^{1}(\Omega)},
u^iL2(D)u^iL2(Ω)ChΩ,d+1,ρΩu^iH1(Ω),\displaystyle\|\hat{u}_{i}\|_{L^{2}(D)}\leq\|\hat{u}_{i}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}_{i}\|_{H^{1}(\Omega)},
v^iL2(D)v^iL2(Ω)ChΩ,d+1,ρΩv^iH1(Ω),\displaystyle\|\hat{v}_{i}\|_{L^{2}(D)}\leq\|\hat{v}_{i}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}_{i}\|_{H^{1}(\Omega)},
j(u^)iL2(D)j(u^)iL2(Ω)ChΩ,d+1,ρΩu^iH2(Ω),\displaystyle\|\partial_{j}(\hat{u})_{i}\|_{L^{2}(D)}\leq\|\partial_{j}(\hat{u})_{i}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}_{i}\|_{H^{2}(\Omega)},
v^iL2(ΓD×[0,t])v^iL2(Ω)ChΩ,d+1,ρΩv^iH1(Ω),\displaystyle\|\hat{v}_{i}\|_{L^{2}(\Gamma_{D}\times[0,t])}\leq\|\hat{v}_{i}\|_{L^{2}(\partial\Omega)}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{v}_{i}\|_{H^{1}(\Omega)},
iu^iniL2(ΓN×[0,t]),ju^iniL2(ΓN×[0,t]),ju^injL2(ΓN×[0,t])ChΩ,d+1,ρΩu^iH2(Ω).\displaystyle\|\partial_{i}\hat{u}_{i}n_{i}\|_{L^{2}(\Gamma_{N}\times[0,t])},\|\partial_{j}\hat{u}_{i}n_{i}\|_{L^{2}(\Gamma_{N}\times[0,t])},\|\partial_{j}\hat{u}_{i}n_{j}\|_{L^{2}(\Gamma_{N}\times[0,t])}\leq C_{h_{\Omega},d+1,\rho_{\Omega}}\|\hat{u}_{i}\|_{H^{2}(\Omega)}.

Using (81) and (82) and the above relations, we can now bound the PINN residuals,

𝑹int1L2(Ω)𝒖^t𝒗^L2(Ω)𝒖^H1(Ω)+𝒗^L2(Ω)lnNNk+1,\displaystyle\|\bm{R}_{int1}\|_{L^{2}(\Omega)}\leq\|\hat{\bm{u}}_{t}-\hat{\bm{v}}\|_{L^{2}(\Omega)}\leq\|\hat{\bm{u}}\|_{H^{1}(\Omega)}+\|\hat{\bm{v}}\|_{L^{2}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
𝑹int2L2(Ω)ρ𝒗^t2μ(𝜺¯(𝒖^))λ(𝒖^)L2(Ω)\displaystyle\|\bm{R}_{int2}\|_{L^{2}(\Omega)}\leq\|\rho\hat{\bm{v}}_{t}-2\mu\nabla\cdot(\underline{\bm{\varepsilon}}(\hat{\bm{u}}))-\lambda\nabla(\nabla\cdot\hat{\bm{u}})\|_{L^{2}(\Omega)}
𝒗^H1(Ω)+𝒖^H2(Ω)ln2NNk+2,\displaystyle\qquad\lesssim\|\hat{\bm{v}}\|_{H^{1}(\Omega)}+\|\hat{\bm{u}}\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},
𝜺¯(𝑹int1)L2(Ω),𝑹int1L2(Ω)𝒖^H2(Ω)+𝒗^H1(Ω)ln2NNk+2,\displaystyle\|\underline{\bm{\varepsilon}}(\bm{R}_{int1})\|_{L^{2}(\Omega)},\|\nabla\cdot\bm{R}_{int1}\|_{L^{2}(\Omega)}\lesssim\|\hat{\bm{u}}\|_{H^{2}(\Omega)}+\|\hat{\bm{v}}\|_{H^{1}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2},
𝑹tb1L2(D)𝒖^L2(Ω)𝒖^H1(Ω)lnNNk+1,\displaystyle\|\bm{R}_{tb1}\|_{L^{2}(D)}\leq\|\hat{\bm{u}}\|_{L^{2}(\partial\Omega)}\lesssim\|\hat{\bm{u}}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+1},
𝑹tb2L2(D)𝒗^L2(Ω)𝒗^H1(Ω)lnNNk+2,\displaystyle\|\bm{R}_{tb2}\|_{L^{2}(D)}\leq\|\hat{\bm{v}}\|_{L^{2}(\partial\Omega)}\lesssim\|\hat{\bm{v}}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},
𝜺¯(𝑹tb1)L2(D),𝑹tb1L2(D)𝒖^H2(D)ln2NNk+2,\displaystyle\|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})\|_{L^{2}(D)},\|\nabla\cdot\bm{R}_{tb1}\|_{L^{2}(D)}\lesssim\|\hat{\bm{u}}\|_{H^{2}(D)}\lesssim{\rm ln}^{2}NN^{-k+2},
𝑹sb1L2(ΓD×[0,t])𝒗^L2(Ω)𝒗^H1(Ω)lnNNk+2,\displaystyle\|\bm{R}_{sb1}\|_{L^{2}(\Gamma_{D}\times[0,t])}\leq\|\hat{\bm{v}}\|_{L^{2}(\partial\Omega)}\lesssim\|\hat{\bm{v}}\|_{H^{1}(\Omega)}\lesssim{\rm ln}NN^{-k+2},
𝑹sb2L2(ΓN×[0,t])2μ𝜺¯(𝒖^)𝒏+λ(𝒖^)𝒏Ω𝒖^H2(Ω)ln2NNk+2.\displaystyle\|\bm{R}_{sb2}\|_{L^{2}(\Gamma_{N}\times[0,t])}\leq\|2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n}\|_{\partial\Omega}\lesssim\|\hat{\bm{u}}\|_{H^{2}(\Omega)}\lesssim{\rm ln}^{2}NN^{-k+2}.

Theorem 5.4: Let dd\in\mathbb{N}, 𝒖C1(Ω)\bm{u}\in C^{1}(\Omega) and 𝒗C(Ω)\bm{v}\in C(\Omega) be the classical solution to the linear elastodynamic equation (49). Let (𝒖θ,𝒗θ)(\bm{u}_{\theta},\bm{v}_{\theta}) denote the PINN approximation with the parameter θ\theta. then the following relation holds,

0TD(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dtCGTexp((2+2μ+λ)T),\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{G}T\exp\left((2+2\mu+\lambda)T\right),

where CGC_{G} is given in the proof.

Proof.

By taking the inner product of (56a) and (56b) with 𝒖^\hat{\bm{u}} and 𝒗^\hat{\bm{v}} and integrating over DD, respectively, we have

d2dtD|𝒖^|2d𝒙=D𝒖^𝒗^d𝒙+D𝑹int1𝒖^d𝒙D|𝒖^|2d𝒙+12D|𝑹int1|2d𝒙+12D|𝒗^|2d𝒙,\displaystyle\frac{d}{2dt}\int_{D}|\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}=\int_{D}\hat{\bm{u}}\hat{\bm{v}}{\,\rm{d}}\bm{x}+\int_{D}\bm{R}_{int1}\hat{\bm{u}}{\,\rm{d}}\bm{x}\leq\int_{D}|\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\bm{R}_{int1}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\hat{\bm{v}}|^{2}{\,\rm{d}}\bm{x}, (83)
ρd2dtD|𝒗^|2d𝒙=2μD𝜺¯(𝒖^):𝒗^d𝒙λD(𝒖^)(𝒗^)d𝒙+D(2μ𝜺¯(𝒖^)𝒏+λ(𝒖^)𝒏)𝒗^ds(𝒙)\displaystyle\rho\frac{d}{2dt}\int_{D}|\hat{\bm{v}}|^{2}{\,\rm{d}}\bm{x}=-2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\hat{\bm{v}}{\,\rm{d}}\bm{x}-\lambda\int_{D}(\nabla\cdot\hat{\bm{u}})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}+\int_{\partial D}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})
+D𝑹int2𝒗^d𝒙\displaystyle\qquad+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}
=2μD𝜺¯(𝒖^):𝒖^td𝒙+2μD𝜺¯(𝒖^):𝑹int1d𝒙λD(𝒖^)(𝒖^t)d𝒙+λD(𝑹int1)(𝒗^)d𝒙\displaystyle=-2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\hat{\bm{u}}_{t}{\,\rm{d}}\bm{x}+2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\bm{R}_{int1}{\,\rm{d}}\bm{x}-\lambda\int_{D}(\nabla\cdot\hat{\bm{u}})(\nabla\cdot\hat{\bm{u}}_{t}){\,\rm{d}}\bm{x}+\lambda\int_{D}(\nabla\cdot\bm{R}_{int1})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}
+ΓD(2μ𝜺¯(𝒖^)𝒏+λ(𝒖^)𝒏)𝑹sb1ds(𝒙)+ΓN𝑹sb2𝒗^ds(𝒙)+D𝑹int2𝒗^d𝒙\displaystyle\qquad+\int_{\Gamma_{D}}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\bm{R}_{sb1}{\,\rm{d}}s(\bm{x})+\int_{\Gamma_{N}}\bm{R}_{sb2}\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}
=ddtDμ|𝜺¯(𝒖^)|2d𝒙ddtDλ2|𝒖^|2d𝒙+2μD𝜺¯(𝒖^):𝑹int1d𝒙+λD(𝑹int1)(𝒗^)d𝒙\displaystyle=-\frac{d}{dt}\int_{D}\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}{\,\rm{d}}\bm{x}-\frac{d}{dt}\int_{D}\frac{\lambda}{2}|\nabla\cdot\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+2\mu\int_{D}\underline{\bm{\varepsilon}}(\hat{\bm{u}}):\nabla\bm{R}_{int1}{\,\rm{d}}\bm{x}+\lambda\int_{D}(\nabla\cdot\bm{R}_{int1})(\nabla\cdot\hat{\bm{v}}){\,\rm{d}}\bm{x}
+ΓD(2μ𝜺¯(𝒖^)𝒏+λ(𝒖^)𝒏)𝑹sb1ds(𝒙)+ΓN𝑹sb2𝒗^ds(𝒙)+D𝑹int2𝒗^d𝒙\displaystyle\qquad+\int_{\Gamma_{D}}(2\mu\underline{\bm{\varepsilon}}(\hat{\bm{u}})\bm{n}+\lambda(\nabla\cdot\hat{\bm{u}})\bm{n})\cdot\bm{R}_{sb1}{\,\rm{d}}s(\bm{x})+\int_{\Gamma_{N}}\bm{R}_{sb2}\cdot\hat{\bm{v}}{\,\rm{d}}s(\bm{x})+\int_{D}\bm{R}_{int2}\hat{\bm{v}}{\,\rm{d}}\bm{x}
ddtDμ|𝜺¯(𝒖^)|2d𝒙ddtDλ2|𝒖^|2d𝒙+μD|𝜺¯(𝒖^)|2d𝒙+μD|𝜺¯(𝑹int1)|2d𝒙\displaystyle\leq-\frac{d}{dt}\int_{D}\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}{\,\rm{d}}\bm{x}-\frac{d}{dt}\int_{D}\frac{\lambda}{2}|\nabla\cdot\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}{\,\rm{d}}\bm{x}
+λ2D|𝑹int1|d𝒙+λ2D|𝒗^|d𝒙+12D|𝒗^|2d𝒙+12D|𝑹int2|2d𝒙\displaystyle\qquad+\frac{\lambda}{2}\int_{D}|\nabla\cdot\bm{R}_{int1}|{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}|\nabla\cdot\hat{\bm{v}}|{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\hat{\bm{v}}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}|\bm{R}_{int2}|^{2}{\,\rm{d}}\bm{x}
+CΓD(ΓD|𝑹sb1|2ds(𝒙))12+CΓN(ΓN|𝑹sb2|2ds(𝒙))12.\displaystyle\qquad+C_{\Gamma_{D}}\left(\int_{\Gamma_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}+C_{\Gamma_{N}}\left(\int_{\Gamma_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}. (84)

Here we have used 𝒗^=𝒖^t𝑹int1\hat{\bm{v}}=\hat{\bm{u}}_{t}-\bm{R}_{int1}, and the constants are given by CΓD=(2μ+λ)|ΓD|12𝒖C1(ΓD×[0,T])+(2μ+λ)|ΓD|12𝒖θC1(ΓD×[0,T])C_{\Gamma_{D}}=(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}\|\bm{u}\|_{C^{1}(\Gamma_{D}\times[0,T])}+(2\mu+\lambda)|\Gamma_{D}|^{\frac{1}{2}}||\bm{u}_{\theta}||_{C^{1}(\Gamma_{D}\times[0,T])} and CΓN=|ΓN|12(𝒗C(ΓN×[0,T])+𝒗θC(ΓN×[0,T]))C_{\Gamma_{N}}=|\Gamma_{N}|^{\frac{1}{2}}(\|\bm{v}\|_{C(\Gamma_{N}\times[0,T])}+||\bm{v}_{\theta}||_{C(\Gamma_{N}\times[0,T])}).

Add (83) to (8.4), and we get,

d2dtD|𝒖^|2d𝒙+ddtDμ|𝜺¯(𝒖^)|2d𝒙+d2dtDλ|𝒖^|2d𝒙+ρd2dtD|𝒗^|2d𝒙\displaystyle\frac{d}{2dt}\int_{D}|\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+\frac{d}{dt}\int_{D}\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}{\,\rm{d}}\bm{x}+\frac{d}{2dt}\int_{D}\lambda|\nabla\cdot\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+\rho\frac{d}{2dt}\int_{D}|\hat{\bm{v}}|^{2}{\,\rm{d}}\bm{x}
D|𝒖^|2d𝒙+μD|𝜺¯(𝒖^)|2d𝒙+λ2D|𝒗^|d𝒙+D|𝒗^|2d𝒙+12D(|𝑹int1|2+|𝑹int2|2)d𝒙\displaystyle\qquad\leq\int_{D}|\hat{\bm{u}}|^{2}{\,\rm{d}}\bm{x}+\mu\int_{D}|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}|\nabla\cdot\hat{\bm{v}}|{\,\rm{d}}\bm{x}+\int_{D}|\hat{\bm{v}}|^{2}{\,\rm{d}}\bm{x}+\frac{1}{2}\int_{D}(|\bm{R}_{int1}|^{2}+|\bm{R}_{int2}|^{2}){\,\rm{d}}\bm{x}
+μD|𝜺¯(𝑹int1)|2d𝒙+λ2D|𝑹int1|d𝒙+CΓD(ΓD|𝑹sb1|2ds(𝒙))12+CΓN(ΓN|𝑹sb2|2ds(𝒙))12.\displaystyle\qquad\qquad+\mu\int_{D}|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}{\,\rm{d}}\bm{x}+\frac{\lambda}{2}\int_{D}|\nabla\cdot\bm{R}_{int1}|{\,\rm{d}}\bm{x}+C_{\Gamma_{D}}\left(\int_{\Gamma_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}+C_{\Gamma_{N}}\left(\int_{\Gamma_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x})\right)^{\frac{1}{2}}. (85)

Integrating (8.4) over [0,τ][0,\tau] for any τT\tau\leq T and applying Cauchy–Schwarz inequality, we obtain,

D|𝒖^(𝒙,τ)|2d𝒙+D2μ|𝜺¯(𝒖^(𝒙,τ))|2d𝒙+Dλ|𝒖^(𝒙,τ)|2d𝒙+ρD|𝒗^(𝒙,τ)|2d𝒙\displaystyle\int_{D}|\hat{\bm{u}}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},\tau))|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}|\hat{\bm{v}}(\bm{x},\tau)|^{2}{\,\rm{d}}\bm{x}
D|𝑹tb1|2d𝒙+D2μ|𝜺¯(𝑹tb1)|2d𝒙+Dλ|𝑹tb1|2d𝒙+ρD|𝑹tb2|2d𝒙\displaystyle\qquad\leq\int_{D}|\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda|\nabla\cdot\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}|\bm{R}_{tb2}|^{2}{\,\rm{d}}\bm{x}
+(2+2μ+λ)0τD(|𝒖^|2+|𝜺¯(𝒖^)|2+|𝒖^|2+|𝒗^|2)d𝒙dt\displaystyle\qquad\qquad+(2+2\mu+\lambda)\int_{0}^{\tau}\int_{D}\left(|\hat{\bm{u}}|^{2}+|\underline{\bm{\varepsilon}}(\hat{\bm{u}})|^{2}+|\nabla\cdot\hat{\bm{u}}|^{2}+|\hat{\bm{v}}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+0TD(|𝑹int1|2+2μ|𝜺¯(𝑹int1)|2+λ|𝑹int1|2+|𝑹int2|2)d𝒙dt\displaystyle\qquad\qquad+\int_{0}^{T}\int_{D}\left(|\bm{R}_{int1}|^{2}+2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}+\lambda|\nabla\cdot\bm{R}_{int1}|^{2}+|\bm{R}_{int2}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+2|T|12CΓD(0TΓD|𝑹sb1|2ds(𝒙)dt)12+2|T|12CΓN(0TΓN|𝑹sb2|2ds(𝒙)dt)12.\displaystyle\qquad\qquad+2|T|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2|T|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.

By applying the integral form of the Gro¨{\rm\ddot{o}}nwall inequality to the above inequality, we have

D(|𝒖^(𝒙,τ)|2+2μ|𝜺¯(𝒖^(𝒙,τ))|2+λ|𝒖^(𝒙,τ)|2+ρD|𝒗^(𝒙,τ)|2)d𝒙CGexp((2+2μ+λ)T),\int_{D}(|\hat{\bm{u}}(\bm{x},\tau)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},\tau))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},\tau)|^{2}+\rho\int_{D}|\hat{\bm{v}}(\bm{x},\tau)|^{2}){\,\rm{d}}\bm{x}\leq C_{G}\exp\left((2+2\mu+\lambda)T\right), (86)

where

CG=D|𝑹tb1|2d𝒙+D2μ|𝜺¯(𝑹tb1)|2d𝒙+Dλ|𝑹tb1|2d𝒙+ρD|𝑹tb2|2d𝒙\displaystyle C_{G}=\int_{D}|\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\int_{D}2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}{\,\rm{d}}\bm{x}+\int_{D}\lambda|\nabla\cdot\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}+\rho\int_{D}|\bm{R}_{tb2}|^{2}{\,\rm{d}}\bm{x}
+0TD(|𝑹int1|2+2μ|𝜺¯(𝑹int1)|2+λ|𝑹int1|2+|𝑹int2|2)d𝒙dt\displaystyle\qquad+\int_{0}^{T}\int_{D}\left(|\bm{R}_{int1}|^{2}+2\mu|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}+\lambda|\nabla\cdot\bm{R}_{int1}|^{2}+|\bm{R}_{int2}|^{2}\right){\,\rm{d}}\bm{x}{\,\rm{d}}t
+2|T|12CΓD(0TΓD|𝑹sb1|2ds(𝒙)dt)12+2|T|12CΓN(0TΓN|𝑹sb2|2ds(𝒙)dt)12.\displaystyle\qquad+2|T|^{\frac{1}{2}}C_{\Gamma_{D}}\left(\int_{0}^{T}\int_{\Gamma_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}+2|T|^{\frac{1}{2}}C_{\Gamma_{N}}\left(\int_{0}^{T}\int_{\Gamma_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t\right)^{\frac{1}{2}}.

Then, we finish the proof by integrating (86) over [0,T][0,T]. ∎

Theorem 5.5: Let dd\in\mathbb{N}, 𝒖C4(Ω)\bm{u}\in C^{4}(\Omega) and 𝒗C3(Ω)\bm{v}\in C^{3}(\Omega) be the classical solution to the linear elastodynamic equation (49). Let (𝒖θ,𝒗θ)(\bm{u}_{\theta},\bm{v}_{\theta}) denote the PINN approximation with the parameter θ\theta. Then the following relation holds,

0TD(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dtCTTexp((2+2μ+λ)T)\displaystyle\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq C_{T}T\exp\left((2+2\mu+\lambda)T\right)
=𝒪(T(θ)2+Mint2d+1+Mtb2d+Msb1d),\displaystyle\qquad=\mathcal{O}(\mathcal{E}_{T}(\theta)^{2}+M_{int}^{-\frac{2}{d+1}}+M_{tb}^{-\frac{2}{d}}+M_{sb}^{-\frac{1}{d}}),

where CTC_{T} is defined in the following proof.

Proof.

By the definitions of different components of the training error (55) and applying the estimate (18) on the quadrature error, we have

D|𝑹tb1|2d𝒙\displaystyle\int_{D}|\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|𝑹tb1|2d𝒙𝒬MtbD(𝑹tb12)+𝒬MtbD(𝑹tb12)\displaystyle=\int_{D}|\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})
C(𝑹tb12)Mtb2d+𝒬MtbD(𝑹tb12),\displaystyle\leq C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2}),
D|𝑹tb2|2d𝒙\displaystyle\int_{D}|\bm{R}_{tb2}|^{2}{\,\rm{d}}\bm{x} =D|𝑹tb2|2d𝒙𝒬MtbD(𝑹tb22)+𝒬MtbD(𝑹tb22)\displaystyle=\int_{D}|\bm{R}_{tb2}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})
C(𝑹tb22)Mtb2d+𝒬MtbD(𝑹tb22),\displaystyle\leq C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2}),
D|𝜺¯(𝑹tb1)|2d𝒙\displaystyle\int_{D}|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}{\,\rm{d}}\bm{x} =D|𝜺¯(𝑹tb1)|2d𝒙𝒬MtbD(|𝜺¯(𝑹tb1)|2)+𝒬MtbD(|𝜺¯(𝑹tb1)|2)\displaystyle=\int_{D}|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})+\mathcal{Q}_{M_{tb}}^{D}(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})
C(|𝜺¯(𝑹tb1)|2)Mtb2d+𝒬MtbD(|𝜺¯(𝑹tb1)|2),\displaystyle\leq C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2}),
D|𝑹tb1|2d𝒙\displaystyle\int_{D}|\nabla\cdot\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x} =D|𝑹tb1|2d𝒙𝒬MtbD(|𝑹tb1|2)+𝒬MtbD(|𝑹tb1|2)\displaystyle=\int_{D}|\nabla\cdot\bm{R}_{tb1}|^{2}{\,\rm{d}}\bm{x}-\mathcal{Q}_{M_{tb}}^{D}(|\nabla\cdot\bm{R}_{tb1}|^{2})+\mathcal{Q}_{M_{tb}}^{D}(|\nabla\cdot\bm{R}_{tb1}|^{2})
C(|𝑹tb1|2)Mtb2d+𝒬MtbD(|𝑹tb1|2),\displaystyle\leq C_{(|\nabla\cdot\bm{R}_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla\cdot\bm{R}_{tb1}|^{2}),
Ω|𝑹int1|2d𝒙dt\displaystyle\int_{\Omega}|\bm{R}_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|𝑹int1|2d𝒙dt𝒬MintΩ(𝑹int12)+𝒬MintΩ(𝑹int12)\displaystyle=\int_{\Omega}|\bm{R}_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})
C(𝑹int12)Mint2d+1+𝒬MintΩ(𝑹int12),\displaystyle\leq C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2}),
Ω|𝑹int2|2d𝒙dt\displaystyle\int_{\Omega}|\bm{R}_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|𝑹int2|2d𝒙dt𝒬MintΩ(𝑹int22)+𝒬MintΩ(𝑹int22)\displaystyle=\int_{\Omega}|\bm{R}_{int2}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})
C(𝑹int22)Mint2d+1+𝒬MintΩ(𝑹int22),\displaystyle\leq C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2}),
Ω|𝜺¯(𝑹int1)|2d𝒙dt\displaystyle\int_{\Omega}|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|𝜺¯(𝑹int1)|2d𝒙dt𝒬MintΩ(|𝜺¯(𝑹int1)|2)+𝒬MintΩ(|𝜺¯(𝑹int1)|2)\displaystyle=\int_{\Omega}|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})
C(|𝜺¯(𝑹int1)|2)Mint2d+1+𝒬MintΩ(|𝜺¯(𝑹int1)|2),\displaystyle\leq C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2}),
Ω|𝑹int1|2d𝒙dt\displaystyle\int_{\Omega}|\nabla\cdot\bm{R}_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t =Ω|𝑹int1|2d𝒙dt𝒬MintΩ(|𝑹int1|2)+𝒬MintΩ(|𝑹int1|2)\displaystyle=\int_{\Omega}|\nabla\cdot\bm{R}_{int1}|^{2}{\,\rm{d}}\bm{x}{\,\rm{d}}t-\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla\cdot\bm{R}_{int1}|^{2})+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla\cdot\bm{R}_{int1}|^{2})
C(|𝑹int1|2)Mint2d+1+𝒬MintΩ(|𝑹int1|2),\displaystyle\leq C_{(|\nabla\cdot\bm{R}_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla\cdot\bm{R}_{int1}|^{2}),
ΩD|𝑹sb1|2ds(𝒙)dt\displaystyle\int_{\Omega_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t =ΩD|𝑹sb1|2ds(𝒙)dt𝒬Msb1ΩD(𝑹sb12)+𝒬Msb1ΩD(𝑹sb12)\displaystyle=\int_{\Omega_{D}}|\bm{R}_{sb1}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})
C(𝑹sb12)Msb12d+𝒬Msb1ΩD(𝑹sb12),\displaystyle\leq C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2}),
ΩN|𝑹sb2|2ds(𝒙)dt\displaystyle\int_{\Omega_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t =ΩN|𝑹sb2|2ds(𝒙)dt𝒬Msb2ΩN(𝑹sb22)+𝒬Msb2ΩN(𝑹sb22)\displaystyle=\int_{\Omega_{N}}|\bm{R}_{sb2}|^{2}{\,\rm{d}}s(\bm{x}){\,\rm{d}}t-\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})
C(𝑹sb22)Msb22d+𝒬Msb2ΩN(𝑹sb22).\displaystyle\leq C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2}).

In light of the above inequalities and (86), we obtain

0TD(|𝒖^(𝒙,t)|2+2μ|𝜺¯(𝒖^(𝒙,t))|2+λ|𝒖^(𝒙,t)|2+ρ|𝒗^(𝒙,t)|2)d𝒙dtTCTexp((2+2μ+λ)T),\int_{0}^{T}\int_{D}(|\hat{\bm{u}}(\bm{x},t)|^{2}+2\mu|\underline{\bm{\varepsilon}}(\hat{\bm{u}}(\bm{x},t))|^{2}+\lambda|\nabla\cdot\hat{\bm{u}}(\bm{x},t)|^{2}+\rho|\hat{\bm{v}}(\bm{x},t)|^{2}){\,\rm{d}}\bm{x}{\,\rm{d}}t\leq TC_{T}\exp\left((2+2\mu+\lambda)T\right),

where

CT=\displaystyle C_{T}= C(𝑹tb12)Mtb2d+𝒬MtbD(𝑹tb12)+ρ(C(𝑹tb22)Mtb2d+𝒬MtbD(𝑹tb22))+2μ(C(|𝜺¯(𝑹tb1)|2)Mtb2d+𝒬MtbD(|𝜺¯(𝑹tb1)|2))\displaystyle C_{({\bm{R}_{tb1}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb1}^{2})+\rho\left(C_{({\bm{R}_{tb2}^{2}})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(\bm{R}_{tb2}^{2})\right)+2\mu\left(C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\underline{\bm{\varepsilon}}(\bm{R}_{tb1})|^{2})\right)
+λ(C(|𝑹tb1|2)Mtb2d+𝒬MtbD(|𝑹tb1|2))+C(𝑹int12)Mint2d+1+𝒬MintΩ(𝑹int12)\displaystyle+\lambda\left(C_{(|\nabla\cdot\bm{R}_{tb1}|^{2})}M_{tb}^{-\frac{2}{d}}+\mathcal{Q}_{M_{tb}}^{D}(|\nabla\cdot\bm{R}_{tb1}|^{2})\right)+C_{({\bm{R}_{int1}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int1}^{2})
+C(𝑹int22)Mint2d+1+𝒬MintΩ(𝑹int22)+2μ(C(|𝜺¯(𝑹int1)|2)Mint2d+1+𝒬MintΩ(|𝜺¯(𝑹int1)|2))\displaystyle+C_{({\bm{R}_{int2}^{2}})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(\bm{R}_{int2}^{2})+2\mu\left(C_{(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\underline{\bm{\varepsilon}}(\bm{R}_{int1})|^{2})\right)
+λ(C(|𝑹int1|2)Mint2d+1+𝒬MintΩ(|𝑹int1|2))+2|T|12CΓD(C(𝑹sb12)Msb12d+𝒬Msb1ΩD(𝑹sb12))12\displaystyle+\lambda\left(C_{(|\nabla\cdot\bm{R}_{int1}|^{2})}M_{int}^{-\frac{2}{d+1}}+\mathcal{Q}_{M_{int}}^{\Omega}(|\nabla\cdot\bm{R}_{int1}|^{2})\right)+2|T|^{\frac{1}{2}}C_{\Gamma_{D}}\left(C_{({\bm{R}_{sb1}^{2}})}M_{sb1}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb1}}^{\Omega_{D}}(\bm{R}_{sb1}^{2})\right)^{\frac{1}{2}}
+2|T|12CΓN(C(𝑹sb22)Msb22d+𝒬Msb2ΩN(𝑹sb22))12.\displaystyle+2|T|^{\frac{1}{2}}C_{\Gamma_{N}}\left(C_{({\bm{R}_{sb2}^{2}})}M_{sb2}^{-\frac{2}{d}}+\mathcal{Q}_{M_{sb2}}^{\Omega_{N}}(\bm{R}_{sb2}^{2})\right)^{\frac{1}{2}}.

The boundedness of the constants C(𝑹q2)C(\bm{R}_{q}^{2}) can be obtained from Lemma 8.4 and 𝑹q2Cn2n𝑹qCn2\|\bm{R}_{q}^{2}\|_{C^{n}}\leq 2^{n}\|\bm{R}_{q}\|_{C^{n}}^{2}, with 𝑹q=𝑹tb1\bm{R}_{q}=\bm{R}_{tb1}, 𝑹tb2\bm{R}_{tb2}, 𝜺¯(𝑹tb1)\underline{\bm{\varepsilon}}(\bm{R}_{tb1}), 𝑹tb1\nabla\cdot\bm{R}_{tb1}, 𝑹int1\bm{R}_{int1}, 𝑹int2\bm{R}_{int2}, 𝜺¯(𝑹int1)\underline{\bm{\varepsilon}}(\bm{R}_{int1}), 𝑹int1\nabla\cdot\bm{R}_{int1}, 𝑹sb1\bm{R}_{sb1} and 𝑹sb2\bm{R}_{sb2}. ∎

References

  • [1] P. F. Antonietti and I. Mazzieri. High-order discontinuous Galerkin methods for the elastodynamics equation on polygonal and polyhedral meshes. Comput. Methods Appl. Mech. Engrg., 342:414–437, 2018.
  • [2] Genming Bai, Ujjwal Koley, Siddhartha Mishra, and Roberto Molinaro. Physics informed neural networks (PINNs) for approximating nonlinear dispersive PDEs. J. Comput. Math., 39(6):816–847, 2021.
  • [3] Christian Beck, Weinan E, and Arnulf Jentzen. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci., 29(4):1563–1619, 2019.
  • [4] Julius Berner, Philipp Grohs, and Arnulf Jentzen. Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci., 2(3):631–657, 2020.
  • [5] Animikh Biswas, Jing Tian, and Süleyman Ulusoy. Error estimates for deep learning methods in fluid dynamics. Numer. Math., 151(3):753–777, 2022.
  • [6] Z. Cai, J. Chen, M. Liu, and X. Liu. Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs. J. Comput. Phys., 420:109707, 2020.
  • [7] F. Calabro, G. Fabiani, and C. Siettos. Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Engrg., 387:114188, 2021.
  • [8] Ovidiu Calin. Deep learning architectures–a mathematical approach. Springer Series in the Data Sciences. Springer, Cham, 2020.
  • [9] Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi, and Francesco Piccialli. Scientific machine learning through physics-informed neural networks: where we are and what’s next. J. Sci. Comput., 92(3):Paper No. 88, 62, 2022.
  • [10] E.C. Cyr, M.A. Gulian, R.G. Patel, M. Perego, and N.A. Trask. Robust training and initialization of deep neural networks: An adaptive basis viewpoint. Proceedings of Machine Learning Research, 107:512–536, 2020.
  • [11] P.J. Davis and P. Rabinowitz. Methods of numerical integration. Dover Publications, Inc, 2007.
  • [12] Tim De Ryck, Ameya D Jagtap, and Siddhartha Mishra. Error estimates for physics-informed neural networks approximating the Navier–Stokes equations. IMA J. Numer. Anal., 0:1–37, 2023.
  • [13] Tim De Ryck, Samuel Lanthaler, and Siddhartha Mishra. On the approximation of functions by tanh neural networks. Neural Networks, 143:732–750, 2021.
  • [14] Tim De Ryck and Siddhartha Mishra. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. Adv. Comput. Math., 48(79), 2022.
  • [15] S. Dong and Z. Li. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Engrg., 387:114129, 2021. (also arXiv:2012.02895).
  • [16] S. Dong and N. Ni. A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural networks. J. Comput. Phys., 435:110242, 2021.
  • [17] S. Dong and Y. Wang. A method for computing inverse parametric PDE problems with randomized neural networks. arXiv:2210.04338, 2022.
  • [18] S. Dong and J. Yang. Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput. Methods Appl. Mech. Engrg., 398:115284, 2022. (also arXiv:2201.09989).
  • [19] S. Dong and J. Yang. On computing the hyperparameter of extreme learning machines: algorithms and applications to computational PDEs, and comparison with classical and high-order finite elements. J. Comput. Phys., 463:111290, 2022. (also arXiv:2110.14121).
  • [20] Suchuan Dong and Zongwei Li. A modified batch intrinsic plasticity method for pre-training the random coefficients of extreme learning machines. J. Comput. Phys., 445:Paper No. 110585, 31, 2021.
  • [21] W. E and B. Yu. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat., 6:1–12, 2018.
  • [22] Dennis Elbrächter, Dmytro Perekrestenko, Philipp Grohs, and Helmut Bölcskei. Deep neural network approximation theory. IEEE Trans. Inform. Theory, 67(5):2581–2623, 2021.
  • [23] G. Fabiani, F. Calabro, L. Russo, and C. Siettos. Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J. Sci. Comput., 89:44, 2021.
  • [24] Rui Fang, David Sondak, Pavlos Protopapas, and Sauro Succi. Neural network models for the anisotropic Reynolds stress tensor in turbulent channel flow. J. Turbul., 21(9-10):525–543, 2020.
  • [25] J. He and J. Xu. MgNet: A unified framework for multigrid and convolutional neural network. Sci. China Math., 62:1331–1354, 2019.
  • [26] Ruimeng Hu, Quyuan Lin, Raydan Alan, and Sui Tang. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. arXiv:2209.11929.
  • [27] Z. Hu, C. Liu, Y. Wang, and Z. Xu. Energetic variational neural network discretizations to gradient flows. arXiv:2206.07303, 2022.
  • [28] Thomas J. R. Hughes and Jerrold E. Marsden. Classical elastodynamics as a linear symmetric hyperbolic system. J. Elasticity, 8(1):97–110, 1978.
  • [29] A.D. Jagtap and G.E. Karniadakis. Extended physics-informed neural network (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys., 28:2002–2041, 2020.
  • [30] A.D. Jagtap, E. Kharazmi, and G.E. Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Engrg., 365:113028, 2020.
  • [31] G.E. Karniadakis, G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physics-informed machine learning. Nat. Rev. Phys., 3:422–440, 2021.
  • [32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [33] A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, and M.W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
  • [34] Kôji Kubota and Kazuyoshi Yokoyama. Global existence of classical solutions to systems of nonlinear wave equations with different speeds of propagation. Japan. J. Math., 27(1):113–202, 2001.
  • [35] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.
  • [36] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: a deep learning library for solving differential equations. SIAM Rev., 63(1):208–228, 2021.
  • [37] Siddhartha Mishra and Roberto Molinaro. Physics informed neural networks for simulating radiative transfer. J. Quant. Spectrosc. Radiat. Transfer, 270:107705, 2021.
  • [38] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal., 42(2):981–1022, 2022.
  • [39] Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal., 0(2):1–43, 2022.
  • [40] Siddhartha Mishra and T. Konstantin Rusch. Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences. SIAM J. Numer. Anal., 59(3):1811–1834, 2021.
  • [41] Partha Niyogi and Federico Girosi. Generalization bounds for function approximation from scattered noisy data. Adv. Comput. Math., 10(1):51–80, 1999.
  • [42] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, New York, second edition, 2006.
  • [43] R.G. Patel, I. Manickam, N.A. Trask, M.A. Wood, M. Lee, I. Tomas, and E.C. Cyr. Thermodynamically consistent physics-informed neural networks for hyperbolic systems. J. Comput. Phys., 449:110754, 2022.
  • [44] M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, and R.M. Kirby. A unified scalable framework for causal sweeping strategies for physics-informed neural networks (pinns) and their temporal decompositions. arXiv:2302.14227, 2023.
  • [45] Maziar Raissi and George Em Karniadakis. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
  • [46] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
  • [47] Jalal Shatah. Global existence of small solutions to nonlinear evolution equations. J. Differential Equations, 46(3):409–425, 1982.
  • [48] Jalal Shatah. Normal forms and quadratic nonlinear Klein-Gordon equations. Comm. Pure Appl. Math., 38(5):685–696, 1985.
  • [49] Yeonjong Shin, Jérôme Darbon, and George Em Karniadakis. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys., 28(5):2042–2074, 2020.
  • [50] Yeonjong Shin, Zhongqiang Zhang, and George Em Karniadakis. Error estimates of residual minimization using neural networks for linear PDEs. arXiv:2010.08019.
  • [51] J.W. Siegel, Q. Hong, X. Jin, W. Hao, and J. Xu. Greedy training algorithms for neural networks and applications to PDEs. arXiv:2107.04466, 2022.
  • [52] J. Sirignano and K. Spoliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:1339–1364, 2018.
  • [53] K. Tang, X. Wan, and Q. Liao. Adaptive deep density estimation for fokker-planck equations. J. Comput. Phys., 457:111080, 2022.
  • [54] A.M. Tartakovsky, C.O. Marrero, P. Perdikaris, G.D. Tartakovsky, and D. Barajas-Solano. Physics-informed deep neural networks for learning parameters and constitutive relationships in subsurface flow problems. Water Resour. Res., 56:e2019WR026731, 2020.
  • [55] Roger Temam. Infinite-dimensional dynamical systems in mechanics and physics, volume 68 of Applied Mathematical Sciences. Springer-Verlag, New York, second edition, 1997.
  • [56] Nils Thuerey, Konstantin Weißenow, Lukas Prantl, and Xiangyu Hu. Deep learning methods for reynolds-averaged Navier–Stokes simulations of airfoil flows. AIAA J., 58(1):25–36, 2020.
  • [57] X. Wan and S. Wei. VAE-KRnet and its applications to variational Bayes. Commun. Comput. Phys., 31:1049–1082, 2022.
  • [58] Baoxiang Wang. Classical global solutions for non-linear Klein-Gordon-Schrödinger equations. Math. Methods Appl. Sci., 20(7):599–616, 1997.
  • [59] Jianxun Wang, Jinlong Wu, and Heng Xiao. Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids, 2(3):034603, 2017.
  • [60] S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: a neural tangent kernel perspective. J. Comput. Phys., 449:110768, 2022.
  • [61] Y. Wang and G. Lin. Efficient deep learning techniques for multiphase flow simulation in heterogeneous porous media. J. Comput. Phys., 401:108968, 2020.
  • [62] Kôsaku Yosida. Functional analysis, volume 123. Springer-Verlag, Berlin-New York, sixth edition, 1980.
  • [63] Umberto Zerbinati. PINNs and GaLS: A priori error estimates for shallow physics informed neural networks applied to elliptic problems. IFAC-PapersOnLine, 55(20):61–66, 2022.