This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\emails

hailong2019@iscas.ac.cn (H. Sheng), chao_yang@pku.edu.cn (C. Yang)

PFNN-2: A Domain Decomposed Penalty-Free Neural Network Method for Solving Partial Differential Equations

Hailong Sheng 1,3    and Chao Yang\comma\corrauth 2 11affiliationmark:  Institute of Software, Chinese Academy of Sciences, Beijing 100190, P.R. China.
22affiliationmark:  School of Mathematical Sciences, Peking University, Beijing 100871, P.R. China.
33affiliationmark:  University of Chinese Academy of Sciences, Beijing 100190, P.R. China.
Abstract

A new penalty-free neural network method, PFNN-2, is presented for solving partial differential equations, which is a subsequent improvement of our previously proposed PFNN method [1]. PFNN-2 inherits all advantages of PFNN in handling the smoothness constraints and essential boundary conditions of self-adjoint problems with complex geometries, and extends the application to a broader range of non-self-adjoint time-dependent differential equations. In addition, PFNN-2 introduces an overlapping domain decomposition strategy to substantially improve the training efficiency without sacrificing accuracy. Experiments results on a series of partial differential equations are reported, which demonstrate that PFNN-2 can outperform state-of-the-art neural network methods in various aspects such as numerical accuracy, convergence speed, and parallel scalability.

keywords:
neural network, penalty-free method, domain decomposition, initial-boundary value problem, partial differential equation.
\ams

65M55, 68T99, 76R99

1 Introduction

In recent years, neural network methods are becoming an attractive alternative for solving partial differential equations (PDEs) arising from applications such as fluid dynamics [2, 3, 4], quantum mechanics [5, 6, 7], molecular dynamics [8], material sciences [9] and geophysics [10, 11]. In contrast to most traditional numerical approaches, methods based on neural networks are naturally meshfree and intrinsically nonlinear therefore can be applied without going through the cumbersome step of mesh generation and could be more potentially applicable to complicated nonlinear problems. These advantages have enabled neural network methods to draw increasingly more attention with both early studies using shallow neural networks [12, 13, 14, 15, 16, 17, 18] and recent works with the advent of deep learning technology [19, 20, 21, 5, 22, 23, 24, 25].

Despite of the tremendous efforts to improve the performance of neural network methods for solving PDEs, there are still issues that require further study. The first issue is related to the accuracy of neural network methods. It is still not fully understood how well a neural network can approximate the solution of a PDE in either theory or practice. Recent investigations were carried out to reveal some preliminary approximation properties of neural networks for simplified problems [26, 27, 28, 29], or to improve the accuracy of neural network methods from various aspects such as by introducing weak form loss functions to relax the smoothness constraints [21, 30, 31, 32, 33, 34], by modifying the solution structure to automatically satisfy the initial-boundary conditions [15, 17, 18, 35, 36], by revising the network structure to enhance its representative capability [37, 38, 39, 40], and by introducing advanced sampling strategies to reduce the statistical error [41, 42]. These improvements so far are usually effective in respective situations, but are not general enough to adapt with different types of PDEs defined on complex geometries.

Another difficulty for solving PDEs with neural network methods is related to the efficiency. It is widely known that training a neural network is usually much more costly than solving a linear system resulted from a traditional discretization of the PDE [43, 44, 45]. To improve the efficiency, it is natural and has been extensively considered [4, 46] to utilize distributed training techniques [47, 48] based on data or model parallelism, by which the training task is split into a number of small sub-tasks according to the partitioned datasets or model parameters so that multiple processors can be exploited. Although this distributed training is general and successful in handling many machine learning tasks [49, 50, 51, 52], it is not the most effective choice for solving PDEs because no specific knowledge of the original problem is adopted throughout the training process.

Inspired by the idea of classical domain decomposition methods [53], it was proposed to introduce domain decomposition strategies into neural network based PDE solvers [54, 55, 56, 57, 58, 59, 60, 61, 62] by dividing the learning task into training a series of sub-networks related to solutions on sub-domains. This tends to be more natural than the plain distributed training approaches because by introducing domain decomposition the training of each sub-network only requires a small part of dataset related to the corresponding sub-domain, therefore can significantly decrease the computational cost. However, these methods may still suffer from issues related to low convergence speed and poor parallel scalability due in large part to the straightforward treatment of artificial sub-domain boundaries.

In this paper, we present PFNN-2, a new penalty-free neural network method that is a subsequent improvement of our previously proposed PFNN method [1]. Inheriting all advantages of PFNN in handling the smoothness constraints and essential boundary conditions of self-adjoint PDEs, PFNN-2 extends the application to a broader range of non-self-adjoint time-dependent problems and introduces a domain decomposition strategy to improve the training efficiency. In particular, PFNN-2 employs a Galerkin variational principle to transform the non-self-adjoint time-dependent equation into a weak form, and adopts compactly supported test functions that are convenient to build on complex geometries. To further reduce the difficulty of the training process and improve the accuracy, PFNN-2 separates the penalty terms related to the initial-boundary constraints from the loss function, with a dedicated neural network learning the initial and essential boundary conditions, and the true solution on the rest of the domain approximated by another network that is totally disentangled from the former one. On top of the above, PFNN-2 introduces an overlapping domain decomposition strategy to speedup multi-processor training without sacrificing accuracy, thanks to the penalty-free treatment of initial-boundary conditions of the decomposed sub-problems and the weak form loss function with reduced smoothness requirement. We carry out a series of numerical experiments to demonstrate that PFNN-2 is advantageous over existing state-of-the-art algorithms in terms of both numerical accuracy and computational efficiency.

The remainder of the paper is organized as follows. In Section 2 we present the basic algorithms employed in PFNN-2 to handle the smoothness constraints and initial-boundary conditions. Following that the domain decomposition strategy adopted by PFNN-2 for parallel computing is introduced in Section 3. We then discuss and compare PFNN-2 with several related state-of-the-art neural network methods in Section 4. Numerical experiment results on a series of linear and nonlinear, time-independent and time-dependent, initial-boundary value problems on simple and complex geometries are reported in Section 5. The paper is concluded in Section 6.

2 The PFNN-2 method

For the ease of discussion, we first consider the case of a single sub-domain Ωd𝒙\Omega\subset\mathbb{R}^{d_{\bm{x}}} (d𝒙+d_{\bm{x}}\in\mathbb{N}_{+}) on which we try to solve the following initial-boundary value problem:

{ut(𝒜u)+𝒞=0,inΩ×(0,T],u=rD,onΓD×(0,T],(𝒜u)𝒏=rN,onΓN×(0,T],u=r0,onΩ¯×{0},\left\{\begin{array}[]{r l l}\dfrac{\partial u}{\partial t}-\nabla\cdot\left(\mathcal{A}\nabla u-\mathcal{B}\right)+\mathcal{C}&=0,&\mbox{in}\ \Omega\times(0,T],\\[7.11317pt] u&=r_{D},&\mbox{on}\ \Gamma_{D}\times(0,T],\\[5.69054pt] \left(\mathcal{A}\nabla u\right)\cdot\bm{n}&=r_{N},&\mbox{on}\ \Gamma_{N}\times(0,T],\\[5.69054pt] u&=r_{0},&\mbox{on}\ \overline{\Omega}\times\{0\},\end{array}\right. (1)

where 𝒜\mathcal{A}, \mathcal{B} and 𝒞\mathcal{C} are dependent on the solution uu as well as the spatial and temporal coordinates 𝒙\bm{x} and tt, and 𝒏\bm{n} is the outward unit normal. Both Dirichlet and Neumann boundary conditions are considered, with the corresponding boundaries ΓDΓN=Ω\Gamma_{D}\cup\Gamma_{N}=\partial\Omega and ΓDΓN=\Gamma_{D}\cap\Gamma_{N}=\varnothing.

To reduce the smoothness requirement, we introduce the following weak form

0TΩ(𝒜uv+(ut++𝒞)v)𝑑𝒙𝑑t0TΓNrNv𝑑𝒙𝑑t=0,v𝒱\displaystyle\int_{0}^{T}\int_{\Omega}\left(\mathcal{A}\nabla u\cdot\nabla v+\Big{(}\frac{\partial u}{\partial t}+\nabla\cdot\mathcal{B}+\mathcal{C}\Big{)}v\right)d\bm{x}dt-\displaystyle\int_{0}^{T}\int_{\Gamma_{N}}r_{N}vd\bm{x}dt=0,\quad\forall v\in\mathcal{V} (2)

with the solution uu belonging to a trial (hypothesis) space \mathcal{H} and satisfying initial-boundary conditions

u=rD,onΓD×(0,T]andu=r0,onΩ¯×{0},u=r_{D},\ \mbox{on}\ \Gamma_{D}\times(0,T]\quad\mbox{and}\quad u=r_{0},\ \mbox{on}\ \overline{\Omega}\times\{0\},

where 𝒱\mathcal{V} is a certain test space, in which any function vv satisfies v=0v=0 on ΓD×(0,T]\Gamma_{D}\times(0,T].

In PFNN-2, the approximate solution of problem (2) is found within the hypothesis space \mathcal{H} that is formed by neural networks. Unlike most of the existing methods [19, 20, 21, 30, 31, 32, 34] that merely apply a single network to construct \mathcal{H}, PFNN-2 adopts two neural networks, with one network g𝜽1g_{\bm{\theta}_{1}} learning the solution with the initial condition and essential boundary condition more accurately and swiftly, and another network f𝜽2f_{\bm{\theta}_{2}} approximating the solution on the rest part of the domain, where 𝜽1\bm{\theta}_{1}, 𝜽2\bm{\theta}_{2} donate the sets of weights and biases forming the two networks, respectively. To eliminate the influence of f𝜽2f_{\bm{\theta}_{2}} on the initial condition and essential boundary condition, a length factor function is introduced to impose appropriate restriction, which satisfies

{=0,on(Ω¯×{0})(ΓD×(0,T]),>0,otherwise.\left\{\begin{array}[]{l l}\ell=0,&\mbox{on}\ (\overline{\Omega}\times\{0\})\cup(\Gamma_{D}\times(0,T]),\\[5.69054pt] \ell>0,&\mbox{otherwise}.\end{array}\right. (3)

With the help of the length factor function, any function in the hypothesis space \mathcal{H} is constructed as follows:

w𝜽:=g𝜽1+f𝜽2,w_{\bm{\theta}}:=g_{\bm{\theta}_{1}}+\ell f_{\bm{\theta}_{2}}, (4)

where 𝜽={𝜽1,𝜽2}\bm{\theta}=\{\bm{\theta}_{1},\bm{\theta}_{2}\}.

The length factor function \ell is established by utilizing spline functions, which are more flexible to adapt to various complicated geometries than the analytical functions proposed in previous works [15, 35, 36]. To be specific, we divide the essential boundary ΓD\Gamma_{D} into several segments {γj}j=1nD\{\gamma_{j}\}_{j=1}^{n_{D}}, nD+n_{D}\in\mathbb{N}_{+}. For each segment γj\gamma_{j}, select another segment of the boundary as its companion γjo\gamma_{j_{o}} (note that γjo\gamma_{j_{o}} should not be adjacent to γj\gamma_{j}). Then, for each γj\gamma_{j}, a spline function ljl_{j} is built, which satisfies

{lj(𝒙)=0,𝒙γj,lj(𝒙)=1,𝒙γjo,0<lj(𝒙)<1,otherwise.\left\{\begin{array}[]{l l}l_{j}(\bm{x})=0,&\bm{x}\in\gamma_{j},\\[5.69054pt] l_{j}(\bm{x})=1,&\bm{x}\in\gamma_{j_{o}},\\[5.69054pt] 0<l_{j}(\bm{x})<1,&\mbox{otherwise}.\end{array}\right. (5)

The specific details to construct ljl_{j} can be found in our previous work of PFNN [1]. Utilizing all the ljl_{j} functions, the length factor function \ell, which is dependent on both space and time, is established by:

(𝒙,t):=~(𝒙,t)max(𝒙^,t^)Ω×[0,T]~(𝒙^,t^),where~(𝒙,t)=tTj=1nD1(1lj(𝒙))μ,\ell(\bm{x},t):=\frac{\widetilde{\ell}(\bm{x},t)}{\max\limits_{(\widehat{\bm{x}},\widehat{t})\in\Omega\times[0,T]}\widetilde{\ell}(\widehat{\bm{x}},\widehat{t})},\quad\mbox{where}\quad\widetilde{\ell}(\bm{x},t)=\frac{t}{T}\prod_{j=1}^{n_{D}}1-(1-l_{j}(\bm{x}))^{\mu}, (6)

where μ>0\mu>0 is adopted for adjusting the shape of \ell. In order to avoid that the value of \ell drops dramatically with the growth of nDn_{D}, we suggest that μ=nD\mu=n_{D}.

Refer to caption
(a) boundary
Refer to caption
(b) l1l_{1}
Refer to caption
(c) l2l_{2}
Refer to caption
(d) \ell(μ=2\mu=2)
Figure 1: An demonstration on the construction of the length factor function.

Figure 1 is an example to demonstrate how to construct the function \ell, in which the essential boundary ΓD\Gamma_{D} is divided into two parts, γ1\gamma_{1} and γ2\gamma_{2}. Correspondingly, another two segments, γ3\gamma_{3} and γ4\gamma_{4}, are taken from the Neumann boundary ΓN\Gamma_{N} as the companions γ1o\gamma_{1_{o}} and γ2o\gamma_{2_{o}}, respectively. Then two spline functions l1l_{1} and l2l_{2} satisfying (5) can be built according to the method in PFNN [1] and the length factor function is established according to (6).

To improve the flexibility to complex geometries and reduce the cost to evaluation residual (2), rather than the commonly utilized orthogonal polynomials [32, 33], a set of compactly supported test functions

vs(𝒙,t;𝒙^s,t^s,hs):=max{1|tt^s|hs,0}j=1d𝒙max{1|xjx^s,j|hs,0},s=1,2,,nv,nv+\small v_{s}(\bm{x},t;\widehat{\bm{x}}_{s},\widehat{t}_{s},h_{s}):=\max\left\{1-\frac{|t-\widehat{t}_{s}|}{h_{s}},0\right\}\prod_{j=1}^{d_{\bm{x}}}\max\left\{1-\frac{|x_{j}-\widehat{x}_{s,j}|}{h_{s}},0\right\},\ s=1,2,\cdots,n_{v},\ n_{v}\in\mathbb{N}_{+} (7)

are employed to span the test space 𝒱\mathcal{V}, where (𝒙^s,t^s)(\widehat{\bm{x}}_{s},\widehat{t}_{s}) and hsh_{s} are the center and radius of the support Gs=j=1d𝒙{(𝒙,t)||xjx^s,j|<hs}{(𝒙,t)||tt^s|<hs}G_{s}=\cap_{j=1}^{d_{\bm{x}}}\{(\bm{x},t)||x_{j}-\widehat{x}_{s,j}|<h_{s}\}\cap\{(\bm{x},t)||t-\widehat{t}_{s}|<h_{s}\} of vsv_{s}, respectively, and xjx_{j} and x^s,j\widehat{x}_{s,j} are the jjth element of 𝒙\bm{x} and 𝒙^s\widehat{\bm{x}}_{s}, respectively. The center (𝒙^s,t^s)(\widehat{\bm{x}}_{s},\widehat{t}_{s}) is sampled on (ΩΓN)×[0,T](\Omega\cup\Gamma_{N})\times[0,T] while the radius hsh_{s} is small to ensure that the residual can be just evaluated on a tiny region instead of the whole domain. To achieve this, the radius hsh_{s} is set to:

hs:=min{δ(𝒙^s,ΓD),t^s,Tt^s,h¯},h_{s}:=\min\{\delta(\widehat{\bm{x}}_{s},\Gamma_{D}),\widehat{t}_{s},T-\widehat{t}_{s},\overline{h}\}, (8)

where δ(𝒙^s,ΓD)\delta(\widehat{\bm{x}}_{s},\Gamma_{D}) is the minimum distance from 𝒙^s\widehat{\bm{x}}_{s} to ΓD\Gamma_{D} so that vs=0v_{s}=0 on ΓD\Gamma_{D}, and h¯\overline{h} is the upper bound of hsh_{s}, which is defined as

h¯=(|Ω|Tnv)1d𝒙+1.\overline{h}=\left(\frac{|\Omega|T}{n_{v}}\right)^{\frac{1}{d_{\bm{x}}+1}}.

Such configuration keeps s=1nv|Gs|2d𝒙+1|Ω|T\sum_{s=1}^{n_{v}}|G_{s}|\approx 2^{d_{\bm{x}}+1}|\Omega|T and guarantees that the supports of all the test functions could cover most part of the computing domain.

Since the two neural networks g𝜽1g_{\bm{\theta}_{1}} and f𝜽2f_{\bm{\theta}_{2}} are decoupled by the length factor function \ell, they can be trained separately. First, g𝜽1g_{\bm{\theta}_{1}} is trained to learn the initial and essential boundary condition via minimizing the loss function

Ψ[g𝜽1]:=(𝒙,t)S(ΓD×(0,T])(g𝜽1(𝒙,t)rD(𝒙,t))2+(𝒙,t)S(Ω¯i×{0})(g𝜽1(𝒙,t)r0(𝒙,t))2,\Psi[g_{\bm{\theta}_{1}}]:=\sum\limits_{(\bm{x},t)\in S(\Gamma_{D}\times(0,T])}\big{(}g_{\bm{\theta}_{1}}(\bm{x},t)-r_{D}(\bm{x},t)\big{)}^{2}+\sum\limits_{(\bm{x},t)\in S(\overline{\Omega}_{i}\times\{0\})}\big{(}g_{\bm{\theta}_{1}}(\bm{x},t)-r_{0}(\bm{x},t)\big{)}^{2}, (9)

where S()S(\Box) represents a set of points sampled on \Box. Then, let g𝜽1g_{\bm{\theta}_{1}^{\ast}} be the obtained network after minimizing (9) and w~𝜽2=g𝜽1+f𝜽2\widetilde{w}_{\bm{\theta}_{2}}=g_{\bm{\theta}_{1}^{\ast}}+\ell f_{\bm{\theta}_{2}}. An approximate solution uu^{\ast} of problem (2) can be obtained by minimizing LL as follows:

u=argminw~𝜽2L[w~𝜽2]=argminw~𝜽2s=1nvR2[w~𝜽2;vs],u^{\ast}=\arg\min\limits_{\widetilde{w}_{\bm{\theta}_{2}}}L[\widetilde{w}_{\bm{\theta}_{2}}]=\arg\min\limits_{\widetilde{w}_{\bm{\theta}_{2}}}\sum_{s=1}^{n_{v}}R^{2}[\widetilde{w}_{\bm{\theta}_{2}};v_{s}], (10)

where

R[w~𝜽2;vs]:=|Ω|T#S(Ω×(0,T])(𝒙,t)S(Ω×[0,T])[𝒜w~𝜽2vs+(w~𝜽2t++𝒞)vs](𝒙,t)|ΓN|T#S(ΓN×(0,T])(𝒙,t)S(ΓN×[0,T])[rNvs](𝒙,t)\small\begin{array}[]{r l}R[\widetilde{w}_{\bm{\theta}_{2}};v_{s}]&:=\displaystyle\frac{|\Omega|T}{\#S(\Omega\times(0,T])}\sum\limits_{(\bm{x},t)\in S(\Omega\times[0,T])}\Big{[}\mathcal{A}\nabla\widetilde{w}_{\bm{\theta}_{2}}\cdot\nabla v_{s}+\Big{(}\dfrac{\partial\widetilde{w}_{\bm{\theta}_{2}}}{\partial t}+\nabla\cdot\mathcal{B}+\mathcal{C}\Big{)}v_{s}\Big{]}\Big{(}\bm{x},t\Big{)}\\ &\ -\displaystyle\frac{|\Gamma_{N}|T}{\#S(\Gamma_{N}\times(0,T])}\sum\limits_{(\bm{x},t)\in S(\Gamma_{N}\times[0,T])}\big{[}r_{N}v_{s}\big{]}\big{(}\bm{x},t\big{)}\end{array} (11)

is the estimate of the residual of the governing equation (2), #S()\#S(\Box) donates the size of S()S(\Box). Eq. (10) is taken as the loss function for training f𝜽2f_{\bm{\theta}_{2}}. It can be seen that neither of the loss functions (9) and (10) involves any penalty term, therefore is easier to minimize than those with penalty terms frequently found in various neural network methods for solving partial differential equations. Besides, the training of each network only needs data samples on part of the computing domain, which has less computational cost than training a single network with data samples from the whole domain.

3 Domain decomposition for PFNN-2

To enable parallel computing capability in PFNN-2, we first partition the computing domain into a group of non-overlapping sub-domains {Ω~i}i=1m\{\widetilde{\Omega}_{i}\}_{i=1}^{m}, and then extend each sub-domain to obtain a series of overlapping sub-domains {Ωi}i=1m\{\Omega_{i}\}_{i=1}^{m}, where m+m\in\mathbb{N}_{+} is the number of sub-domains. Figure 2 shows an example of the overlapping domain decomposition partition for the case of m=9m=9. Starting with a suitable initial guess u0u^{0}, problem (1) can be solved by constructing a solution sequence uku^{k} that converges to the true solution, where k=1,2,k=1,2,\cdots represents the count of iteration. And the approximate solution for each iteration is formed by solving a series of sub-domain problems concurrently on parallel computers. In particular, at the kk-th iteration the sub-domain problem defined on Ωi\Omega_{i} is

{uikt(𝒜uik)+𝒞=0,inΩi×(0,T],uik=uk1,onΓi×(0,T],uik=rD,onΓi,D×(0,T],(𝒜uik)𝒏=rN,onΓi,N×(0,T],uik=r0,onΩ¯i×{0}\begin{array}[]{l}\left\{\begin{array}[]{r l l}\dfrac{\partial u_{i}^{k}}{\partial t}-\nabla\cdot\left(\mathcal{A}\nabla u_{i}^{k}-\mathcal{B}\right)+\mathcal{C}&=0,&\mbox{in}\ \Omega_{i}\times(0,T],\\[5.69054pt] u_{i}^{k}&=u^{k-1},&\mbox{on}\ \Gamma_{i}\times(0,T],\\[5.69054pt] u_{i}^{k}&=r_{D},&\mbox{on}\ \Gamma_{i,D}\times(0,T],\\[5.69054pt] \left(\mathcal{A}\nabla u_{i}^{k}\right)\cdot\bm{n}&=r_{N},&\mbox{on}\ \Gamma_{i,N}\times(0,T],\\[5.69054pt] u_{i}^{k}&=r_{0},&\mbox{on}\ \overline{\Omega}_{i}\times\{0\}\end{array}\right.\end{array} (12)

where Γi,D=ΩiΓD\Gamma_{i,D}=\partial\Omega_{i}\cap\Gamma_{D}, Γi,N=ΩiΓN\Gamma_{i,N}=\partial\Omega_{i}\cap\Gamma_{N}, and Γi=Ωi\(ΓDΓN)\Gamma_{i}=\partial\Omega_{i}\backslash(\Gamma_{D}\cup\Gamma_{N}). After solving the sub-domain problems for iteration kk, the approximate solution defined on the whole domain is composed as uk|Ω~i=uiku^{k}\big{|}_{\widetilde{\Omega}_{i}}=u_{i}^{k}.

Refer to caption
Figure 2: The domain partitioning, network structures and training process of PFNN-2. The computational domain is decomposed into several overlapping sub-domains and each sub-domain comes with a combination of two sub-networks for approximating the relative local solution. Each iteration of the training process is comprised of steps ①-⑤.

To construct the solution sequence, instead of solving the sub-domain problem (12) directly, PFNN-2 finds an approximate solution to the corresponding sub-domain problem in the weak form

0TΩi(𝒜uikv+(uikt++𝒞)v)𝑑𝒙𝑑t0TΓi,NrNv𝑑𝒙𝑑t=0,v𝒱i,\displaystyle\int_{0}^{T}\int_{\Omega_{i}}\left(\mathcal{A}\nabla u_{i}^{k}\cdot\nabla v+\Big{(}\frac{\partial u_{i}^{k}}{\partial t}+\nabla\cdot\mathcal{B}+\mathcal{C}\Big{)}v\right)d\bm{x}dt-\displaystyle\int_{0}^{T}\int_{\Gamma_{i,N}}r_{N}vd\bm{x}dt=0,\quad\forall v\in\mathcal{V}_{i}, (13)

where uiku_{i}^{k} belongs to a hypothesis space i\mathcal{H}_{i} and satisfies the initial-boundary conditions

{uik=uk1,onΓi×(0,T],uik=rD,onΓi,D×(0,T],uik=r0,onΩ¯i×{0},\begin{array}[]{l}\left\{\begin{array}[]{r l l}u_{i}^{k}&=u^{k-1},&\mbox{on}\ \Gamma_{i}\times(0,T],\\[5.69054pt] u_{i}^{k}&=r_{D},&\mbox{on}\ \Gamma_{i,D}\times(0,T],\\[5.69054pt] u_{i}^{k}&=r_{0},&\mbox{on}\ \overline{\Omega}_{i}\times\{0\},\end{array}\right.\end{array}

and 𝒱i\mathcal{V}_{i} is a certain test space.

Since the weak form (13) is similar to (2), we can utilize the method presented in the previous section to establish the hypothesis space i\mathcal{H}_{i} and test space 𝒱i\mathcal{V}_{i}, for all i=1,2,,mi=1,2,\cdots,m. Specifically on sub-domain Ωi\Omega_{i}, 𝒱i\mathcal{V}_{i} is spanned by a set of compactly supported functions {vi,s}s=1nvi\{v_{i,s}\}_{s=1}^{n_{v}^{i}} (nvi+n_{v}^{i}\in\mathbb{N}_{+}), and any function wiiw_{i}\in\mathcal{H}_{i} is constructed according to

wi=gi+ifi,w_{i}=g_{i}+\ell_{i}f_{i}, (14)

where gig_{i} and fif_{i} are two sub-networks that play roles similar to g𝜽1g_{\bm{\theta}_{1}} and f𝜽2f_{\bm{\theta}_{2}}, and i\ell_{i} is a length factor function that decouples the two sub-networks. Note that here and hereafter, we drop the subscripts representing the learnable parameters for brevity.

Starting from a proper initial state wi0=gi0+ifi0w_{i}^{0}=g_{i}^{0}+\ell_{i}f_{i}^{0}, each local approximate solution wiw_{i} is updated by a dedicated processor to produce a sequence wikw_{i}^{k} converging to the corresponding local true solution, where gi0g_{i}^{0} and fi0f_{i}^{0} are the initial states of sub-networks gig_{i} and fif_{i}, k=1,2,k=1,2,\cdots is the count of iteration. In the kk-th iteration, wikw_{i}^{k} is acquired by solving the corresponding sub-problem (13). And following that, all the wikw_{i}^{k} are merged together to form the approximate solution wkw^{k} on the whole domain according to wk|Ω~i=wikw^{k}\big{|}_{\widetilde{\Omega}_{i}}=w_{i}^{k}. To illustrate the solving process accurately and intuitively, Algorithm 1 and Figure 2 are provided, which show that each iteration is comprised of two stages:

  • Online stage (line 4-13 of Algorithm 1): As demonstrated in step ① of Figure 2, each processor conducts communication to the immediate neighbors to acquire the current interior boundary condition, where wiw_{i} satisfies wik=wk1=wjk1w_{i}^{k}=w^{k-1}=w_{j}^{k-1}, on (ΓiΩj)×(0,T](\Gamma_{i}\cap\Omega_{j})\times(0,T] (ji)(j\neq i).

  • Offline stage (line 15-23 of Algorithm 1): First, as shown in steps ②-③ of Figure 2, each sub-network gig_{i} is trained on the corresponding processor to learn the local solution by minimizing the loss function

    Ψik[gi]:=(𝒙,t)S(Γi×(0,T])(gi(𝒙,t)wk1(𝒙,t))2+(𝒙,t)S(Γi,D×(0,T])(gi(𝒙,t)rD(𝒙,t))2+(𝒙,t)S(Ω¯i×{0})(gi(𝒙,t)r0(𝒙,t))2.\small\begin{array}[]{l}\Psi_{i}^{k}[g_{i}]:=\displaystyle\sum\limits_{(\bm{x},t)\in S(\Gamma_{i}\times(0,T])}\big{(}g_{i}(\bm{x},t)-w^{k-1}(\bm{x},t)\big{)}^{2}+\displaystyle\sum\limits_{(\bm{x},t)\in S(\Gamma_{i,D}\times(0,T])}\big{(}g_{i}(\bm{x},t)-r_{D}(\bm{x},t)\big{)}^{2}\\[17.07164pt] \qquad\quad+\displaystyle\sum\limits_{(\bm{x},t)\in S(\overline{\Omega}_{i}\times\{0\})}\big{(}g_{i}(\bm{x},t)-r_{0}(\bm{x},t)\big{)}^{2}.\end{array} (15)

    Let gikg_{i}^{k} be the acquired sub-network after minimizing (15) and w~ik=gik+ifi\widetilde{w}_{i}^{k}=g_{i}^{k}+\ell_{i}f_{i}. Then, as shown in steps ④-⑤ of Figure 2, each sub-network fif_{i} is trained on the corresponding processor to approximate the local solution on the remainder of the sub-domain via minimizing

    Lik[fi]:=s=1nvi[|Ωi|T#S(Ωi×(0,T])(𝒙,t)S(Ωi×[0,T])[𝒜w~ikvi,s+(w~ikt++𝒞)vi,s](𝒙,t)|Γi,N|T#S(Γi,N×(0,T])(𝒙,t)S(Γi,N×[0,T])[rNvi,s](𝒙,t)]2.\small\begin{array}[]{l}L_{i}^{k}[f_{i}]:=\displaystyle\sum_{s=1}^{n_{v}^{i}}\Bigg{[}\displaystyle\frac{|\Omega_{i}|T}{\#S(\Omega_{i}\times(0,T])}\sum\limits_{(\bm{x},t)\in S(\Omega_{i}\times[0,T])}\Big{[}\mathcal{A}\nabla\widetilde{w}_{i}^{k}\cdot\nabla v_{i,s}+\Big{(}\dfrac{\partial\widetilde{w}_{i}^{k}}{\partial t}+\nabla\cdot\mathcal{B}+\mathcal{C}\Big{)}v_{i,s}\Big{]}\Big{(}\bm{x},t\Big{)}\\[5.69054pt] \qquad\qquad\qquad\ -\displaystyle\frac{|\Gamma_{i,N}|T}{\#S(\Gamma_{i,N}\times(0,T])}\sum\limits_{(\bm{x},t)\in S(\Gamma_{i,N}\times[0,T])}\big{[}r_{N}v_{i,s}\big{]}\big{(}\bm{x},t\big{)}\Bigg{]}^{2}.\end{array} (16)

    Afterwards, the obtained sub-network fikf_{i}^{k} is utilized to form the local approximate solution wik=gik+ifikw_{i}^{k}=g_{i}^{k}+\ell_{i}f_{i}^{k}. And finally, the approximate solution on the whole domain is composed as wk|Ω~i=wikw^{k}\big{|}_{\widetilde{\Omega}_{i}}=w_{i}^{k}.

Input: ii, mm, gi0g_{i}^{0} fi0f_{i}^{0}, i\ell_{i}, KoK_{o} (number of outer iteration), KiK_{i} (number of inner iteration).
Output: wiKow_{i}^{K_{o}}.
1 Let wi0=gi0+ifi0w_{i}^{0}=g_{i}^{0}+\ell_{i}f_{i}^{0}.
2 for k=1,2,,Kok=1,2,\cdots,K_{o} do
3       /* Online stage: line 4 to 13 */
4       for j=1,2,,mj=1,2,\cdots,m do
5             if i<ji<j then
6                   Send wk1|ΓjΩi=wik1w^{k-1}\big{|}_{\Gamma_{j}\cap\Omega_{i}}=w_{i}^{k-1} to processor jj.
7                   Receive wk1|ΓiΩj=wjk1w^{k-1}\big{|}_{\Gamma_{i}\cap\Omega_{j}}=w_{j}^{k-1} from processor jj.
8             end if
9            if i>ji>j then
10                   Receive wk1|ΓiΩj=wjk1w^{k-1}\big{|}_{\Gamma_{i}\cap\Omega_{j}}=w_{j}^{k-1} from processor jj.
11                   Send wk1|ΓjΩi=wik1w^{k-1}\big{|}_{\Gamma_{j}\cap\Omega_{i}}=w_{i}^{k-1} to processor jj.
12             end if
13            
14       end for
15       /* Offline stage: line 15 to 23 */
16       for t=1,2,,Kit=1,2,\cdots,K_{i} do
17             Forward propagation: evaluate the loss function Ψik[gi]\Psi_{i}^{k}[g_{i}] according to (15).
18             Backward propagation: calculate the gradients of Ψik[gi]\Psi_{i}^{k}[g_{i}] and minimize Ψik[gi]\Psi_{i}^{k}[g_{i}] to get the updated gikg_{i}^{k}.
19            
20       end for
21      for t=1,2,,Kit=1,2,\cdots,K_{i} do
22             Forward propagation: evaluate the loss function Lik[fi]L_{i}^{k}[f_{i}] according to (16).
23             Backward propagation: calculate the gradients of Lik[fi]L_{i}^{k}[f_{i}] and minimize Lik[fi]L_{i}^{k}[f_{i}] to get the updated fikf_{i}^{k}.
24            
25       end for
26      Let wik=gik+ifikw_{i}^{k}=g_{i}^{k}+\ell_{i}f_{i}^{k}.
27 end for
Algorithm 1 Computing the ii-th local approximate solution in PFNN-2.

4 Discussion and comparison with other related methods

In this section, we review several state-of-the-art neural network methods for solving PDEs and compare them with PFNN-2. For simplicity we consider a model boundary value problem

{Δu=rΩ,inΩ,u=rΓ,onΓ:=Ω.\left\{\begin{array}[]{r l}-\Delta u=r_{\Omega},&\mbox{in}\ \Omega,\\[5.69054pt] u=r_{\Gamma},&\mbox{on}\ \Gamma:=\partial\Omega.\end{array}\right. (17)

The most commonly applied neural network algorithms are based on the idea of least squares; examples can be found in e.g., Refs. [14, 15, 17, 19, 2, 63]. In a least-squares based method, an approximate solution of (17) is found by minimizing the loss function in the least-squares form as follows:

L[w]:=1#S(Ω)𝒙S(Ω)(Δw(𝒙)rΩ(𝒙))2+β#S(Γ)𝒙S(Γ)(w(𝒙)rΓ(𝒙))2,L[w]:=\dfrac{1}{\#S(\Omega)}\sum\limits_{\bm{x}\in S(\Omega)}\big{(}-\Delta w(\bm{x})-r_{\Omega}(\bm{x})\big{)}^{2}+\dfrac{\beta}{\#S(\Gamma)}\sum\limits_{\bm{x}\in S(\Gamma)}\big{(}w(\bm{x})-r_{\Gamma}(\bm{x})\big{)}^{2}, (18)

where β>0\beta>0 is a penalty coefficient. Despite of being straightforward to implement, this approach may encounter issues of low convergence speed and poor accuracy, primarily because the complicated loss function usually requires simultaneously approximating the high-order derivatives of the solution and the additional initial and boundary conditions, thus imposing extreme difficulty for optimization. In addition, the training of the highly nonlinear neural network may also demand large computational cost and lead to low computational efficiency.

4.1 Accuracy improvement

Several neural network methods were proposed to enhance the accuracy by lowering the smoothness requirement so that high-order differential operators involved in the loss function are eliminated. These methods can be roughly classified into two categories, involving weak forms based on the Ritz and Galerkin variational principles, respectively.

In approaches based on the Ritz variational principle such as the ones presented in Refs. [21, 30, 64, 65], the original problem in the strong form (17) is transformed to an energy minimization problem

u=argminwΩ(12|w|2rΩw)𝑑𝒙,u=\arg\min_{w\in\mathcal{H}}\displaystyle\int_{\Omega}\left(\dfrac{1}{2}|\nabla w|^{2}-r_{\Omega}w\right)d\bm{x}, (19)

where \mathcal{H} is a certain hypothesis space built by neural networks. In this way, the high-order derivatives are avoided in the loss function. However, the scope of applications of these methods is usually relatively narrow, confined to self-adjoint elliptic problems.

In methods based on the Galerkin variational principle, an approximate solution is found within a hypothesis space \mathcal{H} satisfying

Ω(uvrΩv)𝑑𝒙=0,v𝒱,\displaystyle\int_{\Omega}\big{(}\nabla u\cdot\nabla v-r_{\Omega}v\big{)}d\bm{x}=0,\quad\forall v\in\mathcal{V}, (20)

where 𝒱\mathcal{V} is a certain test function space. Although having a wider applications than Ritz-type approaches, existing Galerkin-based neural network methods are usually in lack of flexibility and efficiency. For example, orthogonal polynomials can be employed as the test functions [32, 33], but the methods are only applicable to simple geometries for which orthogonal polynomials are known. It is also possible to take compactly supported piecewise linear functions to construct the test space with low cost [31], which however requires a grid to arrange the locations of the test functions and is again hard to adapt to complex geometries. In some recent works [34, 66], an adversarial network was introduced to evaluate the maximum weighted loss in an adaptive manner, but the method is often hard to converge because of the difficulty in balancing the efforts devoted to training the generative network for the approximate solution and the adversarial network for the test function.

Apart from relaxing the smoothness constraints, there are another series of neural network methods that aim to improve the accuracy by removing the boundary constraints [15, 17, 18, 35, 36]. The basic idea is to establish the approximate solution in the form of

w:=𝒢+f,w:=\mathcal{G}+\mathcal{L}f, (21)

where 𝒢\mathcal{G} is manufactured to satisfy boundary conditions, ff is a network for learning the true solution within the domain, and \mathcal{L} is introduced to eliminate the influence of the network ff on the boundary. The main disadvantage of these methods is lacking a universal and flexible strategy to build the functions 𝒢\mathcal{G} and \mathcal{L}. For handling a specific problem on a particular domain, most of the works [15, 35, 36] rely on a dedicated function in the analytical form, which usually cannot be applied to any other case. Although some efforts have been made [17, 18] by introducing spline functions so that the method can be applied to a broader range of computing domains, the flexibility is still very limited.

The proposed PFNN-2 method integrates the advantages of all the aforementioned methods. With the introduction of the weak form loss function, the burdensome task of approximating the high-order derivatives are avoided. By separating the task of learning initial and essential boundary conditions from other training tasks, the constraints are dealt with in a more effective manner, and the subsequent training process is eased because the penalty terms in the corresponding loss function are eliminated. The structures of the approximate solution comprised of neural networks and spline functions as well as the compactly supported test functions have a high flexibility, which can extend the applicability of the method to a wide range of PDEs.

4.2 Domain decomposition

In a classical numerical method for solving PDEs, the idea of domain decomposition can be introduced by transforming the solution of the original problem into solving a number of simpler sub-problems defined on a series of sub-domains {Ωi}i=1m\{\Omega_{i}\}_{i=1}^{m} (m+m\in\mathbb{N}_{+}). Starting from an appropriate initial guess u0u^{0}, a solution sequence uku^{k} (k=0,1,2,k=0,1,2,\cdots) is built to approach to the true solution of the original problem. In particular, for problem (17), at the kk-th iteration, uku^{k} is formed by composing a set of local approximate solutions {uik}i=1m\{u_{i}^{k}\}_{i=1}^{m}, which are acquired by solving sub-domain problems

{𝔇i(uik)=0,inΩi,𝔅i(uik;uk1)=0,onΓi:=Ωi,i=1,2,,m\left\{\begin{array}[]{l l}\mathfrak{D}_{i}(u_{i}^{k})=0,&\mbox{in}\ \Omega_{i},\\[5.69054pt] \mathfrak{B}_{i}(u_{i}^{k};u^{k-1})=0,&\mbox{on}\ \Gamma_{i}:=\partial\Omega_{i},\end{array}\right.\qquad i=1,2,\cdots,m (22)

in parallel, where 𝔇i\mathfrak{D}_{i} and 𝔅i\mathfrak{B}_{i} are certain differential operators, whose forms depend on the specific method employed.

Previously proposed domain decomposition methods for neural network based solution of PDEs can be categorized into two major classes: overlapping and non-overlapping methods. Figure 2 has already illustrated the difference between the two. Among existing works, DeepDDM [54] and D3M [56] are overlapping methods, and cPINNs [57, 59] and XPINNs [58, 59] are non-overlapping ones. Depending on the partition strategies adopted, as well as other characteristics, the differential operators 𝔇i\mathfrak{D}_{i} and 𝔅i\mathfrak{B}_{i} in these methods are in various forms, as briefly listed below.

  • DeepDDM:

    𝔇i(uik):=ΔuikrΩand𝔅i(uik):={uikrΓ,onΓiΓ,uikuk1,onΓi\Γ.\mathfrak{D}_{i}(u_{i}^{k}):=-\Delta u_{i}^{k}-r_{\Omega}\quad\mbox{and}\quad\mathfrak{B}_{i}(u_{i}^{k}):=\left\{\begin{array}[]{l l}u_{i}^{k}-r_{\Gamma},&\mbox{on}\ \Gamma_{i}\cap\Gamma,\\[5.69054pt] u_{i}^{k}-u^{k-1},&\mbox{on}\ \Gamma_{i}\backslash\Gamma.\end{array}\right. (23)
  • D3M:

    𝔇i(uik):=[𝝉ik+uik𝝉ikrΩ]and𝔅i(uik):={uikrΓ,onΓiΓ,uikuk1,onΓi\Γ.\mathfrak{D}_{i}(u_{i}^{k}):=\left[\begin{array}[]{l}\bm{\tau}_{i}^{k}+\nabla u_{i}^{k}\\[5.69054pt] \nabla\cdot\bm{\tau}_{i}^{k}-r_{\Omega}\end{array}\right]\quad\mbox{and}\quad\mathfrak{B}_{i}(u_{i}^{k}):=\left\{\begin{array}[]{l l}u_{i}^{k}-r_{\Gamma},&\mbox{on}\ \Gamma_{i}\cap\Gamma,\\[5.69054pt] u_{i}^{k}-u^{k-1},&\mbox{on}\ \Gamma_{i}\backslash\Gamma.\end{array}\right. (24)

    where 𝝉ik\bm{\tau}_{i}^{k} is an additional variable to approximate the gradient of the solution uiku_{i}^{k}.

  • cPINNs:

    𝔇i(uik):=ΔuikrΩand𝔅i(uik):={uikrΓ,onΓiΓ,[uikuk1(uikuk1)𝒏],onΓi\Γ.\begin{array}[]{l}\mathfrak{D}_{i}(u_{i}^{k}):=-\Delta u_{i}^{k}-r_{\Omega}\,\ \mbox{and}\,\ \par\mathfrak{B}_{i}(u_{i}^{k}):=\left\{\begin{array}[]{l l}u_{i}^{k}-r_{\Gamma},&\mbox{on}\ \Gamma_{i}\cap\Gamma,\\[5.69054pt] \left[\begin{array}[]{l}u_{i}^{k}-u^{k-1}\\[5.69054pt] \nabla(u_{i}^{k}-u^{k-1})\cdot\bm{n}\end{array}\right],&\mbox{on}\ \Gamma_{i}\backslash\Gamma.\end{array}\right.\end{array} (25)
  • XPINNs:

    𝔇i(uik):=ΔuikrΩand𝔅i(uik):={uikrΓ,onΓiΓ,[uikuk1ΔuikΔuk1],onΓi\Γ.\begin{array}[]{l}\mathfrak{D}_{i}(u_{i}^{k}):=-\Delta u_{i}^{k}-r_{\Omega}\,\ \mbox{and}\,\ \par\mathfrak{B}_{i}(u_{i}^{k}):=\left\{\begin{array}[]{l l}u_{i}^{k}-r_{\Gamma},&\mbox{on}\ \Gamma_{i}\cap\Gamma,\\[5.69054pt] \left[\begin{array}[]{l}u_{i}^{k}-u^{k-1}\\[5.69054pt] \Delta u_{i}^{k}-\Delta u^{k-1}\end{array}\right],&\mbox{on}\ \Gamma_{i}\backslash\Gamma.\end{array}\right.\end{array} (26)

In general, the aforementioned methods solve sub-problems (22) by introducing mm sub-networks {wi}i=1m\{w_{i}\}_{i=1}^{m} to approximate the local solutions on {Ωi}i=1m\{\Omega_{i}\}_{i=1}^{m}, respectively. After initialized properly to form the initial states {wi0}i=1m\{w_{i}^{0}\}_{i=1}^{m}, the mm sub-networks are trained in parallel to generate solution sequences {wik}i=1m\{w_{i}^{k}\}_{i=1}^{m} converging to mm local true solutions, where k=1,2,k=1,2,\cdots is the iteration count, and wikw_{i}^{k} is acquired by minimizing the loss function

Lik[wi]:=1#S(Ωi)𝒙S(Ωi)[𝔇i(wi)](𝒙)22+β#S(Γi)𝒙S(Γi)[𝔅i(wi;wk1)](𝒙)22.L_{i}^{k}[w_{i}]:=\dfrac{1}{\#S(\Omega_{i})}\sum\limits_{\bm{x}\in S(\Omega_{i})}\big{\|}\left[\mathfrak{D}_{i}(w_{i})\right](\bm{x})\big{\|}_{2}^{2}+\dfrac{\beta}{\#S(\Gamma_{i})}\sum\limits_{\bm{x}\in S(\Gamma_{i})}\big{\|}\left[\mathfrak{B}_{i}(w_{i};w^{k-1})\right](\bm{x})\big{\|}_{2}^{2}. (27)

Correspondingly, a solution sequence wkw^{k} defined on the whole domain is formed by combining all the updated sub-networks {wik}i=1m\{w_{i}^{k}\}_{i=1}^{m} together.

In contrast to the existing domain decomposition based neural network methods that utilize loss functions in strong form (27), PFNN-2 employs weak form loss functions to avoid the explicit computation of high order derivatives of the approximate solution, and therefore has a higher computing efficiency. In addition to that, thanks to the penalty-free property, PFNN-2 is able to reduce the difficulty of the network training process and improve the sustained accuracy. As demonstrated in later numerical experiments, all these advantages make PFNN-2 a more suitable candidate for the case of multiple sub-domains, especially when the number of sub-domains is large.

5 Numerical Experiments

In this section, we carry out a series of numerical experiments to investigate the performance of the proposed PFNN-2 method, and compare it with several existing state-of-the-art approaches. Four test cases are utilized to examine the accuracy and efficiency of different numerical methods. In all tests involving neural networks, the ResNet model [67] with sinusoid activation function is utilized for constructing the network, in which each ResNet block is comprised of two fully connected layers and a residual connection. Note that PFNN-2 utilizes two neural networks instead of one. To make fair comparison between various methods, we try to adjust the number of ResNet blocks and the width of hidden layers so that the total degree of freedoms (i.e., parameters including weights and biases for neural network methods, and unknowns for traditional methods) is kept to a similar level. The L-BFGS optimizer [68] is applied for training the neural networks. All the experiments are conducted on a computer equipped with Intel Xeon Gold 6130 CPUs. And all calculations are done with 32-bit floating point computations. The relative 2\ell_{2}-error is taken to measure the accuracy of the approximate solutions. All the results obtained with neural network methods are reported based on ten times of independent experiments.

5.1 Anisotropic convection-diffusion equation on an L-shaped domain

The first test case is an anisotropic convection-diffusion equation

𝒜u+u+𝒞=0-\nabla\cdot\mathcal{A}\nabla u+\mathcal{B}\cdot\nabla u+\mathcal{C}=0 (28)

defined on an L-shaped domain [2,2]2\[0,2]2[-2,2]^{2}\backslash[0,2]^{2}, where

𝒜=[(x1+x2)2+16x12x22x12+x22(x1x2)2+16],=[(x1x1)3(x1+x2)3].\mathcal{A}=\left[\begin{array}[]{c c}-(x_{1}+x_{2})^{2}+16&x_{1}^{2}-x_{2}^{2}\\[5.69054pt] -x_{1}^{2}+x_{2}^{2}&-(x_{1}-x_{2})^{2}+16\end{array}\right],\qquad\mathcal{B}=\left[\begin{array}[]{c}(x_{1}-x_{1})^{3}\\[5.69054pt] (x_{1}+x_{2})^{3}\end{array}\right].

We set the solution to be u(r,θ)=r(r1)(r2)sin(2θ/3π/3)u(r,\theta)=r(r-1)(r-2)\sin(2\theta/3-\pi/3), where (r,θ)(r,\theta) is the polar coordinate, and let 𝒞\mathcal{C} be derived accordingly. We apply Neumann boundary condition on ΓN=({2}×[2,2])([2,2]×{2})\Gamma_{N}=(\{-2\}\times[-2,2])\cup([-2,2]\times\{-2\}) and Dirichlet boundary condition on the rest of the boundary ΓD=Ω\ΓN\Gamma_{D}=\partial\Omega\backslash\Gamma_{N}. The main difficulty of solving this problem is the singularity near the inner corner of the L-shaped domain.

We run the test with only one processor to examine the performance of various methods without domain decomposition. It should be noted that not all neural network methods based on variational principles can be applied to this problem. In particular, some methods such as Deep Ritz [21] and Deep Nitsche [30] is only suitable for self-adjoint problems, some such as VPINNs [32] and hp-VPINNs [33] require orthogonal polynomials to form the test spaces, which are hard to build on irregular domains. Therefore, in this test, we only examine methods including the traditional linear finite element method, the least-squares neural network method, a variational neural network method called VarNet [31] (which takes compactly supported piecewise linear functions as test functions) and PFNN-2. For PFNN-2, we sample 1200 points in Ω\Omega and 80 points on ΓN\Gamma_{N} as the centers to form the test functions according to the strategy specified in section 2, and sample 160 points on ΓD\Gamma_{D}. For the other neural network methods, the training sets with similar size are adopted. The maximum number of iterations for training neural networks is set to 6000 epochs.

Table 1: Accuracy of various methods for solving the anisotropic convection-diffusion equation on an L-shaped domain.
Method finite element least-squares VarNet PFNN-2
Network - 3 blocks (width:15) 3 blocks (width:15) 2×\times3 blocks (width:10)
DOFs 1242 1261 1261 1182
Relative [1.5mm] error 2.32e-03 β=10\,\,\,\,\,\,\beta=10  1.38e-02±\pm6.12e-03 β=10\,\,\,\,\,\,\beta=10  1.56e-02±\pm3.43e-03 9.18e-04±\pm3.37e-04
β=100\,\,\,\beta=100  6.64e-03±\pm3.02e-03 β=100\,\,\,\beta=100  6.71e-03±\pm1.38e-03
β=1000\,\beta=1000  1.06e-02±\pm4.93e-03 β=1000\,\beta=1000  6.82e-03±\pm1.98e-03
Refer to caption
(a) finite element
Refer to caption
(b) least-squares (β\beta=100)
Refer to caption
(c) VarNet (β\beta=100)
Refer to caption
(d) PFNN-2
Figure 3: Error distribution of the approximate solutions obtained with various methods for solving the anisotropic convection-diffusion equation on an L-shaped domain.

The relative errors of the approximate solutions obtained by the tested algorithms are reported in Table 1, in which the influence of penalty factor for the least-squares neural network method and VarNet is also investigated. It can be seen from the table that the two penalty-based methods is inferior to the classical finite element method in terms of accuracy for this test case, and their performance is seriously affected by the change of penalty factors. In contrast, PFNN-2 is free of any penalty term and is not subject to such influence, thus can outperform all the methods in terms of sustained accuracy. Further, we draw the distribution of the solution errors of the tested algorithms in Figure 3, from which we observe that there are significant errors occurring near the corner of the L-shaped domain when the finite element method and the penalty-based neural network approaches are used, indicating the inefficient treatment of corner singularity. In addition, the two penalty-based neural network methods also have serious errors near the essential boundary. PFNN-2, on the other hand, is free of such singularity and boundary effects, and can sustain the most accurate result among the tested methods, with the error uniformly distributed both within the domain and on the boundary.

5.2 Allen-Cahn equation on a periodic domain

The next test case is the well-known Allen-Cahn equation [69] for phase field simulation:

utϵ2Δu+u3u=0,\dfrac{\partial u}{\partial t}-\epsilon^{2}\Delta u+u^{3}-u=0, (29)

defined on a periodic domain [0,2π]2[0,2\pi]^{2}. We set the width of the transition layer to be ϵ=0.2\epsilon=0.2 and set the initial condition to be

u(𝒙,0)=tanh(x1π)2+(x2π)22ϵ2.u(\bm{x},0)=\tanh\dfrac{\sqrt{(x_{1}-\pi)^{2}+(x_{2}-\pi)^{2}}-2}{\epsilon\sqrt{2}}.

For this time-dependent problem, we conduct the simulation from t=0t=0 to t=10t=10.

The true solution of the above problem has some sharp gradients on the transition layer. For dealing with the problem, domain decomposition based methods tend to be effective, since each network only need to learn one part of the solution. To examine the effectiveness of the domain decomposition strategies of various neural network methods including cPINNs, XPINNs, DeepDDM, D3M and PFNN-2, four groups of experiments are conducted, in which the computational domain is decomposed into 1×\times1, 3×\times3, 5×\times5 and 7×\times7 sub-domains, respectively. Configurations of the network structures for the various tested methods are listed in Table 2, where we try to maintain a similar amount of learnable parameters for each group of the test. We sample 16000 points on the whole computational domain to form the test function sets. The numbers of outer and inner iterations are set to be 150 and 40, respectively, resulting a total of 6000 iterations. A numerical solution obtained by the Fourier spectral method with 1024 modes and second-order explicit Runge-Kutta temporal integrator with time step Δt=0.001\Delta t=0.001 is taken as the ground truth for evaluating the solution error. Different penalty coefficients including 1, 10, 100 and 1000 are examined for penalty-based methods and the value with the optimal performance is picked out for further analysis.

Table 2: Network structures of various methods for solving the Allen-Cahn equation.
mm cPINNs/XPINNs/DeepDDM D3M PFNN-2
1 1×\times4 blocks (width:32) 1×\times4 blocks (width:32) 1×\times2×\times4 blocks (width:22)
7553 DOFs 7619 DOFs 7306 DOFs
9 9×\times2 blocks (width:15) 9×\times2 blocks (width:15) 9×\times2×\times2 blocks (width:10)
7164 DOFs 7452 DOFs 6858 DOFs
25 25×\times1 blocks (width:14) 25×\times1 blocks (width:14) 25×\times2×\times1 blocks (width:9)
7025 DOFs 7775 DOFs 6800 DOFs
49 49×\times1 blocks (width:10) 49×\times1 blocks (width:10) 49×\times2×\times1 blocks (width:6)
7889 DOFs 8967 DOFs 7154 DOFs
Refer to caption
Figure 4: Relative errors of various domain decomposition based neural network methods for solving the Allen-Cahn equation.

Figure 4 provides an intuitive display of the numerical accuracy of different tested methods, in which box-plots are drawn to describe the five-number summary including the minimum observation, first quartile, middle value, third quartile, and maximum observation for each set of tests. It can be seen that not all domain decomposition strategies can improve the accuracy. As the number of sub-domains increases, the achievable accuracy of most of the existing domain decomposition based neural network methods except DeepDDM gets worse, which reveals their drawbacks in handling multiple sub-problems with constraints. Although DeepDDM can maintain a relatively satisfying accuracy, PFNN-2 has a more preferable performance, primarily due to its capability of dealing with smoothness constraints and boundary conditions.

5.3 Nonlinear anisotropic convection-diffusion equation on an irregular domain

In this test case, we consider a nonlinear anisotropic convection-diffusion equation

ut(𝒜u)+𝒞=0,t[0,1]\dfrac{\partial u}{\partial t}-\nabla\cdot(\mathcal{A}\nabla u-\mathcal{B})+\mathcal{C}=0,\quad t\in[0,1] (30)

defined on an irregular domain with the shape of the Antarctica, as demonstrated in Table 3. We set the exact solution to be u=cos(πx1)cos(πx2)exp(πt(x1+x2))u=\cos(\pi x_{1})\cos(\pi x_{2})\exp(-\pi t(x_{1}+x_{2})) that satisfies the Dirichlet boundary condition, and set the coefficients to be

𝒜=[u2+12u22u2u2+1],=[u2+2u2u2+u],\mathcal{A}=\left[\begin{array}[]{c c}u^{2}+1&2u^{2}\\ -2u^{2}&u^{2}+1\end{array}\right],\quad\mathcal{B}=\left[\begin{array}[]{c}u^{2}+2u\\[0.7113pt] -2u^{2}+u\end{array}\right],

and 𝒞\mathcal{C} is derived according to (30).

Table 3: Network structures of various neural network methods for solving the nonlinear anisotropic convection-diffusion equation on an irregular domain with domain decomposition.
mm 1 14 26 42
Network (cPINNs/ 1×\times4 blocks (width:30) 14×\times2 blocks (width:11) 26×\times1 blocks (width:13) 42×\times1 blocks (width:10)
DeepDDM) 6661 DOFs 6328 DOFs 6448 DOFs 6762 DOFs
Network (PFNN-2) 1×\times2×\times4 blocks (width: 14×\times2×\times2 blocks (width: 26×\times2×\times1 blocks (width: 42×\times2×\times1 blocks (width:
20) 6082 DOFs 7) 5712 DOFs 8) 5876 DOFs 6) 6132 DOFs
Decomposition of the domain [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]

To investigate the effectiveness of domain decomposition strategies of different neural network methods, we examine the performance of PFNN-2 and compare it with cPINNs and DeepDDM, which are selected as representative non-overlapping and overlapping domain decomposition neural network methods, respectively. We conduct four groups of experiments, in which the computational domain is decomposed into 1, 14, 26 and 42 sub-domains, respectively. In each group of experiments, we set network structure of each method with similar amount of undecided parameters, as illustrated in Table 3. A total of 6000 points are sampled on the whole computational domain to form the test function set. Other experiment configurations are the same to the previous experiment.

Refer to caption
(a) cPINNs and DeepDDM
Refer to caption
(b) PFNN-2
Figure 5: Error evolution histories of various methods during the training process in solving the nonlinear anisotropic convection-diffusion equation on an Antarctica-shaped domain.

Figure 5 demonstrates the error evolution histories of various methods. It can be seen from the figure that it is hard for cPINNs and DeepDDM to converge to the true solution in the cases that multiple sub-domains are employed, which exposes their defects in tackling problem with strong nonlinearity. By comparison, thanks to the avoidance of penalty terms, as well as the reduction of the smoothness constraints and the flexble treatment of the complex geometry, PFNN-2 is more suitable for solving this problem and can achieve more accurate and stable results.

5.4 Viscous Burgers equation on a 3D cube

The last test case is a nonlinear equation system – the viscous Burgers equation

𝒖t1ReΔ𝒖+𝒖𝒖=𝟎,\dfrac{\partial\bm{u}}{\partial t}-\dfrac{1}{\mbox{Re}}\Delta\bm{u}+\bm{u}\cdot\nabla\bm{u}=\bm{0}, (31)

defined on domain [0,1]3[0,1]^{3} with Dirichlet boundary condition defined on ΓD=[0,1]2×{0,1}\Gamma_{D}=[0,1]^{2}\times\{0,1\} and Neumann boundary condition defined on ΓN=Γ\ΓD\Gamma_{N}=\Gamma\backslash\Gamma_{D}, where the Reynolds number is Re=100\mbox{Re}=100. We set the true solution to be

𝒖=[u1(𝒙,t)u2(𝒙,t)u3(𝒙,t)]=[2Re(1+πcos(πx1)sin(πx2)sin(πx3)exp(3π2t/Re)1+x1+sin(πx1)sin(πx2)sin(πx3)exp(3π2t/Re))2Re(πsin(πx1)cos(πx2)sin(πx3)exp(3π2t/Re)1+x1+sin(πx1)sin(πx2)sin(πx3)exp(3π2t/Re))2Re(πsin(πx1)sin(πx2)cos(πx3)exp(3π2t/Re)1+x1+sin(πx1)sin(πx2)sin(πx3)exp(3π2t/Re))],\bm{u}=\left[\begin{array}[]{l}u_{1}(\bm{x},t)\\[9.95845pt] u_{2}(\bm{x},t)\\[9.95845pt] u_{3}(\bm{x},t)\end{array}\right]=\left[\scalebox{0.78}{$\begin{array}[]{l}-\dfrac{2}{\mbox{Re}}\left(\dfrac{1+\pi\cos(\pi x_{1})\sin(\pi x_{2})\sin(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}{1+x_{1}+\sin(\pi x_{1})\sin(\pi x_{2})\sin(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}\right)\\[15.649pt] -\dfrac{2}{\mbox{Re}}\left(\dfrac{\pi\sin(\pi x_{1})\cos(\pi x_{2})\sin(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}{1+x_{1}+\sin(\pi x_{1})\sin(\pi x_{2})\sin(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}\right)\\[15.649pt] -\dfrac{2}{\mbox{Re}}\left(\dfrac{\pi\sin(\pi x_{1})\sin(\pi x_{2})\cos(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}{1+x_{1}+\sin(\pi x_{1})\sin(\pi x_{2})\sin(\pi x_{3})\exp(-3\pi^{2}t/\mbox{Re})}\right)\end{array}$}\right],

and conduct the simulation from t=0t=0 to t=1.0t=1.0.

In this test case, we examine both accuracy and efficiency of various methods including cPINNs, DeepDDM and PFNN-2. Firstly, the weak scaling efficiency of the tested methods is investigated. We conduct four sets of experiments, in which the computational domain is decomposed into 1, 2×2×22\times 2\times 2, 4×2×24\times 2\times 2 and 4×4×24\times 4\times 2 sub-domains, respectively. We keep the scale of task assigned to each processor similar. Specifically, we sample 5120 points on each sub-domain respectively to form the test function set, and adopt networks with similar scale to approximate local solution on each sub-domain, as illustrated in Table 4. Other experiment configurations are the same to the previous experiment. The accuracy of various method is reported in Table 4. Not surprisingly, the accuracy of all the methods is improved with the increase of the number of sub-domains, since the representative capability of approximate solution also grows stronger. In particular, PFNN-2 acquires the most accurate approximate solutions in all the test sets. Moveover, Figure 6 demonstrates the runtime for 400 iterations, which indicates that all the tested algorithms can maintain a relatively stable efficiency. In all the experiment groups, PFNN-2 always achieves the highest efficiency.

Table 4: Accuracy of various domain decomposition based neural network methods for solving the viscous Burgers equations in weak scaling efficiency test.
Method Network structure DOFs Errors (u1u_{1}) Errors (u2u_{2}) Errors (u3u_{3})
cPINNs 1×\times2 blocks (width:15) 1×\times828 7.31e-02±\pm2.57e-02 1.31e-01±\pm3.64e-02 1.34e-01±\pm4.08e-02
8×\times2 blocks (width:15) 8×\times828 3.09e-02±\pm8.38e-03 4.54e-02±\pm1.33e-02 5.39e-02±\pm1.63e-02
16×\times2 blocks (width:15) 16×\times828 2.64e-02±\pm7.40e-03 3.92e-02±\pm1.02e-02 4.58e-02±\pm1.36e-02
32×\times2 blocks (width:15) 32×\times828 2.33e-02±\pm6.92e-03 3.31e-02±\pm8.19e-03 3.72e-02±\pm8.76e-03
DeepDDM 1×\times2 blocks (width:15) 1×\times828 7.31e-02±\pm2.57e-02 1.31e-01±\pm3.64e-02 1.34e-01±\pm4.08e-02
8×\times2 blocks (width:15) 8×\times828 3.38e-02±\pm1.15e-02 5.55e-02±\pm1.58e-02 6.45e-02±\pm2.12e-02
16×\times2 blocks (width:15) 16×\times828 2.46e-02±\pm6.74e-03 3.68e-02±\pm1.13e-02 4.74e-02±\pm1.48e-02
32×\times2 blocks (width:15) 32×\times828 1.64e-02±\pm5.41e-03 2.38e-02±\pm7.20e-03 2.95e-02±\pm7.87e-03
PFNN-2 1×\times2×\times2 blocks (width:10) 1×\times806 5.06e-02±\pm1.36e-02 8.02e-02±\pm2.14e-02 7.67e-02±\pm2.05e-02
8×\times2×\times2 blocks (width:10) 8×\times806 2.23e-02±\pm4.94e-03 2.76e-02±\pm5.46e-03 3.22e-02±\pm6.44e-03
16×\times2×\times2 blocks (width:10) 16×\times806 9.54e-03±\pm2.27e-03 1.43e-02±\pm3.69e-03 1.51e-02±\pm3.83e-03
32×\times2×\times2 blocks (width:10) 32×\times806 6.72e-03±\pm1.80e-03 9.72e-03±\pm2.58e-03 9.81e-03±\pm2.28e-03
Refer to caption
Figure 6: Efficiency of various domain decomposition based neural network methods in solving the viscous Burgers equations in weak scaling efficiency test.

Next, the strong scaling efficiency of the tested approaches is examined. Four sets of experiments are carried out, in which the computational domain is split into 131^{3}, 232^{3}, 333^{3} and 434^{3} sub-domains, respectively. In this test, the overall scale of problem is kept similar in different test groups. To be specific, we sample 160,000 points on the whole computational domain to form the test function set, and keep the amounts of learnable parameters of different approaches in each group in a similar range, as shown in Table 5. Other experiment configurations are the same to the previous experiment. Table 5 also reports the solution errors of the various tested algorithms, which once more demonstrates the advantage of PFNN-2 with respect to the numerical accuracy. To further compare the performance in terms of computing time, we draw the runtime cost for 400 iterations in Figure 7, from which it can be seen that with the increase of the number of processors, the time-to-solution of all the methods is enhanced significantly. Among all the tested algorithms, PFNN-2 is always the most efficient.

Table 5: Accuracy of various domain decomposition based neural network methods for solving the viscous Burgers equations in strong scaling efficiency test.
Method Network structure DOFs Errors (u1u_{1}) Errors (u2u_{2}) Errors (u3u_{3})
cPINNs 1×\times5 blocks (width:72) 47883 6.51e-02±\pm2.63e-02 1.21e-01±\pm3.77e-02 1.25e-01±\pm3.85e-02
8×\times3 blocks (width:33) 47016 3.36e-02±\pm8.34e-03 4.78e-02±\pm1.19e-02 5.39e-02±\pm1.37e-02
27×\times2 blocks (width:22) 45819 2.89e-02±\pm6.73e-03 3.77e-02±\pm8.84e-03 4.05e-02±\pm1.05e-02
64×\times2 blocks (width:14) 47680 2.70e-02±\pm6.48e-03 4.03e-02±\pm9.32e-03 3.92e-02±\pm8.39e-03
DeepDDM 1×\times5 blocks (width:72) 47883 6.51e-02±\pm2.63e-02 1.21e-01±\pm3.77e-02 1.25e-01±\pm3.85e-02
8×\times3 blocks (width:33) 47016 3.81e-02±\pm7.81e-03 4.76e-02±\pm1.25e-02 6.13e-02±\pm1.42e-02
27×\times2 blocks (width:22) 45819 1.60e-02±\pm4.25e-03 2.31e-02±\pm5.30e-03 2.58e-02±\pm5.22e-03
64×\times2 blocks (width:14) 47680 1.21e-02±\pm3.26e-03 1.75e-02±\pm4.82e-03 1.72e-02±\pm3.87e-03
PFNN-2 1×\times2×\times5 blocks (width:50) 46706 4.59e-02±\pm1.30e-02 6.73e-02±\pm1.46e-02 5.92e-02±\pm1.32e-02
8×\times2×\times3 blocks (width:23) 47152 1.15e-02±\pm3.26e-03 1.72e-02±\pm4.56e-03 1.67e-02±\pm4.30e-03
27×\times2×\times2 blocks (width:15) 45522 6.49e-03±\pm1.94e-03 9.37e-03±\pm2.86e-03 9.22e-03±\pm1.77e-03
64×\times2×\times1 blocks (width:15) 46464 5.12e-03±\pm1.63e-03 7.84e-03±\pm2.03e-03 7.68e-03±\pm1.96e-03
Refer to caption
Figure 7: Efficiency of various domain decomposition based neural network methods in solving the viscous Burgers equations on a 3D cube.

6 Conclusion

In this paper, a new penalty-free neural network method — PFNN-2 is presented for solving PDEs including non-self-adjoint time-dependent ones. PFNN-2 employs a Galerkin variational principle to convert the original equation into a weak form with low smoothness requirement, splits the problem with complex initial-boundary constraints into two simpler unconstrained ones, and adopts two neural networks to approximate the corresponding local solutions, which are then combined to form the approximate solution on the whole domain. A domain decomposition strategy is introduced in PFNN-2 to accelerate the training efficiency while maintaining the numerical accuracy. Experiment results demonstrate that PFNN-2 can surpass existing state-of-the-art in terms of accuracy, efficiency and scalability. Possible future explorations on PFNN-2 would involve the improvement on the network structure, the investigation on the error estimation, and the study on solving more challenging problems such as the ones with strong singularities.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (grant# 12131002).

References

  • [1] H. Sheng, C. Yang, PFNN: A penalty-free neural network method for solving a class of second-order boundary-value problems on complex geometries, Journal of Computational Physics, 428 (2021) 110085.
  • [2] M. Raissi, A. Yazdani, G. E. Karniadakis, Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations, Science, 367 (2020) 1026-1030.
  • [3] X. Jin, S. Cai, H. Li, G. E. Karniadakis, NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations, Journal of Computational Physics, 426 (2021) 109951.
  • [4] O. Hennigh, S. Narasimhan, M. A. Nabian, A. Subramaniam, K. Tangsali, Z. Fang, M. Rietmann, W. Byeon, S. Choudhry, NVIDIA SimNetTM{}^{\mbox{TM}}: An AI-accelerated multi-physics simulation framework, in: International Conference on Computational Science, Springer, 2021, pp. 447-461.
  • [5] J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences, 115 (2018) 8505-8510.
  • [6] C. Beck, W. E, A. Jentzen, Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations, Journal of Nonlinear Science, 29 (2019) 1563-1619.
  • [7] D. Pfau, J. S. Spencer, A. G. Matthews, W. M. C. Foulkes, Ab initio solution of the many-electron Schröodinger equation with deep neural networks, Physical Review Research, 2 (2020) 033429.
  • [8] T. M. Razakh, B. Wang, S. Jackson, R. K. Kalia, A. Nakano, K.-i. Nomura, P. Vashishta, PND: Physics-informed neural-network software for molecular dynamics applications, SoftwareX, 15 (2021) 100789.
  • [9] K. Shukla, P. C. Di Leoni, J. Blackshire, D. Sparkman, G. E. Karniadakis, Physics-informed neural network for ultrasound nondestructive quantification of surface breaking cracks, Journal of Nondestructive Evaluation, 39 (2020) 1-20.
  • [10] D. Li, K. Xu, J. M. Harris, E. Darve, Coupled time-lapse full-waveform inversion for subsur-face flow problems using intrusive automatic differentiation, Water Resources Research, 56 (2020) e2019WR027032.
  • [11] W. Zhu, K. Xu, E. Darve, G. C. Beroza, A general approach to seismic inversion with automatic differentiation, Computers & Geosciences, 151 (2021) 104751.
  • [12] H. Lee, I. S. Kang, Neural algorithm for solving differential equations, Journal of Computational Physics, 91 (1990) 110-131.
  • [13] A. J. Meade Jr, A. A. Fernandez, The numerical solution of linear ordinary differential equations by feedforward neural networks, Mathematical and Computer Modelling, 19 (1994) 1-25.
  • [14] B. P. van Milligen, V. Tribaldos, J. Jiménez, Neural network differential equation and plasma equilibrium solver, Physical Review Letters, 75 (1995) 3594.
  • [15] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks, 9 (1998) 987-1000.
  • [16] I. E. Lagaris, A. C. Likas, D. G. Papageorgiou, Neural-network methods for boundary value problems with irregular boundaries, IEEE Transactions on Neural Networks, 11 (2000) 1041-1049.
  • [17] K. S. McFall, J. R. Mahan, Artificial neural network method for solution of boundary value problems with exact satisfaction of arbitrary boundary conditions, IEEE Transactions on Neural Networks, 20 (2009) 1221-1233.
  • [18] K. S. McFall, Automated design parameter selection for neural networks solving coupled partial differential equations with discontinuities, Journal of the Franklin Institute, 350 (2013) 300-317.
  • [19] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019) 686-707.
  • [20] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nature Reviews Physics, (2021) 1-19.
  • [21] W. E, B. Yu, The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics, 6 (2018) 1-12.
  • [22] Z. Long, Y. Lu, B. Dong, PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network, Journal of Computational Physics, 399 (2019) 108925.
  • [23] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier neural operator for parametric partial differential equations, arXiv preprint, arXiv:2010.08895 (2020).
  • [24] Y. Chen, B. Dong, J. Xu, Meta-mgnet: Meta multigrid networks for solving parameterized partial differential equations, arXiv preprint, arXiv:2010.14088 (2020).
  • [25] L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Machine Intelligence, 3 (2021) 218-229.
  • [26] J. He, L. Li, J. Xu, C. Zheng, ReLU deep neural networks and linear finite elements, Journal of Computational Mathematics (2019).
  • [27] W. E, Machine learning and computational mathematics, arXiv preprint, arXiv:2009.14596 (2020).
  • [28] J. A. Opschoor, P. C. Petersen, C. Schwab, Deep ReLU networks and high-order finite element methods, Analysis and Applications, 18 (2020) 715-770.
  • [29] G. Kutyniok, P. Petersen, M. Raslan, R. Schneider, A theoretical analysis of deep neural networks and parametric PDEs, Constructive Approximation, (2021) 1-53.
  • [30] P. Ming, Y. Liao, Deep Nitsche method: Deep Ritz method with essential boundary conditions, Communications in Computational Physics, 29 (2021) 1365-1384.
  • [31] R. Khodayi-Mehr, M. Zavlanos, VarNet: Variational neural networks for the solution of partial differential equations, in: Learning for Dynamics and Control, 2020, pp. 298-307.
  • [32] E. Kharazmi, Z. Zhang, G. E. Karniadakis, Variational physics-informed neural networks for solving partial differential equations, arXiv preprint, arXiv:1912.00873 (2019).
  • [33] E. Kharazmi, Z. Zhang, G. E. Karniadakis, hp-VPINNs: Variational physics-informed neural networks with domain decomposition, arXiv preprint, arXiv:2003.05385 (2020).
  • [34] Y. Zang, G. Bao, X. Ye, H. Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics (2020) 109409.
  • [35] Z. Liu, Y. Yang, Q.-D. Cai, Solving differential equation with constrained multilayer feedforward network, arXiv preprint, arXiv:1904.06619 (2019).
  • [36] L. Lyu, K. Wu, R. Du, J. Chen, Enforcing exact boundary and initial conditions in the deep mixed residual method, arXiv preprint, arXiv:2008.01491 (2020).
  • [37] S. Wang, Y. Teng, P. Perdikaris, Understanding and mitigating gradient pathologies in physics-informed neural networks, arXiv preprint, arXiv:2001.04536 (2020).
  • [38] A. D. Jagtap, K. Kawaguchi, G. E. Karniadakis, Adaptive activation functions accelerate convergence in deep and physics-informed neural networks, Journal of Computational Physics, 404 (2020) 109136.
  • [39] H. Gao, L. Sun, J.-X. Wang, PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain, Journal of Computational Physics, 428 (2021) 110079.
  • [40] H. Gao, L. Sun, J.-X. Wang, Super-resolution and denoising of fluid flow using physics-informed convolutional neural networks without high-resolution labels, Physics of Fluids, 33 (2021) 073603.
  • [41] M. A. Nabian, R. J. Gladstone, H. Meidani, Efficient training of physics-informed neural networks via importance sampling, Computer-Aided Civil and Infrastructure Engineering, (2021).
  • [42] C. L. Wight, J. Zhao, Solving Allen-Cahn and Cahn-Hilliard equations using the adaptive physics informed neural networks, arXiv preprint, arXiv:2007.04542 (2020).
  • [43] J. C. Strikwerda, Finite difference schemes and partial differential equations, SIAM, 2004.
  • [44] K.-J. Bathe, Finite element procedures, Klaus-Jurgen Bathe, 2006.
  • [45] R. Eymard, T. Gallouët, R. Herbin, Finite volume methods, Handbook of numerical analysis, 7 (2000) 713-1018.
  • [46] L. D. McClenny, M. A. Haile, U. M. Braga-Neto, TensorDiffEq: Scalable multi-GPU forward and inverse solvers for physics informed neural networks, arXiv preprint, arXiv:2103.16034 (2021).
  • [47] T. Ben-Nun, T. Hoefler, Demystifying parallel and distributed deep learning: An in-depth concurrency analysis, ACM Computing Surveys (CSUR), 52 (2019) 1-43.
  • [48] A. Farkas, G. Kertész, R. Lovas, Parallel and distributed training of deep neural networks: A brief overview, in: 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES), IEEE, 2020, pp. 165-170.
  • [49] K. Chang, N. Balachandar, C. Lam, D. Yi, J. Brown, A. Beers, B. Rosen, D. L. Rubin, J. Kalpathy-Cramer, Distributed deep learning networks among institutions for medical imaging, Journal of the American Medical Informatics Association, 25 (2018) 945-954.
  • [50] J. Chen, K. Li, Q. Deng, K. Li, S. Y. Philip, Distributed deep learning model for intelligent video surveillance systems with edge computing, IEEE Transactions on Industrial Informatics, (2019).
  • [51] W. Zhang, X. Cui, U. Finkler, B. Kingsbury, G. Saon, D. Kung, M. Picheny, Distributed deep learning strategies for automatic speech recognition, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5706-5710.
  • [52] H. Wu, Z. Zhang, C. Guan, K. Wolter, M. Xu, Collaborate edge and cloud computing with distributed deep learning for smart city internet of things, IEEE Internet of Things Journal, 7 (2020) 8099-8110.
  • [53] B. F. Smith, P. E. Bjørstad, W. D. Gropp, Domain decomposition: Parallel multilevel methods for elliptic partial differential equations, 1996.
  • [54] W. Li, X. Xiang, Y. Xu, Deep domain decomposition method: Elliptic problems, in: Mathematical and Scientific Machine Learning, PMLR, 2020, pp. 269-286.
  • [55] V. Mercier, S. Gratton, P. Boudier, A coarse space acceleration of Deep-DDM, arXiv preprint, arXiv:2112.03732 (2021).
  • [56] K. Li, K. Tang, T. Wu, Q. Liao, D3M: A deep domain decomposition method for partial differential equations, IEEE Access (2019).
  • [57] A. D. Jagtap, E. Kharazmi, G. E. Karniadakis, Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems, Computer Methods in Applied Mechanics and Engineering, 365 (2020) 113028.
  • [58] A. D. Jagtap, G. E. Karniadakis, Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations, Communications in Computational Physics, 28 (2020) 2002-2041.
  • [59] K. Shukla, A. D. Jagtap, G. E. Karniadakis, Parallel physics-informed neural networks via domain decomposition, arXiv preprint, arXiv:2104.10013 (2021).
  • [60] Z. Hu, A. D. Jagtap, G. E. Karniadakis, K. Kawaguchi, When do extended physics-informed neural networks (XPINNs) improve generalization?, arXiv preprint, arXiv:2109.09444 (2021).
  • [61] A. D. Jagtap, Z. Mao, N. Adams, G. E. Karniadakis, Physics-informed neural networks for inverse problems in supersonic flows, arXiv preprint, arXiv:2202.11821 (2022).
  • [62] T. De Ryck, A. D. Jagtap, S. Mishra, Error estimates for physics informed neural networks approximating the Navier-Stokes equations, arXiv preprint, arXiv:2203.09346 (2022).
  • [63] J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics, 375 (2018) 1339-1364.
  • [64] Z. Wang, Z. Zhang, A mesh-free method for interface problems using the deep learning approach, Journal of Computational Physics, 400 (2020) 108963.
  • [65] J. Huang, H. Wang, T. Zhou, An augmented Lagrangian deep learning method for variational problems with essential boundary conditions, arXiv preprint, arXiv:2106.14348 (2021).
  • [66] G. Bao, X. Ye, Y. Zang, H. Zhou, Numerical solution of inverse problems by weak adversarial networks, arXiv preprint, arXiv:2002.11340 (2020).
  • [67] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
  • [68] J. Nocedal, Updating quasi-newton matrices with limited storage, Mathematics of computation, 35 (1980) 773-782.
  • [69] J. M. Church, Z. Guo, P. K. Jimack, A. Madzvamuse, K. Promislow, B. Wetton, S. M. Wise, F. Yang, High accuracy benchmark problems for Allen-Cahn and Cahn-Hilliard dynamics, Communications in computational physics, 26 (2019).