This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Luis Kaiser 22institutetext: Oden Institute for Computational Engineering and Science, University of Texas at Austin;
201 E 24th St, Austin, TX 78712, 22email: lkaiser@utexas.edu
33institutetext: Richard Tsai 44institutetext: Oden Institute for Computational Engineering and Science, University of Texas at Austin;
201 E 24th St, Austin, TX 78712, 44email: ytsai@math.utexas.edu
55institutetext: Christian Klingenberg 66institutetext: Department of Mathematics, University of Wuerzburg; Emil-Fischer-Straße 40, 97074 Wuerzburg, Germany, 66email: klingen@mathematik.uni-wuerzburg.de

Efficient Numerical Wave Propagation Enhanced By An End-to-End Deep Learning Model

Luis Kaiser\orcidID0009-0006-4576-6650 and
Richard Tsai\orcidID0000-0001-8441-3678 and
Christian Klingenberg\orcidID0000-0003-2033-8204
Abstract

In a variety of scientific and engineering domains, the need for high-fidelity and efficient solutions for high-frequency wave propagation holds great significance. Recent advances in wave modeling use sufficiently accurate fine solver outputs to train a neural network that enhances the accuracy of a fast but inaccurate coarse solver. In this paper we build upon the work of Nguyen and Tsai (2023) and present a novel unified system that integrates a numerical solver with a deep learning component into an end-to-end framework. In the proposed setting, we investigate refinements to the network architecture and data generation algorithm. A stable and fast solver further allows the use of Parareal, a parallel-in-time algorithm to correct high-frequency wave components. Our results show that the cohesive structure improves performance without sacrificing speed, and demonstrate the importance of temporal dynamics, as well as Parareal, for accurate wave propagation.

1 Introduction

Wave propagation computations form the forward part of a numerical method for solving the inverse problem of geophysical inversion. This involves solving the wave equation with highly varying sound speed many times in a most efficient way. For instance, by accurately analyzing the reflections and transmissions generated by complex media discontinuities, it becomes possible to characterize underground formations when searching for natural gas. However, traditional numerical computations often demand a computationally expensive fine grid to guarantee stability.

Aside from physics-informed neural networks (PINNs) moseley2020solving ; MENG2020113250 and neural operators kovachki2023neural ; li2020fourier , convolutional neural network (CNN) approaches yield remarkable results nguyen2022numerical ; rizzhelm ; mlfluid to improve the efficiency of wave simulations, but demand preceding media analysis and tuning of inputs. Furthermore, numerical solvers are avoided to prioritize speed RAISSI2019686 ; especially for extended periods, these methods can diverge.

Therefore, combining a classical numerical solver with a neural network to solve the second-order linear wave equation efficiently across a variety of wave speed profiles is a central point of our research. We take a first step by expanding the method of Nguyen and Tsai nguyen2022numerical and build an end-to-end model that enhances a fast numerical solver through deep learning. Thus, component interplay is optimized, and training methods can involve multiple steps to account for temporal wave dynamics. Similarly, while other Parareal-based datasets nguyen2022numerical ; ibrahim2023parareal are limited to single time-steps to add back missing high-frequency components, a cohesive system can handle Parareal for sequential time intervals.

Approach and Contribution.

An efficient numerical solver 𝒢Δt𝔲𝒢Δt[𝔲,c]\mathcal{G}_{\Delta t}\mathfrak{u}\equiv\mathcal{G}_{\Delta t}[\mathfrak{u},c] is used to propagate a wave 𝔲(x,t)=(u,tu)\mathfrak{u}(x,t)=(u,\partial_{t}u) for a time step t+Δtt+\Delta t on a medium described by the piecewise smooth wave speed c(x)c(x) for x[1,1]2x\in[-1,1]^{2}. This method is computationally cheap since the advancements are computed on a coarse grid using a large time step within the limitation of numerical stability; however, it is consistently less accurate than an expensive fine solver Δt𝔲Δt[𝔲,c]\mathcal{F}_{\Delta t}\mathfrak{u}\equiv\mathcal{F}_{\Delta t}[\mathfrak{u},c]. Consequently, the solutions 𝒢Δt𝔲\mathcal{G}_{\Delta t}\mathfrak{u} exhibit numerical dispersion errors and miss high-fidelity details. In a supervised learning framework, we aim to reduce this discrepancy using the outputs from Δt\mathcal{F}_{\Delta t} as the examples.

We define a restriction operator \mathcal{R} which transforms functions from a fine grid to a coarse grid. Additionally, for mapping coarse grid functions to a fine grid, we integrate a neural network θ\mathcal{I}^{\theta} to augment the under-resolved wave field. We can now define a neural propagator ΨΔt[𝔲,c,θ]ΨΔtθ\Psi_{\Delta t}[\mathfrak{u},c,\theta]\equiv\Psi_{\Delta t}^{\theta} that takes a wave field 𝔲\mathfrak{u} defined on the fine grid, propagates it on a coarser grid, and returns the enhanced wave field on the fine grid,

𝔲n+1𝔲(x,t+Δt)=Δt𝔲nΨΔtθ𝔲nθ𝒢Δt𝔲n.\mathfrak{u}_{n+1}\coloneqq\mathfrak{u}(x,t+\Delta t)=\mathcal{F}_{\Delta t}\mathfrak{u}_{n}\approx\Psi_{\Delta t}^{\theta}\mathfrak{u}_{n}\coloneqq\mathcal{I}^{\theta}\mathcal{G}_{\Delta t}\mathcal{R}\mathfrak{u}_{n}. (1)

The models are parameterized by the family of initial wave fields 𝔉𝔲0\mathfrak{F}_{\mathfrak{u}_{0}} and wave speeds 𝔉c\mathfrak{F}_{c}.

2 Finite-Difference-Based Wave Propagators

We consider smooth initial conditions and absorbing or periodic boundary conditions that lead to well-posed initial boundary value problems. Since we are interested in seismic exploration applications, both boundary conditions can be used to simulate the propagation of wave fields with initial energy distributed inside a compact domain. Following the setup in nguyen2022numerical , let QhuQ_{h}u denote a numerical approximation of Δu\Delta u with discretized spatial and temporal domains, i.e.,

ttu(x,t)c2(x)Qhu(x,t).\partial_{tt}u(x,t)\approx c^{2}(x)Q_{h}u(x,t). (2)

For the spatial (Δx,δx\Delta x,\delta x) and temporal spacing (Δt,δt\Delta t,\delta t) on uniform Cartesian grids, the approximation (u,ut)t(ut,c2Qhu)(u,u_{t})_{t}\approx(u_{t},c^{2}Q_{h}u) can be solved by a time integrator:

  • Coarse solver 𝒢Δt\mathcal{G}_{\Delta t^{\star}} (𝒮Δx,ΔtQh)M\coloneqq(\mathcal{S}_{\Delta x,\Delta t}^{Q_{h}})^{M} with Δt=MΔt\Delta t^{\star}=M\Delta t, which operates on the lower resolution grid, Δx2×Δt+\Delta x\mathbb{Z}^{2}~{}\times~{}\Delta t\mathbb{Z}^{+}. QhQ_{h} is characterized by the velocity Verlet algorithm with absorbing boundary conditions enq .

  • Fine solver Δt\mathcal{F}_{\Delta t^{\star}} (𝒮δx,δtQh)m\coloneqq(\mathcal{S}_{\delta x,\delta t}^{Q_{h}})^{m} with Δt=mδt\Delta t^{\star}=m\delta t, which operates on the higher resolution grid, δx2×δt+\delta x\mathbb{Z}^{2}~{}\times~{}\delta t\mathbb{Z}^{+}, and is sufficiently accurate for the wave speed. We shall use the explicit Runge-Kutta of forth-order (RK4) pseudo-spectral method runge-kutta . Since this approach is only suitable for PDEs with periodic boundary conditions, we first apply Δt\mathcal{F}_{\Delta t^{\star}} to a larger domain and then crop the result.

Model Components.

As the two solvers operate on different Cartesian grids, with δx<Δx\delta x<\Delta x and δt<Δt\delta t<\Delta t, we define the restriction operator \mathcal{R}, which transforms functions from a fine to a coarse grid, and the prolongation operator \mathcal{I}, which maps the inverse relation. The enhanced variants consist of (a) bilinear interpolations denoted as \mathcal{R} and 0\mathcal{I}^{0}, while 0𝔲𝔲\mathcal{I}^{0}\mathcal{R}\mathfrak{u}\neq\mathfrak{u}, and (b) neural network components denoted as θΛΔtθΛ\mathcal{I}^{\theta}\equiv\Lambda^{{\dagger}}\mathcal{I}_{\Delta t^{\star}}^{~{}\theta}\Lambda, while the lower index indicates that the neural networks are trained when the step size Δt\Delta t^{\star} is used. For improved neural network inference, we use the transition operator Λ\Lambda to transform physical wave fields (u,ut)(u,u_{t}) to energy component representations (u,c2ut)(\nabla u,c^{-2}u_{t}), with Λ\Lambda^{{\dagger}} as the pseudo-inverse (see also nguyen2022numerical ; energy1 ). Figure 1 provides a schema visualizing the wave argument transitions.

Refer to caption
Figure 1: Detailed schematic of E2E-JNet3 (adapted from nguyen2022numerical ). Each convolutional block (blue) encompasses a 3x3 convolutional layer (groups=3\text{groups}=3, padding=1\text{padding}=1), followed by a batch normalization and a ReLU activation function. Connectivity within the network is depicted by arrows, with the dashed arrow specifically indicating a single application of 𝒢Δt\mathcal{G}_{\Delta t^{\star}}.

Variants of the neural propagators.

A simple model with bilinear interpolation (E2E-V, 0ΨΔtθ\mathcal{I}^{0}\Psi_{\Delta t^{\star}}^{\theta}\mathcal{R}) is used as a baseline. Each variant changes the baseline by exactly one aspect. This allows us to isolate the effect of each architecture modification. The four investigated end-to-end models ΨΔtθΔtθ𝒢Δt\Psi_{\Delta t^{\star}}^{\theta}\coloneqq\mathcal{I}_{\Delta t^{\star}}^{~{}\theta}\mathcal{G}_{\Delta t^{\star}}\mathcal{R} are:

E2E-JNet3:

E2E 3-level JNet (Figure 1)

E2E-JNet5:

E2E 5-level JNet

E2E-Tira:

Tiramisu JNet tiramisu

E2E-Trans:

Transformer JNet petit2021unet

The second baseline is taken from nguyen2022numerical and denoted as the modular, not end-to-end 3-level JNet (NE2E-JNet3), Δtθ(𝒢Δt\mathcal{I}_{\Delta t^{\star}}^{~{}\theta}(\mathcal{G}_{\Delta t^{\star}}\mathcal{R}), while results of 𝒢Δt\mathcal{G}_{\Delta t^{\star}} are used to separately train the E2E-JNet3 upsampling component.

3 Data Generation Approaches

For optimal results, the training horizon must be long enough to contain sufficiently representative wave patterns that develop in the propagation from the chosen distribution of initial wave fields. Yet the number of iterations must remain small to maintain similarities across different wave speeds. Similar to nguyen2022numerical , we chose to generate the dataset in the following way:

  1. 1.

    An initial wave field 𝔲0=(u0,p0)𝔉u0\mathfrak{u}_{0}=(u_{0},p_{0})\in\mathfrak{F}_{u_{0}} is sampled from a Gaussian pulse,

    u0=e|x+τ|2σ2,p0tu0=0u_{0}=e^{-\frac{|x+\tau|^{2}}{\sigma^{2}}},p_{0}\equiv\partial_{t}u_{0}=0 (3)

    with x[1,1]2x\in[-1,1]^{2}, 1σ2𝒩(250,10)\frac{1}{\sigma^{2}}\sim\mathcal{N}(250,10) and the initial velocity field p0p_{0}. τ[0.5,0.5]2\tau\in[-0.5,0.5]^{2} is the displacement of the Gaussian pulse’s location from the center.

  2. 2.

    Every 𝔲0𝔉u0\mathfrak{u}_{0}\in\mathfrak{F}_{u_{0}} is then propagated eight time steps Δt=0.06\Delta t^{\star}=0.06 by Δt\mathcal{F}_{\Delta t^{\star}}. We adopt the fine grid settings for the spatial (δx=2128\delta x=\frac{2}{128}) and temporal resolution (δt=11280\delta t=\frac{1}{1280}) from nguyen2022numerical .

The wave trajectories 𝔲n+1=Δt𝔲n\mathfrak{u}_{n+1}=\mathcal{F}_{\Delta t^{\star}}\mathfrak{u}_{n} provide the input and output data for the supervised learning algorithm, which aims to learn the solution map ΨΔtθ:XY\Psi_{\Delta t^{\star}}^{\theta}:X\mapsto Y:

X{(un,c2(un)t,c)}Y{(un+1,c2(un+1)t)},\begin{split}&X\coloneqq\{(\nabla u_{n},c^{-2}(u_{n})_{t},c)\}\\ &Y\coloneqq\{(\nabla u_{n+1},c^{-2}(u_{n+1})_{t})\},\end{split} (4)

where 𝒟={(x,y)}\mathcal{D}=\{(x,y)\} with xXx\in X, yYy\in Y. 𝒟\mathcal{D} is modified to create 𝒟m\mathcal{D}^{m}, 𝒟w,m\mathcal{D}^{w,m} (Subsection 3.1), and 𝒟p\mathcal{D}^{p} (Subsection 3.2). For brevity, the dataset is only specified if the model is trained on a modified version; e.g., E2E-JNet3 (𝒟m\mathcal{D}^{m}) is the E2E-JNet3 model trained on 𝒟m\mathcal{D}^{m}.

Wave Speeds

c𝔉cc\sim\mathfrak{F}_{c} are sampled from randomly chosen subregions of two synthetic geological structures, Marmousi marmousi and BP bp , that are mapped onto the spatial grid h2[1,1]2h\mathbb{Z}^{2}\cap[-1,1]^{2} (see Figure 2). Four manually modified media (cf. nguyen2022numerical ) are added during testing to examine rapid variations in wave speed.

Refer to caption
Figure 2: Velocity profiles. Brighter colors indicate higher velocity, while randomly chosen subregions are shown in red squares. Marmousi and BP profiles are drawn with a probability of 30%30\% each, and the other velocity profiles with a probability of 10%10\% each, respectively.

3.1 Multi-Step Training

During evaluation, the end-to-end model ΨΔtθ\Psi_{\Delta t}^{\theta} is applied multiple times to itself to iteratively advance waves over the duration Δt\Delta t. It comes naturally to include longer-term dependencies also in our training dataset. For kk time steps, we therefore introduce a multi-step training strategy that modifies Eq. 1:

𝔲n+k𝔲(tn+kΔt)=(Δt)k𝔲n(ΨΔtθ)k𝔲n.\mathfrak{u}_{n+k}\coloneqq\mathfrak{u}(t_{n}+k\Delta t)=(\mathcal{F}_{\Delta t})^{k}\mathfrak{u}_{n}\approx(\Psi_{\Delta t}^{\theta})^{k}\mathfrak{u}_{n}. (5)

By computing the gradient with respect to the sum of consecutive losses, the gradient flows through the entire computation graph across multiple time steps.

For each initial condition 𝔲0\mathfrak{u}_{0}, Δt\mathcal{F}_{\Delta t} is applied NN times with solutions denoted as 𝔲n,n𝒰1{0,,N}\mathfrak{u}_{n},\forall n\in\mathcal{U}_{1}\coloneqq\{0,\ldots,N\}. In random order, ΨΔtθ\Psi_{\Delta t}^{\theta} is applied to every 𝔲n\mathfrak{u}_{n} for a random amount of steps k𝒰2{n+1,,Nn}k\in\mathcal{U}_{2}\coloneqq\{n+1,\ldots,N-n\}. Formally, the optimization problem can therefore be described as:

minθm×n(ΨΔtθ;𝒟)=minθm×n1|𝒟|𝔲0,cn𝒰1{N}k𝒰2n<kNn(ΨΔtθ)k𝔲n(Δt)k𝔲nEh2.\min_{\theta\in\mathbb{R}^{m\times n}}\mathcal{L}(\Psi_{\Delta t}^{\theta};\mathcal{D})=\min_{\theta\in\mathbb{R}^{m\times n}}\frac{1}{|\mathcal{D}|}\sum\limits_{\mathfrak{u}_{0},c}\sum\limits_{n\sim\mathcal{U}_{1}\setminus{\{N\}}}\sum\limits_{\begin{subarray}{c}k\sim\mathcal{U}_{2}\\ n<k\leq N-n\end{subarray}}\lVert(\Psi_{\Delta t}^{\theta})^{k}\mathfrak{u}_{n}-(\mathcal{F}_{\Delta t})^{k}\mathfrak{u}_{n}\rVert_{E_{h}}^{2}. (6)

The norm Eh2\lVert\cdot\rVert_{E_{h}}^{2} is the discretized energy semi-norm MSE as detailed in nguyen2022numerical . We draw both nn and kk from the uniform random distributions, i.e., n𝒰1n\sim\mathcal{U}_{1} and k𝒰2k\sim\mathcal{U}_{2}, respectively. The novel dataset is denoted as 𝒟m={(unk,c2(unk)t,c;un+1,c2(un+1)t)}\mathcal{D}^{m}=\{(\nabla u_{n}^{k},c^{-2}(u_{n}^{k})_{t},c;\nabla u_{n+1},c^{-2}(u_{n+1})_{t})\}.

Weighted Approach.

Since in the model’s initial, untrained phase, feature variations can be extreme and may lead to imprecise gradient estimations, we aim to accelerate convergence by weighting individual losses. Therefore, rather than drawing k𝒰2k\sim\mathcal{U}_{2} from a uniform distribution, we select values according to a truncated normal distribution TN(μ,σ,a,b)(\mu,\sigma,a,b) from the sample space represented as <a<b<-\infty<a<b<\infty. This focuses on minimizing the impact of errors in the early training stage. After every third epoch, the mean μ\mu is increased by one to account for long-term dependencies. We refer to this dataset as 𝒟w,m\mathcal{D}^{w,m}.

3.2 Parareal Algorithm

Identical to nguyen2022numerical , our implemented scheme iteratively refines the solution using the difference between Δt\mathcal{F}_{\Delta t} and 𝒢Δt\mathcal{G}_{\Delta t} for each subinterval Δt\Delta t. In particular, missing high-frequency components occur due to the transition to a lower grid resolution, or a too simple numerical algorithm. Therefore, a more elaborate model ΨΔtθ\Psi_{\Delta t}^{\theta} is required for convergence. Formally, we rearrange Eq. 1 for the time stepping of Δt\mathcal{F}_{\Delta t}, and replace Δt()𝔲n\mathcal{F}_{\Delta t}(\mathcal{I}\mathcal{R})\mathfrak{u}_{n} by the computationally cheaper strategy ΨΔtθ\Psi_{\Delta t}^{\theta} end-to-end:

𝔲n+1k+1ΨΔtθ𝔲nk+1+[Δt𝔲nkΨΔtθ𝔲nk],k=0,,K1\displaystyle\mathfrak{u}_{n+1}^{k+1}\coloneqq\Psi_{\Delta t}^{\theta}\mathfrak{u}_{n}^{k+1}+[\mathcal{F}_{\Delta t}\mathfrak{u}_{n}^{k}-\Psi_{\Delta t}^{\theta}\mathfrak{u}_{n}^{k}],~{}~{}~{}k=0,\dots,K-1 (7)
𝔲n+10ΨΔtθ𝔲n0,n=0,,N1.\displaystyle\mathfrak{u}_{n+1}^{0}\coloneqq\Psi_{\Delta t}^{\theta}\mathfrak{u}_{n}^{0},~{}~{}~{}n=0,\dots,N-1. (8)

We observe that the computationally expensive Δt𝔲nk\mathcal{F}_{\Delta t}\mathfrak{u}_{n}^{k} on the right-hand side of Eq. 7 can be performed in parallel for each iteration in kk.

Parareal iterations alter a given initial sequence of wave fields 𝔲n0\mathfrak{u}_{n}^{0} to 𝔲nk\mathfrak{u}_{n}^{k} for nn\in\mathbb{N}. This means that neural operators should be trained to map 𝔲nk\mathfrak{u}_{n}^{k} to Δt𝔲nk\mathcal{F}_{\Delta t}\mathfrak{u}^{k}_{n}. Therefore, appropriate training patterns for this setup would naturally differ from those found in 𝒟\mathcal{D}, and the dataset for use with Parareal should be sampled from a different distribution, denoted as 𝒟p\mathcal{D}^{p}.

4 Evaluation Setup

The parameters for 𝒢Δt\mathcal{G}_{\Delta t^{\star}} are set to Δx=264\Delta x=\frac{2}{64} and Δt=1600\Delta t=\frac{1}{600}, with a bilinear interpolation scale factor of two.

Experiment 1: Architecture Preselection.

The average training time of each variant is approximately 7373 CPU core hours. Due to resource constraints, we therefore limit our main analysis to one end-to-end variant. Here, we selected the most promising approach from four deep learning architectures trained on 𝒟\mathcal{D}.

Experiment 2: Multi-Step Training.

We train the chosen end-to-end variant from experiment 1 on 𝒟m\mathcal{D}^{m} using an equal number of training points as in 𝒟\mathcal{D}. The test set is consistent with 𝒟\mathcal{D} to enable comparison with other experiments.

Experiment 3: Weighted Multi-Step Training.

The setup follows experiment 2, while the models are trained on 𝒟w,m\mathcal{D}^{w,m}.

Experiment 4: Parareal Optimization.

We explore improvements to our variants using the Parareal scheme in two datasets:
A. Comprehensive Training (𝒟trainp\mathcal{D}^{p}_{\text{train}}): The models are trained according to the Parareal scheme in Eq. 7 and Eq. 8 with K=4K=4 using a random sample that constitutes a quarter of the original dataset 𝒟\mathcal{D} for fair comparisons. The gradients are determined by summing the losses of a Parareal iteration.
B. Fine-tuning (𝒟refinep\mathcal{D}^{p}_{\text{refine}}): Rather than employing an un-trained model, we deploy variants that were pre-trained on a random subset containing half of 𝒟\mathcal{D}. Then, for another subset that constitutes an eighth of 𝒟\mathcal{D}, ΨΔtθ\Psi_{\Delta t}^{\theta} is applied with Parareal.

5 Discussion

Each of the total 7272 runs required an average of 72.872.8 GPU core hours on one NVIDIA A100 Tensor Core GPU to complete, while the E2E-JNet3 was trained almost 41%41\% faster and the E2E-Tira three times slower than the average. This sums up to a total runtime on a single GPU of just over 5,2415{,}241 hours.

The best trial on the test set was achieved by E2E-Tira with an energy MSE of 0.01090.0109, which is well below the 0.04620.0462 from the previously published model, NE2E-JNet3. Our most efficient variant is E2E-JNet3 trained on 𝒟w,m\mathcal{D}^{w,m} with an energy MSE of 0.01690.0169, which is close to the results of more extensive models such as E2E-Tira and E2E-Trans, but is more than five times faster.

Refer to caption
Figure 3: Total performance of all hyperparameter search trials on the validation set. The boxes represent the range between the 25th and 75th percentile of values, while the whiskers indicate 1.5 times the interquartile range. The blue line illustrates the result of the baseline E2E-V. The red dot shows the mean and the black line marks the median of the data. The grey histograms in the background present the average training time of the respective variant in hours on a single GPU.

End-to-end Structure.

The first important observation based on Figure 3 is that integrating NE2E-JNet3 into a single, end-to-end system (E2E-JNet3) improved the average accuracy on the validation set by more than 46%46\%, and on the test set by ca. 53%53\%. The ability to include the loss of both the coarse solver and downsampling layer also caused a lower standard deviation and fewer outliers, since the mean is well above the median for NE2E-JNet3 compared to E2E-JNet3.

Multi-Step Training.

Introducing a multi-step training loss enhanced the benefits of an end-to-end architecture even further (cf. E2E-JNet3 (𝒟m\mathcal{D}^{m}) in Figure 3) without increasing the number of model parameters. Figure 4 depicts how all end-to-end models had a much lower relative energy MSE (cf. nguyen2022numerical ) increase particularly for the first three time steps on the test set. Hence, we conclude that connecting wave states to incorporate temporal propagation dynamics in the training data appears to be especially important for the early stages of wave advancements. Additionally, by taking fewer steps through sampling from a normal distribution that is being shifted along the x-axis (cf. E2E-JNet3 (𝒟w,m\mathcal{D}^{w,m})), we successfully avoid high performance fluctuations when the model is only partially trained. Figure 5 visualizes the correction of the low-fidelity solution of 𝒢Δt\mathcal{G}_{\Delta t^{\star}} by E2E-JNet3 (𝒟w,m\mathcal{D}^{w,m}).

Refer to caption
Figure 4: Comparing the NE2E-JNet3 model and three end-to-end JNet3 variants that differ in their training algorithms. Initial conditions and velocity profiles are sampled from 𝒟\mathcal{D} and the relative energy MSE results of 10 runs are averaged. As expected, all models show a bounded growth as the waves vanish due to absorbing boundary conditions.
Refer to caption
Figure 5: Visualizing the wave field correction of E2E-JNet3 (𝒟w,m\mathcal{D}^{w,m}) in the energy semi-norm after four time steps of Δt\Delta t^{\star}. The initial condition and velocity profile are sampled from 𝒟\mathcal{D}.

Upsampling Architecture.

An overview of the upsampling architecture performances can be found in Table 1. As expected, the larger networks (E2E-Tira and E2E-Trans) performed slightly better compared to the 3-level JNet architecture, but for the ResNet architecture (E2E-JNet5), more weights did not increase accuracy by much. Consequently, we theorize that the ResNet design may be insufficient for capturing high-fidelity wave patterns, while especially highly-connected layers with an optimized feature and gradient flow (E2E-Tira) are better suited. Given that E2E-JNet3 (𝒟w,m\mathcal{D}^{w,m}) had only a slightly worse average energy MSE on the test set, we generally advise against using the expensive models in our setup.

Table 1: Upsampling variants performance averaged over 10 runs using a batch size of 6464.
variant number of parameters GPU time (sec) test energy MSE
Δt\mathcal{F}_{\Delta t^{\star}} - 57.96749 -
E2E-V - 2.40421 0.07437
E2E-JNet3 40,008 2.88331 0.02496
E2E-JNet5 640,776 10.84893 0.02379
E2E-Tira 123,427 13.57449 0.01274
E2E-Trans 936,816 15.67633 0.01743

Parareal.

While models trained with 𝒟trainp\mathcal{D}^{p}_{\text{train}} have an unstable training progress and diverging loss, applying E2E-JNet3 (𝒟refinep\mathcal{D}^{p}_{\text{refine}}) within the Parareal scheme showed better accuracy than E2E-JNet3 with Parareal (cf. Figure 6). As this training method improved the stability of Parareal, sampling the causality of concurrently solving multiple time intervals is an efficient enhancement to our end-to-end structure.

Refer to caption
Figure 6: Energy MSE of E2E-JNet3 and E2E-JNet3 (𝒟refinep\mathcal{D}^{p}_{\text{refine}}) averaged over 10 runs.

6 Conclusion

In this paper we enhanced the method proposed by Nguyen and Tsai nguyen2022numerical , and reported the results of a large-scale study on different variants that investigate the efficacy of these enhancements.

All end-to-end variants, including the variants with training modifications, outperformed the modular framework of nguyen2022numerical . In particular, the lightweight end-to-end 3-level JNet (E2E-JNet3) performed reasonably well given its low computation cost, and was further improved through a weighted, multi-step training scheme (𝒟w,m\mathcal{D}^{w,m}) to feature time-dependent wave dynamics without adding complexity to the model or substantially extending the training duration. Similarly, the Parareal iterations using the neural propagator trained by the Parareal-based data showed significant performance improvements over E2E-JNet3 without extensive additional computational cost due to parallelization.

As expected, certain expensive upsampling architectures, such as intensify the interconnections between feature and gradient flows (Tiramisu JNet), significantly increased the accuracy. However, the high computational demand makes its application mostly impractical in modern engineering workflows.

Acknowledgements.
Tsai is partially supported by National Science Foundation Grants DMS-2110895 and DMS-2208504. The authors also thank Texas Advanced Computing Center (TACC) for the computing resources.

References

  • (1) Moseley, B., Markham, A., Nissen-Meyer, T.: Solving the Wave Equation with Physics-Informed Deep Learning. ArXiv eprint:2006.11894 (2020)
  • (2) Meng, X., Li, Z., Zhang, D., Karniadakis, G. E.: PPINN: Parareal Physics-Informed Neural Network for Time-Dependent PDEs. CMAME 370, (2020)
  • (3) Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., Anandkumar, A.: Neural Operator: Learning Maps between Function Spaces with Applications to PDEs. Journal of Machine Learning Research 24(89), pp. 1–97 (2023)
  • (4) Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier Neural Operator for Parametric PDEs. ArXiv preprint:2010.08895 (2020)
  • (5) Nguyen, H., Tsai, R.: Numerical Wave Propagation Aided by Deep Learning. Journal of Computational Physics 475, (2023)
  • (6) Rizzuti, G., Siahkoohi, A., Herrmann, F. J: Learned Iterative Solvers for the Helmholtz Equation. 81st EAGE Conference and Exhibition, pp. 1–5 (2019)
  • (7) Kochkov, D., Smith, J., Alieva, A., Wang, Q., Brenner, M., Hoyer, S.: Machine Learning–Accelerated Computational Fluid Dynamics.PNAS 118(21), pp. 89–97 (2021)
  • (8) Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. Journal of Computational Physics 378, pp. 686–707 (2019)
  • (9) Ibrahim, A. Q., Götschel, S., Ruprecht, D.: Parareal with a Physics-Informed Neural Network as Coarse Propagator. ISBN: 978-3-031-39698-4, pp. 649–66 (2023)
  • (10) Engquist, B., Majda, A.: Absorbing Boundary Conditions for Numerical Simulation of Waves. PNAS 74(5), pp. 1765–1766 (1977)
  • (11) Runge, C.: Ueber die Numerische Aufloesung von Differentialgleichungen. Mathematische Annalen 46, pp. 167–-178 (1895)
  • (12) Rocha, D., Sava, P.: Elastic Least-Squares Reverse Time Migration Using the Energy Norm. Geophysics 83(3), pp. 5MJ–Z13 (2018)
  • (13) Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1175–1183 (2017)
  • (14) Petit, O., Thome, N., Rambour, C., Soler, L.: U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. Machine Learning in Medical Imaging, Springer International Publishing, pp. 267–276 (2021)
  • (15) Brougois, A., Bourget, M., Lailly, P., Poulet, M., Ricarte, P., Versteeg, R.: Marmousi, model and data. Conference: EAEG Workshop - Practical Aspects of Seismic Data Inversion (1990)
  • (16) Billette, F., Brandsberg-Dahl, S.: The 2004 BP Velocity Benchmark. European Association of Geoscientists & Engineers (67th EAGE Conference & Exhibition) (2005)