Parameter Estimation and Adaptive Solution of the Leray-Burgers Equation using Physics-Informed Neural Networks

Bong-Sik Kim ¹¹1Department of Mathematics and Physics, American university of Ras Al Khaimah, UAE. E-mail: bkim@aurak.ac.ae, Yuncherl Choi ²²2Ingenium College of Liberal Arts, Kwangwoon University, Seoul 01891, Korea. E-mail : yuncherl@kw.ac.kr, and Doo Seok Lee ³³3Department of Undergraduate Studies, Daegu Gyeongbuk Institute of Science and Technology, Daegu 42988, Korea. E-mail : dslee@dgist.ac.kr

(May 7, 2025)

Abstract

This study presents a unified framework that integrates physics-informed neural networks (PINNs) to address both the inverse and forward problems of the one-dimensional Leray-Burgers equation. First, we investigate the inverse problem by empirically determining the characteristic wavelength parameter $\alpha$ at which the Leray-Burgers solutions closely approximate those of the inviscid Burgers equation. Through PINN-based computational experiments on inviscid Burgers data, we identify a physically consistent range for $\alpha$ being between 0.01 and 0.05 for continuous initial conditions and between 0.01 and 0.03 for discontinuous profiles, demonstrating the dependence of $\alpha$ on initial data. Next, we solve the forward problem using a PINN architecture where $\alpha$ is dynamically optimized during training via a dedicated subnetwork, Alpha2Net. Crucially, Alpha2Net enforces $\alpha$ to remain within the inverse problem-derived bounds, ensuring physical fidelity while jointly optimizing network parameters (weights and biases). This integrated approach effectively captures complex dynamics, such as shock and rarefaction waves. This study also highlights the effectiveness and efficiency of the Leray-Burgers equation in real practical problems, specifically Traffic State Estimation.

1 Introduction

In this study, we explore the one-dimensional Leray-Burgers (LB) equation (2.1), a regularized model of the inviscid Burgers equation, which introduces a wavelength parameter $\alpha$ to prevent finite-time blow-ups while aiming to preserve the essential dynamics of the inviscid case, such as shock waves. However, selecting an appropriate $\alpha$ is a non-trivial task, as its value significantly influences the fidelity of the LB solutions to those of the inviscid Burgers equation. To address this challenge, we employ Physics-Informed Neural Networks (PINNs) [15, 14] in a two-step process that bridges the inverse and forward problems.

First, we tackle the inverse problem, empirically determining the range of $\alpha$ that aligns LB solutions with inviscid Burgers entropy solutions under various initial conditions (Section 4). We find that the choice of $\alpha$ depends on the initial data. For continuous initial profiles, the practical range of $\alpha$ is $0.01$ – $0.05$ , whereas for discontinuous initial profiles, it is $0.01$ – $0.03$ (4.2). This step establishes a critical foundation by identifying the bounds within which $\alpha$ yields physically meaningful results.

Next, we turn to the forward inference problem in Section 5 , where PINNs are utilized to solve the LB equation under different initial and boundary conditions. Unlike traditional approaches where $\alpha$ might be arbitrarily fixed, we treat $\alpha$ as a learnable parameter depending on $t$ , optimized during training alongside the standard PINN parameters (weights and biases). This optimization is facilitated by a dedicated subnetwork, Alpha2Net, integrated into the PINN architecture. To ensure that trained $\alpha$ remains physically relevant, Alpha2Net constrains its values to the range determined from the inverse problem. This constraint links the two problems directly: the inverse problem informs the forward inference by providing a predetermined range that guides the optimization process, ensuring that the resulting solutions are accurate and consistent with the physical insights gained earlier.

We also apply the LB equation to a traffic state estimation to highlight the effectiveness and efficiency of the LB equation in real practical problems in Section 6. We present a brief investigation of two variants of the Lighthill-Whitham-Richards (LWR) traffic flow model, namely LWR- $\alpha$ (based on the Leray-Burgers equation) and LWR- $\epsilon$ (based on the viscous Burgers equation), in the context of Traffic State Estimation (TSE). The result demonstrates the efficacy of the LWR- $\alpha$ model as a suitable alternative for traffic state estimation, outperforming the diffusion-based LWR- $\epsilon$ model in terms of computational efficiency.

2 Background

2.1 Leray-Burgers Equation

We consider a problem of computing the solution $v:[0,T]\times\Omega\to\mathbb{R}$ of an evolution equation

	$\displaystyle v_{t}(t,x)+\mathcal{N}_{\alpha}\,[v](t,x)=0,$	$\displaystyle\quad\forall(t,x)\in[0,T]\times\Omega,$		(2.1)
	$\displaystyle v(0,x)=v_{0}(x),\quad$	$\displaystyle\quad\forall x\in\Omega,$

where $\mathcal{N}_{\alpha}$ is a nonlinear differential operator acting on $v$ with a small constant parameter $\alpha>0$ ,

\mathcal{N}_{\alpha}[v]=vv_{x}+\alpha^{2}v_{x}v_{xx}.

(2.2)

Here, the subscripts of $v$ mean partial derivatives in $t$ and $x$ , $\Omega\subset\mathbb{R}$ is a bounded domain, $T$ denotes the final time and $v_{0}:\Omega\to\mathbb{R}$ is the prescribed initial data. Although the methodology allows for different types of boundary conditions, we restrict our discussion to Dirichlet or periodic cases and prescribe the boundary data as

v_{b}(t,x)=v(t,x),\quad\forall(t,x)\in[0,T]\times\partial\Omega,

where $\partial\Omega$ denotes the boundary of the domain $\Omega$ .

Equation (2.1) is called the Leray-Burgers equation (LB). It is also known as Burgers- $\alpha$ , connectively filtered Burgers equation, Leray regularized reduced order model, etc., in literature. Bhat and Fetecau [2] introduced (2.1) as a regularized approximation to the inviscid Burgers equation

v_{t}+vv_{x}=0.

(2.3)

They considered a special smoothing kernel associated with the Green function of the Helmholtz operator

u_{\alpha}=\mathcal{H}^{-1}_{\alpha}v=(I-\alpha^{2}\partial_{x}^{2})^{-1}v,\ \ (I=identity),

where $\alpha>0$ is interpreted as the characteristic wavelength scale below which the smaller physical phenomena are averaged out (see, for example, [8]) and it accelerates energy decay. Applying the smoothing kernel to the convective term in (2.3) yields

v_{t}+u_{\alpha}v_{x}=0,

(2.4)

where $v=v(t,x)$ is a vector field and $u_{\alpha}$ is the filtered vector field. The filtered vector $u_{\alpha}$ is smoother than $v$ and the equation (2.4) is a nonlinear Leray-type regularization [12] of the inviscid Burgers equation. Here and in the following, we abuse the notation of the filtered vector $u_{\alpha}$ with $u$ . If we express the equation (2.4) in the filtered vector $u$ , it becomes a quasilinear evolution equation that consists of the inviscid Burgers equation plus $\mathcal{O}(\alpha^{2})$ nonlinear terms [2, 3, 4]:

u_{t}+uu_{x}=\alpha^{2}(u_{txx}+uu_{xxx}).

(2.5)

In this paper, we follow Zhao and Mohseni [21] to expand the inverse Helmholz operator in $\alpha$ to higher orders of the Laplacian operator:

(1-\alpha^{2}\Delta)^{-1}=1+\alpha^{2}\Delta+\alpha^{4}\Delta^{2}+\cdots\ \mbox{if}\ \alpha\lambda_{\mathrm{max}}<1,

where $\lambda_{\mathrm{max}}$ is the highest eigenvalue of the discretized operator $\Delta$ . Then we can write (2.4) in the unfiltered vector fields $v$ to obtain the equation (2.1)-(2.2) with $\mathcal{O}(\alpha^{4})$ truncation error.

For smooth initial data $v(0,x)$ that decrease at least at one point (so there exists $y$ such that $v_{x}(0,y)<0$ ), the classical solution $v(t,x)$ of the inviscid Burgers equation (when $\alpha=0$ ) fails to exist beyond a specific finite break time $T_{s}>0$ . It is because the characteristics of the inviscid equation intersect in finite time. The Leray-Burgers equation bends the characteristics to make them not intersect each other, avoiding any finite-time intersection and remedying the finite-time breakdown [2, 4]. So, the Leray-Burgers equation possesses a classical solution globally in time for smooth initial data for $\alpha>0$ [2]:

Theorem 1.

Given initial data $v_{0}\in W^{2,1}(\mathbb{R})=\{u\in L^{1}(\mathbb{R}):D^{s}u\in L^{1}(\mathbb{R})\ \ \mathrm{for\ all}\ \ |s|\leq 2\}$ , the Leray-Burgers equation (2.4) possesses a unique solution $v(t,x)\in W^{2,1}(\mathbb{R})$ for all $t>0$ .

Furthermore, the Leray-Burgers solution $u_{\alpha}(t,x)$ with initial data $u_{\alpha}(0,x)=\mathcal{H}^{-1}_{\alpha}v_{0}(x)$ for $v_{0}\in W^{2,1}(\mathbb{R})$ converges strongly, as $\alpha\rightarrow 0^{+}$ , to a global weak solution $v(t,x)$ of the following initial value problem for the inviscid Burgers equation (Theorem 2 in [2]):

\displaystyle v_{t}+\frac{1}{2}\left(v^{2}\right)_{x}=0\ \ \ \mbox{with}\ \ v(0,x)=v_{0}(x).

Bhat and Fetecau [2] found numerical evidence that the chosen weak solution in the zero- $\alpha$ limit satisfies the Oleinik entropy inequality, making the solution physically appropriate. The proof relies on uniform estimates of the unfiltered velocity $v$ rather than the filtered velocity $u$ . It made possible the strong convergence of the Leray-Burgers solution to the correct entropy solution of the inviscid Burgers equation. In the context of the filtered velocity $u_{\alpha}$ , they also showed that the Leray-Burgers equation captures the correct shock solution of the inviscid Burgers equation for Riemann data consisting of a single decreasing jump [4]. However, since $u_{\alpha}$ captures an unphysical solution for Riemann data comprised of a single increasing jump, it was necessary to control the behavior of the regularized equation by introducing an arbitrary mollification of the Riemann data to capture the correct rarefaction solution of the inviscid Burgers equation. With that modification, they extended the existence results to the case of discontinuous initial data $u_{\alpha}\in L^{\infty}$ . However, it is still an open problem for the initial data $v_{0}\in L^{\infty}$ . In [7], Guelmame, et all, derived a similar regularized equation to (2.5):

u_{t}+uu_{x}=\alpha^{2}(u_{txx}+2u_{x}u_{xx}+uu_{xxx}),

(2.6)

which has an additional term $2u_{x}u_{xx}$ on the right-handed side. Notice that $u$ in this equation is the filtered vector field in (2.4). When they were establishing the existence of the entropy solution, Guelmame, et all. resorted to altering the equation (2.6), as Bhat and Fetecau had to modify the initial data for their proof in [4]. Analysis in the context of the filtered vector field $u$ appears to induce an additional modification of the equations to achieve the desired results. Working with the actual vector field $v$ may avoid such arbitrary changes.

Equation (2.4) and related models have previously appeared in the literature. We refer [1, 2, 3, 4, 6, 7, 11, 13, 17, 20, 18] for more properties related to the Leray-Burgers equation. The paper [17] explores the role of $\alpha$ in regularizing Proper Orthogonal Decomposition (POD)-Galerkin models for the Kuramoto-Sivashinsky (KS) equation. The $\alpha$ -regularization is introduced to enhance the stability and accuracy of these models by applying Helmholtz filtering to the eigenmodes of the quadratic terms. This filtering controls energy transfer between modes, specifically reducing the impact of high-wavenumber modes that contribute to instability, while preserving the system’s key dynamical features. The link between regularization procedures such as Helmholtz regularization and numerical schemes, for example, had been studied in [6, 13]. They argued that, in numerical computations, the parameter $\alpha^{2}$ cannot be interpreted solely as a length scale because it also depends on the numerical discretization scheme chosen. They observed that the choice of $\alpha$ depends on a relation between $\alpha$ and the mesh size that preserves stability and consistency with conservation conditions for the chosen numerical scheme [2, 6, 13]. Also, they found that, for a fixed number of grid points, there is a particular value of $\alpha\approx 0.02$ below which the solution becomes oscillatory (even with continuous initial profiles).

2.2 Leray Regularization and Conserved Quantities

The Leray-type regularization was first introduced by Jean Leray for the Navier-Stokes equations governing incompressible fluid flow [12]. This regularization scheme enhances stability and accuracy by applying a Helmholtz filter to the convective term, whose effect is evident in the Fourier domain:

\widehat{\mathcal{H}^{-1}v}=\frac{\widehat{v}}{1+\alpha^{2}k^{2}}.

This filtering mechanism regulates the energy transfer among different modes by attenuating high-wavenumber contributions, which are typically associated with instability. The parameter $\alpha$ thus serves as a regularization constant that sets the length scale of filtering, suppressing smaller spatial scales while preserving the dominant dynamical behavior of the flow.

In the realm of stochastic partial differential equations, Leray regularization has proven effective in enhancing the stability of reduced order models (ROMs), particularly for convection-dominated systems. Iliescu et al., [11], explored this in their study of a stochastic Burgers equation driven by linear multiplicative noise. They found that standard Galerkin ROMs (G-ROMs) produce spurious numerical oscillations in convection-dominated regimes, a problem exacerbated by increasing noise amplitude. To counter this, they applied an explicit spatial filter to the convective term creating the Leray ROM (L-ROM). This approach significantly mitigates oscillations, yielding more accurate and stable solutions compared to the G-ROM, especially under stochastic perturbations. The L-ROM’s robustness to noise variations suggests that Leray regularization may help preserve statistical properties or conserved quantities, such as energy or moments of the solution, in a stochastic context. This extends the utility of Leray regularization beyond deterministic settings, offering a practical tool for modeling complex stochastic dynamics while maintaining numerical fidelity.

Lemma 2.1.

(Conservations of Energy and Mass) For the Leray-Burgers equation ( $a\leq x\leq b,\,0\leq t$ )

\displaystyle v_{t}+\left(\tfrac{1}{2}v^{2}\right)_{x}+\alpha^{2}v_{x}v_{xx}=0

with periodic boundary condition, both energy and mass are conserved if the regularization parameter $\alpha(t,x)$ is constant with respect to $x$ . That is, $\alpha=\alpha(t)$ .

Proof.

The Leray-Burgers equation can be transformed to the form:

v_{t}+\frac{\partial}{\partial x}\left(\frac{1}{2}v^{2}+\frac{1}{2}\alpha^{2}v_{x}^{2}\right)-\frac{1}{2}(\alpha^{2})_{x}v_{x}^{2}=0.

To preserve energy conservation, the source term must vanish, i.e.,

\frac{1}{2}(\alpha^{2})_{x}v_{x}^{2}=0.

This condition is satisfied if $(\alpha^{2})_{x}=0$ , implying that $\alpha^{2}$ is constant with respect to $x$ . Then the source term vanishes, and the equation takes the conservation form:

v_{t}+\frac{\partial}{\partial x}\left(\frac{1}{2}v^{2}+\frac{1}{2}\alpha^{2}v_{x}^{2}\right)=0,

which ensures energy conservation.

For mass conservation, we want:

	$\displaystyle\frac{d}{dt}\int_{a}^{b}v(t,x)\,\mathrm{d}x$	$\displaystyle=\int_{a}^{b}v_{t}(t,x)\,\mathrm{d}x$
		$\displaystyle=-\int_{a}^{b}\frac{\partial}{\partial x}\left(\frac{1}{2}v^{2}+\frac{1}{2}\alpha^{2}v_{x}^{2}\right)\,\mathrm{d}x$
		$\displaystyle=0.$

This holds if the Leray-Burgers equation is in pure flux form. So, if $\alpha$ is constant to $x$ , the mass conservation also holds. ∎

3 PINN Structure for Inverse and Forward Problems

Refer to caption — Figure 1: The PINN architecture for solving the Leray-Burgers equation: the diagram illustrates the surrogate neural network for predicting v(t,x), the Alpha2Net subnetwork for optimizing $\alpha$ , and the loss components enforcing the PDE, initial, and boundary conditions.

We employ a Physics-Informed Neural Network (PINN) to address both the inverse and forward problems for the Leray-Burgers (LB) equation, as depicted in Figure 1. The LB equation, given by

\displaystyle v_{t}+vv_{x}+\alpha^{2}v_{x}v_{xx}=0,

is solved in various initial and boundary condition scenarios, with the characteristic wavelength parameter $\alpha$ , fixed ( $\alpha=\text{constant}$ ) or adaptively optimized( $\alpha=\alpha(t)$ ). The PINN architecture, shown in Figure 1, consists of two primary components: a main neural network to approximate the solution $v(t,x)$ and a subnetwork, Alpha2Net, to learn the parameter $\alpha$ . The main network is a fully connected feed-forward neural network (multilayer perceptron, MLP) with eight hidden layers, each containing 20 neurons, and employs $\tanh$ activation functions. It takes spatio-temporal coordinates $t$ and $x$ as inputs and outputs the predicted solution $v(t,x)$ . To enforce the physics of the LB equation, automatic differentiation is used to compute the derivatives $\hat{v}_{t}$ , $\hat{v}_{x}$ , and $\hat{v}_{xx}$ , which are then used to evaluate the PDE residual.

The Alpha2Net subnetwork, highlighted in the upper part of Figure 1, is designed to adaptively learn the parameter $\alpha$ as a function of time, i.e. $\alpha(t)$ . This subnetwork is a smaller MLP with three hidden layers, each containing 10 neurons, and also uses $\tanh$ activation functions. It takes the time coordinate $t$ as its sole input and outputs $\alpha^{2}(t)$ , for use in the PDE residual. To ensure physical consistency, Alpha2Net constrains $\alpha^{2}(t)$ to lie within the range $[10^{-4},0.01]$ , slightly broader than the practical range identified in the inverse problem in Section 4. This slight extension allows computational flexibility while maintaining alignment with the physically meaningful bounds established earlier. The constraint is implemented by applying a sigmoid activation at the output layer of Alpha2Net, scaled to map the output to the desired range:

\alpha^{2}(t)=10^{-4}+(0.01-10^{-4})\cdot\text{sigmoid}(z),

where $z$ is the raw output of the subnetwork. This ensures that $\alpha(t)$ remains within the specified bounds during training, preventing the network from converging to nonphysical values.

The output of the main network $(\hat{v},\hat{v}_{t},\hat{v}_{x},\hat{v}_{xx})$ and Alpha2Net( $\alpha$ ) are combined to compute the PDE residual, as shown in the right part of Figure 1. The loss function is designed to enforce the physics of the LB equation and consists of three key components:

•

Residual Loss (enforcing the PDE):

\mathcal{L}_{\text{r}}=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left(\frac{\partial\hat{v}}{\partial t}+\hat{v}\frac{\partial\hat{v}}{\partial x}+\alpha^{2}\frac{\partial^{2}\hat{v}}{\partial x}\frac{\partial^{2}\hat{v}}{\partial x^{2}}\right)^{2},

where $N_{r}$ is the number of collocation points sampled across the spatiotemporal domain.

•

Initial Condition Loss:

$\mathcal{L}_{\text{0}}=\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}\left(\hat{v}(0,x_{i})-v_{0}(x_{i})\right)^{2},$

where $N_{0}$ is the number of points sampled along the initial condition at $t=0$ .
•

Boundary Condition Loss:

$\mathcal{L}_{\text{b}}=\frac{1}{N_{b}}\sum_{i=1}^{N_{b}}\left(\hat{v}(t_{i},x_{\text{b}})-v_{b}(t_{i},x_{\text{b}})\right)^{2},$

where $N_{b}$ is the number of points sampled along the boundary $x=x_{b}$ .

These terms are combined into a total loss

\mathcal{L}=w_{r}\mathcal{L}_{\text{r}}+w_{0}\mathcal{L}_{\text{0}}+w_{b}\mathcal{L}_{\text{b}},

where the weights $w_{r},w_{0},w_{b}$ are either fixed (e.g., set to 1 for equal weighting) or tuned dynamically during training to balance the contributions of each loss term. The training points are adaptively sampled using Latin Hypercube Sampling, with a focus on regions exhibiting high PDE residuals or steep solution gradients, such as shock regions near discontinuities in the initial conditions.

The model is optimized using either the ADAM or Limited-Memory BFGS (L-BFGS) optimizer with a decaying learning rate schedule over unit epochs. During training, the parameters of both the main network (weights and biases) and Alpha2Net are updated simultaneously to minimize the total loss $\mathcal{L}$ . Performance is assessed by computing the $L^{2}$ -error between the PINN predictions and the analytical solutions of the inviscid Burgers equation, obtained via the method of characteristics.

This architecture, as illustrated in Figure 1, effectively integrates the physical constraints of the LB equation into the neural network framework, allowing for both the approximation of the solution $v(t,x)$ and the adaptive optimization of the regularization parameter $\alpha$ . The use of Alpha2Net to learn $\alpha^{2}(t)$ within a constrained range ensures that the solutions remain physically meaningful, bridging the inverse and forward problems seamlessly.

4 Inverse Problem for the Estimation of Parameter $\alpha$

We set up the computational frame for the governing system (2.1) by

	$\displaystyle v_{t}+\lambda_{1}vv_{x}+\lambda_{2}v_{x}v_{xx}=0,\ \ t\in[0,T],\ x\in\Omega$		(4.1)
	$\displaystyle v(0,x)=f(x),\ x\in\Omega\hskip 36.135pt$		(4.2)
	$\displaystyle v(t,x)=g(t,x),\ t\in[0,T],\ x\in\partial\Omega,\hskip 7.22743pt$		(4.3)

where $\Omega\subset\mathbb{R}$ is a bounded domain, $\partial\Omega$ is a boundary of $\Omega$ , $f(x)$ is an initial distribution, and $g(t,x)$ is a boundary data. We intentionally introduced a new parameter $\lambda_{1}$ and set $\lambda_{2}=\alpha^{2}$ . During the training process, the PINN will learn $\lambda_{1}$ to determine the validity of the obtained $\alpha$ for the inviscid Burgers equation ( $\lambda_{1}=1$ ) along with the relative errors. We use numerical or analytical solutions of the exact inviscid and viscous Burgers equations to generate training data sets $D$ with different initial and boundary conditions:

D=\left\{(t_{i},x_{i},v_{i}),\ i=1,...,N_{d}\right\},

where $v_{i}=v(t_{i},x_{i})$ denotes the output value at position $x_{i}\in\Omega$ and time $0<t_{i}\leq T$ with the final time $T$ . $N_{d}$ refers to the number of training data. Our goal is to estimate the effective range of $\alpha$ such that the neural network $v_{\theta}$ satisfies the equation (4.1)-(4.3) and $v_{\theta}(t_{i},x_{i})\approx v_{i}$ . The selected training models represent a range of initial conditions, from continuous initial data to discontinuous data, displaying shock and rarefaction waves.

4.1 PINN for Inverse Problem

Following the original work of Raissi et al. [15, 14], we use a Physics-Informed Neural Network (PINN) to determine physically meaningful $\alpha$ -values closely approximating the entropy solutions to the inviscid Burgers equation. For the inverse problem, Alpha2net in Figure 1 is not used because we are looking for a fixed value of $\alpha$ . The PINN enforces the physical constraint,

\mathcal{F}(t,x):=v_{t}+\lambda_{1}vv_{x}+\lambda_{2}v_{x}v_{xx}

on the MLP surrogate $\hat{v}(t,x)=v_{\theta}(t,x;\xi)$ , where $\theta=\theta(W,b)$ denotes all parameters of the network (weights $W$ and biases $b$ ) and $\xi=(\lambda_{1},\lambda_{2})$ the physical parameters in (4.1), acting directly in the loss function

\mathcal{L}(\theta,\xi)=\mathcal{L}_{d}(\theta,\xi)+\mathcal{L}_{r}(\theta,\xi),

(4.4)

where $\mathcal{L}_{d}$ is the loss function on the available measurement data set that consists in the mean-squared-error (MSE) between the MLP’s predictions and training data and $\mathcal{L}_{r}$ is the additional residual term quantifying the discrepancy of the neural network surrogate $v_{\theta}$ with respect to the underlying differential operator in (4.1). Note that $\mathcal{L}_{d}=w_{0}\mathcal{L}_{0}+w_{b}\mathcal{L}_{b}$ with $w_{0}=w_{1}=1$ in Figure 1. We define the data residual at $(t_{i},x_{i},v_{i})$ in $D$ :

\mathcal{R}_{d,\theta}(t_{i},x_{i};\xi):=v_{\theta}(t_{i},x_{i};\xi)-v_{i},

and the PDE residual at $(t_{i},x_{i})$ in $D$ :

\mathcal{R}_{r,\theta}(t_{i},x_{i};\xi):=\partial_{t}v_{\theta}+\lambda_{1}v_{\theta}\partial_{x}v_{\theta}+\lambda_{2}\partial_{x}v_{\theta}\partial_{xx}v_{\theta},

where $v_{\theta}=v_{\theta}(t,x;\xi)$ . Then, the data loss and residual loss functions in (4.4) can be written as

	$\displaystyle\mathcal{L}_{d}(\theta,\xi)$	$\displaystyle=\frac{1}{N_{d}}\sum_{i=1}^{N_{d}}\Big{\|}\mathcal{R}_{d,\theta}(t_{i},x_{i};\xi)\Big{\|}^{2},$
	$\displaystyle\mathcal{L}_{r}(\theta,\xi)$	$\displaystyle=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\Big{\|}\mathcal{R}_{r,\theta}(t_{i},x_{i};\xi)\Big{\|}^{2}.$

The goal is to find the network and physical parameters $\theta$ and $\xi$ that minimize the loss function (4.4):

(\theta^{*},\xi^{*})=\underset{\theta\in\Theta,\xi\in\Xi}{\arg\min}\,\mathcal{L}(\theta,\xi)

over an admissible set $\Theta$ and $\Xi$ of training network parameters $\theta$ and $\xi$ , respectively.

In practice, given the set of scattered data $v_{i}=v(t_{i},x_{i})$ , the MLP takes the coordinate $(t_{i},x_{i})$ as input and produces output vectors $v_{\theta}(t_{i},x_{i};\xi)$ that have the same dimension as $v_{i}$ . The PDE residual $\mathcal{R}_{r,\theta}(t,x;\xi)$ forces the output vector $v_{\theta}$ to comply with the physics imposed by the LB equation. The PDE residual network takes its derivatives with respect to the input variables $t$ and $x$ by applying the chain rule to differentiate the compositions of functions using the automatic differentiation integrated into TensorFlow. The residual of the underlying differential equation is evaluated using these gradients. The data loss and the residual loss are trained using input from across the entire domain of interest.

4.2 Experiment 1: Inviscid with Riemann Initial Data

We consider the inviscid Burgers equation (2.3) with some standard Riemann initial data of the form

v_{0}(x)=\left\{\begin{array}[]{cc}v_{L},&x\leq 0\\ v_{M},&0<x\leq 1\\ v_{R},&x>1\end{array}\right.

We used the conservative upwind difference scheme to generate training data. For each initial profile, we computed $256\times 101=25856$ data points throughout the entire spatiotemporal domain. We modified the code in [14] and, for each case, performed ten computational simulations with 2000 training data randomly sampled for each computation. We adopted the Limited-Memory BFGS (L-BFGS) optimizer with a learning rate of 0.01 to minimize MSE (4.4). When the L-BFGS optimizer diverged, we preprocessed with the ADAM optimizer and finalized the optimization with the L-BFGS. We manually checked with random sets of hyperparameters by training the algorithm and selected the best set of parameters that fit our objective, 8 hidden layers and 20 units per layer, $10000$ epochs. We trained the other models with the same parameters, which might not be the best but reasonable fit for them. One remark is that our problem is identifying the model parameter $\alpha$ rather than inferencing solutions, and it is unnecessary to consider physical causality in our loss function (4.4) as pointed out in [19].

Upon training, the network is calibrated to predict the entire solution $v(t,x)$ , as well as the unknown parameters $\theta$ and $\xi$ . Along with the relative $L^{2}$ -norm of the difference between the exact solution and the corresponding trial solution

E_{r}(=E_{r}(\hat{v})):=\frac{||v-\hat{v}||_{2}}{||v||_{2}},

we used the absolute error of $\lambda_{1}$ ,

\epsilon(\lambda_{1})=|1-\lambda_{1}|

in determining the validity of each computational result.

The practical range of $\alpha$ was determined by ensuring the relative $L^{2}$ error $E_{r}$ remained below $10^{-2}$ while $\epsilon(\lambda_{1})<0.01$ , aligning the LB solution with the inviscid Burgers entropy solution. The results show that the $\alpha$ value depends on the initial data, with the effective range of $\alpha$ being between 0.01 and 0.05 for continuous initial profiles and between 0.01 and 0.03 for discontinuous initial profiles.

When appropriate, we will also measure the averaged relative $L^{2}$ error in time,

\bar{E}_{r}=\frac{1}{T}\int_{0}^{T}\frac{||v-\hat{v}||_{2}}{||v||_{2}}\,dt.

4.2.1 Shock Waves

We consider two different initial profiles that develop shocks:

v_{t}+vv_{x}=0,\ x\in\mathbb{R},\ t\in[0,1)

with the initial data,

		$\displaystyle\mathrm{(I)}\ \ v(0,x)=\left\{\begin{array}[]{cl}1&\mbox{if}\ x\leq 0\\ 1-x&\mbox{if}\ 0<x<1\\ 0&\mbox{if}\ x\geq 1\end{array}\right.\ \$		(4.5)
	$\displaystyle\mathrm{or}$	$\displaystyle\mathrm{(II)}\ \ v(0,x)=\left\{\begin{array}[]{cl}1&\mbox{if}\ x\leq 0\\ 0&\mbox{if}\ x>0.\end{array}\right.$		(4.5)

The exact entropy solutions corresponding to the initial data (I) and (II) in (4.5) are

		$\displaystyle\mathrm{(I^{\prime})}\ \ v(t,x)=\left\{\begin{array}[]{cl}1&\mbox{if}\ x\leq t\\ \frac{1-x}{1-t}&\mbox{if}\ t<x<1\\ 0&\mbox{if}\ x\geq 1\end{array}\right.$		(4.6)
	$\displaystyle\mathrm{and}$	$\displaystyle\mathrm{(II^{\prime})}\ \ v(t,x)=\left\{\begin{array}[]{cl}1&\mbox{if}\ x\leq\frac{t}{2}\\ 0&\mbox{if}\ x>\frac{t}{2},\end{array}\right.$		(4.6)

respectively. The initial profile (I) in (4.5) represents a ramp function with a slope of $-1$ , which creates a wave that travels faster on the left-hand side of $x$ than on the right-hand side. The faster wave overtakes the slow wave, causing a discontinuity when $t=1$ , as we can see from the exact solution $\mathrm{(I^{\prime})}$ in (4.6). The second initial data (II) in (4.5) contains a discontinuity at $x=0$ . Its solution needs a shock fitting just from the beginning.

No.	Initial Profile (I)			Initial Profile (II)
No.	$\lambda_{2}\!=\!\alpha^{2}$	$\epsilon(\lambda_{1})$	$E_{r}$	$\lambda_{2}\!=\!\alpha^{2}$	$\epsilon(\lambda_{1})$	$E_{r}$
1	1.11e-3	3.4e-3	5.73e-3	6.76e-4	6.18e-2	7.16e-3
2	1.28e-3	6.6e-3	5.91e-3	4.46e-4	2.32e-2	3.70e-3
3	1.54e-3	1.24e-2	5.17e-3	4.85e-4	1.70e-2	5.31e-3
4	1.31e-3	1.33e-2	5.62e-3	5.69e-4	7.2e-3	5.92e-3
5	1.47e-3	6.2e-3	5.45e-3	6.53e-4	6.5e-3	7.77e-3
6	1.32e-3	6.1e-3	5.09e-3	8.41e-4	2.04e-2	1.04e-2
7	6.02e-4	1.03e-2	7.02e-3	8.76e-4	5.91e-2	1.09e-2
8	1.94e-3	6.7e-3	5.29e-3	8.77e-4	4.5e-3	1.23e-2
9	7.32e-4	5.6e-3	6.36e-3	9.17e-4	1.51e-2	1.76e-2
10	1.81e-4	1.24e-2	5.33e-3	7.64e-4	4.00e-4	9.83e-3
Avg	1.31e-3	8.30e-3	5.70e-3	7.11e-3	2.12e-2	9.09e-3
$\sqrt{\text{Avg}}$	3.62e-2			2.67e-2

Table 1: Ten simulation results with

N_{d}=

2000

training data randomly sampled for each computation.

Based on the Rankine-Hugoniot condition, the discontinuity must travel at a speed $x^{\prime}(t)=\frac{1}{2}$ , which we can observe in the analytical solution $\mathrm{(II^{\prime})}$ in (4.6). The solution also satisfies the entropy condition, which guarantees that it is the unique weak solution for the problem. Table 1 shows ten computational results.

In both cases, the average $\epsilon(\lambda_{1})$ is within $3\times 10^{-2}$ , indicating that the inferred PDE residual reflects the actual Leray-Burgers solutions within an acceptable range. The average value of $\alpha$ with the initial profile (I) was $0.0362$ with $E_{r}=5.7\times 10^{-3}$ . Figure 2 shows a plot example. We can see that the Leray-Burgers solution captures well the shock wave and maintains the discontinuity at $x=1$ as $t$ evolves to $1$ . Computations with the initial profile (II) resulted in $\alpha\approx 0.0267$ on average with $E_{r}=9.1\times 10^{-3}$ . Figure 2 shows that the Leray-Burgers equation captures the shock wave as well as its speed $\frac{1}{2}$ per unit time. Increasing the training data ( $N_{d}\geq 4000$ ) did not change the value of $\alpha$ significantly.

4.2.2 Rarefaction Waves

We generate a training data set from the inviscid Burgers equation

v_{t}+vv_{x}=0,\ x\in\mathbb{R},\ t\in[0,2]

with the initial data

		$\displaystyle\mathrm{(III)}\ \ v(0,x)=\left\{\begin{array}[]{cl}0&\mbox{if}\ x\leq 0\\ x&\mbox{if}\ 0<x<1\\ 1&\mbox{if}\ x\geq 1\end{array}\right.\ \$		(4.7)
	$\displaystyle\mathrm{or}$	$\displaystyle\mathrm{(IV)}\ \ v(0,x)=\left\{\begin{array}[]{cl}0&\mbox{if}\ x\leq 0\\ 1&\mbox{if}\ x>0.\end{array}\right.$		(4.7)

The rarefaction waves are continuous self-similar solutions, which are

		$\displaystyle\mathrm{(III^{\prime})}\ \ v(t,x)=\left\{\begin{array}[]{cl}0&\mbox{if}\ x\leq 0\\ \frac{x}{1+t}&\mbox{if}\ 0<x<1+t\\ 1&\mbox{if}\ x\geq 1+t\end{array}\right.$
	$\displaystyle\mathrm{and}\,\,$	$\displaystyle\mathrm{(IV^{\prime})}\ \ v(t,x)=\left\{\begin{array}[]{cl}0&\mbox{if}\ x\leq 0\\ \frac{x}{t}&\mbox{if}\ 0<x<t\\ 1&\mbox{if}\ x\geq t\end{array}\right.$

corresponding to the initial data (III) and (IV) in (4.7), respectively.

In both cases, $\epsilon(\lambda_{1})$ is within $10^{-2}$ , indicating that the inferred PDE residual reflects the Leray-Burgers equations within an acceptable range. The average value of $\alpha$ are $0.0488$ with $E_{r}=1.99\times 10^{-3}$ for the continuous initial profile (III) and $\alpha\approx 0.0276$ with $E_{r}=7.5\times 10^{-3}$ for the discontinuous initial profile (IV). Figure 3 shows that the LB equation captures the rarefaction waves well.

4.2.3 Shock and Rarefaction Waves

We combine the shock and rarefaction profiles:

		$\displaystyle\mathrm{(V)}\ \ v(0,x)=\left\{\begin{array}[]{cc}0&\mbox{if}\ x\leq-1\\ 1+x&\mbox{if}\ -1<x\leq 0\\ 1-x&\mbox{if}\ 0<x\leq 1\\ 0&\mbox{if}\ x>1\end{array}\right.\ \$
	$\displaystyle\mathrm{or}\,\,$	$\displaystyle\mathrm{(VI)}\ \ v(0,x)=\left\{\begin{array}[]{cc}0&\mbox{if}\ x<0\\ 1&\mbox{if}\ 0\leq x\leq 1\\ 0&\mbox{if}\ x>1\end{array}\right.$

In both cases, $\epsilon(\lambda_{1})$ is within $10^{-2}$ and the mean values of $\alpha$ are $0.0348$ with $E_{r}=1.95\times 10^{-2}$ for the continuous initial profile (V) and $\alpha\approx 0.0316$ with $E_{r}\approx 3.17\times 10^{-2}$ for the discontinuous initial profile (VI). Figure 4 shows that the LB equation captures both shock and rarefaction waves well.

4.3 Experiment 2: Viscid Cases

In this section, we consider the following viscous Burgers equation for a training data set:

v_{t}+vv_{x}=\nu v_{xx},\ \ \ \forall(t,x)\in(0,T]\times\Omega

(4.8)

with

		$\displaystyle\mathrm{(A)}\ \ \left\{\begin{array}[]{cc}\nu=\frac{0.01}{\pi},\ T=1,\Omega=[-1,1]&\\ v(0,x)=-\sin(\pi x),\ \forall x\in\Omega&\\ v(t,-1)=v(t,1)=0,\ \forall t\in[0,1]\end{array}\right.\hskip 21.68121pt$
	$\displaystyle\mathrm{or}\,\,$	$\displaystyle\mathrm{(B)}\ \ \left\{\begin{array}[]{cc}\nu=0.07,\ T\approx 0.4327,\ \Omega=[0,2\pi]&\\ v(0,x)=-2\nu\frac{\phi^{\prime}(x)}{\phi(x)}+4,\ \forall x\in\Omega&\\ \phi(x)=\exp\left(\frac{-x^{2}}{4\nu}\right)+\exp\left(\frac{-(x-2\pi)^{2}}{4\nu}\right).\end{array}\right.$

The corresponding LB equation is

v_{t}+vv_{x}=-\alpha^{2}v_{x}v_{xx}.

For the initial and boundary data (A), Rudy, et al., [16] proposed the data set that can correctly identify the viscous Burgers equation solely from time series data. It contains 101-time snapshots of a solution to the Burgers equation with a Gaussian initial condition propagating into a traveling wave. Each snapshot has 256 uniform spatial grids. For our experiment, we adopt the data set prepared by Raissi, et all, in [15, 14] based on [16], $101\times 256=25856$ data points, generated from the exact solution to (4.8). For training, $N_{d}=2000$ collocation points are randomly sampled and we use the L-BFGS optimizer with a learning rate of 0.8. The average of ten experiments is $\alpha=0.0158$ with $E_{r}\approx 3.8\times 10^{-2}$ . The computational simulation shows that the equation develops a shock properly (Figure 5). Note that $\nu=0.01/\pi\approx 12.7\alpha^{2}$ .

For the initial and periodic boundary condition (B), we generate $256\times 500=128000$ training data from the exact solution formula for the whole dynamics over time. With $N_{d}=2000$ training data, the PINN diverges frequently. We experiment with the model with 4000 or more data points to determine an appropriate number of training data. $L^{2}$ error remains around $10^{-2}$ for all cases, which does not provide a clear cut. So, we use the absolute error of $\lambda_{1}$ to determine the appropriate number of training data. For each case of $N_{d}$ , we perform the computation 5 to 10 times (Table 2).

$N_{d}$	4000	6000	8000	10000	12000	14000
$\epsilon(\lambda_{1})$	0.0207	0.0113	0.01059	0.00965	0.00908	0.00848
$N_{d}$	16000	18000	20000	25000	30000
$\epsilon(\lambda_{1})$	0.0046	0.0059	0.00604	0.00671	0.00611

Table 2:

\epsilon(\lambda_{1})=|1-\lambda_{1}|

for various

N_{d}

with the initial profile (B).

As $N_{d}$ increases, the results get better and need to do until it reaches the upper limit. Errors between 14000 and 18000 look better than other ranges. More than 18000 does not seem to improve the results. $N_{d}=16000$ (12.5% of total data) is chosen. More than this does not seem to be better. More likely almost the same. The average of ten computations is $\alpha\approx 0.0894$ with $E_{r}=1.64\times 10^{-2}$ . Observe that $\nu=0.07\approx 8.8\alpha^{2}$ .

Every part of the solution for (B) moves to the right at the same speed, which differs from (A) (Fig. 6). In (A), the left side of a peak moves faster than the right side, developing a steeper middle. It resulted in a higher value of $\alpha$ with (B) than with (A).

In summary, we observe that $\nu=0.01/\pi\approx 12.7\alpha^{2}$ with the profile (A) and $\nu=0.07\approx 8.8\alpha^{2}$ with the profile (B). These results demonstrate that the LB equation can capture nonlinear interactions at significantly smaller length scales compared to the viscous Burgers equation. Notably, numerical schemes for the viscous Burgers equation become unstable at lower $\nu$ values, whereas the LB equation maintains stability and convergence under these conditions. This observation will be clearer when we compare the forward inferred solutions of two equations in Section 5 (Part C).

4.4 Experiment 3: The Filtered Vector $u$

We write Equation (2.1) in the filtered vector $u_{\alpha}=u$ , which is a quasilinear evolution equation that consists of the inviscid Burgers equation plus $\mathcal{O}(\alpha^{2})$ nonlinear terms [2, 3, 4]:

u_{t}+uu_{x}=\alpha^{2}(u_{txx}+uu_{xxx}).

(4.9)

We compute the equation with the same conditions as in the previous corresponding experiments. The results show that the filtered equation (4.9) also tends to depend on the continuity of the initial profile as shown in Table 3.

IC	Continuous	IC	Discontinuous
I	0.0279	II	0.0004
III	0.0469	IV	0.0127
V	0.0469	VI	0.0277

Table 3: Averaged

\alpha

values for the filtered vector

u

. with

N_{d}=2000

and epochs

=10000

except the case II. (IC = Initial Condition as in Section 4.)

When initial profiles contain discontinuities, the $\alpha$ values are much smaller than those with continuous initial profiles. Compared to the unfiltered equation (2.1), the $\alpha$ values for the filtered equation (4.9) are smaller, which may cause more oscillation in forward inference.

With the initial profile (II), the parameter $\lambda_{1}$ for the filtered velocity is not close to 1 with $\epsilon(\lambda_{1})\approx 0.1076$ on average. By increasing the number of epochs from 10000 to 50000 we get a better result. $\lambda_{1}$ gets closer to 1 with $\epsilon(\lambda_{1})\approx 0.0531$ , slightly better relative error and loss, which makes the solution better at later time. The oscillation near the discontinuity gets reduced. This verifies that $u$ needs very small $\alpha$ values to approximate the inviscid Burgers solution.

Having established $\alpha$ ’s practical range, we next explore its application in forward inference.

5 Data-Driven Solutions of the Leray-Burgers Equation

In this section, we solve the LB equation across multiple initial and boundary condition scenarios:

v_{t}+vv_{x}+\alpha^{2}v_{x}v_{xx}=0,\ x\in\mathbb{R},\ t\in(0,1)

with

		$\displaystyle\mathrm{(I)}\ \ v(0,x)=\left\{\begin{array}[]{cl}1&\mbox{if}\ x\leq 0\\ 1-x&\mbox{if}\ 0<x<1\\ 0&\mbox{if}\ x\geq 1\end{array}\right.\ \$
	$\displaystyle\mathrm{or}\,\,$	$\displaystyle\mathrm{(II)}\ \ v(0,x)=\left\{\begin{array}[]{cc}1&\mbox{if}\ x\leq 0\\ 0&\mbox{if}\ x>0.\end{array}\right.$

Training utilizes $N_{0}=5000$ initial condition points, $N_{b}=5000$ boundary condition points, and $N_{r}=20000$ collocation points. These points are adaptively sampled using Latin Hypercube Sampling, with an emphasis on regions exhibiting high PDE residuals or steep solution gradients, particularly in shock regions near discontinuities identified in the initial condition. Our computational focuses are as follows:

1.

Convergence in $\alpha$ . Whether the PINN solutions converge to those of the inviscid Burgers equation as $\alpha\rightarrow 0^{+}$ .
2.

Forward inference with adaptive $\alpha(t)$ . Whether the PINN solutions capture the shock and rarefaction waves well and whether the trained $\alpha$ values are within the physically valid range.
3.

Scaling effect of the $\alpha$ parameter relative to the inviscid and viscous Burgers equation.

5.1 The Convergence of the Leray-Burgers Solutions as $\alpha\rightarrow 0^{+}$

Figure 8 demonstrates that the Leray-Burgers equation effectively captures the shock formation with the continuous initial profile (I) within the range of $0<\alpha<0.05$ . As $\alpha\rightarrow 0^{+}$ , the LB solution converges to the inviscid Burgers solution (the last graph in Figure 8).

With the discontinuous initial file (II), the Leray-Burgers equation still accurately captures the shock formation within the range of $0.01<\alpha<0.03$ (Figure 9). However, the MLP-based PINN generates spurious oscillations near the discontinuity at the beginning. Although the network quickly recovers and fits the oscillations as time progresses, the oscillations worsen, and nonlinear instability arises as the $\alpha$ scale becomes smaller than 0.01, which leads to the deviation of the network solution from the actual inviscid Burgers solution (the last graph in Figure 9).

5.2 Forward Inference with Adaptively Optimized $\alpha>0$

In this section, we employ the MLP-based Physics-Informed Neural Network (MLP-PINN) to effectively learn the nonlinear operator $\mathcal{N}_{\alpha}[v]$ , wherein $v$ represents the primary variable and $\alpha$ denotes a parameter. Coutinho, et all., [5], introduced the idea of adaptive artificial viscosity that can be learned during the training procedure and does not depend on the a priori choice of artificial viscosity coefficient. Instead of incorporating the parameter $\alpha$ in place of the artificial viscosity as in [5], we set up a dedicated subnetwork, Alpha2Net depicted in Figure 1, to find the optimal $\alpha(t)$ value. The integration of the subnetwork into the main PINN architecture makes PINN train both $v$ and $\alpha$ to achieve a robust fit with the LB equation. Two examples highlight the ability of the LB equation to capture shock and rarefaction waves as well as the corresponding optimal values of $\alpha$ , which are presented in Figure 10.

For the computations, we generated $100\times 1000=100000$ training data in the domain $[0,2]\times[-2,4]$ from the corresponding analytical solution for each case. With $N_{0}=N_{b}=1000$ , $N_{r}=10000$ , and epochs =20000. The first graph presents computational snapshots of the system’s evolution with the initial profile (II) over the time interval [0, 2]. The computational outputs are $\mathcal{L}\approx 4.1\times 10^{-4}$ and the averaged relative $L^{2}$ -error in time is around $3.5\times 10^{-2}$ with $\alpha\approx 0.0169$ . Note that $\alpha$ is the average of trained values of $\alpha(t)$ over time. The second graph illustrates snapshots of the evolution of a rarefaction wave with the initial profile (IV). The computational outputs are $\mathcal{L}\approx 1.9\times 10^{-4}$ and the averaged relative $L^{2}$ error in time is around $7.9\times 10^{-3}$ with the averaged $\alpha=0.0032$ in time.

5.3 The Effect of $\alpha$ Scale in Relation to the Inviscid and Viscous Burgers Equations

When comparing the Leray-Burgers equation (2.1) with the viscous Burgers equation (4.8), the term $\alpha^{2}v_{x}v_{xx}$ in (2.1) serves as a nonlinear regularization mechanism, acting as a substitute for the linear diffusion term in the viscous Burgers equation. Unlike linear diffusion, the $\alpha$ term in Equation (2.1) depends on both the first derivative $v_{x}$ and the second derivative $v_{xx}$ , suggesting that its smoothing effect is more pronounced in regions with high gradients, modulated by the parameter $\alpha$ . Thus, it is valuable to assess the performance of these two equations in relation to the inviscid Burgers equation.

Both equations are solved using PINNs with consistent training configurations: 20,000 epochs, fixed weights, and identical network architectures (8 hidden layers, 20 neurons per layer). The key metric for comparison is the $L^{2}$ error, which quantifies the difference between the predicted and exact solutions, with lower values indicating better accuracy. Computations provide $L^{2}$ errors for both equations across different values of $\alpha$ , with $\nu$ set equal to $\alpha^{2}$ in the viscous Burgers equation (4.8). The averaged $L^{2}$ errors over time are summarized in Table 4.

$\alpha$	$\nu=\alpha^{2}$	LB Average $L^{2}$	VB Average $L^{2}$
0.025	0.000625	$2.8025\times 10^{-2}$	$9.1788\times 10^{-2}$
0.030	0.0009	$2.9933\times 10^{-2}$	$1.2258\times 10^{-1}$
0.032	0.001024	$3.2280\times 10^{-2}$	$7.8998\times 10^{-1}$
0.033	0.001089	$3.1334\times 10^{-2}$	$8.6701\times 10^{-2}$
0.035	0.001225	$3.2484\times 10^{-2}$	$1.3176\times 10^{-2}$

Table 4: Averaged

L^{2}

errors over time for Leray-Burgers (LB) and viscous Burgers (VB) equations.

For $\alpha$ values ranging from 0.025 to 0.033 ( $\nu$ from 0.000625 to 0.001089), the LB equation consistently outperforms the viscous Burgers equation in the averaged $L^{2}$ error. The averaged $L^{2}$ error for LB equation remains relatively stable, ranging from $2.8025\times 10^{-2}$ to $3.2280\times 10^{-2}$ . In contrast, the Burgers equation exhibits higher errors, ranging from $7.8998\times 10^{-2}$ to $1.2258\times 10^{-1}$ . There is no clear monotonic trend, indicating variability in the neural network’s ability to approximate the solution.

These results indicate that for small values of $\nu$ , the viscous Burgers equation is prone to developing shocks due to its hyperbolic nature. PINNs may struggle to accurately capture these discontinuities. In contrast, the LB equation, through its nonlinear regularization effect (dependent on $\alpha$ ), likely smooths these discontinuities, leading to improved accuracy.

The data also suggest a tipping point between $\alpha=0.032$ ( $\nu=0.001024$ ), where the performance of the two models being to shift. A more definitive transition appears to occur between $\alpha=0.033$ and $\alpha=0.035$ ( $\nu=0.001225$ ), at which point the viscous Burgers equation begins to outperform the LB equation. This transition is illustrated in Figure 11.

To further examine the differences in solution behavior, Figure 12 presents heatmaps of the difference between the solutions of the LB equation and the inviscid Burgers equation, as well as the difference between the viscous Burgers equation and the inviscid Burgers equation, for $\alpha=0.025$ ( $\nu=0.000625$ ). The LB equation exhibits a more gradual transition in error distribution, while the viscous Burgers equation shows sharper localized discrepancies along the shock region. This suggests that the nonlinear regularization in LB equation helps mitigate sharp discontinuities, leading to improved prediction accuracy.

In summary, the parameter $\alpha$ (through $\nu=\alpha^{2}$ ) controls the regularization strength. Smaller values of $\alpha$ correspond to finer scales where regularization enhances accuracy, while larger values increase $\nu$ , potentially leading to over-smoothing compared to the standard viscous Burgers equation. In practice, the LB equation may be preferable for smaller length scales (low $\alpha$ ), while the viscous Burgers equation may be more suitable for larger scales (higher $\alpha$ ). The transition appears to occur near $\alpha=0.035$ . These findings emphasize the interplay between physical regularization, viscosity, and the numerical approximation capabilities of PINNs.

6 Application to Traffic State Estimation

This section demonstrates the practical utility of forward inference with the LB equation, using the estimated $\alpha$ range to model traffic dynamics efficiently. Huang, et all., [9] applied PINNs to tackle the challenge of data sparsity and sensor noise in traffic state estimation (TSE). The main goal of TSE is to obtain and provide a reliable description of traffic conditions in real time. In Case Study-I in [9], they prepared the test bed of a 5000-meter road segment for 300 seconds $\left((t,x)\in[0,300]\times[0,5000]\right)$ . The spatial resolution of the dataset is 5 meters and the temporal resolution is 1 second. The case study was designed to utilize the trajectory information data from Connected and Autonomous Vehicles (CAVs) as captured by Roadside Units (RSUs), which were deployed every 1000 meters on the road segment (6 RSUs on the 5000-meter road from $x=0$ ). The communication range of RSU was assumed to be 300 meters, meaning that vehicle information broadcast by CAVs at $x\in[0,300]$ can be captured by the first RSU and the second RSU can log CAV data transmitted at $x\in[700,1300]$ , etc. More details on data acquisition and description can be found in [9, 10].

In this section, we switch to the differential notation $\frac{d\rho}{dt},\frac{\partial\rho}{\partial x},\frac{\partial^{2}\rho}{\partial x^{2}}$ to avoid confusion with constant parameter notations such as $\rho_{m}$ . Let $q(t,x)$ denote the flow rate indicating the number of vehicles that pass a set location in a unit of time and $\rho(t,x)$ the flow density representing the number of vehicles in a unit road of space. Then, the Lighthill-Whitham-Richards (LWR) traffic model [9] is, for $(t,x)\in\mathbb{R}^{+}\times\mathbb{R}$ ,

\frac{\partial\rho(t,x)}{\partial t}+\frac{\partial q(t,x)}{\partial x}=0,

(6.1)

where $\rho(t,x)=-\frac{\partial N(t,x)}{\partial x}$ and $q(t,x)=\frac{\partial N(t,x)}{\partial t}$ . Here $N(t,x)$ is the cumulative flow that depicts the number of vehicles that have passed location $x$ by time $t$ . Huang, et all., [9] adopted the Greenshields fundamental diagram to set the relationship between traffic states - density $\rho$ , flow $q$ , and speed $v$ :

	$\displaystyle q(\rho)$	$\displaystyle=\rho v_{f}\left(1-\frac{\rho}{\rho_{m}}\right)$		(6.2)
	$\displaystyle v(\rho)$	$\displaystyle=v_{f}\left(1-\frac{\rho}{\rho_{m}}\right),$		(6.2)

where $\rho_{m}$ is the jam density (maximum density) and $v_{f}$ is the free-flow speed. Substituting the relationship (6.2) into (6.1) transforms the LWR model into the LWR-Greenshield model

v_{f}\left(1-\frac{2\rho(t,x)}{\rho_{m}}\right)\frac{\partial\rho(t,x)}{\partial x}+\frac{\partial\rho(t,x)}{\partial t}=0.

(6.3)

We will just call it the LWR model. The equation (6.3) is a hyperbolic PDE and a second order diffusive term can be added as following, to make the PDE become parabolic and secure a strong solution:

v_{f}\left(1-\frac{2\rho(t,x)}{\rho_{m}}\right)\frac{\partial\rho(t,x)}{\partial x}+\frac{\partial\rho(t,x)}{\partial t}=\epsilon\frac{\partial^{2}\rho}{\partial x^{2}}.

(6.4)

We will call the equation (6.4) the LWR- $\epsilon$ model. The second-order diffusion term ensures that the solution of PDE is continuous and differentiable, avoiding breakdown and discontinuity in the solution. Following the same structural idea from (6.3) to (6.4) we add a regularization term to (6.3) instead of the diffusion term in (6.4):

v_{f}\!\left(1\!-\!\frac{2\rho(t,x)}{\rho_{m}}\right)\frac{\partial\rho(t,x)}{\partial x}+\frac{\partial\rho(t,x)}{\partial t}=-\alpha^{2}\frac{\partial\rho}{\partial x}\frac{\partial^{2}\rho}{\partial x^{2}}.

(6.5)

We will call Equation (6.5) the LWR- $\alpha$ model. We set up the same PINN architecture for computational comparisons of three models, LWR, LWR- $\epsilon$ , and LWR- $\alpha$ , with $v_{f}=25\,m/s,\rho_{m}=0.15\,vehicles/m$ . Figure 13 visualizes the computational results.

Our empirical calculations demonstrate that both the LWR- $\alpha$ and LWR- $\epsilon$ models provide reasonable approximations of the reference speed ( $v$ ), as illustrated in Figure 13. Both models exhibit comparable accuracy to the standard LWR model, validating their potential for traffic state estimation applications.

The experiments highlight the critical role of nonlinear characteristics in traffic data for accurate state estimation. The LWR- $\alpha$ model emerges as the more practical choice due to its superior ability to capture the inherent nonlinear behavior of traffic flow. While the LWR- $\epsilon$ model offers a reasonable approximation, its limited computational performance at lower scales (below 0.025) restricts its utility in real-time applications.

In our traffic state estimation (TSE) application, we employed the LWR- $\alpha$ model with $\alpha=0.025$ . This parameter value directly addresses our primary objectives of determining the practical range of $\alpha$ in the Leray-Burgers equation and evaluating the effectiveness of Physics-Informed Neural Networks (PINNs) in solving the forward inference problem. The chosen value of $\alpha=0.025$ aligns closely with the estimated range of the inverse problem: specifically 0.01 to 0.05 for continuous initial profiles and 0.01 to 0.03 for discontinuous profiles.

The successful application of the LWR- $\alpha$ model in accurately capturing the dynamics of traffic flow validates the physical relevance and practicality of our estimated $\alpha$ range. Moreover, the robust performance of PINNs in precisely estimating traffic states using this model demonstrates their effectiveness in solving the forward inference problem for the Leray-Burgers equation. Consequently, our TSE results substantiate both the accuracy of our $\alpha$ estimation and the capabilities of PINNs, thereby reinforcing the core findings of our study and affirming their potential in real-world applications.

7 Discussion

The relationship between the inverse and forward problems is a cornerstone of our approach to solving the Leray-Burgers (LB) equation with Physics-Informed Neural Networks (PINNs). In the inverse problem, we determine the practical range of the characteristic wavelength parameter $\alpha$ that ensures that the LB equation closely approximates the inviscid Burgers solution. This range, derived from the training of PINNs on inviscid Burgers data, reflects the values of $\alpha$ that maintain the physical fidelity of LB solutions under a variety of initial conditions.

This estimation is not an isolated step, but directly informs the forward inference process. When training PINNs to solve the LB equation, we do not prescribe a fixed $\alpha$ . Instead, $\alpha$ is treated as a trainable parameter, optimized concurrently with the standard PINN parameters (weights and biases) through a subnetwork called Alpha2Net. To ensure that the optimized $\alpha$ remains physically meaningful, Alpha2Net enforces a constraint: $\alpha$ must be within the range established by the inverse problem. This restriction serves a dual purpose: it prevents the network from converging to nonphysical or suboptimal values of $\alpha$ , and it leverages the prior knowledge gained from the inverse problem to improve the accuracy and stability of the forward solutions. For example, if the inverse problem indicates that $\alpha$ should range between 0.01 and 0.05 for certain profiles, Alpha2Net ensures that the $\alpha$ learned during forward inference adheres to these bounds. This linkage guarantees that the solutions to the LB equation not only capture complex phenomena like shocks and rarefactions but also remain consistent with the physical constraints established earlier. Thus, the inverse problem provides an essential scaffold that supports and refines forward inference, creating a unified framework for parameter estimation and PDE solution.

8 Conclusion

Computational experiments show that the $\alpha$ -values depend on the initial data. Specifically, the practical range of $\alpha$ spans from 0.01 to 0.05 for continuous initial profiles and narrows to 0.01 to 0.03 for discontinuous profiles. We also note that the Leray-Burgers equation in terms of the filtered vector $u$ does not produce reliable estimates of $\alpha$ . When approximating the filtered solution $u$ with commendable precision, MLP-PINN necessitates a more extensive dataset, and the range of $\alpha$ values for $u$ appears confined, between 0.0001 and 0.005. Nonetheless, the MLP-PINN’s attempts with $u$ encounter challenges in converging to the true Burgers solutions. Thus, it is evident that the equation formulated in the unfiltered vector field $v$ offers a better approximation to the exact Burgers equation.

In practical terms, treating $\alpha$ as an unknown variable becomes a prudent strategy. By endowment $\alpha$ with learnable attributes along with network parameters, MLP-PINNs can be structured to reveal $\alpha$ within a valid range during the training process, potentially improving accuracy. Nevertheless, the MLP-PINN does generate spurious oscillations near discontinuities inherent in shock-inducing initial profiles. This phenomenon thwarts the PINN solution from aligning with an exact inviscid Burgers solution as $\alpha\rightarrow 0^{+}$ .

This study also demonstrates the effectiveness of the LWR- $\alpha$ model as a viable alternative for traffic state estimation. Surpassing the diffusion-based LWR- $\epsilon$ model in terms of computational efficiency, the LWR- $\alpha$ model aligns with the nonlinear nature of traffic data.

Statements and Declarations

Competing Interests: On behalf of all authors, the corresponding author states that there is no conflict of interest.

Data Availability: The code and data we used to train and evaluate our models are available at

https://github.com/bkimo/PINN-LB.

The data for traffic state estimations generated by Huang, et al. [9, 10] is available at

https://github.com/arjhuang/pise.

Acknowledgment

The second author was supported by the Research Grant of Kwangwoon University in 2022 and by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021R1F1A1058696). The third author gratefully acknowledge the Advanced Technology and Artificial Intelligence Center at American University of Ras Al Khaimah for providing high-performance GPU computing resources to support this research.

References

[1] Raul K.C. Araújo1, Enrique Fernández-Cara, and Diego A. Souza. On the uniform controllability for a family of non-viscous and viscous burgers- $\alpha$ systems. ESAIM: Control, Optimisation and Calculus of Variations, 27(78), 2021.
[2] H. S. Bhat and R. C. Fetecau. A hamiltonian regularization of the burgers equation. Journal of Nonlinear Science, 16:615–638, 2006.
[3] H. S. Bhat and R. C. Fetecau. Stability of fronts for a regularization of the burgers equation. Quarterly of Applied Mathematics, 66:473–496, 2008.
[4] H. S. Bhat and R. C. Fetecau. The Riemann problem for the leray-burgers equation. Journal of Differential Equations, 246:3597–3979, 2009.
[5] Emilio Jose Rocha Coutinho, Marcelo Dall’Aqua, Levi McClenny, Ming Zhong, Ulisses Braga-Neto, and Eduardo Gildin. Physics-informed neural networks with adaptive localized artificial viscosity. Journal of Computational Physics, 489(112265), 2023.
[6] Georg A Gottwald. Dispersive regularizations and numerical discretizations for the inviscid burgers equation. Journal of Physics A: Mathematical and Theoretical, 40(49), 2007.
[7] Billel Guelmame, Stéphane Junca, Didier Clamond, and Robert Pego. Global weak solutions of a hamiltonian regularised burgers equation. Journal of Dynamics and Differential Equations, 2022.
[8] Darryl D. Holm, Chris Jeffery, Susan Kurien, Daniel Livescu, Mark A. Taylor, and Beth A. Wingate. The lans- $\alpha$ model for computing turbulence: Origins, results, and open problems. Los Alamos Science, 19, 2005.
[9] Archie J. Huang and Shaurya Agarwal. Physics-informed deep learning for traffic state estimation: Illustrations with LWR and CTM models. IEEE Open Journal of Intelligent Transportation Systems, 3, 2022.
[10] Archie J. Huang and Shaurya Agarwal. On the limitations of physics-informed deep learning: Illustrations using first-order hyperbolic conservation law-based traffic flow model. IEEE Open Journal of Intelligent Transportation Systems, 4, 2023.
[11] Traian Iliescu, Honghu Liu, and Xuping Xie. Regularized reduced order models for a stochastic burgers equation. International Journal of Numerical Analysis and Modeling, 15(4-5):594–607, 2018.
[12] J. Leray. Essai sur le mouvement d’un fluide visqueux emplissant l’space. Acta Math., 63:193–248, 1934.
[13] Yekaterina S. Pavlova. Convergence of the leray $\alpha$ -regularization scheme for discontinuous entropy solutions of the inviscid burgers equation. The UCI Undergraduate Research Journal, pages 27–42, 2006.
[14] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics informed deep learning (part ii): Data-driven discovery of nonlinear partial differential equations. 2017. Preprint at https://arxiv.org/abs/1711.10566.
[15] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, pages 686–707, 2019.
[16] S.H. Rudy, S.L. Brunton, J.L. Proctor, and J.N. Kutz. Data-driven discovery of partial differential equations. Science Advances, 3, 2017.
[17] Feriedoun Sabetghadam and Alireza Jafarpour. $\alpha$ regularization of the pod-galerkin dynamical systems of the kuramoto–sivashinsky equation. Applied Mathematics and Computation, 218:6012–6025, 2012.
[18] John Villavert and Kamran Mohseni. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
[19] Sifan Wang, Parsi Perdikaris, and Shyam Sankaran. Respecting causality for training physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421(116813), 2024.
[20] Ting Zhang and Chun Shen. Regularization of the shock wave solution to the riemann problem for the relativistic burgers equation. Abstract and Applied Analysis, 2014, 2014.
[21] Hongwu Zhao and Kamran Mohseni. A dynamic model for the lagrangian-averaged navier-stokes- $\alpha$ equations. Physics of Fluids, 17(075106), 2005.