This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\contourlength

1.4pt

Unsupervised physics-informed neural network in reaction-diffusion biology models (Ulcerative colitis and Crohn’s disease cases) A preliminary study

Ahmed Rebai AI Factory, Value Digital Services, Tunisia, "value.com.tn". Louay Boukhris AI Factory, Value Digital Services, Tunisia, "value.com.tn". Radhi Toujani AI Factory, Value Digital Services, Tunisia, "value.com.tn". Ahmed Gueddiche AI Factory, Value Digital Services, Tunisia, "value.com.tn". Fayad Ali Banna AI Factory, Value Digital Services, Tunisia, "value.com.tn". Fares Souissi AI Factory, Value Digital Services, Tunisia, "value.com.tn". Ahmed Lasram AI Factory, Value Digital Services, Tunisia, "value.com.tn". Elyes Ben Rayana AI Factory, Value Digital Services, Tunisia, "value.com.tn". Hatem Zaag Université Sorbonne Paris Nord, LAGA, CNRS (UMR 7539), F-93430, Villetaneuse, France.
Abstract

We propose to explore the potential of physics-informed neural networks (PINNs) in solving a class of partial differential equations (PDEs) used to model the propagation of chronic inflammatory bowel diseases, such as Crohn’s disease and ulcerative colitis. An unsupervised approach was privileged during the deep neural network training. Given the complexity of the underlying biological system, characterized by intricate feedback loops and limited availability of high-quality data, the aim of this study is to explore the potential of PINNs in solving PDEs. In addition to providing this exploratory assessment, we also aim to emphasize the principles of reproducibility and transparency in our approach, with a specific focus on ensuring the robustness and generalizability through the use of artificial intelligence. We will quantify the relevance of the PINN method with several linear and non-linear PDEs in relation to biology. However, it is important to note that the final solution is dependent on the initial conditions, chosen boundary conditions, and neural network architectures. 111This work was carried out under the scientific direction of the mathematician Pr. Hatem Zaag. 222Corresponding author: Ahmed Rebai, ahmed.rebai@value.com.tn

Keywords: Unsupervised PINN, Deep Neural Networks, Coupled Nonlinear PDEs, IBD (Inflammatory Bowel Diseases), Ulcerative Colitis, Crohn’s Disease, Computer Vision, Machine Learning Classification, AI Reproducibility.

1 Introduction

The current work focuses on a new multidisciplinary field at the intersection of three disciplines: artificial intelligence (AI) via deep learning, applied mathematics via partial differential equations, and the biology of the inflammatory bowel diseases (as illustrated in Figure 1). While writing this paper, we encountered several difficulties due to the unique nature of this multidisciplinary subject, which is both innovative and AI hyped, resulting in a genuine debate in the community between partisans who are optimistic about the potential of this new technique [1, 2] and non-partisans who point out its limitations [3, 4]. For this, we believe it is best to begin with a brief overview of each discipline before discussing progress at the various intersections between these disciplines, followed by a discussion of the resulting controversy.

Biology: * Inflammatory process * Biological system complex feedback loops * No real good data to compare the behavior of the algorithm with that of an inflammatory process Deep Learning: * Neural networks architectures * Universal approximation theorem Applied mathematics: * First-order ODEs * Diffusion equations * Reaction diffusion non linear coupled equations * Turing mechanism system [5] Actual study
Figure 1: This figure illustrates the interplay between deep neural networks and differential equation models in this multidisciplinary study. Deep neural networks are capable of handling large amounts of data but can be computationally moderate, while differential equation models on the other hand, require less data but may be computationally intensive. The aim of the study is to leverage the advantages of both approaches to solve partial differential equations modeling biological phenomena using deep learning techniques, minimizing data requirements and obtaining a computationally efficient model.

2 PINN in biology: crossroads of several disciplines

2.1 Artificial intelligence

In its general definition, artificial intelligence allows computers to partially or totally perform intelligent tasks usually associated with human intelligence. Nowadays, artificial intelligence learns and generalizes patterns in high-dimensional and highly non-linear spaces without being specifically guided [6, 7]. This learning processes is based on various types of data (tabular data, image data, sound data, text data…) and is leading to success in various fields such as nuclear energy where AI has been used recently to control a fusion reactor [8] and Earth science where AI allowed to predict the weather in short-term within the "nowcasting" project with a deep reinforcement learning algorithm [9]. Continuing with this progress, in this paper we will see how neural networks could also learn the dynamics of a complex biological system from the structure of the partial differential equations describing these dynamics.

2.2 Partial differential equations

Many engineering fields use partial differential equations as models, including combustion theory, weather prediction, financial markets and industrial machine design. A partial differential equation or a system of partial differential equations can be solved analytically [10], numerically [11, 12] and now with artificial intelligence using techniques such as deep neural networks (DNNs) [1, 13]. In practice, the analytical method works for some simple equations, but its application is difficult in most cases of coupled and nonlinear PDE systems. The numerical resolution technique is preferred for this and often requires the use of expensive commercial numerical solvers such as finite element method (FEM) or finite difference method (FDM). It can be summarized in five steps: modeling, meshing, discretizing, numerical computing and post-processing:

  • Modeling: Mathematical modeling of physical or chemical or biological processes.

  • Mesh: The creation of a mesh or a grid called also computational domain which consists of equivalent system of multiple sub-domains (finite differences or elements or volumes). Given the mesh, the basis functions are predetermined. This step is characterized by its great temporal complexity. At this level, the PINN method could offer a solution to reduce the execution time since it only requires faster random sampling of the working domain.

  • Discretization: Discretize the governing equations by turning it into a system of equations simply by approximating the derivatives. The used functions are linear (i.e polynomial functions) which sometimes does not capture the non-linear character of the underlying phenomena. Given the non-linear nature of neural networks, they could help overcome this deficiency.

  • Solution: Solving the set of linear equations by numerical computing using extensive parallel IT ressources like CPUs, GPUs and large RAMs.

  • Postprocess: Finding the desired quantities like position or velocity by analyzing the obtained data. The use of machine learning models can make this analysis more refined and robust.

As previously stated, numerical methods have some drawbacks such as high time consumption, repetitiveness and lack of autonomy. In fact, creating a mesh to simulate an airplane turbo-reactor can take months in some industrial cases. Also, numerical solving is repetitive because the 5 steps must be reproduced each time the domain is changed. Furthermore, unlike AI models, this procedure does not learn from previous trials even if we keep the same domain or the same grid. However, the basis functions do not always allow for the reflection of non-linear phenomena that cause real or artificial blow-up phenomena such as numerical explosion [14]. Finally, when we consider how difficult it is to reduce human intervention, it is clear that this technique lacks the autonomy sought during the normal use of artificial intelligence. As a result, several unanswered questions may arise, such as: Is it possible for solvers to learn the basis functions from partial differential equations automatically? Is it feasible to develop autonomous flow solvers for fluid mechanics?

2.3 Biology of the inflammatory bowel diseases

Now let’s move to biology which is the science of living organisms extending from the molecular level to the mesoscopic ecosystems. In this paper, we focus on the modelling of the inflammatory process hitting the bowel. Crohn’s disease and ulcerative colitis are both inflammatory bowel diseases but they are different indeed [15, 16]. Ulcerative colitis (UC) is a chronic inflammatory bowel disease resulting from an overreaction of the natural defenses of the digestive immune system, with an estimated prevalence of 1 in 1500 people with an annual incidence of 6 to 8 new cases per 100,000 inhabitants in Australia [17], Western Europe and the United States. In Tunisia, the incidence is estimated at 2.11 per 100,000 inhabitants per year [18, 19]. UC is not a rare disease in tunisian adults, but in children. It is characterized by smooth ulceration of the inner lining of the colon. The inflammation begins in the lower region of the colon, just above the anus, and progresses upward at varying distances. One of the most important indicators of the severity of this disease is the spatial distribution of the intestinal lesions associated with an introduced gastro-enterologist’s severity score. While individuals with moderate to high severity scores have a concentration of lesions around the rectum, those with low severity scores frequently have a homogeneous spatial distribution of colonic lesions. UC appears in lesions such as bleeding rectal and colon ulcers. It is a currently incurable disease characterized by varying intensities of inflammatory relapse with interspersed remission periods. This puts the patient at higher risk of colon cancer than the general population thus the potential removal of the organ (colectomy). Currently available treatments aim to control pain, reduce the frequency and duration of relapses, and thereby relieve symptoms. Crohn’s disease is a type of painful inflammatory bowel disease (IBD) that is not well understood. In Tunisia, this serious disease affects both children over 10 and adults [18]. It consists of the appearance of several asymmetrical segments of deep lesions separated by intact areas. In the worst cases, these areas can turn into fissures or even holes in the wall of the intestine. Unlike other IBDs, it affects any part of the gastrointestinal tract, from top (the mouth) to bottom (the anus), in contiguous or isolated parts. The inflammation can affect the inner lining and even go beyond the entire thickness of the intestinal wall; It is manifested by a blood vessels dilation and tissues fluid loss. It is usually present in the lower part of the small intestine that connects to the colon. The inflamed portion of the intestine affects the deep panniculus and is not adjacent to it, but rather is distributed throughout the gastrointestinal tract, with an erratic inflammation pattern. The diagnosis of this disease requires advanced technological tools which present difficulties in the collection of data to predict the spread. For that, the mathematical modeling has been increasingly utilized as a tool to understand the complex and dynamic processes involved in both diseases as shown in [5, 20].

In the evaluation and management of both Crohn’s disease and ulcerative colitis, doctors typically use a combination of biological, clinical, and spatial indicators to assess a patient’s condition, predict its progression, and determine the most appropriate treatment. Clinical indicators may include a physician’s examination and questioning of the patient, as well as video examination of the colon through colonoscopy. Biopsy samples taken during colonoscopy can also provide valuable histological images. In addition, biological or chemical indicators such as the measurement of calprotectin levels in stool (as an indicator of inflammation) and analysis of the intestinal microbiota through DNA and RNA analysis can provide important insights into the disease. Additionally, analysis of RNA expression in the intestine can also be used as an indicator.

Data type Data requirements Tasks
Clinical data Doctor’s questionnaires Classification
Biological data Physico-clinical analysis Scoring and Classification
Images and videos data Computer vision treatment Classification and PDE
Table 1: A summary of clinical, biological, and image/video data characterization in IBDs medical tasks.

2.4 The importance of spatial information

Gastro-enterologists and surgeons are hindered from having spatial information on anatomical sites since these indicators are not spatial, and the provided information is never localized in a specific position. The diagnosis of these diseases is based on the analysis of colonoscopy videos. Thus, physicians assess the severity of the disease according to the presence of inflammation, bleeding or ulcers on the intestinal wall which requires an advanced level of expertise. In the same way, the extent of the lesions is currently ignored in medical practice, for lack of a validated method for analyzing this information. This same remark applies to other indicators like numerical score of severity [21], the speed of inflammation propagation, the choice of treatment [22]. Gastroenterologists recognize the significance of spatial information in the development of complications such as esophageal and colon cancer in patients with Crohn’s disease and ulcerative colitis. However, current guidelines fail to fully consider the quantity and distribution of lesions, often focusing solely on the most severe lesion identified. This is due to the scarcity of software tools and scientific literature. Additionally, the intricate feedback loops and technical challenges in collecting high-quality data for the calibration of numerical and mathematical models (see figure 2) further highlights the need for innovative methods. This necessity is the driving force behind our current study, which aims to address the limitations in current approaches and provide a more comprehensive understanding of the disease.

Refer to caption
Figure 2: The figure shows how Crohn’s disease differs from ulceratives colitis in terms of propagation, effect on the human body and mathematical modeling of both diseases. In the right, the system of partial differential equations is derived from this paper [5]. In the left, the Fisher kpp equation is used for the ulcerative colitis disease as shown in this study [23].

Endoscopic video analysis [24] plays a crucial role in evaluating the severity of ulcerative colitis and monitoring the progression of the disease. Colonoscopy is widely used as the reference examination to assess the intensity of the disease and the extent of intestinal lesions. During this routine procedure, a gastroenterologist inserts a camera-equipped endoscope into the colon to visualize the inner lining and take biopsies if necessary. It should be noted that this technique has a very strong impact on the quality of life of the patients.

Wireless capsule endoscopy (WCE) [24] is another commonly used technique in which patients swallow a small, intelligent capsule that contains a camera and a light source. The capsule sends images of the intestinal mucosa to a wearable sensor, making it a less invasive alternative to colonoscopy. This method is especially useful for accessing regions of the small intestine that are difficult to reach with endoscopy. However, it is more expensive as the capsule can only be used once. Unfortunately, this technique has been abandoned in Tunisian hospitals due to its cost being considered expensive.

Both colonoscopy and WCE allow for the detection of important lesions in the videos, such as: Loss of visibility of the vascular framework, which is indicated by the disappearance of blood vessels and the formation of fibrous tissue that impedes nutrient absorption and inflammation and bleeding, which appear as red areas on the intestinal wall and ulcers and indentations in the wall that appear white or gray. The precise collection and examination of the endoscopic video data is essential not only for identifying the disease presence and advancement, but also for categorizing the different types of IBDs and classifying the subtypes within the same disease.

2.5 PIML: A new and growing discipline with challenges

Physics-Informed Machine Learning (PIML), also known as Physics-Informed Neural Networks (PINN), is an emerging discipline that merges the concepts of physics with the advanced techniques of machine learning and neural networks. The objective of PIML is to harness the laws of physics to increase the precision of machine learning models, particularly in cases where the systems being modeled are governed by partial differential equations. In PIML, the governing equations of a physical system are integrated into the training process of a machine learning model, resulting in predictions that are not only accurate, but also physically meaningful and interpretable. Overfitting is prevented through this approach. PIML has been successfully applied in diverse fields such as fluid dynamics, structural mechanics, quantum mechanics, cosmology and quantitative finance. With many advancements and applications yet to be discovered, PIML is a rapidly growing field that promises exciting new possibilities. According to the Gartner AI Hype Cycle diagrams for 2021 [25] and 2022 [26], PINN and PIML are currently in the innovation trigger phase, gaining increasing attention and applications in the scientific community. With continued growth and development, these technologies are expected to reach the peak of inflated expectations before settling into a plateau of productivity. As PINN and PIML become more widely adopted, we can expect to see their use in solving real-world problems in biology and medicine (see the figure 3).

The establishment of neural networks in mathematics can be traced back to the seminal work of Cybenko in 1989 [27]. In this paper, Cybenko presented the concept of universal approximation, which demonstrated that a single hidden layer feedforward neural network with a sigmoid activation function is capable of approximating any continuous bounded function with a sufficient number of hidden units. This foundational work was further advanced by the studies of Hornik and Barron [28, 29], which provided additional insights into the concept of universal approximation. Various architectures have been developed, starting with original PINN and followed by DeepFNet, DeepONet, DeepM&MNet.

  • DeepFNet [30] is a neural network architecture that is well-suited for functional approximation tasks because it is flexible, able to model complex relationships, and scalable. Its hierarchical structure allows it to learn and represent multi-scale features in the data, improving its ability to generalize and make accurate predictions on unseen data. Generally, it requires significantly fewer neurons than shallow networks to achieve a given degree of function approximation.

  • DeepONet [31]: uses a deep learning approach to learn nonlinear operators. The advantage is that it can capture complex relationships between variables that are not easily modeled using linear techniques. DeepONet uses seq2seq and fractional algorithms. Seq2seq (or "sequence-to-sequence") is a type of algorithm that is used to map input sequences to output sequences. Data with long-range relationships are analyzed using fractional (or "fractionally-differentiated") approaches.

    Refer to caption
    Figure 3: For the second consecutive year, the 2022 Gartner’s hype cycle for artificial intelligence evokes the physics-informed AI. We notice that the PIML is actually in the innovation trigger regime with an improved outlook because the plateau of productivity will be reached between 2 to 5 years instead of 5 to 10 years.
  • DeepM&MNet [32, 33] is a neural network framework for simulating complex, multiphysics systems. It uses pre-trained neural networks to make predictions about the different fields in a coupled system, such as the flow, electric and concentration fields. The framework is designed to be fast and efficient, and can be used to build models with very little data. DeepM and MNet are versatile algorithms for modeling complex, multiphysics and multiscale dynamic systems. DeepM uses a multilayer perceptron architecture, while MNet uses a combination of convolutional and long short-term memory networks. Both algorithms are able to capture intricate patterns and trends in time series data.

However, the best approach for using PINN in a particular biological system depends on the available data and knowledge of the physics of the system. We will explore three possible scenarios in which PINN can be applied in biology.

  • First scenario: In this case, a Physics-Informed Neural Network (PINN) is used to make predictions about a system based on both data and known physics information. The neural network is trained on the data and also incorporates the known physics through the use of constraints or regularization terms in the loss function. This allows the network to make predictions that are consistent with the known physics and improve accuracy by utilizing the available data. Fluid dynamics represents a classic example where the neural network is trained on experimental or numerical data of the fluid flow and incorporates the governing equations of fluid dynamics as constraints or regularization terms in the loss function.

  • Second scenario: In this scenario, there is a large amount of data available, but there is no physical model to describe the dynamics. For example, consider the physics of jets produced by terrestrial accelerators in heavy ion experiments such as ALICE at the LHC-CERN accelerator. The lack of a physical model can make it difficult to understand the underlying dynamics of the system. This is a good use case for traditional machine learning techniques, as there is no physics information to incorporate into the model.

  • Third scenario: In this case, data is limited and the system is described by several physical models. The limited data and the presence of multiple physical models can make it challenging to determine which model is most appropriate for describing the system. In this case, it may be necessary to use a combination of approaches, such as combining physical models with machine learning techniques, to gain a more complete understanding of the system. It is also important to carefully validate the results and ensure that the chosen model is able to accurately describe the observed behavior. This is the case considered in this article, in which we attempt to model a biological phenomenon caused by loops of reactions and counter-reactions between bacteria and immune cells.

Having discussed the various techniques and scenarios involved in PINNs, it is now important to evaluate and quantify the performance of the model. From a general point of view, the neural network performance can be characterized into three main types:

  1. 1.

    Approximation error to ground truth function.

  2. 2.

    Generalization to unseen data.

  3. 3.

    Trainability of the model.

In fact, the universal function approximation theorem only considers the approximation error of a neural network to the ground truth function. However, it does not consider other important factors, such as the generalization error and the model trainability. This generalization measures a model’s ability to make accurate predictions on new, unseen data. Meanwhile, the model trainability is determined by factors such as its size, complexity, amount and quality of training data. These factors can influence the model’s ability to be effectively trained, as larger and more complex models may require more resources and may be harder to converge. For that this theorm could not ensure the generalization and the trainability in complex biological process [34] or in the modeling of nonlinear two-phase transport in porous media [4]. These limitations include the availability and quality of data and the potential for the models to fail to capture the full range of possible behaviors or phenomena. In the following points, we aim to shed light on the challenges faced in our modeling efforts.

  • Complexity of underlying processes: Biological processes are often characterized by complex interactions and dynamics, making it challenging to accurately model them using traditional mathematical or physical approaches. The non-linear nature of the differential equations involved, including the presence of non-linear terms such as square or cubic terms, only adds to the difficulty. These non-linear terms can even lead to blow-up phenomena, a common challenge for mathematicians working with PDEs [5]. This can also make it challenging for PINNs to learn and represent the underlying patterns and relationships in the data.

  • Data availability and quality: The data used to train PINNs may be limited in quantity or quality, or may not be representative of the full range of behaviors or phenomena. This can affect the model’s ability to learn and generalize, and can reduce its accuracy and reliability. Spatial data requires exhaustive examination, which can be expensive, and hospitals are often reluctant to share their data. Our attempts to contact digestive disease institutes for data resulted in a refusal to collaborate. However, we were able to find a more accessible data set called Kvasir that we plan to use for our study [35].

  • Limited range of behaviors : PINNs may not be able to capture the full range of behaviors and phenomena that can occur in biological systems. This is because the models are typically trained on a limited set of data and are not able to capture the full range of possible behaviors or situations that can arise.

  • Necessity for simple and parsimonious PINN models: The complexity of the PINN model itself can also be a limitation. These models can be computationally expensive to train and may require a large amount of data and computational resources. This can make their use challenging in certain contexts, such as when data is limited or when computational resources are constrained, as in the case of automatic lesion detection techniques where the gastroenterologist manipulates the patient using the colonoscope and works on the software in real-time [36, 37].

2.6 Our approach

The discussion above have provided the necessary foundation for our main work. We are attempting to predict the evolution of two bowel diseases with poor quality data collected only on the edges of the domain and by integrating physical constraints from nonlinear PDE with the simplest deep neural networks. We are inspired by the first work in PINN done by M. Raissi et al. [38]. In this paper, the authors dealt with 4 equations:

  • Two-dimensional Navier-Stokes equation system

  • Shrödinger equation

  • Korteweg-De Vries equation

  • Burgers’ equation

Then, we will start with the simplest PDE equations and gradually add more complexity, including nonlinear terms. By following this progression, we hope to build a comprehensive model for predicting the evolution of these diseases. Therefore, we will apply this approch in these equations:

  • Simple partial differential equation (a Toy model)

  • Diffusion equation or heat equation in 2D domain

  • Fisher-KPP equation in 1D domain

  • Korteweg-De Vries equations

  • Traditional Turing System: nonlinear coupled system for Reaction Diffusion Equations

  • Turing mechanism System for Crohn’s disease presented in this paper [5].

3 Benchmarking our approach with some chosen PDEs

This benchmarking refers to the process of evaluating the performance and accuracy of the DNN against a series of PDEs with a minimum of data. We begin by testing the DNN on a simple PDE and observe that it is able to accurately solve it with a high degree of accuracy. We will see in the next section that this resolution relates to the determination of the severity score for IBDs, incorporating the crucial aspect of spatial information distribution. Next, we apply the DNN to the Burger’s equation, which is a more complex nonlinear PDE. The DNN is still able to solve this equation with a good level of accuracy. We then move on to the heat equation, which is a linear PDE and therefore relatively easier to solve. Finally, we test the DNN on a nonlinear system of Korteweg-De Vries equations.

3.1 Neural networks architecture

Choosing the hyperparameters of a DNN is an important step in its design and training. These parameters are the values that control the overall behaviour of the DNN, such as the number of layers, the number of neurons per layer, the learning rate and the regularization strength. Then, it is important to consider the nature of the problem being solved and the available data. For example, if the data is limited or noisy, it may be necessary to use a smaller or simpler network to avoid overfitting. The choice of optimization algorithm is another important factor in the training of the network. There are many different optimization algorithms available, each with its own strengths and weaknesses. Two commonly used optimization algorithms are L-BFGS and Adam.

Refer to caption
Figure 4: Illustration of a DNN designed to approximate the concentration of phagocytes and bacteria in the digestive system. The network features an input layer of 2 neurons for capturing spatial and temporal information, followed by 5 hidden layers of 5 neurons each. The output layer consists of a single neuron that provides the final prediction. This architecture exhibits a fully connected structure.

We can control overfitting using the dropout concept which works by randomly "dropping out" a fraction of the neurons during training, then a proportion of neurons are temporarily excluded from the network and do not contribute to the forward or backward passes. This has the effect of reducing the complexity of the DNN and forcing the remaining neurons to learn more robust and generalizable features. The notion of parsimony is important in deep learning because it helps to ensure that the models we build are as simple as possible while still being able to effectively capture the underlying patterns in the data. By using parsimony, we can avoid overfitting and build models that are more likely to perform well on new, unseen data.

3.1.1 Adam vs LBFGS

Stochastic gradient descent (SGD) is an optimization algorithm for finding model parameters that minimize the loss function, which measures the difference between the expected and actual output of a model [39]. There are many variations of SGD, including Adagrad, RMSprop, and Adam. Adam optimization is a stochastic gradient descent method that adaptively estimates first-order and second-order moments [40].

In Adam optimization, the parameter update is given by:

mwt+1=β1mwt+(1β1)wLtm_{w}^{t+1}=\beta_{1}m_{w}^{t}+(1-\beta_{1})\nabla_{w}L^{t}
vwt+1=β2vwt+(1β2)(wLt)2v_{w}^{t+1}=\beta_{2}v_{w}^{t}+(1-\beta_{2})(\nabla_{w}L^{t})^{2}
m^w=mwt+11β1t\hat{m}_{w}=\frac{m_{w}^{t+1}}{1-\beta_{1}^{t}}
v^w=vwt+11β2t\hat{v}_{w}=\frac{v_{w}^{t+1}}{1-\beta_{2}^{t}}
wt+1=wtηm^wv^wϵw^{t+1}=w^{t}-\eta\frac{\hat{m}_{w}}{\sqrt{\hat{v}_{w}}-\epsilon}

where w(t)w(t) are the model parameters, LtL^{t} is the loss function, tt is the current training iteration, β1\beta_{1} and β2\beta_{2} are the forgetting factors for gradients and second-order gradient moments, respectively.

On the other hand, the BFGS algorithm is a Quasi-Newton method for optimization, which approximates the Hessian matrix using a series of updates [41, 42]. One of the most widely used Quasi-Newton methods is L-BFGS (Limited-memory BFGS [43, 44]), which is more memory efficient than BFGS, as it only stores a few vectors representing the approximation of the Hessian matrix instead of the full matrix. L-BFGS is more computationally demanding than Adam, but can be faster in situations where the second-order moments are known analytically. Instead of storing the full n x n estimation of the inverse Hessian, L-BFGS stores only a few vectors that represent the approximation implicitly which make it more practical for usage in ML settings with small-mid sized datasets. This method is computationally more demanding. However when the second order is known analytically the optimization method moves faster towards the minimum. The reason lies in the fact that methods based on the first order approximate the error function at a point with a tangent hyperplane, while methods of the second order, with a quadratic hypersurface: this allows us to move closer to the error surface when we update the weights at each iteration.
The L-BFGS algorithm is as follows:

Algorithm 1 L-BFGS
for i=1,2,,ni=1,2,\ldots,n do
     Obtain a direction PkP_{k} by solving BkPk=f(Xk)B_{k}P_{k}=-\nabla f(X_{k})
     Perform a one-dimensional optimization to find an acceptable stepsize αk\alpha_{k} in the direction
     found in the first step. In an exact line search, αk=argminf(Xk+αPk)\alpha_{k}=\operatorname{argmin}f(X_{k}+\alpha P_{k}).
     In practice, an inexact line search usually suffices, with acceptable αk\alpha_{k} satisfying the Wolfe
     conditions.
     Set Sk=αkPkS_{k}=\alpha_{k}P_{k} and update Xk+1=Xk+SkX_{k+1}=X_{k}+S_{k}
     yk=f(Xk+1)f(Xk)y_{k}=\nabla f(X_{k+1})-\nabla f(X_{k})
     Bk+1=Bk+YkYkTskYkT+BkskskTBkTBkskskTB_{k+1}=B_{k}+\frac{Y_{k}Y_{k}^{T}}{s_{k}Y_{k}^{T}}+\frac{B_{k}s_{k}s_{k}^{T}B_{k}^{T}}{B_{k}s_{k}s_{k}^{T}}
end for

where XkX_{k} are the model parameters, ff is the loss function, BkB_{k} is an approximation of the Hessian matrix, and SkS_{k} and YkY_{k} are the differences between the current and previous values of XX and f(X)\nabla f(X), respectively. The L-BFGS and ADAM optimization algorithms are both commonly used in the context of training neural networks, including PINN. Here is a comparison of some key features of these algorithms:

  • Convergence rate: L-BFGS typically has a faster convergence rate than ADAM. However, ADAM can still be effective in practice and may be preferred in some cases due to its simplicity and ability to adapt to changing data [40].

  • Memory requirements: L-BFGS requires storing a set of past gradients in memory, which can be costly for large datasets or networks. ADAM, on the other hand, only requires storing the average of the past gradients, which is typically much cheaper in terms of memory usage [41].

  • Sensitivity to hyperparameters: L-BFGS has relatively few hyperparameters, which can make it easier to tune. ADAM has more hyperparameters, such as the learning rate and the momentum parameters, which can be more sensitive to the choice of values and may require more careful tuning [41, 42].

  • Robustness: L-BFGS can be sensitive to the initialization of the parameters, and may require a good initial guess to find the optimal solution. ADAM can be more robust to the initialization, but may be less sensitive to the true global minimum as shown in [42].

Both L-BFGS and ADAM can be effective in training PINN models, and it may be useful to try both algorithms and compare their performance to determine which is the best fit for a particular problem. In our study, we found that the ADAM give best results.

Optimization algorithms
L-BFGS ADAM
xk+1=xk+αkdkx_{k+1}=x_{k}+\alpha_{k}d_{k} mk=β1mk1+(1β1)gkm_{k}=\beta_{1}m_{k-1}+(1-\beta_{1})g_{k}
sk=xk+1xks_{k}=x_{k+1}-x_{k} vk=β2vk1+(1β2)gk2v_{k}=\beta_{2}v_{k-1}+(1-\beta_{2})g_{k}^{2}
yk=gk+1gky_{k}=g_{k+1}-g_{k} mk=mk1β1km_{k}^{\prime}=\frac{m_{k}}{1-\beta_{1}^{k}}
αk=skTykykTHkyk\alpha_{k}=\frac{s_{k}^{T}y_{k}}{y_{k}^{T}H_{k}y_{k}} vk=vk1β2kv_{k}^{\prime}=\frac{v_{k}}{1-\beta_{2}^{k}}
xk+1=xk+αkdkx_{k+1}=x_{k}+\alpha_{k}d_{k} xk+1=xkηvk+ϵmkx_{k+1}=x_{k}-\frac{\eta}{\sqrt{v_{k}^{\prime}}+\epsilon}m_{k}^{\prime}
Table 2: xkx_{k} represents the current estimate of the solution, dkd_{k} is the search direction, αk\alpha_{k} is the step size, sks_{k} and yky_{k} are the difference vectors, gkg_{k} is the gradient of the objective function at xkx_{k}, mkm_{k} and vkv_{k} are the first and second moments of the gradient estimates, β1\beta_{1} and β2\beta_{2} are the decay rates for the moving averages, η\eta is the learning rate, ϵ\epsilon is a small constant to prevent division by zero, and HkH_{k} is the approximate Hessian matrix. Note that the formulas for L-BFGS involve the computation of the Hessian matrix and its inverse, which can be computationally expensive for large datasets or networks. In contrast, the formulas for ADAM do not involve the Hessian matrix and can be more computationally efficient.

3.1.2 Root Mean Square Error(RMSE) Loss

In the context of a biology process where the values are continuous and expected to be situated in a small range, using the mean squared error (MSE) or root mean squared error (RMSE) loss function can help ensure that the model is able to accurately predict the values within that range. Both MSE and RMSE measure the difference between the predicted values and the true values, but MSE is simply the average squared difference while RMSE is the square root of the average squared difference. Both of these loss functions penalize large errors more heavily than small errors, which can be useful for preventing the DNN model from making large errors in its predictions.

RMSE=i=1n(xi^xi)2nRMSE=\sqrt{\frac{\sum_{i=1}^{n}(\widehat{x_{i}}-x_{i})^{2}}{n}}

With: xix_{i} = the ithi^{th} observed value, xi^\widehat{x_{i}} = the ithi^{th} predicted value and n is the number of available observations.

3.1.3 RELU vs Sigmoid vs Tanh

In the context of tuning a DNN, the choice of activation function can significantly impact the performance. The Rectified Linear Unit (ReLU) activation function is widely used due to its simplicity and ability to converge faster than other activation functions. It is defined as f(x)=max(0,x)f(x)=max(0,x) and is generally used in the hidden layers of the network. One disadvantage of ReLU is that it can suffer from the "dying ReLU" problem, where the weights of the neurons become very small and the activation function becomes inactive. The Sigmoid activation function is defined as f(x)=11+exf(x)=\frac{1}{1+e^{-x}} and is often used in the output layer of binary classification problems. However, it has a slow convergence rate and can suffer from vanishing gradients, where the gradients of the weights become very small, hindering the model’s ability to learn. The Hyperbolic Tangent (Tanh) activation function is defined as f(x)=tanh(x)=2σ(2x)1f(x)=tanh(x)=2\sigma(2x)-1, where σ(x)\sigma(x) is the Sigmoid function. It is often used in the hidden layers of the network and can perform well in certain tasks, but it can also suffer from the vanishing gradients problem. The activation function choice can also depend on the range of the values being predicted, especially in a biology context. For example, if the predicted values are expected to be within a small range, such as between 0 and 1, the Sigmoid or Tanh activation functions may be more appropriate. However, if the predicted values are expected to have a larger range, the ReLU activation function may be more suitable. It is important to keep in mind that the choice of activation function is just one of many hyperparameters that can impact the performance of the model and should be tuned accordingly. In this work, we experimented with three activation functions: RELU, Sigmoid and Hyperbolic tangent. Our models converged for Sigmoid and Hyperbolic tangent but we couldn’t obtain satisfying result with RELU.

3.2 Toy model: simple partial differential equation

Let’s consider this first-order partial differential equation:

au(x,t)x+bu(x,t)t+cu(x,t)=0a*\frac{\partial u(x,t)}{\partial x}+b*\frac{\partial u(x,t)}{\partial t}+c*u(x,t)=0

and let’s fix the values of the constants a, b, and c to be a = 1, b = -2, and c = -1. The modified equation becomes:

u(x,t)x2u(x,t)tu(x,t)=0\frac{\partial u(x,t)}{\partial x}-2*\frac{\partial u(x,t)}{\partial t}-u(x,t)=0

defined for the temporal interval t[0,1]t\in[0,1] and the spatial interval x[0,2]x\in[0,2]. The initial condition is given by:

u(x,0)=6e3xu(x,0)=6*e^{-3*x}

and the PDE boundary conditions are given by:

u(x=0,t)=g1(t)u(x=0,t)=g_{1}(t)
u(x=2,t)=g2(t)u(x=2,t)=g_{2}(t)

(Note that during the numerical and PINN resolutions, we will choose g1(t)=6e2tg_{1}(t)=6*e^{-2*t} and g2(t)=6e62tg_{2}(t)=6*e^{-6-2*t}).

To solve this partial differential equation analytically, the method of separation of variables is used. Start by separating the variables x and t and this involves writing the solution in the form:

u(x,t)=X(x)T(t)u(x,t)=X(x)*T(t)

Substituting this expression into the PDE gives:

X(x)T(t)2X(x)T(t)X(x)T(t)=0X^{\prime}(x)*T(t)-2*X(x)*T^{\prime}(t)-X(x)*T(t)=0
(X(x)X(x))T(t)=2X(x)T(t)(X^{\prime}(x)-X(x))*T(t)=2*X(x)*T^{\prime}(t)
X(x)X(x)X(x)=2T(t)T(t)=C0\frac{X^{\prime}(x)-X(x)}{X(x)}=2*\frac{T^{\prime}(t)}{T(t)}=C_{0}

with C0C_{0} is an arbitrary constant.

Solving for X(x) and T(t) separately gives:

Let’s solve the left member,

X(x)X(x)X(x)=C0\frac{X^{\prime}(x)-X(x)}{X(x)}=C_{0}

The integration gives:

X(x)=C1e(C0+1)xX(x)=C_{1}*e^{(C_{0}+1)*x}

The right member is written

T(t)T(t)=C02\frac{T^{\prime}(t)}{T(t)}=\frac{C_{0}}{2}

The integration gives:

T(t)=C2eC0t2T(t)=C_{2}*e^{\frac{C_{0}*t}{2}}

The general solution is then given by:

u(x,t)=X(x)T(t)=C1C2e(C0+1)x+C0t2u(x,t)=X(x)*T(t)=C_{1}*C_{2}*e^{(C_{0}+1)*x+\frac{C_{0}*t}{2}}

Using the initial condition at t=0t=0:

u(x,0)=C1C2e(C0+1)x=6e3xu(x,0)=C_{1}*C_{2}*e^{(C_{0}+1)*x}=6*e^{-3*x}

Then C1C2=6C_{1}*C_{2}=6 and C0=4C_{0}=-4

Therefore, the solution is:

u(x,t)=6e3x2tu(x,t)=6*e^{-3*x-2*t}

Analytical solution

u(x,t)=6e3x2t,x[0,2],t[0,1]u(x,t)=6*e^{-3*x-2*t},\hskip 14.22636ptx\in[0,2],\hskip 14.22636ptt\in[0,1]

Numerical schema

Now, we are interested in numerically solving the previous PDE by using the centered finite difference method. We need to discretize the spatial and temporal domains and approximate the derivatives using finite differences. Here is an outline of the steps involved in the numerical scheme:

  • Choose a spatial discretization step size Δx\Delta x, and a temporal discretization step size Δt\Delta t.

  • Define the grid points xix_{i} and tjt_{j} as follows:

xi=iΔxtj=jΔtx_{i}=i\Delta x\hskip 28.45274ptt_{j}=j\Delta t
uxui+1,jui1,j2Δx\frac{\partial u}{\partial x}\approx\frac{u_{i+1,j}-u_{i-1,j}}{2\Delta x}
utui,j+1ui,j12Δt\frac{\partial u}{\partial t}\approx\frac{u_{i,j+1}-u_{i,j-1}}{2\Delta t}
ui,j=ui+1,jui1,j2Δx2ui,j+1ui,j12Δtui,ju_{i,j}=\frac{u_{i+1,j}-u_{i-1,j}}{2\Delta x}-2\frac{u_{i,j+1}-u_{i,j-1}}{2\Delta t}-u_{i,j}
ui,j=11+2ΔtΔx(ui+1,j2ui,j+ui1,j2ΔtΔx(ui,j+1ui,j1))u_{i,j}=\frac{1}{1+2\frac{\Delta t}{\Delta x}}(u_{i+1,j}-2u_{i,j}+u_{i-1,j}-2\frac{\Delta t}{\Delta x}(u_{i,j+1}-u_{i,j-1}))

Substitute the initial condition u(xi,0)=6exp(3xi)u(x_{i},0)=6*\exp(-3*x_{i}) to find ui,0u_{i,0}.

Substitute the boundary conditions u(0,tj)=6e2tju(0,t_{j})=6*e^{-2*t_{j}} and u(2,tj)=6e62tju(2,t_{j})=6*e^{-6-2*t_{j}} to find u0,ju_{0,j} and uN,ju_{N,j}, where NN is the number of grid points in the spatial domain.

1import numpy as np
2import matplotlib.pyplot as plt
3
4
5def solve_pde(a, b, c, Delta_x, Delta_t):
6 # Number of grid points in spatial and temporal domains
7 N = int(2 / Delta_x)
8 M = int(1 / Delta_t)
9
10 # Define grid points
11 x = np.linspace(0, 2, N+1)
12 t = np.linspace(0, 1, M+1)
13
14 # Initialize solution array
15 u = np.zeros((N+1, M+1))
16
17 # Set initial condition
18 u[:, 0] = 6 * np.exp(-3 * x)
19
20 # Set boundary conditions
21 u[0, :] = 6 * np.exp(-2 * t)
22 u[N, :] = 6 * np.exp(-6 - 2 * t)
23
24 # Iterate over temporal domain
25 for j in range(M):
26 # Iterate over spatial domain
27 for i in range(1, N):
28 # Use finite difference equation to update solution
29 u[i, j+1] = (1 / (1 + 2 * Delta_t / Delta_x)) * (u[i+1, j] - 2 * u[i, j] + u[i-1, j] - 2 * Delta_t / Delta_x * (u[i, j+1] - u[i, j-1]))
30
31 # Return solution
32 return u
33
34# Define function to calculate analytical solution
35def analytical_solution(x, t):
36 return 6 * np.exp(-2 * t - 3 * x)
37
38# Set discretization step sizes
39Delta_x = 0.1
40Delta_t = 0.01
41
42# Calculate numerical solution
43solution = solve_pde(2, 1, -1, Delta_x, Delta_t)
44
45# Define grid points for plotting
46x_plot = np.linspace(0, 2, int(2 / Delta_x) + 1)
47t_plot = np.linspace(0, 1, int(1 / Delta_t) + 1)
48X, T = np.meshgrid(x_plot, t_plot)
49
50# Calculate analytical solution for plotting
51analytical = analytical_solution(X, T)
52
53# Plot numerical and analytical solutions
54plt.figure()
55plt.title("Numerical and analytical solutions")
56plt.pcolor(X, T, solution, cmap="coolwarm")
57plt.colorbar()
58plt.contour(X, T, analytical, colors="k")
59plt.xlabel("x")
60plt.ylabel("t")
61plt.show()
Listing 1: The code used to compute the numerical solution and the analytical one. The results are visualized using the matplotlib library.

PINN approach

In the case of numerical methods, the approach is to convert the PDE into numerical schemes, ensuring properties such as numerical convergence, consistency, and numerical stability. In the PINN approach, the problem is reformulated as a numerical optimization problem the goal is to end up with a loss function that contains most of the system’s dynamic information. To achieve this, the equation is first rearranged such that all its terms are gathered on one side, which forms the first term of the cost function. The remaining two terms of the loss function consist of the initial and boundary conditions, respectively. Then, the function is the sum of three positive terms that must be minimized through an optimization algorithm. Secondly, the data is generated randomly sampled from the phase space, with points located at the boundary, within the interior of the domain, and subject to the initial conditions.

Let us define f(x,t)f(x,t) as :

f(x,t)=u(x,t)x2u(x,t)tu(x,t)x[0,2],t[0,1]f(x,t)=\frac{\partial u(x,t)}{\partial x}-2*\frac{\partial u(x,t)}{\partial t}-u(x,t)\hskip 28.45274ptx\in[0,2],t\in[0,1]

In this case, u(x,t)u(x,t) will be approximated by a neural network. The latter has as an objective to minimize the following loss :

MSE=MSE0+MSEb+MSEfMSE=MSE_{0}+MSE_{b}+MSE_{f}

where

MSE0=1N0i=1N0|u(x0i,0)u0i|2MSE_{0}=\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}|u(x_{0}^{i},0)-u_{0}^{i}|^{2}
MSEb=1Nbi=1Nb(|u(0,tbi)ubi|2+|u(2,tbi)ubi|2)MSE_{b}=\frac{1}{N_{b}}\sum_{i=1}^{N_{b}}(|u(0,t_{b}^{i})-u_{b}^{i}|^{2}+|u(2,t_{b}^{i})-u_{b}^{i}|^{2})
MSEf=1Nfi=1Nf|f(xfi,tfi)|2MSE_{f}=\frac{1}{N_{f}}\sum_{i=1}^{N_{f}}|f(x_{f}^{i},t_{f}^{i})|^{2}

{x0i,u0i}\{x_{0}^{i},u_{0}^{i}\} denotes the initial data at t=0t=0, {tbi,ubi}\{t_{b}^{i},u_{b}^{i}\} denotes to the boundary data and {xfi,tfi}\{x_{f}^{i},t_{f}^{i}\} corresponds to collocation points on f(x,t)f(x,t).

1
2import numpy as np
3import torch
4import torch.nn as nn
5from torch.autograd import Variable
6
7device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
8
9class Net(nn.Module):
10 def __init__(self):
11 super(Net, self).__init__()
12 self.hidden_layer1 = nn.Linear(2,5)
13 self.hidden_layer2 = nn.Linear(5,5)
14 self.hidden_layer3 = nn.Linear(5,5)
15 self.hidden_layer4 = nn.Linear(5,5)
16 self.hidden_layer5 = nn.Linear(5,5)
17 self.output_layer = nn.Linear(5,1)
18
19 def forward(self, x, t):
20 inputs = torch.cat([x,t], axis=1)
21 layer1_out = torch.sigmoid(self.hidden_layer1(inputs))
22 layer2_out = torch.sigmoid(self.hidden_layer2(layer1_out))
23 layer3_out = torch.sigmoid(self.hidden_layer3(layer2_out))
24 layer4_out = torch.sigmoid(self.hidden_layer4(layer3_out))
25 layer5_out = torch.sigmoid(self.hidden_layer5(layer4_out))
26 output = self.output_layer(layer5_out)
27
28 return output
29
30def f(x, t, net):
31 u = net(x, t)
32 u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]
33 u_t = torch.autograd.grad(u.sum(), t, create_graph=True)[0]
34
35 pde = u_x - 2*u_t - u
36 return pde
37
38def get_boundary_data():
39 x_bc = np.random.uniform(low=0.0, high=2.0, size=(500,1))
40 t_bc = np.zeros((500,1))
41
42 u_bc = 6*np.exp(-3*x_bc)
43 return x_bc, t_bc, u_bc
44
45def get_collocation_data():
46 x_collocation = np.random.uniform(low=0.0, high=2.0, size=(500,1))
47 t_collocation = np.random.uniform(low=0.0, high=1.0, size=(500,1))
48 all_zeros = np.zeros((500,1))
49 return x_collocation, t_collocation, all_zeros
50
51def train_model(net, cost_function, optimizer, num_iterations):
52 previous_validation_loss = 99999999.0
53
54 for epoch in range(num_iterations):
55 optimizer.zero_grad()
56
57 # Loss based on boundaries conditions
58 x_bc, t_bc, u_bc = get_boundary_data()
59 pt_x_bc = Variable(torch.from_numpy(x_bc).float(), requires_grad=False).to(device)
60 pt_t_bc = Variable(torch.from_numpy(t_bc).float(), requires_grad=False).to(device)
61 pt_u_bc = Variable(torch.from_numpy(u_bc).float(), requires_grad=False).to(device)
62
63 net_bc_out = net(pt_x_bc, pt_t_bc)
64 mse_u = cost_function(net_bc_out, pt_u_bc)
65
66 # Loss based on PDE
67 x_collocation, t_collocation, all_zeros = get_collocation_data()
68 pt_x_collocation = Variable(torch.from_numpy(x_collocation).float(), requires_grad=True).to(device)
69 pt_t_collocation = Variable(torch.from_numpy(t_collocation).float(), requires_grad=True).to(device)
70 pt_all_zeros = Variable(torch.from_numpy(all_zeros).float(), requires_grad=False).to(device)
71
72 f_out = f(pt_x_collocation, pt_t_collocation, net)
73 mse_pde = cost_function(f_out, pt_all_zeros)
74
75 # Total loss
76 total_loss = mse_u + mse_pde
77 total_loss.backward()
78 optimizer.step()
79
80 # Validation
81 if epoch % 1000 == 0:
82 x_validation = np.random.uniform(low=0.0, high=2.0, size=(500,1))
83 t_validation = np.random.uniform(low=0.0, high=1.0, size=(500,1))
84 u_validation = 6*np.exp(-3*x_validation)
85
86 pt_x_validation = Variable(torch.from_numpy(x_validation).float(), requires_grad=False).to(device)
87 pt_t_validation = Variable(torch.from_numpy(t_validation).float(), requires_grad=False).to(device)
88 pt_u_validation = Variable(torch.from_numpy(u_validation).float(), requires_grad=False).to(device)
89
90 net_validation_out = net(pt_x_validation, pt_t_validation)
91 mse_validation_loss = cost_function(net_validation_out, pt_u_validation)
Listing 2: An implementation of PINN for the toy model.

Comparison between PINN, analytical solutions and finite difference method

Refer to caption
Refer to caption
Refer to caption
Figure 5: PINN, analytical and finite difference solutions side by side.
Refer to caption
Refer to caption
Refer to caption
Figure 6: PINN, analytical and finite difference at t=0.1 s, t=0.5 s and t = 0.9 s.
Analytical solution Finite difference solution
8 16 32 8 16 32
Layers 2 0.000472 0.000416 0.000746 0.000702 0.000332 0.000572
4 0.000366 0.000465 0.000412 0.000614 0.000710 0.000644
8 0.000790 0.001780 0.001695 0.000979 0.001777 0.001548
Table 3: RMSE between PINN and analytical solution / and PINN and finite difference solution for the toy model.
Neurons
8 16 32
Layers 2 2:05 2:02 2:55
4 3:58 3:46 3:43
8 4:44 5:16 5:26
Table 4: Execution time according to the number of layers and neurons for the toy model.

3.3 Burger’s equation

The Burger’s equation is a nonlinear PDE that is often used to model a variety of physical, biological, and chemical phenomena, including incompressible fluid flow, population dynamics, and chemical reactions. It expresses the balance between the fluid convective transport and the diffusive transport due to viscosity. Solving the Burger’s equation allows to determine the fluid velocity field at a given time and spatial location. This equation has several applications in biology like modeling blood flow in the cardiovascular system [45], modeling pattern formation biological systems and modeling population dynamics [46]. Two types of Burgers equations are considered, those without viscosity term which can be obtained by considering the particles non-interaction and whose solution can be realized with the help of the finite method, including the derivatives approximation means of Taylor developments based on the discretization of the phase space with viscosity term. The solution of the nonlinear viscosity problem based on the Cole-Hopf transform. Concerning numerical resolution done here either by a finite difference method. The first limitation of the Burgers equation is that it is a simplified model that makes certain assumptions about the system being studied. For example, it may assume that the fluid is incompressible or that the reaction rate is constant, which may not hold true in all cases. As a result, the Burgers equation may not be able to accurately capture the full complexity of a heterogeneous system. The second limitation is that the equation is a deterministic model, not take into account the inherent randomness that is often present in biological systems.

Equation

The Burger’s equation is the following:

u(x,t)t+u(x,t)u(x,t)xν2u(x,t)x2=0x[0,1],t[0,0.1]\frac{\partial u(x,t)}{\partial t}+u(x,t)\frac{\partial u(x,t)}{\partial x}-\nu\frac{\partial^{2}u(x,t)}{\partial x^{2}}=0\hskip 42.67912ptx\in[0,1],t\in[0,0.1]

with the initial condition:

u(x,0)=sin(πx)u(x,0)=\sin(\pi x)

and the boundary conditions:

u(0,t)=u(1,t)=0u(0,t)=u(1,t)=0

where ν\nu = 1 is the coefficient of kinematic viscosity

Using the Hopf-Cole transformation

u(x,t)=2νΘxΘu(x,t)=-2\nu\frac{\Theta_{x}}{\Theta}

The burger’s equation transforms to the linear heat equation:

Θ(x,t)t=ν2Θ(x,t)x2x[0,1],t[0,0.1]\frac{\partial\Theta(x,t)}{\partial t}=\nu\frac{\partial^{2}\Theta(x,t)}{\partial x^{2}}\hskip 42.67912ptx\in[0,1],t\in[0,0.1]

with the initial condition

Θ(x,0)=exp(cos(πx)12πν)\Theta(x,0)=\exp(\frac{\cos(\pi x)-1}{2\pi\nu})

and the boundary conditions

Θx(0,t)=Θx(1,t)=0\Theta_{x}(0,t)=\Theta_{x}(1,t)=0

Analytical solution

The Fourier solution to the burger’s equation is given by

u(x,t)=2πνn=1anexp(n2π2νt)nsin(nπx)a0+n=1anexp(n2π2νt)cos(nπx)u(x,t)=2\pi\nu\frac{\sum_{n=1}^{\infty}a_{n}\exp(-n^{2}\pi^{2}\nu t)n\sin(n\pi x)}{a_{0}+\sum_{n=1}^{\infty}a_{n}\exp(-n^{2}\pi^{2}\nu t)\cos(n\pi x)}

with

a0=01exp(cos(πx)12πν)𝑑xa_{0}=\int_{0}^{1}\exp(\frac{\cos(\pi x)-1}{2\pi\nu})\,dx

an=01exp(cos(πx)12πν)cos(nπx)𝑑xn>0a_{n}=\int_{0}^{1}\exp(\frac{\cos(\pi x)-1}{2\pi\nu})\cos(n\pi x)\,dx\hskip 14.22636ptn>0

1
2import numpy as np
3from scipy.integrate import quad
4import matplotlib.pyplot as plt
5from tqdm import tqdm
6
7def A_0(x, v_d):
8 return np.exp((np.cos(np.pi*x) - 1)/(2*np.pi*v_d))
9
10def A_n(x, v_d, n):
11 return 2*np.exp((np.cos(np.pi*x) - 1)/(2*np.pi*v_d))*np.cos(n*np.pi*x)
12
13def fct_u(x,t,v_d):
14 v = v_d
15 Num = 0
16 Den = 0
17 for n in range(1,101):
18 a = quad(A_n, 0, 1, args=(v,n))[0]*np.exp(-v*t*(n**2)*(np.pi**2))
19 Num += a*n*np.sin(n*np.pi*x)
20 Den += a*np.cos(n*np.pi*x)
21
22 return (2*np.pi*v*Num)/(quad(A_0, 0, 1, args=(v))[0] + Den)
23
24for i in range(0,10):
25 plt.plot(fct_u(np.arange(0,1,0.001), i/100,1), label=’i = + str(i/100))
26 plt.legend()
Listing 3: The code calculates the various coefficients and plots the analytic solution at different temporal instants.

Numerical schema

The solution domain (x, t) :x[0,1]x\in[0,1], t[0,0.1]t\in[0,0.1] is discretized into cells described by the node set (xi,tj)(x_{i},t_{j}) in which xjx_{j} =ih, tjt_{j} =jk (i = 0,1,…,N; j = 0,1,…,J, Nh= 1 and Jk = 0.1) h=Δx\Delta x is a spatial mesh size, k=Δt\Delta t is the time step.

A standard explicit finite difference approximation of the heat equation is

Θi,j+1=(12r)Θi,j+2rΘi+1,j,i=0\Theta_{i,j+1}=(1-2r)\Theta_{i,j}+2r\Theta_{i+1,j},\hskip 14.22636pti=0

Θi,j+1=rΘi1,j+(12r)Θi,j+rΘi+1,j,i=1,2,,N1\Theta_{i,j+1}=r\Theta_{i-1,j}+(1-2r)\Theta_{i,j}+r\Theta_{i+1,j},\hskip 14.22636pti=1,2,...,N-1

Θi,j+1=2rΘi1,j+(12r)Θi,j,i=N\Theta_{i,j+1}=2r\Theta_{i-1,j}+(1-2r)\Theta_{i,j},\hskip 14.22636pti=N

Using the Hopf-Cole transformation

u(xi,tj)=νh(Θi+1,jΘi1,jΘi,j),i=1,,N1,j=0,,Ju(x_{i},t_{j})=-\frac{\nu}{h}(\frac{\Theta_{i+1,j}-\Theta_{i-1,j}}{\Theta_{i,j}}),\hskip 14.22636pti=1,...,N-1,\hskip 5.69046ptj=0,...,J
1
2v = 1
3r = 1/4
4t_f = 0.1
5delta_x = 0.01
6delta_t = (r*delta_x**2)/v
7N = int(1/delta_x)
8J = int(t_f/delta_t)
9
10print(’delta_x =’, delta_x,’delta_t =’,delta_t,’N =’,N,’J =’,J)
11
12x = np.arange(0, 1, delta_x)
13t = np.arange(0, t_f, delta_t)
14
15theta_df = np.zeros((N,J))
16
17#Initial condition
18theta_df[:,0] = A_0(x, v)
19
20for j in tqdm(range(0,J-1)):
21 theta_df[0,j+1] = (1 - 2*r)*theta_df[0,j] + 2*r*theta_df[1,j]
22 for i in range(1,N-1):
23 theta_df[i,j+1] = r*theta_df[i-1,j] + (1 - 2*r)*theta_df[i,j] + r*theta_df[i+1,j]
24
25 theta_df[N-1,j+1] = 2*r*theta_df[N-2,j] + (1 - 2*r)*theta_df[N-1,j]
26
27u_df = np.zeros((N-2, J))
28
29for j in tqdm(range(0,J)):
30 for i in range(1,N-2):
31 u_df[i,j] = -(v/delta_x)*(theta_df[i+1,j] - theta_df[i-1,j])/theta_df[i,j]
32
33print(u_df.shape)
34
35for i in range(0,J,500):
36 plt.plot(u_df[:,i], label = ’i =’ +str(i*delta_t))
37 plt.legend()
38
39for i in range(0,J,500):
40 plt.plot(fct_u(x, i*delta_t,1), label = ’i =’ +str(i*delta_t))
41 plt.legend()
Listing 4: An implementation of the finite difference method for solving partial differential equations.

PINN approach

Let us define f(x,t)f(x,t) as :

f=u(x,t)t+u(x,t)u(x,t)xν2u(x,t)x2x[0,1],t[0,0.1]f=\frac{\partial u(x,t)}{\partial t}+u(x,t)\frac{\partial u(x,t)}{\partial x}-\nu\frac{\partial^{2}u(x,t)}{\partial x^{2}}\hskip 42.67912ptx\in[0,1],t\in[0,0.1]

In this case, u(x,t)u(x,t) will be approximated by a neural network. The latter has as an objective to minimize the following loss :

MSE=MSE0+MSEb+MSEfMSE=MSE_{0}+MSE_{b}+MSE_{f}

where

MSE0=1N0i=1N0|u(x0i,0)u0i|2MSE_{0}=\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}|u(x_{0}^{i},0)-u_{0}^{i}|^{2}
MSEb=1Nbi=1Nb(|u(0,tbi)ubi|2+|u(1,tbi)ubi|2)MSE_{b}=\frac{1}{N_{b}}\sum_{i=1}^{N_{b}}(|u(0,t_{b}^{i})-u_{b}^{i}|^{2}+|u(1,t_{b}^{i})-u_{b}^{i}|^{2})
MSEf=1Nfi=1Nf|f(xfi,tfi)|2MSE_{f}=\frac{1}{N_{f}}\sum_{i=1}^{N_{f}}|f(x_{f}^{i},t_{f}^{i})|^{2}

{x0i,u0i}\{x_{0}^{i},u_{0}^{i}\} denotes the initial data at t=0t=0, {tbi,ubi}\{t_{b}^{i},u_{b}^{i}\} denotes to the boundary data and {xfi,tfi}\{x_{f}^{i},t_{f}^{i}\} corresponds to collocation points on f(x,t)f(x,t).

1
2import torch
3import torch.nn as nn
4import numpy as np
5from random import uniform
6device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
7
8torch.set_default_tensor_type(’torch.cuda.FloatTensor’)
9
10N_u = 4000
11N_f = 10000
12
13#Make X_uv_train
14#BC x=-1 & tt t > 0
15x_left = np.ones((N_u//4,1), dtype=float)*(0)
16t_left = np.random.uniform(low=10**(-6), high=0.1, size=(N_u//4,1))
17X_left = np.hstack((x_left, t_left))
18
19#BC x=1 & tt t > 0
20x_right = np.ones((N_u//4,1), dtype=float)*(1)
21t_right = np.random.uniform(low=10**(-6), high=0.1, size=(N_u//4,1))
22X_right = np.hstack((x_right, t_right))
23
24#IC t=0 & tt x,y in [-1,1]
25t_zero = np.zeros((N_u//2,1), dtype=float)
26x_zero = np.random.uniform(low=0.0, high=1.0, size=(N_u//2,1))
27X_zero = np.hstack((x_zero, t_zero))
28
29X_u_train = np.vstack((X_left, X_right, X_zero))
30# shuffling
31index=np.arange(0,N_u)
32np.random.shuffle(index)
33X_u_train=X_u_train[index,:]
34
35#Make u_train
36u_left = np.zeros((N_u//4,1), dtype=float)
37u_right = np.zeros((N_u//4,1), dtype=float)
38u_initial = np.sin(np.pi * x_zero)
39u_train = np.vstack((u_left, u_right, u_initial))
40
41# ==========================================
42u_train=u_train[index,:]
43# ==========================================
44# make X_f_train
45X_f_train=np.zeros((N_f,2),dtype=float)
46for row in range(N_f):
47 x=uniform(0,1)
48 t=uniform(10**(-6),0.1)
49 X_f_train[row,0]=x
50 X_f_train[row,1]=t
51
52X_f_train=np.vstack((X_f_train, X_u_train))
53
54class PhysicsInformedNN():
55 def __init__(self,X_u,u,X_f):
56 # x & t from boundary conditions
57 self.x_u = torch.tensor(X_u[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
58 self.t_u = torch.tensor(X_u[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
59 # x & t from collocation points
60 self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
61 self.t_f = torch.tensor(X_f[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
62 # boundary solution
63 self.u = torch.tensor(u, dtype=torch.float32)
64 # null vector to test against f:
65 self.null = torch.zeros((self.x_f.shape[0], 1))
66 # initialize net:
67 self.create_net()
68
69 self.optimizer = torch.optim.Adam(self.net.parameters(),lr=0.001)
70 # typical MSE loss (this is a function):
71 self.loss = nn.MSELoss()
72 # loss :
73 self.ls = 0
74 # iteration number:
75 self.iter = 0
76
77 def create_net(self):
78 self.net=nn.Sequential(
79 nn.Linear(2,32), nn.Sigmoid(),
80 nn.Linear(32, 32), nn.Sigmoid(),
81 nn.Linear(32, 32), nn.Sigmoid(),
82 nn.Linear(32, 32), nn.Sigmoid(),
83 nn.Linear(32, 1))
84 def net_u(self,x,t):
85 u=self.net(torch.hstack((x,t)))
86 return u
87 def net_f(self,x,t):
88
89 vega = 1
90 u=self.net_u(x,t)
91 u = u.to(device)
92
93 u_t=torch.autograd.grad(u,t,grad_outputs=torch.ones_like(u),create_graph=True)[0]
94 u_x=torch.autograd.grad(u,x,grad_outputs=torch.ones_like(u),create_graph=True)[0]
95 u_xx=torch.autograd.grad(u_x,x,grad_outputs=torch.ones_like(u),create_graph=True)[0]
96 u_t = u_t.to(device)
97 u_x = u_x.to(device)
98 u_xx = u_xx.to(device)
99
100
101 f = u_t + u*u_x - vega*u_xx
102 f = f.to(device)
103
104 return f
105
106 def closure(self):
107 # reset gradients to zero
108 self.optimizer.zero_grad()
109 # u & f predictions
110 u_prediction = self.net_u(self.x_u, self.t_u)
111 #
112 f_prediction = self.net_f(self.x_f,self.t_f)
113 #
114 # losses:
115 u_loss_x = self.loss(u_prediction, self.u)
116 f_loss = self.loss(f_prediction, self.null)
117 self.ls = u_loss_x + f_loss
118 # derivative with respect to net’s weights:
119 self.ls.backward()
120 # increase iteration count:
121 self.iter += 1
122 # print report:
123 if not self.iter % 100:
124 print(’Epoch: {0:}, Loss: {1:6.8f}’.format(self.iter, self.ls))
125 return self.ls
126
127 def train(self):
128 """
129 training loop
130 """
131 for epoch in range(15000):
132 self.net.train()
133 self.optimizer.step(self.closure)
134
135# pass data sets to the PINN:
136pinn = PhysicsInformedNN(X_u_train, u_train, X_f_train)
137pinn.train()
138
139import matplotlib.pyplot as plt
140from mpl_toolkits.axes_grid1 import make_axes_locatable
141#
142x = torch.linspace(0, 1, 100)
143t = torch.linspace( 0, 0.1, 4000)
144# x & t grids:
145X,T = torch.meshgrid(x,t)
146# x & t columns:
147xcol = X.reshape(-1, 1)
148tcol = T.reshape(-1, 1)
149# one large column:
150usol = pinn.net_u(xcol,tcol)
151# reshape solution:
152U = usol.reshape(x.numel(),t.numel())
153# transform to numpy:
154xnp = x.cpu().numpy()
155tnp = t.cpu().numpy()
156Unp = U.cpu().detach().numpy()
157
158print(Unp.shape)
159
160for i in range(0,4000,500):
161 plt.plot(Unp[1:99,i], label = ’i =’ +str(i*2.5*10**(-5)) )
162 plt.legend()
Listing 5: An implementation of the PINN for Burger’s equation.

Comparison between PINN, analytical solution and finite difference method

Refer to caption
Refer to caption
Refer to caption
Figure 7: PINN, analytical and finite difference solution side by side
Refer to caption
Refer to caption
Refer to caption
Figure 8: PINN, analytical and finite difference at t=0.0125 s, t=0.05 s and t = 0.0875
1
2#RMSE
3print("RMSE PINN vs Finite difference = ", np.sqrt(np.mean( (Unp[1:99,:] - u_df)**2) ) )
4print("RMSE PINN vs Analytical solution = ", np.sqrt(np.mean((Unp[1:99,:] - u_an[1:99,:])**2) ) )
5print("RMSE Finite difference vs Analytical solution = ", np.sqrt(np.mean((u_an[1:99,:] - u_df)**2)) )
6
7#RMSE PINN vs Finite difference = 0.03899286511045842
8#RMSE PINN vs Analytical solution = 0.033786413268837606
9#RMSE Finite difference vs Analytical solution = 0.011699786048706274
Listing 6: Implementation of the comparison between the 3 methods of resolution.
Analytical solution Finite difference
8 16 32 8 16 32
Layers 2 0.020769 0.014624 0.011578 0.202538 0.021090 0.018698
4 0.010308 0.029504 0.033058 0.018142 0.034560 0.036699
8 0.212536 0.212404 0.009440 0.209136 0.209136 0.018677
Table 5: RSME between PINN and analytical solution for burger’s equation
Neurons
8 16 32
Layers 2 2:00 2:03 2:38
4 2:37 4:32 5:02
8 4:39 8:04 9:52
Table 6: Execution time according to the number of layers and neurons for the burger’equation

3.4 Heat equation: good for diffusion but poor for spatial heterogeneity

In order to compare between the three resolution approaches: the analytical approach, the numerical approach with finite differences and the PINN; we propose to start by the analytical method. We solve the heat equation by considering a constant α\alpha parameter.

ut=α(2ux2+2uy2)x[10,10],y[10,10],t[0,0.25]\frac{\partial u}{\partial t}=\alpha(\frac{\partial^{2}u}{\partial x^{2}}+\frac{\partial^{2}u}{\partial y^{2}})\hskip 5.69046ptx\in[-10,10],\hskip 5.69046pty\in[-10,10],\hskip 5.69046ptt\in[0,0.25]

with the initial condition

u(x,y,0)=ex2y2u(x,y,0)=e^{-x^{2}-y^{2}}

and the boundary conditions

u(0,y,t)=u(x,0,t)=0u(0,y,t)=u(x,0,t)=0

We take α=2\alpha=2.

Analytical solution

The analytical solution is

u(x,y,t)=2σ4Dt+2σex2+y24Dtu(x,y,t)=\frac{\sqrt{2\sigma}}{\sqrt{4Dt+2\sigma}}e^{-\frac{x^{2}+y^{2}}{4Dt}}

With the following Fourier transform applied:

u(kx,ky)=12π+u(x,y)ei(kxx+kyy)u(k_{x},k_{y})=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}u(x,y)e^{-i(k_{x}x+k_{y}y)}

Numerical schema

The solution domain is discretized into cells described by the node set (xi,yj,tn)(x_{i},y_{j},t_{n}) in which xj=ihx_{j}=ih, yj=jly_{j}=jl, tn=nkt_{n}=nk (i = 0,1,…,I; i = 0,1,…,J ; n = 0,1,…,N) h=Δxh=\Delta x and l=Δyl=\Delta y are a spatial mesh size, k=Δtk=\Delta t is the time step and u(xi,yj,tn)=ui,jnu(x_{i},y_{j},t_{n})=u_{i,j}^{n}

ui,jn+1=ui,jn+αΔtΔxui+1,jn2ui,jnui1,jn(Δx)2+αΔtΔyui,j+1n2ui,jnui,j1n(Δy)2u_{i,j}^{n+1}=u_{i,j}^{n}+\alpha\frac{\Delta t}{\Delta x}\frac{u_{i+1,j}^{n}-2u_{i,j}^{n}u_{i-1,j}^{n}}{(\Delta x)^{2}}+\alpha\frac{\Delta t}{\Delta y}\frac{u_{i,j+1}^{n}-2u_{i,j}^{n}u_{i,j-1}^{n}}{(\Delta y)^{2}}

PINN approach

Let us define f(x,y,t)f(x,y,t) as :

f:=utα(2ux2+2uy2)f:=\frac{\partial u}{\partial t}-\alpha(\frac{\partial^{2}u}{\partial x^{2}}+\frac{\partial^{2}u}{\partial y^{2}})

In this case, u(x,y,t)u(x,y,t) will be approximated by a neural network. The latter has as an objective to minimize the following loss :

MSE=MSE0+MSEb+MSEfMSE=MSE_{0}+MSE_{b}+MSE_{f}

where

MSE0=1N0n=1N0|u(x0i,y0i,0)u0i|2MSE_{0}=\frac{1}{N_{0}}\sum_{n=1}^{N_{0}}|u(x_{0}^{i},y_{0}^{i},0)-u_{0}^{i}|^{2}
MSEb=1Nbn=1Nb(|u(0,ybi,tbi)ubi|2+|u(xbi,0,tbi)ubi|2)MSE_{b}=\frac{1}{N_{b}}\sum_{n=1}^{N_{b}}(|u(0,y_{b}^{i},t_{b}^{i})-u_{b}^{i}|^{2}+|u(x_{b}^{i},0,t_{b}^{i})-u_{b}^{i}|^{2})
MSEf=1Nfn=1Nf|f(xfi,yfi,tfi)|2MSE_{f}=\frac{1}{N_{f}}\sum_{n=1}^{N_{f}}|f(x_{f}^{i},y_{f}^{i},t_{f}^{i})|^{2}

{x0i,y0i,u0i}\{x_{0}^{i},y_{0}^{i},u_{0}^{i}\} denotes the initial data at t=0t=0, {tbi,xbi,ybi,ubi}\{t_{b}^{i},x_{b}^{i},y_{b}^{i},u_{b}^{i}\} denotes to the boundary data and {xfi,yfi,tfi}\{x_{f}^{i},y_{f}^{i},t_{f}^{i}\} corresponds to collocation points on f(x,t)f(x,t).

Comparison between PINN, analytical solution and finite difference method

Refer to caption
Refer to caption
Figure 9: Colonoscopy images showing a moderate colitis. The figure shows the initial state of the diffusion process solved by an analytical approach using a Fourier transform. The image is extracted from this paper [47].
Refer to caption
Refer to caption
Figure 10: Colonoscopy images showing a severe colitis. The figure shows the final state of the diffusion process solved by an analytical approach using a Fourier transform. The image is extracted from this paper [47]. The heat equation model may not be able to accurately predict the spatial distribution of the disease as shown by the two arrows.
Analytical method Finite difference
Number of neurons
8 16 32 8 16 32
Layers 2 0.01104 0.02231 0.00507 0.01102 0.02230 0.00506
4 0.00289 0.02391 0.00601 0.00289 0.02391 0.00599
8 0.04588 0.04613 0.04307 0.04586 0.04612 0.04308
Table 7: Comparaison of the RMSE between the (analytical method and the finite difference) and the PINN using different parameters
Time of execution
Layers
2 4 8
Neurons 8 1:56 3:22 5:40
16 1:39 3:19 5:58
32 1:34 2:17 9:48
Table 8: Time of training by minutes for the PINN model using different parameters
Refer to caption
Figure 11: An illustration showing the comparison of three different methods for solving the heat equation, including an analytical solution, PINN and a finite difference method at time t=0.
Refer to caption
Figure 12: An illustration showing the comparison of three different methods for solving the heat equation, including an analytical solution, PINN, and a finite difference method at time t=1.

The RMSE between the analytical solution and the solution obtained from the PINN is found to be on average of the order of 10210^{-2}. This indicates that the PINN method is able to approximate the analytical solution with a certain level of accuracy. Furthermore, the RMSE between the finite difference solution and the PINN solution is also found to be of the same order. However, a direct correlation between the RMSE and the increase in the number of layers and neurons per layer in the PINN is not observed. Nevertheless, some configurations of the PINN are able to approximate the analytical solution as well as the finite difference, demonstrating the effectiveness of this method for solving the heat equation.

3.5 System of Korteweg-De Vries Equations

The Korteweg-de Vries (KDV) equation is a non-linear partial differential equation that describes solitary waves, also known as solitons, and arises from the Navier-Stokes equation, by neglecting the dispersion term. Solitons, are responsible for tsunamis and tidal bores, which can cause significant damage and threaten marine safety. To prevent such disasters, geophysicists study the production process and formation conditions of these waves. Solitary waves are rare maritime occurrences that were first observed by John Scott Russell in 1834. They are exceedingly dangerous due to their unexpected nature and amplitude, which far exceeds the height of swell waves. Characterizing the propagation mode of solitons can help identify them, as they propagate while preserving nearly all of the initial energy. Solitons can be explained using nonlinear models, as they do not conform to linear approaches like the Airy model. The speed of a solitary wave is proportional to its amplitude, a phenomenon known as wave front stiffening, which causes its nonlinear action. The Korteweg de Vries equation is derived from the assumption that the non-linear effect is added to the linear effect reflecting the wave’s dispersion, and the soliton is the compensation of the two effects. Soliton solutions can be obtained using numerical analysis, such as the finite difference method or the Lax-Wendrof approach, which can solve the KDV and Burger’s equations. Solitons can be found in various fields of physics, including solid mechanics and optics, and have real-world applications such as data transmission in telecommunications. PINN approach can be used to solve such equations [48].

Equation

The coupled Korteweg-De Vries Equations are given by

{ut=6auux2bvvxa3ux3,x[250,250],t[0,10]vt=3auvx+3vx3\large{\begin{cases}\frac{\partial u}{\partial t}=6au\frac{\partial u}{\partial x}-2bv\frac{\partial v}{\partial x}-a\frac{\partial^{3}u}{\partial x^{3}},\hskip 8.5359ptx\in[-250,250],\hskip 11.38109ptt\in[0,10]\\ \frac{\partial v}{\partial t}=-3au\frac{\partial v}{\partial x}+\frac{\partial^{3}v}{\partial x^{3}}\end{cases}}

with the initial conditions:

u(x,0)=F(x),v(x,0)=G(x)with x[250,250]u(x,0)=F(x),\hskip 11.38109ptv(x,0)=G(x)\hskip 5.69054pt\text{with x}\in[-250,250]

and the boundary conditions:

u(250,t)=v(250,t)=0u(-250,t)=v(-250,t)=0

u(250,t)=v(250,t)=0u(250,t)=v(250,t)=0

Analytical solution

The analytical solution of the equation is the couple (u,v) with

u(x,t)=2λ2sech2(ξ)u(x,t)=2\lambda^{2}sech^{2}(\xi)
v(x,t)=12w1/2sech(ξ)v(x,t)=\frac{1}{2w^{1/2}}sech(\xi)

where:

ξ=λ(xλ2t)+12log(w)\xi=\lambda(x-\lambda^{2}t)+\frac{1}{2\log(w)}

w=b8(4a+1)λ4w=\frac{-b}{8(4a+1)\lambda^{4}}

In this paper, we take a=18a=-\frac{1}{8}, b=-3, λ=12\lambda=\frac{1}{2}

Numerical schema

The solution domain is discretized into cells described by the node set (xi,tn)(x_{i},t_{n}) in which xj=ihx_{j}=ih, tn=nkt_{n}=nk (i = 0,1,…,I; n = 0,1,…,N) h=Δxh=\Delta x is a spatial mesh size, k=Δtk=\Delta t is the time step and u(xi,tn)=uinu(x_{i},t_{n})=u_{i}^{n}

uin+1=uin34ΔtΔxuin(uinui1n)+6ΔtΔxvin(vinvi1n)+116Δt(Δx)3(ui+2n2ui+1n+2ui1kui2k)u_{i}^{n+1}=u_{i}^{n}-\frac{3}{4}\frac{\Delta t}{\Delta x}u_{i}^{n}(u_{i}^{n}-u_{i-1}^{n})+6\frac{\Delta t}{\Delta x}v_{i}^{n}(v_{i}^{n}-v_{i-1}^{n})+\frac{1}{16}\frac{\Delta t}{(\Delta x)^{3}}(u_{i+2}^{n}-2u_{i+1}^{n}+2u_{i-1}^{k}-u_{i-2}^{k})
vin+1=vin3ΔtΔxuin(vikvi1k)+12Δt(Δx)3(vi+2k2vi+1k+2vi1kvi2k)v_{i}^{n+1}=v_{i}^{n}-3\frac{\Delta t}{\Delta x}u_{i}^{n}(v_{i}^{k}-v_{i-1}^{k})+\frac{1}{2}\frac{\Delta t}{(\Delta x)^{3}}(v_{i+2}^{k}-2v_{i+1}^{k}+2v_{i-1}^{k}-v_{i-2}^{k})
1
2import torch
3import numpy as np
4from tqdm import tqdm
5import matplotlib.pyplot as plt
6import torch.nn as nn
7from random import uniform
8device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
9torch.set_default_tensor_type(’torch.cuda.FloatTensor’)
10print(device)
11
12def omega(a,b,lambd):
13 return -b/(8*(4*a+1)*lambd**4)
14
15def xi(x,t,a,b,lambd):
16 return lambd*(x-t*lambd**2) + 1/(2*np.log(omega(a,b,lambd)))
17
18def fct_u(x,t,a,b,lambd):
19 return (2*lambd**2)/(np.cosh(xi(x,t,a,b,lambd))**2)
20
21def fct_v(x,t,a,b,lambd):
22 return 1/(2*np.sqrt(omega(a,b,lambd))*np.cosh(xi(x,t,a,b,lambd)))
23
24a = -1/8
25b = -3
26lambd = 0.5
27
28N_uv = 1000
29N_f = 15000
30
31#Make X_uv_train
32#BC x=-10 & tt t > 0
33x_left = np.ones((N_uv//4,1), dtype=float)*(-250)
34t_left = np.random.uniform(low=0.001, high=10.0, size=(N_uv//4,1))
35X_left = np.hstack((x_left, t_left))
36
37#BC x=10 & tt t > 0
38x_right = np.ones((N_uv//4,1), dtype=float)*(250)
39t_right = np.random.uniform(low=0.001, high=10.0, size=(N_uv//4,1))
40X_right = np.hstack((x_right, t_right))
41
42#IC t=0 & tt x,y in [-10,10]
43t_zero = np.zeros((N_uv//2,1), dtype=float)
44x_zero = np.random.uniform(low=-250.0, high=250.0, size=(N_uv//2,1))
45X_zero = np.hstack((x_zero, t_zero))
46
47X_uv_train = np.vstack((X_left, X_right, X_zero))
48# shuffling
49index=np.arange(0,N_uv)
50np.random.shuffle(index)
51X_uv_train=X_uv_train[index,:]
52
53#Make u_train
54u_left = np.zeros((N_uv//4,1), dtype=float)
55u_right = np.zeros((N_uv//4,1), dtype=float)
56u_initial = fct_u(x_zero,0,a,b,lambd)
57u_train = np.vstack((u_left, u_right, u_initial))
58
59#Make v_train
60v_left = np.zeros((N_uv//4,1), dtype=float)
61v_right = np.zeros((N_uv//4,1), dtype=float)
62v_initial = fct_v(x_zero,0,a,b,lambd)
63v_train = np.vstack((v_left, v_right, v_initial))
64
65# ==========================================
66u_train=u_train[index,:]
67v_train=v_train[index,:]
68# ==========================================
69# make X_f_train
70X_f_train=np.zeros((N_f,2),dtype=float)
71for row in range(N_f):
72 x=uniform(-250,250)
73 t=uniform(0,10)
74 X_f_train[row,0]=x
75 X_f_train[row,1]=t
76
77X_f_train=np.vstack((X_f_train, X_uv_train))
78
79class PhysicsInformedNN():
80 def __init__(self,X_uv,u,v,X_f):
81 # x & t from boundary conditions
82 self.x_uv = torch.tensor(X_uv[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
83 self.t_uv = torch.tensor(X_uv[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
84 # x & t from collocation points
85 self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
86 self.t_f = torch.tensor(X_f[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
87 # boundary solution
88 self.u = torch.tensor(u, dtype=torch.float32)
89 self.v = torch.tensor(v, dtype=torch.float32)
90 # null vector to test against f:
91 self.null = torch.zeros((self.x_f.shape[0], 1))
92 # initialize net:
93 self.create_net()
94
95 self.optimizer = torch.optim.Adam(self.net.parameters(),lr=0.001)
96 # typical MSE loss (this is a function):
97 self.loss = nn.MSELoss()
98 # loss :
99 self.ls = 0
100 # iteration number:
101 self.iter = 0
102
103 def create_net(self):
104 self.net=nn.Sequential(
105 nn.Linear(2,32), nn.Sigmoid(),
106 nn.Linear(32, 32), nn.Sigmoid(),
107 nn.Linear(32, 32), nn.Sigmoid(),
108 nn.Linear(32, 32), nn.Sigmoid(),
109 nn.Linear(32, 32), nn.Sigmoid(),
110 nn.Linear(32, 32), nn.Sigmoid(),
111 nn.Linear(32, 32), nn.Sigmoid(),
112
113 nn.Linear(32, 2))
114 def net_uv(self,x,t):
115 uv=self.net(torch.hstack((x,t)))
116 return uv
117
118
119 def net_fg(self,x,t):
120 uv = self.net_uv(x,t)
121 u = uv[:,0].reshape(-1,1).to(device)
122 v = uv[:,1].reshape(-1,1).to(device)
123
124 #u partial derivatives
125 u_t = torch.autograd.grad(u,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
126 u_x = torch.autograd.grad(u,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
127 u_xx = torch.autograd.grad(u_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
128 u_xxx = torch.autograd.grad(u_xx,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
129
130 #v partial derivatives
131 v_t = torch.autograd.grad(v,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
132 v_x = torch.autograd.grad(v,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
133 v_xx = torch.autograd.grad(v_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
134 v_xxx = torch.autograd.grad(v_xx,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
135
136 f = u_t - 6*a*u*u_x -2*b*v*v_x - a*u_xxx
137 g = v_t +3*u*v_x + v_xxx
138
139 f = f.to(device)
140 g = g.to(device)
141
142 return f,g
143
144 def closure(self):
145
146 # reset gradients to zero
147 self.optimizer.zero_grad()
148
149 # u & f predictions
150 uv_prediction = self.net_uv(self.x_uv, self.t_uv)
151 u_prediction = uv_prediction[:,0].reshape(-1,1)
152 v_prediction = uv_prediction[:,1].reshape(-1,1)
153 #
154 f_prediction_u, f_prediction_v = self.net_fg(self.x_f, self.t_f)
155
156 #
157 # losses:
158 u_loss = self.loss(u_prediction, self.u)
159 v_loss = self.loss(v_prediction, self.v)
160 f_loss_u = self.loss(f_prediction_u, self.null)
161 f_loss_v = self.loss(f_prediction_v, self.null)
162
163 self.ls = u_loss + v_loss + f_loss_u + f_loss_v
164
165 # derivative with respect to net’s weights:
166 self.ls.backward()
167
168 # increase iteration count:
169 self.iter += 1
170
171 # print report:
172 if not self.iter % 100:
173 print(’Epoch: {0:}, Loss: {1:6.8f}’.format(self.iter, self.ls))
174 return self.ls
175
176 def train(self):
177 for epoch in range(15000):
178 self.net.train()
179 self.optimizer.step(self.closure)
180
181pinn = PhysicsInformedNN(X_uv_train,u_train,v_train, X_f_train)
182pinn.train()
183
184import matplotlib.pyplot as plt
185from mpl_toolkits.axes_grid1 import make_axes_locatable
186#
187x = torch.linspace(-250, 250, 500)
188t = torch.linspace( 0, 10, 500)
189# x & t grids:
190X,T = torch.meshgrid(x,t)
191# x & t columns:
192xcol = X.reshape(-1, 1)
193tcol = T.reshape(-1, 1)
194# one large column:
195sol = pinn.net_uv(xcol, tcol)
196usol = sol[:,0].reshape(-1,1)
197vsol = sol[:,1].reshape(-1,1)
198# reshape solution:
199U = usol.reshape(x.numel(), t.numel())
200V = vsol.reshape(x.numel(), t.numel())
201# transform to numpy:
202xnp = x.cpu().numpy()
203tnp = t.cpu().numpy()
204Unp = U.cpu().detach().numpy()
205Vnp = V.cpu().detach().numpy()
206
207plt.plot(np.linspace(-250,250,500), fct_u(np.linspace(-250,250,500),100*0.02,a,b,lambd), label=’analytic’, color="red")
208plt.plot(np.linspace(-250,250,500),Unp[:,100], label=’pinn’, linestyle=’dashed’)
209plt.xlim(-30,30)
210plt.legend()
211
212plt.plot(np.linspace(-250,250,500), fct_u(np.linspace(-250,250,500),0.02*300,a,b,lambd), label=’analytic’, color=’red’)
213plt.plot(np.linspace(-250,250,500),Unp[:,300], label=’pinn’, linestyle=’dashed’)
214plt.xlim(-50,50)
215plt.legend()
216
217plt.plot(np.linspace(-250,250,500), fct_v(np.linspace(-250,250,500),25*0.02,a,b,lambd), label=’analytic’, color=’red’)
218plt.plot(np.linspace(-250,250,500),Vnp[:,25], label=’pinn’, linestyle=’dashed’)
219plt.xlim(-50,50)
220plt.legend()
221
222x = np.linspace(-250,250,500)
223t = np.linspace(0,10,500)
224
225ms_x, ms_t = np.meshgrid(x,t)
226
227U_an = fct_u(ms_x, ms_t, a,b, lambd)
228V_an = fct_v(ms_x, ms_t, a,b, lambd)
229
230#RMSE analytic & PINN
231np.sqrt(np.mean((Unp-U_an.T)**2))
232# 0.00152934398676765
233
234#RMSE analytic & PINN
235np.sqrt(np.mean((Vnp-V_an.T)**2))
236#0.001979420025691176
237
238a = -1/8
239b = -3
240lambd = 0.5
241
242delta_x = 1
243delta_t = 0.02
244
245x = np.arange(-250,250,delta_x)
246t = np.arange(0,10,delta_t)
247
248alpha = delta_t/delta_x
249beta = delta_t/(delta_x**3)
250
251n = len(x)
252m = len(t)
253
254u_df = np.zeros((n,m))
255v_df = np.zeros((n,m))
256
257#For u
258#BC
259u_df[0,:] = 0
260u_df[-1,:] = 0
261
262#IC
263u_df[:,0] = fct_u(x,0,a,b,lambd)
264
265#For v
266#BC
267v_df[0,:] = 0
268v_df[-1,:] = 0
269
270#IC
271v_df[:,0] = fct_v(x,0,a,b,lambd)
272
273for k in tqdm(range(0,m-1)):
274 for i in range(2,n-3):
275 u_df[i,k+1] = u_df[i,k] + 6*a*alpha*u_df[i,k]*(u_df[i,k]-u_df[i-1,k]) - 2*b*alpha*v_df[i,k]*(v_df[i,k]-v_df[i-1,k]) - 0.5*a*beta*(u_df[i+2,k]-2*u_df[i+1,k]+2*u_df[i-1,k]-u_df[i-2,k])
276 v_df[i,k+1] = v_df[i,k] - 3*alpha*u_df[i,k]*(v_df[i,k]-v_df[i-1,k]) + 0.5*beta*(v_df[i+2,k]-2*v_df[i+1,k]+2*v_df[i-1,k]-v_df[i-2,k])
277
278#RMSE PINN & finite difference
279np.sqrt(np.mean((Unp-u_df)**2))
280# 0.010952436812997277
281
282#RMSE analytic & finite difference
283np.sqrt(np.mean((U_an.T-u_df)**2))
284# 0.010989140837703614
285
286#RMSE PINN & finite difference
287np.sqrt(np.mean((Vnp-v_df)**2))
288# 0.006990164623215043
289
290#RMSE analytic & finite difference
291np.sqrt(np.mean((V_an-v_df)**2))
292# 0.015797690668562524
Listing 7: An implementation of the PINN resolution for the KdV system.

Comparison beetween PINN, analytical solution and finite difference method

Refer to caption
(a)
Refer to caption
(b)
Figure 13: The figure on the left shows ’u’ while the right shows ’v’ at t=2. Both solutions are plotted on the analytical method
Refer to caption
(a)
Refer to caption
(b)
Figure 14: The figure on the left shows ’u’ while the right shows ’v’ at t=6. Both solutions are plotted on the analytical method.
Analytical Finite difference
Layers Layers
2 4 8 2 4 8
u v u v u v u v u v u v
N 8 0.007 0.002 0.008 0.0037 0.036 0.012 0.01 0.006 0.0101 0.005 0.01 0.009
16 0.005 0.0019 0.0028 0.007 0.036 0.012 0.009 0.006 0.0102 0.006 0.033 0.009
32 0.005 0.0063 0.0029 0.0012 0.037 0.0065 0.01 0.006 0.0107 0.006 0.033 0.009
Table 9: Comparaison of the RMSE between the (analytical method and the finite difference) and the PINN using different parameters for the Korteweg-De Vries Equations.

The value of the RMSE between the analytical solution and the finite difference is approximately 0.022733 and 0.011562 for u and v, respectively. The RMSE in relation to the number of layers and neurons is on average of the order of 10310^{-3} for u and v. The neural network is able to approximate the finite difference solution of the coupled KDV equation. The RMSE value does not vary linearly with the increase or decrease of the number of layers and neurons per layer.

4 PINN framework for the IBDs spread modeling by PDEs

In this section, we will discuss the combination of the concepts from image analysis and mathematical modeling. Using the PINN method, both of these areas are treated. Indeed, The figure 15 shows a global approach that unifies the mathematical models which exploit the spatial information of the two diseases.

  • The estimation of a score of the disease severity with the use of first-order ODE for spatial distribution. PINN is used to solve these equations and to estimate the score.

  • A transfer learning model for the classification of the visual appearance of the different type of lesions. The deep learning model can use an image dataset annotated by gastroenterologists.

  • Based on the spatial information extracted from images, PINN is used to predict the evolution and the spread of the two diseases.

  • PINN is used to predict the lesions distributions along the colon.

Refer to caption
Figure 15: The experimental framework studied in our paper.

4.1 First-order ODE for spatial distribution

In this part, an ordinary differential equation (ODE) has been used to describe the spatial distribution of colonic lesions in individuals with ulcerative colitis. This model is found effective in describing the disease state and there parameters were correlated with severity assessments provided by gastroenterologists [49, 50]. Our contribution consists in finding an PINN architecture allowing to solve this equation by finding the two parameters the growth rate α\alpha and the abundance cc for the following first-order ODE model:

z(s)=αz(s)z^{\prime}(s)=\alpha*z(s)

with s[0,1]s\in[0,1]

then the analytical solution is

z(s)=ceαsz(s)=c*e^{\alpha*s}
1import torch
2import torch.nn as nn
3import torch.optim as optim
4
5# Define the neural network architecture
6class Net(nn.Module):
7 def __init__(self, n_input, n_hidden_1, n_hidden_2, n_output):
8 super(Net, self).__init__()
9 self.fc1 = nn.Linear(n_input, n_hidden_1)
10 self.fc2 = nn.Linear(n_hidden_1, n_hidden_2)
11 self.fc3 = nn.Linear(n_hidden_2, n_output)
12
13 def forward(self, x):
14 x = self.fc1(x)
15 x = self.fc2(x)
16 x = self.fc3(x)
17 return x
18
19# Define the training loss
20def loss(predicted_z, target_z):
21 return torch.nn.functional.mse_loss(predicted_z, target_z)
22
23# Define the physics-informed constraints
24def ode_constraint(predicted_z, alpha):
25 return predicted_z[1] - alpha * predicted_z[0]
26
27def initial_condition_constraint(predicted_z, initial_z):
28 return predicted_z[0] - initial_z
29
30def train(target_z, initial_z, n_hidden_1, n_hidden_2, learning_rate, n_iter):
31 # Create the neural network
32 net = Net(2, n_hidden_1, n_hidden_2, 1)
33
34 # Create the Adam optimizer
35 optimizer = optim.Adam(net.parameters(), lr=learning_rate)
36
37 # Train the neural network
38 for i in range(n_iter):
39 # Zero the gradients
40 optimizer.zero_grad()
41
42 # Forward pass
43 predicted_z = net(t)
44
45 # Compute the loss
46 l = loss(predicted_z, target_z)
47
48 # Compute the gradients of the loss
49 l.backward()
50
51 # Update the weights
52 optimizer.step()
53
54 # Compute the physics-informed constraints
55 ode_error = ode_constraint(predicted_z, alpha)
56 initial_condition_error = initial_condition_constraint(predicted_z, initial_z)
57
58 # Compute the total error
59 total_error = l + ode_error + initial_condition_error
60
61 # Print the error every 100 iterations
62 if i % 100 == 0:
63 print(total_error)
64
65# Set the hyperparameters
66target_z = ...
67initial_z = ...
68n_hidden_1 = ...
69n_hidden_2 = ...
70learning_rate = ...
71n_iter = ...
72
73# Train the neural network
74train(target_z, initial_z, n_hidden_1, n_hidden_2, learning_rate, n_iter)
Listing 8: A pytorch code predicting the solution of the problem.

4.2 A possible explanation of the Crohn’s disease with a Turing mechanism

4.2.1 A first Turing mechanism presented in [5]

Crohn’s disease is a chronic inflammatory bowel disease characterized by patchy inflammation throughout the gastrointestinal tract. In this study [5], the authors propose a reaction-diffusion system that uses bacteria and phagocytic cells to model the dysfunctional immune responses that cause IBD. They demonstrate that, under specific conditions, the system can generate activator-inhibitor dynamics that lead to the formation of spatially periodic and time-indelible Turing patterns. This is the first time Turing patterns have been applied to an inflammation model, and the study compares the model parameters with realistic parameters from the literature.

The model represents the intestine as an interval on the real axis, taking into account only two components: external bacteria (microbiota, pathogens, or antigens) and immune cells (phagocytes). In a healthy gut, the immune response controls the inflammatory response, while in disease states, the intestinal immune system becomes unbalanced, resulting in excessive migration of immune cells to damaged areas and increased epithelial barrier permeability. These changes allow for further infiltration of the microbiota, which can exacerbate inflammation. A complex network of interactions between these factors initiates the inflammatory cascade that leads to the Crohn’s disease.

β(t,x,y)tdbΔβ(t,x,y)=rb(1β(t,x,y)bi)β(t,x,y)aβ(t,x,y)γ(t,x,y)sb+β(t,x,y)+fe(1β(t,x,y)bi)γ(t,x,y)\frac{\partial\beta(t,x,y)}{\partial t}-d_{b}\Delta\beta(t,x,y)=r_{b}(1-\frac{\beta(t,x,y)}{b_{i}})\beta(t,x,y)-\frac{a\beta(t,x,y)\gamma(t,x,y)}{s_{b}+\beta(t,x,y)}+f_{e}(1-\frac{\beta(t,x,y)}{b_{i}})\gamma(t,x,y)
γ(t,x,y)tdcΔγ(t,x,y)=fbβ(t,x,y)rcγ(t,x,y)\frac{\partial\gamma(t,x,y)}{\partial t}-d_{c}\Delta\gamma(t,x,y)=f_{b}\beta(t,x,y)-r_{c}\gamma(t,x,y)

With the following initial conditions

β(0,x,y)=β0(x,y)\beta(0,x,y)=\beta_{0}(x,y)

and

γ(0,x,y)=γ0(x,y)\gamma(0,x,y)=\gamma_{0}(x,y)

In our study, the boundary conditions of Neumann have been considered. In the paper, a periodic boundary conditions have been considered instead.

parameter description value
rbr_{b} The bacteria reproduction rate per minute 0.0347min10.0347\;min^{-1}
rcr_{c} The phagocytes intrinsic death rate per minute 0.02min10.02\;min^{-1}
dbd_{b} The bacteria diffusion rate per minute 1013m2min110^{-13}\;m^{2}*min^{-1}
dcd_{c} The phagocytes diffusion rate per minute 1010m2min110^{-10}\;m^{2}*min^{-1}
bib_{i} The bacteria density in the lumen 1017m310^{17}\;m^{3}
fbf_{b} The rate of the immune response par minute 0.002min10.002\;min^{-1}
aa the product between sbs_{b} and pcp_{c} 0.3129min10.3129\;min^{-1}
sbs_{b} a cofficient of proportionality between pcp_{c} and aa 1015m310^{15}\;m^{3}
fef_{e} The rate per minute of the epithelium porosity 0.0856min10.0856\;min^{-1}
Table 10: The table shows the value of the different parameters used in the Turing system. The different values are extracted from [5].
1import numpy as np
2import matplotlib.pyplot as plt
3
4def reac_diff_solver_1D():
5 L = 3000
6 N = 3000
7 T = 200000
8 dt = 1
9
10 x = np.linspace(0, L, N)
11 dx = x[1] - x[0]
12 t = np.arange(0, T+dt, dt)
13 t_num = len(t)
14
15 d_b = 10**0
16 d_c = 10**5
17
18 # r_c = 0.0002 # To get a Turing process
19 r_c = 2 # This value kill the Turing process
20 b_i = 10**17 # To get a Turing process
21 r_b = 0.0347
22 k = 0.1 # (c*=k*b*)
23 f_b = k*r_c
24
25 s_b = 10**15
26 theta = 0.3 # (b*=theta*b_i)
27 alpha = 0.3129
28
29 f_e = alpha*theta*b_i/((s_b+theta*b_i)*(1-theta))-r_b/k
30
31 Kb = dt/(dx**2)*d_b
32 Kc = dt/(dx**2)*d_c
33
34 b0 = 1*10**15*(1495 < x)*(x < 1505)
35
36 c0 = np.zeros_like(b0)
37
38 Ab = np.diag((1+2*Kb)*np.ones(N)) - np.diag(Kb*np.ones(N-1), 1) - np.diag(Kb*np.ones(N-1), -1)
39 Ab[0, 0] = 1 + Kb
40 Ab[-1, -1] = 1 + Kb
41
42 Ac = np.diag((1+2*Kc+dt*r_c)*np.ones(N)) - np.diag(Kc*np.ones(N-1), 1) - np.diag(Kc*np.ones(N-1), -1)
43 Ac[0, 0] = 1 + Kc + dt*r_c
44 Ac[-1, -1] = 1 + Kc + dt*r_c
45
46 B = np.zeros((t_num, N))
47 B[0, :] = b0
48
49 C = np.zeros((t_num, N))
50 C[0, :] = c0
51
52 source_b = np.zeros(N)
53 source_c = np.zeros(N)
54 MD_b = np.zeros(N)
55 MD_c = np.zeros(N)
56
57 plt.figure()
58 plt.plot(10**(-6)*x, B[0, :], "blue", 10**(-6)*x, C[0, :], "red")
59 for j in range(1, t_num):
60
61 MD_b = B[j-1, :] + dt*f_e*C[j-1, :]*(1 - B[j-1, :]/b_i) + dt*r_b*(1 - B[j-1, :]/b_i)*B[j-1, :] - dt*alpha*B[j-1, :]*C[j-1, :]/(s_b + B[j-1, :])
62 Ab_extended = Ab
63 B[j, :] = np.linalg.solve(Ab_extended, MD_b)
64 MD_c = C[j-1, :] + dt*f_b*B[j, :]
65 C[j, :] = np.linalg.solve(Ac, MD_c)
66
67 plt.plot(10**(-6)*x, B[j, :], "blue", 10**(-6)*x, C[j, :], "red")
68 plt.xlim([0, 3*10**(-3)])
69 plt.ylim([0, 10**(17)])
70 plt.legend(["bacteria at time t="+str(t[j])])
71 plt.pause(0.00001)
72
73 plt.figure()
74 plt.plot(10**(-6)*x, B[-1, :], "blue", 10**(-6)*x, 100*C[-1, :], "red", 10**(-6)*x, B[0, :], "yellow")
75 plt.xlabel("spatial variable $x$ (meters)")
76 plt.ylabel("temporal variables")
77 plt.legend(["bacteria", "chemoattractant", "initial bacteria"])
78
79reac_diff_solver_1D()
Listing 9: The code exposes the resolution of the 1D system by a finite difference schema.
1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
4
5def reac_diff_solver_2D():
6 L = 3000
7 N = 100
8 T = 20000
9
10 x = np.linspace(0, L, N)
11 y = x
12 dx = np.max(np.abs(np.diff(x)))
13 dt = 1
14 R = 0.5
15 tempIt = int(T/dt)
16
17 Xi, Yi = np.meshgrid(x, y)
18
19 d_b = 10**0
20 d_c = 10**3
21 r_c = 0.02
22 b_i = 10**17
23 r_b = 0.0347
24 k = 0.1
25 f_b = k*r_c
26 s_b = 10**15
27 theta = 0.3
28 alpha = 0.3129
29
30 f_e = alpha*theta*b_i/((s_b+theta*b_i)*(1-theta))-r_b/k
31
32 Rb = dt/(2*dx**2)*d_b
33 Rc = dt/(2*dx**2)*d_c
34
35 b0 = 10**15*np.double(np.abs(Xi-L/2)<50)
36 c0 = 0*b0
37
38 X = np.reshape(Xi, (N)**2, 1)
39 Y = np.reshape(Yi, (N)**2, 1)
40
41 B_aux = np.array([b0[:,0], b0, b0[:, -1]])
42 B_aux = np.vstack([B_aux[0,:], B_aux, B_aux[-1,:]])
43 B_vector = np.reshape(B_aux, (N+2)**2, 1)
44 C_aux = np.array([c0[:,0], c0, c0[:, -1]])
45 C_aux = np.vstack([C_aux[0,:], C_aux, C_aux[-1,:]])
46 C_vector = np.reshape(C_aux, (N+2)**2, 1)
47
48 t = 0
49
50 Mb = np.diag((1+2*Rb)*np.ones((N+2)**2,1)) - np.diag(Rb*np.ones((N+2)**2-1,1), -1) - np.diag(Rb*np.ones((N+2)**2-1,1), 1)
51 Mb = np.asarray(Mb).tolist()
52 Mc = np.diag((1+2*Rc+dt*r_c)*np.ones((N+2)**2,1)) - np.diag(Rc*np.ones((N+2)**2-1,1), -1) - np.diag(Rc*np.ones((N+2)**2-1,1), 1)
53 Mc = np.asarray(Mc).tolist()
54 Ab = np.diag((1-2*Rb)*np.ones((N+2)**2,1)) + np.diag(Rb*np.ones((N+2)**2-(N+2),1), (N+2)) + np.diag(Rb*np.ones((N+2)**2-(N+2),1), -(N+2))
55 Ab = np.asarray(Ab).tolist()
56 Ac = np.diag((1-2*Rc)*np.ones((N+2)**2,1)) + np.diag(Rc*np.ones((N+2)**2-(N+2),1), (N+2)) + np.diag(Rc*np.ones((N+2)**2-(N+2),1), -(N+2))
57 Ac = np.asarray(Ac).tolist()
58
59 Mbinv = np.linalg.inv(Mb)
60 Gb = Mbinv*Ab
61 Mcinv = np.linalg.inv(Mc)
62 Gc = Mcinv*Ac
63
64 fig = plt.figure(1)
65 ax1 = fig.add_subplot(121, projection=’3d’)
66 bsurf = ax1.plot_surface(Xi, Yi, b0)
67 ax2 = fig.add_subplot(122, projection=’3d’)
68 csurf = ax2.plot_surface(Xi, Yi, c0)
69 ptit = plt.title(f’t = {t}’)
70 plt.xlim([0, L])
71 plt.ylim([0, L])
72
73 for it in range(1, tempIt+1):
74 t = t+dt
75 B_vector = Gb*B_vector
76 C_vector = Gc*C_vector
77
78 B_aux = np.reshape(B_vector, (N+2, N+2))
79 C_aux = np.reshape(C_vector, (N+2, N+2))
80
81 B_aux[1:N+1, 1:N+1] = B_aux[1:N+1, 1:N+1] + dt*(f_b*C_aux[1:N+1, 1:N+1] - theta*B_aux[1:N+1, 1:N+1] + b_i*(1-theta))
82 C_aux[1:N+1, 1:N+1] = C_aux[1:N+1, 1:N+1] + dt*(r_c*C_aux[1:N+1, 1:N+1] + f_e*B_aux[1:N+1, 1:N+1])
83
84 B_vector = np.reshape(B_aux, (N+2)**2, 1)
85 C_vector = np.reshape(C_aux, (N+2)**2, 1)
86
87 if it % 50 == 0:
88 bsurf.set_zdata(np.reshape(B_vector, (N+2, N+2))[1:N+1, 1:N+1])
89 csurf.set_zdata(np.reshape(C_vector, (N+2, N+2))[1:N+1, 1:N+1])
90 ptit.set_text(f’t = {t}’)
91 plt.draw()
92 plt.pause(0.001)
93 plt.show()
Listing 10: The code exposes the resolution of the 2D system ny a finite difference method.
1import torch
2import numpy as np
3import torch.nn as nn
4from random import uniform, random
5import random
6import matplotlib.pyplot as plt
7device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
8torch.set_default_tensor_type(’torch.cuda.FloatTensor’)
9print(device)
10
11rb = 0.0347
12rc = 2
13db = 10**(0)
14dc = 10**(5)
15bi = 10**(7)
16k=0.1
17fb = k*rc
18alpha = 0.3129
19sb = 10**(5)
20theta = 0.3
21fe = alpha*theta*bi/((sb+theta*bi)*(1-theta))-rb/k
22
23N_uv = 14000
24N_f = 20000
25
26#Make X_uv_train
27#BC x=-10 & tt t > 0
28x_left = np.zeros((N_uv//4,1), dtype=float)
29t_left = np.random.uniform(low=0.0, high=1500.0, size=(N_uv//4,1))
30X_left = np.hstack((x_left, t_left))
31
32#BC x=10 & tt t > 0
33x_right = np.ones((N_uv//4,1), dtype=float)*(3000)*10**(-6)
34t_right = np.random.uniform(low=0.0, high=1500.0, size=(N_uv//4,1))
35X_right = np.hstack((x_right, t_right))
36
37#IC t=0 & tt x,y in [-10,10]
38t_zero = np.zeros((N_uv//2,1), dtype=float)
39x_zero = np.random.uniform(low=0.0, high=3000*10**(-6), size=(N_uv//2,1))
40X_zero = np.hstack((x_zero, t_zero))
41
42X_uv_train = np.vstack((X_left, X_right, X_zero))
43# shuffling
44index=np.arange(0,N_uv)
45np.random.shuffle(index)
46X_uv_train=X_uv_train[index,:]
47
48#Make u_train
49u_left = np.zeros((N_uv//4,1), dtype=float)
50u_right = np.zeros((N_uv//4,1), dtype=float)
51#u_initial = list_indicator(np.sort(x_zero, axis=0),1400*10**(-6),1600*10**(-6)).reshape(N_uv//2,1)*10**(5)
52#u_initial = np.zeros((N_uv//2,1), dtype=float)
53#u_initial[N_uv//4 -50: N_uv//4 +50] = 10**(5)
54u_initial = np.exp(-((x_zero*10**(6) - N_uv//4)**2)/10000)*10**(5)
55u_train = np.vstack((u_left, u_right, u_initial))
56
57
58#Make v_train
59v_left = np.zeros((N_uv//4,1), dtype=float)
60v_right = np.zeros((N_uv//4,1), dtype=float)
61v_initial = np.zeros((N_uv//2,1), dtype=float)
62v_train = np.vstack((v_left, v_right, v_initial))
63
64# ==========================================
65u_train=u_train[index,:]
66v_train=v_train[index,:]
67# ==========================================
68# make X_f_train
69X_f_train=np.zeros((N_f,2),dtype=float)
70for row in range(N_f):
71 #x=random.randint(0,3000)*10**(-6)
72 x = uniform(0,3000*10**(-6))
73 #t=random.randint(0,5000)
74 t = uniform(0,1500)
75 X_f_train[row,0]=x
76 X_f_train[row,1]=t
77
78X_f_train=np.vstack((X_f_train, X_uv_train))
79
80class PhysicsInformedNN():
81 def __init__(self,X_uv,u,v,X_f):
82 # x & t from boundary conditions
83 self.x_uv = torch.tensor(X_uv[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
84 self.t_uv = torch.tensor(X_uv[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
85 # x & t from collocation points
86 self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
87 self.t_f = torch.tensor(X_f[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
88 # boundary solution
89 self.u = torch.tensor(u, dtype=torch.float32)
90 self.v = torch.tensor(v, dtype=torch.float32)
91 # null vector to test against f:
92 self.null = torch.zeros((self.x_f.shape[0], 1))
93 # initialize net:
94 self.create_net()
95
96 self.optimizer = torch.optim.Adam(self.net.parameters(),lr=0.001)
97 # typical MSE loss (this is a function):
98 self.loss = nn.MSELoss()
99 # loss :
100 self.ls = 0
101 # iteration number:
102 self.iter = 0
103
104 def create_net(self):
105 self.net=nn.Sequential(
106 nn.Linear(2,16), nn.Sigmoid(),
107 nn.Linear(16, 16), nn.Sigmoid(),
108 nn.Linear(16, 16), nn.Sigmoid(),
109 nn.Linear(32, 32), nn.Sigmoid(),
110 nn.Linear(32, 32), nn.Sigmoid(),
111 nn.Linear(32, 32), nn.Sigmoid(),
112 nn.Linear(32, 32), nn.Sigmoid(),
113 nn.Linear(32, 2))
114
115 def net_uv(self,x,t):
116 uv=self.net(torch.hstack((x,t)))
117 return uv
118
119
120 def net_fg(self,x,t):
121
122 rb = 0.0347
123 rc = 2
124 db = 10**(0)
125 dc = 10**(5)
126 bi = 10**(7)
127 k=0.1
128 fb = k*rc
129 alpha = 0.3129
130 sb = 10**(5)
131 theta = 0.3
132 fe = alpha*theta*bi/((sb+theta*bi)*(1-theta))-rb/k
133
134 uv = self.net_uv(x,t)
135 u = uv[:,0].reshape(-1,1).to(device)
136 v = uv[:,1].reshape(-1,1).to(device)
137
138 #u partial derivatives
139 u_t = torch.autograd.grad(u,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
140 u_x = torch.autograd.grad(u,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
141 u_xx = torch.autograd.grad(u_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
142
143
144 #v partial derivatives
145 v_t = torch.autograd.grad(v,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
146 v_x = torch.autograd.grad(v,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
147 v_xx = torch.autograd.grad(v_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
148
149
150 f = u_t - db*u_xx - rb*(1 - u/bi)*u + (alpha*u*v)/(sb + u) - fe*(1 - u/bi)*v
151 g = v_t - dc*v_xx - fb*u + rc*v
152
153 f = f.to(device)
154 g = g.to(device)
155
156 return f,g
157
158 def closure(self):
159
160 # reset gradients to zero
161 self.optimizer.zero_grad()
162
163 # u & f predictions
164 uv_prediction = self.net_uv(self.x_uv, self.t_uv)
165 u_prediction = uv_prediction[:,0].reshape(-1,1)
166 v_prediction = uv_prediction[:,1].reshape(-1,1)
167 #
168 f_prediction_u, f_prediction_v = self.net_fg(self.x_f, self.t_f)
169
170 #
171 # losses:
172 u_loss = self.loss(u_prediction, self.u)
173 v_loss = self.loss(v_prediction, self.v)
174 f_loss_u = self.loss(f_prediction_u, self.null)
175 f_loss_v = self.loss(f_prediction_v, self.null)
176
177 self.ls = u_loss + v_loss + f_loss_u + f_loss_v
178
179 # derivative with respect to net’s weights:
180 self.ls.backward()
181
182 # increase iteration count:
183 self.iter += 1
184
185 # print report:
186 if not self.iter % 100:
187 print(’Epoch: {0:}, Loss: {1:6.12f}’.format(self.iter, self.ls))
188 return self.ls
189
190 def train(self):
191 for epoch in range(15000):
192 self.net.train()
193 self.optimizer.step(self.closure)
Listing 11: The python code contains the PINN implementation for the first Turing system.

4.2.2 A second Turing mechanism with cubic term

Let’s discuss another activator-inhibitor dynamic in a Turing system, where the diffusion rate of the activator is much slower than that of the inhibitor. In this system, the inhibitor suppresses the production of both components, while the activator component must increase its own production. The system is subject to small perturbations that encourage the emergence of large-scale patterns, according to the Turing model.

The system is described by two coupled non-linear partial differential equations. The first equation describes the time evolution of the activator component, u, as a function of time t and two spatial dimensions x and y. The left-hand side of the equation represents the rate of change of u with respect to time (irreversible over time), while the right-hand side consists of four terms. The first term represents the diffusion of u with a diffusion coefficient a, while the second term represents the self-enhancement of u (a non-linear term with a cubic dependency). The third term represents the inhibition of u by v, and the fourth term is a constant offset, c. The second equation describes the time evolution of the inhibitor component, v, with an additional time constant, τ\tau. The left-hand side of the equation represents the rate of change of v with respect to time, while the right-hand side consists of three terms. The first term represents the diffusion of v with a diffusion coefficient b, while the second term represents the production of v by u. The third term represents the inhibition of v by itself. The coupling of the two equations via the inhibition term of the activator and the production term of the inhibitor results in the formation of spatial patterns in the system.

u(t,x,y)t=aΔu(t,x,y)+u(t,x,y)u3(t,x,y)v(t,x,y)+c\frac{\partial u(t,x,y)}{\partial t}=a*\Delta u(t,x,y)+u(t,x,y)-u^{3}(t,x,y)-v(t,x,y)+c
τv(t,x,y)t=bΔv(t,x,y)+u(t,x,y)v(t,x,y)\tau\frac{\partial v(t,x,y)}{\partial t}=b*\Delta v(t,x,y)+u(t,x,y)-v(t,x,y)
1import numpy as np
2import matplotlib.pyplot as plt
3
4# Constants
5a = 2.8e-4
6b = 5e-3
7tau = .1
8k = -.005
9
10# Grid size and step sizes
11size = 100
12dx = 2. / size
13
14# Total time and number of iterations
15T = 15.0
16dt = .001
17n = int(T / dt)
18
19# Initialize U and V as random arrays
20U = np.random.rand(size, size)
21V = np.random.rand(size, size)
22
23# Function to compute the Laplacian of an array
24def laplacian(Z):
25 Ztop = Z[0:-2, 1:-1]
26 Zleft = Z[1:-1, 0:-2]
27 Zbottom = Z[2:, 1:-1]
28 Zright = Z[1:-1, 2:]
29 Zcenter = Z[1:-1, 1:-1]
30 return (Ztop + Zleft + Zbottom + Zright - 4 * Zcenter) / dx**2
31
32# Function to plot the patterns
33def show_patterns(U, ax=None):
34 ax.imshow(U, cmap=plt.cm.Reds,
35 interpolation=’bilinear’,
36 extent=[-1, 1, -1, 1])
37 ax.set_axis_off()
38
39# Set up the plot
40fig, axes = plt.subplots(3, 3, figsize=(8, 8))
41step_plot = n // 15
42
43# Iterate and update the variables
44for i in range(n):
45 # Compute the Laplacians of U and V
46 deltaU = laplacian(U)
47 deltaV = laplacian(V)
48 # Take the values of U and V inside the grid
49 Uc = U[1:-1, 1:-1]
50 Vc = V[1:-1, 1:-1]
51 # Update U and V
52 U[1:-1, 1:-1], V[1:-1, 1:-1] = (
53 Uc + dt * (a * deltaU + Uc - Uc**3 - Vc + k),
54 Vc + dt * (b * deltaV + Uc - Vc) / tau
55 )
56 # Neumann conditions: set derivatives at the edges to zero
57 for Z in (U, V):
58 Z[0, :] = Z[1, :]
59 Z[-1, :] = Z[-2, :]
60 Z[:, 0] = Z[:, 1]
61 Z[:, -1] = Z[:, -2]
62
63 # Plot the state of the system at 15 different times
64 if i % step_plot == 0 and i < 15 * step_plot:
65 ax = axes.flat[i // step_plot]
66 show_patterns(U, ax=ax)
67 ax.set_title(f’$t={i * dt:.2f}$’)
Listing 12: The resolution of the system of equations by a finite difference method. The code is modified from [51].
Refer to caption
Figure 16: The Crohn’s disease structure formation with a finite difference method. In this simulation, the Neumann boundary conditions have been considered (The solution normal derivative values at the domain boundaries are fixed to zero). The time intervals used do not necessarily correspond to the real elapsed time in the real case, and are only used to describe the dynamics of the systems.
Refer to caption
Refer to caption
Figure 17: The two figures show a strong similarity between the real disease pattern (on the left) and the simulated pattern (on the right). This suggests that the Turing nonlinear system of partial differential equations, solved using the finite difference schema, was able to accurately capture the behavior of the real disease (the underlying dynamics including the interactions between different factors such as the concentration of bacteria and the presence of phagocytes and the spread of the disease in this case).
1import torch
2import numpy as np
3import torch.nn as nn
4from random import uniform
5import matplotlib.pyplot as plt
6from mpl_toolkits.axes_grid1 import make_axes_locatable
7device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
8torch.set_default_tensor_type(’torch.cuda.FloatTensor’)
9print(device)
10#
11N_uv = 400
12N_f = 10000
13
14#Make X_uv_train
15#BC x=-1, tt y, t>0
16x_left = np.ones((N_uv//8,1), dtype=float)*(-1)
17y_left = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//8,1))
18t_left = np.random.uniform(low=0.0001, high=10.0, size=(N_uv//8,1))
19X_left = np.hstack((x_left, y_left, t_left))
20
21#BC x=1, tt y, t>0
22x_right = np.ones((N_uv//8,1), dtype=float)*(1)
23y_right = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//8,1))
24t_right = np.random.uniform(low=0.0001, high=10.0, size=(N_uv//8,1))
25X_right = np.hstack((x_right, y_right, t_right))
26
27#BC y=1, tt x, t>0
28x_upper = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//8,1))
29y_upper = np.ones((N_uv//8, 1), dtype=float)*(1)
30t_upper = np.random.uniform(low=0.0001, high=10.0, size=(N_uv//8,1))
31X_upper = np.hstack((x_upper, y_upper, t_upper))
32
33#BC y=-1, tt x, t>0
34x_lower = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//8,1))
35y_lower = np.ones((N_uv//8,1), dtype=float)*(-1)
36t_lower = np.random.uniform(low=0.0001, high=10.0, size=(N_uv//8,1))
37X_lower = np.hstack((x_lower, y_lower, t_lower))
38
39#IC t=0, tt x,y
40x_zero = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//2,1))
41y_zero = np.random.uniform(low=-1.0, high=1.0, size=(N_uv//2,1))
42t_zero = np.zeros((N_uv//2,1), dtype=float)
43X_zero = np.hstack((x_zero, y_zero, t_zero))
44
45X_uv_train = np.vstack((X_left, X_right, X_upper, X_lower, X_zero))
46
47#Shuffling
48index = np.arange(0,N_uv)
49np.random.shuffle(index)
50X_uv_train = X_uv_train[index,:]
51
52#Make u_train
53u_left = np.zeros((N_uv//8,1), dtype=float)
54u_right = np.zeros((N_uv//8,1), dtype=float)
55u_upper = np.zeros((N_uv//8,1), dtype=float)
56u_lower = np.zeros((N_uv//8,1), dtype=float)
57u_initial = np.random.uniform(low=0.0, high=1.0, size=(N_uv//2,1))
58#u_initial = np.exp(-x_zero**2 - y_zero**2)
59u_train = np.vstack((u_left, u_right, u_upper, u_lower, u_initial))
60
61#Make v_train
62v_left = np.zeros((N_uv//8,1), dtype=float)
63v_right = np.zeros((N_uv//8,1), dtype=float)
64v_upper = np.zeros((N_uv//8,1), dtype=float)
65v_lower = np.zeros((N_uv//8,1), dtype=float)
66v_initial = np.random.uniform(low=0.0, high=1.0, size=(N_uv//2,1))
67#v_initial = np.exp(-x_zero**2 - y_zero**2)
68v_train = np.vstack((v_left, v_right, v_upper, v_lower, v_initial))
69
70#shuffling
71u_train = u_train[index,:]
72v_train = v_train[index,:]
73
74#Make X_f_train
75X_f_train = np.zeros((N_f,3), dtype=float)
76for i in range(N_f):
77 x=uniform(-1,1)
78 y=uniform(-1,1)
79 t=uniform(0,10)
80 X_f_train[i,0] = x
81 X_f_train[i,1] = y
82 X_f_train[i,2] = t
83
84X_f_train = np.vstack((X_f_train, X_uv_train))
85#
86class PhysicsInformedNN():
87 def __init__(self,X_uv,u,v,X_f):
88 # x & t from boundary conditions
89 self.x_uv = torch.tensor(X_uv[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
90 self.y_uv = torch.tensor(X_uv[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
91 self.t_uv = torch.tensor(X_uv[:, 2].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
92 # x & t from collocation points
93 self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
94 self.y_f = torch.tensor(X_f[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
95 self.t_f = torch.tensor(X_f[:, 2].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
96 # boundary solution
97 self.u = torch.tensor(u, dtype=torch.float32)
98 self.v = torch.tensor(v, dtype=torch.float32)
99 # null vector to test against f:
100 self.null = torch.zeros((self.x_f.shape[0], 1))
101 # initialize net:
102 self.create_net()
103
104 self.optimizer = torch.optim.Adam(self.net.parameters(),lr=0.001)
105 # typical MSE loss (this is a function):
106 self.loss = nn.MSELoss()
107 # loss :
108 self.ls = 0
109 # iteration number:
110 self.iter = 0
111
112 def create_net(self):
113 self.net=nn.Sequential(
114 nn.Linear(3,16), nn.Sigmoid(),
115 nn.Linear(16, 16), nn.Sigmoid(),
116 nn.Linear(16, 16), nn.Sigmoid(),
117 nn.Linear(16, 16), nn.Sigmoid(),
118 nn.Linear(20, 20), nn.Sigmoid(),
119 nn.Linear(32, 32), nn.Sigmoid(),
120 nn.Linear(32, 32), nn.Sigmoid(),
121 nn.Linear(32, 2))
122
123 def net_uv(self,x,y,t):
124 uv=self.net(torch.hstack((x,y,t)))
125 return uv
126
127
128 def net_fg(self,x,y,t):
129
130 a = 2.8e-4
131 b = 5e-3
132 c = -0.005
133 tau=0.1
134
135 uv = self.net_uv(x,y,t)
136 u = uv[:,0].reshape(-1,1).to(device)
137 v = uv[:,1].reshape(-1,1).to(device)
138
139
140 #u partial derivatives
141 u_t = torch.autograd.grad(u,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
142 u_x = torch.autograd.grad(u,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
143 u_xx = torch.autograd.grad(u_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
144 u_y = torch.autograd.grad(u,y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
145 u_yy = torch.autograd.grad(u_y,y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
146
147
148 #v partial derivatives
149 v_t = torch.autograd.grad(v,t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
150 v_x = torch.autograd.grad(v,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
151 v_xx = torch.autograd.grad(v_x,x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
152 v_y = torch.autograd.grad(v,y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
153 v_yy = torch.autograd.grad(v_y,y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
154
155
156 f = u_t - a*(u_xx + u_yy) - u + u**3 + v - c
157 g = v_t - (b*(v_xx + v_yy) + u - v)/tau
158
159 f = f.to(device)
160 g = g.to(device)
161
162 return f,g
163
164 def closure(self):
165
166 # reset gradients to zero
167 self.optimizer.zero_grad()
168
169 # u & f predictions
170 uv_prediction = self.net_uv(self.x_uv, self.y_uv, self.t_uv)
171 u_prediction = uv_prediction[:,0].reshape(-1,1)
172 v_prediction = uv_prediction[:,1].reshape(-1,1)
173 #
174 f_prediction_u, f_prediction_v = self.net_fg(self.x_f, self.y_f, self.t_f)
175
176 #
177 # losses:
178 u_loss = self.loss(u_prediction, self.u)
179 v_loss = self.loss(v_prediction, self.v)
180 f_loss_u = self.loss(f_prediction_u, self.null)
181 f_loss_v = self.loss(f_prediction_v, self.null)
182
183 self.ls = u_loss + v_loss + f_loss_u + f_loss_v
184
185 # derivative with respect to net’s weights:
186 self.ls.backward()
187
188 # increase iteration count:
189 self.iter += 1
190
191 # print report:
192 if not self.iter % 100:
193 print(’Epoch: {0:}, Loss: {1:6.8f}’.format(self.iter, self.ls))
194 return self.ls
195
196 def train(self):
197 for epoch in range(6000):
198 self.net.train()
199 self.optimizer.step(self.closure)
200
201pinn = PhysicsInformedNN(X_uv_train,u_train,v_train, X_f_train)
202pinn.train()
203#
204x = torch.linspace(-1, 1, 200)
205y = torch.linspace(-1, 1, 200)
206t = torch.linspace( 0, 10, 200)
207# x & t grids:
208X,Y,T = torch.meshgrid(x,y,t)
209# x & t columns:
210xcol = X.reshape(-1, 1)
211ycol = Y.reshape(-1, 1)
212tcol = T.reshape(-1, 1)
213# one large column:
214sol = pinn.net_uv(xcol, ycol, tcol)
215usol = sol[:,0].reshape(-1,1)
216vsol = sol[:,1].reshape(-1,1)
217# reshape solution:
218U = usol.reshape(x.numel(), y.numel(), t.numel())
219V = vsol.reshape(x.numel(), y.numel(), t.numel())
220# transform to numpy:
221xnp = x.cpu().numpy()
222ynp = y.cpu().numpy()
223tnp = t.cpu().numpy()
224Unp = U.cpu().detach().numpy()
225Vnp = V.cpu().detach().numpy()
226
227# Plot a heat map for the final result
228#plt.imshow(Unp[:,:,-1])
229#plt.imshow(Vnp[:,:,-1])
Listing 13: The code used implements the PINN architecture aiming to solve the second Turing system.
Finite difference solution
Neurons
8 16 32
U V U V U V
Layers 2 0.987 1.233 0.994 1.165 0.857 1.137
4 0.993 1.195 0.993 1.092 1.115 1.238
8 0.989 1.132 1.003 1.120 1.238 1.125
Table 11: RSME between the PINN and the finite difference solution for the Turing reaction-diffusion system.
Refer to caption
Figure 18: Solution using finite difference method of the Turing system at t=0 and t=10.
Refer to caption
Figure 19: The figure shows that the PINN architecture fail to model the complex dynamic of the Crohn’s disease.

The root mean squared error (RMSE) between the solution obtained by using the physics-informed neural network (PINN) and the finite difference solution tends towards 1 for u and v. At first glance, this low RMSE might indicate that the neural network has converged to a solution. However, as evidenced in figure 19, this is not the case in practice. This equation models the formation of patterns as a result of interaction between different pigments, but the two solutions do not show the same patterns. Therefore, in this case, the PINN diverges and fails to approximate the numerical solution. Continued research aims to uncover the root causes behind the inability of PINNs to adequately capture the intricate explosion phenomena in IBDs.

4.3 Fisher-KPP equation: modeling the bacteria/phagocytes couple with one PDE

The Fisher [52] Kolmogorov-Petrovsky-Piskunov [53] equation is a reaction-diffusion equation that plays a significant role in describing various chemical, physical and biological phenomena. It is often used to model the propagation of a single wave, such as the spread of a disease or the front of a chemical reaction. In the case of ulcerative colitis and Crohn’s disease, the Fisher KPP equation can be used to model the spread of these intestinal diseases through the incorporation of factors such as the concentration of bacteria and the presence of phagocytes. One advantage of using the Fisher-KPP equation to model the IBDSs spread is that it is a relatively simple equation that can be solved analytically or numerically. This makes it a good choice for modeling the spread of these diseases, especially in cases where there is limited data available [34].

{u(x,t)=bacteria(x,t)bacteria(x,t)+phagocytes(x,t)bacteria(x,t)+phagocytes(x,t)=N\begin{cases}u(x,t)=\frac{bacteria(x,t)}{bacteria(x,t)+phagocytes(x,t)}\\ bacteria(x,t)+phagocytes(x,t)=N\end{cases}

Fisher-KPP Equation

u(x,t)t=D2u(x,t)x2+ru(x,t)(1u(x,t)),x[50,50]t[0,10],\frac{\partial{u(x,t)}}{\partial t}=D\frac{\partial^{2}{u(x,t)}}{\partial{x^{2}}}+ru(x,t)(1-u(x,t)),\hskip 14.22636ptx\in[-50,50]\hskip 5.69046ptt\in[0,10],\hskip 5.69046pt
u(x,0)={1if x00if x>0u(x,0)=\begin{cases}1&\text{if }x\leq 0\\ 0&\text{if }x>0\end{cases}

And the boundary conditions u(-50,t)=1 and u(50,t)=0 with D=1 and r=1. The diffusion term: D2u(x,t)x2D\frac{\partial^{2}{u(x,t)}}{\partial{x^{2}}} represents the rate at which bacteria in the medium migrate through a linear diffusion process with diffusivity D. The reaction term: ru(x,t)(1u(x,t))ru(x,t)(1-u(x,t)) represents the bacteria proliferation rate in the medium, which is assumed to be proportional to the u(x,t)u(x,t), and the remaining carrying capacity of the environment, (1u(x,t))(1-u(x,t)). The parameter r represents the growth rate and the quantity ru(x,t)(1u(x,t))ru(x,t)(1-u(x,t)) models the bacteria logistic growth. The term (1u(x,t))(1-u(x,t)) represents the limiting factor, which means that the growth rate decreases as both bacteria and phagocytes approaches the carrying capacity of the biological medium.

Numerical schema

The solution domain is discretized into cells described by the node set (xi,tn)(x_{i},t_{n}) in which xj=ihx_{j}=ih, tn=nkt_{n}=nk (i = 0,1,…,I; n = 0,1,…,N) h=Δxh=\Delta x is a spatial mesh size, k=Δtk=\Delta t is the time step and u(xi,tn)=uinu(x_{i},t_{n})=u_{i}^{n}

uin+1=uin+DΔt(Δx)2(ui+1n2uin+ui1n)+rΔtuin(1uin)u_{i}^{n+1}=u_{i}^{n}+D\frac{\Delta t}{(\Delta x)^{2}}(u_{i+1}^{n}-2u_{i}^{n}+u_{i-1}^{n})+r\Delta tu_{i}^{n}(1-u_{i}^{n})
1import numpy as np
2from tqdm import tqdm
3import matplotlib.pyplot as plt
4from mpl_toolkits.mplot3d import Axes3D
5from matplotlib import cm
6from matplotlib.ticker import LinearLocator, FormatStrFormatter
7from mpl_toolkits.axes_grid1 import make_axes_locatable
8
9D=1
10r=1
11delta_x = 0.1
12delta_t = 0.001
13alpha = D*delta_t/(delta_x**2)
14beta = r*delta_t
15x = np.arange(-50,50,delta_x)
16t = np.arange(0,1,delta_t)
17m=len(x)
18n=len(t)
19u_df = np.zeros((len(t), len(x)))
20plt.plot(x, np.heaviside(-x,0))
21#Initial condition
22u_df[0,:] = np.heaviside(-x,0)
23
24#Bondaries conditions
25u_df[:,0] = 1
26u_df[:,-1] = 0
27
28#Filling u matrix for finite difference schema
29for k in tqdm(range(0,n-1)):
30 for i in range(1,m-2):
31 u_df[k+1,i] = u_df[k,i] + alpha*(u_df[k,i+1] - 2*u_df[k,i] + u_df[k,i-1]) + beta*u_df[k,i]*(1-u_df[k,i])
32
33plt.plot(x, u_df[100,:])
34
35#Plotting u for finite difference
36for i in range(0,1000,200):
37 plt.plot(x, u_df[i,:], label=’t=’+str(i*0.001)+’s’)
38 plt.xlim(left=-10, right=10)
39 plt.legend()
40 plt.xlabel(’x’)
41 plt.ylabel(’u(x,t)’)
42
43
44# 3D plotting in the phase space
45fig = plt.figure()
46ax = fig.gca(projection=’3d’)
47
48x = np.arange(-50,50, delta_x)
49t = np.arange(0,1, delta_t)
50
51ms_x, ms_t = np.meshgrid(x,t)
52
53
54surf = ax.plot_surface(ms_x, ms_t, u_df , cmap=cm.coolwarm, linewidth=0, antialiased=False)
55
56ax.zaxis.set_major_locator(LinearLocator(10))
57ax.zaxis.set_major_formatter(FormatStrFormatter(’%.02f’))
58
59ax.set_xlabel(’x’)
60ax.set_ylabel(’t’)
61#ax.set_zlabel(’u(x,t)’)
62
63fig.colorbar(surf, shrink=0.5, aspect=5)
64
65plt.show()
Listing 14: Implementation of the Fisher-KPP resolution with a finite difference schema.
Refer to caption
Figure 20: Solution of the Fisher-KPP equation using a finite difference method in the phase space.

Comparison

Refer to caption
Refer to caption
Figure 21: PINN, analytical and finite difference solution side by side
Refer to caption
Refer to caption
Refer to caption
Figure 22: PINN and finite difference solution at t=1s, t=5s and t=9s
Analytical solution
8 16 32
Layers 2 0.028321 0.074042 0.040836
4 0.037587 0.009196 0.027296
8 0.043136 0.022766 0.013606
Table 12: RSME between PINN and analytical solution for Fisher-KPP equation
Neurons
8 16 32
Layers 2 1:39 1:43 2:06
4 2:41 2:39 4:02
8 4:24 4:48 8:00
Table 13: Execution time according to the number of layers and neurons for the Fisher-KPP equation

The lower the RMSE, the better the approximation. In our case, the RMSE in relation to the number of layers and neurons is on average of the order of 10210^{-2}. Thus, the PINN is able to approximate the finite difference solution of the Fisher-KPP equation. Note that increasing or decreasing the number of layers and neurons does not systematically improve the RMSE.

Refer to caption
Figure 23: The figure shows the general architecture of the resolution with PINN for the Fisher-KPP equation.
1import torch
2import torch.nn as nn
3from random import uniform
4device = torch.device(’cuda:0’ if torch.cuda.is_available() else ’cpu’)
5torch.set_default_tensor_type(’torch.cuda.FloatTensor’)
6N_u =2000
7N_f =15000
8
9#Make X_u_train
10#Boundary condition: x=-10 , tt t>0
11x_left = np.ones((N_u//4,1))*(-10)
12t_left = np.random.uniform(low=0, high=1, size=(N_u//4,1))
13X_left = np.hstack((x_left, t_left))
14
15#Boundary condition: x=10 , tt t>0
16x_right = np.ones((N_u//4,1))*(10)
17t_right = np.random.uniform(low=0, high=1, size=(N_u//4,1))
18X_right = np.hstack((x_right, t_right))
19
20#Initial condition: t=0 , tt x in [-10,10]
21x_zero = np.random.uniform(low=-10, high=10, size=(N_u//2,1))
22t_zero = np.zeros((N_u//2,1))
23X_zero = np.hstack((x_zero, t_zero))
24
25X_u_train=np.vstack((X_right,X_left,X_zero))
26# shuffling
27index=np.arange(0,N_u)
28np.random.shuffle(index)
29X_u_train=X_u_train[index,:]
30
31#Make u_train
32# make u_train
33u_right=np.zeros((N_u//4,1),dtype=float) # boundary condition x=-10
34u_left=np.ones((N_u//4,1),dtype=float) # boundary condition x=10
35u_initial=np.heaviside(-x_zero,0) # initial condition about the function
36u_train=np.vstack((u_right,u_left,u_initial)) # u_train tensor preparation
37u_train=u_train[index,:]
38
39# make X_f_train
40X_f_train=np.zeros((N_f,2),dtype=float)
41for row in range(N_f):
42 x=uniform(-10,10)
43 t=uniform(0,1)
44 X_f_train[row,0]=x
45 X_f_train[row,1]=t
46X_f_train=np.vstack((X_f_train, X_u_train))
47
48class PhysicsInformedNN():
49 def __init__(self,X_u,u,X_f):
50 # x & t from boundary conditions
51 self.x_u = torch.tensor(X_u[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
52 self.t_u = torch.tensor(X_u[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
53 # x & t from collocation points
54 self.x_f = torch.tensor(X_f[:, 0].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
55 self.t_f = torch.tensor(X_f[:, 1].reshape(-1, 1),dtype=torch.float32,requires_grad=True)
56 # boundary solution
57 self.u = torch.tensor(u, dtype=torch.float32)
58 # null vector to test against f:
59 self.null = torch.zeros((self.x_f.shape[0], 1))
60 # initialize net:
61 self.create_net()
62
63 self.optimizer = torch.optim.Adam(self.net.parameters(),lr=0.001)
64 # typical MSE loss (this is a function):
65 self.loss = nn.MSELoss()
66 # loss :
67 self.ls = 0
68 # iteration number:
69 self.iter = 0
70
71 def create_net(self):
72 self.net=nn.Sequential(
73 nn.Linear(2,32), nn.Sigmoid(),
74
75 nn.Linear(32, 32), nn.Sigmoid(),
76
77 nn.Linear(32, 32), nn.Sigmoid(),
78 nn.Linear(32, 32), nn.Sigmoid(),
79 nn.Linear(32, 1))
80 def net_u(self,x,t):
81 u=self.net(torch.hstack((x,t)))
82 return u
83 def net_f(self,x,t):
84 u=self.net_u(x,t)
85 u = u.to(device)
86
87 u_t=torch.autograd.grad(u,t,grad_outputs=torch.ones_like(u),create_graph=True)[0]
88 u_x=torch.autograd.grad(u,x,grad_outputs=torch.ones_like(u),create_graph=True)[0]
89 u_xx=torch.autograd.grad(u_x,x,grad_outputs=torch.ones_like(u),create_graph=True)[0]
90 u_t = u_t.to(device)
91 u_x = u_x.to(device)
92 u_xx = u_xx.to(device)
93
94
95 f = u_t - u_xx - u*(1-u)
96 f = f.to(device)
97
98 return f
99
100 def closure(self):
101 # reset gradients to zero
102 self.optimizer.zero_grad()
103 # u & f predictions
104 u_prediction = self.net_u(self.x_u, self.t_u)
105 #
106 f_prediction = self.net_f(self.x_f,self.t_f)
107 #
108 # losses:
109 u_loss_x = self.loss(u_prediction, self.u)
110 f_loss = self.loss(f_prediction, self.null)
111 self.ls = u_loss_x + f_loss
112 # derivative with respect to net’s weights:
113 self.ls.backward()
114 # increase iteration count:
115 self.iter += 1
116 # print report:
117 if not self.iter % 100:
118 print(’Epoch: {0:}, Loss: {1:6.8f}’.format(self.iter, self.ls))
119 return self.ls
120
121 def train(self):
122 """
123 training loop
124 """
125 for epoch in range(20000):
126 self.net.train()
127 self.optimizer.step(self.closure)
128
129# pass data sets to the PINN:
130pinn = PhysicsInformedNN(X_u_train, u_train, X_f_train)
131pinn.train()
132
133import matplotlib.pyplot as plt
134from mpl_toolkits.axes_grid1 import make_axes_locatable
135#
136x = torch.linspace(-50, 50, 1000)
137t = torch.linspace( 0, 1, 1000)
138# x & t grids:
139X,T = torch.meshgrid(x,t)
140# x & t columns:
141xcol = X.reshape(-1, 1)
142tcol = T.reshape(-1, 1)
143# one large column:
144usol = pinn.net_u(xcol,tcol)
145# reshape solution:
146U = usol.reshape(x.numel(),t.numel())
147# transform to numpy:
148xnp = x.cpu().numpy()
149tnp = t.cpu().numpy()
150Unp = U.cpu().detach().numpy()
151
152#Plotting u for PINN method
153for i in range(0,1000,200):
154 plt.plot(np.arange(-50,50,0.1),Unp[:,i], label=’t=’+str(i*0.001)+’s’)
155 plt.xlim(left=-10, right=10)
156 plt.legend()
157 plt.xlabel(’x’)
158 plt.ylabel(’u(x,t)’)
159
160
161# A comparison between PINN and the finite difference method
162plt.plot(np.arange(-50,50,0.1),Unp[:,500], label=’PINN’, linestyle=’dashed’)
163plt.plot(np.arange(-50,50,0.1), u_df[500,:], label=’Finite Difference’, color=’red’)
164plt.xlim(left=-10, right=10)
165plt.legend()
166plt.xlabel(’x’)
167plt.ylabel(’u(x,t)’)
Listing 15: The PINN architecture code for the Fisher-KPP equation.
Phenomenon Modelling Remarks
unidimensional modelling First order differential equation

Simplistic approach allows to have a score using a low quality data

Diffusion Modelling Heat equation

Unique/ works well without data/Does not take into consideration the pigmentation

Modelling of viscous diffusion Burgers equation Works well without data
Front wave transport Modelling Fisher KPP Works well without data

M

odelling with non linear terms
Kordweg de Vries Works well without data

Turing pattern Modelling

Turing equation

Numerical instability problem

Table 14: The table shows the different partial differential equations studied in this paper.

5 Spatial distribution extraction by computer vision

The extraction of the spatial distribution of ulcerative colitis and Crohn’s disease using computer vision can be very useful for our study. By using image processing algorithms, it is possible to accurately map out the areas affected by the two diseases. This can help doctors better understand the progression of the disease and adjust treatment accordingly. In addition, it can also allows for faster and more accurate evaluation of treatment response, which can be particularly useful in the context of testing new medications. The traditional detection methods based on the expertise of gastroenterologists are time and resource intensive. As a result, early detection and treatment of these diseases can reduce casualties and improve the patient’s quality of life later on. With recent advances in Deep Learning, powerful approaches for both detection and classification that can handle complex environments have been developed. In this paper, we propose a deep learning based architecture for object classification in the context of Crohn disease. The goal is to assist the PINN framework developed above with input data for the PDEs initial conditions and boundary conditions. The proposed solution combines deep learning and tweaked transfer learning models for object classification and detection with balanced data for each image class. It can operate in a more complex environment and takes into consideration the state of the input. Its aim is to automatically detect damages, locate them and classify the disease type.

5.1 Transfer learning for image transformation

Transfer learning is a deep learning technique that allows the use of pre-trained models to perform image transformation tasks. The pre-trained models have already learned general features from a large dataset, making them suitable for transfer learning to related tasks. In the context of spatial distribution extraction, transfer learning can be used to improve the accuracy of feature extraction for specific image classes. The residual network ResNet [54] has been used for this transfer learning taks. Its architecture is available in many different forms, each with a different number of layers, a variant with 50 layers of neural networks is referred known as Resnet50. The Resnet50 and Resnet34 architectures differ significantly in one key area. In this case, the building block was changed into a bottleneck design due to worries about the amount of time needed to train the layers. In order to create the Resnet 50 architecture, each of the Resnet34’s 2-layer bottleneck blocks was changed to a 3-layer bottleneck block. Comparing this model to the 34-layer ResNet model, the accuracy of this model is noticeably higher (as shown in the figure 24).

On the other hand, the discrete gradient algorithm known as inpainting and the clustering techniques can then be used to fill in missing data and group similar data, respectively. Principal component analysis (PCA) can also be applied to reduce the dimensionality of the extracted features, enabling more efficient processing of the data. Ultimately, these image transformation techniques enable accurate and efficient computer vision solutions for analyzing complex biological images.

Refer to caption
Figure 24: Resnet architecture variants.

The data used to test this approach is the Kvasir dataset [35] which is a collection of annotated medical images of the gastrointestinal (GI) tract, designed for the purpose of computer-aided disease detection. The dataset is important for research in the medical domain of detection and retrieval, especially for single- and multi-disease computer aided detection. It contains images classified into three important anatomical landmarks and three clinically significant findings, as well as two categories of images related to endoscopic polyp removal. The sorting and annotation of the dataset are done by medical doctors. The Kvasir dataset may improve medical practice and refine health care systems globally since it includes sufficient numbers of images to be used for different tasks such as machine learning, deep learning, image retrieval, and transfer learning. The Kvasir dataset is collected using endoscopic equipment at Vestre Viken Hospital Trust in Norway and is carefully annotated by medical experts from VV and the Cancer Registry of Norway (CRN). The CRN is responsible for the national cancer screening programmes with the goal to prevent cancer death by discovering cancers or pre-cancerous lesions as early as possible. The Kvasir dataset is containing 8 classes of 2000 images per class. As a first step we opted to work with binary classification by feeding the model with both the normal images and the images having the disease.

1import os
2import random
3import numpy as np
4import torch
5import torchvision
6import torchvision.transforms as transforms
7import matplotlib.pyplot as plt
8
9def train_val_split(dataset, val_split=0.2, shuffle=True, random_seed=None):
10 if shuffle:
11 if random_seed is not None:
12 random.seed(random_seed)
13 dataset_size = len(dataset)
14 indices = list(range(dataset_size))
15 random.shuffle(indices)
16 else:
17 indices = list(range(len(dataset)))
18
19 split = int(np.floor(val_split * len(dataset)))
20 train_indices, val_indices = indices[split:], indices[:split]
21
22 train_sampler = torch.utils.data.SubsetRandomSampler(train_indices)
23 val_sampler = torch.utils.data.SubsetRandomSampler(val_indices)
24
25 return train_sampler, val_sampler
26
27def get_dataloaders(data_dir, image_size=(512, 512), batch_size=16, val_split=0.2, shuffle=True, random_seed=None):
28 data_transforms = {
29 ’train’: transforms.Compose([
30 transforms.Resize(image_size),
31 transforms.ToTensor()
32 ]),
33 ’val’: transforms.Compose([
34 transforms.Resize(image_size),
35 transforms.ToTensor()
36 ])
37 }
38
39 image_datasets = {
40 x: torchvision.datasets.ImageFolder(
41 os.path.join(data_dir, x),
42 transform=data_transforms[x]
43 )
44 for x in [’train’, ’val’]
45 }
46
47 train_sampler, val_sampler = train_val_split(image_datasets[’train’], val_split, shuffle, random_seed)
48
49 dataloaders = {
50 ’train’: torch.utils.data.DataLoader(
51 image_datasets[’train’], batch_size=batch_size, sampler=train_sampler
52 ),
53 ’val’: torch.utils.data.DataLoader(
54 image_datasets[’val’], batch_size=batch_size, sampler=val_sampler
55 )
56 }
57
58 return dataloaders
59
60def visualize_random_images(dataloader, classes):
61 # Get a batch of training data
62 inputs, labels = next(iter(dataloader))
63
64 # Make a grid from batch
65 out = torchvision.utils.make_grid(inputs)
66
67 plt.imshow(out.permute(1, 2, 0))
68 plt.title([classes[label] for label in labels])
69 plt.show()
70
71dataloaders = get_dataloaders(data_dir)
72visualize_random_images(dataloaders[’train’], dataloaders[’train’].dataset.classes)
73
74device = torch.device("cuda" if torch.cuda.is_available()
75 else "cpu")
76model = models.resnet50(pretrained=True)
77print(model)
78
79for param in model.parameters():
80 param.requires_grad = False
81
82model.fc = nn.Sequential(nn.Linear(2048, 1024),
83 nn.ReLU(),
84 nn.Dropout(0.2),
85 nn.Linear(1024, 2),
86 nn.LogSoftmax(dim=1))
87criterion = nn.NLLLoss()
88optimizer = optim.Adam(model.fc.parameters(), lr=0.001)
89model.to(device)
90
91epochs = 100
92steps = 0
93running_loss = 0
94print_every = 10
95train_losses, test_losses = [], []
96for epoch in range(epochs):
97 for inputs, labels in trainloader:
98 steps += 1
99 inputs, labels = inputs.to(device), labels.to(device)
100 optimizer.zero_grad()
101 logps = model.forward(inputs)
102 #print(logps, labels.view(-1, 1))
103 loss = criterion(logps, labels)
104 loss.backward()
105 optimizer.step()
106 running_loss += loss.item()
107
108 if steps % print_every == 0:
109 test_loss = 0
110 accuracy = 0
111 model.eval()
112 with torch.no_grad():
113 for inputs, labels in testloader:
114 inputs, labels = inputs.to(device),labels.to(device)
115 logps = model.forward(inputs)
116 batch_loss = criterion(logps, labels)
117 test_loss += batch_loss.item()
118
119 ps = torch.exp(logps)
120 top_p, top_class = ps.topk(1, dim=1)
121 equals = top_class == labels.view(*top_class.shape)
122 accuracy +=torch.mean(equals.type(torch.FloatTensor)).item()
123 train_losses.append(running_loss/len(trainloader))
124 test_losses.append(test_loss/len(testloader))
125 print(f"Epoch {epoch+1}/{epochs}.. "
126 f"Train loss: {running_loss/print_every:.3f}.. "
127 f"Test loss: {test_loss/len(testloader):.3f}.. "
128 f"Test accuracy: {accuracy/len(testloader):.3f}")
129 running_loss = 0
130 model.train()
131torch.save(model, ’sicknessmodel.pth’)
132
133plt.plot(train_losses, label=’Training loss’)
134plt.plot(test_losses, label=’Validation loss’)
135plt.legend(frameon=False)
136plt.show()
137
138from sklearn.metrics import f1_score
139f1_score(y_true, y_pred, average=None)
Listing 16: The classificastion code using a transfer learning process with the resnet50 pretrained module.
Refer to caption
Figure 25: Resnet50 loss evolution in function of the number of epochs. The loss function used is the cross-entropy loss.

5.2 Inpainting and clustering

One major issue our data had is the the endoscopic camera’s light in work. In order to tackle this problem we tried firstly to inpaint the images. We have considered the areas with light on them as having missing values, so the inpainting idea, which is the task of reconstructing missing regions in an image, comes into play. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g. object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. The technique consisted of finding the white pixels in the images and dilute the surrounding pixels using a certain threshold on the pixels’ values by working with a mask between 221 and 255, in order to conserve the maximum amount of information. After inpainting the images, we noticed that they still contain some noise in them. So in order to tackle this problem, we tried kmeans [55] for clustering. Clustering algorithms are unsupervised algorithms, meaning they don’t use labelled data. They are used to assign data points from a population to different groups where data points belonging to the same group have similar traits. In our instance, clustering the image enabled us to blend the several image segments together in order to lessen the noise that persisted even after the inpainting.

1import cv2
2import numpy as np
3import matplotlib.pyplot as plt
4
5def inpaint_image(img_path, morph_size=(7,7)):
6 # Load the image
7 img = cv2.imread(img_path)
8 if img is None:
9 raise ValueError("Failed to read image from the given path")
10
11 # Convert the image to grayscale
12 gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
13
14 # Threshold the grayscale image to create a mask
15 mask = cv2.threshold(gray, 220, 255, cv2.THRESH_BINARY)[1]
16
17 # Perform morphological closing on the mask
18 kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, morph_size)
19 mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
20
21 # Inpaint the image using the mask
22 result = cv2.inpaint(img, mask, 21, cv2.INPAINT_TELEA)
23
24 # Convert the result to RGB
25 image = cv2.cvtColor(result, cv2.COLOR_BGR2RGB)
26
27 return image
28
29# Specify the path to the input image
30img_path = ’./kvasirpytorch/kvasir-dataset/ulcerative-colitis/049c2045-5259-47d8-8f9e-bbbecd81789f.jpg’
31
32# Inpaint the image
33image = inpaint_image(img_path)
34
35# Display the result
36plt.imshow(image)
37plt.axis(’off’)
Listing 17: Implementation of the morphological mask and the inpainting transformation.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 26: In order from top and from left to right, the figures present: the original and real image extracted directly from the Kvasir database, the image after application of a segmentation, after application of the algorithm of inpainting and finally after applying segmentation.

5.3 Image classification with PCA

Image classification using principal component analysis (PCA) was performed in this study to explore an unconventional approach to solving computer vision problems. While deep learning is commonly used to address these challenges, we opted to apply gradient boosting. Initially, we extracted the palettes of each image to obtain a correspondence table of selected colors in the RGB color space. We extracted the first five dominant color sets and transformed the images into vectors, which were arranged in a dataframe. However, before proceeding with the classification process, we applied PCA to reduce the dimensionality of the data and ensure that our points were represented optimally, thus increasing the chances of successful classification.

Refer to caption
Figure 27: PCA plot of the points

To identify the optimal number of palettes for the images, we plotted the accuracy of the model against the number of palettes used. Based on the results, we selected the optimal number of palettes for further tests. We then evaluated our gradient boosting method, specifically XGBoost [56], a distributed tree boosting algorithm that utilizes the gradient boosting framework to create efficient, flexible, and portable machine learning algorithms. Overall, this study demonstrates the potential of using gradient boosting and PCA to solve computer vision problems and offers new insights into the classification of image data. This table shows the confusion matrix of our results.

P/A Positive Negative
Positive 66 10
Negative 6 68
Refer to caption
Figure 28: Results of the classification algorithm.

5.4 Computer vision solution perspectives

In this part, we are discussing how the previous subsections are going to introduce the complete architecture of the solution. While the partial differential equations gives us a solution depending on time, classification models only provide the current state of the disease. The proposed solution would be a tailored deep learning architecture having both the regression and the classification tasks in order to get the full picture on the evolution of the disease. The regression part of the solution consists of a convolutional long short term memory (ConvLSTM). This network is provided a sequence of images depending on time and its output is the next sequence of the disease meaning the next state. With this architecture we get both the current state and its evolution both on time and surface.

Refer to caption
Figure 29: Computer vision proposed architecture

While the previous propositions might seem a bit shallow, we propose another approach which is image segmentation. This could be treated also as a regression problem. We apply an image segmentation model on our images that would return the surface that has the disease on it. The information we get is later treated as a tabular data and with sufficient data we could apply a state of the art regression model in order to determine the evolution of the surface. Keep in mind that the information provided by the segmentation model should be treated carefully because of the amount of disturbance that could be provided by the mask on the detected object.

6 Conclusion

This study was carried out with the ultimate goal of being producible and usable by the scientific community in a challenging Tunisian context. Digestive cancers are a significant public health concern due to their high mortality rates. Besides, the mortality depends on the stage at the time of diagnosis: high mortality for late stage detection vs survival rate for pre-cancerous stage detection. Ulcerative Colitis and Crohn’s disease may indicate a potential evolution towards cancer. However, the current diagnostic methods for these diseases are complicated: requires expert gastroenterologists, expensive equipments, and has a major impact on the quality of life of patients (invasive) making early detection challenging. Which brings us to try to answer several open questions: Can we improve on current methods, make it less expensive and with less heavy intervention then improving the patient’s life quality?

This paper aimed to demonstrate the potential of AI in providing a more cost-effective and less invasive solution for detection by solving partial differential equations that model the propagation of these diseases. Our findings provide open-source data and codes, promoting transparency and encouraging further research. Our preliminary study used unsupervised learning to model ulcerative colitis and Crohn’s disease due to the lack of good quality data. Further investigations may be necessary to better understand the modeling of these diseases and the potential to combine computer vision techniques and regression models. We have shown that is possible to solve the Fisher-KPPk equation with low quality data and with a deep neural network. The difficulty was noticed rather in comparison with the Turing non-linear and coupled PDEs system. For that, an investigation is necessary to understand the phenomenon of explosion badly modeled. Finally, we believe that this work was an opportunity to bring together three communities: gastro-enterologists, mathematicians, and AI specialists through our proposed experimental framework 15.

References

  • [1] S. Cuomo, V.S. Di Cola, F. Giampaolo, et al. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. J Sci Comput, 92:88, 2022.
  • [2] Kevrekidis I.G. Lu L. Perdikaris P. Karniadakis, G.E. et al. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  • [3] Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, et al. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
  • [4] Tchelepi H.A. Fuks O. Limitations of physics informed machine learning for nonlinear two-phase transport in porous media. Journal of Machine Learning for Modeling and Computing, 1(1):19–37, 2020.
  • [5] Grégoire Nadin, Eric Ogier-Denis, Ana I. Toledo, and Hatem Zaag. A turing mechanism in order to explain the patchy nature of crohn’s disease. Journal of Mathematical Biology, 83(2):12, 2021.
  • [6] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
  • [7] Hao-nan Wang, Ning Liu, Yi-yun Zhang, Da-wei Feng, Feng Huang, Dong-sheng Li, and Yi-ming Zhang. Deep reinforcement learning: a survey. Frontiers of Information Technology & Electronic Engineering, 21(12):1726–1744, 2020.
  • [8] Jonas Degrave, Federico Felici, Jonas Buchli, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  • [9] Suman Ravuri, Karel Lenc, Matthew Willson, et al. Skilful precipitation nowcasting using deep generative models of radar. Nature, 597(7878):672–677, 09 2021.
  • [10] L.C. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. AMS, 2nd edition, 2010.
  • [11] Jean-Pierre Demailly. Analyse numérique et équations différentielles : nouvelle édition avec exercices corrigés. Grenoble Sciences. EDP Sciences, 05 2016.
  • [12] Sören Bartels. Numerical Methods for Nonlinear Partial Differential Equations. Springer Series in Computational Mathematics. Springer Cham, 1 edition, 2015.
  • [13] Jan Blechschmidt and Oliver G. Ernst. Three ways to solve partial differential equations with neural networks — a review. Gamm, 44(2):e202100006, 06 2021. "Special Issue: Scientific Machine Learning - Part II".
  • [14] Frank Merle and Hatem Zaag. On degenerate blow-up profiles for the subcritical semilinear heat equation. arXiv, 2021.
  • [15] Catherine Le Berre, Ashwin N Ananthakrishnan, Silvio Danese, et al. Ulcerative colitis and crohn’s disease have similar burden and goals for treatment. Clin Gastroenterol Hepatol, 18(1):14–23, Jan 2020.
  • [16] Johan Burisch. Crohn’s disease and ulcerative colitis. occurrence, course and prognosis during the first year of disease in a european population-based inception cohort. Danish medical journal, 61(1):B4778, Jan 2014.
  • [17] Doreen Busingye et al. Prevalence of inflammatory bowel disease in the australian general practice population: A cross-sectional study. PloS one, 16(5):e0252458, 2021.
  • [18] M. Mosli, S. Alawadhi, F. Hasan, et al. Incidence, prevalence, and clinical epidemiology of inflammatory bowel disease in the arab world: A systematic review and meta-analysis. Inflamm Intest Dis, 6(3):123–131, 2021.
  • [19] Sami Karoui and et al. Frequence et facteurs predictifs de colectomie et de coloproctectomie au cours de la rectocolite hemorragique. La tunisie Medicale, 87(02):115–119, 2009.
  • [20] Maxime Collard. Les mathématiques au secours de la biologie. Société Nationale Française de Colo-Proctologie (SNFCP), Nov 2021.
  • [21] S Kraszewski, W Szczurek, J Szymczak, et al. Machine learning prediction model for inflammatory bowel disease based on laboratory markers. working model in a discovery cohort study. J Clin Med, 10(20):4745, Oct 16 2021.
  • [22] R. Makkar and S. Bo. Colonoscopic perforation in inflammatory bowel disease. Gastroenterol Hepatol (N Y), 9(9):573–83, 2013.
  • [23] Ana Isis Toledo Marrero. Reaction-diffusion equations and applications to biological control of dengue and inflammation. PhD thesis, Université Paris-Nord - Paris XIII, 2021.
  • [24] Safaa Al Ali. Mathematical modelling of chronic inflammatory bowel diseases. PhD thesis, Université Paris-Nord - Paris XIII, 2021.
  • [25] Gartner. Gartner hype cycle for artificial intelligence, 2021.
  • [26] Gartner. Gartner hype cycle for artificial intelligence, 2022.
  • [27] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
  • [28] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989.
  • [29] Andrew R. Barron. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14(1):115–133, January 1994.
  • [30] Shiyu Liang and R. Srikant. Why deep neural networks for function approximation? In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  • [31] Lu Lu, Pengzhan Jin, Guofei Pang, et al. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021.
  • [32] Zhiping Mao, Lu Lu, Olaf Marxen, et al. Deepm mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. Journal of Computational Physics, 447:110698, 2021.
  • [33] Shengze Cai, Zhicheng Wang, Lu Lu, et al. Deepm mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Computational Physics, 436:110296, 2021.
  • [34] John H. Lagergren, John T. Nardini, Michael Lavigne G., Erica M. Rutter, and Kevin B. Flores. Learning partial differential equations for biological transport models from noisy spatio-temporal data. Proc. R. Soc. A., 476(20190800), 2020.
  • [35] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, et al. Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In Proceedings of the 8th ACM on Multimedia Systems Conference, MMSys’17, pages 164–169, New York, NY, USA, 2017. ACM.
  • [36] J. Y. Lee, J. Jeong, E. M. Song, et al. Real-time detection of colon polyps during colonoscopy using deep learning: systematic validation with four independent datasets. Scientific Reports, 10:8379, 2020.
  • [37] Marco Chierici, Nicolae Puica, Matteo Pozzi, et al. Automatically detecting crohn’s disease and ulcerative colitis from endoscopic imaging. BMC Medical Informatics and Decision Making, 22(6):300, 2022.
  • [38] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • [39] L. Bottou. Online Algorithms and Stochastic Approximations. Cambridge University Press, Cambridge, UK, 1998.
  • [40] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  • [41] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, NY, 2 edition, July 2006.
  • [42] James Martens. Deep learning via hessian-free optimization. In International Conference on Machine Learning, 2010.
  • [43] Dong C. Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45(1):503–528, August 1989.
  • [44] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  • [45] Tony Lyons. The 2-component dispersionless burgers equation arising in the modelling of blood flow. Communications on Pure and Applied Analysis, 11(4):1563–1576, 2012.
  • [46] Matylda Jabłońska, Robert Sitarz, and Andrzej Kraslawski. Forecasting research trends using population dynamics model with burgers’ type interaction. 32:619–624, 2013.
  • [47] Mahesh Gajendran, Priyadarshini Loganathan, Guillermo Jimenez, et al. A comprehensive review and update on ulcerative colitis,. Disease-a-Month, 65(12):100851, 2019. A Comprehensive Review and Update on Ulcerative Colitis.
  • [48] Chaohao Xiao, Xiaoqian Zhu, Fukang Yin, and Xiaoqun Cao. Physics-informed neural network for solving coupled korteweg-de vries equations. Journal of Physics: Conference Series, 2031:012056, 2021.
  • [49] Safaa Al-Ali, John Chaussard, Sébastien Li-Thiao-Té, et al. Automatic bleeding and ulcer detection from limited quality annotations in ulcerative colitis. Inflammatory Bowel Diseases, 28(Supplement 1):S19 S20, 01 2022.
  • [50] Safaa Al-Ali, Sébastien Li-Thiao-Té, John Chaussard, and Hatem Zaag. Mathematical modeling of the spatial distribution of lesions in inflammatory bowel disease. feb 2020.
  • [51] Cyrille Rossant. IPython Interactive Computing and Visualization Cookbook, Second Edition. Packt Publishing, 2018.
  • [52] R. A. FISHER. The wave of advance of advantageous genes. Annals of Eugenics, 7(4):355–369, 1937.
  • [53] A.N. Kolmogorov, I.G. Petrovsky, and N.S. Piskunov. Investigation of the equation of diffusion combined with increasing of the substance and its application to a biology problem. Bulletin of Moscow State University Series A: Mathematics and Mechanics, 1:1–25, 1937.
  • [54] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  • [55] David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07), pages 1027–1035. Society for Industrial and Applied Mathematics, 01 2007.
  • [56] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), pages 785–794. ACM, 08 2016.