This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Deep Input-Output Stable Dynamics

Ryosuke Kojima
Graduate School of Medicine
Kyoto University
Kyoto, 606-8501
kojima.ryosuke.8e@kyoto-u.ac.jp
&Yuji Okamoto
Graduate School of Medicine
Kyoto University
Kyoto, 606-8501
okamoto.yuji.2c@kyoto-u.ac.jp
Equal contribution.
Abstract

Learning stable dynamics from observed time-series data is an essential problem in robotics, physical modeling, and systems biology. Many of these dynamics are represented as an inputs-output system to communicate with the external environment. In this study, we focus on input-output stable systems, exhibiting robustness against unexpected stimuli and noise. We propose a method to learn nonlinear systems guaranteeing the input-output stability. Our proposed method utilizes the differentiable projection onto the space satisfying the Hamilton-Jacobi inequality to realize the input-output stability. The problem of finding this projection can be formulated as a quadratic constraint quadratic programming problem, and we derive the particular solution analytically. Also, we apply our method to a toy bistable model and the task of training a benchmark generated from a glucose-insulin simulator. The results show that the nonlinear system with neural networks by our method achieves the input-output stability, unlike naive neural networks. Our code is available at https://github.com/clinfo/DeepIOStability.

1 Introduction

Learning dynamics from time-series data has many applications such as industrial robot systems [1], physical systems[2], and biological systems [3, 4]. Many of these real-world systems equipped with inputs and outputs to connect for each other, which are called input-output systems [5]. For example, biological systems sustain life by obtaining energy from the external environment through their inputs. Such real-world systems have various properties such as stability, controllability, and observability, which provide clues to analyze the complex systems.

Our purpose is to learn a complex system with “desired properties” from a dataset consisting of pairs of input and output signals. To represent the target system, this paper considers the following nonlinear dynamics:

x˙=f(x)+G(x)u,x(0)=x0y=h(x).\displaystyle\begin{aligned} \dot{x}&=f(x)+G(x)u,\quad x(0)=x_{0}\\ y&=h(x).\end{aligned} (1)

where the inner state xx, the input uu, and the output yy belong to a signal space that maps time interval [0,)[0,\infty) to the Euclidean space. We denote the dimension of xx, uu, and yy as nn, mm, and ll, respectively.

Recently, with the development of deep learning, many methods to learn systems from time-series data using neural networks have been proposed [6, 7, 8]. By representing the maps (f,G,h)(f,G,h) in Eq. (1) as neural networks, complex systems can be modeled and trained from a given dataset. However, guaranteeing that a trained system has the desired properties is challenging.

Refer to caption Refer to caption
(A) (B)
Figure 1: (A) The model prediction of neural networks (B) the reaction of trained model. These results are min-max normalized.

A naively trained system fits the input signals contained in the training dataset, but does not always fit for new input signals. For example, Figure 1 shows our preliminary experiments where we naively learned neural networks (f,G,h)(f,G,h) in Eq. (1) from input and output signals. The trained neural networks provided small predictive errors for an input signal in the training dataset (Figure 1 (A) ). Contrastingly, the unbounded output signals were computed by the trained system with new step input signals (Figure 1 (B) ). The reason can be expected that the magnitude (integral value) of this input signal was larger than that of the input signals in the training dataset.

The internal stability is known as one of the attractive properties that should be often satisfied in real-world dynamical systems. The conventional methods to train internal stable systems consisting of neural networks adopt the Lyapunov-based approaches [9, 10, 11]. These methods focus on the internal system, x˙=f(x)\dot{x}=f(x) in the input-output system (1). Thus, how to learn the entire system (1) with the desired property related to the influence of the input signal is still challenging.

We propose a novel method to learn a dynamical system consisting of neural networks considering the input-output stability. The notion of the input-output stability is often used together with the Hamilton-Jacobi inequality for controller design of the target system in the field of control theory. The Hamilton-Jacobi inequality is one of the sufficient conditions for the input-output stability. The feature of this condition is that the variable for an input signal uu does not appear in the expression, i.e., we do not need to evaluate the condition for unknown inputs [12]. To the best of our knowledge, this is the first work that establishes a learning method for the dynamical systems consisting of neural networks using the Hamilton-Jacobi inequality.

The contributions of this paper are as follows:

  • This paper derives differentiable projections to the space satisfying the Hamilton-Jacobi inequality.

  • This paper presents the learning method for the input-output system proven to always satisfy the Hamilton-Jacobi inequality.

  • This paper also provides a loss function derived from the Hamilton-Jacobi inequality. By combining this loss function with the projection described above, efficient learning can be expected.

  • This paper presents experiments using two types of benchmarks to evaluate our method.

2 Background

This section describes the 2\mathcal{L}_{2} stability, a standard definition of the input-output stability, and the Hamilton-Jacobi inequality.

First, we define the 2\mathcal{L}_{2} stability of the nonlinear dynamical system (1). If the norm ratio of the output signal to the input signal is bounded, the system is 2\mathcal{L}_{2} stable and this norm ratio is called the 2\mathcal{L}_{2} gain. The 2\mathcal{L}_{2} norm on the input and output signal space is used for the definition of the 2\mathcal{L}_{2} stability. To deal with the 2\mathcal{L}_{2} stability on the nonlinear system (1), we assume that the origin x0x\equiv 0 of the internal system x˙=f(x)\dot{x}=f(x) is an asymptotically stable equilibrium point and h(0)=0h(0)=0. Since the translation of any asymptotically stable equilibrium points is possible, this assumption can be satisfied without loss of generality.

Definition 1.

The 2\mathcal{L}_{2} norm is defined as x2:=0x(t)2dt\|x\|_{\mathcal{L}_{2}}:=\sqrt{\int_{0}^{\infty}\|x(t)\|^{2}\differential t}. If there exists γ0\gamma\geq 0 and a function β\beta on a domain DnD\subset\mathbb{R}^{n} such that

y2γu2+β(x0),\displaystyle\|y\|_{\mathcal{L}_{2}}\leq\gamma\|u\|_{\mathcal{L}_{2}}+\beta(x_{0}), (2)

then the system (1) is 2\mathcal{L}_{2} stable, where the function β()\beta(\cdot) is non-negative and β(0)=0\beta(0)=0. Furthermore, the minimum γ\gamma satisfying (2) is called the 2\mathcal{L}_{2} gain of the nonlinear system (1).

Next, we describe a sufficient condition of the 2\mathcal{L}_{2} stability using the Lyapunov function V:DV:D\rightarrow\mathbb{R}, where the function VV is positive semi-definite, i.e., V(x)0V(x)\geq 0 and V(0)=0V(0)=0. The Hamilton-Jacobi inequality is an input-independent sufficient condition.

Proposition 1 ([5, Theorem 5.5]).

Let ff be locally Lipschitz and let GG and hh be continuous. If there exist a constant γ>0\gamma>0 and a continuously differentiable positive semi-definite function V:DnV:D\subset\mathbb{R}^{n}\rightarrow\mathbb{R} such that

VT(x)f(x)+12γ2GT(x)V(x)2+12h(x)20,xD{0},\displaystyle\begin{aligned} &\nabla V^{\mathrm{T}}(x)f(x)+\frac{1}{2\gamma^{2}}\|G^{\mathrm{T}}(x)\nabla V(x)\|^{2}+\frac{1}{2}\|h(x)\|^{2}\leq 0,\quad\forall x\in D\setminus\{0\},\end{aligned} (3)

then the system (1) is 2\mathcal{L}_{2} stable and the 2\mathcal{L}_{2} gain is less than or equal to γ\gamma. This condition is called the Hamilton-Jacobi inequality.

The above proposition can be generalized to allow the more complicated situations such as limit-cycle and bistable cases, where the domain DD contains multiple asymptotically stable equilibrium points. The equilibrium point assumed in this proposition can be replaced with positive invariant sets by extending the 2\mathcal{L}_{2} norm [13]. Furthermore, by mixing multiple Lyapunov functions, this proposition can be generalized around multiple isolated equilibrium points.

3 Method

The goal of this paper is to learn the 2\mathcal{L}_{2} stable system represented by using neural networks (f,G,h)(f,G,h). The Hamiltonian-Jacobi inequality, which implies the 2\mathcal{L}_{2} stability, is expressed by (f,G,h)(f,G,h). We present a method to project (f,G,h)(f,G,h) onto the space where the Hamilton-Jacobi inequality holds.

3.1 Modification of nonlinear systems

Supposing f𝐧:nnf_{\bf n}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}, G𝐧:nn×mG_{\bf n}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times m}, and h𝐧:nlh_{\bf n}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{l}, a triplet map (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) denote as nominal dynamics. Introducing a distance on the triplet maps, the nearest triplet satisfying the Hamilton-Jacobi inequality from the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) is called modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}). The purpose of this section is to describe the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) associated with the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) by analytically deriving a projection onto the space satisfying the Hamilton-Jacobi inequality.

The problem of finding the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) is written as a quadratic constraint quadratic programming (QCQP) problem for the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}). Since there is generally no analytical solution for QCQP problems, we aim to find the particular solution by adjusting the distance on the triplets.

To prepare for the following theorem, we define the ramp and the clamp functions as

R(x){0,x0x,x>0,C(x;a,b){a,xax,a<xbb,x>b,\displaystyle\mathrm{R}(x)\triangleq\begin{cases}0,&x\leq 0\\ x,&x>0\end{cases},\quad\mathrm{C}(x;a,b)\triangleq\begin{cases}a,&x\leq a\\ x,&a<x\leq b\\ b,&x>b\end{cases},

and define the Hamilton-Jacobi function as

HJ(f,G,h)VTf+12γ2GTV2+12h2,\displaystyle\mathrm{HJ}(f,G,h)\triangleq\nabla V^{\mathrm{T}}f+\frac{1}{2\gamma^{2}}\|G^{\mathrm{T}}\nabla V\|^{2}+\frac{1}{2}\|h\|^{2}, (4)

where VV is a given positive definite function. A way to design VV is to determine the desired positive invariant set and design the increasing function around this set. For example, if the target system has a unique stable point, x=0x=0 can be regarded as the positive invariant set and the increasing function VV can be designed as V(x)=12x2V(x)=\tfrac{1}{2}x^{2}.

Theorem 1.

Consider that the following optimal problem:

minimizef𝐦,G𝐦,h𝐦(1k)Vf𝐦f𝐧+k2γ2G𝐦G𝐧2+kV2h𝐦h𝐧2\displaystyle\underset{f_{\bf m},G_{\bf m},h_{\bf m}}{\text{\rm\bf minimize}}\quad\frac{(1-k)}{\|\nabla V\|}\|f_{\bf m}-f_{\bf n}\|+\frac{k}{2\gamma^{2}}\|G_{\bf m}-G_{\bf n}\|^{2}+\frac{k}{\|\nabla V\|^{2}}\|h_{\bf m}-h_{\bf n}\|^{2} (5a)
subject toHJ(f𝐦,G𝐦,h𝐦)0,\displaystyle\text{\rm\bf subject to}\quad\mathrm{HJ}(f_{\bf m},G_{\bf m},h_{\bf m})\leq 0, (5b)

where k[0,1]k\in[0,1]. The solution of (5) is given by

f𝐦\displaystyle f_{\bf m} =f𝐧1V2R(Vf+k2VGh)V,\displaystyle=f_{\bf n}-\frac{1}{\|\nabla V\|^{2}}\mathrm{R}\left(V_{f}+k^{2}V_{Gh}\right)\nabla V,
G𝐦\displaystyle G_{\bf m} =G𝐧(1C(VfVGh;k2,1))PVG𝐧,\displaystyle=G_{\bf n}-\left(1-\sqrt{\mathrm{C}\left(-\tfrac{V_{f}}{V_{Gh}};k^{2},1\right)}\right)P_{V}G_{\bf n},
h𝐦\displaystyle h_{\bf m} =C(VfVGh;k2,1)h𝐧,\displaystyle=\sqrt{\mathrm{C}\left(-\tfrac{V_{f}}{V_{Gh}};k^{2},1\right)}h_{\bf n},

where

VfVTf𝐧,VGh12γ2G𝐧TV2+12h𝐧2,PVVVTV2.\displaystyle V_{f}\triangleq\nabla V^{\mathrm{T}}f_{\bf n},\quad V_{Gh}\triangleq\frac{1}{2\gamma^{2}}\|G_{\bf n}^{\mathrm{T}}\nabla V\|^{2}+\frac{1}{2}\|h_{\bf n}\|^{2},\quad P_{V}\triangleq\frac{\nabla V\nabla V^{\mathrm{T}}}{\|\nabla V\|^{2}}.
Proof:.

See Appendix A. ∎

The objective function of (5a) is a new distance between the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) and the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}). This new distance allows the derivation of analytical solutions by combining three distances of ff, GG, and hh.

Focusing on the objective function (5a), the constant kk represents the ratio of the distance scale of ff to GG and hh. When k=0k=0, the result of this problem (5) are consistent with the projection of the conventional method that guarantee internal stability [9]. As the constant kk converges 11, the modification method of Theorem 1 approaches G𝐦G_{\bf m} and h𝐦h_{\bf m} to G𝐧G_{\bf n} and h𝐧h_{\bf n}, respectively. In this case, the objective function (5a) becomes the distance between f𝐦f_{\bf m} to f𝐧f_{\bf n}. Therefore, the following corollary is satisfied.

Corollary 1.

The solution of

minimizef𝐦f𝐦f𝐧\displaystyle\underset{f_{\bf m}}{\text{\rm\bf minimize}}\quad\|f_{\bf m}-f_{\bf n}\| (6a)
subject toHJ(f𝐦,G𝐧,h𝐧)0,\displaystyle\text{\rm\bf subject to}\quad\mathrm{HJ}(f_{\bf m},G_{\bf n},h_{\bf n})\leq 0, (6b)

is given by

f𝐦\displaystyle f_{\bf m} =f𝐧1V2R(HJ(f𝐧,G𝐧,h𝐧))V.\displaystyle=f_{\bf n}-\frac{1}{\|\nabla V\|^{2}}\mathrm{R}\left({\rm HJ}(f_{\bf n},G_{\bf n},h_{\bf n})\right)\nabla V.
Proof:.

This solution is easily derived from Theorem 1. ∎

Corollary 1 derives a solution of a linear programming problem rather than QCQP problems. Replacing the Hamilton-Jacobi function HJ(f𝐦,G𝐧,h𝐧)\mathrm{HJ}(f_{\bf m},G_{\bf n},h_{\bf n}) with the time derivative of a positive definite function VTf\nabla V^{\mathrm{T}}f, this corollary matches the result of the conventional study [9].

When the map h𝐦h_{\bf m} is fixed as h𝐧h_{\bf n}, a similar solution as Theorem 1 is derived. Although Corollary 1 is proved by changing kk, the modified dynamics with the fixed h𝐧h_{\bf n} are not derived. We reprove this modified dynamics in a similar way to Theorem 1.

Corollary 2.

Consider the following problem:

minimizef𝐦,G𝐦(1k)Vf𝐦f𝐧+k2γ2G𝐦G𝐧2\displaystyle\underset{f_{\bf m},G_{\bf m}}{\text{\rm\bf minimize}}\quad\frac{(1-k)}{\|\nabla V\|}\|f_{\bf m}-f_{\bf n}\|+\frac{k}{2\gamma^{2}}\|G_{\bf m}-G_{\bf n}\|^{2} (7a)
subject toHJ(f𝐦,G𝐦,h𝐧)0,\displaystyle\text{\rm\bf subject to}\quad\mathrm{HJ}(f_{\bf m},G_{\bf m},h_{\bf n})\leq 0, (7b)

where k[0,1]k\in[0,1]. The solution of (7) is given by

f𝐦=f𝐧1β2R(Vfh+k2VG)V,G𝐦=G𝐧(1C(VfhVG;k2,1))PVG𝐧,\displaystyle f_{\bf m}=f_{\bf n}-\frac{1}{\|\beta\|^{2}}\mathrm{R}\left(V_{fh}+k^{2}V_{G}\right)\nabla V,\quad G_{\bf m}=G_{\bf n}-\left(1-\sqrt{\mathrm{C}\Big{(}-\tfrac{V_{fh}}{V_{G}};k^{2},1\Big{)}}\right)P_{V}G_{\bf n}, (8)

where

VfhVTf𝐧+12h𝐧2,VG12γ2G𝐧TV2.\displaystyle V_{fh}\triangleq\nabla V^{\mathrm{T}}f_{\bf n}+\frac{1}{2}\|h_{\bf n}\|^{2},\quad V_{G}\triangleq\frac{1}{2\gamma^{2}}\|G_{\bf n}^{\mathrm{T}}\nabla V\|^{2}.
Proof:.

See Appendix A. ∎

3.2 Loss function

Refer to caption

(A)                                    (B)                                   (C)

Figure 2: Sketches of our method : (A) minimizing the prediction error (the first term of the loss function (9)) in the blue region, (B) moving the nominal dynamics to the blue region (the second term), and (C) reducing the blue region while keeping the same level of the prediction error (the last term).

We represent (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) as neural networks and denote (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) as the 2\mathcal{L}_{2} stable dynamics modified by Theorem 1, Corollary 1, or 2. Note that the modification depends on the candidate of 2\mathcal{L}_{2} gain γ\gamma. Figure 2 (A) shows the sketch of this modification, where the blue region satisfies the Hamilton-Jacobi inequality.

Since the nonlinear system of the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) is represented as ordinary differential equations (ODEs) consisting of the differentiable functions f𝐧f_{\bf n},G𝐧G_{\bf n}, and h𝐧h_{\bf n}, the techniques of training neural ODEs can be applied [14]. Once a loss function is designed, the parameters of neural networks in the modified system can be learned from given data.

Loss\displaystyle\mathrm{Loss} =E(x0,u,y^)𝒟[yy^22]+λLHJ+αγ2,\displaystyle=\mathrm{E}_{(x_{0},u,\hat{y})\in\cal{D}}[||y-\hat{y}||_{\mathcal{L}_{2}}^{2}]+\lambda L_{\rm HJ}+\alpha\gamma^{2}, (9)

where λ\lambda and α\alpha are non-negative coefficients.

The first term shows the prediction error of the output signal yy ( Figure 2 (A)). A dataset 𝒟\cal{D} consists of tuples (x0,u,y)(x_{0},u,y) where the initial value x0x_{0}, the input signal uu, and the output signal yy. The predicted output y^\hat{y} is calculated from x0x_{0}, uu, and the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}).

The second term aims to improve the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) closer to the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) and is defined as

LHJ\displaystyle L_{\mathrm{HJ}} =Ex[R(HJ(f𝐧,G𝐧,h𝐧)(x)+ε)],\displaystyle=\mathrm{E}_{x}[\mathrm{R}(\mathrm{HJ}(f_{\bf n},G_{\bf n},h_{\bf n})(x)+\varepsilon)],

where ε\varepsilon is a positive constant ( Figure 2 (B)). Since this term is a form of the hinge loss, ε\varepsilon represents the magnitude of the penalties for the Hamilton-Jacobi inequality. To evaluate this inequality for any xDx\in D, we introduce a distribution of xx over the domain DD. In our experiments, this distribution is decided as a Gaussian distribution 𝒩(μ,σ2){\cal N}(\mu,\sigma^{2}) where the mean μ\mu is placed at the asymptotically stable point and the variance σ2\sigma^{2} is an experimental parameter. Without this second term of the loss function, there are degrees of freedom in the nominal dynamics, i.e., multiple nominal dynamics give the same loss by the projection, which negatively affects the parameter learning.

The modifying parameter γ\gamma can be manually designed for the application or automatically trained from data by introducing the last term. This training explores smaller γ\gamma while keeping the same level of the prediction error (Figure 2 (C)).

4 Related work

Estimating parameters of a given system is traditionally studied as system identification. In the field of system identification, much research on the identification of linear systems have been done, where the maps (f,G,h)(f,G,h) in Eq. (1) assumes to be linear [15]. Linear state-space models include identification methods by impulse response like the Eigensystem Realization Algorithm (ERA)[16], by the state-space representation like Multivariable Output Error State sPace (MOESP)[17] and ORThogonal decomposition method (ORT)[18]. System identification methods for non-state-space models, unlike our target system, contain Dynamic Mode Decomposition with control (DMDc)[19] and its nonlinear version, Sparse Identification of Nonlinear DYnamics with control (SINDYc)[20]. Also, the nonlinear models such as the nonlinear ARX model [21] and the Hammerstein-Wiener model [22] have been developed and often trained by error minimization. For example, the system identification method using a piece-wise ARX model allows more complicated functions by relaxing the assumption of linearity [21].

These traditional methods often concern gray-box systems, where the maps (f,G,h)(f,G,h) in Eq. (1) are partially known [23]. This paper deals with a case of black-box systems, where ff, GG, and hh in the system (1) are represented by neural networks. Our method can be used regardless of whether all ff, GG, and hh functions are parameterized using neural networks. So, our method can be easily applied to application-dependent gray-box systems when the functions ff, GG, and hh are differentiable with respect to the parameters.

Because the input-output system (1) can be regarded as a differential equation, our study is closely related to the method of combining neural networks and ODEs [7, 14]. These techniques have been improved in recent years, including discretization errors and computational complexity. Although we used an Euler method for simplicity, we can expect that learning efficiency would be further improved by using these techniques.

The internal stability is a fundamental property in ODEs, and learning a system with this property using neural networks plays an important role in the identification of real-world systems [24]. In particular, the first method to guarantee the internal stability of the trained system has been proposed in [9]. Furthermore, another method [25] extends this method to apply positive invariant sets, e.g. limit cycles and line attractors. Encouraged by these methods based on the Lyapunov function, our method further generalizes these methods using the Hamilton-Jacobi inequality to guarantee the input-output stability.

In this paper, the Lyapunov function VV is considered to be given, but this function can be learned from data. Since a pair of dynamics (1) and VV has redundant degrees of freedom, additional assumptions are required to determine VV uniquely. In [9], it is realized by limiting the dynamics to the internal system and restricting VV to a convex function [26].

Lyapunov functions are also used to design controllers, where the whole system should satisfy the stability condition. A method for learning such a controller using neural nets has been proposed [27]. This method deals with optimization problems over the space that satisfies the stability condition, which is similar to our method. Whereas we solve the QCQP problem to derive the projection onto the space, this method uses a Satisfiability Modulo Theories (SMT) solver to satisfy this condition. The method has also been extended to apply unknown systems [28].

Although this paper deals with deterministic systems, neural networks for stochastic dynamics with the variational inference is well studied [6, 29, 30, 31]. Guaranteeing the internal stability of trained stochastic dynamics is important for noise filtering and robust controller design [10].

5 Experiments

We conduct two experiments to evaluate our proposed method. The first experiment uses a benchmark dataset generated from a nonlinear model with multiple asymptotically equilibrium points. In the next experiment, we applied our method to a biological system using a simulator of the glucose-insulin system.

5.1 Experimental setting

Before describing the results of the experiments, this section explains the evaluation metrics and our experimental setting.

For evaluation metrics, we define the root mean square error (RMSE) and the average 2\mathcal{L}_{2} gain (GainIO) for the given input and output signals as follows:

RMSE1Ni=1Nyiy^i22,GainIO1Ni=1Ny^i2ui2,\displaystyle\text{RMSE}\triangleq\sqrt{\frac{1}{N}\sum_{i=1}^{N}\|y_{i}-\hat{y}_{i}\|^{2}_{\mathcal{L}_{2}}},\quad\text{GainIO}\triangleq\frac{1}{N}\sum^{N}_{i=1}\frac{\|\hat{y}_{i}\|_{\mathcal{L}_{2}}}{\|u_{i}\|_{\mathcal{L}_{2}}},

where NN is the number of signals in the dataset. ui()u_{i}(\cdot) and yi()y_{i}(\cdot) are the input and output signal at the ii-th index, respectively. The prediction signal y^i()\hat{y}_{i}(\cdot) is computed from ui()u_{i}(\cdot), the trained dynamics, and the initial state. Note that the integral contained in the 2\mathcal{L}_{2} norm is approximated by a finite summation. The RMSE is a metric of the prediction errors related to the output signal, and the GainIO is a metric of the property of the 2\mathcal{L}_{2} stability. Whether the target system satisfies or does not satisfy the 2\mathcal{L}_{2} stability, the GainIO with a given finite-size dataset can be calculated. The GainIO error is defined by the absolute error between the GainIO of the test dataset and that of the prediction.

In our experiments, 90%\% of the dataset is used for training and the remaining 10%\% is used for testing. We retry five times for all experiments and show the mean and standard deviations of the metrics.

For simplicity in our experiments, the sampling step Δt\Delta t for the output yy is set as constant and the Euler method is used to solve ODEs. x0x_{0} is put at an asymptotically stable point for each benchmark and is known. In this experiment, to prevent the state from diverging during learning of dynamics, the clipping operation is used so that the absolute values of the states are less than ten.

For comparative methods, we use vanilla neural networks, ARX, ORT [18], MOESP [17], and piece-wise ARX[21]. In the method of vanilla neural networks, the maps (f,G,h)(f,G,h) in the nonlinear system (1) is represented by using three neural networks, i.e., this method is consistent with a method used in Figure 1. To determine the hyperparameters of comparative methods except for neural networks, the grid search is used. Note that these comparative methods only consider the prediction errors.

For training each method with neural networks, an NVIDIA Tesla T4 GPU was used. Our experiments are totally run on 20 GPUs over about three days.

5.2 Bistable model benchmark

Refer to caption
Figure 3: Results of the bistable model benchmark. The upper part shows (A) the RMSE and (B) the GainIO error of the vanilla neural networks (gray), our proposed methods (red), and the conventional methods (blue). The lower part shows (C) the RMSE and (D) the GainIO error for the different sizes of the datasets.

The first experiment is carried out using a bistable model, which is known as a bounded system with multiple asymptotically equilibrium points. This bistable model is defined as

x˙=x(1x2)+u,x(0)=1,y=x.\displaystyle\dot{x}=x(1-x^{2})+u,\quad x(0)=-1,\quad y=x.

The internal system of this model has two asymptotically stable equilibrium points x1,1x\equiv 1,-1.

We generate 1000 input and output signals for this experiment. To construct this dataset, we prepare input signals using positive and negative pulse wave signals whose pulse width is changed at random. The input and output signals on the period [0,10][0,10] are sampled with an interval Δt=0.1\Delta t=0.1. In this benchmark, we set the number of dimensions of the internal system as one and use a fixed function V(x)=min((x1)2,(x+1)2)V(x)=\min((x-1)^{2},(x+1)^{2}), a mixture of the two positive definite functions.

In the result of our experiments, we name the proposed methods modified by Theorem 1, Corollary 1, and 2 as DIOS-fgh, DIOS-f, and DIOS-fg, respectively. In these methods, the parameters of our loss function are set as λ=0\lambda=0 and α=0.01\alpha=0.01. Also, DIOS-fgh+ uses λ=0.01\lambda=0.01 and α=0.01\alpha=0.01 under the same conditions as DIOS-fgh. For this example (1000 input signal), it took about 1 hour using 1 GPU to learn one model training.

The results of the RMSE and the GainIO error in this experiment are shown in Figure 3. Figures 3 (A) and (B) demonstrate that a piece-wise ARX model (PWARX) gives very low RMSE but its GainIO error is high. Our proposed methods achieve a small GainIO error while keeping the RMSE. Note that linear models such as MOESP and ORT only approach one point although the bistable model has two asymptotically stable equilibrium points. Figures 3 (C) and (D) display the effect of dataset sizes. When the dataset size was varied to 100, 1000, and 3000, the result of the larger dataset provided the smaller RMSE for all methods. The vanilla neural networks do not consider the 2\mathcal{L}_{2} gain, so the GainIO error was large in the case of the low RMSE.

Refer to caption
Figure 4: The sketch xx-f(x)f(x) displaying the internal dynamics trained from the bistable datasets with the different sizes. The sign of the bistable model is shown at the bottom of each figure.

Figure 4 shows the relationship between xx and f(x)f(x) in the trained system in (1) to compare the vanilla neural networks, DIOS-fgh and DIOS-fgh+. DIOS-fgh and DIOS-fgh+ successfully find two stable points, i.e., the trained function f(x)f(x) had roots of two stable points x=±1x=\pm 1 and an unstable point x=0x=0. Especially, these stable points were robustly estimated in the trained system using our presented loss function (DIOS-fgh+) even when the dataset size was small. The vanilla neural networks failed to obtain the two stable points in all cases.

5.3 Glucose-insulin benchmark

This section addresses an example of the identification of biological systems. We consider the task of learning the glucose-insulin system using a simulator [32] to construct responses for various inputs and evaluate the robustness of the proposed method for unexpected inputs. This simulator outputs the concentrations of plasma glucose y1y_{1} and insulin y2y_{2} for the appearance of plasma glucose per minute uu. To determine the realistic input uu, we adopt another model, an oral glucose absorption model [33].

Using this simulator, 1000 input and output signals are synthesized for this experiment. The input and output signals are sampled with a sampling interval Δt=1\Delta t=1 and 1000 steps for each sequence. In this benchmark, we set the number of dimensions of the internal system as six, fix a positive definite function V(x)=x2V(x)=x^{2}, and use our loss function with λ=0.001\lambda=0.001 and α=0.001\alpha=0.001. Training one model for this examples (1000 input signal) tooks about 7.5hours using 1GPU. The hyperparameters including the number of layers in the neural networks, the learning rate, optimizer, and the weighted decay are determined using the tree-structured Parzen estimator (TPE) implemented in Optuna [34].

Refer to caption
Figure 5: The input and output signals of the glucose-insulin simulator and the predicted output.
Refer to caption
Figure 6: The step reaction of the trained systems. The color of each line indicates the magnitude of the step input.

The RMSE of vanilla and our proposed method are 0.0103 and 0.0050, respectively. So, from the perspective of RMSE, these methods achieved almost the same performance. Figure 5 shows input and output signals in the test dataset and the predicted output by vanilla neural networks and our method (DIOS-fgh+). The 2\mathcal{L}_{2} stability of the system using the vanilla neural networks is not guaranteed. Since the proposed method guarantees the 2\mathcal{L}_{2} stability, the output signals of DIOS-fgh+ are bounded even if the input signals are unexpectedly large.

To demonstrate this, we conducted an additional experiment using the trained system. Figure 6 shows the transition of output behavior caused by the magnitude of the input signal changing from 2 to 10. Note that the maximum magnitude in the training dataset is one. In this experiment, Δt\Delta t was changed to 0.010.01 and the clipping operation was removed to deal with the large values of the state.

This result shows the output of the vanilla neural networks quickly diverged with an unexpectedly large input. Contrastingly, the output behavior of our proposed method is always bounded. Therefore, we actually confirmed that our proposed method satisfies the 2\mathcal{L}_{2} stability.

6 Conclusion

This paper proposed a learning method for nonlinear dynamical system guaranteeing the 2\mathcal{L}_{2} stability. By theoretically deriving the projection of a triplet (f,G,h)(f,G,h) to the space satisfying the Hamilton-Jacobi inequality, our proposed method realized the 2\mathcal{L}_{2} stability of trained systems. Also, we introduced a loss function to empirically achieve a smaller 2\mathcal{L}_{2} gain while reducing prediction errors. We conducted two experiments to learn dynamical systems consisting of neural networks. The first experiment used a nonlinear model with multiple asymptotically equilibrium points. The result of this experiment showed that our proposed method can robustly estimate a system with multiple stable points. In the next experiment, we applied our method to a biological system using a simulator of the glucose-insulin system. It was confirmed that the proposed method can successfully learn a proper system that works well under unexpectedly large inputs due to the 2\mathcal{L}_{2} stability. There is a limitation that our method cannot apply the system without the 2\mathcal{L}_{2} stability. Future work will expand the 2\mathcal{L}_{2} stability-based method to a more generalized learning method by dissipativity and apply our approach of this study to stochastic systems.

Acknowledgements

This research was supported by JST Moonshot R&D Grant Number JPMJMS2021 and JPMJMS2024. This work was also supported by JSPS KAKENHI Grant No.21H04905 and CREST Grant Number JPMJCR22D3, Japan. This paper was also based on a part of results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

References

  • [1] Jan Swevers, Walter Verdonck, and Joris De Schutter. Dynamic model identification for industrial robots. IEEE control systems magazine, 27(5):58–71, 2007.
  • [2] Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 2016.
  • [3] Timothy S. Gardner, Diego Di Bernardo, David Lorenz, and James J. Collins. Inferring genetic networks and identifying compound mode of action via expression profiling. Science, 301(5629):102–105, 2003.
  • [4] Geoffrey Roeder, Paul K Grant, Andrew Phillips, Neil Dalchau, and Edwards Meeds. Efficient amortised bayesian inference for hierarchical and nonlinear dynamical systems. Proceeding of International Conference on Machine Learning (ICML), 2019.
  • [5] Hassan K. Khalil. Nonlinear systems. Prentice-Hall, 3rd edition, 2002.
  • [6] Rahul G. Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, page 2101–2109, 2017.
  • [7] Ricky T.Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018.
  • [8] Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic ode-net: Learning hamiltonian dynamics with control. In Proceeding of International Conference on Learning Representations, 2019.
  • [9] Gaurav Manek and J. Zico Kolter. Learning stable deep dynamics models. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  • [10] Nathan Lawrence, Philip Loewen, Michael Forbes, Johan Backstrom, and Bhushan Gopaluni. Almost surely stable deep dynamics. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
  • [11] Andreas Schlaginhaufen, Philippe Wenk, Andreas Krause, and Florian Dörfler. Learning stable deep dynamics models for partially observed or delayed dynamical systems. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
  • [12] Arjan van der Schaft. 2\mathcal{L}_{2}-Gain and Passivity Techniques in Nonlinear Control. Springer Publishing Company, Incorporated, 3rd edition, 2016.
  • [13] W.M. Haddad and V.S. Chellaboina. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton University Press, 2008.
  • [14] Xinshi Chen. Review: Ordinary differential equations for deep learning. CoRR, abs/1911.00502, 2019.
  • [15] Tohru Katayama. Subspace Method for System Identification. Springer, 2005.
  • [16] Jer-Nan Juang and Richard S Pappa. An eigensystem realization algorithm for modal parameter identification and model reduction. Journal of guidance, control, and dynamics, 8(5):620–627, 1985.
  • [17] Michel Verhaegen and Patrick Dewilde. Subspace model identification part 1. the output-error state-space model identification class of algorithms. International journal of control, 56(5):1187–1210, 1992.
  • [18] Tohru Katayama, Hideyuki Tanaka, and Takeya Enomoto. A simple subspace identification method of closed-loop systems using orthogonal decomposition. IFAC Proceedings Volumes, 38(1):512–517, 2005.
  • [19] Joshua L Proctor, Steven L Brunton, and J Nathan Kutz. Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems, 15(1):142–161, 2016.
  • [20] Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Sparse identification of nonlinear dynamics with control (SINDYc). IFAC-PapersOnLine, 49(18):710–715, 2016.
  • [21] Andrea Garulli, Simone Paoletti, and Antonio Vicino. A survey on switched and piecewise affine system identification. In IFAC Proceedings Volumes, volume 45, pages 344–355. IFAC, 2012.
  • [22] Adrian Wills, Thomas B. Schon, Lennart Ljung, and Brett Ninness. Identification of hammerstein-wiener models. Automatica, 49(1):70–81, 2013.
  • [23] Lennart Ljung. System identification. In Signal analysis and prediction, pages 163–173. Springer, 1998.
  • [24] Martin Braun and Martin Golubitsky. Differential equations and their applications. Springer, 1983.
  • [25] Naoya Takeishi and Yoshinobu Kawahara. Learning dynamics models with stable invariant sets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9782–9790, 2021.
  • [26] Brandon Amos, Lei Xu, and J Zico Kolter. Input convex neural networks. In International Conference on Machine Learning, pages 146–155, 2017.
  • [27] Ya-Chien Chang, Nima Roohi, and Sicun Gao. Neural lyapunov control. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  • [28] Vrushabh Zinage and Efstathios Bakolas. Neural koopman lyapunov control. arXiv:2201.05098, 2022.
  • [29] Daniel Gedon, Niklas Wahlström, Thomas B. Schön, and Lennart Ljung. Deep state space models for nonlinear system identification. In Proceedings of the 19th IFAC Symposium on System Identification (SYSID), 2021.
  • [30] Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems (NeurIPS), volume 28, 2015.
  • [31] Justin Bayer and Christian Osendorfer. Learning stochastic recurrent networks. ArXiv, abs/1411.7610, 2014.
  • [32] A. De Gaetano and O. Arino. A statistical approach to the determination of stability for dynamical systems modelling physiological processes. Mathematical and Computer Modelling, 31(4):41–51, 2000.
  • [33] Chiara Dalla Man, Robert A. Rizza, and Claudio Cobelli. Meal simulation model of the glucose-insulin system. IEEE Transactions on Biomedical Engineering, 54(10):1740–1749, 2007.
  • [34] Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.

Appendix A Proof of Theorem 1 and Corollary 2

To prove Theorem 1 and Corollary 2, we use the following particular solution of QCQP problems.

Lemma 1.

Suppose that An×nA\in\mathbb{R}^{n\times n} is a positive definite matrix, a vector bnb\in\mathbb{R}^{n}, a scalar cc, and positive constants kxk_{\rm x}, kyk_{\rm y}. The solution of this problem

minimize kxxTAx+ky|y|\displaystyle k_{\rm x}x^{\mathrm{T}}Ax+k_{\rm y}|y|
subject to yxTAx2bTx+c.\displaystyle y\geq x^{\mathrm{T}}Ax-2b^{\mathrm{T}}x+c.

is given by

x\displaystyle x^{*} =(1C(1c~;k~x2,1))A1b,\displaystyle=\left(1-\sqrt{\mathrm{C}(1-\tilde{c};\tilde{k}_{\rm x}^{2},1)}\right)A^{-1}b,
y\displaystyle y^{*} =R(c(1k~x2)bTA1b),\displaystyle=\mathrm{R}\left(c-(1-\tilde{k}_{x}^{2})b^{\mathrm{T}}A^{-1}b\right),

where

c~=cbTA1b,k~x=kxkx+ky.\displaystyle\tilde{c}=\frac{c}{b^{\mathrm{T}}A^{-1}b},\quad\tilde{k}_{\rm x}=\frac{k_{\rm x}}{k_{\rm x}+k_{\rm y}}.
Proof:.

See Appendix B. ∎

To keep the notation short in the following proofs, we define

βV.\displaystyle\beta\triangleq\nabla V.

Proof of Theorem 2: For the Hamilton-Jacobi function HJ(f𝐦,G𝐦,h𝐦)\mathrm{HJ}(f_{\bf m},G_{\bf m},h_{\bf m}) of the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}), the map f𝐦f_{\bf m} and each row of the map G𝐦G_{\bf m} don’t depend on the orthogonal vector of β\beta. Hence, f𝐦f_{\bf m} and G𝐦G_{\bf m} are written as

f𝐦=f𝐧βp,G𝐦=G𝐧βqT,\displaystyle f_{\bf m}=f_{\bf n}-\beta p,\quad G_{\bf m}=G_{\bf n}-\beta q^{\mathrm{T}},

where pp is scalar and qmq\in\mathbb{R}^{m}. h𝐦h_{\bf m} depends only on the norm in HJ(f𝐦,G𝐦,h𝐦)\mathrm{HJ}(f_{\bf m},G_{\bf m},h_{\bf m}), i. e.,

h𝐦=(1r)hn,\displaystyle h_{\bf m}=(1-r)hn,

where rr is scalar.

The optimal function (5a) is rewritten as the pp, qq, and rr, i.e.,

k1βf𝐦f𝐧+k22γ2G𝐦G𝐧2+k22β2h𝐦h𝐧\displaystyle\frac{k_{1}}{\|\beta\|}\|f_{\bf m}-f_{\bf n}\|+\frac{k_{2}}{2\gamma^{2}}\|G_{\bf m}-G_{\bf n}\|^{2}+\frac{k_{2}}{2\|\beta\|^{2}}\|h_{\bf m}-h_{\bf n}\|
=k1|p|+k2β22γ2q2+k2h𝐧2β2r2.\displaystyle=k_{1}|p|+k_{2}\frac{\|\beta\|^{2}}{2\gamma^{2}}\|q\|^{2}+k_{2}\frac{\|h_{\bf n}\|}{2\|\beta\|^{2}}r^{2}.

Also, the condition of constraint (5b) is rewritten as

HJβ(f𝐦,G𝐦,h𝐦)\displaystyle\mathrm{HJ}_{\beta}(f_{\bf m},G_{\bf m},h_{\bf m})
=βTf𝐦+12γ2G𝐦Tβ2+12h𝐦2\displaystyle=\beta^{\mathrm{T}}f_{\bf m}+\frac{1}{2\gamma^{2}}\|G_{\bf m}^{\mathrm{T}}\beta\|^{2}+\frac{1}{2}\|h_{\bf m}\|^{2}
=βT(f𝐧βp)+12γ2(G𝐧βqT)Tβ2+12(1r)h𝐧2\displaystyle=\beta^{\mathrm{T}}(f_{\bf n}-\beta p)+\frac{1}{2\gamma^{2}}\|(G_{\bf n}-\beta q^{\mathrm{T}})^{\mathrm{T}}\beta\|^{2}+\frac{1}{2}\Big{\|}(1-r)h_{\bf n}\|^{2}
=β2p+12γ2β4q2+12h^2r21γ2β2βTG𝐧qh𝐧2r+HJβ(f𝐧,G𝐧,h𝐧)\displaystyle=-\|\beta\|^{2}p+\frac{1}{2\gamma^{2}}\|\beta\|^{4}\|q\|^{2}+\frac{1}{2}\|\hat{h}\|^{2}r^{2}-\frac{1}{\gamma^{2}}\|\beta\|^{2}\beta^{\mathrm{T}}G_{\bf n}q-\|h_{\bf n}\|^{2}r+\mathrm{HJ}_{\beta}(f_{\bf n},G_{\bf n},h_{\bf n})
0.\displaystyle\leq 0.

If x[qT,r]Tx\triangleq[q^{\mathrm{T}},r]^{\mathrm{T}}, ypy\triangleq p, and

A[β22γ2Im00h𝐧22β2],b[12γ2G𝐧Tβh𝐧22β2]c1β2(HJβ(f𝐧,G𝐧,h𝐧)),\displaystyle A\triangleq\begin{bmatrix}\frac{\|\beta\|^{2}}{2\gamma^{2}}I_{m}&0\\ 0&\frac{\|h_{\bf n}\|^{2}}{2\|\beta\|^{2}}\end{bmatrix},\quad b\triangleq\begin{bmatrix}\frac{1}{2\gamma^{2}}G_{\bf n}^{\mathrm{T}}\beta\\ \frac{\|h_{\bf n}\|^{2}}{2\|\beta\|^{2}}\end{bmatrix}\quad c\triangleq\frac{1}{\|\beta\|^{2}}\Big{(}\mathrm{HJ}_{\beta}(f_{\bf n},G_{\bf n},h_{\bf n})\Big{)},

then this optimal problem (5) of this theorem becomes a problem used in Lemma 1. The optimal point of pp, qq and rr is given by

p\displaystyle p^{*} =1β2R(Vf^+k^2VG𝐧,h𝐧),\displaystyle=\frac{1}{\|\beta\|^{2}}\mathrm{R}\left(V_{\hat{f}}+\hat{k}^{2}V_{G_{\bf n},h_{\bf n}}\right),
q\displaystyle q^{*} =1β2(1C(Vf𝐧VG𝐧,h𝐧;k~2,1))G𝐧Tβ,\displaystyle=\frac{1}{\|\beta\|^{2}}\left(1-\sqrt{\mathrm{C}\Big{(}-\tfrac{V_{f_{\bf n}}}{V_{G_{\bf n},h_{\bf n}}};\tilde{k}^{2},1\Big{)}}\right)G_{\bf n}^{\mathrm{T}}\beta,
r\displaystyle r^{*} =1C(Vf𝐧VG𝐧,h𝐧;k~2,1).\displaystyle=1-\sqrt{\mathrm{C}\Big{(}-\tfrac{V_{f_{\bf n}}}{V_{G_{\bf n},h_{\bf n}}};\tilde{k}^{2},1\Big{)}}.

Therefore, Theorem 1 is derived.

Proof of Theorem 2:

Optimal maps f𝐦f_{\bf m} and G𝐦G_{\bf m} are also written as

f𝐦=f𝐧βp,G𝐦=G𝐧βqT.\displaystyle f_{\bf m}=f_{\bf n}-\beta p,\quad G_{\bf m}=G_{\bf n}-\beta q^{\mathrm{T}}.

Hence, the objective function (7a) is rewritten as pp and qq function, i.e.,

k1βf𝐦f𝐧+k22γ2G𝐦G𝐧2=k1|p|+k2β22γ2q2.\displaystyle\frac{k_{1}}{\|\beta\|}\|f_{\bf m}-f_{\bf n}\|+\frac{k_{2}}{2\gamma^{2}}\|G_{\bf m}-G_{\bf n}\|^{2}=k_{1}|p|+k_{2}\frac{\|\beta\|^{2}}{2\gamma^{2}}\|q\|^{2}.

Also the condition of constraint (7b) is written

HJβ(f𝐦,G𝐦,h𝐧)\displaystyle\mathrm{HJ}_{\beta}(f_{\bf m},G_{\bf m},h_{\bf n})
=βTf𝐦+12γ2G𝐦Tβ2+12h𝐦2\displaystyle=\beta^{\mathrm{T}}f_{\bf m}+\frac{1}{2\gamma^{2}}\|G_{\bf m}^{\mathrm{T}}\beta\|^{2}+\frac{1}{2}\|h_{\bf m}\|^{2}
=βT(f𝐧βp)+12γ2(G𝐧βqT)Tβ2+12h𝐧2\displaystyle=\beta^{\mathrm{T}}(f_{\bf n}-\beta p)+\frac{1}{2\gamma^{2}}\|(G_{\bf n}-\beta q^{\mathrm{T}})^{\mathrm{T}}\beta\|^{2}+\frac{1}{2}\|h_{\bf n}\|^{2}
=β2p+12γ2β4q21γ2β2βTG𝐧q+HJβ(f𝐧,G𝐧,h𝐧)\displaystyle=-\|\beta\|^{2}p+\frac{1}{2\gamma^{2}}\|\beta\|^{4}\|q\|^{2}-\frac{1}{\gamma^{2}}\|\beta\|^{2}\beta^{\mathrm{T}}G_{\bf n}q+\mathrm{HJ}_{\beta}(f_{\bf n},G_{\bf n},h_{\bf n})
0.\displaystyle\leq 0.

If xqx\triangleq q, ypy\triangleq p, and

Aβ22γ2Im,b12γ2G𝐦Tβ,c1β2(HJβ(f𝐧,G𝐧,h𝐧)),\displaystyle A\triangleq\frac{\|\beta\|^{2}}{2\gamma^{2}}I_{m},\quad b\triangleq\frac{1}{2\gamma^{2}}G_{\bf m}^{\mathrm{T}}\beta,\quad c\triangleq\frac{1}{\|\beta\|^{2}}\Big{(}\mathrm{HJ}_{\beta}(f_{\bf n},G_{\bf n},h_{\bf n})\Big{)},

then this optimal problem (7) becomes the problem used in Lemma 1. The optimal points of pp and qq are given by

p\displaystyle p^{*} =1β2R(Vf𝐧,h𝐧+k^22(VG𝐧)),\displaystyle=\frac{1}{\|\beta\|^{2}}\mathrm{R}\left(V_{f_{\bf n},h_{\bf n}}+\hat{k}_{2}^{2}(V_{G_{\bf n}})\right),
q\displaystyle q^{*} =1β2(1C(Vf𝐧,h𝐧VG𝐧;k~2,1))G𝐧Tβ,\displaystyle=\frac{1}{\|\beta\|^{2}}\left(1-\sqrt{\mathrm{C}\Big{(}-\tfrac{V_{f_{\bf n},h_{\bf n}}}{V_{G_{\bf n}}};\tilde{k}^{2},1\Big{)}}\right)G_{\bf n}^{\mathrm{T}}\beta,

Therefore, the solution of problem (7) becomes Eqs. (8).

Appendix B Proof of the particular solution of QCQP

This section presents the proof of the following QCQP problem.

minimize kxxTAx+ky|y|\displaystyle k_{\rm x}x^{\mathrm{T}}Ax+k_{\rm y}|y| (10a)
subject to yxTAx2bTx+c,\displaystyle y\geq x^{\mathrm{T}}Ax-2b^{\mathrm{T}}x+c, (10b)

where AA is a positive definite matrix. We classify this problem according to the parameter A,bA,b and cc.

First, the solution of this optimal problem is switched depending on the positive or negative value of cc.

Lemma 2.

If c0c\leq 0 then the solution of Eq. (10) is (x,y)=(0,0)(x,y)=(0,0). If c>0c>0, the solution xx^{*} of Eq. (10) equals the solution of the following problem:

minimize kxxTAx+ky|xTAx2bTx+c|.\displaystyle k_{x}x^{\mathrm{T}}Ax+k_{y}|x^{\mathrm{T}}Ax-2b^{\mathrm{T}}x+c|. (11)

Furthermore, the solution yy^{*} is given by

y=kxxTAx2bTx+c.\displaystyle y^{*}=k_{x}x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c. (12)
Proof:.

The objective function (10a) is strictly convex and the minimum point is (x,y)=(0,0)(x,y)=(0,0). If c0c\leq 0, the origin (x,y)=(0,0)(x,y)=(0,0) satisfies the condition of constraint (10b). Therefore, the solution of Eq. (10) is (x,y)=(0,0)(x,y)=(0,0).

If c>0c>0, (x,y)=(0,0)(x,y)=(0,0) does not satisfies the constraint condition (10b), and the optimal solution belong the boundary of the region satisfying (10b). Therefore, the solution point (x,y)(x^{*},y^{*}) satisfies Eq. (12). ∎

Also, the solution of the new optimal problem (11) is switched by the ratio between cc and bTA1bb^{\mathrm{T}}A^{-1}b.

Lemma 3.

If c~>1k~x2\tilde{c}>1-\tilde{k}_{x}^{2} then the solution xx^{*} of the optimal problem (11) is given by

x=(1k~x)A1b,\displaystyle x^{*}=(1-\tilde{k}_{x})A^{-1}b, (13)

where

c~=cbTA1b,k~x=kxkx+ky.\displaystyle\tilde{c}=\frac{c}{b^{\mathrm{T}}A^{-1}b},\quad\tilde{k}_{x}=\frac{k_{x}}{k_{x}+k_{y}}.

If c~1k~x2\tilde{c}\leq 1-\tilde{k}_{x}^{2}, the solution xx^{*} equals the solution of the following QCQP problem, such that

minimize xTAx\displaystyle x^{\mathrm{T}}Ax (14a)
subject to xTAx2bTx+c=0.\displaystyle x^{\mathrm{T}}Ax-2b^{\mathrm{T}}x+c=0. (14b)
Proof:.

The optimal problem (11) is split on the sign of xTAx2bTx+cx^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c.

Case xTAx2bTx+c<0x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c<0:

The optimal problem of this case is written as

minimize (kxky)xTAx+2kybTxkyc.\displaystyle(k_{x}-k_{y})x^{\mathrm{T}}Ax+2k_{y}b^{\mathrm{T}}x-k_{y}c.

If kxky0k_{x}-k_{y}\leq 0, there is no optimal point. Otherwise kxky>0k_{x}-k_{y}>0, the optimal point is written as

x=kykxkyA1b.\displaystyle x^{*}=-\frac{k_{y}}{k_{x}-k_{y}}A^{-1}b.

The optimal point does not satisfies the condition xTAx2bTx+c<0x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c<0, because

xTAx2bTx+c\displaystyle x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c
=ky2(kxky)2bTA1b+2kykxkybTA1b+c>0,\displaystyle=\frac{k_{y}^{2}}{(k_{x}-k_{y})^{2}}b^{\mathrm{T}}A^{-1}b+\frac{2k_{y}}{k_{x}-k_{y}}b^{\mathrm{T}}A^{-1}b+c>0,

where A1A^{-1} is a positive definite matrix and c>0c>0.

Case xTAx2bTx+c>0x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c>0:

The problem (11) is written as

minimize (kx+ky)xTAx2kybTx+kyc.\displaystyle(k_{x}+k_{y})x^{\mathrm{T}}Ax-2k_{y}b^{\mathrm{T}}x+k_{y}c.

Hence, this optimal point is written as

x\displaystyle x^{*} =kykx+kyA1b\displaystyle=\frac{k_{y}}{k_{x}+k_{y}}A^{-1}b
=(1k~x)A1b,\displaystyle=(1-\tilde{k}_{x})A^{-1}b,

where A\sqrt{A} is a square root of the positive definite matrix AA. The condition xTAx2bTx+c>0x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c>0 is rewritten by the previous optimal solution, such that

xTAx2bTx+c\displaystyle x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c
=(1k~x)2bTA1b2(1k~x)bTA1b+c\displaystyle=(1-\tilde{k}_{x})^{2}b^{\mathrm{T}}A^{-1}b-2(1-\tilde{k}_{x})b^{\mathrm{T}}A^{-1}b+c
=(1k~x2)bTA1b+c>0\displaystyle=-\left(1-\tilde{k}_{x}^{2}\right)b^{\mathrm{T}}A^{-1}b+c>0
c~>1k~x2.\displaystyle\Leftrightarrow\tilde{c}>1-\tilde{k}_{x}^{2}.

Case xTAx2bTx+c=0x^{\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c=0: The problem (11) is written as the QCQP problem (14). ∎

The solution of the simple QCQP problem (14) is easily derived using the method of Lagrange multiplier.

Lemma 4.

The solution of the simple QCQP problem (14) is given by

x=(11c~)A1b.\displaystyle x^{*}=\left(1-\sqrt{1-\tilde{c}}\right)A^{-1}b.
Proof:.

Supposing a Lagrange multiplier λ>0\lambda>0, the Lagrange function is written as

L(x,λ)=xTAx+λ(xTAx2bTx+c).\displaystyle L(x,\lambda)=x^{\mathrm{T}}Ax+\lambda(x^{\mathrm{T}}Ax-2b^{\mathrm{T}}x+c).

The KKT condition is given by

L(x,λ)x\displaystyle\frac{\partial L(x^{*},\lambda)}{\partial x} =2(1+λ)Ax2λb=0.\displaystyle=2(1+\lambda)Ax^{*}-2\lambda b=0. (15)
L(x,λ)λ\displaystyle\frac{\partial L(x^{*},\lambda)}{\partial\lambda} =xTAx2bTx+c=0,\displaystyle=x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c=0, (16)

Eq. (15) is written as

x\displaystyle x^{*} =λ1+λA1b,\displaystyle=\frac{\lambda}{1+\lambda}A^{-1}b,

and the Eq. (16) is given by

xTAx2bTx+c=0\displaystyle x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c=0
\displaystyle\Leftrightarrow λ2(1+λ)2bTA1b2λ1+λbTA1b+c=0\displaystyle\frac{\lambda^{2}}{(1+\lambda)^{2}}b^{\mathrm{T}}A^{-1}b-\frac{2\lambda}{1+\lambda}b^{\mathrm{T}}A^{-1}b+c=0
\displaystyle\Leftrightarrow λ2+2λ(1+λ)2bTA1b+c=0\displaystyle-\frac{\lambda^{2}+2\lambda}{(1+\lambda)^{2}}b^{\mathrm{T}}A^{-1}b+c=0
\displaystyle\Leftrightarrow (λ2+2λ)bTA1b+(1+λ)2c=0\displaystyle-(\lambda^{2}+2\lambda)b^{\mathrm{T}}A^{-1}b+(1+\lambda)^{2}c=0
\displaystyle\Leftrightarrow (cbTA1b)λ2+2(cbTA1b)λ+c=0\displaystyle(c-b^{\mathrm{T}}A^{-1}b)\lambda^{2}+2(c-b^{\mathrm{T}}A^{-1}b)\lambda+c=0
\displaystyle\Leftrightarrow λ2+2λ+c~(c~1)=0\displaystyle\lambda^{2}+2\lambda+\frac{\tilde{c}}{(\tilde{c}-1)}=0
\displaystyle\Leftrightarrow λ=1±1c~c~1\displaystyle\lambda=-1\pm\sqrt{1-\frac{\tilde{c}}{\tilde{c}-1}}
\displaystyle\Leftrightarrow λ=1±11c~.\displaystyle\lambda=-1\pm\sqrt{\frac{1}{1-\tilde{c}}}.

As λ>0\lambda>0 and c~>0\tilde{c}>0, the Lagrange multiplier is written as

λ=1+11c~.\displaystyle\lambda=-1+\sqrt{\frac{1}{1-\tilde{c}}}.

Therefore, the optimal point xx^{*} is given by

x\displaystyle x^{*} =λ1+λA1b\displaystyle=\frac{\lambda}{1+\lambda}A^{-1}b
=1+11c~11c~A1b\displaystyle=\frac{-1+\sqrt{\frac{1}{1-\tilde{c}}}}{\sqrt{\frac{1}{1-\tilde{c}}}}A^{-1}b
=(11c~)A1b.\displaystyle=(1-\sqrt{1-\tilde{c}})A^{-1}b.

Finally, we summarize three Lemmas 2-4 and solve the QCQP problem (10).

Lemma 1.

The solution of the problem (10) is given by

x\displaystyle x^{*} =(1C(1c~;k~x2,1))A1b,\displaystyle=\left(1-\sqrt{\mathrm{C}\left(1-\tilde{c};\tilde{k}_{x}^{2},1\right)}\right)A^{-1}b, (17a)
y\displaystyle y^{*} =R(c(1k~x2)bTA1b).\displaystyle=\mathrm{R}\left(c-(1-\tilde{k}_{x}^{2})b^{\mathrm{T}}A^{-1}b\right). (17b)
Proof:.

Lemmas 2-4 split the solution xx^{*} as three cases:

x\displaystyle x^{*} :={(1k~x)A1b1k~x2<c~(11c~)A1b0<c~1k~x20c0\displaystyle:=\begin{cases}(1-\tilde{k}_{x})A^{-1}b&1-\tilde{k}_{x}^{2}<\tilde{c}\\ \left(1-\sqrt{1-\tilde{c}}\right)A^{-1}b&0<\tilde{c}\leq 1-\tilde{k}_{x}^{2}\\ 0&c\leq 0\end{cases}
:={(1k~x2)A1b1c~<k~x2(11c~)A1bk~x21c~<1(11)A1b11c~\displaystyle:=\begin{cases}\left(1-\sqrt{\tilde{k}_{x}^{2}}\right)A^{-1}b&1-\tilde{c}<\tilde{k}_{x}^{2}\\ \left(1-\sqrt{1-\tilde{c}}\right)A^{-1}b&\tilde{k}_{x}^{2}\leq 1-\tilde{c}<1\\ \left(1-\sqrt{1}\right)A^{-1}b&1\leq 1-\tilde{c}\end{cases}
=(1C(1c~;k~x2,1))A1b.\displaystyle=\left(1-\sqrt{\mathrm{C}\left(1-\tilde{c};\tilde{k}_{x}^{2},1\right)}\right)A^{-1}b.

Furthermore, the solution of yy^{*} is written as,

y\displaystyle y^{*} :={xTAx2bTx+c1k~x2<c~xTAx2bTx+c0<c~1k~x20c0\displaystyle:=\begin{cases}x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c&1-\tilde{k}_{x}^{2}<\tilde{c}\\ x^{*\mathrm{T}}Ax^{*}-2b^{\mathrm{T}}x^{*}+c&0<\tilde{c}\leq 1-\tilde{k}_{x}^{2}\\ 0&c\leq 0\end{cases}
={c(1k~x2)bTA1b1k~x2<c~00<c~1k~x20c~0\displaystyle=\begin{cases}c-(1-\tilde{k}_{x}^{2})b^{\mathrm{T}}A^{-1}b&1-\tilde{k}_{x}^{2}<\tilde{c}\\ 0&0<\tilde{c}\leq 1-\tilde{k}_{x}^{2}\\ 0&\tilde{c}\leq 0\end{cases}
=R(c(1k~x2)bTA1b).\displaystyle=\mathrm{R}(c-(1-\tilde{k}_{x}^{2})b^{\mathrm{T}}A^{-1}b).

Therefore, the solution become Eq. (17). ∎

Appendix C Overall schematic of the learning process

Algorithm 1 shows the overall schematic of the learning process. The first line defines the modified dynamics (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) from the nominal dynamics (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}), defined by the neural network, where ϕ\phi is a set of parameters of the nominal dynamics. The 2-7 line represents a training loop, where the gradient-based optimization methods can be used by using the forward and backward calculation. Note that an ODE solver is used for forward calculation, and Algorithm 2 shows the forward calculation when the Euler method is used. For simplicity, mini-batch computation omitted in this schematic.

Algorithm 1 Training process
0:  x0x_{0}: initial state, uu: input signal,yy: output signal, (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}): nominal dynamics, VV: a designed function
1:  define modified functions (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) from (f𝐧,G𝐧,h𝐧)(f_{\bf n},G_{\bf n},h_{\bf n}) and VV
2:  for 11 to #\#iterations do
3:     y^\hat{y}\leftarrow ODE with (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}) from x0,ux_{0},u (Algorithm 2)
4:     forward computation of Loss{\rm Loss} function (9) from yy
5:     ϕLoss\nabla_{\phi}{\rm Loss}\leftarrow backward computation with Loss{\rm Loss}
6:     ϕ\phi\leftarrow Optimizer(ϕ\phi, ϕLoss\nabla_{\phi}{\rm Loss})
7:  end for

Algorithm 2 Forward computation for dynamics Eq. (1)
0:  x0x_{0}: initial state, uu: input signal, (f𝐦,G𝐦,h𝐦)(f_{\bf m},G_{\bf m},h_{\bf m}): dynamics
0:  y^\hat{y}: output signal
1:  for t 0\leftarrow 0 to TT do
2:     xt+1xt+Δt(f𝐦(xt)+G𝐦(xt)ut)x_{t+1}\leftarrow x_{t}+\Delta t(f_{\bf m}(x_{t})+G_{\bf m}(x_{t})u_{t})
3:     y^th𝐦(xt)\hat{y}_{t}\leftarrow h_{\bf m}(x_{t})
4:  end for
5:  return  y^\hat{y}

Appendix D Neural network architecture and hyper parameters

Table 1: The search space of Bayesian optimization
parameter name range type
learning rate 10510^{-5}10310^{-3} log scale
weight decay 101010^{-10}10610^{-6} log scale
batch size 1010100100 integer
optimizer {\{ AdamW, Adam, RMSProp}\} categorical
#\#layer for f𝐧f_{\bf n} 033 integer
#\#layer for G𝐧G_{\bf n} 033 integer
#\#layer for h𝐧h_{\bf n} 033 integer
#\#dim. for a hidden layer of f𝐧f_{\bf n} 8328-32 integer
#\#dim. for a hidden layer of G𝐧G_{\bf n} 8328-32 integer
#\#dim. for a hidden layer of h𝐧h_{\bf n} 8328-32 integer
ϵ\epsilon 01.01.0 log scale
Initial scale parameter for f𝐧f_{\bf n} 10510^{-5}0.10.1 log scale
Stop gradient for projection true, false boolean

This section details how to determine the neural network architecture. The architecture and hyper parameters of the neural networks were basically determined by Bayesian optimizing using the validation dataset.

Table 1 shows the search space of Bayesian optimization. The first three parameters: learning rate, weight decay, and batch size are parameters for training the neural networks. Also, an optimizer is selected from AdamW, Adam, and RMSProp. The structure of neural network is determined from the number of intermediate layers and dimensions for each layer. One layer in our setting consists of a fully connected layer with a ReLU activation. The last three rows represent parameters related to our proposed methods. ϵ\epsilon is a parameter of the loss function LHJL_{{\rm HJ}}. Initial scale parameter is multiplied with the output of f𝐧f_{\bf n} to prevent the value of f𝐧(x)f_{\bf n}(x) from becoming large in the initial stages of learning. When f𝐧(x)f_{\bf n}(x), which determine the behavior of the internal system, outputs a large value, it diverges due to time evolution, and the learning of the entire system may not progress. Therefore, it is empirically preferable to start with a small value for f𝐧(x)f_{\bf n}(x) at the initial stage of learning. When the flag of ‘stop gradient for projection’ is false, backward computation related to the second term of modification of f𝐦f_{\bf m} and G𝐦G_{\bf m} is disabled. Note that modification related to f𝐦f_{\bf m} and G𝐦G_{\bf m} consists of two terms (see Theorem 1). Setting this parameter to false resulted in better performance in our all experiments.

We ran 300 trials using the Bayesian optimization for the bistable model benchmark and the glucose insulin benchmark with the above settings, and the hyper parameters obtained are shown in Table 2, where the number of dimensions for each hidden layer is shown in tuple from the order closest to the input layer.

Hyperparameters of comparative methods were determined by grid search using the validation dataset. ARX and PWARX have an order parameter nn of the autoregressive model, and this parameter is searched in the range of 151-5. The number of iterations was set to 10000 so that the optimization of PWARX converges sufficiently. MOESP and ORT have an internal dimension nn (1n201\leq n\leq 20) and the number of subsequences used for estimation kk (2n<k202n<k\leq 20).

Table 2: Selected parameters for each benchmark
parameter name bistable glucose insulin
learning rate 3.01×1043.01\times 10^{-4} 3.28×1043.28\times 10^{-4}
weight decay 4.76×1094.76\times 10^{-9} 2.28×1092.28\times 10^{-9}
batch size 100100 100100
optimizer RMSProp RMSProp
#\#layer for f𝐧f_{\bf n} 33 11
#\#layer for G𝐧G_{\bf n} 11 22
#\#layer for h𝐧h_{\bf n} 33 22
#\#dim. for a hidden layer of f𝐧f_{\bf n} (17,10,22) (8)
#\#dim. for a hidden layer of G𝐧G_{\bf n} (34) (27,29)
#\#dim. for a hidden layer of h𝐧h_{\bf n} (10,62,58) (35,18)
ϵ\epsilon 0.630.63 0.750.75
Initial scale parameter for f𝐧f_{\bf n} 9.64×1029.64\times 10^{-2} 8.94×1028.94\times 10^{-2}
Stop gradient for projection false false

Appendix E Glucose-Insulin system

Glucose concentration in the blood is modeled as a time-delay system regulated by insulin concentration (See Fig. 7 (A)) [32]. Suppose that GG II, and XX are the glucose, insulin, and accumulated glucose plasma concentration ([mg/100ml],[μ\muUI/ml],and [min mg/100ml], respectively) and uu is the amount of ingested glucose per minute [min-1 mg/100ml]. The dynamics of each concentration is given by

G˙(t)\displaystyle\dot{G}(t) =k1G(t)k2G(t)I(t)+g0+u(t),\displaystyle=-k_{1}G(t)-k_{2}G(t)I(t)+g_{0}+u(t),
I˙(t)\displaystyle\dot{I}(t) =k3I(t)+k4τX(t),\displaystyle=-k_{3}I(t)+\frac{k_{4}}{\tau}X(t),
X˙(t)\displaystyle\dot{X}(t) =G(t)G(tτ),\displaystyle=G(t)-G(t-\tau),
y(t)\displaystyle y(t) =[G(t),I(t)]T,\displaystyle=[G(t),I(t)]^{\mathrm{T}},

where k1k_{1} is a spontaneous glucose disappearance rate, k2k_{2} is an insulin-dependent glucose disappearance rate, g0g_{0} is a constant increase in plasma glucose concentration, k3k_{3} is an insulin disappearance rate, k4k_{4} is an insulin release rate per the average glucose concentration within the last τ\tau minute.

This system has a unique asymptotically stable equilibrium point (G,I,X)(G,I,X)(G,I,X)\equiv(G^{*},I^{*},X^{*}) on the nonlinear plain such that

G\displaystyle G^{*} =k1k3+(k1k3)2+4k2k3k4g02k2k4,\displaystyle=\frac{-k_{1}k_{3}+\sqrt{(k_{1}k_{3})^{2}+4k_{2}k_{3}k_{4}g_{0}}}{2k_{2}k_{4}},
I\displaystyle I^{*} =k1k3+(k1k3)2+4k2k3k4g02k2k3,\displaystyle=\frac{-k_{1}k_{3}+\sqrt{(k_{1}k_{3})^{2}+4k_{2}k_{3}k_{4}g_{0}}}{2k_{2}k_{3}},
X\displaystyle X^{*} =k1k3+(k1k3)2+4k2k3k4g02k2k4τ.\displaystyle=\frac{-k_{1}k_{3}+\sqrt{(k_{1}k_{3})^{2}+4k_{2}k_{3}k_{4}g_{0}}}{2k_{2}k_{4}}\tau.

Here we set the initial state of this model as G(0)=G,I(0)=I,X(0)=XG(0)=G^{*},I(0)=I^{*},X(0)=X^{*} and set the model parameters as shown in the following Table 3. Furthermore, we adopt the output of the previous oral glucose absorption system [33] as uu (See Fig. 7 (B)), and the glucose absorption amount uu is normalized based on human blood volume per body weight (0.80[100 ml/kg]).

Table 3: Model parameters.
Parameter Value Unit
k1k_{1} 3.35×1023.35\times 10^{-2} 1min\frac{1}{\text{min}}
k2k_{2} 5.22×1055.22\times 10^{-5} 1min(μUI/ml)\frac{1}{\text{min}(\mu\text{UI/ml})}
k3k_{3} 1.0551.055 1min\frac{1}{\text{min}}
k4k_{4} 0.2930.293 (μUI/ml)min(mg/100ml)\frac{(\mu\text{UI/ml})}{\text{min}\text{(mg/100ml)}}
g0g_{0} 3.13 (mg/100ml)min\frac{\text{(mg/100ml)}}{\text{min}}
τ\tau 66 min
Refer to caption Refer to caption
(A) (B)
Figure 7: (A) Overview and (B) input and output behavior of the glucose insulin system.