This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Koopman operator-based discussion on partial observation in stochastic systems

Jun Ohkubo Graduate School of Science and Engineering, Saitama University,
255 Shimo-Okubo, Sakura-ku, Saitama 338-8570, Japan
johkubo@mail.saitama-u.ac.jp
Abstract

It is sometimes difficult to achieve a complete observation for a full set of observables, and partial observations are necessary. For deterministic systems, the Mori-Zwanzig formalism provides a theoretical framework for handling partial observations. Recently, data-driven algorithms based on the Koopman operator theory have made significant progress, and there is a discussion to connect the Mori-Zwanzig formalism with the Koopman operator theory. In this work, we discuss the effects of partial observation in stochastic systems using the Koopman operator theory. The discussion clarifies the importance of distinguishing the state space and the function space in stochastic systems. Even in stochastic systems, the delay embedding technique is beneficial for partial observation, and several numerical experiments showed a power-law behavior of the accuracy for the amplitude of the additive noise. We also discuss the relation between the exponent of the power-law behavior and the effects of partial observation.

1 Introduction

In a classical dynamical system, the behavior is deterministic if we can observe complete information for the system. However, it would be difficult to perform a perfect observation, and partial observation causes stochasticity. A simple example is the movement of a ball; if only the coordinate is measurable but not its velocity, it is impossible to predict in which direction and how much it will move next. The effect of partial observation has also been discussed in statistical physics; constructing coarse-grained models from high-dimensional microscopic ones leads to time-evolution equations of a smaller set of relevant variables of interest. The unobserved variables could play a role as noise. In the recent development of data-driven approaches, the effects of partial observations are crucial because it could be difficult to observe all variables. Hence, it would be desirable to obtain even a portion of the overall information from partial observation.

One of the famous ways for handling partial observations is the Mori-Zwanzig formalism [1, 2, 3]. Mori and Zwanzig developed projection methods to express the effect of unobserved variables in terms of observable ones. The time-evolution equation for the observable variables is called the generalized Langevin equation. The crucial points in the generalized Langevin equation are as follows:

  • Although the original dynamics is deterministic and Markovian, the generalized Langevin equation has a memory term that depends on the history.

  • There is a term that could be interpreted as noise; the noise term represents the orthogonal dynamics and depends on the unknown initial conditions of the unobserved variables.

There are many works on the Mori-Zwanzig formalism. For example, some employed the Mori-Zwanzig formalism to improve prediction accuracy [4, 5, 6, 7, 8, 9]; there are so many related works, and see the reference in, for example, [9]. Some works focused on data-driven approaches for the Mori-Zwanzig formalism [10, 11, 12, 13, 14], and some discussed the application in neural networks [15, 16].

Recently, a data-driven approach based on the Koopman operator has been connected to the discussion of the Mori-Zwanzig formalism [17, 18, 19]. The Koopman operator [20] enables us to deal with nonlinear dynamical systems in terms of linear algebra. The well-known methods of incorporating data into the Koopman operator include the dynamic mode decomposition (DMD)[21, 22], the extended dynamic mode decomposition (EDMD)[23], and the Hankel DMD[24, 25]. For details of the Koopman approach, see the book [26] and reviews [27, 28, 29, 30, 31]. In [18], a discussion based on the Koopman approach naturally leads to the generalized Langevin equation and algorithms for estimating the key components from time series datasets. There are also some works on this topic; [17] focused on the Wiener projection, and [19] employed regression-based projections.

Here, we note that previous discussions on the Koopman-based approach to partial observations have been basically restricted only to deterministic dynamical systems. Of course, the original Mori-Zwanzig formalism targeted deterministic systems, and it would be natural to consider the Koopman-based approach to deterministic cases. However, the Koopman approach is not restricted to deterministic cases and is available to stochastic systems; for example, see [23, 32, 33]. In some cases, one would consider time-evolution equations with noise as the starting point; the noise term in the generalized Langevin equation is not an actual noise. Stochastic differential equations with additive Wiener noise are widely used in statistical physics, and it remains unclear how to consider partial observations in stochastic systems.

In the present paper, we discuss the effects of partial observation in stochastic systems. Here, we focus on the stochastic differential equation with additive Wiener noise. To discuss partial observation, the Koopman operator approach is employed. In the Koopman operator approach, it is crucial to distinguish the state space and the function space; it will be clarified that the generalized Langevin equation in the state space is derived only in deterministic cases. The discussion on stochastic systems clarifies the effect of delay embedding, which improves estimation accuracy over the use of higher-order basis functions. In addition, numerical experiments for the noisy van der Pol system and the noise Lorentz system show a power-law dependency of the accuracy on the noise amplitude in the partial observation settings.

This article is organized as follows. In Sec. 2, we review the Koopman operator approach. Section 3 explains the previous discussions on the Mori-Zwanzig formalism with the Koopman operator approach, where we will emphasize the role of the deterministic feature in the explanation. Then, Sec. 4 yields discussions on the partial observations in stochastic systems and the effects of delay embedding. We also present some results from numerical experiments. In the numerical experiments, the noisy van der Pol system and the noisy Lorenz system are employed as toy examples.

2 Preliminaries on Koopman operator approach

In the present paper, it is crucial to distinguish the state vectors and observable functions that yield the values of state vectors. We review the Koopman operator approach with an emphasis on this point. For further details, refer to [23].

2.1 Distinction between state space and function space

Consider a state space D\mathcal{M}\subseteq\mathbb{R}^{D} and an adequate functional space \mathcal{F}; an element in \mathcal{F} is called an observable function, and ϕ:\mathcal{F}\ni\phi:\mathcal{M}\to\mathbb{R}. To emphasize the distinctions between the state space and the functional space, we add underlines to denote state vectors on \mathcal{M}, e.g., 𝒙¯\underline{\bm{x}}. The dd-th element of 𝒙¯\underline{\bm{x}} is denoted as xd¯\underline{x_{d}}\in\mathbb{R}. By contrast, an observable function, which yields the value of the dd-th element xd¯\underline{x_{d}}, is denoted as xd(𝒙¯)=xd¯x_{d}(\underline{\bm{x}})=\underline{x_{d}}. Note that it is common in many papers on the Koopman operator approach to use abbreviations, e.g., xdx_{d}\in\mathcal{F}, for notational simplicity. While the abbreviations are convenient, we again emphasize the importance of distinguishing between elements in the state space and those in the function space. Hence, we employ the notation with underlines for the state space.

2.2 Deterministic systems

Let 𝒙(t)¯\underline{\bm{x}(t)} be the state vector of the system at time tt. The deterministic time-evolution equation for 𝒙(t)¯\underline{\bm{x}(t)} is given by

t𝒙(t)¯=𝒂(𝒙(t)¯),\displaystyle\frac{\rmd}{\rmd t}\underline{\bm{x}(t)}=\bm{a}(\underline{\bm{x}(t)}), (1)

where 𝒂(𝒙¯)\bm{a}(\underline{\bm{x}}) is a vector of functions giving the time-evolution for each coordinate xd(t)¯\underline{x_{d}(t)}. While it could be possible to include the time variable tt in the functions, we assume that 𝒂(𝒙¯)\bm{a}(\underline{\bm{x}}) is time-independent for simplicity.

It is beneficial to see the coupled ordinary differential equations in (1) from the viewpoint of a dynamical system with discrete-time steps. Setting a time interval for observation as Δtobs\Delta t_{\mathrm{obs}}, the state vector evolves in time as follows:

𝒙(t+Δtobs)¯=𝑭Δtobs(𝒙(t)¯),\displaystyle\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}=\bm{F}_{\Delta t_{\mathrm{obs}}}(\underline{\bm{x}(t)}), (2)

where 𝑭Δtobs:\bm{F}_{\Delta t_{\mathrm{obs}}}:\mathcal{M}\to\mathcal{M} yields the state vector after the time-evolution with Δtobs\Delta t_{\mathrm{obs}}.

In the Koopman operator approach, we consider a time-evolution of observable functions instead of that of the state vector. Consider the observable function xd:x_{d}:\mathcal{M}\to\mathbb{R}, which yields the dd-th element of the state vector. Then, we consider a function that gives the dd-th element of the state vector after the time-evolution with Δtobs\Delta t_{\mathrm{obs}}. As we will see later, it is possible to obtain such a function by employing a map 𝒦Δtobs:\mathcal{K}_{\Delta t_{\mathrm{obs}}}:\mathcal{F}\to\mathcal{F}:

𝒦Δtobsxd=xd𝑭Δtobs,\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}=x_{d}\circ\bm{F}_{\Delta t_{\mathrm{obs}}}, (3)

which leads to

xd(𝒙(t+Δtobs)¯)=𝒦Δtobsxd(𝒙(t)¯).\displaystyle x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})=\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}(t)}). (4)

Note that the Koopman operator is linear even if the original system is nonlinear.

Refer to caption
Figure 1: The role of the Koopman operator 𝒦Δtobs\mathcal{K}_{\Delta t_{\mathrm{obs}}}. We here consider an example for D=2D=2. The change of x1(t)¯\underline{x_{1}(t)} to x1(t+Δtobs)¯\underline{x_{1}(t+\Delta t_{\mathrm{obs}})} on the state space corresponds to x1(𝒙(t)¯)x_{1}(\underline{\bm{x}(t)}) to 𝒦Δtobsx1(𝒙(t)¯)\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}(\underline{\bm{x}(t)}) on the function space. Note that the Koopman operator varies the flat plane x1x_{1} to the curved surface, which corresponds to the map on the function space 𝒦Δtobs:\mathcal{K}_{\Delta t_{\mathrm{obs}}}:\mathcal{F}\to\mathcal{F}.

Figure 1 shows the correspondence between the state space \mathcal{M} and the function space \mathcal{F}. Although the observable function x1x_{1} is the simple flat plane, the action of the Koopman operator yields the curved surface 𝒦Δtobsx1\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}. The function 𝒦Δtobsx1:\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}:\mathcal{M}\to\mathbb{R} gives the coordinate value after the time-evolution, i.e., x1(t+Δtobs)¯\underline{x_{1}(t+\Delta t_{\mathrm{obs}})}.

Note that while it is possible to consider various types of observable functions, we focus only on the simple observables {xd}\{x_{d}\} in the present work. A collection of the observable functions for the coordinates is called a full-state observable 𝒈:D\bm{g}:\mathcal{M}\to\mathbb{R}^{D}, which is a vector-valued observable function and

𝒈=[x1x2xD].\displaystyle\bm{g}=[x_{1}\quad x_{2}\quad\dots\quad x_{D}]^{\top}. (5)

Again, we note that it is crucial to distinguish 𝒈\bm{g} and 𝒙¯=[x1¯x2¯xD¯]D\underline{\bm{x}}=[\underline{x_{1}}\,\,\underline{x_{2}}\,\,\dots\,\,\underline{x_{D}}]^{\top}\in\mathbb{R}^{D}.

2.3 Extended dynamic mode decomposition

As discussed in [23], we can obtain an approximate representation of the Koopman operator 𝒦Δtobs\mathcal{K}_{\Delta t_{\mathrm{obs}}} by a data-driven approach. That is, introducing a set of basis functions, called a dictionary, leads to a Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} for the time interval Δtobs\Delta t_{\mathrm{obs}}. The data-driven method is called the EDMD algorithm.

The EDMD algorithm requires a data set of snapshot pairs. The snapshot pairs are denoted as {(𝒙n¯,𝒚n¯)}n=1Ndata\{(\underline{\bm{x}_{n}},\underline{\bm{y}_{n}})\}_{n=1}^{N_{\mathrm{data}}}, where

𝒙n¯=[xn1(t)¯xn2(t)¯xnD(t)¯]\displaystyle\underline{\bm{x}_{n}}=[\underline{x_{n1}(t)}\quad\underline{x_{n2}(t)}\quad\cdots\quad\underline{x_{nD}(t)}]^{\top} (6)

and

𝒚n¯\displaystyle\underline{\bm{y}_{n}} =𝑭Δtobs(𝒙n¯)\displaystyle=\bm{F}_{\Delta t_{\mathrm{obs}}}(\underline{\bm{x}_{n}}) (7)
=[xn1(t+Δtobs)¯xn2(t+Δtobs)¯xnD(t+Δtobs)¯].\displaystyle=[\underline{x_{n1}(t+\Delta t_{\mathrm{obs}})}\quad\underline{x_{n2}(t+\Delta t_{\mathrm{obs}})}\quad\cdots\quad\underline{x_{nD}(t+\Delta t_{\mathrm{obs}})}]^{\top}.

NdataN_{\mathrm{data}} represents the number of snapshot pairs. The dictionary, 𝝍\bm{\psi}, consists of NdicN_{\mathrm{dic}} basis functions, i.e.,

𝝍(𝒙¯)=[ψ1(𝒙¯)ψ2(𝒙¯)ψNdic(𝒙¯)],\displaystyle\bm{\psi}(\underline{\bm{x}})=[\psi_{1}(\underline{\bm{x}})\quad\psi_{2}(\underline{\bm{x}})\quad\cdots\quad\psi_{N_{\mathrm{dic}}}(\underline{\bm{x}})]^{\top}, (8)

where ψi:\psi_{i}:\mathcal{M}\to\mathbb{R} for i=1,,Ndici=1,\dots,N_{\mathrm{dic}}. For example, a dictionary with monomial functions for D=2D=2 is given as

𝝍(𝒙¯)=[1¯x1(𝒙¯)x2(𝒙¯)x12(𝒙¯)x1x2(𝒙¯)x22(𝒙¯)x13(𝒙¯)x25(𝒙¯)],\displaystyle\bm{\psi}(\underline{\bm{x}})=[\underline{1}\quad x_{1}(\underline{\bm{x}})\quad x_{2}(\underline{\bm{x}})\quad x_{1}^{2}(\underline{\bm{x}})\quad x_{1}x_{2}(\underline{\bm{x}})\quad x_{2}^{2}(\underline{\bm{x}})\quad x_{1}^{3}(\underline{\bm{x}})\quad\dots\quad x_{2}^{5}(\underline{\bm{x}})]^{\top}, (9)

where the maximum degree of the monomial functions is 55, and 1¯\underline{1} means a scalar value 11. The dictionary spans a subspace 𝝍\mathcal{F}_{\bm{\psi}}\subseteq\mathcal{F}.

Note that the dictionary with the monomial functions contains the full-state observable 𝒈\bm{g}; see the second and third elements on the right-hand side of (9). However, the action of the Koopman operator yields the function 𝒦Δtobsxd\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}, which may not be in the spanned subspace 𝝍\mathcal{F}_{\bm{\psi}}. In this case, we have the following expansion:

𝒦Δtobsxd(𝒙¯)i=1Ndicci𝒦Δtobsxdψi(𝒙¯),\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}})\simeq\sum_{i=1}^{N_{\mathrm{dic}}}c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}}_{i}\psi_{i}(\underline{\bm{x}}), (10)

where {ci𝒦Δtobsxd}i=1Ndic\{c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}}_{i}\}_{i=1}^{N_{\mathrm{dic}}} are the expansion coefficients. Similarly, we can consider the action for an arbitrary function ϕ\phi\in\mathcal{F}, 𝒦Δtobsϕ\mathcal{K}_{\Delta t_{\mathrm{obs}}}\phi, and its coefficients {ci𝒦Δtobsϕ}i=1Ndic\{c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}\phi}_{i}\}_{i=1}^{N_{\mathrm{dic}}}. Note that the function ϕ\phi before the time-evolution may not be in the spanned subspace 𝝍\mathcal{F}_{\bm{\psi}}, and the function ϕ\phi is also approximately written as follows:

ϕ(𝒙¯)i=1Ndicciϕψi(𝒙¯).\displaystyle\phi(\underline{\bm{x}})\simeq\sum_{i=1}^{N_{\mathrm{dic}}}c^{\phi}_{i}\psi_{i}(\underline{\bm{x}}). (11)

Since a linear combination of the dictionary functions yields an arbitrary function and its time-evolved function approximately, it is enough for us to consider the action of the Koopman operator to the dictionary. Then, the Koopman operator 𝒦Δtobs\mathcal{K}_{\Delta t_{\mathrm{obs}}} is approximated by a finite-size matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} as follows:

[𝒦Δtobsψ1(𝒙¯)𝒦Δtobsψ2(𝒙¯)𝒦ΔtobsψNdic(𝒙¯)]=𝒦Δtobs𝝍(𝒙¯)KΔtobs𝝍(𝒙¯).\displaystyle\left[\begin{array}[]{c}\mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{1}(\underline{\bm{x}})\\ \mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{2}(\underline{\bm{x}})\\ \vdots\\ \mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{N_{\mathrm{dic}}}(\underline{\bm{x}})\end{array}\right]=\mathcal{K}_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}})\simeq K_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}}). (16)

It is easy to determine the Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} from the dataset; we solve the following least squares problem numerically:

KΔtobs=argminK~Δtobsn=1Ndata𝝍(𝒚n¯)K~Δtobs𝝍(𝒙n¯)2.\displaystyle K_{\Delta t_{\mathrm{obs}}}=\mathop{\rm arg~{}min}\limits_{\widetilde{K}_{\Delta t_{\mathrm{obs}}}}\sum_{n=1}^{N_{\mathrm{data}}}\left\|\bm{\psi}(\underline{\bm{y}_{n}})-\widetilde{K}_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}_{n}})\right\|^{2}. (17)

Note that a comparison with (10) and (17) immediately leads to the element at the dd-th row and the ii-th column of the Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} as follows:

[KΔtobs]di=ci𝒦Δtobsxd1(for 2d(D+1)),\displaystyle\left[K_{\Delta t_{\mathrm{obs}}}\right]_{di}=c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d-1}}_{i}\quad(\textrm{for $2\leq d\leq(D+1)$}),
[KΔtobs]di=ci𝒦Δtobsψd(for d>(D+1)).\displaystyle\left[K_{\Delta t_{\mathrm{obs}}}\right]_{di}=c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{d}}_{i}\quad(\textrm{for $d>(D+1)$}). (18)

Since the first dictionary function ψ1(𝒙¯)=1(𝒙¯)\psi_{1}(\bm{\underline{x}})=1(\bm{\underline{x}}) is time-invariant, [KΔtobs]1i=1[K_{\Delta t_{\mathrm{obs}}}]_{1i}=1 for all ii.

In short, one can consider the linear action of 𝒦Δtobs\mathcal{K}_{\Delta t_{\mathrm{obs}}} in the function space instead of the nonlinear time-evolution on the state space, 𝑭Δtobs\bm{F}_{\Delta t_{\mathrm{obs}}}, in (2). This linearity naturally leads to the use of a linear combination of basis functions, which facilitates estimation from the data via the least squares method. The usage of linearity is one of the benefits of the Koopman operator approach.

2.4 Stochastic systems

As denoted in Sec. 1, the Koopman operator approach is also available for stochastic systems; for example, see [23, 32, 33]. In the stochastic cases, the action of the Koopman operator yields an expected value of a statistic for the state variable rather than the state variable itself. For example,

𝒦Δtobsx1(𝒙ini¯)=𝔼[x1(t+Δtobs)|𝒙(t)¯=𝒙ini¯].\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}(\underline{\bm{x}_{\mathrm{ini}}})=\mathbb{E}[x_{1}(t+\Delta t_{\mathrm{obs}})|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}]. (19)

In the following, we briefly explain why the Koopman operator approach yields the expectation values from the viewpoint of the Fokker-Planck and the backward Kolmogorov equations.

Consider a DD-dimensional vector of stochastic variables, 𝑿¯\underline{\bm{X}}\in\mathcal{M}, and assume that the vector 𝑿(t)¯\underline{\bm{X}(t)} at time tt obeys the following stochastic differential equation:

𝑿(t)¯=𝒂(𝑿(t)¯)t+B(𝑿(t)¯)𝑾(t)¯,\displaystyle\rmd\underline{\bm{X}(t)}=\bm{a}(\underline{\bm{X}(t)})\,\rmd t+B(\underline{\bm{X}(t)})\,\rmd\underline{\bm{W}(t)}, (20)

where 𝒂(𝑿¯)\bm{a}(\underline{\bm{X}}) is a vector of drift coefficient functions, B(𝑿¯)B(\underline{\bm{X}}) is a matrix of diffusion coefficient functions, and 𝑾(t)¯\underline{\bm{W}(t)} is a vector of Wiener processes {Wi(t)¯}\{\underline{W_{i}(t)}\}. The Wiener processes satisfy

𝔼[Wi(t)¯]=0,𝔼[(Wi(t)¯Wi(s)¯)(Wj(t)¯Wj(s)¯)]=(ts)δij,\displaystyle\mathbb{E}[\underline{W_{i}(t)}]=0,\qquad\mathbb{E}[(\underline{W_{i}(t)}-\underline{W_{i}(s)})(\underline{W_{j}(t)}-\underline{W_{j}(s)})]=(t-s)\delta_{ij}, (21)

for 0st0\leq s\leq t. Let 𝑿(0)¯=𝒙ini¯\underline{\bm{X}(0)}=\underline{\bm{x}_{\mathrm{ini}}} be the initial condition for the stochastic differential equation. The stochastic differential equation in (20) has a corresponding Fokker-Planck equation [34], which describes the time-evolution of the probability density function p(𝒙¯,t)p(\underline{\bm{x}},t):

tp(𝒙¯,t)=p(𝒙¯,t),\displaystyle\frac{\partial}{\partial t}p(\underline{\bm{x}},t)=\mathcal{L}^{\dagger}\,p(\underline{\bm{x}},t), (22)

where

=dxd¯ad(𝒙¯)+12i,j2xi¯xj¯[B(𝒙¯)B(𝒙¯)]ij\displaystyle\mathcal{L}^{\dagger}=-\sum_{d}\frac{\partial}{\partial\underline{x_{d}}}a_{d}(\underline{\bm{x}})+\frac{1}{2}\sum_{i,j}\frac{\partial^{2}}{\partial\underline{x_{i}}\partial\underline{x_{j}}}\left[B(\underline{\bm{x}})B(\underline{\bm{x}})^{\top}\right]_{ij} (23)

is the time-evolution operator for the Fokker-Planck equation. The initial condition at time tt is

p(𝒙¯,t)=δ(𝒙¯𝒙ini¯),\displaystyle p(\underline{\bm{x}},t)=\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}}), (24)

where δ()\delta(\cdot) is the Dirac delta function.

Note that the time-evolved probability density function, p(𝒙¯,t+Δtobs)p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}}), is formally written as

p(𝒙¯,t+Δtobs)=Δtobsp(𝒙¯,t).\displaystyle p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}p(\underline{\bm{x}},t). (25)

Then, the expectation for the observable function xdx_{d} is given as follows:

𝔼[xd(𝒙(t+Δtobs)¯)|𝒙(t)¯=𝒙ini¯]\displaystyle\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}]
=xd(𝒙¯)p(𝒙¯,t+Δtobs)𝒙¯\displaystyle=\int_{\mathcal{M}}x_{d}(\underline{\bm{x}})\,p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\rmd\underline{\bm{x}}
=xd(𝒙¯)(eΔtobsδ(𝒙¯𝒙ini¯))𝒙¯\displaystyle=\int_{\mathcal{M}}x_{d}(\underline{\bm{x}})\left(e^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\right)\rmd\underline{\bm{x}}
=(eΔtobsxd(𝒙¯))δ(𝒙¯𝒙ini¯)𝒙¯\displaystyle=\int_{\mathcal{M}}\left(e^{\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}})\right)\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\,\rmd\underline{\bm{x}}
=φd(𝒙¯,t+Δtobs)δ(𝒙¯𝒙ini¯)𝒙¯\displaystyle=\int_{\mathcal{M}}\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\,\rmd\underline{\bm{x}}
=φd(𝒙ini¯,t+Δtobs),\displaystyle=\varphi_{d}(\underline{\bm{x}_{\mathrm{ini}}},t+\Delta t_{\mathrm{obs}}), (26)

where

=dad(𝒙)xd+12i,j[𝑩(𝒙)𝑩(𝒙)]ij2xixj,\displaystyle\mathcal{L}=\sum_{d}a_{d}(\bm{x})\frac{\partial}{\partial x_{d}}+\frac{1}{2}\sum_{i,j}\left[\bm{B}(\bm{x})\bm{B}(\bm{x})^{\top}\right]_{ij}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}, (27)

and φd(𝒙¯,t)\varphi_{d}(\underline{\bm{x}},t) is the solution of the following partial differential equation, i.e., the so-called the backward Kolmogorov equation:

tφd(𝒙¯,t)=φd(𝒙¯,t),\displaystyle\frac{\partial}{\partial t}\varphi_{d}(\underline{\bm{x}},t)=\mathcal{L}\varphi_{d}(\underline{\bm{x}},t), (28)

which leads to

φd(𝒙¯,t+Δtobs)=Δtobsφd(𝒙¯,t).\displaystyle\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}}\varphi_{d}(\underline{\bm{x}},t). (29)

Note that φd(𝒙¯,t+Δtobs)\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\in\mathcal{F}, and the initial condition is

φd(𝒙¯,t)=xd(𝒙¯).\displaystyle\varphi_{d}(\underline{\bm{x}},t)=x_{d}(\underline{\bm{x}}). (30)

Here, \mathcal{L}^{\dagger} in (23) yields the time-evolution of the probability density function on the state space \mathcal{M}. By contrast, \mathcal{L} in (27) corresponds to the time-evolution of functions on the function space \mathcal{F}. In addition, (29) indicates that

𝒦=Δtobs,\displaystyle\mathcal{K}=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}}, (31)

and it is straightforwardly understandable that the Koopman operator 𝒦\mathcal{K} yields the expectation after the time-evolution.

Note that the above discussion is also available to the deterministic cases. When there is no diffusion, i.e., B(𝒙¯)=0B(\underline{\bm{x}})=0, the time-evolution operator \mathcal{L}^{\dagger} includes only the first-order derivatives. While the second-order derivatives lead to the effect of broadening the probability density function, the first-order derivatives correspond to the shift of the coordinates. Then, the zero diffusion cases retain the probability density function as the Dirac delta function; δ(𝒙¯𝒙ini¯)δ(𝒙¯𝒙(t+Δtobs)¯)\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\to\delta(\underline{\bm{x}}-\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}). Hence, the expectation yields the value of 𝒙(t+Δtobs)¯\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}, and the previous discussions on the deterministic cases are recovered adequately.

3 Revisit on partial observations in deterministic systems

3.1 Use of time-dependent basis

Here, we follow [18] and revisit the effects of partial observations in deterministic systems. Note that the following discussions are slightly different from [18]. As discussed below, the absence of diffusion coefficients B(𝒙)B(\bm{x}) is crucial to connect the Koopman operator approach and the conventional discussions for partial observations, i.e., the Mori-Zwanzig formalism.

Assume that only some of the coordinates are observable, and we denote the index set as 𝒪\mathcal{O} and the partial observables as 𝒈𝒪\bm{g}_{\mathcal{O}} whose elements are {xd|d𝒪}\{x_{d}|d\in\mathcal{O}\}; the number of partial observables is NobsN_{\mathrm{obs}}. For example, when we observe x1¯\underline{x_{1}} and x3¯\underline{x_{3}} in a D=3D=3 case, 𝒪={1,3}\mathcal{O}=\{1,3\}, 𝒈𝒪=[x1x3]\bm{g}_{\mathcal{O}}=[x_{1}\,\,x_{3}]^{\top}, and Nobs=2N_{\mathrm{obs}}=2. We also introduce a dictionary 𝝍𝒪¯\bm{\psi}_{\overline{\mathcal{O}}} that includes many basis functions ψi\psi_{i}\in\mathcal{F} related to the unobserved variables. We denote the number of dictionary functions as Ndic𝒪¯N_{\mathrm{dic}}^{\overline{\mathcal{O}}}. The discussions on the function space and the introduction of the basis functions, i.e., the dictionary, lead to a linear algebraic approach for the nonlinear systems; this linearity is characteristic of the Koopman operator approach.

We discuss a case where the basis functions are time-dependent; this case corresponds to the discussion in [18]. In this case, an arbitrary function ϕ\phi\in\mathcal{F} is expressed approximately as

ϕ(𝒙¯,t)=d𝒪cdϕ,𝒪xd(𝒙¯,t)+i=1Ndic𝒪¯ciϕ,𝒪¯ψi(𝒙¯,t).\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}x_{d}(\underline{\bm{x}},t)+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}\psi_{i}(\underline{\bm{x}},t). (32)

Then, we derive an explicit representation of the time-evolution operator \mathcal{L} in (27). Using the same basis functions above, we approximate the operator \mathcal{L} as matrices; it is enough to consider the actions on {xd}\{x_{d}\} and {ψi}\{\psi_{i}\}, and we have

xd(𝒙¯,t)d𝒪Ldd𝒪𝒪xd(𝒙¯,t)+i=1Ndic𝒪¯Ldi𝒪𝒪¯ψi(𝒙¯,t),\displaystyle\mathcal{L}x_{d}(\underline{\bm{x}},t)\simeq\sum_{d^{\prime}\in\mathcal{O}}L^{\mathcal{O}\mathcal{O}}_{dd^{\prime}}x_{d^{\prime}}(\underline{\bm{x}},t)+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}L^{\mathcal{O}\overline{\mathcal{O}}}_{di}\psi_{i}(\underline{\bm{x}},t), (33)

and

ψi(𝒙¯,t)d𝒪Lid𝒪¯𝒪xd(𝒙¯,t)+i=1Ndic𝒪¯Lii𝒪¯𝒪¯ψi(𝒙¯,t).\displaystyle\mathcal{L}\psi_{i}(\underline{\bm{x}},t)\simeq\sum_{d\in\mathcal{O}}L^{\overline{\mathcal{O}}\mathcal{O}}_{id}x_{d}(\underline{\bm{x}},t)+\sum_{i^{\prime}=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}_{ii^{\prime}}\psi_{i^{\prime}}(\underline{\bm{x}},t). (34)

Then, using the matrices, L𝒪𝒪Nobs×NobsL^{\mathcal{O}\mathcal{O}}\in\mathbb{R}^{N_{\mathrm{obs}}\times N_{\mathrm{obs}}}, L𝒪𝒪¯Nobs×Ndic𝒪¯L^{\mathcal{O}\overline{\mathcal{O}}}\in\mathbb{R}^{N_{\mathrm{obs}}\times N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}, L𝒪¯𝒪Ndic𝒪¯×NobsL^{\overline{\mathcal{O}}\mathcal{O}}\in\mathbb{R}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\times N_{\mathrm{obs}}}, and L𝒪¯𝒪¯Ndic𝒪¯×Ndic𝒪¯L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\in\mathbb{R}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\times N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}, the time-evolution equation is given as

t[𝒈𝒪(t)𝝍𝒪¯(t)]=[L𝒪𝒪L𝒪𝒪¯L𝒪¯𝒪L𝒪¯𝒪¯][𝒈𝒪(t)𝝍𝒪¯(t)].\displaystyle\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{g}_{\mathcal{O}}(t)\\ \bm{\psi}_{\overline{\mathcal{O}}}(t)\end{array}\right]=\left[\begin{array}[]{cc}L^{\mathcal{O}\mathcal{O}}&L^{\mathcal{O}\overline{\mathcal{O}}}\\ L^{\overline{\mathcal{O}}\mathcal{O}}&L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{g}_{\mathcal{O}}(t)\\ \bm{\psi}_{\overline{\mathcal{O}}}(t)\end{array}\right]. (41)

As shown in [18], it is possible to derive the time-evolution equation for 𝒈𝒪(t)\bm{g}_{\mathcal{O}}(t). First, we solve (41) for 𝝍𝒪¯(t)\bm{\psi}_{\overline{\mathcal{O}}}(t) implicitly;

𝝍𝒪¯(t)=0t(ts)L𝒪¯𝒪¯L𝒪¯𝒪𝒈𝒪(s)s+tL𝒪¯𝒪¯𝝍𝒪¯(0).\displaystyle\bm{\psi}_{\overline{\mathcal{O}}}(t)=\int_{0}^{t}\rme^{(t-s)L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}}\bm{g}_{\mathcal{O}}(s)\rmd s+\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0). (42)

Then, inserting (42) to (41), we have

t𝒈𝒪(t)=L𝒪𝒪𝒈𝒪(t)+L𝒪𝒪¯0t(ts)L𝒪¯𝒪¯L𝒪¯𝒪𝒈𝒪(s)s+L𝒪𝒪¯tL𝒪¯𝒪¯𝝍𝒪¯(0).\displaystyle\frac{\rmd}{\rmd t}\bm{g}_{\mathcal{O}}(t)=L^{\mathcal{O}\mathcal{O}}\bm{g}_{\mathcal{O}}(t)+L^{\mathcal{O}\overline{\mathcal{O}}}\int_{0}^{t}\rme^{(t-s)L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}}\bm{g}_{\mathcal{O}}(s)\rmd s+L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0). (43)

Replacing L𝒪𝒪L^{\mathcal{O}\mathcal{O}} with MM and L𝒪𝒪¯sL𝒪¯𝒪¯L𝒪¯𝒪-L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{sL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}} with Kmem(s)K_{\mathrm{mem}}(s), we have

t𝒈𝒪(t)=M𝒈𝒪(t)0tKmem(ts)𝒈𝒪(s)s+𝒇(t),\displaystyle\frac{\rmd}{\rmd t}\bm{g}_{\mathcal{O}}(t)=M\bm{g}_{\mathcal{O}}(t)-\int_{0}^{t}K_{\mathrm{mem}}(t-s)\bm{g}_{\mathcal{O}}(s)\rmd s+\bm{f}(t), (44)

where 𝒇(t)=L𝒪𝒪¯tL𝒪¯𝒪¯𝝍𝒪¯(0)\bm{f}(t)=L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0). MM is called the Markov transition matrix, which quantifies the interactions between the observables 𝒈𝒪(t)\bm{g}_{\mathcal{O}}(t). Kmem(s)K_{\mathrm{mem}}(s) is called the memory kernel and depends on the history of the observed quantity; (43) indicates that the memory effect stems from the path via the unobserved quantities. 𝒇(t)\bm{f}(t) is referred to as noise since it depends on the initial values of the unobserved quantities 𝝍𝒪¯(0)\bm{\psi}_{\overline{\mathcal{O}}}(0). That is, there is no information about their explicit values, and then we cannot predict them.

3.2 Feature of deterministic systems

Note that (44) is an equation in the function space \mathcal{F}. It is crucial to distinguish the function space \mathcal{F} and the state space \mathcal{M}; an element xdx_{d} in 𝒈𝒪(t)\bm{g}_{\mathcal{O}}(t) is different from an element xd¯\underline{x_{d}} in the state vector 𝒙¯\underline{\bm{x}} in general. However, deterministic systems have a unique feature that one can interpret (44) as an equation for the state vector.

First, we focus on the time-evolution equation in (26); we see that the final expression φd(𝒙ini¯,t+Δtobs)\varphi_{d}(\underline{\bm{x}_{\mathrm{ini}}},t+\Delta t_{\mathrm{obs}}) yields the expectation values. Although the expectation value 𝔼[xd]\mathbb{E}[x_{d}] is not the state xd¯\underline{x_{d}}, the absence of the noise term does not change an initial Dirac-delta-type density function even in the time-evolution. That is,

eΔtobsδ(𝒙¯𝒙(t)¯)=δ(𝒙¯𝒙(t+Δtobs)¯).\displaystyle e^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}\delta(\underline{\bm{x}}-\underline{\bm{x}(t)})=\delta(\underline{\bm{x}}-\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}). (45)

Then, the expectation value is equal to the corresponding state;

𝔼[xd(𝒙(t+Δtobs)¯)|𝒙(t)¯=𝒙ini¯]=xd(t+Δtobs)¯.\displaystyle\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}]=\underline{x_{d}(t+\Delta t_{\mathrm{obs}})}. (46)

This fact enables us to equate the observable function xdx_{d} with the state xd¯\underline{x_{d}}.

Second, we introduce a different perspective on the time-evolution equation. In deterministic systems, B(𝒙)=0B(\bm{x})=0, i.e., there is no noise. Hence, the second term on the right-hand side of (27) is absent. Then, the following operator yields the time-evolution in the function space:

=dad(𝒙)xd.\displaystyle\mathcal{L}=\sum_{d}a_{d}(\bm{x})\frac{\partial}{\partial x_{d}}. (47)

When we consider the time-evolution for the observable function xdx_{d}, the action of \mathcal{L} leads to

xd=ad(𝒙),\displaystyle\mathcal{L}x_{d}=a_{d}(\bm{x}), (48)

and then, we have

txd=ad(𝒙).\displaystyle\frac{\rmd}{\rmd t}x_{d}=a_{d}(\bm{x}). (49)

Note that (49) is the same form as the original time-evolution equation in the state space:

txd¯=ad(𝒙¯).\displaystyle\frac{\rmd}{\rmd t}\underline{x_{d}}=a_{d}(\underline{\bm{x}}). (50)

This fact also allows us to equate the time-evolution of xdx_{d} in the function space \mathcal{F} with that of xd¯\underline{x_{d}} in the state space \mathcal{M}.

In short, although we should distinguish the function space and the state space, the feature of the deterministic system enables us to consider the derived equation (44) in the function space as the one in the state space. This feature justifies the coarse-grained approach based on the Mori-Zwanzig formalism; we can construct time-evolution equations of a smaller set of macroscopic variables that are measurable, and other unmeasured degrees of freedom play the role of noise.

4 Partial observations in stochastic systems

4.1 Discussion with time-independent basis functions

As discussed in Sec. 3, the derived equation in (44) is for the function space. Note that the function at time t+Δtobst+\Delta t_{\mathrm{obs}} yields the expectation values in (26). Due to the presence of the noise term, even if the initial density function is the Dirac delta function, the values of 𝔼[xd(𝒙(t+Δtobs)¯)]\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})] and xd¯(t+Δtobs)\underline{x_{d}}(t+\Delta t_{\mathrm{obs}}) are different because the density function is spread out. Hence, the generalized Langevin equation in (44) is not directly related to the state vector 𝒙¯\underline{\bm{x}}.

Here, we discuss the generalized Langevin equation in the function space. Although the time-dependent basis in Sec. 3 is beneficial due to the connection with the state vector, it would be preferable to fix the basis functions in the discussion below. Then, we employ the following basis expansion:

ϕ(𝒙¯,t)=d𝒪cdϕ,𝒪(t)xd(𝒙¯)+i=1Ndic𝒪¯ciϕ,𝒪¯(t)ψi(𝒙¯),\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(t)x_{d}(\underline{\bm{x}})+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(t)\psi_{i}(\underline{\bm{x}}), (51)

where the expansion coefficients {cdϕ,𝒪(t)}\{c^{\phi,\mathcal{O}}_{d}(t)\} and ciϕ,𝒪¯(t)c^{\phi,\overline{\mathcal{O}}}_{i}(t) are time-dependent, which differs from (32).

Then, we derive the time-evolution equation for the expansion coefficients {cdϕ,𝒪(t)}\{c^{\phi,\mathcal{O}}_{d}(t)\} and ciϕ,𝒪¯(t)c^{\phi,\overline{\mathcal{O}}}_{i}(t). In the derivation, we employ the bra and ket notations for the notational simplicity. Note that the notations are essentially the same as the Doi-Peliti method [35, 36, 37]; see the discussion in [38, 39, 40].

In the function space, we introduce the following ket states {|xd}\{|x_{d}\rangle\} and {|ψi}\{|\psi_{i}\rangle\}:

|xd=xdfor d=1,,D,\displaystyle|x_{d}\rangle=x_{d}\qquad\textrm{for }d=1,\dots,D, (52)
|ψi=ψifor i=1,,Ndic𝒪¯,\displaystyle|\psi_{i}\rangle=\psi_{i}\qquad\textrm{for }i=1,\dots,N_{\mathrm{dic}}^{\overline{\mathcal{O}}}, (53)

where we omit the argument of functions, 𝒙¯\underline{\bm{x}}. By contrast, the bra states {xd|}\{\langle x_{d}|\} and {ψi|}\{\langle\psi_{i}|\} are defined as follows:

xd|=d=1Dxdδ(xd)(xd)(),\displaystyle\langle x_{d}|=\int\prod_{d^{\prime}=1}^{D}\rmd x_{d^{\prime}}\,\delta(x_{d^{\prime}})\left(\frac{\partial}{\partial x_{d}}\right)(\cdot), (54)
ψi|=𝒙ρ(𝒙)ψi(𝒙)(),\displaystyle\langle\psi_{i}|=\int\rmd\bm{x}\,\rho(\bm{x})\psi_{i}(\bm{x})(\cdot), (55)

where ρ(𝒙)\rho(\bm{x}) is a suitable measure to yield L2L^{2} functions. From (52) and (54), we have the following orthonormal relation:

xd|xd=xdδ(xd)(xd)xd=δd,d,\displaystyle\langle x_{d}|x_{d^{\prime}}\rangle=\int\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)x_{d^{\prime}}=\delta_{d,d^{\prime}}, (56)

where δd,d\delta_{d,d^{\prime}} is the Kronecker delta function.

It would be easy to explain with a concrete example. Hence, we here assume that there are only two observable variables, 𝒪={1,2}\mathcal{O}=\{1,2\}. Using the above bra and ket notations, (51) is rewritten as

ϕ(𝒙¯,t)=d𝒪cdϕ,𝒪(t)|xd+i=1Ndic𝒪¯ciϕ,𝒪¯(t)|ψi.\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(t)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(t)|\psi_{i}\rangle. (57)

The time-derivative of (57) leads to

tϕ(𝒙¯,t)=d𝒪t(cdϕ,𝒪(t))|xd+i=1Ndic𝒪¯t(ciϕ,𝒪¯(t))|ψi.\displaystyle\frac{\rmd}{\rmd t}\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}\frac{\rmd}{\rmd t}\left(c^{\phi,\mathcal{O}}_{d}(t)\right)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\frac{\rmd}{\rmd t}\left(c^{\phi,\overline{\mathcal{O}}}_{i}(t)\right)|\psi_{i}\rangle. (58)

Next, we take the action of xd|\langle x_{d}| on all terms from the left. Similar to the discussion in Sec. 3.1, we have the following time-evolution equation:

Zt[𝒄ϕ,𝒪(t)𝒄ϕ,𝒪¯(t)]=[L~𝒪𝒪L~𝒪𝒪¯L~𝒪¯𝒪L~𝒪¯𝒪¯][𝒄ϕ,𝒪(t)𝒄ϕ,𝒪¯(t)],\displaystyle Z\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right]=\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right], (65)

where we define

Z=[10x1|ψ1x1|ψ2x1|ψNdic𝒪¯01x2|ψ1x2|ψ2x2|ψNdic𝒪¯ψ1|x1ψ1|x2ψ1|ψ1ψ1|ψ2ψ1|ψNdic𝒪¯ψNdic𝒪¯|x1ψNdic𝒪¯|x2ψNdic𝒪¯|ψ1ψNdic𝒪¯|ψ2ψNdic𝒪¯|ψNdic𝒪¯]\displaystyle Z=\left[\begin{array}[]{cccccc}1&0&\langle x_{1}|\psi_{1}\rangle&\langle x_{1}|\psi_{2}\rangle&\cdots&\langle x_{1}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ 0&1&\langle x_{2}|\psi_{1}\rangle&\langle x_{2}|\psi_{2}\rangle&\cdots&\langle x_{2}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ \langle\psi_{1}|x_{1}\rangle&\langle\psi_{1}|x_{2}\rangle&\langle\psi_{1}|\psi_{1}\rangle&\langle\psi_{1}|\psi_{2}\rangle&\cdots&\langle\psi_{1}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ \langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|x_{1}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|x_{2}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{1}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{2}\rangle&\cdots&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\end{array}\right] (71)

and

xd||xd=L~dd𝒪𝒪,xd||ψi=L~di𝒪𝒪¯,ψi||xd=L~id𝒪¯𝒪,ψi||ψi=L~ii𝒪¯𝒪¯.\displaystyle\langle x_{d}|\mathcal{L}|x_{d^{\prime}}\rangle=\widetilde{L}^{\mathcal{O}\mathcal{O}}_{dd^{\prime}},\quad\langle x_{d}|\mathcal{L}|\psi_{i}\rangle=\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}_{di},\quad\langle\psi_{i}|\mathcal{L}|x_{d^{\prime}}\rangle=\widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}_{id^{\prime}},\quad\langle\psi_{i}|\mathcal{L}|\psi_{i^{\prime}}\rangle=\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}_{ii^{\prime}}. (72)

Although the matrix ZZ is not diagonal, the following assumption for the dictionary functions {|ψi}\{|\psi_{i}\rangle\} makes the discussion simple: the functions {|ψi}\{|\psi_{i}\rangle\} consist of monomial basis functions. Then, an element in {|ψi}\{|\psi_{i}\rangle\} has the following form:

|𝒏d=1Dxdnd,\displaystyle|\bm{n}\rangle\equiv\prod_{d=1}^{D}x_{d}^{n_{d}}, (73)

where 𝒏={n1,n2,,nD}\bm{n}=\{n_{1},n_{2},\dots,n_{D}\}. The corresponding bra state is defined as

𝒏|d=1Dxdδ(xd)(xd)nd().\displaystyle\langle\bm{n}|\equiv\int\prod_{d=1}^{D}\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)^{n_{d}}(\cdot). (74)

Then, we have the following orthogonal relation:

𝒏|𝒎=d=1Dxdδ(xd)(xd)ndxdmd=d=1Dnd!δnd,md.\displaystyle\langle\bm{n}|\bm{m}\rangle=\int\prod_{d=1}^{D}\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)^{n_{d}}x_{d}^{m_{d}}=\prod_{d=1}^{D}n_{d}!\delta_{n_{d},m_{d}}. (75)

Note that the monomial functions for the observable states,

|{1,0,,0}=|x1and|{0,1,,0}=|x2,\displaystyle|\{1,0,\dots,0\}\rangle=|x_{1}\rangle\quad\textrm{and}\quad|\{0,1,\dots,0\}\rangle=|x_{2}\rangle, (76)

should not be included in {|ψi}\{|\psi_{i}\rangle\}. Using the above assumption for the dictionary functions, we have the following matrix ZZ in (71):

Z=[100000100000z1000000zNdic𝒪¯],\displaystyle Z=\left[\begin{array}[]{cccccc}1&0&0&0&\cdots&0\\ 0&1&0&0&\cdots&0\\ 0&0&z_{1}&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&0&\cdots&z_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\end{array}\right], (82)

where we denote the diagonal part with {zi|i=1,,Ndic𝒪¯}\{z_{i}|i=1,\dots,N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\}. Since the matrix ZZ is diagonal, Z1Z^{-1} is also diagonal, which leads to

t[𝒄ϕ,𝒪(t)𝒄ϕ,𝒪¯(t)]\displaystyle\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right] =Z1[L~𝒪𝒪L~𝒪𝒪¯L~𝒪¯𝒪L~𝒪¯𝒪¯][𝒄ϕ,𝒪(t)𝒄ϕ,𝒪¯(t)]\displaystyle=Z^{-1}\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right] (89)
=[L~𝒪𝒪L~𝒪𝒪¯L~𝒪¯𝒪L~𝒪¯𝒪¯][𝒄ϕ,𝒪(t)𝒄ϕ,𝒪¯(t)].\displaystyle=\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right]. (94)

Note that the multiplication of the diagonal matrix Z1Z^{-1} does not change the relationship between observable and unobserved functions. Since (94) has the same form as (41), we finally obtain the following equation similar to (44):

t𝒄ϕ,𝒪(t)=L~𝒪𝒪𝒄ϕ,𝒪(t)+L~𝒪𝒪¯0t(ts)L~𝒪¯𝒪¯L~𝒪¯𝒪𝒄ϕ,𝒪(s)s+L~𝒪𝒪¯tL~𝒪¯𝒪¯𝒄ϕ,𝒪¯(0).\displaystyle\frac{\rmd}{\rmd t}\bm{c}^{\phi,\mathcal{O}}(t)=\widetilde{L}^{\mathcal{O}\mathcal{O}}\bm{c}^{\phi,\mathcal{O}}(t)+\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\int_{0}^{t}\rme^{(t-s)\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}\bm{c}^{\phi,\mathcal{O}}(s)\rmd s+\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\rme^{t\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{c}^{\phi,\overline{\mathcal{O}}}(0). (95)

Although the form of (95) is similar to (41), there are differences. First, (95) is a vector-state equation for only one function ϕ\phi. Second, the initial values of the coefficients {𝒄ϕ,𝒪(0)}\{\bm{c}^{\phi,\mathcal{O}}(0)\} are one-hot because we focus only on the observable functions {xd|d𝒪}\{x_{d}|d\in\mathcal{O}\}. For example, assume that we focus on x1x_{1}. Hence, we have the following equation for ϕ=x1\phi=x_{1} at the initial time t=0t=0:

x1=ϕ(𝒙¯,0)=d𝒪cdϕ,𝒪(0)|xd+i=1Ndic𝒪¯ciϕ,𝒪¯(0)|ψi,\displaystyle x_{1}=\phi(\underline{\bm{x}},0)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(0)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(0)|\psi_{i}\rangle, (96)

which indicates that only the first element, c1ϕ,𝒪(0)c^{\phi,\mathcal{O}}_{1}(0), is 1, and the other terms become zero because the left-hand side is the function x1x_{1}. Then, different from the time-dependent basis functions, the third term in the right-hand side in (95), i.e., the noise term, vanishes;

t𝒄ϕ,𝒪(t)=M~𝒄ϕ,𝒪(t)0tK~mem(ts)𝒄ϕ,𝒪(s)s,\displaystyle\frac{\rmd}{\rmd t}\bm{c}^{\phi,\mathcal{O}}(t)=\widetilde{M}\bm{c}^{\phi,\mathcal{O}}(t)-\int_{0}^{t}\widetilde{K}_{\mathrm{mem}}(t-s)\bm{c}^{\phi,\mathcal{O}}(s)\rmd s, (97)

where M~L~𝒪𝒪\widetilde{M}\equiv\widetilde{L}^{\mathcal{O}\mathcal{O}} and K~mem(s)L~𝒪𝒪¯sL~𝒪¯𝒪¯L~𝒪¯𝒪\widetilde{K}_{\mathrm{mem}}(s)\equiv-\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\rme^{s\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}. Note that different from (44), it is not straightforwardly clear that the use of time-delayed states is beneficial. The second term on the right-hand side in (97) is related to the expansion coefficient, which does not directly correspond to the delayed state.

4.2 Effectiveness of delay embedding

As mentioned above, it is crucial to distinguish the state and function spaces. The discussions in Sec. 3.1 and Sec. 4.1 are based on the Koopman operator approach which focuses on the function space. The discussion in Sec. 3.2 focuses on the feature of the deterministic systems, which yields the connection with the state and function spaces.

It is well known that delay embedding is effective in deterministic systems [24, 25, 33, 41, 42]. The effectiveness of delay embedding could be understandable from the viewpoint of the Takens theorem [43]; see also the discussions in [18]. However, in stochastic systems, it is not straightforward to apply the discussion based on the Takens theorem because we consider the function space. Here, we discuss the effectiveness of the delay embedding in the stochastic systems.

As discussed in Sec. 2, the introduction of the dictionary enables us to deal with the nonlinearity within the Koopman operator approach. However, assuming the monomial basis functions, the use of higher-order basis functions causes the so-called curse of dimensionality, which exponentially increases the size of the dictionary. In the discussion in Sec. 4.1, the basis functions affect the second term on the right-hand sides in (95) and (97). The approximation by a finite dictionary can significantly reduce accuracy.

Here, note that the Koopman operator approach leads to

xd(t)=𝒦Δtobsxd(tΔtobs)=Δtobsxd(tΔtobs)\displaystyle x_{d}(t)=\mathcal{K}^{\Delta t_{\mathrm{obs}}}x_{d}(t-\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(t-\Delta t_{\mathrm{obs}}) (98)

for an observable function xdx_{d}. Then, we rewrite (98) formally as follows:

xd(tΔtobs)\displaystyle x_{d}(t-\Delta t_{\mathrm{obs}})
=Δtobsxd(t)\displaystyle=\rme^{-\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(t)
=k=0(dNad(𝒙)xd12i,j[B(𝒙)B(𝒙)]i,j2xixj)k(Δtobs)kxj(t).\displaystyle=\sum_{k=0}^{\infty}\left(-\sum_{d^{\prime}}^{N}a_{d^{\prime}}(\bm{x})\frac{\partial}{\partial x_{d^{\prime}}}-\frac{1}{2}\sum_{i,j}[B(\bm{x})B(\bm{x})]^{\top}_{i,j}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\right)^{k}(\Delta t_{\mathrm{obs}})^{k}x_{j}(t). (99)

Since ad(𝒙)a_{d}(\bm{x}) and B(𝒙)B(\bm{x}) include various functions, the exponential in (99) yields various basis functions not included in the finite dictionary. Hence, they are suitable for representing the subspace spanned by the time-evolution operator, and we could expect that the delay embedding will work well. In the next section, we will demonstrate the effectiveness of the delay embedding. Empirical insight into the characteristics of the noise effects will also be discussed.

5 Numerical experiments on the noise effects

5.1 Two examples

To investigate how stochasticity affects the partial observation, we perform numerical experiments. In [18], the data-driven approach enables us to obtain the Markov transition matrix and the memory kernel in (44) for the deterministic cases, and the relationship between the memory kernel and the delay embedding is describe. In the stochastic cases, as discussed above, it is not straightforward to connect the memory kernel in (97) with the delay embedding. Hence, we investigate the effects of the delay embedding with the data-driven approach based on the EDMD algorithm in Sec. 2.3.

For deterministic systems, it is common that there are many variables, and we observe only a few of them. For the stochastic systems here, we assume that the noise term includes most of the unobserved effects. As discussed below, it is necessary to consider higher-order dictionary functions to evaluate the higher-order statistics in stochastic systems, which leads to high computational costs. Hence, as demonstrations, we examine the following two examples and the effects of partial observation.

The first one is the van der Pol system [44] with additive noises:

X1¯=X2¯t+σ1dW1(t)¯,\displaystyle\rmd\underline{X_{1}}=\underline{X_{2}}\rmd t+\sigma_{1}\underline{dW_{1}(t)}, (100)
X2¯=(μ(1X12¯)X2¯X1¯)t+σ2dW2(t)¯.\displaystyle\rmd\underline{X_{2}}=\left(\mu\left(1-\underline{X_{1}^{2}}\right)\underline{X_{2}}-\underline{X_{1}}\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)}. (101)

The second one is the Lorentz system [45] with additive noises:

X1¯=ν(X2¯X1¯)t+σ1dW1(t)¯,\displaystyle\rmd\underline{X_{1}}=\nu\left(\underline{X_{2}}-\underline{X_{1}}\right)\rmd t+\sigma_{1}\underline{dW_{1}(t)}, (102)
X2¯=(X1¯(ρX3¯)X2¯)t+σ2dW2(t)¯,\displaystyle\rmd\underline{X_{2}}=\left(\underline{X_{1}}\left(\rho-\underline{X_{3}}\right)-\underline{X_{2}}\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)}, (103)
X3¯=(X1¯X2¯βX3¯)t+σ3dW3(t)¯.\displaystyle\rmd\underline{X_{3}}=\left(\underline{X_{1}}\underline{X_{2}}-\beta\underline{X_{3}}\right)\rmd t+\sigma_{3}\underline{dW_{3}(t)}. (104)

In all the numerical experiments, we assume that the noise amplitude for each variable is common: σσ1=σ2\sigma\equiv\sigma_{1}=\sigma_{2} for the van der Pol system, and σσ1=σ2=σ3\sigma\equiv\sigma_{1}=\sigma_{2}=\sigma_{3} for the Lorentz system. The system parameters are set to μ=1\mu=1 for the van der Pol system, and ν=10\nu=10, β=8/3\beta=8/3, and ρ=28\rho=28 or 1313 for the Lorentz system; we tried two values for the parameter ρ\rho, as discussed below.

Refer to caption
Figure 2: Examples of sampled trajectories. (a) is for the van der Pol system with the noise amplitude σ=0.5\sigma=0.5. (b) and (c) correspond to the Lorentz cases with the noise amplitude σ=5\sigma=5; we set ρ=28\rho=28 for (b) and ρ=13\rho=13 for (c). The markers indicate the observed data used in the EDMD algorithm. The intervals of the snapshot pairs are Δtobs=0.1\Delta t_{\mathrm{obs}}=0.1 for (a) and Δtobs=0.01\Delta t_{\mathrm{obs}}=0.01 for (b) and (c).

Figure 2 shows examples of sampled trajectories. The Euler-Maruyama approximation is employed [46]; Δt=103\Delta t=10^{-3} for the van der Pol system, and Δt=104\Delta t=10^{-4} for the Lorentz system. Figure 2 corresponds to the van der Pol system with the noise amplitude σ=0.5\sigma=0.5; we see the noisy behavior along the limit cycle. Figures 2(b) and 2(c) correspond to the Lorentz system with noise amplitude σ=5\sigma=5. The difference with Figs. 2(b) and 2(c) is in the system parameter ρ\rho; ρ=28\rho=28 in Fig. 2(b) and ρ=13\rho=13 in Fig. 2(c). Although the Monte Carlo simulations are performed with the small time interval Δt\Delta t, we construct the snapshot pairs from the generated data. Here, we set the time intervals as Δtobs=0.1\Delta t_{\mathrm{obs}}=0.1 for the van der Pol system and Δtobs=0.01\Delta t_{\mathrm{obs}}=0.01 for the Lorentz system. The markers in Fig. 2 are the observed data points.

5.2 Evaluation of approximate values of exact solutions

In deterministic systems, it is easy to obtain numerically exact solutions, which is necessary to discuss the accuracy of the partial observation. That is, the numerical time integration, such as the Runge-Kutta method, yields a good numerical approximation for the exact solutions. By contrast, the noise in the stochastic system prevents us from such numerical time integration. Although the Monte Carlo algorithm based on the Euler-Maruyama approximation is available for time integration, we should generate many sample trajectories to estimate the exact solutions for statistics. Since it is computationally impractical to obtain solutions with high accuracy using the Monte Carlo method, we directly derive the Koopman matrix KΔtK_{\Delta t} for the stochastic system. The numerical procedure is essentially the same as [39, 40, 47]; we explain it briefly in the context of the present paper.

For example, consider three dimensional cases such as the Lorentz system. Then, we define the dictionary in Sec. 2.3 as follows:

𝝍=[1,x1,x2,x3,x12,x1x2,x1x3,x22,x2x3,x32,].\displaystyle\bm{\psi}=[1,x_{1},x_{2},x_{3},x_{1}^{2},x_{1}x_{2},x_{1}x_{3},x_{2}^{2},x_{2}x_{3},x_{3}^{2},\dots]^{\top}. (105)

Note that (94) leads to the expansion coefficients for the observable function ϕ\phi. Hence, when we can observe all variables, i.e., 𝒪={1,2,3}\mathcal{O}=\{1,2,3\}, we have the following expansion:

x1(𝒙¯,t+Δtobs)\displaystyle x_{1}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})
=c1x1,𝒪¯(t+Δtobs)1(𝒙¯)+c2x1,𝒪(t+Δtobs)x1(𝒙¯)+c3x1,𝒪(t+Δtobs)x2(𝒙¯)\displaystyle=c_{1}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})1(\underline{\bm{x}})+c_{2}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{1}(\underline{\bm{x}})+c_{3}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{2}(\underline{\bm{x}})
+c4x1,𝒪(t+Δtobs)x3(𝒙¯)+i5cix1,𝒪¯(t+Δtobs)ψi(𝒙¯),\displaystyle\quad+c_{4}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{3}(\underline{\bm{x}})+\sum_{i\geq 5}c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\psi_{i}(\underline{\bm{x}}), (106)

where {cdx1,𝒪(t+Δtobs)}\{c_{d}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})\} and {cix1,𝒪¯(t+Δtobs)}\{c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\} are obtained via the time integration of (94) with ϕ=x1\phi=x_{1}. Next, we perform the time integration of (94) with ϕ=x2\phi=x_{2}, ϕ=x3\phi=x_{3} and so on. Here, note that the action of KΔtobsK_{\mathrm{\Delta t_{\mathrm{obs}}}} to 𝝍(𝒙¯,t)\bm{\psi}(\underline{\bm{x}},t) yields 𝝍(𝒙¯,t+Δtobs)\bm{\psi}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}}); this indicates that the linear combination in (106) is directly related to the Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}}. That is, the elements in the second row of the Koopman matrix KΔtobsK_{\mathrm{\Delta t_{\mathrm{obs}}}} correspond to the expansion coefficients {cdx1,𝒪(t+Δtobs)}\{c_{d}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})\} and {cix1,𝒪¯(t+Δtobs)}\{c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\}. The third and fourth rows are for x2x_{2} and x3x_{3}, respectively. Repeating this procedure, it is possible to evaluate the Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} numerically.

Here, we employ the Crank-Nicolson method to solve (94); the discrete time intervals for numerical time integration is set to Δt=103\Delta t=10^{-3} for the van der Pol system and Δt=104\Delta t=10^{-4} for the Lorentz system, respectively. As denoted above, we set the observation time interval for the snapshot pairs as Δtobs=0.1\Delta t_{\mathrm{obs}}=0.1 for the van der Pol system and Δtobs=0.01\Delta t_{\mathrm{obs}}=0.01 for the Lorentz system. The maximum degrees in the dictionary are 88 for the van der Pol system and 66 for the Lorentz system.

If there is no noise, we can compare the numerical results obtained by the evaluated Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} with those from the conventional numerical time integration based on the fourth-order Runge-Kutta method. For randomly selected initial coordinates ([1,+1]2[-1,+1]^{2} for the van der Pol system and [10,+10]3[-10,+10]^{3} for the Lorentz system), the mean absolute errors for 10001000 data points are less than 10710^{-7}. From these results, we confirmed that the finite dictionaries with the specified maximum degrees yield sufficient accuracy. Hence, we employ the evaluated Koopman matrix KΔtobsK_{\Delta t_{\mathrm{obs}}} from (94) as true solutions.

5.3 Settings for the EDMD algorithm for partial observation

To investigate the effects of the partial observation, we employ the EDMD algorithm with the delay embedding. The EDMD algorithm is data-driven, and we generate the dataset as follows:

  1. 1.

    Generate randomly initial coordinates. (For the van der Pol system, a uniform density with [1,+1]2[-1,+1]^{2} was used. For the Lorentz system, we used a uniform density with [10,+10]3[-10,+10]^{3}.)

  2. 2.

    Using the Euler-Maruyama approximation, the time integration is simulated via the Monte Carlo method. The time intervals for time-evolution and observation are the same as those in Sec. 5.1.

  3. 3.

    After 100×Δtobs100\times\Delta t_{\mathrm{obs}} relaxation time, we take 101+Mmax101+M_{\mathrm{max}} data points, where MmaxM_{\mathrm{max}} is the maximum number of the delay embedding.

  4. 4.

    The above procedure is repeated 100100 times.

We prepare 10,00010,000 snapshot pairs and construct the Koopman matrix from the dataset. Although data points for all coordinates are generated, we consider the partial observation with only one variable.

We should note the dictionary in the EDMD algorithm. In deterministic systems, it is common to use a naive delay embedding of the past coordinate, such as xd(tΔtobs)x_{d}(t-\Delta t_{\mathrm{obs}}). Although this setting is enough to evaluate only the coordinate values, we should employ higher-order basis functions to evaluate higher-order statistics. That is, 𝔼[Xd2¯](𝔼[Xd¯])2\mathbb{E}[\underline{X_{d}^{2}}]\neq(\mathbb{E}[\underline{X_{d}}])^{2} in the stochastic systems, and the evaluation of 𝔼[Xd2¯]\mathbb{E}[\underline{X_{d}^{2}}] requires the monomial functions whose maximum degree is at least two.

To consider the higher-order basis functions with the delay embedding, we rewrite the variables 𝒙¯\underline{\bm{x}} in Sec. 2 with 𝒛¯\underline{\bm{z}}. When we observe the dd-th variable, we set

[z1(t)¯z2(t)¯zM+1(t)¯]=[xd(t)¯xd(tΔtobs)¯xd(tMΔtobs)¯],\displaystyle\left[\begin{array}[]{c}\underline{z_{1}(t)}\\ \underline{z_{2}(t)}\\ \vdots\\ \underline{z_{M+1}(t)}\end{array}\right]=\left[\begin{array}[]{c}\underline{x_{d}(t)}\\ \underline{x_{d}(t-\Delta t_{\mathrm{obs}})}\\ \vdots\\ \underline{x_{d}(t-M\Delta t_{\mathrm{obs}})}\end{array}\right], (115)

where MM is the number of the delay embedding. Then, we introduce higher-order monomial basis functions for {zi(t)}\{z_{i}(t)\} to evaluate higher-order statistics. Since we will focus on the mean values and the variances, the maximum degree of the dictionary functions for {zi(t)}\{z_{i}(t)\} is set as two, which leads to the following dictionary 𝝍\bm{\psi}:

[1,z1,,zM+1,z12,z1z2,z1z3,,zMzM+1,zM+12]\displaystyle[1,z_{1},\dots,z_{M+1},z_{1}^{2},z_{1}z_{2},z_{1}z_{3},\dots,z_{M}z_{M+1},z_{M+1}^{2}]
=[1,xd(t),,xd(tMΔtobs),xd(t)2,xd(t)xd(tΔtobs),\displaystyle=[1,x_{d}(t),\dots,x_{d}(t-M\Delta t_{\mathrm{obs}}),x_{d}(t)^{2},x_{d}(t)x_{d}(t-\Delta t_{\mathrm{obs}}),
xd(t)xd(t2Δtobs),,xd(t(M1)Δtobs)xd(tMΔtobs),xd(tMΔtobs)2],\displaystyle\quad x_{d}(t)x_{d}(t-2\Delta t_{\mathrm{obs}}),\dots,x_{d}(t-(M-1)\Delta t_{\mathrm{obs}})x_{d}(t-M\Delta t_{\mathrm{obs}}),x_{d}(t-M\Delta t_{\mathrm{obs}})^{2}], (116)

where xd(tmΔtobs)x_{d}(t-m\Delta t_{\mathrm{obs}}) is a function which yields the coordinate value in the mΔtobsm\Delta t_{\mathrm{obs}} past:

xd(tmΔtobs)(𝒛(t)¯)=xd(tmΔtobs)¯.\displaystyle x_{d}(t-m\Delta t_{\mathrm{obs}})(\underline{\bm{z}(t)})=\underline{x_{d}(t-m\Delta t_{\mathrm{obs}})}. (117)

5.4 Comparison with higher-order dictionary functions and delay embedding

Refer to caption
Figure 3: Accuracy of the estimated statistics by the Koopman matrix with the partial observation in the van der Pol system with the noise amplitude σ=0.5\sigma=0.5. (a) Dependence on the choice of dictionary functions without delay embedding. (b) Dependence of the number of delay embeddings. For the sake of visibility, the results for 𝔼[X2¯]\mathbb{E}[\underline{X_{2}}] and 𝔼[X22¯]\mathbb{E}[\underline{X_{2}^{2}}] are drawn slightly shifted.

As discussed in Sec. 4.2, the delay embedding would improve the accuracy of statistics in the Koopman operator approach. Here, we present the numerical results for the van der Pol system with the noise amplitude σ=0.5\sigma=0.5.

Figure 3 shows the accuracy in the partial observation setting. Here, only the second variable, x2¯\underline{x_{2}}, is measurable. In Fig. 3(a), we do not use the delay embeddings and the maximum degree of the basis functions is changed. We see that the sudden change occurs when the maximum degree is varied from 1 to 2; this fact means that the monomial functions with degree 2 are necessary to evaluate the second-order statistics.

Figure 3(b) shows the dependence of the number of delay embeddings and accuracy. Note that the dictionary introduced in Sec. 5.3 estimates 𝔼[X22¯]\mathbb{E}[\underline{X_{2}^{2}}] adequately. From the numerical results, the increase in the number of embeddings improves the accuracy. We also see that there is a certain saturation point.

Compared with Figs. 3(a) and 3(b), we confirmed that the delay embedding is more effective than the usage of higher-order dictionary functions; this fact is common in all the other cases with other noise amplitude and the Lorentz system. Hence, the discussion in Sec. 4.2 would be valid; the delay embeddings are effective even in the partial observation cases in the stochastic systems.

5.5 Effects of higher order dictionary functions

Refer to caption
Figure 4: Accuracy of the first-order statistics in the partial observation settings for the Lorentz system with the noise amplitude σ=3\sigma=3. In (a) and (d), only the first variable, x1¯\underline{x_{1}}, is observable; x2¯\underline{x_{2}} for (b) and (e), and x3¯\underline{x_{3}} for (c) and (f). (a), (b), and (c) corresponds to the cases with the system parameter ρ=28\rho=28; (d), (e), and (f) for ρ=13\rho=13.

As explained in Sec. 4.2, the use of the higher-order basis functions is necessary for the evaluation of higher-order statistics. We here investigate its effects on the first-order statistics, i.e., the mean values. In the numerical experiments, we show the results for the Lorentz system; the same behavior was observed for the van der Pol system. The noise amplitude for the Lorentz system is set to σ=3\sigma=3.

Figure 4 shows the accuracy of the first-order statistics in the partial observation settings. We compare two dictionaries for the delay embeddings; one is the second-order monomial basis functions for {zi}\{z_{i}\} in (116). The other one is the first-order monomial basis functions. Only in the case of 𝔼[X3¯]\mathbb{E}[\underline{X_{3}}], as shown in Fig. 4(c), there is a slight variation in accuracy for different dictionaries. The reason for this would be as follows. In the Lorentz system, X3¯\underline{X_{3}} differs from X1¯\underline{X_{1}} and X2¯\underline{X_{2}}; (102) does not include a term related to X3¯\underline{X_{3}}, while each of the three equations includes X1¯\underline{X_{1}} and X2¯\underline{X_{2}}. Hence, the subspace spanned by the delay embedding with X3¯\underline{X_{3}} would be smaller than those with X1¯\underline{X_{1}} and X2¯\underline{X_{2}}. Then, the difference in the dictionary affects the partial observation of X3¯\underline{X_{3}}. As ρ\rho becomes smaller, the effects of noise through X2¯\underline{X_{2}} could be smaller. Hence, the effects on partial observations also become smaller; compared to Fig. 4(c), the difference in accuracy between different dictionaries in Fig. 4(f) is smaller. From these results, we conclude that the first-order basis functions with delay embeddings contain enough information for the time-evolution in the function space. The discussion in Sec. 4.2 justifies the results; the simple delayed function contains the information of the time-evolution; see (99).

Of course, note that the evaluation of the second-order statistics, such as 𝔼[Xd2¯]\mathbb{E}[\underline{X_{d}^{2}}], requires the second-order monomial basis functions.

5.6 Power-law behavior for the change of noise amplitude

Refer to caption
Figure 5: The dependency of the accuracy on the noise amplitude in the Lorentz system with ρ=28\rho=28. The dashed lines correspond to the fitting results with the function form in (118).

In general, external parameters, such as temperature, control the amplitude of the additive noise. Here, we investigate the effects of the noise amplitude. As a consequence, we found a power-law dependency of the accuracy with the noise amplitude.

Figure 5 depicts the dependency of the accuracy on the noise amplitude σ\sigma in the Lorentz system. Here, the plotted markers correspond to the accuracy for the delay embeddings with M=8M=8; for all cases, the accuracy seems to have stopped declining in Fig. 4. We judged that the M=8M=8 cases yield enough accurate estimation within the finite dictionary functions.

Note that we here employ the log-log plots in Fig. 5. From the plotted results, we assume the function form of the accuracy for the noise amplitude as follows:

f(σ)=α2σα1+α3.\displaystyle f(\sigma)=\alpha_{2}\sigma^{\alpha_{1}}+\alpha_{3}. (118)

Note that even in the noiseless cases, i.e., deterministic systems, there is a difference between the true statistics and the estimated ones because of the partial observation. Hence, we introduced α3\alpha_{3} in (118). The fitted results are also shown in Fig. 5; the fitting seems to be good.

Table 1: The power-law exponent α1\alpha_{1} in (118) for the Lorentz system.
α1\alpha_{1} for 𝔼[X1¯]\mathbb{E}[\underline{X_{1}}] α1\alpha_{1} for 𝔼[X2¯]\mathbb{E}[\underline{X_{2}}] α1\alpha_{1} for 𝔼[X3¯]\mathbb{E}[\underline{X_{3}}]
ρ=28\rho=28 0.808 ±\pm 0.012 0.849 ±\pm 0.022 1.007 ±\pm 0.013
ρ=13\rho=13 0.773 ±\pm 0.036 0.661 ±\pm 0.024 0.820 ±\pm 0.022

In Table 1, we summarized the power-law exponent α1\alpha_{1} in (118) for the Lorentz system. Here, we repeated the same procedure for Fig. 5 five times and evaluated the means and standard deviations for the exponent α1\alpha_{1}. Although it is difficult to capture the meaning of the exponent α1\alpha_{1}, the exponent would reflect the effect of the unobserved part of the partial observation. The reason is as follows:

  • Although the EDMD algorithm can adequately deal with noise effects, partial observation causes a decrease in accuracy due to noise effects. The decrease occurs if the noise effects are orthogonal to the spanned space by the observable functions.

  • As discussed in Sec. 5.5, X3¯\underline{X_{3}} is the most significantly affected by partial observations. Then, the exponent α1\alpha_{1} for 𝔼[X3¯]\mathbb{E}[\underline{X_{3}}] is the largest in Table 1. Note that the values of accuracy for 𝔼[X3¯]\mathbb{E}[\underline{X_{3}}] are the smallest in Fig. 5; the increase in the accuracy due to the partial observation is not directly related to the actual accuracy values.

  • For the ρ=13\rho=13 case, all the exponents α1\alpha_{1} becomes small. The reason is that the term ρX1¯\rho\underline{X_{1}} in (103) becomes smaller, and the effects of the partial observation are also smaller. In particular, the effects of noise in X1¯\underline{X_{1}} are reduced directly via the term ρX1¯\rho\underline{X_{1}}, and X2¯\underline{X_{2}} is most significantly affected.

From the above discussion, we expect that the power-law form and its exponent reflect some parts of the effects of the partial observation. If the ignored term has a strong nonlinearity, the impact of the partial observation should be considerable. Then, we perform additional numerical experiments for the following modified van der Pol systems, in which we replace (101) with

X2¯=(μ(1X12¯)X2¯h(X1¯))t+σ2dW2(t)¯,\displaystyle\rmd\underline{X_{2}}=\left(\mu\left(1-\underline{X_{1}^{2}}\right)\underline{X_{2}}-h(\underline{X_{1}})\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)}, (119)

where h(X1¯)h(\underline{X_{1}}) is a nonlinear function. In the original van der Pol system, h(X1¯)=X1¯h(\underline{X_{1}})=\underline{X_{1}}. This is an odd function, and then, we consider two cases with h(X1¯)=X13¯h(\underline{X_{1}})=\underline{X_{1}^{3}} and h(X1¯)=X15¯h(\underline{X_{1}})=\underline{X_{1}^{5}}.

Table 2: The power-law exponent α1\alpha_{1} in (118) for the modified van der Pol system.
α1\alpha_{1} for 𝔼[X1¯]\mathbb{E}[\underline{X_{1}}] α1\alpha_{1} for 𝔼[X2¯]\mathbb{E}[\underline{X_{2}}]
h(X1¯)=X1¯h(\underline{X_{1}})=\underline{X_{1}} 0.489 ±\pm 0.027 0.596 ±\pm 0.017
h(X1¯)=X13¯h(\underline{X_{1}})=\underline{X_{1}^{3}} 0.567 ±\pm 0.030 0.652 ±\pm 0.029
h(X1¯)=X15¯h(\underline{X_{1}})=\underline{X_{1}^{5}} 0.794 ±\pm 0.012 1.197 ±\pm 0.102

Table 2 shows the numerical results. As expected from the above discussion, the stronger the nonlinearity, the larger the exponent.

6 Discussions

In this work, we focused on the partial observations in stochastic systems. The Koopman operator approach is also available, and it is crucial to note the difference between state variables and observable functions. In other words, the Koopman operator approach gives an expected value, not the value of the state variable. The deterministic case is special, and the expected value coincides with the value of the state variable, i.e., 𝔼[Xd¯]=xd¯\mathbb{E}[\underline{X_{d}}]=\underline{x_{d}}. Although the stochasticity prevents us from using this simple connection, the delay embedding is effective even in stochastic systems. Note that there is a difference with the deterministic case, i.e., higher-order dictionary functions for the embedded states are necessary to evaluate higher-order statistics. In addition, the numerical experiments clarified that the noise dependency for accuracy exhibits a power-law behavior, and the exponent could reflect the characteristics of the partial observations.

There are some remaining tasks. First, it is not clear why the dependency of accuracy on the noise amplitude exhibits the power-law form in (118). Second, one should investigate the theoretical connection between the exponent α1\alpha_{1} in the power-law form and the information loss in partial observations. While we performed several other numerical experiments, we have not yet found a quantitative correspondence.

As mentioned above, there has been little discussion of partial observations in stochastic systems. Modeling based on stochastic differential equations is popular. Hence, further studies on partial observations are desirable from a practical standpoint. We believe that this study will serve as a starting point for future research.

This work was supported by JST FOREST Program (Grant Number JPMJFR216K, Japan).

References

References

  • [1] Zwanzig R 1961 Phys. Rev. 124 983
  • [2] Mori H 1965 Prog. Theor. Phys. 33 423
  • [3] Zwanzig R 1973 J. Stat. Phys. 9 215
  • [4] Chorin A J, Hald O H and Kupferman R 2000 Proc. Nat. Acad. Sci. 97 2968
  • [5] Chorin A J, Hald O H and Kupferman R 2002 Physica D 166 239
  • [6] Chorin A J and Stinis P 2006 Comm. Appl. Math. Comp. Sci. 1 1
  • [7] Gouasmi A, Parish E J and Duraisamy K 2017 Proc. R. Soc. A 473 20170385
  • [8] te Vrugt M, Topp L, Wittkowski R and Heuer A 2024 J. Chem. Phys. 161 094904
  • [9] Netz R R 2024 Phys. Rev. E 110 014123
  • [10] Chorin A J and Lu F 2015 Proc. Nat. Acad. Sci. 112 9804
  • [11] Meyer H, Pelagejcev P and Schilling T 2019 Europhys. Lett. 128 40001
  • [12] Maeyama S and Watanabe T H 2020 J. Phys. Soc. Jpn. 89 024401
  • [13] González D, Chinesta F and Cueto E 2021 J. Comp. Phys. 428 109982
  • [14] Tian Y, Lin Y T, Anghel M and Livescu D 2021 Phys. Fluids 33 125118
  • [15] Venturi D and Li X 2023 Res. Math. Sci. 10 23
  • [16] Gupta P, Schmid P, Sipp D, Sayadi T and Rigas G 2025 Proc. R. Soc. A 481 20240259
  • [17] Lin K K and Lu F 2021 J. Comp. Phys. 424 109864
  • [18] Lin Y T, Tian Y, Livescu D and Anghel M 2021 SIAM J. Appl. Dyn. Syst. 20 2558
  • [19] Lin Y T, Tian Y, Perez D and Livescu D 2023 SIAM J. Appl. Dyn. Syst. 22 2890
  • [20] Koopman B O 1931 Proc. Natl. Acad. Sci. 17 315
  • [21] Rowley C W, Mezić I, Bagheri S, Schlatter P and Henningson D S 2009 J. Fluid Mech. 641 115
  • [22] Schmid P J 2010 J. Fluid Mech. 656 5
  • [23] Williams M O, Kevrekidis I G and Rowley C W 2015 J. Nonlinear Sci. 25 1307
  • [24] Arbabi H and Mezić I 2017 SIAM J. Appl. Dyn. Syst. 16 2096
  • [25] Brunton S L, Brunton B W, Proctor J L, Kaiser E and Kutz J N 2017 Nature Comm. 8 19
  • [26] Mauroy A, Susuki Y and Mezić I 2020 The Koopman Operator in Systems and Control: Concepts, Methodologies, and Applications, ed Mauroy A, Mezić I and Susuki Y (Cham: Springer)
  • [27] Budišić M, Mohr R and Mezić I 2012 Chaos 22 596
  • [28] Mezić I 2013 Annu. Rev. Fluid Mech. 45 357
  • [29] Rowley C W and Dawson S T M 2017 Annu. Rev. Fluid Mech. 49 387
  • [30] Mezić I 2021 Notices of Amer. Math. Soc. 68 1087
  • [31] Brunton S L, Budisǐć M, Kaiser E and Kutz J N 2022 SIAM Rev. 64 229
  • [32] Črnjarić-Žic N, Maćešić S and Mezić I 2020 J. Nonlinear Sci. 30 2007
  • [33] Wanner M and Mezić I 2022 SIAM J. Appl. Dyn. Syst. 21 1930
  • [34] Gardiner C 2009 Stochastic methods: A handbook for the natural and social sciences, 4th edition (Berlin Heidelberg: Springer)
  • [35] Doi M 1976 J. Phys. A: Math. Gen. 9 1465
  • [36] Doi M 1976 J. Phys. A: Math. Gen. 9 1479
  • [37] Peliti L 1985 J. Physique 46 1469
  • [38] Ohkubo J 2013 J. Phys. A: Math. Theor. 46 375004
  • [39] Ohkubo J and Arai Y 2019 J. Stat. Mech. 063202
  • [40] Takahashi S and Ohkubo J 2023 J. Stat. Phys. 190 159
  • [41] Clainche S L and Vega J M 2017 SIAM J. Appl. Dyn. Syst. 16 882
  • [42] Kamb M, Kaiser E, Brunton S L and Kutz J N 2020 SIAM J. Appl. Dyn. Syst. 19 886
  • [43] Takens F 1981 Lect. Notes in Math. 898 366
  • [44] Van der Pol B 1926 The London, Edinburgh and Dublin Phil. Mag. & J. of Sci. 2, 978
  • [45] Lorenz E N 1963 J. Atmos. Sci. 20 130
  • [46] Kloeden P E and Platen E 1992 Numerical Solution of Stochastic Differential Equations (Berlin: Springer)
  • [47] Ohkubo J 2021 J. Stat. Mech. 2021 013401