Koopman operator-based discussion on partial observation in stochastic systems

Jun Ohkubo Graduate School of Science and Engineering, Saitama University,
255 Shimo-Okubo, Sakura-ku, Saitama 338-8570, Japan johkubo@mail.saitama-u.ac.jp

Abstract

It is sometimes difficult to achieve a complete observation for a full set of observables, and partial observations are necessary. For deterministic systems, the Mori-Zwanzig formalism provides a theoretical framework for handling partial observations. Recently, data-driven algorithms based on the Koopman operator theory have made significant progress, and there is a discussion to connect the Mori-Zwanzig formalism with the Koopman operator theory. In this work, we discuss the effects of partial observation in stochastic systems using the Koopman operator theory. The discussion clarifies the importance of distinguishing the state space and the function space in stochastic systems. Even in stochastic systems, the delay embedding technique is beneficial for partial observation, and several numerical experiments showed a power-law behavior of the accuracy for the amplitude of the additive noise. We also discuss the relation between the exponent of the power-law behavior and the effects of partial observation.

1 Introduction

In a classical dynamical system, the behavior is deterministic if we can observe complete information for the system. However, it would be difficult to perform a perfect observation, and partial observation causes stochasticity. A simple example is the movement of a ball; if only the coordinate is measurable but not its velocity, it is impossible to predict in which direction and how much it will move next. The effect of partial observation has also been discussed in statistical physics; constructing coarse-grained models from high-dimensional microscopic ones leads to time-evolution equations of a smaller set of relevant variables of interest. The unobserved variables could play a role as noise. In the recent development of data-driven approaches, the effects of partial observations are crucial because it could be difficult to observe all variables. Hence, it would be desirable to obtain even a portion of the overall information from partial observation.

One of the famous ways for handling partial observations is the Mori-Zwanzig formalism [1, 2, 3]. Mori and Zwanzig developed projection methods to express the effect of unobserved variables in terms of observable ones. The time-evolution equation for the observable variables is called the generalized Langevin equation. The crucial points in the generalized Langevin equation are as follows:

•

Although the original dynamics is deterministic and Markovian, the generalized Langevin equation has a memory term that depends on the history.
•

There is a term that could be interpreted as noise; the noise term represents the orthogonal dynamics and depends on the unknown initial conditions of the unobserved variables.

There are many works on the Mori-Zwanzig formalism. For example, some employed the Mori-Zwanzig formalism to improve prediction accuracy [4, 5, 6, 7, 8, 9]; there are so many related works, and see the reference in, for example, [9]. Some works focused on data-driven approaches for the Mori-Zwanzig formalism [10, 11, 12, 13, 14], and some discussed the application in neural networks [15, 16].

Recently, a data-driven approach based on the Koopman operator has been connected to the discussion of the Mori-Zwanzig formalism [17, 18, 19]. The Koopman operator [20] enables us to deal with nonlinear dynamical systems in terms of linear algebra. The well-known methods of incorporating data into the Koopman operator include the dynamic mode decomposition (DMD)[21, 22], the extended dynamic mode decomposition (EDMD)[23], and the Hankel DMD[24, 25]. For details of the Koopman approach, see the book [26] and reviews [27, 28, 29, 30, 31]. In [18], a discussion based on the Koopman approach naturally leads to the generalized Langevin equation and algorithms for estimating the key components from time series datasets. There are also some works on this topic; [17] focused on the Wiener projection, and [19] employed regression-based projections.

Here, we note that previous discussions on the Koopman-based approach to partial observations have been basically restricted only to deterministic dynamical systems. Of course, the original Mori-Zwanzig formalism targeted deterministic systems, and it would be natural to consider the Koopman-based approach to deterministic cases. However, the Koopman approach is not restricted to deterministic cases and is available to stochastic systems; for example, see [23, 32, 33]. In some cases, one would consider time-evolution equations with noise as the starting point; the noise term in the generalized Langevin equation is not an actual noise. Stochastic differential equations with additive Wiener noise are widely used in statistical physics, and it remains unclear how to consider partial observations in stochastic systems.

In the present paper, we discuss the effects of partial observation in stochastic systems. Here, we focus on the stochastic differential equation with additive Wiener noise. To discuss partial observation, the Koopman operator approach is employed. In the Koopman operator approach, it is crucial to distinguish the state space and the function space; it will be clarified that the generalized Langevin equation in the state space is derived only in deterministic cases. The discussion on stochastic systems clarifies the effect of delay embedding, which improves estimation accuracy over the use of higher-order basis functions. In addition, numerical experiments for the noisy van der Pol system and the noise Lorentz system show a power-law dependency of the accuracy on the noise amplitude in the partial observation settings.

This article is organized as follows. In Sec. 2, we review the Koopman operator approach. Section 3 explains the previous discussions on the Mori-Zwanzig formalism with the Koopman operator approach, where we will emphasize the role of the deterministic feature in the explanation. Then, Sec. 4 yields discussions on the partial observations in stochastic systems and the effects of delay embedding. We also present some results from numerical experiments. In the numerical experiments, the noisy van der Pol system and the noisy Lorenz system are employed as toy examples.

2 Preliminaries on Koopman operator approach

In the present paper, it is crucial to distinguish the state vectors and observable functions that yield the values of state vectors. We review the Koopman operator approach with an emphasis on this point. For further details, refer to [23].

2.1 Distinction between state space and function space

Consider a state space $\mathcal{M}\subseteq\mathbb{R}^{D}$ and an adequate functional space $\mathcal{F}$ ; an element in $\mathcal{F}$ is called an observable function, and $\mathcal{F}\ni\phi:\mathcal{M}\to\mathbb{R}$ . To emphasize the distinctions between the state space and the functional space, we add underlines to denote state vectors on $\mathcal{M}$ , e.g., $\underline{\bm{x}}$ . The $d$ -th element of $\underline{\bm{x}}$ is denoted as $\underline{x_{d}}\in\mathbb{R}$ . By contrast, an observable function, which yields the value of the $d$ -th element $\underline{x_{d}}$ , is denoted as $x_{d}(\underline{\bm{x}})=\underline{x_{d}}$ . Note that it is common in many papers on the Koopman operator approach to use abbreviations, e.g., $x_{d}\in\mathcal{F}$ , for notational simplicity. While the abbreviations are convenient, we again emphasize the importance of distinguishing between elements in the state space and those in the function space. Hence, we employ the notation with underlines for the state space.

2.2 Deterministic systems

Let $\underline{\bm{x}(t)}$ be the state vector of the system at time $t$ . The deterministic time-evolution equation for $\underline{\bm{x}(t)}$ is given by

\displaystyle\frac{\rmd}{\rmd t}\underline{\bm{x}(t)}=\bm{a}(\underline{\bm{x}(t)}),

(1)

where $\bm{a}(\underline{\bm{x}})$ is a vector of functions giving the time-evolution for each coordinate $\underline{x_{d}(t)}$ . While it could be possible to include the time variable $t$ in the functions, we assume that $\bm{a}(\underline{\bm{x}})$ is time-independent for simplicity.

It is beneficial to see the coupled ordinary differential equations in (1) from the viewpoint of a dynamical system with discrete-time steps. Setting a time interval for observation as $\Delta t_{\mathrm{obs}}$ , the state vector evolves in time as follows:

\displaystyle\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}=\bm{F}_{\Delta t_{\mathrm{obs}}}(\underline{\bm{x}(t)}),

(2)

where $\bm{F}_{\Delta t_{\mathrm{obs}}}:\mathcal{M}\to\mathcal{M}$ yields the state vector after the time-evolution with $\Delta t_{\mathrm{obs}}$ .

In the Koopman operator approach, we consider a time-evolution of observable functions instead of that of the state vector. Consider the observable function $x_{d}:\mathcal{M}\to\mathbb{R}$ , which yields the $d$ -th element of the state vector. Then, we consider a function that gives the $d$ -th element of the state vector after the time-evolution with $\Delta t_{\mathrm{obs}}$ . As we will see later, it is possible to obtain such a function by employing a map $\mathcal{K}_{\Delta t_{\mathrm{obs}}}:\mathcal{F}\to\mathcal{F}$ :

\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}=x_{d}\circ\bm{F}_{\Delta t_{\mathrm{obs}}},

(3)

which leads to

\displaystyle x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})=\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}(t)}).

(4)

Note that the Koopman operator is linear even if the original system is nonlinear.

Refer to caption — Figure 1: The role of the Koopman operator $\mathcal{K}_{\Delta t_{\mathrm{obs}}}$ . We here consider an example for $D=2$ . The change of $\underline{x_{1}(t)}$ to $\underline{x_{1}(t+\Delta t_{\mathrm{obs}})}$ on the state space corresponds to $x_{1}(\underline{\bm{x}(t)})$ to $\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}(\underline{\bm{x}(t)})$ on the function space. Note that the Koopman operator varies the flat plane $x_{1}$ to the curved surface, which corresponds to the map on the function space $\mathcal{K}_{\Delta t_{\mathrm{obs}}}:\mathcal{F}\to\mathcal{F}$ .

Figure 1 shows the correspondence between the state space $\mathcal{M}$ and the function space $\mathcal{F}$ . Although the observable function $x_{1}$ is the simple flat plane, the action of the Koopman operator yields the curved surface $\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}$ . The function $\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}:\mathcal{M}\to\mathbb{R}$ gives the coordinate value after the time-evolution, i.e., $\underline{x_{1}(t+\Delta t_{\mathrm{obs}})}$ .

Note that while it is possible to consider various types of observable functions, we focus only on the simple observables $\{x_{d}\}$ in the present work. A collection of the observable functions for the coordinates is called a full-state observable $\bm{g}:\mathcal{M}\to\mathbb{R}^{D}$ , which is a vector-valued observable function and

\displaystyle\bm{g}=[x_{1}\quad x_{2}\quad\dots\quad x_{D}]^{\top}.

(5)

Again, we note that it is crucial to distinguish $\bm{g}$ and $\underline{\bm{x}}=[\underline{x_{1}}\,\,\underline{x_{2}}\,\,\dots\,\,\underline{x_{D}}]^{\top}\in\mathbb{R}^{D}$ .

2.3 Extended dynamic mode decomposition

As discussed in [23], we can obtain an approximate representation of the Koopman operator $\mathcal{K}_{\Delta t_{\mathrm{obs}}}$ by a data-driven approach. That is, introducing a set of basis functions, called a dictionary, leads to a Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ for the time interval $\Delta t_{\mathrm{obs}}$ . The data-driven method is called the EDMD algorithm.

The EDMD algorithm requires a data set of snapshot pairs. The snapshot pairs are denoted as $\{(\underline{\bm{x}_{n}},\underline{\bm{y}_{n}})\}_{n=1}^{N_{\mathrm{data}}}$ , where

\displaystyle\underline{\bm{x}_{n}}=[\underline{x_{n1}(t)}\quad\underline{x_{n2}(t)}\quad\cdots\quad\underline{x_{nD}(t)}]^{\top}

(6)

and

	$\displaystyle\underline{\bm{y}_{n}}$	$\displaystyle=\bm{F}_{\Delta t_{\mathrm{obs}}}(\underline{\bm{x}_{n}})$			(7)
		$\displaystyle=[\underline{x_{n1}(t+\Delta t_{\mathrm{obs}})}\quad\underline{x_{n2}(t+\Delta t_{\mathrm{obs}})}\quad\cdots\quad\underline{x_{nD}(t+\Delta t_{\mathrm{obs}})}]^{\top}.$			(7)

$N_{\mathrm{data}}$ represents the number of snapshot pairs. The dictionary, $\bm{\psi}$ , consists of $N_{\mathrm{dic}}$ basis functions, i.e.,

\displaystyle\bm{\psi}(\underline{\bm{x}})=[\psi_{1}(\underline{\bm{x}})\quad\psi_{2}(\underline{\bm{x}})\quad\cdots\quad\psi_{N_{\mathrm{dic}}}(\underline{\bm{x}})]^{\top},

(8)

where $\psi_{i}:\mathcal{M}\to\mathbb{R}$ for $i=1,\dots,N_{\mathrm{dic}}$ . For example, a dictionary with monomial functions for $D=2$ is given as

\displaystyle\bm{\psi}(\underline{\bm{x}})=[\underline{1}\quad x_{1}(\underline{\bm{x}})\quad x_{2}(\underline{\bm{x}})\quad x_{1}^{2}(\underline{\bm{x}})\quad x_{1}x_{2}(\underline{\bm{x}})\quad x_{2}^{2}(\underline{\bm{x}})\quad x_{1}^{3}(\underline{\bm{x}})\quad\dots\quad x_{2}^{5}(\underline{\bm{x}})]^{\top},

(9)

where the maximum degree of the monomial functions is $5$ , and $\underline{1}$ means a scalar value $1$ . The dictionary spans a subspace $\mathcal{F}_{\bm{\psi}}\subseteq\mathcal{F}$ .

Note that the dictionary with the monomial functions contains the full-state observable $\bm{g}$ ; see the second and third elements on the right-hand side of (9). However, the action of the Koopman operator yields the function $\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}$ , which may not be in the spanned subspace $\mathcal{F}_{\bm{\psi}}$ . In this case, we have the following expansion:

\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}})\simeq\sum_{i=1}^{N_{\mathrm{dic}}}c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}}_{i}\psi_{i}(\underline{\bm{x}}),

(10)

where $\{c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d}}_{i}\}_{i=1}^{N_{\mathrm{dic}}}$ are the expansion coefficients. Similarly, we can consider the action for an arbitrary function $\phi\in\mathcal{F}$ , $\mathcal{K}_{\Delta t_{\mathrm{obs}}}\phi$ , and its coefficients $\{c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}\phi}_{i}\}_{i=1}^{N_{\mathrm{dic}}}$ . Note that the function $\phi$ before the time-evolution may not be in the spanned subspace $\mathcal{F}_{\bm{\psi}}$ , and the function $\phi$ is also approximately written as follows:

\displaystyle\phi(\underline{\bm{x}})\simeq\sum_{i=1}^{N_{\mathrm{dic}}}c^{\phi}_{i}\psi_{i}(\underline{\bm{x}}).

(11)

Since a linear combination of the dictionary functions yields an arbitrary function and its time-evolved function approximately, it is enough for us to consider the action of the Koopman operator to the dictionary. Then, the Koopman operator $\mathcal{K}_{\Delta t_{\mathrm{obs}}}$ is approximated by a finite-size matrix $K_{\Delta t_{\mathrm{obs}}}$ as follows:

\displaystyle\left[\begin{array}[]{c}\mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{1}(\underline{\bm{x}})\\ \mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{2}(\underline{\bm{x}})\\ \vdots\\ \mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{N_{\mathrm{dic}}}(\underline{\bm{x}})\end{array}\right]=\mathcal{K}_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}})\simeq K_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}}).

(16)

It is easy to determine the Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ from the dataset; we solve the following least squares problem numerically:

\displaystyle K_{\Delta t_{\mathrm{obs}}}=\mathop{\rm arg~{}min}\limits_{\widetilde{K}_{\Delta t_{\mathrm{obs}}}}\sum_{n=1}^{N_{\mathrm{data}}}\left\|\bm{\psi}(\underline{\bm{y}_{n}})-\widetilde{K}_{\Delta t_{\mathrm{obs}}}\bm{\psi}(\underline{\bm{x}_{n}})\right\|^{2}.

(17)

Note that a comparison with (10) and (17) immediately leads to the element at the $d$ -th row and the $i$ -th column of the Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ as follows:

	$\displaystyle\left[K_{\Delta t_{\mathrm{obs}}}\right]_{di}=c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{d-1}}_{i}\quad(\textrm{for $2\leq d\leq(D+1)$}),$
	$\displaystyle\left[K_{\Delta t_{\mathrm{obs}}}\right]_{di}=c^{\mathcal{K}_{\Delta t_{\mathrm{obs}}}\psi_{d}}_{i}\quad(\textrm{for $d>(D+1)$}).$		(18)

Since the first dictionary function $\psi_{1}(\bm{\underline{x}})=1(\bm{\underline{x}})$ is time-invariant, $[K_{\Delta t_{\mathrm{obs}}}]_{1i}=1$ for all $i$ .

In short, one can consider the linear action of $\mathcal{K}_{\Delta t_{\mathrm{obs}}}$ in the function space instead of the nonlinear time-evolution on the state space, $\bm{F}_{\Delta t_{\mathrm{obs}}}$ , in (2). This linearity naturally leads to the use of a linear combination of basis functions, which facilitates estimation from the data via the least squares method. The usage of linearity is one of the benefits of the Koopman operator approach.

2.4 Stochastic systems

As denoted in Sec. 1, the Koopman operator approach is also available for stochastic systems; for example, see [23, 32, 33]. In the stochastic cases, the action of the Koopman operator yields an expected value of a statistic for the state variable rather than the state variable itself. For example,

\displaystyle\mathcal{K}_{\Delta t_{\mathrm{obs}}}x_{1}(\underline{\bm{x}_{\mathrm{ini}}})=\mathbb{E}[x_{1}(t+\Delta t_{\mathrm{obs}})|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}].

(19)

In the following, we briefly explain why the Koopman operator approach yields the expectation values from the viewpoint of the Fokker-Planck and the backward Kolmogorov equations.

Consider a $D$ -dimensional vector of stochastic variables, $\underline{\bm{X}}\in\mathcal{M}$ , and assume that the vector $\underline{\bm{X}(t)}$ at time $t$ obeys the following stochastic differential equation:

\displaystyle\rmd\underline{\bm{X}(t)}=\bm{a}(\underline{\bm{X}(t)})\,\rmd t+B(\underline{\bm{X}(t)})\,\rmd\underline{\bm{W}(t)},

(20)

where $\bm{a}(\underline{\bm{X}})$ is a vector of drift coefficient functions, $B(\underline{\bm{X}})$ is a matrix of diffusion coefficient functions, and $\underline{\bm{W}(t)}$ is a vector of Wiener processes $\{\underline{W_{i}(t)}\}$ . The Wiener processes satisfy

\displaystyle\mathbb{E}[\underline{W_{i}(t)}]=0,\qquad\mathbb{E}[(\underline{W_{i}(t)}-\underline{W_{i}(s)})(\underline{W_{j}(t)}-\underline{W_{j}(s)})]=(t-s)\delta_{ij},

(21)

for $0\leq s\leq t$ . Let $\underline{\bm{X}(0)}=\underline{\bm{x}_{\mathrm{ini}}}$ be the initial condition for the stochastic differential equation. The stochastic differential equation in (20) has a corresponding Fokker-Planck equation [34], which describes the time-evolution of the probability density function $p(\underline{\bm{x}},t)$ :

\displaystyle\frac{\partial}{\partial t}p(\underline{\bm{x}},t)=\mathcal{L}^{\dagger}\,p(\underline{\bm{x}},t),

(22)

where

\displaystyle\mathcal{L}^{\dagger}=-\sum_{d}\frac{\partial}{\partial\underline{x_{d}}}a_{d}(\underline{\bm{x}})+\frac{1}{2}\sum_{i,j}\frac{\partial^{2}}{\partial\underline{x_{i}}\partial\underline{x_{j}}}\left[B(\underline{\bm{x}})B(\underline{\bm{x}})^{\top}\right]_{ij}

(23)

is the time-evolution operator for the Fokker-Planck equation. The initial condition at time $t$ is

\displaystyle p(\underline{\bm{x}},t)=\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}}),

(24)

where $\delta(\cdot)$ is the Dirac delta function.

Note that the time-evolved probability density function, $p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})$ , is formally written as

\displaystyle p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}p(\underline{\bm{x}},t).

(25)

Then, the expectation for the observable function $x_{d}$ is given as follows:

	$\displaystyle\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})\|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}]$
	$\displaystyle=\int_{\mathcal{M}}x_{d}(\underline{\bm{x}})\,p(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\rmd\underline{\bm{x}}$
	$\displaystyle=\int_{\mathcal{M}}x_{d}(\underline{\bm{x}})\left(e^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\right)\rmd\underline{\bm{x}}$
	$\displaystyle=\int_{\mathcal{M}}\left(e^{\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(\underline{\bm{x}})\right)\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\,\rmd\underline{\bm{x}}$
	$\displaystyle=\int_{\mathcal{M}}\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\,\rmd\underline{\bm{x}}$
	$\displaystyle=\varphi_{d}(\underline{\bm{x}_{\mathrm{ini}}},t+\Delta t_{\mathrm{obs}}),$		(26)

where

\displaystyle\mathcal{L}=\sum_{d}a_{d}(\bm{x})\frac{\partial}{\partial x_{d}}+\frac{1}{2}\sum_{i,j}\left[\bm{B}(\bm{x})\bm{B}(\bm{x})^{\top}\right]_{ij}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}},

(27)

and $\varphi_{d}(\underline{\bm{x}},t)$ is the solution of the following partial differential equation, i.e., the so-called the backward Kolmogorov equation:

\displaystyle\frac{\partial}{\partial t}\varphi_{d}(\underline{\bm{x}},t)=\mathcal{L}\varphi_{d}(\underline{\bm{x}},t),

(28)

which leads to

\displaystyle\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}}\varphi_{d}(\underline{\bm{x}},t).

(29)

Note that $\varphi_{d}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})\in\mathcal{F}$ , and the initial condition is

\displaystyle\varphi_{d}(\underline{\bm{x}},t)=x_{d}(\underline{\bm{x}}).

(30)

Here, $\mathcal{L}^{\dagger}$ in (23) yields the time-evolution of the probability density function on the state space $\mathcal{M}$ . By contrast, $\mathcal{L}$ in (27) corresponds to the time-evolution of functions on the function space $\mathcal{F}$ . In addition, (29) indicates that

\displaystyle\mathcal{K}=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}},

(31)

and it is straightforwardly understandable that the Koopman operator $\mathcal{K}$ yields the expectation after the time-evolution.

Note that the above discussion is also available to the deterministic cases. When there is no diffusion, i.e., $B(\underline{\bm{x}})=0$ , the time-evolution operator $\mathcal{L}^{\dagger}$ includes only the first-order derivatives. While the second-order derivatives lead to the effect of broadening the probability density function, the first-order derivatives correspond to the shift of the coordinates. Then, the zero diffusion cases retain the probability density function as the Dirac delta function; $\delta(\underline{\bm{x}}-\underline{\bm{x}_{\mathrm{ini}}})\to\delta(\underline{\bm{x}}-\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})$ . Hence, the expectation yields the value of $\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}$ , and the previous discussions on the deterministic cases are recovered adequately.

3 Revisit on partial observations in deterministic systems

3.1 Use of time-dependent basis

Here, we follow [18] and revisit the effects of partial observations in deterministic systems. Note that the following discussions are slightly different from [18]. As discussed below, the absence of diffusion coefficients $B(\bm{x})$ is crucial to connect the Koopman operator approach and the conventional discussions for partial observations, i.e., the Mori-Zwanzig formalism.

Assume that only some of the coordinates are observable, and we denote the index set as $\mathcal{O}$ and the partial observables as $\bm{g}_{\mathcal{O}}$ whose elements are $\{x_{d}|d\in\mathcal{O}\}$ ; the number of partial observables is $N_{\mathrm{obs}}$ . For example, when we observe $\underline{x_{1}}$ and $\underline{x_{3}}$ in a $D=3$ case, $\mathcal{O}=\{1,3\}$ , $\bm{g}_{\mathcal{O}}=[x_{1}\,\,x_{3}]^{\top}$ , and $N_{\mathrm{obs}}=2$ . We also introduce a dictionary $\bm{\psi}_{\overline{\mathcal{O}}}$ that includes many basis functions $\psi_{i}\in\mathcal{F}$ related to the unobserved variables. We denote the number of dictionary functions as $N_{\mathrm{dic}}^{\overline{\mathcal{O}}}$ . The discussions on the function space and the introduction of the basis functions, i.e., the dictionary, lead to a linear algebraic approach for the nonlinear systems; this linearity is characteristic of the Koopman operator approach.

We discuss a case where the basis functions are time-dependent; this case corresponds to the discussion in [18]. In this case, an arbitrary function $\phi\in\mathcal{F}$ is expressed approximately as

\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}x_{d}(\underline{\bm{x}},t)+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}\psi_{i}(\underline{\bm{x}},t).

(32)

Then, we derive an explicit representation of the time-evolution operator $\mathcal{L}$ in (27). Using the same basis functions above, we approximate the operator $\mathcal{L}$ as matrices; it is enough to consider the actions on $\{x_{d}\}$ and $\{\psi_{i}\}$ , and we have

\displaystyle\mathcal{L}x_{d}(\underline{\bm{x}},t)\simeq\sum_{d^{\prime}\in\mathcal{O}}L^{\mathcal{O}\mathcal{O}}_{dd^{\prime}}x_{d^{\prime}}(\underline{\bm{x}},t)+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}L^{\mathcal{O}\overline{\mathcal{O}}}_{di}\psi_{i}(\underline{\bm{x}},t),

(33)

and

\displaystyle\mathcal{L}\psi_{i}(\underline{\bm{x}},t)\simeq\sum_{d\in\mathcal{O}}L^{\overline{\mathcal{O}}\mathcal{O}}_{id}x_{d}(\underline{\bm{x}},t)+\sum_{i^{\prime}=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}_{ii^{\prime}}\psi_{i^{\prime}}(\underline{\bm{x}},t).

(34)

Then, using the matrices, $L^{\mathcal{O}\mathcal{O}}\in\mathbb{R}^{N_{\mathrm{obs}}\times N_{\mathrm{obs}}}$ , $L^{\mathcal{O}\overline{\mathcal{O}}}\in\mathbb{R}^{N_{\mathrm{obs}}\times N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}$ , $L^{\overline{\mathcal{O}}\mathcal{O}}\in\mathbb{R}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\times N_{\mathrm{obs}}}$ , and $L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\in\mathbb{R}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\times N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}$ , the time-evolution equation is given as

\displaystyle\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{g}_{\mathcal{O}}(t)\\ \bm{\psi}_{\overline{\mathcal{O}}}(t)\end{array}\right]=\left[\begin{array}[]{cc}L^{\mathcal{O}\mathcal{O}}&L^{\mathcal{O}\overline{\mathcal{O}}}\\ L^{\overline{\mathcal{O}}\mathcal{O}}&L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{g}_{\mathcal{O}}(t)\\ \bm{\psi}_{\overline{\mathcal{O}}}(t)\end{array}\right].

(41)

As shown in [18], it is possible to derive the time-evolution equation for $\bm{g}_{\mathcal{O}}(t)$ . First, we solve (41) for $\bm{\psi}_{\overline{\mathcal{O}}}(t)$ implicitly;

\displaystyle\bm{\psi}_{\overline{\mathcal{O}}}(t)=\int_{0}^{t}\rme^{(t-s)L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}}\bm{g}_{\mathcal{O}}(s)\rmd s+\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0).

(42)

Then, inserting (42) to (41), we have

\displaystyle\frac{\rmd}{\rmd t}\bm{g}_{\mathcal{O}}(t)=L^{\mathcal{O}\mathcal{O}}\bm{g}_{\mathcal{O}}(t)+L^{\mathcal{O}\overline{\mathcal{O}}}\int_{0}^{t}\rme^{(t-s)L^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}}\bm{g}_{\mathcal{O}}(s)\rmd s+L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0).

(43)

Replacing $L^{\mathcal{O}\mathcal{O}}$ with $M$ and $-L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{sL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}L^{\overline{\mathcal{O}}\mathcal{O}}$ with $K_{\mathrm{mem}}(s)$ , we have

\displaystyle\frac{\rmd}{\rmd t}\bm{g}_{\mathcal{O}}(t)=M\bm{g}_{\mathcal{O}}(t)-\int_{0}^{t}K_{\mathrm{mem}}(t-s)\bm{g}_{\mathcal{O}}(s)\rmd s+\bm{f}(t),

(44)

where $\bm{f}(t)=L^{\mathcal{O}\overline{\mathcal{O}}}\rme^{tL^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{\psi}_{\overline{\mathcal{O}}}(0)$ . $M$ is called the Markov transition matrix, which quantifies the interactions between the observables $\bm{g}_{\mathcal{O}}(t)$ . $K_{\mathrm{mem}}(s)$ is called the memory kernel and depends on the history of the observed quantity; (43) indicates that the memory effect stems from the path via the unobserved quantities. $\bm{f}(t)$ is referred to as noise since it depends on the initial values of the unobserved quantities $\bm{\psi}_{\overline{\mathcal{O}}}(0)$ . That is, there is no information about their explicit values, and then we cannot predict them.

3.2 Feature of deterministic systems

Note that (44) is an equation in the function space $\mathcal{F}$ . It is crucial to distinguish the function space $\mathcal{F}$ and the state space $\mathcal{M}$ ; an element $x_{d}$ in $\bm{g}_{\mathcal{O}}(t)$ is different from an element $\underline{x_{d}}$ in the state vector $\underline{\bm{x}}$ in general. However, deterministic systems have a unique feature that one can interpret (44) as an equation for the state vector.

First, we focus on the time-evolution equation in (26); we see that the final expression $\varphi_{d}(\underline{\bm{x}_{\mathrm{ini}}},t+\Delta t_{\mathrm{obs}})$ yields the expectation values. Although the expectation value $\mathbb{E}[x_{d}]$ is not the state $\underline{x_{d}}$ , the absence of the noise term does not change an initial Dirac-delta-type density function even in the time-evolution. That is,

\displaystyle e^{\mathcal{L}^{\dagger}\Delta t_{\mathrm{obs}}}\delta(\underline{\bm{x}}-\underline{\bm{x}(t)})=\delta(\underline{\bm{x}}-\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})}).

(45)

Then, the expectation value is equal to the corresponding state;

\displaystyle\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})|\underline{\bm{x}(t)}=\underline{\bm{x}_{\mathrm{ini}}}]=\underline{x_{d}(t+\Delta t_{\mathrm{obs}})}.

(46)

This fact enables us to equate the observable function $x_{d}$ with the state $\underline{x_{d}}$ .

Second, we introduce a different perspective on the time-evolution equation. In deterministic systems, $B(\bm{x})=0$ , i.e., there is no noise. Hence, the second term on the right-hand side of (27) is absent. Then, the following operator yields the time-evolution in the function space:

\displaystyle\mathcal{L}=\sum_{d}a_{d}(\bm{x})\frac{\partial}{\partial x_{d}}.

(47)

When we consider the time-evolution for the observable function $x_{d}$ , the action of $\mathcal{L}$ leads to

\displaystyle\mathcal{L}x_{d}=a_{d}(\bm{x}),

(48)

and then, we have

\displaystyle\frac{\rmd}{\rmd t}x_{d}=a_{d}(\bm{x}).

(49)

Note that (49) is the same form as the original time-evolution equation in the state space:

\displaystyle\frac{\rmd}{\rmd t}\underline{x_{d}}=a_{d}(\underline{\bm{x}}).

(50)

This fact also allows us to equate the time-evolution of $x_{d}$ in the function space $\mathcal{F}$ with that of $\underline{x_{d}}$ in the state space $\mathcal{M}$ .

In short, although we should distinguish the function space and the state space, the feature of the deterministic system enables us to consider the derived equation (44) in the function space as the one in the state space. This feature justifies the coarse-grained approach based on the Mori-Zwanzig formalism; we can construct time-evolution equations of a smaller set of macroscopic variables that are measurable, and other unmeasured degrees of freedom play the role of noise.

4 Partial observations in stochastic systems

4.1 Discussion with time-independent basis functions

As discussed in Sec. 3, the derived equation in (44) is for the function space. Note that the function at time $t+\Delta t_{\mathrm{obs}}$ yields the expectation values in (26). Due to the presence of the noise term, even if the initial density function is the Dirac delta function, the values of $\mathbb{E}[x_{d}(\underline{\bm{x}(t+\Delta t_{\mathrm{obs}})})]$ and $\underline{x_{d}}(t+\Delta t_{\mathrm{obs}})$ are different because the density function is spread out. Hence, the generalized Langevin equation in (44) is not directly related to the state vector $\underline{\bm{x}}$ .

Here, we discuss the generalized Langevin equation in the function space. Although the time-dependent basis in Sec. 3 is beneficial due to the connection with the state vector, it would be preferable to fix the basis functions in the discussion below. Then, we employ the following basis expansion:

\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(t)x_{d}(\underline{\bm{x}})+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(t)\psi_{i}(\underline{\bm{x}}),

(51)

where the expansion coefficients $\{c^{\phi,\mathcal{O}}_{d}(t)\}$ and $c^{\phi,\overline{\mathcal{O}}}_{i}(t)$ are time-dependent, which differs from (32).

Then, we derive the time-evolution equation for the expansion coefficients $\{c^{\phi,\mathcal{O}}_{d}(t)\}$ and $c^{\phi,\overline{\mathcal{O}}}_{i}(t)$ . In the derivation, we employ the bra and ket notations for the notational simplicity. Note that the notations are essentially the same as the Doi-Peliti method [35, 36, 37]; see the discussion in [38, 39, 40].

In the function space, we introduce the following ket states $\{|x_{d}\rangle\}$ and $\{|\psi_{i}\rangle\}$ :

	$\displaystyle\|x_{d}\rangle=x_{d}\qquad\textrm{for }d=1,\dots,D,$		(52)
	$\displaystyle\|\psi_{i}\rangle=\psi_{i}\qquad\textrm{for }i=1,\dots,N_{\mathrm{dic}}^{\overline{\mathcal{O}}},$		(53)

where we omit the argument of functions, $\underline{\bm{x}}$ . By contrast, the bra states $\{\langle x_{d}|\}$ and $\{\langle\psi_{i}|\}$ are defined as follows:

	$\displaystyle\langle x_{d}\|=\int\prod_{d^{\prime}=1}^{D}\rmd x_{d^{\prime}}\,\delta(x_{d^{\prime}})\left(\frac{\partial}{\partial x_{d}}\right)(\cdot),$		(54)
	$\displaystyle\langle\psi_{i}\|=\int\rmd\bm{x}\,\rho(\bm{x})\psi_{i}(\bm{x})(\cdot),$		(55)

where $\rho(\bm{x})$ is a suitable measure to yield $L^{2}$ functions. From (52) and (54), we have the following orthonormal relation:

\displaystyle\langle x_{d}|x_{d^{\prime}}\rangle=\int\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)x_{d^{\prime}}=\delta_{d,d^{\prime}},

(56)

where $\delta_{d,d^{\prime}}$ is the Kronecker delta function.

It would be easy to explain with a concrete example. Hence, we here assume that there are only two observable variables, $\mathcal{O}=\{1,2\}$ . Using the above bra and ket notations, (51) is rewritten as

\displaystyle\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(t)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(t)|\psi_{i}\rangle.

(57)

The time-derivative of (57) leads to

\displaystyle\frac{\rmd}{\rmd t}\phi(\underline{\bm{x}},t)=\sum_{d\in\mathcal{O}}\frac{\rmd}{\rmd t}\left(c^{\phi,\mathcal{O}}_{d}(t)\right)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\frac{\rmd}{\rmd t}\left(c^{\phi,\overline{\mathcal{O}}}_{i}(t)\right)|\psi_{i}\rangle.

(58)

Next, we take the action of $\langle x_{d}|$ on all terms from the left. Similar to the discussion in Sec. 3.1, we have the following time-evolution equation:

\displaystyle Z\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right]=\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right],

(65)

where we define

\displaystyle Z=\left[\begin{array}[]{cccccc}1&0&\langle x_{1}|\psi_{1}\rangle&\langle x_{1}|\psi_{2}\rangle&\cdots&\langle x_{1}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ 0&1&\langle x_{2}|\psi_{1}\rangle&\langle x_{2}|\psi_{2}\rangle&\cdots&\langle x_{2}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ \langle\psi_{1}|x_{1}\rangle&\langle\psi_{1}|x_{2}\rangle&\langle\psi_{1}|\psi_{1}\rangle&\langle\psi_{1}|\psi_{2}\rangle&\cdots&\langle\psi_{1}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ \langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|x_{1}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|x_{2}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{1}\rangle&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{2}\rangle&\cdots&\langle\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}|\psi_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\rangle\end{array}\right]

(71)

and

\displaystyle\langle x_{d}|\mathcal{L}|x_{d^{\prime}}\rangle=\widetilde{L}^{\mathcal{O}\mathcal{O}}_{dd^{\prime}},\quad\langle x_{d}|\mathcal{L}|\psi_{i}\rangle=\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}_{di},\quad\langle\psi_{i}|\mathcal{L}|x_{d^{\prime}}\rangle=\widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}_{id^{\prime}},\quad\langle\psi_{i}|\mathcal{L}|\psi_{i^{\prime}}\rangle=\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}_{ii^{\prime}}.

(72)

Although the matrix $Z$ is not diagonal, the following assumption for the dictionary functions $\{|\psi_{i}\rangle\}$ makes the discussion simple: the functions $\{|\psi_{i}\rangle\}$ consist of monomial basis functions. Then, an element in $\{|\psi_{i}\rangle\}$ has the following form:

\displaystyle|\bm{n}\rangle\equiv\prod_{d=1}^{D}x_{d}^{n_{d}},

(73)

where $\bm{n}=\{n_{1},n_{2},\dots,n_{D}\}$ . The corresponding bra state is defined as

\displaystyle\langle\bm{n}|\equiv\int\prod_{d=1}^{D}\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)^{n_{d}}(\cdot).

(74)

Then, we have the following orthogonal relation:

\displaystyle\langle\bm{n}|\bm{m}\rangle=\int\prod_{d=1}^{D}\rmd x_{d}\,\delta(x_{d})\left(\frac{\partial}{\partial x_{d}}\right)^{n_{d}}x_{d}^{m_{d}}=\prod_{d=1}^{D}n_{d}!\delta_{n_{d},m_{d}}.

(75)

Note that the monomial functions for the observable states,

\displaystyle|\{1,0,\dots,0\}\rangle=|x_{1}\rangle\quad\textrm{and}\quad|\{0,1,\dots,0\}\rangle=|x_{2}\rangle,

(76)

should not be included in $\{|\psi_{i}\rangle\}$ . Using the above assumption for the dictionary functions, we have the following matrix $Z$ in (71):

\displaystyle Z=\left[\begin{array}[]{cccccc}1&0&0&0&\cdots&0\\ 0&1&0&0&\cdots&0\\ 0&0&z_{1}&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&0&\cdots&z_{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}\end{array}\right],

(82)

where we denote the diagonal part with $\{z_{i}|i=1,\dots,N_{\mathrm{dic}}^{\overline{\mathcal{O}}}\}$ . Since the matrix $Z$ is diagonal, $Z^{-1}$ is also diagonal, which leads to

	$\displaystyle\frac{\rmd}{\rmd t}\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right]$	$\displaystyle=Z^{-1}\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right]$			(89)
		$\displaystyle=\left[\begin{array}[]{cc}\widetilde{L}^{\mathcal{O}\mathcal{O}}&\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\\ \widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}&\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}\end{array}\right]\left[\begin{array}[]{c}\bm{c}^{\phi,\mathcal{O}}(t)\\ \bm{c}^{\phi,\overline{\mathcal{O}}}(t)\\ \end{array}\right].$			(94)

Note that the multiplication of the diagonal matrix $Z^{-1}$ does not change the relationship between observable and unobserved functions. Since (94) has the same form as (41), we finally obtain the following equation similar to (44):

\displaystyle\frac{\rmd}{\rmd t}\bm{c}^{\phi,\mathcal{O}}(t)=\widetilde{L}^{\mathcal{O}\mathcal{O}}\bm{c}^{\phi,\mathcal{O}}(t)+\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\int_{0}^{t}\rme^{(t-s)\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}\bm{c}^{\phi,\mathcal{O}}(s)\rmd s+\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\rme^{t\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\bm{c}^{\phi,\overline{\mathcal{O}}}(0).

(95)

Although the form of (95) is similar to (41), there are differences. First, (95) is a vector-state equation for only one function $\phi$ . Second, the initial values of the coefficients $\{\bm{c}^{\phi,\mathcal{O}}(0)\}$ are one-hot because we focus only on the observable functions $\{x_{d}|d\in\mathcal{O}\}$ . For example, assume that we focus on $x_{1}$ . Hence, we have the following equation for $\phi=x_{1}$ at the initial time $t=0$ :

\displaystyle x_{1}=\phi(\underline{\bm{x}},0)=\sum_{d\in\mathcal{O}}c^{\phi,\mathcal{O}}_{d}(0)|x_{d}\rangle+\sum_{i=1}^{N_{\mathrm{dic}}^{\overline{\mathcal{O}}}}c^{\phi,\overline{\mathcal{O}}}_{i}(0)|\psi_{i}\rangle,

(96)

which indicates that only the first element, $c^{\phi,\mathcal{O}}_{1}(0)$ , is 1, and the other terms become zero because the left-hand side is the function $x_{1}$ . Then, different from the time-dependent basis functions, the third term in the right-hand side in (95), i.e., the noise term, vanishes;

\displaystyle\frac{\rmd}{\rmd t}\bm{c}^{\phi,\mathcal{O}}(t)=\widetilde{M}\bm{c}^{\phi,\mathcal{O}}(t)-\int_{0}^{t}\widetilde{K}_{\mathrm{mem}}(t-s)\bm{c}^{\phi,\mathcal{O}}(s)\rmd s,

(97)

where $\widetilde{M}\equiv\widetilde{L}^{\mathcal{O}\mathcal{O}}$ and $\widetilde{K}_{\mathrm{mem}}(s)\equiv-\widetilde{L^{\prime}}^{\mathcal{O}\overline{\mathcal{O}}}\rme^{s\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\overline{\mathcal{O}}}}\widetilde{L^{\prime}}^{\overline{\mathcal{O}}\mathcal{O}}$ . Note that different from (44), it is not straightforwardly clear that the use of time-delayed states is beneficial. The second term on the right-hand side in (97) is related to the expansion coefficient, which does not directly correspond to the delayed state.

4.2 Effectiveness of delay embedding

As mentioned above, it is crucial to distinguish the state and function spaces. The discussions in Sec. 3.1 and Sec. 4.1 are based on the Koopman operator approach which focuses on the function space. The discussion in Sec. 3.2 focuses on the feature of the deterministic systems, which yields the connection with the state and function spaces.

It is well known that delay embedding is effective in deterministic systems [24, 25, 33, 41, 42]. The effectiveness of delay embedding could be understandable from the viewpoint of the Takens theorem [43]; see also the discussions in [18]. However, in stochastic systems, it is not straightforward to apply the discussion based on the Takens theorem because we consider the function space. Here, we discuss the effectiveness of the delay embedding in the stochastic systems.

As discussed in Sec. 2, the introduction of the dictionary enables us to deal with the nonlinearity within the Koopman operator approach. However, assuming the monomial basis functions, the use of higher-order basis functions causes the so-called curse of dimensionality, which exponentially increases the size of the dictionary. In the discussion in Sec. 4.1, the basis functions affect the second term on the right-hand sides in (95) and (97). The approximation by a finite dictionary can significantly reduce accuracy.

Here, note that the Koopman operator approach leads to

\displaystyle x_{d}(t)=\mathcal{K}^{\Delta t_{\mathrm{obs}}}x_{d}(t-\Delta t_{\mathrm{obs}})=\rme^{\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(t-\Delta t_{\mathrm{obs}})

(98)

for an observable function $x_{d}$ . Then, we rewrite (98) formally as follows:

	$\displaystyle x_{d}(t-\Delta t_{\mathrm{obs}})$
	$\displaystyle=\rme^{-\mathcal{L}\Delta t_{\mathrm{obs}}}x_{d}(t)$
	$\displaystyle=\sum_{k=0}^{\infty}\left(-\sum_{d^{\prime}}^{N}a_{d^{\prime}}(\bm{x})\frac{\partial}{\partial x_{d^{\prime}}}-\frac{1}{2}\sum_{i,j}[B(\bm{x})B(\bm{x})]^{\top}_{i,j}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\right)^{k}(\Delta t_{\mathrm{obs}})^{k}x_{j}(t).$		(99)

Since $a_{d}(\bm{x})$ and $B(\bm{x})$ include various functions, the exponential in (99) yields various basis functions not included in the finite dictionary. Hence, they are suitable for representing the subspace spanned by the time-evolution operator, and we could expect that the delay embedding will work well. In the next section, we will demonstrate the effectiveness of the delay embedding. Empirical insight into the characteristics of the noise effects will also be discussed.

5 Numerical experiments on the noise effects

5.1 Two examples

To investigate how stochasticity affects the partial observation, we perform numerical experiments. In [18], the data-driven approach enables us to obtain the Markov transition matrix and the memory kernel in (44) for the deterministic cases, and the relationship between the memory kernel and the delay embedding is describe. In the stochastic cases, as discussed above, it is not straightforward to connect the memory kernel in (97) with the delay embedding. Hence, we investigate the effects of the delay embedding with the data-driven approach based on the EDMD algorithm in Sec. 2.3.

For deterministic systems, it is common that there are many variables, and we observe only a few of them. For the stochastic systems here, we assume that the noise term includes most of the unobserved effects. As discussed below, it is necessary to consider higher-order dictionary functions to evaluate the higher-order statistics in stochastic systems, which leads to high computational costs. Hence, as demonstrations, we examine the following two examples and the effects of partial observation.

The first one is the van der Pol system [44] with additive noises:

	$\displaystyle\rmd\underline{X_{1}}=\underline{X_{2}}\rmd t+\sigma_{1}\underline{dW_{1}(t)},$		(100)
	$\displaystyle\rmd\underline{X_{2}}=\left(\mu\left(1-\underline{X_{1}^{2}}\right)\underline{X_{2}}-\underline{X_{1}}\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)}.$		(101)

The second one is the Lorentz system [45] with additive noises:

	$\displaystyle\rmd\underline{X_{1}}=\nu\left(\underline{X_{2}}-\underline{X_{1}}\right)\rmd t+\sigma_{1}\underline{dW_{1}(t)},$		(102)
	$\displaystyle\rmd\underline{X_{2}}=\left(\underline{X_{1}}\left(\rho-\underline{X_{3}}\right)-\underline{X_{2}}\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)},$		(103)
	$\displaystyle\rmd\underline{X_{3}}=\left(\underline{X_{1}}\underline{X_{2}}-\beta\underline{X_{3}}\right)\rmd t+\sigma_{3}\underline{dW_{3}(t)}.$		(104)

In all the numerical experiments, we assume that the noise amplitude for each variable is common: $\sigma\equiv\sigma_{1}=\sigma_{2}$ for the van der Pol system, and $\sigma\equiv\sigma_{1}=\sigma_{2}=\sigma_{3}$ for the Lorentz system. The system parameters are set to $\mu=1$ for the van der Pol system, and $\nu=10$ , $\beta=8/3$ , and $\rho=28$ or $13$ for the Lorentz system; we tried two values for the parameter $\rho$ , as discussed below.

Figure 2 shows examples of sampled trajectories. The Euler-Maruyama approximation is employed [46]; $\Delta t=10^{-3}$ for the van der Pol system, and $\Delta t=10^{-4}$ for the Lorentz system. Figure 2 corresponds to the van der Pol system with the noise amplitude $\sigma=0.5$ ; we see the noisy behavior along the limit cycle. Figures 2(b) and 2(c) correspond to the Lorentz system with noise amplitude $\sigma=5$ . The difference with Figs. 2(b) and 2(c) is in the system parameter $\rho$ ; $\rho=28$ in Fig. 2(b) and $\rho=13$ in Fig. 2(c). Although the Monte Carlo simulations are performed with the small time interval $\Delta t$ , we construct the snapshot pairs from the generated data. Here, we set the time intervals as $\Delta t_{\mathrm{obs}}=0.1$ for the van der Pol system and $\Delta t_{\mathrm{obs}}=0.01$ for the Lorentz system. The markers in Fig. 2 are the observed data points.

5.2 Evaluation of approximate values of exact solutions

In deterministic systems, it is easy to obtain numerically exact solutions, which is necessary to discuss the accuracy of the partial observation. That is, the numerical time integration, such as the Runge-Kutta method, yields a good numerical approximation for the exact solutions. By contrast, the noise in the stochastic system prevents us from such numerical time integration. Although the Monte Carlo algorithm based on the Euler-Maruyama approximation is available for time integration, we should generate many sample trajectories to estimate the exact solutions for statistics. Since it is computationally impractical to obtain solutions with high accuracy using the Monte Carlo method, we directly derive the Koopman matrix $K_{\Delta t}$ for the stochastic system. The numerical procedure is essentially the same as [39, 40, 47]; we explain it briefly in the context of the present paper.

For example, consider three dimensional cases such as the Lorentz system. Then, we define the dictionary in Sec. 2.3 as follows:

\displaystyle\bm{\psi}=[1,x_{1},x_{2},x_{3},x_{1}^{2},x_{1}x_{2},x_{1}x_{3},x_{2}^{2},x_{2}x_{3},x_{3}^{2},\dots]^{\top}.

(105)

Note that (94) leads to the expansion coefficients for the observable function $\phi$ . Hence, when we can observe all variables, i.e., $\mathcal{O}=\{1,2,3\}$ , we have the following expansion:

	$\displaystyle x_{1}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})$
	$\displaystyle=c_{1}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})1(\underline{\bm{x}})+c_{2}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{1}(\underline{\bm{x}})+c_{3}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{2}(\underline{\bm{x}})$
	$\displaystyle\quad+c_{4}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})x_{3}(\underline{\bm{x}})+\sum_{i\geq 5}c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\psi_{i}(\underline{\bm{x}}),$		(106)

where $\{c_{d}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})\}$ and $\{c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\}$ are obtained via the time integration of (94) with $\phi=x_{1}$ . Next, we perform the time integration of (94) with $\phi=x_{2}$ , $\phi=x_{3}$ and so on. Here, note that the action of $K_{\mathrm{\Delta t_{\mathrm{obs}}}}$ to $\bm{\psi}(\underline{\bm{x}},t)$ yields $\bm{\psi}(\underline{\bm{x}},t+\Delta t_{\mathrm{obs}})$ ; this indicates that the linear combination in (106) is directly related to the Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ . That is, the elements in the second row of the Koopman matrix $K_{\mathrm{\Delta t_{\mathrm{obs}}}}$ correspond to the expansion coefficients $\{c_{d}^{x_{1},\mathcal{O}}(t+\Delta t_{\mathrm{obs}})\}$ and $\{c_{i}^{x_{1},\overline{\mathcal{O}}}(t+\Delta t_{\mathrm{obs}})\}$ . The third and fourth rows are for $x_{2}$ and $x_{3}$ , respectively. Repeating this procedure, it is possible to evaluate the Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ numerically.

Here, we employ the Crank-Nicolson method to solve (94); the discrete time intervals for numerical time integration is set to $\Delta t=10^{-3}$ for the van der Pol system and $\Delta t=10^{-4}$ for the Lorentz system, respectively. As denoted above, we set the observation time interval for the snapshot pairs as $\Delta t_{\mathrm{obs}}=0.1$ for the van der Pol system and $\Delta t_{\mathrm{obs}}=0.01$ for the Lorentz system. The maximum degrees in the dictionary are $8$ for the van der Pol system and $6$ for the Lorentz system.

If there is no noise, we can compare the numerical results obtained by the evaluated Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ with those from the conventional numerical time integration based on the fourth-order Runge-Kutta method. For randomly selected initial coordinates ( $[-1,+1]^{2}$ for the van der Pol system and $[-10,+10]^{3}$ for the Lorentz system), the mean absolute errors for $1000$ data points are less than $10^{-7}$ . From these results, we confirmed that the finite dictionaries with the specified maximum degrees yield sufficient accuracy. Hence, we employ the evaluated Koopman matrix $K_{\Delta t_{\mathrm{obs}}}$ from (94) as true solutions.

5.3 Settings for the EDMD algorithm for partial observation

To investigate the effects of the partial observation, we employ the EDMD algorithm with the delay embedding. The EDMD algorithm is data-driven, and we generate the dataset as follows:

1.

Generate randomly initial coordinates. (For the van der Pol system, a uniform density with $[-1,+1]^{2}$ was used. For the Lorentz system, we used a uniform density with $[-10,+10]^{3}$ .)
2.

Using the Euler-Maruyama approximation, the time integration is simulated via the Monte Carlo method. The time intervals for time-evolution and observation are the same as those in Sec. 5.1.
3.

After $100\times\Delta t_{\mathrm{obs}}$ relaxation time, we take $101+M_{\mathrm{max}}$ data points, where $M_{\mathrm{max}}$ is the maximum number of the delay embedding.
4.

The above procedure is repeated $100$ times.

We prepare $10,000$ snapshot pairs and construct the Koopman matrix from the dataset. Although data points for all coordinates are generated, we consider the partial observation with only one variable.

We should note the dictionary in the EDMD algorithm. In deterministic systems, it is common to use a naive delay embedding of the past coordinate, such as $x_{d}(t-\Delta t_{\mathrm{obs}})$ . Although this setting is enough to evaluate only the coordinate values, we should employ higher-order basis functions to evaluate higher-order statistics. That is, $\mathbb{E}[\underline{X_{d}^{2}}]\neq(\mathbb{E}[\underline{X_{d}}])^{2}$ in the stochastic systems, and the evaluation of $\mathbb{E}[\underline{X_{d}^{2}}]$ requires the monomial functions whose maximum degree is at least two.

To consider the higher-order basis functions with the delay embedding, we rewrite the variables $\underline{\bm{x}}$ in Sec. 2 with $\underline{\bm{z}}$ . When we observe the $d$ -th variable, we set

\displaystyle\left[\begin{array}[]{c}\underline{z_{1}(t)}\\ \underline{z_{2}(t)}\\ \vdots\\ \underline{z_{M+1}(t)}\end{array}\right]=\left[\begin{array}[]{c}\underline{x_{d}(t)}\\ \underline{x_{d}(t-\Delta t_{\mathrm{obs}})}\\ \vdots\\ \underline{x_{d}(t-M\Delta t_{\mathrm{obs}})}\end{array}\right],

(115)

where $M$ is the number of the delay embedding. Then, we introduce higher-order monomial basis functions for $\{z_{i}(t)\}$ to evaluate higher-order statistics. Since we will focus on the mean values and the variances, the maximum degree of the dictionary functions for $\{z_{i}(t)\}$ is set as two, which leads to the following dictionary $\bm{\psi}$ :

	$\displaystyle[1,z_{1},\dots,z_{M+1},z_{1}^{2},z_{1}z_{2},z_{1}z_{3},\dots,z_{M}z_{M+1},z_{M+1}^{2}]$
	$\displaystyle=[1,x_{d}(t),\dots,x_{d}(t-M\Delta t_{\mathrm{obs}}),x_{d}(t)^{2},x_{d}(t)x_{d}(t-\Delta t_{\mathrm{obs}}),$
	$\displaystyle\quad x_{d}(t)x_{d}(t-2\Delta t_{\mathrm{obs}}),\dots,x_{d}(t-(M-1)\Delta t_{\mathrm{obs}})x_{d}(t-M\Delta t_{\mathrm{obs}}),x_{d}(t-M\Delta t_{\mathrm{obs}})^{2}],$		(116)

where $x_{d}(t-m\Delta t_{\mathrm{obs}})$ is a function which yields the coordinate value in the $m\Delta t_{\mathrm{obs}}$ past:

\displaystyle x_{d}(t-m\Delta t_{\mathrm{obs}})(\underline{\bm{z}(t)})=\underline{x_{d}(t-m\Delta t_{\mathrm{obs}})}.

(117)

5.4 Comparison with higher-order dictionary functions and delay embedding

As discussed in Sec. 4.2, the delay embedding would improve the accuracy of statistics in the Koopman operator approach. Here, we present the numerical results for the van der Pol system with the noise amplitude $\sigma=0.5$ .

Figure 3 shows the accuracy in the partial observation setting. Here, only the second variable, $\underline{x_{2}}$ , is measurable. In Fig. 3(a), we do not use the delay embeddings and the maximum degree of the basis functions is changed. We see that the sudden change occurs when the maximum degree is varied from 1 to 2; this fact means that the monomial functions with degree 2 are necessary to evaluate the second-order statistics.

Figure 3(b) shows the dependence of the number of delay embeddings and accuracy. Note that the dictionary introduced in Sec. 5.3 estimates $\mathbb{E}[\underline{X_{2}^{2}}]$ adequately. From the numerical results, the increase in the number of embeddings improves the accuracy. We also see that there is a certain saturation point.

Compared with Figs. 3(a) and 3(b), we confirmed that the delay embedding is more effective than the usage of higher-order dictionary functions; this fact is common in all the other cases with other noise amplitude and the Lorentz system. Hence, the discussion in Sec. 4.2 would be valid; the delay embeddings are effective even in the partial observation cases in the stochastic systems.

5.5 Effects of higher order dictionary functions

As explained in Sec. 4.2, the use of the higher-order basis functions is necessary for the evaluation of higher-order statistics. We here investigate its effects on the first-order statistics, i.e., the mean values. In the numerical experiments, we show the results for the Lorentz system; the same behavior was observed for the van der Pol system. The noise amplitude for the Lorentz system is set to $\sigma=3$ .

Figure 4 shows the accuracy of the first-order statistics in the partial observation settings. We compare two dictionaries for the delay embeddings; one is the second-order monomial basis functions for $\{z_{i}\}$ in (116). The other one is the first-order monomial basis functions. Only in the case of $\mathbb{E}[\underline{X_{3}}]$ , as shown in Fig. 4(c), there is a slight variation in accuracy for different dictionaries. The reason for this would be as follows. In the Lorentz system, $\underline{X_{3}}$ differs from $\underline{X_{1}}$ and $\underline{X_{2}}$ ; (102) does not include a term related to $\underline{X_{3}}$ , while each of the three equations includes $\underline{X_{1}}$ and $\underline{X_{2}}$ . Hence, the subspace spanned by the delay embedding with $\underline{X_{3}}$ would be smaller than those with $\underline{X_{1}}$ and $\underline{X_{2}}$ . Then, the difference in the dictionary affects the partial observation of $\underline{X_{3}}$ . As $\rho$ becomes smaller, the effects of noise through $\underline{X_{2}}$ could be smaller. Hence, the effects on partial observations also become smaller; compared to Fig. 4(c), the difference in accuracy between different dictionaries in Fig. 4(f) is smaller. From these results, we conclude that the first-order basis functions with delay embeddings contain enough information for the time-evolution in the function space. The discussion in Sec. 4.2 justifies the results; the simple delayed function contains the information of the time-evolution; see (99).

Of course, note that the evaluation of the second-order statistics, such as $\mathbb{E}[\underline{X_{d}^{2}}]$ , requires the second-order monomial basis functions.

5.6 Power-law behavior for the change of noise amplitude

In general, external parameters, such as temperature, control the amplitude of the additive noise. Here, we investigate the effects of the noise amplitude. As a consequence, we found a power-law dependency of the accuracy with the noise amplitude.

Figure 5 depicts the dependency of the accuracy on the noise amplitude $\sigma$ in the Lorentz system. Here, the plotted markers correspond to the accuracy for the delay embeddings with $M=8$ ; for all cases, the accuracy seems to have stopped declining in Fig. 4. We judged that the $M=8$ cases yield enough accurate estimation within the finite dictionary functions.

Note that we here employ the log-log plots in Fig. 5. From the plotted results, we assume the function form of the accuracy for the noise amplitude as follows:

\displaystyle f(\sigma)=\alpha_{2}\sigma^{\alpha_{1}}+\alpha_{3}.

(118)

Note that even in the noiseless cases, i.e., deterministic systems, there is a difference between the true statistics and the estimated ones because of the partial observation. Hence, we introduced $\alpha_{3}$ in (118). The fitted results are also shown in Fig. 5; the fitting seems to be good.

Table 1: The power-law exponent

\alpha_{1}

in (118) for the Lorentz system.

	$\alpha_{1}$ for $\mathbb{E}[\underline{X_{1}}]$	$\alpha_{1}$ for $\mathbb{E}[\underline{X_{2}}]$	$\alpha_{1}$ for $\mathbb{E}[\underline{X_{3}}]$
$\rho=28$	0.808 $\pm$ 0.012	0.849 $\pm$ 0.022	1.007 $\pm$ 0.013
$\rho=13$	0.773 $\pm$ 0.036	0.661 $\pm$ 0.024	0.820 $\pm$ 0.022

In Table 1, we summarized the power-law exponent $\alpha_{1}$ in (118) for the Lorentz system. Here, we repeated the same procedure for Fig. 5 five times and evaluated the means and standard deviations for the exponent $\alpha_{1}$ . Although it is difficult to capture the meaning of the exponent $\alpha_{1}$ , the exponent would reflect the effect of the unobserved part of the partial observation. The reason is as follows:

•

Although the EDMD algorithm can adequately deal with noise effects, partial observation causes a decrease in accuracy due to noise effects. The decrease occurs if the noise effects are orthogonal to the spanned space by the observable functions.
•

As discussed in Sec. 5.5, $\underline{X_{3}}$ is the most significantly affected by partial observations. Then, the exponent $\alpha_{1}$ for $\mathbb{E}[\underline{X_{3}}]$ is the largest in Table 1. Note that the values of accuracy for $\mathbb{E}[\underline{X_{3}}]$ are the smallest in Fig. 5; the increase in the accuracy due to the partial observation is not directly related to the actual accuracy values.
•

For the $\rho=13$ case, all the exponents $\alpha_{1}$ becomes small. The reason is that the term $\rho\underline{X_{1}}$ in (103) becomes smaller, and the effects of the partial observation are also smaller. In particular, the effects of noise in $\underline{X_{1}}$ are reduced directly via the term $\rho\underline{X_{1}}$ , and $\underline{X_{2}}$ is most significantly affected.

From the above discussion, we expect that the power-law form and its exponent reflect some parts of the effects of the partial observation. If the ignored term has a strong nonlinearity, the impact of the partial observation should be considerable. Then, we perform additional numerical experiments for the following modified van der Pol systems, in which we replace (101) with

\displaystyle\rmd\underline{X_{2}}=\left(\mu\left(1-\underline{X_{1}^{2}}\right)\underline{X_{2}}-h(\underline{X_{1}})\right)\rmd t+\sigma_{2}\underline{dW_{2}(t)},

(119)

where $h(\underline{X_{1}})$ is a nonlinear function. In the original van der Pol system, $h(\underline{X_{1}})=\underline{X_{1}}$ . This is an odd function, and then, we consider two cases with $h(\underline{X_{1}})=\underline{X_{1}^{3}}$ and $h(\underline{X_{1}})=\underline{X_{1}^{5}}$ .

Table 2: The power-law exponent

\alpha_{1}

in (118) for the modified van der Pol system.

	$\alpha_{1}$ for $\mathbb{E}[\underline{X_{1}}]$	$\alpha_{1}$ for $\mathbb{E}[\underline{X_{2}}]$
$h(\underline{X_{1}})=\underline{X_{1}}$	0.489 $\pm$ 0.027	0.596 $\pm$ 0.017
$h(\underline{X_{1}})=\underline{X_{1}^{3}}$	0.567 $\pm$ 0.030	0.652 $\pm$ 0.029
$h(\underline{X_{1}})=\underline{X_{1}^{5}}$	0.794 $\pm$ 0.012	1.197 $\pm$ 0.102

Table 2 shows the numerical results. As expected from the above discussion, the stronger the nonlinearity, the larger the exponent.

6 Discussions

In this work, we focused on the partial observations in stochastic systems. The Koopman operator approach is also available, and it is crucial to note the difference between state variables and observable functions. In other words, the Koopman operator approach gives an expected value, not the value of the state variable. The deterministic case is special, and the expected value coincides with the value of the state variable, i.e., $\mathbb{E}[\underline{X_{d}}]=\underline{x_{d}}$ . Although the stochasticity prevents us from using this simple connection, the delay embedding is effective even in stochastic systems. Note that there is a difference with the deterministic case, i.e., higher-order dictionary functions for the embedded states are necessary to evaluate higher-order statistics. In addition, the numerical experiments clarified that the noise dependency for accuracy exhibits a power-law behavior, and the exponent could reflect the characteristics of the partial observations.

There are some remaining tasks. First, it is not clear why the dependency of accuracy on the noise amplitude exhibits the power-law form in (118). Second, one should investigate the theoretical connection between the exponent $\alpha_{1}$ in the power-law form and the information loss in partial observations. While we performed several other numerical experiments, we have not yet found a quantitative correspondence.

As mentioned above, there has been little discussion of partial observations in stochastic systems. Modeling based on stochastic differential equations is popular. Hence, further studies on partial observations are desirable from a practical standpoint. We believe that this study will serve as a starting point for future research.

This work was supported by JST FOREST Program (Grant Number JPMJFR216K, Japan).

References

[1] Zwanzig R 1961 Phys. Rev. 124 983
[2] Mori H 1965 Prog. Theor. Phys. 33 423
[3] Zwanzig R 1973 J. Stat. Phys. 9 215
[4] Chorin A J, Hald O H and Kupferman R 2000 Proc. Nat. Acad. Sci. 97 2968
[5] Chorin A J, Hald O H and Kupferman R 2002 Physica D 166 239
[6] Chorin A J and Stinis P 2006 Comm. Appl. Math. Comp. Sci. 1 1
[7] Gouasmi A, Parish E J and Duraisamy K 2017 Proc. R. Soc. A 473 20170385
[8] te Vrugt M, Topp L, Wittkowski R and Heuer A 2024 J. Chem. Phys. 161 094904
[9] Netz R R 2024 Phys. Rev. E 110 014123
[10] Chorin A J and Lu F 2015 Proc. Nat. Acad. Sci. 112 9804
[11] Meyer H, Pelagejcev P and Schilling T 2019 Europhys. Lett. 128 40001
[12] Maeyama S and Watanabe T H 2020 J. Phys. Soc. Jpn. 89 024401
[13] González D, Chinesta F and Cueto E 2021 J. Comp. Phys. 428 109982
[14] Tian Y, Lin Y T, Anghel M and Livescu D 2021 Phys. Fluids 33 125118
[15] Venturi D and Li X 2023 Res. Math. Sci. 10 23
[16] Gupta P, Schmid P, Sipp D, Sayadi T and Rigas G 2025 Proc. R. Soc. A 481 20240259
[17] Lin K K and Lu F 2021 J. Comp. Phys. 424 109864
[18] Lin Y T, Tian Y, Livescu D and Anghel M 2021 SIAM J. Appl. Dyn. Syst. 20 2558
[19] Lin Y T, Tian Y, Perez D and Livescu D 2023 SIAM J. Appl. Dyn. Syst. 22 2890
[20] Koopman B O 1931 Proc. Natl. Acad. Sci. 17 315
[21] Rowley C W, Mezić I, Bagheri S, Schlatter P and Henningson D S 2009 J. Fluid Mech. 641 115
[22] Schmid P J 2010 J. Fluid Mech. 656 5
[23] Williams M O, Kevrekidis I G and Rowley C W 2015 J. Nonlinear Sci. 25 1307
[24] Arbabi H and Mezić I 2017 SIAM J. Appl. Dyn. Syst. 16 2096
[25] Brunton S L, Brunton B W, Proctor J L, Kaiser E and Kutz J N 2017 Nature Comm. 8 19
[26] Mauroy A, Susuki Y and Mezić I 2020 The Koopman Operator in Systems and Control: Concepts, Methodologies, and Applications, ed Mauroy A, Mezić I and Susuki Y (Cham: Springer)
[27] Budišić M, Mohr R and Mezić I 2012 Chaos 22 596
[28] Mezić I 2013 Annu. Rev. Fluid Mech. 45 357
[29] Rowley C W and Dawson S T M 2017 Annu. Rev. Fluid Mech. 49 387
[30] Mezić I 2021 Notices of Amer. Math. Soc. 68 1087
[31] Brunton S L, Budisǐć M, Kaiser E and Kutz J N 2022 SIAM Rev. 64 229
[32] Črnjarić-Žic N, Maćešić S and Mezić I 2020 J. Nonlinear Sci. 30 2007
[33] Wanner M and Mezić I 2022 SIAM J. Appl. Dyn. Syst. 21 1930
[34] Gardiner C 2009 Stochastic methods: A handbook for the natural and social sciences, 4th edition (Berlin Heidelberg: Springer)
[35] Doi M 1976 J. Phys. A: Math. Gen. 9 1465
[36] Doi M 1976 J. Phys. A: Math. Gen. 9 1479
[37] Peliti L 1985 J. Physique 46 1469
[38] Ohkubo J 2013 J. Phys. A: Math. Theor. 46 375004
[39] Ohkubo J and Arai Y 2019 J. Stat. Mech. 063202
[40] Takahashi S and Ohkubo J 2023 J. Stat. Phys. 190 159
[41] Clainche S L and Vega J M 2017 SIAM J. Appl. Dyn. Syst. 16 882
[42] Kamb M, Kaiser E, Brunton S L and Kutz J N 2020 SIAM J. Appl. Dyn. Syst. 19 886
[43] Takens F 1981 Lect. Notes in Math. 898 366
[44] Van der Pol B 1926 The London, Edinburgh and Dublin Phil. Mag. & J. of Sci. 2, 978
[45] Lorenz E N 1963 J. Atmos. Sci. 20 130
[46] Kloeden P E and Platen E 1992 Numerical Solution of Stochastic Differential Equations (Berlin: Springer)
[47] Ohkubo J 2021 J. Stat. Mech. 2021 013401