This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic Structure Estimation from Bandit Feedback using Nonvanishing Exponential Sums

Motoya Ohnishi* mohnishi@cs.washington.edu
Paul G. Allen School of Computer Science & Engineering
University of Washington
Isao Ishikawa* ishikawa.isao.zx@ehime-u.ac.jp
Ehime University
RIKEN Center for Advanced Intelligence Project
Yuko Kuroki yuko.kuroki@centai.eu
CENTAI Institute
Masahiro Ikeda masahiro.ikeda@riken.jp
RIKEN Center for Advanced Intelligence Project
Keio University
Abstract

This work tackles the dynamic structure estimation problems for periodically behaved discrete dynamical system in the Euclidean space. We assume the observations become sequentially available in a form of bandit feedback contaminated by a sub-Gaussian noise. Under such fairly general assumptions on the noise distribution, we carefully identify a set of recoverable information of periodic structures. Our main results are the (computation and sample) efficient algorithms that exploit asymptotic behaviors of exponential sums to effectively average out the noise effect while preventing the information to be estimated from vanishing. In particular, the novel use of the Weyl sum, a variant of exponential sums, allows us to extract spectrum information for linear systems. We provide sample complexity bounds for our algorithms, and we experimentally validate our theoretical claims on simulations of toy examples, including Cellular Automata.

* Equal contributions by MO and II. Project page: https://sites.google.com/view/dsefbf/

1 Introduction

System identification has been of great interest in controls, economics, and statistical machine learning (cf. Tsiamis & Pappas (2019); Tsiamis et al. (2020); Lale et al. (2020); Lee (2022); Kakade et al. (2020); Ohnishi et al. (2024); Mania et al. (2022); Simchowitz & Foster (2020); Curi et al. (2020); Hazan et al. (2018); Simchowitz et al. (2019); Lee & Zhang (2020)). In particular, estimations of periodic information, including eigenstructures for linear systems, under noisy and partially observable environments, are essential to a variety of applications such as biological data analysis (e.g., Hughes et al. (2017); Sokolove & Bushell (1978); Zielinski et al. (2014); also see Furusawa & Kaneko (2012) for how gene oscillation affects differentiation of cells), earthquake analysis (e.g., Rathje et al. (1998); Sabetta & Pugliese (1996); Wolfe (2006); see Allen & Kanamori (2003) for the connections of the frequencies and magnitude of earthquakes), chemical/asteroseismic analysis (e.g., Aerts et al. (2018)), and communication and information systems (e.g., Couillet & Debbah (2011); Derevyanko et al. (2016)), just to name a few.

In this paper, we focus on the periodic structure estimation problem for nearly periodically behaved discrete dynamical systems (cf. Arnold (1998); specifically, we mainly study theoretical aspects of recovering structural information under a novel set of model assumptions. We allow systems that are not exactly periodic with sequentially available bandit feedback. Due to the presence of noise and partial observability, our problem setups do not permit the recovery of the full set of period/eigenvalues information in general; as such we particularly ask the following question: what subset of information on dynamic structures can be statistically efficiently estimated? This work successfully answers this question by identifying and mathematically defining recoverable information, and proposes algorithms for efficiently extracting such information.

The technical novelty of our approach is highlighted by the careful adoption of the asymptotic bounds on the exponential sums that effectively cancel out noise effects while preserving the information to be estimated. When the dynamics is driven by a linear system, the use of the Weyl sum Weyl (1916), a variant of exponential sums, enables us to extract more detailed information. To our knowledge, this is the first attempt of employing asymptotic results of the Weyl sum for statistical estimation problems, and further studies on the relations between statistical estimation theory and exponential sums (or even other number theoretical results) are of independent interests.

With this summary of our work in place, we present our dynamical system model below followed by a brief introduction of the properties of the exponential sums used in this work.

Dynamic structure in bandit feedback. We define DdD\subset\mathbb{R}^{d} as a (finite or infinite) collection of arms to be pulled. Let (ηt)t=1(\eta_{t})_{t=1}^{\infty} be a noise sequence. Let Θd\Theta\subset\mathbb{R}^{d} be a set of latent parameters. We assume that there exists a dynamical system ff on Θ\Theta, equivalently, a map f:ΘΘf:\Theta\rightarrow\Theta. At each time step t{1,2,}t\in\{1,2,\ldots\}, a learner pulls an arm xtDx_{t}\in D and observes a reward

rt(xt)\displaystyle r_{t}(x_{t}) :=ft(θ)xt+ηt,\displaystyle:=f^{t}(\theta)^{\top}x_{t}+\eta_{t},

for some θΘ\theta\in\Theta. In other words, the hidden parameters for the rewards may vary over time but follow only a rule ff with initial value θ\theta. The function rtr_{t} could be viewed as the specific instance of partial observation (cf. Ljung (2010)).

Brief overview of the properties of the exponential sums used in this work. Exponential sums, also known as trigonometric sums, have developed as a significant area of study in number theory, employing various methods from analytical and algebraic number theory (see Arkhipov et al. (2004) for an overview). An exponential sum consists of a finite sum of complex numbers, each with an absolute value of one, and its absolute value can trivially be bounded by the number of terms in the sum. However, due to the cancellation among terms, nontrivial upper and lower bounds can sometimes be established. Several classes of exponential sums with such nontrivial bounds are known, and in this study, we apply bounds from a class known as Weyl sums to extract information from dynamical systems using bandit feedback. In the mathematical community, these bounds are valuable not only in themselves but also for applications in fields like analysis within mathematics (Katz, 1990). This research represents an application of these bounds in the context of machine learning (statistical learning/estimation problem) and is cast as one of the first attempts to open the applications of number theoretic results to learning theory.

In fact, even the standard results utilized in discrete Fourier transform can be leveraged, with their (non)asymptotic properties, to provably handle noisy observations to extract (nearly) periodic information. This process is illustrated in Figure 1 (the one for period estimation); where we show that one can effectively average out the noise through the exponential sum (which is denoted by \mathcal{R} in Figure 1) while preventing the correct period estimation from vanishing. By multiplying the (scalar) observations by certain exponential value, the bounds of the Weyl sum (which is denoted by 𝒲\mathscr{W} in Figure 1) can be applied to guarantee the survival of desirable target eigenvalue information; this is illustrated in Figure 1 (the one for eigenstructure estimation).

Refer to caption
Figure 1: Overview of how the exponential sum techniques are used in the work. For (nearly) period estimation problem, it is an application of discrete Fourier transform in our statistical settings, which ensures that the correct estimate remains while noise effect or wrong estimates are properly suppressed. When the system follows linear dynamics in addition to the (nearly) periodicity, our application of the Weyl sum, a variant of exponential sums, preserves some set of the eigenstructure information.

Our contributions. The contributions of this work are three folds: First, we mathematically identify and define a recoverable set of periodic/eigenvalues information when the observations are available in a form of bandit feedback. The feedback is contaminated by a sub-Gaussian noise, which is more general than those usually considered in system identification work. Second, we present provably correct algorithms for efficiently estimating such information; this constitutes the first attempt of adopting asymptotic results on the Weyl sum. Lastly, we implemented our algorithms for toy and simulated examples to experimentally validate our claims.

Summary of technical contributions and challenges. Here, we showcase a summary of technical contributions and challenges to be overcome:

  • Exploiting the exponential sum techniques (including the discrete Fourier transform) studied in number theory community as a filtering within the context of statistical estimation problems.

  • Adapt the Weyl sum technique to deal with eigenvalues by extending it to matrix sum.

  • Defining novel mathematical concepts: (aliquot) nearly period and (θ,k)(\theta,k)-distinct eigenvalues.

  • Concurrently applying a set of filterings to statistically identify correct periodic structure (we call this as concurrent application of filterings in Algorithm 1).

  • System recovery through maintenance of isomorphic structure in observation (we call it as isomorphicity maintenance lemma; see Lemma C.1).

Notations. Throughout this paper, \mathbb{R}, 0\mathbb{R}_{\geq 0}, \mathbb{N}, >0\mathbb{Z}_{>0}, \mathbb{Q}, >0\mathbb{Q}_{>0}, and \mathbb{C} denote the set of the real numbers, the nonnegative real numbers, the natural numbers ({0,1,2,}\{0,1,2,\ldots\}), the positive integers, the rational numbers, the positive rational numbers, and the complex numbers, respectively. Also, [T]:={1,2,T}[T]:=\{1,2,\ldots T\} for T>0T\in\mathbb{Z}_{>0}. The Euclidean norm is given by xd=x,xd=xx\|x\|_{\mathbb{R}^{d}}=\sqrt{\left<x,x\right>_{\mathbb{R}^{d}}}=\sqrt{x^{\top}x} for xdx\in\mathbb{R}^{d}, where ()(\cdot)^{\top} stands for transposition. M\|M\| and MF\|M\|_{F} are the spectral norm and Frobenius norm of a matrix MM respectively, and (M)\mathscr{I}(M) and 𝒩(M)\mathscr{N}(M) are the image space and the null space of MM, respectively. If aa is a divisor of bb, it is denoted by a|ba|b. The floor and the ceiling of a real number aa is denoted by a\lfloor a\rfloor and a\lceil a\rceil, respectively. Finally, the least common multiple and the greatest common divisor of a set \mathcal{L} of positive integers are denoted by lcm(){\rm lcm}(\mathcal{L}) and gcd(){\rm gcd}(\mathcal{L}), respectively.

2 Motivations

As our main body of the work is largely theoretical with a new problem formulation, this section is intended to articulate the motivations behind this work from scientific (theoretical) and practical perspectives followed by the combined aspects for our formulation.

2.1 Scientific (theoretical) motivation

When dealing with noisy observation, it is advised to use concentrations of measure to extract the true target value by averaging out the noise effect over sufficiently many samples. On the other hand, when the samples are generated by a dynamical system, averaging over the samples will have an effect of averaging over time, which could lose some information on the evolution of dynamics.

Dynamic information recovery from noisy observation.

Consider for example a fixed point attractor and a limit cycle attractor dynamics as depicted in Figure 2; when averaging over samples generated by sufficiently long time horizon, both cases would output values that are close to (0,0)(0,0) which ambiguates the dynamical properties. In particular, properties of long-time average of the dynamics have been studied in ergodic theory (cf. Walters (2000)), and studying dynamical systems from both time and measure is of independent interest.

Refer to caption
Figure 2: Illustration of loss of dynamic information through concentration of measure. Left: An example of the law of large numbers showing the sample mean of the numbers given by throwing dice many times will converge towards the expected value as the number of throws increases. Right: An intuitive understanding of how the concentration of measures or process of averaging out the observations may erase not only the noise effect but also the dynamics information; in this case, the dynamics on the left side is a fixed point attractor and the right one is a limit cycle attractor, both of which may return (0,0)(0,0) when averaging the states over long time.

Dynamic information recovery from scalar observations.

Also, linear bandit feedback system is a special case of partially observable systems, and specifically considers scalar (noisy) observations.

For reconstruction of attractor dynamics from noiseless scalar observations, as will be mentioned in Section 3, Takens’ theorem (or delay embedding theorem; Takens (1981)) is famous, where an observation function is used to construct the embedding. It identifies the dynamical system properties that are preserved through the reconstruction as well.

If the noise is also present, the problems become extensively complicated and some attempts on the analysis have been made (Casdagli et al., 1991); which discusses the trade-off between the distortion and estimation errors.

2.2 Practical motivation

Practically, periodic information estimation is of great interest as we have mentioned in Section 1; in addition, eigenvalue estimation problems are particularly important for analyzing linear systems.

Physical constant estimation.

Some of the important linear systems include electric circuits and vibration (or oscillating) systems which are typically used to model an effect of earthquakes over buildings. Eigenvalue estimation problems in such cases reduce to the estimation of physical constants. Also, they have been studied for channel estimation (Van De Beek et al., 1995) as well.

Dynamic mode decomposition.

At the same time, analyzing the modes of a target dynamical system by the technique named (extended) dynamic mode decomposition ((E)DMD; cf. Tu (2013); Kutz et al. (2016); Williams et al. (2015)) has been extensively researched. The results give spectrum information about the dynamics.

Attractor reconstruction.

With Takens’ theorem, obtaining dynamic structure information of attractors in a finite-dimensional space from a finite number of scalar observations is useful in practice as well; see (Bakarji et al., 2022) for an application in deep autoencoder. Our problems are not dealing with general nonlinear attractor dynamics; however, the use of exponential sums in the estimation process would become a foundation for the future applications for modeling target dynamics.

Communication capacity.

There has been an interest of studying novel paradigms for coding, transmitting, and processing information sent through optical communication systems (Turitsyn et al., 2017); when encoding information in (nearly) periodic signals to transmit, computing communication capacity that properly takes communication errors into account is of fundamental interest. If the observation is made as a mix of the outputs of multiple communication channels, the problem becomes relevant to bandit feedback as well.

Bandit feedback – single-pixel camera.

As widely studied in a context of compressed sensing (cf. Donoho (2006); Eldar & Kutyniok (2012); Candes et al. (2006)), reconstruction of the target image from the single-pixel camera (e.g., Duarte et al. (2008)) observations relies on a random observation matrix (or vector) to obtain a single pixel value for each observation. It is advantageous to concentrate light to one pixel for the cases where the light intensity is very low. Bandit feedback naturally coincides with a random observation matrix that the user can design. The observation through a random matrix coupled with compressed sensing has been studied for channel estimation, magnetic resonance imaging, visualization of black holes, and vibration analysis as well.

Bandit feedback – recommendation system.

Reward maximization problems from bandit feedback have been applied to recommendation system (e.g., Li et al. (2010)) for example, where the observations are in scalar form. Other applications of bandit problems include healthcare, dynamic pricing, dialogue systems and telecommunications. Although our problem formulation is not meant to maximize any reward, it is seamlessly connected to the bandit problems as described below.

Connection to reward maximization problems.

As we will briefly describe in Appendix E, thanks to the problem formulations in this work, one could use our estimation algorithms for downstream bandit problems (i.e., reward maximization problem with bandit feedback). A naive approach is an explore-then-commit type algorithm (cf. Robbins (1952); Anscombe (1963)) where the “explore” stage utilizes our estimation mechanism to identify the dynamics structure behind the system parameter. Devising efficient dynamic structure estimation algorithms for other types of information than the periodicity and eigenstructure under bandit feedback scenarios may also be an interesting direction of research.

2.3 Combined aspects

In our work, we essentially consider periodic (and eigenstructure) information recovery from observation data, which as mentioned is crucial in some application domains. In particular, our formulation cares about the situation where the observation is made by a bandit feedback which produces a sequence of scalar observations. We mention that the bandit feedback is essentially just a (noisy) scalar observation where (possibly random) observation vectors are selected by users.

One possible example scenario in which bandit feedback and dynamic structure estimation meet is illustrated in Figure 3 where the target system has some periodicity and the noisy observation is made through a single-pixel camera equipped with a random filter to be used to determine the periodic structure of the target system.

Also, as mentioned, we emphasize again that reconstruction of dynamical systems characteristics such as their attractor dynamics through (noisy, user-selected) scalar observation is an important area of research; we believe the use of exponential sums in the reconstruction of dynamics opens up further research directions and applications in a similar manner to Takens’ theorem (see Section 3 as well).

Refer to caption
Figure 3: Illustration of the example practical case where our techniques may be applied; in this example, the target object is following nearly periodic motion, and requires concentration of light to be captured. A random filter could be employed and we observe a single value sequentially to predict its periodic information.

3 Related work

As we have seen in Section 2, our broad motivation stems from the reconstruction of dynamical system information from noisy and partial observations. The most relevant lines of work to this problem are (1) that of Takens (Takens, 1981) and its extensions, and (2) system identifications for partially observed dynamical systems.

While the former studies more general nonlinear attractor dynamics from a sequence of scalar observations made by certain observation function having desirable properties, it is originally for the noiseless case. Although there exist some attempts on extending it to noisy situations (e.g., Casdagli et al. (1991)), provable estimation of some dynamic structure information with sample complexity guarantees (or nonasymptotic result) under fairly general noise case is not elaborated. On the other hand, our work may be viewed as a special instance of system identifications for partially observed dynamical systems. Existing works for sample complexity analysis of partially observed linear systems (e.g., Menda et al. (2020); Hazan et al. (2018); Simchowitz et al. (2019); Lee & Zhang (2020); Tsiamis & Pappas (2019); Tsiamis et al. (2020); Lale et al. (2020); Lee (2022); Adams et al. (2021); Bhouri & Perdikaris (2021); Ouala et al. (2020); Uy & Peherstorfer (2021); Subramanian et al. (2022); Bennett & Kallus (2021); Lee et al. (2020)) typically consider additive Gaussian noise and make controllability and/or observability assumptions (for autonomous case, transition with Gaussian noise with positive definite covariance is required). While (Mhammedi et al., 2020) considers nonlinear observation, it still assumes Gaussian noise and controllability. The work (Hazan et al., 2018) considers adversarial noise but with limited budget; we mention its wave-filtering approach is interesting and our use of exponential sums could also be viewed as filtering. Also, the work (Simchowitz et al., 2019) considers control inputs and bounded semi-adversarial noise, which is another set of strong assumptions.

Our work is based on the different set of assumptions and aims at estimating a specific set of information; i.e., while we restrict ourselves to the case of (nearly) periodic systems and allow observations in the form of bandit feedback, our goal is to estimate (nearly) periodic structure of the target system. In this regard, we are not proposing “better” algorithms than the existing lines of work but are considering different problem setups that we believe are well motivated from both theoretical and practical perspectives (see Section 2). We mention that there exist many period estimation methods (e.g., Moore et al. (2014); Tenneti & Vaidyanathan (2015)), and in the case of zero noise, this becomes a trivial problem. For intuitive understandings of how our work may be placed in the literature, we show the illustration in Figure 4.

Refer to caption
Figure 4: Illustration of the position of our work as a study of reconstruction of dynamical system properties. We assume sub-Gaussian noise, bandit feedback (user-defined linear feedback contaminated by noise), autonomous system, and (nearly) periodicity which is the information to be extracted. To our knowledge, our assumptions do not subsume or are subsumed by those of other work as a problem of provably estimating periodic structure. In comparison to existing system identification techniques for partially observed dynamical systems, we do not assume Gaussian (or bounded) noise, limited budget for adversarial noise, controllability, observability or noisy transition, while we assume bandit feedback to identify a recoverable set of dynamic structure information.

We also mention that our model of bandit feedback is commonly studied within stochastic linear bandit literature (cf. Abe & Long (1999); Auer (2003); Dani et al. (2008); Abbasi-Yadkori et al. (2011)). Also, as we consider the dynamically changing system states (or reward vectors), it is closely related to adversarial bandit problems (e.g., Bubeck & Cesa-Bianchi (2012); Hazan (2016)). Recently, some studies on non-stationary rewards have been made (cf. Auer et al. (2019); Besbes et al. (2014); Chen et al. (2019); Cheung et al. (2022); Luo et al. (2018); Russac et al. (2019); Trovo et al. (2020); Wu et al. (2018)) although they do not deal with periodically behaved dynamical system properly (see discussions in (Cai et al., 2021) as well). For discrete action settings, (Oh et al., 2019) proposed the periodic bandit, which aims at minimizing the total regret. Also, if the period is known, Gaussian process bandit for periodic reward functions was proposed (Cai et al., 2021) under Gaussian noise assumption. While our results could be extended to the regret minimization problems by employing our algorithms for estimating the periodic information before committing to arms in a certain way, we emphasize that our primary goal is to estimate such periodic information in provably efficient ways. We thus mention that our work is orthogonal to the recent studies on regret minimization problems for non-stationary environments (or in particular, periodic/seasonal environments). Refer to (Lattimore & Szepesvári, 2020) for bandit algorithms that are not covered here.

4 Problem definition

In this section, we describe our problem setting. In particular, we introduce some definitions on the properties of dynamical system.

4.1 Nearly periodic sequence

First, we define a general notion of nearly periodic sequence:

Definition 4.1 (Nearly periodic sequence).

Let (𝒳,d)(\mathcal{X},d) be a metric space. Let μ0\mu\geq 0 and let L>0L\in\mathbb{Z}_{>0}. We say a sequence (yt)t=1𝒳(y_{t})_{t=1}^{\infty}\subset\mathcal{X} is μ\mu-nearly periodic of length LL if d(ys+Lt,ys)<μd(y_{s+Lt},y_{s})<\mu for any s,t>0s,t\in\mathbb{Z}_{>0}. We also call LL the length of the μ\mu-nearly period.

Intuitively, there exist LL balls of diameter μ\mu in 𝒳\mathcal{X} and the sequence y1,y2,y_{1},y_{2},\dots moves in the balls in order if (yt)t=1(y_{t})_{t=1}^{\infty} is μ\mu-nearly periodic of length LL. Obviously, nearly periodic sequence of length LL is also nearly periodic sequence of length mLmL for any m>0m\in\mathbb{Z}_{>0}. We say a sequence is periodic if it is 0-nearly periodic. We introduce a notion of aliquot nearly period to treat estimation problems of period:

Definition 4.2 (Aliquot nearly period).

Let (𝒳,d)(\mathcal{X},d) be a metric space. Let ρ>0\rho>0 and λ1\lambda\geq 1. Assume a sequence {yt}t=1𝒳\{y_{t}\}_{t=1}^{\infty}\subset\mathcal{X} is μ\mu-nearly periodic of length LL for some μ0\mu\geq 0 and L>0L\in\mathbb{Z}_{>0}. A positive integer \ell is a (ρ,λ)(\rho,\lambda)-aliquot nearly period ((ρ,λ)(\rho,\lambda)-anp) of (yt)t=1(y_{t})_{t=1}^{\infty} if |L\ell|L and the sequence (yt)t=1(y_{t})_{t=1}^{\infty} is (ρ+2λμ)(\rho+2\lambda\mu)-nearly periodic.

We may identify the (ρ,λ)(\rho,\lambda)-anp with a 2λμ2\lambda\mu-nearly period under an error margin ρ\rho. When we estimate the length LL of the nearly period of unknown sequence (yt)t(y_{t})_{t}, we sometimes cannot determine the LL itself, but an aliquot nearly period.

Example 4.1.

A trajectory of finite dynamical system is always periodic and it is the most simple but important example of (nearly) periodic sequence. We also emphasize that if we know the upper bound of the number of underlying space, the period is bounded above by the upper bound as well. These facts are summarized in Proposition 4.1. The cellular automata on finite cells is a specific example of finite dynamical systems. We will treat LifeGame Conway et al. (1970), a special cellular automata, in our simulation experiment (see Section 6).

Proposition 4.1.

Let f:ΘΘf:\Theta\to\Theta be a map on a set Θ\Theta. If |Θ|<|\Theta|<\infty, then for any t|Θ|t\geq|\Theta| and θΘ\theta\in\Theta, ft+L(θ)=ft(θ)f^{t+L}(\theta)=f^{t}(\theta) for some 1L|Θ|1\leq L\leq|\Theta|.

Proof.

Since |{θ,f(θ),,f|Θ|(θ)}|>|Θ||\{\theta,f(\theta),\dots,f^{|\Theta|}(\theta)\}|>|\Theta|, there exist 0i<j|Θ|0\leq i<j\leq|\Theta| such that fi(θ)=fj(θ)f^{i}(\theta)=f^{j}(\theta) by the pigeon hole principal. Thus, ft(θ)=fji+t(θ)f^{t}(\theta)=f^{j-i+t}(\theta) for all t|Θ|t\geq|\Theta|. ∎

If a linear dynamical system generates a nearly periodic sequence, we can show the linear system has a specific structure as follows:

Proposition 4.2.

Let M:ddM:\mathbb{R}^{d}\to\mathbb{R}^{d} be a linear map. Let d=αVα\mathbb{C}^{d}=\oplus_{\alpha}V_{\alpha} be the decomposition via generalized eigenspaces of MM, where α\alpha runs over the eigenvalues of MM and Vα:=𝒩((αIM)d)V_{\alpha}:=\mathscr{N}((\alpha I-M)^{d}). Assume that there exists μ0\mu\geq 0, for any θd\theta\in\mathbb{R}^{d}, (Mtθ)t=t0(M^{t}\theta)_{t=t_{0}}^{\infty} is μ\mu-nearly periodic for some t0t_{0}\in\mathbb{N}. Let θ=αθααVα\theta=\sum_{\alpha}\theta_{\alpha}\in\oplus_{\alpha}V_{\alpha}. Then, each eigenvalue α\alpha such that θα0\theta_{\alpha}\neq 0 satisfies |α|1|\alpha|\leq 1, in addition, if |α|=1|\alpha|=1 and θα0\theta_{\alpha}\neq 0, Mθα=αθαM\theta_{\alpha}=\alpha\theta_{\alpha}.

Proof.

We note that {Mtv}t0\{M^{t}v\}_{t\geq 0} is bounded for any vdv\in\mathbb{R}^{d} by the assumption on MM. Thus, MM cannot have an eigenvalue of magnitude greater than 11. We show that α\alpha is in the form of α=ei2πq\alpha=e^{i2\pi q} for some qq\in\mathbb{Q} if |α|=1|\alpha|=1. Suppose that α=ei2πγ\alpha=e^{i2\pi\gamma} for an irrational number γ\gamma\in\mathbb{R}. Then, for an eigenvector ww for θα\theta_{\alpha} with wd>μ\|w\|_{\mathbb{R}^{d}}>\mu, {Mtw}t>t0\{M^{t}w\}_{t>t_{0}} cannot become a μ\mu-periodic sequence. Thus, we conclude α=ei2πq\alpha=e^{i2\pi q} for some qq\in\mathbb{Q}. Next, we show Mθα=αθαM\theta_{\alpha}=\alpha\theta_{\alpha} if |α|=1|\alpha|=1 and θα0\theta_{\alpha}\neq 0. Suppose (MαI)θα0(M-\alpha I)\theta_{\alpha}\neq 0. Since (MαI)dθα=0(M-\alpha I)^{d}\theta_{\alpha}=0, there exists 1d<d1\leq d^{\prime}<d such that (MαI)d+1θα=0(M-\alpha I)^{d^{\prime}+1}\theta_{\alpha}=0 but (MαI)dθα0(M-\alpha I)^{d^{\prime}}\theta_{\alpha}\neq 0. Let w:=(MαI)d1θαw^{\prime}:=(M-\alpha I)^{d^{\prime}-1}\theta_{\alpha}. Then, we see that (MαI)2w=0(M-\alpha I)^{2}w^{\prime}=0 but (MαI)w0(M-\alpha I)w^{\prime}\neq 0. By direct computation, we see that

Mtwd=(MαI+αI)twdt(MαI)wdwd.\|M^{t}w^{\prime}\|_{\mathbb{R}^{d}}=\|(M-\alpha I+\alpha I)^{t}w^{\prime}\|_{\mathbb{R}^{d}}\geq t\|(M-\alpha I)w^{\prime}\|_{\mathbb{R}^{d}}-\|w^{\prime}\|_{\mathbb{R}^{d}}.

Thus, we have Mtwd\|M^{t}w^{\prime}\|_{\mathbb{R}^{d}}\to\infty as tt\to\infty, which contradicts the fact that {Mtw}t0\{M^{t}w^{\prime}\}_{t\geq 0} is a bounded sequence. The last statement is obvious. ∎

Let WW be a linear subspace of d\mathbb{C}^{d} generated by the trajectory {Mθ,M2θ,}\{M\theta,M^{2}\theta,\dots\} and denote dim(W)\dim(W) by d0d_{0}. Note that restriction of MM to WW induces a linear map from WW to WW. We denote by MθM_{\theta} the induced linear map from WW to WW. Let W=αΛWαW=\oplus_{\alpha\in\Lambda}W_{\alpha} be the decomposition via the generalized eigenspaces of MθM_{\theta}, where Λ\Lambda is the set of eigenvalues of MθM_{\theta} and Wα:=𝒩((αIMθ)d0)W_{\alpha}:=\mathscr{N}((\alpha I-M_{\theta})^{d_{0}}). We define

W=1:=|α|=1Wα,\displaystyle W_{=1}:=\oplus_{|\alpha|=1}W_{\alpha},
W<1:=|α|<1Wα.\displaystyle W_{<1}:=\oplus_{|\alpha|<1}W_{\alpha}.

Then, we have the following statement as a corollary of Proposition 4.2:

Corollary 4.3.

There exist linear maps M1,M<1:WWM_{1},M_{<1}:W\to W such that

  1. 1.

    Mθ=M1+M<1M_{\theta}=M_{1}+M_{<1},

  2. 2.

    M1M<1=M<1M1=OM_{1}M_{<1}=M_{<1}M_{1}=O,

  3. 3.

    M1M_{1} is diagonalizable and any eigenvalue of M1M_{1} is of magnitude 11, and

  4. 4.

    any eigenvalue of M<1M_{<1} is of magnitude smaller than 11.

Proof.

Let p:WW=1p:W\to W_{=1} be the projection and let i:W=1Wi:W_{=1}\to W be the inclusion map. We define M1:=iMθpM_{1}:=iM_{\theta}p. We can construct M<1M_{<1} in the similar manner and these matrices are desired ones. ∎

Example 4.2.

Let GG be a finite group and let ρ\rho be a finite dimensional representation of GG, namely, a group homomorphism ρ:GGLm()\rho:G\to\mathrm{GL}_{m}(\mathbb{C}), where GLm()\mathrm{GL}_{m}(\mathbb{C}) is the set of complex regular matrices of size mm. Fix gGg\in G. Let Bn×nB\in\mathbb{C}^{n\times n} be a matrix whose eigenvalues have magnitudes smaller than 11. We define a matrix of size m+nm+n by

M:=(ρ(g)00B).M:=\left(\begin{array}[]{cc}\rho(g)&0\\ 0&B\end{array}\right).

Then, (Mtx)t=t0(M^{t}x)_{t=t_{0}}^{\infty} is a μ\mu-nearly periodic sequence for any μ>0\mu>0, xmx\in\mathbb{C}^{m}, and sufficiently large t0t_{0}. Moreover, we know that the length of the nearly period is |G||G|. We treat the permutation of variables in d\mathbb{R}^{d} in the simulation experiment (see Section 6), namely the case where GG is the symmetric group 𝔖d\mathfrak{S}_{d} and ρ\rho is a homomorphism from GG to GLd()\mathrm{GL}_{d}(\mathbb{C}) defined by ρ(g)((xj)j=1d):=(xg(j))j=1d\rho(g)((x_{j})_{j=1}^{d}):=(x_{g(j)})_{j=1}^{d}, which is the permutation of variable via gg.

4.2 Problem setting

Here, we state our problem settings. We use the notation introduced in the previous sections. First, we summarize our technical assumptions as follows:

Assumption 1 (Conditions on arms).

The set of arms DD contains the unit hypersphere.

Assumption 2 (Assumptions on noise).

The noise sequence {ηt}t=1\{\eta_{t}\}_{t=1}^{\infty} is conditionally RR-sub-Gaussian (R0R\in\mathbb{R}_{\geq 0}), i.e., given tt,

λ,𝔼[eληt|t1]eλ2R22,\displaystyle\forall\lambda\in\mathbb{R},~{}\mathbb{E}\left[e^{\lambda\eta_{t}}|\mathcal{F}_{t-1}\right]\leq e^{\frac{\lambda^{2}R^{2}}{2}},

and 𝔼[ηt|t1]=0\mathbb{E}[\eta_{t}|\mathcal{F}_{t-1}]=0, Var[ηt|t1]R2\mathrm{Var}[\eta_{t}|\mathcal{F}_{t-1}]\leq R^{2}, where {τ}τ\{\mathcal{F}_{\tau}\}_{\tau\in\mathbb{N}} is an ascending family and we assume that x1,,xτ+1,η1,,ητx_{1},\ldots,x_{\tau+1},\eta_{1},\ldots,\eta_{\tau} are measurable with respect to τ\mathcal{F}_{\tau}.

Assumption 3 (Assumptions on dynamical systems).

There exists μ>0\mu>0 such that for any θΘ\theta\in\Theta, the sequence (ft(θ))t=t0(f^{t}(\theta))_{t=t_{0}}^{\infty} is μ\mu-nearly periodic of length LL for some t0t_{0}\in\mathbb{N}. We denote by BθB_{\theta} the radius of the smallest ball containing {ft(θ)}t=0\{f^{t}(\theta)\}_{t=0}^{\infty}.

Remark 4.4.

Assumption 1 excludes the lower bound arguments of the minimally required samples for our work since taking sufficiently large vector (arm) makes the noise effect negligible. Considering more restrictive conditions for discussing lower bounds is out of scope of this work.

Then, our questions are described as follows:

  • Can we estimate the length LL from the collection of rewards (rt(xt))t=1T(r_{t}(x_{t}))_{t=1}^{T} efficiently ?

  • If we assume the dynamical system is linear, can we further obtain the eigenvalues of ff from a collection of rewards ?

  • How many samples do we need to provably estimate the length LL or eigenvalues of ff ?

We will answer these questions in the following sections and via simulation experiments.

5 Algorithms and theory

With the above settings in place, we present a (computationally efficient) algorithm for each presented problem, and show its sample complexity for estimating certain information.

5.1 Period estimation

Here, we describe an algorithm for period estimation followed by its theoretical analysis. The overall procedure is summarized in Algorithm 1. Line 8 executes concurrent application of filterings.

Input: Current time t0>0t_{0}\in\mathbb{Z}_{>0}; TpT_{p}; ϵ>0\epsilon>0; Lmax>1L_{\max}>1; orthogonal basis {𝐮1,,𝐮d}\{\mathbf{u}_{1},\dots,\mathbf{u}_{d}\} of d\mathbb{R}^{d}
      Output: Estimated length L^\hat{L}

1:1\ell\leftarrow 1; β1\beta\leftarrow 1
2:for m=1,,dm=1,\dots,d do
3:     for t=t0,t0+1,,t0+Tp1t=t_{0},t_{0}+1,\ldots,t_{0}+T_{p}-1 do
4:         Sample arm 𝐮m\mathbf{u}_{m} and observe rt(𝐮m)r_{t}(\mathbf{u}_{m})
5:     end for
6:     while βLmax\ell\cdot\beta\leq L_{\max} do
7:         +1\ell\leftarrow\ell+1
8:         for (s,b)=(0,1),(0,2),,(0,1),(1,1),(1,2),,(β1,1)(s,b)=(0,1),(0,2),\dots,(0,\ell-1),(1,1),(1,2),\dots,(\beta-1,\ell-1) do
9:              if |((rt0+s+βt(𝐮m))t=1Tp/β;b/)}|2>ϵ\left|\mathcal{R}\left((r_{t_{0}+s+\beta t}(\mathbf{u}_{m}))_{t=1}^{\lfloor T_{p}/\beta\rfloor};b/\ell\right)\}\right|^{2}>\epsilon then
10:                  ββ\beta\leftarrow\beta\ell
11:                  1\ell\leftarrow 1
12:                  Break
13:              end if
14:         end for
15:     end while
16:     t0t0+Tpt_{0}\leftarrow t_{0}+T_{p}
17:end for
18:L^β\hat{L}\leftarrow\beta
Algorithm 1 Period estimation (DFT)

To analyze the sample complexity of this estimation algorithm, we first introduce an exponential sum that plays a key role:

Definition 5.1.

For a positive rational number q>0q\in\mathbb{Q}_{>0} and TT complex numbers a1,,aTa_{1},\dots,a_{T}\in\mathbb{C}, we define

((at)t=1T;q)\displaystyle\mathcal{R}\big{(}(a_{t})_{t=1}^{T};q) :=1Tj=1Tajei2πqj.\displaystyle:=\frac{1}{T}\sum_{j=1}^{T}a_{j}e^{i2\pi qj}.

For a μ\mu-nearly periodic sequence 𝐚:=(at)t=1\mathbf{a}:=(a_{t})_{t=1}^{\infty} of length LL, we define the supremum of the standard deviations of the LL sequential data of 𝐚\mathbf{a}:

σL(𝐚):=supt011Lt=t0t0+L1|at1Lj=t0t0+L1aj|2.\displaystyle\sigma_{L}(\mathbf{a}):=\sup_{t_{0}\geq 1}\sqrt{\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}\left|a_{t}-\frac{1}{L}\sum_{j=t_{0}}^{t_{0}+L-1}a_{j}\right|^{2}}.

The exponential sum (;)\mathcal{R}(\cdot;\cdot) can extract a divisor of the nearly period of a μ\mu-nearly periodic sequence if μ\mu is “sufficiently smaller” than the variance of the sequence even when the sequence is contaminated by noise; more precisely, we have the following lemma:

Lemma 5.1.

Let 𝐚:=(aj)j=1\mathbf{a}:=(a_{j})_{j=1}^{\infty} be a μ\mu-nearly periodic sequence of length LL. Then, we have the following statements:

  1. 1.

    if L>1L>1, then there exists s>0s\in\mathbb{Z}_{>0} with s<Ls<L such that

    |((aj)j=1T;s/L)|>σ22μσLμLsupt1|at|T,\displaystyle\left|\mathcal{R}\left((a_{j})_{j=1}^{T};s/L\right)\right|>\sqrt{\frac{\sigma^{2}-2\mu\sigma}{L}}-\mu-\frac{L\sup_{t\geq 1}|a_{t}|}{T}, (5.1)
  2. 2.

    if β\beta is not a divisor of LL, then for any α>0\alpha\in\mathbb{Z}_{>0},

    |((aj)j=1T;α/β)|<μ+L2𝒞0(μ+supt1|at|)T,\displaystyle\left|\mathcal{R}\left((a_{j})_{j=1}^{T};\alpha/\beta\right)\right|<\mu+\frac{L^{2}\mathcal{C}_{0}(\mu+\sup_{t\geq 1}|a_{t}|)}{T}, (5.2)

    where 𝒞0:=1+2/2π(3/4)π2/6=1.72257196806914\mathcal{C}_{0}:=1+{2}/{\sqrt{2}\pi(3/4)^{\pi^{2}/6}}=1.72257196806914....

Proof.

As (at)t=1(a_{t})_{t=1}^{\infty} is μ\mu-almost periodic, there exist (bt)t=1(b_{t})_{t=1}^{\infty} of period LL and (ct)t=1(c_{t})_{t=1}^{\infty} with supt1|ct|<μ\sup_{t\geq 1}|c_{t}|<\mu such that at=bt+cta_{t}=b_{t}+c_{t}.

First, we prove (5.1). Let

b~:=1Lt=1Lbt,b^q:=1Lt=1Lbtei2πtq,(q).\displaystyle\tilde{b}:=\frac{1}{L}\sum_{t=1}^{L}b_{t},~{}~{}\hat{b}_{q}:=\frac{1}{L}\sum_{t=1}^{L}b_{t}e^{i2\pi tq},\quad(q\in\mathbb{Q}).

We claim that

Lsup{|b^s/L|2}s=1L1>s=1L1|b^s/L|2=1Lt=1L|btb~|2σ22μσ.\displaystyle L\cdot\sup\{|\hat{b}_{s/L}|^{2}\}_{s=1}^{L-1}>\sum_{s=1}^{L-1}|\hat{b}_{s/L}|^{2}=\frac{1}{L}\sum_{t=1}^{L}|b_{t}-\tilde{b}|^{2}\geq\sigma^{2}-2\mu\sigma. (5.3)

In fact, the first inequality is obvious. The equality follows from the Plancherel formula for a finite abelian group (see, for example, (Serre, 1977, Excercise 6.2)). As for the last inequality, take arbitrary t0>0t_{0}\in\mathbb{Z}_{>0} and define a~=L1t=t0t0+L1at\tilde{a}=L^{-1}\sum_{t=t_{0}}^{t_{0}+L-1}a_{t} and c~=L1t=t0t0+L1ct\tilde{c}=L^{-1}\sum_{t=t_{0}}^{t_{0}+L-1}c_{t}. Then, we have

1Lt=1L|btb~|2\displaystyle\frac{1}{L}\sum_{t=1}^{L}|b_{t}-\tilde{b}|^{2} 1Lt=t0t0+L1|ata~|2+1Lt=t0t0+L1|ctc~|22Lt=t0t0+L1|ata~||ctc~|\displaystyle\geq\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|a_{t}-\tilde{a}|^{2}+\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|c_{t}-\tilde{c}|^{2}-\frac{2}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|a_{t}-\tilde{a}|\cdot|c_{t}-\tilde{c}|
1Lt=t0t0+L1|ata~|2+1Lt=t0t0+L1|ctc~|2\displaystyle\geq\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|a_{t}-\tilde{a}|^{2}+\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|c_{t}-\tilde{c}|^{2}
21Lt=t0t0+L1|ata~|21Lt=t0t0+L1|ctc~|2\displaystyle\qquad-2\sqrt{\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|a_{t}-\tilde{a}|^{2}}\cdot\sqrt{\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|c_{t}-\tilde{c}|^{2}}
1Lt=t0t0+L1|ata~|22μσ.\displaystyle\geq\frac{1}{L}\sum_{t=t_{0}}^{t_{0}+L-1}|a_{t}-\tilde{a}|^{2}-2\mu\sigma.

Here, we used the Cauchy-Schwartz inequality in the second inequality. Since t0t_{0} is arbitrary, we have (5.3). Let sargmaxs=1,,L1|b^s/L|2s\in\mathrm{argmax}_{s=1,\dots,L-1}|\hat{b}_{s/L}|^{2} and let q:=s/Lq:=s/L. Let T=LT+γT=LT^{\prime}+\gamma for some T,γT^{\prime},\gamma\in\mathbb{N} with 0γ<L0\leq\gamma<L. Then, we have

|((aj)j=1T;q)|\displaystyle\left|\mathcal{R}\left((a_{j})_{j=1}^{T};q\right)\right| |1Tt=0T1[1Ls=1LaLt+sei2πqs]|Lsupt1|at|T\displaystyle\geq\left|\frac{1}{T^{\prime}}\sum_{t=0}^{T^{\prime}-1}\left[\frac{1}{L}\sum_{s=1}^{L}a_{Lt+s}e^{i2\pi qs}\right]\right|-\frac{L\sup_{t\geq 1}|a_{t}|}{T}
>|b^q|μLsupt1|at|T\displaystyle>|\hat{b}_{q}|-\mu-\frac{L\sup_{t\geq 1}|a_{t}|}{T}
σ22μσLμLsupt1|at|T.\displaystyle\geq\sqrt{\frac{\sigma^{2}-2\mu\sigma}{L}}-\mu-\frac{L\sup_{t\geq 1}|a_{t}|}{T}.

Next, we prove (5.2). As in the same computation as above, we have

|((aj)j=1T;α/β)|\displaystyle\left|\mathcal{R}\left((a_{j})_{j=1}^{T};\alpha/\beta\right)\right| LTT|1Tt=0T1ei2παLt/β[1Ls=1LaLt+sei2παs/β]|+Lsupt>0|at|T\displaystyle\leq\frac{LT^{\prime}}{T}\left|\frac{1}{T^{\prime}}\sum_{t=0}^{T^{\prime}-1}e^{i2\pi\alpha Lt/\beta}\left[\frac{1}{L}\sum_{s=1}^{L}a_{Lt+s}e^{i2\pi\alpha s/\beta}\right]\right|+\frac{L\sup_{t>0}|a_{t}|}{T}
<LTT|1Tt=0T1ei2παLt/β||1Ls=1Lbsei2παs/β|+μ+Lsupt>0|at|T\displaystyle<\frac{LT^{\prime}}{T}\left|\frac{1}{T^{\prime}}\sum_{t=0}^{T^{\prime}-1}e^{i2\pi\alpha Lt/\beta}\right|\cdot\left|\frac{1}{L}\sum_{s=1}^{L}b_{s}e^{i2\pi\alpha s/\beta}\right|+\mu+\frac{L\sup_{t>0}|a_{t}|}{T}
<2|1ei2παL/β|(μ+supt1|at|)LT+μ+Lsupt>0|at|T.\displaystyle<\frac{2}{\left|1-e^{i2\pi\alpha L/\beta}\right|}\cdot\left(\mu+\sup_{t\geq 1}|a_{t}|\right)\cdot\frac{L}{T}+\mu+\frac{L\sup_{t>0}|a_{t}|}{T}.

Since |1ei2πa|2(1a2)π2/6a|1-e^{i2\pi a}|\geq\sqrt{2}(1-a^{2})^{\pi^{2}/6}a for a(0,1)a\in(0,1) by (Chesneau & Bagul, 2020, Proposition 3.2), we have

|((aj)j=1T;α/β)|\displaystyle\left|\mathcal{R}\left((a_{j})_{j=1}^{T};\alpha/\beta\right)\right| <(μ+supt1|at|)2βL2(3/4)π2/6T+μ+Lsupt1|at|T\displaystyle<\left(\mu+\sup_{t\geq 1}|a_{t}|\right)\frac{2\beta L}{\sqrt{2}(3/4)^{\pi^{2}/6}T}+\mu+\frac{L\sup_{t\geq 1}|a_{t}|}{T}
<μ+Lβ𝒞0(μ+supt1|at|)T.\displaystyle<\mu+\frac{L\beta\mathcal{C}_{0}(\mu+\sup_{t\geq 1}|a_{t}|)}{T}.

Then, we obtain the explicit lower bound of the samples for period estimation:

Proposition 5.2.

Let 𝐚:=(at)t=1\mathbf{a}:=(a_{t})_{t=1}^{\infty} be a μ\mu-nearly periodic sequence of length LL. Fix a positive integer Lmax>1L_{\max}>1 with LLmaxL\leq L_{\max}, δ(0,1)\delta\in(0,1), ξ(0,1)\xi\in(0,1), and σ0>0\sigma_{0}>0. Let (ηt)t=1(\eta_{t})_{t=1}^{\infty} be a noise sequence satisfying Assumption 2. Put γ:=1/(1+4Lmax+1)\gamma:=1/(1+\sqrt{4L_{\max}+1}) and λ:=μ/(σ0γ)\lambda:=\mu/(\sigma_{0}\gamma). We define

ε\displaystyle\varepsilon :=σ0γξ.\displaystyle:=\sigma_{0}\gamma\xi.

If μ/(γξ)<σ0σL(𝐚)\mu/(\gamma\xi)<\sigma_{0}\leq\sigma_{L}(\mathbf{a}), then, for any

T8LmaxR2log(4/δ)σ02(ξλ)2+36Lmax5/2supt1|at|σ0(ξλ),T\geq\frac{8L_{\max}R^{2}\log(4/\delta)}{\sigma_{0}^{2}(\xi-\lambda)^{2}}+\frac{36L_{\max}^{5/2}\sup_{t\geq 1}|a_{t}|}{\sigma_{0}(\xi-\lambda)},

the set of rational numbers

ST,ε:={q(0,1):qL>0 and |((at+ηt)t=1T;q)|>ε}\displaystyle S_{T,\varepsilon}:=\left\{q\in\mathbb{Q}\cap(0,1):qL\in\mathbb{Z}_{>0}\text{ and }\left|\mathcal{R}\left((a_{t}+\eta_{t})_{t=1}^{T};q\right)\right|>\varepsilon\right\}

is non-empty with probability at least 1δ1-\delta.

If we apply several collections of rewards (rt(xt))t=1T(r_{t}(x_{t}))_{t=1}^{T} for sufficiently large TT indicated in Proposition 5.2, we obtain various divisors of LL. Finally, we provide the precise inputs and output of Algorithm 1 in the following Theorem:

Theorem 5.3.

Suppose Assumptions 1, 2 and 3 hold. Let r[0,1)r\in[0,1) be a non-negative real number, and suppose ρ>0\rho>0 and δ(0,1)\delta\in(0,1) are given. Fix a positive integer Lmax>1L_{\max}>1 with LLmaxL\leq L_{\max}. We define

ε\displaystyle\varepsilon :=ρ6dLmax.\displaystyle:=\frac{\rho}{6\sqrt{d}L_{\max}}. (5.4)

Assume that rεμr\varepsilon\geq\mu. Let TpT_{p} be an integer satisfying

Tp72dALmax2ρ2(1r)2+108BθdLmax3ρ(1r),\displaystyle T_{p}\geq\frac{72dAL_{\max}^{2}}{\rho^{2}(1-r)^{2}}+\frac{108B_{\theta}\sqrt{d}L_{\max}^{3}}{\rho(1-r)}, (5.5)

where A:=R2log(4dLmax2logLmax/δ)A:=R^{2}\log(4dL_{\max}^{2}\log L_{\max}/\delta). Then, the output L^\hat{L} of Algorithm 1 is a (ρ,d)(\rho,\sqrt{d})-anp of (ft(θ))t=t0(f^{t}(\theta))_{t=t_{0}}^{\infty} with probability at least 1δ1-\delta.

If μ\mu is sufficiently small, we may set rr as a small positive number, in particular r=0r=0 if the system is periodic.

Remark 5.4.

If random arm selection is adopted rather than the orthogonal basis, it may underestimate an error margin on some dimensions, which could lead to the nearly period with much larger error margin than expected; considering failure probability of such a case may potentially produce a variant of our algorithm.

5.2 Eigenvalue estimation

If the underlying system has certain structures, more detailed information about the system is expected to be obtained. In this section, we assume the following condition, linearity of the underlying dynamical system ff on Θ\Theta, in addition to Assumption 1, 2, and 3:

Assumption 4 (Linear dynamical systems).

The dynamical system f:ΘΘf:\Theta\to\Theta is linear and is represented by a matrix Md×dM\in\mathbb{R}^{d\times d}.

Let d=αVα\mathbb{C}^{d}=\oplus_{\alpha}V_{\alpha} be the decomposition via generalized eigenspaces of MM, where α\alpha runs over the eigenvalues of MM and Vα:=𝒩((αIM)d)V_{\alpha}:=\mathscr{N}((\alpha I-M)^{d}). We describe θ=αθα\theta=\sum_{\alpha}\theta_{\alpha} with θαVα\theta_{\alpha}\in V_{\alpha}. We remark that an eigenvalue α\alpha of MM such that θα0\theta_{\alpha}\neq 0 is in the form of e2πi/Le^{2\pi i\ell/L} unless |α|<1|\alpha|<1 by Proposition 4.2.

Our objective is to estimate some of, if not all of, the eigenvalues of MM with high probability within some error that decreases by the sample size. To this end, we define the meaningful subset of eigenvalues of MM.

Definition 5.2 ((θ,k)(\theta,k)-distinct eigenvalues).

For a vector θd\theta\in\mathbb{C}^{d} and k>0k\in\mathbb{Z}_{>0}, we define a (θ,k)(\theta,k)-distinct eigenvalue by an eigenvalue β\beta of MkM^{k} such that |β|=1|\beta|=1 and θβ0\theta_{\beta}\neq 0.

In our case, starting from a vector θ\theta, the effect of the eigenvalues that are not of (θ,d)(\theta,d)-distinct eigenvalues of MM may not be observable. Basically, once being able to ignore the effects of eigenvalues of magnitudes less than 11, the system becomes nearly periodic and we aim at estimating (θ,d)(\theta,d)-distinct eigenvalues as we obtain more samples.

Input: Effective sample size N>0N\in\mathbb{Z}_{>0}; threshold γ(N)\gamma(N)
      Output: Matrix A1(N)A^0(N)A_{1}(N)\hat{A}_{0}(N)^{\dagger}

1:Independently draw random unit vectors x~m,m[d]\tilde{x}_{m},~{}~{}m\in[d], from uniform distribution over the unit sphere in d\mathbb{R}^{d}.
2:Wait NN time steps.
3:for t=N+1,,N+2Nd2t=N+1,\cdots,N+2Nd^{2} do
4:     m0{(tN1)mod2d2}+1m_{0}\leftarrow\{(t-N-1)\mod 2d^{2}\}+1
5:     mm0/2dm\leftarrow\lceil m_{0}/2d\rceil
6:     Sample arm xt=x~mx_{t}=\tilde{x}_{m} and observe r~t:=rt(x~m)\tilde{r}_{t}:=r_{t}(\tilde{x}_{m}).
7:end for
8:Construct matrices A0(N)A_{0}(N) and A1(N)A_{1}(N) as in (5.6), respectively.
9:Obtain the low rank approximation A^0(N)\hat{A}_{0}(N) of A0(N)A_{0}(N) via SVD with the threshold γ(N)\gamma(N).
10:Output A1(N)A^0(N)A_{1}(N)\hat{A}_{0}(N)^{\dagger}.
Algorithm 2 Eigenvalue estimation

Our eigenvalue estimation algorithm is summarized in Algorithm 2; it maintains the following matrices. For N>0N\in\mathbb{Z}_{>0}, dd random unit vectors x~1,,x~d\tilde{x}_{1},\dots,\tilde{x}_{d}, and s=0,1s=0,1, we define the matrix As(N)d×dA_{s}(N)\in\mathbb{C}^{d\times d} so that its (k,)(k,\ell) element is given by

j=0N1r2(k1)d+sd+2d2j+N+(x~k)ei2πj24L.\displaystyle\sum_{j=0}^{N-1}r_{2(k-1)d+sd+2d^{2}j+N+\ell}(\tilde{x}_{k})e^{\frac{i2\pi j^{2}}{4L}}. (5.6)

That is, after NN steps, the reward multiplied by exp(i2πj2/4L)\exp(i2\pi j^{2}/4L) is placed from the top row of A0A_{0} and then the top row of A1A_{1}, followed by the second rows of them, and so on. Then, those values are summed up for every 2d22d^{2} steps or every jjth cycle. Here, after throwing away NN samples, the effects of eigenvalues of magnitude less than 11 become negligible, and the trajectory becomes nearly periodically behaved under Assumption 4. The rest of the samples is used to average out the observation noise while maintaining some meaningful information about MM.

The aforementioned exponential sum can be characterized by the Weyl-type sum of matrices, a key machinery for our algorithm, which we define below:

Definition 5.3.

Let WW be a linear space and let M1,,MNM_{1},\dots,M_{N} for N>0N\in\mathbb{Z}_{>0} be linear maps on WW. For L>0L\in\mathbb{Z}_{>0}, we define

𝒲((M1,,MN)):=1Nj=0N1Mj+1ei2πj24L.\mathscr{W}\left((M_{1},\dots,M_{N})\right):=\frac{1}{N}\sum_{j=0}^{N-1}M_{j+1}e^{\frac{i2\pi j^{2}}{4L}}.
Remark 5.5.

Let Es,n:=(η2di+N+sd+j+1+2d2n)i,j=0,,d1E_{s,n}:=(\eta_{2di+N+sd+j+1+2d^{2}n})_{i,j=0,\dots,d-1} be a noise matrix for s=0,1s=0,1. Let

X:=(x~1M1x2M2d+1x~dM2(d1)d+1) and K:=(Mθ,,Mdθ).\displaystyle X:=\left(\begin{array}[]{c}\tilde{x}_{1}^{\top}M^{1}\\ x_{2}^{\top}M^{2d+1}\\ \vdots\\ \tilde{x}_{d}^{\top}M^{2(d-1)d+1}\end{array}\right)\text{ and }K:=(M\theta,\dots,M^{d}\theta).

Then, As(N)A_{s}(N) has an alternative description as follows:

As(N)=X𝒲((M2d2j)j=0N1)Msd+N1K+𝒲((Es,j)j=0N1).\displaystyle A_{s}(N)=X\mathscr{W}\left((M^{2d^{2}j})_{j=0}^{N-1}\right)M^{sd+N-1}K+\mathscr{W}\left((E_{s,j})_{j=0}^{N-1}\right). (5.7)

As in Proposition 5.7 below, the Weyl-type sum has a crucial property. Define κ1\kappa\geq 1 by

κ:=infP{PP1:P1MP=JM},\displaystyle\kappa:={\inf_{P}{\left\{\|P\|\|P^{-1}\|:P^{-1}MP=J_{M}\right\}}},

where JMJ_{M} is a Jordan normal form of MM. Also, we define Δ(0,1]\Delta\in(0,1] to be a value such that, for any eigenvalue α\alpha of MM satisfying |α|<1|\alpha|<1, |α|1Δ|\alpha|\leq 1-\Delta (define Δ=1\Delta=1 if no such eigenvalue exists). Note by Proposition 4.2, the existence of such spectral gap is guaranteed without any further assumptions.

Roughly speaking, Algorithm 2 estimates “A1(N)A0(N)1=XMdX1A_{1}(N)A_{0}(N)^{-1}=XM^{d}X^{-1}”. Of course, the formula in “…” is not valid as XX, KK, and the Weyl-type sum are not necessarily invertible and we cannot recover full information of MdM^{d} in general. However, we can still reconstruct information of MdM^{d} restricted on the eigenspaces for (θ,d)(\theta,d)-distinct eigenvalues.

To see this, we introduce the Weyl sum and its lower bound:

Lemma 5.6 (Lower bound on the Weyl sum Bourgain (1993); Oh ).

Define the Weyl sum by

𝒲(N,b,q):=j=0Nei2π(j2b4q+j214q),\displaystyle\mathscr{W}(N,b,q):=\sum_{j=0}^{N}e^{i2\pi\left(j\frac{2b}{4q}+j^{2}\frac{1}{4q}\right)},

for some b,q>0,b<qb\in\mathbb{N},~{}q\in\mathbb{Z}_{>0},~{}b<q and N>0N\in\mathbb{Z}_{>0}. Then, for N16q2N\geq 16q^{2}, it holds that

|𝒲(N,b,q)|Ω(Nq).\displaystyle|\mathscr{W}(N,b,q)|\in\Omega\left(\frac{N}{\sqrt{q}}\right).
Proof.

It is immediate from Proposition 3.1 in Oh because gcd(1,4q)=1{\rm gcd}(1,4q)=1, 4q0mod44q\equiv 0\mod 4, and 2b2b is even. ∎

We define Cws(L)>0C_{\rm ws}(L)>0 by

Cws(L):=inf{C>0:C1<|1N𝒲(N,b,L)|<C for any N16L2,0b<L}.C_{\rm ws}(L):=\inf\left\{C>0:C^{-1}<\left|\frac{1}{N}\mathscr{W}(N,b,L)\right|<C\text{ for any }N\geq 16L^{2},0\leq b<L\right\}.

Then, we have the following proposition:

Proposition 5.7.

Let M1M_{1} and M<1M_{<1} be matrices as in Corollary 4.3. We define linear maps on WW by

QN(M1)\displaystyle Q_{N}(M_{1}) :=𝒲((M12d2j)j=0N1)\displaystyle:=\mathscr{W}\left((M_{1}^{2d^{2}j})_{j=0}^{N-1}\right)
QN(M<1)\displaystyle Q_{N}(M_{<1}) :=𝒲((M<12d2j)j=0N1).\displaystyle:=\mathscr{W}\left((M_{<1}^{2d^{2}j})_{j=0}^{N-1}\right).

Then, we have the following statements:

  1. 1.

    𝒲((Mθ2d2j)j=0N1)=QN(M1)+QN(M<1)\mathscr{W}\left((M_{\theta}^{2d^{2}j})_{j=0}^{N-1}\right)=Q_{N}(M_{1})+Q_{N}(M_{<1}),

  2. 2.

    for any N16L2N\geq 16L^{2}, QN(M1)Q_{N}(M_{1}) is invertible on W=1W_{=1},

  3. 3.

    for any r0r\geq 0 and for any N16L2N\geq 16L^{2}, M1rQN(M1)|W=1,(M1rQN(M1))|W=11κCws(L)\|M_{1}^{r}Q_{N}(M_{1})|_{W_{=1}}\|,\|(M_{1}^{r}Q_{N}(M_{1}))|_{W_{=1}}^{-1}\|\leq\kappa C_{\rm ws}(L),

  4. 4.

    for any r2(d1)r\geq 2(d-1), we have

    M<1rQN(M<1)d2κeΔ(rd+1)NΔd1.\|M_{<1}^{r}Q_{N}(M_{<1})\|\leq\frac{d^{2}\kappa e^{-\Delta(r-d+1)}}{N\Delta^{d-1}}.
Proof.

We prove 1. By the properties 1 and 2 in Corollary 4.3, we have 𝒲((M2d2j)j=0N1)=QN(M1)+QN(M<1)\mathscr{W}((M^{2d^{2}j})_{j=0}^{N-1})=Q_{N}(M_{1})+Q_{N}(M_{<1}). Next, we prove 2. When we regard QN(M1)|W=1Q_{N}(M_{1})|_{W_{=1}} as a linear map on W=1W_{=1}, it is represented as a diagonal matrix diag(𝒲(N,b1,L),,𝒲(N,bm,L)){\rm diag}(\mathscr{W}(N,b_{1},L),\dots,\mathscr{W}(N,b_{m},L)), where m=dimW=1m={\rm dim}W_{=1}. Therefore, by Lemma 5.6, QN(M1)Q_{N}(M_{1}) is a bijective linear map on W=1W_{=1}. Next, we prove 3. We estimate M1rQN(M1)\|M_{1}^{r}Q_{N}(M_{1})\|. Let p:dWp:\mathbb{C}^{d}\to W be the orthogonal projection and let i:Wdi:W\to\mathbb{C}^{d} be the inclusion map. Let M~1:=iM1pd×d\tilde{M}_{1}:=iM_{1}p\in\mathbb{C}^{d\times d}. Then, we have

QN(M1)=QN(M~1)κCws(L).\|Q_{N}(M_{1})\|=\|Q_{N}(\tilde{M}_{1})\|\leq\kappa C_{\rm ws}(L).

Next, we estimate M<1rQN(M<1)\|M_{<1}^{r}Q_{N}(M_{<1})\|. Let M~<1:=iM<1p\tilde{M}_{<1}:=iM_{<1}p. Then, iM<1rQN(M<1)p=𝒲((M~<12d2j+r)j=0N1)iM_{<1}^{r}Q_{N}(M_{<1})p=\mathscr{W}((\tilde{M}_{<1}^{2d^{2}j+r})_{j=0}^{N-1}) and iM<1rQN(M<1)p=M<1rQN(M<1)\|iM_{<1}^{r}Q_{N}(M_{<1})p\|=\|M_{<1}^{r}Q_{N}(M_{<1})\|. Let PP be a regular matrix such that

M~<1=PJP1,\tilde{M}_{<1}=PJP^{-1},

where JJ is the Jordan normal form. Then, for r2(d1)r\geq 2(d-1), we see that

M<1rQN(M<1)\displaystyle\|M_{<1}^{r}Q_{N}(M_{<1})\| =iM<1rQN(M<1)p\displaystyle=\|iM_{<1}^{r}Q_{N}(M_{<1})p\|
κNj=0N1J2d2j+r<d2κNj=0(2d2j+rd1)(1Δ)2d2j+rd+1\displaystyle\leq\frac{\kappa}{N}\sum_{j=0}^{N-1}\|J^{2d^{2}j+r}\|<\frac{d^{2}\kappa}{N}\sum_{j=0}^{\infty}\binom{2d^{2}j+r}{d-1}(1-\Delta)^{2d^{2}j+r-d+1}
d2κ(1Δ)rd+1NΔd1d2κeΔ(rd+1)NΔd1.\displaystyle\leq\frac{d^{2}\kappa(1-\Delta)^{r-d+1}}{N\Delta^{d-1}}\leq\frac{d^{2}\kappa e^{-\Delta(r-d+1)}}{N\Delta^{d-1}}.

The second last inequality is proved as follows: for |a|<1|a|<1, n1n\geq 1 and rm0r\geq m\geq 0,

|j=0(nj+rm)anjm+r|\displaystyle\left|\sum_{j=0}^{\infty}\binom{nj+r}{m}a^{nj-m+r}\right| =|1m!dmdxmj=0xnj+r|x=a|\displaystyle=\left|\left.\frac{1}{m!}\frac{d^{m}}{dx^{m}}\sum_{j=0}^{\infty}x^{nj+r}\right|_{x=a}\right|
1m!j=0m(mj)r!(rm+j)!|a|rm+j|djdxj11xn|x=a|\displaystyle\leq\frac{1}{m!}\sum_{j=0}^{m}\binom{m}{j}\frac{r!}{(r-m+j)!}|a|^{r-m+j}\left|\left.\frac{d^{j}}{dx^{j}}\frac{1}{1-x^{n}}\right|_{x=a}\right|
1m!j=0m(mj)r!(rm+j)!j!|a|rm+j(1|a|)j=j=0m(rj)|a|rm+j(1|a|)j,\displaystyle\leq\frac{1}{m!}\sum_{j=0}^{m}\binom{m}{j}\frac{r!}{(r-m+j)!}\frac{j!|a|^{r-m+j}}{(1-|a|)^{j}}=\sum_{j=0}^{m}\binom{r}{j}\frac{|a|^{r-m+j}}{(1-|a|)^{j}},

where, the last inequality follows from

|dmdxm11xn|=|ζn=1m!ζ1nn(ζx)m|m!(1|x|)m.\left|\frac{d^{m}}{dx^{m}}\frac{1}{1-x^{n}}\right|=\left|\sum_{\zeta^{n}=1}\frac{m!\zeta^{1-n}}{n(\zeta-x)^{m}}\right|\leq\frac{m!}{(1-|x|)^{m}}.

Thus, we have

j=0m(rj)|a|rm+j(1|a|)j|a|rmj=0r(rj)(|a|1|a|)j|a|rm(1|a|)m.\sum_{j=0}^{m}\binom{r}{j}\frac{|a|^{r-m+j}}{(1-|a|)^{j}}\leq|a|^{r-m}\sum_{j=0}^{r}\binom{r}{j}\left(\frac{|a|}{1-|a|}\right)^{j}\leq\frac{|a|^{r-m}}{(1-|a|)^{m}}.

Proposition 5.7 plays an essential role in our analysis and guarantees that the information in As(N)A_{s}(N) about (θ,d)(\theta,d)-distinct eigenvalues does not vanish while noise effects are canceled out. Now, we state our main theoretical result for the eigenvalue estimation algorithm. Before stating the theorem, we introduce lower rank approximation via the singular value threshold.

Definition 5.4.

Let An×mA\in\mathbb{C}^{n\times m} be a matrix. Let A=U(DO)VA=U\left(D~{}O\right)V^{*} be a singular valued decomposition of AA where Un×nU\in\mathbb{C}^{n\times n} and Vm×mV\in\mathbb{C}^{m\times m} are unitary matrices and D:=diag(σ1,,σr,𝐨)D:={\rm diag}(\sigma_{1},\dots,\sigma_{r},\mathbf{o}^{\top}) is a diagonal matrix with nonnegative components. Let γ>0\gamma>0. We define a low rank approximation AγA_{\gamma} of AA via the singular value threshold γ\gamma by the matrix AγA_{\gamma} defined by Aγ=U(DγO)VA_{\gamma}=U\left(D_{\gamma}~{}O\right)V^{*}, where, Dγ:=diag(𝟏[γ,)(σ1)σ1,,𝟏[γ,)(σr)σr,𝐨)D_{\gamma}:={\rm diag}(\mathbf{1}_{[\gamma,\infty)}(\sigma_{1})\sigma_{1},\dots,\mathbf{1}_{[\gamma,\infty)}(\sigma_{r})\sigma_{r},\mathbf{o}^{\top}) and 𝟏I\mathbf{1}_{I} is defined to be the characteristic function supported on II\subset\mathbb{R}.

Given this definition, we are ready to present the following main result:

Theorem 5.8.

Suppose Assumptions 1, 2, 3, and 4 hold. Given δ(0,1]\delta\in(0,1], let the effective sample size

Nmax{16L2,(d1)logΔΔ+log(Bθκ2)+d+6Δ+d},\displaystyle N\geq\max\left\{16L^{2},\frac{-(d-1)\log\Delta}{\Delta}+\frac{\log(B_{\theta}\kappa^{2})+d+6}{\Delta}+d\right\}, (5.8)

and γ(N)=(4d2R2log(4d2/δ)+1)/N\gamma(N)={(\sqrt{4d^{2}R^{2}\log\left(4d^{2}/\delta\right)}+1)}/{\sqrt{N}}. Then, there exists a matrix AA whose eigenvalues are zeros except for (θ,d)(\theta,d)-distinct eigenvalues of MM, such that the output of Algorithm 2, i.e. A1(N)A^0(N)A_{1}(N)\hat{A}_{0}(N)^{\dagger}, satisfies, with probability at least 1δ1-\delta, that

AA1(N)A^0(N)C(R2(log(1/δ)+1)+1N).\displaystyle\left\|A-A_{1}(N)\hat{A}_{0}(N)^{\dagger}\right\|\leq C\left(\frac{R^{2}\left(\log\left(1/\delta\right)+1\right)+1}{\sqrt{N}}\right). (5.9)

Here, A^0(N)\hat{A}_{0}(N)^{\dagger} is the Moore-Penrose pseudo inverse of a lower rank approximation of A0(N)A_{0}(N) via the singular value threshold γ(N)\gamma(N). The constant C>0C>0 depends on θ\theta, MM, dd, (x~m)m=1d(\tilde{x}_{m})_{m=1}^{d}, and Cws(L)C_{\rm ws}(L).

We mention that by using the results shown in Song (2002), the bound on spectral norm (5.9) can be translated to the bounds on eigenvalues, where the constant depends on the form of AA. As described in Theorem 5.8, the constant is not the absolute constant for any problem instance but depends on several factors; however, for the same execution, this rate is useful to judge how many samples one collects to estimate eigenvalues.

Remark 5.9.

We note that we can reconstruct the (θ,1)(\theta,1)-distinct eigenvalues of MM via Algorithm 2 using the following trick: Fix non-negative integer r0r\geq 0. Take d+rd+r random unit vectors x~1,,x~d+r\tilde{x}_{1},\dots,\tilde{x}_{d+r}. Then, for s=0,1s=0,1, we may define a matrix As(N;r)(d+r)×(d+r)A_{s}(N;r)\in\mathbb{C}^{(d+r)\times(d+r)} so that its (k,)(k,\ell) element is given by

j=0N1r2(k1)(d+r)+s(d+r)+2(d+r)2j+N+(x~k)ei2πj24L.\sum_{j=0}^{N-1}r_{2(k-1)(d+r)+s(d+r)+2(d+r)^{2}j+N+\ell}(\tilde{x}_{k})e^{\frac{i2\pi j^{2}}{4L}}.

Then, we see that Algorithm 2 outputs a matrix A~(r)\tilde{A}(r) that well approximates the (θ,d+r)(\theta,d+r)-distinct eigenvalues of MM since the matrix As(N;r)A_{s}(N;r) coincides with As(N)A_{s}(N) in the case when we replace MM and θ\theta with M(r)M(r) and θ(r)\theta(r) defined by

M(r):=(MOOO)(d+r)×(d+r),θ(r):=(θ𝐨)d+r.\displaystyle M(r):=\left(\begin{array}[]{cc}M&O\\ O&O\end{array}\right)\in\mathbb{C}^{(d+r)\times(d+r)},\theta(r):=\left(\begin{array}[]{c}\theta\\ \mathbf{o}\\ \end{array}\right)\in\mathbb{C}^{d+r}.

Let rr be an integer such that r+dr+d is prime to LL and fix a positive integer mm such that m(r+d)1modLm(r+d)\equiv 1\mod L. Then, the eigenvalues of A~(r)m\tilde{A}(r)^{m} is close to those of (θ,1)(\theta,1)-distinct eigenvalues if we take sufficiently large NN.

6 Simulated experiments

In this section, we present simulated experiments that complement the theoretical claims. In particular, we conducted period estimations for an instance of LifeGame Conway et al. (1970), which is a special case of cellular automata Von Neumann et al. (1966), and for a nearly periodic toy system, and an eigenvalue estimation for a linear system, where some dimensions are for permutations and the rest is for shrinking.

Period estimation: LifeGame.

Refer to caption
Figure 5: Left: Illustration of a period eight instance of LifeGame; (top) original transitions. (down) an instance of noisy observation. Right: μ\mu-nearly periodic dynamics (6.1).

We use a specific instance of LifeGame which is illustrated in Figure 5. As shown on the top eight pictures, starting from certain configuration of cells, it shows transitions of period eight. The sample size is computed as the smallest integer satisfying (5.5), and the threshold ε\varepsilon is given by (5.4). To prevent the dimension from becoming too large, we used five cells that correctly display period eight; that is d=5d=5. Noise ηt\eta_{t} is given by i.i.d. Gaussian with proxy R=0.3R=0.3, and the down eight pictures of Figure 5 are some instances of noisy observations. We tested 50 different random seeds (i.e., 1, 51, 101, 151, … , 2451), and computed the error rate (the number of runs producing a wrong estimate other than the fundamental period eight, which is divided by 50); and it was zero.

Period estimation: Simple μ\mu-nearly periodic system. We consider the following μ\mu-nearly periodic system that circulates over a circle with small variations:

rt+1=μ(αrt1μαrt1μ)+1,θt+1=θt+2πL,\displaystyle r_{t+1}=\mu\left(\alpha\frac{r_{t}-1}{\mu}-\lceil\alpha\frac{r_{t}-1}{\mu}\rceil\right)+1,~{}~{}~{}~{}~{}\theta_{t+1}=\theta_{t}+\frac{2\pi}{L}, (6.1)

where rr and θ\theta are the radius and angle, and α\alpha\notin\mathbb{Q}. We use μ=0.001\mu=0.001, L=5L=5, and α=π\alpha=\pi. Noise ηt\eta_{t} is drawn i.i.d. from the uniform distribution within [R,R][-R,R] for R=0.3R=0.3. We tested 50 different random seeds (i.e., 1, 51, 101, 151, … , 2451), and computed the error rate; and it was zero.

Eigenvalue estimation: Permutation and shrink. We use d=5d=5, and M5×5M\in\mathbb{R}^{5\times 5} is made such that 1) the first four dimensions are for permutation (i.e., each of row and column of 4×44\times 4 sub-matrix has only one nonzero element that is one.), and 2) the last dimension is simply shrinking; we gave 0.70.7 for (5,5)(5,5)-element of MM. Initial vector θ0\theta_{0} and each arm x~m,m[d],\tilde{x}_{m},~{}m\in[d], are uniformly sampled from the unit sphere in 5\mathbb{R}^{5}. The value LL is computed by 4!=244!=24. We used the smallest integer NN that satisfies (5.8), multiplied by Csim>0C_{\rm sim}>0. The results are shown in Table 1; it is observed that the more samples we use the more accurate the estimates become to (θ0,5)(\theta_{0},5)-distinct eigenvalues of MM. Noise ηt\eta_{t} is drawn i.i.d. from the uniform distribution within [R,R][-R,R] for R=0.3R=0.3, and the Table 1 is of the random seed 1234. We also tested 50 different random seeds (i.e., 1, 51, 101, 151, … , 2451) for Csim=30C_{\rm sim}=30, and computed the mean absolute error between the true (θ0,5)(\theta_{0},5)-distinct eigenvalues and their nearest estimated values; it was 0.00190.0019, which is sufficiently small.

Table 1: Results for the eigenvalue estimation. The top-most row shows the true eigenvalues of M5M^{5}, and the second row shows its (θ0,5)(\theta_{0},5)-distinct eigenvalues. From the third to sixth row, it shows the estimated eigenvalues for different values of CsimC_{\rm sim}.
eigenvalues of M5M^{5} 1.0001.000 1.0001.000 0.5000.866i-0.500-0.866i 0.500+0.866i-0.500+0.866i 0.1680.168
(θ0,5)(\theta_{0},5) 1.0001.000 0 0.5000.866i-0.500-0.866i 0.500+0.866i-0.500+0.866i 0
when Csim=1C_{\rm sim}=1 1.000+0.008i1.000+0.008i 0 0.4890.860i-0.489-0.860i 0.510+0.869i-0.510+0.869i 0
when Csim=5C_{\rm sim}=5 0.9990.000i0.999-0.000i 0 0.4950.867i-0.495-0.867i 0.501+0.862i-0.501+0.862i 0
when Csim=10C_{\rm sim}=10 1.000+0.003i1.000+0.003i 0 0.4970.867i-0.497-0.867i 0.501+0.864i-0.501+0.864i 0
when Csim=30C_{\rm sim}=30 1.000+0.001i1.000+0.001i 0 0.5000.866i-0.500-0.866i 0.500+0.864i-0.500+0.864i 0

Attempt on applications to more realistic data: Worm simulation. Finally, we present an attempt on applications to more realistic situation, which will demonstrate some of the important aspects that need to be taken into account when naively applying our algorithms to the real world problems. In particular, we use OpenWorm (https://openworm.org/) (Szigeti et al., 2014) to simulate a worm motion (we ran more than around 1010 hours for simulations); example images from this simulation is shown in Figure 6 (left). It is known that the behavior of worm has some periodicity (e.g., Ahamed et al. (2021)). We resized the images to 6×66\times 6 (i.e., d=36d=36), and downsampled the video by extracting one per eight images. Also, we repeat the same video in the way that the entire video still shows smooth flow (i.e., cut the video so that the end image is similar to the image in the beginning, and concatenate the same videos) so that we can sample a large number of data.

Refer to caption
Figure 6: Left: OpenWorm (https://openworm.org/) simulation of a worm motion. Middle: The extracted sequence of images with 6×66\times 6 resolution for our experiment. Right: Diagram of how the worm motion is viewed as a (nearly) periodic motion.

First, we assume we have no access to reasonable values for the hyperparameters, and show what happens when we use hyperparameters that do not reflect the target system. With such a choice of parameters (see Appendix F.4), the algorithm outputs an estimate that contradicts to the hyperparemters; with the value Lmax=25L_{\rm max}=25, it outputs the estimate 4040. In particular, there exist dimensions that output the estimates 88 and 2020, from which the final estimate becomes 4040. Note that, in this experiment, we use a fixed number of samples TpT_{p} since we have no access to the reasonable hyperparameters. On the other hand, we increased the margin ρ\rho significantly, and then the output becomes 20Lmax20\leq L_{\rm max} where each dimension shows 11 or 2020 in this case. In fact, one segment of our video data (recall we concatenate the same video to create a sequence of an indefinite number of images) consists of 120120 images and contains 66 periods of sequence.

We mention that the aforementioned fact implies that the choice of hyperparameters may be practically validated by confirming that the outputs are consistent to the parameters.

Next, we assume that the system is linear and run the DMD over this canonical basis of the state space. Let MDMDM_{\rm DMD} be the solution to the exact DMD over the video data, and we compute the eigenvalues of MDMDdM^{d}_{\rm DMD} and of the output of Algorithm 2. We tested the algorithm with the fixed hyperparameters given in Appendix F.4 and with varying number of NN, namely, 10310^{3}, 10410^{4}, and 10510^{5}. For the cases of 10310^{3} and 10410^{4}, we only obtained one eigenvalue which is close to 1+0i1+0i and the others were almost zero. For N=105N=10^{5}, we obtained three nonzero eigenvalues which are plotted in Figure 7 against the top four eigenvalues of MDMDdM^{d}_{\rm DMD}. The actual nonzero eigenvalues and the estimated errors for all three cases are given in Table 2.

From the results, we argue that, for some real applications where the system is not exactly linear, the eigenvalue estimation algorithm may not work well although it does not output insane values.

[Uncaptioned image]
Figure 7: Estimated eigenvalues having large magnitudes (red) and the top four DMD eigenvalues (blue).
NN Nonzero eigenvalues Errors
10310^{3} 1.000+0.002i1.000+0.002i 0.1990.199
10410^{4} 1.000+0.002i1.000+0.002i 0.1990.199
1.001+0.001i1.001+0.001i
10510^{5} 0.288+0.884i0.288+0.884i 0.1720.172
0.290+0.867i0.290+0.867i
Table 2: Estimated nonzero eigenvalues for N=103,104,105N=10^{3},10^{4},10^{5}, and the total errors to the true DMD matrix MDMDdM^{d}_{\rm DMD}.

7 Discussion and conclusion

This work proposed novel algorithms for estimating periodic information about the dynamical systems from bandit feedback contaminated by a sub-Gaussian noise. Since we consider a rather novel problem setup, the setting itself would be seen as a limitation of our work, and we present a potentially important list of future works below.

Choice of hyperparameters such as Lmax,NL_{\rm max},N: If those hyperparameters are correctly chosen for nearly periodic systems, the outputs of our algorithms are consistent and reasonable across runs, which follows the theoretical insights; therefore, practically, one can check if the hyperparameters are properly chosen by running the algorithms. Refer to Section 6 for related discussions; in short, inappropriate choice of hyperparameters leads to contradicting outputs (e.g., period estimate larger than the assumed maximum value). On the other hand, studying if it is possible to theoretically ensure correctness of hyperparameters (e.g., by standard doubling trick (Cesa-Bianchi et al., 1997) for efficient guessing) or to identify non-periodic systems is an important future work.

Extension to general spectrum information estimation problems: We only impose nearly periodicity on the dynamical systems on top of the linearity for the eigenvalue estimation problem. Nevertheless, additional studies are needed to allow other forms of observations (not limited to bandit feedback) and non-periodic systems to consider eigenvalues with arguments of 2π2\pi times irrational numbers. Actually, the asymptotic bounds of the Weyl sum themselves are still valid when it is sufficiently well approximated by a rational number (Bourgain, 1993; Oh, ). Moreover, allowing control inputs would make eigenvalues of magnitudes smaller than one efficiently recoverable.

Extensions to random dynamical systems: This work studied deterministic dynamical systems; however, we conjecture that, under the condition that the variance of trajectories generated by a random dynamical system (RDS) (cf. Arnold (1998)) is sufficiently small, the similar estimation procedure is adopted for such RDSs. It is important to study if the estimation problem becomes easier when the system is driven by a particular noise as it could be treated as a random control input. Also, studying other dynamic structures such as the Lyapunov exponents for nonlinear systems is an interesting future work.

Study of statistical estimation leveraged by other number theoretical results: The proper use of exponential sums enables us to average out the noise while preserving particular information. Studying when this separation is feasible for different sets of information, noise, and class of problems should be important.

Optimality of the results: Although we gave a sufficient number of samples for provably guaranteeing (approximate) correctness of the estimates, it is unclear if our sample complexity is what one can best achieve under particular problem settings. Recall that the current problem settings without an assumption on boundedness of the set of arms DD exclude the lower bound arguments of the minimally required samples as mentioned in Remark 4.4.

As such, instead of arguing the tightness, to justify the complexity of our algorithms, we hence mention that the very naive approach for (nearly) period estimation would cost the order of Lmax!L_{\rm max}! sample complexity. To see this, suppose we know the maximum possible (nearly) period LmaxL_{\rm max} then it follows that Lmax!L_{\rm max}! is a true (nearly) period as a multiple of the fundamental (nearly) period. As such, one could employ the concentration of measures to average out the sample. With sufficiently many sequences of length Lmax!L_{\rm max}! depending on the rate of concentration, one could approximately reconstruct the original (noiseless) sequence of length Lmax!L_{\rm max}! from which one could estimate a desired LLmaxL\leq L_{\rm max} which is an (aliquot) nearly period.

Lower bound of the minimally required samples may be discussed by using similar arguments to the best arm identification problems (cf. Kaufmann et al. (2016)); however, our problems consider dynamical systems, and we have no clue on potential technical tools to deal with the arguments at this point.

Acknowledgments

We thank the constructive comments by anonymous reviewers for improving this work. Motoya Ohnishi thanks Sham Kakade, Yoshinobu Kawahara, Emanuel Todorov, Jacob Sacks, Tomoharu Iwata, and Hiroya Nakao for valuable discussions and thanks Sham Kakade for computational supports. This work of Motoya Ohnishi, Isao Ishikawa, and Masahiro Ikeda was supported by JST CREST Grant, Number JPMJCR1913, including AIP challenge program, Japan. Also, Motoya Ohnishi is supported in part by Funai Overseas Fellowship. Isao Ishikawa is supported by JST ACT-X Grant, Number JPMJAX2004, Japan. This work of Yuko Kuroki was supported by Microsoft Research Asia and JST ACT-X JPMJAX200E.

References

  • Abbasi-Yadkori et al. (2011) Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  • Abe & Long (1999) N. Abe and P. M. Long. Associative reinforcement learning using linear probabilistic concepts. In ICML, pp.  3–11, 1999.
  • Adams et al. (2021) J. Adams, N. Hansen, and K. Zhang. Identification of partially observed linear causal models: Graphical conditions for the non-Gaussian and heterogeneous cases. Advances in Neural Information Processing Systems, 34, 2021.
  • Aerts et al. (2018) C. Aerts, G. Molenberghs, M. Michielsen, MG. Pedersen, R. Björklund, C. Johnston, JSG. Mombarg, DM. Bowman, B. Buysschaert, PI. Pápics, et al. Forward asteroseismic modeling of stars with a convective core from gravity-mode oscillations: parameter estimation and stellar model selection. The Astrophysical Journal Supplement Series, 237(1):15, 2018.
  • Ahamed et al. (2021) T. Ahamed, A. C. Costa, and G. J. Stephens. Capturing the continuous complexity of behaviour in Caenorhabditis elegans. Nature Physics, 17(2):275–283, 2021.
  • Allen & Kanamori (2003) R. M. Allen and H. Kanamori. The potential for earthquake early warning in southern California. Science, 300(5620):786–789, 2003.
  • Anscombe (1963) F. J. Anscombe. Sequential medical trials. Journal of the American Statistical Association, 58(302):365–383, 1963.
  • Arkhipov et al. (2004) G. I. Arkhipov, V. N. Chubarikov, and A. A. Karatsuba. Trigonometric Sums in Number Theory and Analysis. De Gruyter, Berlin, New York, 2004.
  • Arnold (1998) L. Arnold. Random dynamical systems. Springer-Verlag, 1998.
  • Auer (2003) P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003.
  • Auer et al. (2019) P. Auer, Y. Chen, P. Gajane, C. Lee, H. Luo, R. Ortner, and C. Wei. Achieving optimal dynamic regret for non-stationary bandits without prior information. In Conference on Learning Theory, pp.  159–163. PMLR, 2019.
  • Bakarji et al. (2022) J. Bakarji, K. Champion, J. N. Kutz, and S. L. Brunton. Discovering governing equations from partial measurements with deep delay autoencoders. arXiv preprint arXiv:2201.05136, 2022.
  • Bennett & Kallus (2021) A. Bennett and N. Kallus. Proximal reinforcement learning: Efficient off-policy evaluation in partially observed Markov decision processes. arXiv preprint arXiv:2110.15332, 2021.
  • Besbes et al. (2014) O. Besbes, Y. Gur, and A. Zeevi. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, 27, 2014.
  • Bezanson et al. (2017) J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
  • Bhouri & Perdikaris (2021) M. A. Bhouri and P. Perdikaris. Gaussian processes meet NeuralODEs: A Bayesian framework for learning the dynamics of partially observed systems from scarce and noisy data. arXiv preprint arXiv:2103.03385, 2021.
  • Bourgain (1993) J. Bourgain. Fourier transform restriction phenomena for certain lattice subsets and applications to nonlinear evolution equations. Geometric & Functional Analysis GAFA, 3(3):209–262, 1993.
  • Bubeck & Cesa-Bianchi (2012) S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5:1–122, 2012.
  • Cai et al. (2021) H. Cai, Z. Cen, L. Leng, and R. Song. Periodic-GP: Learning periodic world with Gaussian process bandits. arXiv preprint arXiv:2105.14422, 2021.
  • Candes et al. (2006) E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
  • Casdagli et al. (1991) M. Casdagli, S. Eubank, J. D. Farmer, and J. Gibson. State space reconstruction in the presence of noise. Physica D: Nonlinear Phenomena, 51(1-3):52–98, 1991.
  • Cesa-Bianchi et al. (1997) N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
  • Chen et al. (2019) Y. Chen, C. Lee, H. Luo, and C. Wei. A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. In Conference on Learning Theory, pp.  696–726. PMLR, 2019.
  • Chesneau & Bagul (2020) C. Chesneau and Y. J. Bagul. A note on some new bounds for trigonometric functions using infinite products. Malaysian Journal of Mathematical Sciences, 14:273–283, 07 2020.
  • Cheung et al. (2022) W. Cheung, D. Simchi-Levi, and R. Zhu. Hedging the drift: Learning to optimize under nonstationarity. Management Science, 68(3):1696–1713, 2022.
  • Conway et al. (1970) J. Conway et al. The game of life. Scientific American, 223(4):4, 1970.
  • Couillet & Debbah (2011) R. Couillet and M. Debbah. Random matrix methods for wireless communications. Cambridge University Press, 2011.
  • Curi et al. (2020) S. Curi, F. Berkenkamp, and A. Krause. Efficient model-based reinforcement learning through optimistic policy search and planning. Advances in Neural Information Processing Systems, 33:14156–14170, 2020.
  • Dani et al. (2008) V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In 21st Annual Conference on Learning Theory, pp.  355–366, 2008.
  • Derevyanko et al. (2016) S. A. Derevyanko, J. E. Prilepsky, and S. K. Turitsyn. Capacity estimates for optical transmission based on the nonlinear Fourier transform. Nature communications, 7(1):12710, 2016.
  • Donoho (2006) D. L. Donoho. Compressed sensing. IEEE Trans. Information Theory, 52(4):1289–1306, 2006.
  • Duarte et al. (2008) M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk. Single-pixel imaging via compressive sampling. IEEE Signal Processing Magazine, 25(2):83–91, 2008.
  • Eldar & Kutyniok (2012) Y. C. Eldar and G. Kutyniok. Compressed sensing: theory and applications. Cambridge university press, 2012.
  • Furusawa & Kaneko (2012) C. Furusawa and K. Kaneko. A dynamical-systems view of stem cell biology. Science, 338(6104):215–217, 2012.
  • Hazan (2016) E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  • Hazan et al. (2018) E. Hazan, H. Lee, K. Singh, C. Zhang, and Y. Zhang. Spectral filtering for general linear dynamical systems. Advances in Neural Information Processing Systems, 31, 2018.
  • Hughes et al. (2017) M. E. Hughes, K. C. Abruzzi, R. Allada, R. Anafi, A. B. Arpat, G. Asher, P. Baldi, C. De Bekker, D. Bell-Pedersen, J. Blau, et al. Guidelines for genome-scale analysis of biological rhythms. Journal of Biological Rhythms, 32(5):380–393, 2017.
  • Kakade et al. (2020) S. Kakade, A. Krishnamurthy, K. Lowrey, M. Ohnishi, and W. Sun. Information theoretic regret bounds for online nonlinear control. Advances in Neural Information Processing Systems, 33:15312–15325, 2020.
  • Katz (1990) N. M. Katz. Exponential Sums and Differential Equations. (AM-124). Princeton University Press, 1990.
  • Kaufmann et al. (2016) E. Kaufmann, O. Cappé, and A. Garivier. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
  • Kutz et al. (2016) J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic mode decomposition: data-driven modeling of complex systems. SIAM, 2016.
  • Lale et al. (2020) S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar. Logarithmic regret bound in partially observable linear dynamical systems. Advances in Neural Information Processing Systems, 33:20876–20888, 2020.
  • Lattimore & Szepesvári (2020) T. Lattimore and C. Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
  • Lee et al. (2020) A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33:741–752, 2020.
  • Lee (2022) H. Lee. Improved rates for prediction and identification of partially observed linear dynamical systems. In International Conference on Algorithmic Learning Theory, pp.  668–698. PMLR, 2022.
  • Lee & Zhang (2020) H. Lee and C. Zhang. Robust guarantees for learning an autoregressive filter. In Algorithmic Learning Theory, pp.  490–517. PMLR, 2020.
  • Li et al. (2010) L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proc. International Conference on World Wide Web, pp. 661–670, 2010.
  • Ljung (2010) L. Ljung. Perspectives on system identification. Annual Reviews in Control, 34(1):1–12, 2010.
  • Luo et al. (2018) H. Luo, C. Wei, A. Agarwal, and J. Langford. Efficient contextual bandits in non-stationary worlds. In Conference on Learning Theory, pp.  1739–1776. PMLR, 2018.
  • Mania et al. (2022) H. Mania, M. I. Jordan, and B. Recht. Active learning for nonlinear system identification with guarantees. Journal of Machine Learning Research, 23(32):1–30, 2022.
  • Menda et al. (2020) K. Menda, J. De B., J. Gupta, I. Kroo, M. Kochenderfer, and Z. Manchester. Scalable identification of partially observed systems with certainty-equivalent EM. In ICML, pp.  6830–6840, 2020.
  • Meng & Zheng (2010) L. Meng and B. Zheng. The optimal perturbation bounds of the Moore-Penrose inverse under the Frobenius norm. Linear algebra and its applications, 432(4):956–963, 2010.
  • Mhammedi et al. (2020) Z. Mhammedi, D. J. Foster, M. Simchowitz, D. Misra, W. Sun, A. Krishnamurthy, A. Rakhlin, and J. Langford. Learning the linear quadratic regulator from nonlinear observations. Advances in Neural Information Processing Systems, 33:14532–14543, 2020.
  • Moore et al. (2014) A. Moore, T. Zielinski, and A. J. Millar. Online period estimation and determination of rhythmicity in circadian data, using the BioDare data infrastructure. In Plant Circadian Networks, pp.  13–44. Springer, 2014.
  • Oh et al. (2019) S. Oh, A. M. Appavoo, and S. Gilbert. Periodic bandits and wireless network selection. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), pp.  149:1–149:15, 2019.
  • (56) T. Oh. Note on a lower bound of the Weyl sum in Bourgain’s NLS paper (GAFA’93).
  • Ohnishi et al. (2024) M. Ohnishi, I. Ishikawa, K. Lowrey, M. Ikeda, S. M. Kakade, and Y. Kawahara. Koopman spectrum nonlinear regulators and efficient online learning. Transactions on Machine Learning Research, 2024.
  • Ouala et al. (2020) S. Ouala, D. Nguyen, L. Drumetz, B. Chapron, A. Pascual, F. Collard, L. Gaultier, and R. Fablet. Learning latent dynamics for partially observed chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30, 2020.
  • Rathje et al. (1998) E. M. Rathje, N. A. Abrahamson, and J. D. Bray. Simplified frequency content estimates of earthquake ground motions. Journal of Geotechnical and Geoenvironmental Engineering, 124(2):150–159, 1998.
  • Robbins (1952) H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952.
  • Russac et al. (2019) Y. Russac, C. Vernade, and O. Cappé. Weighted linear bandits for non-stationary environments. Advances in Neural Information Processing Systems, 32, 2019.
  • Sabetta & Pugliese (1996) F. Sabetta and A. Pugliese. Estimation of response spectra and simulation of nonstationary earthquake ground motions. Bulletin of the Seismological Society of America, 86(2):337–352, 1996.
  • Serre (1977) J.-P. Serre. Linear Representations of Finite Groups. Springer New York, NY, 1977. doi: https://doi.org/10.1007/978-1-4684-9458-7.
  • Simchowitz & Foster (2020) M. Simchowitz and D. Foster. Naive exploration is optimal for online LQR. In ICML, pp.  8937–8948. PMLR, 2020.
  • Simchowitz et al. (2019) M. Simchowitz, R. Boczar, and B. Recht. Learning linear dynamical systems with semi-parametric least squares. In Conference on Learning Theory, pp.  2714–2802. PMLR, 2019.
  • Sokolove & Bushell (1978) P. G. Sokolove and W. N. Bushell. The Chi square periodogram: its utility for analysis of circadian rhythms. Journal of Theoretical Biology, 72(1):131–160, 1978.
  • Song (2002) Y. Song. A note on the variation of the spectrum of an arbitrary matrix. Linear algebra and its applications, 342(1-3):41–46, 2002.
  • Subramanian et al. (2022) J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan. Approximate information state for approximate planning and reinforcement learning in partially observed systems. Journal of Machine Learning Research, 23(12):1–83, 2022.
  • Summers et al. (2020) C. Summers, K. Lowrey, A. Rajeswaran, S. Srinivasa, and E. Todorov. Lyceum: An efficient and scalable ecosystem for robot learning. In Learning for Dynamics and Control, pp.  793–803. PMLR, 2020.
  • Szigeti et al. (2014) B. Szigeti, P. Gleeson, M. Vella, S. Khayrulin, A. Palyanov, J. Hokanson, M. Currie, M. Cantarelli, G. Idili, and S. Larson. OpenWorm: an open-science approach to modeling Caenorhabditis elegans. Frontiers in Computational Neuroscience, 8:137, 2014.
  • Takens (1981) F. Takens. Detecting strange attractors in turbulence. In Dynamical systems and turbulence, pp.  366–381. Springer, 1981.
  • Tenneti & Vaidyanathan (2015) S. V. Tenneti and P. Vaidyanathan. Nested periodic matrices and dictionaries: New signal representations for period estimation. IEEE Transactions on Signal Processing, 63(14):3736–3750, 2015.
  • Trovo et al. (2020) F. Trovo, S. Paladino, M. Restelli, and N. Gatti. Sliding-window Thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research, 68:311–364, 2020.
  • Tsiamis & Pappas (2019) A. Tsiamis and G. J. Pappas. Finite sample analysis of stochastic system identification. In IEEE CDC, pp.  3648–3654, 2019.
  • Tsiamis et al. (2020) A. Tsiamis, N. Matni, and G. Pappas. Sample complexity of Kalman filtering for unknown systems. In Learning for Dynamics and Control, pp.  435–444. PMLR, 2020.
  • Tu (2013) J. H. Tu. Dynamic mode decomposition: Theory and applications. PhD thesis, Princeton University, 2013.
  • Turitsyn et al. (2017) S. K. Turitsyn, J. E. Prilepsky, S. T. Le, S. Wahls, L. L. Frumin, M. Kamalian, and S. A. Derevyanko. Nonlinear Fourier transform for optical data processing and transmission: advances and perspectives. Optica, 4(3):307–322, 2017.
  • Uy & Peherstorfer (2021) W. I. T. Uy and B. Peherstorfer. Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories. Journal of Scientific Computing, 88(3):1–31, 2021.
  • Van De Beek et al. (1995) J. Van De Beek, O. Edfors, M. Sandell, S. K. Wilson, and P. O. Borjesson. On channel estimation in ofdm systems. In IEEE 45th Vehicular Technology Conference. Countdown to the Wireless Twenty-First Century, volume 2, pp.  815–819, 1995.
  • Von Neumann et al. (1966) J. Von Neumann, A. W. Burks, et al. Theory of self-reproducing automata. IEEE Transactions on Neural Networks, 5(1):3–14, 1966.
  • Walters (2000) P. Walters. An introduction to ergodic theory, volume 79. Springer Science & Business Media, 2000.
  • Wedin (1973) P. Wedin. Perturbation theory for pseudo-inverses. BIT Numerical Mathematics, 13(2):217–232, 1973.
  • Weyl (1916) H. Weyl. Über die gleichverteilung von zahlen mod eins. Math. Ann., 77:31–352, 1916.
  • Williams et al. (2015) M. O. Williams, I. G. Kevrekidis, and C. W. Rowley. A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25:1307–1346, 2015.
  • Wolfe (2006) C. J. Wolfe. On the properties of predominant-period estimators for earthquake early warning. Bulletin of the Seismological Society of America, 96(5):1961–1965, 2006.
  • Wu et al. (2018) Q. Wu, N. Iyer, and H. Wang. Learning contextual bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp.  495–504, 2018.
  • Zielinski et al. (2014) T. Zielinski, A. M. Moore, E. Troup, K. J. Halliday, and A. J. Millar. Strengths and limitations of period estimation methods for circadian data. PloS one, 9(5):e96462, 2014.

Appendix A Proof of Proposition 5.2

Proof.
B\displaystyle B :=Lmaxsupt>0|at|,\displaystyle:=L_{\max}\sup_{t>0}|a_{t}|,
C\displaystyle C :=4R2log(4/δ),\displaystyle:=\sqrt{4R^{2}\log(4/\delta)},
D\displaystyle D :=Lmax2𝒞0(μ+supt1|at|).\displaystyle:=L_{\max}^{2}\mathcal{C}_{0}\left(\mu+\sup_{t\geq 1}|a_{t}|\right).

Then, T>0\sqrt{T}>0 satisfies the following inequality

Tmax{C+C2+4(εμ)B2(εμ),C+C2+4(εμ)D2(εμ)}\displaystyle\sqrt{T}\geq\max\left\{\frac{C+\sqrt{C^{2}+4(\varepsilon-\mu)B}}{2(\varepsilon-\mu)},\frac{C+\sqrt{C^{2}+4(\varepsilon-\mu)D}}{2(\varepsilon-\mu)}\right\}

if and only if

ε\displaystyle\varepsilon 2εμLmaxsupt>0|at|T4R2log(4/δ)T, and\displaystyle\leq 2\varepsilon-\mu-\frac{L_{\max}\sup_{t>0}|a_{t}|}{T}-\sqrt{\frac{4R^{2}\log(4/\delta)}{T}},\text{ and}
ε\displaystyle\varepsilon μ+Lmax2𝒞0(μ+supt1|at|)T+4R2log(4/δ)T.\displaystyle\geq\mu+\frac{L_{\max}^{2}\mathcal{C}_{0}(\mu+\sup_{t\geq 1}|a_{t}|)}{T}+\sqrt{\frac{4R^{2}\log(4/\delta)}{T}}.

Since Lmax2L_{\max}\geq 2, DBD\geq B, μ<γsupt1|at|\mu<\gamma\sup_{t\geq 1}|a_{t}|, 𝒞0(1+γ)<3\mathcal{C}_{0}(1+\gamma)<3, and

εμ\displaystyle\varepsilon-\mu =σ0(1λ)LmaxLmax1+4Lmax+1\displaystyle=\frac{\sigma_{0}(1-\lambda)}{\sqrt{L_{\max}}}\cdot\frac{\sqrt{L_{\max}}}{1+\sqrt{4L_{\max}+1}}
σ0(1λ)22Lmax,\displaystyle\geq\frac{\sigma_{0}(1-\lambda)}{2\sqrt{2}\sqrt{L_{\max}}},
max{C+C2+4(εμ)B2(εμ),C+C2+4(εμ)D2(εμ)}\displaystyle\max\left\{\frac{C+\sqrt{C^{2}+4(\varepsilon-\mu)B}}{2(\varepsilon-\mu)},\frac{C+\sqrt{C^{2}+4(\varepsilon-\mu)D}}{2(\varepsilon-\mu)}\right\}
=C+C2+4(εμ)D2(εμ)\displaystyle=\frac{C+\sqrt{C^{2}+4(\varepsilon-\mu)D}}{2(\varepsilon-\mu)}
2LmaxC24(εμ)2+D(εμ)\displaystyle\leq 2\sqrt{\frac{L_{\max}C^{2}}{4(\varepsilon-\mu)^{2}}+\frac{D}{(\varepsilon-\mu)}}
22Lmax2C2σ02(ξλ)2+62Lmax5/2supt1|at|σ0(ξλ)\displaystyle\leq 2\sqrt{\frac{2L_{\max}^{2}C^{2}}{\sigma_{0}^{2}(\xi-\lambda)^{2}}+\frac{6\sqrt{2}L_{\max}^{5/2}\sup_{t\geq 1}|a_{t}|}{\sigma_{0}(\xi-\lambda)}}
22Lmax2C2σ02(ξλ)2+9Lmax5/2supt1|at|σ0(ξλ),\displaystyle\leq 2\sqrt{\frac{2L_{\max}^{2}C^{2}}{\sigma_{0}^{2}(\xi-\lambda)^{2}}+\frac{9L_{\max}^{5/2}\sup_{t\geq 1}|a_{t}|}{\sigma_{0}(\xi-\lambda)}},

where we used 12γ=2γLmax\sqrt{1-2\gamma}=2\gamma\sqrt{L_{\max}}. Since 2ε(σ022μσ0)/Lmax2\varepsilon\leq\sqrt{(\sigma_{0}^{2}-2\mu\sigma_{0})/L_{\max}}, the statement follows from Lemma 5.1. ∎

Appendix B Proof of Theorem 5.3

Proof.

Let 𝒦\mathscr{K} be the set of all combinations (m,β,s,q)(m,\beta,s,q) such that m[d]m\in[d], β[Lmax]\beta\in[L_{\rm max}], s{0,1,,β1}s\in\{0,1,\dots,\beta-1\} and {q=α/:α{1,,1}}\{q=\alpha/\ell:\alpha\in\{1,\dots,\ell-1\}\}. We remark that

|𝒦|\displaystyle|\mathscr{K}| m=1dβ=1LmaxLmax/βα=11s=0β11=dβ=1LmaxLmax/ββ(1)\displaystyle\leq\sum_{m=1}^{d}\sum_{\beta=1}^{L_{\max}}\sum_{\ell\leq L_{\max}/\beta}\sum_{\alpha=1}^{\ell-1}\sum_{s=0}^{\beta-1}1=d\sum_{\beta=1}^{L_{\max}}\sum_{\ell\leq L_{\max}/\beta}\beta(\ell-1)
dβ=1LmaxLmax2(Lmaxβ1)\displaystyle\leq d\sum_{\beta=1}^{L_{\max}}\frac{L_{\max}}{2}\left(\frac{L_{\max}}{\beta}-1\right)
dLmax2logLmax2.\displaystyle\leq\frac{dL_{\max}^{2}\log L_{\max}}{2}.

Also, let κTp\mathcal{E}^{T_{p}}_{\kappa} be the event such that, for the combination κ𝒦\kappa\in\mathscr{K},

|1Tp/βj=0Tp/β1{ηt0+βj+sei2παj}|4R2log(4dLmax2logLmax/δ)Tp/β.\displaystyle\left|\frac{1}{\lfloor T_{p}/\beta\rfloor}\sum_{j=0}^{\lfloor T_{p}/\beta\rfloor-1}\left\{\eta_{t_{0}+\beta j+s}e^{\frac{i2\pi\alpha j}{\ell}}\right\}\right|\leq\sqrt{\frac{4R^{2}\log\left(4dL_{\rm max}^{2}\log L_{\max}/\delta\right)}{\lfloor T_{p}/\beta\rfloor}}.

Because the error sequence satisfies conditionally RR-sub-Gaussian, using Lemma D.1 and from the fact that any subsequence of the filtration {τ}\{\mathcal{F}_{\tau}\} is again a filtration, we obtain, for each κ𝒦\kappa\in\mathscr{K},

Pr[κTp]1δdLmax2logLmax.\displaystyle\mathrm{Pr}\left[\mathcal{E}^{T_{p}}_{\kappa}\right]\geq 1-\frac{\delta}{dL_{\rm max}^{2}\log L_{\max}}.

Define Tp:=κ𝒦κTp\mathcal{E}^{T_{p}}:=\bigcap_{\kappa\in\mathscr{K}}\mathcal{E}^{T_{p}}_{\kappa}. Then, it follows from the Fréchet inequality that

Pr[Tp]\displaystyle\mathrm{Pr}\left[\mathcal{E}^{T_{p}}\right] =Pr[κ𝒦κTp]|𝒦|(1δdLmax2logLmax)(|𝒦|1)\displaystyle=\mathrm{Pr}\left[\bigcap_{\kappa\in\mathscr{K}}\mathcal{E}^{T_{p}}_{\kappa}\right]\geq|\mathscr{K}|\left(1-\frac{\delta}{dL_{\rm max}^{2}\log L_{\max}}\right)-(|\mathscr{K}|-1)
=1δ|𝒦|dLmax2logLmax1δ.\displaystyle=1-\frac{\delta|\mathscr{K}|}{dL_{\rm max}^{2}\log L_{\max}}\geq 1-\delta.

Let L^\widehat{L} be the output of Algorithm 1. We show that, with probability 1δ1-\delta, L^\widehat{L} is (ρ,d)(\rho,\sqrt{d})-anp of LL. In fact, suppose L^\widehat{L} is not (ρ,d)(\rho,\sqrt{d})-anp. We note that L^<L\widehat{L}<L. There exists s{0,,L^1}s\in\{0,\dots,\widehat{L}-1\} and t1,t2>0t_{1},t_{2}\in\mathbb{Z}_{>0},

fs+L^t1(θ)fs+L^t2(θ)d>ρ+2dμ.\displaystyle\|f^{s+\widehat{L}t_{1}}(\theta)-f^{s+\widehat{L}t_{2}}(\theta)\|_{\mathbb{R}^{d}}>\rho+2\sqrt{d}\mu.

Let margmaxm=1,,d|fs+L^t1(θ)𝐮mfs+L^t2(θ)𝐮m|m^{\prime}\in\mathrm{argmax}_{m=1,\dots,d}\left|f^{s+\widehat{L}t_{1}}(\theta)^{\top}\mathbf{u}_{m}-f^{s+\widehat{L}t_{2}}(\theta)^{\top}\mathbf{u}_{m}\right|. Put at:=fs+L^t(θ)𝐮ma_{t}:=f^{s+\widehat{L}t}(\theta)^{\top}\mathbf{u}_{m^{\prime}}. Then, for any t,t>0t,t^{\prime}\in\mathbb{Z}_{>0}, we have

|atat|\displaystyle|a_{t}-a_{t^{\prime}}| |at1at2|2μ\displaystyle\geq|a_{t_{1}}-a_{t_{2}}|-2\mu
fs+L^t1(θ)fs+L^t2(θ)dd2μ\displaystyle\geq\frac{\|f^{s+\widehat{L}t_{1}}(\theta)-f^{s+\widehat{L}t_{2}}(\theta)\|_{\mathbb{R}^{d}}}{\sqrt{d}}-2\mu
>ρd.\displaystyle>\frac{\rho}{\sqrt{d}}.

Thus, by definition, we have

σL/L^((at)t)ρ2dL/L^ρ2dLmax.\sigma_{L/\widehat{L}}((a_{t})_{t})\geq\frac{\rho}{2\sqrt{dL/\widehat{L}}}\geq\frac{\rho}{2\sqrt{dL_{\max}}}.

Let σ0=ρ/2dLmax\sigma_{0}=\rho/2\sqrt{dL_{\max}} and ξ:=γ1/3Lmax\xi:=\gamma^{-1}/3\sqrt{L_{\max}}. Then, μ/(γξ)<σ0σL/L^((at)t)\mu/(\gamma\xi)<\sigma_{0}\leq\sigma_{L/\widehat{L}}((a_{t})_{t}), and

8LmaxR2log(4/δ)σ02(ξλ)2+36Lmax5/2supt1|at|σ0(ξλ)\displaystyle\frac{8L_{\max}R^{2}\log(4/\delta)}{\sigma_{0}^{2}(\xi-\lambda)^{2}}+\frac{36L_{\max}^{5/2}\sup_{t\geq 1}|a_{t}|}{\sigma_{0}(\xi-\lambda)}
32ξ2dLmax2R2log(4/δ)ρ2(1r)2+72ξ1dLmax3supt1|at|ρ(1r)\displaystyle\leq\frac{32\xi^{-2}dL_{\max}^{2}R^{2}\log(4/\delta)}{\rho^{2}(1-r)^{2}}+\frac{72\xi^{-1}\sqrt{d}L_{\max}^{3}\sup_{t\geq 1}|a_{t}|}{\rho(1-r)}
72dLmax2R2log(4/δ)ρ2(1r)2+108dLmax3supt1|at|ρ(1r).\displaystyle\leq\frac{72dL_{\max}^{2}R^{2}\log(4/\delta)}{\rho^{2}(1-r)^{2}}+\frac{108\sqrt{d}L_{\max}^{3}\sup_{t\geq 1}|a_{t}|}{\rho(1-r)}.

The last inequality follows from ξ2/3\xi\geq 2/3. Thus, by Proposition 5.2, the algorithm finds β>L^\beta>\widehat{L} in mm^{\prime}-th loop, and the output becomes an integer larger than L^\widehat{L}, which is contradiction. ∎

Appendix C Proof of Theorem 5.8

Here, we provide the proof of Theorem 5.8.

C.1 Proof

Let

K\displaystyle K :=(Mθ,,Mdθ):dW,\displaystyle:=(M\theta,\dots,M^{d}\theta):\mathbb{C}^{d}\to W,
Es(N)\displaystyle E_{s}(N) :=𝒲((Es,j)j=0N1):dd.\displaystyle:=\mathscr{W}\left((E_{s,j})_{j=0}^{N-1}\right):\mathbb{C}^{d}\to\mathbb{C}^{d}.

For a linear map :WW\mathcal{M}:W\to W, we define linear maps:

X()\displaystyle X(\mathcal{M}) :=(x~11x~22d+1x~d2(d1)d+1):Wd,\displaystyle:=\left(\begin{array}[]{c}\tilde{x}_{1}^{\top}\mathcal{M}^{1}\\ \tilde{x}_{2}^{\top}\mathcal{M}^{2d+1}\\ \vdots\\ \tilde{x}_{d}^{\top}\mathcal{M}^{2(d-1)d+1}\end{array}\right):W\to\mathbb{C}^{d},
QN()\displaystyle Q_{N}(\mathcal{M}) :=𝒲((2d2j)j=0N1):WW.\displaystyle:=\mathscr{W}\left((\mathcal{M}^{2d^{2}j})_{j=0}^{N-1}\right):W\to W.

For s=0,1s=0,1, we define linear maps on d\mathbb{C}^{d} by

As(N;)\displaystyle A_{s}(N;\mathcal{M}) :=X()QN()sd+N1K+Es(N),\displaystyle:=X(\mathcal{M})Q_{N}(\mathcal{M})\mathcal{M}^{sd+N-1}K+E_{s}(N),
Bs(N;)\displaystyle B_{s}(N;\mathcal{M}) :=X()QN()sd+N1K.\displaystyle:=X(\mathcal{M})Q_{N}(\mathcal{M})\mathcal{M}^{sd+N-1}K.

We note that As(N;Mθ)A_{s}(N;M_{\theta}) is identical to As(N)A_{s}(N) defined in (5.7). We impose the following assumption on X(Mθ)X(M_{\theta}):

Assumption 5.

The kernel of the linear map X(M1)X(M_{1}) is the same as 𝒩(M1)\mathcal{N}(M_{1}).

Note that this assumption holds with probability 1 if we randomly choose x~1,,x~d\tilde{x}_{1},\dots,\tilde{x}_{d} (see Lemma C.7).

The following isomorphicity maintenance lemma provides an explicit description of B0(N;M1)B_{0}(N;M_{1})^{\dagger}.

Lemma C.1 (Isomorphicity maintenance lemma).

Suppose Assumption 5 holds. Assume N16L2N\geq 16L^{2}. Let

U1\displaystyle U_{1} :=(B0(N;M1))d,\displaystyle:=\mathscr{I}(B_{0}(N;M_{1}))\subset\mathbb{C}^{d},
U2\displaystyle U_{2} :=𝒩(B0(N;M1))d,\displaystyle:=\mathscr{N}(B_{0}(N;M_{1}))\subset\mathbb{C}^{d},
U3\displaystyle U_{3} :=(M1)=W=1W.\displaystyle:=\mathscr{I}(M_{1})=W_{=1}\subset W.

Let i:U1di:U_{1}\to\mathbb{C}^{d} be the inclusion map and p:dU2p:\mathbb{C}^{d}\to U_{2}^{\perp} the orthogonal projection. Then, restriction of X(M1)X(M_{1}) (resp. M1KM_{1}K) to U3U_{3} (resp. U2U_{2}^{\perp}) induces an isomorphism onto U1U_{1} (resp. U3U_{3}). If we denote the isomorphism by X~\tilde{X} (resp, K~\tilde{K}). Then, B0(N;M1)B_{0}(N;M_{1})^{\dagger} is given by

B0(N;M1)=pK~1M1|U32NQN(M1)|U31X~1i.\displaystyle B_{0}(N;M_{1})^{\dagger}=p^{*}\tilde{K}^{-1}M_{1}|_{U_{3}}^{2-N}Q_{N}(M_{1})|_{U_{3}}^{-1}\tilde{X}^{-1}i^{*}.
Proof.

Since we have 𝒩(M1)=𝒩(M1r+1)\mathscr{N}(M_{1})=\mathscr{N}(M_{1}^{r+1}) for all r0r\geq 0 and Assumption 5, surjectivity of KK by Proposition C.5, and bijectivity of QN(M1)Q_{N}(M_{1}) on U3U_{3} by Proposition 5.7, we have

U1\displaystyle U_{1} =(X(M1)|U3),\displaystyle=\mathscr{I}(X(M_{1})|_{U_{3}}),
U2\displaystyle U_{2} =𝒩(M1K).\displaystyle=\mathscr{N}(M_{1}K).

Thus, the restriction of X(M1)X(M_{1}) (resp. KK) to U3U_{3} (resp. U2U_{2}^{\perp}) induces an isomorphism onto U1U_{1} (resp. U3U_{3}). Let us denote the isomorphism by X~\tilde{X} (resp. K~\tilde{K}). Then, the last statement follows from Proposition C.4. ∎

Lemma C.2.

Assume N16L2N\geq 16L^{2}. Let

A:=B1(N;M1)B0(N;M1).\displaystyle A:=B_{1}(N;M_{1})B_{0}(N;M_{1})^{\dagger}.

Then, AA is independent of NN and its eigenvalues are zeros except for (θ,d)(\theta,d)-distinct eigenvalues of MM.

Proof.

We use the notation as in Lemma C.1. By Lemma C.1, we obtain

B1(N;M1)B0(N;M1)=iX~M1dX~1i,B_{1}(N;M_{1})B_{0}(N;M_{1})^{\dagger}=i\tilde{X}M_{1}^{d}\tilde{X}^{-1}i^{*},

which is independent of NN and its eigenvalues are zeros except for (θ,d)(\theta,d)-distinct eigenvalues of MM. ∎

The following result will be used in the proof of Theorem 5.8 (but not essential).

Lemma C.3.

We have

X(M1)\displaystyle\|X(M_{1})\| κd,\displaystyle\leq\kappa\sqrt{d},
X(M<1)\displaystyle\|X(M_{<1})\| κ(d+1)2d.\displaystyle\leq\kappa(d+1)2^{d}.
Proof.

Considering the Jordan normal form, the first inequality is obvious. As for the second inequality, we define Jd×dJ\in\mathbb{R}^{d\times d} by the nilpotent matrix

J:=(01O1O0).J:=\left(\begin{array}[]{cccc}0&1&&O\\ &\ddots&\ddots&\\ &&&1\\ O&&&0\end{array}\right).

Then, we have

κ1X(M1)\displaystyle\kappa^{-1}\|X(M_{1})\| r=1d(I+J)1\displaystyle\leq\sum_{r=1}^{d}\|(I+J)\|_{1}
=r=1dj=0d(rj)(dj)\displaystyle=\sum_{r=1}^{d}\sum_{j=0}^{d}\binom{r}{j}(d-j)
=j=0d(dj)r=1d(rj)\displaystyle=\sum_{j=0}^{d}(d-j)\sum_{r=1}^{d}\binom{r}{j}
=d+j=1d(dj)r=1d(rj)\displaystyle=d+\sum_{j=1}^{d}(d-j)\sum_{r=1}^{d}\binom{r}{j}
=d+j=1d(dj)(r+1j+1)\displaystyle=d+\sum_{j=1}^{d}(d-j)\binom{r+1}{j+1}
=d+(d+1)2d(d+1)d(d+1)\displaystyle=d+(d+1)2^{d}-(d+1)d-(d+1)
<(d+1)2d.\displaystyle<(d+1)2^{d}.

Proof of Theorem 5.8.

Let AA be the matrix introduced in Lemma C.2. Let

γ(N):=4d2R2log(4d2/δ)+1N.\gamma(N):=\frac{\sqrt{4d^{2}R^{2}\log(4d^{2}/\delta)}+1}{\sqrt{N}}.

Let M1M_{1} and M<1M_{<1} be matrices as in Corollary 4.3. Then, by Proposition 4.2, we see that

As(N,M)\displaystyle A_{s}(N,M) =As(N,M1)+As(N,M<1),\displaystyle=A_{s}(N,M_{1})+A_{s}(N,M_{<1}),
Bs(N,M)\displaystyle B_{s}(N,M) =Bs(N,M1)+Bs(N,M<1).\displaystyle=B_{s}(N,M_{1})+B_{s}(N,M_{<1}).

We denote by A^s(N;M)\hat{A}_{s}(N;M) (resp, B^s(N;M)\hat{B}_{s}(N;M)) the low rank approximation via the singular value threshold γ(N)\gamma(N) (see Definition 5.4). By direct computations, we have

AA1(N;M)A^0(N;M)\displaystyle\|A-A_{1}(N;M)\hat{A}_{0}(N;M)^{\dagger}\|
A1(N;M)A^0(N;M)B0(N;M1)+A1(N;M)B1(N;M1)B0(N;M1)\displaystyle\leq\|A_{1}(N;M)\|\cdot\|\hat{A}_{0}(N;M)^{\dagger}-B_{0}(N;M_{1})^{\dagger}\|+\|A_{1}(N;M)-B_{1}(N;M_{1})\|\cdot\|B_{0}(N;M_{1})^{\dagger}\|
(B1(N;M1)+B1(N;M<1)+E1(N))A^0(N;M)B0(N;M1)\displaystyle\leq(\|B_{1}(N;M_{1})\|+\|B_{1}(N;M_{<1})\|+\|E_{1}(N)\|)\cdot\|\hat{A}_{0}(N;M)^{\dagger}-B_{0}(N;M_{1})^{\dagger}\|
+B1(N;M<1)+E1(N)B0(N;M1).\displaystyle\qquad+\|B_{1}(N;M_{<1})+E_{1}(N)\|\cdot\|B_{0}(N;M_{1})^{\dagger}\|.

By Proposition 5.7, for s=0,1s=0,1 and for Nmax{2d,16L2}N\geq\max\{2d,16L^{2}\}, we have

Bs(N;M1)\displaystyle\|B_{s}(N;M_{1})\| X(M1)M1N+sd1QN(M1)K\displaystyle\leq\|X(M_{1})\|\cdot\|M_{1}^{N+sd-1}Q_{N}(M_{1})\|\cdot\|K\|
κCws(L)X(M1)K\displaystyle\leq\kappa C_{\rm ws}(L)\|X(M_{1})\|\|K\|
Bdκ2Cws(L)\displaystyle\leq Bd\kappa^{2}C_{\rm ws}(L)
Bs(N;M<1)\displaystyle\|B_{s}(N;M_{<1})\| X(M<1)M<1N+sd1QN(M<1)K\displaystyle\leq\|X(M_{<1})\|\cdot\|M_{<1}^{N+sd-1}Q_{N}(M_{<1})\|\cdot\|K\|
X(M<1)Kd2κeΔ(N+sdd)NΔd1\displaystyle\leq\|X(M_{<1})\|\cdot\|K\|\cdot\frac{d^{2}\kappa e^{-\Delta(N+sd-d)}}{N\Delta^{d-1}}
Bκd(d+1)2dd2κeΔ(N+sdd)NΔd1\displaystyle\leq B\kappa\sqrt{d}(d+1)2^{d}\cdot\frac{d^{2}\kappa e^{-\Delta(N+sd-d)}}{N\Delta^{d-1}}
Bκ2ed+6Δ(N+sdd)NΔd1,\displaystyle\leq\frac{B\kappa^{2}e^{d+6-\Delta(N+sd-d)}}{N\Delta^{d-1}},

where we used KdB\|K\|\leq\sqrt{d}B (Assumption 3), and X(M<1)κ(d+1)2d\|X(M_{<1})\|\leq\kappa(d+1)2^{d} (Lemma C.3), and d(d+1)d22d<ed+6\sqrt{d}(d+1)d^{2}2^{d}<e^{d+6}. By Lemma C.1 with Proposition 5.7, we see that

B0(N;M1)κCws(L)X~1K~1.\displaystyle\|B_{0}(N;M_{1})^{\dagger}\|\leq\kappa C_{\rm ws}(L)\|\tilde{X}^{-1}\|\cdot\|\tilde{K}^{-1}\|.

By using Lemma D.1 and union bounds, we obtain

max(E0(N)F,E1(N)F)γ(N)1N,\displaystyle\max(\|E_{0}(N)\|_{F},\|E_{1}(N)\|_{F})\leq\gamma(N)-\frac{1}{\sqrt{N}},

with probability at least 1δ1-\delta. Assume that

N(d1)logΔΔ+log(Bκ2)+d+6Δ+d.N\geq\frac{-(d-1)\log\Delta}{\Delta}+\frac{\log(B\kappa^{2})+d+6}{\Delta}+d.

Then, we see that Bs(N;M<1)1/N\|B_{s}(N;M_{<1})\|\leq 1/N. Thus, by Lemma C.6, with probability at least 1δ1-\delta, we have

A^0(N;M)B0(N;M1)\displaystyle\|\hat{A}_{0}(N;M)^{\dagger}-B_{0}(N;M_{1})^{\dagger}\| 8(E0(N)+B0(N;M<1))B0(N,M1)2(d+1)\displaystyle\leq 8(\|E_{0}(N)\|+\|B_{0}(N;M_{<1})\|)\cdot\|B_{0}(N,M_{1})^{\dagger}\|^{2}(\sqrt{d}+1)
8γ(N)B0(N,M1)2(d+1)\displaystyle\leq 8\gamma(N)\cdot\|B_{0}(N,M_{1})^{\dagger}\|^{2}(\sqrt{d}+1)
8(1+d)κ2Cws(L)2X~12K~12γ(N).\displaystyle\leq 8(1+\sqrt{d})\kappa^{2}C_{\rm ws}(L)^{2}\|\tilde{X}^{-1}\|^{2}\cdot\|\tilde{K}^{-1}\|^{2}\gamma(N).

Therefore, there exists C>0C>0 depending on X(M)\|X(M)\|, K\|K\|, κ\kappa, Cws(L)C_{\rm ws}(L), X~1\|\tilde{X}^{-1}\|, K~1\|\tilde{K}^{-1}\|, and dd such that

AA1(N;M)A^0(N;M)<C(R2(log(1/δ)+1)+1N),\|A-A_{1}(N;M)\hat{A}_{0}(N;M)^{\dagger}\|<C\left(\frac{R^{2}\left(\log\left(1/\delta\right)+1\right)+1}{\sqrt{N}}\right),

with probability at least 1δ1-\delta. ∎

C.2 Miscellaneous

We provide an expression of Moore-Penrose pseudo inverse:

Proposition C.4.

Let A:mnA:\mathbb{C}^{m}\to\mathbb{C}^{n} be a linear map. Let i:(A)ni:\mathscr{I}(A)\to\mathbb{C}^{n} be the inclusion map and let p:m𝒩(A)p:\mathbb{C}^{m}\to\mathscr{N}(A)^{\perp} be the orthogonal projection. Let A~:=A|𝒩(A):𝒩(A)(A)\tilde{A}:=A|_{\mathscr{N}(A)}:\mathscr{N}(A)^{\perp}\to\mathscr{I}(A) be an isomorphism. Then, the Moore-Penrose pseudo inverse AA^{\dagger} coincides with iA~1pi^{*}\tilde{A}^{-1}p^{*}.

Proof.

Let B:=pA~1iB:=p^{*}\tilde{A}^{-1}i^{*}. We remark that A=iA~pA=i\tilde{A}p, ii=id,pp=idi^{*}i={\rm id},pp^{*}={\rm id}, we see that ABA=AABA=A, BAB=BBAB=B, (AB)=AB(AB)^{*}=AB, and BA=(BA)BA=(BA)^{*}. By the uniqueness of Moore-Penrose pseudo inverse, B=AB=A^{\dagger}. ∎

Proposition C.5.

Let Ad×dA\in\mathbb{C}^{d\times d} be a matrix and let vdv\in\mathbb{C}^{d} be a vector. Let VdV\subset\mathbb{C}^{d} be a linear subspace generated by {Ajv}j=1\{A^{j}v\}_{j=1}^{\infty}. Then, V=((Av,A2v,,Adv))V=\mathscr{I}\left((Av,A^{2}v,\dots,A^{d}v)\right).

Proof.

Put W=((Av,A2v,,Adv))W=\mathscr{I}\left((Av,A^{2}v,\dots,A^{d}v)\right). The inclusion WVW\subset V is obvious, we prove the opposite inclusion. It suffices to show that AjvWA^{j}v\in W for any positive integer j>dj>d. By the Cayley-Hamilton theorem, Ad=j=1dcjAdjA^{d}=\sum_{j=1}^{d}c_{j}A^{d-j} for some cjc_{j}\in\mathbb{C}. Thus, by induction AjA^{j} is a linear combination of A,A2,,AdA,A^{2},\dots,A^{d}, namely, AjvWA^{j}v\in W. ∎

We give several lemmas here.

Lemma C.6 (Perturbation bounds of the Moore-Penrose inverse).

Suppose Ad×dA\in\mathbb{C}^{d\times d} is a matrix. Let Ed×dE\in\mathbb{C}^{d\times d} be a matrix satisfying

C>0,N>0,EC1N,\displaystyle\exists C>0,~{}~{}\forall N\in\mathbb{Z}_{>0},~{}~{}\|E\|\leq C\frac{1}{\sqrt{N}},

and let A^Ed×d\hat{A}_{E}\in\mathbb{C}^{d\times d} be the low rank approximation of A+EA+E via SVD with the singular value threshold C/NC/\sqrt{N}. Then, we obtain

AA^E8CA2(d+1)N.\displaystyle\left\|A^{\dagger}-\hat{A}_{E}^{\dagger}\right\|\leq\frac{8C\|A^{\dagger}\|^{2}\left(\sqrt{d}+1\right)}{\sqrt{N}}.
Proof.

Let σmin=A1\sigma_{\min}=\|A^{\dagger}\|^{-1} be the minimal singular value of AA. From (Meng & Zheng, 2010, Theorem 1.1) (or originally Wedin (1973)) and from the fact

A^EAE+dCN,\displaystyle\left\|\hat{A}_{E}-A\right\|\leq\|E\|+\frac{\sqrt{d}C}{\sqrt{N}},

we obtain

AA^E1+52max{A2,A^E2}(d+1)CN.\displaystyle\left\|A^{\dagger}-\hat{A}_{E}^{\dagger}\right\|\leq\frac{1+\sqrt{5}}{2}\max\left\{\left\|A^{\dagger}\right\|^{2},\left\|\hat{A}_{E}^{\dagger}\right\|^{2}\right\}\left(\sqrt{d}+1\right)\frac{C}{\sqrt{N}}.

For NN such that 1N4C2/σmin21\leq N\leq 4C^{2}/\sigma_{\rm min}^{2}, we have A^EN/C2/σmin\left\|\hat{A}_{E}^{\dagger}\right\|\leq\sqrt{N}/C\leq 2/\sigma_{\rm min}. Suppose singular values σk,k[d]\sigma_{k},~{}k\in[d], of AA, and σ^k,k[d]\hat{\sigma}_{k},~{}k\in[d], of A+EA+E are sorted in descending order. Then, it holds that

|σkσ^k|E.\displaystyle|\sigma_{k}-\hat{\sigma}_{k}|\leq\left\|E\right\|.

Therefore, for N>4C2/σmin2N>4C^{2}/\sigma_{\rm min}^{2}, the minimum singular value σ^d\hat{\sigma}_{d} is greater than σmin/2\sigma_{\rm min}/2. In this case, A^E=A+E\hat{A}_{E}=A+E, and it follows that A^E2/σmin\left\|\hat{A}_{E}^{\dagger}\right\|\leq 2/\sigma_{\rm min}. Hence, for all N1N\geq 1, we obtain

A^E2σmin,\displaystyle\left\|\hat{A}_{E}^{\dagger}\right\|\leq\frac{2}{\sigma_{\rm min}},

from which, it follows that

AA^E2C(1+5)(d+1)σmin2N8CA2(d+1)N.\displaystyle\left\|A^{\dagger}-\hat{A}_{E}^{\dagger}\right\|\leq\frac{2C\left(1+\sqrt{5}\right)\left(\sqrt{d}+1\right)}{\sigma_{\rm min}^{2}\sqrt{N}}\leq\frac{8C\|A^{\dagger}\|^{2}\left(\sqrt{d}+1\right)}{\sqrt{N}}.

Lemma C.7 (Null space of random matrix).

Suppose Mkd×d,k[d]M_{k}\in\mathbb{R}^{d\times d},~{}k\in[d], have the same null space. Then, the null space of

X\displaystyle X :=(x1M1x2M2xdMd),\displaystyle:=\left(\begin{array}[]{c}x_{1}^{\top}M_{1}\\ x_{2}^{\top}M_{2}\\ \vdots\\ x_{d}^{\top}M_{d}\\ \end{array}\right), (C.5)

where xk,k[d]x_{k},~{}k\in[d], are unit vectors independently drawn from the uniform distribution over the unit hypersphere, is the same as those of MkM_{k} with probability one.

Proof.

Given any k1k-1 dimensional linear subspace in 𝒩(Mk)\mathscr{N}(M_{k})^{\perp} for any k[d]k\in[d], it holds that the probability that xkMkx_{k}^{\top}M_{k} lies on that space is zero. Therefore, by union bound, and by the fact that the row space is the orthogonal complement of the null space, we obtain the result. ∎

Appendix D Azuma-Hoeffding inequality for exponential sum

Lemma D.1.

Let {Xj}j=1n\{X_{j}\}_{j=1}^{n} for n>0n\in\mathbb{Z}_{>0} be sub-Gaussian martingale difference with variance proxy R2R^{2} and a filtration {j}\{\mathcal{F}_{j}\}. Also, let {aj}\{a_{j}\}\subset\mathbb{C} be a sequence of complex numbers satisfying |aj|1|a_{j}|\leq 1 for all j[n]j\in[n]. Then, the followings hold, where []*[\cdot] stands for 𝔢[]\mathfrak{Re}[\cdot] or 𝔪[]\mathfrak{Im}[\cdot].

Pr[1n|[j=1najXj]|2R2log(2/δ)n]1δ,\displaystyle\mathrm{Pr}\left[\frac{1}{n}\left|*\left[\sum_{j=1}^{n}a_{j}X_{j}\right]\right|\leq\sqrt{\frac{2R^{2}\log\left(2/\delta\right)}{n}}\right]\geq 1-\delta,
Pr[1n|j=1najXj|4R2log(4/δ)n]1δ.\displaystyle\mathrm{Pr}\left[\frac{1}{n}\left|\sum_{j=1}^{n}a_{j}X_{j}\right|\leq\sqrt{\frac{4R^{2}\log\left(4/\delta\right)}{n}}\right]\geq 1-\delta.
Proof.

For a filtration {i}in\{\mathcal{F}_{i}\}_{i\leq n}, we have

𝔼[eλ𝔢[j=1najXj]]𝔼[eλ𝔢[j=1n1ajXj]𝔼[eλ𝔢[anXn]|n1]]eλ2R22𝔼[eλ𝔢[j=1n1ajXj]],\displaystyle\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]}\right]\leq\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[\sum_{j=1}^{n-1}a_{j}X_{j}\right]}\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[a_{n}X_{n}\right]}\big{|}\mathcal{F}_{n-1}\right]\right]\leq e^{\frac{\lambda^{2}R^{2}}{2}}\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[\sum_{j=1}^{n-1}a_{j}X_{j}\right]}\right],

where the first inequality follows from the assumption of filtration, and the second inequality follows from

𝔼[eλ𝔢[anXn]|n1]=𝔼[eλXn𝔢[an]|n1]eλ2R22.\displaystyle\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[a_{n}X_{n}\right]}\big{|}\mathcal{F}_{n-1}\right]=\mathbb{E}\left[e^{\lambda X_{n}\mathfrak{Re}\left[a_{n}\right]}\big{|}\mathcal{F}_{n-1}\right]\leq e^{\frac{\lambda^{2}R^{2}}{2}}.

By induction, we obtain

𝔼[eλ𝔢[j=1najXj]]enλ2R22.\displaystyle\mathbb{E}\left[e^{\lambda\mathfrak{Re}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]}\right]\leq e^{\frac{n\lambda^{2}R^{2}}{2}}.

By using Markov inequality and the union bound, it follows that

Pr[1n|𝔢[j=1najXj]|>ϵ]2enϵ22R2.\displaystyle\mathrm{Pr}\left[\frac{1}{n}\left|\mathfrak{Re}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]\right|>\epsilon\right]\leq 2e^{-\frac{n\epsilon^{2}}{2R^{2}}}.

Similarly, we have

Pr[1n|𝔪[j=1najXj]|>ϵ]2enϵ22R2.\displaystyle\mathrm{Pr}\left[\frac{1}{n}\left|\mathfrak{Im}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]\right|>\epsilon\right]\leq 2e^{-\frac{n\epsilon^{2}}{2R^{2}}}.

Therefore, we obtain

Pr\displaystyle\mathrm{Pr} [1n|j=1najXj|4R2log(4/δ)n]\displaystyle\left[\frac{1}{n}\left|\sum_{j=1}^{n}a_{j}X_{j}\right|\leq\sqrt{\frac{4R^{2}\log\left(4/\delta\right)}{n}}\right]
Pr[1n|𝔢[j=1najXj]|2R2log(4/δ)n]\displaystyle\geq\mathrm{Pr}\left[\frac{1}{n}\left|\mathfrak{Re}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]\right|\leq\sqrt{\frac{2R^{2}\log\left(4/\delta\right)}{n}}\right]
+Pr[1n|𝔪[j=1najXj]|2R2log(4/δ)n]1\displaystyle~{}~{}~{}~{}+\mathrm{Pr}\left[\frac{1}{n}\left|\mathfrak{Im}\left[\sum_{j=1}^{n}a_{j}X_{j}\right]\right|\leq\sqrt{\frac{2R^{2}\log\left(4/\delta\right)}{n}}\right]-1
(1δ2)+(1δ2)11δ,\displaystyle\geq\left(1-\frac{\delta}{2}\right)+\left(1-\frac{\delta}{2}\right)-1\geq 1-\delta,

where the first inequality follows from the Fréchet inequality. ∎

Appendix E Applications to bandit problems

We briefly cover the applicability of our proposed algorithms to bandit problems (e.g., regret minimization) which is mentioned in the introduction.

A naive approach is an explore-then-commit type algorithm (cf. Robbins (1952); Anscombe (1963)). One employs our algorithm to estimate a nearly period, followed by a certain periodic bandit algorithm such as the work in Cai et al. (2021) to obtain an asymptotic order of regret. Caveat here is, because our estimate is only an aliquot nearly period, one may need to take into account the regret caused by this misspecification when running bandit algorithms (e.g., ρ\rho and μ\mu may lead to (small) linear regret). Avoiding this small linear regret would require the system to be 0-nearly periodic and that there exists a sufficiently large gap ensuring μ\mu-nearly period with sufficiently small μ\mu implies 0-nearly period.

If one aims at designing an anytime algorithm, the straightforward application of our algorithms may not give near optimal asymptotic rate of expected regret because the failure probability of periodic structure estimations cannot be adjusted later. To remedy this, one can employ our algorithm repetitively, and gradually increase the span of such procedure. Importantly, samples from separated spans can contribute to the estimate together when the surplus beyond a multiple of period is properly dealt with. Since failure probability decreases exponentially with respect to sample size, we conjecture that increasing the span for bandit algorithm by a certain order will lead to the same rate (up to logarithm) of expected regret of the adopted bandit algorithm.

Appendix F Simulation setups and results

Throughout, we used the following version of Julia Bezanson et al. (2017); for each experiment, the running time was less than a few minutes.

Julia Version 1.6.3
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
JULIA_NUM_THREADS = 12

We also used some tools and functionalities of Lyceum Summers et al. (2020). The licenses of Julia and Lyceum are [The MIT License; Copyright (c) 2009-2021: Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and other contributors: https://github.com/JuliaLang/julia/contributors], and [The MIT License; Copyright (c) 2019 Colin Summers, The Contributors of Lyceum], respectively.

In this section, we provide simulation setups, including the details of parameter settings.

F.1 Period estimation: LifeGame

The hyperparameters of LifeGame environment and the algorithm are summarized in Table 3. Note μ=0\mu=0 because it is a periodic transition. Here, we used 12×1212\times 12 blocks of cells and we focused on the five blocks surrounded by the red rectangle in Figure 8. The transition rule is given by

  1. 1.

    If the cell is alive and two or three of its surrounding eight cells are alive, then the cell remains alive.

  2. 2.

    If the cell is alive and more than three or less than two of its surrounding eight cells are alive, then the cell dies.

  3. 3.

    If the cell is dead and exactly three of its surrounding eight cells are alive, then the cell is revived.

Table 3: Hyperparameters used for period estimation of LifeGame.
LifeGame hyperparameter Value Algorithm hyperparameter Value
height 1212 accuracy for estimation ρ\rho 0.980.98
width 1212 failure probability bound δ\delta 0.20.2
observed dimension 55 maximum possible period LmaxL_{\rm max} 1010
observation noise proxy RR 0.30.3
ball radius BB 5\sqrt{5}
Refer to caption
Figure 8: The area we focus on for the cellular automata experiment.

F.2 Period estimation: Simple μ\mu-nearly periodic system

The dynamical system

rt+1=μ(αrt1μαrt1μ)+1,θt+1=θt+2πL,\displaystyle r_{t+1}=\mu\left(\alpha\frac{r_{t}-1}{\mu}-\lceil\alpha\frac{r_{t}-1}{\mu}\rceil\right)+1,~{}~{}~{}~{}~{}\theta_{t+1}=\theta_{t}+\frac{2\pi}{L},

is μ\mu-nearly periodic. See Figure 9 for the illustrations when μ=0.2,L=5,α=π\mu=0.2,~{}L=5,~{}\alpha=\pi. It is observed that there are five clusters. We mention that this system is not exactly periodic. The hyperparameters of this system and the algorithm are summarized in Table 4.

Table 4: Hyperparameters used for μ\mu-nearly periodic system.
System hyperparameter Value Algorithm hyperparameter Value
dimension 22 accuracy for estimation ρ\rho 0.30.3
μ\mu 0.0010.001 failure probability bound δ\delta 0.20.2
α\alpha π\pi maximum possible nearly period LmaxL_{\rm max} 88
observation noise proxy RR 0.30.3
ball radius BB 22
Refer to caption
Figure 9: An example of μ\mu-nearly periodic system.

F.3 Eigenvalue estimation

We used the matrix MM given by

M:=(0001001000100000010000000.7).\displaystyle M:=\left(\begin{array}[]{ccccc}0&0&0&1&0\\ 0&1&0&0&0\\ 1&0&0&0&0\\ 0&0&1&0&0\\ 0&0&0&0&0.7\\ \end{array}\right). (F.6)

The first 4×44\times 4 block matrix is for permutation. After NN steps, it is expected that the last dimension shrinks so that the system becomes nearly periodic. It follows that 4!=244!=24 is a multiple of the length LL. Eigenvalues of M5M^{5} are given by 1.000,1.000,0.5000.866i,0.500+0.866i,0.1681.000,~{}1.000,~{}-0.500-0.866i,~{}-0.500+0.866i,~{}0.168, and the (θ0,5)(\theta_{0},5)-distinct eigenvalues are 1.000,0.5000.866i,0.500+0.866i1.000,~{}-0.500-0.866i,~{}-0.500+0.866i.

The hyperparameters of the environment and the algorithm are summarized in Table 5. Note we don’t necessarily need κ\kappa, BB, and Δ\Delta to run the algorithm as long as the effective sample size is sufficiently large; we used the values (satisfying the conditions) in Table 5 for simplicity.

Table 5: Hyperparameters used for eigenvalue estimation.
Hyperparameter Value Hyperparameter Value
κ\kappa 66 a nearly period LL 2424
Δ\Delta 0.10.1 failure probability bound δ\delta 0.20.2
dimension 55 observation noise proxy RR 0.30.3
ball radius BB 11

F.4 Realistic simulation

The hyperparameters used for period estimation that outputs an invalid estimate are summarized in Table 6 while those for a reasonable output are summarized in Table 7. Also, the hyperparameters used for the eigenvalue estimation are summarized in Table 8. Note for all of them, we used the fixed number of sample sizes.

Table 6: Hyperparameters used for worm simulation that outputs incorrect value.
System hyperparameter Value Algorithm hyperparameter Value
dimension 3636 accuracy for estimation ρ\rho 1.01.0
sample number TpT_{p} 3×1053\times 10^{5} maximum possible nearly period LmaxL_{\rm max} 2525
observation noise proxy RR 0
Table 7: Hyperparameters used for worm simulation that outputs reasonable value.
System hyperparameter Value Algorithm hyperparameter Value
dimension 3636 accuracy for estimation ρ\rho 20.020.0
sample number TpT_{p} 3×1053\times 10^{5} maximum possible nearly period LmaxL_{\rm max} 2525
observation noise proxy RR 0
Table 8: Hyperparameters used for eigenvalue estimation of worm simulation.
Hyperparameter Value Hyperparameter Value
sample number NN 103,104,10510^{3},10^{4},10^{5} failure probability bound δ\delta 0.20.2
dimension 3636 observation noise proxy RR 1.01.0
a nearly period LL 2020