This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\diagramstyle

[labelstyle=] \stackMath

Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems

Lyudmila Grigoryeva1 and Juan-Pablo Ortega2,3
Abstract

A new class of non-homogeneous state-affine systems is introduced for use in reservoir computing. Sufficient conditions are identified that guarantee first, that the associated reservoir computers with linear readouts are causal, time-invariant, and satisfy the fading memory property and second, that a subset of this class is universal in the category of fading memory filters with stochastic almost surely uniformly bounded inputs. This means that any discrete-time filter that satisfies the fading memory property with random inputs of that type can be uniformly approximated by elements in the non-homogeneous state-affine family.


Key Words: reservoir computing, universality, state-affine systems, SAS, echo state networks, ESN, echo state affine systems, machine learning, fading memory property, linear training, stochastic signal treatment.

22footnotetext: Department of Mathematics and Statistics. Universität Konstanz. Box 146. D-78457 Konstanz. Germany. Lyudmila.Grigoryeva@uni-konstanz.de 33footnotetext: Universität Sankt Gallen. Faculty of Mathematics and Statistics. Bodanstrasse 6. CH-9000 Sankt Gallen. Switzerland. Juan-Pablo.Ortega@unisg.ch44footnotetext: Centre National de la Recherche Scientifique (CNRS). France.

1 Introduction

A reservoir computer (RC) [Jaeg 10, Jaeg 04, Maas 02, Maas 11, Croo 07, Vers 07, Luko 09] or a RC system is a specific type of recurrent neural network determined by two maps, namely a reservoir F:N×nNF:\mathbb{R}^{N}\times\mathbb{R}^{n}\longrightarrow\mathbb{R}^{N}, n,Nn,N\in\mathbb{N}, and a readout map h:Nh:\mathbb{R}^{N}\rightarrow\mathbb{R} that under certain hypotheses transform (or filter) an infinite discrete-time input 𝐳=(,𝐳1,𝐳0,𝐳1,)(n){\bf z}=(\ldots,{\bf z}_{-1},{\bf z}_{0},{\bf z}_{1},\ldots)\in(\mathbb{R}^{n})^{\mathbb{Z}} into an output signal 𝐲{\bf y}\in\mathbb{R}^{\mathbb{Z}} of the same type using the state-space transformation given by:

𝐱t\displaystyle\mathbf{x}_{t} =F(𝐱t1,𝐳t),\displaystyle=F(\mathbf{x}_{t-1},{\bf z}_{t}), (1.1)
yt\displaystyle y_{t} =h(𝐱t),\displaystyle=h(\mathbf{x}_{t}), (1.2)

where tt\in\mathbb{Z} and the dimension NN\in\mathbb{N} of the state vectors 𝐱tN\mathbf{x}_{t}\in\mathbb{R}^{N} will be referred to as the number of virtual neurons of the system. The expressions (1.1)-(1.2) determine a nonlinear state-space system and many of its dynamical properties (stability, controlability) have been studied for decades in the literature from that point of view.

This notion of reservoir computer (also known as liquid state machine) is a significant generalization of the definitions found in the literature, where the readout map hh is consistently taken to be linear. In many supervised machine learning applications, the reservoir map is randomly generated (see, for instance, the echo state networks in [Jaeg 10, Jaeg 04]) and the memoryless readout is trained so that the output matches a given teaching signal that we denote by 𝐝\mathbf{d}\in\mathbb{R}^{\mathbb{Z}}. Two important advantages of this approach lay on the fact that they reduce the training of a dynamic task to a static problem and, moreover, if the reservoir map is rich enough, good performances can be indeed attained with just linear readouts that are trained via a (eventually regularized) linear regression that minimizes the Euclidean distance between the output 𝐲{\bf y} and the teaching signal 𝐝\mathbf{d}. These features circumvent well-known difficulties in the training of generic recurrent neural networks having to do with bifurcation phenomena [Doya 92] and that, despite recent progress in the regularization and training of deep RNN structures (see, for instance [Grav 13, Pasc 13, Zare 14], and references therein), render classical gradient descent methods non-convergent.

The interest for reservoir computing in both the machine learning and the signal processing communities has strongly increased in the last years. One of the main reasons for this fact is that some RC implementations are based on the computational capacities of certain non-neural dynamical systems [Crut 10], which opens the door to physical (optical or optoelectronic) realizations that have already been built using dedicated hardware (see, for instance, [Jaeg 07, Atiy 00, Appe 11, Roda 11, Vand 11, Larg 12, Paqu 12, Brun 13, Vand 14, Vinc 15]) and that have shown unprecedented information processing speeds.

There are two central questions that need to be addressed when designing a machine learning paradigm, namely, the capacity and the universality problems. The capacity problem concerns generically the estimation of the error that is going to be committed in the execution of a specific task. In statistical learning and in the approximation theoretical treatment of static neural networks, this estimation has taken the form of generic bounds that incorporate various architecture parameters of the system like in [Pisi 81, Jone 92, Barr 93, Kurk 05]. In the specific context of reservoir computing, and in dynamic learning in general, one is interested in various notions of memory capacity that have been the subject of much research [Jaeg 02, Whit 04, Gang 08, Herm 10, Damb 12, Grig 15, Coui 16, Grig 16a].

The universality problem consists in showing that the set of input/output functionals that can be generated with a specific architecture is dense in a sufficiently rich class, like the one containing, for example, all continuous or even all measurable functionals. For classical machine learning paradigms like neural networks, this question has given rise to well-known results [Kolm 56, Arno 57, Spre 65, Spre 96, Spre 97, Cybe 89, Horn 89, Rusc 98] that show that they can be considered as universal approximators in a static and deterministic setup.

There is no general recipe that allows one to conclude the universality of a given machine learning approach. The proof strategy depends much on the specific paradigm and, more importantly, on the nature of the inputs and the outputs. In the context of reservoir computing there are several situations for which universality has been established when the inputs/outputs are deterministic. There are two features that influence significantly the level of mathematical sophistication that is needed to conclude universality: first, the compactness of the time domain under consideration and second, if one works in continuous or discrete time. In the following paragraphs we briefly review the results that have already been obtained and, in passing, we present and put in context the contributions contained in this paper.

The compactness of the time domain is crucial because, as we will see later on, universality can be obtained as a consequence of various versions of the Stone-Weierstrass Theorem, which are invariably formulated for functions defined on a compact metric space. When the time domain is compact, this property is naturally inherited by the spaces relevant in the proofs. However, when it is not, it can still be secured using functionals that satisfy a condition introduced in [Boyd 85] known as the fading memory property. The distinction between continuous and discrete time inputs is justified by the availability in the continuous setup of different tools coming from functional analysis that do not exist for discrete time.

Reservoir computing universality for compact time domains is a corollary of classical results in systems theory. Indeed, in the continuous time setup, it can be established [Flie 76, Suss 76] for linear systems using polynomial readouts and for bilinear systems using linear readouts. In the discrete-time setup, the situation is more convoluted when the readout is linear and required the introduction in [Flie 80] of the so-called (homogeneous) state-affine systems (SAS) (see also [Sont 79a, Sont 79b]). The extension of these results to continuous non-compact time intervals was carried out in [Boyd 85] for fading memory filters using exponentially stable linear RCs with polynomial readouts and their bilinear counterparts with linear readouts (see also [Maas 00, Maas 02, Maas 04, Maas 07]). An extension to the non-compact discrete-time setup based on the Stone-Weierstrass theorem is, to our knowledge, not available in the literature and it is one of the main contributions of this paper. This problem has only been tackled from an internal approximation point of view, which consists in uniformly approximating the reservoir and readout maps in (1.1)-(1.2) in order to obtain an approximation of the resulting filter; this strategy has been introduced in [Matt 92, Matt 93] for absolutely summable systems. The proofs in those works were unfortunately based on an invalid compactness assumption. Even though corrections were proposed in [Perr 96, Stub 97a], this approach yields, in the best of cases, universality only within the reservoir filter category, while we aim at proving that statement in the much larger category of fading memory filters.

The paper is structured in three sections:

  • All the notation and main definitions which are used later on in the paper are provided in Section 2. Important concepts like filters, reservoir filters, and the fading memory property are discussed.

  • Section 3 contains two different universality results. The first one in Subsection 3.1 shows that the entire family of fading memory RCs itself is universal, as well as the much smaller one containing all the linear reservoirs with polynomial readouts, when certain spectral restrictions are imposed on the reservoir matrices (see below for details). The second universality result is contained in Subsection 3.2 and is one of the main contributions of the paper. Here we restrict ourselves to reservoir computers with linear readouts which are closer to the type of RCs used in applications. We introduce a non-homogeneous variant of the state-affine systems in [Flie 80] and identify sufficient conditions that guarantee that the associated reservoir computers with linear readouts are causal, time-invariant, and satisfy the echo state and the fading memory properties. Finally, we state a universality result for a subset of this class which is shown to be universal in the category of fading memory filters with uniformly bounded inputs.

  • These universality statements are generalized to the stochastic setup for almost surely uniformly bounded inputs in Section 4. In particular, it is shown that any discrete-time filter that has the fading memory property with almost surely uniformly bounded stochastic inputs can be uniformly approximated by elements in the non-homogeneous state-affine family.

Despite some preexisting work on the uniform approximation in probability of stochastic systems with finite memory [Perr 96, Perr 97, Stub 97b], the universality result in the stochastic setup is, to our knowledge, the first of its type in the literature and opens the door to new developments in the learning of stochastic processes and their obvious applications to forecasting [Galt 14]. In the deterministic setup, RC has been very successful (see for instance [Jaeg 04, Path 17, Path 18]) in the learning of the attractors of various dynamical systems. This approach is used for forecasting by path continuation of synthetically learnt proxies, which has led to several orders of magnitude accuracy improvements with respect to most standard dynamical systems forecasting techniques based on Takens’ Theorem [Take 81]. We expect that the results in this paper should lead to comparable improvements in the density forecasting of stochastic processes.

2 Notation, definitions, and preliminary discussions

Vector and matrix notations. Polynomials.

A column vector is denoted by a bold lower case symbol like 𝐫\mathbf{r} and 𝐫\mathbf{r}^{\top} indicates its transpose. Given a vector 𝐯n\mathbf{v}\in\mathbb{R}^{n}, we denote its entries by viv_{i} or viv^{i}, depending on the context, with i{1,,n}i\in\left\{1,\dots,n\right\}; we also write 𝐯=(vi)i{1,,n}\mathbf{v}=(v_{i})_{i\in\left\{1,\dots,n\right\}}. We denote by 𝕄n,m\mathbb{M}_{n,m} the space of real n×mn\times m matrices with m,nm,n\in\mathbb{N}. When n=mn=m, we use the symbol 𝕄n\mathbb{M}_{n} to refer to the space of square matrices of order nn. 𝔻n𝕄n\mathbb{D}_{n}\subset\mathbb{M}_{n} is the set of diagonal matrices of order nn and 𝔻\mathbb{D} denotes the set of diagonal matrices of any order. Given a vector 𝐯n\mathbf{v}\in\mathbb{R}^{n}, we denote by diag(𝐯){\rm diag}(\mathbf{v}) the diagonal matrix in 𝕄n\mathbb{M}_{n} with the elements of 𝐯\mathbf{v} as diagonal entries. ilnk𝕄n\mathbb{N}{\rm il}_{n}^{k}\subset\mathbb{M}_{n} is the set of nilpotent matrices in 𝕄n\mathbb{M}_{n} of index knk\leq n , that is, AilnkA\in\mathbb{N}{\rm il}_{n}^{k} if and only if A𝕄nA\in\mathbb{M}_{n}, Ak=0A^{k}=0, and Al0A^{l}\neq 0 for any l<kl<k. il\mathbb{N}{\rm il} denotes the set of nilpotent matrices of any order and any index. Given a matrix A𝕄n,mA\in\mathbb{M}_{n,m}, we denote its components by AijA_{ij} and we write A=(Aij)A=(A_{ij}), with i{1,,n}i\in\left\{1,\dots,n\right\}, j{1,m}j\in\left\{1,\dots m\right\}. Given a vector 𝐯n\mathbf{v}\in\mathbb{R}^{n}, the symbol 𝐯\|\mathbf{v}\| stands for its Euclidean norm. For any A𝕄n,mA\in\mathbb{M}_{n,m}, A2\|A\|_{2} denotes its matrix norm induced by the Euclidean norms in m\mathbb{R}^{m} and n\mathbb{R}^{n}, and satisfies [Horn 13, Example 5.6.6] that A2=σmax(A)\|A\|_{2}=\sigma_{{\rm max}}(A), with σmax(A)\sigma_{{\rm max}}(A) the largest singular value of AA. A2\|A\|_{2} is sometimes referred to as the spectral norm of AA [Horn 13].

Let V1,V2,W1,W2V_{1},V_{2},W_{1},W_{2} be vector spaces. The symbols V1V2V_{1}\oplus V_{2} and V1V2V_{1}\otimes V_{2} denote the corresponding direct sum and tensor product vector spaces [Hung 74], respectively, of V1V_{1} and V2V_{2}. Given any 𝐯1V1{\bf v}_{1}\in V_{1} and 𝐯2V2{\bf v}_{2}\in V_{2}, the vectors 𝐯1𝐯2V1V2{\bf v}_{1}\oplus{\bf v}_{2}\in V_{1}\oplus V_{2} and 𝐯1𝐯2V1V2{\bf v}_{1}\otimes{\bf v}_{2}\in V_{1}\otimes V_{2} are the direct sum and the (pure) tensor product of 𝐯1{\bf v}_{1} and 𝐯2{\bf v}_{2}, respectively. Given two linear maps A1:V1W1A_{1}:V_{1}\longrightarrow W_{1} and A2:V2W2A_{2}:V_{2}\longrightarrow W_{2}, we denote by A1A2:V1V2W1W2A_{1}\oplus A_{2}:V_{1}\oplus V_{2}\longrightarrow W_{1}\oplus W_{2} and A1A2:V1V2W1W2A_{1}\otimes A_{2}:V_{1}\otimes V_{2}\longrightarrow W_{1}\otimes W_{2} the associated direct sum and tensor product maps, respectively, defined by A1A2(𝐯1𝐯2):=A1(𝐯1)A2(𝐯2)A_{1}\oplus A_{2}\left({\bf v}_{1}\oplus{\bf v}_{2}\right):=A_{1}\left({\bf v}_{1}\right)\oplus A_{2}\left({\bf v}_{2}\right) and A1A2(𝐯1𝐯2):=A1(𝐯1)A2(𝐯2)A_{1}\otimes A_{2}\left({\bf v}_{1}\otimes{\bf v}_{2}\right):=A_{1}\left({\bf v}_{1}\right)\otimes A_{2}\left({\bf v}_{2}\right). The matrix representation of A1A2A_{1}\oplus A_{2} is obtained by concatenating in a block diagonal matrix the matrix representations of A1A_{1} and A2A_{2}. As to the matrix representation of A1A2A_{1}\otimes A_{2} it is obtained via the Kronecker product of the matrix representations of A1A_{1} and A2A_{2} [Horn 13].

Given an element 𝐳n{\bf z}\in\mathbb{R}^{n}, we denote by [𝐳]\mathbb{R}[{\bf z}] the real-valued multivariate polynomials on 𝐳{\bf z} with real coefficients. Analogously, Pol(n,){\rm Pol}(\mathbb{R}^{n},\mathbb{R}) will denote the set of real-valued polynomials on n\mathbb{R}^{n}. When zz\in\mathbb{R} and m,nm,n\in\mathbb{N}, we define the set 𝕄m,n[z]\mathbb{M}_{m,n}[z] of 𝕄m,n\mathbb{M}_{m,n}-valued polynomials on zz with coefficients in 𝕄m,n\mathbb{M}_{m,n} as

𝕄m,n[z]:={A0+zA1+z2A2++zrArr,A0,A1,A2,,Ar𝕄m,n}.\mathbb{M}_{m,n}[z]:=\{A_{0}+zA_{1}+z^{2}A_{2}+\cdots+z^{r}A_{r}\mid r\in\mathbb{N},A_{0},A_{1},A_{2},\ldots,A_{r}\in\mathbb{M}_{m,n}\}. (2.1)

ilnk[z]𝕄n[z]\mathbb{N}{\rm il}_{n}^{k}[z]\subset\mathbb{M}_{n}[z] is the set of nilpotent 𝕄n\mathbb{M}_{n}-valued polynomials on zz of index kk, that is, p(z)ilnk[z]p(z)\in\mathbb{N}{\rm il}_{n}^{k}[z] whenever kk is the smallest natural number for which p(z)k=𝟎p(z)^{k}={\bf 0}, for all zz\in\mathbb{R}. il[z]\mathbb{N}{\rm il}[z] is the set of matrix-valued nilpotent polynomials on zz of any order and any index.

Sequence spaces.

\mathbb{N} denotes the set of natural numbers with the zero element included. \mathbb{Z} (respectively, +\mathbb{Z}_{+} and \mathbb{Z}_{-}) are the integers (respectively, the positive and the negative integers). The symbol (n)(\mathbb{R}^{n})^{\mathbb{Z}} denotes the set of infinite real sequences of the form 𝐳=(,𝐳1,𝐳0,𝐳1,){\bf z}=(\ldots,{\bf z}_{-1},{\bf z}_{0},{\bf z}_{1},\ldots), 𝐳in{\bf z}_{i}\in\mathbb{R}^{n}, ii\in\mathbb{Z}; (n)(\mathbb{R}^{n})^{\mathbb{Z}_{-}} and (n)+(\mathbb{R}^{n})^{\mathbb{Z}_{+}} are the subspaces consisting of, respectively, left and right infinite sequences: (n)={𝐳=(,𝐳2,𝐳1,𝐳0)𝐳in,i}(\mathbb{R}^{n})^{\mathbb{Z}_{-}}=\{{\bf z}=(\ldots,{\bf z}_{-2},{\bf z}_{-1},{\bf z}_{0})\mid{\bf z}_{i}\in\mathbb{R}^{n},i\in\mathbb{Z}_{-}\}, (n)+={𝐳=(𝐳0,𝐳1,𝐳2,)𝐳in,i+}(\mathbb{R}^{n})^{\mathbb{Z}_{+}}=\{{\bf z}=({\bf z}_{0},{\bf z}_{1},{\bf z}_{2},\ldots)\mid{\bf z}_{i}\in\mathbb{R}^{n},i\in\mathbb{Z}_{+}\}. Analogously, (Dn)(D_{n})^{\mathbb{Z}}, (Dn)(D_{n})^{\mathbb{Z}_{-}}, and (Dn)+(D_{n})^{\mathbb{Z}_{+}} stand for (semi-)infinite sequences with elements in the subset DnnD_{n}\subset\mathbb{R}^{n}. In most cases we shall use in these infinite product spaces either the product topology (see [Munk 14, Chapter 2]) or the topology induced by the supremum norm 𝐳:=supt{𝐳t}\|{\bf z}\|_{\infty}:={\rm sup}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}\|\right\}. The symbols (n)\ell^{\infty}(\mathbb{R}^{n}) and ±(n)\ell_{\pm}^{\infty}(\mathbb{R}^{n}) will be used to denote the Banach spaces formed by the elements in those infinite product spaces that have a finite supremum norm \|\cdot\|_{\infty}. The symbol Bn(𝐯,M)nB_{n}({\bf v},M)\subset\mathbb{R}^{n}, denotes the open ball of radius M>0M>0 and center 𝐯n{\bf v}\in{\mathbb{R}}^{n} with respect to the Euclidean norm. The bars over sets stand for topological closures, in particular, Bn(𝐯,M)¯\overline{B_{n}({\bf v},M)} is the closed ball.

Filters.

We will refer to the maps of the type U:(Dn)U:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} as filters or operators and to those like H:(Dn)H:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R} (or H:(Dn)±H:(D_{n})^{\mathbb{Z}_{\pm}}\longrightarrow\mathbb{R}) as functionals. A filter UU is called causal when for any two elements 𝐳,𝐰(Dn){\bf z},\mathbf{w}\in(D_{n})^{\mathbb{Z}} that satisfy that 𝐳τ=𝐰τ{\bf z}_{\tau}=\mathbf{w}_{\tau} for all τt\tau\leq t, for any given tt\in\mathbb{Z}, we have that U(𝐳)t=U(𝐰)tU({\bf z})_{t}=U({\bf w})_{t}. Let Uτ:(Dn)(Dn)U_{\tau}:(D_{n})^{\mathbb{Z}}\longrightarrow(D_{n})^{\mathbb{Z}}, τ\tau\in\mathbb{Z}, be the time delay operator defined by Uτ(𝐳)t:=𝐳tτU_{\tau}({\bf z})_{t}:={\bf z}_{t-\tau}. The filter UU is called time-invariant when it commutes with the time delay operator, that is, UτU=UUτU_{\tau}\circ U=U\circ U_{\tau} (in this expression, the two time delay operators UτU_{\tau} have to be understood as defined in the appropriate sequence spaces). We recall (see for instance [Boyd 85]) that there is a bijection between causal time-invariant filters and functionals on (Dn)(D_{n})^{\mathbb{Z}_{-}}. Indeed, given a time-invariant filter UU, we can associate to it a functional HU:(Dn)H_{U}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} via the assignment HU(𝐳):=U(𝐳e)0H_{U}({\bf z}):=U({\bf z}^{e})_{0}, where 𝐳e(Dn){\bf z}^{e}\in(D_{n})^{\mathbb{Z}} is an arbitrary extension of 𝐳(Dn){\bf z}\in(D_{n})^{\mathbb{Z}_{-}} to (Dn)(D_{n})^{\mathbb{Z}}. Conversely, for any functional H:(Dn)H:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R}, we can define a time-invariant causal filter UH:(Dn)U_{H}:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} by UH(𝐳)t:=H((Ut)(𝐳))U_{H}({\bf z})_{t}:=H((\mathbb{P}_{\mathbb{Z}_{-}}\circ U_{-t})({\bf z})), where UtU_{-t} is the (t)(-t)-time delay operator and :(Dn)(Dn)\mathbb{P}_{\mathbb{Z}_{-}}:(D_{n})^{\mathbb{Z}}\longrightarrow(D_{n})^{\mathbb{Z}_{-}} is the natural projection. It is easy to verify that:

HUH\displaystyle H_{U_{H}} =\displaystyle= H,for any functionalH:(Dn),\displaystyle H,\quad\mbox{for any functional}\quad H:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R},
UHU\displaystyle U_{H_{U}} =\displaystyle= U,for any causal time-invariant filterU:(Dn).\displaystyle U,\quad\mbox{for any causal time-invariant filter}\quad U:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}}.

Additionally, let H1,H2:(Dn)H_{1},H_{2}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} and λ\lambda\in\mathbb{R}, then UH1+λH2(𝐳)=UH1(𝐳)+λUH2(𝐳)U_{H_{1}+\lambda H_{2}}({\bf z})=U_{H_{1}}({\bf z})+\lambda U_{H_{2}}({\bf z}), for any 𝐳(n){\bf z}\in(\mathbb{R}^{n})^{\mathbb{Z}}. In the following pages and when the discussion will take place in a causal and time-invariant setup, we will use the term filter to denote exchangeably the associated functional and the filter itself.

Reservoir filters.

Consider now the RC system determined by (1.1)–(1.2). It is worth mentioning that, unlike in those expressions, the reservoir and the readout maps are in general defined only on subsets DN,DNND_{N},D^{\prime}_{N}\subset\mathbb{R}^{N} and DnnD_{n}\subset\mathbb{R}^{n} and not on the entire Euclidean spaces N{\mathbb{R}}^{N} and n{\mathbb{R}}^{n}, that is, F:DN×DnDNF:D_{N}\times D_{n}\longrightarrow D^{\prime}_{N} and h:DNh:D^{\prime}_{N}\rightarrow\mathbb{R}. Reservoir systems determine a filter when the following existence and uniqueness property holds: for each 𝐳(Dn){\bf z}\in\left(D_{n}\right)^{\mathbb{Z}} there exists a unique 𝐱(DN){\bf x}\in\left(D_{N}\right)^{\mathbb{Z}} such that for each tt\in\mathbb{Z}, the relation (1.1) holds. This condition is known in the literature as the echo state property [Jaeg 10, Yild 12] and has deserved much attention in the context of echo state networks [Jaeg 04, Bueh 06, Bai  12, Wain 16, Manj 13]. The echo state property formulated for infinite (or semi-infinite) inputs guarantees that the output of the filter at any given time does not depend on initial conditions. We emphasize that this is a genuine condition that is not automatically satisfied by all RC systems.

We will denote by UF:(Dn)(DN)U^{F}:(D_{n})^{\mathbb{Z}}\longrightarrow(D_{N})^{\mathbb{Z}} the filter determined by the reservoir map via (1.1), that is, UF(𝐳)t:=𝐱tDNU^{F}({\bf z})_{t}:=\mathbf{x}_{t}\in D_{N}, and by UhF:(Dn)U^{F}_{h}:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} the one determined by the entire reservoir system, that is, UhF(𝐳)t:=h(UF(𝐳)t)=ytU^{F}_{h}({\bf z})_{t}:=h\left(U^{F}({\bf z})_{t}\right)=y_{t}. UhFU^{F}_{h} will be called the reservoir filter associated to the RC system (1.1)–(1.2). The filters UFU^{F} and UhFU^{F}_{h} are causal by construction and it can also be shown that they are necessarily time-invariant [Grig 18]. We can hence associate to UhFU^{F}_{h} a reservoir functional HhF:(Dn)H^{F}_{h}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} determined by HhF:=HUhFH^{F}_{h}:=H_{U^{F}_{h}}.

Weighted norms and the fading memory property (FMP).

Let w:(0,1]w:\mathbb{N}\longrightarrow(0,1] be a decreasing sequence with zero limit. We define the associated weighted norm w\|\cdot\|_{w} on (n)(\mathbb{R}^{n})^{\mathbb{Z}_{-}} associated to the weighting sequence ww as the map:

w:(n)+¯𝐳𝐳w:=supt{𝐳twt},\displaystyle\begin{array}[]{cccc}\|\cdot\|_{w}:&(\mathbb{R}^{n})^{\mathbb{Z}_{-}}&\longrightarrow&\overline{\mathbb{R}^{+}}\\ &{\bf z}&\longmapsto&\|{\bf z}\|_{w}:=\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}w_{-t}\|\},\end{array}

where \|\cdot\| denotes the Euclidean norm in n\mathbb{R}^{n}. It is worth noting that the space

w(n):={𝐳(n)𝐳w<},\ell^{\infty}_{w}({\mathbb{R}}^{n}):=\left\{{\bf z}\in\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}}\mid\|{\bf z}\|_{w}<\infty\right\}, (2.3)

endowed with weighted norm w\|\cdot\|_{w} forms a Banach space [Grig 18].

All along the paper, we will work with uniformly bounded families of sequences, both in the deterministic and the stochastic setups. The two main properties of these subspaces in relation with the weighted norms are spelled out in the following two lemmas.

Lemma 2.1

Let M>0M>0 and let KMK_{M} be the set of elements in (n)\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}} which are uniformly bounded by MM, that is,

KM:={𝐳(n)𝐳tMfor allt}=Bn(𝟎,M)¯,K_{M}:=\left\{{\bf z}\in\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}}\mid\|{\bf z}_{t}\|\leq M\quad\mbox{for all}\quad t\in\mathbb{Z}_{-}\right\}=\overline{B_{n}({\bf 0},M)}^{\mathbb{Z}_{-}}, (2.4)

with Bn(𝟎,M)¯n\overline{B_{n}({\bf 0},M)}\subset\mathbb{R}^{n} the closed ball of radius MM and center 𝟎{\bf 0} in n{\mathbb{R}}^{n} with respect to the Euclidean norm. Then, for any weighting sequence ww and 𝐳KM{\bf z}\in K_{M}, we have that 𝐳w<\|{\bf z}\|_{w}<\infty.

Additionally, let λ,ρ(0,1)\lambda,\rho\in(0,1) and let w,wρ,w1ρw,w^{\rho},w^{1-\rho} be the weighting sequences given by wt:=λtw_{t}:=\lambda^{t}, wtρ:=λρtw_{t}^{\rho}:=\lambda^{\rho t}, wt1ρ:=λ(1ρ)tw_{t}^{1-\rho}:=\lambda^{(1-\rho)t}, tt\in\mathbb{N}. Then, the series t=0𝐳twt\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|w_{t} is absolutely convergent and satisfies the inequalities:

t=0𝐳twt\displaystyle\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|w_{t} =\displaystyle= t=0𝐳tλt𝐳w1ρ11λρ,\displaystyle\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|\lambda^{t}\leq\|{\bf z}\|_{w^{1-\rho}}\frac{1}{1-\lambda^{\rho}}, (2.5)
t=0𝐳twt\displaystyle\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|w_{t} =\displaystyle= t=0𝐳tλt𝐳wρ11λ1ρ.\displaystyle\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|\lambda^{t}\leq\|{\bf z}\|_{w^{\rho}}\frac{1}{1-\lambda^{1-\rho}}. (2.6)

The following result is a discrete-time version of Lemma 1 in [Boyd 85] that is easily obtained by noticing that in the discrete-time setup all functions are trivially continuous if we consider the discrete topology for their domains and, moreover, all families of functions are equicontinuous. A proof is given in the appendices for the sake of completeness.

Lemma 2.2

Let M>0M>0 and let KMK_{M} be as in  (2.4). Let w:(0,1]w:\mathbb{N}\longrightarrow(0,1] be a weighting sequence. Then KMK_{M} is a compact topological space when endowed with the relative topology inherited from the norm topology in the Banach space (w(n),w)(\ell^{\infty}_{w}({\mathbb{R}}^{n}),\left\|\cdot\right\|_{w}).

Definition 2.3

Let DnnD_{n}\subset\mathbb{R}^{n} and let HU:(Dn)H_{U}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} be the functional associated to the causal and time-invariant filter U:(Dn)U:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}}. We say that UU has the fading memory property (FMP) whenever there exists a weighting sequence w:(0,1]w:\mathbb{N}\longrightarrow(0,1] such that the map HU:((Dn),w)H_{U}:((D_{n})^{\mathbb{Z}_{-}},\|\cdot\|_{w})\longrightarrow\mathbb{R} is continuous. This means that for any 𝐳(Dn){\bf z}\in(D_{n})^{\mathbb{Z}_{-}} and any ϵ>0\epsilon>0, there exists a δ(ϵ)>0\delta(\epsilon)>0 such that for any 𝐬(Dn){\bf s}\in(D_{n})^{\mathbb{Z}_{-}} that satisfies that

𝐳𝐬w=supt{(𝐳t𝐬t)wt}<δ(ϵ),then|HU(𝐳)HU(𝐬)|<ϵ.\|{\bf z}-{\bf s}\|_{w}=\sup_{t\in\mathbb{Z}_{-}}\{\|({\bf z}_{t}-{\bf s}_{t})w_{-t}\|\}<\delta(\epsilon),\quad\mbox{then}\quad|H_{U}({\bf z})-H_{U}({\bf s})|<\epsilon.

If the weighting sequence ww is such that wt=λtw_{t}=\lambda^{t}, for some λ(0,1)\lambda\in(0,1) and all tt\in\mathbb{N}, then UU is said to have the λ\lambda-exponential fading memory property.

Remark 2.4

This formulation of the fading memory property is due to Boyd and Chua [Boyd 85] and it is the key concept that allowed these authors to extend to non-compact time intervals the first filter universality results formulated in the classical works of Fréchet [Frec 10], Volterra, and Wiener [Wien 58, Bril 58, Geor 59], always under compactness assumptions on the input space and the time interval in which inputs are defined.

Remark 2.5

In the context of reservoir filters, the fading memory property is in some occasions related to the Lyapunov stability of the autonomous system associated to the reservoir map by setting the input sequence equal to zero. This connection has been made explicit, for example, for discrete-time nonlinear state-space models that are affine in their inputs, and have direct feed-through term in the output  [Zang 04] or for time delay reservoirs  [Grig 16b].

Remark 2.6

Time-invariant fading memory filters always have the bounded input, bounded output (BIBO) property. Indeed, if for simplicity we consider functionals HUH_{U} that map the zero input to zero, that is HU(𝟎)=0H_{U}({\bf 0})=0, and we want bounded outputs such that |HU(𝐳)|<k|H_{U}({\bf z})|<k, for a given constant k>0k>0, by Definition  2.3 it suffices to consider inputs 𝐳(N){\bf z}\in(\mathbb{R}^{N})^{\mathbb{Z}_{-}} such that 𝐳:=supt{𝐳t}<δ(k)\|{\bf z}\|_{\infty}:=\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}\|\}<\delta(k). Indeed, if HUH_{U} has the FMP with respect to a weighting sequence ww, then 𝐳w𝐳<δ(k)\|{\bf z}\|_{w}\leq\|{\bf z}\|_{\infty}<\delta(k) and hence |HU(𝐳)|<k|H_{U}({\bf z})|<k, as required. Another important dynamical implication of the fading memory property is the uniqueness of steady states or, equivalently, the asymptotic independence of the dynamics on the initial conditions. See Theorem 6 in [Boyd 85] for details about this fact.

The following lemma, which will be used later on in the paper, spells out how the FMP depends on the weighting sequence used to define it.

Lemma 2.7

Let DnnD_{n}\subset\mathbb{R}^{n} and let HU:(Dn)H_{U}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} be the functional associated to the causal and time-invariant filter U:(Dn)()U:(D_{n})^{\mathbb{Z}}\longrightarrow(\mathbb{R})^{\mathbb{Z}}. If HUH_{U} has the FMP with respect to a given weighting sequence ww, then it also has it with respect to any other weighting sequence ww^{\prime} which satisfies

wtwt<λ,for a fixedλ>0and for allt.\frac{w_{t}}{w^{\prime}_{t}}<\lambda,\quad\mbox{for a fixed}\quad\lambda>0\quad\mbox{and for all}\quad t\in\mathbb{N}.

In particular, the thesis of the lemma holds when ww^{\prime} dominates ww, that is when λ=1\lambda=1.

It can be shown [Grig 18] that when in this lemma the set (Dn)(D_{n})^{\mathbb{Z}_{-}} is made of uniformly bounded sequences, that is, (Dn)=KM(D_{n})^{\mathbb{Z}_{-}}=K_{M}, with KMK_{M} as in (2.4) then, if a filter has the FMP with respect to a given weighting sequence, it necessarily has the same property with respect to any other weighting sequence.

3 Universality results in the deterministic setup

The goal of this section is identifying families of reservoir filters that are able to uniformly approximate any time-invariant, causal, and fading memory filter with deterministic inputs with any desired degree of accuracy. Such families of reservoir computers are said to be universal.

The main mathematical tool that we use is the Stone-Weierstrass theorem for polynomial subalgebras of real-valued functions defined on compact metric spaces. This approach provides us with universal families of filters as long as we can prove that, roughly speaking, their elements form polynomial algebras using a product defined in the space of functionals. More specifically, if DnnD_{n}\subset\mathbb{R}^{n} and HU1,HU2:(Dn)H_{U_{1}},H_{U_{2}}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} are the functionals associated to the causal and time-invariant filters U1,U2:(Dn)U_{1},U_{2}:(D_{n})^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}}, we readily define their product HU1HU2:(Dn)H_{U_{1}}\cdot H_{U_{2}}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} and linear combination HU1+λHU2:(Dn)H_{U_{1}}+\lambda H_{U_{2}}:(D_{n})^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R}, λ\lambda\in\mathbb{R}, as

(HU1HU2)(𝐳):=HU1(𝐳)HU2(𝐳),(HU1+λHU2)(𝐳):=HU1(𝐳)+λHU2(𝐳),𝐳(Dn).(H_{U_{1}}\cdot H_{U_{2}})\left({\bf z}\right):=H_{U_{1}}\left({\bf z}\right)\cdot H_{U_{2}}\left({\bf z}\right),\quad(H_{U_{1}}+\lambda H_{U_{2}})\left({\bf z}\right):=H_{U_{1}}\left({\bf z}\right)+\lambda H_{U_{2}}\left({\bf z}\right),\quad{\bf z}\in(D_{n})^{\mathbb{Z}_{-}}. (3.1)

This section contains two different universality results. The first one shows that polynomial algebras of filters generated by reservoir systems using the operations in (3.1) that have the fading memory property and that separate points, are able to approximate any fading memory filter. Two important consequences of this result are that the entire family of fading memory RCs itself is universal, as well as the one containing all the linear reservoirs with polynomial readouts, when certain spectral restrictions are imposed on the reservoir matrices (see below for details). In the second result, we restrict ourselves to reservoir computers with linear readouts and introduce the non-homogeneous state-affine family in order to be able to obtain a similar universality statement. The linearity restriction on the readouts makes this universality statement closer to the type of RCs used in applications and to the standard notion of reservoir system that one commonly finds in the literature [Luko 09].

The first result can be seen as a discrete-time version of the one in [Boyd 85] for continuous-time filters, while the second one is an extension to infinite time intervals of the main approximation result in [Flie 80], which was originally formulated for compact time intervals using homogeneous state-affine systems.

3.1 Universality for fading memory RCs with non-linear readouts

The following statement is a direct consequence of the compactness result in Lemma 2.4 and the Stone-Weierstrass, as formulated in Theorem 7.3.1 in [Dieu 69]. See Appendix 6.4 for a detailed proof.

All along this subsection, we work with reservoir filters with uniformly bounded inputs in a set KM(n)K_{M}\subset(\mathbb{R}^{n})^{\mathbb{Z}_{-}}, as defined in (2.4). These filters are generated by reservoir systems F:DN×Bn(𝟎,M)¯DNF:D_{N}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N} and h:DNh:D_{N}\rightarrow\mathbb{R}, for some n,Nn,N\in\mathbb{N}, M>0M>0, and DNND_{N}\subset\mathbb{R}^{N}.

Theorem 3.1

Let KM(n)K_{M}\subset(\mathbb{R}^{n})^{\mathbb{Z}_{-}} be a subset of the type defined in (2.4), II an index set, and let

:={HhiFi:KMhiC(DNi),Fi:DNi×Bn(𝟎,M)¯DNi,iI,Ni}\mathcal{R}:=\{H_{h_{i}}^{F_{i}}:K_{M}\longrightarrow\mathbb{R}\mid h_{i}\in C^{\infty}(D_{N_{i}}),F_{i}:D_{N_{i}}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N_{i}},i\in I,N_{i}\in\mathbb{N}\} (3.2)

be a set of reservoir filters defined on KMK_{M} that have the FMP with respect to a given weighted norm w\|\cdot\|_{w}. Let 𝒜()\mathcal{A}(\mathcal{R}) be the polynomial algebra generated by \mathcal{R}, that is, the set formed by finite products and linear combinations of elements in \mathcal{R} according to the operations defined in (3.1). If the algebra 𝒜()\mathcal{A}(\mathcal{R}) contains the constant functionals and separates the points in KMK_{M}, then any causal, time-invariant fading memory filter H:KMH:K_{M}\longrightarrow\mathbb{R} can be uniformly approximated by elements in 𝒜()\mathcal{A}(\mathcal{R}), that is, 𝒜()\mathcal{A}(\mathcal{R}) is dense in the set (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}) of real-valued continuous functions on (KM,w)(K_{M},\|\cdot\|_{w}). More explicitly, this implies that for any fading memory filter HH and any ϵ>0\epsilon>0, there exist a finite set of indices {i1,,ir}I\{i_{1},\ldots,i_{r}\}\subset I and a polynomial p:rp:\mathbb{R}^{r}\longrightarrow\mathbb{R} such that

HHhF:=sup𝐳KM{|H(𝐳)HhF(𝐳)|}<ϵwithh:=p(hi1,,hir)andF:=(Fi1,,Fir).\|H-H_{h}^{F}\|_{\infty}:=\sup_{{\bf z}\in K_{M}}\{|H({\bf z})-H_{h}^{F}({\bf z})|\}<\epsilon\quad\mbox{with}\quad h:=p(h_{i_{1}},\ldots,h_{i_{r}})\quad\mbox{and}\quad F:=(F_{i_{1}},\ldots,F_{i_{r}}).

An important fact is that the polynomial algebra 𝒜()\mathcal{A}(\mathcal{R}) generated by a set formed by fading memory reservoir filters is made of fading memory reservoir filters. Indeed, let hiC(DNi)h_{i}\in C^{\infty}(D_{N_{i}}), Fi:DNi×Bn(𝟎,M)¯DNiF_{i}:D_{N_{i}}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N_{i}}, i{1,2}i\in\{1,2\}, and λ\lambda\in\mathbb{R}. Then, the product Hh1F1Hh2F2H_{h_{1}}^{F_{1}}\cdot H_{h_{2}}^{F_{2}} and the linear combination Hh1F1+λHh2F2H_{h_{1}}^{F_{1}}+\lambda H_{h_{2}}^{F_{2}} filters, as they were defined in (3.1), are such that

Hh1F1Hh2F2\displaystyle H_{h_{1}}^{F_{1}}\cdot H_{h_{2}}^{F_{2}} =\displaystyle= HhF,withh:=h1h2C(DN1×DN2),\displaystyle H_{h}^{F},\quad\mbox{with}\quad h:=h_{1}\cdot h_{2}\in C^{\infty}(D_{N_{1}}\times D_{N_{2}}), (3.3)
Hh1F1+λHh2F2\displaystyle H_{h_{1}}^{F_{1}}+\lambda H_{h_{2}}^{F_{2}} =\displaystyle= HhF,withh:=h1+λh2C(DN1×DN2),\displaystyle H_{h^{\prime}}^{F},\quad\mbox{with}\quad h^{\prime}:=h_{1}+\lambda h_{2}\in C^{\infty}(D_{N_{1}}\times D_{N_{2}}), (3.4)

and where F:(DN1×DN2)×Bn(𝟎,M)¯(DN1×DN2)F:(D_{N_{1}}\times D_{N_{2}})\times\overline{B_{n}({\bf 0},M)}\longrightarrow(D_{N_{1}}\times D_{N_{2}}) is given by

F(((𝐱1)t,(𝐱2)t),𝐳t):=(F1((𝐱1)t,𝐳t),F2((𝐱2)t,𝐳t)),F(((\mathbf{x}_{1})_{t},(\mathbf{x}_{2})_{t}),{\bf z}_{t}):=\left(F_{1}((\mathbf{x}_{1})_{t},{\bf z}_{t}),F_{2}((\mathbf{x}_{2})_{t},{\bf z}_{t})\right), (3.5)

for any ((𝐱1)t,(𝐱2)t)DN1×DN2((\mathbf{x}_{1})_{t},(\mathbf{x}_{2})_{t})\in D_{N_{1}}\times D_{N_{2}}, 𝐳tBn(𝟎,M)¯{\bf z}_{t}\in\overline{B_{n}({\bf 0},M)}, and tt\in\mathbb{Z}_{-}. We emphasize that the functionals HhFH_{h}^{F} and HhFH_{h^{\prime}}^{F} in (3.3) and (3.4) are well defined because if the reservoir maps F1F_{1} and F2F_{2} satisfy the echo state property then so does FF. Indeed, if 𝐱1(DN1)\mathbf{x}_{1}\in\left(D_{N_{1}}\right)^{\mathbb{Z}} and 𝐱2(DN2)\mathbf{x}_{2}\in\left(D_{N_{2}}\right)^{\mathbb{Z}} are the solutions of the reservoir equation (1.1) for F1F_{1} and F2F_{2} associated to the input 𝐳KM{\bf z}\in K_{M}, then so is (𝐱1,𝐱2)(DN1×DN2)(\mathbf{x}_{1},\mathbf{x}_{2})\in\left(D_{N_{1}}\times D_{N_{2}}\right)^{\mathbb{Z}}, defined by (𝐱1,𝐱2)t:=((𝐱1)t,(𝐱2)t)(\mathbf{x}_{1},\mathbf{x}_{2})_{t}:=((\mathbf{x}_{1})_{t},(\mathbf{x}_{2})_{t}), for FF in (3.5).

This observation has as a consequence that the set formed by all the RC systems that have the echo state property and the FMP with respect to a given weighted norm w\|\cdot\|_{w} form a polynomial algebra that contains the constant functions (they can be obtained by using as readouts constant elements in C(DNi)C^{\infty}(D_{N_{i}})) and separates points (take the trivial reservoir map F(𝐱,𝐳)=𝐳F(\mathbf{x},{\bf z})={\bf z} and use the separation property of C(DNi)C^{\infty}(D_{N_{i}}) together with time-invariance). This remark and Theorem 3.1 yield the following corollary.

Corollary 3.2

Let KM(n)K_{M}\subset(\mathbb{R}^{n})^{\mathbb{Z}_{-}} be a subset as defined in (2.4) and let

w:={HhF:KMhC(DN),F:DN×Bn(𝟎,M)¯DN,N}\mathcal{R}_{w}:=\{H_{h}^{F}:K_{M}\longrightarrow\mathbb{R}\mid h\in C^{\infty}(D_{N}),F:D_{N}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N},N\in\mathbb{N}\} (3.6)

be the set of all reservoir filters with uniformly bounded inputs in KMK_{M} and that have the FMP with respect to a given weighted norm w\|\cdot\|_{w}. Then w\mathcal{R}_{w} is universal, that is, it is dense in the set (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}) of real-valued continuous functions on (KM,w)(K_{M},\|\cdot\|_{w}).

Remark 3.3

The stability of reservoir filters under products and linear combinations in (3.3)-(3.4) is a feature that allows us, in Corollary 3.2 and in some of the results that follow later on, to identify families of reservoir filters that are able to approximate any fading memory filter. This fact is a requirement for the application of the Stone-Weierstrass theorem but does not mean that we have to carry those operations out in the construction of approximating filters, which would indeed be difficult to implement in specific applications.

According to the previous corollary, reservoir filters that have the FMP are able to approximate any time-invariant fading memory filter. We now show that actually a much smaller family of reservoirs suffices to do that, namely, certain classes of linear reservoirs with polynomial readouts. Consider the RC system determined by the expressions

𝐱t\displaystyle{\bf x}_{t} =A𝐱t1+𝐜𝐳t,A𝕄N,𝐜𝕄N,n,\displaystyle=A\mathbf{x}_{t-1}+{\bf c}{\bf z}_{t},\quad A\in\mathbb{M}_{N},{\bf c}\in\mathbb{M}_{N,n}, (3.7)
yt\displaystyle y_{t} =h(𝐱t),h[𝐱].\displaystyle=h(\mathbf{x}_{t}),\quad h\in\mathbb{R}[\mathbf{x}]. (3.8)

If this system has a reservoir filter associated (the next result provides a sufficient condition for this to happen) we denote by HhA,𝐜:KMH^{A,{\bf c}}_{h}:K_{M}\longrightarrow\mathbb{R} the associated functional and we refer to it as the linear reservoir functional determined by A,𝐜A,{\bf c}, and the polynomial hh. These filters exhibit the following universality property that is proved in Appendix 6.5.

Corollary 3.4

Let KM(n)K_{M}\subset(\mathbb{R}^{n})^{\mathbb{Z}_{-}} be a subset of the type defined in (2.4) and let 0<ϵ<10<\epsilon<1. Consider the set ϵ\mathcal{L}_{\epsilon} formed by all the linear reservoir systems as in (3.7)-(3.8) determined by matrices A𝕄NA\in\mathbb{M}_{N} such that σmax(A)<1ϵ\sigma_{{\rm max}}(A)<1-\epsilon. Then, the elements in ϵ\mathcal{L}_{\epsilon} generate λρ\lambda_{\rho}-exponential fading memory reservoir functionals, with λρ:=(1ϵ)ρ\lambda_{\rho}:=(1-\epsilon)^{\rho}, for any ρ(0,1)\rho\in(0,1). Equivalently, ϵwρ\mathcal{L}_{\epsilon}\subset\mathcal{R}_{w^{\rho}}, with wtρ:=λρtw^{\rho}_{t}:=\lambda_{\rho}^{t}, and wρ\mathcal{R}_{w^{\rho}} as in (3.6). These functionals can be explicitly written as:

HhA,𝐜(𝐳)=h(i=0Ai𝐜𝐳i),for any𝐳KM.H^{A,{\bf c}}_{h}({\bf z})=h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{-i}\right),\quad\mbox{for any}\quad{\bf z}\in K_{M}. (3.9)

This family is dense in (C0(KM),wρ)(C^{0}(K_{M}),\|\cdot\|_{w^{\rho}}).

The same universality result can be stated for the following two smaller subfamilies of ϵ\mathcal{L}_{\epsilon}:

(i)

The family 𝒟ϵϵ\mathcal{DL}_{\epsilon}\subset\mathcal{L}_{\epsilon} formed by the linear reservoir systems in ϵ\mathcal{L}_{\epsilon} determined by diagonal matrices A𝔻A\in\mathbb{D} such that σmax(A)<1ϵ\sigma_{{\rm max}}(A)<1-\epsilon.

(ii)

The family 𝒩ϵ\mathcal{NL}\subset\mathcal{L}_{\epsilon} formed by the linear reservoir systems determined by nilpotent matrices AilA\in\mathbb{N}{\rm il}.

Remark 3.5

The elements in the family 𝒩\mathcal{NL} belong automatically to ϵ\mathcal{L}_{\epsilon} because the eigenvalues of a nilpotent matrix are always zero. This implies that if a linear reservoir system is determined by a nilpotent matrix AilNkA\in\mathbb{N}{\rm il}_{N}^{k} of index kNk\leq N, then the reservoir functional HhA,𝐜H^{A,{\bf c}}_{h} is automatically well-defined and given by a finite version of (3.9), that is,

HhA,𝐜(𝐳)=h(i=0k1Ai𝐜𝐳i),for any𝐳KM.H^{A,{\bf c}}_{h}({\bf z})=h\left(\sum_{i=0}^{k-1}A^{i}{\bf c}{\bf z}_{-i}\right),\quad\mbox{for any}\quad{\bf z}\in K_{M}. (3.10)

3.2 State-affine systems and universality for fading memory RCs with linear readouts

As it was explained in the introduction, the standard notion of reservoir computing that one finds in the literature concerns architectures with linear readouts. It is is particularly convenient to work with RCs that have this feature in machine learning applications since in that case the training reduces to solving a linear regression problem. That makes training feasible when there is need for a high number of neurons, as it happens in many cases. This point makes relevant the identification of families of reservoirs that are universal when the readout is restricted to be linear, which is the subject of this subsection. In order to simplify the presentation, we restrict ourselves in this case to one-dimensional input signals, that is, all along this section we set n=1n=1. The extension to the general case is straightforward, even though more convoluted to write down (see Remark 3.15).

Definition 3.6

Let NN\in\mathbb{N}, 𝐖N{\bf W}\in\mathbb{R}^{N}, and let p(z)𝕄N[z]p(z)\in\mathbb{M}_{N}[z] and q(z)𝕄N,1[z]q(z)\in\mathbb{M}_{N,1}[z] be two polynomials on the variable zz with matrix coefficients, as they were introduced in (2.1). The non-homogeneous state-affine system (SAS) associated to p,qp,q and 𝐖{\bf W} is the reservoir system determined by the state-space transformation:

𝐱t\displaystyle\mathbf{x}_{t} =p(zt)𝐱t1+q(zt),\displaystyle=p(z_{t})\mathbf{x}_{t-1}+q({z}_{t}), (3.11)
yt\displaystyle y_{t} =𝐖𝐱t.\displaystyle={\bf W}^{\top}\mathbf{x}_{t}. (3.12)

Our next result spells out a sufficient condition that guarantees that the SAS reservoir system (3.11)-(3.12) has the echo state property. Moreover, it provides an explicit expression for the unique causal and time-invariant solution associated to a given input.

Proposition 3.7

Consider a non-homogeneous state-affine system as in (3.11)-(3.12) determined by polynomials p,qp,q, and a vector 𝐖{\bf W}, with inputs defined on II^{\mathbb{Z}}, I:=[1,1]I:=[-1,1]. Assume that

K1:=maxzIp(z)2=maxzIσmax(p(z))<1.K_{1}:=\max_{z\in I}\|p(z)\|_{2}=\max_{z\in I}\sigma_{{\rm max}}(p(z))<1. (3.13)

Then, the reservoir system (3.11)-(3.12) has the echo state property and for each input 𝐳I{\bf z}\in I^{\mathbb{Z}} it has a unique causal and time-invariant solution given by

𝐱t\displaystyle\mathbf{x}_{t} =j=0(k=0j1p(ztk))q(ztj),\displaystyle=\sum_{j=0}^{\infty}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j}), (3.14)
yt\displaystyle y_{t} =𝐖𝐱t,\displaystyle={\bf W}^{\top}\mathbf{x}_{t}, (3.15)

where

k=0j1p(ztk):=p(zt)p(zt1)p(ztj+1).\prod_{k=0}^{j-1}p(z_{t-k}):=p(z_{t})\cdot p(z_{t-1})\cdots p(z_{t-j+1}).

Let now K2:=maxzIq(z)2K_{2}:=\max_{z\in I}\|q(z)\|_{2}. Then,

𝐱tK21K1,for all t.\left\|\mathbf{x}_{t}\right\|\leq\frac{K_{2}}{1-K_{1}},\quad\mbox{for all }\quad t\in\mathbb{Z}. (3.16)

We will denote by U𝐖p,q:IU_{{\bf W}}^{p,q}:I^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} and H𝐖p,q:IH_{{\bf W}}^{p,q}:I^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} the corresponding SAS reservoir filter and SAS functional, respectively.

The next result presents two alternative conditions that imply the hypothesis maxzIp(z)2<1\max_{z\in I}\|p(z)\|_{2}<1 in the previous proposition and that are easier to verify in practice.

Lemma 3.8

Let p(z)𝕄N[z]p(z)\in\mathbb{M}_{N}[z] be the polynomial given by

p(z):=A0+zA1+z2A2++zn1An1,n1.p(z):=A_{0}+zA_{1}+z^{2}A_{2}+\cdots+z^{n_{1}}A_{n_{1}},\quad{n_{1}}\in\mathbb{N}.

Suppose that zIz\in I and consider the following three conditions:

(i)

There exists a constant 0<λ<10<\lambda<1, such that Ai2=σmax(Ai)<λ\|A_{i}\|_{2}=\sigma_{{\rm max}}(A_{i})<\lambda, for any i{0,1,,n1}i\in\{0,1,\ldots,{n_{1}}\}, and that at the same time satisfies that λ(n1+1)<1\lambda({n_{1}}+1)<1.

(ii)

Bp:=A02+A12++An12<1B_{p}:=\|A_{0}\|_{2}+\|A_{1}\|_{2}+\cdots+\|A_{n_{1}}\|_{2}<1.

(iii)

Mp:=maxzIp(z)2<1M_{p}:=\max_{z\in I}\|p(z)\|_{2}<1.

Then, condition (i) implies (ii) and condition (ii) implies (iii).

We emphasize that since Proposition 3.7 was proved using condition (iii) in the previous lemma then, any of the three conditions in that statement imply the echo state property for (3.14)-(3.15) and the time-invariance of the corresponding solutions. The next result shows that the same situation holds in relation with the fading memory property.

Proposition 3.9

Consider a non-homogeneous state-affine system as in (3.11)-(3.12) determined by polynomials p,qp,q, and a vector 𝐖{\bf W}, with inputs defined on II^{\mathbb{Z}}, I:=[1,1]I:=[-1,1]. If the polynomial pp satisfies any of the three conditions in Lemma 3.8 then the corresponding reservoir filter has the fading memory property. More specifically, if pp satisfies condition (i) in Lemma 3.8, then H𝐖p,q:(I,wρ)H_{{\bf W}}^{p,q}:(I^{\mathbb{Z}_{-}},\|\cdot\|_{w^{\rho}})\longrightarrow\mathbb{R} is continuous with wtρ:=(n1+1)ρtλρtw^{\rho}_{t}:=(n_{1}+1)^{\rho t}\lambda^{\rho t} and ρ(0,1)\rho\in(0,1) arbitrary. The same conclusion holds for conditions (ii) and (iii) with wtρ:=Bpρtw^{\rho}_{t}:=B_{p}^{\rho t} and wtρ:=Mpρtw^{\rho}_{t}:=M_{p}^{\rho t}, respectively.

The importance of SAS in relation to the universality problem has to do with the fact that they form a polynomial algebra which allows us, under certain conditions, to use the Stone-Weierstrass theorem to prove a density statement. Before we show that, we observe that for any two polynomials p1(z)𝕄N1,M1[z]p_{1}(z)\in\mathbb{M}_{N_{1},M_{1}}[z] and p2(z)𝕄N2,M2[z]p_{2}(z)\in\mathbb{M}_{N_{2},M_{2}}[z] given by

p1(z)\displaystyle p_{1}(z) :=A01+zA11+z2A21++zn1An11,\displaystyle:=A_{0}^{1}+zA_{1}^{1}+z^{2}A_{2}^{1}+\cdots+z^{n_{1}}A_{n_{1}}^{1}, (3.17)
p2(z)\displaystyle p_{2}(z) :=A02+zA12+z2A22++zn2An22,\displaystyle:=A_{0}^{2}+zA_{1}^{2}+z^{2}A_{2}^{2}+\cdots+z^{n_{2}}A_{n_{2}}^{2}, (3.18)

with n1,n2n_{1},n_{2}\in\mathbb{N}, their direct sum and their tensor product are also polynomials in zz with matrix coefficients. More explicitly, p1p2(z)𝕄N1+N2,M1+M2[z]p_{1}\oplus p_{2}(z)\in\mathbb{M}_{N_{1}+N_{2},M_{1}+M_{2}}[z] and is written as

p1p2(z)=A01A02+zA11A12+z2A21A22++zn2An21An22+zn2+1An2+11𝟎++zn1An11𝟎,p_{1}\oplus p_{2}(z)=A_{0}^{1}\oplus A_{0}^{2}+zA_{1}^{1}\oplus A_{1}^{2}+z^{2}A_{2}^{1}\oplus A_{2}^{2}+\cdots+z^{n_{2}}A_{n_{2}}^{1}\oplus A_{n_{2}}^{2}+z^{n_{2}+1}A_{n_{2}+1}^{1}\oplus{\bf 0}+\cdots+z^{n_{1}}A_{n_{1}}^{1}\oplus{\bf 0}, (3.19)

where we assumed that n2n1n_{2}\leq n_{1}. Analogously, their tensor product p1p2(z)𝕄N1N2,M1M2[z]p_{1}\otimes p_{2}(z)\in\mathbb{M}_{N_{1}\cdot N_{2},M_{1}\cdot M_{2}}[z] and is written as

p1p2(z)=i=0n1j=0n2zi+jAi1Aj2.p_{1}\otimes p_{2}(z)=\sum_{i=0}^{n_{1}}\sum_{j=0}^{n_{2}}z^{i+j}A_{i}^{1}\otimes A_{j}^{2}. (3.20)

The next result shows that the products and the linear combinations of SAS reservoir functionals are SAS reservoir functionals. Additionally, it makes explicit the polynomials that determine the corresponding SAS reservoir systems.

Proposition 3.10

Let H𝐖1p1,q1,H𝐖2p2,q2:IH_{{\bf W}_{1}}^{p_{1},q_{1}},H_{{\bf W}_{2}}^{p_{2},q_{2}}:I^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} be two SAS reservoir functionals associated to two corresponding time-invariant SAS reservoir systems. Assume that the two polynomials with matrix coefficients p1(z)𝕄N1[z]p_{1}(z)\in\mathbb{M}_{N_{1}}[z] and p2(z)𝕄N2[z]p_{2}(z)\in\mathbb{M}_{N_{2}}[z] satisfy that p1(z)2<1ϵ\|p_{1}(z)\|_{2}<1-\epsilon and p2(z)2<1ϵ\|p_{2}(z)\|_{2}<1-\epsilon for all zI:=[1,1]z\in I:=[-1,1] and a given 0<ϵ<10<\epsilon<1. Then, with the notation introduced in the expressions (3.19) and (3.20), we have that:

(i)

For any λ\lambda\in\mathbb{R}, the linear combination of the SAS reservoir functionals H𝐖1p1,q1+λH𝐖2p2,q2H_{{\bf W}_{1}}^{p_{1},q_{1}}+\lambda H_{{\bf W}_{2}}^{p_{2},q_{2}} is a SAS reservoir functional and:

H𝐖1p1,q1+λH𝐖2p2,q2=H𝐖1λ𝐖2p1p2,q1q2.H_{{\bf W}_{1}}^{p_{1},q_{1}}+\lambda H_{{\bf W}_{2}}^{p_{2},q_{2}}=H_{{\bf W}_{1}\oplus\lambda{\bf W}_{2}}^{p_{1}\oplus p_{2},q_{1}\oplus q_{2}}. (3.21)
(ii)

The product of the SAS reservoir functionals H𝐖1p1,q1H𝐖2p2,q2H_{{\bf W}_{1}}^{p_{1},q_{1}}\cdot H_{{\bf W}_{2}}^{p_{2},q_{2}} is a SAS reservoir functional and:

H𝐖1p1,q1H𝐖2p2,q2=H𝟎𝟎(𝐖1𝐖2)p,q1q2(q1q2),H_{{\bf W}_{1}}^{p_{1},q_{1}}\cdot H_{{\bf W}_{2}}^{p_{2},q_{2}}=H_{{\bf 0}\oplus{\bf 0}\oplus\left({\bf W}_{1}\otimes{\bf W}_{2}\right)}^{p,q_{1}\oplus q_{2}\oplus\left(q_{1}\otimes q_{2}\right)}, (3.22)

where p(z)𝕄N12[z]p(z)\in\mathbb{M}_{N_{12}}[z], N12:=N1+N2+N1N2N_{12}:=N_{1}+N_{2}+N_{1}\cdot N_{2}, is the polynomial with matrix coefficients in 𝕄N12\mathbb{M}_{N_{12}} whose block-matrix expression for the three summands in N1N2(N1N2)\mathbb{R}^{N_{1}}\oplus\mathbb{R}^{N_{2}}\oplus\left(\mathbb{R}^{N_{1}}\otimes\mathbb{R}^{N_{2}}\right) is:

p(z):=(p1(z)𝟎𝟎𝟎p2(z)𝟎p1q2(z)q1p2(z)p1p2(z)).p(z):=\left(\begin{array}[]{ccc}p_{1}(z)&{\bf 0}&{\bf 0}\\ {\bf 0}&p_{2}(z)&{\bf 0}\\ p_{1}\otimes q_{2}(z)&q_{1}\otimes p_{2}(z)&p_{1}\otimes p_{2}(z)\end{array}\right). (3.23)

The expression p1p2(z)𝕄N1N2[z]p_{1}\otimes p_{2}(z)\in\mathbb{M}_{N_{1}\cdot N_{2}}[z] denotes the element defined in (3.20). The symbol p1q2(z)p_{1}\otimes q_{2}(z) (respectively, q1p2(z)q_{1}\otimes p_{2}(z)) denotes the matrix of the linear map from N1\mathbb{R}^{N_{1}} (respectively, N2\mathbb{R}^{N_{2}}) to N1N2\mathbb{R}^{N_{1}}\otimes\mathbb{R}^{N_{2}} that associates to any 𝐯1N1{\bf v}_{1}\in\mathbb{R}^{N_{1}} the element (p1(z)𝐯1)q2(z)(p_{1}(z){\bf v}_{1})\otimes q_{2}(z) (respectively, q1(z)(p2(z)𝐯2)q_{1}(z)\otimes(p_{2}(z){\bf v}_{2}), with 𝐯2N2{\bf v}_{2}\in\mathbb{R}^{N_{2}}). When all the polynomials in (3.23) are written in terms of monomials using the conventions that we just mentioned and we factor out the different powers of the variable zz, we obtain a polynomial with matrix coefficients in 𝕄N12\mathbb{M}_{N_{12}} and with degree deg(p){\rm deg}(p) equal to

deg(p)=max{deg(p1)deg(q2),deg(q1)deg(p2),deg(p1)deg(p2)}.{\rm deg}(p)=\max\left\{{\rm deg}(p_{1})\cdot{\rm deg}(q_{2}),{\rm deg}(q_{1})\cdot{\rm deg}(p_{2}),{\rm deg}(p_{1})\cdot{\rm deg}(p_{2})\right\}.

The equalities (3.21) and (3.22) show that the SAS family forms a polynomial algebra.

Remark 3.11

Notice that the linear reservoir equation (3.7) is a particular case of the SAS reservoir equation (3.11) that is obtained by taking for pp and qq polynomials of degree zero and one, respectively. Regarding that specific case, Proposition 3.10 shows that linear reservoirs with linear readouts do not form a polynomial algebra. Indeed, as it can be seen in (3.22), the product of two SAS filters involves the tensor product q1q2q_{1}\otimes q_{2} which, when q1q_{1} and q2q_{2} come from a linear filter, it has degree two and it is hence not compatible with a linear reservoir filter.

Theorem 3.12 (Universality of SAS reservoir computers)

Let II^{\mathbb{Z}_{-}}\subset\mathbb{R}^{\mathbb{Z}_{-}} be the subset of real uniformly bounded sequences in I:=[1,1]I:=[-1,1] as in  (2.4), that is,

I:={𝐳zt[1,1],for allt0},I^{\mathbb{Z}_{-}}:=\{{\bf z}\in\mathbb{R}^{\mathbb{Z}_{-}}\mid z_{t}\in[-1,1],\ \mbox{for all}\quad t\leq 0\},

and let 𝒮ϵ\mathcal{S}_{\epsilon} be the family of functionals H𝐖p,q:IH_{{\bf W}}^{p,q}:I^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} induced by the state-affine systems defined in (3.11)-(3.12) that satisfy that Mp:=maxzIp(z)2<1ϵM_{p}:=\max_{z\in I}\|p(z)\|_{2}<1-\epsilon and Mq:=maxzIq(z)2<1ϵM_{q}:=\max_{z\in I}\|q(z)\|_{2}<1-\epsilon. The family 𝒮ϵ\mathcal{S}_{\epsilon} forms a polynomial subalgebra of wρ\mathcal{R}_{w^{\rho}} (as defined in (3.6)) with wtρ:=(1ϵ)ρtw^{\rho}_{t}:=(1-\epsilon)^{\rho t} and ρ(0,1)\rho\in(0,1) arbitrary, made of fading memory reservoir filters that contains the constant functions and separates points. The subfamily 𝒮ϵ\mathcal{S}_{\epsilon} is hence dense in the set (C0(I),wρ)(C^{0}(I^{\mathbb{Z}_{-}}),\|\cdot\|_{w^{\rho}}) of real-valued continuous functions on (I,wρ)(I^{\mathbb{Z}_{-}},\|\cdot\|_{w^{\rho}}).

This statement implies that any causal, time-invariant fading memory filter H:IH:I^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} can be uniformly approximated by elements in 𝒮ϵ\mathcal{S}_{\epsilon}. More specifically, for any fading memory filter HH and any ϵ>0\epsilon>0, there exist a natural number NN\in\mathbb{N}, polynomials p(z)𝕄N[z],q(z)𝕄N,1[z]p(z)\in\mathbb{M}_{N}[z],q(z)\in\mathbb{M}_{N,1}[z] with Mp,Mq<1ϵM_{p},M_{q}<1-\epsilon, and a vector 𝐖N\mathbf{W}\in\mathbb{R}^{N} such that

HH𝐖p,q:=supzI{|H(z)H𝐖p,q(z)|}<ϵ.\|H-H_{\bf W}^{p,q}\|_{\infty}:=\sup_{{z}\in I^{\mathbb{Z}_{-}}}\{|H({z})-H_{\bf W}^{p,q}({z})|\}<\epsilon.

The same universality result can be stated for the smaller subfamily 𝒩𝒮ϵ𝒮ϵ\mathcal{NS}_{\epsilon}\subset\mathcal{S}_{\epsilon} formed by SAS reservoir systems determined by nilpotent polynomials p(z)il[z]p(z)\in\mathbb{N}{\rm il}[z].

Remark 3.13

As it is stated in Theorem 3.12, it is the condition (iii) in Lemma 3.8 that yields a universal family of SAS fading memory reservoirs. As it can deduced from its proof (available in the Appendix 6.10), the families determined by conditions (i) or (ii) in that lemma contain SAS fading memory reservoirs but they do not form a polynomial algebra. In such cases, and according to Theorem 3.1, it is the algebras generated by them and not the families themselves that are universal.

Remark 3.14

A continuous-time analog of the universality result that we just proved can be obtained using the bilinear systems considered in Section 5.3 of [Boyd 85]. In discrete time, but only when the number of time steps is finite, this universal approximation property is exhibited [Flie 80] by homogeneous state-affine systems, that is, by setting q(z)=𝟎q(z)={\bf 0} in (3.11)-(3.12).

Remark 3.15

Generalization to multidimensional signals. When the input signal is defined in InI_{n}^{\mathbb{Z}}, with In:=[1,1]nI_{n}:=[-1,1]^{n}, a SAS family with the same universality properties can be defined by replacing the polynomials pp and qq in Definition 3.6, by polynomials of degree rr and ss of the form:

p(𝐳)\displaystyle p({\bf z}) =\displaystyle= i1,,in{0,,r}i1++inrz1i1zninAi1,,in,Ai1,,in𝕄N,𝐳In\displaystyle\sum_{{i_{1},\ldots,i_{n}\in\left\{0,\ldots,r\right\}\above 0.0pti_{1}+\cdots+i_{n}\leq r}}z_{1}^{i_{1}}\cdots z_{n}^{i_{n}}A_{{i_{1},\ldots,i_{n}}},\quad A_{{i_{1},\ldots,i_{n}}}\in\mathbb{M}_{N},\quad{\bf z}\in I_{n}
q(𝐳)\displaystyle q({\bf z}) =\displaystyle= i1,,in{0,,s}i1++insz1i1zninBi1,,in,Bi1,,in𝕄N,1,𝐳In.\displaystyle\sum_{{i_{1},\ldots,i_{n}\in\left\{0,\ldots,s\right\}\above 0.0pti_{1}+\cdots+i_{n}\leq s}}z_{1}^{i_{1}}\cdots z_{n}^{i_{n}}B_{{i_{1},\ldots,i_{n}}},\quad B_{{i_{1},\ldots,i_{n}}}\in\mathbb{M}_{N,1},\quad{\bf z}\in I_{n}.
Remark 3.16

SAS with trigonometric polynomials. An analogous construction can be carried out using trigonometric polynomials of the type:

p(𝐳)\displaystyle p({\bf z}) =\displaystyle= i1,,in{0,,r}i1++inrcos(i1z1++inzn)Ai1,,in,Ai1,,in𝕄N,𝐳In\displaystyle\sum_{{i_{1},\ldots,i_{n}\in\left\{0,\ldots,r\right\}\above 0.0pti_{1}+\cdots+i_{n}\leq r}}\cos\left(i_{1}\cdot z_{1}+\cdots+i_{n}\cdot z_{n}\right)A_{{i_{1},\ldots,i_{n}}},\quad A_{{i_{1},\ldots,i_{n}}}\in\mathbb{M}_{N},\quad{\bf z}\in I_{n}
q(𝐳)\displaystyle q({\bf z}) =\displaystyle= i1,,in{0,,s}i1++inscos(i1z1++inzn)Bi1,,in,Bi1,,in𝕄N,1,𝐳In.\displaystyle\sum_{{i_{1},\ldots,i_{n}\in\left\{0,\ldots,s\right\}\above 0.0pti_{1}+\cdots+i_{n}\leq s}}\cos\left(i_{1}\cdot z_{1}+\cdots+i_{n}\cdot z_{n}\right)B_{{i_{1},\ldots,i_{n}}},\quad B_{{i_{1},\ldots,i_{n}}}\in\mathbb{M}_{N,1},\quad{\bf z}\in I_{n}.

In this case, it is easy to establish that the resulting SAS family forms a polynomial algebra by invoking Proposition 3.10 and by reformulating the expressions (3.19) and (3.20) using the trigonometric identity

cos(θ)cos(ϕ)=12(cos(θϕ)+cos(θ+ϕ)).\cos(\theta)\cos(\phi)=\frac{1}{2}\left(\cos(\theta-\phi)+\cos(\theta+\phi)\right).

Additionally, the corresponding SAS family includes the linear family and hence the point separation property can be established as in the proof of Theorem 3.12 in the Appendix 6.10.

4 Reservoir universality results in the stochastic setup

This section extends the previously stated deterministic universality results to a setup in which the reservoir inputs and outputs are stochastic, that is, the reservoir is not driven anymore by infinite sequences but by discrete-time stochastic processes. We emphasize that we restrict our discussion to reservoirs that are deterministic and hence the only source of randomness in the systems considered is the stochastic nature of the input.

The results that follow are mainly based on the observation that if we adopt a uniform approximation criterion and we assume that the random inputs satisfy almost surely the uniform boundedness that we used as hypothesis in Section 3, then important features like the fading memory property or universality are naturally inherited in the stochastic setup from the deterministic case. This fact is what we call the deterministic-stochastic transfer principle and it is contained in the statement of Theorem 4.4 below. In particular, this result can be easily applied to show that all the universal families with deterministic inputs introduced in Section 3 are also universal in the stochastic setup when the input processes considered produce paths that, up to a set of measure zero, are uniformly bounded.

The stochastic setup.

All along this section we work on a probability space (Ω,𝒜,)(\Omega,\mathcal{A},\mathbb{P}). If a condition defined on this probability space holds everywhere except for a set with zero measure, we will say that the relation is true almost surely. Let 𝐗:ΩB{\bf X}:\Omega\longrightarrow B be a random variable with (B,B)(B,\left\|\cdot\right\|_{B}) a normed space endowed with a σ\sigma-algebra (for example, but not necessarily, its Borel σ\sigma-algebra). Let

𝐗L:=esssupωΩ{𝐗(ω)B}=inf{ρ+¯𝐗Bρalmost surely},\left\|{\bf X}\right\|_{L^{\infty}}:=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\{\left\|{\bf X}(\omega)\right\|_{B}\}=\inf\left\{\rho\in\overline{\mathbb{R}^{+}}\mid\left\|{\bf X}\right\|_{B}\leq\rho\quad\mbox{almost surely}\right\}, (4.1)

We denote by L(Ω,B)L^{\infty}(\Omega,B) the classes of BB-valued almost surely equal random variables whose norms have a finite essential supremum or that, equivalently, have almost surely bounded norms, that is,

L(Ω,B):=SB/B,L^{\infty}(\Omega,B):=S_{B}/\sim_{B}, (4.2)

where

SB:={𝐗:ΩB random variable𝐗L<},S_{B}:=\left\{{\bf X}:\Omega\longrightarrow B\ \mbox{ random variable}\mid\left\|{\bf X}\right\|_{L^{\infty}}<\infty\right\}, (4.3)

and B\sim_{B} is the equivalence relation defined on SBS_{B} as follows: two random variables 𝐘{\bf Y} and 𝐙{\bf Z} with finite L\left\|\cdot\right\|_{L^{\infty}} norm are B\sim_{B}-equivalent if and only if ({ωΩ:𝐘(ω)𝐙(ω)}=0\mathbb{P}(\left\{\omega\in\Omega:{\bf Y}(\omega)\neq{\bf Z}(\omega)\right\}=0. As it is customary in the literature, we will not make a distinction in what follows between the elements in SBS_{B} and the classes in the quotient L(Ω,B)L^{\infty}(\Omega,B). Using this identification we recall, for example, that L(Ω,B)L^{\infty}(\Omega,B) is a vector space with the operations

(𝐗+λ𝐘)(ω):=𝐗(ω)+λ𝐘(ω)({\bf X}+\lambda{\bf Y})(\omega):={\bf X}(\omega)+\lambda{\bf Y}(\omega) (4.4)

for any 𝐗,𝐘L(Ω,B){\bf X},{\bf Y}\in L^{\infty}(\Omega,B), λ\lambda\in\mathbb{R}, ωΩ\omega\in\Omega. Moreover, (L(Ω,B),L)(L^{\infty}(\Omega,B),\left\|\cdot\right\|_{L^{\infty}}) is a normed space. We emphasize that L(Ω,B)L^{\infty}(\Omega,B) is in general not a Banach space (see [Ledo 91, pages 42 and 46]. It can be shown that whenever BB is finite dimensional or, more generally, a separable Banach space, then the space L(Ω,B)L^{\infty}(\Omega,B) is also a Banach space [Pisi 16].

Given an element 𝐗L(Ω,B){\bf X}\in L^{\infty}(\Omega,B), we denote by E[𝐗]{\rm E}\left[{\bf X}\right] its expectation. The following lemma collects some elementary results that will be needed later on and shows, in particular, that the expectation E[𝐗]{\rm E}\left[{\bf X}\right] as well as that of all the powers 𝐗Bk\left\|{\bf X}\right\|_{B}^{k} of its norm are finite for all the elements 𝐗L(Ω,B){\bf X}\in L^{\infty}(\Omega,B).

Lemma 4.1

Let 𝐗L(Ω,B){\bf X}\in L^{\infty}(\Omega,B) and let C+¯C\in\overline{\mathbb{R}^{+}}. Then:

(i)

𝐗B𝐗L\left\|{\bf X}\right\|_{B}\leq\left\|{\bf X}\right\|_{L^{\infty}} almost surely.

(ii)

𝐗LC\left\|{\bf X}\right\|_{L^{\infty}}\leq C if and only if 𝐗BC\left\|{\bf X}\right\|_{B}\leq C almost surely.

(iii)

𝐗BC\left\|{\bf X}\right\|_{B}\leq C almost surely if and only if E[𝐗Bk]Ck{\rm E}\left[\|{\bf X}\|_{B}^{k}\right]\leq C^{k} for any kk\in\mathbb{N}.

(iv)

Let B=nB=\mathbb{R}^{n}, then the components XiX_{i} of 𝐗{\bf X}, i{1,,n}i\in\left\{1,\ldots,n\right\}, are such that E[Xi]𝐗L{\rm E}\left[X_{i}\right]\leq\left\|{\bf X}\right\|_{L^{\infty}}.

The first point in this lemma explains why we will refer to the elements of L(Ω,B)L^{\infty}(\Omega,B) as almost surely bounded random variables.

Stochastic inputs and outputs. The filters that we will consider in this section have almost surely bounded stochastic processes as inputs and outputs. Recall that a discrete-time stochastic process is a map of the type:

𝐳:×Ωn(t,ω)𝐳t(ω),\begin{array}[]{cccc}{\bf z}:&\mathbb{Z}\times\Omega&\longrightarrow&\mathbb{R}^{n}\\ &(t,\omega)&\longmapsto&{\bf z}_{t}(\omega),\end{array} (4.5)

such that, for each tt\in\mathbb{Z}, the assignment 𝐳t:Ωn{\bf z}_{t}:\Omega\longrightarrow{\mathbb{R}}^{n} is a random variable. For each ωΩ\omega\in\Omega, we will denote by 𝐳(ω):={𝐳t(ω)nt}{\bf z}(\omega):=\{{\bf z}_{t}(\omega)\in\mathbb{R}^{n}\mid t\in\mathbb{Z}\} the realization or the sample path of the process 𝐳{\bf z}. The results that follow are presented for stochastic processes indexed by \mathbb{Z} but are equally valid for +\mathbb{Z}_{+} and \mathbb{Z}_{-}.

Recall that a map of the type (4.5) is a n\mathbb{R}^{n}-valued stochastic process if and only if the corresponding map 𝐳:Ω(n){\bf z}:\Omega\longrightarrow\left(\mathbb{R}^{n}\right)^{\mathbb{Z}} into path space (designated with the same symbol) is a random variable when in (n)\left(\mathbb{R}^{n}\right)^{\mathbb{Z}} we consider the product sigma algebra generated by cylinder sets [Come 06, Chapter 1]. Then, the space of n\mathbb{R}^{n}-valued stochastic processes can be made into a vector space with the same operations as in (4.4) and we can define in this space a norm L\|\cdot\|_{L^{\infty}} using the same prescription as in (4.1) by considering (n)\left(\mathbb{R}^{n}\right)^{\mathbb{Z}} as a normed space with the supremum norm \|\cdot\|_{\infty}, that is,

𝐳L:=esssupωΩ{𝐳(ω)}=esssupωΩ{supt{𝐳t(ω)}}.\left\|{\bf z}\right\|_{L^{\infty}}:=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\{\left\|{\bf z}(\omega)\right\|_{\infty}\}=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (4.6)

The following lemma provides an alternative characterization of the norm L\|\cdot\|_{L^{\infty}} that will be very useful in the proofs of the results that follow and in which the supremum and the essential supremum have been interchanged. The last statement contained in it complements part (ii) of Lemma 4.1 for processes.

Lemma 4.2

Let 𝐳:Ω(n){\bf z}:\Omega\longrightarrow\left({\mathbb{R}}^{n}\right)^{\mathbb{Z}} be a stochastic process. Then,

𝐳L:=esssupωΩ{supt{𝐳t(ω)}}=supt{esssupωΩ{𝐳t(ω)}}.\left\|{\bf z}\right\|_{L^{\infty}}:=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}=\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (4.7)

Equivalently, using the notation in (4.1),

𝐳L:=supt{𝐳t(ω)}L=supt{𝐳tL}.\left\|{\bf z}\right\|_{L^{\infty}}:=\left\|\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}(\omega)\|\}\right\|_{L^{\infty}}=\sup_{t\in\mathbb{Z}}\{\left\|{\bf z}_{t}\right\|_{L^{\infty}}\}. (4.8)

These equalities imply that for any non-negative real number C0C\geq 0, the process 𝐳{\bf z} satisfies that 𝐳LC\|{\bf z}\|_{L^{\infty}}\leq C if and only if 𝐳tLC\|{\bf z}_{t}\|_{L^{\infty}}\leq C for all tt\in\mathbb{Z} or, equivalently, if and only if supt{𝐳tL}C\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}\|_{L^{\infty}}\}\leq C.

Consider now the space L(Ω,(n))L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right) of processes with finite L\left\|\cdot\right\|_{L^{\infty}} norm. We refer to the elements of L(Ω,(n))L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right) as almost surely bounded time series. Additionally, consider the space L(Ω,(n))L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right) of processes whose paths are all uniformly bounded, that is, they lay in the Banach space ((n),)(\ell^{\infty}({\mathbb{R}}^{n}),\|\cdot\|_{\infty}). According to the definition in (4.2), we have for both these spaces that

L(Ω,(n)):=S(n)/(n),L(Ω,(n)):=S(n)/(n)L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right):=S_{(\mathbb{R}^{n})^{\mathbb{Z}}}/\sim_{(\mathbb{R}^{n})^{\mathbb{Z}}},\quad L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right):=S_{\ell^{\infty}({\mathbb{R}}^{n})}/\sim_{\ell^{\infty}({\mathbb{R}}^{n})}

with

S(n):={𝐳:×Ωn stochastic process, 𝐳(ω)(n),for allωΩ𝐳L<},S_{(\mathbb{R}^{n})^{\mathbb{Z}}}:=\left\{{\bf z}:\mathbb{Z}\times\Omega\longrightarrow\mathbb{R}^{n}\ \mbox{ stochastic process, }{\bf z}(\omega)\in(\mathbb{R}^{n})^{\mathbb{Z}},\ \mbox{for all}\ \omega\in\Omega\mid\left\|{\bf z}\right\|_{L^{\infty}}<\infty\right\},
S(n):={𝐳:×Ωn stochastic process, 𝐳(ω)(n),for allωΩ𝐳L<},S_{\ell^{\infty}({\mathbb{R}}^{n})}:=\left\{{\bf z}:\mathbb{Z}\times\Omega\longrightarrow\mathbb{R}^{n}\ \mbox{ stochastic process, }{\bf z}(\omega)\in\ell^{\infty}({\mathbb{R}}^{n}),\ \mbox{for all}\ \omega\in\Omega\mid\left\|{\bf z}\right\|_{L^{{\infty}}}<\infty\right\},

and with the almost sure equality equivalence relations (n)\sim_{\ell^{\infty}({\mathbb{R}}^{n})} and (n)\sim_{(\mathbb{R}^{n})^{\mathbb{Z}}} between stochastic processes with paths in (n)\ell^{\infty}({\mathbb{R}}^{n}) and (n)(\mathbb{R}^{n})^{\mathbb{Z}}, respectively. The following result shows that the normed spaces L(Ω,(n))L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right) and L(Ω,(n))L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right) are isomorphic.

Lemma 4.3

In the setup that we just introduced the inclusion ι:S(n)S(n)\iota:S_{\ell^{\infty}({\mathbb{R}}^{n})}\hookrightarrow S_{(\mathbb{R}^{n})^{\mathbb{Z}}} is equivariant with respect to the equivalence relations (n)\sim_{\ell^{\infty}({\mathbb{R}}^{n})} and (n)\sim_{(\mathbb{R}^{n})^{\mathbb{Z}}} and drops to an isomorphism of normed spaces ϕ:(L(Ω,(n)),L)(L(Ω,(n)),L)\phi:(L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right),\left\|\cdot\right\|_{L^{\infty}})\longrightarrow(L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right),\left\|\cdot\right\|_{L^{\infty}}). Equivalently, the following diagram commutes {diagram} where Π(n)\Pi_{\sim_{\ell^{\infty}({\mathbb{R}}^{n})}} and Π(n)\Pi_{\sim_{({\mathbb{R}}^{n})^{\mathbb{Z}}}} are the canonical projections.

Let now ww be a weighting sequence and let w\|\cdot\|_{w} be the associated weighted norm in (n)\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}}. If we replace in (4.6) the \ell^{\infty} norm \|\cdot\|_{\infty} by the weighted norm w\|\cdot\|_{w}, we obtain a weighted norm Lw\|\cdot\|_{L^{\infty}_{w}} in the space of processes 𝐳:×Ωn{\bf z}:\mathbb{Z}_{-}\times\Omega\longrightarrow\mathbb{R}^{n} indexed by \mathbb{Z}_{-} as:

𝐳Lw:=esssupωΩ{𝐳(ω)w}=esssupωΩ{supt{𝐳t(ω)wt}}.\left\|{\bf z}\right\|_{L^{\infty}_{w}}:=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\{\left\|{\bf z}(\omega)\right\|_{w}\}=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}_{-}}\left\{\|{\bf z}_{t}(\omega)\|w_{-t}\right\}\right\}. (4.9)

We will denote by Lw(Ω,(n))L^{\infty}_{w}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}_{-}}\right) the space of processes with finite Lw\left\|\cdot\right\|_{L^{\infty}_{w}} norm. A result similar to Lemma 4.3 shows that the normed spaces (Lw(Ω,(n)),Lw)(L^{\infty}_{w}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}_{-}}\right),\left\|\cdot\right\|_{L^{\infty}_{w}}) and (L(Ω,w(n)),Lw)(L^{\infty}\left(\Omega,\ell^{\infty}_{w}({\mathbb{R}}^{n})\right),\left\|\cdot\right\|_{L^{\infty}_{w}}) are isomorphic. Additionally, as in Lemma 4.2, we have that for any 𝐳Lw(Ω,(n)){\bf z}\in L^{\infty}_{w}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}_{-}}\right):

𝐳Lw:=esssupωΩ{supt{𝐳t(ω)wt}}=supt{esssupωΩ{𝐳t(ω)wt}}.\left\|{\bf z}\right\|_{L^{\infty}_{w}}:=\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}_{-}}\left\{\|{\bf z}_{t}(\omega)\|w_{-t}\right\}\right\}=\mathop{{\rm sup}}_{t\in\mathbb{Z}_{-}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|w_{-t}\right\}\right\}. (4.10)

Deterministic filters in a stochastic setup. As we already pointed out, we consider filters UU that have almost surely bounded processes as inputs and outputs. The same conventions as in the deterministic setup are used in the identification of the different signals, namely, 𝐳{\bf z} denotes the filter input process and the symbol yy is reserved for the output process. Let now DnnD_{n}\subset{\mathbb{R}}^{n} and let DnLL(Ω,(n))D_{n}^{L^{\infty}_{\mathbb{Z}}}\subset L^{\infty}(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}) be a subset formed by processes whose paths take values in DnD_{n} almost surely. In the sequel we will restrict our attention to intrinsically deterministic filters U:DnLL(Ω,)U:D_{n}^{L^{\infty}_{\mathbb{Z}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) that are obtained by presenting almost surely bounded stochastic inputs 𝐳DnLL(Ω,(n)){\bf z}\in D_{n}^{L^{\infty}_{\mathbb{Z}}}\subset L^{\infty}(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}) to filters U:(Dn)U:\left(D_{n}\right)^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} similar to those introduced in the previous section, which explains why we use the same symbol for both. This is explicitly carried out by defining the output process U(𝐳)L(Ω,)U({\bf z})\in L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) using the convention

(U(𝐳))(ω):=U(𝐳(ω)),ωΩ,(U({\bf z}))(\omega):=U({\bf z}(\omega)),\quad\omega\in\Omega, (4.11)

where on the right hand side it is the filter U:(Dn)U:\left(D_{n}\right)^{\mathbb{Z}}\longrightarrow\mathbb{R}^{\mathbb{Z}} which is applied to the paths 𝐳(ω):={𝐳t(ω)nt}(Dn){\bf z}(\omega):=\{{\bf z}_{t}(\omega)\in\mathbb{R}^{n}\mid t\in\mathbb{Z}\}\in\left(D_{n}\right)^{\mathbb{Z}} of the process 𝐳{\bf z}. We call these filters deterministic because, in view of (4.11) the dependence of the image process (U(𝐳))(ω)L(Ω,)(U({\bf z}))({\omega})\in\in L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) on the probability space takes place exclusively through the dependence 𝐳(ω){\bf z}({\omega}) in the input. In this section we reserve the symbol UU to denote deterministic filters U:DnLL(Ω,)U:D_{n}^{L^{\infty}_{\mathbb{Z}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}). We draw attention to the fact that assuming that the filters map into almost surely bounded processes is a genuine hypothesis that needs to be verified in each specific case considered.

The concepts of causality and time-invariance are defined as in the deterministic case by replacing equalities by almost sure equalities in the corresponding identities. More explicitly, we say that the filter U:DnLL(Ω,)U:D_{n}^{L^{\infty}_{\mathbb{Z}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) is time-invariant when for any τ\tau\in\mathbb{Z} and any 𝐳DnL{\bf z}\in D_{n}^{L^{\infty}_{\mathbb{Z}}}, we have that

(UτU)(𝐳)=(UUτ)(𝐳),almost surely.(U_{\tau}\circ U)({\bf z})=(U\circ U_{\tau})({\bf z}),\quad\mbox{almost surely.}

Analogously, we say that the filter is causal with stochastic inputs when for any two elements 𝐳,𝐰DnL{\bf z},\mathbf{w}\in D_{n}^{L^{\infty}_{\mathbb{Z}}} that satisfy that 𝐳τ=𝐰τ{\bf z}_{\tau}=\mathbf{w}_{\tau} almost surely, for any τt\tau\leq t and for a given tt\in\mathbb{Z}, we have that U(𝐳)t=U(𝐰)tU({\bf z})_{t}=U({\bf w})_{t}, almost surely. Causal and time-invariant deterministic filters produce almost surely causal and time-invariant filters when stochastic inputs are presented to them.

In this setup, there is also a correspondence between causal and time-invariant filters U:DnLL(Ω,)U:D_{n}^{L^{\infty}_{\mathbb{Z}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) and functionals HU:DnLL(Ω,)H_{U}:D_{n}^{L^{\infty}_{\mathbb{Z}_{-}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}), where DnL:=(DnL)D_{n}^{L^{\infty}_{\mathbb{Z}_{-}}}:=\mathbb{P}_{\mathbb{Z}_{-}}\left(D_{n}^{L^{\infty}_{\mathbb{Z}}}\right).

Given a weighting sequence w:(0,1]w:\mathbb{N}\longrightarrow(0,1] and a time-invariant filter U:DnLL(Ω,)U:D_{n}^{L^{\infty}_{\mathbb{Z}_{-}}}\longrightarrow L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}}) with stochastic inputs, we will say that UU has the fading memory property with respect to the weighting sequence ww when the corresponding functional HU:(DnL,Lw)L(Ω,)H_{U}:\left(D_{n}^{L^{\infty}_{\mathbb{Z}_{-}}},\|\cdot\|_{L^{\infty}_{w}}\right)\longrightarrow L^{\infty}(\Omega,\mathbb{R}) is a continuous map.

Let M>0M>0 and define, using Lemma 4.2,

KML:={𝐳L(Ω,(n))𝐳LM}={𝐳L(Ω,(n))𝐳tLM,for allt}.K^{L^{\infty}}_{M}:=\left\{{\bf z}\in L^{\infty}(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}_{-}})\mid\|{\bf z}\|_{L^{\infty}}\leq M\right\}=\left\{{\bf z}\in L^{\infty}(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}_{-}})\mid\|{\bf z}_{t}\|_{L^{\infty}}\leq M,\ \mbox{for all}\ t\in\mathbb{Z}_{-}\right\}. (4.12)

The sets KMLK^{L^{\infty}}_{M} are the stochastic counterparts of the sets KMK_{M} in the deterministic setup; we will say that KMLK^{L^{\infty}}_{M} is a set of almost surely uniformly bounded processes. A stochastic analog of Lemma  2.1 can be formulated for them with KMK_{M} replaced by KMLK^{L^{\infty}}_{M}, the norm \left\|\cdot\right\| by L\left\|\cdot\right\|_{L^{\infty}}, and the weighted norm w\left\|\cdot\right\|_{w} by Lw\left\|\cdot\right\|_{L^{\infty}_{w}}. Indeed, the following result shows that the fading memory and the universality properties are naturally inherited by deterministic filters with almost surely uniformly bounded inputs. We call this fact the deterministic-stochastic transfer principle.

Theorem 4.4 (Deterministic-stochastic transfer principle)

Let M>0M>0 and let KMK_{M} and KMLK_{M}^{L^{\infty}} be the sets of deterministic and stochastic inputs defined in  (2.4) and  (4.12), respectively. The following properties hold true:

(i)

Let H:(KM,w)H:(K_{M},\left\|\cdot\right\|_{w})\longrightarrow\mathbb{R} be a causal and time-invariant filter. Then HH has the fading memory property if and only if the corresponding filter with almost surely uniformly bounded inputs has almost surely bounded outputs, that is, H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}), and it has the fading memory property.

(ii)

Let 𝒯:={Hi:(KM,w)iI}\mathcal{T}:=\left\{H_{i}:(K_{M},\left\|\cdot\right\|_{w})\longrightarrow\mathbb{R}\mid i\in I\right\} be a family of causal and time-invariant fading memory filters. Then, 𝒯\mathcal{T} is dense in the set (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}) if and only if the corresponding family with inputs in KMLK^{L^{\infty}}_{M} is universal in the set of continuous maps of the type H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}).

A first universality result using RC systems.

The following paragraphs contain a stochastic analog of Theorem 3.1 which shows that any fading memory filter with almost surely uniformly bounded inputs can be approximated using the elements of a polynomial algebra of reservoir filters with the same kind of inputs, provided that it contains the constant functionals and has the separation property. We note that, as in the deterministic case, the existence of the reservoir filter associated to a reservoir system like (1.1)-(1.2) is guaranteed only in the presence of the echo state property. The next lemma shows that this property is inherited by deterministic fading memory reservoir filters with almost surely bounded inputs.

Lemma 4.5

Consider a reservoir system determined by the relations (1.1)–(1.2) and the maps F:DN×Bn(𝟎,M)¯DNF:D_{N}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N} and h:DNh:D_{N}\rightarrow\mathbb{R}, for some n,Nn,N\in\mathbb{N}, M>0M>0, and DNND_{N}\subset\mathbb{R}^{N}. If this reservoir system has the echo state and the fading memory properties then so does the corresponding system with stochastic inputs in KMLK^{L^{\infty}}_{M} which, additionally, has an associated reservoir functional HhF:(KML,Lw)L(Ω,)H_{h}^{F}:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) with almost surely bounded outputs that satisfies the fading memory property.

Theorem 4.6

Let M>0M>0 and let KMLK_{M}^{L^{\infty}} be the set of almost surely uniformly bounded processes introduced in (4.12). Consider the set \mathcal{R}

:={HhiFi:KMLL(Ω,)hiPol(Ni,),Fi:Ni×nNi,iI,Ni}\mathcal{R}:=\{H_{h_{i}}^{F_{i}}:K_{M}^{L^{\infty}}\longrightarrow L^{\infty}(\Omega,\mathbb{R})\mid h_{i}\in{\rm Pol}(\mathbb{R}^{N_{i}},\mathbb{R}),F_{i}:\mathbb{R}^{N_{i}}\times\mathbb{R}^{n}\longrightarrow\mathbb{R}^{N_{i}},i\in I,N_{i}\in\mathbb{N}\} (4.13)

formed by deterministic fading memory reservoir filters with respect to a given weighted norm w\|\cdot\|_{w} and driven by stochastic inputs in KMLK_{M}^{L^{\infty}}. Let 𝒜()\mathcal{A}(\mathcal{R}) be the polynomial algebra generated by \mathcal{R}. If the algebra 𝒜()\mathcal{A}(\mathcal{R}) has the separation property and contains all the constant functionals, then any deterministic, causal, time-invariant fading memory filter H:(KML,Lw)L(Ω,)H:(K_{M}^{L^{\infty}},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) can be uniformly approximated by elements in 𝒜()\mathcal{A}(\mathcal{R}), that is, for any ϵ>0\epsilon>0, there exist a finite set of indices {i1,,ir}I\{i_{1},\ldots,i_{r}\}\subset I and a polynomial p:rp:\mathbb{R}^{r}\longrightarrow\mathbb{R} such that

HHhF:=sup𝐳KML{H(𝐳)HhF(𝐳)L}<ϵwithh:=p(hi1,,hir)andF:=(Fi1,,Fir).\|H-H_{h}^{F}\|_{\infty}:=\sup_{{\bf z}\in K_{M}^{L^{\infty}}}\{\|H({\bf z})-H_{h}^{F}({\bf z})\|_{L^{\infty}}\}<\epsilon\quad\mbox{with}\quad h:=p(h_{i_{1}},\ldots,h_{i_{r}})\quad\mbox{and}\quad F:=(F_{i_{1}},\ldots,F_{i_{r}}).

In the next paragraphs we identify, as in the deterministic case, families of reservoirs that satisfy the hypotheses of this theorem and where we will eventually impose linearity constraints on the readout function. The following corollary to Theorem 4.6 is the stochastic analog of Corollary 3.2.

Corollary 4.7

Let M>0M>0 and let KMLK_{M}^{L^{\infty}} be the set of almost surely uniformly bounded processes introduced in (4.12). Let

w:={HhF:KMLL(Ω,)hPol(N,),F:N×nN,N}\mathcal{R}_{w}:=\{H_{h}^{F}:K_{M}^{L^{\infty}}\longrightarrow L^{\infty}(\Omega,\mathbb{R})\mid h\in{\rm Pol}(\mathbb{R}^{N},\mathbb{R}),F:\mathbb{R}^{N}\times\mathbb{R}^{n}\longrightarrow\mathbb{R}^{N},N\in\mathbb{N}\} (4.14)

be the set of all the reservoir filters defined on KMLK_{M}^{L^{\infty}} that have the FMP with respect to a given weighted norm Lw\|\cdot\|_{L^{\infty}_{w}}. Then w\mathcal{R}_{w} is universal, that is, for any time-invariant fading memory filter H:(KML,Lw)L(Ω,)H:(K_{M}^{L^{\infty}},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) and any ϵ>0\epsilon>0, there exists a reservoir filter HhFwH_{h}^{F}\in\mathcal{R}_{w} such that HHhF:=sup𝐳KML{H(𝐳)HhF(𝐳)L}<ϵ.\|H-H_{h}^{F}\|_{\infty}:=\sup_{{\bf z}\in K_{M}^{L^{\infty}}}\{\|H({\bf z})-H_{h}^{F}({\bf z})\|_{L^{\infty}}\}<\epsilon.

Linear reservoir computers with stochastic inputs are universal.

As it was the case in the deterministic setup, we can prove in the stochastic case that the linear RC family introduced in (3.7)-(3.8) suffices to achieve universality. The proof of the following statement is a direct consequence of Corollary  3.4 and Theorem  4.4.

Corollary 4.8

Let M>0M>0 and let KMLK_{M}^{L^{\infty}} be the set of almost surely uniformly bounded processes introduced in (4.12). Let ϵ\mathcal{L}_{\epsilon} be the family introduced in Corollary 3.4 and formed by all the linear reservoir filters HpA,𝐜H^{A,{\bf c}}_{p} determined by matrices A𝕄NA\in\mathbb{M}_{N} such that σmax(A)<1ϵ\sigma_{{\rm max}}(A)<1-\epsilon. The elements in ϵ\mathcal{L}_{\epsilon} map KMLK_{M}^{L^{\infty}} into L(Ω,)L^{\infty}(\Omega,\mathbb{R}) and are time-invariant fading memory filters with respect to the weighted norm wρL\|\cdot\|_{w^{\rho}}^{L^{\infty}} associated to wtρ:=(1ϵ)ρtw_{t}^{\rho}:=(1-\epsilon)^{\rho t}, for any ρ(0,1)\rho\in(0,1). Moreover, they are universal, that is, for any time-invariant and causal fading memory filter H:(KML,Lwρ)L(Ω,)H:(K_{M}^{L^{\infty}},\|\cdot\|_{L^{\infty}_{w^{\rho}}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) and any ε>0\varepsilon>0, there exists HpA,𝐜ϵH^{A,{\bf c}}_{p}\in\mathcal{L}_{\epsilon} such that HHpA,𝐜:=sup𝐳KML{H(𝐳)HpA,𝐜(𝐳)L}<ε.\|H-H^{A,{\bf c}}_{p}\|_{\infty}:=\sup_{{\bf z}\in K_{M}^{L^{\infty}}}\{\|H({\bf z})-H^{A,{\bf c}}_{p}({\bf z})\|_{L^{\infty}}\}<\varepsilon.

The same universality result can be stated for the subfamily 𝒟ϵϵ\mathcal{DL}_{\epsilon}\subset\mathcal{L}_{\epsilon}, formed by the linear reservoir systems in ϵ\mathcal{L}_{\epsilon} determined by diagonal matrices, and for 𝒩ϵ\mathcal{NL}\subset\mathcal{L}_{\epsilon}, formed by the linear reservoir systems determined by nilpotent matrices.

Remark 4.9

The linear reservoir filters in 𝒩\mathcal{NL} determined by nilpotent matrices have been used in [Gono 18] to formulate a LpL^{p} version of these universality results.

Remark 4.10

The previous corollary has interesting consequences in the realm of time series analysis. Indeed, many well-known parametric time series models consist in autoregressive relations, possibly nonlinear, driven by independent or uncorrelated innovations. The parameter constraints that are imposed on them in order to ensure that they have (second order) stationary solutions imply, in may situations, that the resulting filter has the FMP. In those cases, Corollary 4.8 allows us to conclude that when those models are driven by almost surely uniformly bounded innovations, they can be arbitrarily well approximated by a polynomial function of a vector autoregressive model (VAR) of order 1. This statement applies, for example, to any stationary ARMA [Box 76, Broc 06] or GARCH [Engl 82, Boll 86, Fran 10] model driven by almost surely uniformly bounded innovations.

State-affine reservoir computers with almost surely uniformly bounded inputs are universal.

As it was the case in the deterministic setup, non-homogeneous SAS are universal time-invariant fading memory filters in the stochastic framework with almost surely uniformly bounded inputs. Their advantage with respect to the families proposed in the previous corollary is that they use a linear readout which is of major importance in practical implementations. More specifically, the following result holds true as a direct consequence of Theorem 3.12 and the equivalence stated in Theorem  4.4.

Theorem 4.11

(Universality of SAS reservoir computers with almost surely uniformly bounded inputs) Let KILL(Ω,)K^{L^{\infty}}_{I}\subset L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}_{-}}) be the set of almost surely and uniformly bounded processes in the interval I=[1,1]I=[-1,1], that is,

KIL:={zL(Ω,)ztL1,for allt}.K^{L^{\infty}}_{I}:=\left\{{z}\in L^{\infty}(\Omega,\mathbb{R}^{\mathbb{Z}_{-}})\mid\|{z}_{t}\|_{L^{\infty}}\leq 1,\quad\mbox{for all}\quad t\in\mathbb{Z}_{-}\right\}.

Let 𝒮ϵ\mathcal{S}_{\epsilon} be the family of functionals H𝐖p,q:KILL(Ω,)H_{{\bf W}}^{p,q}:K^{L^{\infty}}_{I}\longrightarrow L^{\infty}(\Omega,\mathbb{R}) induced by the state-affine systems defined in (3.11)-(3.12) and that satisfy Mp:=maxzIp(z)<1ϵM_{p}:=\max_{z\in I}\|p(z)\|<1-\epsilon and Mq:=maxzIq(z)<1ϵM_{q}:=\max_{z\in I}\|q(z)\|<1-\epsilon. The family 𝒮ϵ\mathcal{S}_{\epsilon} forms a polynomial subalgebra of wρ\mathcal{R}_{w^{\rho}} (as defined in (4.14)) with wtρ:=(1ϵ)ρtw^{\rho}_{t}:=(1-\epsilon)^{\rho t}, made of fading memory reservoir filters that map into L(Ω,)L^{\infty}(\Omega,\mathbb{R}).

Moreover, for any time-invariant and causal fading memory filter H:(KIL,Lwρ)L(Ω,)H:(K^{L^{\infty}}_{I},\|\cdot\|_{L^{\infty}_{w^{\rho}}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) and any ϵ>0\epsilon>0, there exist a natural number NN\in\mathbb{N}, polynomials p(z)𝕄N,N[z],q(z)𝕄N,1[z]p(z)\in\mathbb{M}_{N,N}[z],q(z)\in\mathbb{M}_{N,1}[z] with Mp,Mq<1ϵM_{p},M_{q}<1-\epsilon, and a vector 𝐖N\mathbf{W}\in\mathbb{R}^{N} such that

HH𝐖p,q:=supzKIL{H(z)H𝐖p,q(z)L}<ϵ.\|H-H_{\bf W}^{p,q}\|_{\infty}:=\sup_{{z}\in K^{L^{\infty}}_{I}}\{\|H({z})-H_{\bf W}^{p,q}({z})\|_{L^{\infty}}\}<\epsilon.

The same universality result can be stated for the smaller subfamily 𝒩𝒮ϵ𝒮ϵ\mathcal{NS}_{\epsilon}\subset\mathcal{S}_{\epsilon} formed by SAS reservoir systems determined by nilpotent polynomials p(z)il[z]p(z)\in\mathbb{N}{\rm il}[z].

5 Conclusion

This paper studies and proposes solutions for the universality problem in the approximation of fading memory filters using reservoir computer (RC) systems. RCs are a particular type of recurrent neural networks that have important applications both in machine learning and in signal processing where they exhibit superb information processing performances. Their importance is also linked to the possibility of building highly efficient hardware realizations. RC systems are in general defined as nonlinear state-space systems determined by a reservoir and a readout map. In many supervised machine learning applications the readout is chosen to be linear and the reservoir map is randomly generated, which reduces the training of a dynamic task to a static regression problem and allows to circumvent well-known difficulties in the training of generic recurrent neural networks.

The universality question that we addressed consists in finding families of RCs as simple as possible such that the set of input/output functionals that can be generated with them is dense in a sufficiently rich class. The work presented here is the dynamic counterpart of a statement of this type for neural networks in a static and deterministic setup in which they have been proved to be universal approximators.

The RC universality results stated in the paper correspond to two different situations in which the inputs are either deterministic and uniformly bounded or stochastic and almost surely uniformly bounded. In both cases we proved two different universality statements. First, we showed that the family of fading memory RCs is universal in the much larger fading memory filters category. The same applies to the much smaller RC family containing just linear reservoirs with polynomial readouts, when certain spectral restrictions are imposed on the reservoir maps. The second result concerns exclusively reservoir computers with linear readouts, which are closer to the type of RCs used in applications and hardware implementations. More specifically, we introduced the family of what we called non-homogeneous state-affine systems and identified sufficient conditions that guarantee that the associated reservoir computers with linear readouts are causal, time-invariant, and satisfy the echo state and the fading memory properties. Finally, we stated a universality result for a subset of this class which was shown to be universal in the same fading memory filters category as above. These universality statements are then generalized to the stochastic setup for almost surely uniformly bounded inputs. In particular, we showed that any discrete-time filter that has the fading memory property with almost surely uniformly bounded stochastic inputs can be uniformly approximated by elements in the non-homogeneous state-affine family. All the density statements in the paper are formulated with respect to natural uniform approximation norms that appear in each of the different cases considered.

Despite preexisting work, these universality results are, to our knowledge, the first of their type in the semi-infinite discrete-time inputs setup. In the stochastic case they open the door to new developments in the learning theory of stochastic processes.

6 Appendices

6.1 Proof of Lemma 2.1

Let w:(0,1]w:\mathbb{N}\longrightarrow(0,1] be an arbitrary weighting sequence. Then, for any 𝐳KM{\bf z}\in K_{M}:

𝐳w:=supt{𝐳twt}=supt{𝐳twt}M1=M<.\|{\bf z}\|_{w}:=\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}w_{-t}\|\}=\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}\|w_{-t}\}\leq M\cdot 1=M<\infty.

Regarding the inequalities  (2.5) and  (2.6), notice that if wt=λtw_{t}=\lambda^{t} then:

t=0𝐳twt=t=0𝐳tλt=t=0𝐳t(λ1ρλρ)t=t=0𝐳tλ(1ρ)tλρtt=0supi{𝐳iλ(1ρ)i}λρt=supi{𝐳iλ(1ρ)i}t=0λρt=𝐳w1ρ11λρ,\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|w_{t}=\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|\lambda^{t}=\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|(\lambda^{1-\rho}\lambda^{\rho})^{t}=\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|\lambda^{(1-\rho)t}\lambda^{\rho t}\\ \leq\sum_{t=0}^{\infty}\sup_{i\in\mathbb{N}}\left\{\|{\bf z}_{-i}\|\lambda^{(1-\rho)i}\right\}\lambda^{\rho t}=\sup_{i\in\mathbb{N}}\left\{\|{\bf z}_{-i}\|\lambda^{(1-\rho)i}\right\}\sum_{t=0}^{\infty}\lambda^{\rho t}=\|{\bf z}\|_{w^{1-\rho}}\frac{1}{1-\lambda^{\rho}},

which proves (2.5). The proof of (2.6) is similar and follows from noticing that:

t=0𝐳tλ(1ρ)tλρtt=0supi{𝐳iλρi}λ(1ρ)t)=supi{𝐳iλρi}t=0λ(1ρ)t=𝐳wρ11λ1ρ.\sum_{t=0}^{\infty}\|{\bf z}_{-t}\|\lambda^{(1-\rho)t}\lambda^{\rho t}\\ \leq\sum_{t=0}^{\infty}\sup_{i\in\mathbb{N}}\left\{\|{\bf z}_{-i}\|\lambda^{\rho i}\right\}\lambda^{(1-\rho)t)}=\sup_{i\in\mathbb{N}}\left\{\|{\bf z}_{-i}\|\lambda^{\rho i}\right\}\sum_{t=0}^{\infty}\lambda^{(1-\rho)t}=\|{\bf z}\|_{w^{\rho}}\frac{1}{1-\lambda^{1-\rho}}.\quad\blacksquare

6.2 Proof of Lemma 2.2

We recall first that by Lemma  2.1 we have that 𝐳w<\|{\bf z}\|_{w}<\infty, for any 𝐳KM{\bf z}\in K_{M}. Second, since (w(n),w)(\ell^{\infty}_{w}({\mathbb{R}}^{n}),\left\|\cdot\right\|_{w}) is a Banach space [Grig 18], it is hence metrizable and therefore so is (KM,w)(K_{M},\left\|\cdot\right\|_{w}) when endowed with the relative topology (see, for instance, [Munk 14, Exercise 1, Chapter 2, §21]). We will then conclude the compactness of (KM,w)(K_{M},\left\|\cdot\right\|_{w}) by showing that this space is sequentially compact (see, for example [Munk 14, Theorem 28.2]). We proceed by using the strategy in the proof of Lemma 1 in [Boyd 85].

For any mm\in\mathbb{N}, let KMmK_{M}^{m} be the set obtained by projecting into (n){m,,1,0}\left(\mathbb{R}^{n}\right)^{\{-m,\ldots,-1,0\}} the elements of KM(n)K_{M}\subset({\mathbb{R}}^{n})^{\mathbb{Z}_{-}}. Given an element 𝐳KM{\bf z}\in K_{M}, we will denote by 𝐳(m):=(𝐳m,,𝐳0){\bf z}^{(m)}:=({\bf z}_{-m},\ldots,{\bf z}_{0}) its projection into KMmK_{M}^{m}. Additionally, notice that KMm=Bn(𝟎,M)¯m+1K_{M}^{m}=\overline{B_{n}({\bf 0},M)}^{m+1} is compact (and hence sequentially compact) with the product topology, since it is a product of closed balls Bn(𝟎,M)¯n\overline{B_{n}({\bf 0},M)}\subset\mathbb{R}^{n} which are compact.

Let {𝐳(n)}nKM\{{\bf z}(n)\}_{n\in\mathbb{N}}\subset K_{M} be a sequence of elements in KMK_{M}. The argument that we just stated proves that for any kk\in\mathbb{N}, there is a subset k\mathbb{N}_{k}\subset\mathbb{N} and an element 𝐳(k)KMk{\bf z}^{(k)}\in K_{M}^{k} such that

maxt{k,,0}𝐳t(n)𝐳t(k)0,as nnk.\max_{t\in\left\{-k,\ldots,0\right\}}\left\|{\bf z}_{t}(n)-{\bf z}^{(k)}_{t}\right\|\longrightarrow 0,\quad\mbox{as $n\rightarrow\infty$,\, $n\in\mathbb{N}_{k}$}.

Moreover, the sets k\mathbb{N}_{k} can be constructed so that 12\mathbb{N}\supset\mathbb{N}_{1}\supset\mathbb{N}_{2}\supset\cdots and so that 𝐳(k){\bf z}^{(k)} extends 𝐳(l){\bf z}^{(l)} when klk\geq l. This implies the existence of an element 𝐳KM{\bf z}\in K_{M} such that, for each kk\in\mathbb{N},

maxt{k,,0}𝐳t(n)𝐳t0,as nnk,\max_{t\in\left\{-k,\ldots,0\right\}}\left\|{\bf z}_{t}(n)-{\bf z}_{t}\right\|\longrightarrow 0,\quad\mbox{as $n\rightarrow\infty$,\, $n\in\mathbb{N}_{k}$},

and hence there exists an increasing subsequence nkn_{k} such that nkkn_{k}\in\mathbb{N}_{k} and that for each k0k_{0},

maxt{k0,,0}𝐳t(nk)𝐳t0,as k.\max_{t\in\left\{-k_{0},\ldots,0\right\}}\left\|{\bf z}_{t}(n_{k})-{\bf z}_{t}\right\|\longrightarrow 0,\quad\mbox{as $k\longrightarrow\infty$}. (6.1)

We conclude by showing that the sequence {𝐳(nk)}k\left\{{\bf z}(n_{k})\right\}_{k\in\mathbb{N}} converges in (KM,w)(K_{M},\left\|\cdot\right\|_{w}) to the element 𝐳KM{\bf z}\in K_{M}. First, given that wt0w_{t}\longrightarrow 0 as tt\longrightarrow\infty, then for any ε>0\varepsilon>0 there exists k0k_{0} such that wk<ε/2Mw_{k}<\varepsilon/2M, for any kk0k\geq k_{0}. Additionally, since 𝐳(nk),𝐳KM{\bf z}(n_{k}),{\bf z}\in K_{M} for any kk\in\mathbb{N}, we have that

suptk0{𝐳t(nk)𝐳twt}2Mwk0<ε.\sup_{t\leq-k_{0}}\{\left\|{\bf z}_{t}(n_{k})-{\bf z}_{t}\right\|w_{-t}\}\leq 2Mw_{k_{0}}<\varepsilon. (6.2)

Now, by (6.1) there exists k1k_{1} such that for any kk1k\geq k_{1}

supt{k0,,0}{𝐳t(nk)𝐳twt}<supt{k0,,0}{𝐳t(nk)𝐳t}<ε.\sup_{t\in\left\{-k_{0},\ldots,0\right\}}\{\left\|{\bf z}_{t}(n_{k})-{\bf z}_{t}\right\|w_{-t}\}<\sup_{t\in\left\{-k_{0},\ldots,0\right\}}\{\left\|{\bf z}_{t}(n_{k})-{\bf z}_{t}\right\|\}<\varepsilon. (6.3)

Consequently, (6.2) and (6.3) imply that for any k>max{k0,k1}k>\max\{k_{0},k_{1}\}, 𝐳(nk)𝐳w<ε\left\|{\bf z}(n_{k})-{\bf z}\right\|_{w}<\varepsilon, as required.  \blacksquare

6.3 Proof of Lemma 2.7

Let δw(ϵ)\delta^{w}(\epsilon) be the epsilon-delta relation for the FMP associated to the weighting sequence ww. We now show that HUH_{U} has the FMP with respect to ww^{\prime} via the epsilon-delta relation given by δw(ϵ):=δw(ϵ)/λ\delta^{w^{\prime}}(\epsilon):=\delta^{w}(\epsilon)/\lambda. Indeed, for any ϵ>0\epsilon>0 and any 𝐳,𝐬K{\bf z},{\bf s}\in K such that 𝐳𝐬w<δw(ϵ)\|{\bf z}-{\bf s}\|_{w^{\prime}}<\delta^{w^{\prime}}(\epsilon), we have that

𝐳𝐬w=supt{𝐳t𝐬twt}=supt{𝐳t𝐬twtwtwt}<λsupt{𝐳t𝐬twt}<λ𝐳𝐬w<λδw(ϵ)=δw(ϵ),\|{\bf z}-{\bf s}\|_{w}=\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}-{\bf s}_{t}\|w_{-t}\}=\sup_{t\in\mathbb{Z}_{-}}\left\{\|{\bf z}_{t}-{\bf s}_{t}\|\frac{w_{-t}}{w^{\prime}_{-t}}w^{\prime}_{-t}\right\}<\lambda\sup_{t\in\mathbb{Z}_{-}}\{\|{\bf z}_{t}-{\bf s}_{t}\|w^{\prime}_{-t}\}<\lambda\|{\bf z}-{\bf s}\|_{w^{\prime}}<\lambda\delta^{w^{\prime}}(\epsilon)=\delta^{w}(\epsilon),

and consequently, since HUH_{U} has the FMP with respect to the weighting sequence ww, we can conclude that |HU(𝐳)HU(𝐬)|<ϵ|H_{U}({\bf z})-H_{U}({\bf s})|<\epsilon. This shows that the implication

𝐳𝐬w<δw(ϵ)|HU(𝐳)HU(𝐬)|<ϵ\|{\bf z}-{\bf s}\|_{w^{\prime}}<\delta^{w^{\prime}}(\epsilon)\Longrightarrow|H_{U}({\bf z})-H_{U}({\bf s})|<\epsilon

holds, as required.  \blacksquare

6.4 Proof of Theorem 3.1

Since the elements in \mathcal{R} have the FMP with respect to a given weighted norm w\|\cdot\|_{w}, then so do those in 𝒜()\mathcal{A}(\mathcal{R}) since polynomial combinations of continuous elements of the form HhiFi:(KM,w)H_{h_{i}}^{F_{i}}:(K_{M},\|\cdot\|_{w})\longrightarrow\mathbb{R} are also continuous. Therefore, under that hypothesis, 𝒜()\mathcal{A}(\mathcal{R}) is a polynomial subalgebra of the algebra (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}) of real-valued continuous functions on (KM,w)(K_{M},\|\cdot\|_{w}). Since by hypothesis 𝒜()\mathcal{A}(\mathcal{R}) contains the constant functionals and separates the points in KMK_{M} and, by Lemma 2.2, the set (KM,w)(K_{M},\|\cdot\|_{w}) is compact, the Stone-Weierstrass theorem (Theorem 7.3.1 in [Dieu 69]) implies that 𝒜()\mathcal{A}(\mathcal{R}) is dense in (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}), which concludes the proof.  \blacksquare

6.5 Proof of Corollary 3.4

In order to show that the reservoir systems in ϵ\mathcal{L}_{\epsilon} induce reservoir filters, we first show that they have the echo state property by using the following lemma, whose proof can be found in [Grig 18].

Lemma 6.1

Let DNND_{N}\subset\mathbb{R}^{N} and DnnD_{n}\subset{\mathbb{R}}^{n} and let F:DN×DnDNF:D_{N}\times D_{n}\longrightarrow D_{N} be a continuous reservoir map. Suppose that FF is a contraction map with contraction constant 0<r<10<r<1, that is:

F(𝐱,𝐳)F(𝐲,𝐳)r𝐱𝐲,for all 𝐱,𝐲DN and all 𝐳Dn,\left\|F(\mathbf{x},{\bf z})-F(\mathbf{y},{\bf z})\right\|\leq r\left\|\mathbf{x}-{\bf y}\right\|,\quad\mbox{for all $\mathbf{x},{\bf y}\in D_{N}$ and all ${\bf z}\in D_{n}$},

then the corresponding reservoir system has the echo state property.

We start now by noting that the condition σmax(A)<1ϵ<1\sigma_{{\rm max}}(A)<1-\epsilon<1 implies that the reservoir map F(𝐱,𝐳):=A𝐱+𝐜𝐳F(\mathbf{x},{\bf z}):=A\mathbf{x}+{\bf c}{\bf z} associated to (3.7) is a contracting map with constant σmax(A)\sigma_{{\rm max}}(A) which, by hypothesis, is smaller than one. Indeed,

F(𝐱,𝐳)F(𝐲,𝐳)=A(𝐱𝐲)σmax(A)𝐱𝐲for all 𝐱,𝐲DN and all 𝐳Dn.\left\|F(\mathbf{x},{\bf z})-F(\mathbf{y},{\bf z})\right\|=\left\|A(\mathbf{x}-{\bf y})\right\|\leq\sigma_{{\rm max}}(A)\left\|\mathbf{x}-{\bf y}\right\|\quad\mbox{for all $\mathbf{x},{\bf y}\in D_{N}$ and all ${\bf z}\in D_{n}$}.

By Lemma 6.1 we can conclude that this reservoir system has a reservoir filter associated that we now show is explicitly given by (3.9). We start by proving that the conditions σmax(A)<1ϵ<1\sigma_{{\rm max}}(A)<1-\epsilon<1 and that the elements in KMK_{M} are uniformly bounded by a constant MM imply that the infinite sum in (3.9) is convergent. Let n,mn,m\in\mathbb{N} be such that n<mn<m and let Sn:=i=0nAi𝐜ziS_{n}:=\sum_{i=0}^{n}A^{i}{\bf c}z_{-i}. Now:

SnSm=j=n+1mAi𝐜𝐳ij=n+1mA2i𝐜2𝐳iM𝐜2j=n+1mσmax(A)iM𝐜2j=n+1σmax(A)i=M𝐜2σmax(A)n+11σmax(A).\left\|S_{n}-S_{m}\right\|=\left\|\sum_{j=n+1}^{m}A^{i}{\bf c}{\bf z}_{-i}\right\|\leq\sum_{j=n+1}^{m}\left\|A\right\|_{2}^{i}\left\|{\bf c}\right\|_{2}\|{\bf z}_{-i}\|\leq M\left\|{\bf c}\right\|_{2}\sum_{j=n+1}^{m}\sigma_{{\rm max}}(A)^{i}\\ \leq M\left\|{\bf c}\right\|_{2}\sum_{j=n+1}^{\infty}\sigma_{{\rm max}}(A)^{i}=M\left\|{\bf c}\right\|_{2}\frac{\sigma_{{\rm max}}(A)^{n+1}}{1-\sigma_{{\rm max}}(A)}.

The condition σmax(A)<1ϵ<1\sigma_{{\rm max}}(A)<1-\epsilon<1 implies that M𝐜2σmax(A)n+11σmax(A)=Mσmax(𝐜)σmax(A)n+11σmax(A)0M\left\|{\bf c}\right\|_{2}\frac{\sigma_{{\rm max}}(A)^{n+1}}{1-\sigma_{{\rm max}}(A)}=M\frac{\sigma_{{\rm max}}({\bf c})\sigma_{{\rm max}}(A)^{n+1}}{1-\sigma_{{\rm max}}(A)}\rightarrow 0 as nn\rightarrow\infty and hence {Sn}n\left\{S_{n}\right\}_{n\in\mathbb{N}} is a Cauchy sequence in N\mathbb{R}^{N} that consequently converges.

The fact that the filter determined by the expression (3.9) is a solution of the recursions (3.7)-(3.8) is a straightforward verification. In order to carry it out, it suffices to use that the filter UhA,𝐜(𝐳)U^{A,{\bf c}}_{h}({\bf z}) associated to the functional HhA,𝐜(𝐳)H^{A,{\bf c}}_{h}({\bf z}) is given by

UhA,𝐜(𝐳)t=h(i=0Ai𝐜𝐳ti),U^{A,{\bf c}}_{h}({\bf z})_{t}=h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i}\right),

and that the time series 𝐱t~\widetilde{{\bf x}_{t}} defined by 𝐱t~:=i=0Ai𝐜𝐳ti\widetilde{{\bf x}_{t}}:=\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i} satisfies the recursion relation (3.7).

We now verify by hand that the filters UhA,𝐜U^{A,{\bf c}}_{h} are time-invariant. Let 𝐳KM{\bf z}\in K_{M} and t,τt,\tau\in\mathbb{N} arbitrary and let UτU_{\tau} be the corresponding time delay operator, then:

(UhA,𝐜Uτ)(𝐳)t=(UhA,𝐜(Uτ(𝐳)))t=h(i=0Ai𝐜Uτ(𝐳)ti)=h(i=0Ai𝐜𝐳tiτ)\left(U^{A,{\bf c}}_{h}\circ U_{\tau}\right)({\bf z})_{t}=\left(U^{A,{\bf c}}_{h}\left(U_{\tau}({\bf z})\right)\right)_{t}=h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}U_{\tau}({\bf z})_{t-i}\right)=h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i-\tau}\right) (6.4)

At the same time,

(UτUhA,𝐜)(𝐳)t=(Uτ(UhA,𝐜(𝐳)))t=UhA,𝐜(𝐳)tτ=h(i=0Ai𝐜𝐳tτi),\left(U_{\tau}\circ U^{A,{\bf c}}_{h}\right)({\bf z})_{t}=\left(U_{\tau}\left(U^{A,{\bf c}}_{h}({\bf z})\right)\right)_{t}=U^{A,{\bf c}}_{h}({\bf z})_{t-\tau}=h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-\tau-i}\right),

which coincides with  (6.4) and proves the time-invariance of UhA,𝐜U^{A,{\bf c}}_{h}.

The next step consists in showing that the elements in ϵ\mathcal{L}_{\epsilon} are λρ\lambda_{\rho}-exponential fading memory filters, with λρ:=(1ϵ)ρ\lambda_{\rho}:=(1-\epsilon)^{\rho}, for any ρ(0,1)\rho\in(0,1), that is, ϵwρ\mathcal{L}_{\epsilon}\subset\mathcal{R}_{w^{\rho}}, with wρ:(0,1]w^{\rho}:\mathbb{N}\rightarrow(0,1] the sequence given by wtρ:=(1ϵ)ρtw_{t}^{\rho}:=(1-\epsilon)^{\rho t}. Let wρ\|\cdot\|_{w^{\rho}} be the associated weighted norm in KMK_{M} and let 𝐳KM{\bf z}\in K_{M} be an arbitrary element. We start by noting that the continuity of the readout map h:DNh:D_{N}\rightarrow\mathbb{R} implies that for any ε>0\varepsilon>0 there exists an element δ(ε)>0\delta(\varepsilon)>0 such that for any 𝐯DN\mathbf{v}\in D_{N} that satisfies

𝐯i=0Ai𝐜𝐳ti<δ(ε),then|h(𝐯)h(i=0Ai𝐜𝐳ti)|<ε.\left\|\mathbf{v}-\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i}\right\|<\delta(\varepsilon),\quad\mbox{then}\quad\left|h(\mathbf{v})-h\left(\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i}\right)\right|<\varepsilon. (6.5)

We now show that for any 𝐬KM{\bf s}\in K_{M} such that

𝐬𝐳wρ<δ(ε)(1(1ϵ)1ρ)σmax(𝐜),then|HhA,𝐜(𝐬)HhA,𝐜(𝐳)|<ε.\|{\bf s}-{\bf z}\|_{w^{\rho}}<\frac{\delta(\varepsilon)\left(1-(1-\epsilon)^{1-\rho}\right)}{\sigma_{{\rm max}}({\bf c})},\quad\mbox{then}\quad\left|H^{A,{\bf c}}_{h}({\bf s})-H^{A,{\bf c}}_{h}({\bf z})\right|<\varepsilon. (6.6)

Indeed,

i=0Ai𝐜𝐬tii=0Ai𝐜𝐳ti=i=0Ai𝐜(𝐬ti𝐳ti)i=0Ai𝐜(𝐬ti𝐳ti)i=0σmax(Ai)𝐜(𝐬ti𝐳ti)i=0σmax(A)i𝐜(𝐬ti𝐳ti)i=0(1ϵ)i𝐜(𝐬ti𝐳ti).\left\|\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf s}_{t-i}-\sum_{i=0}^{\infty}A^{i}{\bf c}{\bf z}_{t-i}\right\|=\left\|\sum_{i=0}^{\infty}A^{i}{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\leq\sum_{i=0}^{\infty}\left\|A^{i}{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\\ \leq\sum_{i=0}^{\infty}\sigma_{{\rm max}}(A^{i})\left\|{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\leq\sum_{i=0}^{\infty}\sigma_{{\rm max}}(A)^{i}\left\|{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\leq\sum_{i=0}^{\infty}(1-\epsilon)^{i}\left\|{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|.

If we now use (2.6) in Lemma 2.1 and the hypothesis in (6.6), we can conclude that

i=0(1ϵ)i𝐜(𝐬ti𝐳ti)σmax(𝐜)i=0(1ϵ)i(𝐬ti𝐳ti)σmax(𝐜)𝐬𝐳wρ1(1ϵ)1ρ<δ(ε),\sum_{i=0}^{\infty}(1-\epsilon)^{i}\left\|{\bf c}({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\leq\sigma_{{\rm max}}({\bf c})\sum_{i=0}^{\infty}(1-\epsilon)^{i}\left\|({\bf s}_{t-i}-{\bf z}_{t-i})\right\|\leq\frac{\sigma_{{\rm max}}({\bf c})\|{\bf s}-{\bf z}\|_{w^{\rho}}}{1-(1-\epsilon)^{1-\rho}}<\delta(\varepsilon),

which proves the continuity of the map HhA,𝐜:(KM,wρ)H^{A,{\bf c}}_{h}:(K_{M},\|\cdot\|_{w^{\rho}})\longrightarrow\mathbb{R} and hence shows that HhA,𝐜H^{A,{\bf c}}_{h} is a λρ\lambda_{\rho}-exponential fading memory filter.

In order to establish the universality statement in the corollary we will proceed, as in the proof of Theorem 3.1, by showing that ϵ\mathcal{L}_{\epsilon} is a polynomial algebra that contains the constant functionals and separates the points in KMK_{M} and then by invoking the Stone-Weierstrass theorem using the compactness of (KM,wρ)(K_{M},\|\cdot\|_{w^{\rho}}).

In order to show that (ϵ,wρ)(\mathcal{L}_{\epsilon},\|\cdot\|_{w^{\rho}}) is a polynomial algebra, notice first that if A1,A2𝕄NA_{1},A_{2}\in\mathbb{M}_{N} are such that σmax(A1),σmax(A2)<1ϵ\sigma_{{\rm max}}(A_{1}),\sigma_{{\rm max}}(A_{2})<1-\epsilon, then

σmax(A1A2)=max(σmax(A1),σmax(A2))<1ϵ.\sigma_{{\rm max}}(A_{1}\oplus A_{2})=\max\left(\sigma_{{\rm max}}(A_{1}),\sigma_{{\rm max}}(A_{2})\right)<1-\epsilon. (6.7)

If we now take 𝐜i𝕄Ni,n{\bf c}_{i}\in\mathbb{M}_{N_{i},n}, i{1,2}i\in\{1,2\} and h1,h2h_{1},h_{2} two real-valued polynomials in N1N_{1} and N2N_{2} variables, respectively, we have by the first part of the corollary that we just proved that the filter functionals Hh1A1,𝐜1H^{A_{1},{\bf c}_{1}}_{h_{1}} and Hh2A2,𝐜2H^{A_{2},{\bf c}_{2}}_{h_{2}} are well defined. Additionally, by (3.3)-(3.4) so are the combinations Hh1A1,𝐜1Hh2A2,𝐜2H^{A_{1},{\bf c}_{1}}_{h_{1}}\cdot H^{A_{2},{\bf c}_{2}}_{h_{2}} and Hh1A1,𝐜1+λHh2A2,𝐜2H^{A_{1},{\bf c}_{1}}_{h_{1}}+\lambda H^{A_{2},{\bf c}_{2}}_{h_{2}} that satisfy:

Hh1A1,𝐜1Hh2A2,𝐜2=Hh1h2A1A2,𝐜1𝐜2,Hh1A1,𝐜1+λHh2A2,𝐜2=Hh1λh2A1A2,𝐜1𝐜2,λ.H^{A_{1},{\bf c}_{1}}_{h_{1}}\cdot H^{A_{2},{\bf c}_{2}}_{h_{2}}=H^{A_{1}\oplus A_{2},{\bf c}_{1}\oplus{\bf c}_{2}}_{h_{1}\cdot h_{2}},\quad H^{A_{1},{\bf c}_{1}}_{h_{1}}+\lambda H^{A_{2},{\bf c}_{2}}_{h_{2}}=H^{A_{1}\oplus A_{2},{\bf c}_{1}\oplus{\bf c}_{2}}_{h_{1}\oplus\lambda h_{2}},\quad\lambda\in\mathbb{R}. (6.8)

Using the relations (6.8) and (6.7), we can conclude that both Hh1A1,𝐜1Hh2A2,𝐜2H^{A_{1},{\bf c}_{1}}_{h_{1}}\cdot H^{A_{2},{\bf c}_{2}}_{h_{2}} and Hh1A1,𝐜1+λHh2A2,𝐜2H^{A_{1},{\bf c}_{1}}_{h_{1}}+\lambda H^{A_{2},{\bf c}_{2}}_{h_{2}} belong to ϵwρ\mathcal{L}_{\epsilon}\subset\mathcal{R}_{w^{\rho}}. This implies that (ϵ,wρ)(\mathcal{L}_{\epsilon},\|\cdot\|_{w^{\rho}}) is a polynomial subalgebra of (wρ,wρ)(\mathcal{R}_{w^{\rho}},\|\cdot\|_{w^{\rho}})

Since ϵ\mathcal{L}_{\epsilon} contains the constant functionals (just take constant readout maps hh), in order to conclude the proof, it is enough to show that the elements in ϵ\mathcal{L}_{\epsilon} separate points in KMK_{M}. In the proof of this statement we need the following elementary fact about analytic functions.

Lemma 6.2

Let M>0M>0 and let 𝐳[M,M]{\bf z}\in[-M,M]^{\mathbb{Z}_{-}}. Define the real valued function f𝐳(x):=j=0zjxjf_{{\bf z}}(x):=\sum_{j=0}^{\infty}z_{-j}x^{j}. This function is real analytic in the interval (1,1)(-1,1). Moreover, if 𝐳𝟎{\bf z}\neq{\bf 0}, then there exists a point x0(1,1)x_{0}\in(-1,1) such that f𝐳(x0)0f_{{\bf z}}(x_{0})\neq 0.

Proof of the lemma.   We note first that for any x(1,1)x\in(-1,1) and any ss\in\mathbb{N} we have that

|j=0szjxj|j=0s|zj||xj|Mj=0s|x|jM1|x|.\left|\sum_{j=0}^{s}z_{-j}x^{j}\right|\leq\sum_{j=0}^{s}\left|z_{-j}\right|\left|x^{j}\right|\leq M\sum_{j=0}^{s}\left|x\right|^{j}\leq\frac{M}{1-|x|}.

Taking the limit ss\rightarrow\infty, we obtain that

|f𝐳(x)|M1|x|,for all x(1,1),\left|f_{{\bf z}}(x)\right|\leq\frac{M}{1-|x|},\quad\mbox{for all $x\in(-1,1)$,}

which proves the first claim in the lemma. Now, by the uniqueness theorem for the representation of analytic functions by power series (see [Brow 09, page 217]), the series j=0zjxj\sum_{j=0}^{\infty}z_{-j}x^{j} is the Taylor expansion around 0 of f𝐳(x)f_{{\bf z}}(x). Since 𝐳𝟎{\bf z}\neq{\bf 0} by hypothesis, some of the derivatives of f𝐳(x)f_{{\bf z}}(x) are non-zero and hence this function cannot be flat, which implies that there exists a point x0(1,1)x_{0}\in(-1,1) such that f𝐳(x0)0f_{{\bf z}}(x_{0})\neq 0. \blacktriangledown

We now show that the elements in ϵ\mathcal{L}_{\epsilon} separate points in KMK_{M}. Take 𝐳1,𝐳2KM(n){\bf z}_{1},{\bf z}_{2}\in K_{M}\subset\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}} such that 𝐳1𝐳2{\bf z}_{1}\neq{\bf z}_{2} and let A𝕄(n,n)A\in\mathbb{M}(n,n), with σmax(A)<1ϵ\sigma_{{\rm max}}(A)<1-\epsilon, and 𝐜:=𝕀n{\bf c}:=\mathbb{I}_{n}. Let UA,𝐜:KM(n)U^{A,{\bf c}}:K_{M}\longrightarrow\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}} be the linear filter associated to AA and 𝐜{\bf c} via the recursion  (3.7). Using the preceding arguments we have that

UA,𝐜(𝐳)t=j=0Aj𝐳tj.U^{A,{\bf c}}({\bf z})_{t}=\sum_{j=0}^{\infty}A^{j}{\bf z}_{t-j}. (6.9)

Since 𝐳1𝐳2{\bf z}_{1}\neq{\bf z}_{2}, then there exists and index i{1,,n}i\in\left\{1,\ldots,n\right\} and tt\in\mathbb{Z}_{-} such that (z1i)t(z2i)t\left(z_{1}^{i}\right)_{t}\neq\left(z_{2}^{i}\right)_{t}. Let now b(1+ϵ,1ϵ)b\in(-1+\epsilon,1-\epsilon) and let Ab:=diag(0,,0,b,0,,0)𝔻nA_{b}:={\rm diag}\left(0,\ldots,0,b,0,\ldots,0\right)\in\mathbb{D}_{n} be the matrix that has the element bb in the ii-th entry. It is easy to see using (6.9) that

UAb,𝐜(𝐳)t=(0,,0,j=0bjztji,0,,0),withj=0bjztjiin the i-th entry.U^{A_{b},{\bf c}}({\bf z})_{t}=\left(0,\ldots,0,\sum_{j=0}^{\infty}b^{j}z^{i}_{t-j},0,\ldots,0\right)^{\top},\quad\mbox{with}\quad\sum_{j=0}^{\infty}b^{j}z^{i}_{t-j}\quad\mbox{in the $i$-th entry.} (6.10)

Let 𝐬:=𝐳1𝐳2𝟎{\bf s}:={\bf z}_{1}-{\bf z}_{2}\neq{\bf 0}. Notice that by (6.10) we have that UAb,𝐜(𝐬)0=(0,,0,j=0bjsji,0,,0)U^{A_{b},{\bf c}}({\bf s})_{0}=\left(0,\ldots,0,\sum_{j=0}^{\infty}b^{j}s^{i}_{-j},0,\ldots,0\right)^{\top}. Given that the vector 𝐬i{\bf s}^{i}\in\mathbb{R}^{\mathbb{Z}_{-}} is non-zero, Lemma 6.2, implies the existence of an element b0(1+ϵ,1ϵ)b_{0}\in(-1+\epsilon,1-\epsilon) such that UAb0,𝐜(𝐬)0𝟎U^{A_{b_{0}},{\bf c}}({\bf s})_{0}\neq{\bf 0}, which is equivalent to UAb0,𝐜(𝐳1)0UAb0,𝐜(𝐳2)0U^{A_{b_{0}},{\bf c}}({\bf z}_{1})_{0}\neq U^{A_{b_{0}},{\bf c}}({\bf z}_{2})_{0}. Using the polynomial h(𝐱):=xih(\mathbf{x}):=x_{i}\in\mathbb{R}, the previous relation implies that UhAb0,𝐜(𝐳1)0UhAb0,𝐜(𝐳2)0U^{A_{b_{0}},{\bf c}}_{h}({\bf z}_{1})_{0}\neq U_{h}^{A_{b_{0}},{\bf c}}({\bf z}_{2})_{0} or, equivalently,

HhAb0,𝐜(𝐳1)HhAb0,𝐜(𝐳2),as required.H^{A_{b_{0}},{\bf c}}_{h}\left({\bf z}_{1}\right)\neq H^{A_{b_{0}},{\bf c}}_{h}\left({\bf z}_{2}\right),\quad\mbox{as required}.

We conclude the proof by establishing the universality the families 𝒟ϵ\mathcal{DL}_{\epsilon} and 𝒩\mathcal{NL} formed by the linear reservoir filters generated by diagonal and nilpotent matrices, respectively. First, in the case of 𝒟ϵ\mathcal{DL}_{\epsilon}, the statement is a consequence of (6.8) and of the fact that when the matrices A1A_{1} and A2A_{2} are diagonal, then the matrix associated to the linear map A1A2A_{1}\oplus A_{2} is also diagonal. Additionally, notice that the point separation property for ϵ\mathcal{L}_{\epsilon} has been proved using diagonal matrices in (6.10) and hence it also holds for 𝒟ϵ\mathcal{DL}_{\epsilon}. The claim follows from the Stone-Weierstrass theorem.

Finally, in the case of 𝒩\mathcal{NL}, the proof also follows from (6.8) since it is straightforward to see that when the matrices A1A_{1} and A2A_{2} are nilpotent, then the matrix associated to the linear map A1A2A_{1}\oplus A_{2} is also nilpotent. It is only the point separation property of 𝒩\mathcal{N} that requires a separate argument that we provide in the following lines. Let 𝐳1,𝐳2KM{\bf z}_{1},{\bf z}_{2}\in K_{M} such that 𝐳1𝐳2{\bf z}_{1}\neq{\bf z}_{2} and let t0t_{0}\in\mathbb{N} be the first time index for which (𝐳1)t0(𝐳2)t0\left({\bf z}_{1}\right)_{-t_{0}}\neq\left({\bf z}_{2}\right)_{-t_{0}}, that is, (𝐳1)t=(𝐳2)t\left({\bf z}_{1}\right)_{-t}=\left({\bf z}_{2}\right)_{-t}, for all t{0,1,,t01}t\in\{0,1,\ldots,t_{0}-1\}. Let now i0{1,,n}i_{0}\in\left\{1,\ldots,n\right\} be such that (z1i0)t0(z2i0)t0\left(z_{1}^{i_{0}}\right)_{-t_{0}}\neq\left(z_{2}^{i_{0}}\right)_{-t_{0}}. Let now At0+1ilt0+1t0+1A_{t_{0}+1}\in\mathbb{N}{\rm il}_{t_{0}+1}^{t_{0}+1} be the upper shift matrix in dimension t0+1t_{0}+1, that is, At0+1𝕄t0+1A_{t_{0}+1}\in\mathbb{M}_{t_{0}+1} is by definition a superdiagonal matrix with a diagonal of ones above the main diagonal, and construct an element 𝐜𝕄t0+1,n{\bf c}\in\mathbb{M}_{t_{0}+1,n} whose last row is given by a vector of zeros with the exception of a one in the entry i0i_{0}. The nilpotency of At0+1A_{t_{0}+1} implies

UAt0+1,𝐜(𝐳)0=j=0t0At0+1j𝐜𝐳j.U^{A_{t_{0}+1},{\bf c}}({\bf z})_{0}=\sum_{j=0}^{t_{0}}A_{t_{0}+1}^{j}{\bf c}{\bf z}_{-j}.

When we apply this expression to 𝐳1{\bf z}_{1} and 𝐳2{\bf z}_{2}, since (𝐳1)t=(𝐳2)t\left({\bf z}_{1}\right)_{-t}=\left({\bf z}_{2}\right)_{-t}, for all t{0,1,,t01}t\in\{0,1,\ldots,t_{0}-1\}, we obtain that

UAt0+1,𝐜(𝐳1𝐳2)0=At0+1t0𝐜(𝐳𝟏𝐳𝟐)t0=(0,,0,(z1i0)t0(z2i0)t0)𝟎.U^{A_{t_{0}+1},{\bf c}}({\bf z}_{1}-{\bf z}_{2})_{0}=A_{t_{0}+1}^{t_{0}}{\bf c}({\bf z_{1}}-{\bf z_{2}})_{-t_{0}}=\left(0,\ldots,0,\left(z_{1}^{i_{0}}\right)_{-t_{0}}-\left(z_{2}^{i_{0}}\right)_{-t_{0}}\right)^{\top}\neq{\bf 0}.

Using the polynomial h(𝐱):=xt0+1h(\mathbf{x}):=x_{t_{0}+1}, this relation implies that UhAt0+1,𝐜(𝐳1)0UhAt0+1,𝐜(𝐳2)0U^{A_{t_{0}+1},{\bf c}}_{h}({\bf z}_{1})_{0}\neq U_{h}^{A_{t_{0}+1},{\bf c}}({\bf z}_{2})_{0} or, equivalently, HhAt0+1,𝐜(𝐳1)HhAt0+1,𝐜(𝐳2),as required.H^{A_{t_{0}+1},{\bf c}}_{h}\left({\bf z}_{1}\right)\neq H^{A_{t_{0}+1},{\bf c}}_{h}\left({\bf z}_{2}\right),\ \mbox{as required}.\blacksquare

6.6 Proof of Proposition 3.7

We start by noting, as we did in the proof of Corollary 3.4, that the condition (3.13) implies that the reservoir map associated to (3.11) is a contraction and hence, by Lemma 6.1, it satisfies the echo state property and has a well-defined associated filter.

We now prove that the condition  (3.13) implies the convergence of the series in the expression  (3.14). Let K1:=maxzIp(z)2=maxzIσmax(p(z))<1K_{1}:=\max_{z\in I}\|p(z)\|_{2}=\max_{z\in I}\sigma_{{\rm max}}(p(z))<1 and K2:=maxzIq(z)2=maxzIσmax(q(z))K_{2}:=\max_{z\in I}\|q(z)\|_{2}=\max_{z\in I}\sigma_{{\rm max}}(q(z)); notice that K1K_{1} and K2K_{2} are well-defined due to the compactness of II. Let now n,mn,m\in\mathbb{N} be such that n<mn<m and let Sn:=j=0n(k=0j1p(ztk))q(ztj)NS_{n}:=\sum_{j=0}^{n}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j})\in\mathbb{R}^{N}. Then,

SnSm\displaystyle\left\|S_{n}-S_{m}\right\| =\displaystyle= j=n+1m(k=0j1p(ztk))q(ztj)j=n+1mk=0j1p(ztk)2q(ztj)\displaystyle\left\|\sum_{j=n+1}^{m}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j})\right\|\leq\sum_{j=n+1}^{m}\left\|\prod_{k=0}^{j-1}p(z_{t-k})\right\|_{2}\left\|q(z_{t-j})\right\|
\displaystyle\leq j=n+1mk=0j1p(ztk)2q(ztj)K2j=n+1mK1jK2j=n+1K1j=K2K1n+11K1.\displaystyle\sum_{j=n+1}^{m}\prod_{k=0}^{j-1}\left\|p(z_{t-k})\right\|_{2}\left\|q(z_{t-j})\right\|\leq K_{2}\sum_{j=n+1}^{m}K_{1}^{j}\leq K_{2}\sum_{j=n+1}^{\infty}K_{1}^{j}=\frac{K_{2}K_{1}^{n+1}}{1-K_{1}}.

The condition K1<1K_{1}<1 implies that K2K1n+11K10\frac{K_{2}K_{1}^{n+1}}{1-K_{1}}\rightarrow 0 as nn\rightarrow\infty and hence {Sn}n\left\{S_{n}\right\}_{n\in\mathbb{N}} is a Cauchy sequence in N\mathbb{R}^{N} that consequently converges. This proves the convergence of the infinite series in (3.14) and the causal character of the filter that it defines. The time-invariance can also be easily established by mimicking the verification that we carried out in the proof of Corollary 3.4. We now prove that (3.14) is indeed a solution of  (3.11):

p(zt)𝐱t1+q(zt)=p(zt)(j=0(k=0j1p(zt1k))q(zt1j))+q(zt)=q(zt)+p(zt)q(zt1)+p(zt)p(zt1)q(zt2)+p(zt)p(zt1)p(zt2)q(zt3)+=j=0(k=0j1p(ztk))q(ztj)=𝐱t.p(z_{t})\mathbf{x}_{t-1}+q({z}_{t})=p(z_{t})\left(\sum_{j=0}^{\infty}\left(\prod_{k=0}^{j-1}p(z_{t-1-k})\right)q(z_{t-1-j})\right)+q({z}_{t})=q(z_{t})+p(z_{t})q(z_{t-1})\\ +p(z_{t})p(z_{t-1})q(z_{t-2})+p(z_{t})p(z_{t-1})p(z_{t-2})q(z_{t-3})+\cdots=\sum_{j=0}^{\infty}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j})=\mathbf{x}_{t}.

We conclude by proving the inequality in (3.16). Note first that for any mm\in\mathbb{N},

j=0m(k=0j1p(ztk))q(ztj)j=0mk=0j1p(ztk)2q(ztj)j=0mk=0j1p(ztk)2q(ztj)K2(1K1m+1)1K1,\left\|\sum_{j=0}^{m}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j})\right\|\leq\sum_{j=0}^{m}\left\|\prod_{k=0}^{j-1}p(z_{t-k})\right\|_{2}\left\|q(z_{t-j})\right\|\\ \leq\sum_{j=0}^{m}\prod_{k=0}^{j-1}\left\|p(z_{t-k})\right\|_{2}\left\|q(z_{t-j})\right\|\leq\frac{K_{2}\left(1-K_{1}^{m+1}\right)}{1-K_{1}},

and hence, by the continuity of the norm and for any tt\in\mathbb{Z}:

𝐱t=limmj=0m(k=0j1p(ztk))q(ztj)limmK2(1K1m+1)1K1=K21K1.\left\|\mathbf{x}_{t}\right\|=\lim_{m\rightarrow\infty}\left\|\sum_{j=0}^{m}\left(\prod_{k=0}^{j-1}p(z_{t-k})\right)q(z_{t-j})\right\|\leq\lim_{m\rightarrow\infty}\frac{K_{2}\left(1-K_{1}^{m+1}\right)}{1-K_{1}}=\frac{K_{2}}{1-K_{1}}.\quad\blacksquare

6.7 Proof of Lemma 3.8

(i) \Longrightarrow (ii): A02+A12++An12<i=0n1λ=λ(n1+1)<1\|A_{0}\|_{2}+\|A_{1}\|_{2}+\cdots+\|A_{n_{1}}\|_{2}<\sum_{i=0}^{n_{1}}\lambda=\lambda(n_{1}+1)<1.

(ii) \Longrightarrow (iii): p(z)2=A0+zA1+z2A2++zn1An12A02+|z|A12+|z2|A22++|zn1|An12<A02+A12++An12<1\|p(z)\|_{2}=\|A_{0}+zA_{1}+z^{2}A_{2}+\cdots+z^{n_{1}}A_{n_{1}}\|_{2}\leq\|A_{0}\|_{2}+|z|\|A_{1}\|_{2}+|z^{2}|\|A_{2}\|_{2}+\cdots+|z^{n_{1}}|\|A_{n_{1}}\|_{2}<\|A_{0}\|_{2}+\|A_{1}\|_{2}+\cdots+\|A_{n_{1}}\|_{2}<1.  \blacksquare

6.8 Proof of Proposition 3.9

We start by formulating and proving an elementary result that will be needed later on.

Lemma 6.3

Let 𝐟:Un𝕄m{\bf f}:U\subset\mathbb{R}^{n}\longrightarrow\mathbb{M}_{m} be a differentiable function defined on the convex set UU. For any 𝐳U{\bf z}\in U denote by i𝐟(𝐳)𝕄m\partial_{i}{\bf f}({\bf z})\in\mathbb{M}_{m} the matrix containing the partial derivatives of the components of 𝐟{\bf f} with respect to their ith-entry, i{1,,n}i\in\left\{1,\ldots,n\right\}. Then, for any 𝐱,𝐲U\mathbf{x},{\bf y}\in U we have:

𝐟(𝐲)𝐟(𝐱)2nmmaxi{1,,n}(sup𝐳U{i𝐟(𝐳)2})𝐱𝐲.\left\|{\bf f}({\bf y})-{\bf f}(\mathbf{x})\right\|_{2}\leq\sqrt{nm}\max_{i\in\left\{1,\ldots,n\right\}}\left(\sup_{{\bf z}\in U}\{\left\|\partial_{i}{\bf f}({\bf z})\right\|_{2}\}\right)\left\|\mathbf{x}-{\bf y}\right\|. (6.11)

Proof.   Given A=(Ai,j)𝕄n,mA=(A_{i,j})\in\mathbb{M}_{n,m}, let AF:=tr(AA)=i=1nj=1mAi,j2\left\|A\right\|_{F}:=\mbox{tr}\left(A^{\top}A\right)=\sum_{i=1}^{n}\sum_{j=1}^{m}A_{i,j}^{2} be its Frobenius norm. Recall (see Theorem 5.6.34 and Exercise 5.6.P24 in [Horn 13]) that

A2AFrA2,\left\|A\right\|_{2}\leq\left\|A\right\|_{F}\leq\sqrt{r}\left\|A\right\|_{2}, (6.12)

where rr is the rank of AA. Consider now 𝐱,𝐲U\mathbf{x},{\bf y}\in U arbitrary and let D𝐟(𝐳):n𝕄mD{\bf f}({\bf z}):{\mathbb{R}}^{n}\longrightarrow\mathbb{M}_{m} be the differential of 𝐟{\bf f} evaluated at 𝐳U{\bf z}\in U. The convexity of UU implies that the Mean Value Inequality holds (see Theorem 2.4.8 in [Abra 88]) and hence:

𝐟(𝐲)𝐟(𝐱)Fsupt[0,1]{D𝐟((1t)𝐱+t𝐲)2}𝐱𝐲.\left\|{\bf f}({\bf y})-{\bf f}(\mathbf{x})\right\|_{F}\leq\sup_{t\in[0,1]}\{\left\|D{\bf f}((1-t){\bf x}+t{\bf y})\right\|_{2}\}\left\|\mathbf{x}-{\bf y}\right\|. (6.13)

The first inequality in  (6.12) and (6.13) imply that

𝐟(𝐲)𝐟(𝐱)2sup𝐳U{D𝐟(𝐳)2}𝐱𝐲.\left\|{\bf f}({\bf y})-{\bf f}(\mathbf{x})\right\|_{2}\leq\sup_{{\bf z}\in U}\{\left\|D{\bf f}({\bf z})\right\|_{2}\}\left\|\mathbf{x}-{\bf y}\right\|. (6.14)

At the same time, notice that by  (6.12)

D𝐟(𝐳)22D𝐟(𝐳)F2=i=1nj=1mk=1mifjk2(𝐳)=i=1ni𝐟(𝐳)F2mi=1ni𝐟(𝐳)22mnmaxi{1,,n}(i𝐟(𝐳)22).\left\|D{\bf f}({\bf z})\right\|_{2}^{2}\leq\left\|D{\bf f}({\bf z})\right\|_{F}^{2}=\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{m}\partial_{i}f_{jk}^{2}({\bf z})=\sum_{i=1}^{n}\left\|\partial_{i}{\bf f}({\bf z})\right\|^{2}_{F}\\ \leq m\sum_{i=1}^{n}\left\|\partial_{i}{\bf f}({\bf z})\right\|^{2}_{2}\leq mn\max_{i\in\left\{1,\ldots,n\right\}}\left(\left\|\partial_{i}{\bf f}({\bf z})\right\|^{2}_{2}\right).

This inequality, together with  (6.14), imply the statement  (6.11) since the maximum and the supremum can be trivially exchanged.  \blacktriangledown

We now carry out the proof of the proposition under the hypothesis (iii) in Lemma  3.8 which is implied by the other two. The modifications necessary to establish the result under the other two hypotheses are straightforward. Consider two arbitrary elements 𝐳,𝐬I{\bf z},{\bf s}\in I^{\mathbb{Z}_{-}}. Then, by the Cauchy-Schwarz and Minkowski inequalities:

|H𝐖p,q(𝐳)H𝐖p,q(𝐬)|=|𝐖[j=0((k=0j1p(zk))q(zj)(k=0j1p(sk))q(sj))]|𝐖j=0aj(zj+1¯)q(zj)aj(sj+1¯)q(sj),whereaj(zj+1¯):=k=0j1p(zk).\left|H_{{\bf W}}^{p,q}({\bf z})-H_{{\bf W}}^{p,q}({\bf s})\right|=\left|{\bf W}^{\top}\left[\sum_{j=0}^{\infty}\left(\left(\prod_{k=0}^{j-1}p(z_{-k})\right)q(z_{-j})-\left(\prod_{k=0}^{j-1}p(s_{-k})\right)q(s_{-j})\right)\right]\right|\\ \leq\left\|{\bf W}\right\|\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})q(z_{-j})-a_{j}(\underline{s_{-j+1}})q(s_{-j})\right\|,\quad\mbox{where}\quad a_{j}(\underline{z_{-j+1}}):=\prod_{k=0}^{j-1}p(z_{-k}). (6.15)

We now bound the right hand side of  (6.15) as follows:

j=0aj(zj+1¯)q(zj)aj(sj+1¯)q(sj)=j=0aj(zj+1¯)q(zj)+aj(zj+1¯)q(sj)aj(zj+1¯)q(sj)aj(sj+1¯)q(sj)j=0aj(zj+1¯)2q(zj)q(sj)+aj(zj+1¯)aj(sj+1¯)2q(sj)\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})q(z_{-j})-a_{j}(\underline{s_{-j+1}})q(s_{-j})\right\|\\ =\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})q(z_{-j})+a_{j}(\underline{z_{-j+1}})q(s_{-j})-a_{j}(\underline{z_{-j+1}})q(s_{-j})-a_{j}(\underline{s_{-j+1}})q(s_{-j})\right\|\\ \leq\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})\right\|_{2}\left\|q(z_{-j})-q(s_{-j})\right\|+\left\|a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})\right\|_{2}\left\|q(s_{-j})\right\| (6.16)

If LqL_{q} is a Lipschitz constant of q:INq:I\longrightarrow\mathbb{R}^{N} then

aj(zj+1¯)2q(zj)q(sj)MpjLq|zjsj|,\left\|a_{j}(\underline{z_{-j+1}})\right\|_{2}\left\|q(z_{-j})-q(s_{-j})\right\|\leq M_{p}^{j}L_{q}\left|z_{-j}-s_{-j}\right|, (6.17)

which inserted in  (6.16) and in  (6.15) implies that

|H𝐖p,q(𝐳)H𝐖p,q(𝐬)|𝐖Lq[j=0Mpj|zjsj|+j=0aj(zj+1¯)aj(sj+1¯)2]\left|H_{{\bf W}}^{p,q}({\bf z})-H_{{\bf W}}^{p,q}({\bf s})\right|\leq\left\|{\bf W}\right\|L_{q}\left[\sum_{j=0}^{\infty}M_{p}^{j}\left|z_{-j}-s_{-j}\right|+\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})\right\|_{2}\right] (6.18)

We now bound above the second summand in (6.18) using the inequality (6.11) in the statement of Lemma 6.3 as well as the following identity:

aj(zj+1¯)aj(sj+1¯)=l=0j1(p(s0)p(s(l1))p(zl)p(z(l+1))p(z(j1))p(s0)p(s(l1))p(sl)p(z(l+1))p(z(j1))).a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})=\sum_{l=0}^{j-1}(p(s_{0})\cdots p(s_{-(l-1)})\cdot p(z_{-l})\cdot p(z_{-(l+1)})\cdots p(z_{-(j-1)})\\ -p(s_{0})\cdots p(s_{-(l-1)})\cdot p(s_{-l})\cdot p(z_{-(l+1)})\cdots p(z_{-(j-1)})). (6.19)

This equality simply follows from writing:

aj(zj+1¯)aj(sj+1¯)=l=0j1p(zl)l=0j1p(sl)=p(z0)p(z1)p(z(j1))p(s0)p(s1)p(s(j1))=p(z0)p(z1)p(z(j1))p(s0)p(s1)p(s(j1))+{p(s0)p(z1)p(z(j1))p(s0)p(z1)p(z(j1))+p(s0)p(s1)p(z2)p(z(j1))p(s0)p(s1)p(z2)p(z(j1))++p(s0)p(s(l1))p(zl)p(z(l+1))p(z(j1))p(s0)p(s(l1))p(zl)p(z(l+1))p(z(j1))++p(s0)p(s(j2))p(z(j1))p(s0)p(s(j2))p(z(j1))}=l=0j1(p(s0)p(s(l1))p(zl)p(z(l+1))p(z(j1))p(s0)p(s(l1))p(sl)p(z(l+1))p(z(j1))),a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})=\prod_{l=0}^{j-1}p(z_{-l})-\prod_{l=0}^{j-1}p(s_{-l})=p(z_{0})p(z_{-1})\cdots p(z_{-(j-1)})-{p(s_{0})p(s_{-1})\cdots p(s_{-(j-1)})}\\ ={p(z_{0})p(z_{-1})\cdots p(z_{-(j-1)})}-{p(s_{0})p(s_{-1})\cdots p(s_{-(j-1)})}\\ +\Bigg{\{}{{p(s_{0})p(z_{-1})\cdots p(z_{-(j-1)})}}-{p(s_{0})p(z_{-1})\cdots p(z_{-(j-1)})}\\ +{{{p(s_{0})p(s_{-1})p(z_{-2})\cdots p(z_{-(j-1)})}}}-{{p(s_{0})p(s_{-1})p(z_{-2})\cdots p(z_{-(j-1)})}}\\ +\cdots+{p(s_{0})\cdots p(s_{-(l-1)})p(z_{-l})p(z_{-(l+1)})\cdots p(z_{-(j-1)})}-p(s_{0})\cdots p(s_{-(l-1)})p(z_{-l})p(z_{-(l+1)})\cdots p(z_{-(j-1)})\\ +\cdots+{p(s_{0})\cdots p(s_{-(j-2)})p(z_{-(j-1)})}-{{{{p(s_{0})\cdots p(s_{-(j-2)})p(z_{-(j-1)})}}}}\Bigg{\}}\\ =\sum_{l=0}^{j-1}(p(s_{0})\cdots p(s_{-(l-1)})\cdot p(z_{-l})\cdot p(z_{-(l+1)})\cdots p(z_{-(j-1)})\\ -p(s_{0})\cdots p(s_{-(l-1)})\cdot p(s_{-l})\cdot p(z_{-(l+1)})\cdots p(z_{-(j-1)})),

where the 2(j1)2(j-1) summands inside the braces are obtained by adding and subtracting polynomials recursively constructed out of aj(zj+1¯)a_{j}(\underline{z_{-j+1}}) by changing the variables of the first kk factors, k{1,,j1}k\in\{1,\cdots,j-1\}. We then combine all the (2k1)(2k-1)-th with the (2k+2)th(2k+2)-th summands of the resulting expression in order to obtain the first j1j-1 terms in the sum in (6.19). Then the last jj-th term results from combining the second with the one before last summands, that is, p(s0)p(s1)p(s(j1))p(s_{0})p(s_{-1})\cdots p(s_{-(j-1)}) and p(s0)p(s(j2))p(z(j1))p(s_{0})\cdots p(s_{-(j-2)})p(z_{-(j-1)}), respectively.

Using the relation (6.19) we can write:

aj(zj+1¯)aj(sj+1¯)2l=0j1p(s0)p(s(l1))(p(zl)p(sl))p(z(l+1))p(z(j1))2l=0j1p(s0)2p(s(l1))2p(zl)p(sl)2p(z(l+1))2p(z(j1))2Mpj1NsupzI{p(z)2}l=1j|zj+lsj+l|,\left\|a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})\right\|_{2}\leq\sum_{l=0}^{j-1}\left\|p(s_{0})\cdots p(s_{-(l-1)})\cdot(p(z_{-l})-p(s_{-l}))\cdot p(z_{-(l+1)})\cdots p(z_{-(j-1)})\right\|_{2}\\ \leq\sum_{l=0}^{j-1}\left\|p(s_{0})\right\|_{2}\cdots\left\|p(s_{-(l-1)})\right\|_{2}\cdot\left\|p(z_{-l})-p(s_{-l})\right\|_{2}\cdot\left\|p(z_{-(l+1)})\right\|_{2}\cdots\left\|p(z_{-(j-1)})\right\|_{2}\\ \leq M_{p}^{j-1}\sqrt{N}\sup_{z\in I}\left\{\left\|p^{\prime}(z)\right\|_{2}\right\}\sum_{l=1}^{j}\left|z_{-j+l}-s_{-j+l}\right|,

where the last inequality is a consequence of (6.11). Let Mp:=NsupzI{p(z)2}M_{p^{\prime}}:=\sqrt{N}\sup_{z\in I}\left\{\left\|p^{\prime}(z)\right\|_{2}\right\}, then

aj(zj+1¯)aj(sj+1¯)2MpMpMpjl=1j|zj+lsj+l|=MpMpl=1jMplMpjl|z(jl)s(jl)|\left\|a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})\right\|_{2}\leq\frac{M_{p^{\prime}}}{M_{p}}M_{p}^{j}\sum_{l=1}^{j}\left|z_{-j+l}-s_{-j+l}\right|=\frac{M_{p^{\prime}}}{M_{p}}\sum_{l=1}^{j}M_{p}^{l}M_{p}^{j-l}\left|z_{-(j-l)}-s_{-(j-l)}\right|

Since the last term in this inequality is one summand of the Cauchy product of the series with general terms MpjM_{p}^{j} and Mpj|zjsj|M_{p}^{j}\left|z_{-j}-s_{-j}\right| and these two series are absolutely convergent (recall the statement (2.5)), we can conclude (see for instance [Apos 74, §8.24]) that

j=0aj(zj+1¯)aj(sj+1¯)2MpMpj=0l=1jMplMpjl|z(jl)s(jl)|=MpMp11Mpj=0Mpj|zjsj|.\sum_{j=0}^{\infty}\left\|a_{j}(\underline{z_{-j+1}})-a_{j}(\underline{s_{-j+1}})\right\|_{2}\leq\frac{M_{p^{\prime}}}{M_{p}}\sum_{j=0}^{\infty}\sum_{l=1}^{j}M_{p}^{l}M_{p}^{j-l}\left|z_{-(j-l)}-s_{-(j-l)}\right|\\ =\frac{M_{p^{\prime}}}{M_{p}}\frac{1}{1-M_{p}}\sum_{j=0}^{\infty}M_{p}^{j}\left|z_{-j}-s_{-j}\right|.

If we now substitute this relation in (6.18) and we use Lemma 2.1 with weighting sequences wtρ:=Mpρtw_{t}^{\rho}:=M_{p}^{\rho t}, for any ρ(0,1)\rho\in(0,1), we obtain that:

|H𝐖p,q(𝐳)H𝐖p,q(𝐬)|\displaystyle\left|H_{{\bf W}}^{p,q}({\bf z})-H_{{\bf W}}^{p,q}({\bf s})\right| \displaystyle\leq 𝐖Lq(1+MpMp11Mp)j=0Mpj|zjsj|\displaystyle\left\|{\bf W}\right\|L_{q}\left(1+\frac{M_{p^{\prime}}}{M_{p}}\frac{1}{1-M_{p}}\right)\sum_{j=0}^{\infty}M_{p}^{j}\left|z_{-j}-s_{-j}\right|
\displaystyle\leq 𝐖Lq(1+MpMp11Mp)(11Mp1ρ)𝐳𝐬wρ,\displaystyle\left\|{\bf W}\right\|L_{q}\left(1+\frac{M_{p^{\prime}}}{M_{p}}\frac{1}{1-M_{p}}\right)\left(\frac{1}{1-M_{p}^{1-\rho}}\right)\left\|{\bf z}-{\bf s}\right\|_{w^{\rho}},

which proves the continuity of the map H𝐖p,q:(I,wρ)H_{{\bf W}}^{p,q}:(I^{\mathbb{Z}_{-}},\|\cdot\|_{w^{\rho}})\longrightarrow\mathbb{R}, as required.  \blacksquare

6.9 Proof of Proposition 3.10

We first recall that since by hypothesis the reservoir functionals H𝐖1p1,q1,H𝐖2p2,q2H_{{\bf W}_{1}}^{p_{1},q_{1}},H_{{\bf W}_{2}}^{p_{2},q_{2}} are well-defined then, by the comments that follow (3.5), so are H𝐖1p1,q1+λH𝐖2p2,q2H_{{\bf W}_{1}}^{p_{1},q_{1}}+\lambda H_{{\bf W}_{2}}^{p_{2},q_{2}} and H𝐖1p1,q1H𝐖2p2,q2H_{{\bf W}_{1}}^{p_{1},q_{1}}\cdot H_{{\bf W}_{2}}^{p_{2},q_{2}}.

The proof of (𝐢){\bf(i)} is a straightforward verification. As to (𝐢𝐢){\bf(ii)}, denote first by yt1,yt2y_{t}^{1},y_{t}^{2} and 𝐱t1,𝐱t2\mathbf{x}_{t}^{1},\mathbf{x}_{t}^{2} the outputs and the state variables, respectively, of the SAS corresponding to the two functionals that we are considering. We note first that by (3.12):

yt1yt2=𝐖1𝐱t1𝐖2𝐱t2=(𝐖1𝐖2)(𝐱t1𝐱t2).y_{t}^{1}\cdot y_{t}^{2}={\bf W}_{1}^{\top}\mathbf{x}^{1}_{t}\cdot{\bf W}_{2}^{\top}\mathbf{x}^{2}_{t}=\left({\bf W}_{1}\otimes{\bf W}_{2}\right)^{\top}(\mathbf{x}^{1}_{t}\otimes\mathbf{x}^{2}_{t}).

Using (3.11) it can be readily verified that the time evolution of the tensor product 𝐱t1𝐱t2\mathbf{x}^{1}_{t}\otimes\mathbf{x}^{2}_{t} is given by

𝐱t1𝐱t2\displaystyle\mathbf{x}^{1}_{t}\otimes\mathbf{x}^{2}_{t} =(p1(zt)p2(zt))(𝐱t11𝐱t12)+p1(zt)𝐱t11q2(zt)+q1(zt)p2(zt)𝐱t12+q1(zt)q2(zt),\displaystyle=(p_{1}(z_{t})\otimes p_{2}(z_{t}))(\mathbf{x}^{1}_{t-1}\otimes\mathbf{x}^{2}_{t-1})+p_{1}(z_{t})\mathbf{x}^{1}_{t-1}\otimes q_{2}(z_{t})+q_{1}(z_{t})\otimes p_{2}(z_{t})\mathbf{x}^{2}_{t-1}+q_{1}(z_{t})\otimes q_{2}(z_{t}),
=(p1p2)(zt)(𝐱t11𝐱t12)+p1(zt)𝐱t11q2(zt)+q1(zt)p2(zt)𝐱t12+(q1q2)(zt),\displaystyle=(p_{1}\otimes p_{2})(z_{t})(\mathbf{x}^{1}_{t-1}\otimes\mathbf{x}^{2}_{t-1})+p_{1}(z_{t})\mathbf{x}^{1}_{t-1}\otimes q_{2}(z_{t})+q_{1}(z_{t})\otimes p_{2}(z_{t})\mathbf{x}^{2}_{t-1}+(q_{1}\otimes q_{2})(z_{t}),

which proves (3.23) and hence (3.22).

In order to show that the reservoir functionals on the right hand side of (3.21) and (3.22) are well-defined we prove the following lemma.

Lemma 6.4

Let p1(z)𝕄N1,M1[z]p_{1}(z)\in\mathbb{M}_{N_{1},M_{1}}[z] and p2(z)𝕄N2,M2[z]p_{2}(z)\in\mathbb{M}_{N_{2},M_{2}}[z] be two polynomials with matrix coefficients and assume that they satisfy that p1(z)2<1ϵ\|p_{1}(z)\|_{2}<1-\epsilon and p2(z)2<1ϵ\|p_{2}(z)\|_{2}<1-\epsilon for all zI:=[1,1]z\in I:=[-1,1] and a given 0<ϵ>10<\epsilon>1. Then:

(i)

p1p2(z)2<1ϵ\|p_{1}\oplus p_{2}(z)\|_{2}<1-\epsilon,

(ii)

p1p2(z)2<1ϵ\|p_{1}\otimes p_{2}(z)\|_{2}<1-\epsilon,

for all zI:=[1,1]z\in I:=[-1,1].

Proof of the lemma.   Let 𝐱=𝐱1𝐱2M1M2\mathbf{x}=\mathbf{x}_{1}\oplus\mathbf{x}_{2}\in\mathbb{R}^{M_{1}}\oplus\mathbb{R}^{M_{2}}. Then, in order to prove part (i) note that

(p1p2)(z)𝐱2=(p1(z)𝐱1,p2(z)𝐱2)2=p1(z)𝐱12+p2(z)𝐱22p1(z)22𝐱12+p2(z)22𝐱22(1ϵ)2(𝐱12+𝐱22)=(1ϵ)2𝐱2.\|(p_{1}\oplus p_{2})(z)\cdot\mathbf{x}\|^{2}=\|(p_{1}(z)\cdot\mathbf{x}_{1},p_{2}(z)\cdot\mathbf{x}_{2})\|^{2}=\|p_{1}(z)\cdot\mathbf{x}_{1}\|^{2}+\|p_{2}(z)\cdot\mathbf{x}_{2}\|^{2}\\ \leq\|p_{1}(z)\|^{2}_{2}\|\mathbf{x}_{1}\|^{2}+\|p_{2}(z)\|^{2}_{2}\|\mathbf{x}_{2}\|^{2}\leq(1-\epsilon)^{2}\left(\|\mathbf{x}_{1}\|^{2}+\|\mathbf{x}_{2}\|^{2}\right)=(1-\epsilon)^{2}\|\mathbf{x}\|^{2}.

This inequality implies that

p1p2(z)2=sup𝐱𝟎{(p1p2)(z)𝐱𝐱}sup𝐱𝟎{(1ϵ)𝐱𝐱}=1ϵ,as required.\|p_{1}\oplus p_{2}(z)\|_{2}=\sup_{\mathbf{x}\neq{\bf 0}}\left\{\frac{\|(p_{1}\oplus p_{2})(z)\cdot\mathbf{x}\|}{\|\mathbf{x}\|}\right\}\leq\sup_{\mathbf{x}\neq{\bf 0}}\left\{\frac{(1-\epsilon)\|\mathbf{x}\|}{\|\mathbf{x}\|}\right\}=1-\epsilon,\quad\mbox{as required.}

As to the statement in part (ii):

p1p2(z)2=σmax(p1p2(z))=σmax(p1(z))σmax(p2(z))=p1(z)2p2(z)2<(1ϵ)2<(1ϵ).\|p_{1}\otimes p_{2}(z)\|_{2}=\sigma_{{\rm max}}(p_{1}\otimes p_{2}(z))=\sigma_{{\rm max}}(p_{1}(z))\sigma_{{\rm max}}(p_{2}(z))=\|p_{1}(z)\|_{2}\|p_{2}(z)\|_{2}<(1-\epsilon)^{2}<(1-\epsilon).\ \blacktriangledown

Now, the first part of this lemma and Proposition 3.7 guarantee that H𝐖1λ𝐖2p1p2,q1q2H_{{\bf W}_{1}\oplus\lambda{\bf W}_{2}}^{p_{1}\oplus p_{2},q_{1}\oplus q_{2}} is well-defined. The same conclusion holds for H𝟎𝟎(𝐖1𝐖2)p,q1q2(q1q2)H_{{\bf 0}\oplus{\bf 0}\oplus\left({\bf W}_{1}\otimes{\bf W}_{2}\right)}^{p,q_{1}\oplus q_{2}\oplus\left(q_{1}\otimes q_{2}\right)} because due to the block diagonal character of (3.23) then σmax(p(z))=σmax((p1(z)p2(z)(p1p2)(z))=p1(z)p2(z)(p1p2)(z)2\sigma_{{\rm max}}(p(z))=\sigma_{{\rm max}}((p_{1}(z)\oplus p_{2}(z)\oplus\left(p_{1}\otimes p_{2}\right)(z))=\|p_{1}(z)\oplus p_{2}(z)\oplus\left(p_{1}\otimes p_{2}\right)(z)\|_{2}. By parts (i) and (ii) in Lemma 6.4 we can conclude that p(z)2<1ϵ\|p(z)\|_{2}<1-\epsilon for all z[1,1]z\in[-1,1] and, again by Proposition 3.7, the reservoir functional H𝟎𝟎(𝐖1𝐖2)p,q1q2(q1q2)H_{{\bf 0}\oplus{\bf 0}\oplus\left({\bf W}_{1}\otimes{\bf W}_{2}\right)}^{p,q_{1}\oplus q_{2}\oplus\left(q_{1}\otimes q_{2}\right)} is well-defined.  \blacksquare

6.10 Proof of Theorem 3.12

Note first that the hypothesis Mp<1ϵ<1M_{p}<1-\epsilon<1 on the polynomials pp associated to the elements in 𝒮ϵ\mathcal{S}_{\epsilon} implies, by Propositions 3.7 and 3.9, that this family is made of time-invariant reservoir filters that have the FMP with respect to weighting sequences of the form wtp:=Mpρtw^{p}_{t}:=M_{p}^{\rho t}, ρ(0,1)\rho\in(0,1). Additionally, using Lemma 2.7 and the hypothesis Mp<1ϵM_{p}<1-\epsilon, for a fixed given ϵ(0,1)\epsilon\in(0,1), we can conclude that all the reservoir filters in 𝒮ϵ\mathcal{S}_{\epsilon} have the FMP with the common weighting sequence wtρ:=(1ϵ)ρtw_{t}^{\rho}:=(1-\epsilon)^{\rho t}, ρ(0,1)\rho\in(0,1).

The elements in 𝒮ϵ\mathcal{S}_{\epsilon} form a polynomial algebra as a consequence of Lemma 6.4 and Proposition 3.10. Moreover, the family 𝒮ϵ\mathcal{S}_{\epsilon} has the point separation property and contains all the constant functionals. Indeed, since 𝒮ϵ\mathcal{S}_{\epsilon} includes the linear family ϵ\mathcal{L}_{\epsilon}, we recall that in Appendix 6.5 we proved that given 𝐳1,𝐳2KM(n){\bf z}_{1},{\bf z}_{2}\in K_{M}\subset\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}} such that 𝐳1𝐳2{\bf z}_{1}\neq{\bf z}_{2}, there exists A𝕄(n,n)A\in\mathbb{M}(n,n), with σmax(A)<1ϵ\sigma_{{\rm max}}(A)<1-\epsilon and 𝐜:=𝕀n{\bf c}:=\mathbb{I}_{n} such that UA,𝐜(𝐳1)0UA,𝐜(𝐳2)0U^{A,{\bf c}}({\bf z}_{1})_{0}\neq U^{A,{\bf c}}({\bf z}_{2})_{0}. The point separation property follows from choosing any vector 𝐖N{\bf W}\in\mathbb{R}^{N} such that 𝐖(UA,𝐜(𝐳1))0𝐖(UA,𝐜(𝐳2))0{\bf W}^{\top}(U^{A,{\bf c}}({\bf z}_{1}))_{0}\neq{\bf W}^{\top}(U^{A,{\bf c}}({\bf z}_{2}))_{0}, which implies that U𝐖A,𝐜(𝐳1)0U𝐖A,𝐜(𝐳2)0U^{A,{\bf c}}_{{\bf W}}({\bf z}_{1})_{0}\neq U^{A,{\bf c}}_{{\bf W}}({\bf z}_{2})_{0} and hence HU𝐖A,𝐜(𝐳1)HU𝐖A,𝐜(𝐳2)H_{U^{A,{\bf c}}_{{\bf W}}}({\bf z}_{1})\neq H_{U^{A,{\bf c}}_{{\bf W}}}({\bf z}_{2}), as required.

All the constant functionals can be obtained by taking for pp the zero polynomial and for qq the constant polynomials (qq has degree zero). In that case, the state variables are a constant sequence 𝐱t=q\mathbf{x}_{t}=q and the associated functional is the constant map H𝐖0,q(𝐳)=𝐖qH^{0,q}_{{\bf W}}({\bf z})={\bf W}^{\top}q, for all 𝐳KM{\bf z}\in K_{M}.

The universality result follows hence from the Stone-Weierstrass Theorem and the compactness of (I,wρ)(I^{\mathbb{Z}_{-}},\|\cdot\|_{w^{\rho}}) established in Lemma 2.2.

Finally, we prove the statement regarding the family 𝒩𝒮ϵ\mathcal{NS}_{\epsilon} determined by nilpotent polynomials pp. First, by expressions (3.21), (3.22), and (3.23), it is easy to show that this family is a polynomial algebra. The only point that requires some detail is the fact that the kk-th power of the polynomial pp in (3.23) that is obtained in the product of the two SAS reservoir functionals H𝐖1p1,q1H_{{\bf W}_{1}}^{p_{1},q_{1}} and H𝐖2p2,q2H_{{\bf W}_{2}}^{p_{2},q_{2}} is given by

pk(z):=(p1k(z)𝟎𝟎𝟎p2k(z)𝟎p1kq2k1(z)q1k1p2k(z)p1kp2k(z)),p^{k}(z):=\left(\begin{array}[]{ccc}p_{1}^{k}(z)&{\bf 0}&{\bf 0}\\ {\bf 0}&p_{2}^{k}(z)&{\bf 0}\\ p_{1}^{k}\otimes q_{2}^{k-1}(z)&q_{1}^{k-1}\otimes p_{2}^{k}(z)&p_{1}^{k}\otimes p_{2}^{k}(z)\end{array}\right),

which shows that if p1p_{1} and p2p_{2} are nilpotent then so is the associated polynomial pp. The point separation property is, again, inherited from the proof of linear case provided in the Appendix 6.5.  \blacksquare

6.11 Proof of Lemma 4.1

(i) Let A:={ρ+¯𝐗Bρalmost surely}A:=\left\{\rho\in\overline{\mathbb{R}_{+}}\mid\left\|{\bf X}\right\|_{B}\leq\rho\quad\mbox{almost surely}\right\}. It suffices to show that 𝐗L:=infAA\left\|{\bf X}\right\|_{L^{\infty}}:=\inf A\in A, which implies that 𝐗B𝐗L\left\|{\bf X}\right\|_{B}\leq\left\|{\bf X}\right\|_{L^{\infty}} almost surely. Indeed, consider the sequence 𝐗L+1/j\left\|{\bf X}\right\|_{L^{\infty}}+1/j, jj\in\mathbb{N}. By the approximation property of the infimum, there exists a decreasing sequence of numbers {ρj}jA\{\rho_{j}\}_{j\in\mathbb{N}}\subset A in AA satisfying 𝐗Lρj<𝐗L+1/j\left\|{\bf X}\right\|_{L^{\infty}}\leq\rho_{j}<\left\|{\bf X}\right\|_{L^{\infty}}+1/j for all jj\in\mathbb{N}. Define F:={ωΩ𝐗(ω)B>𝐗L}F:=\left\{\omega\in\Omega\mid\left\|{\bf X}(\omega)\right\|_{B}>\left\|{\bf X}\right\|_{L^{\infty}}\right\} and Fj:={ωΩ𝐗(ω)B>ρj}F_{j}:=\left\{\omega\in\Omega\mid\left\|{\bf X}(\omega)\right\|_{B}>\rho_{j}\right\}. It is easy to see that FjFj+1F_{j}\subset F_{j+1}, jj\in\mathbb{N} and that limjFj=F\lim_{j\rightarrow\infty}F_{j}=F and, consequently, (see  [Grim 01, Lemma 5, page 7]) limj(Fj)=(F)\lim_{j\rightarrow\infty}\mathbb{P}(F_{j})=\mathbb{P}(F). Since by construction (Fj)=0\mathbb{P}(F_{j})=0 for all jj\in\mathbb{N} then (F)=0\mathbb{P}(F)=0 necessarily, which shows that 𝐗LA\left\|{\bf X}\right\|_{L^{\infty}}\in A, as required.

(ii) If 𝐗LC\left\|{\bf X}\right\|_{L^{\infty}}\leq C then by part (i), 𝐗B𝐗LC\left\|{\bf X}\right\|_{B}\leq\left\|{\bf X}\right\|_{L^{\infty}}\leq C almost surely. Conversely, if 𝐗BC\left\|{\bf X}\right\|_{B}\leq C almost surely, then CA={ρ+¯𝐗Bρalmost surely}C\in A=\left\{\rho\in\overline{\mathbb{R}^{+}}\mid\left\|{\bf X}\right\|_{B}\leq\rho\quad\mbox{almost surely}\right\}. Consequently, 𝐗L=infACA\left\|{\bf X}\right\|_{L^{\infty}}=\inf A\leq C\in A, as required.

(iii) Suppose first that 𝐗BC\left\|{\bf X}\right\|_{B}\leq C almost surely and define F:={ωΩ𝐗(ω)B>C}F:=\left\{\omega\in\Omega\mid\left\|{\bf X}(\omega)\right\|_{B}>C\right\}. By hypothesis, we have that (F)=0\mathbb{P}(F)=0 and (ΩF)=1\mathbb{P}(\Omega\setminus F)=1. Then,

E[𝐗Bk]\displaystyle{\rm E}\left[\left\|{\bf X}\right\|_{B}^{k}\right] =\displaystyle= Ω𝐗Bk𝑑=ΩF𝐗Bk𝑑+F𝐗Bk𝑑\displaystyle\int_{\Omega}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}=\int_{\Omega\setminus F}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}+\int_{F}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}
=\displaystyle= ΩF𝐗Bk𝑑ΩFCk𝑑=Ck(ΩF)=Ck,\displaystyle\int_{\Omega\setminus F}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}\leq\int_{\Omega\setminus F}C^{k}d\mathbb{P}=C^{k}\mathbb{P}(\Omega\setminus F)=C^{k},

as required. Conversely, assume that E[𝐗Bk]Ck{\rm E}\left[\left\|{\bf X}\right\|_{B}^{k}\right]\leq C^{k}, for any kk\in\mathbb{N}, and define

Fn:={ωΩ𝐗(ω)B>C+1n},F_{n}:=\left\{\omega\in\Omega\mid\left\|{\bf X}(\omega)\right\|_{B}>C+\frac{1}{n}\right\},

for all n1n\geq 1. It is easy to see that FnFn+1F_{n}\subset F_{n+1} and that limnFn=F\lim_{n\rightarrow\infty}F_{n}=F and, consequently, (see  [Grim 01, Lemma 5, page 7]) limn(Fn)=(F)\lim_{n\rightarrow\infty}\mathbb{P}(F_{n})=\mathbb{P}(F). Now,

Ck\displaystyle C^{k} \displaystyle\geq E[𝐗Bk]=Ω𝐗Bk𝑑=ΩFn𝐗Bk𝑑+Fn𝐗Bk𝑑\displaystyle{\rm E}\left[\left\|{\bf X}\right\|_{B}^{k}\right]=\int_{\Omega}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}=\int_{\Omega\setminus F_{n}}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}+\int_{F_{n}}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}
\displaystyle\geq Fn𝐗Bk𝑑Fn(C+1n)k𝑑=(C+1n)k(Fn),\displaystyle\int_{F_{n}}\left\|{\bf X}\right\|_{B}^{k}d\mathbb{P}\geq\int_{F_{n}}\left(C+\frac{1}{n}\right)^{k}d\mathbb{P}=\left(C+\frac{1}{n}\right)^{k}\mathbb{P}(F_{n}),

which implies that (Fn)Ck/(C+1n)k\mathbb{P}(F_{n})\leq C^{k}/\left(C+\frac{1}{n}\right)^{k} for any kk\in\mathbb{N} and hence, by taking the limit kk\rightarrow\infty, we can conclude that (Fn)=0\mathbb{P}(F_{n})=0. Consequently, (F)=limn(Fn)=0\mathbb{P}(F)=\lim_{n\rightarrow\infty}\mathbb{P}(F_{n})=0, which shows that 𝐗BC\left\|{\bf X}\right\|_{B}\leq C almost surely.

(iv) Let ||||||\cdot|| denote the Euclidean norm on n\mathbb{R}^{n}. Since |Xi|𝐗\left|X_{i}\right|\leq\left\|{\bf X}\right\| always and by part (i) 𝐗𝐗L\left\|{\bf X}\right\|\leq\left\|{\bf X}\right\|_{L^{\infty}} almost surely, we can conclude that |Xi|𝐗L\left|X_{i}\right|\leq\left\|{\bf X}\right\|_{L^{\infty}} almost surely. This implies that XiL(Ω,)X_{i}\in L^{\infty}(\Omega,\mathbb{R}) and hence the statement follows from part (iii).  \blacksquare

6.12 Proof of Lemma 4.2

We start by proving by contradiction that

esssupωΩ{supt{𝐳t(ω)}}supt{esssupωΩ{𝐳t(ω)}}.\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}\geq\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (6.20)

Indeed, suppose that

esssupωΩ{supt{𝐳t(ω)}}<supt{esssupωΩ{𝐳t(ω)}}.\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}<\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (6.21)

By the approximation property of the supremum [Apos 74, Theorem 1.14], there exists t0t_{0}\in\mathbb{Z} such that

esssupωΩ{supt{𝐳t(ω)}}<esssupωΩ{𝐳t0(ω)}supt{esssupωΩ{𝐳t(ω)}}.\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}<\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t_{0}}(\omega)\|\right\}\leq\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (6.22)

However, 𝐳t0(ω)supt{𝐳t(ω)}\|{\bf z}_{t_{0}}(\omega)\|\leq\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}(\omega)\|\} for all ωΩ\omega\in\Omega and hence by part (i) in Lemma 4.1

𝐳t0(ω)supt{𝐳t(ω)}esssupωΩ{supt{𝐳t(ω)}},almost surely.\|{\bf z}_{t_{0}}(\omega)\|\leq\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}(\omega)\|\}\leq\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\},\quad\mbox{almost surely.}

Now, by part (ii) in Lemma 4.1, this implies that

esssupωΩ{𝐳t0(ω)}esssupωΩ{supt{𝐳t(ω)}}.\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t_{0}}(\omega)\|\right\}\leq\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}.

However, this expression is in contradiction with the first inequality in (6.22) and hence the assumption (6.21) cannot be correct. This argument implies that the inequality (6.20) holds.

We now prove the reverse inequality, that is,

esssupωΩ{supt{𝐳t(ω)}}supt{esssupωΩ{𝐳t(ω)}}.\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}\leq\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}. (6.23)

By part (ii) of Lemma 4.1, this inequality holds if and only if

supt{𝐳t(ω)}supt{esssupωΩ{𝐳t(ω)}},almost surely.\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\leq\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\},\quad\mbox{almost surely.} (6.24)

Now, by part (i) in Lemma 4.1, we have that 𝐳t(ω)esssupωΩ{𝐳t(ω)}\|{\bf z}_{t}(\omega)\|\leq\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\}, almost surely and for each fixed tt\in\mathbb{Z}. Let AtΩA_{t}\subset\Omega be the zero-measure set such that 𝐳t(ω)>esssupωΩ{𝐳t(ω)}\|{\bf z}_{t}(\omega)\|>\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\} for all ωAt\omega\in A_{t}. Let A:=tAtA:=\bigcup_{t\in\mathbb{Z}}A_{t}. Notice that (A)=(tAt)t(At)=0\mathbb{P}(A)=\mathbb{P}\left(\bigcup_{t\in\mathbb{Z}}A_{t}\right)\leq\sum_{t\in\mathbb{Z}}\mathbb{P}(A_{t})=0 and hence B:=A𝖼B:=A^{\mathsf{c}} has measure one and

𝐳t(ω)esssupωΩ{𝐳t(ω)},for all ωB and all t.\|{\bf z}_{t}(\omega)\|\leq\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\|{\bf z}_{t}(\omega)\|\right\},\quad\mbox{for all $\omega\in B$ and all $t\in\mathbb{Z}$.}

Since BB has measure one, this inequality is equivalent to (6.24), which guarantees that (6.23) holds. The inequalities (6.20) and (6.23) that we just proved imply that the equality (4.7) holds true.  \blacksquare

6.13 Proof of Lemma 4.3

It is obvious that S(n)S(n)S_{\ell^{\infty}({\mathbb{R}}^{n})}\subset S_{({\mathbb{R}}^{n})^{\mathbb{Z}}} and hence the inclusion map

ι\displaystyle\iota :S(n)S(n),\displaystyle:S_{\ell^{\infty}({\mathbb{R}}^{n})}\hookrightarrow S_{(\mathbb{R}^{n})^{\mathbb{Z}}}, (6.25)

is well-defined. The equivariance with respect to the equivalence relations (n)\sim_{\ell^{\infty}({\mathbb{R}}^{n})} and (n)\sim_{(\mathbb{R}^{n})^{\mathbb{Z}}} follows trivially from noticing that if 𝐳1,𝐳2S(n){\bf z}_{1},{\bf z}_{2}\in S_{\ell^{\infty}({\mathbb{R}}^{n})} are such that 𝐳1(n)𝐳2{\bf z}_{1}\sim_{\ell^{\infty}({\mathbb{R}}^{n})}{\bf z}_{2} one obviously have that ι(𝐳)1(n)ι(𝐳2)\iota({\bf z})_{1}\sim_{(\mathbb{R}^{n})^{\mathbb{Z}}}\iota({\bf z}_{2}). This shows the existence of the projected map ϕ\phi that makes the diagram {diagram} commutative where Π(n)\Pi_{\sim_{\ell^{\infty}({\mathbb{R}}^{n})}} and Π(n)\Pi_{\sim_{({\mathbb{R}}^{n})^{\mathbb{Z}}}} map the elements in S(n)S_{\ell^{\infty}({\mathbb{R}}^{n})} and S(n)S_{(\mathbb{R}^{n})^{\mathbb{Z}}} onto their corresponding equivalence classes with respect to the associated equivalence relations. One can easily prove that the norm preservation following the diagram. It is a straightforward exercise to verify that ϕ\phi is injective and preserves the norm L\left\|\cdot\right\|_{L^{\infty}}. In order to show that ϕ\phi is surjective, let 𝐳L(Ω,(n)){\bf z}\in L^{\infty}\left(\Omega,({\mathbb{R}}^{n})^{\mathbb{Z}}\right). Given that 𝐳L<\left\|{\bf z}\right\|_{L^{\infty}}<\infty or, equivalently, esssupωΩ{supt{𝐳t(ω)}}<\mathop{{\rm ess\,sup}}_{\omega\in\Omega}\left\{\mathop{{\rm sup}}_{t\in\mathbb{Z}}\left\{\|{\bf z}_{t}(\omega)\|\right\}\right\}<\infty, by part (i) in Lemma 4.1, this implies that

supt{𝐳t(ω)}<,almost surely.\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}(\omega)\|\}<\infty,\quad\mbox{almost surely.} (6.26)

Since the elements in the spaces in L(Ω,(n))L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right) and L(Ω,(n))L^{\infty}\left(\Omega,({\mathbb{R}}^{n})^{\mathbb{Z}}\right) are equivalence classes containing almost surely equal random variables, we can take another representative 𝐳:Ω(n){\bf z}^{\ast}:\Omega\longrightarrow({\mathbb{R}}^{n})^{\mathbb{Z}} for the class containing 𝐳L(Ω,(n)){\bf z}\in L^{\infty}\left(\Omega,({\mathbb{R}}^{n})^{\mathbb{Z}}\right) defined as

𝐳(ω):={𝐳(ω),whensupt{𝐳t(ω)}<,0,otherwise.{\bf z}^{\ast}(\omega):=\left\{\begin{array}[]{cc}{\bf z}(\omega),&\quad\mbox{when}\quad\sup_{t\in\mathbb{Z}}\{\|{\bf z}_{t}(\omega)\|\}<\infty,\\ 0,&\quad\mbox{otherwise}.\end{array}\right.

Since the processes 𝐳{\bf z} and 𝐳{\bf z}^{\ast} differ by (6.26) only in a set of zero measure, they are equal in L(Ω,(n))L^{\infty}\left(\Omega,(\mathbb{R}^{n})^{\mathbb{Z}}\right) but, this time, 𝐳L(Ω,(n)){\bf z}^{\ast}\in L^{\infty}\left(\Omega,\ell^{\infty}({\mathbb{R}}^{n})\right) and ϕ(𝐳)=𝐳\phi({\bf z}^{\ast})={\bf z}, as required.  \blacksquare

6.14 Proof of Theorem 4.4

Proof of part (i). All along this proof we will denote the elements in KMK_{M} with a lower bold case (𝐳KM{\bf z}\in K_{M}) and those in KMLK_{M}^{L^{\infty}} with an upper bold case (𝐙KML{\bf Z}\in K_{M}^{L^{\infty}}).

We first assume that the functional H:(KM,w)H:(K_{M},\left\|\cdot\right\|_{w})\longrightarrow\mathbb{R} has the fading memory property. This means that HH is a continuous map and since by Lemma  2.2 the space (KM,w)(K_{M},\left\|\cdot\right\|_{w}) is compact, then so is the image H(KM)H(K_{M}) as a subset of the real line. This implies that there exists a finite real number L>0L>0 such that H(KM)[L,L]H(K_{M})\subset[-L,L]. Let now 𝐙KML{\bf Z}\in K^{L^{\infty}}_{M}; the condition 𝐙LM\|{\bf Z}\|_{L^{\infty}}\leq M is equivalent to 𝐙tM\left\|{\bf Z}_{t}\right\|\leq M, for all tt\in\mathbb{Z}_{-}, almost surely, and hence implies that H(𝐙)[L,L]H\left({\bf Z}\right)\in[-L,L], almost surely or, equivalently, that H(𝐙)LL\left\|H\left({\bf Z}\right)\right\|_{L^{\infty}}\leq L. This, in turn, implies that H(𝐙)L(Ω,)H({\bf Z})\in L^{\infty}(\Omega,\mathbb{R}) for any 𝐙KML{\bf Z}\in K^{L^{\infty}}_{M}, as required.

We now show that H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) has the FMP. The FMP hypothesis on H:(KM,w)H:(K_{M},\left\|\cdot\right\|_{w})\longrightarrow\mathbb{R} implies that for any 𝐳KM{\bf z}\in K_{M} and any ϵ>0\epsilon>0 there exists a δ(ϵ)>0\delta(\epsilon)>0 such that for any 𝐬KM{\bf s}\in K_{M} that satisfies that

𝐳𝐬w=supt{(𝐳t𝐬t)wt}<δ(ϵ),then|H(𝐳)H(𝐬)|<ϵ.\|{\bf z}-{\bf s}\|_{w}=\sup_{t\in\mathbb{Z}_{-}}\{\|({\bf z}_{t}-{\bf s}_{t})w_{-t}\|\}<\delta(\epsilon),\quad\mbox{then}\quad|H({\bf z})-H({\bf s})|<\epsilon. (6.27)

Moreover, since by Lemma  2.2 the space (KM,w)(K_{M},\left\|\cdot\right\|_{w}) is compact, the Uniform Continuity Theorem  [Munk 14, Theorem 7.3] guarantees that the relation δ(ϵ)\delta(\epsilon) does not depend on the point 𝐳KM{\bf z}\in K_{M}.

We now prove the statement by showing that for any ϵ>0\epsilon>0 and 𝐙KML{\bf Z}\in K_{M}^{L^{\infty}} then H(𝐙)H(𝐒)L<ϵ\|H({\bf Z})-H({\bf S})\|_{L^{\infty}}<\epsilon, for all 𝐒KML{\bf S}\in K_{M}^{L^{\infty}} such that 𝐙𝐒Lw<δ(ϵ)\left\|{\bf Z}-{\bf S}\right\|_{L^{\infty}_{w}}<\delta(\epsilon). Indeed, the inequality 𝐙𝐒Lw<δ(ϵ)\left\|{\bf Z}-{\bf S}\right\|_{L^{\infty}_{w}}<\delta(\epsilon) holds if and only if supt{𝐙t𝐒tLwt}<δ(ϵ)\sup_{t\in\mathbb{Z}_{-}}\{\left\|{\bf Z}_{t}-{\bf S}_{t}\right\|_{L^{\infty}}w_{-t}\}<\delta(\epsilon). Given that for any ll\in\mathbb{Z}_{-} we have that 𝐙l𝐒lLwlsupt{𝐙t𝐒tLwt}<δ(ϵ)\left\|{\bf Z}_{l}-{\bf S}_{l}\right\|_{L^{\infty}}w_{-l}\leq\sup_{t\in\mathbb{Z}_{-}}\{\left\|{\bf Z}_{t}-{\bf S}_{t}\right\|_{L^{\infty}}w_{-t}\}<\delta(\epsilon), part (ii) in Lemma  4.1 implies that 𝐙l𝐒lwl<δ(ϵ)\left\|{\bf Z}_{l}-{\bf S}_{l}\right\|w_{-l}<\delta(\epsilon) almost surely for any ll\in\mathbb{Z}_{-} and hence supt{𝐙t𝐒twt}=𝐙𝐒w<δ(ϵ)\sup_{t\in\mathbb{Z}_{-}}\{\left\|{\bf Z}_{t}-{\bf S}_{t}\right\|w_{-t}\}=\left\|{\bf Z}-{\bf S}\right\|_{w}<\delta(\epsilon), almost surely. This implies, using  (6.27), that |H(𝐙)H(𝐒)|<ϵ|H({\bf Z})-H({\bf S})|<\epsilon, almost surely, which by part (ii) in Lemma  4.1 implies that H(𝐙)H(𝐒)L<ϵ\|H({\bf Z})-H({\bf S})\|_{L^{\infty}}<\epsilon, as required.

Conversely, if H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) has the fading memory property then so does H:(KM,w)H:(K_{M},\left\|\cdot\right\|_{w})\longrightarrow\mathbb{R} because KMKMLK_{M}\subset K^{L^{\infty}}_{M} and 𝐳=𝐳L\left\|{\bf z}\right\|=\left\|{\bf z}\right\|_{L^{\infty}} for the elements 𝐳KM{\bf z}\in K_{M}.

Proof of part (ii). We suppose first that 𝒯\mathcal{T} is dense in the set (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}) and show that the corresponding family with intputs in KMLK^{L^{\infty}}_{M} is universal. Let H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) be an arbitrary causal and time-invariant FMP filter and let HS𝒯H_{S}\in\mathcal{T} be such that sup𝐳KM{H(𝐳)HS(𝐳)L}<ϵ\sup_{{\bf z}\in K_{M}}\{\|H({\bf z})-H_{S}({\bf z})\|_{L^{\infty}}\}<\epsilon. The existence of HSH_{S} is ensured by the density hypothesis on 𝒯\mathcal{T}. We show that this ensures that sup𝐙KML{H(𝐙)HS(𝐙)L}<ϵ\sup_{{\bf Z}\in K_{M}^{L^{\infty}}}\{\|H({\bf Z})-H_{S}({\bf Z})\|_{L^{\infty}}\}<\epsilon. Indeed, this conclusion is true if H(𝐙)HS(𝐙)L<ϵ\|H({\bf Z})-H_{S}({\bf Z})\|_{L^{\infty}}<\epsilon for any 𝐙KML{\bf Z}\in K_{M}^{L^{\infty}} which, by part (ii) in Lemma  4.1 is equivalent to |H(𝐙)HS(𝐙)|<ϵ|H({\bf Z})-H_{S}({\bf Z})|<\epsilon almost surely, for any 𝐙KML{\bf Z}\in K_{M}^{L^{\infty}}. This condition is in turn true because as 𝐙KML{\bf Z}\in K_{M}^{L^{\infty}}, then 𝐙tM\left\|{\bf Z}_{t}\right\|\leq M almost surely for all tt\in\mathbb{Z}_{-} and hence 𝐙KM{\bf Z}\in K_{M} almost surely. Since HSH_{S} approximates HH for deterministic inputs, we have that |H(𝐙)HS(𝐙)|<ϵ|H({\bf Z})-H_{S}({\bf Z})|<\epsilon almost surely, as required.

Conversely, if the family 𝒯\mathcal{T} with intputs in KMLK^{L^{\infty}}_{M} is universal in the set of continuous maps of the type H:(KML,Lw)L(Ω,)H:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) we can easily show that 𝒯\mathcal{T} is dense in (C0(KM),w)(C^{0}(K_{M}),\|\cdot\|_{w}). Let H(C0(KM),w)H\in(C^{0}(K_{M}),\|\cdot\|_{w}) and let HS:(KML,Lw)L(Ω,)H_{S}:(K^{L^{\infty}}_{M},\|\cdot\|_{L^{\infty}_{w}})\longrightarrow L^{\infty}(\Omega,\mathbb{R}) be the element that, for a given ϵ>0\epsilon>0, satisfies HHSL=sup𝐙KML{H(𝐙)HS(𝐙)L}<ϵ\|H-H_{S}\|_{L^{\infty}}=\sup_{{\bf Z}\in K_{M}^{L^{\infty}}}\{\|H({\bf Z})-H_{S}({\bf Z})\|_{L^{\infty}}\}<\epsilon. Given that, as we pointed out, KMKMLK_{M}\subset K^{L^{\infty}}_{M} and 𝐳=𝐳L\left\|{\bf z}\right\|=\left\|{\bf z}\right\|_{L^{\infty}}, for the elements 𝐳KM{\bf z}\in K_{M}, we have

HHS=sup𝐳KM{H(𝐳)HS(𝐳)}=sup𝐳KM{H(𝐳)HS(𝐳)L}sup𝐙KML{H(𝐙)HS(𝐙)L}<ϵ.\left\|H-H_{S}\right\|=\sup_{{\bf z}\in K_{M}}\{\|H({\bf z})-H_{S}({\bf z})\|\}=\sup_{{\bf z}\in K_{M}}\{\|H({\bf z})-H_{S}({\bf z})\|_{L^{\infty}}\}\leq\sup_{{\bf Z}\in K_{M}^{L^{\infty}}}\{\|H({\bf Z})-H_{S}({\bf Z})\|_{L^{\infty}}\}<\epsilon.\quad\blacksquare

6.15 Proof of Lemma 4.5

As we pointed out in Section 2, if the reservoir system determined by F:DN×Bn(𝟎,M)¯DNF:D_{N}\times\overline{B_{n}({\bf 0},M)}\longrightarrow D_{N} and h:DNh:D_{N}\rightarrow\mathbb{R} has the echo state property, a result in [Grig 18] guarantees that the associated filter is automatically causal and time-invariant. This implies the existence of a functional HhF:(n)H_{h}^{F}:\left(\mathbb{R}^{n}\right)^{\mathbb{Z}_{-}}\longrightarrow\mathbb{R} that, by hypothesis, has the fading memory property. The rest of the statement is a consequence of part (i) in Theorem 4.4.  \blacksquare

6.16 Proof of Theorem 4.6

We first notice that the polynomial algebra 𝒜()\mathcal{A}(\mathcal{R}) is, by Theorem 3.1 and the first part of Theorem 4.4, made of fading memory reservoir filters that map into L(Ω,)L^{\infty}(\Omega,\mathbb{R}). Using the other hypotheses in the statement we can easily conclude that the family 𝒜()\mathcal{A}(\mathcal{R}) satisfies the thesis of Theorem 3.1 and it is hence universal in the deterministic setup. The result follows from the second part of Theorem 4.4.  \blacksquare

Acknowledgments: We thank Philipp Harms and Herbert Jaeger for carefully looking at early versions of this work and for making suggestions that have significantly improved some of our results. We thank Josef Teichmann for fruitful discussions. We also thank the editor and two remarkable anonymous referees whose input has significantly improved the presentation and the contents of the paper. The authors acknowledge partial financial support of the French ANR “BIPHOPROC” project (ANR-14-OHRI-0002-02) as well as the hospitality of the Centre Interfacultaire Bernoulli of the Ecole Polytechnique Fédérale de Lausanne during the program “Stochastic Dynamical Models in Mathematical Finance, Econometrics, and Actuarial Sciences” that made possible the collaboration that led to some of the results included in this paper. LG acknowledges partial financial support of the Graduate School of Decision Sciences and the Young Scholar Fund AFF of the Universität Konstanz. JPO acknowledges partial financial support coming from the Research Commission of the Universität Sankt Gallen and the Swiss National Science Foundation (grant number 200021_175801/1).

References

  • [Abra 88] R. Abraham, J. E. Marsden, and T. S. Ratiu. Manifolds, Tensor Analysis, and Applications. Vol. 75, Applied Mathematical Sciences. Springer-Verlag, 1988.
  • [Apos 74] T. Apostol. Mathematical Analysis. Addison Wesley, second Ed., 1974.
  • [Appe 11] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer. “Information processing using a single dynamical node as complex system”. Nature Communications, Vol. 2, p. 468, jan 2011.
  • [Arno 57] V. I. Arnold. “On functions of three variables”. Proceedings of the USSR Academy of Sciences, Vol. 114, pp. 679–681, 1957.
  • [Atiy 00] A. F. Atiya and A. G. Parlos. “New results on recurrent network training: unifying the algorithms and accelerating convergence”. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, Vol. 11, No. 3, pp. 697–709, jan 2000.
  • [Bai  12] Bai Zhang, D. J. Miller, and Yue Wang. “Nonlinear system modeling with random matrices: echo state networks revisited”. IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, No. 1, pp. 175–182, jan 2012.
  • [Barr 93] A. Barron. “Universal approximation bounds for superpositions of a sigmoidal function”. IEEE Transactions on Information Theory, Vol. 39, No. 3, pp. 930–945, may 1993.
  • [Boll 86] T. Bollerslev. “Generalized autoregressive conditional heteroskedasticity”. Journal of Econometrics, Vol. 31, No. 3, pp. 307–327, 1986.
  • [Box 76] G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, 1976.
  • [Boyd 85] S. Boyd and L. Chua. “Fading memory and the problem of approximating nonlinear operators with Volterra series”. IEEE Transactions on Circuits and Systems, Vol. 32, No. 11, pp. 1150–1161, nov 1985.
  • [Bril 58] M. B. Brilliant. “Theory of the analysis of nonlinear systems”. Tech. Rep., Massachusetts Institute of Technology, Research Laboratory of Electronics, 1958.
  • [Broc 06] P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer-Verlag, 2006.
  • [Brow 09] J. W. Brown and R. V. Churchill. Complex Variables and Applications Eighth Edition. McGraw-Hill, eighth Ed., 2009.
  • [Brun 13] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer. “Parallel photonic information processing at gigabyte per second data rates using transient states”. Nature Communications, Vol. 4, No. 1364, 2013.
  • [Bueh 06] M. Buehner and P. Young. “A tighter bound for the echo state property”. IEEE Transactions on Neural Networks, Vol. 17, No. 3, pp. 820–824, 2006.
  • [Come 06] F. Comets and T. Meyre. Calcul Stochastique et Modèles de Diffusions. Dunod, Paris, 2006.
  • [Coui 16] R. Couillet, G. Wainrib, H. Sevi, and H. T. Ali. “The asymptotic performance of linear echo state neural networks”. Journal of Machine Learning Research, Vol. 17, No. 178, pp. 1–35, 2016.
  • [Croo 07] N. Crook. “Nonlinear transient computation”. Neurocomputing, Vol. 70, pp. 1167–1176, 2007.
  • [Crut 10] J. P. Crutchfield, W. L. Ditto, and S. Sinha. “Introduction to focus issue: intrinsic and designed computation: information processing in dynamical systems-beyond the digital hegemony”. Chaos (Woodbury, N.Y.), Vol. 20, No. 3, p. 037101, sep 2010.
  • [Cybe 89] G. Cybenko. “Approximation by superpositions of a sigmoidal function”. Mathematics of Control, Signals, and Systems, Vol. 2, No. 4, pp. 303–314, dec 1989.
  • [Damb 12] J. Dambre, D. Verstraeten, B. Schrauwen, and S. Massar. “Information processing capacity of dynamical systems”. Scientific reports, Vol. 2, No. 514, 2012.
  • [Dieu 69] J. Dieudonne. Foundations of Modern Analysis. Academic Press, 1969.
  • [Doya 92] K. Doya. “Bifurcations in the learning of recurrent neural networks”. In: Proceedings of IEEE International Symposium on Circuits and Systems, pp. 2777–2780, IEEE, 1992.
  • [Engl 82] R. F. Engle. “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation”. Econometrica, Vol. 50, No. 4, pp. 987–1007, 1982.
  • [Flie 76] M. Fliess. “Un outil algebrique : les series formelles non commutatives”. In: G. Marchesini and S. K. Mitter, Eds., Mathematical Systems Theory, pp. 122–148, Springer Verlag, 1976.
  • [Flie 80] M. Fliess and D. Normand-Cyrot. “Vers une approche algébrique des systèmes non linéaires en temps discret”. In: A. Bensoussan and J. Lions, Eds., Analysis and Optimization of Systems. Lecture Notes in Control and Information Sciences, vol. 28, Springer Berlin Heidelberg, 1980.
  • [Fran 10] C. Francq and J.-M. Zakoian. GARCH Models: Structure, Statistical Inference and Financial Applications. Wiley, 2010.
  • [Frec 10] M. Fréchet. “Sur les fonctionnelles continues”. Annales scientifiques de l’Ecole Normale Supérieure. 3ème série., Vol. 27, pp. 193–216, 1910.
  • [Galt 14] M. N. Galtier, C. Marini, G. Wainrib, and H. Jaeger. “Relative entropy minimizing noisy non-linear neural network to approximate stochastic processes”. Neural Networks, Vol. 56, pp. 10–21, 2014.
  • [Gang 08] S. Ganguli, D. Huh, and H. Sompolinsky. “Memory traces in dynamical systems.”. Proceedings of the National Academy of Sciences of the United States of America, Vol. 105, No. 48, pp. 18970–5, dec 2008.
  • [Geor 59] D. A. George. “Continuous nonlinear systems”. Tech. Rep., Massachusetts Institute of Technology, Research Laboratory of Electronics, 1959.
  • [Gono 18] L. Gonon and J.-P. Ortega. “Reservoir computing universality with stochastic inputs”. Preprint, 2018.
  • [Grav 13] A. Graves, A.-R. Mohamed, and G. Hinton. “Speech recognition with deep recurrent neural networks”. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649, IEEE, may 2013.
  • [Grig 15] L. Grigoryeva, J. Henriques, L. Larger, and J.-P. Ortega. “Optimal nonlinear information processing capacity in delay-based reservoir computers”. Scientific Reports, Vol. 5, No. 12858, pp. 1–11, 2015.
  • [Grig 16a] L. Grigoryeva, J. Henriques, L. Larger, and J.-P. Ortega. “Nonlinear memory capacity of parallel time-delay reservoir computers in the processing of multidimensional signals”. Neural Computation, Vol. 28, pp. 1411–1451, 2016.
  • [Grig 16b] L. Grigoryeva, J. Henriques, and J.-P. Ortega. “Reservoir computing: information processing of stationary signals”. In: Proceedings of the 19th IEEE International Conference on Computational Science and Engineering, pp. 496–503, 2016.
  • [Grig 18] L. Grigoryeva and J.-P. Ortega. “Echo state networks are universal”. (under revision in Neural Networks), 2018.
  • [Grim 01] G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford University Press, 2001.
  • [Herm 10] M. Hermans and B. Schrauwen. “Memory in linear recurrent neural networks in continuous time.”. Neural networks : the official journal of the International Neural Network Society, Vol. 23, No. 3, pp. 341–55, apr 2010.
  • [Horn 13] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, second Ed., 2013.
  • [Horn 89] K. Hornik, M. Stinchcombe, and H. White. “Multilayer feedforward networks are universal approximators”. Neural Networks, Vol. 2, No. 5, pp. 359–366, 1989.
  • [Hung 74] T. W. Hungerford. Algebra. Springer New York, 1974.
  • [Jaeg 02] H. Jaeger. “Short term memory in echo state networks”. Fraunhofer Institute for Autonomous Intelligent Systems. Technical Report., Vol. 152, 2002.
  • [Jaeg 04] H. Jaeger and H. Haas. “Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication”. Science, Vol. 304, No. 5667, pp. 78–80, 2004.
  • [Jaeg 07] H. Jaeger, M. Lukoševičius, D. Popovici, and U. Siewert. “Optimization and applications of echo state networks with leaky-integrator neurons”. Neural Networks, Vol. 20, No. 3, pp. 335–352, 2007.
  • [Jaeg 10] H. Jaeger. “The ’echo state’ approach to analysing and training recurrent neural networks with an erratum note”. Tech. Rep., German National Research Center for Information Technology, 2010.
  • [Jone 92] L. K. Jones. “A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training”. The Annals of Statistics, Vol. 20, No. 1, pp. 608–613, 1992.
  • [Kolm 56] A. N. Kolmogorov. “On the representation of continuous functions of several variables as superpositions of functions of smaller number of variables”. Soviet Math. Dokl, Vol. 108, pp. 179–182, 1956.
  • [Kurk 05] V. Kurkova and M. Sanguineti. “Learning with generalization capability by kernel methods of bounded complexity”. Journal of Complexity, Vol. 21, No. 3, pp. 350–367, 2005.
  • [Larg 12] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pesquera, C. R. Mirasso, and I. Fischer. “Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing”. Optics Express, Vol. 20, No. 3, p. 3241, jan 2012.
  • [Ledo 91] M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer-Verlag, 1991.
  • [Luko 09] M. Lukoševičius and H. Jaeger. “Reservoir computing approaches to recurrent neural network training”. Computer Science Review, Vol. 3, No. 3, pp. 127–149, 2009.
  • [Maas 00] W. Maass and E. D. Sontag. “Neural Systems as Nonlinear Filters”. Neural Computation, Vol. 12, No. 8, pp. 1743–1772, aug 2000.
  • [Maas 02] W. Maass, T. Natschläger, and H. Markram. “Real-time computing without stable states: a new framework for neural computation based on perturbations”. Neural Computation, Vol. 14, pp. 2531–2560, 2002.
  • [Maas 04] W. Maass, T. Natschläger, and H. Markram. “Fading memory and kernel properties of generic cortical microcircuit models”. Journal of Physiology Paris, Vol. 98, No. 4-6 SPEC. ISS., pp. 315–330, 2004.
  • [Maas 07] W. Maass, P. Joshi, and E. D. Sontag. “Computational aspects of feedback in neural circuits”. PLoS Computational Biology, Vol. 3, No. 1, p. e165, 2007.
  • [Maas 11] W. Maass. “Liquid state machines: motivation, theory, and applications”. In: S. S. Barry Cooper and A. Sorbi, Eds., Computability In Context: Computation and Logic in the Real World, Chap. 8, pp. 275–296, 2011.
  • [Manj 13] G. Manjunath and H. Jaeger. “Echo state property linked to an input: exploring a fundamental characteristic of recurrent neural networks”. Neural Computation, Vol. 25, No. 3, pp. 671–696, 2013.
  • [Matt 92] M. B. Matthews. On the Uniform Approximation of Nonlinear Discrete-Time Fading-Memory Systems Using Neural Network Models. PhD thesis, ETH Zürich, 1992.
  • [Matt 93] M. B. Matthews. “Approximating nonlinear fading-memory operators using neural network models”. Circuits, Systems, and Signal Processing, Vol. 12, No. 2, pp. 279–307, jun 1993.
  • [Munk 14] J. Munkres. Topology. Pearson, second Ed., 2014.
  • [Paqu 12] Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar. “Optoelectronic reservoir computing”. Scientific reports, Vol. 2, p. 287, jan 2012.
  • [Pasc 13] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio. “How to construct deep recurrent neural networks”. arXiv, dec 2013.
  • [Path 17] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott. “Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data”. Chaos, Vol. 27, No. 12, 2017.
  • [Path 18] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott. “Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach”. Physical Review Letters, Vol. 120, No. 2, p. 24102, 2018.
  • [Perr 96] P. C. Perryman. Approximation Theory for Deterministic and Stochastic Nonlinear Systems. PhD thesis, University of California, Irvine, 1996.
  • [Perr 97] P. Perryman and A. Stubberud. “Uniform, in-probability approximation of stochastic systems”. In: Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers, pp. 146–150, IEEE Comput. Soc. Press, 1997.
  • [Pisi 16] G. Pisier. Martingales in Banach Spaces. Cambridge University Press, 2016.
  • [Pisi 81] G. Pisier. “Remarques sur un résultat non publié de B. Maurey”. Séminaire d’analyse fonctionnelle École Polytechnique, pp. 1–12, 1981.
  • [Roda 11] A. Rodan and P. Tino. “Minimum complexity echo state network.”. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, Vol. 22, No. 1, pp. 131–44, jan 2011.
  • [Rusc 98] L. Rüschendorf and W. Thomsen. “Closedness of sum spaces and the generalized schrödinger Problem”. Theory of Probability & Its Applications, Vol. 42, No. 3, pp. 483–494, jan 1998.
  • [Sont 79a] E. Sontag. “Realization theory of discrete-time nonlinear systems: Part I-The bounded case”. IEEE Transactions on Circuits and Systems, Vol. 26, No. 5, pp. 342–356, may 1979.
  • [Sont 79b] E. D. Sontag. “Polynomial Response Maps”. In: Lecture Notes Control in Control and Information Sciences. Vol. 13, Springer Verlag, 1979.
  • [Spre 65] D. A. Sprecher. “A representation theorem for continuous functions of several variables”. Proceedings of the American Mathematical Society, Vol. 16, No. 2, p. 200, apr 1965.
  • [Spre 96] D. A. Sprecher. “A numerical implementation of Kolmogorov’s superpositions”. Neural Networks, Vol. 9, No. 5, pp. 765–772, 1996.
  • [Spre 97] D. A. Sprecher. “A numerical implementation of Kolmogorov’s superpositions II”. Neural Networks, Vol. 10, No. 3, pp. 447–457, 1997.
  • [Stub 97a] A. Stubberud and P. Perryman. “Current state of system approximation for deterministic and stochastic systems”. In: Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers, pp. 141–145, IEEE Comput. Soc. Press, 1997.
  • [Stub 97b] A. Stubberud and P. Perryman. “State of system approximation for stochastic systems”. In: Proceedings of 13th International Conference on Digital Signal Processing, pp. 711–714, IEEE, 1997.
  • [Suss 76] H. J. Sussmann. “Semigroup representations, bilinear approximations of input-output maps, and generalized inputs”. In: G. Marchesini and S. K. MItter, Eds., Mathematical Systems Theory, pp. 172–191, Springer Verlag, 1976.
  • [Take 81] F. Takens. “Detecting strange attractors in turbulence”. pp. 366–381, Springer Berlin Heidelberg, 1981.
  • [Vand 11] K. Vandoorne, J. Dambre, D. Verstraeten, B. Schrauwen, and P. Bienstman. “Parallel reservoir computing using optical amplifiers”. IEEE Transactions on Neural Networks, Vol. 22, No. 9, pp. 1469–1481, sep 2011.
  • [Vand 14] K. Vandoorne, P. Mechet, T. Van Vaerenbergh, M. Fiers, G. Morthier, D. Verstraeten, B. Schrauwen, J. Dambre, and P. Bienstman. “Experimental demonstration of reservoir computing on a silicon photonics chip”. Nature Communications, Vol. 5, pp. 78–80, mar 2014.
  • [Vers 07] D. Verstraeten, B. Schrauwen, M. D’Haene, and D. Stroobandt. “An experimental unification of reservoir computing methods”. Neural Networks, Vol. 20, pp. 391–403, 2007.
  • [Vinc 15] Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne, P. Bienstman, M. Haelterman, and S. Massar. “High-performance photonic reservoir computer based on a coherently driven passive cavity”. Optica, Vol. 2, No. 5, pp. 438–446, 2015.
  • [Wain 16] G. Wainrib and M. N. Galtier. “A local echo state property through the largest Lyapunov exponent”. Neural Networks, Vol. 76, pp. 39–45, apr 2016.
  • [Whit 04] O. White, D. Lee, and H. Sompolinsky. “Short-Term Memory in Orthogonal Neural Networks”. Physical Review Letters, Vol. 92, No. 14, p. 148102, apr 2004.
  • [Wien 58] N. Wiener. Nonlinear Problems in Random Theory. The Technology Press of MIT, 1958.
  • [Yild 12] I. B. Yildiz, H. Jaeger, and S. J. Kiebel. “Re-visiting the echo state property.”. Neural networks : the official journal of the International Neural Network Society, Vol. 35, pp. 1–9, nov 2012.
  • [Zang 04] G. Zang and P. A. Iglesias. “Fading memory and stability”. Journal of the Franklin Institute, Vol. 340, No. 6-7, pp. 489–502, 2004.
  • [Zare 14] W. Zaremba, I. Sutskever, and O. Vinyals. “Recurrent neural network regularization”. arXiv, sep 2014.