This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint estimation and model order selection for one dimensional ARMA models via convex optimization: a nuclear norm penalization approach

Stéphane Chrétien, Tianwen Wei and Basad Ali Hussain Al-sarray
Abstract.

The problem of estimating ARMA models is computationally interesting due to the nonconcavity of the log-likelihood function. Recent results were based on the convex minimization. Joint model selection using penalization by a convex norm, e.g. the nuclear norm of a certain matrix related to the state space formulation was extensively studied from a computational viewpoint. The goal of the present short note is to present a theoretical study of a nuclear norm penalization based variant of the method of [2, 3] under the assumption of a Gaussian noise process.

Laboratoire de Mathématiques, UMR 6623, Université de Franche-Comté, 16 route de Gray, 25030 Besancon, France. Email: stephane.chretien@univ-fcomte.fr

Keywords: ARMA models, Time series, Low rank model, Prediction, Nuclear norm penalization.


1. Introduction

The Auto-regressive with moving average (ARMA) model is central to the field of time serie analysis and has been studied since the early thirties in the field of econometrics [12]. ARMA time series are sequences of the form (xt)t(x_{t})_{t\in\mathbb{N}} satisfying the following recursion

(1.1) xt\displaystyle x_{t} =\displaystyle= i=1paixti+j=1qbjetj+et\displaystyle\sum^{p}_{i=1}a_{i}x_{t-i}+\sum^{q}_{j=1}b_{j}e_{t-j}+e_{t}

for all tmax{p,q}t\geqslant max\left\{p,q\right\}, and we focus on the case where (et)t(e_{t})_{t\in\mathbb{N}} is a sequence of zero mean independent identically distributed Gaussian random variables with variance denoted by σε2\sigma_{\varepsilon}^{2} for simplicity ***Extentions to more sophisticated models for the noise (εt)(\varepsilon_{t}) in order to accomodate more applications were also studied extensively in the recent years but the Gaussian case is already a challenge from the algorithmic perspective as will be discussed below. As is well known [12], time series model are adequate for a wide range of phenomena in economics, engineering, social science, epidemiology, ecology, signal processing, etc. They can also be helpful as a building block in more complicated models such as GARCH models, which are particularly useful in financial time series analysis.

Two problems are to be addressed when studying ARMA time series:

  1. (1)

    estimate pp and qq, the intrinsic orders of the model.

  2. (2)

    estimate a=(a1,a2,,ap)a=(a_{1},a_{2},...,a_{p}) and b=(b1,b2,bq)b=(b_{1},b_{2},...b_{q}).

In the case where q=0q=0, the convention is to write (1.1) as:

(1.2) xt\displaystyle x_{t} =\displaystyle= i=1paixt1+et\displaystyle\sum^{p}_{i=1}a_{i}x_{t-1}+e_{t}

and xtx_{t} to simply called an AR process. Estimation of aa is often performed using the conditional likelihood approach, given x0,,xp1x_{0},...,x_{p-1} yielding to the standard Yule-Walker equations. On the other hand, the model order selection problem is often performed using a penalized log-likelihood approach such as AIC,BIC,.., may also use the plain likelihood. We refer the reader to the standard text of Brockwell and Davis for more details on these standard problems. Turning back to the full ARMA model, it is well known that the log-likelihood is not a concave function, and that multiple stationary points exist which can lead to severe bias when using local optimization routines for such as gradient or Newton-type methods for the joint estimation of aa and bb. In Shumway and Stoffer [12] and iterative procedure resembling the EM algorithm is proposed, which seems more appropriate for the ARMA model than standard optimization algorithms. However, no convergence guarantee towards a global maximizer is provided. Concerning the model selection problem, penalties play a prominent role in modern statistical theory and practice, in particular since the recent successes of the LASSO in regression and its multiple generalization. The nuclear norm penalization has played an import for many problems in engineering, machine learning and statistics such as matrix completion, …Application of nuclear norm penalization to state space model estimation and model order selection using a moment-like estimator in a convex optimization framework is proposed in [6]. The approach of [6] is a remarkable contribution since convex model selection and state space estimation were combined for the first time in the problem of Time Series. However the approach of [6] is supported by no theoretical guarantee yet. Another approach for State Space model estimation was proposed in [2, 3] where good practical performances are reported and an asymptotic analysis is provided. This method as well as the unpenalized version of the method in [6] can be recast into the family of subspace methods; see [15]. In such subspace-type methods, model order selection and model estimation are decoupled and it is natural to wonder if the approach of [2] can be refined in order to incorporate joint model selection using a nuclear norm penalty as in [6].

Based on the evidence of the practical efficiency of subspace-type methods [15], our goal in the present note is to propose a theoretical study of a nuclear norm penalized version of the subspace method from [2] which incorporates the main ideas in [6].

2. The subspace method

2.1. Recall on the subspace approach

A real valued random discrete dynamical system (xt)t(x_{t})_{t\in\mathbb{N}} admits a State Space representation if there exists a discrete time process (st)t(s_{t})_{t\in\mathbb{N}} such that

st+1\displaystyle s_{t+1} =\displaystyle= Ast+Ket\displaystyle As_{t}+Ke_{t}
xt\displaystyle x_{t} =\displaystyle= Bst+et\displaystyle Bs_{t}+e_{t}

where (et)t(e_{t})_{t\in\mathbb{N}} is the noise, and Ap×pA\in\mathbb{R}^{p\times p}, B1×pB\in\mathbb{R}^{1\times p}, Kp×1K\in\mathbb{R}^{p\times 1} are parameter matrices. It is well known that ARMA processes admit a State Space representation and vice versa [12].

2.2. Prediction

The problem of predicting xt+jx_{t+j} for j0j\geqslant 0 based on the knowledge of xtx_{t^{\prime}}, t<tt^{\prime}<t and sts_{t} can be solved easily following the approach by Bauer [2, 3]. For given initial values x0x_{0}, e0e_{0}, the State Space representation gives

xt+h\displaystyle x_{t+h} =\displaystyle= et+h+j=1hBAj1Ket+hj+BAhst\displaystyle e_{t+h}+\sum^{h}_{j=1}BA^{j-1}Ke_{t+h-j}+BA^{h}s_{t}

On the other hand, the State Space representation implies that

st\displaystyle s_{t} =\displaystyle= Ast1+Ket1\displaystyle As_{t-1}+Ke_{t-1}
=\displaystyle= Ast1+K(xt1Bst1)\displaystyle As_{t-1}+K\left(x_{t-1}-Bs_{t-1}\right)
=\displaystyle= (AKB)st1+Kxt1\displaystyle\left(A-KB\right)s_{t-1}+Kx_{t-1}
=\displaystyle= \displaystyle\cdots

Thus, we obtain

st\displaystyle s_{t} =\displaystyle= (AKB)ts0+j=0t1(AKB)jKxt1j.\displaystyle\left(A-KB\right)^{t}s_{0}+\sum_{j=0}^{t-1}\left(A-KB\right)^{j}Kx_{t-1-j}.

In what follows, we will assume that we observe x0,,xTx_{0},\ldots,x_{T} and that t>0t>0 is such that T2t+1>0T-2t+1>0.

2.3. Prediction with Hankel matrices

We will rewrite the prediction problem in terms of some Hankel matrices. For this purpose, define

A¯=AKB,𝒜¯0=[A¯ts0,A¯0t+1,,A¯Tt+1s0],𝒦\displaystyle\bar{A}=A-KB,\quad\bar{\mathcal{A}}_{0}=[\bar{A}^{t}s_{0},\bar{A}^{t+1}_{0},\ldots,\bar{A}^{T-t+1}s_{0}],\quad\mathcal{K} =\displaystyle= [A¯t1K,,A¯2K,A¯K,K],\displaystyle\left[\bar{A}^{t-1}K,\cdots,\bar{A}^{2}K,\bar{A}K,K\right],
𝒪\displaystyle\mathcal{O} =[BBABAt1] and 𝒩=\displaystyle=\left[\begin{array}[]{c}B\\ BA\\ \vdots\\ BA^{t-1}\end{array}\right]\quad\textrm{ and }\quad\mathcal{N}= [100BK100BAt2KBAt3KBK1].\displaystyle\left[\begin{array}[]{cccccc}1&0&\cdots&\cdots&\cdots&0\\ BK&1&0&\cdots&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ BA^{t-2}K&BA^{t-3}K&\cdots&\cdots&BK&1\end{array}\right].

Then, we have

(2.10) [xtx2t1]\displaystyle\left[\begin{array}[]{c}x_{t}\\ \vdots\\ x_{2t-1}\end{array}\right] =\displaystyle= 𝒪st+𝒩[etet+h]\displaystyle\mathcal{O}s_{t}+\mathcal{N}\left[\begin{array}[]{c}e_{t}\\ \vdots\\ e_{t+h}\end{array}\right]

and

(2.14) st\displaystyle s_{t} =\displaystyle= 𝒦[x0xt1]+(AKB)ts0.\displaystyle\mathcal{K}\left[\begin{array}[]{c}x_{0}\\ \vdots\\ x_{t-1}\end{array}\right]+\left(A-KB\right)^{t}s_{0}.

Combining (2.10) and (2.14), we thus obtain

[xtx2t1]\displaystyle\left[\begin{array}[]{c}x_{t}\\ \vdots\\ x_{2t-1}\end{array}\right] =\displaystyle= 𝒪𝒦[x0xt1]+𝒪(AKB)ts0+𝒩[etet+h].\displaystyle\mathcal{O}\mathcal{K}\left[\begin{array}[]{c}x_{0}\\ \vdots\\ x_{t-1}\end{array}\right]+\mathcal{O}\left(A-KB\right)^{t}s_{0}+\mathcal{N}\left[\begin{array}[]{c}e_{t}\\ \vdots\\ e_{t+h}\end{array}\right].

Now, define

Xpast\displaystyle X_{past} =[x0x1xT2t+1x1x2xT2t+2xt1xtxTt] and Xfuture=\displaystyle=\left[\begin{array}[]{cccc}x_{0}&x_{1}&\cdots&x_{T-2t+1}\\ x_{1}&x_{2}&\cdots&x_{T-2t+2}\\ \vdots&\vdots&\vdots&\vdots\\ x_{t-1}&x_{t}&\cdots&x_{T-t}\\ \end{array}\right]\quad\textrm{ and }\quad X_{future}= [xtxt+1xTt+1xt+1xt+2xTt+2x2t1x2txT].\displaystyle\left[\begin{array}[]{cccc}x_{t}&x_{t+1}&\cdots&x_{T-t+1}\\ x_{t+1}&x_{t+2}&\cdots&x_{T-t+2}\\ \vdots&\vdots&\vdots&\vdots\\ x_{2t-1}&x_{2t}&\cdots&x_{T}\\ \end{array}\right].

Both matrices are Hankel matrices. The first one represents the past values and and second one the future values. Define also the noise matrix

E\displaystyle E =\displaystyle= [etet+1eTt+1et+1et+2eTt+2e2t1e2teT].\displaystyle\left[\begin{array}[]{cccc}e_{t}&e_{t+1}&\cdots&e_{T-t+1}\\ e_{t+1}&e_{t+2}&\cdots&e_{T-t+2}\\ \vdots&\vdots&\vdots&\vdots\\ e_{2t-1}&e_{2t}&\cdots&e_{T}\\ \end{array}\right].

All these Hankel matrices are related by the following equation

Xfuture\displaystyle X_{future} =\displaystyle= 𝒪𝒦Xpast+𝒪𝒜¯0+𝒩E.\displaystyle\mathcal{O}\mathcal{K}\>X_{past}+\mathcal{O}\bar{\mathcal{A}}_{0}+\mathcal{N}E.

3. The estimation problem

In the last section, we showed that the matrices AA, BB and KK of the State Space model entered nicely into an equation allowing prediction of future values based on past values of the dynamical system. Our goal is now to use this equation to estimate the matrices AA, BB and CC. One interesting feature of our procedure is that the dimension pp of the State Space model can be estimated jointly with the matrices themselves.

3.1. Estimating 𝒪𝒦\mathcal{O}\mathcal{K}

The matrix 𝒪𝒦\mathcal{O}\mathcal{K} can be estimated using a least squares approach corresponding to solving

(3.18) minLt×t12XfutureLXpastF2.\displaystyle\min_{L\in\mathbb{R}^{t\times t}}\>\frac{1}{2}\|X_{future}-L\>X_{past}\|_{F}^{2}.

This procedure will make sense if the term 𝒪𝒜¯0\mathcal{O}\bar{\mathcal{A}}_{0} is small. This can indeed be justified if tt is large and if A¯\|\bar{A}\| is small. Let us call L^\hat{L} a solution of (3.18).

3.2. Nuclear Norm penalized 1\ell_{1}-norm for low rank estimation

An interesting property of the matrix 𝒪𝒦\mathcal{O}\mathcal{K} is that its rank is the State’s dimension pp when AA has full rank. Moreover, 𝒪𝒦\mathcal{O}\mathcal{K} has small rank compared to tt when tt is large compared to pp. Therefore, one is tempted to penalize the least squares problem (3.18) with a low-rank promoting penalty.

One option is to try to solve

(3.19) minLt×t12XfutureLXpastF2+λrank(L)\displaystyle\min_{L\in\mathbb{R}^{t\times t}}\>\frac{1}{2}\|X_{future}-L\>X_{past}\|^{2}_{F}+\lambda\>{\rm rank}\left(L\right)

The main drawback of this approach is that the rank function is non continuous and non convex. This renders the optimization problem intractable in practice. Fortunately, the rank function admits a well known convex surrogate, which is the nuclear norm, i.e. the sum of the singular values, denoted by .\|.\|_{*}.
Thus, a nice convex relaxation of (3.19) is given by

(3.20) minLt×t12XfutureLXpastF2+λL.\displaystyle\min_{L\in\mathbb{R}^{t\times t}}\>\frac{1}{2}\|X_{future}-L\>X_{past}\|^{2}_{F}+\lambda\>\|L\|_{*}.

It has been observed in practice that nuclear norm penalized least squares provide low rank solution for many interesting estimation problems [11].

4. Main results

The penalized least-squares problem (3.20) can be transformed into the following constrained problem

(4.21) minLt×tL subject to XfutureLXpastFη,\displaystyle\min_{L\in\mathbb{R}^{t\times t}}\>\|L\|_{*}\quad\textrm{ subject to }\quad\|X_{future}-L\>X_{past}\|_{F}\leq\eta,

for some appropriate choice of η\eta.

Let Σ\Sigma denote the covariance matrix of [x0,,xt1]t[x_{0},\ldots,x_{t-1}]^{t} and let Σ±12\Sigma^{\pm\frac{1}{2}} denote the square root of Σ±1\Sigma^{\pm 1}. Then, Let HH be the random matrix whose components are given by

Hs,r\displaystyle H_{s,r} =\displaystyle= s=0T2t+1εs,szs+r.\displaystyle\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}z_{s^{\prime}+r}.

where εs,s\varepsilon_{s,s^{\prime}}, s=0,,t1s=0,\ldots,t-1 and s=0,,T2t+1s^{\prime}=0,\ldots,T-2t+1 are independent Rademacher random variables which are independent of zsz_{s^{\prime}}, s=0,,Tts^{\prime}=0,\ldots,T-t. Let ΣH\Sigma^{H} denote the covariance matrix of vec(H){\rm vec}(H). Let \mathcal{M} denote the operator defined by

(4.22) \displaystyle\mathcal{M} =\displaystyle= Mat(ΣH1/2vec())\displaystyle{\rm Mat}\left(\Sigma^{H^{-1/2}}{\rm vec}(\cdot)\right)

and let \mathcal{M}^{-*} denote the adjoint of the inverse of \mathcal{M}. The fact that \mathcal{M} is invertible is easily obtained (see Section 6.3.3) and is seen from the fact that ΣH\Sigma^{H} has all its eigenvalues equal to T2t+1T-2t+1 according to Section 6.3.2. Let 𝒮\mathcal{S} be the operator defined by

𝒮()\displaystyle\mathcal{S}(\cdot) \displaystyle\mapsto ()Σ1/2.\displaystyle\mathcal{M}^{-*}\left(\cdot\right)\>\Sigma^{-1/2}.

and let 𝒯\mathcal{T} be the mapping

𝒯()1t(T2t+2)1(Σ12).\displaystyle\mathcal{T}(\cdot)\mapsto\frac{1}{\sqrt{t\>(T-2t+2)}}\>\mathcal{M}^{-1}\>\left(\cdot\>\>\Sigma^{\frac{1}{2}}\right).

Our main result is the following theorem.

Theorem 4.1.

Let ξ\xi be any positive real number. Assume that η\eta is such that

(4.23) 𝒪𝒜¯s0+𝒩E\displaystyle\|\mathcal{O}\bar{\mathcal{A}}\>s_{0}+\mathcal{N}E\| \displaystyle\geq η\displaystyle\eta

with probability less than or equal to eν2/2e^{-\nu^{2}/2} for some ν>0\nu>0. Then, with probability greater than or equal to 1eν2/21-e^{-\nu^{2}/2},

(4.24) 𝒪𝒦L^F\displaystyle\|\mathcal{O}\mathcal{K}-\hat{L}\|_{F} \displaystyle\leq 2ηΛ,\displaystyle\frac{2\eta}{\Lambda},

where

Λξt(T2t+1)(14ξπ(e2)14σmax(Σ1/2)t)\displaystyle\Lambda\geq\xi\sqrt{t(T-2t+1)}\left(1-\frac{4\xi}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\sigma_{\max}\left(\Sigma^{1/2}\right)\>\sqrt{t}\right)
22tT2t+1σmax(Σ)σmin(Σ)((2ct+1)+ct)trank(𝒪𝒦)cσmin(Σ)+2tνξ.\displaystyle\hskip 28.45274pt-2\sqrt{2}\>\sqrt{\frac{t}{T-2t+1}}\>\sqrt{\frac{\sigma_{\max}(\Sigma)}{\sigma_{\min}(\Sigma)}}\>\sqrt{\left(\left(2\>ct+1\right)+c\>\sqrt{t}\right)\>\sqrt{t}\>\frac{{\rm rank}(\mathcal{O}\mathcal{K})}{c\>\sqrt{\sigma_{\min}(\Sigma)}}+2\>t}-\nu\xi.

In the remainder of this section, we introduce the results, notations and tools for proving this theorem. The proof is given in Section 4.6.

4.1. Some notations

For all s=0,,t1s=0,\ldots,t-1 and s=0,,T2t+1s^{\prime}=0,\ldots,T-2t+1, let 𝒜s,s\mathcal{A}_{s,s^{\prime}} denote the operator defined by

(4.25) 𝒜s,s(L)\displaystyle\mathcal{A}_{s,s^{\prime}}(L) =\displaystyle= r=0t1Ls,rxs+r\displaystyle\sum_{r=0}^{t-1}L_{s,r}x_{s^{\prime}+r}

and let 𝒜\mathcal{A} denote the operator

(4.26) L(𝒜s,s(L))s=1,,t,s=0,,T2t+1.\displaystyle L\mapsto\left(\mathcal{A}_{s,s^{\prime}}(L)\right)_{s=1,\ldots,t,\>s^{\prime}=0,\ldots,T-2t+1}.

The descent cone of the nuclear norm at 𝒪𝒦\mathcal{O}\mathcal{K}, denoted by 𝒟(,𝒪𝒦)\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K}), is defined by

(4.27) 𝒟(,𝒪𝒦)\displaystyle\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K}) =\displaystyle= τ>0{Dt×tH+τDH}.\displaystyle\cup_{\tau>0}\>\left\{D\in\mathbb{R}^{t\times t}\mid\|H+\tau D\|_{*}\leq\|H\|_{*}\right\}.

4.2. A deterministic inequality

The following result will be the key of our analysis.

Theorem 4.2.

[14] Assume that

(4.28) 𝒪𝒜¯s0+𝒩E\displaystyle\|\mathcal{O}\bar{\mathcal{A}}\>s_{0}+\mathcal{N}E\| \displaystyle\leq η.\displaystyle\eta.

Let L^\hat{L} denote any solution of (4.21). Then,

(4.29) 𝒪𝒦L^F\displaystyle\|\mathcal{O}\mathcal{K}-\hat{L}\|_{F} \displaystyle\leq 2ηλmin(𝒜,𝒟(,𝒪𝒦)),\displaystyle\frac{2\eta}{\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right)},

where

(4.30) λmin(𝒜,𝒟(,𝒪𝒦))\displaystyle\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right) =\displaystyle= minD𝒟(,𝒪𝒦)DF=1,𝒜(D)F.\displaystyle\min_{\stackrel{{\scriptstyle\|D\|_{F}=1,}}{{D\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})}}}\quad\|\mathcal{A}(D)\|_{F}.

4.3. A lower bound on λmin(𝒜,𝒟(,𝒪𝒦))\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right)

We will closely follow the approach of Tropp based on Mendelson’s bound. For this purpose, we will need the definition of the Gaussian mean width wG(𝔛)w_{G}(\mathfrak{X}) of a set 𝔛d\mathfrak{X}\in\mathbb{R}^{d}

wG(𝔛)\displaystyle w_{G}(\mathfrak{X}) =\displaystyle= 𝔼[supx𝔛G,x],\displaystyle\mathbb{E}\left[\sup_{x\in\mathfrak{X}}\>\langle G,x\rangle\right],

where the expectation is taken with respect to the Gaussian random vector GG taking values in d\mathbb{R}^{d}. The statistical dimension of 𝔛\mathfrak{X} (see e.g. [1]) is Let us also denote by QξQ_{\xi} the quantity

Qξ(D)\displaystyle Q_{\xi}(D) =\displaystyle= 1ts=0t1(|r=0t1Ds,rzs+r|ξ),\displaystyle\frac{1}{t}\sum_{s=0}^{t-1}\>\mathbb{P}\left(\left|\sum_{r=0}^{t-1}D_{s,r}z_{s^{\prime}+r}\right|\geq\xi\right),

which, as one might easily check, does not depend on ss^{\prime}. Recall that Σ\Sigma is the covariance matrix of [x0,,xt1]t[x_{0},\ldots,x_{t-1}]^{t} and that Σ±12\Sigma^{\pm\frac{1}{2}} denotes the square root of Σ±1\Sigma^{\pm 1}. Thus,

[z0zt1]:=Σ12[x0xt1]\displaystyle\left[\begin{array}[]{c}z_{0}\\ \vdots\\ z_{t-1}\end{array}\right]:=\Sigma^{-\frac{1}{2}}\left[\begin{array}[]{c}x_{0}\\ \vdots\\ x_{t-1}\end{array}\right]

follows the standard Gaussian distribution 𝒩(0,I)\mathcal{N}(0,I). Let D~=DΣ12\tilde{D}=D\Sigma^{\frac{1}{2}}. We now state Tropp’s result.

Lemma 4.3.

Define

K\displaystyle K =\displaystyle= 1t(T2t+2)1(Σ1/2𝒟(,𝒪𝒦)).\displaystyle\frac{1}{\sqrt{t\>(T-2t+2)}}\>\mathcal{M}^{-1}\left(\Sigma^{-1/2}\>\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right).

We have

λmin(𝒜,𝒟(,𝒪𝒦))\displaystyle\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right) \displaystyle\geq ξt(T2t2)infD~Σ1/2F=1Q2ξ(D~)2σmin(𝒮)wG(K)νξ\displaystyle\quad\xi\sqrt{t(T-2t-2)}\quad\inf_{\|\tilde{D}\Sigma^{1/2}\|_{F}=1}\quad Q_{2\xi}(\tilde{D})-\frac{2}{\sigma_{\min}(\mathcal{S})}\>w_{G}(K)-\nu\xi

with probability greater than or equal to 1exp(ν2/2)1-\exp(-\nu^{2}/2).

Proof.

See Section 6.1. ∎

4.4. A lower bound on infD~Σ1/2F=1Q2ξ(D~)\inf_{\|\tilde{D}\Sigma^{1/2}\|_{F}=1}\>Q_{2\xi}(\tilde{D})

Since

Z\displaystyle Z =r=0t1D~s,rzs+r\displaystyle=\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}

follows the law 𝒩(0,r=0t1D~s,r2)\mathcal{N}(0,\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}), using Lemma 6.2 from the Appendix, we get

(Z2(r=0t1D~s,r2)u)\displaystyle\mathbb{P}\left(Z^{2}\leq\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}\right)\sqrt{u}\right) 2π(eu2)14.\displaystyle\leq\frac{2}{\sqrt{\pi}}\left(e\ \frac{u}{2}\right)^{\frac{1}{4}}.

Thus, setting

u\displaystyle u =ξ4(r=0t1D~s,r2)2,\displaystyle=\frac{\xi^{4}}{\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}\right)^{2}},

we obtain

(|r=0t1D~s,rzs+r|ξ)\displaystyle\mathbb{P}\left(\left|\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right|\geq\xi\right) 12π(e2)14ξr=0t1D~s,r2.\displaystyle\geq 1-\frac{2}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\frac{\xi}{\sqrt{\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}}}.

This finally gives

Q2ξ(D~)\displaystyle Q_{2\xi}(\tilde{D}) 14ξπ(e2)141ts=0t11r=0t1D~s,r2.\displaystyle\geq 1-\frac{4\xi}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\frac{1}{t}\sum_{s=0}^{t-1}\frac{1}{\sqrt{\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}}}.

Let us now compute a lower bound to the infimum of this quantity over the set of D~\tilde{D} satisfying D~Σ1/2F=1\|\tilde{D}\Sigma^{1/2}\|_{F}=1. For this purpose, first note that

infD~Σ1/2F=1Q2ξ(D~)\displaystyle\inf_{\|\tilde{D}\Sigma^{1/2}\|_{F}=1}\>Q_{2\xi}(\tilde{D}) 1supD~Fσmax(Σ1/2)14ξπ(e2)141ts=0t11r=0t1D~s,r2.\displaystyle\geq 1-\sup_{\|\tilde{D}\|_{F}\geq\sigma_{\max}\left(\Sigma^{1/2}\right)^{-1}}\>\frac{4\xi}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\frac{1}{t}\sum_{s=0}^{t-1}\frac{1}{\sqrt{\sum_{r=0}^{t-1}\tilde{D}_{s,r}^{2}}}.

On the other hand, simple manipulations of the optimality conditions using symmetry prove that

supAF11ts=0t11r=0t1As,r2\displaystyle\sup_{\|A\|_{F}\leq 1}\>\frac{1}{t}\sum_{s=0}^{t-1}\frac{1}{\sqrt{\sum_{r=0}^{t-1}A_{s,r}^{2}}} =t.\displaystyle=\sqrt{t}.

Therefore,

(4.32) infD~Σ1/2F=1Q2ξ(D~)\displaystyle\inf_{\|\tilde{D}\Sigma^{1/2}\|_{F}=1}\>Q_{2\xi}(\tilde{D}) 14ξπ(e2)14σmax(Σ1/2)t.\displaystyle\geq 1-\frac{4\xi}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\sigma_{\max}\left(\Sigma^{1/2}\right)\>\sqrt{t}.

4.5. The Gaussian mean width of KK

The Gaussian mean width of a set 𝔛\mathfrak{X} and its statistical dimension are related by

(4.33) wG(𝔛)2\displaystyle w_{G}(\mathfrak{X})^{2} δ(𝔛)wG(𝔛)2+1.\displaystyle\leq\delta(\mathfrak{X})\leq w_{G}(\mathfrak{X})^{2}+1.

See [1, Proposition 10.2] for a proof. In this subsection, we estimate the Gaussian mean width of KK using its statistical dimension.

4.5.1. The descent cone 𝒟(,𝒪𝒦)\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})

The descent cone of the nuclear norm satisfies [14, Eq. (4.1)] which we recall now

(4.34) 𝒟(,𝒪𝒦)\displaystyle\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})^{\circ} =\displaystyle= cone¯((𝒪𝒦)).\displaystyle\overline{{\rm cone}}\left(\partial\|\cdot\|_{*}(\mathcal{O}\mathcal{K})\right).

4.5.2. Computation of KK^{\circ}

Using Proposition 4.2 in [14], we obtain

(4.35) supD~KD~F=1,D~,H~\displaystyle\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle \displaystyle\leq dist(H~,K).\displaystyle{\rm dist}\left(\tilde{H},K^{\circ}\right).

We now have to compute the polar cone of KK. We have

K\displaystyle K^{\circ} =\displaystyle= {ΔΔ,D0DK}\displaystyle\left\{\Delta\mid\langle\Delta,D\rangle\leq 0\quad\forall\>D\>\in K\right\}
=\displaystyle= {Δ1t(T2t+2)(ΔΣ12),D0D𝒟(,𝒪𝒦)}.\displaystyle\left\{\Delta\mid\langle\frac{1}{\sqrt{t\>(T-2t+2)}}\>\mathcal{M}^{-}\>\left(\Delta\>\Sigma^{\frac{1}{2}}\right),D\rangle\leq 0\quad\forall\>D\>\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right\}.

Recall that 𝒯\mathcal{T} is the mapping

Δ1t(T2t+2)1(ΔΣ12).\displaystyle\Delta\mapsto\frac{1}{\sqrt{t\>(T-2t+2)}}\>\mathcal{M}^{-1}\>\left(\Delta\>\Sigma^{\frac{1}{2}}\right).

Then, we obtain that

K\displaystyle K^{\circ} =\displaystyle= 𝒯1(𝒟(,𝒪𝒦)).\displaystyle\mathcal{T}^{-1}\left(\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})^{\circ}\right).

4.5.3. An upper bound on the statistical dimension of KK

Let us write the singular value decomposition of 𝒪𝒦\mathcal{O}\mathcal{K} as

𝒪𝒦\displaystyle\mathcal{O}\mathcal{K} =\displaystyle= [U1U2][diag(σ𝒪𝒦)000][V1V2]t\displaystyle\left[\begin{array}[]{cc}U_{1}&U_{2}\end{array}\right]\left[\begin{array}[]{cc}{\rm diag}(\sigma_{\mathcal{O}\mathcal{K}})&0\\ 0&0\end{array}\right]\left[\begin{array}[]{cc}V_{1}&V_{2}\end{array}\right]^{t}

where σ𝒪𝒦\sigma_{\mathcal{O}\mathcal{K}} is the vector of the singular values of 𝒪𝒦\mathcal{O}\mathcal{K}. Moreover, the subdifferential of the Schatten norm is given by

(𝒪𝒦)\displaystyle\partial\|\cdot\|_{*}(\mathcal{O}\mathcal{K}) =\displaystyle= [U1U2]{[I00Y]Y1}[V1V2]t.\displaystyle\left[\begin{array}[]{cc}U_{1}&U_{2}\end{array}\right]\left\{\left[\begin{array}[]{cc}I&0\\ 0&Y\end{array}\right]\mid\|Y\|\leq 1\right\}\left[\begin{array}[]{cc}V_{1}&V_{2}\end{array}\right]^{t}.

Therefore, using (4.35), we obtain that

𝔼[(supD~KD~F=1,D~,H~)2]\displaystyle\mathbb{E}\left[\left(\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle\right)^{2}\right] \displaystyle\leq 𝔼[minτ>0,Y1𝒯1(τ[U1V1t00U2YV2t])H~F2].\displaystyle\mathbb{E}\left[\min_{\tau>0,\>\|Y\|\leq 1}\|\mathcal{T}^{-1}\left(\tau\left[\begin{array}[]{cc}U_{1}V_{1}^{t}&0\\ 0&U_{2}YV_{2}^{t}\end{array}\right]\right)-\tilde{H}\|_{F}^{2}\right].

Thus, we get

𝔼[(supD~KD~F=1,D~,H~)2]\displaystyle\mathbb{E}\left[\left(\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle\right)^{2}\right] 𝔼[minτ>0,Y1𝒯1(τ2U1V1tF2+τU2YV2t𝒯2,2(H~)F2\displaystyle\leq\mathbb{E}\Bigg{[}\min_{\tau>0,\>\|Y\|\leq 1}\|\mathcal{T}^{-1}\|\>\Bigg{(}\tau^{2}\|U_{1}V_{1}^{t}\|_{F}^{2}+\|\tau U_{2}YV_{2}^{t}-\mathcal{T}_{2,2}(\tilde{H})\|_{F}^{2}
+𝒯1,2(H~)F2+𝒯2,1(H~)F2)]\displaystyle\hskip 56.9055pt+\|\mathcal{T}_{1,2}(\tilde{H})\|_{F}^{2}+\|\mathcal{T}_{2,1}(\tilde{H})\|_{F}^{2}\Bigg{)}\Bigg{]}

where

𝒯\displaystyle\mathcal{T} =[𝒯11𝒯12𝒯21𝒯22],\displaystyle=\left[\begin{array}[]{cc}\mathcal{T}_{11}&\mathcal{T}_{12}\\ \mathcal{T}_{21}&\mathcal{T}_{22}\end{array}\right],

and the dimension of T11T_{11} is rank(𝒪𝒦)×rank(𝒪𝒦){\rm rank}(\mathcal{O}\mathcal{K})\times{\rm rank}(\mathcal{O}\mathcal{K}) and the dimension of 𝒯j,j\mathcal{T}_{j,j^{\prime}} for all other combinations of jj and jj^{\prime} is easily deduced from the dimension of 𝒯\mathcal{T}. which gives, after taking τ=U2t𝒯2,2(H~)V2\tau=\|U_{2}^{t}\mathcal{T}_{2,2}(\tilde{H})V_{2}\|,

𝔼[(supD~KD~F=1,D~,H~)2]\displaystyle\mathbb{E}\left[\left(\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle\right)^{2}\right] σmin(𝒯)1(𝔼[τ2]rank(𝒪𝒦)\displaystyle\leq\sigma_{\min}(\mathcal{T})^{-1}\>\Bigg{(}\mathbb{E}\left[\tau^{2}\right]\>{\rm rank}(\mathcal{O}\mathcal{K})
+(σmax(𝒯1,2)2+σmax(𝒯2,1)2)𝔼[H~F2]).\displaystyle\hskip 56.9055pt+\Bigg{(}\sigma_{\max}(\mathcal{T}_{1,2})^{2}+\sigma_{\max}(\mathcal{T}_{2,1})^{2}\Bigg{)}\>\mathbb{E}\Bigg{[}\|\tilde{H}\|_{F}^{2}\Bigg{]}\Bigg{)}.

Note that

τ\displaystyle\tau 𝒯2,2H~.\displaystyle\leq\|\mathcal{T}_{2,2}\|\|\tilde{H}\|.

By Gordon’s theorem [16, Theorem 10.2], 𝔼[H~]2t\mathbb{E}\left[\|\tilde{H}\|\right]\leq 2\>\sqrt{t}. Moreover, by Lemma 6.1 in the Appendix,

𝔼[H~2]\displaystyle\mathbb{E}\left[\|\tilde{H}\|^{2}\right] 2c(2ct+1)+2t.\displaystyle\leq\frac{2}{c}\left(2\>ct+1\right)+2\>\sqrt{t}.

On the other hand, 𝔼[H~F2]=2t\mathbb{E}\left[\|\tilde{H}\|_{F}^{2}\right]=2t. Therefore, we obtain that

δ(K)\displaystyle\delta(K) =𝔼[(supD~KD~F=1,D~,H~)2]\displaystyle=\mathbb{E}\left[\left(\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle\right)^{2}\right]
2σmin(𝒯)1(1c𝒯2,22((2ct+1)+ct)rang(𝒪𝒦)\displaystyle\leq 2\>\sigma_{\min}(\mathcal{T})^{-1}\>\Bigg{(}\frac{1}{c}\>\|\mathcal{T}_{2,2}\|^{2}\left(\left(2\>ct+1\right)+c\>\sqrt{t}\right)\>{\rm rang}(\mathcal{O}\mathcal{K})
+2(σmax(𝒯1,2)2+σmax(𝒯2,1)2)t).\displaystyle\hskip 36.98866pt+2\>\Bigg{(}\sigma_{\max}(\mathcal{T}_{1,2})^{2}+\sigma_{\max}(\mathcal{T}_{2,1})^{2}\Bigg{)}\>t\Bigg{)}.

Using (4.33), we obtain that

(4.39) wG(K)\displaystyle w_{G}(K)\leq
2σmin(𝒯)1(1c𝒯2,22((2ct+1)+ct)rang(𝒪𝒦)+2(σmax(𝒯1,2)2+σmax(𝒯2,1)2)t)\displaystyle\sqrt{2\>\sigma_{\min}(\mathcal{T})^{-1}\>\Bigg{(}\frac{1}{c}\>\|\mathcal{T}_{2,2}\|^{2}\left(\left(2\>ct+1\right)+c\>\sqrt{t}\right)\>{\rm rang}(\mathcal{O}\mathcal{K})+2\>\Bigg{(}\sigma_{\max}(\mathcal{T}_{1,2})^{2}+\sigma_{\max}(\mathcal{T}_{2,1})^{2}\Bigg{)}\>t\Bigg{)}}

4.6. Proof of Theorem 4.4

Combining Lemma 4.3 with (4.32) and (4.39), we obtain that

λmin(𝒜,𝒟(,𝒪𝒦))\displaystyle\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right) \displaystyle\geq tT2t24ξ2π(e2)14σmin(Σ1/2)\displaystyle\quad t\>\sqrt{T-2t-2}\quad\frac{4\xi^{2}}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\sigma_{\min}\left(\Sigma^{1/2}\right)
22σmin(𝒮)(1c𝒯2,22((2ct+1)+ct)rang(𝒪𝒦)σmin(𝒯)+(σmax(𝒯1,2)2+σmax(𝒯2,1)2)t)νξ\displaystyle\hskip-113.81102pt-\frac{2\sqrt{2}}{\sigma_{\min}(\mathcal{S})}\>\sqrt{\Bigg{(}\frac{1}{c}\>\|\mathcal{T}_{2,2}\|^{2}\left(\left(2\>ct+1\right)+c\>\sqrt{t}\right)\>\frac{{\rm rang}(\mathcal{O}\mathcal{K})}{\sigma_{\min}(\mathcal{T})}+\Bigg{(}\sigma_{\max}(\mathcal{T}_{1,2})^{2}+\sigma_{\max}(\mathcal{T}_{2,1})^{2}\Bigg{)}\>t\Bigg{)}}-\nu\xi

Using that

𝒯2,22\displaystyle\|\mathcal{T}_{2,2}\|^{2} 𝒯2,\displaystyle\leq\|\mathcal{T}\|^{2},

and

σmax(𝒯1,2)2+σmax(𝒯2,1)2\displaystyle\sigma_{\max}(\mathcal{T}_{1,2})^{2}+\sigma_{\max}(\mathcal{T}_{2,1})^{2} 2𝒯2,\displaystyle\leq 2\>\|\mathcal{T}\|^{2},

and combining this last inequality with Theorem 4.2, we obtain the following proposition.

Proposition 4.4.

Let ξ\xi be any positive real number. Assume that η\eta is such that

(4.40) 𝒪𝒜¯s0+𝒩E\displaystyle\|\mathcal{O}\bar{\mathcal{A}}\>s_{0}+\mathcal{N}E\| \displaystyle\geq η\displaystyle\eta

with probability less than or equal to eν2/2e^{-\nu^{2}/2} for some ν>0\nu>0. Then, with probability greater than or equal to 1eν2/21-e^{-\nu^{2}/2},

(4.41) 𝒪𝒦L^F\displaystyle\|\mathcal{O}\mathcal{K}-\hat{L}\|_{F} \displaystyle\leq 2ηΛ,\displaystyle\frac{2\eta}{\Lambda},

where

Λξt(T2t+1)(14ξπ(e2)14σmax(Σ1/2)t)\displaystyle\Lambda\geq\xi\sqrt{t(T-2t+1)}\left(1-\frac{4\xi}{\sqrt{\pi}}\left(\frac{e}{2}\right)^{\frac{1}{4}}\sigma_{\max}\left(\Sigma^{1/2}\right)\>\sqrt{t}\right)
22𝒯σmin(𝒮)((2ct+1)+ct)rang(𝒪𝒦)cσmin(𝒯)+2tνξ.\displaystyle\hskip 28.45274pt-\frac{2\sqrt{2}\|\mathcal{T}\|}{\sigma_{\min}(\mathcal{S})}\>\sqrt{\left(\left(2\>ct+1\right)+c\>\sqrt{t}\right)\>\frac{{\rm rang}(\mathcal{O}\mathcal{K})}{c\>\sigma_{\min}(\mathcal{T})}+2\>t}-\nu\xi.

Combinig this result with the bounds from Section 6.3.3, the proof is completed.

5. Conclusion

The goal of the present note is to show that the performance of nuclear norm penalized subspace-type methods can be studied theoretically. We concentrated on a special approach due to Bauer [2]. Our approach can easily be extended to the case of the method promoted in [6]. Our next objective for future research is to address the case of more general noise sequences such as in [9].

6. Appendix: Technical intermediate results

In this section, we gather some technical results used in the proof of Theorem 4.4.

6.1. Proof of Lemma 4.3

6.1.1. First step

We have

(6.42) λmin(𝒜,𝒟(,𝒪𝒦))\displaystyle\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right) =\displaystyle= minD𝒟(,𝒪𝒦)DF=1,𝒜(D)F\displaystyle\min_{\stackrel{{\scriptstyle\|D\|_{F}=1,}}{{D\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})}}}\>\|\mathcal{A}(D)\|_{F}
(6.43) =\displaystyle= minD𝒟(,𝒪𝒦)DF=1,(s=0t1s=0T2t+1(r=0t1Ds,rxs+r)2)12\displaystyle\min_{\stackrel{{\scriptstyle\|D\|_{F}=1,}}{{D\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})}}}\>\left(\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}D_{s,r}x_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}}

Recall that Σ\Sigma is the covariance matrix of [x0,,xt1]t[x_{0},\ldots,x_{t-1}]^{t} and that Σ±12\Sigma^{\pm\frac{1}{2}} denotes the square root of Σ±1\Sigma^{\pm 1}. Thus,

[z0zt1]:=Σ12[x0xt1]\displaystyle\left[\begin{array}[]{c}z_{0}\\ \vdots\\ z_{t-1}\end{array}\right]:=\Sigma^{-\frac{1}{2}}\left[\begin{array}[]{c}x_{0}\\ \vdots\\ x_{t-1}\end{array}\right]

follows the standard Gaussian distribution 𝒩(0,I)\mathcal{N}(0,I). Recall also that D~=DΣ12\tilde{D}=D\Sigma^{\frac{1}{2}}. Then, we have

(6.45) λmin(𝒜,𝒟(,𝒪𝒦))\displaystyle\lambda_{\min}\left(\mathcal{A},\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\right) =\displaystyle= minD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,(s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\min_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\>\left(\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}}

Now, we have

(1t(T2t+1)s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\left(\frac{1}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}} \displaystyle\geq 1t(T2t+1)s=0t1s=0T2t+1|r=0t1D~s,rzs+r|\displaystyle\frac{1}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left|\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right|

which gives, by Markov’s inequality

(1t(T2t+1)s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\left(\frac{1}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}} \displaystyle\geq ξt(T2t+1)s=0t1s=0T2t+1  1{|r=0t1D~s,rzs+r|ξ}.\displaystyle\frac{\xi}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\mathds{1}\left\{\left|\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right|\geq\xi\right\}.

Thus, we obtain

(1t(T2t+1)s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\left(\frac{1}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}}
ξQ2ξ(D~)ξt(T2t+1)s=0t1s=0T2t+1(Q2ξ(D~)𝟙{|r=0t1D~s,rzs+r|ξ}).\displaystyle\hskip 28.45274pt\geq\quad\xi Q_{2\xi}(\tilde{D})-\frac{\xi}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(Q_{2\xi}(\tilde{D})-\mathds{1}\left\{\left|\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right|\geq\xi\right\}\right).

6.1.2. Second step

Let

f(z0,,zTt)\displaystyle f(z_{0},\ldots,z_{T-t}) =\displaystyle= supD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,s=0t1s=0T2t+1(Q2ξ(D~)𝟙{|r=0t1D~s,rzs+r|ξ}).\displaystyle\sup_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(Q_{2\xi}(\tilde{D})-\mathds{1}\left\{\left|\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right|\geq\xi\right\}\right).

We will now use the bounded difference inequality to control this quantity. For this purpose, notice that

|f(ζ0,,ζs,,ζTt)f(ζ0,,ζs,,ζTt)|\displaystyle|f(\zeta_{0},\ldots,\zeta_{s},\ldots,\zeta_{T-t})-f(\zeta_{0},\ldots,\zeta^{\prime}_{s},\ldots,\zeta_{T-t})| \displaystyle\leq 2t(T2t+2).\displaystyle 2\>t\>(T-2t+2).

for all (ζ0,,ζs,,ζTt)(\zeta_{0},\ldots,\zeta_{s},\ldots,\zeta_{T-t}) in Tt+1\mathbb{R}^{T-t+1} and ζs\zeta^{\prime}_{s}\in\mathbb{R}. Thus,

f(z0,,zTt)𝔼[f(z0,,zTt)]\displaystyle f(z_{0},\ldots,z_{T-t})-\mathbb{E}\left[f(z_{0},\ldots,z_{T-t})\right] \displaystyle\leq νt(T2t+2),\displaystyle\nu\>\sqrt{t\>(T-2t+2)},

with probability 1eν2/21-e^{-\nu^{2}/2} for all ν+\nu\in\mathbb{R}_{+}. Now, the expected supremum can be bounded in the same manner as in [14, Equation 5.6].

𝔼[f(z0,,zTt)]\displaystyle\mathbb{E}\left[f(z_{0},\ldots,z_{T-t})\right] \displaystyle\leq 2ξ𝔼[supD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,s=0t1s=0T2t+1εs,sr=0t1D~s,rzs+r]\displaystyle\frac{2}{\xi}\>\mathbb{E}\left[\sup_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right]

where εs,r\varepsilon_{s,r}, s=0,,t1s=0,\ldots,t-1 and r=0,,t1r=0,\ldots,t-1 are independent Rademacher random variables which are independent of zsz_{s^{\prime}}, s=0,,Tts^{\prime}=0,\ldots,T-t. Therefore, we obtain

infD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,(1t(T2t+1)s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\inf_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\left(\frac{1}{t\>(T-2t+1)}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}}
ξQ2ξ(D~)ξt(T2t+1)(2ξ𝔼[supD~Σ1/2𝒟(,𝒪𝒦)s=0t1s=0T2t+1εs,sr=0t1D~s,rzs+r]\displaystyle\hskip 28.45274pt\geq\quad\xi Q_{2\xi}(\tilde{D})-\frac{\xi}{t\>(T-2t+1)}\quad\Bigg{(}\frac{2}{\xi}\>\mathbb{E}\left[\sup_{\tilde{D}\in\Sigma^{-1/2}\>\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})}\quad\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right]
+νt(T2t+2)),\displaystyle\hskip 85.35826pt+\nu\>\sqrt{t\>(T-2t+2)}\Bigg{)},

which gives

infD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,(s=0t1s=0T2t+1(r=0t1D~s,rzs+r)2)12\displaystyle\inf_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\left(\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\>\>\left(\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right)^{2}\right)^{\frac{1}{2}}
ξt(T2t+2)Q2ξ(D~)\displaystyle\hskip 56.9055pt\geq\quad\xi\>\sqrt{t\>(T-2t+2)}\>Q_{2\xi}(\tilde{D})
2𝔼[supD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,1t(T2t+2)s=0t1s=0T2t+1εs,sr=0t1D~s,rzs+r]νξ.\displaystyle\hskip 85.35826pt-2\>\mathbb{E}\left[\sup_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\quad\frac{1}{\sqrt{t\>(T-2t+2)}}\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right]-\nu\xi.

Let us denote by WW the quantity

W\displaystyle W =\displaystyle= 𝔼[supD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,1t(T2t+2)s=0t1s=0T2t+1εs,sr=0t1D~s,rzs+r].\displaystyle\mathbb{E}\left[\sup_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\quad\frac{1}{\sqrt{t\>(T-2t+2)}}\sum_{s=0}^{t-1}\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}\sum_{r=0}^{t-1}\tilde{D}_{s,r}z_{s^{\prime}+r}\right].

Then, we have

W\displaystyle W =\displaystyle= 𝔼[supD~𝒟(,𝒪𝒦)Σ1/2D~Σ1/2F=1,1t(T2t+2)D~,H],\displaystyle\mathbb{E}\left[\sup_{\stackrel{{\scriptstyle\|\tilde{D}\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}\in\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\Sigma^{1/2}}}}\quad\frac{1}{\sqrt{t\>(T-2t+2)}}\>\langle\tilde{D},H\rangle\right],

where we recall that HH is the random matrix whose components are given by

Hs,r\displaystyle H_{s,r} =\displaystyle= s=0T2t+1εs,szs+r\displaystyle\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}z_{s^{\prime}+r}

and ΣH\Sigma^{H} denotes the covariance matrix of vec(H){\rm vec}(H). Let H~=(H)\tilde{H}=\mathcal{M}(H) where \mathcal{M} denotes the operator defined by

()\displaystyle\mathcal{M}(\cdot) =\displaystyle= Mat(ΣH1/2vec()).\displaystyle{\rm Mat}\left(\Sigma^{H^{-1/2}}{\rm vec}(\cdot)\right).

Then H~\tilde{H} is a Gaussian matrix with i.i.d. components with law 𝒩(0,1)\mathcal{N}(0,1). Using the invertibility of \mathcal{M} proved in Section 6.3.3, we get

W\displaystyle W =𝔼[supD~K(D~Σ1/2)F=1,D~,H~],\displaystyle=\mathbb{E}\left[\sup_{\stackrel{{\scriptstyle\|\mathcal{M}^{-*}\left(\tilde{D}^{*}\>\Sigma^{-1/2}\right)\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle\right],

where

K\displaystyle K =\displaystyle= 1t(T2t+2)(𝒟(,𝒪𝒦)Σ12),\displaystyle\frac{1}{\sqrt{t\>(T-2t+2)}}\>\mathcal{M}^{-*}\>\left(\mathcal{D}(\|\cdot\|_{*},\mathcal{O}\mathcal{K})\>\Sigma^{\frac{1}{2}}\right),

where we recall that \mathcal{M}^{-*} is the adjoint of the inverse of \mathcal{M}. Moreover, we have

supD~K(D~)Σ1/2F=1,D~,H~\displaystyle\sup_{\stackrel{{\scriptstyle\|\mathcal{M}^{-*}\left(\tilde{D}^{*}\right)\>\Sigma^{-1/2}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle \displaystyle\leq 1σmin(𝒮)supD~KD~F=1,D~,H~\displaystyle\frac{1}{\sigma_{\min}(\mathcal{S})}\quad\sup_{\stackrel{{\scriptstyle\|\tilde{D}^{*}\|_{F}=1,}}{{\tilde{D}^{*}\in K}}}\quad\>\left\langle\tilde{D}^{*},\tilde{H}\right\rangle

where σmin(𝒮)\sigma_{\min}(\mathcal{S}) is the smallest singular value of the operator 𝒮\mathcal{S} defined by

𝒮()\displaystyle\mathcal{S}(\cdot) \displaystyle\mapsto ()Σ1/2.\displaystyle\mathcal{M}^{-*}\left(\cdot\right)\>\Sigma^{-1/2}.

Thus,

W\displaystyle W wG(K)σmin(𝒮)\displaystyle\leq\frac{w_{G}(K)}{\sigma_{\min}(\mathcal{S})}

and the proof is completed.

6.2. Control of 𝔼[H~2]\mathbb{E}\left[\|\tilde{H}\|^{2}\right]

Lemma 6.1.

We have

𝔼[H~2]\displaystyle\mathbb{E}\left[\|\tilde{H}\|^{2}\right] (1+12ct)𝔼[H~]2+𝔼[H~].\displaystyle\leq\left(1+\frac{1}{2\>ct}\right)\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\mathbb{E}[\|\tilde{H}\|].
Proof.

By Gaussian concentration [13, Proposition 4] and the fact that the spectral (operator) norm is 1-Lipschitz, we obtain that for all u>0u>0,

(H~𝔼[H~]+u)\displaystyle\mathbb{P}\left(\|\tilde{H}\|\geq\mathbb{E}\left[\|\tilde{H}\|\right]+u\right) ecu2\displaystyle\leq e^{-cu^{2}}

for some absolute positive constant cc. Taking u=δ𝔼[H~]u=\delta\mathbb{E}\left[\|\tilde{H}\|\right], we obtain that

(H~(1+δ)𝔼[H~])\displaystyle\mathbb{P}\left(\|\tilde{H}\|\geq(1+\delta)\>\mathbb{E}\left[\|\tilde{H}\|\right]\right) e4cδ2t.\displaystyle\leq e^{-4\>c\delta^{2}t}.

Thus,

𝔼[H~2]\displaystyle\mathbb{E}\left[\|\tilde{H}\|^{2}\right] =0+(H~2s)𝑑s\displaystyle=\int_{0}^{+\infty}\mathbb{P}\left(\|\tilde{H}\|^{2}\geq s\right)ds
=0𝔼[H~]2(H~2s)𝑑s+𝔼[H~]2+(H~2s)𝑑s\displaystyle=\int_{0}^{\mathbb{E}\left[\|\tilde{H}\|\right]^{2}}\mathbb{P}\left(\|\tilde{H}\|^{2}\geq s\right)ds+\int_{\mathbb{E}\left[\|\tilde{H}\|\right]^{2}}^{+\infty}\mathbb{P}\left(\|\tilde{H}\|^{2}\geq s\right)ds
=𝔼[H~]2+𝔼[H~]2+(H~s)𝑑s\displaystyle=\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\int_{\mathbb{E}\left[\|\tilde{H}\|\right]^{2}}^{+\infty}\mathbb{P}\left(\|\tilde{H}\|\geq\sqrt{s}\right)ds
𝔼[H~]2+𝔼[H~]2+exp(4c(s𝔼[H~]𝔼[H~])2t)𝑑s\displaystyle\leq\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\int_{\mathbb{E}\left[\|\tilde{H}\|\right]^{2}}^{+\infty}\exp\left(-4\>c\left(\frac{\sqrt{s}-\mathbb{E}[\|\tilde{H}\|]}{\mathbb{E}[\|\tilde{H}\|]}\right)^{2}\>t\right)\>ds

and making the change of variable r=(s𝔼[H~])2r=(\sqrt{s}-\mathbb{E}[\|\tilde{H}\|])^{2}, we obtain

𝔼[H~2]\displaystyle\mathbb{E}\left[\|\tilde{H}\|^{2}\right] =0+(H~2s)𝑑s\displaystyle=\int_{0}^{+\infty}\mathbb{P}\left(\|\tilde{H}\|^{2}\geq s\right)ds
𝔼[H~]2+0+exp(4ct𝔼[H~]2r)(1+1r)𝑑r\displaystyle\leq\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\int_{0}^{+\infty}\exp\left(-4\>\frac{ct}{\mathbb{E}[\|\tilde{H}\|]^{2}}\>r\right)\>\left(1+\frac{1}{\sqrt{r}}\right)\>dr
𝔼[H~]2+20+exp(4ct𝔼[H~]2r)𝑑r+0𝔼[H~]21r𝑑r.\displaystyle\leq\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+2\>\int_{0}^{+\infty}\exp\left(-4\>\frac{ct}{\mathbb{E}[\|\tilde{H}\|]^{2}}\>r\right)\>dr+\int_{0}^{\mathbb{E}[\|\tilde{H}\|]^{2}}\frac{1}{\sqrt{r}}\>dr.

Thus, we obtain

𝔼[H~2]\displaystyle\mathbb{E}\left[\|\tilde{H}\|^{2}\right] 𝔼[H~]2+𝔼[H~]𝔼[H~]22ct[exp(4ct𝔼[H~]2r)]0+\displaystyle\leq\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\mathbb{E}[\|\tilde{H}\|]-\frac{\mathbb{E}[\|\tilde{H}\|]^{2}}{2\>ct}\>\left[\exp\left(-4\>\frac{ct}{\mathbb{E}[\|\tilde{H}\|]^{2}}\>r\right)\right]_{0}^{+\infty}
(1+12ct)𝔼[H~]2+𝔼[H~].\displaystyle\leq\left(1+\frac{1}{2\>ct}\right)\mathbb{E}\left[\|\tilde{H}\|\right]^{2}+\mathbb{E}[\|\tilde{H}\|].

This completes the proof. ∎

6.3. Some properties of Σ\Sigma, ΣH\Sigma^{H}, \mathcal{M}, 𝒮\mathcal{S} and 𝒯\mathcal{T}

6.3.1. The spectrum of Σ\Sigma

The spectrum of Σ\Sigma can be studied using the methods of Grenander and Szego [8]. In [5], the classical results are extended to the case of generalized fractional processes. it was shown in particular by Grenander and Szego in [8, Chapter 5] that 2πmλ2πM2\pi m\leq\lambda\leq 2\pi M for any eigenvalue λ\lambda of Σ\Sigma, where mm and MM are the essential infimum and supremum of the spectral density function ff of the process. For ARMA processes, this function is just

f(ν)\displaystyle f(\nu) =σε22π|θ(eiν)ϕ(eiν)|2\displaystyle=\frac{\sigma_{\varepsilon}^{2}}{2\pi}\left|\frac{\theta(e^{i\nu})}{\phi(e^{i\nu})}\right|^{2}

where

ϕ(z)\displaystyle\phi(z) =1a1zapzp and θ(z)=1+b1z++bqzq.\displaystyle=1-a_{1}z-\cdots-a_{p}z^{p}\textrm{ and }\theta(z)=1+b_{1}z+\cdots+b_{q}z^{q}.

6.3.2. The spectrum of ΣH\Sigma^{H}

Recall that HH is the random matrix whose components are given by

Hs,r\displaystyle H_{s,r} =\displaystyle= s=0T2t+1εs,szs+r.\displaystyle\sum_{s^{\prime}=0}^{T-2t+1}\varepsilon_{s,s^{\prime}}z_{s^{\prime}+r}.

where εs,s\varepsilon_{s,s^{\prime}}, s=0,,t1s=0,\ldots,t-1 and s=0,,T2t+1s^{\prime}=0,\ldots,T-2t+1 are independent Rademacher random variables which are independent of zsz_{s^{\prime}}, s=0,,Tts^{\prime}=0,\ldots,T-t.

Using matrix representation, we have

H\displaystyle H =\displaystyle= εz\displaystyle\varepsilon z
=\displaystyle= (ε0,1ε0,2ε0,T2t+1ε1,1ε1,2ε1,T2t+1εt1,1εt1,2εt1,T2t+1)(z0z1zt1z1z2ztzT2t+1zT2t+2zTt).\displaystyle\begin{pmatrix}\varepsilon_{0,1}&\varepsilon_{0,2}&\cdots&\varepsilon_{0,T-2t+1}\\ \varepsilon_{1,1}&\varepsilon_{1,2}&\cdots&\varepsilon_{1,T-2t+1}\\ \vdots&\vdots&\ddots&\vdots\\ \varepsilon_{t-1,1}&\varepsilon_{t-1,2}&\cdots&\varepsilon_{t-1,T-2t+1}\end{pmatrix}\begin{pmatrix}z_{0}&z_{1}&\cdots&z_{t-1}\\ z_{1}&z_{2}&\cdots&z_{t}\\ \vdots&\vdots&\ddots&\vdots\\ z_{T-2t+1}&z_{T-2t+2}&\cdots&z_{T-t}\end{pmatrix}.

Let ZpZ_{p} be the (p+1)(p+1)-th column of zz. Then

vec(H)\displaystyle{\rm vec}(H) =\displaystyle= (εZ0εZ1εZt1).\displaystyle\begin{pmatrix}\varepsilon Z_{0}\\ \varepsilon Z_{1}\\ \vdots\\ \varepsilon Z_{t-1}\end{pmatrix}.

The (p,q)(p,q)-th block of ΣH\Sigma^{H} is given by

Σ[p,q]H=𝔼[εE[ZpZqt]εt],\displaystyle\Sigma^{H}_{[p,q]}=\mathbb{E}[\varepsilon E[Z_{p}Z_{q}^{t}]\varepsilon^{t}],

where for p<qp<q

𝔼[ZpZqt]=(00IT2t+1(qp)0).\displaystyle\mathbb{E}[Z_{p}Z_{q}^{t}]=\begin{pmatrix}0&0\\ I_{T-2t+1-(q-p)}&0\end{pmatrix}.

Here, IT2t+1(qp)I_{T-2t+1-(q-p)} denotes the identity matrix of dimension T2t+1(qp)T-2t+1-(q-p).

Partitioning ε\varepsilon appropriately as

ε=(ε[1,1]ε[1,2]ε[2,1]ε[2,2]),\displaystyle\varepsilon=\begin{pmatrix}\varepsilon_{[1,1]}&\varepsilon_{[1,2]}\\ \varepsilon_{[2,1]}&\varepsilon_{[2,2]}\end{pmatrix},

we deduce that

Σ[p,q]H\displaystyle\Sigma^{H}_{[p,q]} =\displaystyle= E[(ε[1,1]ε[1,2]ε[2,1]ε[2,2])(00IT2t+1(qp)0)(ε[1,1]tε[2,1]tε[1,2]tε[2,2]t)]\displaystyle E\left[\begin{pmatrix}\varepsilon_{[1,1]}&\varepsilon_{[1,2]}\\ \varepsilon_{[2,1]}&\varepsilon_{[2,2]}\end{pmatrix}\begin{pmatrix}0&0\\ I_{T-2t+1-(q-p)}&0\end{pmatrix}\begin{pmatrix}\varepsilon_{[1,1]}^{t}&\varepsilon_{[2,1]}^{t}\\ \varepsilon_{[1,2]}^{t}&\varepsilon_{[2,2]}^{t}\end{pmatrix}\right]
=\displaystyle= E(ε[1,2]ε[1,1]tε[1,2]ε[2,1]tε[2,2]ε[1,1]tε[2,2]ε[2,1]t)\displaystyle E\begin{pmatrix}\varepsilon_{[1,2]}\varepsilon_{[1,1]}^{t}&\varepsilon_{[1,2]}\varepsilon_{[2,1]}^{t}\\ \varepsilon_{[2,2]}\varepsilon_{[1,1]}^{t}&\varepsilon_{[2,2]}\varepsilon_{[2,1]}^{t}\end{pmatrix}
=\displaystyle= 0\displaystyle 0

for p<qp<q. Similarly, we can show that Σ[p,q]H=0\Sigma^{H}_{[p,q]}=0 for p>qp>q. As for p=qp=q, we have E[ZpZpt]=IT2t+1E[Z_{p}Z_{p}^{t}]=I_{T-2t+1}. Thus

Σ[p,p]H=E[εεt]=(T2t+1)It\displaystyle\Sigma^{H}_{[p,p]}=E[\varepsilon\varepsilon^{t}]=(T-2t+1)I_{t}

It is then follows that ΣH=(T2t+1)It(T2t+1)\Sigma^{H}=(T-2t+1)I_{t(T-2t+1)}.

6.3.3. Consequences for \mathcal{M}, 𝒮\mathcal{S} and 𝒯\mathcal{T}

Recall that \mathcal{M} denotes the operator defined by

(6.46) \displaystyle\mathcal{M} =\displaystyle= Mat(ΣH1/2vec())\displaystyle{\rm Mat}\left(\Sigma^{H^{-1/2}}{\rm vec}(\cdot)\right)

and \mathcal{M}^{-*} denotes the adjoint of the inverse of \mathcal{M}. Using the resuts of Section 6.3.2, we obtain that

(6.47) \displaystyle\mathcal{M} =\displaystyle= 1T2t+1Id.\displaystyle\frac{1}{\sqrt{T-2t+1}}\ Id.

and

(6.48) \displaystyle\mathcal{M}^{-*} =\displaystyle= T2t+1Id.\displaystyle\sqrt{T-2t+1}\ Id.

Using these results, we obtain that 𝒮\mathcal{S} is the operator defined by

𝒮()\displaystyle\mathcal{S}(\cdot) \displaystyle\mapsto 1T2t+1Σ1/2.\displaystyle\frac{1}{\sqrt{T-2t+1}}\>\cdot\>\Sigma^{-1/2}.

and 𝒯\mathcal{T} is the mapping

𝒯()1tΣ12.\displaystyle\mathcal{T}(\cdot)\mapsto\frac{1}{\sqrt{t}}\>\cdot\>\>\Sigma^{\frac{1}{2}}.

We thus have the following results on 𝒯\mathcal{T}.

𝒯\displaystyle\|\mathcal{T}\| \displaystyle\leq σmax(Σ)t\displaystyle\frac{\sqrt{\sigma_{\max}(\Sigma)}}{\sqrt{t}}

and

σmin(𝒯)\displaystyle\sigma_{\min}(\mathcal{T}) \displaystyle\geq σmin(Σ)t.\displaystyle\frac{\sqrt{\sigma_{\min}(\Sigma)}}{\sqrt{t}}.

We also obtain that

σmin(𝒮)\displaystyle\sigma_{\min}(\mathcal{S}) \displaystyle\geq σmin(Σ)T2t+1.\displaystyle\frac{\sqrt{\sigma_{\min}(\Sigma)}}{\sqrt{T-2t+1}}.

6.4. Some properties of the χ2\chi^{2} distribution

We recall the following useful bounds for the χ2(ν)\chi^{2}(\nu) distribution of degree of freedom ν\nu.

Lemma 6.2.

[4, Lemma B.1] The following bounds hold:

(χ(ν)ν+2t)\displaystyle\mathbb{P}\left(\chi(\nu)\geq\sqrt{\nu}+\sqrt{2t}\right) \displaystyle\leq exp(t)\displaystyle\exp(-t)
(χ(ν)uν)\displaystyle\mathbb{P}\left(\chi(\nu)\leq\sqrt{u\nu}\right) \displaystyle\leq 2πν(ue/2)ν4.\displaystyle\frac{2}{\sqrt{\pi\nu}}\left(u\ e/2\right)^{\frac{\nu}{4}}.

References

  • [1] Amelunxen, D. Lotz, M., McCoy, M.B. and Tropp, J., Living on the edge: Phase transitions in conve programs with random data, http://arxiv.org/abs/1303.6672.
  • [2] Bauer, D., Asymptotic properties of subspace estimators. Automatica, 41, no. 3, (2005), 359-376.
  • [3] Bauer, D. (2005). Estimating linear dynamical systems using subspace methods. Econometric Theory, 21(01), 181-211.
  • [4] Chretien, S., and Darses, S. (2014). Sparse recovery with unknown variance: a LASSO-type approach. Information Theory, IEEE Transactions on, 60(7), 3970-3988.
  • [5] Palma, W., and Bondon, P., On the eigenstructure of generalized fractional processes. Statistics & Probability Letters 65 (2003) 93–101.
  • [6] Fazel, M., Pong, T. K. and Sun, D. (2009), Hankel matrix rank minimization with applications to system identification and realization.
  • [7] Gordon, Y., Some inequalities for Gaussian processes and applications, Israel Journal of Mathematics. 11/1985; 50(4):265-289.
  • [8] Grenander, U., Szego, G., Toeplitz forms and their applications. California Monographs in Mathematical Sciences. University of California Press, Berkeley, CA (1958).
  • [9] Francq, C., and Zakoïan, J. M. (1998). Estimating linear representations of nonlinear processes. Journal of Statistical Planning and Inference, 68(1), 145-165.
  • [10] Van Overschee, P. and De Moor, B. (1994), N4SID*: Subspace algorithms for the identification of combined deterministic stochastic system, Automatica, 27, 75–93.
  • [11] Recht, B., Fazel, M. and Parrilo, P., Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Review, 200?.
  • [12] Shumway, R.H., and Stoffer, D.S., Time series analysis and its applications, EZ - Third edition http://www.stat.ualberta.ca/ wiens/stat479/tsa3EZ.pdf
  • [13] Tao, T., https://terrytao.wordpress.com/2009/06/09/talagrands-concentration-inequality.
  • [14] Tropp, J., Convex recovery of a structured signal from independent random linear measuremenents, http://arxiv.org/abs/1405.1102.
  • [15] Verhaegen, M. and Verdult, V. (2007), Filtering and system identification a least squares approach, Cambridge University Press.
  • [16] Vershynin, Estimation in high dimensions: a geometric perspective, http://arxiv.org/abs/1405.5103.