This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sequential Low-Rank Change Detection

Yao Xie H. Milton Stewart School of Industrial
and Systems Engineering
Georgia Institute of Technology, GA
Email: yao.xie@isye.gatech.edu
   Lee Seversky Air Force Research Lab,
Information Directorate, Rome, NY
Email: lee.seversky@us.af.mil
(September 28, 2025)
Abstract

Detecting emergence of a low-rank signal from high-dimensional data is an important problem arising from many applications such as camera surveillance and swarm monitoring using sensors. We consider a procedure based on the largest eigenvalue of the sample covariance matrix over a sliding window to detect the change. To achieve dimensionality reduction, we present a sketching-based approach for rank change detection using the low-dimensional linear sketches of the original high-dimensional observations. The premise is that when the sketching matrix is a random Gaussian matrix, and the dimension of the sketching vector is sufficiently large, the rank of sample covariance matrix for these sketches equals the rank of the original sample covariance matrix with high probability. Hence, we may be able to detect the low-rank change using sample covariance matrices of the sketches without having to recover the original covariance matrix. We character the performance of the largest eigenvalue statistic in terms of the false-alarm-rate and the expected detection delay, and present an efficient online implementation via subspace tracking.

I Introduction

Detecting emergence of a low-rank structure is a problem arising from many high-dimensional streaming data applications, such as video surveillance, financial time series, and sensor networks. The subspace structure may represent an anomaly or novelty that we would like to detect as soon as possible once it appears. One such example is swarm behavior monitoring. Biological swarms consist of many simple individuals following basic rules to form complex collective behaviors [1]. Examples include flocks of birds, schools of fish, and colonies of bacteria. The collective behavior and movement patterns of swarms have inspired much recent research into designing robotic swarms consisting of many agents that use simple algorithms to collectively accomplish complicated tasks, e.g., a swarm of UAVs [2]. Early detection of an emerging or transient behavior that leads to a specific form of behavior is very important for applications in swarm monitoring and control. One key observation, as shown in [1] for classification of behavior purposes, is that many forms of swarm behaviors are represented as low-dimensional linear subspaces. This leads to a low-rank change in the covariance structure of the observations.

Refer to caption
Figure 1: Sensors jointly monitor locations (and possibly speeds) of a swarm.
Refer to caption
Figure 2: Detection of emergence of a low-rank signal. The observations are the locations of the particles. Left, locations of the particles are random and the sample covariance matrix is due to noise. Right, the locations of the swarms lie near a circle in the three dimensional space, and the sample covariance exhibit of a strong rank-two component.

In this paper, we propose a sequential change-point detection procedure based on the maximum eigenvalue of the sample covariance, which is a natural approach to detect the emergence of a low-rank signal. We characterize the performance of the detection procedure using two standard performance metrics, the average-run-length (ARL) that is related to the false-alarm-rate, and the expected detection delay. The distribution of the largest eigenvalue of the sample covariance matrix when data are Gaussian is presented in [3], which corresponds to the Tracey-Widom law of order one. However, it is intractable to directly analyze ARL using the Tracey-Widom law. Instead, we use a simple ε\varepsilon-set argument to decompose the detection statistic into a set of χ2\chi^{2}-CUSUM procedures, whose performance is well understood in literature.

Moreover, we present a new approach to detect low-rank changes based on sketching of the observation vectors. Sketching corresponds to linear projection of the original observations: yt=Axty_{t}=Ax_{t}, with Am×pA\in\mathbb{R}^{m\times p}. We may use sample covariance matrix of the sketches to detect emergence of a low-rank signal component, since it can be shown that for random projection, it can be shown that when mm is greater the rank of the signal covariance matrix, the sample covariance matrix of the linear sketches will have the same rank as the signal covariance matrix. We show that the minimum number of sketches mm is related to the property of the projection AA and the signal subspace. One key difference of the sketching for low-rank change detection from covariance sketching for low-rank covariance matrix recovery [4] is that, here we do not have to recover the covariance matrix which may require more number of sketches. Hence, sketching is a natural fit to detection problem.

The low-rank change is related to the so-called spiked covariance matrix [3], which assumes that a small number directions explain most of the variance. Such assumption is also made by sparse PCA, where the low-rank component is further assumed to be sparse. The goal of sparse PCA is to estimate such sparse subspaces (see, e.g., [5]). A fixed-sample hypothesis test to determine whether or not there exists a sparse and low-rank component, based on the largest eigenvalue statistic, is studied in [6], where it is shown to asymptotically minimax optimal. Another test statistic for such problems, the so-called Kac-Rice statistic, has been studied in [7]. The Kac-Rice statistic is the conditional survival function of the largest observed singular value conditioned on all other observed singular values, and it has a simple distribution form as uniformly distributed on [0, 1]. However, the statistic involves an infinite integral over the real line, which may not be easy to evaluate, and the test statistic needs to compute all eigenvalues of the sample covariance matrix instead of the largest eigenvalue.

II largest eigenvalue procedure

Assuming a sequence of pp-dimensional vectors x1,x2,,xtx_{1},x_{2},\ldots,x_{t}, t=1,2,t=1,2,\ldots. There may be a change-point at time τ\tau such that the distribution of the data stream changes. Our goal is to detect such a change as quickly as possible. Formally, such a problem can be stated as the following hypothesis test:

H0:\displaystyle H_{0}: x1,x2,,xtiid𝒩(0,σ02Ip)\displaystyle\quad x_{1},x_{2},\ldots,x_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}I_{p})
H1:\displaystyle H_{1}: x1,x2,,xτiid𝒩(0,σ02Ip),\displaystyle\quad x_{1},x_{2},\ldots,x_{\tau}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}I_{p}),
xτ+1,,xt,iid𝒩(0,σ02Ip+Σ).\displaystyle~~~x_{\tau+1},\ldots,x_{t},\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}I_{p}+\Sigma).

Here σ02\sigma_{0}^{2} is the noise variance, which is assumed to be known or has been estimated from training data. Assume the signal covariance matrix Σ\Sigma is low-rank, meaning rank(Σ)<p\mbox{rank}(\Sigma)<p. The signal covariance matrix is unknown.

One may construct a maximum likelihood ratio statistic. However, since the signal covariance matrix is unknown, we may have to form the generalized likelihood ratio statistic, which replaces the covariance matrix with the sample covariance. This may cause an issue since the statistic involves inversion of the sample covariance matrix, whose numerical property (such as condition number) is usually poor when pp is large.

Alternatively, we consider the largest eigenvalue of the sample covariance matrix which is a natural detection statistic for detecting a low-rank signal. Assume a scanning window approach. We estimate the sample covariance matrix using samples in a time window of [tw,t][t-w,t], where ww is the window size. Assume ww is chosen sufficiently large and it is greater than the anticipated longest detection delay. Using the ratio of the largest eigenvalue of the sample covariance matrix relative to the noise variance, we form the maximum eigenvalue procedure which is a stopping time given by:

T=inf{t:maxtw<k<t(tk)[1σ02λ1(Σ^t,k)d]b},T=\inf\{t:\max_{t-w<k<t}(t-k)\left[\frac{1}{\sigma_{0}^{2}}\lambda_{1}(\widehat{\Sigma}_{t,k})-d\right]\geq b\},

where dd is the pre-set drift parameter, b>0b>0 is the threshold, and λ1(Σ)\lambda_{1}(\Sigma) denotes the largest eigenvalue of a matrix Σ\Sigma. Here the index kk represents the possible change-point location. Hence, samples between [k+1,t][k+1,t] corresponds to post-change samples. The sample covariance matrix for post-change samples up to time tt is given by:

Σ^t,k=1tki=k+1txtxt.\widehat{\Sigma}_{t,k}=\frac{1}{t-k}\sum_{i=k+1}^{t}x_{t}x_{t}^{\intercal}.

The maximization over kk corresponds to search for the unknown change-point location. An alarm is fired whenever the detection statistic exceeds the threshold bb.

III Performance bounds

In this section, we characterize the performance of the procedure in terms of two standard performance metrics, the expected value of the stopping time when there is no change, called the average run length (ARL), and the expected detection (EDD), which is the expected time to stop in the extreme case when the change occurs immediately at κ=0\kappa=0.

Our approach is to decompose the maximum eigenvalue statistic to a set of χ2\chi^{2} CUSUM procedures. Our argument is based on the ε\varepsilon-net, which provides a convenient way to discretize unit sphere in our case. The number of such compact set is called the covering number, 𝒞(X,ε)\mathcal{C}(X,\varepsilon), which is the minimal cardinality of an ε\varepsilon-net. The covering number of a unit sphere is given by

Lemma III.1 (Lemma 5.2, [9]).

The unit Euclidean sphere Sp1S^{p-1} equipped with the Euclidean metric satisfies for every ε>0\varepsilon>0 that

𝒞(Sp1,ε)(1+2ε)p\mathcal{C}(S^{p-1},\varepsilon)\leq\left(1+\frac{2}{\varepsilon}\right)^{p}

The ε\varepsilon-net help to derive an upper bound of the largest eigenvalue, as stated in the following lemma

Lemma III.2 (Lemma 5.4, [9]).

Let Σ\Sigma be a symmetric p×pp\times p matrix, and let 𝒞ε\mathcal{C}_{\varepsilon} be an ε\varepsilon-net of Sp1S^{p-1} for some ε[0,1/2)\varepsilon\in[0,1/2). Then

λ1(A)(12ε)1supq𝒞ε|qAq|\lambda_{1}(A)\leq(1-2\varepsilon)^{-1}\sup_{q\in\mathcal{C}_{\varepsilon}}|q^{\intercal}Aq|

Our main theoretical results are the following, which are the lower bound on the ARL and the approximation to EDD of the largest eigenvalue procedure.

Theorem III.3 (Lower bound on ARL).

When bb\rightarrow\infty

𝔼[T]eb(1/2ε)(1θ)(1θ2+12logθ)(1+2ε)p.\mathbb{E}^{\infty}[T]\gtrsim\frac{e^{b(1/2-\varepsilon)(1-\theta)}}{(\frac{1-\theta}{2}+\frac{1}{2}\log\theta)(1+\frac{2}{\varepsilon})^{p}}.

where θ(0,1)\theta\in(0,1) is the root to the equation logθ+d(1θ)(12ε)=0\log\theta+d(1-\theta)(1-2\varepsilon)=0, ε(0,12)\varepsilon\in(0,\frac{1}{2}).

The lower bound above can be used to control false-alarm of the procedure. Given a target ARL, we may choose the corresponding threshold bb.

Let Σ\|\Sigma\| denote the spectral norm of a matrix Σ\Sigma, which corresponds to the largest eigenvalue. We have the following approximation:

Theorem III.4 (Approximation to EDD).

When bb\rightarrow\infty

𝔼1[T]=b+eb1[2(1+ρ2/σ02)]1+12log(1+ρ/σ0)(1+o(1))\mathbb{E}^{1}[T]=\frac{b+e^{-b}-1}{[2(1+\rho^{2}/\sigma_{0}^{2})]^{-1}+\frac{1}{2}\log\left(1+\rho/\sigma_{0}\right)}(1+o(1)) (1)

where ρ2=Σ\rho^{2}=\|\Sigma\|.

In (1), ρ2/σ2\rho^{2}/\sigma^{2} represents the signal-to-noise ratio. Note that the right-hand-side of (1) is a decreasing function of ρ2/σ2\rho^{2}/\sigma^{2}, which is expected, since the detection delay should be smaller when the signal-to-noise ratio is larger.

The following informal derivation justifies the theorems. First, from Lemma III.1, we have that the detection statistic

λ1(Σ^t,k)(12ε)1maxq𝒞ε|qΣ^t,kq|\lambda_{1}(\widehat{\Sigma}_{t,k})\leq(1-2\varepsilon)^{-1}\max_{q\in\mathcal{C}_{\varepsilon}}|q^{\intercal}\widehat{\Sigma}_{t,k}q|

For each q𝒞ε(S)q\in\mathcal{C}_{\varepsilon}(S), q=1\|q\|=1, we have

(tk)σ02|qΣ^t,kq|=i=k+1t(qxi)2σ02\frac{(t-k)}{\sigma_{0}^{2}}|q^{\intercal}\widehat{\Sigma}_{t,k}q|=\sum_{i=k+1}^{t}\frac{(q^{\intercal}x_{i})^{2}}{\sigma_{0}^{2}}

Note that under H0H_{0}: xi𝒩(0,σ02Ip)x_{i}\sim\mathcal{N}(0,\sigma_{0}^{2}I_{p}), and hence qxi/σ0𝒩(0,1)q^{\intercal}x_{i}/\sigma_{0}\sim\mathcal{N}(0,1), and

(qxi)2/σ02χ2(1).(q^{\intercal}x_{i})^{2}/\sigma_{0}^{2}\sim\chi^{2}(1).

Alternatively, under H1H_{1}: xi𝒩(0,σ02Ip+Σ)x_{i}\sim\mathcal{N}(0,\sigma_{0}^{2}I_{p}+\Sigma), and hence qxi/σ0𝒩(0,1+qΣq/σ2)q^{\intercal}x_{i}/\sigma_{0}\sim\mathcal{N}(0,1+q^{\intercal}\Sigma q/\sigma^{2}), and hence

(qxi)2/σ02(1+qΣqσ02)χ2(1).(q^{\intercal}x_{i})^{2}/\sigma_{0}^{2}\sim(1+\frac{q^{\intercal}\Sigma q}{\sigma_{0}^{2}})\chi^{2}(1).

Hence, the distribution before the change is χ2\chi^{2} random variable, and after the change-point is a scaled χ2\chi^{2} random variable. Now define a set of procedures, for each q𝒞ε(S)q\in\mathcal{C}_{\varepsilon}(S):

T~q=inf{t:maxk<ti=k+1t((qxi)2/σ02w)b}\widetilde{T}_{q}=\inf\{t:\max_{k<t}\sum_{i=k+1}^{t}((q^{\intercal}x_{i})^{2}/\sigma_{0}^{2}-w^{\prime})\geq b^{\prime}\}

Note that each one of these procedures is a χ2\chi^{2} CUSUM procedure. Now if set

w=(12ε)w,b=(12ε)b.w^{\prime}=(1-2\varepsilon)w,\quad b^{\prime}=(1-2\varepsilon)b.

then due to the above relations, we may bound the average run length of the original procedure in terms of these χ2\chi^{2} procedures:

𝔼[T]𝔼[minqT~q].\mathbb{E}^{\infty}[T]\geq\mathbb{E}^{\infty}\left[\min_{q}\widetilde{T}_{q}\right].

Since each T~q\widetilde{T}_{q} is a χ2\chi^{2} CUSUM procedure, whose properties are well understood, we may obtain a lower bound to the ARL of the maximum eigenvalue procedure.

Refer to caption
Figure 3: Obtain MM sketches of each nn-dimensional signal vector.

IV Sketching for rank change detection

When pp is large, a common practice is to use a linear projection AA to reduce data dimensionality. The linear projection maps the original pp-dimensional vector into a lower dimension mm-dimensional vector. We refer such linear projection as sketching. One implementation of sketching is illustrated in Fig. 3, where each signal vector xtx_{t} is projected by MM vectors, a1a_{1}, …, aMa_{M}. The sketching corresponds to linear projection of the original vector:

yit=aixt.y_{it}=a_{i}^{\intercal}x_{t}.

Define a vector of observations yt=[y1,t,,yM,t]My_{t}=[y_{1,t},\ldots,y_{M,t}]^{\intercal}\in\mathbb{R}^{M}, and

A=[a1,,aM]p×M.A=[a_{1},\ldots,a_{M}]\in\mathbb{R}^{p\times M}.

we have yt=Axty_{t}=A^{\intercal}x_{t}, t=1,2,t=1,2,\ldots.

One intriguing question is whether we may perform detection of the low-rank change using the linear projections. The answer is yes, as we present in the following. We first show that each linear corresponds to a bi-linear projection of the original covariance matrix. Define the sample covariance matrix of the sketches

Σ^k,ty=1tki=k+1tytyt=1tki=k+1t[y1i2y1iy2iy1iyMiy2iy1iy2i2yMiy1iyMiy2iyMi2]=1tki=k+1t[a1xixia1a1xixia2a1xixiaMa2xixia1a2xixia2aMxixia1aMxixia2aMxixiaM]=AΣ^t,kA\begin{split}\widehat{\Sigma}^{y}_{k,t}&=\frac{1}{t-k}\sum_{i=k+1}^{t}y_{t}y_{t}^{\intercal}\\ &=\frac{1}{t-k}\sum_{i=k+1}^{t}\begin{bmatrix}y_{1i}^{2}&y_{1i}y_{2i}&\cdots&y_{1i}y_{Mi}\\ y_{2i}y_{1i}&y_{2i}^{2}&\cdots&\\ \vdots&\ddots&\\ y_{Mi}y_{1i}&y_{Mi}y_{2i}&\cdots&y_{Mi}^{2}\end{bmatrix}\\ &=\frac{1}{t-k}\sum_{i=k+1}^{t}\begin{bmatrix}a_{1}x_{i}x_{i}^{\intercal}a_{1}&a_{1}x_{i}x_{i}a_{2}&\cdots&a_{1}x_{i}x_{i}a_{M}\\ a_{2}x_{i}x_{i}a_{1}&a_{2}x_{i}x_{i}a_{2}&\cdots&\\ \vdots&\ddots&\\ a_{M}x_{i}x_{i}a_{1}&a_{M}x_{i}x_{i}a_{2}&\cdots&a_{M}x_{i}x_{i}a_{M}\end{bmatrix}\\ &=A^{\intercal}\widehat{\Sigma}_{t,k}A\end{split} (2)

A key observation is that for certain choice of AA, e.g., Gaussian random matrix, when M>rank(Σ)M>\mbox{rank}(\Sigma), the rank of Σ^y\widehat{\Sigma}_{y} is equal to the rank of Σ^x\widehat{\Sigma}_{x} with probability one. Hence, we may detect change using the largest eigenvalue of the sample covariance matrix of yty_{t}.

A related line of work is covariance sketching, where the goal is to recover a low-rank covariance matrix using the quadratic sketches, which are square of each (aixt)2(a_{i}^{\intercal}x_{t})^{2}. This corresponds to only using diagonal entries of the sample covariance matrix (2) of the sketches.

Refer to caption
Refer to caption
Figure 4: EDD (upper) and logarithm of EDD (lower) versus MM (the number of sketches), for various SNR by varying ρ\rho. Here p=100p=100, and σ02=1\sigma_{0}^{2}=1, and the post-change covariance matrix is generated as σ02Im+ρ×\sigma_{0}^{2}I_{m}+\rho\times Gaussian random matrix with rank 3.

The sketches yty_{t} are still multi-variate Gaussian distributed, and the projection changes the variance. For the sketches, we may consider the following hypothesis test based on the original test form:

H0:\displaystyle H_{0}: y1,y2,,ytiid𝒩(0,σ02AA)\displaystyle\quad y_{1},y_{2},\ldots,y_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}A^{\intercal}A)
H1:\displaystyle H_{1}: y1,y2,,yτiid𝒩(0,σ02AA),\displaystyle\quad y_{1},y_{2},\ldots,y_{\tau}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}A^{\intercal}A),
yτ+1,,yt,iid𝒩(0,σ02AA+AΣA)\displaystyle~~y_{\tau+1},\ldots,y_{t},\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}A^{\intercal}A+A^{\intercal}\Sigma A)

Without loss of generality, to preserve noise property (or equivalently, to avoid amplifying noise), i.e., we choose the projection to be orthogonal, i.e., AA=ImA^{\intercal}A=I_{m}. Moreover, due to the Gaussian form, the analysis for the ARL and EDD of the procedure in Section III still holds, with ρ\rho refined by AΣA\|A^{\intercal}\Sigma A\|. Hence, it can be seen from this analysis, that projection affects the signal-to-noise ratio, and hence, not surprisingly, although in principal we only need MM to be greater than the rank of Σ\Sigma, in fact, we cannot choose MM to be arbitrarily small.

Fig. 4 illustrates this effect. In this example, we use a random subspace AA of dimension MM by pp, and vary the number of sketches MM, when the post-change covariance matrix of xtx_{t} is generated as σ02In\sigma_{0}^{2}I_{n} plus a Gaussian random matrix with rank 3. Then we plot the EDD of the largest eigenvalue procedure when sketches are used. We calibrate the thresholds in each setting so that the ARL is fixed to be 5000. Note that when the number of sketches is sufficiently large (and greater than rank(Σ)\mbox{rank}(\Sigma), the EDD approaches to a small number.

The following analysis may shed some light on how small we may choose MM to be. Let s=rank(Σ)s=\mbox{rank}(\Sigma), which is smaller than the ambient dimension pp. Since the EDD (1) is a decreasing function of SNR, which is a proportion to AΣA\|A^{\intercal}\Sigma A\|. Denote the eigendecomposition of Σ\Sigma to be UΛUU\Lambda U^{\intercal}. Clearly, SNR for the sketching case will depend on AUA^{\intercal}U, which depends on the principal angle between two subspaces. Recall that we have required AA to be a subspace AA=ImA^{\intercal}A=I_{m}, and hence the SNR will depend on the principal angle between the random projection and the signal subspace UU. The principal angle between two random subspaces is studied in [10]. The eigenvalues of the sample covariance matrix are jointly Beta distribution. Based on this fact, the following lemma is obtained in [11], which helps us to understand the behavior of the procedure. It characterizes the 2\ell_{2} norm of the fixed unit-norm vector projected by a random subspace.

Lemma IV.1 (Theorem 3 [11]).

Given random subspace AA, then for any fixed vector uu with u=1\|u\|=1, AuBeta(m/2,(pm)/2)\|A^{\intercal}u\|\sim\mbox{Beta}(m/2,(p-m)/2), and when pp\rightarrow\infty with δ=limpm/p\delta=\lim_{p\rightarrow\infty}m/p, for 0<ε<min(δ,1δ)0<\varepsilon<\min(\delta,1-\delta),

(δε<Au<δ+ε)1\mathbb{P}(\delta-\varepsilon<\|A^{\intercal}u\|<\delta+\varepsilon)\rightarrow 1

Using this lemma in our setting, we may obtain a simple lower bound for the AU\|A^{\intercal}U\|:

AU=[Au1Aus]=maxz:z=1iAuizimaxi=1sAiuiMplogs,\begin{split}\|A^{\intercal}U\|&=\|[A^{\intercal}u_{1}\cdots A^{\intercal}u_{s}]\|=\max_{z:\|z\|=1}\left\|\sum_{i}A^{\intercal}u_{i}z_{i}\right\|\\ &\geq\max_{i=1}^{s}\|A_{i}^{\intercal}u_{i}\|\gtrsim\frac{M}{p}\log s,\end{split} (3)

with high probability, where the last inequality is due to the maximum of a set of Beta random variables. Hence, the number of sketches MM should scale as p/logsp/\log s.

Refer to caption
Figure 5: The spectral norm AU\|A^{\intercal}U\| versus MM, when the number of sketches MM increases. Here AA is a subspace of dimension pp-by-mm, and UU is a orthonormal matrix of dimension pp-by-ss. Note that AU\|A^{\intercal}U\| increases slowly as ss increases, and it also increases when MM increases, which is consistent with our theory in (3).

V Online implementation via subspace tracking

Refer to caption

(a): max|[βt]i|\max|[\beta_{t}]_{i}|
Refer to caption
(b): βt\|\beta_{t}\|

Figure 6: Detection statistic βt2\|\beta_{t}\|^{2} computed using subspace tracking via Grouse [12]. There is a change-point at κ=500\kappa=500. In this example, p=100p=100, s=10s=10, and σ02=0.01\sigma_{0}^{2}=0.01. This example considers missing data. Only about 70%70\% entries are observed. At each time we only randomly observe a subset of entries of xtx_{t}.

There is a connection of the low-rank covariance and subspace model. Assume a rank ss post-change covariance matrix Σ\Sigma, and its eigen-decomposion UΛUU\Lambda U^{\intercal}. Then we may expression each observation vector xt=Uβt+wtx_{t}=U\beta_{t}+w_{t}, where wtw_{t} is a pp-dimensional Gaussian random vector with covariance matrix σ02Ip\sigma_{0}^{2}I_{p}. Before the change, βt=0\beta_{t}=0. After the change, βt0\beta_{t}\neq 0 and is a ss-dimensional Gaussian random vector with covariance matrix being Λ\Lambda. Hence, we may detect the low-rank change by detecting a non-zero βt\beta_{t}, if UU is known. When UU is unknown, we may perform an online subspace estimation from a sequence of data.

Based on such a connection, we may develop an efficient way to compute the detection statistic online via subspace tracking. This is related to the so-called matched-subspace detection [13], and here we further combine matched-subspace detection with online subspace estimation. Start with an initialization of the subspace U0U_{0}, using the sequence of observations x1,x2,x_{1},x_{2},\ldots, we may update the subspace using stochastic gradient descent on grassmannian manifolds, e.g., via the GROUSE algorithm [12]. Then we perform projection of the next vector Ut1xtU_{t-1}^{\intercal}x_{t}, which gives an estimate for βt\beta_{t}. We may claim a change when either maxi|[βt]i|\max_{i}|[\beta_{t}]_{i}| (mimicking the largest eigenvalue) or the norm square βt2\|\beta_{t}\|^{2} (mimicking the sum of the eigenvalues) becomes large. Since the subspace tracking algorithm (e.g., GROUSE) can even deal with missing data, this approach allows us to compute the detection statistic even when we can only observe a subset of entries of xtx_{t} at each time. Fig. 6 demonstrates the detection statistic computed via GROUSE subspace tracking when only about 70%\% of the entries can be observed at random locations. There is a change-point at time 500500. Clearly, the detection statistic computed via subspace tracking can detect such a change by raise a large peak.

VI Conclusion

We have presented a sequential change-point detection procedure based on the largest eigenvalue statistic for detecting a low-rank change to the signal covariance matrix. It is related to the so-called spiked covariance model. We present a lower-bound for the average-run-length (ARL) of the procedure when there is no change, which can be used to control false alarm rate. We also present an approximation to the expected detection delay (EDD) of the proposed procedure, which characterizes the dependence of EDD on the spectral norm of the post-change covariance matrix Σ\Sigma. Our theoretical results are obtained using an ε\varepsilon-net argument, which leads to a decomposition of the proposed procedure into a set of χ2\chi^{2}-CUSUM procedures. We further present a sketching procedure, which linearly projects the original observations into lower-dimensional sketches, and performs the low-rank change detection using these sketches. We demonstrate that when the number of sketches is sufficiently large (on the order of p/logsp/\log s with ss being the rank of the post-change covariance), the sketching procedure can detect the change with little performance loss relative to the original procedure using full data. Finally, we present an approach to compute the detection statistic online via subspace tracking.

Acknowledgement

The author would like to thank Dake Zhang at Tsinghua University, Lauren Hue-Seversky and Matthew Berger at Air Force Research Lab, Information Directorate for stimulating discussions, and Shuang Li at Georgia Tech for help with numerical examples. This work is partially supported by a AFRI Visiting Faculty Fellowship.

References

  • [1] M. Berger, L. Seversky, and D. Brown, “Classifying swarm behavior via compressive subspace learning,” in IEEE International Conference on Robotics and Automation, 2016.
  • [2] D. Hambling, “U.S. Navy plans to fly first drone swarm this summer.” http://www.defensetech.org/2016/01/04/u-s-navy-plans-to-fly-first-drone-swarm-this-summer/, Jan. 2016.
  • [3] I. M. Johnstone, “On the distribution of the largest engenvalue in principal component analysis,” Ann. Statist., vol. 29, pp. 295–327, 2001.
  • [4] Y. Chen, Y. Chi, and A. J. Goldsmith, “Exact and stable covariance estimation from quadratic sampling via convex programming,” IEEE Trans. on Info. Theory, vol. 61, pp. 4034–4059, July 2015.
  • [5] T. Cai, Z. Ma, and Y. Wu, “Optimal estimation and rank detection for sparse spiked covariance matrices,” arXiv:1305.3235, 2016.
  • [6] Q. Berthet and P. Rigollet, “Optimal detection of sparse principal components in high dimension,” Ann. Statist., vol. 41, no. 1, pp. 1780–1815, 2013.
  • [7] J. Taylor, J. Loftus, and R. Tibshrani, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Ser. B, vol. 58, pp. 267–288, 1996.
  • [8] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, pp. 100–115, June 1954.
  • [9] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” arXiv:1011.2037, 2011.
  • [10] P.-A. Absil, A. Edelman, and P. Koev, “On the largest principal angle between random subspaces,” Linear Algebra and Appl., 2006.
  • [11] Y. Cao, A. Thompson, M. Wang, and Y. Xie, “Sketching for sequential change-point detection,” arXiv:1505.06770, 2015.
  • [12] D. Zhang and L. Balzano., “Global convergence of a grassmannian gradient descent algorithm for subspace estimation,” arXiv:1506.07405, 2015.
  • [13] L. L. Scharf and B. Friedlander, “Matched subspace detectors,” IEEE Trans. Signal Processing, vol. 42, no. 8, pp. 2146–2157, 1994.