Tests for principal eigenvalues and eigenvectors

Jianqing Fan
Operations Research and Financial Engineering Princeton University
Yingying Li
Department of Information System Business Statistics and Operations Management Hong Kong University of Science and Technology
Ningning Xia
School of Statistics and Management Shanghai University of Finance and Economics
Xinghua Zheng
Department of Information System Business Statistics and Operations Management Hong Kong University of Science and Technology

Abstract

We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of structural breaks because they can distinguish between individual principal eigenvalues and/or eigenvectors. We demonstrate the application by comparing the principal eigenvalues and principal eigenvectors of S&P500 Index constituents’ daily returns over different years.

Keywords: Factor model; principal eigenvalues; principal eigenvectors; central limit theorem; two-sample test

1 Introduction

Factor models have been widely adopted in many disciplines, most notably, economics and finance. Some of the most famous examples include the capital asset pricing model (CAPM, Sharpe (1964)), arbitrage pricing theory (Ross (1976)), approximate factor model (Chamberlain and Rothschild (1983)), Fama-French three factor model (Fama and French (1992)) and the more recent five-factor model (Fama and French (2015)).

Statistically, the analysis of factor models is closely related to principal component analysis (PCA). For example, finding the number of factors is equivalent to determining the number of principal eigenvalues (Bai and Ng (2002); Onatski (2010); Ahn and Horenstein (2013)); estimating factor loadings as well as factors relies on principal eigenvectors (Stock and Watson (1998, 2002); Bai (2003); Bai and Ng (2006); Fan et al. (2011, 2013); Wang and Fan (2017)).

A factor model typically reads as follows:

y_{it}={\bf b}_{i}^{\mathrm{T}}{\bf f}_{t}+\varepsilon_{it},~{}~{}~{}~{}i=1,2,\ldots,N,~{}t=1,2,\ldots,T,

(1)

where $y_{it}$ is the observation from the $i$ th subject at time $t$ , ${\bf f}_{t}$ is a set of factors, and $\varepsilon_{it}$ is the idiosyncratic component. The number of factors, $r=\dim(f_{t})$ , is small compared with the dimension $N$ , and is assumed to be fixed throughout the paper. The factor model (1) can be put in a matrix form as

{\bf y}_{t}={\bf B}{\bf f}_{t}+{\boldsymbol{\varepsilon}}_{t},t=1,2,\ldots,T,

where ${\bf y}_{t}=(y_{1t},\ldots,y_{Nt})^{\mathrm{T}}$ , ${\bf B}=({\bf b}_{1},\ldots,{\bf b}_{N})^{\mathrm{T}}$ and ${\boldsymbol{\varepsilon}}_{t}=(\varepsilon_{1t},\ldots,\varepsilon_{Nt})^{\mathrm{T}}$ . If follows that the covariance matrix ${\boldsymbol{\Sigma}}$ of ${\bf y}_{t}$ satisfies

{\boldsymbol{\Sigma}}={\bf B}\operatorname{Cov}({\bf f}_{t}){\bf B}^{\mathrm{T}}+{\boldsymbol{\Sigma}}_{\varepsilon},

where ${\boldsymbol{\Sigma}}_{\varepsilon}$ is the covariance matrix of $({\boldsymbol{\varepsilon}}_{t})$ .

The factors $({\bf f}_{t})$ in some situations are taken to be observable. Examples include the market factor in CAPM and the Fama-French three factors. In some other situations, factors are latent and hence unobservable. In this paper, we focus on the latent factor case.

Factor models provide a parsimonious way to describe the dynamics of large dimensional variables. In the study of factor models, time invariance of factor loadings is a standard assumption. For example, in order to apply PCA, the loadings need to be time invariant or at least roughly so, otherwise the estimation will be inconsistent. However, parameter instability has been a pervasive phenomenon in time series data. Such instability could be due to policy regime switches, changes in economic/finanncial fundamentals, etc. Because of this reason, caution has to be exercised about potential structural changes in real data. Statistical analysis of structural change in large factor model is challenging because the factors are unobserved and factor loadings have to be estimated.

There are some existing work on detecting structural breaks. Typically, the setup is as follows: suppose there are two time periods, one from time 1 to $T_{1}$ , the second from $T_{1}+1$ to $T_{1}+T_{2}$ , where $T_{1}$ and $T_{2}$ do not necessarily equal. The first period has loading ${\bf B}_{1}$ , and the second period has loading ${\bf B}_{2}$ . One then tests whether ${\bf B}_{1}$ equals ${\bf B}_{2}$ . Specifically, one considers the following model:

	$\displaystyle{\bf y}_{t}$	$\displaystyle=$	$\displaystyle{\bf B}_{1}{\bf F}_{t}+{\boldsymbol{\varepsilon}}_{t},~{}~{}~{}t=1,2,\ldots,T_{1},$
	$\displaystyle{\bf y}_{t}$	$\displaystyle=$	$\displaystyle{\bf B}_{2}{\bf F}_{t}+{\boldsymbol{\varepsilon}}_{t},~{}~{}~{}t=T_{1}+1,\ldots,T_{1}+T_{2},$

and tests the following hypothesis for detecting structural breaks

H_{0}:~{}{\bf B}_{1}={\bf B}_{2}~{}~{}~{}~{}\textrm{vs.}~{}~{}~{}~{}H_{a}:~{}{\bf B}_{1}\neq{\bf B}_{2}.

Existing works include Stock and Watson (2009); Breitung and Eickmeier (2011); Chen et al. (2020); Han and Inoue (2015), among others.

Let us connect the factor loadings with principal eigenvalues and eigenvectors. Recall that ${\boldsymbol{\Sigma}}$ stands for the covariance matrix of $({\bf y}_{t})$ . Write its spectral decomposition as

{\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{{\bf V}}^{T},

where

{\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N}),\mbox{ and }{\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}).

The diagonal matrix ${\boldsymbol{\Lambda}}$ consists of eigenvalues in descending order, and ${\bf V}$ consists of corresponding eigenvectors. Under the convention that $\operatorname{Cov}({\bf f}_{t})={\bf I}$ , the factor loading matrix

{\bf B}=\left(\sqrt{\lambda_{1}}{\bf v}_{1},\ldots,\sqrt{\lambda_{r}}{\bf v}_{r}\right).

Therefore structural breaks can be due to changes in

(i)

one or more $\lambda_{i}$ , or
(ii)

one or more ${\bf v}_{i}$ , or
(iii)

both.

The economic and/or financial implications of these possibilities are, however, completely different. If a structural break is only due to change in eigenvalues, then in many applications, the structural break has no essential impact. For example, from dimension reduction point of view, if the principal eigenvalues change while the principal eigenvectors do not change, then projecting onto the principal eigenvectors is still valid. In contrast, if a structural break is caused by eigenvectors, then it may indicate a much more fundamental change, possibly associated with important economic or market condition changes, to which one should be alerted.

Such observations bring up the aim of this paper: instead of testing whether the whole matrix ${\bf B}$ is the same during two periods, we want to detect changes in individual principal eigenvalues and eigenvectors. By doing so, we can pinpoint the source of structural changes. Specifically, when a structural break occurs, we can determine whether it is caused by a change in a principal eigenvalue, a change in a principal eigenvector, or perhaps changes in both principal eigenvalues and eigenvectors.

To be more specific, we consider the the following three tests. Let ${\boldsymbol{\Sigma}}^{(1)}$ and ${\boldsymbol{\Sigma}}^{(2)}$ be the population covariance matrices for the two periods under study. For any symmetric matrix $A$ and any integer $k$ , we let $\lambda_{k}(A)$ denote the $k$ th largest eigenvalue of $A$ , ${\bf v}_{k}(A)$ the corresponding eigenvector, and $\operatorname{tr}({\bf A})$ its trace.

(i)

Test equality of principal eigenvalues: for each $k=1,\ldots,r$ , we test

$H_{0}^{(I,k)}:~{}\lambda_{k}^{(1)}=\lambda_{k}^{(2)}~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(I,k)}:~{}\lambda_{k}^{(1)}\neq\lambda_{k}^{(2)},$

where $\lambda_{k}^{(1)}:=\lambda_{k}({\boldsymbol{\Sigma}}^{(i)}),\ i=1,2$ .

(ii)

Considering that the total variation may vary, we test about equality of the ratio of principal eigenvalues: for each $k=1,\ldots,r$ , test

H_{0}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}=\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}\neq\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}.

(iii)

Most importantly, we test equality of principal eigenvectors: for each $k=1,2,\ldots,r$ , test

H_{0}^{(III,k)}:~{}|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1,~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(III,k)}:~{}|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|<1,

where ${\bf v}_{k}^{(i)}:={\bf v}_{k}({\boldsymbol{\Sigma}}^{(i)}),\ i=1,2$ , and $\langle{\bf a},{\bf b}\rangle$ denotes the inner product of two vectors ${\bf a}$ and ${\bf b}$ .

In this paper, we establish central limit theorems (CLT) for principal eigenvalues, eigenvalue ratios, as well as eigenvectors. We then develop two-sample tests based on these CLTs.

Due to the wide application of PCA, a lot of work has been devoted to investigating principal eigenvalues. However, the study of principal eigenvectors is very limited. This paper represents a significant advancement in this direction.

We remark that there is an independent work, Bao et al. (2022), that study similar questions. Nevertheless, there are several significant differences between Bao et al. (2022) and our paper. First, the non-principal eigenvalues are assumed to be equal in Bao et al. (2022); see equation (1.2) therein. This is an unrealistic assumption in many applications. In our paper, we allow the non-principal eigenvalues to follow an arbitrary distribution, rendering our results readily applicable in practice. Second, in Bao et al. (2022), the dimension to the sample size ratio needs to be away from one; see Assumption 2.4 therein. We do not impose such a restriction in our paper. Third, Bao et al. (2022) only consider the one-sample situation and study the projection of sample leading eigenvectors onto a given direction. In our paper, we establish two-sample CLT, where the projection of a principal eigenvector onto a random direction is considered. Establishing such a result presents a significant challenge. In summary, the setting of our paper is practically appropriate, and the results are of significant importance.

The organization of the paper is as follows. Theoretical results are presented in Sections 2-4. Simulation and Empirical studies are presented in Section 5 and 6, respectively. Proofs are collected in the Appendix.

Notation: we use the following notation in addition to what have been introduced above. For a symmetric $N\times N$ matrix ${\bf A}$ , its empirical spectral distribution (ESD) is defined as

F^{{\bf A}}(x)=\dfrac{1}{N}\ \sum_{j=1}^{N}\ {\mathbf{1}}(\lambda_{j}({\bf A})\leq x),~{}~{}~{}~{}~{}x\in{\mathbb{R}},

where ${\mathbf{1}}(\cdot)$ is the indicator function. The limit of ESD as $N\rightarrow\infty$ , if it exists, is referred to as the limiting spectral distribution, or LSD for short. For any vector ${\bf a}$ , let ${{\bf a}}[k]$ be its $k$ th entry. We use $``\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}"$ to denote weak convergence.

2 Setting and Assumptions

We assume that $({\bf y}_{t})_{t=1}^{T}$ is a sequence of i.i.d. $N-$ dimensional random vectors with mean zero and covariance matrix ${\boldsymbol{\Sigma}}$ . Let $\lambda_{1},\ldots,\lambda_{N}$ be the eigenvalues of ${\boldsymbol{\Sigma}}$ in descending order, and ${\bf v}_{1},\ldots,{\bf v}_{N}$ be the corresponding eigenvectors. Write ${\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N})$ and ${\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N})$ . Then the spectral decomposition of ${\boldsymbol{\Sigma}}$ is given by ${\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{\mathrm{T}}$ .

We make the following assumptions.

Assumption A:

The eigenvalues $\lambda_{1}>\lambda_{2}>\ldots>\lambda_{r}>\lambda_{r+1}\geq\ldots\geq\lambda_{N}$ satisfy that
1. (A.i)
  
  for the principal part, one has $\lim_{N\rightarrow\infty}\lambda_{k}/N=\theta_{k}\in(0,+\infty)$ for $1\leq k\leq r$ , where $r>1$ is a fixed integer and $\theta_{k}$ ’s are distinct.
2. (A.ii)
  
  for the non-principal part, there exists a $C_{0}<\infty$ such that $\lambda_{j}\leq C_{0}$ for $j>r$ , and the empirical distribution of $\{\lambda_{r+1},\ldots,\lambda_{N}\}$ tends to a distribution $H$ .

Remark 1.

Assumption (A.i) implies that the factors are strong. When the factors are weak, say $\lambda_{i}\asymp N^{\alpha}$ for some $\alpha\in(0,1)$ , the convergence of sample principal components still holds with the convergence rate depending on $\alpha$ . In this paper, we only focus on the strong factor case and leave the study of weak factors for future work.

Assumption B: The observations $({\bf y}_{i})_{i=1}^{T}$ can be written as ${\bf y}_{i}={\bf V}{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{i}$ , where
$\{{\bf z}_{i}=({\bf z}_{i}[1],{\bf z}_{i}[2],\ldots,{\bf z}_{i}[N])^{\mathrm{T}},\ i=1,2,\ldots,T\}$ are i.i.d. random vectors, and ${\bf z}_{i}[\ell],\ell=1,\ldots,N,$ are independent random variables with zero mean, unit variance and satisfying $\sup_{N}\max_{1\leq\ell\leq N}\operatorname{E}({\bf z}_{1}[\ell])^{4}<\infty$ .

Remark 2.

Assumption B covers the multivariate normal case and coincides with the idea of PCA. Specifically, if ${\bf y}_{i}$ follows a multivariate normal distribution, then ${\bf z}_{i}={\boldsymbol{\Lambda}}^{-1/2}{\bf V}^{\mathrm{T}}{\bf y}_{i}$ is an $N$ -dimensional standard normal random vector and Assumption B holds naturally. On the other hand, under the orthogonal basis ${\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N})$ , the coordinates of ${\bf y}_{i}$ are $(\sqrt{\lambda_{1}}{\bf z}_{i}[1],\ldots,\sqrt{\lambda_{N}}{\bf z}_{i}[N])$ . Assumption B says that the coordinate variables are independent with mean zero and variance $\lambda_{i},i=1,\ldots,N$ .

Assumption C: The dimension $N$ and sample size $T$ are such that $\rho_{N}:=N/T\rightarrow\rho\in(0,+\infty)$ as $N\rightarrow\infty$ .

3 One-sample Asymptotics

Let $\widehat{{\boldsymbol{\Sigma}}}_{N}$ be the sample covariance matrix defined as

\widehat{{\boldsymbol{\Sigma}}}_{N}=\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}{\bf y}_{t}{\bf y}_{t}^{\mathrm{T}}.

Denote its eigenvalues by $\widehat{\lambda}_{1}\geq\cdots\geq\widehat{\lambda}_{N}$ , and let $\widehat{{\bf v}}_{1},\ldots,\widehat{{\bf v}}_{N}$ be the corresponding eigenvectors.

Theorem 1.

Under Assumptions A–C, the principal eigenvalues converge weakly to a multivariate normal distribution:

\displaystyle\sqrt{T}\begin{pmatrix}\widehat{\lambda}_{1}/\lambda_{1}-1\\ \vdots\\ \widehat{\lambda}_{r}/\lambda_{r}-1\end{pmatrix}~{}\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}~{}N(0,{\boldsymbol{\Sigma}}_{\lambda}),

(2)

where ${\boldsymbol{\Sigma}}_{\lambda}=\operatorname{diag}(\sigma_{\lambda_{1}}^{2},\ldots,\sigma_{\lambda_{r}}^{2})$ is a diagonal matrix with $\sigma_{\lambda_{k}}^{2}=\operatorname{E}\left({\bf z}_{1}[k]\right)^{4}-1$ .

Remark 3.

The marginal convergence in (2) has been established in Wang and Fan (2017) under a sub-Gaussian assumption. We generalize their result to joint convergence and under a weaker moment assumption. See also Cai et al. (2020) for a related result under a different setting.

Remark 4.

By Theorem 3 below, the variance $\sigma_{\lambda_{k}}^{2}$ can be consistently estimated by

\widehat{\sigma_{\lambda_{k}}^{2}}=\dfrac{1}{T(\widehat{\lambda}_{k})^{2}}\sum_{t=1}^{T}(\widehat{{\bf v}}_{k}^{\mathrm{T}}{\bf y}_{t})^{4}-1,

hence a feasible CLT is readily available.

Theorem 2.

Under Assumptions A–C, for each $1\leq k\leq r$ , we have

\displaystyle\sqrt{\dfrac{T}{\widehat{\sigma_{-k}^{2}}}}\left(\dfrac{\widehat{\lambda}_{k}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}}-\dfrac{\lambda_{k}}{\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,1),

(3)

where

\widehat{\sigma_{-k}^{2}}=\dfrac{\widehat{\lambda}_{k}^{2}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}\right)^{2}}\left[\widehat{\sigma_{\lambda_{k}}^{2}}+\dfrac{\sum_{j\neq k,j=1}^{r}\widehat{\lambda}_{j}^{2}\ \widehat{\sigma_{\lambda_{j}}^{2}}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}\right)^{2}}\right].

Remark 5.

Theorem 2 can be used to construct the confidence interval for the ratio $\varrho_{k}:={\lambda_{k}}/{\operatorname{tr}({\boldsymbol{\Sigma}})}$ . This follows easily from (3) and the fact that, if we write $\widetilde{\varrho}={\lambda_{k}}/{(\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k})}$ , then $\varrho_{k}=\widetilde{\varrho}_{k}/(1+\widetilde{\varrho}_{k})$ , which is a strictly increasing function of $\widetilde{\varrho}_{k}$ .

Theorem 3.

Under Assumptions A–C, for each $1\leq k\leq r$ , the principal sample eigenvector $\widehat{{\bf v}}_{k}$ satisfies

T\left(1-\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle^{2}-\dfrac{1}{T\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2},

where $\omega_{ki}={\theta_{k}\theta_{i}}/{(\theta_{k}-\theta_{i})^{2}}$ , which can be consistently estimated by

\widehat{\omega_{ki}}=\dfrac{\widehat{\lambda}_{k}\widehat{\lambda}_{i}}{(\widehat{\lambda}_{k}-\widehat{\lambda}_{i})^{2}},

and $Z_{i}$ ’s are i.i.d standard normal random variables.

Remark 6.

The convergence rate of $\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle^{2}$ has been established in Theorem 3.2 of Wang and Fan (2017). We derive the corresponding limiting distribution at the boundary of the parameter space, which is much more difficult to prove.

The proofs of Theorems 1, 2 and 3 are given in the supplementary material.

4 Two-sample Tests

We now discuss how to conduct the three tests mentioned in the Introduction.

Suppose that we have two groups of observations of the same dimension $N$ :

{\bf y}_{1}^{(1)},\ldots,{\bf y}_{T_{1}}^{(1)},~{}~{}~{}~{}\textrm{and}~{}~{}~{}{\bf y}_{1}^{(2)},\ldots,{\bf y}_{T_{2}}^{(2)},

which are drawn independently from two populations with mean zero and covariance matrices ${\boldsymbol{\Sigma}}^{(1)}$ and ${\boldsymbol{\Sigma}}^{(2)}$ . We assume that Assumption B holds for each group of observations. Moreover, analogous to Assumption C, we have

\lim_{N\rightarrow\infty}\frac{N}{T_{i}}=\rho_{i}\in(0,+\infty),\quad i=1,2.

Finally, analogous to Assumption A, with the spectral decompositions of ${\boldsymbol{\Sigma}}^{(i)},i=1,2$ , given by

\displaystyle{\boldsymbol{\Sigma}}^{(i)}

\displaystyle=

\displaystyle{{\bf V}}^{(i)}{\boldsymbol{\Lambda}}^{(i)}{{{\bf V}}^{(i)}}^{\mathrm{T}},

where ${\boldsymbol{\Lambda}}^{(i)}=\operatorname{diag}(\lambda_{1}^{(i)},\ldots,\lambda_{r}^{(i)},\lambda_{r+1}^{(i)},\ldots,\lambda_{N}^{(i)}),$ and ${{\bf V}}^{(i)}=({{\bf v}}_{1}^{(i)},\ldots,{{\bf v}}_{r}^{(i)},{{\bf v}}_{r+1}^{(i)},\ldots{{\bf v}}_{N}^{(i)}),$ we assume, for each population, there are $r$ principal eigenvalues, which satisfy

\lim_{N\rightarrow\infty}\dfrac{\lambda_{k}^{(i)}}{N}\ =\ \theta_{k}^{(i)}\in(0,+\infty),~{}~{}~{}~{}~{}~{}~{}\textrm{for }~{}1\leq k\leq r,~{}i=1,2;

while the remaining eigenvalues $\lambda_{j}^{(i)},r+1\leq j\leq N$ , are uniformly bounded and have a limiting empirical distribution. The two liming empirical distributions for the two populations can be different.

Naturally, our tests will be based on the sample covariance matrices

\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)}=\frac{1}{T_{i}}\sum_{j=1}^{T_{i}}{\bf y}_{j}^{(i)}{{\bf y}^{(i)}_{j}}^{\mathrm{T}},\quad i=1,2.

Write their spectral decompositions as

\displaystyle\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)}=\widehat{{\bf V}}^{(i)}\widehat{{\boldsymbol{\Lambda}}}^{(i)}{(\widehat{{\bf V}}^{(i)})}^{\mathrm{T}}

where

\widehat{{\boldsymbol{\Lambda}}}^{(i)}=\operatorname{diag}(\widehat{\lambda}_{1}^{(i)},\ldots,\widehat{\lambda}_{N}^{(i)}),~{}~{}\widehat{{\bf V}}^{(i)}=(\widehat{{\bf v}}_{1}^{(i)},\ldots,\widehat{{\bf v}}_{N}^{(i)}).

4.1 Testing equality of principal eigenvalues

To test $H_{0}^{(I,k)}:~{}\lambda_{k}^{(1)}=\lambda_{k}^{(2)}$ , we use the following test statistic

T_{\lambda k}=\sqrt{\dfrac{T_{1}T_{2}}{T_{1}(\widehat{\sigma_{\lambda_{k}}^{2}})^{(2)}+T_{2}(\widehat{\sigma_{\lambda_{k}}^{2}})^{(1)}}}\cdot\left(\dfrac{\widehat{\lambda}_{k}^{(1)}}{\widehat{\lambda}_{k}^{(2)}}-1\right),

where

(\widehat{\sigma_{\lambda_{k}}^{2}})^{(i)}=\dfrac{1}{(\widehat{\lambda}_{k}^{(i)})^{2}\ T_{i}}\sum_{j=1}^{T_{i}}\left((\widehat{{\bf v}}_{k}^{(i)})^{\mathrm{T}}{\bf y}_{j}^{(i)}\right)^{4}-1,\quad i=1,2.

Theorem 4.

Under null hypothesis $H_{0}^{(I,k)}$ and Assumptions A–C, the proposed test statistic $T_{\lambda k}$ converges weakly to the standard normal distribution.

Theorem 4 follows directly from Theorem 1 and the Delta method.

4.2 Testing equality of ratios of principal eigenvalues

The null hypothesis

H_{0}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}=\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}

is equivalent to

H_{0}^{(II,k)^{\prime}}:\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})-\lambda_{k}^{(1)}}\ =\ \dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})-\lambda_{k}^{(2)}}.

Based on such an observation, we propose the following test statistic

T_{ek}:=\dfrac{\sqrt{N}\left(\dfrac{\widehat{\lambda}_{k}^{(1)}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(1)})-\widehat{\lambda}_{k}^{(1)}}-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(2)})-\widehat{\lambda}_{k}^{(2)}}\right)}{\sqrt{\dfrac{N}{T_{1}}\ \widehat{{\sigma_{-k}^{2}}}^{(1)}+\dfrac{N}{T_{2}}\ \widehat{{\sigma_{-k}^{2}}}^{(2)}}}

where

\widehat{{\sigma_{-k}^{2}}}^{(i)}=\dfrac{(\widehat{\lambda}_{k}^{(i)})^{2}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)})-\widehat{\lambda}_{k}^{(i)}\right)^{2}}\left(\widehat{\sigma_{\lambda_{k}}^{2}}^{(i)}+\dfrac{\sum_{j\neq k,j=1}^{r}(\widehat{\lambda}_{j}^{(i)})^{2}\ \widehat{\sigma_{\lambda_{j}}^{2}}^{(i)}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)})-\widehat{\lambda}_{k}^{(i)}\right)^{2}}\right),\quad i=1,2.

Theorem 5.

Under the null hypothesis $H_{0}^{(II,k)}$ and Assumptions A–C, the test statistic $T_{ek}$ converges weakly to the standard normal distribution.

Theorem 5 is a straightforward consequence of Theorem 2.

4.3 Testing equality of principal eigenvectors

To test $H_{0}^{(III,k)}:|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1$ , we propose the following test statistic

	$\displaystyle T_{vk}$	$\displaystyle:=2N\left(1-\|\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle\|\right)$		(4)
		$\displaystyle\quad-\dfrac{N^{2}}{T_{1}(N-r)\widehat{\lambda}_{k}^{(1)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(1)}-\dfrac{N^{2}}{T_{2}(N-r)\widehat{\lambda}_{k}^{(2)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(2)}.$		(4)

Theorem 6.

Under null hypothesis $H_{0}^{(III,k)}$ and Assumptions A–C, suppose further that $\boldsymbol{\Xi}_{11}^{*}:=\left(\lim_{N\rightarrow\infty}\langle{{\bf v}}_{s}^{(1)},{{\bf v}}_{t}^{(2)}\rangle\right)_{s,t=1,\ldots,r}$ exist. Then the proposed test statistic $T_{vk}$ converges weakly as follows:

T_{vk}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ {\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{*}\\ (-{\boldsymbol{\Xi}_{11,-k}^{*}})^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k},

where ${\bf q}_{k}$ follows a multivariate normal distribution with mean zero and covariance matrix

	$\displaystyle{\bf D}_{k}=\operatorname{diag}$	$\displaystyle\left(\rho_{1}\omega_{k1}^{(1)},\ldots,\rho_{1}\omega_{k(k-1)}^{(1)},\rho_{1}\omega_{k(k+1)}^{(1)},\ldots,\rho_{1}\omega_{kr}^{(1)},\right.$
		$\displaystyle\ \left.\rho_{2}\omega_{k1}^{(2)},\ldots,\rho_{2}\omega_{k(k-1)}^{(2)},\rho_{2}\omega_{k(k+1)}^{(2)},\ldots,\rho_{2}\omega_{kr}^{(2)}\right),$

\omega_{kj}^{(i)}=\dfrac{\theta_{k}^{(i)}\theta_{j}^{(i)}}{(\theta_{k}^{(i)}-\theta_{j}^{(i)})^{2}},~{}~{}~{}~{}\textrm{for}~{}i=1,2,~{}~{}j=1,\ldots,k-1,k+1,\ldots,r,

and $\boldsymbol{\Xi}_{11,-k}^{*}$ is the matrix obtained by deleting the $k$ th row and $k$ th column of $\boldsymbol{\Xi}_{11}^{*}$ . Furthermore, $\omega_{kj}^{(i)}$ and $\boldsymbol{\Xi}_{11}^{*}$ can be consistently estimated by

\widehat{\omega_{kj}^{(i)}}=\dfrac{\widehat{\lambda}_{k}^{(i)}\widehat{\lambda}_{j}^{(i)}}{(\widehat{\lambda}_{k}^{(i)}-\widehat{\lambda}_{j}^{(i)})^{2}},\quad\mbox{ and }\quad\widehat{\boldsymbol{\Xi}_{11}^{*}}=\left(\langle\widehat{{\bf v}_{s}^{(1)}},\widehat{{\bf v}_{t}^{(2)}}\rangle\right)_{s,t=1,\ldots,r},

respectively.

Corollary 1.

Under the stronger null hypothesis that $|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1$ for all $k=1,\ldots,r$ , we have $\boldsymbol{\Xi}_{11}^{*}={\bf I}_{r}$ , and the proposed test statistic $T_{vk}$ converges as follows:

T_{vk}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{j\neq k,j=1}^{r}\left(\rho_{1}\omega_{kj}^{(1)}+\rho_{2}\omega_{kj}^{(2)}\right)\cdot Z_{j}^{2},

where $Z_{j}$ ’s are i.i.d. standard normal random variables.

Theorem 6 and Corollary 1 are proved in the supplementary material.

5 Simulation Studies

5.1 Design

We consider five population covariance matrices as follows:

	$\displaystyle{\boldsymbol{\Sigma}}_{1}$	$\displaystyle=$	$\displaystyle{\bf V}^{(1)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(1)})^{T},~{}~{}~{}{\boldsymbol{\Sigma}}_{2}={\bf V}^{(1)}{\boldsymbol{\Lambda}}^{(2)}({\bf V}^{(1)})^{T},~{}~{}~{}{\boldsymbol{\Sigma}}_{3}={\bf V}^{(2)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(2)})^{T},$
	$\displaystyle{\boldsymbol{\Sigma}}_{4}$	$\displaystyle=$	$\displaystyle{\boldsymbol{\Lambda}}^{(1)},~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\boldsymbol{\Sigma}}_{5}={\bf V}^{(5)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(5)})^{T},$

where ${\bf V}^{(1)}$ , ${\bf V}^{(2)}$ are two random orthogonal matrices, and

$\displaystyle{\boldsymbol{\Lambda}}^{(1)}$	$\displaystyle=$	$\displaystyle\operatorname{diag}(5N/2,N,N/2,\lambda_{4}^{(1)},\cdots,\lambda_{N}^{(1)}),~{}~{}~{}\lambda_{j}^{(1)}\sim_{{i.i.d.}}\mbox{Unif}(1,3),$
$\displaystyle{\boldsymbol{\Lambda}}^{(2)}$	$\displaystyle=$	$\displaystyle\operatorname{diag}(7N/2,2N,N,\lambda_{4}^{(2)},\cdots,\lambda_{N}^{(2)}),~{}~{}~{}~{}\lambda_{j}^{(2)}\sim_{{i.i.d.}}\mbox{Unif}(2,5),$
$\displaystyle{\bf V}^{(5)}$	$\displaystyle=$	$\displaystyle({\bf v}_{1}^{(5)},{\bf v}_{2}^{(5)},{\bf e}_{3},\cdots,{\bf e}_{N}),$
$\displaystyle{\bf v}_{1}^{(5)}$	$\displaystyle=$	$\displaystyle(\cos\theta,\sin\theta,0,\cdots,0)^{T},$
$\displaystyle{\bf v}_{2}^{(5)}$	$\displaystyle=$	$\displaystyle(-\sin\theta,\cos\theta,0,\cdots,0)^{T},~{}~{}~{}~{}\theta\in[0,\pi/2].$

The observations will be simulated as the following: for a given ${\boldsymbol{\Sigma}}$ , which will be one of the five covariance matrices above, write its spectral decomposition as ${\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{T}$ . Then we simulate observations with covariance matrix ${\boldsymbol{\Sigma}}$ by ${\bf V}{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{t}$ , where ${\bf z}_{t}=({\bf z}_{t}[1],{\bf z}_{t}[2],\ldots,{\bf z}_{t}[N])^{\mathrm{T}}$ consists of i.i.d. standardized student $t(8)$ random variables.

Theorems 4, 5 and 6 are associated with different null hypotheses. When evaluating the sizes of the tests proposed in these theorems, we adopt the following setting:

•

For both Theorems 4 and 5, we simulate two samples of observations with ${\boldsymbol{\Sigma}}_{1}$ and ${\boldsymbol{\Sigma}}_{3}$ as their respective population covariance matrices. Note that ${\boldsymbol{\Sigma}}_{1}$ and ${\boldsymbol{\Sigma}}_{3}$ share the same eigenvalues but have different eigenvectors.
•

For Theorem 6, the two samples of observations are simulated with ${\boldsymbol{\Sigma}}_{1}$ and ${\boldsymbol{\Sigma}}_{2}$ as their respective population covariance matrices. The two matrices share the same eigenvectors but have different eigenvalues.

On the other hand, when evaluating powers, we use the following design:

•

For testing equality of eigenvalues/eigenvalue ratios, the two samples of observations are simulated with ${\boldsymbol{\Sigma}}_{1}$ and ${\boldsymbol{\Sigma}}_{2}$ as their respective population covariance matrices;
•

For testing equality of principal eigenvectors, we simulate two samples of observations with ${\boldsymbol{\Sigma}}_{4}$ and ${\boldsymbol{\Sigma}}_{5}$ as their respective population covariance matrices. The difference between the principal eigenvectors of the two matrices is a function of the angle $\theta$ . We will change the value of $\theta$ to see how the power varies as a function of $\theta$ .

5.2 Visual check

We firstly visually examine Theorems 4, 5 and 6 by comparing the empirical distributions of the test statistics with their respective asymptotic distributions under the null hypotheses.

For Theorem 4, the asymptotic distribution of the test statistic $T_{\lambda 1}$ is the standard normal distribution. This is clearly supported by Figure 1, which give the normal Q-Q plot and histogram of $T_{\lambda 1}$ based on 5,000 replications.

Refer to caption — Figure 1: Normal Q-Q plot and histogram of $T_{\lambda 1}$ based on 5,000 replications with $N=500,T_{1}=500$ and $T_{2}=750$ .

For Theorem 5, the asymptotic distribution of the test statistic $T_{e1}$ is again the standard normal distribution. This is supported by Figure 2.

For Theorem 6, the asymptotic distribution of the test statistic $T_{v1}$ is a generalized $\chi^{2}$ -distribution, which does not have an explicit density formula. To examine the asymptotics, we compare the empirical distribution of the test statistic $T_{v1}$ with that of Monte-Carlo samples from the asymptotic distribution. The comparison is conducted via both Q-Q plot and density estimation. The results are given in Figure 3. We can see that the empirical distribution of the test statistic $T_{v1}$ match well with the asymptotic distribution.

5.3 Size and power evaluation

In this subsection, we evaluate the sizes and powers of the three tests in Theorems 4, 5 and 6.

Table 1 reports the empirical sizes of the three tests based on $T_{\lambda k},T_{ek}$ and $T_{vk}$ , $k=1,2,3,$ at 5% significance level for different combinations of $N$ , $T_{1}$ and $T_{2}$ . Tests based on $T_{ek}$ and $T_{vk}$ , $k=1,2,3,$ involve the number of factors, which is unknown in practice. There are several estimators available, including those given in Bai and Ng (2002) and Ahn and Horenstein (2013). We evaluate the sizes based on a given estimated number of factors, specified by $\widehat{r}$ in the table. We see that for the first two sets of tests, for different estimated number of factors and different $N$ and $T_{1},T_{2}$ , the empirical sizes are close to the nominal level of 5%. For the third set of tests based on $T_{vi}$ , $i=1,2,3,$ , the size approaches 5% as the dimension $N$ and samples sizes $T_{1},T_{2}$ get larger.

$N$	$T_{\lambda 1}$	$T_{e1}$			$T_{v1}$
$N$	$T_{\lambda 1}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.052\$	0.052	0.050	0.050	0.083	0.086	0.086
300	$0.052\$	0.051	0.049	0.049	0.060	0.063	0.063
500	$0.053\$	0.053	0.051	0.051	0.051	0.055	0.055

$N$	$T_{\lambda 2}$	$T_{e2}$			$T_{v2}$
$N$	$T_{\lambda 2}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.057\$	0.048	0.047	0.047	0.102	0.108	0.109
300	$0.053\$	0.055	0.054	0.054	0.057	0.063	0.063
500	$0.056\$	0.053	0.052	0.052	0.048	0.055	0.055

$N$	$T_{\lambda 3}$	$T_{e3}$			$T_{v3}$
$N$	$T_{\lambda 3}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.065\$	NA	0.049	0.049	NA	0.096	0.099
300	$0.052\$	NA	0.052	0.052	NA	0.058	0.059
500	$0.062\$	NA	0.055	0.055	NA	0.057	0.057

Table 1: Empirical sizes based on 5,000 replications of

T_{\lambda k},T_{ek}

and

T_{vk}

k=1,2,3,

at 5% significance level with

N/T_{1}=1

and

N/T_{2}=2/3

Power evaluation results are given in Table 2. We see that the powers are in general quite high especially as the dimension $N$ and samples sizes $T_{1},T_{2}$ all get larger.

$N$	$T_{\lambda 1}$	$T_{e1}$			$T_{v1}$
$N$	$T_{\lambda 1}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.196\$	0.146	0.138	0.138	0.494	0.508	0.509
300	$0.617\$	0.457	0.450	0.450	0.909	0.913	0.914
500	$0.842\$	0.705	0.697	0.697	0.987	0.988	0.988

$N$	$T_{\lambda 2}$	$T_{e2}$			$T_{v2}$
$N$	$T_{\lambda 2}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.712\$	0.160	0.158	0.158	0.407	0.428	0.430
300	$0.989\$	0.360	0.355	0.355	0.833	0.844	0.844
500	$0.996\$	0.522	0.520	0.520	0.970	0.974	0.974

$N$	$T_{\lambda 3}$	$T_{e3}$			$T_{v3}$
$N$	$T_{\lambda 3}$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$	$\widehat{r}=2$	$\widehat{r}=3$ (true)	$\widehat{r}=4$
100	$0.705\$	NA	0.145	0.145	NA	NA	NA
300	$0.990\$	NA	0.303	0.303	NA	NA	NA
500	$1\$	NA	0.446	0.446	NA	NA	NA

Table 2: Empirical powers based on 5,000 replications of

T_{\lambda k},T_{ek}

and

T_{vk}

(for

\theta=\pi/9

k=1,2,3,

at 5% significance level with

N/T_{1}=1

and

N/T_{2}=2/3

Finally, in Figure 4, we evaluate the power of the eigenvector test $T_{vk},k=1,2$ as a function of $\theta$ . For the three $\theta$ values tested, $i\pi/9,i=1,2,3$ , the bigger the value, the bigger the difference between the principal eigenvectors in the two populations, and the higher the power. Moreover, even for the smallest value $\pi/9$ , the power quickly increases to close to 1 as the dimension $N$ and samples sizes $T_{1},T_{2}$ get larger.

6 Empirical Studies

In this section, we conduct empirical studies based on daily returns of S&P500 Index constituents from January 2000 to December 2020. The objective is to test, between two consecutive years, whether the principal eigenvalues, eigenvalue ratios and principal eigenvectors are equal or not.

6.1 Tests about principal eigenvalues

We plot in Figure 5 the values of the test statistic, $T_{\lambda k},k=1,2,3,$ together with the critical values at 5% significance level based on Theorem 4.

We see from Figure 5 that for testing equality of the first principal eigenvalue, the test result is statistically significant for more than half of two consecutive years, suggesting that the first principal eigenvalue tends to change over time. The second and third principal eigenvalues seem a bit more stable.

6.2 Tests on eigenvalues ratios

We plot in Figure 6 the results of testing equality of eigenvalue ratios.

An interesting observation is that, in sharp contrast with the tests about eigenvalues, for testing equality of eigenvalue ratios, the rejection rate is much lower. Such contrast suggests an interesting difference between the absolute sizes of principal eigenvalues and their relative sizes: while the absolute size appears to change frequently over time, the relative size is more stable.

6.3 Tests about principal eigenvectors

Figure 7 reports the results of tests about principal eigenvalues.

Notice that in this case, the asymptotic distribution under the null hypothesis is a complicated generalized $\chi^{2}$ distribution. There is no explicit formula for computing the critical value. To solve this issue, we simulate a large number of observations from the limiting distribution, based on which we estimate the 95% quantile. That leads to the red dotted curve in the plots. Note that the critical values change over time. The reason is that the limiting distribution involves both population principal eigenvalues and eigenvectors, which are subject to change over time. The black curves report test statistic values.

For the test about the first principal eigenvector, we see that for all pairs of consecutive years, the value of the test statistic is well above the 95% quantile, so we should reject the null hypothesis that the first principal eigenvector is the same between two consecutive years. For the tests about the second and third principal eigenvector, we also reject the corresponding null hypothesis for most of the pairs of consecutive years. These findings have a significant implication on factor modeling. In particular, the results show that structural breaks due to principal eigenvectors occur more often than what one would have guessed based on stock market condition changes.

6.4 Summary of the three test results

Figure 8 summarizes the results of the three tests.

Figure 8 reveals that testing for equality of principal eigenvectors between two adjacent years result in more rejections than those about principal eigenvalues or eigenvalue ratios. Moreover, the tests about the first principal eigenvalue and eigenvector are more likely to be rejected than those about the second and third principal components. Let us point out that while it could be indeed the case that the first principal eigenvalue and eigenvector change more frequently than the second or third principal eigenvalue and eigenvector, the difference could also be due to that the first principal component is the strongest so that the related tests are most powerful.

7 Conclusion

We establish both one-sample and two-sample central limit theorems for principal eigenvalues and eigenvectors under large factor models. Based on these CLTs, we develop three tests to detect structural changes in large factor models. Our tests can reveal whether the change is in principal eigenvalues or eigenvectors or both. Numerically, these tests are found to have good finite sample performance. Applying these tests to daily returns of the S&P500 Index constituent stocks, we find that, between two consecutive years, the principal eigenvalues, eigenvalue ratios and principal eigenvectors all exhibit frequent changes.

References

Ahn and Horenstein (2013) Ahn, Seung C. and Horenstein, Alex R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3), 1203–1227.
Andersen (1963) Andersen, T. W. (1963). Asymptotic theory for principal component analysis. The Annals of Mathematical Statistics, 34, 122–148.
Bai (2003) Bai, Jushan. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1), 135–171.
Bai and Ng (2002) Bai, Jushan and Ng, Serena. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221.
Bai and Ng (2006) Bai, Jushan and Ng, Serena. (2006). Evaluating latent and observed factors in macroeconomics and finance. J. Econometrics, 131(1-2), 507–537.
Bai and Silverstein (2010) Bai, Zhidong and Silverstein, Jack W. (2010). Spectral analysis of large dimensional random matrices. Springer Series in Statistics, second edn, Springer, New York.
Bai and Yao (2008) Bai, Zhidong and Yao, Jianfeng. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat., 44(3), 447–474.
Bao et al. (2022) Bao, Zhigang, Ding, Xiucai, Wang, Jingming and Wang, Ke. (2022). Statistical inference for principal components of spiked covariance matrices. The Annals of Statistics, 50(2), 1144–1169.
Breitung and Eickmeier (2011) Breitung, Jörg and Eickmeier, Sandra. (2011). Testing for structural breaks in dynamic factor models. J. Econometrics, 163(1), 71–84.
Cai et al. (2020) Cai, T. Tony and Han, Xiao and Pan, Guangming. (2020). Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices. Annals of Statistics, 48(3), 1255–1280.
Chamberlain and Rothschild (1983) Chamberlain, Gary and Rothschild, Michael. (1983). Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. Econometrica, 51(5), 1281–1304.
Chen et al. (2020) Chen, Liang and Dolado, Juan J. and Gonzalo, Jesús. (2014). Detecting big structural breaks in large factor models. J. Econometrics, 180(1), 30–48.
Fama and French (1992) Fama, Eugene F. and French, Kenneth R. (1992). The Cross-Section of Expected Stock Returns. The Journal of Finance, 47(2), 427–465.
Fama and French (2015) Fama, Eugene F. and French, Kenneth R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22.
Fan et al. (2011) Fan, Jianqing and Liao, Yuan and Mincheva, Martina. (2011). High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39(6), 3320–3356.
Fan et al. (2013) Fan, Jianqing and Liao, Yuan and Mincheva, Martina. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol., 75(4), 603–680.
Han and Inoue (2015) Han, Xu and Inoue, Atsushi. (2015). Tests for parameter instability in dynamic factor models. Econometric Theory, 31(5), 1117–1152.
Onatski (2010) Onatski, Alexei. (2010). Determining the Number of Factors from Empirical Distribution of Eigenvalues. The Review of Economics and Statistics, 92(4), 1004–1016.
Ross (1976) Ross, Stephen. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3), 341–360.
Sharpe (1964) Sharpe, William. (1964). Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk. Journal of Finance, 19(3), 425–442.
Silverstein and Bai (1995) Silvertein, J. and Bai, Z. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. Journal of Multivariate Analysis, 54(2), 175–192.
Stock and Watson (1998) Stock, James H. and Watson, Mark W. (1998). Diffusion Indexes. Working Paper.
Stock and Watson (2002) Stock, James H. and Watson, Mark W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc., 97(460), 1167–1179.
Stock and Watson (2009) Stock, James H. and Watson, Mark W. (2009). Forecasting to dynamic factor models subject to structural instability. The methodology and practice of econometrics, Oxford Univ. Press, Oxford, 173–205.
Wang et al. (2014) Wang, Qinwen and Su, Zhonggen and Yao, Jianfeng (2014). Joint CLT for several random sesquilinear forms with applications to large-dimensional spiked population models. Electron. J. Probab., 19, 1–28.
Wang and Fan (2017) Wang, Weichen and Fan, Jianqing (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann. Statist., 45(3), 1342–1374.
Zheng et al. (2015) Zheng, Shurong and Bai, Zhidong and Yao, Jianfeng (2015). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist., 43(2), 546–591.

SUPPLEMENTARY MATERIAL

The supplementary material includes the proof of Theorem 1, 2, 3 and 6, and Corollary 1 in the main text.

S1. Notations

Recall the spectral decomposition of ${\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{\mathrm{T}}$ , where the orthogonal matrix ${\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N})$ consists of the eigenvectors of ${\boldsymbol{\Sigma}}$ , and ${\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N})$ with eigenvalues $\lambda_{1}\geq\ldots\geq\lambda_{N}$ . Write ${\boldsymbol{\Lambda}}=\operatorname{diag}({\boldsymbol{\Lambda}}_{A},{\boldsymbol{\Lambda}}_{B})$ , where

{\boldsymbol{\Lambda}}_{A}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{r})~{}~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}{\boldsymbol{\Lambda}}_{B}=\operatorname{diag}(\lambda_{r+1},\ldots,\lambda_{N}).

Define ${\bf x}_{t}={\bf V}^{\mathrm{T}}{\bf y}_{t}$ . Then $\operatorname{Cov}({\bf x}_{t})={\boldsymbol{\Lambda}}$ , and the eigenvectors of ${\boldsymbol{\Lambda}}$ are the unit vectors ${\bf e}_{1},\ldots,{\bf e}_{N}$ , where ${\bf e}_{k}$ is the unit vector with 1 in the $k$ th entry and zeros elsewhere. Let ${\bf S}_{N}=1/T\cdot\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}{\bf x}_{t}^{\mathrm{T}}$ , whose eigenvalues are denoted by $\widehat{\lambda}_{1}\geq\widehat{\lambda}_{2}\geq\ldots\geq\widehat{\lambda}_{N}$ with corresponding eigenvectors ${\bf u}_{1},{\bf u}_{2},\ldots,{\bf u}_{N}$ . To resolve the ambiguity in the direction of an eigenvector, we specify the direction such that ${\bf u}_{k}[k]\geq 0$ for all $1\leq k\leq N$ , namey, the $k$ th coordinate of the $k$ th eigenvector is nonnegative (although when the $k$ th coordinate is zero, the direction is still not specified, in which case we take an arbitrary direction.) Note that the eigenvalues of $\widehat{{\boldsymbol{\Sigma}}}_{N}=\dfrac{1}{T}\sum_{t=1}^{T}{\bf y}_{t}{\bf y}_{t}^{\mathrm{T}}$ are the same as ${\bf S}_{N}$ , and the eigenvectors are $\widehat{{\bf v}}_{k}={\bf V}{\bf u}_{k}$ . It follows that

\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle=\langle{\bf v}_{k},{\bf V}{\bf u}_{k}\rangle={\bf v}_{k}^{\mathrm{T}}{\bf V}{\bf u}_{k}=\langle{\bf e}_{k},{\bf u}_{k}\rangle.

In the below, we focus on the analysis of principal eigenvalues and eigenvectors of ${\bf S}_{N}$ .

Notation: For any square matrix ${\bf A}$ , $\operatorname{tr}({\bf A})$ denotes its trace, $|{\bf A}|$ its determinant, and $\|{\bf A}\|$ its spectral norm. For any vector ${\bf v}$ , $\|{\bf v}\|$ stands for its $\ell_{2}$ norm. Write the $(i,j)$ th entry of any matrix ${\bf W}$ as $[{\bf W}]_{ij}$ and ${\bf v}[k]$ as the $k$ th entry of a vector ${\bf v}$ . Use $\gamma_{j}({\bf A})$ to denote the $j$ th largest eigenvalue of matrix ${\bf A}$ . The notation $\,{\buildrel p\over{\longrightarrow}}\,$ stands for convergence in probability, $\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}$ represents convergence in law, $Y_{n}=o_{p}(f(n))$ means that $Y_{n}/f(n)\,{\buildrel p\over{\longrightarrow}}\,0$ , and $Y_{n}=O_{p}(f(n))$ for that the sequence $(Y_{n}/f(n))$ is tight. Write $a_{n}\asymp b_{n}$ if $c_{1}b_{n}\leq a_{n}\leq c_{2}b_{n}$ for some constants $c_{1},c_{2}>0$ . For any sequence of random matrices $({\bf W}_{N})$ with fixed dimension, write ${\bf W}_{N}=o_{p}(1)$ or $O_{p}(1)$ if all the entries of ${\bf W}_{N}$ are $o_{p}(1)$ or $O_{p}(1)$ , respectively. We say an event ${\mathcal{A}}_{n}$ holds with high probability if $P({\mathcal{A}}_{n})\geq 1-O(n^{-\ell})$ for any constant $\ell>0$ . Let ${\bf e}_{k},\widetilde{{\bf e}}_{kA},\widetilde{{\bf e}}_{kB}$ be the unit vectors with 1 in the $k$ th coordinate and zeros elsewhere of dimensions $N,r,(N-r)$ , respectively. We use ${\bf I}$ to denote the identity matrix and ${\mathbf{1}}(\cdot)$ to denote the indicator function. Denote ${\mathbb{C}}^{+}=\{z\in{\mathbb{C}}:\operatorname{Im}(z)>0\}$ , where $\operatorname{Im}(z)$ is the imaginary part of a complex number $z$ . For any probability distribution $G(x)$ , its Stieltjes transform $m_{G}(z)$ is defined by

m_{G}(z)=\int\dfrac{1}{x-z}\ dG(x),~{}~{}~{}~{}~{}z\in{\mathbb{C}}^{+}.

In all the sequel, $C$ is a generic constant whose value may vary from place to place.

S2. Proof of Theorem 1

Proof of Theorem 1.

Recall that ${\bf x}_{t}={\bf V}^{\mathrm{T}}{\bf y}_{t}$ . Write

{\bf X}_{N\times T}=({\bf x}_{1},\ldots,{\bf x}_{T}):=\begin{pmatrix}{{\bf x}_{(1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf x}_{(N)}}^{\mathrm{T}}\end{pmatrix}.

Let ${\bf z}_{t}={\boldsymbol{\Lambda}}^{-1/2}{\bf x}_{t}$ , ${\bf z}_{(\ell)}=\lambda_{\ell}^{-1/2}{\bf x}_{(\ell)}$ for $t=1,\ldots,T$ , $\ell=1,\ldots,N$ , and ${\bf Z}=({\bf z}_{1},\ldots,{\bf z}_{T})=({\bf z}_{(1)},\ldots,{\bf z}_{(N)})^{\mathrm{T}}$ . Then

\displaystyle{\bf X}={\boldsymbol{\Lambda}}^{1/2}{\bf Z}=\begin{pmatrix}\sqrt{\lambda_{1}}{{\bf z}_{(1)}}^{\mathrm{T}}\\ \vdots\\ \sqrt{\lambda_{N}}{{\bf z}_{(N)}}^{\mathrm{T}}\end{pmatrix}=({\boldsymbol{\Lambda}}^{1/2}{\bf z}_{1},\ldots,{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{T}).

Write

{\bf Z}_{A}=({\bf Z}_{A})_{r\times T}=\begin{pmatrix}{{\bf z}_{(1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf z}_{(r)}}^{\mathrm{T}}\end{pmatrix}=({\bf z}_{1}^{(A)},\ldots,{\bf z}_{T}^{(A)}),

and

{\bf Z}_{B}=({\bf Z}_{B})_{(N-r)\times T}=\begin{pmatrix}{{\bf z}_{(r+1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf z}_{(N)}}^{\mathrm{T}}\end{pmatrix}=({\bf z}_{1}^{(B)},\ldots,{\bf z}_{T}^{(B)}).

Write

{\bf z}_{t}=\begin{pmatrix}{\bf z}_{t}^{(A)}\\ {\bf z}_{t}^{(B)}\end{pmatrix},~{}~{}~{}~{}~{}{\bf z}_{t}^{(A)}:r\times 1,~{}~{}{\bf z}_{t}^{(B)}:(N-r)\times 1.

Define the companion matrix of ${\bf S}_{N}$ as

$\displaystyle\underline{{\bf S}}_{N}$	$\displaystyle:=$	$\displaystyle\dfrac{1}{T}{\bf X}^{\mathrm{T}}{\bf X}=\dfrac{1}{T}{\bf Z}^{\mathrm{T}}{\boldsymbol{\Lambda}}{\bf Z}=\dfrac{1}{T}\sum_{j=1}^{\mathrm{T}}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\sum_{j=1}^{r}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}+\dfrac{1}{T}\sum_{j=r+1}^{N}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}$
	$\displaystyle=:$	$\displaystyle\underline{{\bf S}}_{11}+\underline{{\bf S}}_{22},$

where

	$\displaystyle\underline{{\bf S}}_{11}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\sum_{j=1}^{r}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}=\dfrac{1}{T}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}{\bf Z}_{A},$
	$\displaystyle\underline{{\bf S}}_{22}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\sum_{j=r+1}^{N}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}=\dfrac{1}{T}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}{\bf Z}_{B}.$

Further denote the companion matrices of $\underline{{\bf S}}_{11}$ and $\underline{{\bf S}}_{22}$ as

{\bf S}_{11}=\dfrac{1}{T}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2},~{}~{}~{}~{}~{}~{}~{}{\bf S}_{22}=\dfrac{1}{T}{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}.

Define the event ${\mathcal{F}}_{s}=\{\|{\bf S}_{22}\|\leq C_{s}\}$ for some constant $C_{s}\in(0,+\infty)$ . By Wely’s Theorem, Assumption (A.ii) and Theorem 9.13 of Bai and Silverstein (2010), for any $\ell>0$ , we can choose a $C_{s}$ sufficiently large such that

P({\mathcal{F}}_{s}^{c})=o(T^{-\ell}).

(5)

Note that the non-zero eigenvalues of $\underline{{\bf S}}_{11}$ and $\underline{{\bf S}}_{22}$ are the same as their companion matrices ${\bf S}_{11}$ and ${\bf S}_{22}$ , respectively. For any principal eigenvalues $\lambda_{k}$ , $k=1,\ldots,r$ , the matrix

\dfrac{{\bf S}_{11}}{\lambda_{k}}=\dfrac{1}{T}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)^{1/2}

is in the low-dimensional situation as considered in Theorem 1 of Andersen (1963), by which one has $\dfrac{\gamma_{j}({\bf S}_{11})}{\lambda_{k}}-\dfrac{\lambda_{j}}{\lambda_{k}}\rightarrow 0$ for $1\leq j,k\leq r$ . Because $\|{\bf S}_{22}\|=O_{p}(1)$ , by Wely’s Theorem that

\displaystyle\gamma_{j}({\bf S}_{11})+\gamma_{T}({\bf S}_{22})\leq\widehat{\lambda}_{j}\leq\gamma_{j}({\bf S}_{11})+\gamma_{1}({\bf S}_{22}),~{}~{}~{}~{}1\leq j\leq r,

(6)

we get

\displaystyle\dfrac{\widehat{\lambda}_{j}}{\lambda_{k}}-\dfrac{\lambda_{j}}{\lambda_{k}}\ \,{\buildrel p\over{\longrightarrow}}\,\ 0~{}~{}~{}~{}~{}~{}\textrm{for}~{}1\leq j,k\leq r.

(7)

In particular, ${\widehat{\lambda}_{k}}/{\lambda_{k}}\,{\buildrel p\over{\longrightarrow}}\,1$ for $k=1,\ldots,r$ .

Next, we derive the central limit theorem of $\widehat{\lambda}_{k}/\lambda_{k}$ for $1\leq k\leq r$ .

Write ${\bf x}_{t}=\begin{pmatrix}{\bf x}_{t}^{(A)}\\ {\bf x}_{t}^{(B)}\end{pmatrix}$ , where ${\bf x}_{t}^{(A)}={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf z}_{t}^{(A)}$ , ${\bf x}_{t}^{(B)}={\boldsymbol{\Lambda}}_{B}^{1/2}{\bf z}_{t}^{(B)}$ . Further denote

{\bf X}_{A}=({\bf x}_{1}^{(A)},\ldots,{\bf x}_{T}^{(A)})={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A},~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}{\bf X}_{B}=({\bf x}_{1}^{(B)},\ldots,{\bf x}_{T}^{(B)})={\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}.

The sample covariance matrix ${\bf S}_{N}$ can be decomposed as

$\displaystyle{\bf S}_{N}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}{\bf x}_{t}^{\mathrm{T}}=\dfrac{1}{T}\begin{pmatrix}\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(A)}{{\bf x}_{t}^{(A)}}^{\mathrm{T}}&\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(A)}{{\bf x}_{t}^{(B)}}^{\mathrm{T}}\\ \sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(B)}{{\bf x}_{t}^{(A)}}^{\mathrm{T}}&\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(B)}{{\bf x}_{t}^{(B)}}^{\mathrm{T}}\end{pmatrix}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\begin{pmatrix}{\bf X}_{A}{\bf X}_{A}^{\mathrm{T}}&{\bf X}_{A}{\bf X}_{B}^{\mathrm{T}}\\ {\bf X}_{B}{\bf X}_{A}^{\mathrm{T}}&{\bf X}_{B}{\bf X}_{B}^{\mathrm{T}}\end{pmatrix}=\dfrac{1}{T}\begin{pmatrix}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}&{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}\\ {\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}&{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}\end{pmatrix}$
	$\displaystyle=$	$\displaystyle\begin{pmatrix}{\bf S}_{11}&{\bf S}_{12}\\ {\bf S}_{21}&{\bf S}_{22}\end{pmatrix}.$

Under Assumptions (A.ii), B and C, by Silverstein and Bai (1995), the ESD of ${\bf S}_{22}$ almost surely converges to a non-random probability distribution $F$ whose Stieltjes transform $m(z)$ is the unique solution in the domain ${\mathbb{C}}^{+}$ to the equation

m(z)=\int\dfrac{1}{t(1-\rho-\rho zm(z))-z}\ dH(z),~{}~{}~{}~{}\mbox{for all }z\in{\mathbb{C}}^{+}.

By definition, each principal sample eigenvalues $\widehat{\lambda}_{k}$ solves the equation

0=|\lambda{\bf I}-{\bf S}_{N}|=|\lambda{\bf I}-{\bf S}_{22}|\cdot|\lambda{\bf I}-\widetilde{{\bf K}}_{N}(\lambda)|,

(8)

where

\widetilde{{\bf K}}_{N}(\lambda)={\bf S}_{11}+{\bf S}_{12}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}=\dfrac{1}{T}{\bf X}_{A}({\bf I}+{\bf A}_{N}){\bf X}_{A}^{\mathrm{T}}={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf K}_{N}(\lambda){\boldsymbol{\Lambda}}_{A}^{1/2},

(9)

and

{\bf A}_{N}={\bf A}_{N}(\lambda)=\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B},~{}~{}~{}~{}~{}~{}~{}{\bf K}_{N}(\lambda)=\dfrac{1}{T}{\bf Z}_{A}({\bf I}+{\bf A}_{N}){\bf Z}_{A}^{\mathrm{T}}.

Further define

{\bf R}_{N}={\bf R}_{N}(\lambda)=\dfrac{1}{\sqrt{T}}\left({\bf Z}_{A}({\bf I}+{\bf A}_{N}){\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot{\bf I}\right),

then

\displaystyle{\bf K}_{N}(\lambda)=\dfrac{1}{\sqrt{T}}{\bf R}_{N}(\lambda)+\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda))\cdot{\bf I}.

(10)

We first give three lemmas, which will be repeatedly used in the following proofs. The first lemma is about the random matrix ${\bf A}_{N}$ , and the second and third ones are about the limiting distributions of $({\bf R}_{N}(\lambda))$ . The proofs of these lemmas are postponed to the end of this subsection.

Lemma 1.

Under Assumptions A–C, for $i,j=1,\ldots,r$ , we have

			$\displaystyle\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}(\lambda_{i})=O_{p}(N^{-1}),~{}~{}~{}~{}~{}\dfrac{1}{T}\operatorname{tr}\left({\bf A}_{N}(\lambda_{i}){\bf A}_{N}(\lambda_{j})\right)=O_{p}(N^{-2}),~{}~{}~{}~{}$
	and		$\displaystyle~{}~{}~{}~{}\dfrac{1}{T}\sum_{\ell=1}^{\mathrm{T}}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}[{\bf A}_{N}(\lambda_{j})]_{\ell\ell}\right)=O_{p}(N^{-2}).$		(11)

Lemma 2.

Under Assumptions A–C and assume that $\lambda\asymp N$ , the random matrix ${\bf R}_{N}(\lambda)$ converges weakly to a symmetric Gaussian random matrix ${\bf R}=([{\bf R}]_{ij})$ with zero-mean and the following covariance function:

\displaystyle\operatorname{Cov}\left([{\bf R}]_{ij},[{\bf R}]_{i^{\prime}j^{\prime}}\right)=\left\{\begin{array}[]{ll}0,&\textrm{if}~{}i\neq i^{\prime},~{}\textrm{or}~{}j\neq j^{\prime},\\ \operatorname{Var}\left([{\bf R}]_{ij}\right)=1,&\textrm{if}~{}i=i^{\prime}\neq j=j^{\prime},\\ \operatorname{Var}\left([{\bf R}]_{ii}\right)=\operatorname{E}({\bf z}_{1}[i])^{4}-1,&\textrm{if}~{}i=i^{\prime}=j=j^{\prime}.\end{array}\right.

Lemma 3.

Under Assumptions A–C and $\lambda\asymp N$ , the block diagonal random matrix ${\bf R}_{J_{N}}=\operatorname{diag}({\bf R}_{N}(\lambda_{1}),\ldots,{\bf R}_{N}(\lambda_{r}))$ converges weakly to a symmetric Gaussian block diagonal random matrix ${\bf R}_{J}=\operatorname{diag}({\bf R}_{1},\ldots,{\bf R}_{r})$ with zero-mean and the following covariance function, for any $1\leq m,m^{\prime}\leq r$ ,

\displaystyle\operatorname{Cov}\left([{\bf R}_{m}]_{ij},[{\bf R}_{m^{\prime}}]_{i^{\prime}j^{\prime}}\right)=\left\{\begin{array}[]{ll}0,&\textrm{if}~{}i\neq i^{\prime},~{}\textrm{or}~{}j\neq j^{\prime},\\ 1,&\textrm{if}~{}i=i^{\prime}\neq j=j^{\prime},\\ \operatorname{E}({\bf z}_{1}[i])^{4}-1,&\textrm{if}~{}i=i^{\prime}=j=j^{\prime}.\end{array}\right.

We now return to the analysis of principal sample eigenvalues $\widehat{\lambda}_{k}$ . Noting that the principal eigenvalues of ${\bf S}_{N}$ go to infinity and the estimate (5), without loss of generality, we can assume that for $N$ large enough, $\widehat{\lambda}_{k}$ is not an eigenvalue of ${\bf S}_{22}$ . It follows that $\widehat{\lambda}_{k}$ is the $k$ th eigenvalue of matrix $\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})$ .

Note that

	$\displaystyle{\bf K}_{N}(\lambda_{k})-{\bf K}_{N}(\widehat{\lambda}_{k})$	$\displaystyle=$	$\displaystyle\dfrac{1}{T^{2}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right){\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}$
		$\displaystyle=$	$\displaystyle\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right){\bf Q}_{N},$

where

{\bf Q}_{N}=\dfrac{1}{T^{2}\widehat{\lambda}_{k}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\widehat{\lambda}_{k}^{-1}{\bf S}_{22}\right)^{-1}\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-1}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}.

By the elementary formulae,

{\bf X}^{\mathrm{T}}({\bf I}-{\bf X}{\bf X}^{\mathrm{T}})^{-1}=({\bf I}-{\bf X}^{\mathrm{T}}{\bf X})^{-1}{\bf X}^{\mathrm{T}},~{}~{}~{}~{}~{}({\bf I}-{\bf X}{\bf X}^{\mathrm{T}})^{-1}{\bf X}={\bf X}({\bf I}-{\bf X}^{\mathrm{T}}{\bf X})^{-1},

it follows that

	$\displaystyle{\bf Q}_{N}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T^{2}\widehat{\lambda}_{k}}{\bf Z}_{A}\left({\bf I}-\dfrac{1}{T\widehat{\lambda}_{k}}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\right)^{-1}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\left({\bf I}-\dfrac{1}{T\lambda_{k}}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\right)^{-1}{\bf Z}_{A}^{\mathrm{T}}$
		$\displaystyle=$	$\displaystyle O_{p}(1/N),$

where the last step comes from the fact that eigenvalues of ${\bf S}_{22}$ are $O_{p}(1)$ and an analysis similar to that of ${\bf R}_{N}$ in the proof of Lemma 2 below. It follows from (7) that

\displaystyle{\bf K}_{N}(\lambda_{k})-{\bf K}_{N}(\widehat{\lambda}_{k})=o_{p}(1/N).

(14)

Recall that $\widehat{\lambda}_{k}$ is the $k$ th largest eigenvalue of matrix $\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})$ . Denote matrix ${\bf D}=\left(\widehat{\lambda}_{k}{\bf I}-\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\right)/\lambda_{k}$ . From Assumption A, Lemmas 1, 2, and equations (7), (10) and (14), it follows that

[{\bf D}]_{kk}=\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{kk}+O_{p}(1/T),

for $i\neq k$ ,

[{\bf D}]_{ii}=\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-\dfrac{\lambda_{i}}{\lambda_{k}}\right)-\dfrac{\lambda_{i}}{\lambda_{k}}\cdot\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{ii}+O_{p}(1/T)\,{\buildrel p\over{\longrightarrow}}\,1-\theta_{i}/\theta_{k}\neq 0.

(15)

and

[{\bf D}]_{ij}=-\dfrac{\sqrt{\lambda_{i}\lambda_{j}}}{\lambda_{k}}\cdot\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{ij}+o_{p}(1/N)=O_{p}(1/\sqrt{T}).

Let det $({\bf A})$ be the determinant of a matrix ${\bf A}$ , then

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\textrm{det}\left(\widehat{\lambda}_{k}{\bf I}-\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\right)/\lambda_{k}$
	$\displaystyle=$	$\displaystyle\textrm{det}\left(\begin{array}[]{ccccc}[{\bf D}]_{11}&&&&\\ &\ddots&&&O_{p}(1\sqrt{T})\\ &&[{\bf D}]_{kk}&&\\ O_{p}(1/\sqrt{T})&&&\ddots&\\ &&&&[{\bf D}]_{rr}\end{array}\right)$
	$\displaystyle=$	$\displaystyle\left(\prod_{i=1,i\neq k}^{r}[{\bf D}]_{ii}\right)\cdot\left[\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\frac{1}{\sqrt{T}}[{\bf R}_{N}]_{kk}\right]$
		$\displaystyle~{}~{}~{}~{}~{}~{}+O_{p}(1/T)\left[\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\frac{1}{\sqrt{T}}[{\bf R}_{N}]_{kk}\right]+O_{p}(1/T).$

Using (15) we then obtain that

\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}=1+\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{kk}+O_{p}(1/T).

In particular,

\displaystyle\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-[{\bf R}_{N}(\lambda_{k})]_{kk}\ \,{\buildrel p\over{\longrightarrow}}\,\ 0.

(17)

By Lemma 2, we obtain

\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\operatorname{E}({\bf z}_{1}[k])^{4}-1).

Similarly, the joint convergence of $\sqrt{T}\left({\widehat{\lambda}_{1}}/{\lambda_{1}}-1,\cdots,{\widehat{\lambda}_{r}}/{\lambda_{r}}-1\right)$ follows from (17) and Lemma 3. ∎

We now prove Lemmas 1–3.

Proof of Lemma 1.

By the estimate (5), it suffices to prove Lemma 1 for ${\bf A}_{N}(\lambda_{i}){\mathbf{1}}({\mathcal{F}}_{s})$ and ${\bf A}_{N}(\lambda_{j}){\mathbf{1}}({\mathcal{F}}_{s})$ . Under Assumption A, we have

\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}(\lambda_{i}){\mathbf{1}}({\mathcal{F}}_{s})=\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda_{i}{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right){\mathbf{1}}({\mathcal{F}}_{s})=O_{p}(N^{-1}),

and

	$\displaystyle\dfrac{1}{T}\operatorname{tr}\left({\bf A}_{N}(\lambda_{i}){\bf A}_{N}(\lambda_{j})\right){\mathbf{1}}({\mathcal{F}}_{s})$	$\displaystyle=\dfrac{1}{T}\operatorname{tr}\left((\lambda_{i}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}(\lambda_{j}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}\right){\mathbf{1}}({\mathcal{F}}_{s})$
		$\displaystyle=O_{p}\left(N^{-2}\right).$

To prove $(1/T)\sum_{\ell=1}^{T}[{\bf A}_{N}(\lambda_{i})]_{\ell\ell}[{\bf A}_{N}(\lambda_{j})]_{\ell\ell}{\mathbf{1}}({\mathcal{F}}_{s})=O_{p}(N^{-2})$ , it suffices to show that

\displaystyle\max_{i}\max_{\ell}\operatorname{E}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)=O(N^{-2}).

(18)

Note that $[{\bf A}_{N}]_{\ell\ell}=T^{-1}{{\bf z}_{\ell}^{(B)}}^{T}{\boldsymbol{\Lambda}}_{B}^{1/2}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf z}_{\ell}^{(B)}$ are identically distributed for different $\ell$ , hence

	$\displaystyle\operatorname{E}[{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s})$	$\displaystyle=$	$\displaystyle\dfrac{1}{T}\sum_{\ell=1}^{T}\operatorname{E}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)\leq\dfrac{1}{T}\operatorname{E}\left(\operatorname{tr}\left({\bf A}_{N}(\lambda_{i})\right)^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)$
		$\displaystyle=$	$\displaystyle\dfrac{1}{T}\operatorname{E}\left(\operatorname{tr}\left((\lambda_{i}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}\right)^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)\leq\dfrac{C}{N^{2}}.$

∎

Proof of Lemma 2.

Recall that ${\bf Z}_{A}=({\bf z}_{(1)},\ldots,{\bf z}_{(r)})^{\mathrm{T}}$ . We have

[{\bf Z}_{A}({\bf I}+{\bf A}_{N}(\lambda)){\bf Z}_{A}^{\mathrm{T}}]_{ij}={\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda)){\bf z}_{(j)}.

Consider the random vector of dimension $K=\dfrac{1}{2}r(r+1)$ :

{\bf W}_{N}={\bf W}_{N}(\lambda):=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}_{N}+{\bf A}_{N}(\lambda)){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}.

For any $1\leq\ell,\ell^{\prime}\leq K$ , there exist two pairs $(i,j)$ and $(i^{\prime},j^{\prime})$ , $1\leq i\leq j\leq r,\ 1\leq i^{\prime}\leq j^{\prime}\leq r$ such that

{\bf W}_{N}[\ell]=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right),

and

{\bf W}_{N}[\ell^{\prime}]=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i^{\prime})}^{\mathrm{T}}({\bf I}+{\bf A}_{N}){\bf z}_{(j^{\prime})}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot\operatorname{E}({\bf z}_{(i^{\prime})}[1]{\bf z}_{(j^{\prime})}[1])\right).

By Lemma 1, we have

\lim_{T\rightarrow\infty}\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}\left([{\bf I}+{\bf A}_{N}]_{ii}\right)^{2}=1,~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}\lim_{T\rightarrow\infty}\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N})^{2}=1.

By Corollary 7.1 of Bai and Yao (2008), the random vector ${\bf W}_{N}$ converges weakly to a $K$ -dimensional Gaussian vector with mean zero and covariance matrix ${\boldsymbol{\Gamma}}_{W}$ satisfying $[{\boldsymbol{\Gamma}}_{W}]_{\ell\ell^{\prime}}=\rho_{(i,j)(i^{\prime},j^{\prime})}$ , where

	$\displaystyle\rho_{(i,j)(i^{\prime},j^{\prime})}$	$\displaystyle=$	$\displaystyle\operatorname{E}\left({\bf z}_{(i)}[1]\ {\bf z}_{(j)}[1]\ {\bf z}_{(i^{\prime})}[1]\ {\bf z}_{(j^{\prime})}[1]\right)-\operatorname{E}\left({\bf z}_{(i)}[1]\ {\bf z}_{(j)}[1]\right)\operatorname{E}\left({\bf z}_{(i^{\prime})}[1]\ {\bf z}_{(j^{\prime})}[1]\right)$
		$\displaystyle=$	$\displaystyle\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\ {\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right)-\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\right)\operatorname{E}\left({\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right).$

The result follows. ∎

Proof of Lemma 3.

Consider the block diagonal random matrix

{\bf R}_{J_{N}}=\operatorname{diag}\left({\bf R}_{N}(\lambda_{1}),\ldots,{\bf R}_{N}(\lambda_{r})\right)

as an $M=r\times\dfrac{1}{2}r(r+1)$ dimensional vector

	$\displaystyle\dfrac{1}{\sqrt{T}}\left(\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda_{1})){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{1}))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}~{},\ldots,\right.$
	$\displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda_{r})){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{r}))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}\right).$

By Assumption A–C and Lemma 1, the block diagonal random matrix ${\bf R}_{J_{N}}$ converges weakly to a symmetric Gaussian block diagonal random matrix

{\bf R}_{J}=\operatorname{diag}({\bf R}_{1},\ldots,{\bf R}_{r})

with mean zero and covariance function as follows: for any $1\leq m,m^{\prime}\leq r$ ,

\operatorname{Cov}\left([{\bf R}_{m}]_{ij},[{\bf R}_{m^{\prime}}]_{i^{\prime}j^{\prime}}\right)=\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\ {\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right)-\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\right)\cdot\operatorname{E}\left({\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right).

The conclusion follows. ∎

S3. Proof of Theorem 2

Proof.

Write

\displaystyle\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})=\sum_{j=r+1}^{N}\lambda_{j},~{}~{}~{}~{}~{}~{}\operatorname{tr}({\bf S}_{-r})=\sum_{j=r+1}^{N}\widehat{\lambda}_{j}.

To prove Theorem 2, we first show that $\operatorname{tr}({\bf S}_{-r})/N$ has a faster convergence rate than $\sqrt{T}$ , that is,

\displaystyle\Delta:=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{-r})-\dfrac{1}{N}\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})\right)\ \,{\buildrel p\over{\longrightarrow}}\,\ 0.

(19)

Decompose $\Delta=\Delta_{1}+\Delta_{2}$ , where

\displaystyle\Delta_{1}=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{-r})-\dfrac{1}{N}\operatorname{tr}({\bf S}_{22})\right),~{}~{}~{}~{}\Delta_{2}=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{22})-\dfrac{1}{N}\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})\right).

Note that $\Delta_{2}\,{\buildrel p\over{\longrightarrow}}\,0$ by Theorem 2.1 of Zheng et al. (2015).

Next we analyze $\Delta_{1}$ . Note that

\operatorname{tr}({\bf S}_{N})=\operatorname{tr}({\bf S}_{11})+\operatorname{tr}({\bf S}_{22})=\sum_{j=1}^{r}\widehat{\lambda}_{j}+\operatorname{tr}({\bf S}_{-r}).

Hence

\Delta_{1}=\sum_{j=1}^{r}\sqrt{T}\left(\gamma_{j}({\bf S}_{11}/N)-\widehat{\lambda}_{j}/N\right).

By inequality (6), we have

\dfrac{r\sqrt{T}}{N}\ \gamma_{T}({\bf S}_{22})\ \leq\ \sum_{j=1}^{r}\sqrt{T}\left(\dfrac{\widehat{\lambda}_{j}}{N}-\gamma_{j}({\bf S}_{11}/N)\right)\ \leq\ \dfrac{r\sqrt{T}}{N}\ \gamma_{1}({\bf S}_{22}).

By Assumption (A.ii) and Assumption C, we have $\Delta_{1}\,{\buildrel p\over{\longrightarrow}}\,0$ . Hence, (19) holds.

We can then rewrite the result of Theorem 1 as follows:

\sqrt{T}\left[\begin{pmatrix}\widehat{\lambda}_{1}/N\\ \vdots\\ \widehat{\lambda}_{r}/N\\ \operatorname{tr}({\bf S}_{-r})/N\end{pmatrix}-\begin{pmatrix}\lambda_{1}/N\\ \vdots\\ \lambda_{r}/N\\ \operatorname{tr}({\boldsymbol{\Sigma}}_{-r})/N\end{pmatrix}\right]\ \rightarrow\ N(0,{\boldsymbol{\Sigma}}_{J}),

where ${\boldsymbol{\Sigma}}_{J}=\operatorname{diag}(\theta_{1}^{2}\sigma_{\lambda_{1}}^{2},\ldots,\theta_{r}^{2}\sigma_{\lambda_{r}}^{2},0)$ . For any $k=1,\ldots,r$ , by considering the function

f({\bf x})=f(x_{1},\ldots,x_{r+1})=\dfrac{x_{k}}{\sum_{i\neq k,i=1}^{r}x_{i}},

and using the Delta method, we get that

\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}}-\dfrac{\lambda_{k}}{\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{-k}^{2}),

where

\displaystyle\sigma_{-k}^{2}

\displaystyle=

\displaystyle\dfrac{\theta_{k}^{2}}{\left(\sum_{i\neq k,i=1}^{r}\theta_{i}+\int tdH(t)\right)^{2}}\ \left[\sigma_{\lambda_{k}}^{2}+\dfrac{\sum_{j\neq k,j=1}^{r}\theta_{j}^{2}\ \sigma_{\lambda_{j}}^{2}}{\left(\sum_{i\neq k,i=1}^{r}\theta_{i}+\int tdH(t)\right)^{2}}\right].

Finally, $\widehat{\sigma_{-k}^{2}}$ defined in Theorem 2 is a consistent estimator of $\sigma_{-k}^{2}$ by Theorem 1 and (19). ∎

S4. Some preliminary results for proving Theorems 3 and 6

We first derive some preliminary results in preparation for the proofs of Theorems 3 and 6.

Recall that we write ${\bf u}_{k}=({\bf u}_{kA}^{\mathrm{T}},{\bf u}_{kB}^{\mathrm{T}})^{\mathrm{T}}$ as the eigenvector of ${\bf S}_{N}$ corresponding to the eigenvalue $\widehat{\lambda}_{k}$ , where ${\bf u}_{kA}$ and ${\bf u}_{kB}$ are of dimensions $r$ and $N-r$ , respectively. Also recall that $\widetilde{{\bf u}}_{kA}={\bf u}_{kA}/\|{\bf u}_{kA}\|$ . Further denote by $\widetilde{{\bf e}}_{kA}$ the $r$ dimensional vector with 1 in the $k$ th coordinate and 0’s elsewhere.

Proposition 1.

Under Assumptions A–C, for $1\leq k\leq r$ , we have

T\left(1-\left(\widetilde{{\bf u}}_{kA}[k]\right)^{2}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2},

where $\omega_{ki}=\dfrac{\theta_{k}\theta_{i}}{(\theta_{k}-\theta_{i})^{2}}$ and $Z_{i}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,1)$ .

Remark 7.

As a corollary, we have

N\left(1-|\widetilde{{\bf u}}_{kA}[k]|\right)=N\left(1-\dfrac{|{\bf u}_{k}[k]|}{\|{\bf u}_{kA}\|}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \dfrac{\rho}{2}\sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2}.

Proposition 2.

Under Assumptions A–C,

(i)

for $1\leq k\leq r$ , we have

\sqrt{T}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA})\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\boldsymbol{\Sigma}}_{k}),

where

{\boldsymbol{\Sigma}}_{k}=\sum_{i\neq k,i=1}^{r}\omega_{ki}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}},~{}~{}~{}~{}~{}~{}{\textrm{a}nd}~{}~{}~{}~{}\omega_{ki}=\dfrac{\theta_{k}\theta_{i}}{(\theta_{k}-\theta_{i})^{2}};

(ii)

for any fixed $r$ -dimensional vectors ${\bf c}_{k}$ , $k=1,\ldots,r$ , if there exist $i\neq j$ such that ${\bf c}_{i}[j]\neq{\bf c}_{j}[i]$ , then

\sqrt{T}\sum_{k=1}^{r}{\bf c}_{k}^{\mathrm{T}}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA})\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N\left(0,\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}({\bf c}_{k}[i]-{\bf c}_{i}[k])^{2}\right);

(iii)

for $1\leq\ell,k\leq r$ , the $\ell$ th principal eigenvalue $\sqrt{T}(\widehat{\lambda}_{\ell}/\lambda_{\ell}-1)$ and the $k$ th principal eigenvector $\sqrt{T}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA})$ are asymptotically independent.

Remark 8.

The conclusion in (i) coincides with Theorem 3.2 in Wang and Fan (2017), which is proved under the sub-Gaussian assumption.

Proposition 3.

Under Assumptions A–C, for $1\leq\ell\leq r$ , we have

\sqrt{T}\left(\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{kA}^{2}),

where $\sigma_{kA}^{2}=\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x)$ and the function $F(\cdot)$ is a distribution function whose Stieltjes transform, $m_{F}$ , is the unique solution in the set $\{m_{F}\in{\mathbb{C}}^{+}:-(1-\rho)/z+\rho m_{F}\in{\mathbb{C}}^{+}\}$ to the equation

m_{F}=\int\dfrac{dH(\tau)}{\tau(1-\rho-\rho zm_{F})-z},

where $H$ is given in Assumption (A.ii).

Remark 9.

Under Assumptions A–C, Proposition 3 can be rewritten as

N^{3/2}\left(1-\|{\bf u}_{kA}\|^{2}-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\rho\sigma_{kA}^{2}/\theta_{k}^{2}),

\displaystyle N^{3/2}\left(1-\|{\bf u}_{kA}\|-\dfrac{1}{2T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\rho\sigma_{kA}^{2}/(4\theta_{k}^{2})).

(20)

In particular, $N\left(1-\|{\bf u}_{kA}\|\right)\,{\buildrel p\over{\longrightarrow}}\,\dfrac{\rho}{2\theta_{k}}\int xdF(x)$ .

A. Proof of Proposition 1

Proof.

By definition,

{\bf S}_{N}{\bf u}_{k}=\widehat{\lambda}_{k}{\bf u}_{k}.

Writing ${\bf u}_{k}=({\bf u}_{kA}^{\mathrm{T}},{\bf u}_{kB}^{\mathrm{T}})^{\mathrm{T}}$ gives us that

	$\displaystyle{\bf S}_{11}{\bf u}_{kA}+{\bf S}_{12}{\bf u}_{kB}$	$\displaystyle=$	$\displaystyle\widehat{\lambda}_{k}{\bf u}_{kA},$		(21)
	$\displaystyle{\bf S}_{21}{\bf u}_{kA}+{\bf S}_{22}{\bf u}_{kB}$	$\displaystyle=$	$\displaystyle\widehat{\lambda}_{k}{\bf u}_{kB}.$		(22)

Solving (22) for ${\bf u}_{kB}$ yields

\displaystyle{\bf u}_{kB}=(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}{\bf u}_{kA}.

(23)

Replacing ${\bf u}_{kB}$ in (21) with (23) gives

\left({\bf S}_{11}+{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}\right){\bf u}_{kA}=\widehat{\lambda}_{k}{\bf u}_{kA},

in other words, $\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k}){\bf u}_{kA}=\widehat{\lambda}_{k}{\bf u}_{kA}$ .

To prove Proposition 1, we first show that $\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\,{\buildrel p\over{\longrightarrow}}\,0$ , where $\widetilde{{\bf u}}_{kA}={\bf u}_{kA}/\|{\bf u}_{kA}\|$ . It follows from the definition of ${\bf K}_{N}$ that

$\displaystyle\widetilde{{\bf u}}_{kA}$	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\widetilde{{\bf u}}_{kA}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}\left({\bf K}_{N}(\widehat{\lambda}_{k})-{\bf K}_{N}(\lambda_{k})\right){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf u}}_{kA}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+\dfrac{1}{\widehat{\lambda}_{k}^{2}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right){\boldsymbol{\Lambda}}_{A}^{1/2}{\bf M}_{N}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA},$

where

{\bf M}_{N}=\dfrac{1}{T^{2}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\widehat{\lambda}_{k}^{-1}{\bf S}_{22})^{-1}({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-1}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}.

Because $\|{\bf S}_{22}\|$ and $\|(1/T)Z_{A}Z_{A}^{\mathrm{T}}\|$ are both $O_{p}(1)$ , and $\lambda_{k}\asymp N$ and $\widehat{\lambda}_{k}=O_{p}(N)$ , we have $\|{\bf M}_{N}\|=O_{p}(1)$ . Thus, by Theorem 1 and Assumption A, we get

\widetilde{{\bf u}}_{kA}=\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+O_{p}(N^{-3/2}).

For the first term, consider the following decomposition:

$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}$	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf K}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}\left(\dfrac{1}{\sqrt{T}}{\bf R}_{N}(\lambda_{k})+\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{k}))\right){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\dfrac{1}{\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}\cdot\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\dfrac{1}{\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+O_{p}(1/N),$

where the last step comes from Assumption A and Lemma 1. Hence,

\displaystyle\widetilde{{\bf u}}_{kA}=\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+O_{p}(1/N).

(24)

Subtracting $({\boldsymbol{\Lambda}}_{A}/\lambda_{k})\widetilde{{\bf u}}_{kA}$ on both sides of (24) yields

	$\displaystyle\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)\widetilde{{\bf u}}_{kA}$	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}\ {\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}$		(25)
			$\displaystyle~{}~{}~{}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\widetilde{{\bf u}}_{kA}+O_{p}(1/N).$		(25)

Further define

{\mathcal{O}}_{N,k}=\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}}{\lambda_{k}-\lambda_{i}}\ \widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}.

It is easy to see that

{\mathcal{O}}_{N,k}\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)=\sum_{i\neq k,i=1}^{r}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}={\bf I}-\widetilde{{\bf e}}_{kA}\widetilde{{\bf e}}_{kA}^{\mathrm{T}}.

Left-multiplying ${\mathcal{O}}_{N,k}$ on both sides of (25) yields

	$\displaystyle\widetilde{{\bf u}}_{kA}-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA}$	$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}$		(26)
			$\displaystyle~{}~{}~{}+\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{\widehat{\lambda}_{k}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}+O_{p}(1/N).$		(26)

By Lemma 2 and Theorem 1, we get

\widetilde{{\bf u}}_{kA}-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA}=O_{P}(1/\sqrt{T}).

It follows that $\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\,{\buildrel p\over{\longrightarrow}}\,0$ .

Replacing $\widehat{\lambda}_{k}$ and $\widetilde{{\bf u}}_{kA}$ on the right hand side of equation (24) with $\lambda_{k}$ and $\widetilde{{\bf e}}_{kA}$ , respectively, yields

	$\displaystyle\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}$	$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+\left(\dfrac{\lambda_{k}}{\widehat{\lambda}_{k}}-1\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\widetilde{{\bf e}}_{kA}$
			$\displaystyle~{}~{}~{}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)+\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}-{\bf I}\right)\widetilde{{\bf e}}_{kA}+o_{p}(1/\sqrt{T}).$

Rewrite the above equation as

	$\displaystyle\sqrt{T}\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)$	$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+\sqrt{T}\left(\dfrac{\lambda_{k}}{\widehat{\lambda}_{k}}-1\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\widetilde{{\bf e}}_{kA}$
			$\displaystyle~{}~{}~{}+\sqrt{T}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}-{\bf I}\right)\widetilde{{\bf e}}_{kA}+o_{p}(1).$

Multiplying ${\mathcal{O}}_{N,k}$ on both sides yields

	$\displaystyle\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)$	$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}$		(27)
			$\displaystyle~{}~{}~{}+\sqrt{T}\left(\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle-1\right)\widetilde{{\bf e}}_{kA}+o_{p}(1),$		(27)

where the last step comes from the facts that ${\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf e}}_{kA}=0$ and ${\mathcal{O}}_{N,k}\widetilde{{\bf e}}_{kA}=0$ . Write

{\bf W}_{k}^{\perp}=\sum_{i\neq k,i=1}^{r}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}.

Then

\widetilde{{\bf u}}_{kA}=\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA}+{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}.

Notice that $\widetilde{{\bf u}}_{kA}$ and $\widetilde{{\bf e}}_{kA}$ are both unit vectors, thus

\displaystyle 1=\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}+\|{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}\|^{2}.

(28)

From (27) and the fact that ${\bf W}_{k}^{\perp}\widetilde{{\bf e}}_{kA}=0$ , we get

	$\displaystyle{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}$	$\displaystyle=$	$\displaystyle{\bf W}_{k}^{\perp}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)$		(29)
		$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}{\bf W}_{k}^{\perp}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf e}}_{kA}+o_{p}(1/\sqrt{T}).$		(29)

Combining (28) and (29) gives

	$\displaystyle T\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)=T\\|{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}\\|^{2}$	(30)
$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}^{2}}\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}{\mathcal{O}}_{N,k}{\bf W}_{k}^{\perp}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+o_{p}(1)$
$\displaystyle=$	$\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}^{2}}{(\lambda_{k}-\lambda_{i})^{2}}\left[\dfrac{1}{\lambda_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\right]_{ki}^{2}+o_{p}(1)$
$\displaystyle=$	$\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}[{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1).$

By Lemma 2, the conclusion follows. ∎

B. Proof of Proposition 2

Proof.

By (27), we obtain

	$\displaystyle\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)$	(31)
$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}}{\mathcal{O}}_{N,k}\ {\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf e}}_{kA}+\sqrt{T}\left(\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle-1\right)\widetilde{{\bf e}}_{kA}+o_{p}(1)$
$\displaystyle=$	$\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{1}{\lambda_{k}-\lambda_{i}}\left[{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\right]_{ki}\widetilde{{\bf e}}_{iA}-\sqrt{T}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)\widetilde{{\bf e}}_{kA}+o_{p}(1)$
$\displaystyle=$	$\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}[{\bf R}_{N}(\lambda_{k})]_{ki}\widetilde{{\bf e}}_{iA}-\sqrt{T}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)\widetilde{{\bf e}}_{kA}+o_{p}(1).$

By Lemma 2, the conclusion in Part (i) follows.

Next, for any fixed vectors ${\bf c}_{k},\ k=1,\ldots,r$ , if there exist $i\neq j$ such that ${\bf c}_{i}[j]\neq{\bf c}_{j}[i]$ , then by (31) and (30) and Lemma 2, we have

\displaystyle\sqrt{T}\sum_{k=1}^{r}{\bf c}_{k}^{\mathrm{T}}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)=I_{1}-\dfrac{1}{\sqrt{T}}I_{2}+o_{p}(1),

(32)

where

$\displaystyle I_{1}$	$\displaystyle=$	$\displaystyle\sum_{k=1}^{r}\sum_{i\neq k,i=1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}\ [{\bf R}_{N}(\lambda_{k})]_{ki}\ {\bf c}_{k}[i]$
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}\left({\bf c}_{k}[i]-{\bf c}_{i}[k]\right)\ [{\bf R}_{N}(\lambda_{k})]_{ki}$
	$\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}$	$\displaystyle N\left(0,\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}({\bf c}_{k}[i]-{\bf c}_{i}[k])^{2}\right),$

and

$\displaystyle I_{2}$	$\displaystyle=$	$\displaystyle\sum_{k=1}^{r}{\bf c}_{k}[k]\cdot T\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)$
	$\displaystyle=$	$\displaystyle\dfrac{1}{2}\sum_{k=1}^{r}\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}{\bf c}_{k}[k][{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1)$
	$\displaystyle=$	$\displaystyle\dfrac{1}{2}\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}\left({\bf c}_{k}[k]+{\bf c}_{i}[i]\right)[{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1)$
	$\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}$	$\displaystyle\dfrac{1}{2}\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}\left({\bf c}_{k}[k]+{\bf c}_{i}[i]\right)Z_{ki}^{2},$

where $Z_{ki}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,1)$ . The conclusion in Part (ii) follows.

Finally, by (17) and (31), Proposition 1 and Lemma 2, $\sqrt{T}(\widehat{\lambda}_{\ell}/\lambda_{\ell}-1)$ and $\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-{\bf e}_{kA}\right)$ are asymptotically independent. ∎

C. Proof of Proposition 3

Proof.

Recall that ${\bf u}_{kB}=\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{21}{\bf u}_{kA}$ . We have

\|{\bf u}_{kA}\|^{2}=1-{\bf u}_{kB}^{\mathrm{T}}{\bf u}_{kB}=1-{\bf u}_{kA}^{\mathrm{T}}{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}{\bf u}_{kA}.

Dividing both sides by $\|{\bf u}_{kA}\|^{2}$ yields

\displaystyle 1=\dfrac{1}{\|{\bf u}_{kA}\|^{2}}-\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}.

(33)

Hence,

	$\displaystyle 1-\\|{\bf u}_{kA}\\|^{2}$	$\displaystyle=$	$\displaystyle 1-\dfrac{1}{1+\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}}$		(34)
		$\displaystyle=$	$\displaystyle\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+\widetilde{\varepsilon}_{kA},$		(34)

where

\widetilde{\varepsilon}_{kA}=\dfrac{-\left(\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}\right)^{2}}{1+\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}}.

To derive the CLT of $\|{\bf u}_{kA}\|^{2}$ , we need to analyze the term ${\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}$ . We first study the difference when replacing $\widehat{\lambda}_{k}$ with $\lambda_{k}$ in ${\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}$ . We have

			$\displaystyle{\bf S}_{12}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-2}\right){\bf S}_{21}$
		$\displaystyle=$	$\displaystyle{\bf S}_{12}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right)\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}+(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right){\bf S}_{21}$
		$\displaystyle=$	$\displaystyle L_{1}+L_{2},$

where

	$\displaystyle L_{1}$	$\displaystyle=$	$\displaystyle(\lambda_{k}-\widehat{\lambda}_{k}){\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21},$
	$\displaystyle L_{2}$	$\displaystyle=$	$\displaystyle(\lambda_{k}-\widehat{\lambda}_{k}){\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}.$

Define

\widetilde{{\bf Q}}_{N}(\lambda_{k})=\dfrac{1}{\sqrt{T}}\left\{\dfrac{1}{T}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)\cdot{\bf I}\right\}.

By Theorem 7.1 of Bai and Yao (2008), $\widetilde{{\bf Q}}_{N}(\lambda_{k})$ converges weakly to a symmetric Gaussian random matrix ${\bf Q}^{*}$ with zero mean and finite covariance functions. Using the definitions of ${\bf S}_{12}$ and ${\bf S}_{21}$ , we can rewrite $L_{1}$ as

$\displaystyle L_{1}$	$\displaystyle=$	$\displaystyle(\lambda_{k}-\widehat{\lambda}_{k})\ {\bf S}_{12}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf S}_{21}$
		$\displaystyle+(\lambda_{k}-\widehat{\lambda}_{k})^{2}\ {\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf S}_{21}$
	$\displaystyle=$	$\displaystyle(\lambda_{k}-\widehat{\lambda}_{k})\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}$
		$\displaystyle+(\lambda_{k}-\widehat{\lambda}_{k})^{2}\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}$
	$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2}),$

where the last two steps follow from Assumption A and the facts that $1-\widehat{\lambda}_{k}/\lambda_{k}=O_{p}(1/\sqrt{T})$ , $\widetilde{{\bf Q}}_{N}(\lambda_{k})=O_{p}(1)$ and $\|{\bf S}_{22}\|=O_{p}(1)$ . Similarly, we obtain

L_{2}=\dfrac{1}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2}).

In addition, we have

	$\displaystyle{\bf S}_{12}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}$
		$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}^{2}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}^{2}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right),$

where

\widetilde{{\bf R}}_{N}(\lambda_{k})=\dfrac{1}{\sqrt{T}}\left\{\dfrac{1}{T}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\cdot{\bf I}\right\}.

By Theorem 7.1 of Bai and Yao (2008), $\widetilde{{\bf R}}_{N}(\lambda_{k})$ converges weakly to a symmetric Gaussian random matrix $\widetilde{{\bf R}}$ with zero mean and finite covariance functions. Combining the results above, we obtain

	$\displaystyle{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}$	(35)
$\displaystyle=$	$\displaystyle{\bf S}_{12}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{{}_{2}}{\bf S}_{21}+{\bf S}_{12}\left(\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}-\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}\right){\bf S}_{21}$
$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}^{2}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}^{2}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)$
	$\displaystyle+\dfrac{2}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2})$
$\displaystyle=$	$\displaystyle O_{p}\left(\dfrac{1}{\lambda_{k}}\right).$

It follows that $\widetilde{\varepsilon}_{kA}=O_{p}(1/\lambda_{k}^{2})$ , and

1-\|{\bf u}_{kA}\|^{2}=\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+O_{p}(1/\lambda_{k}^{2}).

(36)

Next, we derive the limit of $\lambda_{k}\left(1-\|{\bf u}_{kA}||^{2}\right)$ . By Assumption A and (35), we have

			$\displaystyle\lambda_{k}\left(1-\\|{\bf u}_{kA}\\|^{2}\right)=\lambda_{k}\cdot\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+O_{p}(1/\lambda_{k})$
		$\displaystyle=$	$\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\cdot\dfrac{\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}}{\lambda_{k}}$
			$\displaystyle+2\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)\cdot\dfrac{\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}}{\lambda_{k}}+o_{p}(1/\sqrt{T}).$

Replacing $\widetilde{{\bf u}}_{kA}$ with $\widetilde{{\bf e}}_{kA}$ and using Proposition 2, we get

	$\displaystyle\lambda_{k}\left(1-\\|{\bf u}_{kA}\\|^{2}\right)=$	$\displaystyle\dfrac{1}{\sqrt{T}}[\widetilde{{\bf R}}_{N}(\lambda_{k})]_{kk}+\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)$		(37)
		$\displaystyle+2\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+o_{p}(1/\sqrt{T}).$		(37)

Under Assumptions A–C, we have

\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\ \,{\buildrel p\over{\longrightarrow}}\,\ \rho\int xdF(x),

where $F(\cdot)$ is the LSD of ${\bf S}_{22}$ . According to Theorem 1.1 of Silverstein and Bai (1995), the Stieltjes transform of $F$ , $m_{F}$ , is the unique solution in the set $\{m_{F}\in{\mathbb{C}}^{+}:-(1-\rho)/z+\rho m_{F}\in{\mathbb{C}}^{+}\}$ to the equation

m_{F}=\int\dfrac{dH(\tau)}{\tau(1-\rho-\rho zm_{F})-z}.

Therefore, $\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)\,{\buildrel p\over{\longrightarrow}}\,\rho\int xdF(x)$ .

We now consider the limiting distribution of $\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)$ . By (37) and (17), we get

			$\displaystyle\sqrt{T}\left\{\lambda_{k}\left(1-\\|{\bf u}_{kA}\\|^{2}\right)-\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\right\}$
		$\displaystyle=$	$\displaystyle[\widetilde{{\bf R}}_{N}(\lambda_{k})]_{kk}-2[{\bf R}_{N}(\lambda_{k})]_{kk}\cdot\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+o_{p}(1).$

Notice that

\displaystyle\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}=\begin{pmatrix}\dfrac{1}{\sqrt{T}}[{\bf Z}_{A}\widetilde{{\bf C}}_{N}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}(\widetilde{{\bf C}}_{N})\cdot{\bf I}]_{kk}\\ \dfrac{1}{\sqrt{T}}[{\bf Z}_{A}\widetilde{{\bf D}}_{N}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}(\widetilde{{\bf D}}_{N})\cdot{\bf I}]_{kk}\end{pmatrix}=\begin{pmatrix}\dfrac{1}{\sqrt{T}}\left({\bf z}_{(k)}^{\mathrm{T}}\widetilde{{\bf C}}_{N}{\bf z}_{(k)}-\operatorname{tr}(\widetilde{{\bf C}}_{N})\right)\\ \dfrac{1}{\sqrt{T}}\left({\bf z}_{(k)}^{\mathrm{T}}\widetilde{{\bf D}}_{N}{\bf z}_{(k)}-\operatorname{tr}(\widetilde{{\bf D}}_{N})\right)\end{pmatrix},

where

	$\displaystyle\widetilde{{\bf C}}_{N}$	$\displaystyle=$	$\displaystyle\widetilde{{\bf C}}_{N}(\lambda)={\bf I}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B},$
	$\displaystyle\widetilde{{\bf D}}_{N}$	$\displaystyle=$	$\displaystyle\widetilde{{\bf D}}_{N}(\lambda)=\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf X}_{B}.$

To finish the proof, we need the following lemma, which will be proved at the end of this subsection.

Lemma 4.

Under Assumptions A–C, if $\lambda\asymp T$ , then $\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}$ converges weakly to a zero-mean Gaussian vector with covariance matrix ${\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,21}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix},$ where

$\displaystyle{\boldsymbol{\Omega}}_{k,11}$	$\displaystyle=$	$\displaystyle\operatorname{E}({\bf z}_{1}[k])^{4}-1,$
$\displaystyle{\boldsymbol{\Omega}}_{k,22}$	$\displaystyle=$	$\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x),$
$\displaystyle{\boldsymbol{\Omega}}_{k,12}$	$\displaystyle=$	$\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-1\right)\cdot\rho\int xdF(x),$

where $F(\cdot)$ is the LSD of ${\bf S}_{22}$ .

Based on the lemma above, we conclude that

\sqrt{T}\left\{\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\right\}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{kA}^{2}),

where

	$\displaystyle\sigma_{kA}^{2}$	$\displaystyle=$	$\displaystyle\left(-2\rho\int xdF(x),~{}~{}1\right)\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,12}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}\begin{pmatrix}-2\rho\int xdF(x)\\ 1\end{pmatrix}$
		$\displaystyle=$	$\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x).$

Further, from Assumption A that $\lambda_{k}=O_{p}(N)$ for $k\leq r$ , and the boundedness of the eigenvalues of ${\bf S}_{22}$ and $\max_{j>r}\widehat{\lambda}_{j}$ with high probability, it follows that

\sqrt{T}\left[\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)-\dfrac{1}{T}\operatorname{tr}({\bf S}_{22})\right]\,{\buildrel p\over{\longrightarrow}}\,0,

and

\sqrt{T}\left[\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}-\dfrac{1}{T}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}\right]\,{\buildrel p\over{\longrightarrow}}\,0.

Recall that

\sqrt{T}\left[\dfrac{1}{T}\operatorname{tr}({\bf S}_{22})-\dfrac{1}{T}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}\right]\,{\buildrel p\over{\longrightarrow}}\,0,

which has been shown in the proof of (19). Therefore, Proposition 3 follows.

∎

At last, we prove Lemma 4.

Proof of Lemma 4.

By Theorem 2.1 of Wang et al. (2014), $\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}$ converges weakly to a zero-mean Gaussian vector with covariance matrix ${\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,21}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix},$ where

$\displaystyle{\boldsymbol{\Omega}}_{k,11}$	$\displaystyle=$	$\displaystyle\omega_{1}A_{1}+(\tau_{1}-\omega_{1})(A_{2}+A_{3}),$
$\displaystyle{\boldsymbol{\Omega}}_{k,22}$	$\displaystyle=$	$\displaystyle\omega_{2}A_{1}+(\tau_{2}-\omega_{2})(A_{2}+A_{3}),$
$\displaystyle{\boldsymbol{\Omega}}_{k,12}$	$\displaystyle=$	$\displaystyle\omega_{3}A_{1}+(\tau_{3}-\omega_{3})(A_{2}+A_{3}),$

with

A_{1}=\operatorname{E}({\bf z}_{(k)}[1])^{4}-1=\operatorname{E}({\bf z}_{1}[k])^{4}-1,~{}~{}~{}~{}~{}~{}~{}A_{2}=1,~{}~{}~{}~{}~{}~{}~{}~{}A_{3}=1,

\tau_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}^{2}\right),~{}~{}~{}~{}\tau_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}^{2}\right),~{}~{}~{}~{}\tau_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\widetilde{{\bf D}}_{N}\right),

\omega_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf C}}_{N}\right),~{}~{}~{}~{}\omega_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}\circ\widetilde{{\bf D}}_{N}\right),~{}~{}~{}~{}\omega_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf D}}_{N}\right),

where ${\bf A}\circ{\bf B}$ denotes the Hadamard product of two symmetric matrices ${\bf A}$ and ${\bf B}$ , i.e. $[{\bf A}\circ{\bf B}]_{ij}=[{\bf A}]_{ij}\cdot[{\bf B}]_{ij}$ .

To prove Lemma 4, we need to compute the values of $\tau_{i}$ and $\omega_{i}$ , $i=1,2,3$ . We start with $\tau_{i}$ ’s. From the definitions of $\widetilde{{\bf C}}_{N}$ and $\widetilde{{\bf D}}_{N}$ , it is easy to check that

$\displaystyle\tau_{1}$	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf C}}_{N}^{2})$
	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left({\bf I}+\dfrac{2}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B}\right)$
	$\displaystyle=$	$\displaystyle 1+2\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right)+\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right)$
	$\displaystyle=$	$\displaystyle 1;$

$\displaystyle\tau_{2}$	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf D}}_{N}^{2})=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)$
	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf S}_{22}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf S}_{22}\right)$
	$\displaystyle=$	$\displaystyle\rho\int x^{2}dF(x);$

and

$\displaystyle\tau_{3}$	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf C}}_{N}\widetilde{{\bf D}}_{N})$
	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)$
	$\displaystyle=$	$\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)+\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)$
	$\displaystyle=$	$\displaystyle\rho\int x\,dF(x).$

Next, we calculate the values of $\omega_{i}$ , $i=1,2,3$ . Denote by ${\bf X}_{Bi}$ the matrix obtained from ${\bf X}_{B}$ by deleting its $i$ th column. Then

{\bf S}_{22}=\dfrac{1}{T}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}+\dfrac{1}{T}{\bf x}_{i}^{(B)}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}.

Recall that for any invertible matrix ${\bf A}$ and vector ${\bf r}$ , one has

({\bf A}+{\bf r}{\bf r}^{\mathrm{T}})^{-1}={\bf A}^{-1}-\dfrac{{\bf A}^{-1}{\bf r}{\bf r}^{\mathrm{T}}{\bf A}^{-1}}{1+{\bf r}^{\mathrm{T}}{\bf A}^{-1}{\bf r}},

and

{\bf r}^{\mathrm{T}}({\bf A}+{\bf r}{\bf r}^{\mathrm{T}})^{-2}{\bf r}=\dfrac{{\bf r}^{\mathrm{T}}{\bf A}^{-2}{\bf r}}{\left(1+{\bf r}^{\mathrm{T}}{\bf A}^{-1}{\bf r}\right)^{2}}.

By Assumption A, we have

$\displaystyle[\widetilde{{\bf D}}_{N}]_{ii}$	$\displaystyle=$	$\displaystyle\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}(\lambda^{-1}{\bf S}_{22}-{\bf I})^{-2}{{\bf x}_{i}^{(B)}}$	(38)
	$\displaystyle=$	$\displaystyle\dfrac{\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(1/(T\lambda){\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-{\bf I}\right)^{-2}{\bf x}_{i}^{(B)}}{\left(1+\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}\right)^{2}}$
	$\displaystyle\,{\buildrel p\over{\longrightarrow}}\,$	$\displaystyle\rho\int xdF(x),$

and

$\displaystyle[\widetilde{{\bf C}}_{N}]_{ii}$	$\displaystyle=$	$\displaystyle 1-\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}({\bf S}_{22}-\lambda{\bf I})^{-1}{{\bf x}_{i}^{(B)}}$	(39)
	$\displaystyle=$	$\displaystyle 1-\dfrac{\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}}{1+\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}}$
	$\displaystyle\,{\buildrel p\over{\longrightarrow}}\,$	$\displaystyle 1.$

It is easy to see that

\lim_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})<\infty,

and

\lim_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})=0.

Hence

\sup_{N}\operatorname{E}([\widetilde{{\bf D}}_{N}]_{ii}{\mathbf{1}}({\mathcal{F}}_{s}))^{4}\leq\sup_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})<\infty

and $\sup_{N}\operatorname{E}([\widetilde{{\bf C}}_{N}]_{ii}{\mathbf{1}}({\mathcal{F}}_{s}))^{4}<\infty$ , which implies that the family of random variables $\{[\widetilde{{\bf D}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\}$ and $\{[\widetilde{{\bf C}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\}$ are uniformly integrable. Together with (38) and (39) and the fact that ${\mathbf{1}}({\mathcal{F}}_{s})=1$ with high probability, we get

\operatorname{E}\left|\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf D}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-\left(\rho\int xdF(x)\right)^{2}\right|\leq\operatorname{E}\left|[\widetilde{{\bf D}}_{N}]_{11}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-\left(\rho\int xdF(x)\right)^{2}\right|\rightarrow 0,

and

\operatorname{E}\left|\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf C}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-1\right|\rightarrow 0.

Thus, $\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf D}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\,{\buildrel p\over{\longrightarrow}}\,\left(\rho\int xdF(x)\right)^{2}$ and $\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf C}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\,{\buildrel p\over{\longrightarrow}}\,1$ . Moreover, noting that in the event ${\mathcal{F}}_{s}$ , $[\widetilde{{\bf C}}_{N}]_{ii}$ and $[\widetilde{{\bf D}}_{N}]_{ii}$ , $i=1,\ldots,T$ , are uniformly bounded and that $P({\mathcal{F}}_{s})\rightarrow 1$ , we obtain

\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf D}}_{N}]_{ii}^{2}\,{\buildrel p\over{\longrightarrow}}\,\left(\rho\int xdF(x)\right)^{2},~{}~{}\textrm{and}~{}\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf C}}_{N}]_{ii}^{2}\,{\buildrel p\over{\longrightarrow}}\,1

Therefore,

\displaystyle\omega_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf C}}_{N}\right)=1,

\displaystyle\omega_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}\circ\widetilde{{\bf D}}_{N}\right)=\rho^{2}\ \left(\int xdF(x)\right)^{2},

and

\displaystyle\omega_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf D}}_{N}\right)=\rho\ \int xdF(x).

In summary,

\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}~{}~{}\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}~{}~{}N\left(0,{\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,12}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}\right),

where

$\displaystyle{\boldsymbol{\Omega}}_{k,11}$	$\displaystyle=$	$\displaystyle\operatorname{E}({\bf z}_{1}[k])^{4}-1$
$\displaystyle{\boldsymbol{\Omega}}_{k,22}$	$\displaystyle=$	$\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\ \left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x),\mbox{and }$
$\displaystyle{\boldsymbol{\Omega}}_{k,12}$	$\displaystyle=$	$\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-1\right)\cdot\rho\int xdF(x).$

∎

S5. Proof of Theorem 3

Proof.

Recall that $\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle=\langle{\bf u}_{k},{\bf e}_{k}\rangle$ , where ${\bf v}_{k},\widehat{{\bf v}}_{k}$ are the $k$ th principal eigenvector of ${\boldsymbol{\Sigma}}$ and $k$ th principal sample eigenvector of $\widehat{{\boldsymbol{\Sigma}}}_{N}$ , respectively, and ${\bf u}_{k}$ is the $k$ th eigenvector of sample covariance matrix ${\bf S}_{N}$ . Under Assumptions A–C, by Propositions 1 and 3, we get

			$\displaystyle T\left(1-\langle{\bf u}_{k},{\bf e}_{k}\rangle^{2}-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)$
		$\displaystyle=$	$\displaystyle T\left(1-\\|{\bf u}_{kA}\\|^{2}+\\|{\bf u}_{kA}\\|^{2}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)$
		$\displaystyle=$	$\displaystyle\dfrac{T}{\lambda_{k}}\left(\lambda_{k}\left(1-\\|{\bf u}_{kA}\\|^{2}\right)-\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)+T\\|{\bf u}_{kA}\\|^{2}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}$	$\displaystyle\sum_{i\neq k,i=1}^{r}\omega_{ki}\ Z_{i}^{2},$

where $Z_{i}$ are i.i.d. standard normal random variables.

It remains to prove that

\dfrac{1}{\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}}\,{\buildrel p\over{\longrightarrow}}\,0.

Rewrite the term as

			$\displaystyle\dfrac{1}{\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}$
			$\displaystyle+\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}}$
		$\displaystyle=$	$\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}$
			$\displaystyle+\dfrac{1}{\widehat{\lambda}_{k}^{2}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}^{2}\left(2-\widehat{\lambda}_{j}/\lambda_{k}-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}.$

By Assumption A–C and Theorem 1, the term converge to zero in probability. ∎

S6. Proofs of Theorem 6 and Corollary 1

Proof of Theorem 6.

By (31), we have

\sqrt{N}\left(\widetilde{{\bf u}}_{kA}^{(i)}-\widetilde{{\bf e}}_{kA}\right)=\sqrt{\frac{N}{T_{i}}}\sum_{\ell\neq k,\ell=1}^{r}\frac{\sqrt{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}}{\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}}}[{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}\ \widetilde{{\bf e}}_{\ell A}+o_{p}(1),\quad i=1,2.

(40)

Hence, when $\ell\neq k$ ,

\sqrt{N}\ \dfrac{u_{k}^{{(i)}}[\ell]}{\|{\bf u}_{kA}^{{(i)}}\|}=\sqrt{\frac{N}{T}_{i}}\cdot\dfrac{\sqrt{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}}{\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}}}\ [{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}\ +o_{p}(1),\quad i=1,2.

(41)

Similarly, by (30),

N\left(1-\dfrac{|u_{k}^{{(i)}}[k]|}{\|{\bf u}_{kA}^{{(i)}}\|}\right)=\dfrac{N}{2T_{i}}\cdot\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}{(\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}})^{2}}[{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}^{2}+o_{p}(1),\quad i=1,2.

(42)

Proposition 3 implies that

N(1-\|{\bf u}_{kA}^{{(i)}}\|)\ \,{\buildrel p\over{\longrightarrow}}\,\ \dfrac{\rho_{i}}{2\theta_{k}^{{(i)}}}\int xdF^{{(i)}}(x),\quad i=1,2.

(43)

Write the two population eigen-matrices ${{\bf V}}^{(1)},{{\bf V}}^{(2)}$ as

{{\bf V}}^{(i)}=({{\bf v}}_{1}^{(i)},\ldots,{{\bf v}}_{N}^{(i)}),~{}~{}~{}~{}~{}~{}i=1,2,

and define

\boldsymbol{\Xi}=([\boldsymbol{\Xi}]_{ij})={{\bf V}}^{(1)}{}^{\mathrm{T}}{{\bf V}}^{(2)}=\begin{pmatrix}\boldsymbol{\Xi}_{11}&\boldsymbol{\Xi}_{12}\\ \boldsymbol{\Xi}_{21}&\boldsymbol{\Xi}_{22},\end{pmatrix}

where $[\boldsymbol{\Xi}]_{ij}={{\bf v}}_{i}^{(1)}{}^{\mathrm{T}}{{\bf v}}_{j}^{(2)}$ , and $\boldsymbol{\Xi}_{11},\boldsymbol{\Xi}_{12},\boldsymbol{\Xi}_{21},\boldsymbol{\Xi}_{22}$ are of sizes $r\times r,r\times(N-r),(N-r)\times r$ and $(N-r)\times(N-r)$ , respectively.

Under null hypothesis $H_{0}^{(III,k)}:|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1$ , the $k$ th row and $k$ th column of $\boldsymbol{\Xi}$ are zero except that the $k$ th diagonal entry is one. To prove the theorem, note that

	$\displaystyle\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle$	$\displaystyle=$	$\displaystyle\langle{{\bf V}}^{(1)}{\bf u}_{k}^{(1)},{{\bf V}}^{(2)}{\bf u}_{k}^{(2)}\rangle={\bf u}_{k}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}{\bf u}_{k}^{(2)}$
		$\displaystyle=$	$\displaystyle{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}+{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}+{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}+{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}.$

We start with the first term ${\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}$ , and will show later that

N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=o_{p}(1),~{}~{}N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}=o_{p}(1),\mbox{ and }N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1).

(44)

Because the entries in the $k$ th row and $k$ th column of $\boldsymbol{\Xi}_{11}$ are zero except that $[\boldsymbol{\Xi}_{11}]_{kk}=1$ , we have

			$\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right)$		(45)
		$\displaystyle=$	$\displaystyle N\left(1-u_{k}^{(1)}[k]\cdot u_{k}^{(2)}[k]\right)-\sum_{i,j=1,i\neq k,j\neq k}^{r}N\ [\boldsymbol{\Xi}_{11}]_{ij}\cdot u_{k}^{(1)}[i]\cdot u_{k}^{(2)}[j].$		(45)

For the first term, we have

	$\displaystyle N\left(1-u_{k}^{(1)}[k]\cdot u_{k}^{(2)}[k]\right)=N(1-u_{k}^{(1)}[k])+u_{k}^{(1)}[k]\cdot N(1-u_{k}^{(2)}[k])$	(46)
$\displaystyle=:$	$\displaystyle N(1-u_{k}^{(1)}[k])+N(1-u_{k}^{(2)}[k])+\varepsilon_{1}$
$\displaystyle=$	$\displaystyle N\left(1-\dfrac{u_{k}^{(1)}[k]}{\\|{\bf u}_{kA}^{(1)}\\|}\right)+N\dfrac{u_{k}^{(1)}[k]}{\\|{\bf u}_{kA}^{(1)}\\|}(1-\\|{\bf u}_{kA}^{(1)}\\|)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\\|{\bf u}_{kA}^{(2)}\\|}\right)$
	$\displaystyle+N\dfrac{u_{k}^{(2)}[k]}{\\|{\bf u}_{kA}^{(2)}\\|}(1-\\|{\bf u}_{kA}^{(2)}\\|)+\varepsilon_{1}$
$\displaystyle=:$	$\displaystyle N\left(1-\dfrac{u_{k}^{(1)}[k]}{\\|{\bf u}_{kA}^{(1)}\\|}\right)+N(1-\\|{\bf u}_{kA}^{(1)}\\|)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\\|{\bf u}_{kA}^{(2)}\\|}\right)$
	$\displaystyle+N(1-\\|{\bf u}_{kA}^{(2)}\\|)+\varepsilon_{2}+\varepsilon_{1},$

where

	$\displaystyle\varepsilon_{1}$	$\displaystyle=$	$\displaystyle N(1-u_{k}^{(2)}[k])(u_{k}^{(1)}[k]-1),~{}~{}~{}~{}\textrm{and}$
	$\displaystyle\varepsilon_{2}$	$\displaystyle=$	$\displaystyle N(1-\\|{\bf u}_{kA}^{(1)}\\|)\left(\dfrac{u_{k}^{(1)}[k]}{\\|{\bf u}_{kA}^{(1)}\\|}-1\right)+N(1-\\|{\bf u}_{kA}^{(2)}\\|)\left(\dfrac{u_{k}^{(2)}[k]}{\\|{\bf u}_{kA}^{(2)}\\|}-1\right).$

By Theorem 3 and Proposition 1, both $\varepsilon_{1}$ and $\varepsilon_{2}$ are $o_{p}(1)$ .

For the second term on the right-hand side of (45), by Proposition 3, we have

	$\displaystyle N\sum_{i,j\neq k,i,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot u_{k}^{(1)}[i]\cdot u_{k}^{(2)}[j]$	(47)
$\displaystyle=$	$\displaystyle N\\|{\bf u}_{kA}^{(1)}\\|\cdot\\|{\bf u}_{kA}^{(2)}\\|\cdot\sum_{i,j\neq k,i,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\\|{\bf u}_{kA}^{(1)}\\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\\|{\bf u}_{kA}^{(2)}\\|}$
$\displaystyle=$	$\displaystyle N\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\\|{\bf u}_{kA}^{(1)}\\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\\|{\bf u}_{kA}^{(2)}\\|}+o_{p}(1).$

Combining (46) and (47) and using (41) and (42), we obtain

			$\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right)$
		$\displaystyle=$	$\displaystyle N\left(1-\\|{\bf u}_{kA}^{(1)}\\|\right)+N\left(1-\\|{\bf u}_{kA}^{(2)}\\|\right)+N\left(1-\dfrac{u_{k}^{(1)}[k]}{\\|{\bf u}_{kA}^{(1)}\\|}\right)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\\|{\bf u}_{kA}^{(2)}\\|}\right)$
			$\displaystyle-N\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\\|{\bf u}_{kA}^{(1)}\\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\\|{\bf u}_{kA}^{(2)}\\|}+o_{p}(1)$
		$\displaystyle=$	$\displaystyle N\left(1-\\|{\bf u}_{kA}^{(1)}\\|\right)+N\left(1-\\|{\bf u}_{kA}^{(2)}\\|\right)$
			$\displaystyle+\dfrac{N}{2T_{1}}\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}^{(1)}\lambda_{i}^{(1)}}{(\lambda_{k}^{(1)}-\lambda_{i}^{(1)})^{2}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{ki}^{2}+\dfrac{N}{2T_{2}}\sum_{j\neq k,j=1}^{r}\dfrac{\lambda_{k}^{(2)}\lambda_{j}^{(1)}}{(\lambda_{k}^{(2)}-\lambda_{j}^{(2)})^{2}}[{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kj}^{2}$
			$\displaystyle-\sqrt{\dfrac{N^{2}}{T_{1}T_{2}}}\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{i}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{i}^{(1)}}\dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{j}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{j}^{(2)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{ki}\cdot[{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kj}\cdot(\boldsymbol{\Xi}_{11})_{ij}$
			$\displaystyle+o_{p}(1).$

For $k=1,\ldots,r$ , define two $(r-1)\times 1$ vectors ${\bf a}_{k}$ and ${\bf b}_{k}$ as

\displaystyle{\bf a}_{k}=\begin{pmatrix}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k1}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{k-1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{k-1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k(k-1)}\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{k+1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{k+1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k(k+1)}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{r}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{r}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{kr}\end{pmatrix},~{}~{}~{}~{}~{}~{}~{}{\bf b}_{k}=\begin{pmatrix}\dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k1}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{k-1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{k-1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k(k-1)}\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{k+1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{k+1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k(k+1)}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{r}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{r}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kr}\end{pmatrix}.

Under the assumptions of Theorem 6, by Lemma 2, we have

\displaystyle{\bf a}_{k}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\bf D}_{a_{k}}),~{}~{}~{}~{}~{}{\bf b}_{k}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\bf D}_{b_{k}}),

(48)

where

	$\displaystyle{\bf D}_{a_{k}}$	$\displaystyle=$	$\displaystyle\operatorname{diag}(\omega_{k1}^{(1)},\ldots,\omega_{k(k-1)}^{(1)},\omega_{k(k+1)}^{(1)},\ldots,\omega_{kr}^{(1)}),$
	$\displaystyle{\bf D}_{b_{k}}$	$\displaystyle=$	$\displaystyle\operatorname{diag}(\omega_{k1}^{(2)},\ldots,\omega_{k(k-1)}^{(2)},\omega_{k(k+1)}^{(2)},\ldots,\omega_{kr}^{(2)}),$

and

\omega_{kj}^{(i)}=\dfrac{\theta_{k}^{(i)}\theta_{j}^{(i)}}{(\theta_{k}^{(i)}-\theta_{j}^{(i)})^{2}},~{}~{}~{}~{}~{}~{}\textrm{for}~{}~{}~{}i=1,2,~{}~{}~{}1\leq j\neq k\leq r.

Let $\boldsymbol{\Xi}_{11,-k}$ be the matrix obtained by removing the $k$ th row and $k$ th column of $\boldsymbol{\Xi}_{11}$ . Then by (45), (43) and (48), we get

			$\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right)$
		$\displaystyle=$	$\displaystyle N\left(1-\\|{\bf u}_{kA}^{(1)}\\|\right)+N\left(1-\\|{\bf u}_{kA}^{(2)}\\|\right)$
			$\displaystyle+\dfrac{N}{2T_{1}}{\bf a}_{k}^{\mathrm{T}}{\bf a}_{k}+\dfrac{N}{2T_{2}}{\bf b}_{k}^{\mathrm{T}}{\bf b}_{k}-\sqrt{\dfrac{N^{2}}{T_{1}T_{2}}}{\bf a}_{k}^{\mathrm{T}}\boldsymbol{\Xi}_{11,-k}{\bf b}_{k}+o_{p}(1)$
		$\displaystyle=$	$\displaystyle N\left(1-\\|{\bf u}_{kA}^{(1)}\\|\right)+N\left(1-\\|{\bf u}_{kA}^{(2)}\\|\right)$
			$\displaystyle+\dfrac{1}{2}\begin{pmatrix}\sqrt{\dfrac{N}{T_{1}}}{\bf a}_{k}\\ \sqrt{\dfrac{N}{T_{2}}}{\bf b}_{k}\end{pmatrix}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}\\ -\boldsymbol{\Xi}_{11,-k}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}\begin{pmatrix}\sqrt{\dfrac{N}{T_{1}}}{\bf a}_{k}\\ \sqrt{\dfrac{N}{T_{2}}}{\bf b}_{k}\end{pmatrix}+o_{p}(1)$
		$\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}$	$\displaystyle\dfrac{\rho_{1}}{2\theta_{k}^{(1)}}\int xdF^{(1)}+\dfrac{\rho_{2}}{2\theta_{k}^{(2)}}\int xdF^{(2)}+\dfrac{1}{2}{\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{}\\ -{\boldsymbol{\Xi}_{11,-k}^{}}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k},$

where ${\bf q}_{k}\sim N(0,{\bf D}_{k})$ with ${\bf D}_{k}=\begin{pmatrix}\rho_{1}{\bf D}_{a_{k}}&0\\ 0&\rho_{2}{\bf D}_{b_{k}}\end{pmatrix}.$

Combining (19), (44) and the convergence

\dfrac{N^{2}}{T_{i}(N-r)\widehat{\lambda}_{k}^{(i)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(i)}\ \,{\buildrel p\over{\longrightarrow}}\,\ \dfrac{\rho_{i}}{\theta_{k}^{(i)}}\int xdF^{(i)},~{}~{}~{}~{}i=1,2,

we get that our test statistic

T_{vk}=2N\left(1-\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle\right)-\dfrac{N^{2}}{T_{1}(N-r)\widehat{\lambda}_{k}^{(1)}}\cdot\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(1)}-\dfrac{N^{2}}{T_{2}(N-r)\widehat{\lambda}_{k}^{(2)}}\cdot\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(2)}

converges weakly to

{\bf q}_{k}^{T}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{*}\\ -{\boldsymbol{\Xi}_{11,-k}^{*}}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k}.

It remains to prove (44). By (23), we have

\displaystyle N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}+\varepsilon_{3},

where

\varepsilon_{3}=N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}]{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}.

(49)

Write

N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}=I_{1}+I_{2}+\varepsilon_{4},

where

$\displaystyle I_{1}$	$\displaystyle=$	$\displaystyle N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)},$
$\displaystyle I_{2}$	$\displaystyle=$	$\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)},\quad\mbox{and}$
$\displaystyle\varepsilon_{4}$	$\displaystyle=$	$\displaystyle N({\bf u}_{kA}^{(1)}-\widetilde{{\bf u}}_{kA}^{(1)})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
		$\displaystyle+N\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}({\bf u}_{kA}^{(2)}-\widetilde{{\bf u}}_{kA}^{(2)})$
	$\displaystyle=:$	$\displaystyle\varepsilon_{41}+\varepsilon_{42}.$

For term $I_{1}$ , note that $\widetilde{{\bf e}}_{kA}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}$ is the $k$ th row of $\boldsymbol{\Xi}_{12}$ which is zero, hence $I_{1}=0$ .

Next, we prove that $I_{2}=o_{p}(1)$ . Write

$\displaystyle I_{2}$	$\displaystyle=$	$\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}(\widetilde{{\bf u}}_{kA}^{(2)}-\widetilde{{\bf e}}_{kA})$
		$\displaystyle+N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}$
	$\displaystyle=:$	$\displaystyle I_{21}+I_{22}.$

By Proposition 2 and that $\lambda_{k}^{(2)}=O(N)$ , we get $I_{21}=o_{p}(1)$ . As to $I_{22}$ , by (40), Assumption A and the facts that $\|{\bf S}_{22}^{(2)}\|=O_{p}(1)$ and $\|{\bf S}_{21}^{(2)}\|=O_{p}(\sqrt{N})$ , we obtain

$\displaystyle I_{22}$	$\displaystyle=$	$\displaystyle\dfrac{N}{\sqrt{T_{1}}}\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{\ell}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{\ell}^{(1)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k\ell}$
		$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\times\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}\dfrac{1}{T_{2}}{\bf X}_{B}^{(2)}{{\bf Z}_{A}^{(2)}}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(2)}}^{1/2}\widetilde{{\bf e}}_{kA}+o_{p}(1)$
	$\displaystyle=$	$\displaystyle\dfrac{N}{\sqrt{T_{1}\lambda_{k}^{(2)}}}\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{\ell}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{\ell}^{(1)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k\ell}$
		$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\times\dfrac{1}{T_{2}}\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}+o_{p}(1).$

Using the independence between ${\bf z}_{(k)}^{(2)}$ and ${\bf X}_{B}^{(2)}$ , ${\bf S}_{22}^{(2)}$ , Assumption A and that $\|{\bf S}_{22}\|=O_{p}(1)$ , we have

\dfrac{1}{T_{2}}\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}=o_{p}(1).

Therefore, $I_{22}=o_{p}(1)$ .

We now analyze $\varepsilon_{4}$ . For $\varepsilon_{41}$ , because $N(\|{\bf u}_{kA}^{(1)}\|-1)=O_{p}(1)$ , by Proposition 3 and that $||{\bf S}_{21}^{(2)}||=O_{p}(\sqrt{N})$ , we get $\varepsilon_{41}=o_{p}(1)$ . Similarly, we get $\varepsilon_{42}=o_{p}(1)$ .

To sum up, we have shown that

N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}=o_{p}(1).

(50)

Next, we prove that $\varepsilon_{3}=o_{p}(1)$ . We have

$\displaystyle\varepsilon_{3}$	$\displaystyle=$	$\displaystyle N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
	$\displaystyle=$	$\displaystyle N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
		$\displaystyle+N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}](\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
	$\displaystyle=$	$\displaystyle\dfrac{N}{\lambda_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\lambda_{k}^{(2)}}\right){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
		$\displaystyle+\dfrac{N}{\widehat{\lambda}_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\lambda_{k}^{(2)}}\right)^{2}{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\widehat{\lambda}_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
	$\displaystyle=:$	$\displaystyle\varepsilon_{31}+\varepsilon_{32}.$

Following the same proof strategy as for (50) and applying Theorem 1, we get $\varepsilon_{11}=o_{p}(1)$ . For $\varepsilon_{32}$ , using Assumption (A.i), Theorem 1 and that $||{\bf S}_{21}^{(2)}||=O_{p}(\sqrt{N})$ , we get $\varepsilon_{42}=o_{p}(1)$ .

To sum up, we have

N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=o_{p}(1).

Using the same argument we get $N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}=o_{p}(1).$

Finally, we show that $N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1).$ By (23), we have

			$\displaystyle N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}$
		$\displaystyle=$	$\displaystyle N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}$
		$\displaystyle=$	$\displaystyle N\\|{\bf u}_{kA}^{(1)}\\|\cdot\\|{\bf u}_{kA}^{(2)}\\|\cdot\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}$
		$\displaystyle=$	$\displaystyle N\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}+o_{p}(1),$

where the last step follows from Proposition 3. Note that by equation (40), we have

			$\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}$
		$\displaystyle=$	$\displaystyle O_{p}\left(\sqrt{N}\cdot\sqrt{\lambda_{k}^{(1)}}\cdot\dfrac{1}{\lambda_{k}^{(1)}}\cdot\dfrac{1}{\lambda_{k}^{(2)}}\cdot\sqrt{\lambda_{k}^{(2)}}\right)=o_{p}(1).$

Similarly,

N\ \widetilde{{\bf e}}_{kA}^{T}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}(\widetilde{{\bf u}}_{kA}^{(2)}-\widetilde{{\bf e}}_{kA})=o_{p}(1).

Therefore,

N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}+o_{p}(1).

Note further that by Theorem 1,

			$\displaystyle N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}\left[(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}-(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\right]\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}$
		$\displaystyle=$	$\displaystyle\dfrac{N}{\widehat{\lambda}_{k}^{(1)}\widehat{\lambda}_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(1)}}{\lambda_{k}^{(1)}}\right)$
			$\displaystyle\times\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}({\bf I}-1/\widehat{\lambda}_{k}^{(1)}\cdot{\bf S}_{22}^{(1)})^{-1}({\bf I}-1/\lambda_{k}^{(1)}\cdot{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}({\bf I}-1/\widehat{\lambda}_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}$
		$\displaystyle=$	$\displaystyle o_{p}(1).$

Similarly,

N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}\left[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}\right]{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}=o_{p}(1).

It follows that

\displaystyle N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=N\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}+o_{p}(1).

Note that

			$\displaystyle N\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}$
		$\displaystyle=$	$\displaystyle\dfrac{N}{T_{1}T_{2}}\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(1)}}^{1/2}{\bf Z}_{A}^{(1)}{{\bf X}_{B}^{(1)}}^{\mathrm{T}}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{{\bf Z}_{A}^{(2)}}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(2)}}^{1/2}\widetilde{{\bf e}}_{kA}$
		$\displaystyle=$	$\displaystyle\dfrac{N}{T_{1}T_{2}}\sqrt{\lambda_{k}^{(1)}\lambda_{k}^{(2)}}\cdot{{\bf z}_{(k)}^{(1)}}^{\mathrm{T}}{{\bf X}_{B}^{(1)}}^{\mathrm{T}}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}.$

Using the independence among ${\bf z}_{k}^{(1)},{\bf z}_{(k)}^{(2)},{\bf X}_{B}^{(1)}$ and ${\bf X}_{B}^{(2)}$ , Assumption A and that $\|{\bf S}_{22}\|=O_{p}(1)$ , one can show that the last term is $o_{p}(1)$ . It follows that

N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1),

which completes the proof of Theorem 6. ∎

Proof of Corollary 1.

If $({{\bf v}}_{1}^{(1)},\ldots,{{\bf v}}_{r}^{(1)})=({{\bf v}}_{1}^{(2)},\ldots,{{\bf v}}_{r}^{(2)})$ , then $\boldsymbol{\Xi}_{11,-k}={\bf I}_{r-1}$ . Denote ${\bf q}_{k}=\begin{pmatrix}{\bf q}_{kA}\\ {\bf q}_{kB},\end{pmatrix},$ where ${\bf q}_{kA}\sim N(0,\rho_{1}{\bf D}_{a_{k}})$ , ${\bf q}_{kB}\sim N(0,\rho_{2}{\bf D}_{b_{k}})$ and ${\bf q}_{kA}$ and ${\bf q}_{kB}$ are independent. Therefore, the limiting distribution becomes

$\displaystyle{\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&{\bf I}_{r-1}\\ {\bf I}_{r-1}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k}$	$\displaystyle=$	$\displaystyle{\bf q}_{kA}^{\mathrm{T}}{\bf q}_{kA}+2{\bf q}_{kA}^{\mathrm{T}}{\bf q}_{kB}+{\bf q}_{kB}^{\mathrm{T}}{\bf q}_{kB}$
	$\displaystyle=$	$\displaystyle\sum_{j\neq k,j=1}^{r}\left(q_{kA}[j]+q_{kB}[j]\right)^{2}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}$	$\displaystyle\sum_{j\neq k,j=1}^{r}\left(\rho_{1}\omega_{kj}^{(1)}+\rho_{2}\omega_{kj}^{(2)}\right)\cdot Z_{j}^{2},$

where $Z_{j}$ are i.i.d. standard normal random variables.

∎