This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tests for principal eigenvalues and eigenvectors

Jianqing Fan
Operations Research and Financial Engineering
Princeton University
Yingying Li   
Department of Information System
Business Statistics and Operations Management Hong Kong University of Science and Technology
Ningning Xia   
School of Statistics and Management
Shanghai University of Finance and Economics
Xinghua Zheng   
Department of Information System
Business Statistics and Operations Management Hong Kong University of Science and Technology
Abstract

We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of structural breaks because they can distinguish between individual principal eigenvalues and/or eigenvectors. We demonstrate the application by comparing the principal eigenvalues and principal eigenvectors of S&P500 Index constituents’ daily returns over different years.


Keywords: Factor model; principal eigenvalues; principal eigenvectors; central limit theorem; two-sample test

1 Introduction

Factor models have been widely adopted in many disciplines, most notably, economics and finance. Some of the most famous examples include the capital asset pricing model (CAPM, Sharpe (1964)), arbitrage pricing theory (Ross (1976)), approximate factor model (Chamberlain and Rothschild (1983)), Fama-French three factor model (Fama and French (1992)) and the more recent five-factor model (Fama and French (2015)).

Statistically, the analysis of factor models is closely related to principal component analysis (PCA). For example, finding the number of factors is equivalent to determining the number of principal eigenvalues (Bai and Ng (2002); Onatski (2010); Ahn and Horenstein (2013)); estimating factor loadings as well as factors relies on principal eigenvectors (Stock and Watson (1998, 2002); Bai (2003); Bai and Ng (2006); Fan et al. (2011, 2013); Wang and Fan (2017)).

A factor model typically reads as follows:

yit=𝐛iT𝐟t+εit,i=1,2,,N,t=1,2,,T,y_{it}={\bf b}_{i}^{\mathrm{T}}{\bf f}_{t}+\varepsilon_{it},~{}~{}~{}~{}i=1,2,\ldots,N,~{}t=1,2,\ldots,T, (1)

where yity_{it} is the observation from the iith subject at time tt, 𝐟t{\bf f}_{t} is a set of factors, and εit\varepsilon_{it} is the idiosyncratic component. The number of factors, r=dim(ft)r=\dim(f_{t}), is small compared with the dimension NN, and is assumed to be fixed throughout the paper. The factor model (1) can be put in a matrix form as

𝐲t=𝐁𝐟t+𝜺t,t=1,2,,T,{\bf y}_{t}={\bf B}{\bf f}_{t}+{\boldsymbol{\varepsilon}}_{t},t=1,2,\ldots,T,

where 𝐲t=(y1t,,yNt)T{\bf y}_{t}=(y_{1t},\ldots,y_{Nt})^{\mathrm{T}}, 𝐁=(𝐛1,,𝐛N)T{\bf B}=({\bf b}_{1},\ldots,{\bf b}_{N})^{\mathrm{T}} and 𝜺t=(ε1t,,εNt)T{\boldsymbol{\varepsilon}}_{t}=(\varepsilon_{1t},\ldots,\varepsilon_{Nt})^{\mathrm{T}}. If follows that the covariance matrix 𝚺{\boldsymbol{\Sigma}} of 𝐲t{\bf y}_{t} satisfies

𝚺=𝐁Cov(𝐟t)𝐁T+𝚺ε,{\boldsymbol{\Sigma}}={\bf B}\operatorname{Cov}({\bf f}_{t}){\bf B}^{\mathrm{T}}+{\boldsymbol{\Sigma}}_{\varepsilon},

where 𝚺ε{\boldsymbol{\Sigma}}_{\varepsilon} is the covariance matrix of (𝜺t)({\boldsymbol{\varepsilon}}_{t}).

The factors (𝐟t)({\bf f}_{t}) in some situations are taken to be observable. Examples include the market factor in CAPM and the Fama-French three factors. In some other situations, factors are latent and hence unobservable. In this paper, we focus on the latent factor case.

Factor models provide a parsimonious way to describe the dynamics of large dimensional variables. In the study of factor models, time invariance of factor loadings is a standard assumption. For example, in order to apply PCA, the loadings need to be time invariant or at least roughly so, otherwise the estimation will be inconsistent. However, parameter instability has been a pervasive phenomenon in time series data. Such instability could be due to policy regime switches, changes in economic/finanncial fundamentals, etc. Because of this reason, caution has to be exercised about potential structural changes in real data. Statistical analysis of structural change in large factor model is challenging because the factors are unobserved and factor loadings have to be estimated.

There are some existing work on detecting structural breaks. Typically, the setup is as follows: suppose there are two time periods, one from time 1 to T1T_{1}, the second from T1+1T_{1}+1 to T1+T2T_{1}+T_{2}, where T1T_{1} and T2T_{2} do not necessarily equal. The first period has loading 𝐁1{\bf B}_{1}, and the second period has loading 𝐁2{\bf B}_{2}. One then tests whether 𝐁1{\bf B}_{1} equals 𝐁2{\bf B}_{2}. Specifically, one considers the following model:

𝐲t\displaystyle{\bf y}_{t} =\displaystyle= 𝐁1𝐅t+𝜺t,t=1,2,,T1,\displaystyle{\bf B}_{1}{\bf F}_{t}+{\boldsymbol{\varepsilon}}_{t},~{}~{}~{}t=1,2,\ldots,T_{1},
𝐲t\displaystyle{\bf y}_{t} =\displaystyle= 𝐁2𝐅t+𝜺t,t=T1+1,,T1+T2,\displaystyle{\bf B}_{2}{\bf F}_{t}+{\boldsymbol{\varepsilon}}_{t},~{}~{}~{}t=T_{1}+1,\ldots,T_{1}+T_{2},

and tests the following hypothesis for detecting structural breaks

H0:𝐁1=𝐁2vs.Ha:𝐁1𝐁2.H_{0}:~{}{\bf B}_{1}={\bf B}_{2}~{}~{}~{}~{}\textrm{vs.}~{}~{}~{}~{}H_{a}:~{}{\bf B}_{1}\neq{\bf B}_{2}.

Existing works include Stock and Watson (2009); Breitung and Eickmeier (2011); Chen et al. (2020); Han and Inoue (2015), among others.

Let us connect the factor loadings with principal eigenvalues and eigenvectors. Recall that 𝚺{\boldsymbol{\Sigma}} stands for the covariance matrix of (𝐲t)({\bf y}_{t}). Write its spectral decomposition as

𝚺=𝐕𝚲𝐕T,{\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{{\bf V}}^{T},

where

𝐕=(𝐯1,,𝐯N), and 𝚲=diag(λ1,,λN).{\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N}),\mbox{ and }{\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}).

The diagonal matrix 𝚲{\boldsymbol{\Lambda}} consists of eigenvalues in descending order, and 𝐕{\bf V} consists of corresponding eigenvectors. Under the convention that Cov(𝐟t)=𝐈\operatorname{Cov}({\bf f}_{t})={\bf I}, the factor loading matrix

𝐁=(λ1𝐯1,,λr𝐯r).{\bf B}=\left(\sqrt{\lambda_{1}}{\bf v}_{1},\ldots,\sqrt{\lambda_{r}}{\bf v}_{r}\right).

Therefore structural breaks can be due to changes in

  1. (i)

    one or more λi\lambda_{i}, or

  2. (ii)

    one or more 𝐯i{\bf v}_{i}, or

  3. (iii)

    both.

The economic and/or financial implications of these possibilities are, however, completely different. If a structural break is only due to change in eigenvalues, then in many applications, the structural break has no essential impact. For example, from dimension reduction point of view, if the principal eigenvalues change while the principal eigenvectors do not change, then projecting onto the principal eigenvectors is still valid. In contrast, if a structural break is caused by eigenvectors, then it may indicate a much more fundamental change, possibly associated with important economic or market condition changes, to which one should be alerted.

Such observations bring up the aim of this paper: instead of testing whether the whole matrix 𝐁{\bf B} is the same during two periods, we want to detect changes in individual principal eigenvalues and eigenvectors. By doing so, we can pinpoint the source of structural changes. Specifically, when a structural break occurs, we can determine whether it is caused by a change in a principal eigenvalue, a change in a principal eigenvector, or perhaps changes in both principal eigenvalues and eigenvectors.

To be more specific, we consider the the following three tests. Let 𝚺(1){\boldsymbol{\Sigma}}^{(1)} and 𝚺(2){\boldsymbol{\Sigma}}^{(2)} be the population covariance matrices for the two periods under study. For any symmetric matrix AA and any integer kk, we let λk(A)\lambda_{k}(A) denote the kkth largest eigenvalue of AA, 𝐯k(A){\bf v}_{k}(A) the corresponding eigenvector, and tr(𝐀)\operatorname{tr}({\bf A}) its trace.

  1. (i)

    Test equality of principal eigenvalues: for each k=1,,rk=1,\ldots,r, we test

    H0(I,k):λk(1)=λk(2)vsHa(I,k):λk(1)λk(2),H_{0}^{(I,k)}:~{}\lambda_{k}^{(1)}=\lambda_{k}^{(2)}~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(I,k)}:~{}\lambda_{k}^{(1)}\neq\lambda_{k}^{(2)},

    where λk(1):=λk(𝚺(i)),i=1,2\lambda_{k}^{(1)}:=\lambda_{k}({\boldsymbol{\Sigma}}^{(i)}),\ i=1,2.

  2. (ii)

    Considering that the total variation may vary, we test about equality of the ratio of principal eigenvalues: for each k=1,,rk=1,\ldots,r, test

    H0(II,k):λk(1)tr(𝚺(1))=λk(2)tr(𝚺(2))vsHa(II,k):λk(1)tr(𝚺(1))λk(2)tr(𝚺(2)).H_{0}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}=\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}\neq\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}.
  3. (iii)

    Most importantly, we test equality of principal eigenvectors: for each k=1,2,,rk=1,2,\ldots,r, test

    H0(III,k):|𝐯k(1),𝐯k(2)|=1,vsHa(III,k):|𝐯k(1),𝐯k(2)|<1,H_{0}^{(III,k)}:~{}|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1,~{}~{}~{}~{}~{}\textrm{vs}~{}~{}~{}~{}H_{a}^{(III,k)}:~{}|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|<1,

where 𝐯k(i):=𝐯k(𝚺(i)),i=1,2{\bf v}_{k}^{(i)}:={\bf v}_{k}({\boldsymbol{\Sigma}}^{(i)}),\ i=1,2, and 𝐚,𝐛\langle{\bf a},{\bf b}\rangle denotes the inner product of two vectors 𝐚{\bf a} and 𝐛{\bf b}.

In this paper, we establish central limit theorems (CLT) for principal eigenvalues, eigenvalue ratios, as well as eigenvectors. We then develop two-sample tests based on these CLTs.

Due to the wide application of PCA, a lot of work has been devoted to investigating principal eigenvalues. However, the study of principal eigenvectors is very limited. This paper represents a significant advancement in this direction.

We remark that there is an independent work, Bao et al. (2022), that study similar questions. Nevertheless, there are several significant differences between Bao et al. (2022) and our paper. First, the non-principal eigenvalues are assumed to be equal in Bao et al. (2022); see equation (1.2) therein. This is an unrealistic assumption in many applications. In our paper, we allow the non-principal eigenvalues to follow an arbitrary distribution, rendering our results readily applicable in practice. Second, in Bao et al. (2022), the dimension to the sample size ratio needs to be away from one; see Assumption 2.4 therein. We do not impose such a restriction in our paper. Third, Bao et al. (2022) only consider the one-sample situation and study the projection of sample leading eigenvectors onto a given direction. In our paper, we establish two-sample CLT, where the projection of a principal eigenvector onto a random direction is considered. Establishing such a result presents a significant challenge. In summary, the setting of our paper is practically appropriate, and the results are of significant importance.

The organization of the paper is as follows. Theoretical results are presented in Sections 2-4. Simulation and Empirical studies are presented in Section 5 and 6, respectively. Proofs are collected in the Appendix.

Notation: we use the following notation in addition to what have been introduced above. For a symmetric N×NN\times N matrix 𝐀{\bf A}, its empirical spectral distribution (ESD) is defined as

F𝐀(x)=1Nj=1N 1(λj(𝐀)x),x,F^{{\bf A}}(x)=\dfrac{1}{N}\ \sum_{j=1}^{N}\ {\mathbf{1}}(\lambda_{j}({\bf A})\leq x),~{}~{}~{}~{}~{}x\in{\mathbb{R}},

where 𝟏(){\mathbf{1}}(\cdot) is the indicator function. The limit of ESD as NN\rightarrow\infty, if it exists, is referred to as the limiting spectral distribution, or LSD for short. For any vector 𝐚{\bf a}, let 𝐚[k]{{\bf a}}[k] be its kkth entry. We use ``𝒟"``\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}" to denote weak convergence.

2 Setting and Assumptions

We assume that (𝐲t)t=1T({\bf y}_{t})_{t=1}^{T} is a sequence of i.i.d. NN-dimensional random vectors with mean zero and covariance matrix 𝚺{\boldsymbol{\Sigma}}. Let λ1,,λN\lambda_{1},\ldots,\lambda_{N} be the eigenvalues of 𝚺{\boldsymbol{\Sigma}} in descending order, and 𝐯1,,𝐯N{\bf v}_{1},\ldots,{\bf v}_{N} be the corresponding eigenvectors. Write 𝚲=diag(λ1,,λN){\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}) and 𝐕=(𝐯1,,𝐯N){\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N}). Then the spectral decomposition of 𝚺{\boldsymbol{\Sigma}} is given by 𝚺=𝐕𝚲𝐕T{\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{\mathrm{T}}.

We make the following assumptions.

Assumption A:

  1. The eigenvalues λ1>λ2>>λr>λr+1λN\lambda_{1}>\lambda_{2}>\ldots>\lambda_{r}>\lambda_{r+1}\geq\ldots\geq\lambda_{N} satisfy that

    1. (A.i)

      for the principal part, one has limNλk/N=θk(0,+)\lim_{N\rightarrow\infty}\lambda_{k}/N=\theta_{k}\in(0,+\infty) for 1kr1\leq k\leq r, where r>1r>1 is a fixed integer and θk\theta_{k}’s are distinct.

    2. (A.ii)

      for the non-principal part, there exists a C0<C_{0}<\infty such that λjC0\lambda_{j}\leq C_{0} for j>rj>r, and the empirical distribution of {λr+1,,λN}\{\lambda_{r+1},\ldots,\lambda_{N}\} tends to a distribution HH.

Remark 1.

Assumption (A.i) implies that the factors are strong. When the factors are weak, say λiNα\lambda_{i}\asymp N^{\alpha} for some α(0,1)\alpha\in(0,1), the convergence of sample principal components still holds with the convergence rate depending on α\alpha. In this paper, we only focus on the strong factor case and leave the study of weak factors for future work.

Assumption B: The observations (𝐲i)i=1T({\bf y}_{i})_{i=1}^{T} can be written as 𝐲i=𝐕𝚲1/2𝐳i{\bf y}_{i}={\bf V}{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{i}, where
{𝐳i=(𝐳i[1],𝐳i[2],,𝐳i[N])T,i=1,2,,T}\{{\bf z}_{i}=({\bf z}_{i}[1],{\bf z}_{i}[2],\ldots,{\bf z}_{i}[N])^{\mathrm{T}},\ i=1,2,\ldots,T\} are i.i.d. random vectors, and 𝐳i[],=1,,N,{\bf z}_{i}[\ell],\ell=1,\ldots,N, are independent random variables with zero mean, unit variance and satisfying supNmax1NE(𝐳1[])4<\sup_{N}\max_{1\leq\ell\leq N}\operatorname{E}({\bf z}_{1}[\ell])^{4}<\infty.

Remark 2.

Assumption B covers the multivariate normal case and coincides with the idea of PCA. Specifically, if 𝐲i{\bf y}_{i} follows a multivariate normal distribution, then 𝐳i=𝚲1/2𝐕T𝐲i{\bf z}_{i}={\boldsymbol{\Lambda}}^{-1/2}{\bf V}^{\mathrm{T}}{\bf y}_{i} is an NN-dimensional standard normal random vector and Assumption B holds naturally. On the other hand, under the orthogonal basis 𝐕=(𝐯1,,𝐯N){\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N}), the coordinates of 𝐲i{\bf y}_{i} are (λ1𝐳i[1],,λN𝐳i[N])(\sqrt{\lambda_{1}}{\bf z}_{i}[1],\ldots,\sqrt{\lambda_{N}}{\bf z}_{i}[N]). Assumption B says that the coordinate variables are independent with mean zero and variance λi,i=1,,N\lambda_{i},i=1,\ldots,N.

Assumption C: The dimension NN and sample size TT are such that ρN:=N/Tρ(0,+)\rho_{N}:=N/T\rightarrow\rho\in(0,+\infty) as NN\rightarrow\infty.

3 One-sample Asymptotics

Let 𝚺^N\widehat{{\boldsymbol{\Sigma}}}_{N} be the sample covariance matrix defined as

𝚺^N=1Tt=1T𝐲t𝐲tT.\widehat{{\boldsymbol{\Sigma}}}_{N}=\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}{\bf y}_{t}{\bf y}_{t}^{\mathrm{T}}.

Denote its eigenvalues by λ^1λ^N\widehat{\lambda}_{1}\geq\cdots\geq\widehat{\lambda}_{N}, and let 𝐯^1,,𝐯^N\widehat{{\bf v}}_{1},\ldots,\widehat{{\bf v}}_{N} be the corresponding eigenvectors.

Theorem 1.

Under Assumptions A–C, the principal eigenvalues converge weakly to a multivariate normal distribution:

T(λ^1/λ11λ^r/λr1)𝒟N(0,𝚺λ),\displaystyle\sqrt{T}\begin{pmatrix}\widehat{\lambda}_{1}/\lambda_{1}-1\\ \vdots\\ \widehat{\lambda}_{r}/\lambda_{r}-1\end{pmatrix}~{}\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}~{}N(0,{\boldsymbol{\Sigma}}_{\lambda}), (2)

where 𝚺λ=diag(σλ12,,σλr2){\boldsymbol{\Sigma}}_{\lambda}=\operatorname{diag}(\sigma_{\lambda_{1}}^{2},\ldots,\sigma_{\lambda_{r}}^{2}) is a diagonal matrix with σλk2=E(𝐳1[k])41\sigma_{\lambda_{k}}^{2}=\operatorname{E}\left({\bf z}_{1}[k]\right)^{4}-1.

Remark 3.

The marginal convergence in (2) has been established in Wang and Fan (2017) under a sub-Gaussian assumption. We generalize their result to joint convergence and under a weaker moment assumption. See also Cai et al. (2020) for a related result under a different setting.

Remark 4.

By Theorem 3 below, the variance σλk2\sigma_{\lambda_{k}}^{2} can be consistently estimated by

σλk2^=1T(λ^k)2t=1T(𝐯^kT𝐲t)41,\widehat{\sigma_{\lambda_{k}}^{2}}=\dfrac{1}{T(\widehat{\lambda}_{k})^{2}}\sum_{t=1}^{T}(\widehat{{\bf v}}_{k}^{\mathrm{T}}{\bf y}_{t})^{4}-1,

hence a feasible CLT is readily available.

Theorem 2.

Under Assumptions A–C, for each 1kr1\leq k\leq r, we have

Tσk2^(λ^ktr(𝚺^N)λ^kλktr(𝚺)λk)𝒟N(0,1),\displaystyle\sqrt{\dfrac{T}{\widehat{\sigma_{-k}^{2}}}}\left(\dfrac{\widehat{\lambda}_{k}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}}-\dfrac{\lambda_{k}}{\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,1), (3)

where

σk2^=λ^k2(tr(𝚺^N)λ^k)2[σλk2^+jk,j=1rλ^j2σλj2^(tr(𝚺^N)λ^k)2].\widehat{\sigma_{-k}^{2}}=\dfrac{\widehat{\lambda}_{k}^{2}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}\right)^{2}}\left[\widehat{\sigma_{\lambda_{k}}^{2}}+\dfrac{\sum_{j\neq k,j=1}^{r}\widehat{\lambda}_{j}^{2}\ \widehat{\sigma_{\lambda_{j}}^{2}}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}\right)^{2}}\right].
Remark 5.

Theorem 2 can be used to construct the confidence interval for the ratio ϱk:=λk/tr(𝚺)\varrho_{k}:={\lambda_{k}}/{\operatorname{tr}({\boldsymbol{\Sigma}})}. This follows easily from (3) and the fact that, if we write ϱ~=λk/(tr(𝚺)λk)\widetilde{\varrho}={\lambda_{k}}/{(\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k})}, then ϱk=ϱ~k/(1+ϱ~k)\varrho_{k}=\widetilde{\varrho}_{k}/(1+\widetilde{\varrho}_{k}), which is a strictly increasing function of ϱ~k\widetilde{\varrho}_{k}.

Theorem 3.

Under Assumptions A–C, for each 1kr1\leq k\leq r, the principal sample eigenvector 𝐯^k\widehat{{\bf v}}_{k} satisfies

T(1𝐯k,𝐯^k21Tλ^kj=r+1Nλ^j(1λ^j/λ^k)2)𝒟ik,i=1rωkiZi2,T\left(1-\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle^{2}-\dfrac{1}{T\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2},

where ωki=θkθi/(θkθi)2\omega_{ki}={\theta_{k}\theta_{i}}/{(\theta_{k}-\theta_{i})^{2}}, which can be consistently estimated by

ωki^=λ^kλ^i(λ^kλ^i)2,\widehat{\omega_{ki}}=\dfrac{\widehat{\lambda}_{k}\widehat{\lambda}_{i}}{(\widehat{\lambda}_{k}-\widehat{\lambda}_{i})^{2}},

and ZiZ_{i}’s are i.i.d standard normal random variables.

Remark 6.

The convergence rate of 𝐯k,𝐯^k2\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle^{2} has been established in Theorem 3.2 of Wang and Fan (2017). We derive the corresponding limiting distribution at the boundary of the parameter space, which is much more difficult to prove.

The proofs of Theorems 1, 2 and 3 are given in the supplementary material.

4 Two-sample Tests

We now discuss how to conduct the three tests mentioned in the Introduction.

Suppose that we have two groups of observations of the same dimension NN:

𝐲1(1),,𝐲T1(1),and𝐲1(2),,𝐲T2(2),{\bf y}_{1}^{(1)},\ldots,{\bf y}_{T_{1}}^{(1)},~{}~{}~{}~{}\textrm{and}~{}~{}~{}{\bf y}_{1}^{(2)},\ldots,{\bf y}_{T_{2}}^{(2)},

which are drawn independently from two populations with mean zero and covariance matrices 𝚺(1){\boldsymbol{\Sigma}}^{(1)} and 𝚺(2){\boldsymbol{\Sigma}}^{(2)}. We assume that Assumption B holds for each group of observations. Moreover, analogous to Assumption C, we have

limNNTi=ρi(0,+),i=1,2.\lim_{N\rightarrow\infty}\frac{N}{T_{i}}=\rho_{i}\in(0,+\infty),\quad i=1,2.

Finally, analogous to Assumption A, with the spectral decompositions of 𝚺(i),i=1,2{\boldsymbol{\Sigma}}^{(i)},i=1,2, given by

𝚺(i)\displaystyle{\boldsymbol{\Sigma}}^{(i)} =\displaystyle= 𝐕(i)𝚲(i)𝐕(i)T,\displaystyle{{\bf V}}^{(i)}{\boldsymbol{\Lambda}}^{(i)}{{{\bf V}}^{(i)}}^{\mathrm{T}},

where 𝚲(i)=diag(λ1(i),,λr(i),λr+1(i),,λN(i)),{\boldsymbol{\Lambda}}^{(i)}=\operatorname{diag}(\lambda_{1}^{(i)},\ldots,\lambda_{r}^{(i)},\lambda_{r+1}^{(i)},\ldots,\lambda_{N}^{(i)}), and 𝐕(i)=(𝐯1(i),,𝐯r(i),𝐯r+1(i),𝐯N(i)),{{\bf V}}^{(i)}=({{\bf v}}_{1}^{(i)},\ldots,{{\bf v}}_{r}^{(i)},{{\bf v}}_{r+1}^{(i)},\ldots{{\bf v}}_{N}^{(i)}), we assume, for each population, there are rr principal eigenvalues, which satisfy

limNλk(i)N=θk(i)(0,+),for 1kr,i=1,2;\lim_{N\rightarrow\infty}\dfrac{\lambda_{k}^{(i)}}{N}\ =\ \theta_{k}^{(i)}\in(0,+\infty),~{}~{}~{}~{}~{}~{}~{}\textrm{for }~{}1\leq k\leq r,~{}i=1,2;

while the remaining eigenvalues λj(i),r+1jN\lambda_{j}^{(i)},r+1\leq j\leq N, are uniformly bounded and have a limiting empirical distribution. The two liming empirical distributions for the two populations can be different.

Naturally, our tests will be based on the sample covariance matrices

𝚺^N(i)=1Tij=1Ti𝐲j(i)𝐲j(i)T,i=1,2.\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)}=\frac{1}{T_{i}}\sum_{j=1}^{T_{i}}{\bf y}_{j}^{(i)}{{\bf y}^{(i)}_{j}}^{\mathrm{T}},\quad i=1,2.

Write their spectral decompositions as

𝚺^N(i)=𝐕^(i)𝚲^(i)(𝐕^(i))T\displaystyle\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)}=\widehat{{\bf V}}^{(i)}\widehat{{\boldsymbol{\Lambda}}}^{(i)}{(\widehat{{\bf V}}^{(i)})}^{\mathrm{T}}

where

𝚲^(i)=diag(λ^1(i),,λ^N(i)),𝐕^(i)=(𝐯^1(i),,𝐯^N(i)).\widehat{{\boldsymbol{\Lambda}}}^{(i)}=\operatorname{diag}(\widehat{\lambda}_{1}^{(i)},\ldots,\widehat{\lambda}_{N}^{(i)}),~{}~{}\widehat{{\bf V}}^{(i)}=(\widehat{{\bf v}}_{1}^{(i)},\ldots,\widehat{{\bf v}}_{N}^{(i)}).

4.1 Testing equality of principal eigenvalues

To test H0(I,k):λk(1)=λk(2)H_{0}^{(I,k)}:~{}\lambda_{k}^{(1)}=\lambda_{k}^{(2)}, we use the following test statistic

Tλk=T1T2T1(σλk2^)(2)+T2(σλk2^)(1)(λ^k(1)λ^k(2)1),T_{\lambda k}=\sqrt{\dfrac{T_{1}T_{2}}{T_{1}(\widehat{\sigma_{\lambda_{k}}^{2}})^{(2)}+T_{2}(\widehat{\sigma_{\lambda_{k}}^{2}})^{(1)}}}\cdot\left(\dfrac{\widehat{\lambda}_{k}^{(1)}}{\widehat{\lambda}_{k}^{(2)}}-1\right),

where

(σλk2^)(i)=1(λ^k(i))2Tij=1Ti((𝐯^k(i))T𝐲j(i))41,i=1,2.(\widehat{\sigma_{\lambda_{k}}^{2}})^{(i)}=\dfrac{1}{(\widehat{\lambda}_{k}^{(i)})^{2}\ T_{i}}\sum_{j=1}^{T_{i}}\left((\widehat{{\bf v}}_{k}^{(i)})^{\mathrm{T}}{\bf y}_{j}^{(i)}\right)^{4}-1,\quad i=1,2.
Theorem 4.

Under null hypothesis H0(I,k)H_{0}^{(I,k)} and Assumptions A–C, the proposed test statistic TλkT_{\lambda k} converges weakly to the standard normal distribution.

Theorem 4 follows directly from Theorem 1 and the Delta method.

4.2 Testing equality of ratios of principal eigenvalues

The null hypothesis

H0(II,k):λk(1)tr(𝚺(1))=λk(2)tr(𝚺(2))H_{0}^{(II,k)}:~{}\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})}=\dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})}

is equivalent to

H0(II,k):λk(1)tr(𝚺(1))λk(1)=λk(2)tr(𝚺(2))λk(2).H_{0}^{(II,k)^{\prime}}:\dfrac{\lambda_{k}^{(1)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(1)})-\lambda_{k}^{(1)}}\ =\ \dfrac{\lambda_{k}^{(2)}}{\operatorname{tr}({\boldsymbol{\Sigma}}^{(2)})-\lambda_{k}^{(2)}}.

Based on such an observation, we propose the following test statistic

Tek:=N(λ^k(1)tr(𝚺^N(1))λ^k(1)λ^k(2)tr(𝚺^N(2))λ^k(2))NT1σk2^(1)+NT2σk2^(2)T_{ek}:=\dfrac{\sqrt{N}\left(\dfrac{\widehat{\lambda}_{k}^{(1)}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(1)})-\widehat{\lambda}_{k}^{(1)}}-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(2)})-\widehat{\lambda}_{k}^{(2)}}\right)}{\sqrt{\dfrac{N}{T_{1}}\ \widehat{{\sigma_{-k}^{2}}}^{(1)}+\dfrac{N}{T_{2}}\ \widehat{{\sigma_{-k}^{2}}}^{(2)}}}

where

σk2^(i)=(λ^k(i))2(tr(𝚺^N(i))λ^k(i))2(σλk2^(i)+jk,j=1r(λ^j(i))2σλj2^(i)(tr(𝚺^N(i))λ^k(i))2),i=1,2.\widehat{{\sigma_{-k}^{2}}}^{(i)}=\dfrac{(\widehat{\lambda}_{k}^{(i)})^{2}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)})-\widehat{\lambda}_{k}^{(i)}\right)^{2}}\left(\widehat{\sigma_{\lambda_{k}}^{2}}^{(i)}+\dfrac{\sum_{j\neq k,j=1}^{r}(\widehat{\lambda}_{j}^{(i)})^{2}\ \widehat{\sigma_{\lambda_{j}}^{2}}^{(i)}}{\left(\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N}^{(i)})-\widehat{\lambda}_{k}^{(i)}\right)^{2}}\right),\quad i=1,2.
Theorem 5.

Under the null hypothesis H0(II,k)H_{0}^{(II,k)} and Assumptions A–C, the test statistic TekT_{ek} converges weakly to the standard normal distribution.

Theorem 5 is a straightforward consequence of Theorem 2.

4.3 Testing equality of principal eigenvectors

To test H0(III,k):|𝐯k(1),𝐯k(2)|=1H_{0}^{(III,k)}:|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1, we propose the following test statistic

Tvk\displaystyle T_{vk} :=2N(1|𝐯^k(1),𝐯^k(2)|)\displaystyle:=2N\left(1-|\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle|\right) (4)
N2T1(Nr)λ^k(1)j=r+1Nλ^j(1)N2T2(Nr)λ^k(2)j=r+1Nλ^j(2).\displaystyle\quad-\dfrac{N^{2}}{T_{1}(N-r)\widehat{\lambda}_{k}^{(1)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(1)}-\dfrac{N^{2}}{T_{2}(N-r)\widehat{\lambda}_{k}^{(2)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(2)}.
Theorem 6.

Under null hypothesis H0(III,k)H_{0}^{(III,k)} and Assumptions A–C, suppose further that 𝚵11:=(limN𝐯s(1),𝐯t(2))s,t=1,,r\boldsymbol{\Xi}_{11}^{*}:=\left(\lim_{N\rightarrow\infty}\langle{{\bf v}}_{s}^{(1)},{{\bf v}}_{t}^{(2)}\rangle\right)_{s,t=1,\ldots,r} exist. Then the proposed test statistic TvkT_{vk} converges weakly as follows:

Tvk𝒟𝐪kT(𝐈r1𝚵11,k(𝚵11,k)T𝐈r1)𝐪k,T_{vk}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ {\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{*}\\ (-{\boldsymbol{\Xi}_{11,-k}^{*}})^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k},

where 𝐪k{\bf q}_{k} follows a multivariate normal distribution with mean zero and covariance matrix

𝐃k=diag\displaystyle{\bf D}_{k}=\operatorname{diag} (ρ1ωk1(1),,ρ1ωk(k1)(1),ρ1ωk(k+1)(1),,ρ1ωkr(1),\displaystyle\left(\rho_{1}\omega_{k1}^{(1)},\ldots,\rho_{1}\omega_{k(k-1)}^{(1)},\rho_{1}\omega_{k(k+1)}^{(1)},\ldots,\rho_{1}\omega_{kr}^{(1)},\right.
ρ2ωk1(2),,ρ2ωk(k1)(2),ρ2ωk(k+1)(2),,ρ2ωkr(2)),\displaystyle\ \left.\rho_{2}\omega_{k1}^{(2)},\ldots,\rho_{2}\omega_{k(k-1)}^{(2)},\rho_{2}\omega_{k(k+1)}^{(2)},\ldots,\rho_{2}\omega_{kr}^{(2)}\right),
ωkj(i)=θk(i)θj(i)(θk(i)θj(i))2,fori=1,2,j=1,,k1,k+1,,r,\omega_{kj}^{(i)}=\dfrac{\theta_{k}^{(i)}\theta_{j}^{(i)}}{(\theta_{k}^{(i)}-\theta_{j}^{(i)})^{2}},~{}~{}~{}~{}\textrm{for}~{}i=1,2,~{}~{}j=1,\ldots,k-1,k+1,\ldots,r,

and 𝚵11,k\boldsymbol{\Xi}_{11,-k}^{*} is the matrix obtained by deleting the kkth row and kkth column of 𝚵11\boldsymbol{\Xi}_{11}^{*}. Furthermore, ωkj(i)\omega_{kj}^{(i)} and 𝚵11\boldsymbol{\Xi}_{11}^{*} can be consistently estimated by

ωkj(i)^=λ^k(i)λ^j(i)(λ^k(i)λ^j(i))2, and 𝚵11^=(𝐯s(1)^,𝐯t(2)^)s,t=1,,r,\widehat{\omega_{kj}^{(i)}}=\dfrac{\widehat{\lambda}_{k}^{(i)}\widehat{\lambda}_{j}^{(i)}}{(\widehat{\lambda}_{k}^{(i)}-\widehat{\lambda}_{j}^{(i)})^{2}},\quad\mbox{ and }\quad\widehat{\boldsymbol{\Xi}_{11}^{*}}=\left(\langle\widehat{{\bf v}_{s}^{(1)}},\widehat{{\bf v}_{t}^{(2)}}\rangle\right)_{s,t=1,\ldots,r},

respectively.

Corollary 1.

Under the stronger null hypothesis that |𝐯k(1),𝐯k(2)|=1|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1 for all k=1,,rk=1,\ldots,r, we have 𝚵11=𝐈r\boldsymbol{\Xi}_{11}^{*}={\bf I}_{r}, and the proposed test statistic TvkT_{vk} converges as follows:

Tvk𝒟jk,j=1r(ρ1ωkj(1)+ρ2ωkj(2))Zj2,T_{vk}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{j\neq k,j=1}^{r}\left(\rho_{1}\omega_{kj}^{(1)}+\rho_{2}\omega_{kj}^{(2)}\right)\cdot Z_{j}^{2},

where ZjZ_{j}’s are i.i.d. standard normal random variables.

Theorem 6 and Corollary 1 are proved in the supplementary material.

5 Simulation Studies

5.1 Design

We consider five population covariance matrices as follows:

𝚺1\displaystyle{\boldsymbol{\Sigma}}_{1} =\displaystyle= 𝐕(1)𝚲(1)(𝐕(1))T,𝚺2=𝐕(1)𝚲(2)(𝐕(1))T,𝚺3=𝐕(2)𝚲(1)(𝐕(2))T,\displaystyle{\bf V}^{(1)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(1)})^{T},~{}~{}~{}{\boldsymbol{\Sigma}}_{2}={\bf V}^{(1)}{\boldsymbol{\Lambda}}^{(2)}({\bf V}^{(1)})^{T},~{}~{}~{}{\boldsymbol{\Sigma}}_{3}={\bf V}^{(2)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(2)})^{T},
𝚺4\displaystyle{\boldsymbol{\Sigma}}_{4} =\displaystyle= 𝚲(1),𝚺5=𝐕(5)𝚲(1)(𝐕(5))T,\displaystyle{\boldsymbol{\Lambda}}^{(1)},~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\boldsymbol{\Sigma}}_{5}={\bf V}^{(5)}{\boldsymbol{\Lambda}}^{(1)}({\bf V}^{(5)})^{T},

where 𝐕(1){\bf V}^{(1)}, 𝐕(2){\bf V}^{(2)} are two random orthogonal matrices, and

𝚲(1)\displaystyle{\boldsymbol{\Lambda}}^{(1)} =\displaystyle= diag(5N/2,N,N/2,λ4(1),,λN(1)),λj(1)i.i.d.Unif(1,3),\displaystyle\operatorname{diag}(5N/2,N,N/2,\lambda_{4}^{(1)},\cdots,\lambda_{N}^{(1)}),~{}~{}~{}\lambda_{j}^{(1)}\sim_{{i.i.d.}}\mbox{Unif}(1,3),
𝚲(2)\displaystyle{\boldsymbol{\Lambda}}^{(2)} =\displaystyle= diag(7N/2,2N,N,λ4(2),,λN(2)),λj(2)i.i.d.Unif(2,5),\displaystyle\operatorname{diag}(7N/2,2N,N,\lambda_{4}^{(2)},\cdots,\lambda_{N}^{(2)}),~{}~{}~{}~{}\lambda_{j}^{(2)}\sim_{{i.i.d.}}\mbox{Unif}(2,5),
𝐕(5)\displaystyle{\bf V}^{(5)} =\displaystyle= (𝐯1(5),𝐯2(5),𝐞3,,𝐞N),\displaystyle({\bf v}_{1}^{(5)},{\bf v}_{2}^{(5)},{\bf e}_{3},\cdots,{\bf e}_{N}),
𝐯1(5)\displaystyle{\bf v}_{1}^{(5)} =\displaystyle= (cosθ,sinθ,0,,0)T,\displaystyle(\cos\theta,\sin\theta,0,\cdots,0)^{T},
𝐯2(5)\displaystyle{\bf v}_{2}^{(5)} =\displaystyle= (sinθ,cosθ,0,,0)T,θ[0,π/2].\displaystyle(-\sin\theta,\cos\theta,0,\cdots,0)^{T},~{}~{}~{}~{}\theta\in[0,\pi/2].

The observations will be simulated as the following: for a given 𝚺{\boldsymbol{\Sigma}}, which will be one of the five covariance matrices above, write its spectral decomposition as 𝚺=𝐕𝚲𝐕T{\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{T}. Then we simulate observations with covariance matrix 𝚺{\boldsymbol{\Sigma}} by 𝐕𝚲1/2𝐳t{\bf V}{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{t}, where 𝐳t=(𝐳t[1],𝐳t[2],,𝐳t[N])T{\bf z}_{t}=({\bf z}_{t}[1],{\bf z}_{t}[2],\ldots,{\bf z}_{t}[N])^{\mathrm{T}} consists of i.i.d. standardized student t(8)t(8) random variables.

Theorems 4, 5 and 6 are associated with different null hypotheses. When evaluating the sizes of the tests proposed in these theorems, we adopt the following setting:

  • For both Theorems 4 and 5, we simulate two samples of observations with 𝚺1{\boldsymbol{\Sigma}}_{1} and 𝚺3{\boldsymbol{\Sigma}}_{3} as their respective population covariance matrices. Note that 𝚺1{\boldsymbol{\Sigma}}_{1} and 𝚺3{\boldsymbol{\Sigma}}_{3} share the same eigenvalues but have different eigenvectors.

  • For Theorem 6, the two samples of observations are simulated with 𝚺1{\boldsymbol{\Sigma}}_{1} and 𝚺2{\boldsymbol{\Sigma}}_{2} as their respective population covariance matrices. The two matrices share the same eigenvectors but have different eigenvalues.

On the other hand, when evaluating powers, we use the following design:

  • For testing equality of eigenvalues/eigenvalue ratios, the two samples of observations are simulated with 𝚺1{\boldsymbol{\Sigma}}_{1} and 𝚺2{\boldsymbol{\Sigma}}_{2} as their respective population covariance matrices;

  • For testing equality of principal eigenvectors, we simulate two samples of observations with 𝚺4{\boldsymbol{\Sigma}}_{4} and 𝚺5{\boldsymbol{\Sigma}}_{5} as their respective population covariance matrices. The difference between the principal eigenvectors of the two matrices is a function of the angle θ\theta. We will change the value of θ\theta to see how the power varies as a function of θ\theta.

5.2 Visual check

We firstly visually examine Theorems 4, 5 and 6 by comparing the empirical distributions of the test statistics with their respective asymptotic distributions under the null hypotheses.

For Theorem 4, the asymptotic distribution of the test statistic Tλ1T_{\lambda 1} is the standard normal distribution. This is clearly supported by Figure 1, which give the normal Q-Q plot and histogram of Tλ1T_{\lambda 1} based on 5,000 replications.

Refer to caption
Refer to caption
Figure 1: Normal Q-Q plot and histogram of Tλ1T_{\lambda 1} based on 5,000 replications with N=500,T1=500N=500,T_{1}=500 and T2=750T_{2}=750.

For Theorem 5, the asymptotic distribution of the test statistic Te1T_{e1} is again the standard normal distribution. This is supported by Figure 2.

Refer to caption
Refer to caption
Figure 2: Normal Q-Q plot and histogram of Te1T_{e1} based on 5,000 replications with N=500,T1=500N=500,T_{1}=500 and T2=750T_{2}=750.

For Theorem 6, the asymptotic distribution of the test statistic Tv1T_{v1} is a generalized χ2\chi^{2}-distribution, which does not have an explicit density formula. To examine the asymptotics, we compare the empirical distribution of the test statistic Tv1T_{v1} with that of Monte-Carlo samples from the asymptotic distribution. The comparison is conducted via both Q-Q plot and density estimation. The results are given in Figure 3. We can see that the empirical distribution of the test statistic Tv1T_{v1} match well with the asymptotic distribution.

Refer to caption
Refer to caption
Figure 3: Comparisons of the empirical distribution of the test statistic Tv1T_{v1} with the asymptotic distribution when N=500,T1=500N=500,T_{1}=500 and T2=750T_{2}=750. Left: Q-Q plot of Tv1T_{v1} versus Monte-Carlo samples from the asymptotic distribution; right: histogram of Tv1T_{v1} versus the kernel density estimate of the asymptotic distribution.

5.3 Size and power evaluation

In this subsection, we evaluate the sizes and powers of the three tests in Theorems 4, 5 and 6.

Table 1 reports the empirical sizes of the three tests based on Tλk,TekT_{\lambda k},T_{ek} and TvkT_{vk}, k=1,2,3,k=1,2,3, at 5% significance level for different combinations of NN, T1T_{1} and T2T_{2}. Tests based on TekT_{ek} and TvkT_{vk}, k=1,2,3,k=1,2,3, involve the number of factors, which is unknown in practice. There are several estimators available, including those given in Bai and Ng (2002) and Ahn and Horenstein (2013). We evaluate the sizes based on a given estimated number of factors, specified by r^\widehat{r} in the table. We see that for the first two sets of tests, for different estimated number of factors and different NN and T1,T2T_{1},T_{2}, the empirical sizes are close to the nominal level of 5%. For the third set of tests based on TviT_{vi}, i=1,2,3,i=1,2,3,, the size approaches 5% as the dimension NN and samples sizes T1,T2T_{1},T_{2} get larger.

NN Tλ1T_{\lambda 1} Te1T_{e1} Tv1T_{v1}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.0520.052\ 0.052 0.050 0.050 0.083 0.086 0.086
300 0.0520.052\ 0.051 0.049 0.049 0.060 0.063 0.063
500 0.0530.053\ 0.053 0.051 0.051 0.051 0.055 0.055
NN Tλ2T_{\lambda 2} Te2T_{e2} Tv2T_{v2}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.0570.057\ 0.048 0.047 0.047 0.102 0.108 0.109
300 0.0530.053\ 0.055 0.054 0.054 0.057 0.063 0.063
500 0.0560.056\ 0.053 0.052 0.052 0.048 0.055 0.055
NN Tλ3T_{\lambda 3} Te3T_{e3} Tv3T_{v3}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.0650.065\ NA 0.049 0.049 NA 0.096 0.099
300 0.0520.052\ NA 0.052 0.052 NA 0.058 0.059
500 0.0620.062\ NA 0.055 0.055 NA 0.057 0.057
Table 1: Empirical sizes based on 5,000 replications of Tλk,TekT_{\lambda k},T_{ek} and TvkT_{vk}, k=1,2,3,k=1,2,3, at 5% significance level with N/T1=1N/T_{1}=1 and N/T2=2/3N/T_{2}=2/3.

Power evaluation results are given in Table 2. We see that the powers are in general quite high especially as the dimension NN and samples sizes T1,T2T_{1},T_{2} all get larger.

NN Tλ1T_{\lambda 1} Te1T_{e1} Tv1T_{v1}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.1960.196\ 0.146 0.138 0.138 0.494 0.508 0.509
300 0.6170.617\ 0.457 0.450 0.450 0.909 0.913 0.914
500 0.8420.842\ 0.705 0.697 0.697 0.987 0.988 0.988
NN Tλ2T_{\lambda 2} Te2T_{e2} Tv2T_{v2}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.7120.712\ 0.160 0.158 0.158 0.407 0.428 0.430
300 0.9890.989\ 0.360 0.355 0.355 0.833 0.844 0.844
500 0.9960.996\ 0.522 0.520 0.520 0.970 0.974 0.974
NN Tλ3T_{\lambda 3} Te3T_{e3} Tv3T_{v3}
r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4 r^=2\widehat{r}=2 r^=3\widehat{r}=3 (true) r^=4\widehat{r}=4
100 0.7050.705\ NA 0.145 0.145 NA NA NA
300 0.9900.990\ NA 0.303 0.303 NA NA NA
500 11\ NA 0.446 0.446 NA NA NA
Table 2: Empirical powers based on 5,000 replications of Tλk,TekT_{\lambda k},T_{ek} and TvkT_{vk} (for θ=π/9\theta=\pi/9), k=1,2,3,k=1,2,3, at 5% significance level with N/T1=1N/T_{1}=1 and N/T2=2/3N/T_{2}=2/3.

Finally, in Figure 4, we evaluate the power of the eigenvector test Tvk,k=1,2T_{vk},k=1,2 as a function of θ\theta. For the three θ\theta values tested, iπ/9,i=1,2,3i\pi/9,i=1,2,3, the bigger the value, the bigger the difference between the principal eigenvectors in the two populations, and the higher the power. Moreover, even for the smallest value π/9\pi/9, the power quickly increases to close to 1 as the dimension NN and samples sizes T1,T2T_{1},T_{2} get larger.

Refer to caption
Refer to caption
Figure 4: Empirical power of Tvi,i=1,2,T_{v_{i}},i=1,2, as a function of θ\theta at 5% significance level for different NN and T1,T2T_{1},T_{2} with N/T1=1N/T_{1}=1 and N/T2=2/3N/T_{2}=2/3. Left: powers for Tv1T_{v_{1}}; right: powers for Tv2T_{v_{2}}.

6 Empirical Studies

In this section, we conduct empirical studies based on daily returns of S&P500 Index constituents from January 2000 to December 2020. The objective is to test, between two consecutive years, whether the principal eigenvalues, eigenvalue ratios and principal eigenvectors are equal or not.

6.1 Tests about principal eigenvalues

We plot in Figure 5 the values of the test statistic, Tλk,k=1,2,3,T_{\lambda k},k=1,2,3, together with the critical values at 5% significance level based on Theorem 4.

Refer to caption
Refer to caption
Refer to caption
Figure 5: Results of testing for equality of the first three principal eigenvalues between two consecutive years during 2000-2020. From top to bottom: testing equality of the first principal eigenvalue, the second principal eigenvalue, and the third principal eigenvalue, respectively.

We see from Figure 5 that for testing equality of the first principal eigenvalue, the test result is statistically significant for more than half of two consecutive years, suggesting that the first principal eigenvalue tends to change over time. The second and third principal eigenvalues seem a bit more stable.

6.2 Tests on eigenvalues ratios

We plot in Figure 6 the results of testing equality of eigenvalue ratios.

Refer to caption
Refer to caption
Refer to caption
Figure 6: Results of testing for equality of the first three principal eigenvalue ratios between two consecutive years during 2000-2020. From top to bottom: testing equality of the first principal eigenvalue ratio, the second principal eigenvalue ratio, and the third principal eigenvalue ratio, respectively.

An interesting observation is that, in sharp contrast with the tests about eigenvalues, for testing equality of eigenvalue ratios, the rejection rate is much lower. Such contrast suggests an interesting difference between the absolute sizes of principal eigenvalues and their relative sizes: while the absolute size appears to change frequently over time, the relative size is more stable.

6.3 Tests about principal eigenvectors

Figure 7 reports the results of tests about principal eigenvalues.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Results of testing for equality of the first three principal eigenvectors between two consecutive years during 2000-2020. From top to bottom: testing equality of the first principal eigenvector, the second principal eigenvector, and the third principal eigenvector, respectively.

Notice that in this case, the asymptotic distribution under the null hypothesis is a complicated generalized χ2\chi^{2} distribution. There is no explicit formula for computing the critical value. To solve this issue, we simulate a large number of observations from the limiting distribution, based on which we estimate the 95% quantile. That leads to the red dotted curve in the plots. Note that the critical values change over time. The reason is that the limiting distribution involves both population principal eigenvalues and eigenvectors, which are subject to change over time. The black curves report test statistic values.

For the test about the first principal eigenvector, we see that for all pairs of consecutive years, the value of the test statistic is well above the 95% quantile, so we should reject the null hypothesis that the first principal eigenvector is the same between two consecutive years. For the tests about the second and third principal eigenvector, we also reject the corresponding null hypothesis for most of the pairs of consecutive years. These findings have a significant implication on factor modeling. In particular, the results show that structural breaks due to principal eigenvectors occur more often than what one would have guessed based on stock market condition changes.

6.4 Summary of the three test results

Figure 8 summarizes the results of the three tests.

Refer to caption
Figure 8: Summary of the results of the three tests. Colors represent test results: red indicates rejection at 5% level, yellow for rejection at 10% level, and green for retaining at 10% level.

Figure 8 reveals that testing for equality of principal eigenvectors between two adjacent years result in more rejections than those about principal eigenvalues or eigenvalue ratios. Moreover, the tests about the first principal eigenvalue and eigenvector are more likely to be rejected than those about the second and third principal components. Let us point out that while it could be indeed the case that the first principal eigenvalue and eigenvector change more frequently than the second or third principal eigenvalue and eigenvector, the difference could also be due to that the first principal component is the strongest so that the related tests are most powerful.

7 Conclusion

We establish both one-sample and two-sample central limit theorems for principal eigenvalues and eigenvectors under large factor models. Based on these CLTs, we develop three tests to detect structural changes in large factor models. Our tests can reveal whether the change is in principal eigenvalues or eigenvectors or both. Numerically, these tests are found to have good finite sample performance. Applying these tests to daily returns of the S&P500 Index constituent stocks, we find that, between two consecutive years, the principal eigenvalues, eigenvalue ratios and principal eigenvectors all exhibit frequent changes.

References

  • Ahn and Horenstein (2013) Ahn, Seung C. and Horenstein, Alex R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3), 1203–1227.
  • Andersen (1963) Andersen, T. W. (1963). Asymptotic theory for principal component analysis. The Annals of Mathematical Statistics, 34, 122–148.
  • Bai (2003) Bai, Jushan. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1), 135–171.
  • Bai and Ng (2002) Bai, Jushan and Ng, Serena. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221.
  • Bai and Ng (2006) Bai, Jushan and Ng, Serena. (2006). Evaluating latent and observed factors in macroeconomics and finance. J. Econometrics, 131(1-2), 507–537.
  • Bai and Silverstein (2010) Bai, Zhidong and Silverstein, Jack W. (2010). Spectral analysis of large dimensional random matrices. Springer Series in Statistics, second edn, Springer, New York.
  • Bai and Yao (2008) Bai, Zhidong and Yao, Jianfeng. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat., 44(3), 447–474.
  • Bao et al. (2022) Bao, Zhigang, Ding, Xiucai, Wang, Jingming and Wang, Ke. (2022). Statistical inference for principal components of spiked covariance matrices. The Annals of Statistics, 50(2), 1144–1169.
  • Breitung and Eickmeier (2011) Breitung, Jörg and Eickmeier, Sandra. (2011). Testing for structural breaks in dynamic factor models. J. Econometrics, 163(1), 71–84.
  • Cai et al. (2020) Cai, T. Tony and Han, Xiao and Pan, Guangming. (2020). Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices. Annals of Statistics, 48(3), 1255–1280.
  • Chamberlain and Rothschild (1983) Chamberlain, Gary and Rothschild, Michael. (1983). Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. Econometrica, 51(5), 1281–1304.
  • Chen et al. (2020) Chen, Liang and Dolado, Juan J. and Gonzalo, Jesús. (2014). Detecting big structural breaks in large factor models. J. Econometrics, 180(1), 30–48.
  • Fama and French (1992) Fama, Eugene F. and French, Kenneth R. (1992). The Cross-Section of Expected Stock Returns. The Journal of Finance, 47(2), 427–465.
  • Fama and French (2015) Fama, Eugene F. and French, Kenneth R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22.
  • Fan et al. (2011) Fan, Jianqing and Liao, Yuan and Mincheva, Martina. (2011). High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39(6), 3320–3356.
  • Fan et al. (2013) Fan, Jianqing and Liao, Yuan and Mincheva, Martina. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol., 75(4), 603–680.
  • Han and Inoue (2015) Han, Xu and Inoue, Atsushi. (2015). Tests for parameter instability in dynamic factor models. Econometric Theory, 31(5), 1117–1152.
  • Onatski (2010) Onatski, Alexei. (2010). Determining the Number of Factors from Empirical Distribution of Eigenvalues. The Review of Economics and Statistics, 92(4), 1004–1016.
  • Ross (1976) Ross, Stephen. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3), 341–360.
  • Sharpe (1964) Sharpe, William. (1964). Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk. Journal of Finance, 19(3), 425–442.
  • Silverstein and Bai (1995) Silvertein, J. and Bai, Z. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. Journal of Multivariate Analysis, 54(2), 175–192.
  • Stock and Watson (1998) Stock, James H. and Watson, Mark W. (1998). Diffusion Indexes. Working Paper.
  • Stock and Watson (2002) Stock, James H. and Watson, Mark W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc., 97(460), 1167–1179.
  • Stock and Watson (2009) Stock, James H. and Watson, Mark W. (2009). Forecasting to dynamic factor models subject to structural instability. The methodology and practice of econometrics, Oxford Univ. Press, Oxford, 173–205.
  • Wang et al. (2014) Wang, Qinwen and Su, Zhonggen and Yao, Jianfeng (2014). Joint CLT for several random sesquilinear forms with applications to large-dimensional spiked population models. Electron. J. Probab., 19, 1–28.
  • Wang and Fan (2017) Wang, Weichen and Fan, Jianqing (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann. Statist., 45(3), 1342–1374.
  • Zheng et al. (2015) Zheng, Shurong and Bai, Zhidong and Yao, Jianfeng (2015). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist., 43(2), 546–591.

SUPPLEMENTARY MATERIAL

The supplementary material includes the proof of Theorem 1, 2, 3 and 6, and Corollary 1 in the main text.

S1. Notations

Recall the spectral decomposition of 𝚺=𝐕𝚲𝐕T{\boldsymbol{\Sigma}}={\bf V}{\boldsymbol{\Lambda}}{\bf V}^{\mathrm{T}}, where the orthogonal matrix 𝐕=(𝐯1,,𝐯N){\bf V}=({\bf v}_{1},\ldots,{\bf v}_{N}) consists of the eigenvectors of 𝚺{\boldsymbol{\Sigma}}, and 𝚲=diag(λ1,,λN){\boldsymbol{\Lambda}}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}) with eigenvalues λ1λN\lambda_{1}\geq\ldots\geq\lambda_{N}. Write 𝚲=diag(𝚲A,𝚲B){\boldsymbol{\Lambda}}=\operatorname{diag}({\boldsymbol{\Lambda}}_{A},{\boldsymbol{\Lambda}}_{B}), where

𝚲A=diag(λ1,,λr)and𝚲B=diag(λr+1,,λN).{\boldsymbol{\Lambda}}_{A}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{r})~{}~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}{\boldsymbol{\Lambda}}_{B}=\operatorname{diag}(\lambda_{r+1},\ldots,\lambda_{N}).

Define 𝐱t=𝐕T𝐲t{\bf x}_{t}={\bf V}^{\mathrm{T}}{\bf y}_{t}. Then Cov(𝐱t)=𝚲\operatorname{Cov}({\bf x}_{t})={\boldsymbol{\Lambda}}, and the eigenvectors of 𝚲{\boldsymbol{\Lambda}} are the unit vectors 𝐞1,,𝐞N{\bf e}_{1},\ldots,{\bf e}_{N}, where 𝐞k{\bf e}_{k} is the unit vector with 1 in the kkth entry and zeros elsewhere. Let 𝐒N=1/Tt=1T𝐱t𝐱tT{\bf S}_{N}=1/T\cdot\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}{\bf x}_{t}^{\mathrm{T}}, whose eigenvalues are denoted by λ^1λ^2λ^N\widehat{\lambda}_{1}\geq\widehat{\lambda}_{2}\geq\ldots\geq\widehat{\lambda}_{N} with corresponding eigenvectors 𝐮1,𝐮2,,𝐮N{\bf u}_{1},{\bf u}_{2},\ldots,{\bf u}_{N}. To resolve the ambiguity in the direction of an eigenvector, we specify the direction such that 𝐮k[k]0{\bf u}_{k}[k]\geq 0 for all 1kN1\leq k\leq N, namey, the kkth coordinate of the kkth eigenvector is nonnegative (although when the kkth coordinate is zero, the direction is still not specified, in which case we take an arbitrary direction.) Note that the eigenvalues of 𝚺^N=1Tt=1T𝐲t𝐲tT\widehat{{\boldsymbol{\Sigma}}}_{N}=\dfrac{1}{T}\sum_{t=1}^{T}{\bf y}_{t}{\bf y}_{t}^{\mathrm{T}} are the same as 𝐒N{\bf S}_{N}, and the eigenvectors are 𝐯^k=𝐕𝐮k\widehat{{\bf v}}_{k}={\bf V}{\bf u}_{k}. It follows that

𝐯k,𝐯^k=𝐯k,𝐕𝐮k=𝐯kT𝐕𝐮k=𝐞k,𝐮k.\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle=\langle{\bf v}_{k},{\bf V}{\bf u}_{k}\rangle={\bf v}_{k}^{\mathrm{T}}{\bf V}{\bf u}_{k}=\langle{\bf e}_{k},{\bf u}_{k}\rangle.

In the below, we focus on the analysis of principal eigenvalues and eigenvectors of 𝐒N{\bf S}_{N}.

Notation: For any square matrix 𝐀{\bf A}, tr(𝐀)\operatorname{tr}({\bf A}) denotes its trace, |𝐀||{\bf A}| its determinant, and 𝐀\|{\bf A}\| its spectral norm. For any vector 𝐯{\bf v}, 𝐯\|{\bf v}\| stands for its 2\ell_{2} norm. Write the (i,j)(i,j)th entry of any matrix 𝐖{\bf W} as [𝐖]ij[{\bf W}]_{ij} and 𝐯[k]{\bf v}[k] as the kkth entry of a vector 𝐯{\bf v}. Use γj(𝐀)\gamma_{j}({\bf A}) to denote the jjth largest eigenvalue of matrix 𝐀{\bf A}. The notation p\,{\buildrel p\over{\longrightarrow}}\, stands for convergence in probability, 𝒟\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}} represents convergence in law, Yn=op(f(n))Y_{n}=o_{p}(f(n)) means that Yn/f(n)p 0Y_{n}/f(n)\,{\buildrel p\over{\longrightarrow}}\,0, and Yn=Op(f(n))Y_{n}=O_{p}(f(n)) for that the sequence (Yn/f(n))(Y_{n}/f(n)) is tight. Write anbna_{n}\asymp b_{n} if c1bnanc2bnc_{1}b_{n}\leq a_{n}\leq c_{2}b_{n} for some constants c1,c2>0c_{1},c_{2}>0. For any sequence of random matrices (𝐖N)({\bf W}_{N}) with fixed dimension, write 𝐖N=op(1){\bf W}_{N}=o_{p}(1) or Op(1)O_{p}(1) if all the entries of 𝐖N{\bf W}_{N} are op(1)o_{p}(1) or Op(1)O_{p}(1), respectively. We say an event 𝒜n{\mathcal{A}}_{n} holds with high probability if P(𝒜n)1O(n)P({\mathcal{A}}_{n})\geq 1-O(n^{-\ell}) for any constant >0\ell>0. Let 𝐞k,𝐞~kA,𝐞~kB{\bf e}_{k},\widetilde{{\bf e}}_{kA},\widetilde{{\bf e}}_{kB} be the unit vectors with 1 in the kkth coordinate and zeros elsewhere of dimensions N,r,(Nr)N,r,(N-r), respectively. We use 𝐈{\bf I} to denote the identity matrix and 𝟏(){\mathbf{1}}(\cdot) to denote the indicator function. Denote +={z:Im(z)>0}{\mathbb{C}}^{+}=\{z\in{\mathbb{C}}:\operatorname{Im}(z)>0\}, where Im(z)\operatorname{Im}(z) is the imaginary part of a complex number zz. For any probability distribution G(x)G(x), its Stieltjes transform mG(z)m_{G}(z) is defined by

mG(z)=1xz𝑑G(x),z+.m_{G}(z)=\int\dfrac{1}{x-z}\ dG(x),~{}~{}~{}~{}~{}z\in{\mathbb{C}}^{+}.

In all the sequel, CC is a generic constant whose value may vary from place to place.

S2. Proof of Theorem 1

Proof of Theorem 1.

Recall that 𝐱t=𝐕T𝐲t{\bf x}_{t}={\bf V}^{\mathrm{T}}{\bf y}_{t}. Write

𝐗N×T=(𝐱1,,𝐱T):=(𝐱(1)T𝐱(N)T).{\bf X}_{N\times T}=({\bf x}_{1},\ldots,{\bf x}_{T}):=\begin{pmatrix}{{\bf x}_{(1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf x}_{(N)}}^{\mathrm{T}}\end{pmatrix}.

Let 𝐳t=𝚲1/2𝐱t{\bf z}_{t}={\boldsymbol{\Lambda}}^{-1/2}{\bf x}_{t}, 𝐳()=λ1/2𝐱(){\bf z}_{(\ell)}=\lambda_{\ell}^{-1/2}{\bf x}_{(\ell)} for t=1,,Tt=1,\ldots,T, =1,,N\ell=1,\ldots,N, and 𝐙=(𝐳1,,𝐳T)=(𝐳(1),,𝐳(N))T{\bf Z}=({\bf z}_{1},\ldots,{\bf z}_{T})=({\bf z}_{(1)},\ldots,{\bf z}_{(N)})^{\mathrm{T}}. Then

𝐗=𝚲1/2𝐙=(λ1𝐳(1)TλN𝐳(N)T)=(𝚲1/2𝐳1,,𝚲1/2𝐳T).\displaystyle{\bf X}={\boldsymbol{\Lambda}}^{1/2}{\bf Z}=\begin{pmatrix}\sqrt{\lambda_{1}}{{\bf z}_{(1)}}^{\mathrm{T}}\\ \vdots\\ \sqrt{\lambda_{N}}{{\bf z}_{(N)}}^{\mathrm{T}}\end{pmatrix}=({\boldsymbol{\Lambda}}^{1/2}{\bf z}_{1},\ldots,{\boldsymbol{\Lambda}}^{1/2}{\bf z}_{T}).

Write

𝐙A=(𝐙A)r×T=(𝐳(1)T𝐳(r)T)=(𝐳1(A),,𝐳T(A)),{\bf Z}_{A}=({\bf Z}_{A})_{r\times T}=\begin{pmatrix}{{\bf z}_{(1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf z}_{(r)}}^{\mathrm{T}}\end{pmatrix}=({\bf z}_{1}^{(A)},\ldots,{\bf z}_{T}^{(A)}),

and

𝐙B=(𝐙B)(Nr)×T=(𝐳(r+1)T𝐳(N)T)=(𝐳1(B),,𝐳T(B)).{\bf Z}_{B}=({\bf Z}_{B})_{(N-r)\times T}=\begin{pmatrix}{{\bf z}_{(r+1)}}^{\mathrm{T}}\\ \vdots\\ {{\bf z}_{(N)}}^{\mathrm{T}}\end{pmatrix}=({\bf z}_{1}^{(B)},\ldots,{\bf z}_{T}^{(B)}).

Write

𝐳t=(𝐳t(A)𝐳t(B)),𝐳t(A):r×1,𝐳t(B):(Nr)×1.{\bf z}_{t}=\begin{pmatrix}{\bf z}_{t}^{(A)}\\ {\bf z}_{t}^{(B)}\end{pmatrix},~{}~{}~{}~{}~{}{\bf z}_{t}^{(A)}:r\times 1,~{}~{}{\bf z}_{t}^{(B)}:(N-r)\times 1.

Define the companion matrix of 𝐒N{\bf S}_{N} as

𝐒¯N\displaystyle\underline{{\bf S}}_{N} :=\displaystyle:= 1T𝐗T𝐗=1T𝐙T𝚲𝐙=1Tj=1Tλj𝐳(j)𝐳(j)T\displaystyle\dfrac{1}{T}{\bf X}^{\mathrm{T}}{\bf X}=\dfrac{1}{T}{\bf Z}^{\mathrm{T}}{\boldsymbol{\Lambda}}{\bf Z}=\dfrac{1}{T}\sum_{j=1}^{\mathrm{T}}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}
=\displaystyle= 1Tj=1rλj𝐳(j)𝐳(j)T+1Tj=r+1Nλj𝐳(j)𝐳(j)T\displaystyle\dfrac{1}{T}\sum_{j=1}^{r}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}+\dfrac{1}{T}\sum_{j=r+1}^{N}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}
=:\displaystyle=: 𝐒¯11+𝐒¯22,\displaystyle\underline{{\bf S}}_{11}+\underline{{\bf S}}_{22},

where

𝐒¯11\displaystyle\underline{{\bf S}}_{11} =\displaystyle= 1Tj=1rλj𝐳(j)𝐳(j)T=1T𝐙AT𝚲A𝐙A,\displaystyle\dfrac{1}{T}\sum_{j=1}^{r}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}=\dfrac{1}{T}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}{\bf Z}_{A},
𝐒¯22\displaystyle\underline{{\bf S}}_{22} =\displaystyle= 1Tj=r+1Nλj𝐳(j)𝐳(j)T=1T𝐙BT𝚲B𝐙B.\displaystyle\dfrac{1}{T}\sum_{j=r+1}^{N}\lambda_{j}{\bf z}_{(j)}{\bf z}_{(j)}^{\mathrm{T}}=\dfrac{1}{T}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}{\bf Z}_{B}.

Further denote the companion matrices of 𝐒¯11\underline{{\bf S}}_{11} and 𝐒¯22\underline{{\bf S}}_{22} as

𝐒11=1T𝚲A1/2𝐙A𝐙AT𝚲A1/2,𝐒22=1T𝚲B1/2𝐙B𝐙BT𝚲B1/2.{\bf S}_{11}=\dfrac{1}{T}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2},~{}~{}~{}~{}~{}~{}~{}{\bf S}_{22}=\dfrac{1}{T}{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}.

Define the event s={𝐒22Cs}{\mathcal{F}}_{s}=\{\|{\bf S}_{22}\|\leq C_{s}\} for some constant Cs(0,+)C_{s}\in(0,+\infty). By Wely’s Theorem, Assumption (A.ii) and Theorem 9.13 of Bai and Silverstein (2010), for any >0\ell>0, we can choose a CsC_{s} sufficiently large such that

P(sc)=o(T).P({\mathcal{F}}_{s}^{c})=o(T^{-\ell}). (5)

Note that the non-zero eigenvalues of 𝐒¯11\underline{{\bf S}}_{11} and 𝐒¯22\underline{{\bf S}}_{22} are the same as their companion matrices 𝐒11{\bf S}_{11} and 𝐒22{\bf S}_{22}, respectively. For any principal eigenvalues λk\lambda_{k}, k=1,,rk=1,\ldots,r, the matrix

𝐒11λk=1T(𝚲Aλk)1/2𝐙A𝐙AT(𝚲Aλk)1/2\dfrac{{\bf S}_{11}}{\lambda_{k}}=\dfrac{1}{T}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)^{1/2}

is in the low-dimensional situation as considered in Theorem 1 of Andersen (1963), by which one has γj(𝐒11)λkλjλk0\dfrac{\gamma_{j}({\bf S}_{11})}{\lambda_{k}}-\dfrac{\lambda_{j}}{\lambda_{k}}\rightarrow 0 for 1j,kr1\leq j,k\leq r. Because 𝐒22=Op(1)\|{\bf S}_{22}\|=O_{p}(1), by Wely’s Theorem that

γj(𝐒11)+γT(𝐒22)λ^jγj(𝐒11)+γ1(𝐒22),1jr,\displaystyle\gamma_{j}({\bf S}_{11})+\gamma_{T}({\bf S}_{22})\leq\widehat{\lambda}_{j}\leq\gamma_{j}({\bf S}_{11})+\gamma_{1}({\bf S}_{22}),~{}~{}~{}~{}1\leq j\leq r, (6)

we get

λ^jλkλjλkp  0for1j,kr.\displaystyle\dfrac{\widehat{\lambda}_{j}}{\lambda_{k}}-\dfrac{\lambda_{j}}{\lambda_{k}}\ \,{\buildrel p\over{\longrightarrow}}\,\ 0~{}~{}~{}~{}~{}~{}\textrm{for}~{}1\leq j,k\leq r. (7)

In particular, λ^k/λkp 1{\widehat{\lambda}_{k}}/{\lambda_{k}}\,{\buildrel p\over{\longrightarrow}}\,1 for k=1,,rk=1,\ldots,r.

Next, we derive the central limit theorem of λ^k/λk\widehat{\lambda}_{k}/\lambda_{k} for 1kr1\leq k\leq r.

Write 𝐱t=(𝐱t(A)𝐱t(B)){\bf x}_{t}=\begin{pmatrix}{\bf x}_{t}^{(A)}\\ {\bf x}_{t}^{(B)}\end{pmatrix}, where 𝐱t(A)=𝚲A1/2𝐳t(A){\bf x}_{t}^{(A)}={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf z}_{t}^{(A)}, 𝐱t(B)=𝚲B1/2𝐳t(B){\bf x}_{t}^{(B)}={\boldsymbol{\Lambda}}_{B}^{1/2}{\bf z}_{t}^{(B)}. Further denote

𝐗A=(𝐱1(A),,𝐱T(A))=𝚲A1/2𝐙A,and𝐗B=(𝐱1(B),,𝐱T(B))=𝚲B1/2𝐙B.{\bf X}_{A}=({\bf x}_{1}^{(A)},\ldots,{\bf x}_{T}^{(A)})={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A},~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}{\bf X}_{B}=({\bf x}_{1}^{(B)},\ldots,{\bf x}_{T}^{(B)})={\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}.

The sample covariance matrix 𝐒N{\bf S}_{N} can be decomposed as

𝐒N\displaystyle{\bf S}_{N} =\displaystyle= 1Tt=1T𝐱t𝐱tT=1T(t=1T𝐱t(A)𝐱t(A)Tt=1T𝐱t(A)𝐱t(B)Tt=1T𝐱t(B)𝐱t(A)Tt=1T𝐱t(B)𝐱t(B)T)\displaystyle\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}{\bf x}_{t}^{\mathrm{T}}=\dfrac{1}{T}\begin{pmatrix}\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(A)}{{\bf x}_{t}^{(A)}}^{\mathrm{T}}&\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(A)}{{\bf x}_{t}^{(B)}}^{\mathrm{T}}\\ \sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(B)}{{\bf x}_{t}^{(A)}}^{\mathrm{T}}&\sum_{t=1}^{\mathrm{T}}{\bf x}_{t}^{(B)}{{\bf x}_{t}^{(B)}}^{\mathrm{T}}\end{pmatrix}
=\displaystyle= 1T(𝐗A𝐗AT𝐗A𝐗BT𝐗B𝐗AT𝐗B𝐗BT)=1T(𝚲A1/2𝐙A𝐙AT𝚲A1/2𝚲A1/2𝐙A𝐙BT𝚲B1/2𝚲B1/2𝐙B𝐙AT𝚲A1/2𝚲B1/2𝐙B𝐙BT𝚲B1/2)\displaystyle\dfrac{1}{T}\begin{pmatrix}{\bf X}_{A}{\bf X}_{A}^{\mathrm{T}}&{\bf X}_{A}{\bf X}_{B}^{\mathrm{T}}\\ {\bf X}_{B}{\bf X}_{A}^{\mathrm{T}}&{\bf X}_{B}{\bf X}_{B}^{\mathrm{T}}\end{pmatrix}=\dfrac{1}{T}\begin{pmatrix}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}&{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}\\ {\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}&{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf Z}_{B}{\bf Z}_{B}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{B}^{1/2}\end{pmatrix}
=\displaystyle= (𝐒11𝐒12𝐒21𝐒22).\displaystyle\begin{pmatrix}{\bf S}_{11}&{\bf S}_{12}\\ {\bf S}_{21}&{\bf S}_{22}\end{pmatrix}.

Under Assumptions (A.ii), B and C, by Silverstein and Bai (1995), the ESD of 𝐒22{\bf S}_{22} almost surely converges to a non-random probability distribution FF whose Stieltjes transform m(z)m(z) is the unique solution in the domain +{\mathbb{C}}^{+} to the equation

m(z)=1t(1ρρzm(z))z𝑑H(z),for all z+.m(z)=\int\dfrac{1}{t(1-\rho-\rho zm(z))-z}\ dH(z),~{}~{}~{}~{}\mbox{for all }z\in{\mathbb{C}}^{+}.

By definition, each principal sample eigenvalues λ^k\widehat{\lambda}_{k} solves the equation

0=|λ𝐈𝐒N|=|λ𝐈𝐒22||λ𝐈𝐊~N(λ)|,0=|\lambda{\bf I}-{\bf S}_{N}|=|\lambda{\bf I}-{\bf S}_{22}|\cdot|\lambda{\bf I}-\widetilde{{\bf K}}_{N}(\lambda)|, (8)

where

𝐊~N(λ)=𝐒11+𝐒12(λ𝐈𝐒22)1𝐒21=1T𝐗A(𝐈+𝐀N)𝐗AT=𝚲A1/2𝐊N(λ)𝚲A1/2,\widetilde{{\bf K}}_{N}(\lambda)={\bf S}_{11}+{\bf S}_{12}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}=\dfrac{1}{T}{\bf X}_{A}({\bf I}+{\bf A}_{N}){\bf X}_{A}^{\mathrm{T}}={\boldsymbol{\Lambda}}_{A}^{1/2}{\bf K}_{N}(\lambda){\boldsymbol{\Lambda}}_{A}^{1/2}, (9)

and

𝐀N=𝐀N(λ)=1T𝐗BT(λ𝐈𝐒22)1𝐗B,𝐊N(λ)=1T𝐙A(𝐈+𝐀N)𝐙AT.{\bf A}_{N}={\bf A}_{N}(\lambda)=\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B},~{}~{}~{}~{}~{}~{}~{}{\bf K}_{N}(\lambda)=\dfrac{1}{T}{\bf Z}_{A}({\bf I}+{\bf A}_{N}){\bf Z}_{A}^{\mathrm{T}}.

Further define

𝐑N=𝐑N(λ)=1T(𝐙A(𝐈+𝐀N)𝐙ATtr(𝐈+𝐀N)𝐈),{\bf R}_{N}={\bf R}_{N}(\lambda)=\dfrac{1}{\sqrt{T}}\left({\bf Z}_{A}({\bf I}+{\bf A}_{N}){\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot{\bf I}\right),

then

𝐊N(λ)=1T𝐑N(λ)+1Ttr(𝐈+𝐀N(λ))𝐈.\displaystyle{\bf K}_{N}(\lambda)=\dfrac{1}{\sqrt{T}}{\bf R}_{N}(\lambda)+\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda))\cdot{\bf I}. (10)

We first give three lemmas, which will be repeatedly used in the following proofs. The first lemma is about the random matrix 𝐀N{\bf A}_{N}, and the second and third ones are about the limiting distributions of (𝐑N(λ))({\bf R}_{N}(\lambda)). The proofs of these lemmas are postponed to the end of this subsection.

Lemma 1.

Under Assumptions A–C, for i,j=1,,ri,j=1,\ldots,r, we have

1Ttr𝐀N(λi)=Op(N1),1Ttr(𝐀N(λi)𝐀N(λj))=Op(N2),\displaystyle\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}(\lambda_{i})=O_{p}(N^{-1}),~{}~{}~{}~{}~{}\dfrac{1}{T}\operatorname{tr}\left({\bf A}_{N}(\lambda_{i}){\bf A}_{N}(\lambda_{j})\right)=O_{p}(N^{-2}),~{}~{}~{}~{}
and 1T=1T([𝐀N(λi)][𝐀N(λj)])=Op(N2).\displaystyle~{}~{}~{}~{}\dfrac{1}{T}\sum_{\ell=1}^{\mathrm{T}}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}[{\bf A}_{N}(\lambda_{j})]_{\ell\ell}\right)=O_{p}(N^{-2}). (11)
Lemma 2.

Under Assumptions A–C and assume that λN\lambda\asymp N, the random matrix 𝐑N(λ){\bf R}_{N}(\lambda) converges weakly to a symmetric Gaussian random matrix 𝐑=([𝐑]ij){\bf R}=([{\bf R}]_{ij}) with zero-mean and the following covariance function:

Cov([𝐑]ij,[𝐑]ij)={0,ifii,orjj,Var([𝐑]ij)=1,ifi=ij=j,Var([𝐑]ii)=E(𝐳1[i])41,ifi=i=j=j.\displaystyle\operatorname{Cov}\left([{\bf R}]_{ij},[{\bf R}]_{i^{\prime}j^{\prime}}\right)=\left\{\begin{array}[]{ll}0,&\textrm{if}~{}i\neq i^{\prime},~{}\textrm{or}~{}j\neq j^{\prime},\\ \operatorname{Var}\left([{\bf R}]_{ij}\right)=1,&\textrm{if}~{}i=i^{\prime}\neq j=j^{\prime},\\ \operatorname{Var}\left([{\bf R}]_{ii}\right)=\operatorname{E}({\bf z}_{1}[i])^{4}-1,&\textrm{if}~{}i=i^{\prime}=j=j^{\prime}.\end{array}\right.
Lemma 3.

Under Assumptions A–C and λN\lambda\asymp N, the block diagonal random matrix 𝐑JN=diag(𝐑N(λ1),,𝐑N(λr)){\bf R}_{J_{N}}=\operatorname{diag}({\bf R}_{N}(\lambda_{1}),\ldots,{\bf R}_{N}(\lambda_{r})) converges weakly to a symmetric Gaussian block diagonal random matrix 𝐑J=diag(𝐑1,,𝐑r){\bf R}_{J}=\operatorname{diag}({\bf R}_{1},\ldots,{\bf R}_{r}) with zero-mean and the following covariance function, for any 1m,mr1\leq m,m^{\prime}\leq r,

Cov([𝐑m]ij,[𝐑m]ij)={0,ifii,orjj,1,ifi=ij=j,E(𝐳1[i])41,ifi=i=j=j.\displaystyle\operatorname{Cov}\left([{\bf R}_{m}]_{ij},[{\bf R}_{m^{\prime}}]_{i^{\prime}j^{\prime}}\right)=\left\{\begin{array}[]{ll}0,&\textrm{if}~{}i\neq i^{\prime},~{}\textrm{or}~{}j\neq j^{\prime},\\ 1,&\textrm{if}~{}i=i^{\prime}\neq j=j^{\prime},\\ \operatorname{E}({\bf z}_{1}[i])^{4}-1,&\textrm{if}~{}i=i^{\prime}=j=j^{\prime}.\end{array}\right.

We now return to the analysis of principal sample eigenvalues λ^k\widehat{\lambda}_{k}. Noting that the principal eigenvalues of 𝐒N{\bf S}_{N} go to infinity and the estimate (5), without loss of generality, we can assume that for NN large enough, λ^k\widehat{\lambda}_{k} is not an eigenvalue of 𝐒22{\bf S}_{22}. It follows that λ^k\widehat{\lambda}_{k} is the kkth eigenvalue of matrix 𝐊~N(λ^k)\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k}).

Note that

𝐊N(λk)𝐊N(λ^k)\displaystyle{\bf K}_{N}(\lambda_{k})-{\bf K}_{N}(\widehat{\lambda}_{k}) =\displaystyle= 1T2𝐙A𝐗BT((λ^k𝐈𝐒22)1(λk𝐈𝐒22)1)𝐗B𝐙AT\displaystyle\dfrac{1}{T^{2}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right){\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}
=\displaystyle= (1λ^kλk)𝐐N,\displaystyle\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right){\bf Q}_{N},

where

𝐐N=1T2λ^k𝐙A𝐗BT(𝐈λ^k1𝐒22)1(𝐈λk1𝐒22)1𝐗B𝐙AT.{\bf Q}_{N}=\dfrac{1}{T^{2}\widehat{\lambda}_{k}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\widehat{\lambda}_{k}^{-1}{\bf S}_{22}\right)^{-1}\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-1}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}.

By the elementary formulae,

𝐗T(𝐈𝐗𝐗T)1=(𝐈𝐗T𝐗)1𝐗T,(𝐈𝐗𝐗T)1𝐗=𝐗(𝐈𝐗T𝐗)1,{\bf X}^{\mathrm{T}}({\bf I}-{\bf X}{\bf X}^{\mathrm{T}})^{-1}=({\bf I}-{\bf X}^{\mathrm{T}}{\bf X})^{-1}{\bf X}^{\mathrm{T}},~{}~{}~{}~{}~{}({\bf I}-{\bf X}{\bf X}^{\mathrm{T}})^{-1}{\bf X}={\bf X}({\bf I}-{\bf X}^{\mathrm{T}}{\bf X})^{-1},

it follows that

𝐐N\displaystyle{\bf Q}_{N} =\displaystyle= 1T2λ^k𝐙A(𝐈1Tλ^k𝐗BT𝐗B)1𝐗BT𝐗B(𝐈1Tλk𝐗BT𝐗B)1𝐙AT\displaystyle\dfrac{1}{T^{2}\widehat{\lambda}_{k}}{\bf Z}_{A}\left({\bf I}-\dfrac{1}{T\widehat{\lambda}_{k}}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\right)^{-1}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\left({\bf I}-\dfrac{1}{T\lambda_{k}}{\bf X}_{B}^{\mathrm{T}}{\bf X}_{B}\right)^{-1}{\bf Z}_{A}^{\mathrm{T}}
=\displaystyle= Op(1/N),\displaystyle O_{p}(1/N),

where the last step comes from the fact that eigenvalues of 𝐒22{\bf S}_{22} are Op(1)O_{p}(1) and an analysis similar to that of 𝐑N{\bf R}_{N} in the proof of Lemma 2 below. It follows from (7) that

𝐊N(λk)𝐊N(λ^k)=op(1/N).\displaystyle{\bf K}_{N}(\lambda_{k})-{\bf K}_{N}(\widehat{\lambda}_{k})=o_{p}(1/N). (14)

Recall that λ^k\widehat{\lambda}_{k} is the kkth largest eigenvalue of matrix 𝐊~N(λ^k)\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k}). Denote matrix 𝐃=(λ^k𝐈𝐊~N(λ^k))/λk{\bf D}=\left(\widehat{\lambda}_{k}{\bf I}-\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\right)/\lambda_{k}. From Assumption A, Lemmas 1, 2, and equations (7), (10) and (14), it follows that

[𝐃]kk=(λ^kλk1)1T[𝐑N(λk)]kk+Op(1/T),[{\bf D}]_{kk}=\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{kk}+O_{p}(1/T),

for iki\neq k,

[𝐃]ii=(λ^kλkλiλk)λiλk1T[𝐑N(λk)]ii+Op(1/T)p 1θi/θk0.[{\bf D}]_{ii}=\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-\dfrac{\lambda_{i}}{\lambda_{k}}\right)-\dfrac{\lambda_{i}}{\lambda_{k}}\cdot\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{ii}+O_{p}(1/T)\,{\buildrel p\over{\longrightarrow}}\,1-\theta_{i}/\theta_{k}\neq 0. (15)

and

[𝐃]ij=λiλjλk1T[𝐑N(λk)]ij+op(1/N)=Op(1/T).[{\bf D}]_{ij}=-\dfrac{\sqrt{\lambda_{i}\lambda_{j}}}{\lambda_{k}}\cdot\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{ij}+o_{p}(1/N)=O_{p}(1/\sqrt{T}).

Let det(𝐀)({\bf A}) be the determinant of a matrix 𝐀{\bf A}, then

0\displaystyle 0 =\displaystyle= det(λ^k𝐈𝐊~N(λ^k))/λk\displaystyle\textrm{det}\left(\widehat{\lambda}_{k}{\bf I}-\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\right)/\lambda_{k}
=\displaystyle= det([𝐃]11Op(1T)[𝐃]kkOp(1/T)[𝐃]rr)\displaystyle\textrm{det}\left(\begin{array}[]{ccccc}[{\bf D}]_{11}&&&&\\ &\ddots&&&O_{p}(1\sqrt{T})\\ &&[{\bf D}]_{kk}&&\\ O_{p}(1/\sqrt{T})&&&\ddots&\\ &&&&[{\bf D}]_{rr}\end{array}\right)
=\displaystyle= (i=1,ikr[𝐃]ii)[(λ^kλk1)1T[𝐑N]kk]\displaystyle\left(\prod_{i=1,i\neq k}^{r}[{\bf D}]_{ii}\right)\cdot\left[\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\frac{1}{\sqrt{T}}[{\bf R}_{N}]_{kk}\right]
+Op(1/T)[(λ^kλk1)1T[𝐑N]kk]+Op(1/T).\displaystyle~{}~{}~{}~{}~{}~{}+O_{p}(1/T)\left[\left(\frac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-\frac{1}{\sqrt{T}}[{\bf R}_{N}]_{kk}\right]+O_{p}(1/T).

Using (15) we then obtain that

λ^kλk=1+1T[𝐑N(λk)]kk+Op(1/T).\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}=1+\dfrac{1}{\sqrt{T}}[{\bf R}_{N}(\lambda_{k})]_{kk}+O_{p}(1/T).

In particular,

T(λ^kλk1)[𝐑N(λk)]kkp  0.\displaystyle\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)-[{\bf R}_{N}(\lambda_{k})]_{kk}\ \,{\buildrel p\over{\longrightarrow}}\,\ 0. (17)

By Lemma 2, we obtain

T(λ^kλk1)𝒟N(0,E(𝐳1[k])41).\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\operatorname{E}({\bf z}_{1}[k])^{4}-1).

Similarly, the joint convergence of T(λ^1/λ11,,λ^r/λr1)\sqrt{T}\left({\widehat{\lambda}_{1}}/{\lambda_{1}}-1,\cdots,{\widehat{\lambda}_{r}}/{\lambda_{r}}-1\right) follows from (17) and Lemma 3. ∎

We now prove Lemmas 13.

Proof of Lemma 1.

By the estimate (5), it suffices to prove Lemma 1 for 𝐀N(λi)𝟏(s){\bf A}_{N}(\lambda_{i}){\mathbf{1}}({\mathcal{F}}_{s}) and 𝐀N(λj)𝟏(s){\bf A}_{N}(\lambda_{j}){\mathbf{1}}({\mathcal{F}}_{s}). Under Assumption A, we have

1Ttr𝐀N(λi)𝟏(s)=1Ttr((λi𝐈𝐒22)1𝐒22)𝟏(s)=Op(N1),\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}(\lambda_{i}){\mathbf{1}}({\mathcal{F}}_{s})=\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda_{i}{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right){\mathbf{1}}({\mathcal{F}}_{s})=O_{p}(N^{-1}),

and

1Ttr(𝐀N(λi)𝐀N(λj))𝟏(s)\displaystyle\dfrac{1}{T}\operatorname{tr}\left({\bf A}_{N}(\lambda_{i}){\bf A}_{N}(\lambda_{j})\right){\mathbf{1}}({\mathcal{F}}_{s}) =1Ttr((λi𝐈𝐒22)1𝐒22(λj𝐈𝐒22)1𝐒22)𝟏(s)\displaystyle=\dfrac{1}{T}\operatorname{tr}\left((\lambda_{i}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}(\lambda_{j}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}\right){\mathbf{1}}({\mathcal{F}}_{s})
=Op(N2).\displaystyle=O_{p}\left(N^{-2}\right).

To prove (1/T)=1T[𝐀N(λi)][𝐀N(λj)]𝟏(s)=Op(N2)(1/T)\sum_{\ell=1}^{T}[{\bf A}_{N}(\lambda_{i})]_{\ell\ell}[{\bf A}_{N}(\lambda_{j})]_{\ell\ell}{\mathbf{1}}({\mathcal{F}}_{s})=O_{p}(N^{-2}), it suffices to show that

maximaxE([𝐀N(λi)]2𝟏(s))=O(N2).\displaystyle\max_{i}\max_{\ell}\operatorname{E}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)=O(N^{-2}). (18)

Note that [𝐀N]=T1𝐳(B)T𝚲B1/2(λ𝐈𝐒22)1𝚲B1/2𝐳(B)[{\bf A}_{N}]_{\ell\ell}=T^{-1}{{\bf z}_{\ell}^{(B)}}^{T}{\boldsymbol{\Lambda}}_{B}^{1/2}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\boldsymbol{\Lambda}}_{B}^{1/2}{\bf z}_{\ell}^{(B)} are identically distributed for different \ell, hence

E[𝐀N(λi)]2𝟏(s)\displaystyle\operatorname{E}[{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s}) =\displaystyle= 1T=1TE([𝐀N(λi)]2𝟏(s))1TE(tr(𝐀N(λi))2𝟏(s))\displaystyle\dfrac{1}{T}\sum_{\ell=1}^{T}\operatorname{E}\left([{\bf A}_{N}(\lambda_{i})]_{\ell\ell}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)\leq\dfrac{1}{T}\operatorname{E}\left(\operatorname{tr}\left({\bf A}_{N}(\lambda_{i})\right)^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)
=\displaystyle= 1TE(tr((λi𝐈𝐒22)1𝐒22)2𝟏(s))CN2.\displaystyle\dfrac{1}{T}\operatorname{E}\left(\operatorname{tr}\left((\lambda_{i}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{22}\right)^{2}{\mathbf{1}}({\mathcal{F}}_{s})\right)\leq\dfrac{C}{N^{2}}.

Proof of Lemma 2.

Recall that 𝐙A=(𝐳(1),,𝐳(r))T{\bf Z}_{A}=({\bf z}_{(1)},\ldots,{\bf z}_{(r)})^{\mathrm{T}}. We have

[𝐙A(𝐈+𝐀N(λ))𝐙AT]ij=𝐳(i)T(𝐈+𝐀N(λ))𝐳(j).[{\bf Z}_{A}({\bf I}+{\bf A}_{N}(\lambda)){\bf Z}_{A}^{\mathrm{T}}]_{ij}={\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda)){\bf z}_{(j)}.

Consider the random vector of dimension K=12r(r+1)K=\dfrac{1}{2}r(r+1):

𝐖N=𝐖N(λ):=1T(𝐳(i)T(𝐈N+𝐀N(λ))𝐳(j)tr(𝐈+𝐀N(λ))E(𝐳(i)[1]𝐳(j)[1]))1ijr.{\bf W}_{N}={\bf W}_{N}(\lambda):=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}_{N}+{\bf A}_{N}(\lambda)){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}.

For any 1,K1\leq\ell,\ell^{\prime}\leq K, there exist two pairs (i,j)(i,j) and (i,j)(i^{\prime},j^{\prime}), 1ijr, 1ijr1\leq i\leq j\leq r,\ 1\leq i^{\prime}\leq j^{\prime}\leq r such that

𝐖N[]=1T(𝐳(i)T(𝐈+𝐀N)𝐳(j)tr(𝐈+𝐀N)E(𝐳(i)[1]𝐳(j)[1])),{\bf W}_{N}[\ell]=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right),

and

𝐖N[]=1T(𝐳(i)T(𝐈+𝐀N)𝐳(j)tr(𝐈+𝐀N)E(𝐳(i)[1]𝐳(j)[1])).{\bf W}_{N}[\ell^{\prime}]=\dfrac{1}{\sqrt{T}}\left({\bf z}_{(i^{\prime})}^{\mathrm{T}}({\bf I}+{\bf A}_{N}){\bf z}_{(j^{\prime})}-\operatorname{tr}({\bf I}+{\bf A}_{N})\cdot\operatorname{E}({\bf z}_{(i^{\prime})}[1]{\bf z}_{(j^{\prime})}[1])\right).

By Lemma 1, we have

limT1Tt=1T([𝐈+𝐀N]ii)2=1,andlimT1Ttr(𝐈+𝐀N)2=1.\lim_{T\rightarrow\infty}\dfrac{1}{T}\sum_{t=1}^{\mathrm{T}}\left([{\bf I}+{\bf A}_{N}]_{ii}\right)^{2}=1,~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}\lim_{T\rightarrow\infty}\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N})^{2}=1.

By Corollary 7.1 of Bai and Yao (2008), the random vector 𝐖N{\bf W}_{N} converges weakly to a KK-dimensional Gaussian vector with mean zero and covariance matrix 𝚪W{\boldsymbol{\Gamma}}_{W} satisfying [𝚪W]=ρ(i,j)(i,j)[{\boldsymbol{\Gamma}}_{W}]_{\ell\ell^{\prime}}=\rho_{(i,j)(i^{\prime},j^{\prime})}, where

ρ(i,j)(i,j)\displaystyle\rho_{(i,j)(i^{\prime},j^{\prime})} =\displaystyle= E(𝐳(i)[1]𝐳(j)[1]𝐳(i)[1]𝐳(j)[1])E(𝐳(i)[1]𝐳(j)[1])E(𝐳(i)[1]𝐳(j)[1])\displaystyle\operatorname{E}\left({\bf z}_{(i)}[1]\ {\bf z}_{(j)}[1]\ {\bf z}_{(i^{\prime})}[1]\ {\bf z}_{(j^{\prime})}[1]\right)-\operatorname{E}\left({\bf z}_{(i)}[1]\ {\bf z}_{(j)}[1]\right)\operatorname{E}\left({\bf z}_{(i^{\prime})}[1]\ {\bf z}_{(j^{\prime})}[1]\right)
=\displaystyle= E(𝐳1[i]𝐳1[j]𝐳1[i]𝐳1[j])E(𝐳1[i]𝐳1[j])E(𝐳1[i]𝐳1[j]).\displaystyle\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\ {\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right)-\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\right)\operatorname{E}\left({\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right).

The result follows. ∎

Proof of Lemma 3.

Consider the block diagonal random matrix

𝐑JN=diag(𝐑N(λ1),,𝐑N(λr)){\bf R}_{J_{N}}=\operatorname{diag}\left({\bf R}_{N}(\lambda_{1}),\ldots,{\bf R}_{N}(\lambda_{r})\right)

as an M=r×12r(r+1)M=r\times\dfrac{1}{2}r(r+1) dimensional vector

1T((𝐳(i)T(𝐈+𝐀N(λ1))𝐳(j)tr(𝐈+𝐀N(λ1))E(𝐳(i)[1]𝐳(j)[1]))1ijr,,\displaystyle\dfrac{1}{\sqrt{T}}\left(\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda_{1})){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{1}))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}~{},\ldots,\right.
(𝐳(i)T(𝐈+𝐀N(λr))𝐳(j)tr(𝐈+𝐀N(λr))E(𝐳(i)[1]𝐳(j)[1]))1ijr).\displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left({\bf z}_{(i)}^{\mathrm{T}}({\bf I}+{\bf A}_{N}(\lambda_{r})){\bf z}_{(j)}-\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{r}))\cdot\operatorname{E}({\bf z}_{(i)}[1]{\bf z}_{(j)}[1])\right)_{1\leq i\leq j\leq r}\right).

By Assumption A–C and Lemma 1, the block diagonal random matrix 𝐑JN{\bf R}_{J_{N}} converges weakly to a symmetric Gaussian block diagonal random matrix

𝐑J=diag(𝐑1,,𝐑r){\bf R}_{J}=\operatorname{diag}({\bf R}_{1},\ldots,{\bf R}_{r})

with mean zero and covariance function as follows: for any 1m,mr1\leq m,m^{\prime}\leq r,

Cov([𝐑m]ij,[𝐑m]ij)=E(𝐳1[i]𝐳1[j]𝐳1[i]𝐳1[j])E(𝐳1[i]𝐳1[j])E(𝐳1[i]𝐳1[j]).\operatorname{Cov}\left([{\bf R}_{m}]_{ij},[{\bf R}_{m^{\prime}}]_{i^{\prime}j^{\prime}}\right)=\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\ {\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right)-\operatorname{E}\left({\bf z}_{1}[i]\ {\bf z}_{1}[j]\right)\cdot\operatorname{E}\left({\bf z}_{1}[i^{\prime}]\ {\bf z}_{1}[j^{\prime}]\right).

The conclusion follows. ∎

S3. Proof of Theorem 2

Proof.

Write

tr(𝚺r)=j=r+1Nλj,tr(𝐒r)=j=r+1Nλ^j.\displaystyle\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})=\sum_{j=r+1}^{N}\lambda_{j},~{}~{}~{}~{}~{}~{}\operatorname{tr}({\bf S}_{-r})=\sum_{j=r+1}^{N}\widehat{\lambda}_{j}.

To prove Theorem 2, we first show that tr(𝐒r)/N\operatorname{tr}({\bf S}_{-r})/N has a faster convergence rate than T\sqrt{T}, that is,

Δ:=T(1Ntr(𝐒r)1Ntr(𝚺r))p  0.\displaystyle\Delta:=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{-r})-\dfrac{1}{N}\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})\right)\ \,{\buildrel p\over{\longrightarrow}}\,\ 0. (19)

Decompose Δ=Δ1+Δ2\Delta=\Delta_{1}+\Delta_{2}, where

Δ1=T(1Ntr(𝐒r)1Ntr(𝐒22)),Δ2=T(1Ntr(𝐒22)1Ntr(𝚺r)).\displaystyle\Delta_{1}=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{-r})-\dfrac{1}{N}\operatorname{tr}({\bf S}_{22})\right),~{}~{}~{}~{}\Delta_{2}=\sqrt{T}\left(\dfrac{1}{N}\operatorname{tr}({\bf S}_{22})-\dfrac{1}{N}\operatorname{tr}({\boldsymbol{\Sigma}}_{-r})\right).

Note that Δ2p 0\Delta_{2}\,{\buildrel p\over{\longrightarrow}}\,0 by Theorem 2.1 of Zheng et al. (2015).

Next we analyze Δ1\Delta_{1}. Note that

tr(𝐒N)=tr(𝐒11)+tr(𝐒22)=j=1rλ^j+tr(𝐒r).\operatorname{tr}({\bf S}_{N})=\operatorname{tr}({\bf S}_{11})+\operatorname{tr}({\bf S}_{22})=\sum_{j=1}^{r}\widehat{\lambda}_{j}+\operatorname{tr}({\bf S}_{-r}).

Hence

Δ1=j=1rT(γj(𝐒11/N)λ^j/N).\Delta_{1}=\sum_{j=1}^{r}\sqrt{T}\left(\gamma_{j}({\bf S}_{11}/N)-\widehat{\lambda}_{j}/N\right).

By inequality (6), we have

rTNγT(𝐒22)j=1rT(λ^jNγj(𝐒11/N))rTNγ1(𝐒22).\dfrac{r\sqrt{T}}{N}\ \gamma_{T}({\bf S}_{22})\ \leq\ \sum_{j=1}^{r}\sqrt{T}\left(\dfrac{\widehat{\lambda}_{j}}{N}-\gamma_{j}({\bf S}_{11}/N)\right)\ \leq\ \dfrac{r\sqrt{T}}{N}\ \gamma_{1}({\bf S}_{22}).

By Assumption (A.ii) and Assumption C, we have Δ1p 0\Delta_{1}\,{\buildrel p\over{\longrightarrow}}\,0. Hence, (19) holds.

We can then rewrite the result of Theorem 1 as follows:

T[(λ^1/Nλ^r/Ntr(𝐒r)/N)(λ1/Nλr/Ntr(𝚺r)/N)]N(0,𝚺J),\sqrt{T}\left[\begin{pmatrix}\widehat{\lambda}_{1}/N\\ \vdots\\ \widehat{\lambda}_{r}/N\\ \operatorname{tr}({\bf S}_{-r})/N\end{pmatrix}-\begin{pmatrix}\lambda_{1}/N\\ \vdots\\ \lambda_{r}/N\\ \operatorname{tr}({\boldsymbol{\Sigma}}_{-r})/N\end{pmatrix}\right]\ \rightarrow\ N(0,{\boldsymbol{\Sigma}}_{J}),

where 𝚺J=diag(θ12σλ12,,θr2σλr2,0){\boldsymbol{\Sigma}}_{J}=\operatorname{diag}(\theta_{1}^{2}\sigma_{\lambda_{1}}^{2},\ldots,\theta_{r}^{2}\sigma_{\lambda_{r}}^{2},0). For any k=1,,rk=1,\ldots,r, by considering the function

f(𝐱)=f(x1,,xr+1)=xkik,i=1rxi,f({\bf x})=f(x_{1},\ldots,x_{r+1})=\dfrac{x_{k}}{\sum_{i\neq k,i=1}^{r}x_{i}},

and using the Delta method, we get that

T(λ^ktr(𝚺^N)λ^kλktr(𝚺)λk)𝒟N(0,σk2),\sqrt{T}\left(\dfrac{\widehat{\lambda}_{k}}{\operatorname{tr}(\widehat{{\boldsymbol{\Sigma}}}_{N})-\widehat{\lambda}_{k}}-\dfrac{\lambda_{k}}{\operatorname{tr}({\boldsymbol{\Sigma}})-\lambda_{k}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{-k}^{2}),

where

σk2\displaystyle\sigma_{-k}^{2} =\displaystyle= θk2(ik,i=1rθi+t𝑑H(t))2[σλk2+jk,j=1rθj2σλj2(ik,i=1rθi+t𝑑H(t))2].\displaystyle\dfrac{\theta_{k}^{2}}{\left(\sum_{i\neq k,i=1}^{r}\theta_{i}+\int tdH(t)\right)^{2}}\ \left[\sigma_{\lambda_{k}}^{2}+\dfrac{\sum_{j\neq k,j=1}^{r}\theta_{j}^{2}\ \sigma_{\lambda_{j}}^{2}}{\left(\sum_{i\neq k,i=1}^{r}\theta_{i}+\int tdH(t)\right)^{2}}\right].

Finally, σk2^\widehat{\sigma_{-k}^{2}} defined in Theorem 2 is a consistent estimator of σk2\sigma_{-k}^{2} by Theorem 1 and (19). ∎

S4. Some preliminary results for proving Theorems 3 and 6

We first derive some preliminary results in preparation for the proofs of Theorems 3 and 6.

Recall that we write 𝐮k=(𝐮kAT,𝐮kBT)T{\bf u}_{k}=({\bf u}_{kA}^{\mathrm{T}},{\bf u}_{kB}^{\mathrm{T}})^{\mathrm{T}} as the eigenvector of 𝐒N{\bf S}_{N} corresponding to the eigenvalue λ^k\widehat{\lambda}_{k}, where 𝐮kA{\bf u}_{kA} and 𝐮kB{\bf u}_{kB} are of dimensions rr and NrN-r, respectively. Also recall that 𝐮~kA=𝐮kA/𝐮kA\widetilde{{\bf u}}_{kA}={\bf u}_{kA}/\|{\bf u}_{kA}\|. Further denote by 𝐞~kA\widetilde{{\bf e}}_{kA} the rr dimensional vector with 1 in the kkth coordinate and 0’s elsewhere.

Proposition 1.

Under Assumptions A–C, for 1kr1\leq k\leq r, we have

T(1(𝐮~kA[k])2)𝒟ik,i=1rωkiZi2,T\left(1-\left(\widetilde{{\bf u}}_{kA}[k]\right)^{2}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2},

where ωki=θkθi(θkθi)2\omega_{ki}=\dfrac{\theta_{k}\theta_{i}}{(\theta_{k}-\theta_{i})^{2}} and ZiiidN(0,1)Z_{i}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,1).

Remark 7.

As a corollary, we have

N(1|𝐮~kA[k]|)=N(1|𝐮k[k]|𝐮kA)𝒟ρ2ik,i=1rωkiZi2.N\left(1-|\widetilde{{\bf u}}_{kA}[k]|\right)=N\left(1-\dfrac{|{\bf u}_{k}[k]|}{\|{\bf u}_{kA}\|}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ \dfrac{\rho}{2}\sum_{i\neq k,i=1}^{r}\omega_{ki}\cdot Z_{i}^{2}.
Proposition 2.

Under Assumptions A–C,

  1. (i)

    for 1kr1\leq k\leq r, we have

    T(𝐮~kA𝐞~kA)𝒟N(0,𝚺k),\sqrt{T}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA})\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\boldsymbol{\Sigma}}_{k}),

    where

    𝚺k=ik,i=1rωki𝐞~iA𝐞~iAT,andωki=θkθi(θkθi)2;{\boldsymbol{\Sigma}}_{k}=\sum_{i\neq k,i=1}^{r}\omega_{ki}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}},~{}~{}~{}~{}~{}~{}{\textrm{a}nd}~{}~{}~{}~{}\omega_{ki}=\dfrac{\theta_{k}\theta_{i}}{(\theta_{k}-\theta_{i})^{2}};
  2. (ii)

    for any fixed rr-dimensional vectors 𝐜k{\bf c}_{k}, k=1,,rk=1,\ldots,r, if there exist iji\neq j such that 𝐜i[j]𝐜j[i]{\bf c}_{i}[j]\neq{\bf c}_{j}[i], then

    Tk=1r𝐜kT(𝐮~kA𝐞~kA)𝒟N(0,k=1r1i=k+1rωki(𝐜k[i]𝐜i[k])2);\sqrt{T}\sum_{k=1}^{r}{\bf c}_{k}^{\mathrm{T}}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA})\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N\left(0,\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}({\bf c}_{k}[i]-{\bf c}_{i}[k])^{2}\right);
  3. (iii)

    for 1,kr1\leq\ell,k\leq r, the \ellth principal eigenvalue T(λ^/λ1)\sqrt{T}(\widehat{\lambda}_{\ell}/\lambda_{\ell}-1) and the kkth principal eigenvector T(𝐮~kA𝐞~kA)\sqrt{T}(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}) are asymptotically independent.

Remark 8.

The conclusion in (i) coincides with Theorem 3.2 in Wang and Fan (2017), which is proved under the sub-Gaussian assumption.

Proposition 3.

Under Assumptions A–C, for 1r1\leq\ell\leq r, we have

T(λk(1𝐮kA2)1Tj=r+1Nλ^j(1λ^j/λk)2)𝒟N(0,σkA2),\sqrt{T}\left(\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{kA}^{2}),

where σkA2=(E(𝐳1[k])43)ρ2(xdF(x))2+2ρx2dF(x)\sigma_{kA}^{2}=\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x) and the function F()F(\cdot) is a distribution function whose Stieltjes transform, mFm_{F}, is the unique solution in the set {mF+:(1ρ)/z+ρmF+}\{m_{F}\in{\mathbb{C}}^{+}:-(1-\rho)/z+\rho m_{F}\in{\mathbb{C}}^{+}\} to the equation

mF=dH(τ)τ(1ρρzmF)z,m_{F}=\int\dfrac{dH(\tau)}{\tau(1-\rho-\rho zm_{F})-z},

where HH is given in Assumption (A.ii).

Remark 9.

Under Assumptions A–C, Proposition 3 can be rewritten as

N3/2(1𝐮kA21Tλkj=r+1Nλ^j(1λ^j/λk)2)𝒟N(0,ρσkA2/θk2),N^{3/2}\left(1-\|{\bf u}_{kA}\|^{2}-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\rho\sigma_{kA}^{2}/\theta_{k}^{2}),

or

N3/2(1𝐮kA12Tλkj=r+1Nλ^j(1λ^j/λk)2)𝒟N(0,ρσkA2/(4θk2)).\displaystyle N^{3/2}\left(1-\|{\bf u}_{kA}\|-\dfrac{1}{2T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\rho\sigma_{kA}^{2}/(4\theta_{k}^{2})). (20)

In particular, N(1𝐮kA)pρ2θkx𝑑F(x)N\left(1-\|{\bf u}_{kA}\|\right)\,{\buildrel p\over{\longrightarrow}}\,\dfrac{\rho}{2\theta_{k}}\int xdF(x).

A. Proof of Proposition 1

Proof.

By definition,

𝐒N𝐮k=λ^k𝐮k.{\bf S}_{N}{\bf u}_{k}=\widehat{\lambda}_{k}{\bf u}_{k}.

Writing 𝐮k=(𝐮kAT,𝐮kBT)T{\bf u}_{k}=({\bf u}_{kA}^{\mathrm{T}},{\bf u}_{kB}^{\mathrm{T}})^{\mathrm{T}} gives us that

𝐒11𝐮kA+𝐒12𝐮kB\displaystyle{\bf S}_{11}{\bf u}_{kA}+{\bf S}_{12}{\bf u}_{kB} =\displaystyle= λ^k𝐮kA,\displaystyle\widehat{\lambda}_{k}{\bf u}_{kA}, (21)
𝐒21𝐮kA+𝐒22𝐮kB\displaystyle{\bf S}_{21}{\bf u}_{kA}+{\bf S}_{22}{\bf u}_{kB} =\displaystyle= λ^k𝐮kB.\displaystyle\widehat{\lambda}_{k}{\bf u}_{kB}. (22)

Solving (22) for 𝐮kB{\bf u}_{kB} yields

𝐮kB=(λ^k𝐈𝐒22)1𝐒21𝐮kA.\displaystyle{\bf u}_{kB}=(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}{\bf u}_{kA}. (23)

Replacing 𝐮kB{\bf u}_{kB} in (21) with (23) gives

(𝐒11+𝐒12(λ^k𝐈𝐒22)1𝐒21)𝐮kA=λ^k𝐮kA,\left({\bf S}_{11}+{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}\right){\bf u}_{kA}=\widehat{\lambda}_{k}{\bf u}_{kA},

in other words, 𝐊~N(λ^k)𝐮kA=λ^k𝐮kA\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k}){\bf u}_{kA}=\widehat{\lambda}_{k}{\bf u}_{kA}.

To prove Proposition 1, we first show that 𝐮~kA𝐞~kAp 0\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\,{\buildrel p\over{\longrightarrow}}\,0, where 𝐮~kA=𝐮kA/𝐮kA\widetilde{{\bf u}}_{kA}={\bf u}_{kA}/\|{\bf u}_{kA}\|. It follows from the definition of 𝐊N{\bf K}_{N} that

𝐮~kA\displaystyle\widetilde{{\bf u}}_{kA} =\displaystyle= 1λ^k𝐊~N(λ^k)𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\widehat{\lambda}_{k})\widetilde{{\bf u}}_{kA}
=\displaystyle= 1λ^k𝐊~N(λk)𝐮~kA+1λ^k𝚲A1/2(𝐊N(λ^k)𝐊N(λk))𝚲A1/2𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}\left({\bf K}_{N}(\widehat{\lambda}_{k})-{\bf K}_{N}(\lambda_{k})\right){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf u}}_{kA}
=\displaystyle= 1λ^k𝐊~N(λk)𝐮~kA+1λ^k2(1λ^kλk)𝚲A1/2𝐌N𝚲A1/2𝐮~kA,\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+\dfrac{1}{\widehat{\lambda}_{k}^{2}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right){\boldsymbol{\Lambda}}_{A}^{1/2}{\bf M}_{N}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA},

where

𝐌N=1T2𝐙A𝐗BT(𝐈λ^k1𝐒22)1(𝐈λk1𝐒22)1𝐗B𝐙AT.{\bf M}_{N}=\dfrac{1}{T^{2}}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\widehat{\lambda}_{k}^{-1}{\bf S}_{22})^{-1}({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-1}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}.

Because 𝐒22\|{\bf S}_{22}\| and (1/T)ZAZAT\|(1/T)Z_{A}Z_{A}^{\mathrm{T}}\| are both Op(1)O_{p}(1), and λkN\lambda_{k}\asymp N and λ^k=Op(N)\widehat{\lambda}_{k}=O_{p}(N), we have 𝐌N=Op(1)\|{\bf M}_{N}\|=O_{p}(1). Thus, by Theorem 1 and Assumption A, we get

𝐮~kA=1λ^k𝐊~N(λk)𝐮~kA+Op(N3/2).\widetilde{{\bf u}}_{kA}=\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA}+O_{p}(N^{-3/2}).

For the first term, consider the following decomposition:

1λ^k𝐊~N(λk)𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\widetilde{{\bf K}}_{N}(\lambda_{k})\widetilde{{\bf u}}_{kA} =\displaystyle= 1λ^k𝚲A1/2𝐊N(λk)𝚲A1/2𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf K}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}
=\displaystyle= 1λ^k𝚲A1/2(1T𝐑N(λk)+1Ttr(𝐈+𝐀N(λk)))𝚲A1/2𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}\left(\dfrac{1}{\sqrt{T}}{\bf R}_{N}(\lambda_{k})+\dfrac{1}{T}\operatorname{tr}({\bf I}+{\bf A}_{N}(\lambda_{k}))\right){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}
=\displaystyle= 1λ^k1T𝚲A1/2𝐑N(λk)𝚲A1/2𝐮~kA+𝚲Aλ^k𝐮~kA+1Ttr𝐀N𝚲Aλ^k𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\dfrac{1}{\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+\dfrac{1}{T}\operatorname{tr}{\bf A}_{N}\cdot\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}
=\displaystyle= 1λ^k1T𝚲A1/2𝐑N(λk)𝚲A1/2𝐮~kA+𝚲Aλ^k𝐮~kA+Op(1/N),\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\dfrac{1}{\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+O_{p}(1/N),

where the last step comes from Assumption A and Lemma 1. Hence,

𝐮~kA=1λ^kT𝚲A1/2𝐑N(λk)𝚲A1/2𝐮~kA+𝚲Aλ^k𝐮~kA+Op(1/N).\displaystyle\widetilde{{\bf u}}_{kA}=\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\widetilde{{\bf u}}_{kA}+O_{p}(1/N). (24)

Subtracting (𝚲A/λk)𝐮~kA({\boldsymbol{\Lambda}}_{A}/\lambda_{k})\widetilde{{\bf u}}_{kA} on both sides of (24) yields

(𝐈𝚲Aλk)𝐮~kA\displaystyle\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)\widetilde{{\bf u}}_{kA} =\displaystyle= 1λ^kT𝚲A1/2𝐑N(λk)𝚲A1/2𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}\ {\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA} (25)
+𝚲Aλ^k(1λ^kλk)𝐮~kA+Op(1/N).\displaystyle~{}~{}~{}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\widehat{\lambda}_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\widetilde{{\bf u}}_{kA}+O_{p}(1/N).

Further define

𝒪N,k=ik,i=1rλkλkλi𝐞~iA𝐞~iAT.{\mathcal{O}}_{N,k}=\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}}{\lambda_{k}-\lambda_{i}}\ \widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}.

It is easy to see that

𝒪N,k(𝐈𝚲Aλk)=ik,i=1r𝐞~iA𝐞~iAT=𝐈𝐞~kA𝐞~kAT.{\mathcal{O}}_{N,k}\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)=\sum_{i\neq k,i=1}^{r}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}={\bf I}-\widetilde{{\bf e}}_{kA}\widetilde{{\bf e}}_{kA}^{\mathrm{T}}.

Left-multiplying 𝒪N,k{\mathcal{O}}_{N,k} on both sides of (25) yields

𝐮~kA𝐮~kA,𝐞~kA𝐞~kA\displaystyle\widetilde{{\bf u}}_{kA}-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA} =\displaystyle= 1λ^kT𝒪N,k𝚲A1/2𝐑N(λk)𝚲A1/2𝐮~kA\displaystyle\dfrac{1}{\widehat{\lambda}_{k}\sqrt{T}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA} (26)
+(1λ^kλk)1λ^k𝒪N,k𝚲A𝐮~kA+Op(1/N).\displaystyle~{}~{}~{}+\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{\widehat{\lambda}_{k}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}+O_{p}(1/N).

By Lemma 2 and Theorem 1, we get

𝐮~kA𝐮~kA,𝐞~kA𝐞~kA=OP(1/T).\widetilde{{\bf u}}_{kA}-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA}=O_{P}(1/\sqrt{T}).

It follows that 𝐮~kA𝐞~kAp 0\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\,{\buildrel p\over{\longrightarrow}}\,0.

Replacing λ^k\widehat{\lambda}_{k} and 𝐮~kA\widetilde{{\bf u}}_{kA} on the right hand side of equation (24) with λk\lambda_{k} and 𝐞~kA\widetilde{{\bf e}}_{kA}, respectively, yields

𝐮~kA𝐞~kA\displaystyle\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA} =\displaystyle= 1λkT𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA+(λkλ^k1)𝚲Aλk𝐞~kA\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+\left(\dfrac{\lambda_{k}}{\widehat{\lambda}_{k}}-1\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\widetilde{{\bf e}}_{kA}
+𝚲Aλk(𝐮~kA𝐞~kA)+(𝚲Aλk𝐈)𝐞~kA+op(1/T).\displaystyle~{}~{}~{}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)+\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}-{\bf I}\right)\widetilde{{\bf e}}_{kA}+o_{p}(1/\sqrt{T}).

Rewrite the above equation as

T(𝐈𝚲Aλk)(𝐮~kA𝐞~kA)\displaystyle\sqrt{T}\left({\bf I}-\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\right)\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right) =\displaystyle= 1λk𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA+T(λkλ^k1)𝚲Aλk𝐞~kA\displaystyle\dfrac{1}{\lambda_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+\sqrt{T}\left(\dfrac{\lambda_{k}}{\widehat{\lambda}_{k}}-1\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\widetilde{{\bf e}}_{kA}
+T(𝚲Aλk𝐈)𝐞~kA+op(1).\displaystyle~{}~{}~{}+\sqrt{T}\left(\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}-{\bf I}\right)\widetilde{{\bf e}}_{kA}+o_{p}(1).

Multiplying 𝒪N,k{\mathcal{O}}_{N,k} on both sides yields

T(𝐮~kA𝐞~kA)\displaystyle\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right) =\displaystyle= 1λk𝒪N,k𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA\displaystyle\dfrac{1}{\lambda_{k}}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA} (27)
+T(𝐮~kA,𝐞~kA1)𝐞~kA+op(1),\displaystyle~{}~{}~{}+\sqrt{T}\left(\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle-1\right)\widetilde{{\bf e}}_{kA}+o_{p}(1),

where the last step comes from the facts that 𝒪N,k𝚲A𝐞~kA=0{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf e}}_{kA}=0 and 𝒪N,k𝐞~kA=0{\mathcal{O}}_{N,k}\widetilde{{\bf e}}_{kA}=0. Write

𝐖k=ik,i=1r𝐞~iA𝐞~iAT.{\bf W}_{k}^{\perp}=\sum_{i\neq k,i=1}^{r}\widetilde{{\bf e}}_{iA}\widetilde{{\bf e}}_{iA}^{\mathrm{T}}.

Then

𝐮~kA=𝐮~kA,𝐞~kA𝐞~kA+𝐖k𝐮~kA.\widetilde{{\bf u}}_{kA}=\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\widetilde{{\bf e}}_{kA}+{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}.

Notice that 𝐮~kA\widetilde{{\bf u}}_{kA} and 𝐞~kA\widetilde{{\bf e}}_{kA} are both unit vectors, thus

1=𝐮~kA,𝐞~kA2+𝐖k𝐮~kA2.\displaystyle 1=\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}+\|{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}\|^{2}. (28)

From (27) and the fact that 𝐖k𝐞~kA=0{\bf W}_{k}^{\perp}\widetilde{{\bf e}}_{kA}=0, we get

𝐖k𝐮~kA\displaystyle{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA} =\displaystyle= 𝐖k(𝐮~kA𝐞~kA)\displaystyle{\bf W}_{k}^{\perp}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right) (29)
=\displaystyle= 1λkT𝐖k𝒪N,k𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA+op(1/T).\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}{\bf W}_{k}^{\perp}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf e}}_{kA}+o_{p}(1/\sqrt{T}).

Combining (28) and (29) gives

T(1𝐮~kA,𝐞~kA2)=T𝐖k𝐮~kA2\displaystyle T\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)=T\|{\bf W}_{k}^{\perp}\widetilde{{\bf u}}_{kA}\|^{2} (30)
=\displaystyle= 1λk2𝐞~kAT𝚲A1/2𝐑N(λk)𝚲A1/2𝒪N,k𝐖k𝒪N,k𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA+op(1)\displaystyle\dfrac{1}{\lambda_{k}^{2}}\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}{\mathcal{O}}_{N,k}{\bf W}_{k}^{\perp}{\mathcal{O}}_{N,k}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf e}}_{kA}+o_{p}(1)
=\displaystyle= ik,i=1rλk2(λkλi)2[1λk𝚲A1/2𝐑N(λk)𝚲A1/2]ki2+op(1)\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}^{2}}{(\lambda_{k}-\lambda_{i})^{2}}\left[\dfrac{1}{\lambda_{k}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\right]_{ki}^{2}+o_{p}(1)
=\displaystyle= ik,i=1rλkλi(λkλi)2[𝐑N(λk)]ki2+op(1).\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}[{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1).

By Lemma 2, the conclusion follows. ∎

B. Proof of Proposition 2

Proof.

By (27), we obtain

T(𝐮~kA𝐞~kA)\displaystyle\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right) (31)
=\displaystyle= 1λk𝒪N,k𝚲A1/2𝐑N(λk)𝚲A1/2𝐞~kA+T(𝐮~kA,𝐞~kA1)𝐞~kA+op(1)\displaystyle\dfrac{1}{\lambda_{k}}{\mathcal{O}}_{N,k}\ {\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\ \widetilde{{\bf e}}_{kA}+\sqrt{T}\left(\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle-1\right)\widetilde{{\bf e}}_{kA}+o_{p}(1)
=\displaystyle= ik,i=1r1λkλi[𝚲A1/2𝐑N(λk)𝚲A1/2]ki𝐞~iAT(1𝐮~kA,𝐞~kA)𝐞~kA+op(1)\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{1}{\lambda_{k}-\lambda_{i}}\left[{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf R}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\right]_{ki}\widetilde{{\bf e}}_{iA}-\sqrt{T}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)\widetilde{{\bf e}}_{kA}+o_{p}(1)
=\displaystyle= ik,i=1rλkλiλkλi[𝐑N(λk)]ki𝐞~iAT(1𝐮~kA,𝐞~kA)𝐞~kA+op(1).\displaystyle\sum_{i\neq k,i=1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}[{\bf R}_{N}(\lambda_{k})]_{ki}\widetilde{{\bf e}}_{iA}-\sqrt{T}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)\widetilde{{\bf e}}_{kA}+o_{p}(1).

By Lemma 2, the conclusion in Part (i) follows.

Next, for any fixed vectors 𝐜k,k=1,,r{\bf c}_{k},\ k=1,\ldots,r, if there exist iji\neq j such that 𝐜i[j]𝐜j[i]{\bf c}_{i}[j]\neq{\bf c}_{j}[i], then by (31) and (30) and Lemma 2, we have

Tk=1r𝐜kT(𝐮~kA𝐞~kA)=I11TI2+op(1),\displaystyle\sqrt{T}\sum_{k=1}^{r}{\bf c}_{k}^{\mathrm{T}}\left(\widetilde{{\bf u}}_{kA}-\widetilde{{\bf e}}_{kA}\right)=I_{1}-\dfrac{1}{\sqrt{T}}I_{2}+o_{p}(1), (32)

where

I1\displaystyle I_{1} =\displaystyle= k=1rik,i=1rλkλiλkλi[𝐑N(λk)]ki𝐜k[i]\displaystyle\sum_{k=1}^{r}\sum_{i\neq k,i=1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}\ [{\bf R}_{N}(\lambda_{k})]_{ki}\ {\bf c}_{k}[i]
=\displaystyle= k=1r1i=k+1rλkλiλkλi(𝐜k[i]𝐜i[k])[𝐑N(λk)]ki\displaystyle\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\dfrac{\sqrt{\lambda_{k}\lambda_{i}}}{\lambda_{k}-\lambda_{i}}\left({\bf c}_{k}[i]-{\bf c}_{i}[k]\right)\ [{\bf R}_{N}(\lambda_{k})]_{ki}
𝒟\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}} N(0,k=1r1i=k+1rωki(𝐜k[i]𝐜i[k])2),\displaystyle N\left(0,\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}({\bf c}_{k}[i]-{\bf c}_{i}[k])^{2}\right),

and

I2\displaystyle I_{2} =\displaystyle= k=1r𝐜k[k]T(1𝐮~kA,𝐞~kA)\displaystyle\sum_{k=1}^{r}{\bf c}_{k}[k]\cdot T\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle\right)
=\displaystyle= 12k=1rik,i=1rλkλi(λkλi)2𝐜k[k][𝐑N(λk)]ki2+op(1)\displaystyle\dfrac{1}{2}\sum_{k=1}^{r}\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}{\bf c}_{k}[k][{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1)
=\displaystyle= 12k=1r1i=k+1rλkλi(λkλi)2(𝐜k[k]+𝐜i[i])[𝐑N(λk)]ki2+op(1)\displaystyle\dfrac{1}{2}\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\dfrac{\lambda_{k}\lambda_{i}}{(\lambda_{k}-\lambda_{i})^{2}}\left({\bf c}_{k}[k]+{\bf c}_{i}[i]\right)[{\bf R}_{N}(\lambda_{k})]_{ki}^{2}+o_{p}(1)
𝒟\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}} 12k=1r1i=k+1rωki(𝐜k[k]+𝐜i[i])Zki2,\displaystyle\dfrac{1}{2}\sum_{k=1}^{r-1}\sum_{i=k+1}^{r}\omega_{ki}\left({\bf c}_{k}[k]+{\bf c}_{i}[i]\right)Z_{ki}^{2},

where ZkiiidN(0,1)Z_{ki}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,1). The conclusion in Part (ii) follows.

Finally, by (17) and (31), Proposition 1 and Lemma 2, T(λ^/λ1)\sqrt{T}(\widehat{\lambda}_{\ell}/\lambda_{\ell}-1) and T(𝐮~kA𝐞kA)\sqrt{T}\left(\widetilde{{\bf u}}_{kA}-{\bf e}_{kA}\right) are asymptotically independent. ∎

C. Proof of Proposition 3

Proof.

Recall that 𝐮kB=(λ^k𝐈𝐒22)1𝐒21𝐮kA{\bf u}_{kB}=\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{21}{\bf u}_{kA}. We have

𝐮kA2=1𝐮kBT𝐮kB=1𝐮kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮kA.\|{\bf u}_{kA}\|^{2}=1-{\bf u}_{kB}^{\mathrm{T}}{\bf u}_{kB}=1-{\bf u}_{kA}^{\mathrm{T}}{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}{\bf u}_{kA}.

Dividing both sides by 𝐮kA2\|{\bf u}_{kA}\|^{2} yields

1=1𝐮kA2𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA.\displaystyle 1=\dfrac{1}{\|{\bf u}_{kA}\|^{2}}-\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}. (33)

Hence,

1𝐮kA2\displaystyle 1-\|{\bf u}_{kA}\|^{2} =\displaystyle= 111+𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA\displaystyle 1-\dfrac{1}{1+\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}} (34)
=\displaystyle= 𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA+ε~kA,\displaystyle\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+\widetilde{\varepsilon}_{kA},

where

ε~kA=(𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA)21+𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA.\widetilde{\varepsilon}_{kA}=\dfrac{-\left(\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}\right)^{2}}{1+\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}}.

To derive the CLT of 𝐮kA2\|{\bf u}_{kA}\|^{2}, we need to analyze the term 𝐒12(λ^k𝐈𝐒22)2𝐒21{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}. We first study the difference when replacing λ^k\widehat{\lambda}_{k} with λk\lambda_{k} in 𝐒12(λ^k𝐈𝐒22)2𝐒21{\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21}. We have

𝐒12((λ^k𝐈𝐒22)2(λk𝐈𝐒22)2)𝐒21\displaystyle{\bf S}_{12}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-2}\right){\bf S}_{21}
=\displaystyle= 𝐒12((λ^k𝐈𝐒22)1(λk𝐈𝐒22)1)((λ^k𝐈𝐒22)1+(λk𝐈𝐒22)1)𝐒21\displaystyle{\bf S}_{12}\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}-(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right)\left((\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}+(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}\right){\bf S}_{21}
=\displaystyle= L1+L2,\displaystyle L_{1}+L_{2},

where

L1\displaystyle L_{1} =\displaystyle= (λkλ^k)𝐒12(λ^k𝐈𝐒22)1(λk𝐈𝐒22)2𝐒21,\displaystyle(\lambda_{k}-\widehat{\lambda}_{k}){\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-2}{\bf S}_{21},
L2\displaystyle L_{2} =\displaystyle= (λkλ^k)𝐒12(λ^k𝐈𝐒22)2(λk𝐈𝐒22)1𝐒21.\displaystyle(\lambda_{k}-\widehat{\lambda}_{k}){\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-2}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-1}{\bf S}_{21}.

Define

𝐐~N(λk)=1T{1T𝐙A𝐗BT(𝐈λk1𝐒22)3𝐗B𝐙ATtr((𝐈λk1𝐒22)3𝐒22)𝐈}.\widetilde{{\bf Q}}_{N}(\lambda_{k})=\dfrac{1}{\sqrt{T}}\left\{\dfrac{1}{T}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)\cdot{\bf I}\right\}.

By Theorem 7.1 of Bai and Yao (2008), 𝐐~N(λk)\widetilde{{\bf Q}}_{N}(\lambda_{k}) converges weakly to a symmetric Gaussian random matrix 𝐐{\bf Q}^{*} with zero mean and finite covariance functions. Using the definitions of 𝐒12{\bf S}_{12} and 𝐒21{\bf S}_{21}, we can rewrite L1L_{1} as

L1\displaystyle L_{1} =\displaystyle= (λkλ^k)𝐒12(λk𝐈𝐒22)3𝐒21\displaystyle(\lambda_{k}-\widehat{\lambda}_{k})\ {\bf S}_{12}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf S}_{21}
+(λkλ^k)2𝐒12(λ^k𝐈𝐒22)1(λk𝐈𝐒22)3𝐒21\displaystyle+(\lambda_{k}-\widehat{\lambda}_{k})^{2}\ {\bf S}_{12}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf S}_{21}
=\displaystyle= (λkλ^k)1T2𝚲A1/2𝐙A𝐗BT(λk𝐈𝐒22)3𝐗B𝐙AT𝚲A1/2\displaystyle(\lambda_{k}-\widehat{\lambda}_{k})\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}
+(λkλ^k)21T2𝚲A1/2𝐙A𝐗BT(λ^k𝐈𝐒22)1(λk𝐈𝐒22)3𝐗B𝐙AT𝚲A1/2\displaystyle+(\lambda_{k}-\widehat{\lambda}_{k})^{2}\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22})^{-1}(\lambda_{k}{\bf I}-{\bf S}_{22})^{-3}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}
=\displaystyle= 1λk(1λ^kλk)𝚲Aλk1Ttr((𝐈λk1𝐒22)3𝐒22)+Op(1/N2),\displaystyle\dfrac{1}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2}),

where the last two steps follow from Assumption A and the facts that 1λ^k/λk=Op(1/T)1-\widehat{\lambda}_{k}/\lambda_{k}=O_{p}(1/\sqrt{T}), 𝐐~N(λk)=Op(1)\widetilde{{\bf Q}}_{N}(\lambda_{k})=O_{p}(1) and 𝐒22=Op(1)\|{\bf S}_{22}\|=O_{p}(1). Similarly, we obtain

L2=1λk(1λ^kλk)𝚲Aλk1Ttr((𝐈λk1𝐒22)3𝐒22)+Op(1/N2).L_{2}=\dfrac{1}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(({\bf I}-\lambda_{k}^{-1}{\bf S}_{22})^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2}).

In addition, we have

𝐒12(λk𝐈𝐒22)2𝐒21\displaystyle{\bf S}_{12}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21} =\displaystyle= 1T2𝚲A1/2𝐙A𝐗BT(λk𝐈𝐒22)2𝐗B𝐙AT𝚲A1/2\displaystyle\dfrac{1}{T^{2}}{\boldsymbol{\Lambda}}_{A}^{1/2}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}
=\displaystyle= 1λk2T𝚲A1/2𝐑~N(λk)𝚲A1/2+𝚲Aλk21Ttr((𝐈λk1𝐒22)2𝐒22),\displaystyle\dfrac{1}{\lambda_{k}^{2}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}^{2}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right),

where

𝐑~N(λk)=1T{1T𝐙A𝐗BT(𝐈λk1𝐒22)2𝐗B𝐙ATtr((𝐈λk1𝐒22)2𝐒22)𝐈}.\widetilde{{\bf R}}_{N}(\lambda_{k})=\dfrac{1}{\sqrt{T}}\left\{\dfrac{1}{T}{\bf Z}_{A}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf X}_{B}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\cdot{\bf I}\right\}.

By Theorem 7.1 of Bai and Yao (2008), 𝐑~N(λk)\widetilde{{\bf R}}_{N}(\lambda_{k}) converges weakly to a symmetric Gaussian random matrix 𝐑~\widetilde{{\bf R}} with zero mean and finite covariance functions. Combining the results above, we obtain

𝐒12(λ^k𝐈𝐒22)2𝐒21\displaystyle{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21} (35)
=\displaystyle= 𝐒12(λk𝐈𝐒22)2𝐒21+𝐒12((λ^k𝐈𝐒22)2(λk𝐈𝐒22)2)𝐒21\displaystyle{\bf S}_{12}\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{{}_{2}}{\bf S}_{21}+{\bf S}_{12}\left(\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}-\left(\lambda_{k}{\bf I}-{\bf S}_{22}\right)^{-2}\right){\bf S}_{21}
=\displaystyle= 1λk2T𝚲A1/2𝐑~N(λk)𝚲A1/2+𝚲Aλk21Ttr((𝐈λk1𝐒22)2𝐒22)\displaystyle\dfrac{1}{\lambda_{k}^{2}\sqrt{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}+\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}^{2}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)
+2λk(1λ^kλk)𝚲Aλk1Ttr((𝐈λk1𝐒22)3𝐒22)+Op(1/N2)\displaystyle+\dfrac{2}{\lambda_{k}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{{\boldsymbol{\Lambda}}_{A}}{\lambda_{k}}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+O_{p}(1/N^{2})
=\displaystyle= Op(1λk).\displaystyle O_{p}\left(\dfrac{1}{\lambda_{k}}\right).

It follows that ε~kA=Op(1/λk2)\widetilde{\varepsilon}_{kA}=O_{p}(1/\lambda_{k}^{2}), and

1𝐮kA2=𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA+Op(1/λk2).1-\|{\bf u}_{kA}\|^{2}=\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+O_{p}(1/\lambda_{k}^{2}). (36)

Next, we derive the limit of λk(1𝐮kA||2)\lambda_{k}\left(1-\|{\bf u}_{kA}||^{2}\right). By Assumption A and (35), we have

λk(1𝐮kA2)=λk𝐮~kAT𝐒12(λ^k𝐈𝐒22)2𝐒21𝐮~kA+Op(1/λk)\displaystyle\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)=\lambda_{k}\cdot\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\bf S}_{12}\left(\widehat{\lambda}_{k}{\bf I}-{\bf S}_{22}\right)^{-2}{\bf S}_{21}\widetilde{{\bf u}}_{kA}+O_{p}(1/\lambda_{k})
=\displaystyle= 1λkT𝐮~kAT𝚲A1/2𝐑~N(λk)𝚲A1/2𝐮~kA+1Ttr((𝐈λk1𝐒22)2𝐒22)𝐮~kAT𝚲A𝐮~kAλk\displaystyle\dfrac{1}{\lambda_{k}\sqrt{T}}\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf R}}_{N}(\lambda_{k}){\boldsymbol{\Lambda}}_{A}^{1/2}\widetilde{{\bf u}}_{kA}+\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\cdot\dfrac{\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}}{\lambda_{k}}
+2(1λ^kλk)1Ttr((𝐈λk1𝐒22)3𝐒22)𝐮~kAT𝚲A𝐮~kAλk+op(1/T).\displaystyle+2\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)\cdot\dfrac{\widetilde{{\bf u}}_{kA}^{\mathrm{T}}{\boldsymbol{\Lambda}}_{A}\widetilde{{\bf u}}_{kA}}{\lambda_{k}}+o_{p}(1/\sqrt{T}).

Replacing 𝐮~kA\widetilde{{\bf u}}_{kA} with 𝐞~kA\widetilde{{\bf e}}_{kA} and using Proposition 2, we get

λk(1𝐮kA2)=\displaystyle\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)= 1T[𝐑~N(λk)]kk+1Ttr((𝐈λk1𝐒22)2𝐒22)\displaystyle\dfrac{1}{\sqrt{T}}[\widetilde{{\bf R}}_{N}(\lambda_{k})]_{kk}+\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right) (37)
+2(1λ^kλk)1Ttr((𝐈λk1𝐒22)3𝐒22)+op(1/T).\displaystyle+2\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+o_{p}(1/\sqrt{T}).

Under Assumptions A–C, we have

1Ttr((𝐈λk1𝐒22)2𝐒22)pρx𝑑F(x),\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\ \,{\buildrel p\over{\longrightarrow}}\,\ \rho\int xdF(x),

where F()F(\cdot) is the LSD of 𝐒22{\bf S}_{22}. According to Theorem 1.1 of Silverstein and Bai (1995), the Stieltjes transform of FF, mFm_{F}, is the unique solution in the set {mF+:(1ρ)/z+ρmF+}\{m_{F}\in{\mathbb{C}}^{+}:-(1-\rho)/z+\rho m_{F}\in{\mathbb{C}}^{+}\} to the equation

mF=dH(τ)τ(1ρρzmF)z.m_{F}=\int\dfrac{dH(\tau)}{\tau(1-\rho-\rho zm_{F})-z}.

Therefore, λk(1𝐮kA2)pρx𝑑F(x)\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)\,{\buildrel p\over{\longrightarrow}}\,\rho\int xdF(x).

We now consider the limiting distribution of λk(1𝐮kA2)\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right). By (37) and (17), we get

T{λk(1𝐮kA2)1Ttr((𝐈λk1𝐒22)2𝐒22)}\displaystyle\sqrt{T}\left\{\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\right\}
=\displaystyle= [𝐑~N(λk)]kk2[𝐑N(λk)]kk1Ttr((𝐈λk1𝐒22)3𝐒22)+op(1).\displaystyle[\widetilde{{\bf R}}_{N}(\lambda_{k})]_{kk}-2[{\bf R}_{N}(\lambda_{k})]_{kk}\cdot\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-3}{\bf S}_{22}\right)+o_{p}(1).

Notice that

([𝐑N(λ)]kk[𝐑~N(λ)]kk)=(1T[𝐙A𝐂~N𝐙ATtr(𝐂~N)𝐈]kk1T[𝐙A𝐃~N𝐙ATtr(𝐃~N)𝐈]kk)=(1T(𝐳(k)T𝐂~N𝐳(k)tr(𝐂~N))1T(𝐳(k)T𝐃~N𝐳(k)tr(𝐃~N))),\displaystyle\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}=\begin{pmatrix}\dfrac{1}{\sqrt{T}}[{\bf Z}_{A}\widetilde{{\bf C}}_{N}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}(\widetilde{{\bf C}}_{N})\cdot{\bf I}]_{kk}\\ \dfrac{1}{\sqrt{T}}[{\bf Z}_{A}\widetilde{{\bf D}}_{N}{\bf Z}_{A}^{\mathrm{T}}-\operatorname{tr}(\widetilde{{\bf D}}_{N})\cdot{\bf I}]_{kk}\end{pmatrix}=\begin{pmatrix}\dfrac{1}{\sqrt{T}}\left({\bf z}_{(k)}^{\mathrm{T}}\widetilde{{\bf C}}_{N}{\bf z}_{(k)}-\operatorname{tr}(\widetilde{{\bf C}}_{N})\right)\\ \dfrac{1}{\sqrt{T}}\left({\bf z}_{(k)}^{\mathrm{T}}\widetilde{{\bf D}}_{N}{\bf z}_{(k)}-\operatorname{tr}(\widetilde{{\bf D}}_{N})\right)\end{pmatrix},

where

𝐂~N\displaystyle\widetilde{{\bf C}}_{N} =\displaystyle= 𝐂~N(λ)=𝐈+1T𝐗BT(λ𝐈𝐒22)1𝐗B,\displaystyle\widetilde{{\bf C}}_{N}(\lambda)={\bf I}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B},
𝐃~N\displaystyle\widetilde{{\bf D}}_{N} =\displaystyle= 𝐃~N(λ)=1T𝐗BT(𝐈λ1𝐒22)2𝐗B.\displaystyle\widetilde{{\bf D}}_{N}(\lambda)=\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf X}_{B}.

To finish the proof, we need the following lemma, which will be proved at the end of this subsection.

Lemma 4.

Under Assumptions A–C, if λT\lambda\asymp T, then ([𝐑N(λ)]kk[𝐑~N(λ)]kk)\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix} converges weakly to a zero-mean Gaussian vector with covariance matrix 𝛀k=(𝛀k,11𝛀k,12𝛀k,21𝛀k,22),{\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,21}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}, where

𝛀k,11\displaystyle{\boldsymbol{\Omega}}_{k,11} =\displaystyle= E(𝐳1[k])41,\displaystyle\operatorname{E}({\bf z}_{1}[k])^{4}-1,
𝛀k,22\displaystyle{\boldsymbol{\Omega}}_{k,22} =\displaystyle= (E(𝐳1[k])43)ρ2(xdF(x))2+2ρx2dF(x),\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x),
𝛀k,12\displaystyle{\boldsymbol{\Omega}}_{k,12} =\displaystyle= (E(𝐳1[k])41)ρxdF(x),\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-1\right)\cdot\rho\int xdF(x),

where F()F(\cdot) is the LSD of 𝐒22{\bf S}_{22}.

Based on the lemma above, we conclude that

T{λk(1𝐮kA2)1Ttr((𝐈λk1𝐒22)2𝐒22)}𝒟N(0,σkA2),\sqrt{T}\left\{\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)\right\}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,\sigma_{kA}^{2}),

where

σkA2\displaystyle\sigma_{kA}^{2} =\displaystyle= (2ρx𝑑F(x),1)(𝛀k,11𝛀k,12𝛀k,12𝛀k,22)(2ρx𝑑F(x)1)\displaystyle\left(-2\rho\int xdF(x),~{}~{}1\right)\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,12}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}\begin{pmatrix}-2\rho\int xdF(x)\\ 1\end{pmatrix}
=\displaystyle= (E(𝐳1[k])43)ρ2(xdF(x))2+2ρx2dF(x).\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x).

Further, from Assumption A that λk=Op(N)\lambda_{k}=O_{p}(N) for krk\leq r, and the boundedness of the eigenvalues of 𝐒22{\bf S}_{22} and maxj>rλ^j\max_{j>r}\widehat{\lambda}_{j} with high probability, it follows that

T[1Ttr((𝐈λk1𝐒22)2𝐒22)1Ttr(𝐒22)]p 0,\sqrt{T}\left[\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda_{k}^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)-\dfrac{1}{T}\operatorname{tr}({\bf S}_{22})\right]\,{\buildrel p\over{\longrightarrow}}\,0,

and

T[1Tj=r+1Nλ^j(1λ^j/λk)21Tj=r+1Nλ^j]p 0.\sqrt{T}\left[\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}-\dfrac{1}{T}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}\right]\,{\buildrel p\over{\longrightarrow}}\,0.

Recall that

T[1Ttr(𝐒22)1Tj=r+1Nλ^j]p 0,\sqrt{T}\left[\dfrac{1}{T}\operatorname{tr}({\bf S}_{22})-\dfrac{1}{T}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}\right]\,{\buildrel p\over{\longrightarrow}}\,0,

which has been shown in the proof of (19). Therefore, Proposition 3 follows.

At last, we prove Lemma 4.

Proof of Lemma 4.

By Theorem 2.1 of Wang et al. (2014), ([𝐑N(λ)]kk[𝐑~N(λ)]kk)\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix} converges weakly to a zero-mean Gaussian vector with covariance matrix 𝛀k=(𝛀k,11𝛀k,12𝛀k,21𝛀k,22),{\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,21}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}, where

𝛀k,11\displaystyle{\boldsymbol{\Omega}}_{k,11} =\displaystyle= ω1A1+(τ1ω1)(A2+A3),\displaystyle\omega_{1}A_{1}+(\tau_{1}-\omega_{1})(A_{2}+A_{3}),
𝛀k,22\displaystyle{\boldsymbol{\Omega}}_{k,22} =\displaystyle= ω2A1+(τ2ω2)(A2+A3),\displaystyle\omega_{2}A_{1}+(\tau_{2}-\omega_{2})(A_{2}+A_{3}),
𝛀k,12\displaystyle{\boldsymbol{\Omega}}_{k,12} =\displaystyle= ω3A1+(τ3ω3)(A2+A3),\displaystyle\omega_{3}A_{1}+(\tau_{3}-\omega_{3})(A_{2}+A_{3}),

with

A1=E(𝐳(k)[1])41=E(𝐳1[k])41,A2=1,A3=1,A_{1}=\operatorname{E}({\bf z}_{(k)}[1])^{4}-1=\operatorname{E}({\bf z}_{1}[k])^{4}-1,~{}~{}~{}~{}~{}~{}~{}A_{2}=1,~{}~{}~{}~{}~{}~{}~{}~{}A_{3}=1,
τ1=limN1Ttr(𝐂~N2),τ2=limN1Ttr(𝐃~N2),τ3=limN1Ttr(𝐂~N𝐃~N),\tau_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}^{2}\right),~{}~{}~{}~{}\tau_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}^{2}\right),~{}~{}~{}~{}\tau_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\widetilde{{\bf D}}_{N}\right),
ω1=limN1Ttr(𝐂~N𝐂~N),ω2=limN1Ttr(𝐃~N𝐃~N),ω3=limN1Ttr(𝐂~N𝐃~N),\omega_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf C}}_{N}\right),~{}~{}~{}~{}\omega_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}\circ\widetilde{{\bf D}}_{N}\right),~{}~{}~{}~{}\omega_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf D}}_{N}\right),

where 𝐀𝐁{\bf A}\circ{\bf B} denotes the Hadamard product of two symmetric matrices 𝐀{\bf A} and 𝐁{\bf B}, i.e. [𝐀𝐁]ij=[𝐀]ij[𝐁]ij[{\bf A}\circ{\bf B}]_{ij}=[{\bf A}]_{ij}\cdot[{\bf B}]_{ij}.

To prove Lemma 4, we need to compute the values of τi\tau_{i} and ωi\omega_{i}, i=1,2,3i=1,2,3. We start with τi\tau_{i}’s. From the definitions of 𝐂~N\widetilde{{\bf C}}_{N} and 𝐃~N\widetilde{{\bf D}}_{N}, it is easy to check that

τ1\displaystyle\tau_{1} =\displaystyle= limN1Ttr(𝐂~N2)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf C}}_{N}^{2})
=\displaystyle= limN1Ttr(𝐈+2T𝐗BT(λ𝐈𝐒22)1𝐗B+1T𝐗BT(λ𝐈𝐒22)1𝐗B1T𝐗BT(λ𝐈𝐒22)1𝐗B)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left({\bf I}+\dfrac{2}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf X}_{B}\right)
=\displaystyle= 1+2limN1Ttr((λ𝐈𝐒22)1𝐒22)+limN1Ttr((λ𝐈𝐒22)1𝐒22(λ𝐈𝐒22)1𝐒22)\displaystyle 1+2\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right)+\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\right)
=\displaystyle= 1;\displaystyle 1;
τ2\displaystyle\tau_{2} =\displaystyle= limN1Ttr(𝐃~N2)=limN1Ttr(1T𝐗BT(𝐈λ1𝐒22)2𝐗B1T𝐗BT(𝐈λ1𝐒22)2𝐗B)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf D}}_{N}^{2})=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)
=\displaystyle= limN1Ttr(1T𝐗BT(𝐈λ1𝐒22)2𝐒22(𝐈λ1𝐒22)2𝐒22)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf S}_{22}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf S}_{22}\right)
=\displaystyle= ρx2𝑑F(x);\displaystyle\rho\int x^{2}dF(x);

and

τ3\displaystyle\tau_{3} =\displaystyle= limN1Ttr(𝐂~N𝐃~N)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}(\widetilde{{\bf C}}_{N}\widetilde{{\bf D}}_{N})
=\displaystyle= limN1Ttr(1T𝐗BT(𝐈λ1𝐒22)2𝐗B+1T𝐗BT(λ𝐈𝐒22)1𝐗B1T𝐗BT(𝐈λ1𝐒22)2𝐗B)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}+\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\cdot\dfrac{1}{T}{\bf X}_{B}^{\mathrm{T}}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)
=\displaystyle= limN1Ttr((𝐈λ1𝐒22)2𝐒22)+limN1Ttr((λ𝐈𝐒22)1𝐒22(𝐈λ1𝐒22)2𝐒22)\displaystyle\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)+\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\left(\lambda{\bf I}-{\bf S}_{22}\right)^{-1}{\bf S}_{22}\left({\bf I}-\lambda^{-1}{\bf S}_{22}\right)^{-2}{\bf S}_{22}\right)
=\displaystyle= ρx𝑑F(x).\displaystyle\rho\int x\,dF(x).

Next, we calculate the values of ωi\omega_{i}, i=1,2,3i=1,2,3. Denote by 𝐗Bi{\bf X}_{Bi} the matrix obtained from 𝐗B{\bf X}_{B} by deleting its iith column. Then

𝐒22=1T𝐗Bi𝐗BiT+1T𝐱i(B)𝐱i(B)T.{\bf S}_{22}=\dfrac{1}{T}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}+\dfrac{1}{T}{\bf x}_{i}^{(B)}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}.

Recall that for any invertible matrix 𝐀{\bf A} and vector 𝐫{\bf r}, one has

(𝐀+𝐫𝐫T)1=𝐀1𝐀1𝐫𝐫T𝐀11+𝐫T𝐀1𝐫,({\bf A}+{\bf r}{\bf r}^{\mathrm{T}})^{-1}={\bf A}^{-1}-\dfrac{{\bf A}^{-1}{\bf r}{\bf r}^{\mathrm{T}}{\bf A}^{-1}}{1+{\bf r}^{\mathrm{T}}{\bf A}^{-1}{\bf r}},

and

𝐫T(𝐀+𝐫𝐫T)2𝐫=𝐫T𝐀2𝐫(1+𝐫T𝐀1𝐫)2.{\bf r}^{\mathrm{T}}({\bf A}+{\bf r}{\bf r}^{\mathrm{T}})^{-2}{\bf r}=\dfrac{{\bf r}^{\mathrm{T}}{\bf A}^{-2}{\bf r}}{\left(1+{\bf r}^{\mathrm{T}}{\bf A}^{-1}{\bf r}\right)^{2}}.

By Assumption A, we have

[𝐃~N]ii\displaystyle[\widetilde{{\bf D}}_{N}]_{ii} =\displaystyle= 1T𝐱i(B)T(λ1𝐒22𝐈)2𝐱i(B)\displaystyle\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}(\lambda^{-1}{\bf S}_{22}-{\bf I})^{-2}{{\bf x}_{i}^{(B)}} (38)
=\displaystyle= 1T𝐱i(B)T(1/(Tλ)𝐗Bi𝐗BiT𝐈)2𝐱i(B)(1+1T𝐱i(B)T(T1𝐗Bi𝐗BiTλ𝐈)1𝐱i(B))2\displaystyle\dfrac{\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(1/(T\lambda){\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-{\bf I}\right)^{-2}{\bf x}_{i}^{(B)}}{\left(1+\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}\right)^{2}}
p\displaystyle\,{\buildrel p\over{\longrightarrow}}\, ρx𝑑F(x),\displaystyle\rho\int xdF(x),

and

[𝐂~N]ii\displaystyle[\widetilde{{\bf C}}_{N}]_{ii} =\displaystyle= 11T𝐱i(B)T(𝐒22λ𝐈)1𝐱i(B)\displaystyle 1-\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}({\bf S}_{22}-\lambda{\bf I})^{-1}{{\bf x}_{i}^{(B)}} (39)
=\displaystyle= 11T𝐱i(B)T(T1𝐗Bi𝐗BiTλ𝐈)1𝐱i(B)1+1T𝐱i(B)T(T1𝐗Bi𝐗BiTλ𝐈)1𝐱i(B)\displaystyle 1-\dfrac{\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}}{1+\dfrac{1}{T}{{\bf x}_{i}^{(B)}}^{\mathrm{T}}\left(T^{-1}{\bf X}_{Bi}{\bf X}_{Bi}^{\mathrm{T}}-\lambda{\bf I}\right)^{-1}{\bf x}_{i}^{(B)}}
p\displaystyle\,{\buildrel p\over{\longrightarrow}}\, 1.\displaystyle 1.

It is easy to see that

limN1TEtr(T1𝐗BT(𝐈λ1𝐒22)2𝐗B)4𝟏(s)<,\lim_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})<\infty,

and

limN1TEtr(T1𝐗BT(λ𝐈𝐒22)1𝐗B)4𝟏(s)=0.\lim_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}(\lambda{\bf I}-{\bf S}_{22})^{-1}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})=0.

Hence

supNE([𝐃~N]ii𝟏(s))4supN1TEtr(T1𝐗BT(𝐈λ1𝐒22)2𝐗B)4𝟏(s)<\sup_{N}\operatorname{E}([\widetilde{{\bf D}}_{N}]_{ii}{\mathbf{1}}({\mathcal{F}}_{s}))^{4}\leq\sup_{N}\dfrac{1}{T}\operatorname{E}\operatorname{tr}\left(T^{-1}{\bf X}_{B}^{T}({\bf I}-\lambda^{-1}{\bf S}_{22})^{-2}{\bf X}_{B}\right)^{4}{\mathbf{1}}({\mathcal{F}}_{s})<\infty

and supNE([𝐂~N]ii𝟏(s))4<\sup_{N}\operatorname{E}([\widetilde{{\bf C}}_{N}]_{ii}{\mathbf{1}}({\mathcal{F}}_{s}))^{4}<\infty, which implies that the family of random variables {[𝐃~N]ii2𝟏(s)}\{[\widetilde{{\bf D}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\} and {[𝐂~N]ii2𝟏(s)}\{[\widetilde{{\bf C}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\} are uniformly integrable. Together with (38) and (39) and the fact that 𝟏(s)=1{\mathbf{1}}({\mathcal{F}}_{s})=1 with high probability, we get

E|1Ti=1T[𝐃~N]ii2𝟏(s)(ρx𝑑F(x))2|E|[𝐃~N]112𝟏(s)(ρx𝑑F(x))2|0,\operatorname{E}\left|\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf D}}_{N}]_{ii}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-\left(\rho\int xdF(x)\right)^{2}\right|\leq\operatorname{E}\left|[\widetilde{{\bf D}}_{N}]_{11}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-\left(\rho\int xdF(x)\right)^{2}\right|\rightarrow 0,

and

E|1Tt=1T[𝐂~N]tt2𝟏(s)1|0.\operatorname{E}\left|\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf C}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})-1\right|\rightarrow 0.

Thus, 1Tt=1T[𝐃~N]tt2𝟏(s)p(ρx𝑑F(x))2\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf D}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\,{\buildrel p\over{\longrightarrow}}\,\left(\rho\int xdF(x)\right)^{2} and 1Tt=1T[𝐂~N]tt2𝟏(s)p 1\dfrac{1}{T}\sum_{t=1}^{T}[\widetilde{{\bf C}}_{N}]_{tt}^{2}{\mathbf{1}}({\mathcal{F}}_{s})\,{\buildrel p\over{\longrightarrow}}\,1. Moreover, noting that in the event s{\mathcal{F}}_{s}, [𝐂~N]ii[\widetilde{{\bf C}}_{N}]_{ii} and [𝐃~N]ii[\widetilde{{\bf D}}_{N}]_{ii}, i=1,,Ti=1,\ldots,T, are uniformly bounded and that P(s)1P({\mathcal{F}}_{s})\rightarrow 1, we obtain

1Ti=1T[𝐃~N]ii2p(ρx𝑑F(x))2,and1Ti=1T[𝐂~N]ii2p 1\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf D}}_{N}]_{ii}^{2}\,{\buildrel p\over{\longrightarrow}}\,\left(\rho\int xdF(x)\right)^{2},~{}~{}\textrm{and}~{}\dfrac{1}{T}\sum_{i=1}^{T}[\widetilde{{\bf C}}_{N}]_{ii}^{2}\,{\buildrel p\over{\longrightarrow}}\,1

Therefore,

ω1=limN1Ttr(𝐂~N𝐂~N)=1,\displaystyle\omega_{1}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf C}}_{N}\right)=1,
ω2=limN1Ttr(𝐃~N𝐃~N)=ρ2(x𝑑F(x))2,\displaystyle\omega_{2}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf D}}_{N}\circ\widetilde{{\bf D}}_{N}\right)=\rho^{2}\ \left(\int xdF(x)\right)^{2},

and

ω3=limN1Ttr(𝐂~N𝐃~N)=ρx𝑑F(x).\displaystyle\omega_{3}=\lim_{N}\dfrac{1}{T}\operatorname{tr}\left(\widetilde{{\bf C}}_{N}\circ\widetilde{{\bf D}}_{N}\right)=\rho\ \int xdF(x).

In summary,

([𝐑N(λ)]kk[𝐑~N(λ)]kk)𝒟N(0,𝛀k=(𝛀k,11𝛀k,12𝛀k,12𝛀k,22)),\begin{pmatrix}[{\bf R}_{N}(\lambda)]_{kk}\\ [\widetilde{{\bf R}}_{N}(\lambda)]_{kk}\end{pmatrix}~{}~{}\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}~{}~{}N\left(0,{\boldsymbol{\Omega}}_{k}=\begin{pmatrix}{\boldsymbol{\Omega}}_{k,11}&{\boldsymbol{\Omega}}_{k,12}\\ {\boldsymbol{\Omega}}_{k,12}&{\boldsymbol{\Omega}}_{k,22}\end{pmatrix}\right),

where

𝛀k,11\displaystyle{\boldsymbol{\Omega}}_{k,11} =\displaystyle= E(𝐳1[k])41\displaystyle\operatorname{E}({\bf z}_{1}[k])^{4}-1
𝛀k,22\displaystyle{\boldsymbol{\Omega}}_{k,22} =\displaystyle= (E(𝐳1[k])43)ρ2(xdF(x))2+2ρx2dF(x),and\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-3\right)\cdot\rho^{2}\ \left(\int xdF(x)\right)^{2}+2\rho\int x^{2}dF(x),\mbox{and }
𝛀k,12\displaystyle{\boldsymbol{\Omega}}_{k,12} =\displaystyle= (E(𝐳1[k])41)ρxdF(x).\displaystyle\left(\operatorname{E}({\bf z}_{1}[k])^{4}-1\right)\cdot\rho\int xdF(x).

S5. Proof of Theorem 3

Proof.

Recall that 𝐯k,𝐯^k=𝐮k,𝐞k\langle{\bf v}_{k},\widehat{{\bf v}}_{k}\rangle=\langle{\bf u}_{k},{\bf e}_{k}\rangle, where 𝐯k,𝐯^k{\bf v}_{k},\widehat{{\bf v}}_{k} are the kkth principal eigenvector of 𝚺{\boldsymbol{\Sigma}} and kkth principal sample eigenvector of 𝚺^N\widehat{{\boldsymbol{\Sigma}}}_{N}, respectively, and 𝐮k{\bf u}_{k} is the kkth eigenvector of sample covariance matrix 𝐒N{\bf S}_{N}. Under Assumptions A–C, by Propositions 1 and 3, we get

T(1𝐮k,𝐞k21Tλkj=r+1Nλ^j(1λ^j/λk)2)\displaystyle T\left(1-\langle{\bf u}_{k},{\bf e}_{k}\rangle^{2}-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)
=\displaystyle= T(1𝐮kA2+𝐮kA2(1𝐮~kA,𝐞~kA2)1Tλkj=r+1Nλ^j(1λ^j/λk)2)\displaystyle T\left(1-\|{\bf u}_{kA}\|^{2}+\|{\bf u}_{kA}\|^{2}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)-\dfrac{1}{T\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)
=\displaystyle= Tλk(λk(1𝐮kA2)1Tj=r+1Nλ^j(1λ^j/λk)2)+T𝐮kA2(1𝐮~kA,𝐞~kA2)\displaystyle\dfrac{T}{\lambda_{k}}\left(\lambda_{k}\left(1-\|{\bf u}_{kA}\|^{2}\right)-\dfrac{1}{T}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{(1-\widehat{\lambda}_{j}/\lambda_{k})^{2}}\right)+T\|{\bf u}_{kA}\|^{2}\left(1-\langle\widetilde{{\bf u}}_{kA},\widetilde{{\bf e}}_{kA}\rangle^{2}\right)
𝒟\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}} ik,i=1rωkiZi2,\displaystyle\sum_{i\neq k,i=1}^{r}\omega_{ki}\ Z_{i}^{2},

where ZiZ_{i} are i.i.d. standard normal random variables.

It remains to prove that

1λkj=r+1Nλ^j(1λ^j/λk)21λ^kj=r+1Nλ^j(1λ^j/λ^k)2p 0.\dfrac{1}{\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}}\,{\buildrel p\over{\longrightarrow}}\,0.

Rewrite the term as

1λkj=r+1Nλ^j(1λ^j/λk)21λ^kj=r+1Nλ^j(1λ^j/λk)2\displaystyle\dfrac{1}{\lambda_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}
+1λ^kj=r+1Nλ^j(1λ^j/λk)21λ^kj=r+1Nλ^j(1λ^j/λ^k)2\displaystyle+\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}-\dfrac{1}{\widehat{\lambda}_{k}}\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}}
=\displaystyle= 1λ^k(λ^kλk1)j=r+1Nλ^j(1λ^j/λk)2\displaystyle\dfrac{1}{\widehat{\lambda}_{k}}\left(\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}-1\right)\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}}{\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}
+1λ^k2(1λ^kλk)j=r+1Nλ^j2(2λ^j/λkλ^j/λ^k)(1λ^j/λ^k)2(1λ^j/λk)2.\displaystyle+\dfrac{1}{\widehat{\lambda}_{k}^{2}}\left(1-\dfrac{\widehat{\lambda}_{k}}{\lambda_{k}}\right)\sum_{j=r+1}^{N}\dfrac{\widehat{\lambda}_{j}^{2}\left(2-\widehat{\lambda}_{j}/\lambda_{k}-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)}{\left(1-\widehat{\lambda}_{j}/\widehat{\lambda}_{k}\right)^{2}\left(1-\widehat{\lambda}_{j}/\lambda_{k}\right)^{2}}.

By Assumption A–C and Theorem 1, the term converge to zero in probability. ∎

S6. Proofs of Theorem 6 and Corollary 1

Proof of Theorem 6.

By (31), we have

N(𝐮~kA(i)𝐞~kA)=NTik,=1rλk(i)λ(i)λk(i)λ(i)[𝐑N(i)(λk(i))]k𝐞~A+op(1),i=1,2.\sqrt{N}\left(\widetilde{{\bf u}}_{kA}^{(i)}-\widetilde{{\bf e}}_{kA}\right)=\sqrt{\frac{N}{T_{i}}}\sum_{\ell\neq k,\ell=1}^{r}\frac{\sqrt{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}}{\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}}}[{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}\ \widetilde{{\bf e}}_{\ell A}+o_{p}(1),\quad i=1,2. (40)

Hence, when k\ell\neq k,

Nuk(i)[]𝐮kA(i)=NTiλk(i)λ(i)λk(i)λ(i)[𝐑N(i)(λk(i))]k+op(1),i=1,2.\sqrt{N}\ \dfrac{u_{k}^{{(i)}}[\ell]}{\|{\bf u}_{kA}^{{(i)}}\|}=\sqrt{\frac{N}{T}_{i}}\cdot\dfrac{\sqrt{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}}{\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}}}\ [{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}\ +o_{p}(1),\quad i=1,2. (41)

Similarly, by (30),

N(1|uk(i)[k]|𝐮kA(i))=N2Tik,=1rλk(i)λ(i)(λk(i)λ(i))2[𝐑N(i)(λk(i))]k2+op(1),i=1,2.N\left(1-\dfrac{|u_{k}^{{(i)}}[k]|}{\|{\bf u}_{kA}^{{(i)}}\|}\right)=\dfrac{N}{2T_{i}}\cdot\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\lambda_{k}^{{(i)}}\lambda_{\ell}^{{(i)}}}{(\lambda_{k}^{{(i)}}-\lambda_{\ell}^{{(i)}})^{2}}[{\bf R}_{N}^{{(i)}}(\lambda_{k}^{{(i)}})]_{k\ell}^{2}+o_{p}(1),\quad i=1,2. (42)

Proposition 3 implies that

N(1𝐮kA(i))pρi2θk(i)x𝑑F(i)(x),i=1,2.N(1-\|{\bf u}_{kA}^{{(i)}}\|)\ \,{\buildrel p\over{\longrightarrow}}\,\ \dfrac{\rho_{i}}{2\theta_{k}^{{(i)}}}\int xdF^{{(i)}}(x),\quad i=1,2. (43)

Write the two population eigen-matrices 𝐕(1),𝐕(2){{\bf V}}^{(1)},{{\bf V}}^{(2)} as

𝐕(i)=(𝐯1(i),,𝐯N(i)),i=1,2,{{\bf V}}^{(i)}=({{\bf v}}_{1}^{(i)},\ldots,{{\bf v}}_{N}^{(i)}),~{}~{}~{}~{}~{}~{}i=1,2,

and define

𝚵=([𝚵]ij)=𝐕(1)𝐕(2)T=(𝚵11𝚵12𝚵21𝚵22,)\boldsymbol{\Xi}=([\boldsymbol{\Xi}]_{ij})={{\bf V}}^{(1)}{}^{\mathrm{T}}{{\bf V}}^{(2)}=\begin{pmatrix}\boldsymbol{\Xi}_{11}&\boldsymbol{\Xi}_{12}\\ \boldsymbol{\Xi}_{21}&\boldsymbol{\Xi}_{22},\end{pmatrix}

where [𝚵]ij=𝐯i(1)𝐯j(2)T[\boldsymbol{\Xi}]_{ij}={{\bf v}}_{i}^{(1)}{}^{\mathrm{T}}{{\bf v}}_{j}^{(2)}, and 𝚵11,𝚵12,𝚵21,𝚵22\boldsymbol{\Xi}_{11},\boldsymbol{\Xi}_{12},\boldsymbol{\Xi}_{21},\boldsymbol{\Xi}_{22} are of sizes r×r,r×(Nr),(Nr)×rr\times r,r\times(N-r),(N-r)\times r and (Nr)×(Nr)(N-r)\times(N-r), respectively.

Under null hypothesis H0(III,k):|𝐯k(1),𝐯k(2)|=1H_{0}^{(III,k)}:|\langle{\bf v}_{k}^{(1)},{\bf v}_{k}^{(2)}\rangle|=1, the kkth row and kkth column of 𝚵\boldsymbol{\Xi} are zero except that the kkth diagonal entry is one. To prove the theorem, note that

𝐯^k(1),𝐯^k(2)\displaystyle\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle =\displaystyle= 𝐕(1)𝐮k(1),𝐕(2)𝐮k(2)=𝐮k(1)𝚵T𝐮k(2)\displaystyle\langle{{\bf V}}^{(1)}{\bf u}_{k}^{(1)},{{\bf V}}^{(2)}{\bf u}_{k}^{(2)}\rangle={\bf u}_{k}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}{\bf u}_{k}^{(2)}
=\displaystyle= 𝐮kA(1)𝚵11T𝐮kA(2)+𝐮kA(1)𝚵12T𝐮kB(2)+𝐮kB(1)𝚵21T𝐮kA(2)+𝐮kB(1)𝚵22T𝐮kB(2).\displaystyle{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}+{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}+{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}+{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}.

We start with the first term 𝐮kA(1)𝚵11T𝐮kA(2){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}, and will show later that

N𝐮kA(1)𝚵12T𝐮kB(2)=op(1),N𝐮kB(1)𝚵21T𝐮kA(2)=op(1), and N𝐮kB(1)𝚵22T𝐮kB(2)=op(1).N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=o_{p}(1),~{}~{}N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}=o_{p}(1),\mbox{ and }N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1). (44)

Because the entries in the kkth row and kkth column of 𝚵11\boldsymbol{\Xi}_{11} are zero except that [𝚵11]kk=1[\boldsymbol{\Xi}_{11}]_{kk}=1, we have

N(1𝐮kA(1)𝚵11T𝐮kA(2))\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right) (45)
=\displaystyle= N(1uk(1)[k]uk(2)[k])i,j=1,ik,jkrN[𝚵11]ijuk(1)[i]uk(2)[j].\displaystyle N\left(1-u_{k}^{(1)}[k]\cdot u_{k}^{(2)}[k]\right)-\sum_{i,j=1,i\neq k,j\neq k}^{r}N\ [\boldsymbol{\Xi}_{11}]_{ij}\cdot u_{k}^{(1)}[i]\cdot u_{k}^{(2)}[j].

For the first term, we have

N(1uk(1)[k]uk(2)[k])=N(1uk(1)[k])+uk(1)[k]N(1uk(2)[k])\displaystyle N\left(1-u_{k}^{(1)}[k]\cdot u_{k}^{(2)}[k]\right)=N(1-u_{k}^{(1)}[k])+u_{k}^{(1)}[k]\cdot N(1-u_{k}^{(2)}[k]) (46)
=:\displaystyle=: N(1uk(1)[k])+N(1uk(2)[k])+ε1\displaystyle N(1-u_{k}^{(1)}[k])+N(1-u_{k}^{(2)}[k])+\varepsilon_{1}
=\displaystyle= N(1uk(1)[k]𝐮kA(1))+Nuk(1)[k]𝐮kA(1)(1𝐮kA(1))+N(1uk(2)[k]𝐮kA(2))\displaystyle N\left(1-\dfrac{u_{k}^{(1)}[k]}{\|{\bf u}_{kA}^{(1)}\|}\right)+N\dfrac{u_{k}^{(1)}[k]}{\|{\bf u}_{kA}^{(1)}\|}(1-\|{\bf u}_{kA}^{(1)}\|)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\|{\bf u}_{kA}^{(2)}\|}\right)
+Nuk(2)[k]𝐮kA(2)(1𝐮kA(2))+ε1\displaystyle+N\dfrac{u_{k}^{(2)}[k]}{\|{\bf u}_{kA}^{(2)}\|}(1-\|{\bf u}_{kA}^{(2)}\|)+\varepsilon_{1}
=:\displaystyle=: N(1uk(1)[k]𝐮kA(1))+N(1𝐮kA(1))+N(1uk(2)[k]𝐮kA(2))\displaystyle N\left(1-\dfrac{u_{k}^{(1)}[k]}{\|{\bf u}_{kA}^{(1)}\|}\right)+N(1-\|{\bf u}_{kA}^{(1)}\|)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\|{\bf u}_{kA}^{(2)}\|}\right)
+N(1𝐮kA(2))+ε2+ε1,\displaystyle+N(1-\|{\bf u}_{kA}^{(2)}\|)+\varepsilon_{2}+\varepsilon_{1},

where

ε1\displaystyle\varepsilon_{1} =\displaystyle= N(1uk(2)[k])(uk(1)[k]1),and\displaystyle N(1-u_{k}^{(2)}[k])(u_{k}^{(1)}[k]-1),~{}~{}~{}~{}\textrm{and}
ε2\displaystyle\varepsilon_{2} =\displaystyle= N(1𝐮kA(1))(uk(1)[k]𝐮kA(1)1)+N(1𝐮kA(2))(uk(2)[k]𝐮kA(2)1).\displaystyle N(1-\|{\bf u}_{kA}^{(1)}\|)\left(\dfrac{u_{k}^{(1)}[k]}{\|{\bf u}_{kA}^{(1)}\|}-1\right)+N(1-\|{\bf u}_{kA}^{(2)}\|)\left(\dfrac{u_{k}^{(2)}[k]}{\|{\bf u}_{kA}^{(2)}\|}-1\right).

By Theorem 3 and Proposition 1, both ε1\varepsilon_{1} and ε2\varepsilon_{2} are op(1)o_{p}(1).

For the second term on the right-hand side of (45), by Proposition 3, we have

Ni,jk,i,j=1r[𝚵11]ijuk(1)[i]uk(2)[j]\displaystyle N\sum_{i,j\neq k,i,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot u_{k}^{(1)}[i]\cdot u_{k}^{(2)}[j] (47)
=\displaystyle= N𝐮kA(1)𝐮kA(2)i,jk,i,j=1r[𝚵11]ijuk(1)[i]𝐮kA(1)uk(2)[j]𝐮kA(2)\displaystyle N\|{\bf u}_{kA}^{(1)}\|\cdot\|{\bf u}_{kA}^{(2)}\|\cdot\sum_{i,j\neq k,i,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\|{\bf u}_{kA}^{(1)}\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\|{\bf u}_{kA}^{(2)}\|}
=\displaystyle= Nik,i=1rjk,j=1r[𝚵11]ijuk(1)[i]𝐮kA(1)uk(2)[j]𝐮kA(2)+op(1).\displaystyle N\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\|{\bf u}_{kA}^{(1)}\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\|{\bf u}_{kA}^{(2)}\|}+o_{p}(1).

Combining (46) and (47) and using (41) and (42), we obtain

N(1𝐮kA(1)𝚵11T𝐮kA(2))\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right)
=\displaystyle= N(1𝐮kA(1))+N(1𝐮kA(2))+N(1uk(1)[k]𝐮kA(1))+N(1uk(2)[k]𝐮kA(2))\displaystyle N\left(1-\|{\bf u}_{kA}^{(1)}\|\right)+N\left(1-\|{\bf u}_{kA}^{(2)}\|\right)+N\left(1-\dfrac{u_{k}^{(1)}[k]}{\|{\bf u}_{kA}^{(1)}\|}\right)+N\left(1-\dfrac{u_{k}^{(2)}[k]}{\|{\bf u}_{kA}^{(2)}\|}\right)
Nik,i=1rjk,j=1r[𝚵11]ijuk(1)[i]𝐮kA(1)uk(2)[j]𝐮kA(2)+op(1)\displaystyle-N\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}[\boldsymbol{\Xi}_{11}]_{ij}\cdot\dfrac{u_{k}^{(1)}[i]}{\|{\bf u}_{kA}^{(1)}\|}\cdot\dfrac{u_{k}^{(2)}[j]}{\|{\bf u}_{kA}^{(2)}\|}+o_{p}(1)
=\displaystyle= N(1𝐮kA(1))+N(1𝐮kA(2))\displaystyle N\left(1-\|{\bf u}_{kA}^{(1)}\|\right)+N\left(1-\|{\bf u}_{kA}^{(2)}\|\right)
+N2T1ik,i=1rλk(1)λi(1)(λk(1)λi(1))2[𝐑N(1)(λk(1))]ki2+N2T2jk,j=1rλk(2)λj(1)(λk(2)λj(2))2[𝐑N(2)(λk(2))]kj2\displaystyle+\dfrac{N}{2T_{1}}\sum_{i\neq k,i=1}^{r}\dfrac{\lambda_{k}^{(1)}\lambda_{i}^{(1)}}{(\lambda_{k}^{(1)}-\lambda_{i}^{(1)})^{2}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{ki}^{2}+\dfrac{N}{2T_{2}}\sum_{j\neq k,j=1}^{r}\dfrac{\lambda_{k}^{(2)}\lambda_{j}^{(1)}}{(\lambda_{k}^{(2)}-\lambda_{j}^{(2)})^{2}}[{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kj}^{2}
N2T1T2ik,i=1rjk,j=1rλk(1)λi(1)λk(1)λi(1)λk(2)λj(2)λk(2)λj(2)[𝐑N(1)(λk(1))]ki[𝐑N(2)(λk(2))]kj(𝚵11)ij\displaystyle-\sqrt{\dfrac{N^{2}}{T_{1}T_{2}}}\sum_{i\neq k,i=1}^{r}\sum_{j\neq k,j=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{i}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{i}^{(1)}}\dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{j}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{j}^{(2)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{ki}\cdot[{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kj}\cdot(\boldsymbol{\Xi}_{11})_{ij}
+op(1).\displaystyle+o_{p}(1).

For k=1,,rk=1,\ldots,r, define two (r1)×1(r-1)\times 1 vectors 𝐚k{\bf a}_{k} and 𝐛k{\bf b}_{k} as

𝐚k=(λk(1)λ1(1)λk(1)λ1(1)[𝐑N(1)(λk(1))]k1λk(1)λk1(1)λk(1)λk1(1)[𝐑N(1)(λk(1))]k(k1)λk(1)λk+1(1)λk(1)λk+1(1)[𝐑N(1)(λk(1))]k(k+1)λk(1)λr(1)λk(1)λr(1)[𝐑N(1)(λk(1))]kr),𝐛k=(λk(2)λ1(2)λk(2)λ1(2)[𝐑N(2)(λk(2))]k1λk(2)λk1(2)λk(2)λk1(2)[𝐑N(2)(λk(2))]k(k1)λk(2)λk+1(2)λk(2)λk+1(2)[𝐑N(2)(λk(2))]k(k+1)λk(2)λr(2)λk(2)λr(2)[𝐑N(2)(λk(2))]kr).\displaystyle{\bf a}_{k}=\begin{pmatrix}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k1}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{k-1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{k-1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k(k-1)}\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{k+1}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{k+1}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k(k+1)}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{r}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{r}^{(1)}}\ [{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{kr}\end{pmatrix},~{}~{}~{}~{}~{}~{}~{}{\bf b}_{k}=\begin{pmatrix}\dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k1}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{k-1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{k-1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k(k-1)}\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{k+1}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{k+1}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{k(k+1)}\\ \vdots\\ \dfrac{\sqrt{\lambda_{k}^{(2)}\lambda_{r}^{(2)}}}{\lambda_{k}^{(2)}-\lambda_{r}^{(2)}}\ [{\bf R}_{N}^{(2)}(\lambda_{k}^{(2)})]_{kr}\end{pmatrix}.

Under the assumptions of Theorem 6, by Lemma 2, we have

𝐚k𝒟N(0,𝐃ak),𝐛k𝒟N(0,𝐃bk),\displaystyle{\bf a}_{k}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\bf D}_{a_{k}}),~{}~{}~{}~{}~{}{\bf b}_{k}\ \stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}}\ N(0,{\bf D}_{b_{k}}), (48)

where

𝐃ak\displaystyle{\bf D}_{a_{k}} =\displaystyle= diag(ωk1(1),,ωk(k1)(1),ωk(k+1)(1),,ωkr(1)),\displaystyle\operatorname{diag}(\omega_{k1}^{(1)},\ldots,\omega_{k(k-1)}^{(1)},\omega_{k(k+1)}^{(1)},\ldots,\omega_{kr}^{(1)}),
𝐃bk\displaystyle{\bf D}_{b_{k}} =\displaystyle= diag(ωk1(2),,ωk(k1)(2),ωk(k+1)(2),,ωkr(2)),\displaystyle\operatorname{diag}(\omega_{k1}^{(2)},\ldots,\omega_{k(k-1)}^{(2)},\omega_{k(k+1)}^{(2)},\ldots,\omega_{kr}^{(2)}),

and

ωkj(i)=θk(i)θj(i)(θk(i)θj(i))2,fori=1,2,1jkr.\omega_{kj}^{(i)}=\dfrac{\theta_{k}^{(i)}\theta_{j}^{(i)}}{(\theta_{k}^{(i)}-\theta_{j}^{(i)})^{2}},~{}~{}~{}~{}~{}~{}\textrm{for}~{}~{}~{}i=1,2,~{}~{}~{}1\leq j\neq k\leq r.

Let 𝚵11,k\boldsymbol{\Xi}_{11,-k} be the matrix obtained by removing the kkth row and kkth column of 𝚵11\boldsymbol{\Xi}_{11}. Then by (45), (43) and (48), we get

N(1𝐮kA(1)𝚵11T𝐮kA(2))\displaystyle N\left(1-{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{11}{\bf u}_{kA}^{(2)}\right)
=\displaystyle= N(1𝐮kA(1))+N(1𝐮kA(2))\displaystyle N\left(1-\|{\bf u}_{kA}^{(1)}\|\right)+N\left(1-\|{\bf u}_{kA}^{(2)}\|\right)
+N2T1𝐚kT𝐚k+N2T2𝐛kT𝐛kN2T1T2𝐚kT𝚵11,k𝐛k+op(1)\displaystyle+\dfrac{N}{2T_{1}}{\bf a}_{k}^{\mathrm{T}}{\bf a}_{k}+\dfrac{N}{2T_{2}}{\bf b}_{k}^{\mathrm{T}}{\bf b}_{k}-\sqrt{\dfrac{N^{2}}{T_{1}T_{2}}}{\bf a}_{k}^{\mathrm{T}}\boldsymbol{\Xi}_{11,-k}{\bf b}_{k}+o_{p}(1)
=\displaystyle= N(1𝐮kA(1))+N(1𝐮kA(2))\displaystyle N\left(1-\|{\bf u}_{kA}^{(1)}\|\right)+N\left(1-\|{\bf u}_{kA}^{(2)}\|\right)
+12(NT1𝐚kNT2𝐛k)T(𝐈r1𝚵11,k𝚵11,kT𝐈r1)(NT1𝐚kNT2𝐛k)+op(1)\displaystyle+\dfrac{1}{2}\begin{pmatrix}\sqrt{\dfrac{N}{T_{1}}}{\bf a}_{k}\\ \sqrt{\dfrac{N}{T_{2}}}{\bf b}_{k}\end{pmatrix}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}\\ -\boldsymbol{\Xi}_{11,-k}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}\begin{pmatrix}\sqrt{\dfrac{N}{T_{1}}}{\bf a}_{k}\\ \sqrt{\dfrac{N}{T_{2}}}{\bf b}_{k}\end{pmatrix}+o_{p}(1)
𝒟\displaystyle\stackrel{{\scriptstyle\mathcal{D}}}{{\rightarrow}} ρ12θk(1)x𝑑F(1)+ρ22θk(2)x𝑑F(2)+12𝐪kT(𝐈r1𝚵11,k𝚵11,kT𝐈r1)𝐪k,\displaystyle\dfrac{\rho_{1}}{2\theta_{k}^{(1)}}\int xdF^{(1)}+\dfrac{\rho_{2}}{2\theta_{k}^{(2)}}\int xdF^{(2)}+\dfrac{1}{2}{\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{*}\\ -{\boldsymbol{\Xi}_{11,-k}^{*}}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k},

where 𝐪kN(0,𝐃k){\bf q}_{k}\sim N(0,{\bf D}_{k}) with 𝐃k=(ρ1𝐃ak00ρ2𝐃bk).{\bf D}_{k}=\begin{pmatrix}\rho_{1}{\bf D}_{a_{k}}&0\\ 0&\rho_{2}{\bf D}_{b_{k}}\end{pmatrix}.

Combining (19), (44) and the convergence

N2Ti(Nr)λ^k(i)j=r+1Nλ^j(i)pρiθk(i)x𝑑F(i),i=1,2,\dfrac{N^{2}}{T_{i}(N-r)\widehat{\lambda}_{k}^{(i)}}\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(i)}\ \,{\buildrel p\over{\longrightarrow}}\,\ \dfrac{\rho_{i}}{\theta_{k}^{(i)}}\int xdF^{(i)},~{}~{}~{}~{}i=1,2,

we get that our test statistic

Tvk=2N(1𝐯^k(1),𝐯^k(2))N2T1(Nr)λ^k(1)j=r+1Nλ^j(1)N2T2(Nr)λ^k(2)j=r+1Nλ^j(2)T_{vk}=2N\left(1-\langle\widehat{{\bf v}}_{k}^{(1)},\widehat{{\bf v}}_{k}^{(2)}\rangle\right)-\dfrac{N^{2}}{T_{1}(N-r)\widehat{\lambda}_{k}^{(1)}}\cdot\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(1)}-\dfrac{N^{2}}{T_{2}(N-r)\widehat{\lambda}_{k}^{(2)}}\cdot\sum_{j=r+1}^{N}\widehat{\lambda}_{j}^{(2)}

converges weakly to

𝐪kT(𝐈r1𝚵11,k𝚵11,kT𝐈r1)𝐪k.{\bf q}_{k}^{T}\begin{pmatrix}{\bf I}_{r-1}&-\boldsymbol{\Xi}_{11,-k}^{*}\\ -{\boldsymbol{\Xi}_{11,-k}^{*}}^{\mathrm{T}}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k}.

It remains to prove (44). By (23), we have

N𝐮kA(1)𝚵12T𝐮kB(2)=N𝐮kA(1)𝚵12T(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)+ε3,\displaystyle N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}+\varepsilon_{3},

where

ε3=N𝐮kA(1)𝚵12T[(λ^k(2)𝐈𝐒22(2))1(λk(2)𝐈𝐒22(2))1]𝐒21(2)𝐮kA(2).\varepsilon_{3}=N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}]{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}. (49)

Write

N𝐮kA(1)𝚵12T(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)=I1+I2+ε4,N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}=I_{1}+I_{2}+\varepsilon_{4},

where

I1\displaystyle I_{1} =\displaystyle= N𝐞~kAT𝚵12(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮~kA(2),\displaystyle N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)},
I2\displaystyle I_{2} =\displaystyle= N(𝐮~kA(1)𝐞~kA)T𝚵12(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮~kA(2),and\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)},\quad\mbox{and}
ε4\displaystyle\varepsilon_{4} =\displaystyle= N(𝐮kA(1)𝐮~kA(1))T𝚵12(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)\displaystyle N({\bf u}_{kA}^{(1)}-\widetilde{{\bf u}}_{kA}^{(1)})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
+N𝐮~kA(1)𝚵12T(λk(2)𝐈𝐒22(2))1𝐒21(2)(𝐮kA(2)𝐮~kA(2))\displaystyle+N\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}({\bf u}_{kA}^{(2)}-\widetilde{{\bf u}}_{kA}^{(2)})
=:\displaystyle=: ε41+ε42.\displaystyle\varepsilon_{41}+\varepsilon_{42}.

For term I1I_{1}, note that 𝐞~kA𝚵12T\widetilde{{\bf e}}_{kA}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12} is the kkth row of 𝚵12\boldsymbol{\Xi}_{12} which is zero, hence I1=0I_{1}=0.

Next, we prove that I2=op(1)I_{2}=o_{p}(1). Write

I2\displaystyle I_{2} =\displaystyle= N(𝐮~kA(1)𝐞~kA)T𝚵12(λk(2)𝐈𝐒22(2))1𝐒21(2)(𝐮~kA(2)𝐞~kA)\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}(\widetilde{{\bf u}}_{kA}^{(2)}-\widetilde{{\bf e}}_{kA})
+N(𝐮~kA(1)𝐞~kA)T𝚵12(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐞~kA\displaystyle+N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}
=:\displaystyle=: I21+I22.\displaystyle I_{21}+I_{22}.

By Proposition 2 and that λk(2)=O(N)\lambda_{k}^{(2)}=O(N), we get I21=op(1)I_{21}=o_{p}(1). As to I22I_{22}, by (40), Assumption A and the facts that 𝐒22(2)=Op(1)\|{\bf S}_{22}^{(2)}\|=O_{p}(1) and 𝐒21(2)=Op(N)\|{\bf S}_{21}^{(2)}\|=O_{p}(\sqrt{N}), we obtain

I22\displaystyle I_{22} =\displaystyle= NT1k,=1rλk(1)λ(1)λk(1)λ(1)[𝐑N(1)(λk(1))]k\displaystyle\dfrac{N}{\sqrt{T_{1}}}\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{\ell}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{\ell}^{(1)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k\ell}
×𝐞~AT𝚵12(λk(2)𝐈𝐒22(2))11T2𝐗B(2)𝐙A(2)T𝚲A(2)1/2𝐞~kA+op(1)\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\times\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}\dfrac{1}{T_{2}}{\bf X}_{B}^{(2)}{{\bf Z}_{A}^{(2)}}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(2)}}^{1/2}\widetilde{{\bf e}}_{kA}+o_{p}(1)
=\displaystyle= NT1λk(2)k,=1rλk(1)λ(1)λk(1)λ(1)[𝐑N(1)(λk(1))]k\displaystyle\dfrac{N}{\sqrt{T_{1}\lambda_{k}^{(2)}}}\sum_{\ell\neq k,\ell=1}^{r}\dfrac{\sqrt{\lambda_{k}^{(1)}\lambda_{\ell}^{(1)}}}{\lambda_{k}^{(1)}-\lambda_{\ell}^{(1)}}[{\bf R}_{N}^{(1)}(\lambda_{k}^{(1)})]_{k\ell}
×1T2𝐞~AT𝚵12(𝐈1/λk(2)𝐒22(2))1𝐗B(2)𝐳(k)(2)+op(1).\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\times\dfrac{1}{T_{2}}\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}+o_{p}(1).

Using the independence between 𝐳(k)(2){\bf z}_{(k)}^{(2)} and 𝐗B(2){\bf X}_{B}^{(2)}, 𝐒22(2){\bf S}_{22}^{(2)}, Assumption A and that 𝐒22=Op(1)\|{\bf S}_{22}\|=O_{p}(1), we have

1T2𝐞~AT𝚵12(𝐈1/λk(2)𝐒22(2))1𝐗B(2)𝐳(k)(2)=op(1).\dfrac{1}{T_{2}}\widetilde{{\bf e}}_{\ell A}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}=o_{p}(1).

Therefore, I22=op(1)I_{22}=o_{p}(1).

We now analyze ε4\varepsilon_{4}. For ε41\varepsilon_{41}, because N(𝐮kA(1)1)=Op(1)N(\|{\bf u}_{kA}^{(1)}\|-1)=O_{p}(1), by Proposition 3 and that 𝐒21(2)=Op(N)||{\bf S}_{21}^{(2)}||=O_{p}(\sqrt{N}), we get ε41=op(1)\varepsilon_{41}=o_{p}(1). Similarly, we get ε42=op(1)\varepsilon_{42}=o_{p}(1).

To sum up, we have shown that

N𝐮kA(1)𝚵12T(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)=op(1).N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}=o_{p}(1). (50)

Next, we prove that ε3=op(1)\varepsilon_{3}=o_{p}(1). We have

ε3\displaystyle\varepsilon_{3} =\displaystyle= N(λk(2)λ^k(2))𝐮kA(1)𝚵12T(λ^k(2)𝐈𝐒22(2))1(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)\displaystyle N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
=\displaystyle= N(λk(2)λ^k(2))𝐮kA(1)𝚵12T(λk(2)𝐈𝐒22(2))2𝐒21(2)𝐮kA(2)\displaystyle N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
+N(λk(2)λ^k(2))𝐮kA(1)𝚵12T[(λ^k(2)𝐈𝐒22(2))1(λk(2)𝐈𝐒22(2))1](λk(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)\displaystyle+N(\lambda_{k}^{(2)}-\widehat{\lambda}_{k}^{(2)}){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}](\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
=\displaystyle= Nλk(2)(1λ^k(2)λk(2))𝐮kA(1)𝚵12T(𝐈1/λk(2)𝐒22(2))2𝐒21(2)𝐮kA(2)\displaystyle\dfrac{N}{\lambda_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\lambda_{k}^{(2)}}\right){\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
+Nλ^k(2)(1λ^k(2)λk(2))2𝐮kA(1)𝚵12T(𝐈1/λ^k(2)𝐒22(2))1(𝐈1/λk(2)𝐒22(2))2𝐒21(2)𝐮kA(2)\displaystyle+\dfrac{N}{\widehat{\lambda}_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(2)}}{\lambda_{k}^{(2)}}\right)^{2}{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}({\bf I}-1/\widehat{\lambda}_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}({\bf I}-1/\lambda_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-2}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
=:\displaystyle=: ε31+ε32.\displaystyle\varepsilon_{31}+\varepsilon_{32}.

Following the same proof strategy as for (50) and applying Theorem 1, we get ε11=op(1)\varepsilon_{11}=o_{p}(1). For ε32\varepsilon_{32}, using Assumption (A.i), Theorem 1 and that 𝐒21(2)=Op(N)||{\bf S}_{21}^{(2)}||=O_{p}(\sqrt{N}), we get ε42=op(1)\varepsilon_{42}=o_{p}(1).

To sum up, we have

N𝐮kA(1)𝚵12T𝐮kB(2)=op(1).N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{12}{\bf u}_{kB}^{(2)}=o_{p}(1).

Using the same argument we get N𝐮kB(1)𝚵21T𝐮kA(2)=op(1).N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{21}{\bf u}_{kA}^{(2)}=o_{p}(1).

Finally, we show that N𝐮kB(1)𝚵22T𝐮kB(2)=op(1).N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1). By (23), we have

N𝐮kB(1)𝚵22T𝐮kB(2)\displaystyle N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}
=\displaystyle= N𝐮kA(1)𝐒12(1)T(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐮kA(2)\displaystyle N{\bf u}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}{\bf u}_{kA}^{(2)}
=\displaystyle= N𝐮kA(1)𝐮kA(2)𝐮~kA(1)𝐒12(1)T(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐮~kA(2)\displaystyle N\|{\bf u}_{kA}^{(1)}\|\cdot\|{\bf u}_{kA}^{(2)}\|\cdot\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}
=\displaystyle= N𝐮~kA(1)𝐒12(1)T(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐮~kA(2)+op(1),\displaystyle N\widetilde{{\bf u}}_{kA}^{(1)}{}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}+o_{p}(1),

where the last step follows from Proposition 3. Note that by equation (40), we have

N(𝐮~kA(1)𝐞~kA)T𝐒12(1)(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐮~kA(2)\displaystyle N(\widetilde{{\bf u}}_{kA}^{(1)}-\widetilde{{\bf e}}_{kA})^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf u}}_{kA}^{(2)}
=\displaystyle= Op(Nλk(1)1λk(1)1λk(2)λk(2))=op(1).\displaystyle O_{p}\left(\sqrt{N}\cdot\sqrt{\lambda_{k}^{(1)}}\cdot\dfrac{1}{\lambda_{k}^{(1)}}\cdot\dfrac{1}{\lambda_{k}^{(2)}}\cdot\sqrt{\lambda_{k}^{(2)}}\right)=o_{p}(1).

Similarly,

N𝐞~kAT𝐒12(1)(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)(𝐮~kA(2)𝐞~kA)=op(1).N\ \widetilde{{\bf e}}_{kA}^{T}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}(\widetilde{{\bf u}}_{kA}^{(2)}-\widetilde{{\bf e}}_{kA})=o_{p}(1).

Therefore,

N𝐮kB(1)𝚵22T𝐮kB(2)=N𝐞~kAT𝐒12(1)(λ^k(1)𝐈𝐒22(1))1𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐞~kA+op(1).N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}+o_{p}(1).

Note further that by Theorem 1,

N𝐞~kAT𝐒12(1)[(λ^k(1)𝐈𝐒22(1))1(λk(1)𝐈𝐒22(1))1]𝚵22(λ^k(2)𝐈𝐒22(2))1𝐒21(2)𝐞~kA\displaystyle N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}\left[(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}-(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\right]\boldsymbol{\Xi}_{22}(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}
=\displaystyle= Nλ^k(1)λ^k(2)(1λ^k(1)λk(1))\displaystyle\dfrac{N}{\widehat{\lambda}_{k}^{(1)}\widehat{\lambda}_{k}^{(2)}}\left(1-\dfrac{\widehat{\lambda}_{k}^{(1)}}{\lambda_{k}^{(1)}}\right)
×𝐞~kAT𝐒12(1)(𝐈1/λ^k(1)𝐒22(1))1(𝐈1/λk(1)𝐒22(1))1𝚵22(𝐈1/λ^k(2)𝐒22(2))1𝐒21(2)𝐞~kA\displaystyle\times\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}({\bf I}-1/\widehat{\lambda}_{k}^{(1)}\cdot{\bf S}_{22}^{(1)})^{-1}({\bf I}-1/\lambda_{k}^{(1)}\cdot{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}({\bf I}-1/\widehat{\lambda}_{k}^{(2)}\cdot{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}
=\displaystyle= op(1).\displaystyle o_{p}(1).

Similarly,

N𝐞~kAT𝐒12(1)(λ^k(1)𝐈𝐒22(1))1𝚵22[(λ^k(2)𝐈𝐒22(2))1(λk(2)𝐈𝐒22(2))1]𝐒21(2)𝐞~kA=op(1).N\ \widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\widehat{\lambda}_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}\left[(\widehat{\lambda}_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}-(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}\right]{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}=o_{p}(1).

It follows that

N𝐮kB(1)𝚵22T𝐮kB(2)=N𝐞~kAT𝐒12(1)(λk(1)𝐈𝐒22(1))1𝚵22(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐞~kA+op(1).\displaystyle N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=N\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}+o_{p}(1).

Note that

N𝐞~kAT𝐒12(1)(λk(1)𝐈𝐒22(1))1𝚵22(λk(2)𝐈𝐒22(2))1𝐒21(2)𝐞~kA\displaystyle N\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{\bf S}_{12}^{(1)}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf S}_{21}^{(2)}\widetilde{{\bf e}}_{kA}
=\displaystyle= NT1T2𝐞~kAT𝚲A(1)1/2𝐙A(1)𝐗B(1)T(λk(1)𝐈𝐒22(1))1𝚵22(λk(2)𝐈𝐒22(2))1𝐗B(2)𝐙A(2)T𝚲A(2)1/2𝐞~kA\displaystyle\dfrac{N}{T_{1}T_{2}}\widetilde{{\bf e}}_{kA}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(1)}}^{1/2}{\bf Z}_{A}^{(1)}{{\bf X}_{B}^{(1)}}^{\mathrm{T}}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{{\bf Z}_{A}^{(2)}}^{\mathrm{T}}{{\boldsymbol{\Lambda}}_{A}^{(2)}}^{1/2}\widetilde{{\bf e}}_{kA}
=\displaystyle= NT1T2λk(1)λk(2)𝐳(k)(1)T𝐗B(1)T(λk(1)𝐈𝐒22(1))1𝚵22(λk(2)𝐈𝐒22(2))1𝐗B(2)𝐳(k)(2).\displaystyle\dfrac{N}{T_{1}T_{2}}\sqrt{\lambda_{k}^{(1)}\lambda_{k}^{(2)}}\cdot{{\bf z}_{(k)}^{(1)}}^{\mathrm{T}}{{\bf X}_{B}^{(1)}}^{\mathrm{T}}(\lambda_{k}^{(1)}{\bf I}-{\bf S}_{22}^{(1)})^{-1}\boldsymbol{\Xi}_{22}(\lambda_{k}^{(2)}{\bf I}-{\bf S}_{22}^{(2)})^{-1}{\bf X}_{B}^{(2)}{\bf z}_{(k)}^{(2)}.

Using the independence among 𝐳k(1),𝐳(k)(2),𝐗B(1){\bf z}_{k}^{(1)},{\bf z}_{(k)}^{(2)},{\bf X}_{B}^{(1)} and 𝐗B(2){\bf X}_{B}^{(2)}, Assumption A and that 𝐒22=Op(1)\|{\bf S}_{22}\|=O_{p}(1), one can show that the last term is op(1)o_{p}(1). It follows that

N𝐮kB(1)𝚵22T𝐮kB(2)=op(1),N{\bf u}_{kB}^{(1)}{}^{\mathrm{T}}\boldsymbol{\Xi}_{22}{\bf u}_{kB}^{(2)}=o_{p}(1),

which completes the proof of Theorem 6. ∎

Proof of Corollary 1.

If (𝐯1(1),,𝐯r(1))=(𝐯1(2),,𝐯r(2))({{\bf v}}_{1}^{(1)},\ldots,{{\bf v}}_{r}^{(1)})=({{\bf v}}_{1}^{(2)},\ldots,{{\bf v}}_{r}^{(2)}), then 𝚵11,k=𝐈r1\boldsymbol{\Xi}_{11,-k}={\bf I}_{r-1}. Denote 𝐪k=(𝐪kA𝐪kB,),{\bf q}_{k}=\begin{pmatrix}{\bf q}_{kA}\\ {\bf q}_{kB},\end{pmatrix}, where 𝐪kAN(0,ρ1𝐃ak){\bf q}_{kA}\sim N(0,\rho_{1}{\bf D}_{a_{k}}), 𝐪kBN(0,ρ2𝐃bk){\bf q}_{kB}\sim N(0,\rho_{2}{\bf D}_{b_{k}}) and 𝐪kA{\bf q}_{kA} and 𝐪kB{\bf q}_{kB} are independent. Therefore, the limiting distribution becomes

𝐪kT(𝐈r1𝐈r1𝐈r1𝐈r1)𝐪k\displaystyle{\bf q}_{k}^{\mathrm{T}}\begin{pmatrix}{\bf I}_{r-1}&{\bf I}_{r-1}\\ {\bf I}_{r-1}&{\bf I}_{r-1}\end{pmatrix}{\bf q}_{k} =\displaystyle= 𝐪kAT𝐪kA+2𝐪kAT𝐪kB+𝐪kBT𝐪kB\displaystyle{\bf q}_{kA}^{\mathrm{T}}{\bf q}_{kA}+2{\bf q}_{kA}^{\mathrm{T}}{\bf q}_{kB}+{\bf q}_{kB}^{\mathrm{T}}{\bf q}_{kB}
=\displaystyle= jk,j=1r(qkA[j]+qkB[j])2\displaystyle\sum_{j\neq k,j=1}^{r}\left(q_{kA}[j]+q_{kB}[j]\right)^{2}
=d\displaystyle\stackrel{{\scriptstyle d}}{{=}} jk,j=1r(ρ1ωkj(1)+ρ2ωkj(2))Zj2,\displaystyle\sum_{j\neq k,j=1}^{r}\left(\rho_{1}\omega_{kj}^{(1)}+\rho_{2}\omega_{kj}^{(2)}\right)\cdot Z_{j}^{2},

where ZjZ_{j} are i.i.d. standard normal random variables.