This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Quadratic Optimization Bias of Large Covariance Matrices

Hubeyb Gurdoganlabel=e1]hgurdogan@math.ucla.edu [    Alex Shkolniklabel=e2]shkolnik@ucsb.edu [ Department of Mathematics, University of California, Los Angeles, CA. presep=, ]e1 Department of Statistics and Applied Probability, University of California, Santa Barbara, CA. presep=, ]e2
Abstract

We describe a puzzle involving the interactions between an optimization of a multivariate quadratic function and a “plug-in” estimator of a spiked covariance matrix. When the largest eigenvalues (i.e., the spikes) diverge with the dimension, the gap between the true and the out-of-sample optima typically also diverges. We show how to “fine-tune” the plug-in estimator in a precise way to avoid this outcome. Central to our description is a “quadratic optimization bias” function, the roots of which determine this fine-tuning property. We derive an estimator of this root from a finite number of observations of a high dimensional vector. This leads to a new covariance estimator designed specifically for applications involving quadratic optimization. Our theoretical results have further implications for improving low dimensional representations of data, and principal component analysis in particular.

,
62H12,
62H25,
Covariance estimation, optimization, sample eigenvector correction, dimension reduction, spectral methods, principal component analysis,
keywords:
[class=MSC]
keywords:
\startlocaldefs\foreach\x

in A,B,…,Z,a,b,…,z \endlocaldefs

and

1 Introduction

Optimization with a “plug-in” model as an ingredient is routine practice in modern statistical problems in engineering and the sciences. Yet the interactions between the optimization procedure and the errors in an estimated model are often not well understood. Natural questions in this context include the following: “Does the optimizer amplify or reduce the statistical errors in the model? How does one leverage that information if it is known? Which components of the model should be estimated more precisely, and which can afford less accuracy?” We explore these questions for the optimization of a multivariate quadratic function that is specified in terms of a large covariance model. This setup is canonical for many problems that are encountered in the areas of finance, signal-noise processing, operations research and statistics.

Large covariance estimation occupies an important place in high-dimensional statistics and is fundamental to multivariate data analysis (e.g., Yao, Zheng and Bai (2015), Fan, Liao and Liu (2016) and Lam (2020)). A covariance model generalizes the classical setting of independence by introducing pairwise correlations. A parsimonious way to prescribe such correlations for many variables is through the use of a relatively small number of factors, which are high-dimensional vectors that govern all or most of the correlations in the observed data. This leads to a particular type of covariance matrix, a so called “spiked-model” in which a small number of (spiked) eigenvalues separate themselves with a larger magnitude from the remaining (bulk) spectrum (Wang and Fan, 2017). Imposing this factor structure may also be viewed as a form of regularization which replaces the problem of estimating p2p^{2} unknown parameters of a p×pp\times p covariance matrix Σ\Sigma with the estimation of a few “structured” components of this matrix. Determining the components that require the most attention in a setting that entails optimization is a central motivation of our work.

1.1 Motivation

To motivate the study of the interplay between optimization and model estimation error, we consider a quadratic function in pp variables. Let,

Q(x)=c0+c1x,ζ12x,Σx(x\bbRp)\displaystyle\hskip 32.0ptQ(x)=c_{0}+c_{1}\hskip 1.0pt\langle x,\zeta\rangle-\frac{1}{2}\hskip 1.0pt\langle x,\Sigma x\rangle\hskip 32.0pt(x\in\bbR^{p}) (1)

for an inner product ,\langle\hskip 1.0pt\cdot,\cdot\hskip 1.0pt\rangle, constants c0,c1\bbRc_{0},c_{1}\in\bbR and a vector ζ\bbRp\zeta\in\bbR^{p}. The p×pp\times p matrix Σ\Sigma is assumed to be symmetric and positive definite. The maximization of (1)(\ref{quad}) is encountered in many classical contexts within statistics and probability including least-squares regression, maximum a posteriori estimation, saddle-point approximations, and Legendre-Fenchel transforms in moderate/large deviations theory. Some related and highly influential applications include Markowitz’s portfolio construction in finance (Markowitz, 1952), Capon beamforming in signal processing (Capon, 1969) and optimal fingerprinting in climate science (Hegerl et al., 1996). In optimization theory, quadratic functions form a key ingredient for more general (black-box) minimization techniques such as trust-region methods (e.g., Maggiar et al. (2018)).111In this setting the covariance matrix corresponds to an estimated Hessian matrix. Since any number of linear equality constraints may be put into the unconstrained Lagrangian form in (1)(\ref{quad}), our setting is more general than it first appears. Moreover, the maximization of (1)(\ref{quad}) is the starting point for numerous applications of quadratic programming where nonlinear constraints are often added.222To give one example, interpreting Σ\Sigma as a graph adjacency matrix and adding simple bound constants to (1)(\ref{quad}) leads to approximations of graph properties such as the maximum independent set (Hager and Hungerford, 2015). While Σ\Sigma is no longer interpreted as a spiked covariance matrix in a graph theory setting, its mathematical properties are similar owing to the celebrated Cheeger’s inequality.

The maximizer of Q()Q(\hskip 1.0pt\cdot\hskip 1.0pt) is given by c1Σ1ζc_{1}\hskip 1.0pt\Sigma^{-1}\zeta which attains the objective value

maxx\bbRpQ(x)=c0+c12μp22(μp2=ζ,Σ1ζ),\displaystyle\hskip 32.0pt\max_{x\in\bbR^{p}}Q(x)=c_{0}+\frac{c_{1}^{2}\mu^{2}_{p}}{2}\hskip 32.0pt\big{(}\hskip 1.0pt\mu^{2}_{p}=\langle\zeta,\Sigma^{-1}\zeta\rangle\hskip 1.0pt\big{)}\hskip 1.0pt, (2)

but in practice, the maximization of Q()Q(\hskip 1.0pt\cdot\hskip 1.0pt) is performed with an estimate ^Σ\bm{\hat{}}{\Sigma} replacing the unknown Σ\Sigma. This “plug-in” step is well known to yield a perplexing problem (see Section 1.3). In essence, the optimizer chases the errors in ^Σ\bm{\hat{}}{\Sigma} to produce a systematic bias in the computed maximum. This bias is then amplified by a higher dimension.

Consider a high-dimensional limit pp\uparrow\infty and a sequence of symmetric positive definite Σ=Σp×p\Sigma=\Sigma_{p\times p} with a fixed number qq of spiked eigenvalues diverging in pp and all remaining eigenvalues bounded in (0,)(0,\infty). Let x^\hat{x} be the maximizer of Q^()\hat{Q}(\hskip 1.0pt\cdot\hskip 1.0pt), defined by replacing Σ\Sigma in (1)(\ref{quad}) by estimates ^Σ=^Σp×p\bm{\hat{}}{\Sigma}=\bm{\hat{}}{\Sigma}_{p\times p} with the same eigenvalue properties. The estimated objective is Q^(x^)\hat{Q}(\hat{x}), but a more relevant quantity is the realized objective,

Q(x^)\displaystyle Q(\hat{x}) =c0+c1x^,ζ12x^,Σx^=c0+c12μ^p22D^p\displaystyle=c_{0}+c_{1}\hskip 1.0pt\langle\hat{x},\zeta\rangle-\frac{1}{2}\langle\hat{x},\Sigma\hat{x}\rangle=c_{0}+\frac{c_{1}^{2}\hat{\mu}^{2}_{p}}{2}\hskip 1.0pt\hat{D}_{p} (3)

where μ^p2=ζ,^Σ1ζ\hat{\mu}^{2}_{p}=\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle and D^p\hat{D}_{p} is a discrepancy (relative to (2)(\ref{conv})) that can grow rapidly as the dimension increases. Precluding edge cases where μ^p2/ζ,ζ\hat{\mu}^{2}_{p}\hskip 1.0pt/\langle\zeta,\zeta\rangle or ζ,ζ\langle\zeta,\zeta\rangle vanish, unless ^Σ\bm{\hat{}}{\Sigma} is fine-tuned in a calculated way, the following puzzling behavior ensues.

  1.  

    The discrepancy D^p\hat{D}_{p} tends to -\infty as pp\uparrow\infty and consequently, the realized maximum Q(x^)Q(\hat{x}) tends to -\infty while the true maximum (2)(\ref{conv}) tends to ++\infty.

The asymptotic behavior above is fully determined by a certain \bbRq\bbR^{q}-valued function \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) which we derive and call the quadratic optimization bias. The way in which this bias depends on the entries of ^Σ\bm{\hat{}}{\Sigma} characterizes the sought after interplay between the optimizer and the error in the estimated covariance model. Mitigating the discrepancy between the realized and true quadratic optima (2)(\ref{conv}) and (3)(\ref{Qhx}) reduces to the problem of approximating the roots of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt). We remark that by parametrizing the constants c0c_{0} and c1c_{1} in pp, one can arrive at an alternative limits for (2)(\ref{conv}) and (3)(\ref{Qhx}), but practical scalings preserve the large disparity between the true and realized objective values. We examine (in Section 2.2) the scaling c1=1/pc_{1}=1/p in particular, due to its applicability to portfolio theory, robust beamforming and optimal fingerprinting.

1.2 Summary of results & organization

The illustration above reflects that, in statistical settings, solutions to estimated quadratic optimization problems exhibit very poor properties out-of-sample. Section 2 answers the question of which components of Σ\Sigma must be estimated accurately to reduce the discrepancy D^p\hat{D}_{p} in (3)(\ref{Qhx}). The size of |D^p||\hat{D}_{p}| is amplified by the growth rate rpr_{p} of the qq spiked eigenvalues, but is fully determined by the precision of the estimate \scrH\scrH of the associated p×qp\times q matrix of eigenvectors \scrB\scrB of Σ\Sigma. In particular, D^p=|\scrEp(\scrH)|2O(rp)\hat{D}_{p}=-|\scrE_{p}(\scrH)|^{2}\hskip 1.0ptO(r_{p}) where \scrEp(\scrH)\scrE_{p}(\scrH) is given by,

\scrEp(\scrH)=\scrBz(\scrB\scrH)(\scrHz)1|\scrHz|2(z=ζ|ζ|),\displaystyle\hskip 32.0pt\scrE_{p}(\scrH)=\frac{\scrB^{\top}z-(\scrB^{\top}\scrH)(\scrH^{\top}z)}{\sqrt{1-|\scrH^{\top}z|^{2}}}\hskip 32.0pt\Big{(}z=\frac{\zeta}{|\zeta|}\Big{)}\hskip 1.0pt, (4)

for the Euclidean length |||\cdot|\hskip 1.0pt. Theorem 1 gives sharp asymptotics for D^p\hat{D}_{p} in \scrEp(\scrH)\scrE_{p}(\scrH) and the other estimates/parameters. Remarkably, the accuracy of the estimates of eigenvalues of Σ\Sigma is secondary relative the ensuring that \scrH\scrH is such that \scrEp(\scrH)\scrE_{p}(\scrH) is small for large pp. This is noteworthy in view of the large literature on bias correction of sample eigenvalues (or “eigenvalue shrinkage”: see Ollila, Palomar and Pascal (2020), Ledoit and Wolf (2021), Ledoit and Wolf (2022) and Donoho, Gavish and Romanov (2023) for a sampling of recent work). Instead, for the discrepancy D^p\hat{D}_{p}, the estimation of the eigenvectors of the spikes is what matters most. We remark that while \scrH=\scrB\scrH=\scrB forms a root of the map \scrEp:\bbRp×q\bbRq\scrE_{p}\hskip 0.5pt:\hskip 0.5pt\bbR^{p\times q}\to\bbR^{q} (i.e., \scrEp(\scrB)=0q\scrE_{p}(\scrB)=0_{q}), it is not the only root. We refer to \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) as the quadratic optimization bias (function) which was first identified in Goldberg, Papanicolaou and Shkolnik (2022) in the context of portfolio theory and for the special covariance Σ\Sigma with a single spike (q=1)(q=1) and identical remaining eigenvalues.333We state a more general definition in Section 2, but \scrH\scrH must have orthonormal columns in (4)(\ref{obf}).

Section 3 considers a p×pp\times p sample covariance matrix SS and its spectral decomposition S=\scrH\calSp2\scrH+GS=\scrH\calS_{p}^{2}\scrH^{\top}+G, for a diagonal q×qq\times q matrix of eigenvalues \calSp2\calS^{2}_{p}, the associated p×qp\times q matrix \scrH\scrH of eigenvectors (\scrH\scrH=Iq\scrH^{\top}\scrH=I_{q}) and a residual GG. It is assumed that rpr_{p} is O(p)O(p) and that the sequence S=Sp×pS=S_{p\times p} is based on a fixed number of observations of a high dimensional vector. Our Theorem 3 proves that \scrEp(\scrH)\scrE_{p}(\scrH) is almost surely bounded away from zero (in \bbRq\bbR^{q}) eventually in pp. This has material implications for the use of principal component analysis for problems motivated by Section 1.1.

Section 5 develops the following correction to the sample eigenvectors \scrH\scrH. For the q×qq\times q diagonal matrix Ψ{{\Psi}} satisfying Ψ2=Iqtr(G)\calSp2/nq{{\Psi}}^{2}=I_{q}-\hskip 1.0pt\textup{\text{tr}}(G)\hskip 1.0pt\calS_{p}^{-2}/n_{q} for nq1n_{q}\geq 1, the difference between the number of nonzero sample eigenvalues and qq, we compute

\scrHΨ+z\scrH\scrHz1|\scrHz|2z\scrH(Ψ1Ψ).\displaystyle\scrH\Psi+\frac{z-\scrH\scrH^{\top}z}{1-|\scrH^{\top}z|^{2}}\hskip 1.0ptz^{\top}\scrH(\Psi^{-1}-\Psi)\hskip 1.0pt. (5)

Theorem 9 proves the p×qp\times q matrix of left singular vectors of (5)(\ref{Hcorr}), denoted \scrH\scrH_{\sharp}, has

\scrEp(\scrH)0q(p),\displaystyle\hskip 32.0pt\scrE_{p}(\scrH_{\sharp})\to 0_{q}\hskip 32.0pt(p\uparrow\infty)\hskip 1.0pt, (6)

almost surely. The matrix \scrH\scrH_{\sharp} constitutes a set of corrected principal component loadings and is the basis of our covariance estimator ^Σ\bm{\hat{}}{\Sigma}_{\sharp}. This matrix, owing to (6)(\ref{EHslim}), yields an improved plug-in estimator x=c1^Σ1ζx_{\sharp}=c_{1}\bm{\hat{}}{\Sigma}_{\sharp}^{-1}\zeta for the maximizer of (1)(\ref{quad}). Thus, our work also has implications for the estimation of the precision matrix Σ1\Sigma^{-1}. Theorem 9 also proves that the columns of \scrH\scrH_{\sharp} have a larger projection (than \scrH\scrH) onto the column space of \scrB\scrB. Recent literature has remarked on the difficulty (or even impossibility) of correcting such bias in eigenvectors (e.g., Ledoit and Wolf (2017), Wang and Fan (2017) and Jung (2022)). That projection is strictly better when zz in (4)(\ref{obf}) has |\scrBz||\scrB^{\top}z| bounded away from zero, i.e., captures information about that subspace. But (6)(\ref{EHslim}) holds regardless, highlighting that the choice of the “loss” function (in our case (3)(\ref{Qhx})) matters.444See also Donoho, Gavish and Johnstone (2018) for another illustration of this phenomenon.

In Section 4, we prove an impossibility theorem (Theorem 8) that shows that without very strong assumptions one cannot obtain an estimator of \scrB\scrH\scrB^{\top}\scrH asymptotically in the dimension if q>1q>1. This has negative implications for obtaining estimates of \scrEp(\scrH)\scrE_{p}(\scrH) in (4)(\ref{obf}) where \scrB\scrH\scrB^{\top}\scrH is one of the unknowns. The latter contains all q2q^{2} inner products between the sample and population eigenvectors, and its estimation from the observed data is an interesting theoretical problem in its own right. Our negative result adds to the literature on high dimension and low sample size (hdlss) asymptotics, as inspired by Hall, Marron and Neeman (2005) and Ahn et al. (2007).555Aoshima et al. (2018) survey much of the literature since. We remark that the hdlss regime is highly relevant for real-world data as a small sample size is often imposed by experimental constraints, or by the lack of long-range stationarity of time series. The content of Theorem 8 also points to a key feature that distinguishes our work from Goldberg, Papanicolaou and Shkolnik (2022)) who fix q=1q=1. Another aspect making our setting substantially more challenging is that we find roots of a multivariate function \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) (which is univariate when q=1q=1).

Refer to caption
Figure 1: Discrepancy D^p\hat{D}_{p} (with two standard deviation error bars) for two covariance models estimated from the simulated data sets of Section 6. The first (solid line) is based on (5)(\ref{Hcorr}) and the resulting corrected eigenvectors \scrH\scrH_{\sharp} . The second (dashed line) is based on the raw sample eigenvectors \scrH\scrH (pca). The optimal D^p\hat{D}_{p} equals 11.

In terms of applications, our results generalize those of Goldberg, Papanicolaou and Shkolnik (2022) to covariance models that hold wide acceptance in the empirical literature on financial asset return (i.e., the Arbitrage Pricing Theory of Ross (1976), Huberman (1982), Chamberlain and Rothschild (1983) and others). Section 6 investigates the problem of minimum variance investing with numerical simulation, and demonstrates that the estimator \scrH\scrH_{\sharp} results in vanishing asymptotic portfolio risk and a bounded discrepancy D^p\hat{D}_{p} (see Figure 1). Appendix E summarizes other applications including signal-noise processing and climate science as related to Section 1.1.

1.3 Limitations & related literature

Our findings in Section 1.1 form a starting point for important extensions and applications. Extending the estimator in (5)(\ref{Hcorr}) to general quadratic programming with inequality constraints would greatly expand its scope. In terms of covariance models, we require spikes that diverge linearly with the dimension, which excludes several alternative frameworks in the literature.666This includes the Johnstone spike model, in which all eigenvalues remain bounded as the dimension grows, and its extensions (e.g., Johnstone (2001), Paul (2007), Johnstone and Lu (2009) and Bai and Yao (2012)). Futher generalizations include slowly growing spiked eigenvalue models as in De Mol, Giannone and Reichlin (2008), Onatski (2012), Shen et al. (2016) and Bai and Ng (2023). Likewise, the asymptotics of the data matrix aspect ratio p/np/n differs across applications. We also do not address the important setting in which the number of spikes qq is misspecified.777There is a large literature on the estimation of the number of spikes/factors/principal components. Most relevant to our setup (high dimension and low sample size) is Jung, Lee and Ahn (2018). Finally, the established convergence in (6)(\ref{EHslim}) leaves the question of rates unanswered. This is particularly important for problems requiring the discrepancy D^p\hat{D}_{p} to not grow too quickly. We offer no theoretical treatment of convergence rates but our numerical results suggest this quantity remains bounded as pp grows (c.f., Figure 1).

The work we build on directly was initiated in Goldberg, Papanicolaou and Shkolnik (2022). We refer to their proposal as the GPS estimator and derive it in Section 5.1. Important extensions are developed in Gurdogan and Kercheval (2022) and Goldberg, Gurdogan and Kercheval (2023). The GPS estimator was shown to be mathematically equivalent to a James-Stein estimation of the leading eigenvector of a sample covariance matrix in Shkolnik (2022). These results share much in common with the ideas found in Casella and Hwang (1982). For a survey of the above literature, focusing on connections to the James-Stein estimator, see Goldberg and Kercheval (2023). The GPS estimator is explained in terms of regularization in Lee and Shkolnik (2024a), and Lee and Shkolnik (2024b) derive central limit theorems for this estimator as relevant for the convergence of the discrepancy D^p\hat{D}_{p}. Some numerical exploration of the case of more than one spike is found in Goldberg et al. (2020).

The spiked covariance models we consider, and the application of pca for their estimation, are rooted in the literature on approximate factor models and “asymptotic principal components” originating with Chamberlain and Rothschild (1983) and Connor and Korajczyk (1986). Recent work in this direction is well represented by Bai and Ng (2008), Fan, Liao and Mincheva (2013), Bai and Ng (2023) and Fan, Masini and Medeiros (2023). In this literature, the work that most closely resembles ours, by focusing on improved estimation of sample eigenvectors, is Fan, Liao and Wang (2016), Fan and Zhong (2018) and Lettau and Pelger (2020). Fan, Liao and Wang (2016) project the data onto a space generated by some externally observed covariates, improving the resulting sample eigenvectors when the covariates have sufficient explanatory power. Fan and Zhong (2018) apply a linear transformation to the sample eigenvectors in an approach that is most closely related to formula (5)(\ref{Hcorr}). We also apply a linear transformation, but the eigenspace is first augmented by the vector ζ\zeta in (1)(\ref{quad}).888We remark that with a single spike/factor (i.e., q=1q=1), a linear transformation of the eigenvector(s) adjustment only the eigenvalue, not the eigenvector itself due to its unit length normalization. Further differences with Fan, Liao and Wang (2016) arise in the estimation of the optimal linear transformation. Lettau and Pelger (2020) extract principal components from a rank-one updated sample covariance matrix. This update is based on insight from asset pricing theory and it is unclear how the resulting sample eigenvectors are related to formula (5)(\ref{Hcorr}). The same applies to the very closely related literature on sample covariance matrix shrinkage (e.g., Ledoit and Wolf (2004a), Fisher and Sun (2011), Lancewicki and Aladjem (2014) and Wang and Zhang (2024)).999This takes the form ^Σ=aS+(1a)F\bm{\hat{}}{\Sigma}=a\hskip 1.0ptS+(1-a)F for some a[0,1]a\in[0,1] and matrix FF. Targets FIF\neq I adjust eigenvectors but in ways that may be difficult to quantify via closed-form expressions (c.f. (5)(\ref{Hcorr})).

The vast majority of the literature on approximate factor models and covariance estimation assumes the data matrix aspect ratio p/np/n tends to a finite constant asymptotically.101010This may be due to the outsized influence of random matrix theory (e.g., Marchenko and Pastur (1967)). Another reason may be the consistency of the sample eigenvectors that can be achieved in this regime (see Yata and Aoshima (2009), Shen, Shen and Marron (2016) and Wang and Fan (2017). In contrast, our analysis of a finite sample in the high dimensional limit draws on the work on pca in Jung and Marron (2009), Jung, Sen and Marron (2012) and Shen et al. (2016) and others. In the latter, the hdlss asymptotics for the matrix \scrB\scrH\scrB^{\top}\scrH, appearing in (4)(\ref{obf}), have already been worked out (but see Section 4 for our impossibility theorem). Our main focus is on correcting the biases that the asymptotics of \scrB\scrH\scrB^{\top}\scrH reveal. For approaches to correcting the finite sample bias in eigenvalues and principal component scores, see Yata and Aoshima (2012), Yata and Aoshima (2013), Jung (2022) and our Remark 7. Shen, Shen and Marron (2013) apply regularization in the presence of sparsity in the population eigenvectors to correct finite sample bias in the principal components. It is unclear how their estimators are related to the update in (5)(\ref{Hcorr}), but we do not impose such sparsity assumptions.

Several other strands of the pca literature are relevant as their aims coincide with improved sample eigenvector estimation. In one direction is the literature on sparse and low-rank matrix decompositions (e.g. Chandrasekaran, Parrilo and Willsky (2012), Saunderson et al. (2012), Bai and Ng (2019), Farnè and Montanari (2024) and Li and Shkolnik (2024)). These convex relaxations aim to find more accurate low-dimensional representations of the data and are sometime referred to as forms of robust pca (Candès et al., 2011). In a related direction is the recent work on robust pca for heteroskedastic noise (e.g., Cai et al. (2021), Zhang, Cai and Wu (2022), Yan, Chen and Fan (2021), Agterberg, Lubberts and Priebe (2022) and Zhou and Chen (2023)). These efforts provide (p,np,n finite) bounds on generalized angles between the true and the estimated subspaces and complement our asymptotic pca results in Sections 3 & 5. Perturbations of eigenvectors have also been recently revisited in Fan, Wang and Zhong (2018), Abbe, Fan and Wang (2022) and Li et al. (2022). The latter use these bounds to construct estimators that “de-bias” linear forms such as \scrBz\scrB^{\top}z appearing in (4)(\ref{obf}). These results can likely supply alternative proofs to ours (or even convergence rates), but our focus is on limit theorems only.

Lastly, we emphasize the area of mean-variance portfolio optimization. As the literature on this topic is quite vast, we mention only a few strands related to Section 1.1. Examples of early influential work in this direction include Michaud (1989) and Best and Grauer (1991). For numerical simulations that illustrate the impact on practically motivated models and metrics see Bianchi, Goldberg and Rosenberg (2017). A random matrix theory perspective on the behavior of objectives related to (3)(\ref{Qhx}) may be found in Pafka and Kondor (2003), Bai, Liu and Wong (2009), El Karoui (2010), El Karoui (2013), Bun, Bouchaud and Potters (2017) and Bodnar, Okhrin and Parolya (2022). Highly relevant recent work in econometrics using latent factor models includes Ding, Li and Zheng (2021) who consider a portfolio risk measure closely tied to (3)(\ref{Qhx}). Bayesian approaches to mean-variance optimization include Lai et al. (2011) and Bauder et al. (2021). These estimators are closely related to Ledoit-Wolf shrinkage (Ledoit and Wolf (2003) and Ledoit and Wolf (2004b)) which itself has undergone numerous improvements (e.g., Ledoit and Wolf (2018) and Ledoit and Wolf (2020a)). In tandem, shrinkage methods have been known to impart effects akin to extra constraints in the portfolio optimization as early as Jagannathan and Ma (2003). An insightful example of such robust portfolio optimization that relates (3)(\ref{Qhx}) to the convergence of the covariance matrix estimator is developed in Fan, Zhang and Yu (2012). More advanced robust portfolio optimizations have also been proposed (e.g., Boyd et al. (2024)). Alternatively, constraints are often applied in the covariance matrix estimation process as an optimization in itself. For example, Won et al. (2013) apply a condition number constraint that leads to non-linear adjustments of sample eigenvalues (c.f., Ledoit and Wolf (2020b)), but leaves the sample eigenvectors unchanged. Bongiorno and Challet (2023) document the difficulty with relying solely on eigenvalue correction, especially for small sample sizes. Cai et al. (2020) apply sparsity constraints (on the precision matrix) and analyze optimality properties related to (3)(\ref{Qhx}). We emphasize that the impact of such constraints on eigenvectors is difficult (or impossible) to quantify, in contrast to formula (5)(\ref{Hcorr}).111111It should be noted that another interesting approach to mean-variance portfolio optimization concerns the direct shrinkage of the portfolio weights (i.e., akin to shrinkage x^\hat{x} in (3)(\ref{Qhx}), e.g., Bodnar, Parolya and Schmid (2018), Bodnar, Okhrin and Parolya (2022) Bodnar, Parolya and Thorsén (2023)).

1.4 Notation

Let col(A)\textsc{col}{\hskip 0.5pt(A)} denote the column span of the matrix AA and let ζA\zeta_{A} denote the orthogonal projection of the vector ζ\zeta on col(A)\textsc{col}{\hskip 0.5pt(A)}, e.g.,

ζA=AAζ\displaystyle\zeta_{A}=AA^{\dagger}\zeta (7)

where A=(AA)1AA^{\dagger}=(A^{\top}A)^{-1}A^{\top}, the Moore-Penrose inverse of a full column rank matrix AA. We use II to denote an identity matrix and IqI_{q} when highlighting its dimensions, q×qq\times q.

Write u,v\langle u,v\rangle for the scalar product of u,v\bbRmu,v\in\bbR^{m}, let |u|=u,u|u|=\sqrt{\langle u,u\rangle} and |A||A| be the induced (spectral) norm of a matrix AA. We denote by νm×q()\nu_{m\times q}(\hskip 1.0pt\cdot\hskip 1.0pt) a function that given a m×m\times\ell matrix AA, uniquely selects (see Appendix D) an enumeration of its singular values |A|=Λ11Λmin(,m)|A|=\Lambda_{11}\geq\cdots\geq\Lambda_{\min(\ell,m)} and outputs a m×qm\times q matrix νm×q(A)\nu_{m\times q}(A) of left singular vectors with the values Λ11,,Λqq\Lambda_{11},\dots,\Lambda_{qq} in columns 1,,qmin(m,)1,\dots,q\leq\min(m,\ell). That is,

νm×q(A)=νm×q(AA)=\scrL:\scrLA=Λ\scrR,\scrL\scrL=Iq=\scrR\scrR,\displaystyle\nu_{m\times q}(A)=\nu_{m\times q}(AA^{\top})=\scrL\hskip 4.0pt\hskip 0.5pt:\hskip 0.5pt\hskip 4.0pt\scrL^{\top}A=\Lambda\scrR^{\top},\hskip 8.0pt\scrL^{\top}\scrL=I_{q}=\scrR^{\top}\scrR\hskip 1.0pt, (8)

where Λ\Lambda is the q×qq\times q diagonal with entries Λ11,,Λqq\Lambda_{11},\dots,\Lambda_{qq}, and \scrR\scrR is the ×q\ell\times q matrix of right singular vectors of AA. The m×qm\times q matrix νm×q(A)\nu_{m\times q}(A) also corresponds to some unique choice of eigenvectors of AAAA^{\top} with qq largest eigenvalues Λ112Λqq2\Lambda_{11}^{2}\geq\cdots\geq\Lambda^{2}_{qq}.

We take 0q=(0,,0)\bbRq0_{q}=(0,\dots,0)\in\bbR^{q} and 1q=(1,,1)\bbRq1_{q}=(1,\dots,1)\in\bbR^{q}. Lastly, lim¯p\varliminf_{p\uparrow\infty} and lim¯p\varlimsup_{p\uparrow\infty} denote the limit inferior and superior as pp\uparrow\infty, and A=Ap×mA=A_{p\times m}, a sequence of matrices with dimensions p×mp\times m when at least one of pp or mm grows to infinity.

2 Quadratic Optimization Bias

We begin with a p×pp\times p covariance matrix Σ\Sigma which has the decomposition,

Σ=BB+Γ\displaystyle\Sigma=BB^{\top}+\Gamma (9)

for a p×qp\times q full rank matrix BB and some p×pp\times p invertible, symmetric matrix Γ\Gamma.

The covariance decomposition in (9)(\ref{bSig}) is often associated with assuming a factor model (e.g., see Fan, Fan and Lv (2008)). In the context of large covariance matrix estimation, the following approximate factor model framework is by now standard.121212These conditions originate with Chamberlain and Rothschild (1983) and Assumption 1 closely mirrors theirs as well as those of later work such as Fan, Fan and Lv (2008) and Fan, Liao and Mincheva (2013).

Assumption 1.

The matrices B=Bp×qB=B_{p\times q} and Γ=Γp×p\Gamma=\Gamma_{p\times p} satisfy the following.

  1. (a)

    0<lim¯pinf|v|=1v,Γv<lim¯psup|v|=1v,Γv<0<\varliminf_{p\uparrow\infty}\inf_{|v|=1}\langle v,\Gamma v\rangle<\varlimsup_{p\uparrow\infty}\sup_{|v|=1}\langle v,\Gamma v\rangle<\infty.

  2. (b)

    limp(BB)/p\lim_{p\uparrow\infty}\hskip 1.0pt(B^{\top}B)\hskip 1.0pt/p exists as an invertible q×qq\times q matrix with fixed q1q\geq 1.

In the literature on factor analysis, the entries of a column of BB are called loadings, or exposures to a risk factor corresponding to that column. Condition (b) of Assumption 1 states that all qq risk factors are persistent as the dimension grows and implies the qq largest eigenvalues of Σ=Σp×p\Sigma=\Sigma_{p\times p} grow linearly in pp. Condition (a) states that all remaining variance (or risk) vanishes in the high dimensional limit and the bulk (pqp-q smallest) eigenvalues of Σp×p\Sigma_{p\times p} are bounded in (0,)(0,\infty) eventually. The Γ\Gamma matrix is associated with covariances of idiosyncratic errors, but can have alternative interpretation (e.g., covariance of the specific return of financial assets). Assumption 1 implies limp\scrBνp×q(Σ)Iq\lim_{p\uparrow\infty}\scrB^{\top}\nu_{p\times q}(\Sigma)\to I_{q} for the p×qp\times q sequence \scrB=νp×q(B)\scrB=\nu_{p\times q}(B) of eigenvectors of BBBB^{\top} with nonzero eigenvalues. The latter implication motivates the frequent reference to the \scrB=\scrBp×q\scrB=\scrB_{p\times q} as the asymptotic principal components of Σp×p\Sigma_{p\times p}.

In practice, Σ\Sigma is unknown and an estimated model ^Σ\bm{\hat{}}{\Sigma} is used instead. Let,

^Σ=HH+γ^2I.\displaystyle\bm{\hat{}}{\Sigma}=HH^{\top}+\hat{\gamma}^{2}I\hskip 1.0pt. (10)

for a full rank p×qp\times q matrix HH and a number γ^>0\hat{\gamma}>0. We assume qq is known and allow for γ^\hat{\gamma} to depend on pp provided this sequence is bounded in (0,)(0,\infty). We do not pursue alternative (to γ^2I\hat{\gamma}^{2}I) estimates of Γ\Gamma because, as pointed out below, accurate estimation of the matrix Γ\Gamma is of secondary concern relative to the accuracy of the estimate HH.

For ζ\bbRp\zeta\in\bbR^{p}, the eigenvectors νp×q(B)=\scrB\nu_{p\times q}(B)=\scrB and zH=HHzz_{H}=HH^{\dagger}z per (7)(\ref{eH}), define

\scrEp(H)=\scrB(zzH)|zzH|(z=ζ|ζ|)\displaystyle\hskip 32.0pt\scrE_{p}(H)=\frac{\scrB^{\top}(z-z_{H})}{|z-z_{H}|}\hskip 32.0pt\Big{(}z=\frac{\zeta}{|\zeta|}\Big{)} (11)

assuming |zzH|0|z-z_{H}|\neq 0. We note |\scrEp(H)|1|\scrE_{p}(H)|\leq 1 and that (11)(\ref{optbias}) is a precursor to the quadratic optimization bias function \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) in (4)(\ref{obf}), but the HH in (11)(\ref{optbias}) need not have orthonormal columns (fn. 3). These two definitions are equated in Section 3.

All results in this section continue to hold with (11)(\ref{optbias}) redefined with any \scrB\scrB such that BB=\scrBΛp2\scrBBB^{\top}=\scrB\Lambda_{p}^{2}\scrB^{\top} with |\scrB||\scrB| bounded in pp and diagonal q×qq\times q matrix Λp2\Lambda^{2}_{p}, not necessarily the eigenvalues. This alternative may be useful for some applications.

2.1 Discrepancy of quadratic optima in high dimensions

Returning to the optimization setting of Section 1.1, for constants c0,c1\bbRc_{0},c_{1}\in\bbR and ζ\bbRp\zeta\in\bbR^{p}, we consider

Q^(x)\displaystyle\hat{Q}(x) =c0+c1x,ζ12x,^Σx\displaystyle=c_{0}+c_{1}\hskip 1.0pt\langle x,\zeta\rangle-\frac{1}{2}\langle x,\bm{\hat{}}{\Sigma}x\rangle (12)

which attains maxx\bbRpQ^(x)=Q^(x^)=c0+c12μ^p22\max_{x\in\bbR^{p}}\hat{Q}(x)=\hat{Q}(\hat{x})=c_{0}+\frac{c_{1}^{2}\hat{\mu}_{p}^{2}}{2} at the maximizer x^\bbRp\hat{x}\in\bbR^{p} analogously to (2)(\ref{conv}) but with μ^p2=ζ,^Σ1ζ\hat{\mu}_{p}^{2}=\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle. Because Q^()\hat{Q}(\hskip 1.0pt\cdot\hskip 1.0pt) is not the true objective function Q()Q(\hskip 1.0pt\cdot\hskip 1.0pt) in (1)(\ref{quad}), we are interested in the realized objective Q(x^)Q(\hat{x}). Now,

Q(x^)=c0+c12μ^p22(2x^,Σx^c12μ^p2)=c0+c12μ^p22D^p,\displaystyle Q(\hat{x})=c_{0}+\frac{c_{1}^{2}\hat{\mu}^{2}_{p}}{2}\hskip 1.0pt\Big{(}2-\frac{\langle\hat{x},\Sigma\hat{x}\rangle}{c_{1}^{2}\hat{\mu}_{p}^{2}}\Big{)}=c_{0}+\frac{c_{1}^{2}\hat{\mu}^{2}_{p}}{2}\hskip 1.0pt\hat{D}_{p}\hskip 1.0pt, (13)

which identifies the discrepancy D^p\hat{D}_{p} in (3)(\ref{Qhx}) relative to both Q^(x^)\hat{Q}(\hat{x}) and (2)(\ref{conv}).

To avoid division by zero in (11)(\ref{optbias}), we prevent ζ\bbRp\zeta\in\bbR^{p} from vanishing and residing entirely in col(H)\textsc{col}{\hskip 0.5pt(H)} asymptotically (i.e., |1zH|2=1|ζH|2/|ζ|2|1-z_{H}|^{2}=1-|\zeta_{H}|^{2}/|\zeta|^{2}).131313This edge case must be treated separately from our analysis and we do not pursue it. The entries of ζ\zeta may be viewed as the first pp entries of an infinite sequence or as rows of a triangular array. We further assume the estimate HH has properties consistent BB in view of Assumption 1 (b).

Assumption 2.

Suppose H=Hp×qH=H_{p\times q} and ζ=ζp×1\zeta=\zeta_{p\times 1} satisfy lim¯p|ζH|/|ζ|<1\varlimsup_{p\uparrow\infty}|\zeta_{H}|\hskip 1.0pt/\hskip 1.0pt|\zeta|<1 and lim¯p|ζ|0\varliminf_{p\uparrow\infty}|\zeta|\neq 0. Also, limp(HH)/p\lim_{p\uparrow\infty}(H^{\top}H)\hskip 1.0pt/p exists as a q×qq\times q invertible matrix,

We address the asymptotics of the discrepancy D^p=2x^,Σx^/(c1μ^p)2\hat{D}_{p}=2-\langle\hat{x},\Sigma\hat{x}\rangle/(c_{1}\hat{\mu}_{p})^{2} in (13)(\ref{QhxD}), letting BB=\scrBΛp2\scrBBB^{\top}=\scrB\Lambda_{p}^{2}\scrB as above, with \scrB=νp×q(B)\scrB=\nu_{p\times q}(B) the canonical choice.

Theorem 1.

Suppose Assumptions 1 and 2 hold. Then, for uH=zzH|zzH|u_{H}=\frac{z-z_{H}}{|z-z_{H}|},

D^p=|Λp\scrEp(H)|2γ^2+(2uH,ΓuHγ^2)+O(|\scrEp(H)|+|\scrEp(H)|2+1p).\displaystyle\hat{D}_{p}=-\frac{|\Lambda_{p}\hskip 2.0pt\scrE_{p}(H)|^{2}}{\hat{\gamma}^{2}}+\Big{(}2-\frac{\langle u_{H},\Gamma u_{H}\rangle}{\hat{\gamma}^{2}}\Big{)}+O\Big{(}|\scrE_{p}(H)|+|\scrE_{p}(H)|^{2}+\frac{1}{p}\Big{)}\hskip 1.0pt.
Remark 3.

The proof (see Appendix A) has a more general statement by relaxing the rate of growth of the eigenvalues of Λp2\Lambda^{2}_{p} to a sequence r=rpr=r_{p} (rather than pp). That is, we only assume the limits of BB/rpB^{\top}B/r_{p} and HH/rpH^{\top}H/r_{p} are invertible matrices. In this case, O(1/p)O(1/p) is replaced by O(1/rp)O(1/r_{p}) above. This shows |D^p||\hat{D}_{p}| is in O(rp|\scrEp(H)|2)O(r_{p}\hskip 1.0pt|\scrE_{p}(H)|^{2}).

Theorem 1 reveals that D^p\hat{D}_{p} diverges to -\infty unless we find roots of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt), perhaps asymptotically. Note that \scrEp(B)=0\scrE_{p}(B)=0, but other roots exists (see Section 5).

Lemma 2.

For any full rank p×qp\times q matrix HH with |ζH|<|ζ||\zeta_{H}|<|\zeta| and any q×qq\times q invertible matrix KK, we have \scrEp(H)=\scrEp(HK)\scrE_{p}(H)=\scrE_{p}(HK).

Proof.

This follows by a direct verification using the definition in (7)(\ref{eH}).

ζHK\displaystyle\zeta_{HK} =(HK)ζ=(HK)((HK)(HK))1(HK)ζ\displaystyle=(HK)^{\dagger}\zeta=(HK)((HK)^{\top}(HK))^{-1}(HK)^{\top}\zeta
=(HK)K1(HH)1K(HK)ζ\displaystyle=(HK)K^{-1}(H^{\top}H)^{-1}K^{-\top}(HK)^{\top}\zeta
=H(HH)1Hζ=Hζ=ζH\displaystyle=H(H^{\top}H)^{-1}H^{\top}\zeta=H^{\dagger}\zeta=\zeta_{H}

and with the definition of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) in (11)(\ref{optbias}) we obtain the desired result. ∎

Lemma 2 pinpoints what constitutes a poor “plug-in” covariance estimator ^Σ\bm{\hat{}}{\Sigma}. For example, the column lengths of HH have no effect on the quadratic optimization bias \scrEp(H)\scrE_{p}(H). For the eigenvalue decomposition HH=\scrH\calSp2\scrHHH^{\top}=\scrH\calS_{p}^{2}\scrH^{\top} (with K=\calSp1K=\calS_{p}^{-1} in Lemma 2), we see that \scrEp(H)=\scrEp(\scrH)\scrE_{p}(H)=\scrE_{p}(\scrH). Thus, to fine-tune ^Σ\bm{\hat{}}{\Sigma} for quadratic optimization, one need correct only the basis col(H)\textsc{col}{\hskip 0.5pt(H)}. This amounts to finding the (asymptotic) roots of the function \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt). If the convergence to a root is sufficiently rapid, one may then estimate uH,ΓuH\langle u_{H},\Gamma u_{H}\rangle closely by γ^2\hat{\gamma}^{2} to bring the discrepancy D^p\hat{D}_{p} to one per Theorem 1. We conclude this section by showing that for many applications the rate of convergence of \scrEp(H)\scrE_{p}(H) is less important than Theorem 1 suggests.

2.2 Applications

To illustrate some important examples in practice, we consider the following canonical, constrained optimization problem.

minw\bbRp\displaystyle\min_{w\in\bbR^{p}} σ^2\displaystyle\hskip 4.0pt\hat{\sigma}^{2} (14)
σ^2\displaystyle\hat{\sigma}^{2} =w,^Σw\displaystyle=\langle w,\bm{\hat{}}{\Sigma}w\rangle
w,ζ\displaystyle\langle w,\zeta\rangle =1\displaystyle=1

Now, Q^()\hat{Q}(\hskip 1.0pt\cdot\hskip 1.0pt) in (12)(\ref{hQx}) is the Lagrangian for (14)(\ref{minvar}) with c0=0c_{0}=0 and c1=ζ,^Σ1ζ1c_{1}=\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle^{-1}, which decays as 1/p1/p under Assumption 2. The minimizer w^\bbRp\hat{w}\in\bbR^{p} of (14)(\ref{minvar}) corresponds to the weights of a minimum variance portfolio of financial assets with ζ=1p\zeta=1_{p} implementing the “full-investment” constraint. Minimum variance and the more general mean-variance optimized portfolios are widely used in finance. Here, the pp entries of a column of BB in (9)(\ref{bSig}) represent the exposures of pp assets to that risk factor, e.g., market risk (bull/bear market), industry risk (energy, automotive, etc.), climate risk (migration, drought, etc.), innovation risk (Chat GPT, etc). Similar formulations based on (14)(\ref{minvar}) arise in signal-noise processing and climate science (see Appendix E).141414In signal-noise processing, (14)(\ref{minvar}) maximizes the signal-to-noise ratio of a beamformer with ζ\zeta referred to as the “steering vector”. The same is done for optimal fingerprinting in climate science with ζ\zeta called the “guess pattern”. We review this literature with emphasis on estimation of Σ\Sigma in Appendix E.

Continuing with the above example, the minimum σ^2\hat{\sigma}^{2} of (14)(\ref{minvar}) corresponds to the variance of the estimated portfolio w^\hat{w}, while the expected out-of-sample variance is,

\scrVp2=w^,Σw^.\displaystyle\scrV_{p}^{2}=\langle\hat{w},\Sigma\hat{w}\rangle\hskip 1.0pt. (15)

We have D^p=2μ^p2\scrVp2\hat{D}_{p}=2-\hat{\mu}_{p}^{2}\scrV_{p}^{2} (see Appendix A) and, under the conditions of Theorem 1,

\scrVp2=|Λp\scrEp(H)|2p|zzH|2+O(1/p)\displaystyle\scrV_{p}^{2}=\frac{|\Lambda_{p}\scrE_{p}(H)|^{2}}{p|z-z_{H}|^{2}}+O(1\hskip 1.0pt/p) (16)

because |ζ|2=|1p|2=p|\zeta|^{2}=|1_{p}|^{2}=p. Because |Λp|2/p|\Lambda_{p}|^{2}/p converges in (0,)(0,\infty) as pp\uparrow\infty under our Assumption 1(b), we achieve (in expectation) an asymptotically riskless portfolio provided the convergence |\scrEp(H)|0|\scrE_{p}(H)|\to 0 and irrespective of its rate.

3 Principal Component Analysis

Let YY denote a p×np\times n data matrix of pp variables observed at nn dates which, for a random n×qn\times q matrix \calX\calX and random p×np\times n matrix \calE\calE, follows the linear model,

Y=B\calX+\calE.\displaystyle Y=B\calX^{\top}+\hskip 1.0pt\calE\hskip 1.0pt. (17)

The p×qp\times q matrix BB forms the unknown to be estimated, and only YY is observed, while \calX\calX is a matrix of latent variables and the matrix \calE\calE represents an additive noise.

The pca estimate HH of BB may be derived from q1q\geq 1 leading terms of the spectral decomposition of the p×pp\times p sample covariance matrix SS (see Remark 4), i.e.,

S=(\scrs2,h)\scrs2hh=HH+G\displaystyle S={\textstyle\sum\nolimits}_{(\scrs^{2},\hskip 1.0pth)}\scrs^{2}hh^{\top}=HH^{\top}+G (18)

where the sum is over all eigenvalue/eigenvector pairs (\scrs2,h)(\scrs^{2},h) for h\bbRph\in\bbR^{p} of unit length (i.e., |h|=1|h|=1). The jjth column η\eta of the p×qp\times q matrix HH in (18)(\ref{spectral}) is taken as η=\scrsh\eta=\scrs h where \scrs2\scrs^{2} is the jjth largest eigenvalue of SS. The matrix G=SHHG=S-HH^{\top} forms the residual. Ordering the eigenvalues of SS as \scrs1,p2\scrs2,p2\scrsp,p2\scrs^{2}_{1,p}\geq\scrs^{2}_{2,p}\geq\cdots\geq\scrs^{2}_{p,p}, we have

\scrH=H\calSp1;HH=\calSp2,\displaystyle\scrH=H\calS^{-1}_{p}\hskip 1.0pt;\hskip 32.0ptH^{\top}H=\calS_{p}^{2}\hskip 1.0pt, (19)

where \calSp2\calS_{p}^{2} is a q×qq\times q diagonal matrix with entries (\calSp2)jj=\scrsj,p2(\calS_{p}^{2})_{jj}=\scrs^{2}_{j,p} and the columns of the matrix \scrH\scrH are the associated sample eigenvectors hh in (18)(\ref{spectral}) with \scrH\scrH=Iq\scrH^{\top}\scrH=I_{q}.

Since data is often centered in practice, in addition to (17)(\ref{data}), we consider the eigenvectors \scrH\scrH of the transformed p×np\times n data matrix YJY\hskip 1.0ptJ where for any g\bbRng\in\bbR^{n},

J=Igg|g|2(\scrH=νp×q(YJ)=νp×q(S)),\displaystyle\hskip 64.0ptJ=I-\frac{\hskip 5.0ptg\hskip 1.0ptg^{\top}}{|g|^{2}}\hskip 32.0pt\big{(}\scrH=\nu_{p\times q}(YJ)=\nu_{p\times q}(S)\big{)}\hskip 1.0pt, (20)

and the sample covariance in (18)(\ref{spectral}) is given by S=YJY/nS=Y\hskip 1.5ptJY^{\top}/n since JJ=JJ\hskip 1.0ptJ^{\top}=J. Centering the nn columns of YY entails the choice g=1ng=1_{n} in (20)(\ref{Jc}) but we allow J=IJ=I.

Remark 4.

The identity E(S)=Σ=BB+Γ\textup{{E}}\hskip 0.5pt(S)=\Sigma=BB^{\top}+\Gamma is the aim of centering and holds under well-known conditions, e.g., YY has i.i.d. columns, E(\calXJ\calX)/n=I\textup{{E}}\hskip 0.5pt(\calX^{\top}J\calX)\hskip 1.0pt/n=I with the \calX\calX and \calE\calE uncorrelated. We do not require that E(S)=Σ\textup{{E}}\hskip 0.5pt(S)=\Sigma for the results of this section.

Our results require the following signal-to-noise ratio (diagonal) matrix Ψ{{\Psi}}, where the “noise” is specified in terms of the average of the bulk eigenvalues, κp2\kappa^{2}_{p} (c.f., (25)(\ref{bulk2})).

Ψ2=Iqκp2\calSp2;κp2=j>q\scrsj,p2n+q(n+>q),\displaystyle\hskip 64.0pt{{\Psi}}^{2}=I_{q}-\kappa^{2}_{p}\hskip 1.0pt\calS_{p}^{-2}\hskip 1.0pt;\hskip 32.0pt\kappa^{2}_{p}=\frac{\sum_{j>q}\scrs^{2}_{j,p}}{n_{+}-q}\hskip 32.0pt(n_{+}>q)\hskip 1.0pt, (21)

where n+n_{+} is the number of nonzero eigenvalues of SS. When p>np>n, ensuring per (21)(\ref{snr-bulk}) that n+>qn_{+}>q implies, for YY of full rank, that n+=nn_{+}=n for J=IJ=I and n+=n1n_{+}=n-1 otherwise. For p>np>n, the eigenvectors νn×q(JY)\nu_{n\times q}(JY^{\top}) and eigenvalues \calSp2\calS_{p}^{2} may also be computed more efficiently using the smaller n×nn\times n matrix JYYJ/nJY^{\top}YJ/n which shares its nonzero eigenvalues with SS. This computation is represented as follows (c.f. (20)(\ref{Jc})).

\scrH=Yνn×q(JY)\calSp1/n\displaystyle\hskip 32.0pt\scrH=Y\hskip 1.0pt\nu_{n\times q}(JY^{\top})\hskip 1.0pt\calS_{p}^{-1}/\sqrt{n} (22)

The pca–estimated model for Σ=BB+Γ\Sigma=BB^{\top}+\Gamma takes H=\scrH\calSpH=\scrH\calS_{p} in (19)(\ref{pcs}) and our estimator ^Σ=HH+γ^2I\bm{\hat{}}{\Sigma}=HH^{\top}+\hat{\gamma}^{2}I for the simple choice γ^2=nκp2/p\hat{\gamma}^{2}=n\kappa^{2}_{p}\hskip 1.0pt/p which suffices in view of Section 2. We prove (Theorem 4) that γ^2\hat{\gamma}^{2} consistently estimates the average idiosyncratic variance tr(Γ)/p\textup{\text{tr}}(\Gamma)/p as pp\uparrow\infty, under our upcoming Assumption 6.151515The residual G=SHHG=S-HH^{\top} is typically regularized to form an robust estimate of Γ\Gamma. Examples include zeroing out all but the diagonal of this matrix, and the POET estimator Fan, Liao and Mincheva (2013).

Sections 3.13.2 below define \scrB=νp×q(B)\scrB=\nu_{p\times q}(B), the qq eigenvectors of BBBB^{\top} associated with the largest qq eigenvalues (as other choices were possible for Theorem 1 and the definition of \scrEp(H)\scrE_{p}(H) in (11)(\ref{optbias})). We do not require E(S)=Σ\textup{{E}}\hskip 0.5pt(S)=\Sigma per Remark 4.

3.1 Norm of the optimization bias for pca

We analyze the asymptotics \scrEp(H)\scrE_{p}(H) for the pca estimate HH. Lemma 2 with K=\calSpK=\calS_{p} in (19)(\ref{pcs}) and \scrH\scrH=I\scrH^{\top}\scrH=I imply that zH=HHz=\scrH\scrHz=z\scrHz_{H}=HH^{\dagger}z=\scrH\scrH^{\top}z=z_{\scrH} and z,z\scrH=|z\scrH|2\langle z,z_{\scrH}\rangle=|z_{\scrH}|^{2}, which reduces (11)(\ref{optbias}) to

\scrEp(H)=\scrEp(\scrH)=\scrB(zz\scrH)|zz\scrH|=\scrBz(\scrB\scrH)(\scrHz)1|\scrHz|2.\displaystyle\scrE_{p}(H)=\scrE_{p}(\scrH)=\frac{\scrB^{\top}(z-z_{\scrH})}{|z-z_{\scrH}|}=\frac{\scrB^{\top}z-(\scrB^{\top}\scrH)\hskip 1.0pt(\scrH^{\top}z)}{\sqrt{1-|\scrH^{\top}z|^{2}}}\hskip 1.0pt. (23)

The unknowns in (23)(\ref{optbias_gps}) are \scrBz\scrB^{\top}z and \scrB\scrH\scrB^{\top}\scrH and we provide theoretical evidence that they cannot be estimated from data in Section 4 without very strong assumptions. Here, we nevertheless obtain an estimate of the length |\scrEp(H)||\scrE_{p}(H)|. The following addresses a division by zero in (23)(\ref{optbias_gps}). Recall that z=ζ|ζ|z=\frac{\zeta}{|\zeta|} and ζB=BBζ\zeta_{B}=BB^{\dagger}\zeta per (7)(\ref{eH}).

Assumption 5.

lim¯p|ζB|/|ζ|<1\varlimsup_{p\uparrow\infty}|\zeta_{B}|\hskip 1.0pt/\hskip 1.0pt|\zeta|<1 and lim¯p|ζ|0\varliminf_{p\uparrow\infty}|\zeta|\neq 0 for ζ=ζp×1\zeta=\zeta_{p\times 1}.

Our next assumption concerns the matrices \calX\calX and \calE\calE in (17)(\ref{data}). These guarantee that almost all realizations of the data YY have full rank for sufficiently large pp, allowing us to treat n+n_{+} in (21)(\ref{snr-bulk}) as n+=nn_{+}=n when J=IJ=I and n+=n1n_{+}=n-1 otherwise.

Assumption 6.

Assumption 1 on the matrices B=Bp×qB=B_{p\times q} and Γ=Γp×p\Gamma=\Gamma_{p\times p} holds and the following conditions hold for \calX\calX and sequences \calE=\calEp×n\calE=\calE_{p\times n} and ζ=ζp×1\zeta=\zeta_{p\times 1}.

  1. (a)

    Only YY is observed (the variables \calX,\calE\calX,\calE in (17)(\ref{data}) are latent).

  2. (b)

    The true number of factors qq is known and n+>qn_{+}>q (with nn fixed).

  3. (c)

    \calXJ\calX\calX^{\top}J\calX is (q×qq\times q) invertible almost surely (and does not depend on pp).

  4. (d)

    limp\calE\calE/p=γ2I\lim_{p\uparrow\infty}\hskip 1.0pt\calE^{\top}\calE/p=\gamma^{2}I almost surely for some constant γ>0\gamma>0.

  5. (e)

    lim¯pJ\calEB/p=0\varlimsup_{p\uparrow\infty}\|\hskip 1.0ptJ\hskip 1.0pt\calE^{\top}B\|\hskip 1.0pt/p=0 almost surely for some matrix norm \|\hskip 1.0pt\cdot\hskip 1.0pt\| on \bbRn×q\bbR^{n\times q}.

  6. (f)

    lim¯p|J\calEz|/p=0\varlimsup_{p\uparrow\infty}|\hskip 1.0ptJ\hskip 1.0pt\calE^{\top}z|\hskip 1.0pt/\sqrt{p}=0 almost surely where z=ζ/|ζ|z=\zeta/|\zeta|.

These conditions are discussed below. Our fundamental result on pca (in conjunction with Theorem 1) may now be stated. Its proof is deferred to Appendix B.

Theorem 3.

Suppose Assumptions 5 & 6 hold. Then, almost surely,

limp(|\scrEp(\scrH)||Π\scrHz||zz\scrH|)=0,\displaystyle\lim_{p\uparrow\infty}\Big{(}|\scrE_{p}(\scrH)|-\frac{|\Pi\scrH^{\top}z|}{|z-z_{\scrH}|}\Big{)}=0\hskip 1.0pt, (24)

where Π=(Ψ1Ψ)\Pi=({{\Psi}}^{-1}-{{\Psi}}). Moreover, the length of the pca optimization bias |\scrEp(\scrH)||\scrE_{p}(\scrH)| is eventually in (0,)(0,\infty) almost surely, provided limp\scrHz0q\lim_{p\uparrow\infty}\scrH^{\top}z\neq 0_{q} (see Corollary 6).

We remark that φ=Π\scrHz|zz\scrH|\bbRq\varphi=\frac{\Pi\scrH^{\top}z}{|z-z_{\scrH}|}\in\bbR^{q} is computable solely from the data YY with almost every |φ||\varphi| bounded in [0,)[0,\infty) eventually. Theorem 3 demonstrates that pca, and sample eigenvectors \scrH\scrH in particular, lead to poor “plug-in” covariance estimators for quadratic optimization unless every column of \scrH\scrH is eventually orthogonal to ζ\zeta. So typically, the discrepancy D^p\hat{D}_{p} in (13)(\ref{QhxD}) between the estimated and realized optima diverges to -\infty as pp grows and at a linear rate. In the portfolio application of Section 2.2, this covariance results in strictly positive expected (out-of-sample) portfolio risk \scrVp\scrV_{p} per (15)(\ref{tru})(16)(\ref{tru_asymp}) asymptotically, which may be approximated by using |φ||\varphi|.

We make some remarks on Assumption 6. Conditions (a)(c) are straightforward, but we mention that the invertibility of \calXJ\calX\calX^{\top}J\calX is closely related to the requirement that n+>qn_{+}>q in condition (b). Condition (c) fails when gg in (20)(\ref{Jc}) lies in col(\calX)\textsc{col}{\hskip 0.5pt(\calX)} but such a case is dealt with by rewriting the data in (17)(\ref{data}) as Y=αg+Bα\calXα+\calEY=\alpha g^{\top}+B_{\alpha}\calX^{\top}_{\alpha}+\calE for some BαB_{\alpha} and \calXα\calX_{\alpha} of q1q-1 columns each, and some mean vector α\bbRp\alpha\in\bbR^{p}. Then, we have YJ=Bα\calXαJ+\calEJYJ=B_{\alpha}\calX^{\top}_{\alpha}J+\calE J, and it only remains to check if condition (c) holds with the matrix \calXα\calX_{\alpha} replacing \calX\calX. Conditions (d)(f) require that strong laws of large numbers hold for the columns of the sequence \calE=\calEp×n\calE=\calE_{p\times n}. These roughly state that the columns of \calE\calE are stationary with weakly dependent entries having bounded fourth moments. All three are easily verified for the \calEp×n\calE_{p\times n} populated by i.i.d. Gaussian random entries. Lastly, we remark that if conditions (e) and (f) hold for J=IJ=I they hold for any JJ.

Since in practice both nn and pp are finite, we can make some refinements to the definitions in (21)(\ref{snr-bulk}) based on some classical random matrix theory. In particular, it is well known that when the aspect ratio n/pn/p converges in [0,)[0,\infty) (in our case to zero), the eigenvalues of \calE\calE/p\calE^{\top}\calE/p have support that is approximately between γ2(1n/p)2\gamma^{2}(1-\sqrt{n/p})^{2} and γ2(1+n/p)2\gamma^{2}(1+\sqrt{n/p})^{2} for the constant γ2\gamma^{2} in condition (d). We can then define,

κp2=(1+n/pn+q+n/p)j>q\scrsj,p2\displaystyle\kappa_{p}^{2}=\bigg{(}\frac{1+n/p}{n_{+}-q+n/p}\bigg{)}\sum_{j>q}\scrs^{2}_{j,p} (25)

which is a Marchenko-Pastur type adjustment to κp2\kappa_{p}^{2} (and Ψ2{{\Psi}}^{2}) defined in (21)(\ref{snr-bulk}). When the eigenvalues of \calE\calE/p\calE^{\top}\calE/p obey the Marchenko-Pastur law, this κp2\kappa^{2}_{p} is advisable.

3.2 hdlss results for pca

Theorem 3 is essentially a corollary of our next result, which is of independent theoretical interest for the hdlss literature.

Theorem 4.

Suppose Assumption 6 holds. Then, hold almost surely.

  1. (a)

    limpΨ\calSpKp1=I\lim_{p\uparrow\infty}\Psi\calS_{p}K_{p}^{-1}=I where KpK_{p} is a q×qq\times q diagonal matrix with (Kp2)jj(K^{2}_{p})_{jj}, the jjth largest eigenvalue of the p×pp\times p matrix BVnBB\hskip 1.0ptV_{n}B^{\top} where Vn=\calXJ\calX/nV_{n}=\calX^{\top}J\calX/n.

  2. (b)

    limpnκp2pγ2=1\lim_{p\uparrow\infty}\frac{n\kappa^{2}_{p}}{p\gamma^{2}}=1 for the constant γ\gamma of Assumption 6(d).

  3. (c)

    limp|\scrH\scrB\scrB\scrHΨ2|=0\lim_{p\uparrow\infty}|\scrH^{\top}\scrB\scrB^{\top}\scrH-{{\Psi}}^{2}|=0 and every Ψjj2\Psi^{2}_{jj} is eventually in (0,1)(0,1).

  4. (d)

    limp|\scrHz(\scrH\scrB)\scrBz|=0\lim_{p\uparrow\infty}|\scrH^{\top}z-(\scrH^{\top}\scrB)\scrB^{\top}z|=0.

The proof is deferred to Appendix B. Parts (a)(b) should not surprise those well versed in the hdlss literature. Nevertheless, these limit theorems for eigenvalues provide new content by supplying estimators, not just asymptotic descriptions.

Remark 7.

Parts (a)(b) of Theorem 4 supply improved eigenvalue estimates for the pca covariance model when E(S)=Σ\textup{{E}}\hskip 0.5pt(S)=\Sigma, and while these have no effect on the optimization bias \scrEp(\scrH)\scrE_{p}(\scrH), we summarize them. Part (a) implies HΨ2H=\scrH(\calSpΨ)2\scrHH\Psi^{2}H^{\top}=\scrH(\calS_{p}{{\Psi}})^{2}\scrH^{\top} is an improved estimator (relative to HHHH^{\top}) of the population matrix BBBB^{\top}. Part (b) implies that γ^2=nκp2/p\hat{\gamma}^{2}=n\kappa_{p}^{2}\hskip 1.0pt/p is an asymptotic estimator of tr(Γ)/p\textup{\text{tr}}(\Gamma)/p where Γ=E(\calEJ\calE/n)\Gamma=\textup{{E}}\hskip 0.5pt(\calE J\calE^{\top}/n). To see this, w.l.o.g. take J=IJ=I, and note that the trace tr and the expectation E  commute. Then, tr(Γ)=tr(E(\calE\calE/n))=(p/n)E(tr(\calE\calE/p))\textup{\text{tr}}\hskip 1.0pt(\Gamma)=\textup{\text{tr}}\hskip 1.0pt(\textup{{E}}\hskip 0.5pt(\calE\calE^{\top}/n))=(p/n)\hskip 1.0pt\textup{{E}}\hskip 0.5pt(\textup{\text{tr}}\hskip 1.0pt(\calE^{\top}\calE/p)) and since \calE\calE/pγ2In×n\calE^{\top}\calE/p\to\gamma^{2}I_{n\times n} (Assumption 6(d)) provided \calE\calE/p\calE^{\top}\calE/p is uniformly integrable, tr(Γ)/p\textup{\text{tr}}(\Gamma)/p converges to γ2\gamma^{2}.

The limits in parts (c)(d) of Theorem 4 are new and noteworthy. They supply estimators for the quantities \scrH\scrB\scrB\scrH\scrH^{\top}\scrB\scrB^{\top}\scrH and \scrHzB=(\scrH\scrB)\scrBz\scrH^{\top}z_{B}=(\scrH^{\top}\scrB)\scrB^{\top}z from data. While these are not enough to estimate \scrEp(\scrH)\scrE_{p}(\scrH) (for that we need both \scrB\scrH\scrB^{\top}\scrH and \scrBz\scrB^{\top}z), they suffice for the task of estimating the norm |\scrEp(\scrH)||\scrE_{p}(\scrH)| from the data YY.

The convergence in part (c) has an interpretation. By direct calculation we have that \scrH\scrB\scrB\scrH=(BB\scrH)(BB\scrH)\scrH^{\top}\scrB\scrB^{\top}\scrH=(BB^{\dagger}\scrH)^{\top}(BB^{\dagger}\scrH) (e.g., see (87)(\ref{HBBH}) in Appendix B), which implies that for columns j,jj,j^{\prime} of \scrH\scrH, say hh and hh^{\prime}, the jjjj^{\prime}th entry of \scrH\scrB\scrB\scrH\scrH^{\top}\scrB\scrB^{\top}\scrH is hB,hB\langle h_{B},h^{\prime}_{B}\rangle, i.e., the inner product of hh and hh^{\prime} projected onto col(B)\textsc{col}{\hskip 0.5pt(B)}. This is in contrast to the jjjj^{\prime}th entry h,b\langle h,b^{\prime}\rangle of \scrH\scrB\scrH^{\top}\scrB where bb^{\prime} is the jj^{\prime}th column of \scrB\scrB. Part (c) states that,

limphB,hB\displaystyle\hskip 32.0pt\lim_{p\uparrow\infty}\hskip 1.0pt\langle h_{B},h^{\prime}_{B}\rangle =0(hh)\displaystyle=0\hskip 32.0pt(h\neq h^{\prime}) (26)
limp|hB|Ψjj1\displaystyle\lim_{p\uparrow\infty}|h_{B}|\hskip 1.0pt\Psi^{-1}_{jj} =1(j=j)\displaystyle=1\hskip 32.0pt(j=j^{\prime})

almost surely, where Ψjj\Psi_{jj} is itself a random sequence eventually in (0,1)(0,1). That is, sample eigenvectors remain orthogonal in col(B)\textsc{col}{\hskip 0.5pt(B)}, but their norms are less than the maximal unit length, i.e., columns of \scrH\scrH are inconsistent estimators of columns of \scrB\scrB.

The following elegant characterization is an artifact of the fact that square matrices with orthonormal rows must also have orthonormal columns.

Corollary 5.

Let \calH=\scrHΨ1\calH=\scrH\Psi^{-1}. Under the hypotheses of Theorem 4 the q×qq\times q matrices \calH\scrB\calH^{\top}\scrB and \scrB\calH\scrB^{\top}\calH are asymptotic inverses of one another., i.e., almost surely,

limp|\calH\scrB\scrB\calHI|=limp|\scrB\calH\calH\scrBI|=0.\displaystyle\lim_{p\uparrow\infty}|\hskip 1.0pt\calH^{\top}\scrB\scrB^{\top}\calH-I\hskip 1.0pt|=\lim_{p\uparrow\infty}|\hskip 1.0pt\scrB^{\top}\calH\calH^{\top}\scrB-I\hskip 1.0pt|=0\hskip 1.0pt. (27)

Applying this to limp|\calHz\calH\scrB\scrBz|=0\lim_{p\uparrow\infty}|\calH^{\top}z-\calH^{\top}\scrB\scrB^{\top}z|=0 per Theorem 4(d), yields

limp|zB||\calHz|=1\displaystyle\lim_{p\uparrow\infty}\frac{|z_{B}|}{|\calH^{\top}z|}=1 (28)

provisionally on Assumption 5 and without it, both |zB|=|\scrBz||z_{B}|=|\scrB^{\top}z| and |\calHz||\calH^{\top}z| converge to zero. Thus, Theorem 4 implies we can asymptotically know the length |zB||z_{B}| (the norm of the projection of zz in col(B)\textsc{col}{\hskip 0.5pt(B)}). Further, as all diagonal entries of Ψ\Psi are eventually smaller than one and |\calHz||Ψ1||\scrHz|=|Ψ|1|zH||\calH^{\top}z|\leq|\Psi^{-1}||\scrH^{\top}z|=|\Psi|^{-1}|z_{H}|, we deduce that col(B)\textsc{col}{\hskip 0.5pt(B)} has larger projection onto zz than does col(H)\textsc{col}{\hskip 0.5pt(H)} eventually in pp. We conclude with a simple consequence of Theorem 4(d) relevant for Theorem 3.

Corollary 6.

Suppose that Assumption 6 holds. Then, limp\scrBz=0q\lim_{p\uparrow\infty}\scrB^{\top}z=0_{q} implies limp\scrHz=0q\lim_{p\uparrow\infty}\scrH^{\top}z=0_{q} almost surely.

4 An Impossibility Theorem

The problem of estimating the unknown \scrB\scrH\scrB^{\top}\scrH and \scrBz\scrB^{\top}z appearing in (23)(\ref{optbias_gps}) encounters significant challenges for q>1q>1. It is related to, but separate from, the problem called “unidentifiability” that arises in the context of factor analysis (e.g., Shapiro (1985)). Here, we prove an “impossibility” result. To give an interpretation of \scrB\scrH\scrB^{\top}\scrH, we now require E(S)=Σ\textup{{E}}\hskip 0.5pt(S)=\Sigma and \scrB=νp×q(B)\scrB=\nu_{p\times q}(B), so that \scrH\scrH and \scrB\scrB may be regarded as the sample and the population eigenvectors (or asymptotic principal components).

With \scrBΛp\scrW\scrB\Lambda_{p}\scrW^{\top} denoting the singular value decomposition of B\bbRp×qB\in\bbR^{p\times q} in (17)(\ref{data}), and similarly (1/n)Y\scrU=\scrH\calSp(1/\sqrt{n})\hskip 1.0ptY\scrU=\scrH\calS_{p} with \scrU\bbRn×q\scrU\in\bbR^{n\times q}, we find (see Appendix B),

limp\scrB\scrH=limpΛp\scrW\calX\scrU\scrSp1/n\lim_{p\uparrow\infty}\scrB^{\top}\scrH=\lim_{p\uparrow\infty}\Lambda_{p}\scrW^{\top}\calX^{\top}\scrU\scrS_{p}^{-1}/\sqrt{n} (29)

which holds almost surely under Assumption 6. This limit relation has been studied in the hdlss literature under various conditions and modes of convergence (e.g., Jung, Sen and Marron (2012) and Shen et al. (2016)). But these authors do not derive estimators for the right side of (29)(\ref{BHlim_text}) (i.e., the Λp\scrW\calX\Lambda_{p}\scrW^{\top}\calX^{\top} is not observed).

We prove that it is not possible, without very strong assumption on \calX\calX, to develop asymptotic estimators of the inner product matrix \scrB\scrH\scrB^{\top}\scrH. Given this, it is also reasonable to conjecture the same for \scrEp(\scrH)\bbRq\scrE_{p}(\scrH)\in\bbR^{q}. While this problem is motivated by our study of the quadratic optimization bias, the estimation of the entries of \scrB\scrH\scrB^{\top}\scrH, and hence the estimation of angles between the sample and population eigenvectors, is an interesting (and to our knowledge, uninvestigated) problem in its own right.

We remark that the problem of “unidentifiability” amounts to the observation that replacing BB and \calX\calX by B\scrOB\scrO and \calX\scrO\calX\scrO for any orthogonal matrix \scrO\scrO does not alter the observed data matrix Y=B\calX+\calEY=B\calX^{\top}+\calE deeming BB unidentifiable (i.e., BB or B\scrOB\scrO?). However, the quantity of interest in our work is \scrB\scrH\scrB^{\top}\scrH, which bypasses this type of unidentifiability as \scrB\scrB is defined via the identity νp×q(B)=νp×q(B\scrO)\nu_{p\times q}(B)=\nu_{p\times q}(B\scrO) which is a population quantity encoding the uniquely selected qq eigenvectors of E(S)=Σ\textup{{E}}\hskip 0.5pt(S)=\Sigma. Hence, the unidentifability of BB is related to but not the same as problem we formulate.

We work in a setting where the noise \calE\calE in (17)(\ref{data}) is null and the matrices B=Bp×qB=B_{p\times q} have additional regularity over Assumption 1. The presumption here is that these simplifications can make our stated estimation problem for \scrB\scrH\scrB^{\top}\scrH only easier.

Condition 8.

The data matrices Y=Yp×nY=Y_{p\times n} with n>qn>q fixed have Y=B\calXY=B\calX for a sequence B=Bp×qB=B_{p\times q} and \calX\bbRn×q\calX\in\bbR^{n\times q} satisfying the following.

  1. (a)

    The \calX\calX is a random variable on a probability space (Ω,\calF,P)(\Omega,\calF,\textup{{P}}\hskip 0.5pt) with Vn2=\calX\calX/nV^{2}_{n}=\calX^{\top}\calX/n almost surely invertible and such that E(Vn2)=Iq\textup{{E}}\hskip 0.5pt(V^{2}_{n})=I_{q}.

  2. (b)

    The B=Bp×qB=B_{p\times q} (for all pp) satisfies B/p=\scrBΛ\scrWB/\sqrt{p}=\scrB\Lambda\scrW^{\top} for \scrB=νp×q(B)\scrB=\nu_{p\times q}(B), a fixed q×qq\times q orthogonal \scrW\scrW and fixed q×qq\times q diagonal Λ\Lambda with ΛiiΛjj\Lambda_{ii}\neq\Lambda_{jj} for all iji\neq j.

Any (B,\calX)(B,\calX) of Condition 8 has B=Bp×qB=B_{p\times q} obeying Assumption 1(b) with limpBB/p=\scrWΛ2\scrW\lim_{p\uparrow\infty}B^{\top}B/p=\scrW\Lambda^{2}\scrW^{\top}, and \calX\calX for which the sample covariance S=YY/nS=YY^{\top}/n satisfies E(S)=Σ=BB=\scrBΛp2\scrB\textup{{E}}\hskip 0.5pt(S)=\Sigma=BB^{\top}=\scrB\Lambda_{p}^{2}\scrB^{\top} for the eigenvalue matrix Λp2=pΛ2\Lambda_{p}^{2}=p\Lambda^{2}.

For B=Bp×qB=B_{p\times q} satisfying Condition 8(b), we define a set of orthogonal transformations which non-trivially change the eigenvectors \scrB=νp×q(BB)\scrB=\nu_{p\times q}(BB^{\top}). Let,

\bbOB={\scrO\bbRq×q:νp×q(\scrBoΛ\scrW)=\scrBo=\scrB\scrO for all p}.\displaystyle\bbO_{B}=\big{\{}\scrO\in\bbR^{q\times q}\hskip 0.5pt:\hskip 0.5pt\nu_{p\times q}(\scrB_{o}\Lambda\scrW^{\top})=\scrB_{o}=\scrB\scrO\text{ for all $p$}\hskip 1.0pt\big{\}}\hskip 1.0pt. (30)

Every element \scrO\bbOB\scrO\in\bbO_{B} induces the data Y=Bo\calXY=B_{o}\calX^{\top} with Bo=\scrBoΛ\scrWpB_{o}=\scrB_{o}\Lambda\scrW^{\top}\sqrt{p} and \scrBo=\scrB\scrO\scrB_{o}=\scrB\scrO, which is uniquely identified by the orthogonal matrix \scrO\scrO. The new data set built in this way satisfies Condition 8 for (B,\calX)(B,\calX) of that condition. We remark that the only diagonal element of \bbOB\bbO_{B} is the identity matrix IqI_{q} (i.e., flipping the signs of any of the columns of \scrB\scrB does not result in a different set of eigenvectors νp×q(B)\nu_{p\times q}(B)). Indeed, if we partition all q×qq\times q orthogonal matrices by the equivalence relation that sets two matrices equivalent when their columns differ only by a sign, then the set \bbOB\bbO_{B} selects exactly one element from each equivalence class. Since the number of elements in each equivalence class is finite, we have established that the set \bbOB\bbO_{B} has the same cardinality as the set of all orthogonal matrices with dimensions q×qq\times q.

We now consider \bbG\bbOB\bbG\subseteq\bbO_{B} and a sequence of (nonrandom) measurable functions fp:\bbRp×q\bbRq×qf_{p}\hskip 0.5pt:\hskip 0.5pt\bbR^{p\times q}\to\bbR^{q\times q} that together with the notation YB=B\calXY_{B}=B\calX^{\top} define,

Af,\bbG(\scrB)={ωΩ:limp|\scrH\scrBofp(YBo)|(ω)=0,\scrO\bbG{Iq}}.\displaystyle A_{f,\bbG}(\scrB)=\{\omega\in\Omega\hskip 0.5pt:\hskip 0.5pt\lim_{p\uparrow\infty}|\scrH^{\top}\scrB_{o}-f_{p}(Y_{B_{o}})|(\omega)=0,\hskip 2.0pt\scrO\in\bbG\cup\{I_{q}\}\}\hskip 1.0pt. (31)

The event Af,\bbG(\scrB)A_{f,\bbG}(\scrB) consists of all outcomes for which the fp(YBo)f_{p}(Y_{B_{o}}) consistently estimate \scrH\scrBo\scrH^{\top}\scrB_{o} as pp\uparrow\infty for every \scrO\bbG{Iq}\scrO\in\bbG\cup\{I_{q}\}. The following lemma may be used to generate bounds on the probability of event Af,\bbG(\scrB)A_{f,\bbG}(\scrB) for many examples.

Lemma 7.

Suppose Condition 8. Then, for any f:\bbRp×n\bbRp×qf\hskip 0.5pt:\hskip 0.5pt\bbR^{p\times n}\to\bbR^{p\times q} and corresponding function fνf^{\nu} given by fν(Y/p)=νp×q(Y)f(Y)=\scrHf(Y)f^{\nu}(Y/\sqrt{p})=\nu_{p\times q}(Y)f(Y)=\scrH f(Y), we have

|\scrBfν(Y/p)||\scrH\scrBf(Y)|.\displaystyle|\scrB-\hskip 1.0ptf^{\nu}(Y/\sqrt{p})|\leq|\scrH^{\top}\scrB-f(Y)|\hskip 1.0pt. (32)
Proof.

Since almost surely, \calX\calX has linearly independent columns, it is easy to see that \scrB=\scrH\scrH\scrB\scrB=\scrH\scrH^{\top}\scrB. Using that |\scrH|=1|\scrH|=1 and that fν(Y)=\scrHf(Y)f^{\nu}(Y)=\scrH f(Y) yields,

|\scrBfν(Y/p)|\displaystyle|\scrB-f^{\nu}(Y/\sqrt{p})| =|\scrB\scrHf(Y)|=|\scrH\scrH\scrB\scrHf(Y)|\displaystyle=|\scrB-\scrH f(Y)|=|\scrH\scrH^{\top}\scrB-\scrH f(Y)|
|\scrH||\scrH\scrBf(Y)|=|\scrH\scrBf(Y)|\displaystyle\leq|\scrH||\scrH^{\top}\scrB-f(Y)|=|\scrH^{\top}\scrB-f(Y)|

The next example is a good warm-up for our main result (Theorem 8) below.

Example 9.

Let m\bbRn×qm\in\bbR^{n\times q} be nonrandom and M=\calX\scrWΛM=\calX\scrW\Lambda, a random matrix with φm=P(M=m)\varphi_{m}=\textup{{P}}\hskip 0.5pt(M=m) and P(M=m\scrO)=1φm\textup{{P}}\hskip 0.5pt(M=m\scrO)=1-\varphi_{m} for some \scrO\bbOB{Iq}\scrO\in\bbO_{B}\setminus\{I_{q}\}. By taking \bbG={Iq,\scrO}\bbG=\{I_{q},\scrO\}, the event Af,\bbG(\scrB)A_{f,\bbG}(\scrB) contains the outcomes for which \scrH\scrBo\scrH^{\top}\scrB_{o} admits a consistent estimator for two data sets corresponding to the \scrB\scrB and \scrBo=\scrB\scrO\scrB_{o}=\scrB\scrO.

If P(Af,\bbG(\scrB))>φm\textup{{P}}\hskip 0.5pt(A_{f,\bbG}(\scrB))>\varphi_{m}, then Af,\bbG(\scrB)A_{f,\bbG}(\scrB) contains outcomes corresponding to each possible realization of MM which implies by Lemma 7 that both |\scrBfpν(\scrBm)||\scrB-f^{\nu}_{p}(\scrB m^{\top})| and |\scrBofpν(\scrBm)||\scrB_{o}-f^{\nu}_{p}(\scrB m^{\top})| converge to zero. Since this is a contradiction, P(Af,\bbG(\scrB))φm\textup{{P}}\hskip 0.5pt(A_{f,\bbG}(\scrB))\leq\varphi_{m}.

This stylized example may be substantially generalized by requiring a certain distributional property of the random variable M=\calX\scrWΛM=\calX\scrW\Lambda.

Definition 10.

We say a random variable M\bbRn×qM\in\bbR^{n\times q} is \bbG\bbG-distributable if there exists a collection \bbG\bbOB{Iq}\bbG\subseteq\bbO_{B}\setminus\{I_{q}\} such that for any measurable G\bbRn×qG\subseteq\bbR^{n\times q},

P(MG)P(M\scrO\bbGG\scrO)(G\scrO={m\scrO:mG}).\hskip 32.0pt\textup{{P}}\hskip 0.5pt(M\in G)\hskip 1.0pt\leq\hskip 1.0pt\textup{{P}}\hskip 0.5pt(M\in\cup_{\scrO\in\bbG}G\scrO)\hskip 16.0pt\big{(}G\scrO=\{m\scrO\hskip 0.5pt:\hskip 0.5ptm\in G\}\big{)}\hskip 1.0pt.

Clearly, MM that has mean zero i.i.d. Gaussian entries is \bbG\bbG-distributable for \bbG\bbG with just one element, but we expect many random matrices MM to have this property. Our main result shows that even when restricting to a smaller set of covariance models, the chances of estimating the matrix \scrH\scrB\scrH^{\top}\scrB are no better than a coin flip.

Theorem 8.

Suppose Condition 8 holds and M=\calX\scrWΛM=\calX\scrW\Lambda is \bbG\bbG-distributable with \bbG\bbOB{Iq}\bbG\subseteq\bbO_{B}\setminus\{I_{q}\}. Then, for this \bbG\bbG and any sequence of (nonrandom) measurable functions fp:\bbRp×n\bbRp×qf_{p}\hskip 0.5pt:\hskip 0.5pt\bbR^{p\times n}\to\bbR^{p\times q}, the Af,\bbG(\scrB)A_{f,\bbG}(\scrB) in (31)(\ref{AfB}) has P(Af,\bbG(\scrB))1/2\textup{{P}}\hskip 0.5pt(A_{f,\bbG}(\scrB))\leq 1/2.

Proof.

By Lemma 7, we have Af,\bbG(\scrB)Af,\bbGν(\scrB)A_{f,\bbG}(\scrB)\subseteq A^{\nu}_{f,\bbG}(\scrB) where

Af,\bbGν(\scrB)={ωΩ:limp|\scrBofpν(\scrBoM)|(ω)=0,\scrO\bbG{Iq}}A^{\nu}_{f,\bbG}(\scrB)=\{\omega\in\Omega\hskip 0.5pt:\hskip 0.5pt\lim_{p\uparrow\infty}|\scrB_{o}-f^{\nu}_{p}(\scrB_{o}M^{\top})|(\omega)=0,\hskip 2.0pt\hskip 1.0pt\forall\scrO\in\bbG\cup\{I_{q}\}\}

for \scrBoMp=YBo=Y\scrB_{o}M^{\top}\sqrt{p}=Y_{B_{o}}=Y, the data matrix per (31)(\ref{AfB}), \scrBo=\scrB\scrO\scrB_{o}=\scrB\scrO and M=\calX\scrWΛM=\calX\scrW\Lambda, after recalling the definition fν(Y/p)=νp×q(Y)f(Y)=\scrHf(Y)f^{\nu}(Y/\sqrt{p})=\nu_{p\times q}(Y)f(Y)=\scrH f(Y).

Note that the \calF\calF-measurability of the set Af,\bbGν(\scrB)A^{\nu}_{f,\bbG}(\scrB) is granted by the measurability of each fpνf^{\nu}_{p} (i.e., each fpf_{p} is measurable and so is each νp×q\nu_{p\times q} (Acker, 1974)).

Letting G=M(Af,\bbGν(\scrB))={M(ω)\bbRn×q:ωAf,\bbGν(\scrB)}G=M(A^{\nu}_{f,\bbG}(\scrB))=\{M(\omega)\in\bbR^{n\times q}\hskip 0.5pt:\hskip 0.5pt\omega\in A_{f,\bbG}^{\nu}(\scrB)\}, we see that

limp|\scrBfpν(\scrBm)|\displaystyle\hskip 32.0pt\lim_{p\uparrow\infty}|\scrB-f^{\nu}_{p}(\scrB m^{\top})| =0mG,\displaystyle=0\hskip 32.0pt\forall m\in G, (33)

by taking \scrO=Iq\scrO=I_{q}. Analogously, for \scrBo=\scrB\scrO\scrB_{o}=\scrB\scrO for any \scrO\bbG\scrO\in\bbG, we have

limp|\scrB\scrOfpν(\scrB\scrOm)|\displaystyle\hskip 10.0pt\lim_{p\uparrow\infty}|\scrB\scrO-f^{\nu}_{p}(\scrB\scrO m^{\top})| =0mG.\displaystyle=0\hskip 32.0pt\forall m\in G. (34)

Letting G=\scrO\bbGG\scrOG^{\prime}=\underset{\scrO\in\bbG}{\cup}G\scrO, we claim that GG and GG^{\prime} are disjoint. To see this, note that if m1GGm_{1}\in G\cap G^{\prime}, then m1=m2\scrOm_{1}=m_{2}\scrO for m2Gm_{2}\in G, \scrO\bbG\scrO\in\bbG. Substituting m1=m2\scrOGm_{1}=m_{2}\scrO\in G for mm in relation (34)(\ref{fvBO}), and substituting m2Gm_{2}\in G for mm in relation (33)(\ref{fvB}), yields

limp|\scrB\scrOfpν(\scrBm2)|=0 and limp|\scrBfpν(\scrBm2)|=0,\displaystyle\lim_{p\uparrow\infty}|\scrB\scrO-f^{\nu}_{p}(\scrB m_{2}^{\top})|=0\hskip 4.0pt\text{ and }\hskip 4.0pt\lim_{p\uparrow\infty}|\scrB-f^{\nu}_{p}(\scrB m_{2}^{\top})|=0\hskip 1.0pt,

a contradiction, as both cannot hold simultaneously. Thus, GG and GG^{\prime} are disjoint.

Consequently {MG}\{M\in G\} and {MG}\{M\in G^{\prime}\} are disjoint and moreover, the \bbG\bbG-distributability of MM implies P(MG)P(MG)\textup{{P}}\hskip 0.5pt(M\in G)\leq\textup{{P}}\hskip 0.5pt(M\in G^{\prime}). This along with the fact that Af,\bbG(\scrB)Af,\bbGν(\scrB){MG}A_{f,\bbG}(\scrB)\subseteq A_{f,\bbG}^{\nu}(\scrB)\subseteq\{M\in G\} implies the desired result, i.e.,

1\displaystyle 1 P(MG)+P(MG)2P(MG)2P(Af,\bbGν(\scrB))\displaystyle\geq\textup{{P}}\hskip 0.5pt(M\in G^{\prime})+\textup{{P}}\hskip 0.5pt(M\in G)\geq 2\hskip 1.0pt\textup{{P}}\hskip 0.5pt(M\in G)\geq 2\hskip 1.0pt\textup{{P}}\hskip 0.5pt(A^{\nu}_{f,\bbG}(\scrB))
2P(Af,\bbG(\scrB)).\displaystyle\geq 2\hskip 1.0pt\textup{{P}}\hskip 0.5pt(A_{f,\bbG}(\scrB))\hskip 1.0pt.

5 Optimization Bias Free Covariance Estimator

Let \scrH\scrH be the p×qp\times q matrix of eigenvectors in (22)(\ref{efficient}) of the sample covariance SS. Recalling the variables Π=(Ψ1Ψ)\Pi=({{\Psi}}^{-1}-{{\Psi}}) and z\bbRpz\in\bbR^{p} of Theorem 3, we define

z\scrH=zz\scrH|zz\scrH|\bbRp,φ=Π\scrHz|zz\scrH|\bbRq,\displaystyle z_{\perp\scrH}=\frac{z-z_{\scrH}}{|z-z_{\scrH}|}\in\bbR^{p}\hskip 1.0pt,\hskip 16.0pt\varphi=\frac{\Pi\scrH^{\top}z}{|z-z_{\scrH}|}\in\bbR^{q}\hskip 1.0pt, (35)

Theorem 3 proved that limp(|\scrEp(\scrH)||φ|)=0\lim_{p\uparrow\infty}(|\scrE_{p}(\scrH)|-|\varphi|)=0 with |φ||\varphi| eventually in [0,)[0,\infty) almost surely. From the observable φ\varphi and z\scrHz_{\perp\scrH}, we now construct an p×qp\times q matrix \scrH\scrH_{\sharp} with \scrEp(\scrH)0\scrE_{p}(\scrH_{\sharp})\to 0 as pp\uparrow\infty. To this end, consider the eigenvalue decomposition

Ψ2+φφ=\scrMΦ2\scrM,\displaystyle{{\Psi}}^{2}+\varphi\varphi^{\top}=\scrM\Phi^{2}\scrM^{\top}\hskip 1.0pt, (36)

for eigenvectors \scrM=νq×q(Ψ2+φφ)\scrM=\nu_{q\times q}({{\Psi}}^{2}+\varphi\varphi^{\top}) and diagonal q×qq\times q matrix of eigenvalues Φ2\Phi^{2}. The estimator \scrH\scrH_{\sharp} is computed as the eigenvectors \scrH=νp×q(\scrHΨ+z\scrHφ)\scrH_{\sharp}=\nu_{p\times q}(\scrH\Psi+z_{\perp\scrH}\varphi^{\top}), i.e.,

\scrH=(\scrHΨ+z\scrHφ)\scrMΦ1,\displaystyle\scrH_{\sharp}=(\scrH\Psi+z_{\perp\scrH}\hskip 1.0pt\varphi^{\top})\scrM\Phi^{-1}, (37)

where the diagonal Φ\Phi is invertible for pp sufficiently large under our assumptions.

Theorem 9.

Suppose Assumptions 5 & 6 hold. Then, almost surely,

limp\scrEp(\scrH)=0q.\displaystyle\lim_{p\uparrow\infty}\scrE_{p}(\scrH_{\sharp})=0_{q}\hskip 1.0pt. (38)

Moreover, \scrH\scrH=I\scrH_{\sharp}^{\top}\scrH_{\sharp}=I and limp|\scrH\scrB\scrB\scrHΦ2|=0\lim_{p\uparrow\infty}|\scrH_{\sharp}^{\top}\scrB\scrB^{\top}\scrH_{\sharp}-\Phi^{2}|=0 almost surely.

The proof is deferred to Appendix C but we sketch the derivation of (37)(\ref{Hsharp}) and give a geometrical interpretation in Sections 5.1 and 5.2. We take H=\scrHΨ\calSpH_{\sharp}=\scrH_{\sharp}\Psi\calS_{p} to combine the eigenvector correction \scrH\scrH_{\sharp} with that for the eigenvalues (see Remark 7). Note that \scrEp(\scrH)=\scrEp(H)\scrE_{p}(\scrH_{\sharp})=\scrE_{p}(H_{\sharp}) by Lemma 2. We let ^Σ=HH+γ^2I\bm{\hat{}}{\Sigma}_{\sharp}=H_{\sharp}H_{\sharp}^{\top}+\hat{\gamma}^{2}I be our covariance estimator where, identically to pca, we take γ^2=nκp2/p\hat{\gamma}^{2}=n\kappa_{p}^{2}\hskip 1.0pt/p with κp2\kappa_{p}^{2} in (21)(\ref{snr-bulk}) or (25)(\ref{bulk2}).

Theorem 9 provides theoretical guarantees for many applications, including that the estimator ^Σ\bm{\hat{}}{\Sigma}_{\sharp} is now demonstrated to yield minimum variance portfolios (i.e., solutions of (14)(\ref{minvar})) with zero asymptotic variance (see \scrVp2\scrV^{2}_{p} in (16)(\ref{tru_asymp})). Addressing the convergence rate of (38)(\ref{limEHsharp}) is outside of our scope, but we study this rate numerically in Section 6, which shows, at least for Gaussian data, that rate is O(1/p)O(1/\sqrt{p}). This suggests that \scrH\scrH_{\sharp} yields a bounded discrepancy D^p\hat{D}_{p} of Theorem 1 under some conditions.

The last part of Theorem 9 concerns the inner products of the columns of \scrH\scrH_{\sharp} projected onto col(B)\textsc{col}{\hskip 0.5pt(B)}. This is in direct comparison to Theorem 4(c) which shows that the sample eigenvectors \scrH\scrH are orthogonal in col(B)\textsc{col}{\hskip 0.5pt(B)} and the same is true for the columns of \scrH\scrH_{\sharp} since Φ2\Phi^{2} is diagonal. Selecting the jjth column hh_{\sharp} of \scrH\scrH_{\sharp} we have,

limp|hB|Φjj1=1,\lim_{p\uparrow\infty}|\hskip 1.0pth_{\sharp B}|\hskip 1.0pt\Phi^{-1}_{jj}=1\hskip 1.0pt,

as compared with limp|hB|Ψjj1=1\lim_{p\uparrow\infty}|\hskip 1.0pth_{B}|\hskip 1.0pt{{\Psi}}^{-1}_{jj}=1 in (26)(\ref{hBh_B}). Note, Ψjj2Φjj2{{\Psi}}^{2}_{jj}\leq\Phi^{2}_{jj} eventually with a strict inequality when |φ||\varphi| is bounded away from zero (in pp) due to (36)(\ref{Phi}). Thus the length |hB||h_{\sharp B}| of hh_{\sharp} projected onto col(B)\textsc{col}{\hskip 0.5pt(B)} is at least as large as for its counterpart

5.1 Remarks on the GPS program

The special case q=1q=1 was considered by Goldberg, Papanicolaou and Shkolnik (2022) (henceforth gps) who apply their results to portfolio theory. We summarize the relevant parts of the gps program making adjustments for greater generality and compatibility with our solution in Section 5.2.

Here, (9)(\ref{bSig}) takes the form Σ=ββ+Γ\Sigma=\beta\beta^{\top}+\Gamma where β\bbRp\beta\in\bbR^{p} and Assumption 1 requiring a sequence β=βp×1\beta=\beta_{p\times 1} for which β,β/p\langle\beta,\beta\rangle/p converges in (0,)(0,\infty). The sample covariance matrix may be written as S=ηη+GS=\eta\eta^{\top}+G where \scrs2=η,η=maxv\bbRpv,Sv/v,v\scrs^{2}=\langle\eta,\eta\rangle=\max_{v\in\bbR^{p}}\langle v,Sv\rangle/\langle v,v\rangle is the largest eigenvalue with eigenvector h=η/|η|h=\eta/|\eta| and the matrix GG contains the remaining spectrum per (18)(\ref{spectral}). Setting b=β/|β|b=\beta/|\beta| yields,

\scrEp(h)\displaystyle\scrE_{p}(h) =b,zb,hh,z1h,z2\displaystyle=\frac{\langle b,z\rangle-\langle b,h\rangle\langle h,z\rangle}{\sqrt{1-\langle h,z\rangle^{2}}} (39)

for the quadratic optimization bias (23)(\ref{optbias_gps}) in the case q=1q=1. Our (39)(\ref{Eph}) uses a different denominator than GPS, but this difference is not essential. Our Γ\Gamma generalizes the choice of a scalar matrix in GPS and our Assumption 6 relax their conditions.

The GPS program assumes (w.l.o.g.) that b,z0\langle b,z\rangle\geq 0 and h,z0\langle h,z\rangle\geq 0, enforces Assumption 5 so that lim¯pb,z<1\varlimsup_{p\uparrow\infty}\langle b,z\rangle<1, and takes the following steps.

  1. (1)

    Find asymptotic estimators for unknowns b,z\langle b,z\rangle and h,b\langle h,b\rangle in (39)(\ref{Eph}). To this end, for the observed ψ2=1tr(G)/\scrs2(n+1)\psi^{2}=1-\frac{\hskip 1.0pt\textup{\text{tr}}(G)/\scrs^{2}}{(n_{+}-1)} (c.f., (21)(\ref{snr-bulk})), under Assumption 6 almost surely,

    limp|b,hψ|=0<limpψ<1 and limp|h,zh,bb,z|=0.\displaystyle\lim_{p\uparrow\infty}|\langle b,h\rangle-\psi|=0<\lim_{p\uparrow\infty}\psi<1\text{ and }\lim_{p\uparrow\infty}|\langle h,z\rangle-\langle h,b\rangle\langle b,z\rangle|=0\hskip 1.0pt. (40)
  2. (2)

    Consider the estimator hzt=h+tz|h+tz|h_{z}t=\frac{h+tz}{|h+tz|} parametrized by t\bbRt\in\bbR so that hzt,z\langle h_{z}t,z\rangle increases in t0t\geq 0. This construction is motivated by the h,bh,z\langle h,b\rangle\langle h,z\rangle in the numerator of (39)(\ref{Eph}) becoming eventually less than b,z\langle b,z\rangle almost surely, per (40)(\ref{gps}).

  3. (3)

    Solve \scrEp(hzt)=0\scrE_{p}(h_{z}t)=0 for t=τt=\tau_{*} as a function of the unknowns h,b\langle h,b\rangle and b,z\langle b,z\rangle. Leveraging (40)(\ref{gps}), construct an observable τ\tau such that |ττ|0|\tau_{*}-\tau|\to 0 as pp\uparrow\infty and prove a uniform continuity of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) to establish that limp\scrEp(hzτ)=0\lim_{p\uparrow\infty}\scrE_{p}(h_{z}\tau)=0.

These steps cannot be easily extended to the setting of general qq. Step (1) is no longer possible in view of Theorem 8, and indeed, the “sign” conventions b,z0\langle b,z\rangle\geq 0 and h,z0\langle h,z\rangle\geq 0 cannot be appropriated from the univariate case given that result. Step (2) is difficult to extend because its intuition becomes obscure for general \bbRq\bbR^{q} where the vector \scrEp(\scrH)\scrE_{p}(\scrH) resides. Step (3) relies on basic calculations to determine the root of a univariate function. Determining roots in \bbRq\bbR^{q}, especially without the right parametrization in step (2), appears difficult given the definition of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) in (23)(\ref{optbias_gps}).

We make some adjustments to prime our approach in Section 5.2. First, write

limp|b,h2ψ2|=limp|h,zh,bb,z|=0.\displaystyle\lim_{p\uparrow\infty}|\langle b,h\rangle^{2}-\psi^{2}|=\lim_{p\uparrow\infty}|\langle h,z\rangle-\langle h,b\rangle\langle b,z\rangle|=0\hskip 1.0pt. (41)

as a replacement for (40)(\ref{gps}). This drops the sign conventions on b,z,h,z\langle b,z\rangle,\langle h,z\rangle to reformulate step (1) for compatibility with the findings of Theorem 4 parts (c)(d).

Our adjustment to step (2) sacrifices its intuition for additional degrees of freedom. In particular, for zh=zzh|zzh|z_{\perp h}=\frac{z-z_{h}}{|z-z_{h}|} and zh=h,zhz_{h}=\langle h,z\rangle h (c.f., (35)(\ref{zpH})), set

hzt=t1h+t2zh(t1,t2\bbR:t12+t22=1).\displaystyle\hskip 32.0pth_{z}t=t_{1}\hskip 1.0pth+t_{2}\hskip 1.0ptz_{\perp h}\hskip 32.0pt(t_{1},t_{2}\in\bbR\hskip 0.5pt:\hskip 0.5ptt_{1}^{2}+t^{2}_{2}=1)\hskip 1.0pt. (42)

This two-parameter estimator hzth_{z}t parametrizes the quadratic optimization bias as,

\scrEp(hzt)\displaystyle\scrE_{p}(h_{z}\hskip 1.0ptt) =b,z(t1b,h+t2\scrEp(h))(t1h,z+t21h,z2)1|hztz|2.\displaystyle=\frac{\langle b,z\rangle-(t_{1}\langle b,h\rangle+t_{2}\scrE_{p}(h))(t_{1}\langle h,z\rangle+t_{2}\sqrt{1-\langle h,z\rangle^{2}})}{\sqrt{1-|h_{z}tz|^{2}}}\hskip 1.0pt. (43)

It is not difficult to verify that setting t1h,bt_{1}\propto\langle h,b\rangle and t2\scrEp(h)t_{2}\propto\scrE_{p}(h) in the above display leads to the identity \scrEp(hzt)=0\scrE_{p}(h_{z}t_{*})=0 with t=(h,b,\scrEp(h))h,b2+\scrEp2(h)t_{*}=\frac{(\langle h,b\rangle,\scrE_{p}(h))}{\sqrt{\langle h,b\rangle^{2}+\scrE_{p}^{2}(h)}}. Finally, the parameter

tK=th,b|h,b|=(h,b2,h,b\scrEp(h))h,b4+h,b2\scrEp2(h)(K=h,b|h,b|)\displaystyle\hskip 32.0ptt_{*}K=\frac{t_{*}\langle h,b\rangle}{|\langle h,b\rangle|}=\frac{(\langle h,b\rangle^{2},\hskip 1.0pt\langle h,b\rangle\scrE_{p}(h))}{\sqrt{\langle h,b\rangle^{4}+\langle h,b\rangle^{2}\scrE_{p}^{2}(h)}}\hskip 32.0pt\Big{(}K=\frac{\langle h,b\rangle}{|\langle h,b\rangle|}\Big{)} (44)

also has the property that \scrEp(hztK)=0\scrE_{p}(h_{z}t_{*}K)=0 but tKt_{*}K admits asymptotic estimators via the replacement (41)(\ref{gpsq}) of (40)(\ref{gps}). This modifies step (3) of the GPS program to find an asymptotic root of \scrEp()\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt) without any sign conventions on b,z,h,z\langle b,z\rangle,\langle h,z\rangle. While these changes are somewhat trivial for the case q=1q=1, our understanding of them is informed by the case q>1q>1 and initiated by the impossibility result in Theorem 8.

5.2 Sketch of the derivation of \scrH\scrH_{\sharp}

We begin by defining a matrix \scrHz\scrH_{z} composed of (q+1)(q+1) orthonormal columns, derived from \scrH\scrH in (20)(\ref{Jc}) and z\scrHz_{\perp\scrH} in (35)(\ref{zpH}), i.e.,

\scrHz=(\scrHz\scrH)=(\scrHzz\scrH|zz\scrH|),\displaystyle\scrH_{z}=\big{(}\scrH\hskip 8.0ptz_{\perp\scrH}\big{)}=\Big{(}\scrH\hskip 16.0pt\frac{z-z_{\scrH}}{|z-z_{\scrH}|}\Big{)}\hskip 1.0pt, (45)

so that col(\scrHz)\textsc{col}{\hskip 0.5pt(\scrH_{z})} expands col(\scrH)\textsc{col}{\hskip 0.5pt(\scrH)} by the vector z=ζ|ζ|z=\frac{\zeta}{|\zeta|}. We introduce a parametrized estimator \scrHzT\scrH_{z}T for a full rank matrix TT, derive a root TT_{*} of the map T\scrEp(\scrHzT)T\mapsto\scrE_{p}(\scrH_{z}T), and construct an asymptotic estimator of TT_{*} by applying Theorem 4(c)(d).

We consider the following family of estimators with (42)(\ref{hzt}) as a special case.

{\scrHzT:T\bbR(q+1)×q with TT\bbRq×q invertible}.\displaystyle\big{\{}\hskip 1.0pt\scrH_{z}T\hskip 0.5pt:\hskip 0.5ptT\in\bbR^{(q+1)\times q}\text{ with }T^{\top}T\in\bbR^{q\times q}\text{ invertible}\big{\}}\hskip 1.0pt. (46)

Any \scrHzT\scrH_{z}T in this family is a p×qp\times q matrix of full rank. We have T=(t1t2)T=(t_{1}\hskip 4.0ptt_{2})^{\top} for q=1q=1, but the constraint on TT=t12+t22T^{\top}T=t_{1}^{2}+t_{2}^{2} imposed by (42)(\ref{hzt}) is relaxed in (46)(\ref{Tfamily}).

Substituting \scrHzT\scrH_{z}T into the optimization bias function in (11)(\ref{optbias}), we obtain

\scrEp(\scrHzT)=\scrBz\scrB\scrHz(TT)\scrHzz1|z\scrHzT|2\displaystyle\scrE_{p}(\scrH_{z}T)=\frac{\scrB^{\top}z-\scrB^{\top}\scrH_{z}(TT^{\dagger})\scrH_{z}^{\top}z}{1-|z_{\scrH_{z}T}|^{2}} (47)

where we have used that \scrHz\scrHz=I\scrH_{z}^{\top}\scrH_{z}=I. The expression (47)(\ref{EHT}) is obscure, but we note that T(TT)=TT^{\top}(TT^{\dagger})=T^{\top} which suggests a simplification post T=\scrB\scrHzT^{\top}_{*}=\scrB^{\top}\scrH_{z}, i.e.,

\scrEp(\scrHzT)\displaystyle\scrE_{p}(\scrH_{z}T_{*}) =\scrBzT\scrHzz1|z\scrHzT|2=\scrBz\scrB\scrHz\scrHzz1|z\scrHz\scrHz\scrB|2=0,\displaystyle=\frac{\scrB^{\top}z-T_{*}^{\top}\scrH_{z}^{\top}z}{1-|z_{\scrH_{z}T_{*}}|^{2}}=\frac{\scrB^{\top}z-\scrB^{\top}\scrH_{z}\scrH_{z}^{\top}z}{1-|z_{\scrH_{z}\scrH_{z}^{\top}\scrB}|^{2}}=0\hskip 1.0pt, (48)

provided TTT^{\top}_{*}T_{*} is invertible and |z\scrHzT|<1|z_{\scrH_{z}T_{*}}|<1 (see Appendix C). For last equality we use that \scrHz\scrHzz=z\scrHz=z\scrH_{z}\scrH_{z}^{\top}z=z_{\scrH_{z}}=z (i.e., the projection of zz onto col(\scrHz)\textsc{col}{\hskip 0.5pt(\scrH_{z})} is zz itself). We remark that the matrix formalism of (48)(\ref{slick}) has advantages even over the special case in (43)(\ref{Ehzt}). Figure 2 illustrates geometry of the transformation T\scrHzTT\mapsto\scrH_{z}T at T=TT=T_{*}.

Refer to caption
Figure 2: Left panel: Illustration of col(\scrH)\textsc{col}{\hskip 0.5pt(\scrH)} relative to z\scrHz_{\perp\scrH} in (35)(\ref{zpH}). The angles θ\scrH\vec{\theta}_{\scrH} between z\scrHz_{\perp\scrH} and col(\scrB)\textsc{col}{\hskip 0.5pt(\scrB)} are the arc cosines of the entries of \scrEp(\scrH)=\scrBz\scrH\scrE_{p}(\scrH)=\scrB^{\top}z_{\perp\scrH}. Right panel: The optimal T=\scrHz\scrBT_{*}=\scrH_{z}^{\top}\scrB leads to a basis \scrH\scrH_{*} that spans col(\scrHzT)\textsc{col}{\hskip 0.5pt(\scrH_{z}T_{*})} and leads to \scrBz\scrH=\scrEp(\scrH)=0\scrB^{\top}z_{\perp\scrH_{*}}=\scrE_{p}(\scrH_{*})=0, setting each entry of θ\scrH\vec{\theta}_{\scrH_{*}} to 9090^{\circ}.

The slick calculation above does not constitute our original derivation which is heavy-handed and superfluous. The advantage of (48)(\ref{slick}) lies in its brevity and its quick bridge to the GPS program. Yet, (48)(\ref{slick}) is not sufficient in view of Theorem 8, i.e., the optimal point T=\scrHz\scrBT_{*}=\scrH_{z}^{\top}\scrB is not observed nor can it be estimated from the observed data. To this end, we seek an invertible matrix KK for which (similarly to (44)(\ref{tsK})),

TK=\scrHz\scrBKT_{*}K=\scrH_{z}^{\top}\scrB K

may be estimated solely from YY and use Lemma 2 to conclude that \scrEp(\scrHzTK)=0\scrE_{p}(\scrH_{z}T_{*}K)=0 provided KK is invertible. The choice KK is motivated by the fact that as pp\uparrow\infty,

\scrHz\scrB\scrB\scrH=(\scrH\scrB\scrB\scrH\scrEp(\scrH)\scrB\scrH)limp(Ψφ)Ψ\displaystyle\scrH_{z}^{\top}\scrB\scrB^{\top}\scrH=\left(\begin{array}[]{c}\scrH^{\top}\scrB\scrB^{\top}\scrH\\ \scrE^{\top}_{p}(\scrH)\scrB^{\top}\scrH\end{array}\right)\to\lim_{p\uparrow\infty}\left(\begin{array}[]{c}{{\Psi}}\\ \varphi^{\top}\end{array}\right){{\Psi}} (53)

where we used \scrBz\scrH=\scrEp(\scrH)\scrB z_{\perp\scrH}=\scrE_{p}(\scrH) in (23)(\ref{optbias_gps}) and applied Theorem 4 to obtain the stated almost sure convergence (see Appendix C for details). The variables in the limit are computable from the data YY and it is again notable that while we are unable to estimate \scrEp(\scrH)\scrE_{p}(\scrH), the quantity \scrEp(\scrH)(\scrB\scrH)\scrE^{\top}_{p}(\scrH)(\scrB^{\top}\scrH) admits an estimator as did |\scrEp(\scrH)||\scrE_{p}(\scrH)|.

Our estimator (37)(\ref{Hsharp}) is now easily seen to take the following form.

\scrH=\scrHzT,T=(Ψφ)\scrMΦ1\displaystyle\hskip 32.0pt\scrH_{\sharp}=\scrH_{z}T_{\sharp}\hskip 1.0pt,\hskip 32.0ptT_{\sharp}=\Big{(}\begin{array}[]{c}{{\Psi}}\\ \varphi^{\top}\end{array}\Big{)}\scrM\Phi^{-1} (56)

This suggests taking K=\scrB\scrHΨ1\scrMΦ1K=\scrB^{\top}\scrH{{\Psi}}^{-1}\scrM\Phi^{-1} because (53)(\ref{HzBBH}) now implies that

limp|TKT|=0.\displaystyle\hskip 32.0pt\lim_{p\uparrow\infty}|T_{*}K-T_{\sharp}|=0\hskip 1.0pt. (57)

Appendix C proves the KK is eventually invertible and applies (57)(\ref{Tprime}) to deduce that \scrEp(\scrHzTK)=0\scrE_{p}(\scrH_{z}T_{*}K)=0 implies the desired conclusion limp\scrEp(\scrHzT)=0\lim_{p\uparrow\infty}\scrE_{p}(\scrH_{z}T_{\sharp})=0.

6 A Numerical Example

We illustrate our results on a numerical example to provide a verification of Theorems 4, 3 and 9. We also study the convergence rates of various estimators, which are not supplied by our theory. Consider i.i.d. observations of y\bbRpmaxy\in\bbR^{p_{\max}} where,

y=α+Bx+ϵ,\displaystyle y=\alpha+Bx+\epsilon\hskip 1.0pt, (58)

with α\bbRpmax\alpha\in\bbR^{p_{\max}} and a pmax×qp_{\max}\times q matrix BB, that are realized over uncorrelated x\bbRqx\in\bbR^{q} and ϵ\bbRpmax\epsilon\in\bbR^{p_{\max}} with var(x)=Iq\textup{{var}}(x)=I_{q} and var(ϵ)=Γ\textup{{var}}(\epsilon)=\Gamma. Then, var(y)=Σ=BB+Γ\textup{{var}}(y)=\Sigma=BB^{\top}+\Gamma. Fixing n=120n=120, q=7q=7, pmax=128,000p_{\max}=128,000, we simulate pmax×np_{\max}\times n data matrices YY with observations of yy as its columns. The parameters α,B,Γ\alpha,B,\Gamma are calibrated in Section 6.2 with the minimum variance portfolio problem described in Section 2.2 in mind.

We simulate 10510^{5} data matrices Ypmax×nY_{p_{\max}\times n}, selecting subsets Yp×nY_{p\times n} of size p×np\times n by taking p=500,2000,8000,32000,128000p=500,2000,8000,32000,128000 to study the asymptotics of three estimators. All three are based on a centered sample covariance S=YJY/nS=YJY^{\top}/n (see (20)(\ref{Jc})), the spectrum of which equals that of JYYJ/nJY^{\top}YJ/n and is computed from this n×nn\times n matrix. This results in a 7×77\times 7 diagonal matrix \calSp2\calS^{2}_{p} with the 77 largest eigenvalues of SS, as well as the Ψ2{{\Psi}}^{2} in (21)(\ref{snr-bulk}) and κp2\kappa^{2}_{p} in (25)(\ref{bulk2}). Our three estimators have the form,

^Σ=\scrH(\calSpΨ)2\scrH+γ^2I,(\scrH{\scrH,\scrH,\scrH}),\displaystyle\hskip 32.0pt\bm{\hat{}}{\Sigma}_{\natural}=\scrH_{\natural}(\calS_{p}{{\Psi}})^{2}\scrH^{\top}_{\natural}+\hat{\gamma}^{2}I\hskip 1.0pt,\hskip 16.0pt(\scrH_{\natural}\in\{\scrH,\scrH_{\flat},\scrH_{\sharp}\}), (59)

where γ^2=nκp2/p\hat{\gamma}^{2}=n\kappa_{p}^{2}/p and \scrH\scrH_{\natural} is one of three p×7p\times 7 matrices of orthonormal columns.

  1. (\scrH\scrH)

    The sample eigenvectors \scrH=νp×7(S)\scrH=\nu_{p\times 7}(S) are computed per (22)(\ref{efficient}) using the matrix JYYJ/nJY^{\top}YJ/n. These vectors correspond to a pca covariance model in (59)(\ref{nSig}).

  2. (\scrH\scrH_{\flat})

    The matrix \scrH\scrH_{\flat} will use the gps estimator of Section 5.1 to issue a correction to only the first column of \scrH\scrH. In particular, we let \scrK\scrK equal \scrH\scrH in columns 2277 and replace its first columns by hzt/|hzt|h_{z}t/|h_{z}t| with hzth_{z}t in (42)(\ref{hzt}) and (t1,t2)(t_{1},t_{2}) given by (44)(\ref{tsK}). Finally, we set \scrH=νp×7(\scrK\calSpΨ)\scrH_{\flat}=\nu_{p\times 7}(\scrK\calS_{p}{{\Psi}}) computed analogously to (22)(\ref{efficient}) for efficiency.

  3. (\scrH\scrH_{\sharp})

    The corrected sample eigenvectors \scrH\scrH_{\sharp} are computed using (35)(\ref{zpH})(37)(\ref{Hsharp}).

To assess the performance of the three covariance estimators in (59)(\ref{nSig}) we test them on several metrics. With respect to the minimum variance portfolio application, for Σ=var(y)\Sigma=\textup{{var}}(y) and y\bbRpy\in\bbR^{p} in (58)(\ref{model}), the returns to pp financial assets, we compute

σmin2=min1p,w=1w,Σw(w\bbRp),\displaystyle\hskip 32.0pt\sigma^{2}_{\min}=\min_{\langle 1_{p},w\rangle=1}\langle w,\Sigma w\rangle\hskip 32.0pt(w\in\bbR^{p})\hskip 1.0pt, (60)

the true minimum variance. We compare the volatility σmin\sigma_{\min} to the realized volatility, \scrVp(\scrH)=w^,Σw^\scrV_{p}(\scrH_{\natural})=\sqrt{\langle\hat{w}_{\natural},\Sigma\hat{w}_{\natural}\rangle} (see (15)(\ref{tru})) of w^\bbRp\hat{w}_{\natural}\in\bbR^{p} that minimizes (14)(\ref{minvar}) with ^Σ=^Σ\bm{\hat{}}{\Sigma}=\bm{\hat{}}{\Sigma}_{\natural}.

We also study the length |\scrEp(\scrH)||\scrE_{p}(\scrH_{\natural})| of the quadratic optimization bias, the true and realized quadratic optima (taking c0=c1=1c_{0}=c_{1}=1 in (2)(\ref{conv}) and (3)(\ref{Qhx})) of Section 1.1,

maxx\bbRpQ(x)=1+μp22,Q(x^)=1+μ^p22D^p,\displaystyle\max_{x\in\bbR^{p}}Q(x)=1+\frac{\mu^{2}_{p}}{2}\hskip 1.0pt,\hskip 32.0ptQ(\hat{x})=1+\frac{\hat{\mu}^{2}_{p}}{2}\hat{D}_{p}\hskip 1.0pt, (61)

and their discrepancy D^p=2μ^p2\scrVp2\hat{D}_{p}=2-\hat{\mu}_{p}^{2}\scrV^{2}_{p}. The μp2=1p,Σ11p\mu^{2}_{p}=\langle 1_{p},\Sigma^{-1}1_{p}\rangle and μ^p2=1p,^Σ11p\hat{\mu}^{2}_{p}=\langle 1_{p},\bm{\hat{}}{\Sigma}^{-1}_{\natural}1_{p}\rangle (as well as x^=^Σ11p\hat{x}_{\natural}=\bm{\hat{}}{\Sigma}_{\natural}^{-1}1_{p}, and minimizers of (14)(\ref{minvar}), (60)(\ref{sigmin})) are efficiently computed via the Woodbury identity to obtain the inverses of the covariance matrices Σ\Sigma and ^Σ\bm{\hat{}}{\Sigma}_{\natural}.

Refer to caption
Figure 3: Realized minimum variance portfolio volatility versus portfolio size pp with lines, the sample means of Table 1, and two standard deviation confidence intervals.
Refer to caption
Figure 4: Discrepancy D^p\hat{D}_{p} versus dimension pp with lines, the sample means found in Table 2 and Table 4, and two standard deviation confidence intervals, (the confidence intervals for the estimator \scrH\scrH_{\sharp} are too narrow to be clearly visible).
pp σmin\sigma_{\min} E\scrVp(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\scrV_{p}(\scrH) E\scrVp(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\scrV_{p}(\scrH_{\flat}) E\scrVp(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\scrV_{p}(\scrH_{\sharp})
500500 7.697.69 11.4311.43 10.8210.82 10.7510.75
20002000 4.094.09 9.559.55 7.617.61 6.396.39
80008000 2.082.08 8.978.97 6.156.15 3.353.35
3200032000 1.041.04 8.778.77 5.715.71 1.681.68
128000128000 0.520.52 8.698.69 5.545.54 0.840.84
Table 1: Realized minimum variance portfolio volatilities (square root of \scrVp2\scrV_{p}^{2}) computed using the three covariance estimators ^Σ{\bm{\hat{}}{\Sigma}} (pca), ^Σ{\bm{\hat{}}{\Sigma}}_{\flat} and ^Σ{\bm{\hat{}}{\Sigma}}_{\sharp}. Sample mean estimates for expectation EX\textup{{E}}\hskip 0.5pt\hskip 1.0ptX of column variable XX based on 10510^{5} simulations.
pp maxxQ(x)\max_{x}Q(x) EQ(x^)\textup{{E}}\hskip 0.5pt\hskip 1.0ptQ(\hat{x}) ED^p(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\hat{D}_{p}(\scrH) EQ(x^)\textup{{E}}\hskip 0.5pt\hskip 1.0ptQ(\hat{x}_{\sharp}) ED^p(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\hat{D}_{p}(\scrH_{\sharp})
500500 1.011.01 0.990.99 1.16-1.16 1.01.0 1.221.22
20002000 1.031.03 0.640.64 7.11-7.11 1.011.01 0.930.93
80008000 1.121.12 4.98-4.98 30.04-30.04 1.041.04 0.850.85
3200032000 1.471.47 97.01-97.01 121.81-121.81 1.181.18 0.860.86
128000128000 2.882.88 1572.9-1572.9 486.92-486.92 1.71.7 0.870.87
Table 2: Realized maximum Q(x^)Q(\hat{x}) and discrepancy D^p(\scrH)\hat{D}_{p}(\scrH) of pca for growing pp are compared with Q(x^)Q(\hat{x}_{\sharp}) and D^p(\scrH)\hat{D}_{p}(\scrH_{\sharp}), computed using the covariance ^Σ\bm{\hat{}}{\Sigma}_{\sharp}. The true maximum in column 22 applies the true covariance matrix Σ\Sigma. Sample mean estimates for expectation EX\textup{{E}}\hskip 0.5pt\hskip 1.0ptX of column variable XX based on 10510^{5} simulations.
pp E|φ|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\varphi| E|\scrEp(\scrH)|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH)| E|\scrEp(\scrH)|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH_{\sharp})| pE|\scrEp(\scrH)|2p\hskip 1.0pt\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH)|^{2} pE|\scrEp(\scrH)|2p\hskip 1.0pt\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH_{\sharp})|^{2}
500500 0.2370.237 0.2980.298 0.190.19 44.544.5 18.0518.05
20002000 0.2190.219 0.260.26 0.1320.132 135.6135.6 34.8834.88
80008000 0.2220.222 0.2370.237 0.0510.051 447.8447.8 21.0621.06
3200032000 0.2250.225 0.2280.228 0.0180.018 1663.91663.9 10.0310.03
128000128000 0.2270.227 0.2280.228 0.0080.008 6629.16629.1 7.687.68
Table 3: Quadratic optimization bias length |\scrEp(\scrH)||\scrE_{p}(\scrH)| for pca, its estimator |φ||\varphi| of Theorem 3 and |\scrEp(\scrH)||\scrE_{p}(\scrH_{\sharp})| are shown for growing pp. Scaled variables p|\scrEp(\scrH)|2p\hskip 1.0pt|\scrE_{p}(\scrH)|^{2} and p|\scrEp(\scrH)|2p\hskip 1.0pt|\scrE_{p}(\scrH_{\sharp})|^{2} are provided to illustrate the convergence rates. Sample mean estimates for expectation EX\textup{{E}}\hskip 0.5pt\hskip 1.0ptX of column variable XX based on 10510^{5} simulations.
pp maxxQ(x)\max_{x}Q(x) EQ(x^)\textup{{E}}\hskip 0.5pt\hskip 1.0ptQ(\hat{x}_{\flat}) ED^p(\scrH)\textup{{E}}\hskip 0.5pt\hskip 1.0pt\hat{D}_{p}(\scrH_{\flat}) E|\scrEp(\scrH)|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH_{\flat})| pE|\scrEp(\scrH)|2p\hskip 1.0pt\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrE_{p}(\scrH_{\flat})|^{2}
500500 1.011.01 1.01.0 0.520.52 0.2780.278 38.738.7
20002000 1.031.03 0.970.97 0.97-0.97 0.2360.236 111.8111.8
80008000 1.121.12 0.370.37 5.92-5.92 0.2110.211 354.5354.5
3200032000 1.471.47 10.27-10.27 26.19-26.19 0.2010.201 1292.01292.0
128000128000 2.882.88 179.42-179.42 104.76-104.76 0.20.2 5138.05138.0
Table 4: Discrepancy and optimization bias metrics reported in Tables 3 & 2 recomputed with the vectors \scrH\scrH_{\flat} and the corresponding covariance estimator. Sample mean estimates for expectation EX\textup{{E}}\hskip 0.5pt\hskip 1.0ptX of column variable XX based on 10510^{5} simulations.
pp E|\scrH\scrB\scrB\scrHΦ2|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrH^{\top}_{\sharp}\scrB\scrB^{\top}\scrH_{\sharp}-\Phi^{2}| E|Φ2|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\Phi^{2}| E|\scrH\scrB\scrB\scrHΨ2|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|\scrH^{\top}\scrB\scrB^{\top}\scrH-{{\Psi}}^{2}| E|Ψ2|\textup{{E}}\hskip 0.5pt\hskip 1.0pt|{{\Psi}}^{2}|
500500 0.5020.502 0.98050.9805 0.50230.5023 0.9620.962
20002000 0.29380.2938 0.98070.9807 0.29520.2952 0.96610.9661
80008000 0.08390.0839 0.98120.9812 0.08750.0875 0.96770.9677
3200032000 0.02510.0251 0.98120.9812 0.02620.0262 0.96790.9679
128000128000 0.00960.0096 0.98110.9811 0.00990.0099 0.96780.9678
Table 5: The inner products of the columns of \scrH\scrH (pca) and \scrH\scrH_{\sharp} after projection onto col(\scrB)\textsc{col}{\hskip 0.5pt(\scrB)}, and their estimators Ψ2{{\Psi}}^{2} and Φ2\Phi^{2}. The norms |Ψ2||{{\Psi}}^{2}| and |Φ2||\Phi^{2}| estimate the largest, squared projected lengths of the columns of \scrH\scrH and \scrH\scrH_{\sharp} respectively. Sample mean estimates for expectation EX\textup{{E}}\hskip 0.5pt\hskip 1.0ptX of column variable XX based on 10510^{5} simulations.

6.1 Discussion of the results

Table 1 and Figure 3 summarize the simulations for our minimum variance portfolio application. Volatilities are quoted in percent annualized units (see Section 6.2) and only portfolio sizes p=500,2000,8000p=500,2000,8000 should be considered as practically relevant. Three sets of portfolio weights (w^,w^,w^\bbRp\hat{w},\hat{w}_{\flat},\hat{w}_{\sharp}\in\bbR^{p}) constructed with the covariance models in (59)(\ref{nSig}) are tested. As predicted by (16)(\ref{tru_asymp}), the realized portfolio volatility \scrVp(\scrH)\scrV_{p}(\scrH) for the pca-model weights w^\hat{w} in the third column of Table 1 remains bounded away from zero (on average). The same holds for the partially corrected estimator \scrH\scrH_{\flat}, which substantially decreases the volatility of the pca weights for larger portfolios. This estimator was also tested for q=4q=4 in Goldberg et al. (2020), but for a model in which (\scrBz)j0(\scrB^{\top}z)_{j}\to 0 as pp\uparrow\infty for j=2,3,4j=2,3,4. In this special case, the estimators \scrH\scrH_{\flat} and \scrH\scrH_{\sharp} coincide asymptotically. In our more realistic model of Section 6.2, all sample eigenvectors require correction as evident by comparing \scrVp(\scrH)\scrV_{p}(\scrH_{\flat}) and \scrVp(\scrH)\scrV_{p}(\scrH_{\sharp}) in Table 1. The latter portfolio volatility decays at the rate of roughly 1/p1/\sqrt{p}. The true volatility σmin\sigma_{\min} (second column of Table 1) also decays at this rate. Figure 3 depicts the much larger deviations about the average that the estimators \scrH\scrH and \scrH\scrH_{\flat} produces on the portfolio volatility metric relative to \scrH\scrH_{\sharp}. Surprisingly, \scrH\scrH_{\flat} produced the largest such deviations.

Table 2 and Figure 4 compare the pca-model ^Σ\bm{\hat{}}{\Sigma} to our optimization-bias free estimator ^Σ\bm{\hat{}}{\Sigma}_{\sharp} on the quadratic function objectives in (61)(\ref{numobj}). As predicted in Section 1.1 and Theorem 1 in particular, the true objective value (the second column of Table 2) increases in pp while the realized objective Q(x^)Q(\hat{x}) decreases rapidly. The (expected) discrepancy D^p(\scrH)\hat{D}_{p}(\scrH) of the pca-model is shown to diverge to negative infinity linearly with the dimension as predicted by Theorem 1 (i.e, largest qq eigenvalues of the covariance model of Section 6.2 diverge in pp). The last two columns of Table 2 confirm the realized maximum and discrepancy produced by the corrected eigenvectors \scrH\scrH_{\sharp} behave in a more desirable way. The discrepancy D^p(\scrH)\hat{D}_{p}(\scrH_{\sharp}) appears to converge in a neighborhood of the optimal value one, while the realized maximum Q(x^)Q(\hat{x}_{\sharp}) has a trend similar to that of the true maximum. Figure 4 shows the large uncertainly of the average behaviour summarized in Table 2 that results from using the sample eigenvectors \scrH\scrH or their partially corrected version, \scrH\scrH_{\flat} (see Table 4 for the averages). The uncertainty produced by the corrected eigenvectors \scrH\scrH_{\sharp} is negligible by comparison.

Table 3 summarizes our numerical results on the length of the quadratic optimization bias |\scrEp()||\scrE_{p}(\hskip 1.0pt\cdot\hskip 1.0pt)| for the sample eigenvectors \scrH\scrH and the corrected vectors \scrH\scrH_{\sharp}. Table 4 supplies the same for the partially corrected eigenvectors \scrH\scrH_{\flat}. The first three columns of Table 3 confirm the findings of Theorem 3, i.e., the length optimization bias |\scrEp(\scrH)||\scrE_{p}(\scrH)| for pca may be accurately estimated from observable data in higher dimensions. We find that the expected length |\scrEp(\scrH)||\scrE_{p}(\scrH)| converges away from zero, and that p|\scrEp(\scrH)|2p\hskip 1.0pt|\scrE_{p}(\scrH)|^{2} diverges in expectation. This is predicted by Theorem 3 since \scrHz\bbR7\scrH^{\top}z\in\bbR^{7} does not vanish as pp grows. Table 4 presents similar findings for \scrH\scrH_{\flat}, which we have not analyzed theoretically. Column 44 of Table 3 confirms the predictions of Theorem 9, i.e., the corrected bias length |\scrEp(\scrH)||\scrE_{p}(\scrH_{\sharp})| vanishes as pp grows. Our numerical finding expand on this by also demonstrating that p|\scrEp(\scrH)|2p\hskip 1.0pt|\scrE_{p}(\scrH_{\sharp})|^{2} appears to be bounded (in expectation). This suggest a convergence rate of O(1/p)O(1/\sqrt{p}) for the corrected bias |\scrEp(\scrH)||\scrE_{p}(\scrH_{\sharp})|. The latter is consistent with the asymptotics of D^p(\scrH)\hat{D}_{p}(\scrH_{\sharp}) in Table 2 which Theorem 1 forecasts to behave as O(1)(1p|\scrEp(\scrH)|2)O(1)(1-p\hskip 1.0pt|\scrE_{p}(\scrH_{\sharp})|^{2}).

Table 5 provides support for Theorem 4(c) and Theorem 9 which concerns the projection of the estimated eigenvectors onto the population subspace col(\scrB)\textsc{col}{\hskip 0.5pt(\scrB)}. The convergence verified in columns two and four show that the vectors in \scrH\scrH and \scrH\scrH_{\sharp} remain orthogonal after projection onto col(\scrB)\textsc{col}{\hskip 0.5pt(\scrB)} because Φ2\Phi^{2} and Ψ2{{\Psi}}^{2} are diagonal matrices. The largest elements of these matrices (presented as averages in columns three and five) estimate the largest length squared of the columns of \scrH\scrH and \scrH\scrH_{\sharp} in col(\scrB)\textsc{col}{\hskip 0.5pt(\scrB)} respectively. This confirms \scrH\scrH_{\sharp} has a larger such projection than does \scrH\scrH.

Refer to caption
Refer to caption
Figure 5: Left panel: Histograms of exposures to the market and style risk factors (i.e., the first three columns of the risk factor exposures matrix Ξ\Xi). Right panel: Histogram of asset specific volatilities (i.e., square roots of the diagonal entries of Γ\Gamma).
Refer to caption
Refer to caption
Figure 6: Visualization of the asset industry memberships matrix \calM=j=47ξjξj\calM=\sum_{j=4}^{7}\xi_{j}\xi^{\top}_{j} for ξj\xi_{j}, the jjth column of the risk factor exposure matrix Ξ\Xi (for a random sample of 500500 assets). The left panel shows \calM\calM and the right panel shows its Cuthill–McKee ordering. The block structure corresponds to industry groupings.

6.2 Population covariance model

Our covariance matrix calibration loosely follows the specification of the Barra US equity risk model (see Menchero, Orr and Wang (2011) and Blin, Guerard and Mark (2022)). To this end, we introduce a (random) vector of factor returns f\bbR7f\in\bbR^{7} and a pmax×7p_{\max}\times 7 exposure matrix Ξ\Xi which satisfy,

var(f)=(25000554468-2206400000001600005500481192-10804400192260-8226800-108-8160-44-2200022-44121),Bx=Ξf,\displaystyle\textup{{var}}(f)=\left(\begin{array}[]{ccccccc}$250$&$0$&$0$&$55$&$44$&$68$&$-22$\\ $0$&$64$&$0$&$0$&$0$&$0$&$0$\\ $0$&$0$&$16$&$0$&$0$&$0$&$0$\\ $55$&$0$&$0$&$481$&$192$&$-108$&$0$\\ $44$&$0$&$0$&$192$&$260$&$-8$&$22$\\ $68$&$0$&$0$&$-108$&$-8$&$160$&$-44$\\ $-22$&$0$&$0$&$0$&$22$&$-44$&$121$\\ \end{array}\right)\hskip 1.0pt,\hskip 16.0ptBx=\Xi f\hskip 1.0pt, (69)

with BxBx in (58)(\ref{model}) such that f=Axf=Ax and var(f)=AA\textup{{var}}(f)=AA^{\top}. The factor returns ff are Gaussian with mean-zero and covariance in (69)(\ref{varf}). The unit are chosen so that the factor volatilities (square roots of the diagonal of var(f)\textup{{var}}(f)) are in units of annualized percent. The columns of Ξ\Xi are exposures to (q=7q=7) fundamental risk factors (market risk, two style risk factors and fours industry risk factors), and are generated as follows.

  1. The entries of the first column (exposures to market risk) of Ξ\Xi are drawn as i.i.d. normal with mean 1.01.0 and standard deviation 0.250.25. The second and third columns of Ξ\Xi (style risk factors) have i.i.d. entries that are normal with mean zero and standard deviations 0.50.5 and 1.01.0 respectively for those columns.

  2. The last four columns of Ξ\Xi are initialized to be zero and for each row ii, independently of all other rows, we select two industries I1I_{1} and I2I_{2} from {1,2,3,4}\{1,2,3,4\} uniformly at random and without replacement. Then, drawing U1U_{1} and U2U_{2} that are independent and uniform in (0,1)(0,1), we set ΞiI1=U1\Xi_{iI_{1}}=U_{1} and ΞiI2=ΞiI1+U2\Xi_{iI_{2}}=\Xi_{iI_{1}}+U_{2}.

The left panel of Figure 5 contains histograms of the first three columns of Ξ\Xi. This calibration of market and style risk factors is similar to that in Goldberg et al. (2020), who do not consider industry risk, and compare the estimators \scrH\scrH and \scrH\scrH_{\flat} in simulation. The entries of the last four columns, which correspond to industry risk factors, have the following interpretation. Each asset chooses two industries for membership with an exposure of 0.50.5 to each on average. When the chosen industries are the same, that exposure is 1.01.0 on average (i.e., U1+U2U_{1}+U_{2}). Figure 6 supplies a visual illustration of the structure of these industry memberships. The industry risk factors drive the poor performance of the estimator \scrH\scrH_{\flat} in our simulations due to the nonzero projection that the corresponding four columns of Ξ\Xi have in col(z)\textsc{col}{\hskip 0.5pt(z)}. The latter translates to components of the optimization bias vector \scrEp(\scrH)\scrE_{p}(\scrH_{\flat}) that materially deviate from zero, and the first that is suboptimally corrected.

The asset specific return ϵ\bbRpmax\epsilon\in\bbR^{p_{\max}} in (58)(\ref{model}) are drawn from a mean-zero Gaussian distribution with a diagonal covariance matrix var(ϵ)=Γ\textup{{var}}(\epsilon)=\Gamma. We take Γii=γi2\Gamma_{ii}=\gamma_{i}^{2}, for asset specific volatilities γi\gamma_{i}, drawn as independent copies of 25+75×Z25+75\times Z where ZZ is a Beta(4,16)\text{Beta}(4,16) distributed random variable. These are quoted in annualized percent units, and we refer the reader to Clarke, De Silva and Thorley (2011) for typical values that are estimated in practice. Lastly, the expected return vector α\bbRpmax\alpha\in\bbR^{p_{\max}} in (58)(\ref{model}) is taken as α=Ξσf\alpha=\Xi\sigma_{f} for σf=diag(var(f))(15.81,8,4,21.93,16.12,12.65,11)\sigma_{f}=\sqrt{\text{diag}(\textup{{var}}(f))}\approx(15.81,8,4,21.93,16.12,12.65,11).

Appendix A Proofs for Section 2

By direct computation based on the definition in (7)(\ref{eH}) we obtain,

z,zH=Hz,(HH)1Hz=HHz,HHz=|zH|2\displaystyle\langle z,z_{H}\rangle=\langle H^{\top}z,(H^{\top}H)^{-1}H^{\top}z\rangle=\langle HH^{\dagger}z,HH^{\dagger}z\rangle=|z_{H}|^{2} (70)

and, recalling that z=ζ|ζ|z=\frac{\zeta}{|\zeta|}, the above yields the following useful identities.

|zzH|2=1|zH|2=z,zzH=1|ζH|2/|ζ|2\displaystyle|z-z_{H}|^{2}=1-|z_{H}|^{2}=\langle z,z-z_{H}\rangle=1-|\zeta_{H}|^{2}/|\zeta|^{2} (71)

The right side is bounded away from zero in pp under Assumption 2. Throughout, we regard γ^\hat{\gamma} as a sequence in pp that is bounded in (0,)(0,\infty). We also introduce an auxiliary sequence r=rpr=r_{p}\uparrow\infty to generalize the rates in Assumptions 1 and 2 so that both BB/rpB^{\top}B/r_{p} and HH/rpH^{\top}H/r_{p} converge to invertible q×qq\times q matrices.

We begin by expanding on some of the calculations in Section 1.1 and Section 2. Starting with (12)(\ref{hQx}), the maximizer of Q^()\hat{Q}(\hskip 1.0pt\cdot\hskip 1.0pt) is easily calculated as x^=c1^Σ1ζ\hat{x}=c_{1}\bm{\hat{}}{\Sigma}^{-1}\zeta, and

maxx\bbRpQ^(x)\displaystyle\hskip 32.0pt\max_{x\in\bbR^{p}}\hat{Q}(x) =c0+c12ζ,^Σ1ζc122ζ,^Σ1ζ\displaystyle=c_{0}+c_{1}^{2}\hskip 1.0pt\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle-\frac{c_{1}^{2}}{2}\hskip 1.0pt\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle
=c0+c12μ^p22\displaystyle=c_{0}+\frac{c_{1}^{2}\hat{\mu}^{2}_{p}}{2}

justifying the expression for Q^(x^)\hat{Q}(\hat{x}) below (12)(\ref{hQx}) with μ^p2=ζ,^Σ1ζ\hat{\mu}_{p}^{2}=\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle as well as (2)(\ref{conv}).

Define w^=^Σ1ζ/μ^p2=x^/(c1μ^p2)\hat{w}=\bm{\hat{}}{\Sigma}^{-1}\zeta\hskip 1.0pt/\hat{\mu}_{p}^{2}=\hat{x}\hskip 1.0pt/(c_{1}\hat{\mu}_{p}^{2}) and set \scrVp2=w^,Σw^\scrV^{2}_{p}=\langle\hat{w},\Sigma\hat{w}\rangle per (15)(\ref{tru}). Then,

Q(x^)\displaystyle{Q}(\hat{x}) =c0+c12μ^p212x^,Σx^\displaystyle=c_{0}+c_{1}^{2}\hskip 1.0pt\hat{\mu}_{p}^{2}-\frac{1}{2}\langle\hat{x},\Sigma\hat{x}\rangle
=c0+c12μ^p22(2x^,Σx^c12μ^p2)\displaystyle=c_{0}+\frac{c_{1}^{2}\hskip 1.0pt\hat{\mu}^{2}_{p}}{2}\Big{(}2-\frac{\langle\hat{x},\Sigma\hat{x}\rangle}{c_{1}^{2}\hat{\mu}^{2}_{p}}\Big{)}
=c0+c12μ^p22(2μ^p2\scrVp2)\displaystyle=c_{0}+\frac{c_{1}^{2}\hskip 1.0pt\hat{\mu}^{2}_{p}}{2}\hskip 1.0pt\big{(}2-\hat{\mu}^{2}_{p}\scrV_{p}^{2}\big{)}

which is identical to (13)(\ref{QhxD}) with D^p=2μ^p2\scrVp2\hat{D}_{p}=2-\hat{\mu}^{2}_{p}\hskip 1.0pt\scrV_{p}^{2}.

Lastly, we recognize w^\hat{w} as the (unique) solution of (14)(\ref{minvar}). The following provides a useful decomposition of these solutions.

Lemma 10.

Suppose H=Hp×qH=H_{p\times q} has limpHH/rp\lim_{p\uparrow\infty}H^{\top}H/r_{p} as an invertible q×qq\times q matrix. Then, for vectors v\bbRpv\in\bbR^{p} with supp|v|<\sup_{p}|v|<\infty, the minimizer w^\hat{w} of (14)(\ref{minvar}) has

w^=ζζHζ,ζζH+v|ζ|rp=1|ζ|(zzH|zzH|2+vrp).\hat{w}=\frac{\zeta-\zeta_{H}}{\langle\zeta,\zeta-\zeta_{H}\rangle}+\frac{v}{|\zeta|r_{p}}=\frac{1}{|\zeta|}\Big{(}\frac{z-z_{H}}{\hskip 1.0pt|z-z_{H}|^{2}}+\frac{v}{r_{p}}\Big{)}\hskip 1.0pt. (72)
Proof.

We begin with an expression of ^Σ1\bm{\hat{}}{\Sigma}^{-1} via the Woodbury identity.

^Σ1=1γ^2(Ipγ^2H(Iq+γ^2HH)1H)\displaystyle\bm{\hat{}}{\Sigma}^{-1}=\frac{1}{\hat{\gamma}^{2}}\hskip 1.0pt\Big{(}I_{p}-\hat{\gamma}^{-2}H\big{(}I_{q}+\hat{\gamma}^{-2}H^{\top}H\big{)}^{-1}H^{\top}\Big{)}

Next, consider the singular value decomposition H=\scrH\calSp\scrUH=\scrH\calS_{p}\scrU^{\top} where \scrH\bbRp×q\scrH\in\bbR^{p\times q} and \scrUq×q\scrU\in\mathbb{R}^{q\times q} have orthonormal columns and \calSpq×q\calS_{p}\in\mathbb{R}^{q\times q} is diagonal. Then,

^Σ1\displaystyle\bm{\hat{}}{\Sigma}^{-1} =1γ^2(Ip\scrH\calSp(γ^2Iq+\calSp2)1\calSp\scrH)\displaystyle=\frac{1}{\hat{\gamma}^{2}}\hskip 1.0pt\big{(}I_{p}-\scrH\calS_{p}\hskip 1.0pt\big{(}\hat{\gamma}^{2}I_{q}+\calS^{2}_{p}\big{)}^{-1}\calS_{p}\scrH^{\top}\big{)}
=1γ^2(Ip\scrH\scrH+γ^2\scrH(γ^2Iq+\calSp2)1\scrH)\displaystyle=\frac{1}{\hat{\gamma}^{2}}\hskip 1.0pt\big{(}I_{p}-\scrH\scrH^{\top}+\hat{\gamma}^{2}\scrH\big{(}\hat{\gamma}^{2}I_{q}+\calS_{p}^{2}\big{)}^{-1}\scrH^{\top}\big{)} (73)

where at the last step we utilized that d2γ^2+d2=1γ^2γ^2+d2\frac{d^{2}}{\hat{\gamma}^{2}+d^{2}}=1-\frac{\hat{\gamma}^{2}}{\hat{\gamma}^{2}+d^{2}} and that \calSp\calS_{p} is diagonal.

Starting with the expression w^=^Σ1ζ/μ^p2\hat{w}=\bm{\hat{}}{\Sigma}^{-1}\zeta\hskip 1.0pt/\hskip 1.0pt\hat{\mu}_{p}^{2}, we define

Cp=γ^2\scrH(γ^2I+\calSp2)1\scrHz,\displaystyle C_{p}=\hat{\gamma}^{2}\hskip 1.0pt\scrH(\hat{\gamma}^{2}I+\calS_{p}^{2})^{-1}\scrH^{\top}z\hskip 1.0pt, (74)

substitute (73)(\ref{eq:portcalc1}) and use that 11+δ=1δ1+δ\frac{1}{1+\delta}=1-\frac{\delta}{1+\delta} and ζH=HHζ=\scrH\scrHζ\zeta_{H}=HH^{\dagger}\zeta=\scrH\scrH^{\top}\zeta, to obtain

w^\displaystyle\hat{w} =ζζH+|ζ|Cpζ,ζζH+|ζ|ζ,Cp=(ζζHζ,ζζH+|ζ|Cpζ,ζζH)(1δp1+δp)\displaystyle=\frac{\zeta-\zeta_{H}+|\zeta|\hskip 1.0ptC_{p}}{\langle\zeta,\zeta-\zeta_{H}\rangle+|\zeta|\langle\zeta,C_{p}\rangle}=\Big{(}\frac{\zeta-\zeta_{H}}{\langle\zeta,\zeta-\zeta_{H}\rangle}+\frac{|\zeta|\hskip 1.0ptC_{p}}{\langle\zeta,\zeta-\zeta_{H}\rangle}\Big{)}\Big{(}1-\frac{\delta_{p}}{1+\delta_{p}}\Big{)}

for δp=|ζ|ζ,Cpζ,ζζH=z,Cp|zzH|2\delta_{p}=\frac{|\zeta|\langle\zeta,C_{p}\rangle}{\langle\zeta,\zeta-\zeta_{H}\rangle}=\frac{\langle z,C_{p}\rangle}{|z-z_{H}|^{2}} per (71)(\ref{zmzH}). This identifies vv in (72)(\ref{hwu}) via the relation,

v|ζ|rp\displaystyle\frac{v}{|\zeta|r_{p}} =|ζ|Cpζ,ζζH(1δp1+δp)ζζHζ,ζζH(δp1+δp)\displaystyle=\frac{|\zeta|\hskip 1.0ptC_{p}}{\langle\zeta,\zeta-\zeta_{H}\rangle}\Big{(}1-\frac{\delta_{p}}{1+\delta_{p}}\Big{)}-\frac{\zeta-\zeta_{H}}{\langle\zeta,\zeta-\zeta_{H}\rangle}\Big{(}\frac{\delta_{p}}{1+\delta_{p}}\Big{)}
=|ζ|Cpζ,ζζH(11+δp)ζζHζ,ζζH(δp1+δp)\displaystyle=\frac{|\zeta|\hskip 1.0ptC_{p}}{\langle\zeta,\zeta-\zeta_{H}\rangle}\Big{(}\frac{1}{1+\delta_{p}}\Big{)}-\frac{\zeta-\zeta_{H}}{\langle\zeta,\zeta-\zeta_{H}\rangle}\Big{(}\frac{\delta_{p}}{1+\delta_{p}}\Big{)}
=|ζ|Cp(ζζH)δp(1+δp)ζ,ζζH\displaystyle=\frac{|\zeta|C_{p}-(\zeta-\zeta_{H})\hskip 1.0pt\delta_{p}}{(1+\delta_{p})\langle\zeta,\zeta-\zeta_{H}\rangle}
=1|ζ|(Cp(zzH)δp(1+δp)|zzH|2))\displaystyle=\frac{1}{|\zeta|}\bigg{(}\frac{C_{p}-(z-z_{H})\hskip 1.0pt\delta_{p}}{(1+\delta_{p})\hskip 1.0pt|z-z_{H}|^{2})}\bigg{)}

Because 0δp|Cp|/|zzH|20\leq\delta_{p}\leq|C_{p}|\hskip 1.0pt/|z-z_{H}|^{2}, to conclude the proof, it now suffices to show that |Cp||C_{p}| is O(1/rp)O(1/r_{p}) so that also supp|v|<\sup_{p}|v|<\infty as required. We have,

|Cp|\displaystyle|C_{p}| γ^2|\scrH(γ^2I+\calSp2)1\scrH|\displaystyle\leq\hat{\gamma}^{2}|\scrH(\hat{\gamma}^{2}I+\calS_{p}^{2})^{-1}\scrH^{\top}|
=maxjqγ^2(γ^2I+\calSp2)jj\displaystyle=\max_{j\leq q}\hskip 1.0pt\frac{\hat{\gamma}^{2}}{(\hat{\gamma}^{2}I+\calS^{2}_{p})_{jj}}
maxjqγ^2/(\calSp2)jj\displaystyle\leq\max_{j\leq q}\hskip 1.0pt\hat{\gamma}^{2}/(\calS_{p}^{2})_{jj}
=γ^2|\scrU\calSp2\scrU|\displaystyle=\hat{\gamma}^{2}\hskip 1.0pt|\scrU\calS_{p}^{-2}\scrU^{\top}|
=(γ^2/rp)|(HH)1rp|.\displaystyle=(\hat{\gamma}^{2}/r_{p})\hskip 1.0pt|(H^{\top}H)^{-1}r_{p}|\hskip 1.0pt. (75)

Since the spectral norm |||\hskip 1.0pt\cdot\hskip 1.0pt| and the inverse of a matrix over invertible matrices are continuous functions, our assumption on limpHH/rp\lim_{p\uparrow\infty}H^{\top}H/r_{p} implies that rp|(HH)1|r_{p}\hskip 1.0pt|(H^{\top}H)^{-1}| converges to a finite number. This, together with (75)(\ref{eq: portcalc2}) completes the proof. ∎

Proof of Theorem 1.

Continuing from the expression D^p=2μ^p2\scrVp2\hat{D}_{p}=2-\hat{\mu}^{2}_{p}\hskip 1.0pt\scrV_{p}^{2}, we first address the asymptotics of \scrVp2=w^,Σw^\scrV_{p}^{2}=\langle\hat{w},\Sigma\hat{w}\rangle, and using (9)(\ref{bSig}) this yields,

\scrVp2=|Bw^|2+w^,Γw^.\scrV_{p}^{2}=|B^{\top}\hat{w}|^{2}+\langle\hat{w},\Gamma\hat{w}\rangle. (76)

Applying (72)(\ref{hwu}) of Lemma 10 and the positive definiteness of Γ\Gamma per Assumption 1,

0w^,Γw^=1|ζ|2(uH,ΓuH|zzH|2+2v,ΓuHrp|zzH|+v,Γvrp2)\displaystyle 0\leq\langle\hat{w},\Gamma\hat{w}\rangle=\frac{1}{|\zeta|^{2}}\Big{(}\frac{\langle u_{H},\Gamma u_{H}\rangle}{|z-z_{H}|^{2}}+\frac{2\langle v,\Gamma u_{H}\rangle}{r_{p}|z-z_{H}|}+\frac{\langle v,\Gamma v\rangle}{r_{p}^{2}}\Big{)} (77)

where uH=zzH|zzH|u_{H}=\frac{z-z_{H}}{|z-z_{H}|} and supp|v|<\sup_{p}|v|<\infty. Turning our attention to the first term in (76)(\ref{V2}) by letting w^H=ζζHζ,ζζH=1|ζ|zzH|zzH|2\hat{w}_{H}=\frac{\zeta-\zeta_{H}}{\langle\zeta,\zeta-\zeta_{H}\rangle}=\frac{1}{|\zeta|}\frac{z-z_{H}}{|z-z_{H}|^{2}}, we again apply Lemma 10 to deduce that

|Bw^|2\displaystyle|B^{\top}\hat{w}|^{2} =|B(w^H+v|ζ|rp)|2=|Bw^H|2+2Bw^H,Bv|ζ|rp+(|Bv||ζ|rp)2.\displaystyle=\Big{|}B^{\top}\big{(}\hat{w}_{H}+\frac{v}{|\zeta|r_{p}}\big{)}\Big{|}^{2}=|B^{\top}\hat{w}_{H}|^{2}+2\frac{\langle B^{\top}\hat{w}_{H},B^{\top}v\rangle}{|\zeta|r_{p}}+\bigg{(}\frac{|B^{\top}v|}{|\zeta|r_{p}}\bigg{)}^{2}.

Since \scrBw^H=\scrEp(H)|ζ||zzH|\scrB^{\top}\hat{w}_{H}=\frac{\scrE_{p}(H)}{|\zeta||z-z_{H}|} per (11)(\ref{optbias}), the decomposition BB=\scrBΛp2\scrBBB^{\top}=\scrB\Lambda^{2}_{p}\scrB^{\top}, yields

|Bw^|2\displaystyle|B^{\top}\hat{w}|^{2} =|Λp\scrEp(H)|2|ζ|2|zzH|2+2Λp2\scrEp(H),\scrBvrp|ζ|2|zzH|+(|Λp\scrBv||ζ|rp)2.\displaystyle=\frac{|\Lambda_{p}\hskip 1.0pt\scrE_{p}(H)|^{2}}{|\zeta|^{2}|z-z_{H}|^{2}}+2\frac{\langle\Lambda^{2}_{p}\hskip 1.0pt\scrE_{p}(H),\scrB^{\top}v\rangle}{r_{p}\hskip 1.0pt|\zeta|^{2}|z-z_{H}|}+\bigg{(}\frac{|\Lambda_{p}\scrB^{\top}v|}{|\zeta|r_{p}}\bigg{)}^{2}. (78)

Considering D^p\hat{D}_{p}, we examine μ^p2=ζ,^Σ1ζ\hat{\mu}_{p}^{2}=\langle\zeta,\bm{\hat{}}{\Sigma}^{-1}\zeta\rangle for pp large. Using (73)(\ref{eq:portcalc1}) and (71)(\ref{zmzH}),

μ^p2=z,^Σ1z|ζ|2=(|ζ|γ^)2(|zzH|2+z,Cp)\displaystyle\hat{\mu}_{p}^{2}=\langle z,\bm{\hat{}}{\Sigma}^{-1}z\rangle|\zeta|^{2}=\bigg{(}\frac{|\zeta|}{\hat{\gamma}}\bigg{)}^{2}\hskip 2.0pt\Big{(}|z-z_{H}|^{2}+\langle z,C_{p}\rangle\Big{)} (79)

where CpC_{p}, in Lemma 10, was shown to have z,Cp\langle z,C_{p}\rangle in O(1/rp)O(1/r_{p}). We have |Λp2||\Lambda_{p}^{2}| in O(rp)O(r_{p}) for our modification of Assumption 1 and assuming |ζ|/rp|\zeta|/r_{p} vanishes,

μ^p2w^,Γw^\displaystyle\hat{\mu}_{p}^{2}\langle\hat{w},\Gamma\hat{w}\rangle =uH,ΓuHγ^2+uH,ΓuHz,Cpγ^2|zzH|2+op(Γ)+op(Γ)z,Cp|zzH|2\displaystyle=\frac{\langle u_{H},\Gamma u_{H}\rangle}{\hat{\gamma}^{2}}+\frac{\langle u_{H},\Gamma u_{H}\rangle\langle z,C_{p}\rangle}{\hat{\gamma}^{2}|z-z_{H}|^{2}}+o_{p}(\Gamma)+\frac{o_{p}(\Gamma)\langle z,C_{p}\rangle}{|z-z_{H}|^{2}}

for op(Γ)=2v,ΓuH|zzH|rp+v,Γv|zzH|2γ^2rp2o_{p}(\Gamma)=\frac{2\langle v,\Gamma u_{H}\rangle|z-z_{H}|\hskip 1.0ptr_{p}+\langle v,\Gamma v\rangle|z-z_{H}|^{2}}{\hat{\gamma}^{2}r_{p}^{2}} is in O(1/rp)O(1/r_{p}) as the eigenvalues of Γp×p\Gamma_{p\times p} are bounded in pp. So, the last three terms in the above display are in O(1/rp)O(1/r_{p}).

Similarly, combining (78)(\ref{Bhw2}) and (79)(\ref{mhp2}), we obtain

μ^p2|Bw^|2\displaystyle\hat{\mu}_{p}^{2}|B^{\top}\hat{w}|^{2} =|Λp\scrEp(H)|2γ^2+|Λp\scrEp(H)|2z,Cpγ^2|zzH|2+op(B)+op(B)z,Cp|zzH|2\displaystyle=\frac{|\Lambda_{p}\hskip 1.0pt\scrE_{p}(H)|^{2}}{\hat{\gamma}^{2}}+\frac{|\Lambda_{p}\hskip 1.0pt\scrE_{p}(H)|^{2}\langle z,C_{p}\rangle}{\hat{\gamma}^{2}|z-z_{H}|^{2}}+o_{p}(B)+\frac{o_{p}(B)\langle z,C_{p}\rangle}{|z-z_{H}|^{2}}

where the 2nd term is in O(|\scrEp(H)|2)O(|\scrE_{p}(H)|^{2}) and op(B)o_{p}(B) is in O(|\scrEp(H)|+1/rp)O(|\scrE_{p}(H)|+1/r_{p}) as

op(B)\displaystyle o_{p}(B) =2Λp2\scrEp(H),\scrBv|zzH|rpγ^2+1rp(|Λp\scrBv||zzH|γ^rp)2\displaystyle=\frac{2\langle\Lambda^{2}_{p}\hskip 1.0pt\scrE_{p}(H),\scrB^{\top}v\rangle|z-z_{H}|}{r_{p}\hskip 1.0pt\hat{\gamma}^{2}}+\frac{1}{r_{p}}\bigg{(}\frac{|\Lambda_{p}\scrB^{\top}v||z-z_{H}|}{\hat{\gamma}\sqrt{r_{p}}}\bigg{)}^{2} (80)

where we note that |\scrB||\scrB| and |v||v| are bounded in pp. The claim now follows. ∎

Appendix B Proofs for Section 3

Essential for our proofs is Weyl’s inequality for eigenvalue perturbations of a matrix (Weyl, 1912). In particular, for symmetric m×mm\times m matrices AA and NN,

maxj|αjαj||N|\max_{j}|\alpha_{j}-\alpha_{j}^{\prime}|\leq|N|

where αj\alpha_{j} and αj\alpha_{j}^{\prime} denote the jjth largest eigenvalues of AA and A+NA+N respectively.

Define Wp=JYYJ\bbRn×nW_{p}=JY^{\top}Y\hskip 1.0ptJ\in\bbR^{n\times n} with J=Igg|g|2J=I-\frac{\hskip 2.0ptgg^{\top}}{|g|^{2}} in (20)(\ref{Jc}) and YY in (17)(\ref{data}).

Wp=J\calXBB\calXJ+J\calE\calEJ+J\calXB\calEJ+(J\calXB\calEJ)\displaystyle W_{p}=J\calX B^{\top}B\calX^{\top}J+J\calE^{\top}\calE J+J\calX B^{\top}\calE J+(J\calX B^{\top}\calE J)^{\top} (81)

By Assumption 1(b) the following q×qq\times q limit matrix exists, with the right side the eigenvalue decomposition with q×qq\times q orthogonal \scrW\scrW and invertible, diagonal Λ\Lambda.

limpBBp=\scrWΛ2\scrW\displaystyle\lim_{p\uparrow\infty}\frac{B^{\top}B}{p}=\scrW\Lambda^{2}\scrW^{\top} (82)

For γ2\gamma^{2} of Assumption 6(d), define the n×qn\times q matrix MM and n×nn\times n matrix LL as,

L=MM+γ2J,M=J\calX\scrWΛ.\displaystyle L=MM^{\top}+\gamma^{2}\hskip 1.0ptJ\hskip 1.0pt,\hskip 32.0ptM=J\calX\scrW\Lambda\hskip 1.0pt. (83)

Let λj,n2(M)\lambda^{2}_{j,n}(M) denote the jjth largest eigenvalue of MMMM^{\top} (also, MMM^{\top}M for jqj\leq q) associated with the jjth column of νn×n(M)\nu_{n\times n}(M), the eigenvectors of MMMM^{\top}. By Assumption 6(c), we have λj,n2(M)>0\lambda^{2}_{j,n}(M)>0 for jqj\leq q and λj,n2(M)=0\lambda^{2}_{j,n}(M)=0 otherwise.

Lemma 11.

Under Assumptions 1 & 6, almost surely, limpWp/p=L\lim_{p\uparrow\infty}W_{p}/p=L and,

limp\scrsj,p2/p={(λj,n2(M)+γ2)/n1j<n,γ2/nj=n,J=I,0otherwise.\displaystyle\lim_{p\uparrow\infty}\scrs^{2}_{j,p}/p=\left\{\begin{array}[]{cc}(\lambda^{2}_{j,n}(M)+\gamma^{2})\hskip 1.0pt/n&1\leq j<n\hskip 1.0pt,\\ \gamma^{2}/n&j=n,\hskip 1.0ptJ=I\hskip 1.0pt,\\ 0&\text{otherwise.}\end{array}\right.

As relevant for Assumption 6(b), above implies n+n_{+} (see (21)(\ref{snr-bulk})) almost surely converges to nn whenever J=IJ=I and n+n_{+} converges to n1n-1 otherwise, as pp\uparrow\infty.

Proof.

We address the convergence of Wp/pW_{p}/p as follows. The sum of the first and second terms in (81)(\ref{Lp}) (scaled by 1/p1/p) converge to LL due to Assumption 1(b) and Assumption 6(d). The last two terms in (81)(\ref{Lp}) (scaled by 1/p1/p) vanish by Assumption 6(e). Since the nonzero eigenvalues of WpW_{p} are those of YJJY=YJYYJJ^{\top}Y^{\top}=YJY^{\top}, almost surely, \scrsj,p2/p\scrs^{2}_{j,p}/p converges to the jjth eigenvalue of LL by Weyl’s inequality.

It remains to find the eigenvalues of LL. For J=IJ=I, it is easy to check that the eigenvalues of LL are just λj,n2(M)+γ2\lambda^{2}_{j,n}(M)+\gamma^{2} for j=1,,nj=1,\dots,n with eigenvectors νn×n(M)\nu_{n\times n}(M). When JIJ\neq I, we have Lg=0Lg=0 so that for any other eigenvector vv of LL, we have v,g=0\langle v,g\rangle=0 and consequently Jv=vJv=v. It follows when JIJ\neq I, the eigenvalues of LL are given by λj,n2(M)+γ2\lambda^{2}_{j,n}(M)+\gamma^{2} for j<nj<n and zero otherwise. This concludes the proof. ∎

Proof of Theorem 4.

Taking (b) first, κp2\kappa_{p}^{2} in (21)(\ref{snr-bulk}) is the average of \scrsj,p2\scrs^{2}_{j,p} for q+1jn+q+1\leq j\leq n_{+}. By Lemma 11, for such jj we have \scrsj,p2/pγ2/n\scrs^{2}_{j,p}/p\to\gamma^{2}/n (i.e., λj,n(M)=0\lambda_{j,n}(M)=0 for j>qj>q). Therefore, κp2/pγ2/n\kappa_{p}^{2}/p\to\gamma^{2}/n almost surely and part (b) holds.

Turning to part (a) we have (Ψ\calSp)2=\calSp2κp2I(\Psi\calS_{p})^{2}=\calS_{p}^{2}-\kappa^{2}_{p}\hskip 1.0ptI for Ψ2\Psi^{2} in (21)(\ref{snr-bulk}). By part (b) and Lemma 11, (Ψ\calSp)jj2/pλj,n2(M)/n(\Psi\calS_{p})^{2}_{jj}/p\to\lambda^{2}_{j,n}(M)/n for jqj\leq q. For Kp2K^{2}_{p} in (a), since (n/p)(Kp2)jj(n/p)(K^{2}_{p})_{jj} is the jjth largest eigenvalue of B\calXJ\calXB/pB\hskip 1.0pt\calX^{\top}J\calX B^{\top}/p and equals that of J\calXBB\calXJ/pJ\calX B^{\top}B\calX\top J\hskip 1.0pt/p. The latter n×nn\times n matrix converges to MMMM^{\top} by Assumption 1(b) and now, by Weyl’s inequality, (n/p)(Kp2)jjλj,n2(M)(n/p)(K^{2}_{p})_{jj}\to\lambda^{2}_{j,n}(M) almost surely. Dividing by nn finishes the proof.

Henceforth, and in view of the above, we work with the assumption that YY has rank n+>qn_{+}>q since for any outcome there is a pp sufficiently large to ensure this.

For part (c), let \calW=\scrU\calSp1p/n\calW=\scrU\calS_{p}^{-1}\sqrt{p/n} where \scrU\scrU is the n×qn\times q matrix of right singular vectors of YJYJ corresponding to its left singular vectors \scrH=νp×q(YJ)\scrH=\nu_{p\times q}(YJ). We have,

J\calW=\calW,\calWWp\calW=pI\displaystyle J\calW=\calW\hskip 1.0pt,\hskip 32.0pt\calW^{\top}W_{p}\calW=p\hskip 1.0ptI (84)

where the first identity is due to gg being a right singular vector of YJYJ with value zero (i.e., \scrUg=0q\scrU^{\top}g=0_{q}). The second identity comes from the singular value decomposition which implies that YJ\scrU/n=\scrH\calSpYJ\scrU/\sqrt{n}=\scrH\calS_{p}. The latter further yields that,

\scrH=YJ\calWp=Y\calWp=1pB\calX\calW+1p\calE\calW.\displaystyle\scrH=\frac{YJ\calW}{\sqrt{p}}=\frac{Y\calW}{\sqrt{p}}=\frac{1}{\sqrt{p}}B\calX^{\top}\calW+\frac{1}{\sqrt{p}}\calE\calW. (85)

Multiplying this by BBBB^{\dagger} yields that for Zp=B\calX+BB\calEZ_{p}=B\calX^{\top}+BB^{\dagger}\calE,

BB\scrH=1pZp\calW,\displaystyle BB^{\dagger}\scrH=\frac{1}{\sqrt{p}}Z_{p}\calW\hskip 1.0pt,\hskip 32.0pt (86)

and we expand on ZpZpZ_{p}Z_{p}^{\top} to obtain (using that (BB)B=BBB=B(BB^{\dagger})^{\top}B=BB^{\dagger}B=B),

ZpZp\displaystyle Z_{p}Z_{p}^{\top} =\calXBB\calX+\calEB\calX+\calXB\calE+\calEBB\calE\displaystyle=\calX B^{\top}B\calX^{\top}+\calE^{\top}B\calX^{\top}+\calX B^{\top}\calE+\calE^{\top}BB^{\dagger}\calE
=YY\calE\calE+\calEBB\calE\displaystyle=Y^{\top}Y-\calE^{\top}\calE+\calE^{\top}BB^{\dagger}\calE

Since the matrix BB is full rank, BB=IB^{\dagger}B=I and also BB=\scrB\scrBBB^{\dagger}=\scrB\scrB^{\top}. Therefore,

\scrH\scrB\scrB\scrH=\scrHBB\scrH=\scrHBBBB\scrH=(BB\scrH)(BB\scrH)\displaystyle\scrH^{\top}\scrB\scrB^{\top}\scrH=\scrH^{\top}BB^{\dagger}\scrH=\scrH^{\top}BB^{\dagger}BB^{\dagger}\scrH=(BB^{\dagger}\scrH)^{\top}(BB^{\dagger}\scrH) (87)

where at the last step we used that BBBB^{\dagger} is symmetric.

Combining (86)(\ref{BBH}) and (87)(\ref{HBBH}) with (84)(\ref{JcW}) with WpW_{p} in (81)(\ref{Lp}) for which \calW\calW=\calSp2p/n\calW^{\top}\calW=\calS_{p}^{-2}p/n, adding and subtracting γ^2I\hat{\gamma}^{2}I (where γ^2=nκp2/p\hat{\gamma}^{2}=n\kappa_{p}^{2}/p) and recalling that Ψ=Iqκp2\calSp2\Psi=I_{q}-\kappa_{p}^{2}\calS_{p}^{-2},

\scrH\scrB\scrB\scrH\displaystyle\scrH^{\top}\scrB\scrB^{\top}\scrH =\calW(Wp\calE\calE+J\calEBB\calEJ)\calW/p\displaystyle=\calW^{\top}\big{(}W_{p}-\calE^{\top}\calE+J\calE^{\top}BB^{\dagger}\calE J\big{)}\calW\hskip 1.0pt/p
=Iq+\calW(γ^2Iγ^2I\calE\calE/p)\calW+\calWJ\calEBB\calEJ\calW/p\displaystyle=I_{q}+\calW^{\top}\big{(}\hskip 1.0pt\hat{\gamma}^{2}I-\hat{\gamma}^{2}I-\calE^{\top}\calE/p\big{)}\calW+\calW^{\top}J\calE^{\top}BB^{\dagger}\calE J\calW\hskip 1.0pt/p
=Ψ2+\calW(γ^2I\calE\calE/p)\calW+\calWJ\calEBB\calEJ\calW/p.\displaystyle=\Psi^{2}+\calW^{\top}\big{(}\hskip 1.0pt\hat{\gamma}^{2}I-\calE^{\top}\calE/p\big{)}\calW+\calW^{\top}J\calE^{\top}BB^{\dagger}\calE J\calW\hskip 1.0pt/p\hskip 1.0pt.

From the above, we obtain the following bound.

|\scrH\scrB\scrB\scrHΨ2||\calW|2(|γ^2I\calE\calE/p|+|J\calEBB\calEJ|)/p.\displaystyle|\scrH^{\top}\scrB\scrB^{\top}\scrH-{{\Psi}}^{2}|\leq\hskip 1.0pt|\calW|^{2}\hskip 1.0pt\big{(}\hskip 1.0pt|\hskip 1.0pt\hat{\gamma}^{2}I-\calE^{\top}\calE/p|+|J\calE^{\top}BB^{\dagger}\calE J|\hskip 1.0pt\big{)}\hskip 1.0pt/p\hskip 1.0pt. (88)

We have |\scrW|2=|\scrU\calSp1|2(p/n)|\scrU|2minjqn\scrsj,p2/p|\scrW|^{2}=|\scrU\calS_{p}^{-1}|^{2}\hskip 1.0pt(p/n)\leq\frac{|\scrU|^{2}}{\min_{j\leq q}n\scrs^{2}_{j,p}/p} and thus,

lim¯p|\scrW|2<\displaystyle\varlimsup_{p\uparrow\infty}|\scrW|^{2}<\infty (89)

almost surely by Lemma 11 and using that |\scrU|1|\scrU|\leq 1.

By part (b) and Assumption 6(d), we also have that almost surely,

limp|γ^2I\calE\calE/p|=0.\displaystyle\lim_{p\uparrow\infty}|\hskip 1.0pt\hat{\gamma}^{2}I-\calE^{\top}\calE/p|=0\hskip 1.0pt. (90)

Since |BB\calEJ|2=|(BB\calEJ)BB\calEJ|=|J\calEBB\calEJ||BB^{\dagger}\calE J|^{2}=|(BB^{\dagger}\calE J)^{\top}BB^{\dagger}\calE J|=|J\calE^{\top}BB^{\dagger}\calE J|, it suffices to prove that

limp|BB\calEJ|/p=0\displaystyle\lim_{p\uparrow\infty}|BB^{\dagger}\calE J|\hskip 1.0pt/\sqrt{p}=0 (91)

almost surely. In that regard, we have

|BB\calEJ|2/p\displaystyle|BB^{\dagger}\calE J\hskip 1.0pt|^{2}/p =|J\calEBB\calEJ|/p\displaystyle=|\hskip 1.0ptJ\calE^{\top}BB^{\dagger}\calE J\hskip 1.0pt|\hskip 1.0pt/p
=|J\calEB(BB)1B\calEJ|/p\displaystyle=|\hskip 1.0ptJ\calE^{\top}B(B^{\top}B)^{-1}B^{\top}\calE J\hskip 1.0pt|\hskip 1.0pt/p
=|(p1BB)1/2B\calEJ|2/p2.\displaystyle=|\hskip 1.0pt(p^{-1}B^{\top}B)^{-1/2}B^{\top}\calE J\hskip 1.0pt|^{2}/p^{2}\hskip 1.0pt.

Applying Assumption 1(b), and in particular (82)(\ref{BBlim}), yields

limp|BB\calEJ|2/p|Λ2|(limp|B\calEJ|/p)2.\lim_{p\uparrow\infty}|BB^{\dagger}\calE J|^{2}/p\leq|\Lambda^{-2}|\hskip 1.0pt(\lim_{p\uparrow\infty}|B^{\top}\calE J|\hskip 1.0pt/p)^{2}\hskip 1.0pt.

Assumption 6(e) and the fact that all matrix norms on \bbRq×n\bbR^{q\times n} are equivalent concludes the proof of (91)(\ref{BBE}). Part (c) now follows by combining (88)(\ref{HBBHbound})(91)(\ref{BBE}) and observing that each (Ψ2)jj=1κp2/\scrsj,p2(\Psi^{2})_{jj}=1-\kappa_{p}^{2}/\scrs^{2}_{j,p} for jqj\leq q is eventually in (0,1)(0,1) due to parts (a) and (b).

Lastly, for part (d), we again use that BB=\scrB\scrBBB^{\dagger}=\scrB\scrB^{\top} is symmetric, that J\calW=\calWJ\calW=\calW, and computing \scrHz\scrH^{\top}z from (85)(\ref{Hkey}) while considering (86)(\ref{BBH}) yields that almost surely

limp|\scrHz\scrH\scrB\scrBz|\displaystyle\lim_{p\uparrow\infty}|\scrH^{\top}z-\scrH\scrB\scrB^{\top}z| =limp|\scrHz(BB\scrH)z|\displaystyle=\lim_{p\uparrow\infty}|\scrH^{\top}z-(BB^{\dagger}\scrH)^{\top}z|
=limp1p|\calW\calXBz+\calW\calEz\calWZpz|\displaystyle=\lim_{p\uparrow\infty}\frac{1}{\sqrt{p}}\big{|}\calW^{\top}\calX B^{\top}z+\calW^{\top}\calE^{\top}z-\calW^{\top}Z_{p}^{\top}z\big{|}
=limp1p|\calWJ\calEz(BB\calEJ\calW)z|\displaystyle=\lim_{p\uparrow\infty}\frac{1}{\sqrt{p}}|\hskip 1.0pt\calW^{\top}J\calE^{\top}z-(BB^{\dagger}\calE J\calW)^{\top}z|
limp|\calW|(1p|J\calEz|+1p|BB\calEJ|)=0\displaystyle\leq\lim_{p\uparrow\infty}|\calW^{\top}|\hskip 1.0pt\Big{(}\frac{1}{\sqrt{p}}|J\calE^{\top}z|+\frac{1}{\sqrt{p}}|BB^{\dagger}\calE J|\Big{)}=0

by applying (89)(\ref{lsupW}), (91)(\ref{BBE}) and Assumption 6(f). This concludes the proof. ∎

Proof of Theorem 3.

We first prove the norm of the numerator \scrB(zz\scrH)\scrB^{\top}(z-z_{\scrH}) of \scrEp(\scrH)\scrE_{p}(\scrH) in (23)(\ref{optbias_gps}) converges to |Φ\scrHz||\Phi\scrH^{\top}z|. Using that z\scrH=\scrH\scrHzz_{\scrH}=\scrH\scrH^{\top}z yields,

|\scrB(zz\scrH)|2\displaystyle|\scrB^{\top}(z-z_{\scrH})|^{2} =\scrBz,\scrB\scrBz2\scrBz,\scrBz\scrH+\scrBz\scrH,\scrBz\scrH\displaystyle=\langle\scrB^{\top}z,\scrB\scrB^{\top}z\rangle-2\langle\scrB^{\top}z,\scrB^{\top}z_{\scrH}\rangle+\langle\scrB^{\top}z_{\scrH},\scrB^{\top}z_{\scrH}\rangle
=|\scrBz|22\scrH\scrB\scrBz,\scrHz+|\scrB\scrH\scrHz|2.\displaystyle=|\scrB^{\top}z|^{2}-2\langle\scrH^{\top}\scrB\scrB^{\top}z,\scrH^{\top}z\rangle+|\scrB^{\top}\scrH\scrH^{\top}z|^{2}\hskip 1.0pt. (92)

Considering the last term in (92)(\ref{numerEH2}), we obtain

|\scrB\scrH\scrHz|2\displaystyle|\scrB^{\top}\scrH\scrH^{\top}z|^{2} =\scrHz,(\scrH\scrB\scrB\scrHΨ2)\scrHz+z\scrHΨ2\scrHz.\displaystyle=\langle\scrH^{\top}z,(\scrH^{\top}\scrB\scrB^{\top}\scrH-\Psi^{2})\hskip 1.0pt\scrH^{\top}z\rangle+z^{\top}\scrH\Psi^{2}\scrH^{\top}z\hskip 1.0pt.

and because |\scrHz,(\scrH\scrB\scrB\scrHΨ2)\scrHz||\scrHz|2|\scrH\scrB\scrB\scrHΨ2||\langle\scrH^{\top}z,(\scrH^{\top}\scrB\scrB^{\top}\scrH-\Psi^{2})\hskip 1.0pt\scrH^{\top}z\rangle|\leq|\scrH^{\top}z|^{2}|\scrH^{\top}\scrB\scrB^{\top}\scrH-\Psi^{2}| as well as |\scrHz|1|\scrH^{\top}z|\leq 1, we have by part (c) of Theorem 4 that

limp|\scrB\scrH\scrHz|2=limp|Ψ\scrHz|2.\displaystyle\lim_{p\uparrow\infty}|\scrB^{\top}\scrH\scrH^{\top}z|^{2}=\lim_{p\uparrow\infty}|\Psi\scrH^{\top}z|^{2}\hskip 1.0pt. (93)

The second term in (92)(\ref{numerEH2}) has,

\scrH\scrB\scrBz,\scrHz=\scrH\scrB\scrBz\scrHz,\scrHz+|\scrHz|2\displaystyle\langle\scrH^{\top}\scrB\scrB^{\top}z,\scrH^{\top}z\rangle=\langle\scrH^{\top}\scrB\scrB^{\top}z-\scrH^{\top}z,\scrH^{\top}z\rangle+|\scrH^{\top}z|^{2}

so that by part (d) of Theorem 4, we have

limp\scrH\scrB\scrBz,\scrHz=limp|\scrHz|2\displaystyle\lim_{p\uparrow\infty}\langle\scrH^{\top}\scrB\scrB^{\top}z,\scrH^{\top}z\rangle=\lim_{p\uparrow\infty}|\scrH^{\top}z|^{2} (94)

For the first term in (92)(\ref{numerEH2}), due to Corollary 5 and (28)(\ref{Hzlen}) in particular,

limp|\scrBz|2=limp|zB|2=limp|Ψ1\scrH\scrB\scrBz|\displaystyle\lim_{p\uparrow\infty}|\scrB^{\top}z|^{2}=\lim_{p\uparrow\infty}|z_{B}|^{2}=\lim_{p\uparrow\infty}|\Psi^{-1}\scrH^{\top}\scrB\scrB^{\top}z| (95)

where we used that |zB|=|\scrBz||z_{B}|=|\scrB^{\top}z|. Since |Ψ1|<|\Psi^{-1}|<\infty almost surely due to part (c) of Theorem 4, applying part (d) of the same theorem now yields,

limp|\scrBz|2=limp|Ψ1\scrHz|2\displaystyle\lim_{p\uparrow\infty}|\scrB^{\top}z|^{2}=\lim_{p\uparrow\infty}|\Psi^{-1}\scrH^{\top}z|^{2} (96)

Now, we rewrite the term |Φ\scrHz|2|\Phi\scrH^{\top}z|^{2} by substituting Φ=Ψ1Ψ\Phi={{\Psi}}^{-1}-{{\Psi}} as,

|Φ\scrHz|2\displaystyle|\Phi\scrH^{\top}z|^{2} =z\scrH(Ψ1Ψ)(Ψ1Ψ)\scrHz\displaystyle=z^{\top}\scrH(\Psi^{-1}-\Psi)(\Psi^{-1}-\Psi)\scrH^{\top}z
=z\scrHΨ2\scrHz2z\scrH\scrHz+z\scrHΨ2\scrHz\displaystyle=z^{\top}\scrH\Psi^{-2}\scrH^{\top}z-2z^{\top}\scrH\scrH^{\top}z+z^{\top}\scrH\Psi^{2}\scrH^{\top}z
=|Ψ1\scrHz|22|\scrHz|2+|Ψ\scrHz|2\displaystyle=|\Psi^{-1}\scrH^{\top}z|^{2}-2\hskip 1.0pt|\scrH^{\top}z|^{2}+|\Psi\scrH^{\top}z|^{2}

which confirms that limp(|\scrB(zz\scrH)||Φ\scrHz|)=0\lim_{p\uparrow\infty}\big{(}|\scrB^{\top}(z-z_{\scrH})|-|\Phi\scrH^{\top}z|\big{)}=0 almost surely, and after taking the limit of (92)(\ref{numerEH2}) and substituting (93)(\ref{numer1}), (94)(\ref{numer2}) and (96)(\ref{numer3}). Lastly, from (94)(\ref{numer2}),

lim¯p|\scrHz|\displaystyle\varlimsup_{p\uparrow\infty}|\scrH^{\top}z| lim¯p|\scrHz||\scrH\scrB||\scrBz|lim¯p|Ψ2||\scrBz|<lim¯p|\scrBz|\displaystyle\leq\varlimsup_{p\uparrow\infty}|\scrH^{\top}z||\scrH^{\top}\scrB||\scrB^{\top}z|\leq\varlimsup_{p\uparrow\infty}|\Psi^{2}||\scrB^{\top}z|<\varlimsup_{p\uparrow\infty}|\scrB^{\top}z| (97)

where we used that |\scrH\scrB|2=|\scrH\scrB\scrB\scrH||\scrH^{\top}\scrB|^{2}=|\scrH^{\top}\scrB\scrB^{\top}\scrH| and part (c) of Theorem 4 which shows the limit of the latter is |Ψ2||\Psi^{2}| with every Ψjj2\Psi^{2}_{jj} eventually in (0,1)(0,1). From this, lim¯p|Φ\scrHz||Φ|<\varlimsup_{p\uparrow\infty}|\Phi\scrH^{\top}z|\leq|\Phi|<\infty and 1|\scrHz|2\sqrt{1-|\scrH^{\top}z|^{2}} (denominator of \scrEp(\scrH)\scrE_{p}(\scrH) in (23)(\ref{optbias_gps})) is eventually in (0,1)(0,1) almost surely. We now deduce that |\scrEp(\scrH)||Φ\scrHz|1|\scrHz|2|\scrE_{p}(\scrH)|-\frac{|\Phi\scrH^{\top}z|}{\sqrt{1-|\scrH^{\top}z|^{2}}} vanishes and |\scrEp(\scrH)||\scrE_{p}(\scrH)| is eventually in [0,)[0,\infty) almost surely. Lastly |\scrEp(\scrH)||\scrE_{p}(\scrH)| converges to zero only if limp\scrHz=0\lim_{p\uparrow\infty}\scrH^{\top}z=0 (when |φ||\varphi| vanishes) concluding the proof.

Appendix C Proofs for Section 5

We begin with the following auxiliary result which requires Assumption 6. As usual, \scrB=\scrBp×q=νp×q(B)\scrB=\scrB_{p\times q}=\nu_{p\times q}(B) with B=Bp×qB=B_{p\times q} satisfying Assumption 5.

Lemma 12.

The matrix \scrB\scrH\scrB^{\top}\scrH is eventually invertible almost surely.

Proof.

Considering the determinant of \scrB\scrH\scrB^{\top}\scrH, almost surely,

limp(det(\scrB\scrH))2\displaystyle\lim_{p\uparrow\infty}(\det\hskip 1.0pt(\scrB^{\top}\scrH))^{2} =limpdet(\scrH\scrB\scrB\scrH)\displaystyle=\lim_{p\uparrow\infty}\det\hskip 1.0pt(\scrH^{\top}\scrB\scrB^{\top}\scrH)
=limpdet(Ψ2)\displaystyle=\lim_{p\uparrow\infty}\det({{\Psi}}^{2})

where we applied Theorem 4(c) and the continuity of the determinant. Moreover, the diagonal matrix Ψ2{{\Psi}}^{2} has every Ψjj{{\Psi}}_{jj} eventually in (0,1)(0,1) almost surely. Thus, det(\scrB\scrH)\det(\scrB^{\top}\scrH) is almost surely converging to a positive limit, i.e., \scrB\scrH\scrB^{\top}\scrH is eventually invertible. ∎

Proof of Theorem 9.

For \scrHz\scrH_{z} in (45)(\ref{Hz}) and z\scrH,φz_{\perp\scrH},\varphi in (35)(\ref{zpH}), define

\scrH+=\scrHΨ+z\scrHφ=\scrHzT+,T+=(Ψφ)\displaystyle\scrH_{+}=\scrH{{\Psi}}+z_{\perp\scrH}\varphi^{\top}=\scrH_{z}T_{+}\hskip 1.0pt,\hskip 32.0ptT_{+}=\Big{(}\begin{array}[]{c}{{\Psi}}\\ \varphi^{\top}\end{array}\Big{)} (100)

where T+\bbR(q+1)×qT_{+}\in\bbR^{(q+1)\times q} was first encountered in (56)(\ref{HsharpMat}). We compute,

T+T+=Ψ2+φφ=\scrMΦ2\scrM,\scrM=νq×q(Ψ2+φφ),\displaystyle\hskip 32.0ptT_{+}^{\top}T_{+}={{\Psi}}^{2}+\varphi\varphi^{\top}=\scrM\hskip 1.0pt\Phi^{2}\scrM^{\top}\hskip 1.0pt,\hskip 32.0pt\scrM=\nu_{q\times q}({{\Psi}}^{2}+\varphi\varphi^{\top})\hskip 1.0pt, (101)

with the eigenvalue decomposition per (36)(\ref{Phi}). The singular value decomposition,

T+=\scrTΦ\scrM,\scrT=ν(q+1)×q(T+)=(τ1τjτq),\displaystyle\hskip 32.0ptT_{+}=\scrT\Phi\scrM^{\top}\hskip 1.0pt,\hskip 32.0pt\scrT=\nu_{(q+1)\times q}(T_{+})=(\hskip 1.0pt\tau_{1}\hskip 4.0pt\cdots\tau_{j}\cdots\hskip 4.0pt\tau_{q}\hskip 1.0pt)\hskip 1.0pt, (102)

has τj\bbRq+1\tau_{j}\in\bbR^{q+1} denoting the jjth left singular vector with |τj|=1|\tau_{j}|=1 and value Φjj\Phi_{jj}. We can write the final estimator \scrH\scrH_{\sharp} in (37)(\ref{Hsharp}) in the form \scrH=\scrHzT\scrH_{\sharp}=\scrH_{z}T_{\sharp} (c.f. (56)(\ref{HsharpMat})) where

T=T+\scrMΦ1=\scrT,(\scrH=\scrHz\scrT=\scrH+\scrMΦ1).\displaystyle\hskip 32.0ptT_{\sharp}=T_{+}\scrM\Phi^{-1}=\scrT\hskip 1.0pt,\hskip 32.0pt(\scrH_{\sharp}=\scrH_{z}\scrT=\scrH_{+}\scrM\Phi^{-1}). (103)

We prove the last part first. Using (100)(\ref{Hplus}) and multiplying from the right by \scrB\scrB^{\top},

\scrB\scrH+=(\scrB\scrH)Ψ+\scrEp(\scrH)φ\displaystyle\scrB^{\top}\scrH_{+}=(\scrB^{\top}\scrH){{\Psi}}+\scrE_{p}(\scrH)\varphi^{\top}

where we used that \scrBz\scrH=\scrEp(\scrH)\scrB^{\top}z_{\perp\scrH}=\scrE_{p}(\scrH). Applying Corollary 5 yields,

limp\scrB\scrH+=limp((\scrB\scrH)Ψ+(\scrB\scrHΨ2\scrH\scrB)\scrEp(\scrH)φ).\displaystyle\lim_{p\uparrow\infty}\scrB^{\top}\scrH_{+}=\lim_{p\uparrow\infty}\big{(}(\scrB^{\top}\scrH){{\Psi}}+(\scrB^{\top}\scrH{{\Psi}}^{-2}\scrH^{\top}\scrB)\scrE_{p}(\scrH)\varphi^{\top}\big{)}\hskip 1.0pt. (104)

Using the identity \scrBz\scrH=\scrEp(\scrH)\scrB^{\top}z_{\perp\scrH}=\scrE_{p}(\scrH) and applying Theorem 4 parts (c)(d),

limpΨ1(\scrH\scrB)\scrEp(\scrH)\displaystyle\lim_{p\uparrow\infty}{{\Psi}}^{-1}(\scrH^{\top}\scrB)\hskip 1.0pt\scrE_{p}(\scrH) =limpΨ1(\scrH\scrB)\scrBz(\scrH\scrB\scrB\scrH)\scrHz|zz\scrH|\displaystyle=\lim_{p\uparrow\infty}{{\Psi}}^{-1}\frac{(\scrH^{\top}\scrB)\scrB^{\top}z-(\scrH^{\top}\scrB\scrB^{\top}\scrH)\scrH^{\top}z}{|z-z_{\scrH}|}
=limpΨ1(IΨ2)\scrHz|zz\scrH|\displaystyle=\lim_{p\uparrow\infty}{{\Psi}}^{-1}\frac{(I-{{\Psi}}^{2})\scrH^{\top}z}{|z-z_{\scrH}|}
=limpΠ\scrHz|zz\scrH|\displaystyle=\lim_{p\uparrow\infty}\frac{\Pi\scrH^{\top}z}{|z-z_{\scrH}|} (105)

so that limpΨ1(\scrH\scrB)\scrEp(\scrH)=limpφ\lim_{p\uparrow\infty}{{\Psi}}^{-1}(\scrH^{\top}\scrB)\hskip 1.0pt\scrE_{p}(\scrH)=\lim_{p\uparrow\infty}\varphi per (35)(\ref{zpH}) with Π=Ψ1Ψ\Pi={{\Psi}}^{-1}-{{\Psi}}. This justifies the nontrivial part of the limit statement in (53)(\ref{HzBBH}). Continuing from (104)(\ref{limBHplus1}),

limp\scrB\scrH+=limp(\scrB\scrH)(Ψ+Ψ1φφ)=limp(\scrB\scrH)Ψ1(T+T+).\displaystyle\lim_{p\uparrow\infty}\scrB^{\top}\scrH_{+}=\lim_{p\uparrow\infty}(\scrB^{\top}\scrH)\big{(}{{\Psi}}+{{\Psi}}^{-1}\varphi\varphi^{\top}\big{)}=\lim_{p\uparrow\infty}(\scrB^{\top}\scrH){{\Psi}}^{-1}(T_{+}^{\top}T_{+})\hskip 1.0pt.

Combining this with \scrB\scrH=\scrB\scrH+\scrMΦ1\scrB^{\top}\scrH_{\sharp}=\scrB^{\top}\scrH_{+}\scrM\Phi^{-1} per (103)(\ref{Tsharp}) and (101)(\ref{TpTp}) leads to the relation limp\scrB\scrH=limp(\scrB\scrH)Ψ1\scrMΦ\lim_{p\uparrow\infty}\scrB^{\top}\scrH_{\sharp}=\lim_{p\uparrow\infty}(\scrB^{\top}\scrH){{\Psi}}^{-1}\scrM\Phi. Therefore,

limp\scrH\scrB\scrB\scrH=limpΦ\scrMΨ1(\scrH\scrB\scrB\scrH)Ψ1\scrMΦ\lim_{p\uparrow\infty}\scrH_{\sharp}^{\top}\scrB\scrB^{\top}\scrH_{\sharp}=\lim_{p\uparrow\infty}\Phi\scrM^{\top}{{\Psi}}^{-1}(\scrH^{\top}\scrB\scrB^{\top}\scrH){{\Psi}}^{-1}\scrM\Phi

and the right side evaluates to limpΦ2\lim_{p\uparrow\infty}\Phi^{2} by Theorem 4(c) and that \scrM\scrM=I\scrM^{\top}\scrM=I.

Finally, we have \scrH\scrH=\scrT\scrHz\scrHz\scrT=I\scrH^{\top}_{\sharp}\scrH_{\sharp}=\scrT^{\top}\scrH_{z}^{\top}\scrH_{z}\scrT=I using (103)(\ref{Tsharp}) and the fact that both matrices \scrHz\scrH_{z} and \scrT\scrT have orthonormal columns.

We now move to proving that \scrEp(\scrH)0\scrE_{p}(\scrH_{\sharp})\to 0 for \scrH=\scrHzT\scrH_{\sharp}=\scrH_{z}T_{\sharp} per (103)(\ref{Tsharp}) with,

\scrEp(\scrH)=\scrB(zz\scrH)1|z\scrH|2\displaystyle\scrE_{p}(\scrH_{\sharp})=\frac{\scrB^{\top}(z-z_{\scrH_{\sharp}})}{\sqrt{1-|z_{\scrH_{\sharp}}|^{2}}} (106)

replacing \scrH\scrH in (23)(\ref{optbias_gps}) with \scrH\scrH_{\sharp} and applying (70)(\ref{zzH}). We prove the desired result in two steps below. In step 1 we show the denominator in (106)(\ref{EHsharp}) is bounded away from zero eventually. In step 2 we prove that the numerator in (106)(\ref{EHsharp}) converges to zero.

Step 1. We prove |z\scrH|<1|z_{\scrH_{\sharp}}|<1 eventually in pp almost surely. Note that,

|z\scrHzT|2\displaystyle|z_{\scrH_{z}T}|^{2} =|\scrHzT(\scrHzT)z|2=|\scrHzTT\scrHzz|2=z\scrHzTT\scrHz\scrHzTT\scrHzz\displaystyle=|\scrH_{z}T(\scrH_{z}T)^{\dagger}z|^{2}=|\scrH_{z}TT^{\dagger}\scrH_{z}^{\top}z|^{2}=z^{\top}\scrH_{z}TT^{\dagger}\scrH_{z}^{\top}\scrH_{z}TT^{\dagger}\scrH_{z}^{\top}z
=z\scrHzTT\scrHzz\displaystyle=z^{\top}\scrH_{z}TT^{\dagger}\scrH_{z}^{\top}z

for any element \scrHzT\scrH_{z}T in the family (46)(\ref{Tfamily}) and where we have used that \scrHz\scrHz=I\scrH_{z}^{\top}\scrH_{z}=I and that TT=IT^{\dagger}T=I. Starting with (102)(\ref{Tplus}) and (103)(\ref{Tsharp}), we have the spectral decomposition

TT=\scrT\scrT=\scrT\scrT=j=1qτjτj.\displaystyle T_{\sharp}T_{\sharp}^{\dagger}=\scrT\scrT^{\dagger}=\scrT\scrT^{\dagger}={\textstyle\sum\nolimits}_{j=1}^{q}\tau_{j}\tau_{j}^{\top}\hskip 1.0pt. (107)

using which and \scrH=\scrHzT\scrH_{\sharp}=\scrH_{z}T_{\sharp}, we write

|z\scrH|2=|z\scrHzT|2=z\scrHzTT\scrHzz=j=1q\scrHzz,τj2.\displaystyle|z_{\scrH_{\sharp}}|^{2}=|z_{\scrH_{z}T_{\sharp}}|^{2}=z^{\top}\scrH_{z}T_{\sharp}T_{\sharp}^{\top}\scrH^{\top}_{z}z={\textstyle\sum\nolimits}_{j=1}^{q}\langle\scrH^{\top}_{z}z,\tau_{j}\rangle^{2}\hskip 1.0pt. (108)

Next, since (Ψ1φ1)T+=φφΨ1Ψ=0q\Big{(}\begin{array}[]{c}-{{\Psi}}^{-1}\varphi\\ 1\end{array}\Big{)}^{\top}T_{+}=\varphi^{\top}-\varphi^{\top}{{\Psi}}^{-1}{{\Psi}}=0_{q} with T+T_{+} in (100)(\ref{Hplus}), the vector

τq+1=(Ψ1φ1)11+|Ψ1φ|2\displaystyle\tau_{q+1}=\Big{(}\begin{array}[]{c}-{{\Psi}}^{-1}\varphi\\ 1\end{array}\Big{)}\frac{1}{\sqrt{1+|{{\Psi}}^{-1}\varphi|^{2}}}

is in the null space of T+T_{+} and therefore in that of TT_{\sharp} per (103)(\ref{Tsharp}). Since the column spaces of TT_{\sharp} and TTT_{\sharp}T_{\sharp}^{\dagger} are identical, we have τ1,,τq,τq+1\bbRq+1\tau_{1},\dots,\tau_{q},\tau_{q+1}\in\bbR^{q+1} forms a basis for \bbRq+1\bbR^{q+1}.

Observing that |\scrHzz|2=|\scrHz|2+|zz\scrH|2=|\scrHz|2+1|z\scrH|2=1|\scrH_{z}^{\top}z|^{2}=|\scrH^{\top}z|^{2}+|z-z_{\scrH}|^{2}=|\scrH^{\top}z|^{2}+1-|z_{\scrH}|^{2}=1,

1=|\scrHzz|2=j=1q+1\scrHzz,τj2=|z\scrH|2+\scrHzz,τq+121=|\scrH_{z}^{\top}z|^{2}={\textstyle\sum\nolimits}_{j=1}^{q+1}\langle\scrH^{\top}_{z}z,\tau_{j}\rangle^{2}=|z_{\scrH_{\sharp}}|^{2}+\langle\scrH_{z}^{\top}z,\tau_{q+1}\rangle^{2}

since τ1,,τq+1\tau_{1},\dots,\tau_{q+1} forms a basis for \bbRq+1\bbR^{q+1} and applying (108)(\ref{zHsharp2}). Consequently,

|z\scrH|2=1\scrHzz,τq+12.|z_{\scrH_{\sharp}}|^{2}=1-\langle\scrH^{\top}_{z}z,\tau_{q+1}\rangle^{2}\hskip 1.0pt.

It now only suffices to show that \scrHzz,τq+12>0\langle\scrH^{\top}_{z}z,\tau_{q+1}\rangle^{2}>0 eventually in pp. For φ\varphi in (35)(\ref{zpH}),

|Ψ1φ|2=|Ψ1Π\scrHz|zz\scrH||2=|(Ψ2I)\scrHz|2|zz\scrH|2,|{{\Psi}}^{-1}\varphi|^{2}=\Big{|}\frac{{{\Psi}}^{-1}\Pi\scrH^{\top}z}{|z-z_{\scrH}|}\Big{|}^{2}=\frac{|({{\Psi}}^{-2}-I)\scrH^{\top}z|^{2}}{|z-z_{\scrH}|^{2}},

and z\scrHz_{\perp\scrH} in (35)(\ref{zpH}) has z\scrH,z=1z\scrH,z|zz\scrH|=|zz\scrH|\langle z_{\perp\scrH},z\rangle=\frac{1-\langle z_{\scrH},z\rangle}{|z-z_{\scrH}|}=|z-z_{\scrH}| by (70)(\ref{zzH}) and (71)(\ref{zmzH}). Thus,

\scrHzz,τq+12\displaystyle\langle\scrH^{\top}_{z}z,\tau_{q+1}\rangle^{2} =((\scrHz|zz\scrH|)(Ψ1φ1))211+|Ψ1φ|2\displaystyle=\bigg{(}\left(\begin{array}[]{c}\scrH^{\top}z\\ |z-z_{\scrH}|\end{array}\right)^{\top}\left(\begin{array}[]{c}-\Psi^{-1}\varphi\\ 1\end{array}\right)\bigg{)}^{2}\frac{1}{1+|\Psi^{-1}\varphi|^{2}} (113)
=(|zz\scrH|2z\scrH(Ψ2I)\scrHz)2|zz\scrH|2+|(Ψ2I)\scrHz|2\displaystyle=\frac{\big{(}|z-z_{\scrH}|^{2}-z^{\top}\scrH(\Psi^{-2}-I)\scrH^{\top}z\big{)}^{2}}{|z-z_{\scrH}|^{2}+|(\Psi^{-2}-I)\scrH^{\top}z|^{2}}
=(1|z\scrH|2+|z\scrH|2z\scrHΨ2\scrHz)2|zz\scrH|2+|(Ψ2I)\scrHz|2\displaystyle=\frac{\big{(}1-|z_{\scrH}|^{2}+|z_{\scrH}|^{2}-z^{\top}\scrH\Psi^{-2}\scrH^{\top}z\big{)}^{2}}{|z-z_{\scrH}|^{2}+|({{\Psi}}^{-2}-I)\scrH^{\top}z|^{2}}
=(1z\scrHΨ2\scrHz)2|zz\scrH|2+|(Ψ2I)\scrHz|2.\displaystyle=\frac{\big{(}1-z^{\top}\scrH\Psi^{-2}\scrH^{\top}z\big{)}^{2}}{|z-z_{\scrH}|^{2}+|(\Psi^{-2}-I)\scrH^{\top}z|^{2}}. (114)

By (28)(\ref{Hzlen}) and the fact that |zB|=|\scrBz|1|z_{B}|=|\scrB^{\top}z|\leq 1 we have,

lim¯pz\scrHΨ2\scrHz=lim¯p|\scrBz|2\displaystyle\varlimsup_{p\uparrow\infty}z^{\top}\scrH{{\Psi}}^{-2}\scrH^{\top}z=\varlimsup_{p\uparrow\infty}|\scrB^{\top}z|^{2}

under Assumption 5 which also guarantees lim¯p|\scrBz|<1\varlimsup_{p\uparrow\infty}|\scrB^{\top}z|<1. We deduce that the numerator of (114)(\ref{Hztau}) is eventually strictly positive almost surely. The denominator is finite as |zz\scrH|21|z-z_{\scrH}|^{2}\leq 1 and the eigenvalues of Ψ2\Psi^{-2} are finite by Theorem 4(c).

Thus, \scrHzz,τq+12\langle\scrH^{\top}_{z}z,\tau_{q+1}\rangle^{2} is almost surely bounded away from zero eventually.

Step 2. We prove that the numerator of (106)(\ref{EHsharp}) almost surely eventually vanishes. We omit the “almost surely” clause for brevity below. Recall that (48)(\ref{slick}) supplies that

\scrB(zz\scrHzT)=0\displaystyle\scrB^{\top}(z-z_{\scrH_{z}T_{*}})=0 (115)

provided TTT^{\top}_{*}T_{*} is invertible for T=\scrHz\scrBT_{*}=\scrH_{z}^{\top}\scrB. To establish the latter, we directly compute T=(\scrB\scrH\scrEp(\scrH))T_{*}^{\top}=\big{(}\begin{array}[]{cc}\scrB^{\top}\scrH&\scrE_{p}(\scrH)\end{array}\big{)} using \scrBz\scrH=\scrEp(\scrH)\scrB^{\top}z_{\perp\scrH}=\scrE_{p}(\scrH) with \scrEp(\scrH)\scrE_{p}(\scrH) in (23)(\ref{optbias_gps}). Then,

TT=\scrB\scrHz\scrHz\scrB=\scrB\scrH\scrH\scrB+\scrEp(\scrH)\scrEp(\scrH)\displaystyle T^{\top}_{*}T_{*}=\scrB^{\top}\scrH_{z}\scrH_{z}^{\top}\scrB=\scrB^{\top}\scrH\scrH^{\top}\scrB+\scrE_{p}(\scrH)\hskip 1.0pt\scrE^{\top}_{p}(\scrH) (116)

and we deduce that |TT||T^{\top}_{*}T_{*}| is eventually bounded since the columns of \scrB\scrB and \scrH\scrH have unit length and |\scrEp(\scrH)||\scrE_{p}(\scrH)| is eventually finite by Theorem 3. Since both terms in (116)(\ref{TsTs}) are positive semidefinite and \scrB\scrH\scrB^{\top}\scrH eventually invertible by Lemma 12, all eigenvalues of (116)(\ref{TsTs}) are strictly positive. Hence, TTT_{*}^{\top}T_{*} is eventually invertible.

In view of (115)(\ref{Numzero}), it only suffices to prove that the difference between z\scrH=z\scrHzTz_{\scrH_{\sharp}}=z_{\scrH_{z}T_{\sharp}} and z\scrHzTz_{\scrH_{z}T_{*}} vanishes in some norm. By Lemma 2 with K=\scrB\scrHΨ1\scrMΦ1K=\scrB^{\top}\scrH{{\Psi}}^{-1}\scrM\Phi^{-1},

lim¯p|z\scrHzTz\scrHzT|=lim¯p|z\scrHzTz\scrHzTK|,\displaystyle\varlimsup_{p\uparrow\infty}|z_{\scrH_{z}T_{\sharp}}-z_{\scrH_{z}T_{*}}|=\varlimsup_{p\uparrow\infty}|z_{\scrH_{z}T_{\sharp}}-z_{\scrH_{z}T_{*}K}|\hskip 1.0pt, (117)

owing to Lemma 12 and Theorem 4(c) which guarantee that \scrB\scrH\scrB^{\top}\scrH and Ψ1\scrMΦ1{{\Psi}}^{-1}\scrM\Phi^{-1} (and hence KK) are eventually invertible. Substituting T=T+\scrMΦ1T_{\sharp}=T_{+}\scrM\Phi^{-1}, we have

limp|TTK|\displaystyle\lim_{p\uparrow\infty}|T_{\sharp}-T_{*}K| limp|(Ψφ)\scrHz\scrB\scrB\scrHΨ1||\scrMΦ1|=0\displaystyle\leq\lim_{p\uparrow\infty}\Big{|}\Big{(}\begin{array}[]{c}{{\Psi}}\\ \varphi^{\top}\end{array}\Big{)}-\scrH_{z}^{\top}\scrB\scrB^{\top}\scrH{{\Psi}}^{-1}\Big{|}\hskip 1.0pt|\scrM\Phi^{-1}|=0

which confirms (57)(\ref{Tprime}) and applies (53)(\ref{HzBBH}) which was justified above (see (105)(\ref{PHBEH})). Since the mapping Tz\scrHzTT\rightarrow z_{\scrH_{z}T} from the domain of real (q+1)×q(q+1)\times q, full column rank matrices is continuous, we have via (117)(\ref{limproj}) that limp|z\scrHzTz\scrHzT|=0\lim\limits_{p\rightarrow\infty}|z_{\scrH_{z}T_{\sharp}}-z_{\scrH_{z}T_{*}}|=0 as required.

We remark that since lim¯p|z\scrHzT|<1\varlimsup_{p\uparrow\infty}|z_{\scrH_{z}T_{\sharp}}|<1, we now have lim¯p|z\scrHzT|<1\varlimsup_{p\uparrow\infty}|z_{\scrH_{z}T_{*}}|<1. This also proves that \scrEp(\scrHzT)=0\scrE_{p}(\scrH_{z}T_{*})=0 eventually in pp (see comments below (48)(\ref{slick})).

Appendix D The Eigenvector Selection Function

For any matrix Am×A\in\mathbb{R}^{m\times\ell}, we enumerate qmin(,m)q\leq\min(\ell,m) singular values (in descending order) and their left singular vectors in a well defined way. We start by ordering all dd distinct singular values of AA as λ1>λ2>>λd\lambda_{1}>\lambda_{2}>\cdots>\lambda_{d} (c.f. Λjj\Lambda_{jj} in Section 1.4) and uniquely identifying the linear subspaces \calK1,\calK2,,\calKd\calK_{1},\calK_{2},\dots,\calK_{d} formed by the associated left singular vectors. Given the first k1k-1 left singular vectors v1,v2,..,vk1v_{1},v_{2},..,v_{k-1} are selected, we select the kkth left singular vector vkv_{k} by taking the following steps.

  1. 1.

    Identify the unique j{1,2,,d}j\in\{1,2,\dots,d\} for which,

    k(a=1j1dim(𝒦a),a=1jdim(\calKa)].\displaystyle k\in\Big{(}\sum_{a=1}^{j-1}\dim(\mathcal{K}_{a}),\sum_{a=1}^{j}\dim(\calK_{a})\Big{]}\hskip 1.0pt. (118)
  2. 2.

    Let \calKj,k\calKj\calK_{j,k}\subseteq\calK_{j} denote the orthogonal complement of the subspace formed by the subset of vectors v1,,vk1v_{1},\dots,v_{k-1} corresponding to λj\lambda_{j}, where the orthogonal complement is taken within 𝒦j\mathcal{K}_{j}. For the standard basis elements \rme1,\rme2,,\rmem\rme_{1},\rme_{2},\dots,\rme_{m} of m\mathbb{R}^{m}, we identify the unique \rmes\rme_{s} as the first one that is not orthogonal to \calKj,k\calK_{j,k}.

  3. 3.

    Set vkv_{k} as the orthogonal projection of \rmes\rme_{s} onto \calKj,k\calK_{j,k} normalized to |vk|=1|v_{k}|=1.

Implementing this process sequentially on k=1,2,,mk=1,2,...,m assembles a list of left singular vectors v1,,vmv_{1},\dots,v_{m} with associated singular values in decreasing order. We define νm×q(A)\nu_{m\times q}(A) as an m×qm\times q matrix carrying v1,v2,..,vqv_{1},v_{2},..,v_{q} at its columns.

Remark 11.

In the second step to define the kkth left singular vector, note that the subspace 𝒦j,k\mathcal{K}_{j,k} is of non-zero dimension by (118)(\ref{eq:definitionofj}). Moreover, the uniquely defined standard basis element \rmes\rme_{s} has to exist. If it does not exist, the whole space m\mathbb{R}^{m} becomes orthogonal to 𝒦j,k\mathcal{K}_{j,k}, implying that 𝒦j,k\mathcal{K}_{j,k} is of zero dimension, which contradicts our previous assertion.

Example 12.

We illustrate the above procedure with A=ImA=I_{m}. The matrix ImI_{m} has λ1=1\lambda_{1}=1 as the sole singular value which corresponds to the subspace of left singular vectors \calK1=\bbRm\calK_{1}=\bbR^{m}. For k=1k=1 in the algorithm introduced above, we obtain the corresponding jj determined as 11 by (118)(\ref{eq:definitionofj}). The subspace \calKj,k=\calK1,1\calK_{j,k}=\calK_{1,1} equals \calK1=m\calK_{1}=\mathbb{R}^{m} as there has not been any selection yet. Then the first of \rme1,,\rmem\rme_{1},\dots,\rme_{m} that is not orthogonal to \calK1,1=m\calK_{1,1}=\mathbb{R}^{m} would clearly be \rme1\rme_{1}. Hence, v1=\rme1v_{1}=\rme_{1} is the normalized orthogonal projection of \rme1\rme_{1} onto \calK1,1=m\calK_{1,1}=\mathbb{R}^{m}. Next, we assume as an induction hypothesis that v1=\rme1,v2=\rme2,,vk1=\rmek1v_{1}=\rme_{1},v_{2}=\rme_{2},\dots,v_{k-1}=\rme_{k-1} and implement the kkth step of the algorithm to show vk=ekv_{k}=e_{k}. Clearly, jj defined by (118)(\ref{eq:definitionofj}) corresponding to kk is 11. Moreover, \calK1,k\calK_{1,k}, the orthogonal complement of the subspace formed by the vectors previously selected for the singular value λ1=1\lambda_{1}=1, is spanned by \rmek,\rmek+1,,\rmem\rme_{k},\rme_{k+1},\dots,\rme_{m}. Hence, the first of \rme1,,\rmem\rme_{1},\dots,\rme_{m} that is not orthogonal to \calK1,k\calK_{1,k} is \rmek\rme_{k}. That sets vk=\rmekv_{k}=\rme_{k}. As a result, we obtain νm×q(Im)\nu_{m\times q}(I_{m}) assembled as [\rme1,\rme2,,\rmeq][\rme_{1},\rme_{2},\dots,\rme_{q}] so that its iith column contains the coordinate vector \rmei\rme_{i}.

Appendix E Capon Beamforming

One important illustration of the pathological behaviour described below (3)(\ref{Qhx}) concerns robust (Capon) beamforming (see Cox, Zeskind and Owen (1987), Li and Stoica (2005)) and Vorobyov (2013)). Some recent work that applies spectral methods for robust beamforming may be found in Zhu, Xu and Ye (2020), Luo et al. (2023) and Chen, Qiu and Sheng (2024), who survey related work. The importance of the covariance estimation aspect of robust beamforming is also well-recognized (e.g., Abrahamsson, Selen and Stoica (2007), Chen et al. (2010) and Xie et al. (2021)). In particular, the LW shrinkage estimator developed in Ledoit and Wolf (2004b) has had noteworthy influence on this literature, despite being originally proposed for portfolio selection in finance Ledoit and Wolf (2003). Typical application of this estimator employs the identity matrix as the “shrinkage target”, which leaves the eigenvectors of the sample covariance matrix unchanged (fn. 9). However, the estimation error in the sample eigenvectors (especially for small sample/snapshot sizes as is our setting) is known to have material impact Cox (2002). One (rare) example of robust beamforming work that attempts to “de-noise” sample eigenvectors directly is Quijano and Zurk (2015). But, their analysis does not overlap with our (4)(\ref{obf})(6)(\ref{EHslim}).

{acks}

[Acknowledgments] We thank Haim Bar and Alec Kercheval for useful feedback on the motivating optimization example (Section 1.1) in our introduction. We thank Lisa Goldberg for an insightful discussion that led to the main theorem of Section 4. We thank Kay Giesecke for many helpful comments on an earlier draft of this paper. We thank the participants of the 2023 SIAM Conference on Financial Mathematics in Philadelphia, PA, the UCLA Seminar on Financial and Actuarial Mathematics, Los Angeles, CA, the CDAR Risk Seminar, UC Berkeley and the AFT Lab Seminar, Stanford CA for their comments and feedback on many of the ideas that led to this manuscript.

References

  • Abbe, Fan and Wang (2022) {barticle}[author] \bauthor\bsnmAbbe, \bfnmEmmanuel\binitsE., \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmWang, \bfnmKaizheng\binitsK. (\byear2022). \btitleAn p\ell_{p} theory of PCA and spectral clustering. \bjournalThe Annals of Statistics \bvolume50 \bpages2359–2385. \endbibitem
  • Abrahamsson, Selen and Stoica (2007) {binproceedings}[author] \bauthor\bsnmAbrahamsson, \bfnmRichard\binitsR., \bauthor\bsnmSelen, \bfnmYngve\binitsY. and \bauthor\bsnmStoica, \bfnmPetre\binitsP. (\byear2007). \btitleEnhanced covariance matrix estimators in adaptive beamforming. In \bbooktitle2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 \bvolume2 \bpagesII–969. \bpublisherIEEE. \endbibitem
  • Acker (1974) {barticle}[author] \bauthor\bsnmAcker, \bfnmAndrew F\binitsA. F. (\byear1974). \btitleAbsolute continuity of eigenvectors of time-varying operators. \bjournalProceedings of the American Mathematical Society \bvolume42 \bpages198–201. \endbibitem
  • Agterberg, Lubberts and Priebe (2022) {barticle}[author] \bauthor\bsnmAgterberg, \bfnmJoshua\binitsJ., \bauthor\bsnmLubberts, \bfnmZachary\binitsZ. and \bauthor\bsnmPriebe, \bfnmCarey E.\binitsC. E. (\byear2022). \btitleEntrywise Estimation of Singular Vectors of Low-Rank Matrices With Heteroskedasticity and Dependence. \bjournalIEEE Transactions on Information Theory \bvolume68 \bpages4618–4650. \bdoi10.1109/TIT.2022.3159085 \endbibitem
  • Ahn et al. (2007) {barticle}[author] \bauthor\bsnmAhn, \bfnmJeongyoun\binitsJ., \bauthor\bsnmMarron, \bfnmJS\binitsJ., \bauthor\bsnmMuller, \bfnmKeith M\binitsK. M. and \bauthor\bsnmChi, \bfnmYueh-Yun\binitsY.-Y. (\byear2007). \btitleThe high-dimension, low-sample-size geometric representation holds under mild conditions. \bjournalBiometrika \bvolume94 \bpages760–766. \endbibitem
  • Aoshima et al. (2018) {barticle}[author] \bauthor\bsnmAoshima, \bfnmMakoto\binitsM., \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH., \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK., \bauthor\bsnmZhou, \bfnmYi-Hui\binitsY.-H. and \bauthor\bsnmMarron, \bfnmJS\binitsJ. (\byear2018). \btitleA survey of high dimension low sample size asymptotics. \bjournalAustralian & New Zealand journal of statistics \bvolume60 \bpages4–19. \endbibitem
  • Bai, Liu and Wong (2009) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ., \bauthor\bsnmLiu, \bfnmHuixia\binitsH. and \bauthor\bsnmWong, \bfnmWing-Keung\binitsW.-K. (\byear2009). \btitleEnhancement of the applicability of Markowitz’s portfolio optimization by utilizing random matrix theory. \bjournalMathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics \bvolume19 \bpages639–667. \endbibitem
  • Bai and Ng (2008) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2008). \btitleLarge dimensional factor analysis. \bjournalFoundations and Trends in Econometrics \bvolume3 \bpages89–163. \endbibitem
  • Bai and Ng (2019) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2019). \btitleRank regularized estimation of approximate factor models. \bjournalJournal of Econometrics \bvolume212 \bpages78–96. \endbibitem
  • Bai and Ng (2023) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2023). \btitleApproximate factor models with weaker loadings. \bjournalJournal of Econometrics. \endbibitem
  • Bai and Yao (2012) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2012). \btitleOn sample eigenvalues in a generalized spiked population model. \bjournalJournal of Multivariate Analysis \bvolume106 \bpages167–177. \endbibitem
  • Bauder et al. (2021) {barticle}[author] \bauthor\bsnmBauder, \bfnmDavid\binitsD., \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmSchmid, \bfnmWolfgang\binitsW. (\byear2021). \btitleBayesian mean–variance analysis: optimal portfolio selection under parameter uncertainty. \bjournalQuantitative Finance \bvolume21 \bpages221–242. \endbibitem
  • Best and Grauer (1991) {barticle}[author] \bauthor\bsnmBest, \bfnmMichael J\binitsM. J. and \bauthor\bsnmGrauer, \bfnmRobert R\binitsR. R. (\byear1991). \btitleOn the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results. \bjournalThe review of financial studies \bvolume4 \bpages315–342. \endbibitem
  • Bianchi, Goldberg and Rosenberg (2017) {barticle}[author] \bauthor\bsnmBianchi, \bfnmStephen W\binitsS. W., \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R. and \bauthor\bsnmRosenberg, \bfnmAllan\binitsA. (\byear2017). \btitleThe impact of estimation error on latent factor model forecasts of portfolio risk. \bjournalThe Journal of Portfolio Management \bvolume43 \bpages147–156. \endbibitem
  • Blin, Guerard and Mark (2022) {bincollection}[author] \bauthor\bsnmBlin, \bfnmJohn\binitsJ., \bauthor\bsnmGuerard, \bfnmJohn\binitsJ. and \bauthor\bsnmMark, \bfnmAndrew\binitsA. (\byear2022). \btitleA History of Commercially Available Risk Models. In \bbooktitleEncyclopedia of Finance \bpages1–39. \bpublisherSpringer. \endbibitem
  • Bodnar, Okhrin and Parolya (2022) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmOkhrin, \bfnmYarema\binitsY. and \bauthor\bsnmParolya, \bfnmNestor\binitsN. (\byear2022). \btitleOptimal shrinkage-based portfolio selection in high dimensions. \bjournalJournal of Business & Economic Statistics \bvolume41 \bpages140–156. \endbibitem
  • Bodnar, Parolya and Schmid (2018) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmSchmid, \bfnmWolfgang\binitsW. (\byear2018). \btitleEstimation of the global minimum variance portfolio in high dimensions. \bjournalEuropean Journal of Operational Research \bvolume266 \bpages371–390. \endbibitem
  • Bodnar, Parolya and Thorsén (2023) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmThorsén, \bfnmErik\binitsE. (\byear2023). \btitleDynamic Shrinkage Estimation of the High-Dimensional Minimum-Variance Portfolio. \bjournalIEEE Transactions on Signal Processing \bvolume71 \bpages1334-1349. \bdoi10.1109/TSP.2023.3263950 \endbibitem
  • Bongiorno and Challet (2023) {barticle}[author] \bauthor\bsnmBongiorno, \bfnmChristian\binitsC. and \bauthor\bsnmChallet, \bfnmDamien\binitsD. (\byear2023). \btitleNon-linear shrinkage of the price return covariance matrix is far from optimal for portfolio optimization. \bjournalFinance Research Letters \bvolume52 \bpages103383. \bdoihttps://doi.org/10.1016/j.frl.2022.103383 \endbibitem
  • Boyd et al. (2024) {barticle}[author] \bauthor\bsnmBoyd, \bfnmStephen\binitsS., \bauthor\bsnmJohansson, \bfnmKasper\binitsK., \bauthor\bsnmKahn, \bfnmRonald\binitsR., \bauthor\bsnmSchiele, \bfnmPhilipp\binitsP. and \bauthor\bsnmSchmelzer, \bfnmThomas\binitsT. (\byear2024). \btitleMarkowitz Portfolio Construction at Seventy. \bjournalarXiv preprint arXiv:2401.05080. \endbibitem
  • Bun, Bouchaud and Potters (2017) {barticle}[author] \bauthor\bsnmBun, \bfnmJoël\binitsJ., \bauthor\bsnmBouchaud, \bfnmJean-Philippe\binitsJ.-P. and \bauthor\bsnmPotters, \bfnmMarc\binitsM. (\byear2017). \btitleCleaning large correlation matrices: tools from random matrix theory. \bjournalPhysics Reports \bvolume666 \bpages1–109. \endbibitem
  • Cai et al. (2020) {barticle}[author] \bauthor\bsnmCai, \bfnmT. Tony\binitsT. T., \bauthor\bsnmHu, \bfnmJianchang\binitsJ., \bauthor\bsnmLi, \bfnmYingying\binitsY. and \bauthor\bsnmZheng, \bfnmXinghua\binitsX. (\byear2020). \btitleHigh-dimensional minimum variance portfolio estimation based on high-frequency data. \bjournalJournal of Econometrics \bvolume214 \bpages482-494. \bdoihttps://doi.org/10.1016/j.jeconom.2019.04.039 \endbibitem
  • Cai et al. (2021) {barticle}[author] \bauthor\bsnmCai, \bfnmChangxiao\binitsC., \bauthor\bsnmLi, \bfnmGen\binitsG., \bauthor\bsnmChi, \bfnmYuejie\binitsY., \bauthor\bsnmPoor, \bfnmH. Vincent\binitsH. V. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2021). \btitleSubspace estimation from unbalanced and incomplete data matrices: {2,infty}\ell_{\{2,infty\}} statistical guarantees. \bjournalThe Annals of Statistics \bvolume49 \bpages944–967. \bdoi10.1214/20-AOS1986 \endbibitem
  • Candès et al. (2011) {barticle}[author] \bauthor\bsnmCandès, \bfnmEmmanuel J\binitsE. J., \bauthor\bsnmLi, \bfnmXiaodong\binitsX., \bauthor\bsnmMa, \bfnmYi\binitsY. and \bauthor\bsnmWright, \bfnmJohn\binitsJ. (\byear2011). \btitleRobust principal component analysis? \bjournalJournal of the ACM (JACM) \bvolume58 \bpages1–37. \endbibitem
  • Capon (1969) {barticle}[author] \bauthor\bsnmCapon, \bfnmJack\binitsJ. (\byear1969). \btitleHigh-resolution frequency-wavenumber spectrum analysis. \bjournalProceedings of the IEEE \bvolume57 \bpages1408–1418. \endbibitem
  • Casella and Hwang (1982) {barticle}[author] \bauthor\bsnmCasella, \bfnmGeorge\binitsG. and \bauthor\bsnmHwang, \bfnmJiunn Tzon\binitsJ. T. (\byear1982). \btitleLimit expressions for the risk of james-stein estimators. \bjournalCanadian Journal of Statistics \bvolume10 \bpages305–309. \endbibitem
  • Chamberlain and Rothschild (1983) {barticle}[author] \bauthor\bsnmChamberlain, \bfnmGary\binitsG. and \bauthor\bsnmRothschild, \bfnmMichael\binitsM. (\byear1983). \btitleArbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. \bjournalEconometrica: Journal of the Econometric Society \bpages1281–1304. \endbibitem
  • Chandrasekaran, Parrilo and Willsky (2012) {barticle}[author] \bauthor\bsnmChandrasekaran, \bfnmVenkat\binitsV., \bauthor\bsnmParrilo, \bfnmPablo A\binitsP. A. and \bauthor\bsnmWillsky, \bfnmAlan S\binitsA. S. (\byear2012). \btitleLATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. \bjournalThe Annals of Statistics \bpages1935–1967. \endbibitem
  • Chen, Qiu and Sheng (2024) {barticle}[author] \bauthor\bsnmChen, \bfnmXiangwei\binitsX., \bauthor\bsnmQiu, \bfnmShuang\binitsS. and \bauthor\bsnmSheng, \bfnmWeixing\binitsW. (\byear2024). \btitleImproved eigenspace-based method for robust adaptive beamforming with dimension search. \bjournalSignal Processing \bvolume218 \bpages109366. \endbibitem
  • Chen et al. (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmYilun\binitsY., \bauthor\bsnmWiesel, \bfnmAmi\binitsA., \bauthor\bsnmEldar, \bfnmYonina C\binitsY. C. and \bauthor\bsnmHero, \bfnmAlfred O\binitsA. O. (\byear2010). \btitleShrinkage algorithms for MMSE covariance estimation. \bjournalIEEE transactions on signal processing \bvolume58 \bpages5016–5029. \endbibitem
  • Clarke, De Silva and Thorley (2011) {barticle}[author] \bauthor\bsnmClarke, \bfnmRoger\binitsR., \bauthor\bsnmDe Silva, \bfnmHarindra\binitsH. and \bauthor\bsnmThorley, \bfnmSteven\binitsS. (\byear2011). \btitleMinimum-Variance Portfolio Composition. \bjournalJournal of Portfolio Management \bvolume2 \bpages31–45. \endbibitem
  • Connor and Korajczyk (1986) {barticle}[author] \bauthor\bsnmConnor, \bfnmGregory\binitsG. and \bauthor\bsnmKorajczyk, \bfnmRobert A.\binitsR. A. (\byear1986). \btitlePerformance measurement with the arbitrage pricing theory: A new framework for analysis. \bjournalJournal of financial economics \bvolume15 \bpages373–394. \endbibitem
  • Cox (2002) {binproceedings}[author] \bauthor\bsnmCox, \bfnmHenry\binitsH. (\byear2002). \btitleAdaptive beamforming in non-stationary environments. In \bbooktitleConference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002. \bvolume1 \bpages431–438. \bpublisherIEEE. \endbibitem
  • Cox, Zeskind and Owen (1987) {barticle}[author] \bauthor\bsnmCox, \bfnmHenry\binitsH., \bauthor\bsnmZeskind, \bfnmRobertm\binitsR. and \bauthor\bsnmOwen, \bfnmMarkm\binitsM. (\byear1987). \btitleRobust adaptive beamforming. \bjournalIEEE Transactions on Acoustics, Speech, and Signal Processing \bvolume35 \bpages1365–1376. \endbibitem
  • De Mol, Giannone and Reichlin (2008) {barticle}[author] \bauthor\bsnmDe Mol, \bfnmChristine\binitsC., \bauthor\bsnmGiannone, \bfnmDomenico\binitsD. and \bauthor\bsnmReichlin, \bfnmLucrezia\binitsL. (\byear2008). \btitleForecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components? \bjournalJournal of Econometrics \bvolume146 \bpages318–328. \endbibitem
  • Ding, Li and Zheng (2021) {barticle}[author] \bauthor\bsnmDing, \bfnmYi\binitsY., \bauthor\bsnmLi, \bfnmYingying\binitsY. and \bauthor\bsnmZheng, \bfnmXinghua\binitsX. (\byear2021). \btitleHigh dimensional minimum variance portfolio estimation under statistical factor models. \bjournalJournal of Econometrics \bvolume222 \bpages502-515. \bnoteAnnals Issue: Financial Econometrics in the Age of the Digital Economy. \bdoihttps://doi.org/10.1016/j.jeconom.2020.07.013 \endbibitem
  • Donoho, Gavish and Johnstone (2018) {barticle}[author] \bauthor\bsnmDonoho, \bfnmDavid L\binitsD. L., \bauthor\bsnmGavish, \bfnmMatan\binitsM. and \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2018). \btitleOptimal shrinkage of eigenvalues in the spiked covariance model. \bjournalAnnals of statistics \bvolume46 \bpages1742. \endbibitem
  • Donoho, Gavish and Romanov (2023) {barticle}[author] \bauthor\bsnmDonoho, \bfnmDavid\binitsD., \bauthor\bsnmGavish, \bfnmMatan\binitsM. and \bauthor\bsnmRomanov, \bfnmElad\binitsE. (\byear2023). \btitleScreeNOT: Exact MSE-optimal singular value thresholding in correlated noise. \bjournalThe Annals of Statistics \bvolume51 \bpages122–148. \endbibitem
  • El Karoui (2010) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2010). \btitleHigh-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. \endbibitem
  • El Karoui (2013) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2013). \btitleOn the realized risk of high-dimensional Markowitz portfolios. \bjournalSIAM Journal on Financial Mathematics \bvolume4 \bpages737–783. \endbibitem
  • Fan, Fan and Lv (2008) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmFan, \bfnmYingying\binitsY. and \bauthor\bsnmLv, \bfnmJinchi\binitsJ. (\byear2008). \btitleHigh dimensional covariance matrix estimation using a factor model. \bjournalJournal of Econometrics \bvolume147 \bpages186–197. \endbibitem
  • Fan, Liao and Mincheva (2013) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmMincheva, \bfnmMartina\binitsM. (\byear2013). \btitleLarge covariance estimation by thresholding principal orthogonal complements. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume75 \bpages603–680. \endbibitem
  • Fan, Liao and Liu (2016) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmLiu, \bfnmHan\binitsH. (\byear2016). \btitleAn overview of the estimation of large covariance and precision matrices. \bjournalThe Econometrics Journal \bvolume19 \bpagesC1–C32. \endbibitem
  • Fan, Liao and Wang (2016) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmWang, \bfnmWeichen\binitsW. (\byear2016). \btitleProjected principal component analysis in factor models. \bjournalAnnals of statistics \bvolume44 \bpages219. \endbibitem
  • Fan, Masini and Medeiros (2023) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmMasini, \bfnmRicardo P\binitsR. P. and \bauthor\bsnmMedeiros, \bfnmMarcelo C\binitsM. C. (\byear2023). \btitleBridging factor and sparse models. \bjournalThe Annals of Statistics \bvolume51 \bpages1692–1717. \endbibitem
  • Fan, Wang and Zhong (2018) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmWang, \bfnmWeichen\binitsW. and \bauthor\bsnmZhong, \bfnmYiqiao\binitsY. (\byear2018). \btitleAn \ell_{\infty} eigenvector perturbation bound and its application to robust covariance estimation. \bjournalJournal of Machine Learning Research \bvolume18 \bpages1–42. \endbibitem
  • Fan, Zhang and Yu (2012) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmZhang, \bfnmJingjin\binitsJ. and \bauthor\bsnmYu, \bfnmKe\binitsK. (\byear2012). \btitleVast portfolio selection with gross-exposure constraints. \bjournalJournal of the American Statistical Association \bvolume107 \bpages592–606. \endbibitem
  • Fan and Zhong (2018) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmZhong, \bfnmYiqiao\binitsY. (\byear2018). \btitleOptimal subspace estimation using overidentifying vectors via generalized method of moments. \bjournalarXiv preprint arXiv:1805.02826. \endbibitem
  • Farnè and Montanari (2024) {barticle}[author] \bauthor\bsnmFarnè, \bfnmMatteo\binitsM. and \bauthor\bsnmMontanari, \bfnmAngela\binitsA. (\byear2024). \btitleLarge factor model estimation by nuclear norm plus 1\ell_{1} norm penalization. \bjournalJournal of Multivariate Analysis \bvolume199 \bpages105244. \endbibitem
  • Fisher and Sun (2011) {barticle}[author] \bauthor\bsnmFisher, \bfnmThomas J\binitsT. J. and \bauthor\bsnmSun, \bfnmXiaoqian\binitsX. (\byear2011). \btitleImproved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix. \bjournalComputational Statistics & Data Analysis \bvolume55 \bpages1909–1918. \endbibitem
  • Goldberg, Gurdogan and Kercheval (2023) {bmisc}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmGurdogan, \bfnmHubeyb\binitsH. and \bauthor\bsnmKercheval, \bfnmAlec\binitsA. (\byear2023). \btitlePortfolio optimization via strategy-specific eigenvector shrinkage. \endbibitem
  • Goldberg and Kercheval (2023) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R. and \bauthor\bsnmKercheval, \bfnmAlec N\binitsA. N. (\byear2023). \btitleJames–Stein for the leading eigenvector. \bjournalProceedings of the National Academy of Sciences \bvolume120 \bpagese2207046120. \endbibitem
  • Goldberg, Papanicolaou and Shkolnik (2022) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmPapanicolaou, \bfnmAlex\binitsA. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2022). \btitleThe dispersion bias. \bjournalSIAM Journal on Financial Mathematics \bvolume13 \bpages521–550. \endbibitem
  • Goldberg et al. (2020) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmPapanicolaou, \bfnmAlex\binitsA., \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. and \bauthor\bsnmUlucam, \bfnmSimge\binitsS. (\byear2020). \btitleBetter betas. \bjournalThe Journal of Portfolio Management \bvolume47 \bpages119–136. \endbibitem
  • Gurdogan and Kercheval (2022) {barticle}[author] \bauthor\bsnmGurdogan, \bfnmHubeyb\binitsH. and \bauthor\bsnmKercheval, \bfnmAlec\binitsA. (\byear2022). \btitleMultiple Anchor Point Shrinkage for the Sample Covariance Matrix. \bjournalSIAM Journal on Financial Mathematics \bvolume13 \bpages1112–1143. \endbibitem
  • Hager and Hungerford (2015) {barticle}[author] \bauthor\bsnmHager, \bfnmWilliam W\binitsW. W. and \bauthor\bsnmHungerford, \bfnmJames T\binitsJ. T. (\byear2015). \btitleContinuous quadratic programming formulations of optimization problems on graphs. \bjournalEuropean Journal of Operational Research \bvolume240 \bpages328–337. \endbibitem
  • Hall, Marron and Neeman (2005) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP., \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. and \bauthor\bsnmNeeman, \bfnmAmnon\binitsA. (\byear2005). \btitleGeometric representation of high dimension, low sample size data. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume67 \bpages427–444. \endbibitem
  • Hegerl et al. (1996) {barticle}[author] \bauthor\bsnmHegerl, \bfnmHANs\binitsH. \bsuffixGabriele C, \bauthor\bsnmHasselmann, \bfnmKlaus\binitsK., \bauthor\bsnmSanter, \bfnmBenjamin D\binitsB. D., \bauthor\bsnmCubasch, \bfnmUlrich\binitsU. and \bauthor\bsnmJones, \bfnmPhilip D\binitsP. D. (\byear1996). \btitleDetecting greenhouse-gas-induced climate change with an optimal fingerprint method. \bjournalJournal of Climate \bvolume9 \bpages2281–2306. \endbibitem
  • Huberman (1982) {barticle}[author] \bauthor\bsnmHuberman, \bfnmGur\binitsG. (\byear1982). \btitleA simple approach to arbitrage pricing theory. \bjournalJournal of Economic Theory \bvolume28 \bpages183–191. \endbibitem
  • Jagannathan and Ma (2003) {barticle}[author] \bauthor\bsnmJagannathan, \bfnmRavi\binitsR. and \bauthor\bsnmMa, \bfnmTongshu\binitsT. (\byear2003). \btitleRisk reduction in large portfolios: Why imposing the wrong constraints helps. \bjournalThe journal of finance \bvolume58 \bpages1651–1683. \endbibitem
  • Johnstone (2001) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2001). \btitleOn the distribution of the largest eigenvalue in principal components analysis. \bjournalThe Annals of statistics \bvolume29 \bpages295–327. \endbibitem
  • Johnstone and Lu (2009) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. and \bauthor\bsnmLu, \bfnmArthur Yu\binitsA. Y. (\byear2009). \btitleOn consistency and sparsity for principal components analysis in high dimensions. \bjournalJournal of the American Statistical Association \bvolume104 \bpages682–693. \endbibitem
  • Jung (2022) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS. (\byear2022). \btitleAdjusting systematic bias in high dimensional principal component scores. \bjournalStatistica Sinica \bvolume32 \bpages939–959. \endbibitem
  • Jung, Lee and Ahn (2018) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS., \bauthor\bsnmLee, \bfnmMyung Hee\binitsM. H. and \bauthor\bsnmAhn, \bfnmJeongyoun\binitsJ. (\byear2018). \btitleOn the number of principal components in high dimensions. \bjournalBiometrika \bvolume105 \bpages389–402. \endbibitem
  • Jung and Marron (2009) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS. and \bauthor\bsnmMarron, \bfnmJ Stephen\binitsJ. S. (\byear2009). \btitlePCA consistency in high dimension, low sample size context. \bjournalThe Annals of Statistics \bvolume37 \bpages4104–4130. \endbibitem
  • Jung, Sen and Marron (2012) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS., \bauthor\bsnmSen, \bfnmArusharka\binitsA. and \bauthor\bsnmMarron, \bfnmJS\binitsJ. (\byear2012). \btitleBoundary behavior in high dimension, low sample size asymptotics of PCA. \bjournalJournal of Multivariate Analysis \bvolume109 \bpages190–203. \endbibitem
  • Lai et al. (2011) {barticle}[author] \bauthor\bsnmLai, \bfnmTze Leung\binitsT. L., \bauthor\bsnmXing, \bfnmHaipeng\binitsH., \bauthor\bsnmChen, \bfnmZehao\binitsZ. \betalet al. (\byear2011). \btitleMean–variance portfolio optimization when means and covariances are unknown. \bjournalThe Annals of Applied Statistics \bvolume5 \bpages798–823. \endbibitem
  • Lam (2020) {barticle}[author] \bauthor\bsnmLam, \bfnmClifford\binitsC. (\byear2020). \btitleHigh-dimensional covariance matrix estimation. \bjournalWiley Interdisciplinary reviews: computational statistics \bvolume12 \bpagese1485. \endbibitem
  • Lancewicki and Aladjem (2014) {barticle}[author] \bauthor\bsnmLancewicki, \bfnmTomer\binitsT. and \bauthor\bsnmAladjem, \bfnmMayer\binitsM. (\byear2014). \btitleMulti-Target Shrinkage Estimation for Covariance Matrices. \bjournalIEEE Transactions on Signal Processing \bvolume62 \bpages6380-6390. \bdoi10.1109/TSP.2014.2364784 \endbibitem
  • Ledoit and Wolf (2003) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2003). \btitleImproved estimation of the covariance matrix of stock returns with an application to portfolio selection. \bjournalJournal of empirical finance \bvolume10 \bpages603–621. \endbibitem
  • Ledoit and Wolf (2004a) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2004a). \btitleHoney, I Shrunk the Sample Covariance Matrix. \bjournalJournal of Portfolio Management \bvolume30 \bpages110. \endbibitem
  • Ledoit and Wolf (2004b) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2004b). \btitleA well-conditioned estimator for large-dimensional covariance matrices. \bjournalJournal of multivariate analysis \bvolume88 \bpages365–411. \endbibitem
  • Ledoit and Wolf (2017) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2017). \btitleNonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. \bjournalThe Review of Financial Studies \bvolume30 \bpages4349–4388. \endbibitem
  • Ledoit and Wolf (2018) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2018). \btitleOptimal estimation of a large-dimensional covariance matrix under Stein’s loss. \endbibitem
  • Ledoit and Wolf (2020a) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2020a). \btitleThe Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation. \bjournalJournal of Financial Econometrics \bvolume20 \bpages187-218. \bdoi10.1093/jjfinec/nbaa007 \endbibitem
  • Ledoit and Wolf (2020b) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2020b). \btitleAnalytical nonlinear shrinkage of large-dimensional covariance matrices. \bjournalThe Annals of Statistics \bvolume48 \bpages3043 – 3065. \bdoi10.1214/19-AOS1921 \endbibitem
  • Ledoit and Wolf (2021) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2021). \btitleShrinkage estimation of large covariance matrices: Keep it simple, statistician? \bjournalJournal of Multivariate Analysis \bvolume186 \bpages104796. \endbibitem
  • Ledoit and Wolf (2022) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2022). \btitleQuadratic shrinkage for large covariance matrices. \bjournalBernoulli \bvolume28 \bpages1519–1547. \endbibitem
  • Lee and Shkolnik (2024a) {bmisc}[author] \bauthor\bsnmLee, \bfnmYouhong\binitsY. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024a). \btitleJames-Stein regularization of the first principal component. \endbibitem
  • Lee and Shkolnik (2024b) {bmisc}[author] \bauthor\bsnmLee, \bfnmYouhong\binitsY. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024b). \btitleCentral Limit Theorems of a Strongly Spiked Eigenvecor and its James-Stein estimator. \endbibitem
  • Lettau and Pelger (2020) {barticle}[author] \bauthor\bsnmLettau, \bfnmMartin\binitsM. and \bauthor\bsnmPelger, \bfnmMarkus\binitsM. (\byear2020). \btitleEstimating latent asset-pricing factors. \bjournalJournal of Econometrics \bvolume218 \bpages1–31. \endbibitem
  • Li and Shkolnik (2024) {bmisc}[author] \bauthor\bsnmLi, \bfnmChang Yuan\binitsC. Y. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024). \btitleOn Minimum Trace Factor Analysis–An Old Song Sung to a New Tune. \endbibitem
  • Li and Stoica (2005) {bbook}[author] \bauthor\bsnmLi, \bfnmJian\binitsJ. and \bauthor\bsnmStoica, \bfnmPetre\binitsP. (\byear2005). \btitleRobust adaptive beamforming. \bpublisherJohn Wiley & Sons. \endbibitem
  • Li et al. (2022) {barticle}[author] \bauthor\bsnmLi, \bfnmGen\binitsG., \bauthor\bsnmCai, \bfnmChangxiao\binitsC., \bauthor\bsnmPoor, \bfnmH Vincent\binitsH. V. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2022). \btitleMinimax estimation of linear functions of eigenvectors in the face of small eigen-gaps. \bjournalarXiv preprint arXiv:2104.03298. \endbibitem
  • Luo et al. (2023) {barticle}[author] \bauthor\bsnmLuo, \bfnmTao\binitsT., \bauthor\bsnmChen, \bfnmPeng\binitsP., \bauthor\bsnmCao, \bfnmZhenxin\binitsZ., \bauthor\bsnmZheng, \bfnmLe\binitsL. and \bauthor\bsnmWang, \bfnmZongxin\binitsZ. (\byear2023). \btitleURGLQ: An Efficient Covariance Matrix Reconstruction Method for Robust Adaptive Beamforming. \bjournalIEEE Transactions on Aerospace and Electronic Systems. \endbibitem
  • Maggiar et al. (2018) {barticle}[author] \bauthor\bsnmMaggiar, \bfnmAlvaro\binitsA., \bauthor\bsnmWachter, \bfnmAndreas\binitsA., \bauthor\bsnmDolinskaya, \bfnmIrina S\binitsI. S. and \bauthor\bsnmStaum, \bfnmJeremy\binitsJ. (\byear2018). \btitleA derivative-free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple importance sampling. \bjournalSIAM Journal on Optimization \bvolume28 \bpages1478–1507. \endbibitem
  • Marchenko and Pastur (1967) {barticle}[author] \bauthor\bsnmMarchenko, \bfnmVladimir Alexandrovich\binitsV. A. and \bauthor\bsnmPastur, \bfnmLeonid Andreevich\binitsL. A. (\byear1967). \btitleDistribution of eigenvalues for some sets of random matrices. \bjournalMatematicheskii Sbornik \bvolume114 \bpages507–536. \endbibitem
  • Markowitz (1952) {barticle}[author] \bauthor\bsnmMarkowitz, \bfnmHarry\binitsH. (\byear1952). \btitlePortfolio Selection. \bjournalThe Journal of Finance \bvolume7 \bpages77–91. \endbibitem
  • Menchero, Orr and Wang (2011) {barticle}[author] \bauthor\bsnmMenchero, \bfnmJose\binitsJ., \bauthor\bsnmOrr, \bfnmD\binitsD. and \bauthor\bsnmWang, \bfnmJun\binitsJ. (\byear2011). \btitleThe Barra US equity model (USE4), methodology notes. \bjournalMSCI Barra. \endbibitem
  • Michaud (1989) {barticle}[author] \bauthor\bsnmMichaud, \bfnmRichard O\binitsR. O. (\byear1989). \btitleThe Markowitz optimization enigma: Is ‘optimized’optimal? \bjournalFinancial analysts journal \bvolume45 \bpages31–42. \endbibitem
  • Ollila, Palomar and Pascal (2020) {barticle}[author] \bauthor\bsnmOllila, \bfnmEsa\binitsE., \bauthor\bsnmPalomar, \bfnmDaniel P\binitsD. P. and \bauthor\bsnmPascal, \bfnmFrédéric\binitsF. (\byear2020). \btitleShrinking the eigenvalues of M-estimators of covariance matrix. \bjournalIEEE Transactions on Signal Processing \bvolume69 \bpages256–269. \endbibitem
  • Onatski (2012) {barticle}[author] \bauthor\bsnmOnatski, \bfnmAlexei\binitsA. (\byear2012). \btitleAsymptotics of the principal components estimator of large factor models with weakly influential factors. \bjournalJournal of Econometrics \bvolume168 \bpages244–258. \endbibitem
  • Pafka and Kondor (2003) {barticle}[author] \bauthor\bsnmPafka, \bfnmSzilárd\binitsS. and \bauthor\bsnmKondor, \bfnmImre\binitsI. (\byear2003). \btitleNoisy covariance matrices and portfolio optimization II. \bjournalPhysica A: Statistical Mechanics and its Applications \bvolume319 \bpages487-494. \bdoihttps://doi.org/10.1016/S0378-4371(02)01499-1 \endbibitem
  • Paul (2007) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2007). \btitleAsymptotics of sample eigenstructure for a large dimensional spiked covariance model. \bjournalStatistica Sinica \bpages1617–1642. \endbibitem
  • Quijano and Zurk (2015) {barticle}[author] \bauthor\bsnmQuijano, \bfnmJorge E\binitsJ. E. and \bauthor\bsnmZurk, \bfnmLisa M\binitsL. M. (\byear2015). \btitleEigenvector pruning method for high resolution beamforming. \bjournalThe Journal of the Acoustical Society of America \bvolume138 \bpages2152–2160. \endbibitem
  • Ross (1976) {barticle}[author] \bauthor\bsnmRoss, \bfnmStephen A\binitsS. A. (\byear1976). \btitleThe arbitrage theory of capital asset pricing. \bjournalJournal of economic theory \bvolume13 \bpages341–360. \endbibitem
  • Saunderson et al. (2012) {barticle}[author] \bauthor\bsnmSaunderson, \bfnmJ.\binitsJ., \bauthor\bsnmChandrasekaran, \bfnmV.\binitsV., \bauthor\bsnmParrilo, \bfnmP. A.\binitsP. A. and \bauthor\bsnmWillsky, \bfnmA. S.\binitsA. S. (\byear2012). \btitleDiagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting. \bjournalSIAM Journal on Matrix Analysis and Applications \bvolume33 \bpages1395–1416. \bdoi10.1137/120872516 \endbibitem
  • Shapiro (1985) {barticle}[author] \bauthor\bsnmShapiro, \bfnmAlexander\binitsA. (\byear1985). \btitleIdentifiability of factor analysis: Some results and open problems. \bjournalLinear Algebra and its Applications \bvolume70 \bpages1–7. \endbibitem
  • Shen, Shen and Marron (2013) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH. and \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. (\byear2013). \btitleConsistency of sparse PCA in high dimension, low sample size contexts. \bjournalJournal of Multivariate Analysis \bvolume115 \bpages317–333. \endbibitem
  • Shen, Shen and Marron (2016) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH. and \bauthor\bsnmMarron, \bfnmJames S\binitsJ. S. (\byear2016). \btitleA general framework for consistency of principal component analysis. \bjournalJournal of Machine Learning Research \bvolume17 \bpages1–34. \endbibitem
  • Shen et al. (2016) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH., \bauthor\bsnmZhu, \bfnmHongtu\binitsH. and \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. (\byear2016). \btitleThe statistics and mathematics of high dimensional low sample size asympotics. \bjournalStatistica Sinica \bvolume26 \bpages1747–1770. \endbibitem
  • Shkolnik (2022) {barticle}[author] \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2022). \btitleJames-Stein estimation of the first principal component. \bjournalStat, forthcoming. \endbibitem
  • Vorobyov (2013) {barticle}[author] \bauthor\bsnmVorobyov, \bfnmSergiy A\binitsS. A. (\byear2013). \btitlePrinciples of minimum variance robust adaptive beamforming design. \bjournalSignal Processing \bvolume93 \bpages3264–3277. \endbibitem
  • Wang and Fan (2017) {barticle}[author] \bauthor\bsnmWang, \bfnmWeichen\binitsW. and \bauthor\bsnmFan, \bfnmJianqing\binitsJ. (\byear2017). \btitleAsymptotics of empirical eigenstructure for high dimensional spiked covariance. \bjournalThe Annals of Statistics \bvolume45 \bpages1342–1374. \endbibitem
  • Wang and Zhang (2024) {barticle}[author] \bauthor\bsnmWang, \bfnmXuanci\binitsX. and \bauthor\bsnmZhang, \bfnmBin\binitsB. (\byear2024). \btitleTarget selection in shrinkage estimation of covariance matrix: A structural similarity approach. \bjournalStatistics & Probability Letters \bpages110048. \endbibitem
  • Weyl (1912) {barticle}[author] \bauthor\bsnmWeyl, \bfnmHermann\binitsH. (\byear1912). \btitleDas asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). \bjournalMathematische Annalen \bvolume71 \bpages441–479. \endbibitem
  • Won et al. (2013) {barticle}[author] \bauthor\bsnmWon, \bfnmJoong-Ho\binitsJ.-H., \bauthor\bsnmLim, \bfnmJohan\binitsJ., \bauthor\bsnmKim, \bfnmSeung-Jean\binitsS.-J. and \bauthor\bsnmRajaratnam, \bfnmBala\binitsB. (\byear2013). \btitleCondition-number-regularized covariance estimation. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume75 \bpages427–450. \endbibitem
  • Xie et al. (2021) {barticle}[author] \bauthor\bsnmXie, \bfnmLei\binitsL., \bauthor\bsnmHe, \bfnmZishu\binitsZ., \bauthor\bsnmTong, \bfnmJun\binitsJ., \bauthor\bsnmLi, \bfnmJun\binitsJ. and \bauthor\bsnmXi, \bfnmJiangtao\binitsJ. (\byear2021). \btitleCross-validated tuning of shrinkage factors for MVDR beamforming based on regularized covariance matrix estimation. \bjournalarXiv preprint arXiv:2104.01909. \endbibitem
  • Yan, Chen and Fan (2021) {bmisc}[author] \bauthor\bsnmYan, \bfnmYuling\binitsY., \bauthor\bsnmChen, \bfnmYuxin\binitsY. and \bauthor\bsnmFan, \bfnmJianqing\binitsJ. (\byear2021). \btitleInference for Heteroskedastic PCA with Missing Data. \bnotearXiv:2107.12365 [cs, math, stat]. \bdoi10.48550/arXiv.2107.12365 \endbibitem
  • Yao, Zheng and Bai (2015) {barticle}[author] \bauthor\bsnmYao, \bfnmJianfeng\binitsJ., \bauthor\bsnmZheng, \bfnmShurong\binitsS. and \bauthor\bsnmBai, \bfnmZD\binitsZ. (\byear2015). \btitleSample covariance matrices and high-dimensional data analysis. \bjournalCambridge UP, New York. \endbibitem
  • Yata and Aoshima (2009) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2009). \btitlePCA consistency for non-Gaussian data in high dimension, low sample size context. \bjournalCommunications in Statistics-Theory and Methods \bvolume38 \bpages2634–2652. \endbibitem
  • Yata and Aoshima (2012) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2012). \btitleEffective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. \bjournalJournal of multivariate analysis \bvolume105 \bpages193–215. \endbibitem
  • Yata and Aoshima (2013) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2013). \btitlePCA consistency for the power spiked model in high-dimensional settings. \bjournalJournal of multivariate analysis \bvolume122 \bpages334–354. \endbibitem
  • Zhang, Cai and Wu (2022) {barticle}[author] \bauthor\bsnmZhang, \bfnmAnru R.\binitsA. R., \bauthor\bsnmCai, \bfnmT. Tony\binitsT. T. and \bauthor\bsnmWu, \bfnmYihong\binitsY. (\byear2022). \btitleHeteroskedastic PCA: Algorithm, optimality, and applications. \bjournalThe Annals of Statistics \bvolume50. \bdoi10.1214/21-AOS2074 \endbibitem
  • Zhou and Chen (2023) {bmisc}[author] \bauthor\bsnmZhou, \bfnmYuchen\binitsY. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2023). \btitleDeflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA. \bnotearXiv:2303.06198 [cs, math, stat]. \endbibitem
  • Zhu, Xu and Ye (2020) {barticle}[author] \bauthor\bsnmZhu, \bfnmXingyu\binitsX., \bauthor\bsnmXu, \bfnmXu\binitsX. and \bauthor\bsnmYe, \bfnmZhongfu\binitsZ. (\byear2020). \btitleRobust adaptive beamforming via subspace for interference covariance matrix reconstruction. \bjournalSignal Processing \bvolume167 \bpages107289. \endbibitem