The Quadratic Optimization Bias of Large Covariance Matrices
Abstract
We describe a puzzle involving the interactions between an optimization of a multivariate quadratic function and a “plug-in” estimator of a spiked covariance matrix. When the largest eigenvalues (i.e., the spikes) diverge with the dimension, the gap between the true and the out-of-sample optima typically also diverges. We show how to “fine-tune” the plug-in estimator in a precise way to avoid this outcome. Central to our description is a “quadratic optimization bias” function, the roots of which determine this fine-tuning property. We derive an estimator of this root from a finite number of observations of a high dimensional vector. This leads to a new covariance estimator designed specifically for applications involving quadratic optimization. Our theoretical results have further implications for improving low dimensional representations of data, and principal component analysis in particular.
keywords:
[class=MSC]keywords:
in A,B,…,Z,a,b,…,z \endlocaldefs
and
1 Introduction
Optimization with a “plug-in” model as an ingredient is routine practice in modern statistical problems in engineering and the sciences. Yet the interactions between the optimization procedure and the errors in an estimated model are often not well understood. Natural questions in this context include the following: “Does the optimizer amplify or reduce the statistical errors in the model? How does one leverage that information if it is known? Which components of the model should be estimated more precisely, and which can afford less accuracy?” We explore these questions for the optimization of a multivariate quadratic function that is specified in terms of a large covariance model. This setup is canonical for many problems that are encountered in the areas of finance, signal-noise processing, operations research and statistics.
Large covariance estimation occupies an important place in high-dimensional statistics and is fundamental to multivariate data analysis (e.g., Yao, Zheng and Bai (2015), Fan, Liao and Liu (2016) and Lam (2020)). A covariance model generalizes the classical setting of independence by introducing pairwise correlations. A parsimonious way to prescribe such correlations for many variables is through the use of a relatively small number of factors, which are high-dimensional vectors that govern all or most of the correlations in the observed data. This leads to a particular type of covariance matrix, a so called “spiked-model” in which a small number of (spiked) eigenvalues separate themselves with a larger magnitude from the remaining (bulk) spectrum (Wang and Fan, 2017). Imposing this factor structure may also be viewed as a form of regularization which replaces the problem of estimating unknown parameters of a covariance matrix with the estimation of a few “structured” components of this matrix. Determining the components that require the most attention in a setting that entails optimization is a central motivation of our work.
1.1 Motivation
To motivate the study of the interplay between optimization and model estimation error, we consider a quadratic function in variables. Let,
(1) |
for an inner product , constants and a vector . The matrix is assumed to be symmetric and positive definite. The maximization of is encountered in many classical contexts within statistics and probability including least-squares regression, maximum a posteriori estimation, saddle-point approximations, and Legendre-Fenchel transforms in moderate/large deviations theory. Some related and highly influential applications include Markowitz’s portfolio construction in finance (Markowitz, 1952), Capon beamforming in signal processing (Capon, 1969) and optimal fingerprinting in climate science (Hegerl et al., 1996). In optimization theory, quadratic functions form a key ingredient for more general (black-box) minimization techniques such as trust-region methods (e.g., Maggiar et al. (2018)).111In this setting the covariance matrix corresponds to an estimated Hessian matrix. Since any number of linear equality constraints may be put into the unconstrained Lagrangian form in , our setting is more general than it first appears. Moreover, the maximization of is the starting point for numerous applications of quadratic programming where nonlinear constraints are often added.222To give one example, interpreting as a graph adjacency matrix and adding simple bound constants to leads to approximations of graph properties such as the maximum independent set (Hager and Hungerford, 2015). While is no longer interpreted as a spiked covariance matrix in a graph theory setting, its mathematical properties are similar owing to the celebrated Cheeger’s inequality.
The maximizer of is given by which attains the objective value
(2) |
but in practice, the maximization of is performed with an estimate replacing the unknown . This “plug-in” step is well known to yield a perplexing problem (see Section 1.3). In essence, the optimizer chases the errors in to produce a systematic bias in the computed maximum. This bias is then amplified by a higher dimension.
Consider a high-dimensional limit and a sequence of symmetric positive definite with a fixed number of spiked eigenvalues diverging in and all remaining eigenvalues bounded in . Let be the maximizer of , defined by replacing in by estimates with the same eigenvalue properties. The estimated objective is , but a more relevant quantity is the realized objective,
(3) |
where and is a discrepancy (relative to ) that can grow rapidly as the dimension increases. Precluding edge cases where or vanish, unless is fine-tuned in a calculated way, the following puzzling behavior ensues.
-
The discrepancy tends to as and consequently, the realized maximum tends to while the true maximum tends to .
The asymptotic behavior above is fully determined by a certain -valued function which we derive and call the quadratic optimization bias. The way in which this bias depends on the entries of characterizes the sought after interplay between the optimizer and the error in the estimated covariance model. Mitigating the discrepancy between the realized and true quadratic optima and reduces to the problem of approximating the roots of . We remark that by parametrizing the constants and in , one can arrive at an alternative limits for and , but practical scalings preserve the large disparity between the true and realized objective values. We examine (in Section 2.2) the scaling in particular, due to its applicability to portfolio theory, robust beamforming and optimal fingerprinting.
1.2 Summary of results & organization
The illustration above reflects that, in statistical settings, solutions to estimated quadratic optimization problems exhibit very poor properties out-of-sample. Section 2 answers the question of which components of must be estimated accurately to reduce the discrepancy in . The size of is amplified by the growth rate of the spiked eigenvalues, but is fully determined by the precision of the estimate of the associated matrix of eigenvectors of . In particular, where is given by,
(4) |
for the Euclidean length . Theorem 1 gives sharp asymptotics for in and the other estimates/parameters. Remarkably, the accuracy of the estimates of eigenvalues of is secondary relative the ensuring that is such that is small for large . This is noteworthy in view of the large literature on bias correction of sample eigenvalues (or “eigenvalue shrinkage”: see Ollila, Palomar and Pascal (2020), Ledoit and Wolf (2021), Ledoit and Wolf (2022) and Donoho, Gavish and Romanov (2023) for a sampling of recent work). Instead, for the discrepancy , the estimation of the eigenvectors of the spikes is what matters most. We remark that while forms a root of the map (i.e., ), it is not the only root. We refer to as the quadratic optimization bias (function) which was first identified in Goldberg, Papanicolaou and Shkolnik (2022) in the context of portfolio theory and for the special covariance with a single spike and identical remaining eigenvalues.333We state a more general definition in Section 2, but must have orthonormal columns in .
Section 3 considers a sample covariance matrix and its spectral decomposition , for a diagonal matrix of eigenvalues , the associated matrix of eigenvectors () and a residual . It is assumed that is and that the sequence is based on a fixed number of observations of a high dimensional vector. Our Theorem 3 proves that is almost surely bounded away from zero (in ) eventually in . This has material implications for the use of principal component analysis for problems motivated by Section 1.1.
Section 5 develops the following correction to the sample eigenvectors . For the diagonal matrix satisfying for , the difference between the number of nonzero sample eigenvalues and , we compute
(5) |
Theorem 9 proves the matrix of left singular vectors of , denoted , has
(6) |
almost surely. The matrix constitutes a set of corrected principal component loadings and is the basis of our covariance estimator . This matrix, owing to , yields an improved plug-in estimator for the maximizer of . Thus, our work also has implications for the estimation of the precision matrix . Theorem 9 also proves that the columns of have a larger projection (than ) onto the column space of . Recent literature has remarked on the difficulty (or even impossibility) of correcting such bias in eigenvectors (e.g., Ledoit and Wolf (2017), Wang and Fan (2017) and Jung (2022)). That projection is strictly better when in has bounded away from zero, i.e., captures information about that subspace. But holds regardless, highlighting that the choice of the “loss” function (in our case ) matters.444See also Donoho, Gavish and Johnstone (2018) for another illustration of this phenomenon.
In Section 4, we prove an impossibility theorem (Theorem 8) that shows that without very strong assumptions one cannot obtain an estimator of asymptotically in the dimension if . This has negative implications for obtaining estimates of in where is one of the unknowns. The latter contains all inner products between the sample and population eigenvectors, and its estimation from the observed data is an interesting theoretical problem in its own right. Our negative result adds to the literature on high dimension and low sample size (hdlss) asymptotics, as inspired by Hall, Marron and Neeman (2005) and Ahn et al. (2007).555Aoshima et al. (2018) survey much of the literature since. We remark that the hdlss regime is highly relevant for real-world data as a small sample size is often imposed by experimental constraints, or by the lack of long-range stationarity of time series. The content of Theorem 8 also points to a key feature that distinguishes our work from Goldberg, Papanicolaou and Shkolnik (2022)) who fix . Another aspect making our setting substantially more challenging is that we find roots of a multivariate function (which is univariate when ).

In terms of applications, our results generalize those of Goldberg, Papanicolaou and Shkolnik (2022) to covariance models that hold wide acceptance in the empirical literature on financial asset return (i.e., the Arbitrage Pricing Theory of Ross (1976), Huberman (1982), Chamberlain and Rothschild (1983) and others). Section 6 investigates the problem of minimum variance investing with numerical simulation, and demonstrates that the estimator results in vanishing asymptotic portfolio risk and a bounded discrepancy (see Figure 1). Appendix E summarizes other applications including signal-noise processing and climate science as related to Section 1.1.
1.3 Limitations & related literature
Our findings in Section 1.1 form a starting point for important extensions and applications. Extending the estimator in to general quadratic programming with inequality constraints would greatly expand its scope. In terms of covariance models, we require spikes that diverge linearly with the dimension, which excludes several alternative frameworks in the literature.666This includes the Johnstone spike model, in which all eigenvalues remain bounded as the dimension grows, and its extensions (e.g., Johnstone (2001), Paul (2007), Johnstone and Lu (2009) and Bai and Yao (2012)). Futher generalizations include slowly growing spiked eigenvalue models as in De Mol, Giannone and Reichlin (2008), Onatski (2012), Shen et al. (2016) and Bai and Ng (2023). Likewise, the asymptotics of the data matrix aspect ratio differs across applications. We also do not address the important setting in which the number of spikes is misspecified.777There is a large literature on the estimation of the number of spikes/factors/principal components. Most relevant to our setup (high dimension and low sample size) is Jung, Lee and Ahn (2018). Finally, the established convergence in leaves the question of rates unanswered. This is particularly important for problems requiring the discrepancy to not grow too quickly. We offer no theoretical treatment of convergence rates but our numerical results suggest this quantity remains bounded as grows (c.f., Figure 1).
The work we build on directly was initiated in Goldberg, Papanicolaou and Shkolnik (2022). We refer to their proposal as the GPS estimator and derive it in Section 5.1. Important extensions are developed in Gurdogan and Kercheval (2022) and Goldberg, Gurdogan and Kercheval (2023). The GPS estimator was shown to be mathematically equivalent to a James-Stein estimation of the leading eigenvector of a sample covariance matrix in Shkolnik (2022). These results share much in common with the ideas found in Casella and Hwang (1982). For a survey of the above literature, focusing on connections to the James-Stein estimator, see Goldberg and Kercheval (2023). The GPS estimator is explained in terms of regularization in Lee and Shkolnik (2024a), and Lee and Shkolnik (2024b) derive central limit theorems for this estimator as relevant for the convergence of the discrepancy . Some numerical exploration of the case of more than one spike is found in Goldberg et al. (2020).
The spiked covariance models we consider, and the application of pca for their estimation, are rooted in the literature on approximate factor models and “asymptotic principal components” originating with Chamberlain and Rothschild (1983) and Connor and Korajczyk (1986). Recent work in this direction is well represented by Bai and Ng (2008), Fan, Liao and Mincheva (2013), Bai and Ng (2023) and Fan, Masini and Medeiros (2023). In this literature, the work that most closely resembles ours, by focusing on improved estimation of sample eigenvectors, is Fan, Liao and Wang (2016), Fan and Zhong (2018) and Lettau and Pelger (2020). Fan, Liao and Wang (2016) project the data onto a space generated by some externally observed covariates, improving the resulting sample eigenvectors when the covariates have sufficient explanatory power. Fan and Zhong (2018) apply a linear transformation to the sample eigenvectors in an approach that is most closely related to formula . We also apply a linear transformation, but the eigenspace is first augmented by the vector in .888We remark that with a single spike/factor (i.e., ), a linear transformation of the eigenvector(s) adjustment only the eigenvalue, not the eigenvector itself due to its unit length normalization. Further differences with Fan, Liao and Wang (2016) arise in the estimation of the optimal linear transformation. Lettau and Pelger (2020) extract principal components from a rank-one updated sample covariance matrix. This update is based on insight from asset pricing theory and it is unclear how the resulting sample eigenvectors are related to formula . The same applies to the very closely related literature on sample covariance matrix shrinkage (e.g., Ledoit and Wolf (2004a), Fisher and Sun (2011), Lancewicki and Aladjem (2014) and Wang and Zhang (2024)).999This takes the form for some and matrix . Targets adjust eigenvectors but in ways that may be difficult to quantify via closed-form expressions (c.f. ).
The vast majority of the literature on approximate factor models and covariance estimation assumes the data matrix aspect ratio tends to a finite constant asymptotically.101010This may be due to the outsized influence of random matrix theory (e.g., Marchenko and Pastur (1967)). Another reason may be the consistency of the sample eigenvectors that can be achieved in this regime (see Yata and Aoshima (2009), Shen, Shen and Marron (2016) and Wang and Fan (2017). In contrast, our analysis of a finite sample in the high dimensional limit draws on the work on pca in Jung and Marron (2009), Jung, Sen and Marron (2012) and Shen et al. (2016) and others. In the latter, the hdlss asymptotics for the matrix , appearing in , have already been worked out (but see Section 4 for our impossibility theorem). Our main focus is on correcting the biases that the asymptotics of reveal. For approaches to correcting the finite sample bias in eigenvalues and principal component scores, see Yata and Aoshima (2012), Yata and Aoshima (2013), Jung (2022) and our Remark 7. Shen, Shen and Marron (2013) apply regularization in the presence of sparsity in the population eigenvectors to correct finite sample bias in the principal components. It is unclear how their estimators are related to the update in , but we do not impose such sparsity assumptions.
Several other strands of the pca literature are relevant as their aims coincide with improved sample eigenvector estimation. In one direction is the literature on sparse and low-rank matrix decompositions (e.g. Chandrasekaran, Parrilo and Willsky (2012), Saunderson et al. (2012), Bai and Ng (2019), Farnè and Montanari (2024) and Li and Shkolnik (2024)). These convex relaxations aim to find more accurate low-dimensional representations of the data and are sometime referred to as forms of robust pca (Candès et al., 2011). In a related direction is the recent work on robust pca for heteroskedastic noise (e.g., Cai et al. (2021), Zhang, Cai and Wu (2022), Yan, Chen and Fan (2021), Agterberg, Lubberts and Priebe (2022) and Zhou and Chen (2023)). These efforts provide ( finite) bounds on generalized angles between the true and the estimated subspaces and complement our asymptotic pca results in Sections 3 & 5. Perturbations of eigenvectors have also been recently revisited in Fan, Wang and Zhong (2018), Abbe, Fan and Wang (2022) and Li et al. (2022). The latter use these bounds to construct estimators that “de-bias” linear forms such as appearing in . These results can likely supply alternative proofs to ours (or even convergence rates), but our focus is on limit theorems only.
Lastly, we emphasize the area of mean-variance portfolio optimization. As the literature on this topic is quite vast, we mention only a few strands related to Section 1.1. Examples of early influential work in this direction include Michaud (1989) and Best and Grauer (1991). For numerical simulations that illustrate the impact on practically motivated models and metrics see Bianchi, Goldberg and Rosenberg (2017). A random matrix theory perspective on the behavior of objectives related to may be found in Pafka and Kondor (2003), Bai, Liu and Wong (2009), El Karoui (2010), El Karoui (2013), Bun, Bouchaud and Potters (2017) and Bodnar, Okhrin and Parolya (2022). Highly relevant recent work in econometrics using latent factor models includes Ding, Li and Zheng (2021) who consider a portfolio risk measure closely tied to . Bayesian approaches to mean-variance optimization include Lai et al. (2011) and Bauder et al. (2021). These estimators are closely related to Ledoit-Wolf shrinkage (Ledoit and Wolf (2003) and Ledoit and Wolf (2004b)) which itself has undergone numerous improvements (e.g., Ledoit and Wolf (2018) and Ledoit and Wolf (2020a)). In tandem, shrinkage methods have been known to impart effects akin to extra constraints in the portfolio optimization as early as Jagannathan and Ma (2003). An insightful example of such robust portfolio optimization that relates to the convergence of the covariance matrix estimator is developed in Fan, Zhang and Yu (2012). More advanced robust portfolio optimizations have also been proposed (e.g., Boyd et al. (2024)). Alternatively, constraints are often applied in the covariance matrix estimation process as an optimization in itself. For example, Won et al. (2013) apply a condition number constraint that leads to non-linear adjustments of sample eigenvalues (c.f., Ledoit and Wolf (2020b)), but leaves the sample eigenvectors unchanged. Bongiorno and Challet (2023) document the difficulty with relying solely on eigenvalue correction, especially for small sample sizes. Cai et al. (2020) apply sparsity constraints (on the precision matrix) and analyze optimality properties related to . We emphasize that the impact of such constraints on eigenvectors is difficult (or impossible) to quantify, in contrast to formula .111111It should be noted that another interesting approach to mean-variance portfolio optimization concerns the direct shrinkage of the portfolio weights (i.e., akin to shrinkage in , e.g., Bodnar, Parolya and Schmid (2018), Bodnar, Okhrin and Parolya (2022) Bodnar, Parolya and Thorsén (2023)).
1.4 Notation
Let denote the column span of the matrix and let denote the orthogonal projection of the vector on , e.g.,
(7) |
where , the Moore-Penrose inverse of a full column rank matrix . We use to denote an identity matrix and when highlighting its dimensions, .
Write for the scalar product of , let and be the induced (spectral) norm of a matrix . We denote by a function that given a matrix , uniquely selects (see Appendix D) an enumeration of its singular values and outputs a matrix of left singular vectors with the values in columns . That is,
(8) |
where is the diagonal with entries , and is the matrix of right singular vectors of . The matrix also corresponds to some unique choice of eigenvectors of with largest eigenvalues .
We take and . Lastly, and denote the limit inferior and superior as , and , a sequence of matrices with dimensions when at least one of or grows to infinity.
2 Quadratic Optimization Bias
We begin with a covariance matrix which has the decomposition,
(9) |
for a full rank matrix and some invertible, symmetric matrix .
The covariance decomposition in is often associated with assuming a factor model (e.g., see Fan, Fan and Lv (2008)). In the context of large covariance matrix estimation, the following approximate factor model framework is by now standard.121212These conditions originate with Chamberlain and Rothschild (1983) and Assumption 1 closely mirrors theirs as well as those of later work such as Fan, Fan and Lv (2008) and Fan, Liao and Mincheva (2013).
Assumption 1.
The matrices and satisfy the following.
-
(a)
.
-
(b)
exists as an invertible matrix with fixed .
In the literature on factor analysis, the entries of a column of are called loadings, or exposures to a risk factor corresponding to that column. Condition (b) of Assumption 1 states that all risk factors are persistent as the dimension grows and implies the largest eigenvalues of grow linearly in . Condition (a) states that all remaining variance (or risk) vanishes in the high dimensional limit and the bulk ( smallest) eigenvalues of are bounded in eventually. The matrix is associated with covariances of idiosyncratic errors, but can have alternative interpretation (e.g., covariance of the specific return of financial assets). Assumption 1 implies for the sequence of eigenvectors of with nonzero eigenvalues. The latter implication motivates the frequent reference to the as the asymptotic principal components of .
In practice, is unknown and an estimated model is used instead. Let,
(10) |
for a full rank matrix and a number . We assume is known and allow for to depend on provided this sequence is bounded in . We do not pursue alternative (to ) estimates of because, as pointed out below, accurate estimation of the matrix is of secondary concern relative to the accuracy of the estimate .
For , the eigenvectors and per , define
(11) |
assuming . We note and that is a precursor to the quadratic optimization bias function in , but the in need not have orthonormal columns (fn. 3). These two definitions are equated in Section 3.
All results in this section continue to hold with redefined with any such that with bounded in and diagonal matrix , not necessarily the eigenvalues. This alternative may be useful for some applications.
2.1 Discrepancy of quadratic optima in high dimensions
Returning to the optimization setting of Section 1.1, for constants and , we consider
(12) |
which attains at the maximizer analogously to but with . Because is not the true objective function in , we are interested in the realized objective . Now,
(13) |
which identifies the discrepancy in relative to both and .
To avoid division by zero in , we prevent from vanishing and residing entirely in asymptotically (i.e., ).131313This edge case must be treated separately from our analysis and we do not pursue it. The entries of may be viewed as the first entries of an infinite sequence or as rows of a triangular array. We further assume the estimate has properties consistent in view of Assumption 1 (b).
Assumption 2.
Suppose and satisfy and . Also, exists as a invertible matrix,
We address the asymptotics of the discrepancy in , letting as above, with the canonical choice.
Remark 3.
The proof (see Appendix A) has a more general statement by relaxing the rate of growth of the eigenvalues of to a sequence (rather than ). That is, we only assume the limits of and are invertible matrices. In this case, is replaced by above. This shows is in .
Theorem 1 reveals that diverges to unless we find roots of , perhaps asymptotically. Note that , but other roots exists (see Section 5).
Lemma 2.
For any full rank matrix with and any invertible matrix , we have .
Proof.
This follows by a direct verification using the definition in .
and with the definition of in we obtain the desired result. ∎
Lemma 2 pinpoints what constitutes a poor “plug-in” covariance estimator . For example, the column lengths of have no effect on the quadratic optimization bias . For the eigenvalue decomposition (with in Lemma 2), we see that . Thus, to fine-tune for quadratic optimization, one need correct only the basis . This amounts to finding the (asymptotic) roots of the function . If the convergence to a root is sufficiently rapid, one may then estimate closely by to bring the discrepancy to one per Theorem 1. We conclude this section by showing that for many applications the rate of convergence of is less important than Theorem 1 suggests.
2.2 Applications
To illustrate some important examples in practice, we consider the following canonical, constrained optimization problem.
(14) | ||||
Now, in is the Lagrangian for with and , which decays as under Assumption 2. The minimizer of corresponds to the weights of a minimum variance portfolio of financial assets with implementing the “full-investment” constraint. Minimum variance and the more general mean-variance optimized portfolios are widely used in finance. Here, the entries of a column of in represent the exposures of assets to that risk factor, e.g., market risk (bull/bear market), industry risk (energy, automotive, etc.), climate risk (migration, drought, etc.), innovation risk (Chat GPT, etc). Similar formulations based on arise in signal-noise processing and climate science (see Appendix E).141414In signal-noise processing, maximizes the signal-to-noise ratio of a beamformer with referred to as the “steering vector”. The same is done for optimal fingerprinting in climate science with called the “guess pattern”. We review this literature with emphasis on estimation of in Appendix E.
Continuing with the above example, the minimum of corresponds to the variance of the estimated portfolio , while the expected out-of-sample variance is,
(15) |
We have (see Appendix A) and, under the conditions of Theorem 1,
(16) |
because . Because converges in as under our Assumption 1(b), we achieve (in expectation) an asymptotically riskless portfolio provided the convergence and irrespective of its rate.
3 Principal Component Analysis
Let denote a data matrix of variables observed at dates which, for a random matrix and random matrix , follows the linear model,
(17) |
The matrix forms the unknown to be estimated, and only is observed, while is a matrix of latent variables and the matrix represents an additive noise.
The pca estimate of may be derived from leading terms of the spectral decomposition of the sample covariance matrix (see Remark 4), i.e.,
(18) |
where the sum is over all eigenvalue/eigenvector pairs for of unit length (i.e., ). The th column of the matrix in is taken as where is the th largest eigenvalue of . The matrix forms the residual. Ordering the eigenvalues of as , we have
(19) |
where is a diagonal matrix with entries and the columns of the matrix are the associated sample eigenvectors in with .
Since data is often centered in practice, in addition to , we consider the eigenvectors of the transformed data matrix where for any ,
(20) |
and the sample covariance in is given by since . Centering the columns of entails the choice in but we allow .
Remark 4.
The identity is the aim of centering and holds under well-known conditions, e.g., has i.i.d. columns, with the and uncorrelated. We do not require that for the results of this section.
Our results require the following signal-to-noise ratio (diagonal) matrix , where the “noise” is specified in terms of the average of the bulk eigenvalues, (c.f., ).
(21) |
where is the number of nonzero eigenvalues of . When , ensuring per that implies, for of full rank, that for and otherwise. For , the eigenvectors and eigenvalues may also be computed more efficiently using the smaller matrix which shares its nonzero eigenvalues with . This computation is represented as follows (c.f. ).
(22) |
The pca–estimated model for takes in and our estimator for the simple choice which suffices in view of Section 2. We prove (Theorem 4) that consistently estimates the average idiosyncratic variance as , under our upcoming Assumption 6.151515The residual is typically regularized to form an robust estimate of . Examples include zeroing out all but the diagonal of this matrix, and the POET estimator Fan, Liao and Mincheva (2013).
Sections 3.1–3.2 below define , the eigenvectors of associated with the largest eigenvalues (as other choices were possible for Theorem 1 and the definition of in ). We do not require per Remark 4.
3.1 Norm of the optimization bias for pca
We analyze the asymptotics for the pca estimate . Lemma 2 with in and imply that and , which reduces to
(23) |
The unknowns in are and and we provide theoretical evidence that they cannot be estimated from data in Section 4 without very strong assumptions. Here, we nevertheless obtain an estimate of the length . The following addresses a division by zero in . Recall that and per .
Assumption 5.
and for .
Our next assumption concerns the matrices and in . These guarantee that almost all realizations of the data have full rank for sufficiently large , allowing us to treat in as when and otherwise.
Assumption 6.
Assumption 1 on the matrices and holds and the following conditions hold for and sequences and .
-
(a)
Only is observed (the variables in are latent).
-
(b)
The true number of factors is known and (with fixed).
-
(c)
is () invertible almost surely (and does not depend on ).
-
(d)
almost surely for some constant .
-
(e)
almost surely for some matrix norm on .
-
(f)
almost surely where .
These conditions are discussed below. Our fundamental result on pca (in conjunction with Theorem 1) may now be stated. Its proof is deferred to Appendix B.
Theorem 3.
We remark that is computable solely from the data with almost every bounded in eventually. Theorem 3 demonstrates that pca, and sample eigenvectors in particular, lead to poor “plug-in” covariance estimators for quadratic optimization unless every column of is eventually orthogonal to . So typically, the discrepancy in between the estimated and realized optima diverges to as grows and at a linear rate. In the portfolio application of Section 2.2, this covariance results in strictly positive expected (out-of-sample) portfolio risk per – asymptotically, which may be approximated by using .
We make some remarks on Assumption 6. Conditions (a)–(c) are straightforward, but we mention that the invertibility of is closely related to the requirement that in condition (b). Condition (c) fails when in lies in but such a case is dealt with by rewriting the data in as for some and of columns each, and some mean vector . Then, we have , and it only remains to check if condition (c) holds with the matrix replacing . Conditions (d)–(f) require that strong laws of large numbers hold for the columns of the sequence . These roughly state that the columns of are stationary with weakly dependent entries having bounded fourth moments. All three are easily verified for the populated by i.i.d. Gaussian random entries. Lastly, we remark that if conditions (e) and (f) hold for they hold for any .
Since in practice both and are finite, we can make some refinements to the definitions in based on some classical random matrix theory. In particular, it is well known that when the aspect ratio converges in (in our case to zero), the eigenvalues of have support that is approximately between and for the constant in condition (d). We can then define,
(25) |
which is a Marchenko-Pastur type adjustment to (and ) defined in . When the eigenvalues of obey the Marchenko-Pastur law, this is advisable.
3.2 hdlss results for pca
Theorem 3 is essentially a corollary of our next result, which is of independent theoretical interest for the hdlss literature.
Theorem 4.
The proof is deferred to Appendix B. Parts (a)–(b) should not surprise those well versed in the hdlss literature. Nevertheless, these limit theorems for eigenvalues provide new content by supplying estimators, not just asymptotic descriptions.
Remark 7.
Parts (a)–(b) of Theorem 4 supply improved eigenvalue estimates for the pca covariance model when , and while these have no effect on the optimization bias , we summarize them. Part (a) implies is an improved estimator (relative to ) of the population matrix . Part (b) implies that is an asymptotic estimator of where . To see this, w.l.o.g. take , and note that the trace tr and the expectation E commute. Then, and since (Assumption 6(d)) provided is uniformly integrable, converges to .
The limits in parts (c)–(d) of Theorem 4 are new and noteworthy. They supply estimators for the quantities and from data. While these are not enough to estimate (for that we need both and ), they suffice for the task of estimating the norm from the data .
The convergence in part (c) has an interpretation. By direct calculation we have that (e.g., see in Appendix B), which implies that for columns of , say and , the th entry of is , i.e., the inner product of and projected onto . This is in contrast to the th entry of where is the th column of . Part (c) states that,
(26) | ||||
almost surely, where is itself a random sequence eventually in . That is, sample eigenvectors remain orthogonal in , but their norms are less than the maximal unit length, i.e., columns of are inconsistent estimators of columns of .
The following elegant characterization is an artifact of the fact that square matrices with orthonormal rows must also have orthonormal columns.
Corollary 5.
Let . Under the hypotheses of Theorem 4 the matrices and are asymptotic inverses of one another., i.e., almost surely,
(27) |
Applying this to per Theorem 4(d), yields
(28) |
provisionally on Assumption 5 and without it, both and converge to zero. Thus, Theorem 4 implies we can asymptotically know the length (the norm of the projection of in ). Further, as all diagonal entries of are eventually smaller than one and , we deduce that has larger projection onto than does eventually in . We conclude with a simple consequence of Theorem 4(d) relevant for Theorem 3.
Corollary 6.
Suppose that Assumption 6 holds. Then, implies almost surely.
4 An Impossibility Theorem
The problem of estimating the unknown and appearing in encounters significant challenges for . It is related to, but separate from, the problem called “unidentifiability” that arises in the context of factor analysis (e.g., Shapiro (1985)). Here, we prove an “impossibility” result. To give an interpretation of , we now require and , so that and may be regarded as the sample and the population eigenvectors (or asymptotic principal components).
With denoting the singular value decomposition of in , and similarly with , we find (see Appendix B),
(29) |
which holds almost surely under Assumption 6. This limit relation has been studied in the hdlss literature under various conditions and modes of convergence (e.g., Jung, Sen and Marron (2012) and Shen et al. (2016)). But these authors do not derive estimators for the right side of (i.e., the is not observed).
We prove that it is not possible, without very strong assumption on , to develop asymptotic estimators of the inner product matrix . Given this, it is also reasonable to conjecture the same for . While this problem is motivated by our study of the quadratic optimization bias, the estimation of the entries of , and hence the estimation of angles between the sample and population eigenvectors, is an interesting (and to our knowledge, uninvestigated) problem in its own right.
We remark that the problem of “unidentifiability” amounts to the observation that replacing and by and for any orthogonal matrix does not alter the observed data matrix deeming unidentifiable (i.e., or ?). However, the quantity of interest in our work is , which bypasses this type of unidentifiability as is defined via the identity which is a population quantity encoding the uniquely selected eigenvectors of . Hence, the unidentifability of is related to but not the same as problem we formulate.
We work in a setting where the noise in is null and the matrices have additional regularity over Assumption 1. The presumption here is that these simplifications can make our stated estimation problem for only easier.
Condition 8.
The data matrices with fixed have for a sequence and satisfying the following.
-
(a)
The is a random variable on a probability space with almost surely invertible and such that .
-
(b)
The (for all ) satisfies for , a fixed orthogonal and fixed diagonal with for all .
Any of Condition 8 has obeying Assumption 1(b) with , and for which the sample covariance satisfies for the eigenvalue matrix .
For satisfying Condition 8(b), we define a set of orthogonal transformations which non-trivially change the eigenvectors . Let,
(30) |
Every element induces the data with and , which is uniquely identified by the orthogonal matrix . The new data set built in this way satisfies Condition 8 for of that condition. We remark that the only diagonal element of is the identity matrix (i.e., flipping the signs of any of the columns of does not result in a different set of eigenvectors ). Indeed, if we partition all orthogonal matrices by the equivalence relation that sets two matrices equivalent when their columns differ only by a sign, then the set selects exactly one element from each equivalence class. Since the number of elements in each equivalence class is finite, we have established that the set has the same cardinality as the set of all orthogonal matrices with dimensions .
We now consider and a sequence of (nonrandom) measurable functions that together with the notation define,
(31) |
The event consists of all outcomes for which the consistently estimate as for every . The following lemma may be used to generate bounds on the probability of event for many examples.
Lemma 7.
Suppose Condition 8. Then, for any and corresponding function given by , we have
(32) |
Proof.
Since almost surely, has linearly independent columns, it is easy to see that . Using that and that yields,
∎
The next example is a good warm-up for our main result (Theorem 8) below.
Example 9.
Let be nonrandom and , a random matrix with and for some . By taking , the event contains the outcomes for which admits a consistent estimator for two data sets corresponding to the and .
If , then contains outcomes corresponding to each possible realization of which implies by Lemma 7 that both and converge to zero. Since this is a contradiction, .
This stylized example may be substantially generalized by requiring a certain distributional property of the random variable .
Definition 10.
We say a random variable is -distributable if there exists a collection such that for any measurable ,
Clearly, that has mean zero i.i.d. Gaussian entries is -distributable for with just one element, but we expect many random matrices to have this property. Our main result shows that even when restricting to a smaller set of covariance models, the chances of estimating the matrix are no better than a coin flip.
Theorem 8.
Suppose Condition 8 holds and is -distributable with . Then, for this and any sequence of (nonrandom) measurable functions , the in has .
Proof.
Note that the -measurability of the set is granted by the measurability of each (i.e., each is measurable and so is each (Acker, 1974)).
Letting , we see that
(33) |
by taking . Analogously, for for any , we have
(34) |
Letting , we claim that and are disjoint. To see this, note that if , then for , . Substituting for in relation , and substituting for in relation , yields
a contradiction, as both cannot hold simultaneously. Thus, and are disjoint.
Consequently and are disjoint and moreover, the -distributability of implies . This along with the fact that implies the desired result, i.e.,
∎
5 Optimization Bias Free Covariance Estimator
Let be the matrix of eigenvectors in of the sample covariance . Recalling the variables and of Theorem 3, we define
(35) |
Theorem 3 proved that with eventually in almost surely. From the observable and , we now construct an matrix with as . To this end, consider the eigenvalue decomposition
(36) |
for eigenvectors and diagonal matrix of eigenvalues . The estimator is computed as the eigenvectors , i.e.,
(37) |
where the diagonal is invertible for sufficiently large under our assumptions.
The proof is deferred to Appendix C but we sketch the derivation of and give a geometrical interpretation in Sections 5.1 and 5.2. We take to combine the eigenvector correction with that for the eigenvalues (see Remark 7). Note that by Lemma 2. We let be our covariance estimator where, identically to pca, we take with in or .
Theorem 9 provides theoretical guarantees for many applications, including that the estimator is now demonstrated to yield minimum variance portfolios (i.e., solutions of ) with zero asymptotic variance (see in ). Addressing the convergence rate of is outside of our scope, but we study this rate numerically in Section 6, which shows, at least for Gaussian data, that rate is . This suggests that yields a bounded discrepancy of Theorem 1 under some conditions.
The last part of Theorem 9 concerns the inner products of the columns of projected onto . This is in direct comparison to Theorem 4(c) which shows that the sample eigenvectors are orthogonal in and the same is true for the columns of since is diagonal. Selecting the th column of we have,
as compared with in . Note, eventually with a strict inequality when is bounded away from zero (in ) due to . Thus the length of projected onto is at least as large as for its counterpart
5.1 Remarks on the GPS program
The special case was considered by Goldberg, Papanicolaou and Shkolnik (2022) (henceforth gps) who apply their results to portfolio theory. We summarize the relevant parts of the gps program making adjustments for greater generality and compatibility with our solution in Section 5.2.
Here, takes the form where and Assumption 1 requiring a sequence for which converges in . The sample covariance matrix may be written as where is the largest eigenvalue with eigenvector and the matrix contains the remaining spectrum per . Setting yields,
(39) |
for the quadratic optimization bias in the case . Our uses a different denominator than GPS, but this difference is not essential. Our generalizes the choice of a scalar matrix in GPS and our Assumption 6 relax their conditions.
The GPS program assumes (w.l.o.g.) that and , enforces Assumption 5 so that , and takes the following steps.
-
(1)
Find asymptotic estimators for unknowns and in . To this end, for the observed (c.f., ), under Assumption 6 almost surely,
(40) -
(2)
Consider the estimator parametrized by so that increases in . This construction is motivated by the in the numerator of becoming eventually less than almost surely, per .
-
(3)
Solve for as a function of the unknowns and . Leveraging , construct an observable such that as and prove a uniform continuity of to establish that .
These steps cannot be easily extended to the setting of general . Step (1) is no longer possible in view of Theorem 8, and indeed, the “sign” conventions and cannot be appropriated from the univariate case given that result. Step (2) is difficult to extend because its intuition becomes obscure for general where the vector resides. Step (3) relies on basic calculations to determine the root of a univariate function. Determining roots in , especially without the right parametrization in step (2), appears difficult given the definition of in .
We make some adjustments to prime our approach in Section 5.2. First, write
(41) |
as a replacement for . This drops the sign conventions on to reformulate step (1) for compatibility with the findings of Theorem 4 parts (c)–(d).
Our adjustment to step (2) sacrifices its intuition for additional degrees of freedom. In particular, for and (c.f., ), set
(42) |
This two-parameter estimator parametrizes the quadratic optimization bias as,
(43) |
It is not difficult to verify that setting and in the above display leads to the identity with . Finally, the parameter
(44) |
also has the property that but admits asymptotic estimators via the replacement of . This modifies step (3) of the GPS program to find an asymptotic root of without any sign conventions on . While these changes are somewhat trivial for the case , our understanding of them is informed by the case and initiated by the impossibility result in Theorem 8.
5.2 Sketch of the derivation of
We begin by defining a matrix composed of orthonormal columns, derived from in and in , i.e.,
(45) |
so that expands by the vector . We introduce a parametrized estimator for a full rank matrix , derive a root of the map , and construct an asymptotic estimator of by applying Theorem 4(c)–(d).
We consider the following family of estimators with as a special case.
(46) |
Any in this family is a matrix of full rank. We have for , but the constraint on imposed by is relaxed in .
Substituting into the optimization bias function in , we obtain
(47) |
where we have used that . The expression is obscure, but we note that which suggests a simplification post , i.e.,
(48) |
provided is invertible and (see Appendix C). For last equality we use that (i.e., the projection of onto is itself). We remark that the matrix formalism of has advantages even over the special case in . Figure 2 illustrates geometry of the transformation at .

The slick calculation above does not constitute our original derivation which is heavy-handed and superfluous. The advantage of lies in its brevity and its quick bridge to the GPS program. Yet, is not sufficient in view of Theorem 8, i.e., the optimal point is not observed nor can it be estimated from the observed data. To this end, we seek an invertible matrix for which (similarly to ),
may be estimated solely from and use Lemma 2 to conclude that provided is invertible. The choice is motivated by the fact that as ,
(53) |
where we used in and applied Theorem 4 to obtain the stated almost sure convergence (see Appendix C for details). The variables in the limit are computable from the data and it is again notable that while we are unable to estimate , the quantity admits an estimator as did .
Our estimator is now easily seen to take the following form.
(56) |
This suggests taking because now implies that
(57) |
Appendix C proves the is eventually invertible and applies to deduce that implies the desired conclusion .
6 A Numerical Example
We illustrate our results on a numerical example to provide a verification of Theorems 4, 3 and 9. We also study the convergence rates of various estimators, which are not supplied by our theory. Consider i.i.d. observations of where,
(58) |
with and a matrix , that are realized over uncorrelated and with and . Then, . Fixing , , , we simulate data matrices with observations of as its columns. The parameters are calibrated in Section 6.2 with the minimum variance portfolio problem described in Section 2.2 in mind.
We simulate data matrices , selecting subsets of size by taking to study the asymptotics of three estimators. All three are based on a centered sample covariance (see ), the spectrum of which equals that of and is computed from this matrix. This results in a diagonal matrix with the largest eigenvalues of , as well as the in and in . Our three estimators have the form,
(59) |
where and is one of three matrices of orthonormal columns.
-
()
The sample eigenvectors are computed per using the matrix . These vectors correspond to a pca covariance model in .
-
()
The matrix will use the gps estimator of Section 5.1 to issue a correction to only the first column of . In particular, we let equal in columns – and replace its first columns by with in and given by . Finally, we set computed analogously to for efficiency.
-
()
The corrected sample eigenvectors are computed using –.
To assess the performance of the three covariance estimators in we test them on several metrics. With respect to the minimum variance portfolio application, for and in , the returns to financial assets, we compute
(60) |
the true minimum variance. We compare the volatility to the realized volatility, (see ) of that minimizes with .
We also study the length of the quadratic optimization bias, the true and realized quadratic optima (taking in and ) of Section 1.1,
(61) |
and their discrepancy . The and (as well as , and minimizers of , ) are efficiently computed via the Woodbury identity to obtain the inverses of the covariance matrices and .


6.1 Discussion of the results
Table 1 and Figure 3 summarize the simulations for our minimum variance portfolio application. Volatilities are quoted in percent annualized units (see Section 6.2) and only portfolio sizes should be considered as practically relevant. Three sets of portfolio weights () constructed with the covariance models in are tested. As predicted by , the realized portfolio volatility for the pca-model weights in the third column of Table 1 remains bounded away from zero (on average). The same holds for the partially corrected estimator , which substantially decreases the volatility of the pca weights for larger portfolios. This estimator was also tested for in Goldberg et al. (2020), but for a model in which as for . In this special case, the estimators and coincide asymptotically. In our more realistic model of Section 6.2, all sample eigenvectors require correction as evident by comparing and in Table 1. The latter portfolio volatility decays at the rate of roughly . The true volatility (second column of Table 1) also decays at this rate. Figure 3 depicts the much larger deviations about the average that the estimators and produces on the portfolio volatility metric relative to . Surprisingly, produced the largest such deviations.
Table 2 and Figure 4 compare the pca-model to our optimization-bias free estimator on the quadratic function objectives in . As predicted in Section 1.1 and Theorem 1 in particular, the true objective value (the second column of Table 2) increases in while the realized objective decreases rapidly. The (expected) discrepancy of the pca-model is shown to diverge to negative infinity linearly with the dimension as predicted by Theorem 1 (i.e, largest eigenvalues of the covariance model of Section 6.2 diverge in ). The last two columns of Table 2 confirm the realized maximum and discrepancy produced by the corrected eigenvectors behave in a more desirable way. The discrepancy appears to converge in a neighborhood of the optimal value one, while the realized maximum has a trend similar to that of the true maximum. Figure 4 shows the large uncertainly of the average behaviour summarized in Table 2 that results from using the sample eigenvectors or their partially corrected version, (see Table 4 for the averages). The uncertainty produced by the corrected eigenvectors is negligible by comparison.
Table 3 summarizes our numerical results on the length of the quadratic optimization bias for the sample eigenvectors and the corrected vectors . Table 4 supplies the same for the partially corrected eigenvectors . The first three columns of Table 3 confirm the findings of Theorem 3, i.e., the length optimization bias for pca may be accurately estimated from observable data in higher dimensions. We find that the expected length converges away from zero, and that diverges in expectation. This is predicted by Theorem 3 since does not vanish as grows. Table 4 presents similar findings for , which we have not analyzed theoretically. Column of Table 3 confirms the predictions of Theorem 9, i.e., the corrected bias length vanishes as grows. Our numerical finding expand on this by also demonstrating that appears to be bounded (in expectation). This suggest a convergence rate of for the corrected bias . The latter is consistent with the asymptotics of in Table 2 which Theorem 1 forecasts to behave as .
Table 5 provides support for Theorem 4(c) and Theorem 9 which concerns the projection of the estimated eigenvectors onto the population subspace . The convergence verified in columns two and four show that the vectors in and remain orthogonal after projection onto because and are diagonal matrices. The largest elements of these matrices (presented as averages in columns three and five) estimate the largest length squared of the columns of and in respectively. This confirms has a larger such projection than does .




6.2 Population covariance model
Our covariance matrix calibration loosely follows the specification of the Barra US equity risk model (see Menchero, Orr and Wang (2011) and Blin, Guerard and Mark (2022)). To this end, we introduce a (random) vector of factor returns and a exposure matrix which satisfy,
(69) |
with in such that and . The factor returns are Gaussian with mean-zero and covariance in . The unit are chosen so that the factor volatilities (square roots of the diagonal of ) are in units of annualized percent. The columns of are exposures to () fundamental risk factors (market risk, two style risk factors and fours industry risk factors), and are generated as follows.
-
–
The entries of the first column (exposures to market risk) of are drawn as i.i.d. normal with mean and standard deviation . The second and third columns of (style risk factors) have i.i.d. entries that are normal with mean zero and standard deviations and respectively for those columns.
-
–
The last four columns of are initialized to be zero and for each row , independently of all other rows, we select two industries and from uniformly at random and without replacement. Then, drawing and that are independent and uniform in , we set and .
The left panel of Figure 5 contains histograms of the first three columns of . This calibration of market and style risk factors is similar to that in Goldberg et al. (2020), who do not consider industry risk, and compare the estimators and in simulation. The entries of the last four columns, which correspond to industry risk factors, have the following interpretation. Each asset chooses two industries for membership with an exposure of to each on average. When the chosen industries are the same, that exposure is on average (i.e., ). Figure 6 supplies a visual illustration of the structure of these industry memberships. The industry risk factors drive the poor performance of the estimator in our simulations due to the nonzero projection that the corresponding four columns of have in . The latter translates to components of the optimization bias vector that materially deviate from zero, and the first that is suboptimally corrected.
The asset specific return in are drawn from a mean-zero Gaussian distribution with a diagonal covariance matrix . We take , for asset specific volatilities , drawn as independent copies of where is a distributed random variable. These are quoted in annualized percent units, and we refer the reader to Clarke, De Silva and Thorley (2011) for typical values that are estimated in practice. Lastly, the expected return vector in is taken as for .
Appendix A Proofs for Section 2
By direct computation based on the definition in we obtain,
(70) |
and, recalling that , the above yields the following useful identities.
(71) |
The right side is bounded away from zero in under Assumption 2. Throughout, we regard as a sequence in that is bounded in . We also introduce an auxiliary sequence to generalize the rates in Assumptions 1 and 2 so that both and converge to invertible matrices.
We begin by expanding on some of the calculations in Section 1.1 and Section 2. Starting with , the maximizer of is easily calculated as , and
justifying the expression for below with as well as .
Define and set per . Then,
which is identical to with .
Lastly, we recognize as the (unique) solution of . The following provides a useful decomposition of these solutions.
Lemma 10.
Suppose has as an invertible matrix. Then, for vectors with , the minimizer of has
(72) |
Proof.
We begin with an expression of via the Woodbury identity.
Next, consider the singular value decomposition where and have orthonormal columns and is diagonal. Then,
(73) |
where at the last step we utilized that and that is diagonal.
Starting with the expression , we define
(74) |
substitute and use that and , to obtain
for per . This identifies in via the relation,
Because , to conclude the proof, it now suffices to show that is so that also as required. We have,
(75) |
Since the spectral norm and the inverse of a matrix over invertible matrices are continuous functions, our assumption on implies that converges to a finite number. This, together with completes the proof. ∎
Proof of Theorem 1.
Continuing from the expression , we first address the asymptotics of , and using this yields,
(76) |
Applying of Lemma 10 and the positive definiteness of per Assumption 1,
(77) |
where and . Turning our attention to the first term in by letting , we again apply Lemma 10 to deduce that
Since per , the decomposition , yields
(78) |
Considering , we examine for large. Using and ,
(79) |
where , in Lemma 10, was shown to have in . We have in for our modification of Assumption 1 and assuming vanishes,
for is in as the eigenvalues of are bounded in . So, the last three terms in the above display are in .
Similarly, combining and , we obtain
where the 2nd term is in and is in as
(80) |
where we note that and are bounded in . The claim now follows. ∎
Appendix B Proofs for Section 3
Essential for our proofs is Weyl’s inequality for eigenvalue perturbations of a matrix (Weyl, 1912). In particular, for symmetric matrices and ,
where and denote the th largest eigenvalues of and respectively.
Define with in and in .
(81) |
By Assumption 1(b) the following limit matrix exists, with the right side the eigenvalue decomposition with orthogonal and invertible, diagonal .
(82) |
Let denote the th largest eigenvalue of (also, for ) associated with the th column of , the eigenvectors of . By Assumption 6(c), we have for and otherwise.
As relevant for Assumption 6(b), above implies (see ) almost surely converges to whenever and converges to otherwise, as .
Proof.
We address the convergence of as follows. The sum of the first and second terms in (scaled by ) converge to due to Assumption 1(b) and Assumption 6(d). The last two terms in (scaled by ) vanish by Assumption 6(e). Since the nonzero eigenvalues of are those of , almost surely, converges to the th eigenvalue of by Weyl’s inequality.
It remains to find the eigenvalues of . For , it is easy to check that the eigenvalues of are just for with eigenvectors . When , we have so that for any other eigenvector of , we have and consequently . It follows when , the eigenvalues of are given by for and zero otherwise. This concludes the proof. ∎
Proof of Theorem 4.
Taking (b) first, in is the average of for . By Lemma 11, for such we have (i.e., for ). Therefore, almost surely and part (b) holds.
Turning to part (a) we have for in . By part (b) and Lemma 11, for . For in (a), since is the th largest eigenvalue of and equals that of . The latter matrix converges to by Assumption 1(b) and now, by Weyl’s inequality, almost surely. Dividing by finishes the proof.
Henceforth, and in view of the above, we work with the assumption that has rank since for any outcome there is a sufficiently large to ensure this.
For part (c), let where is the matrix of right singular vectors of corresponding to its left singular vectors . We have,
(84) |
where the first identity is due to being a right singular vector of with value zero (i.e., ). The second identity comes from the singular value decomposition which implies that . The latter further yields that,
(85) |
Multiplying this by yields that for ,
(86) |
and we expand on to obtain (using that ),
Since the matrix is full rank, and also . Therefore,
(87) |
where at the last step we used that is symmetric.
Combining and with with in for which , adding and subtracting (where ) and recalling that ,
From the above, we obtain the following bound.
(88) |
We have and thus,
(89) |
almost surely by Lemma 11 and using that .
By part (b) and Assumption 6(d), we also have that almost surely,
(90) |
Since , it suffices to prove that
(91) |
almost surely. In that regard, we have
Applying Assumption 1(b), and in particular , yields
Assumption 6(e) and the fact that all matrix norms on are equivalent concludes the proof of . Part (c) now follows by combining – and observing that each for is eventually in due to parts (a) and (b).
Proof of Theorem 3.
We first prove the norm of the numerator of in converges to . Using that yields,
(92) |
Considering the last term in , we obtain
and because as well as , we have by part (c) of Theorem 4 that
(93) |
For the first term in , due to Corollary 5 and in particular,
(95) |
where we used that . Since almost surely due to part (c) of Theorem 4, applying part (d) of the same theorem now yields,
(96) |
Now, we rewrite the term by substituting as,
which confirms that almost surely, and after taking the limit of and substituting , and . Lastly, from ,
(97) |
where we used that and part (c) of Theorem 4 which shows the limit of the latter is with every eventually in . From this, and (denominator of in ) is eventually in almost surely. We now deduce that vanishes and is eventually in almost surely. Lastly converges to zero only if (when vanishes) concluding the proof.
∎
Appendix C Proofs for Section 5
We begin with the following auxiliary result which requires Assumption 6. As usual, with satisfying Assumption 5.
Lemma 12.
The matrix is eventually invertible almost surely.
Proof.
Proof of Theorem 9.
For in and in , define
(100) |
where was first encountered in . We compute,
(101) |
with the eigenvalue decomposition per . The singular value decomposition,
(102) |
has denoting the th left singular vector with and value . We can write the final estimator in in the form (c.f. ) where
(103) |
We prove the last part first. Using and multiplying from the right by ,
where we used that . Applying Corollary 5 yields,
(104) |
Using the identity and applying Theorem 4 parts (c)–(d),
(105) |
so that per with . This justifies the nontrivial part of the limit statement in . Continuing from ,
Combining this with per and leads to the relation . Therefore,
Finally, we have using and the fact that both matrices and have orthonormal columns.
We now move to proving that for per with,
(106) |
replacing in with and applying . We prove the desired result in two steps below. In step 1 we show the denominator in is bounded away from zero eventually. In step 2 we prove that the numerator in converges to zero.
Step 1. We prove eventually in almost surely. Note that,
for any element in the family and where we have used that and that . Starting with and , we have the spectral decomposition
(107) |
using which and , we write
(108) |
Next, since with in , the vector
is in the null space of and therefore in that of per . Since the column spaces of and are identical, we have forms a basis for .
Observing that ,
since forms a basis for and applying . Consequently,
It now only suffices to show that eventually in . For in ,
and in has by and . Thus,
(113) | ||||
(114) |
By and the fact that we have,
under Assumption 5 which also guarantees . We deduce that the numerator of is eventually strictly positive almost surely. The denominator is finite as and the eigenvalues of are finite by Theorem 4(c).
Thus, is almost surely bounded away from zero eventually.
Step 2. We prove that the numerator of almost surely eventually vanishes. We omit the “almost surely” clause for brevity below. Recall that supplies that
(115) |
provided is invertible for . To establish the latter, we directly compute using with in . Then,
(116) |
and we deduce that is eventually bounded since the columns of and have unit length and is eventually finite by Theorem 3. Since both terms in are positive semidefinite and eventually invertible by Lemma 12, all eigenvalues of are strictly positive. Hence, is eventually invertible.
In view of , it only suffices to prove that the difference between and vanishes in some norm. By Lemma 2 with ,
(117) |
owing to Lemma 12 and Theorem 4(c) which guarantee that and (and hence ) are eventually invertible. Substituting , we have
which confirms and applies which was justified above (see ). Since the mapping from the domain of real , full column rank matrices is continuous, we have via that as required.
We remark that since , we now have . This also proves that eventually in (see comments below ).
∎
Appendix D The Eigenvector Selection Function
For any matrix , we enumerate singular values (in descending order) and their left singular vectors in a well defined way. We start by ordering all distinct singular values of as (c.f. in Section 1.4) and uniquely identifying the linear subspaces formed by the associated left singular vectors. Given the first left singular vectors are selected, we select the th left singular vector by taking the following steps.
-
1.
Identify the unique for which,
(118) -
2.
Let denote the orthogonal complement of the subspace formed by the subset of vectors corresponding to , where the orthogonal complement is taken within . For the standard basis elements of , we identify the unique as the first one that is not orthogonal to .
-
3.
Set as the orthogonal projection of onto normalized to .
Implementing this process sequentially on assembles a list of left singular vectors with associated singular values in decreasing order. We define as an matrix carrying at its columns.
Remark 11.
In the second step to define the th left singular vector, note that the subspace is of non-zero dimension by . Moreover, the uniquely defined standard basis element has to exist. If it does not exist, the whole space becomes orthogonal to , implying that is of zero dimension, which contradicts our previous assertion.
Example 12.
We illustrate the above procedure with . The matrix has as the sole singular value which corresponds to the subspace of left singular vectors . For in the algorithm introduced above, we obtain the corresponding determined as by . The subspace equals as there has not been any selection yet. Then the first of that is not orthogonal to would clearly be . Hence, is the normalized orthogonal projection of onto . Next, we assume as an induction hypothesis that and implement the th step of the algorithm to show . Clearly, defined by corresponding to is . Moreover, , the orthogonal complement of the subspace formed by the vectors previously selected for the singular value , is spanned by . Hence, the first of that is not orthogonal to is . That sets . As a result, we obtain assembled as so that its th column contains the coordinate vector .
Appendix E Capon Beamforming
One important illustration of the pathological behaviour described below concerns robust (Capon) beamforming (see Cox, Zeskind and Owen (1987), Li and Stoica (2005)) and Vorobyov (2013)). Some recent work that applies spectral methods for robust beamforming may be found in Zhu, Xu and Ye (2020), Luo et al. (2023) and Chen, Qiu and Sheng (2024), who survey related work. The importance of the covariance estimation aspect of robust beamforming is also well-recognized (e.g., Abrahamsson, Selen and Stoica (2007), Chen et al. (2010) and Xie et al. (2021)). In particular, the LW shrinkage estimator developed in Ledoit and Wolf (2004b) has had noteworthy influence on this literature, despite being originally proposed for portfolio selection in finance Ledoit and Wolf (2003). Typical application of this estimator employs the identity matrix as the “shrinkage target”, which leaves the eigenvectors of the sample covariance matrix unchanged (fn. 9). However, the estimation error in the sample eigenvectors (especially for small sample/snapshot sizes as is our setting) is known to have material impact Cox (2002). One (rare) example of robust beamforming work that attempts to “de-noise” sample eigenvectors directly is Quijano and Zurk (2015). But, their analysis does not overlap with our –.
[Acknowledgments] We thank Haim Bar and Alec Kercheval for useful feedback on the motivating optimization example (Section 1.1) in our introduction. We thank Lisa Goldberg for an insightful discussion that led to the main theorem of Section 4. We thank Kay Giesecke for many helpful comments on an earlier draft of this paper. We thank the participants of the 2023 SIAM Conference on Financial Mathematics in Philadelphia, PA, the UCLA Seminar on Financial and Actuarial Mathematics, Los Angeles, CA, the CDAR Risk Seminar, UC Berkeley and the AFT Lab Seminar, Stanford CA for their comments and feedback on many of the ideas that led to this manuscript.
References
- Abbe, Fan and Wang (2022) {barticle}[author] \bauthor\bsnmAbbe, \bfnmEmmanuel\binitsE., \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmWang, \bfnmKaizheng\binitsK. (\byear2022). \btitleAn theory of PCA and spectral clustering. \bjournalThe Annals of Statistics \bvolume50 \bpages2359–2385. \endbibitem
- Abrahamsson, Selen and Stoica (2007) {binproceedings}[author] \bauthor\bsnmAbrahamsson, \bfnmRichard\binitsR., \bauthor\bsnmSelen, \bfnmYngve\binitsY. and \bauthor\bsnmStoica, \bfnmPetre\binitsP. (\byear2007). \btitleEnhanced covariance matrix estimators in adaptive beamforming. In \bbooktitle2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 \bvolume2 \bpagesII–969. \bpublisherIEEE. \endbibitem
- Acker (1974) {barticle}[author] \bauthor\bsnmAcker, \bfnmAndrew F\binitsA. F. (\byear1974). \btitleAbsolute continuity of eigenvectors of time-varying operators. \bjournalProceedings of the American Mathematical Society \bvolume42 \bpages198–201. \endbibitem
- Agterberg, Lubberts and Priebe (2022) {barticle}[author] \bauthor\bsnmAgterberg, \bfnmJoshua\binitsJ., \bauthor\bsnmLubberts, \bfnmZachary\binitsZ. and \bauthor\bsnmPriebe, \bfnmCarey E.\binitsC. E. (\byear2022). \btitleEntrywise Estimation of Singular Vectors of Low-Rank Matrices With Heteroskedasticity and Dependence. \bjournalIEEE Transactions on Information Theory \bvolume68 \bpages4618–4650. \bdoi10.1109/TIT.2022.3159085 \endbibitem
- Ahn et al. (2007) {barticle}[author] \bauthor\bsnmAhn, \bfnmJeongyoun\binitsJ., \bauthor\bsnmMarron, \bfnmJS\binitsJ., \bauthor\bsnmMuller, \bfnmKeith M\binitsK. M. and \bauthor\bsnmChi, \bfnmYueh-Yun\binitsY.-Y. (\byear2007). \btitleThe high-dimension, low-sample-size geometric representation holds under mild conditions. \bjournalBiometrika \bvolume94 \bpages760–766. \endbibitem
- Aoshima et al. (2018) {barticle}[author] \bauthor\bsnmAoshima, \bfnmMakoto\binitsM., \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH., \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK., \bauthor\bsnmZhou, \bfnmYi-Hui\binitsY.-H. and \bauthor\bsnmMarron, \bfnmJS\binitsJ. (\byear2018). \btitleA survey of high dimension low sample size asymptotics. \bjournalAustralian & New Zealand journal of statistics \bvolume60 \bpages4–19. \endbibitem
- Bai, Liu and Wong (2009) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ., \bauthor\bsnmLiu, \bfnmHuixia\binitsH. and \bauthor\bsnmWong, \bfnmWing-Keung\binitsW.-K. (\byear2009). \btitleEnhancement of the applicability of Markowitz’s portfolio optimization by utilizing random matrix theory. \bjournalMathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics \bvolume19 \bpages639–667. \endbibitem
- Bai and Ng (2008) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2008). \btitleLarge dimensional factor analysis. \bjournalFoundations and Trends in Econometrics \bvolume3 \bpages89–163. \endbibitem
- Bai and Ng (2019) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2019). \btitleRank regularized estimation of approximate factor models. \bjournalJournal of Econometrics \bvolume212 \bpages78–96. \endbibitem
- Bai and Ng (2023) {barticle}[author] \bauthor\bsnmBai, \bfnmJushan\binitsJ. and \bauthor\bsnmNg, \bfnmSerena\binitsS. (\byear2023). \btitleApproximate factor models with weaker loadings. \bjournalJournal of Econometrics. \endbibitem
- Bai and Yao (2012) {barticle}[author] \bauthor\bsnmBai, \bfnmZhidong\binitsZ. and \bauthor\bsnmYao, \bfnmJianfeng\binitsJ. (\byear2012). \btitleOn sample eigenvalues in a generalized spiked population model. \bjournalJournal of Multivariate Analysis \bvolume106 \bpages167–177. \endbibitem
- Bauder et al. (2021) {barticle}[author] \bauthor\bsnmBauder, \bfnmDavid\binitsD., \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmSchmid, \bfnmWolfgang\binitsW. (\byear2021). \btitleBayesian mean–variance analysis: optimal portfolio selection under parameter uncertainty. \bjournalQuantitative Finance \bvolume21 \bpages221–242. \endbibitem
- Best and Grauer (1991) {barticle}[author] \bauthor\bsnmBest, \bfnmMichael J\binitsM. J. and \bauthor\bsnmGrauer, \bfnmRobert R\binitsR. R. (\byear1991). \btitleOn the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results. \bjournalThe review of financial studies \bvolume4 \bpages315–342. \endbibitem
- Bianchi, Goldberg and Rosenberg (2017) {barticle}[author] \bauthor\bsnmBianchi, \bfnmStephen W\binitsS. W., \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R. and \bauthor\bsnmRosenberg, \bfnmAllan\binitsA. (\byear2017). \btitleThe impact of estimation error on latent factor model forecasts of portfolio risk. \bjournalThe Journal of Portfolio Management \bvolume43 \bpages147–156. \endbibitem
- Blin, Guerard and Mark (2022) {bincollection}[author] \bauthor\bsnmBlin, \bfnmJohn\binitsJ., \bauthor\bsnmGuerard, \bfnmJohn\binitsJ. and \bauthor\bsnmMark, \bfnmAndrew\binitsA. (\byear2022). \btitleA History of Commercially Available Risk Models. In \bbooktitleEncyclopedia of Finance \bpages1–39. \bpublisherSpringer. \endbibitem
- Bodnar, Okhrin and Parolya (2022) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmOkhrin, \bfnmYarema\binitsY. and \bauthor\bsnmParolya, \bfnmNestor\binitsN. (\byear2022). \btitleOptimal shrinkage-based portfolio selection in high dimensions. \bjournalJournal of Business & Economic Statistics \bvolume41 \bpages140–156. \endbibitem
- Bodnar, Parolya and Schmid (2018) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmSchmid, \bfnmWolfgang\binitsW. (\byear2018). \btitleEstimation of the global minimum variance portfolio in high dimensions. \bjournalEuropean Journal of Operational Research \bvolume266 \bpages371–390. \endbibitem
- Bodnar, Parolya and Thorsén (2023) {barticle}[author] \bauthor\bsnmBodnar, \bfnmTaras\binitsT., \bauthor\bsnmParolya, \bfnmNestor\binitsN. and \bauthor\bsnmThorsén, \bfnmErik\binitsE. (\byear2023). \btitleDynamic Shrinkage Estimation of the High-Dimensional Minimum-Variance Portfolio. \bjournalIEEE Transactions on Signal Processing \bvolume71 \bpages1334-1349. \bdoi10.1109/TSP.2023.3263950 \endbibitem
- Bongiorno and Challet (2023) {barticle}[author] \bauthor\bsnmBongiorno, \bfnmChristian\binitsC. and \bauthor\bsnmChallet, \bfnmDamien\binitsD. (\byear2023). \btitleNon-linear shrinkage of the price return covariance matrix is far from optimal for portfolio optimization. \bjournalFinance Research Letters \bvolume52 \bpages103383. \bdoihttps://doi.org/10.1016/j.frl.2022.103383 \endbibitem
- Boyd et al. (2024) {barticle}[author] \bauthor\bsnmBoyd, \bfnmStephen\binitsS., \bauthor\bsnmJohansson, \bfnmKasper\binitsK., \bauthor\bsnmKahn, \bfnmRonald\binitsR., \bauthor\bsnmSchiele, \bfnmPhilipp\binitsP. and \bauthor\bsnmSchmelzer, \bfnmThomas\binitsT. (\byear2024). \btitleMarkowitz Portfolio Construction at Seventy. \bjournalarXiv preprint arXiv:2401.05080. \endbibitem
- Bun, Bouchaud and Potters (2017) {barticle}[author] \bauthor\bsnmBun, \bfnmJoël\binitsJ., \bauthor\bsnmBouchaud, \bfnmJean-Philippe\binitsJ.-P. and \bauthor\bsnmPotters, \bfnmMarc\binitsM. (\byear2017). \btitleCleaning large correlation matrices: tools from random matrix theory. \bjournalPhysics Reports \bvolume666 \bpages1–109. \endbibitem
- Cai et al. (2020) {barticle}[author] \bauthor\bsnmCai, \bfnmT. Tony\binitsT. T., \bauthor\bsnmHu, \bfnmJianchang\binitsJ., \bauthor\bsnmLi, \bfnmYingying\binitsY. and \bauthor\bsnmZheng, \bfnmXinghua\binitsX. (\byear2020). \btitleHigh-dimensional minimum variance portfolio estimation based on high-frequency data. \bjournalJournal of Econometrics \bvolume214 \bpages482-494. \bdoihttps://doi.org/10.1016/j.jeconom.2019.04.039 \endbibitem
- Cai et al. (2021) {barticle}[author] \bauthor\bsnmCai, \bfnmChangxiao\binitsC., \bauthor\bsnmLi, \bfnmGen\binitsG., \bauthor\bsnmChi, \bfnmYuejie\binitsY., \bauthor\bsnmPoor, \bfnmH. Vincent\binitsH. V. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2021). \btitleSubspace estimation from unbalanced and incomplete data matrices: statistical guarantees. \bjournalThe Annals of Statistics \bvolume49 \bpages944–967. \bdoi10.1214/20-AOS1986 \endbibitem
- Candès et al. (2011) {barticle}[author] \bauthor\bsnmCandès, \bfnmEmmanuel J\binitsE. J., \bauthor\bsnmLi, \bfnmXiaodong\binitsX., \bauthor\bsnmMa, \bfnmYi\binitsY. and \bauthor\bsnmWright, \bfnmJohn\binitsJ. (\byear2011). \btitleRobust principal component analysis? \bjournalJournal of the ACM (JACM) \bvolume58 \bpages1–37. \endbibitem
- Capon (1969) {barticle}[author] \bauthor\bsnmCapon, \bfnmJack\binitsJ. (\byear1969). \btitleHigh-resolution frequency-wavenumber spectrum analysis. \bjournalProceedings of the IEEE \bvolume57 \bpages1408–1418. \endbibitem
- Casella and Hwang (1982) {barticle}[author] \bauthor\bsnmCasella, \bfnmGeorge\binitsG. and \bauthor\bsnmHwang, \bfnmJiunn Tzon\binitsJ. T. (\byear1982). \btitleLimit expressions for the risk of james-stein estimators. \bjournalCanadian Journal of Statistics \bvolume10 \bpages305–309. \endbibitem
- Chamberlain and Rothschild (1983) {barticle}[author] \bauthor\bsnmChamberlain, \bfnmGary\binitsG. and \bauthor\bsnmRothschild, \bfnmMichael\binitsM. (\byear1983). \btitleArbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. \bjournalEconometrica: Journal of the Econometric Society \bpages1281–1304. \endbibitem
- Chandrasekaran, Parrilo and Willsky (2012) {barticle}[author] \bauthor\bsnmChandrasekaran, \bfnmVenkat\binitsV., \bauthor\bsnmParrilo, \bfnmPablo A\binitsP. A. and \bauthor\bsnmWillsky, \bfnmAlan S\binitsA. S. (\byear2012). \btitleLATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. \bjournalThe Annals of Statistics \bpages1935–1967. \endbibitem
- Chen, Qiu and Sheng (2024) {barticle}[author] \bauthor\bsnmChen, \bfnmXiangwei\binitsX., \bauthor\bsnmQiu, \bfnmShuang\binitsS. and \bauthor\bsnmSheng, \bfnmWeixing\binitsW. (\byear2024). \btitleImproved eigenspace-based method for robust adaptive beamforming with dimension search. \bjournalSignal Processing \bvolume218 \bpages109366. \endbibitem
- Chen et al. (2010) {barticle}[author] \bauthor\bsnmChen, \bfnmYilun\binitsY., \bauthor\bsnmWiesel, \bfnmAmi\binitsA., \bauthor\bsnmEldar, \bfnmYonina C\binitsY. C. and \bauthor\bsnmHero, \bfnmAlfred O\binitsA. O. (\byear2010). \btitleShrinkage algorithms for MMSE covariance estimation. \bjournalIEEE transactions on signal processing \bvolume58 \bpages5016–5029. \endbibitem
- Clarke, De Silva and Thorley (2011) {barticle}[author] \bauthor\bsnmClarke, \bfnmRoger\binitsR., \bauthor\bsnmDe Silva, \bfnmHarindra\binitsH. and \bauthor\bsnmThorley, \bfnmSteven\binitsS. (\byear2011). \btitleMinimum-Variance Portfolio Composition. \bjournalJournal of Portfolio Management \bvolume2 \bpages31–45. \endbibitem
- Connor and Korajczyk (1986) {barticle}[author] \bauthor\bsnmConnor, \bfnmGregory\binitsG. and \bauthor\bsnmKorajczyk, \bfnmRobert A.\binitsR. A. (\byear1986). \btitlePerformance measurement with the arbitrage pricing theory: A new framework for analysis. \bjournalJournal of financial economics \bvolume15 \bpages373–394. \endbibitem
- Cox (2002) {binproceedings}[author] \bauthor\bsnmCox, \bfnmHenry\binitsH. (\byear2002). \btitleAdaptive beamforming in non-stationary environments. In \bbooktitleConference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002. \bvolume1 \bpages431–438. \bpublisherIEEE. \endbibitem
- Cox, Zeskind and Owen (1987) {barticle}[author] \bauthor\bsnmCox, \bfnmHenry\binitsH., \bauthor\bsnmZeskind, \bfnmRobertm\binitsR. and \bauthor\bsnmOwen, \bfnmMarkm\binitsM. (\byear1987). \btitleRobust adaptive beamforming. \bjournalIEEE Transactions on Acoustics, Speech, and Signal Processing \bvolume35 \bpages1365–1376. \endbibitem
- De Mol, Giannone and Reichlin (2008) {barticle}[author] \bauthor\bsnmDe Mol, \bfnmChristine\binitsC., \bauthor\bsnmGiannone, \bfnmDomenico\binitsD. and \bauthor\bsnmReichlin, \bfnmLucrezia\binitsL. (\byear2008). \btitleForecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components? \bjournalJournal of Econometrics \bvolume146 \bpages318–328. \endbibitem
- Ding, Li and Zheng (2021) {barticle}[author] \bauthor\bsnmDing, \bfnmYi\binitsY., \bauthor\bsnmLi, \bfnmYingying\binitsY. and \bauthor\bsnmZheng, \bfnmXinghua\binitsX. (\byear2021). \btitleHigh dimensional minimum variance portfolio estimation under statistical factor models. \bjournalJournal of Econometrics \bvolume222 \bpages502-515. \bnoteAnnals Issue: Financial Econometrics in the Age of the Digital Economy. \bdoihttps://doi.org/10.1016/j.jeconom.2020.07.013 \endbibitem
- Donoho, Gavish and Johnstone (2018) {barticle}[author] \bauthor\bsnmDonoho, \bfnmDavid L\binitsD. L., \bauthor\bsnmGavish, \bfnmMatan\binitsM. and \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2018). \btitleOptimal shrinkage of eigenvalues in the spiked covariance model. \bjournalAnnals of statistics \bvolume46 \bpages1742. \endbibitem
- Donoho, Gavish and Romanov (2023) {barticle}[author] \bauthor\bsnmDonoho, \bfnmDavid\binitsD., \bauthor\bsnmGavish, \bfnmMatan\binitsM. and \bauthor\bsnmRomanov, \bfnmElad\binitsE. (\byear2023). \btitleScreeNOT: Exact MSE-optimal singular value thresholding in correlated noise. \bjournalThe Annals of Statistics \bvolume51 \bpages122–148. \endbibitem
- El Karoui (2010) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2010). \btitleHigh-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. \endbibitem
- El Karoui (2013) {barticle}[author] \bauthor\bsnmEl Karoui, \bfnmNoureddine\binitsN. (\byear2013). \btitleOn the realized risk of high-dimensional Markowitz portfolios. \bjournalSIAM Journal on Financial Mathematics \bvolume4 \bpages737–783. \endbibitem
- Fan, Fan and Lv (2008) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmFan, \bfnmYingying\binitsY. and \bauthor\bsnmLv, \bfnmJinchi\binitsJ. (\byear2008). \btitleHigh dimensional covariance matrix estimation using a factor model. \bjournalJournal of Econometrics \bvolume147 \bpages186–197. \endbibitem
- Fan, Liao and Mincheva (2013) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmMincheva, \bfnmMartina\binitsM. (\byear2013). \btitleLarge covariance estimation by thresholding principal orthogonal complements. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume75 \bpages603–680. \endbibitem
- Fan, Liao and Liu (2016) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmLiu, \bfnmHan\binitsH. (\byear2016). \btitleAn overview of the estimation of large covariance and precision matrices. \bjournalThe Econometrics Journal \bvolume19 \bpagesC1–C32. \endbibitem
- Fan, Liao and Wang (2016) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmLiao, \bfnmYuan\binitsY. and \bauthor\bsnmWang, \bfnmWeichen\binitsW. (\byear2016). \btitleProjected principal component analysis in factor models. \bjournalAnnals of statistics \bvolume44 \bpages219. \endbibitem
- Fan, Masini and Medeiros (2023) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmMasini, \bfnmRicardo P\binitsR. P. and \bauthor\bsnmMedeiros, \bfnmMarcelo C\binitsM. C. (\byear2023). \btitleBridging factor and sparse models. \bjournalThe Annals of Statistics \bvolume51 \bpages1692–1717. \endbibitem
- Fan, Wang and Zhong (2018) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmWang, \bfnmWeichen\binitsW. and \bauthor\bsnmZhong, \bfnmYiqiao\binitsY. (\byear2018). \btitleAn eigenvector perturbation bound and its application to robust covariance estimation. \bjournalJournal of Machine Learning Research \bvolume18 \bpages1–42. \endbibitem
- Fan, Zhang and Yu (2012) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ., \bauthor\bsnmZhang, \bfnmJingjin\binitsJ. and \bauthor\bsnmYu, \bfnmKe\binitsK. (\byear2012). \btitleVast portfolio selection with gross-exposure constraints. \bjournalJournal of the American Statistical Association \bvolume107 \bpages592–606. \endbibitem
- Fan and Zhong (2018) {barticle}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmZhong, \bfnmYiqiao\binitsY. (\byear2018). \btitleOptimal subspace estimation using overidentifying vectors via generalized method of moments. \bjournalarXiv preprint arXiv:1805.02826. \endbibitem
- Farnè and Montanari (2024) {barticle}[author] \bauthor\bsnmFarnè, \bfnmMatteo\binitsM. and \bauthor\bsnmMontanari, \bfnmAngela\binitsA. (\byear2024). \btitleLarge factor model estimation by nuclear norm plus norm penalization. \bjournalJournal of Multivariate Analysis \bvolume199 \bpages105244. \endbibitem
- Fisher and Sun (2011) {barticle}[author] \bauthor\bsnmFisher, \bfnmThomas J\binitsT. J. and \bauthor\bsnmSun, \bfnmXiaoqian\binitsX. (\byear2011). \btitleImproved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix. \bjournalComputational Statistics & Data Analysis \bvolume55 \bpages1909–1918. \endbibitem
- Goldberg, Gurdogan and Kercheval (2023) {bmisc}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmGurdogan, \bfnmHubeyb\binitsH. and \bauthor\bsnmKercheval, \bfnmAlec\binitsA. (\byear2023). \btitlePortfolio optimization via strategy-specific eigenvector shrinkage. \endbibitem
- Goldberg and Kercheval (2023) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R. and \bauthor\bsnmKercheval, \bfnmAlec N\binitsA. N. (\byear2023). \btitleJames–Stein for the leading eigenvector. \bjournalProceedings of the National Academy of Sciences \bvolume120 \bpagese2207046120. \endbibitem
- Goldberg, Papanicolaou and Shkolnik (2022) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmPapanicolaou, \bfnmAlex\binitsA. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2022). \btitleThe dispersion bias. \bjournalSIAM Journal on Financial Mathematics \bvolume13 \bpages521–550. \endbibitem
- Goldberg et al. (2020) {barticle}[author] \bauthor\bsnmGoldberg, \bfnmLisa R\binitsL. R., \bauthor\bsnmPapanicolaou, \bfnmAlex\binitsA., \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. and \bauthor\bsnmUlucam, \bfnmSimge\binitsS. (\byear2020). \btitleBetter betas. \bjournalThe Journal of Portfolio Management \bvolume47 \bpages119–136. \endbibitem
- Gurdogan and Kercheval (2022) {barticle}[author] \bauthor\bsnmGurdogan, \bfnmHubeyb\binitsH. and \bauthor\bsnmKercheval, \bfnmAlec\binitsA. (\byear2022). \btitleMultiple Anchor Point Shrinkage for the Sample Covariance Matrix. \bjournalSIAM Journal on Financial Mathematics \bvolume13 \bpages1112–1143. \endbibitem
- Hager and Hungerford (2015) {barticle}[author] \bauthor\bsnmHager, \bfnmWilliam W\binitsW. W. and \bauthor\bsnmHungerford, \bfnmJames T\binitsJ. T. (\byear2015). \btitleContinuous quadratic programming formulations of optimization problems on graphs. \bjournalEuropean Journal of Operational Research \bvolume240 \bpages328–337. \endbibitem
- Hall, Marron and Neeman (2005) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP., \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. and \bauthor\bsnmNeeman, \bfnmAmnon\binitsA. (\byear2005). \btitleGeometric representation of high dimension, low sample size data. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume67 \bpages427–444. \endbibitem
- Hegerl et al. (1996) {barticle}[author] \bauthor\bsnmHegerl, \bfnmHANs\binitsH. \bsuffixGabriele C, \bauthor\bsnmHasselmann, \bfnmKlaus\binitsK., \bauthor\bsnmSanter, \bfnmBenjamin D\binitsB. D., \bauthor\bsnmCubasch, \bfnmUlrich\binitsU. and \bauthor\bsnmJones, \bfnmPhilip D\binitsP. D. (\byear1996). \btitleDetecting greenhouse-gas-induced climate change with an optimal fingerprint method. \bjournalJournal of Climate \bvolume9 \bpages2281–2306. \endbibitem
- Huberman (1982) {barticle}[author] \bauthor\bsnmHuberman, \bfnmGur\binitsG. (\byear1982). \btitleA simple approach to arbitrage pricing theory. \bjournalJournal of Economic Theory \bvolume28 \bpages183–191. \endbibitem
- Jagannathan and Ma (2003) {barticle}[author] \bauthor\bsnmJagannathan, \bfnmRavi\binitsR. and \bauthor\bsnmMa, \bfnmTongshu\binitsT. (\byear2003). \btitleRisk reduction in large portfolios: Why imposing the wrong constraints helps. \bjournalThe journal of finance \bvolume58 \bpages1651–1683. \endbibitem
- Johnstone (2001) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. (\byear2001). \btitleOn the distribution of the largest eigenvalue in principal components analysis. \bjournalThe Annals of statistics \bvolume29 \bpages295–327. \endbibitem
- Johnstone and Lu (2009) {barticle}[author] \bauthor\bsnmJohnstone, \bfnmIain M\binitsI. M. and \bauthor\bsnmLu, \bfnmArthur Yu\binitsA. Y. (\byear2009). \btitleOn consistency and sparsity for principal components analysis in high dimensions. \bjournalJournal of the American Statistical Association \bvolume104 \bpages682–693. \endbibitem
- Jung (2022) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS. (\byear2022). \btitleAdjusting systematic bias in high dimensional principal component scores. \bjournalStatistica Sinica \bvolume32 \bpages939–959. \endbibitem
- Jung, Lee and Ahn (2018) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS., \bauthor\bsnmLee, \bfnmMyung Hee\binitsM. H. and \bauthor\bsnmAhn, \bfnmJeongyoun\binitsJ. (\byear2018). \btitleOn the number of principal components in high dimensions. \bjournalBiometrika \bvolume105 \bpages389–402. \endbibitem
- Jung and Marron (2009) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS. and \bauthor\bsnmMarron, \bfnmJ Stephen\binitsJ. S. (\byear2009). \btitlePCA consistency in high dimension, low sample size context. \bjournalThe Annals of Statistics \bvolume37 \bpages4104–4130. \endbibitem
- Jung, Sen and Marron (2012) {barticle}[author] \bauthor\bsnmJung, \bfnmSungkyu\binitsS., \bauthor\bsnmSen, \bfnmArusharka\binitsA. and \bauthor\bsnmMarron, \bfnmJS\binitsJ. (\byear2012). \btitleBoundary behavior in high dimension, low sample size asymptotics of PCA. \bjournalJournal of Multivariate Analysis \bvolume109 \bpages190–203. \endbibitem
- Lai et al. (2011) {barticle}[author] \bauthor\bsnmLai, \bfnmTze Leung\binitsT. L., \bauthor\bsnmXing, \bfnmHaipeng\binitsH., \bauthor\bsnmChen, \bfnmZehao\binitsZ. \betalet al. (\byear2011). \btitleMean–variance portfolio optimization when means and covariances are unknown. \bjournalThe Annals of Applied Statistics \bvolume5 \bpages798–823. \endbibitem
- Lam (2020) {barticle}[author] \bauthor\bsnmLam, \bfnmClifford\binitsC. (\byear2020). \btitleHigh-dimensional covariance matrix estimation. \bjournalWiley Interdisciplinary reviews: computational statistics \bvolume12 \bpagese1485. \endbibitem
- Lancewicki and Aladjem (2014) {barticle}[author] \bauthor\bsnmLancewicki, \bfnmTomer\binitsT. and \bauthor\bsnmAladjem, \bfnmMayer\binitsM. (\byear2014). \btitleMulti-Target Shrinkage Estimation for Covariance Matrices. \bjournalIEEE Transactions on Signal Processing \bvolume62 \bpages6380-6390. \bdoi10.1109/TSP.2014.2364784 \endbibitem
- Ledoit and Wolf (2003) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2003). \btitleImproved estimation of the covariance matrix of stock returns with an application to portfolio selection. \bjournalJournal of empirical finance \bvolume10 \bpages603–621. \endbibitem
- Ledoit and Wolf (2004a) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2004a). \btitleHoney, I Shrunk the Sample Covariance Matrix. \bjournalJournal of Portfolio Management \bvolume30 \bpages110. \endbibitem
- Ledoit and Wolf (2004b) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2004b). \btitleA well-conditioned estimator for large-dimensional covariance matrices. \bjournalJournal of multivariate analysis \bvolume88 \bpages365–411. \endbibitem
- Ledoit and Wolf (2017) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2017). \btitleNonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. \bjournalThe Review of Financial Studies \bvolume30 \bpages4349–4388. \endbibitem
- Ledoit and Wolf (2018) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2018). \btitleOptimal estimation of a large-dimensional covariance matrix under Stein’s loss. \endbibitem
- Ledoit and Wolf (2020a) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2020a). \btitleThe Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation. \bjournalJournal of Financial Econometrics \bvolume20 \bpages187-218. \bdoi10.1093/jjfinec/nbaa007 \endbibitem
- Ledoit and Wolf (2020b) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2020b). \btitleAnalytical nonlinear shrinkage of large-dimensional covariance matrices. \bjournalThe Annals of Statistics \bvolume48 \bpages3043 – 3065. \bdoi10.1214/19-AOS1921 \endbibitem
- Ledoit and Wolf (2021) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2021). \btitleShrinkage estimation of large covariance matrices: Keep it simple, statistician? \bjournalJournal of Multivariate Analysis \bvolume186 \bpages104796. \endbibitem
- Ledoit and Wolf (2022) {barticle}[author] \bauthor\bsnmLedoit, \bfnmOlivier\binitsO. and \bauthor\bsnmWolf, \bfnmMichael\binitsM. (\byear2022). \btitleQuadratic shrinkage for large covariance matrices. \bjournalBernoulli \bvolume28 \bpages1519–1547. \endbibitem
- Lee and Shkolnik (2024a) {bmisc}[author] \bauthor\bsnmLee, \bfnmYouhong\binitsY. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024a). \btitleJames-Stein regularization of the first principal component. \endbibitem
- Lee and Shkolnik (2024b) {bmisc}[author] \bauthor\bsnmLee, \bfnmYouhong\binitsY. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024b). \btitleCentral Limit Theorems of a Strongly Spiked Eigenvecor and its James-Stein estimator. \endbibitem
- Lettau and Pelger (2020) {barticle}[author] \bauthor\bsnmLettau, \bfnmMartin\binitsM. and \bauthor\bsnmPelger, \bfnmMarkus\binitsM. (\byear2020). \btitleEstimating latent asset-pricing factors. \bjournalJournal of Econometrics \bvolume218 \bpages1–31. \endbibitem
- Li and Shkolnik (2024) {bmisc}[author] \bauthor\bsnmLi, \bfnmChang Yuan\binitsC. Y. and \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2024). \btitleOn Minimum Trace Factor Analysis–An Old Song Sung to a New Tune. \endbibitem
- Li and Stoica (2005) {bbook}[author] \bauthor\bsnmLi, \bfnmJian\binitsJ. and \bauthor\bsnmStoica, \bfnmPetre\binitsP. (\byear2005). \btitleRobust adaptive beamforming. \bpublisherJohn Wiley & Sons. \endbibitem
- Li et al. (2022) {barticle}[author] \bauthor\bsnmLi, \bfnmGen\binitsG., \bauthor\bsnmCai, \bfnmChangxiao\binitsC., \bauthor\bsnmPoor, \bfnmH Vincent\binitsH. V. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2022). \btitleMinimax estimation of linear functions of eigenvectors in the face of small eigen-gaps. \bjournalarXiv preprint arXiv:2104.03298. \endbibitem
- Luo et al. (2023) {barticle}[author] \bauthor\bsnmLuo, \bfnmTao\binitsT., \bauthor\bsnmChen, \bfnmPeng\binitsP., \bauthor\bsnmCao, \bfnmZhenxin\binitsZ., \bauthor\bsnmZheng, \bfnmLe\binitsL. and \bauthor\bsnmWang, \bfnmZongxin\binitsZ. (\byear2023). \btitleURGLQ: An Efficient Covariance Matrix Reconstruction Method for Robust Adaptive Beamforming. \bjournalIEEE Transactions on Aerospace and Electronic Systems. \endbibitem
- Maggiar et al. (2018) {barticle}[author] \bauthor\bsnmMaggiar, \bfnmAlvaro\binitsA., \bauthor\bsnmWachter, \bfnmAndreas\binitsA., \bauthor\bsnmDolinskaya, \bfnmIrina S\binitsI. S. and \bauthor\bsnmStaum, \bfnmJeremy\binitsJ. (\byear2018). \btitleA derivative-free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple importance sampling. \bjournalSIAM Journal on Optimization \bvolume28 \bpages1478–1507. \endbibitem
- Marchenko and Pastur (1967) {barticle}[author] \bauthor\bsnmMarchenko, \bfnmVladimir Alexandrovich\binitsV. A. and \bauthor\bsnmPastur, \bfnmLeonid Andreevich\binitsL. A. (\byear1967). \btitleDistribution of eigenvalues for some sets of random matrices. \bjournalMatematicheskii Sbornik \bvolume114 \bpages507–536. \endbibitem
- Markowitz (1952) {barticle}[author] \bauthor\bsnmMarkowitz, \bfnmHarry\binitsH. (\byear1952). \btitlePortfolio Selection. \bjournalThe Journal of Finance \bvolume7 \bpages77–91. \endbibitem
- Menchero, Orr and Wang (2011) {barticle}[author] \bauthor\bsnmMenchero, \bfnmJose\binitsJ., \bauthor\bsnmOrr, \bfnmD\binitsD. and \bauthor\bsnmWang, \bfnmJun\binitsJ. (\byear2011). \btitleThe Barra US equity model (USE4), methodology notes. \bjournalMSCI Barra. \endbibitem
- Michaud (1989) {barticle}[author] \bauthor\bsnmMichaud, \bfnmRichard O\binitsR. O. (\byear1989). \btitleThe Markowitz optimization enigma: Is ‘optimized’optimal? \bjournalFinancial analysts journal \bvolume45 \bpages31–42. \endbibitem
- Ollila, Palomar and Pascal (2020) {barticle}[author] \bauthor\bsnmOllila, \bfnmEsa\binitsE., \bauthor\bsnmPalomar, \bfnmDaniel P\binitsD. P. and \bauthor\bsnmPascal, \bfnmFrédéric\binitsF. (\byear2020). \btitleShrinking the eigenvalues of M-estimators of covariance matrix. \bjournalIEEE Transactions on Signal Processing \bvolume69 \bpages256–269. \endbibitem
- Onatski (2012) {barticle}[author] \bauthor\bsnmOnatski, \bfnmAlexei\binitsA. (\byear2012). \btitleAsymptotics of the principal components estimator of large factor models with weakly influential factors. \bjournalJournal of Econometrics \bvolume168 \bpages244–258. \endbibitem
- Pafka and Kondor (2003) {barticle}[author] \bauthor\bsnmPafka, \bfnmSzilárd\binitsS. and \bauthor\bsnmKondor, \bfnmImre\binitsI. (\byear2003). \btitleNoisy covariance matrices and portfolio optimization II. \bjournalPhysica A: Statistical Mechanics and its Applications \bvolume319 \bpages487-494. \bdoihttps://doi.org/10.1016/S0378-4371(02)01499-1 \endbibitem
- Paul (2007) {barticle}[author] \bauthor\bsnmPaul, \bfnmDebashis\binitsD. (\byear2007). \btitleAsymptotics of sample eigenstructure for a large dimensional spiked covariance model. \bjournalStatistica Sinica \bpages1617–1642. \endbibitem
- Quijano and Zurk (2015) {barticle}[author] \bauthor\bsnmQuijano, \bfnmJorge E\binitsJ. E. and \bauthor\bsnmZurk, \bfnmLisa M\binitsL. M. (\byear2015). \btitleEigenvector pruning method for high resolution beamforming. \bjournalThe Journal of the Acoustical Society of America \bvolume138 \bpages2152–2160. \endbibitem
- Ross (1976) {barticle}[author] \bauthor\bsnmRoss, \bfnmStephen A\binitsS. A. (\byear1976). \btitleThe arbitrage theory of capital asset pricing. \bjournalJournal of economic theory \bvolume13 \bpages341–360. \endbibitem
- Saunderson et al. (2012) {barticle}[author] \bauthor\bsnmSaunderson, \bfnmJ.\binitsJ., \bauthor\bsnmChandrasekaran, \bfnmV.\binitsV., \bauthor\bsnmParrilo, \bfnmP. A.\binitsP. A. and \bauthor\bsnmWillsky, \bfnmA. S.\binitsA. S. (\byear2012). \btitleDiagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting. \bjournalSIAM Journal on Matrix Analysis and Applications \bvolume33 \bpages1395–1416. \bdoi10.1137/120872516 \endbibitem
- Shapiro (1985) {barticle}[author] \bauthor\bsnmShapiro, \bfnmAlexander\binitsA. (\byear1985). \btitleIdentifiability of factor analysis: Some results and open problems. \bjournalLinear Algebra and its Applications \bvolume70 \bpages1–7. \endbibitem
- Shen, Shen and Marron (2013) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH. and \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. (\byear2013). \btitleConsistency of sparse PCA in high dimension, low sample size contexts. \bjournalJournal of Multivariate Analysis \bvolume115 \bpages317–333. \endbibitem
- Shen, Shen and Marron (2016) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH. and \bauthor\bsnmMarron, \bfnmJames S\binitsJ. S. (\byear2016). \btitleA general framework for consistency of principal component analysis. \bjournalJournal of Machine Learning Research \bvolume17 \bpages1–34. \endbibitem
- Shen et al. (2016) {barticle}[author] \bauthor\bsnmShen, \bfnmDan\binitsD., \bauthor\bsnmShen, \bfnmHaipeng\binitsH., \bauthor\bsnmZhu, \bfnmHongtu\binitsH. and \bauthor\bsnmMarron, \bfnmJames Stephen\binitsJ. S. (\byear2016). \btitleThe statistics and mathematics of high dimensional low sample size asympotics. \bjournalStatistica Sinica \bvolume26 \bpages1747–1770. \endbibitem
- Shkolnik (2022) {barticle}[author] \bauthor\bsnmShkolnik, \bfnmAlex\binitsA. (\byear2022). \btitleJames-Stein estimation of the first principal component. \bjournalStat, forthcoming. \endbibitem
- Vorobyov (2013) {barticle}[author] \bauthor\bsnmVorobyov, \bfnmSergiy A\binitsS. A. (\byear2013). \btitlePrinciples of minimum variance robust adaptive beamforming design. \bjournalSignal Processing \bvolume93 \bpages3264–3277. \endbibitem
- Wang and Fan (2017) {barticle}[author] \bauthor\bsnmWang, \bfnmWeichen\binitsW. and \bauthor\bsnmFan, \bfnmJianqing\binitsJ. (\byear2017). \btitleAsymptotics of empirical eigenstructure for high dimensional spiked covariance. \bjournalThe Annals of Statistics \bvolume45 \bpages1342–1374. \endbibitem
- Wang and Zhang (2024) {barticle}[author] \bauthor\bsnmWang, \bfnmXuanci\binitsX. and \bauthor\bsnmZhang, \bfnmBin\binitsB. (\byear2024). \btitleTarget selection in shrinkage estimation of covariance matrix: A structural similarity approach. \bjournalStatistics & Probability Letters \bpages110048. \endbibitem
- Weyl (1912) {barticle}[author] \bauthor\bsnmWeyl, \bfnmHermann\binitsH. (\byear1912). \btitleDas asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). \bjournalMathematische Annalen \bvolume71 \bpages441–479. \endbibitem
- Won et al. (2013) {barticle}[author] \bauthor\bsnmWon, \bfnmJoong-Ho\binitsJ.-H., \bauthor\bsnmLim, \bfnmJohan\binitsJ., \bauthor\bsnmKim, \bfnmSeung-Jean\binitsS.-J. and \bauthor\bsnmRajaratnam, \bfnmBala\binitsB. (\byear2013). \btitleCondition-number-regularized covariance estimation. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume75 \bpages427–450. \endbibitem
- Xie et al. (2021) {barticle}[author] \bauthor\bsnmXie, \bfnmLei\binitsL., \bauthor\bsnmHe, \bfnmZishu\binitsZ., \bauthor\bsnmTong, \bfnmJun\binitsJ., \bauthor\bsnmLi, \bfnmJun\binitsJ. and \bauthor\bsnmXi, \bfnmJiangtao\binitsJ. (\byear2021). \btitleCross-validated tuning of shrinkage factors for MVDR beamforming based on regularized covariance matrix estimation. \bjournalarXiv preprint arXiv:2104.01909. \endbibitem
- Yan, Chen and Fan (2021) {bmisc}[author] \bauthor\bsnmYan, \bfnmYuling\binitsY., \bauthor\bsnmChen, \bfnmYuxin\binitsY. and \bauthor\bsnmFan, \bfnmJianqing\binitsJ. (\byear2021). \btitleInference for Heteroskedastic PCA with Missing Data. \bnotearXiv:2107.12365 [cs, math, stat]. \bdoi10.48550/arXiv.2107.12365 \endbibitem
- Yao, Zheng and Bai (2015) {barticle}[author] \bauthor\bsnmYao, \bfnmJianfeng\binitsJ., \bauthor\bsnmZheng, \bfnmShurong\binitsS. and \bauthor\bsnmBai, \bfnmZD\binitsZ. (\byear2015). \btitleSample covariance matrices and high-dimensional data analysis. \bjournalCambridge UP, New York. \endbibitem
- Yata and Aoshima (2009) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2009). \btitlePCA consistency for non-Gaussian data in high dimension, low sample size context. \bjournalCommunications in Statistics-Theory and Methods \bvolume38 \bpages2634–2652. \endbibitem
- Yata and Aoshima (2012) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2012). \btitleEffective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. \bjournalJournal of multivariate analysis \bvolume105 \bpages193–215. \endbibitem
- Yata and Aoshima (2013) {barticle}[author] \bauthor\bsnmYata, \bfnmKazuyoshi\binitsK. and \bauthor\bsnmAoshima, \bfnmMakoto\binitsM. (\byear2013). \btitlePCA consistency for the power spiked model in high-dimensional settings. \bjournalJournal of multivariate analysis \bvolume122 \bpages334–354. \endbibitem
- Zhang, Cai and Wu (2022) {barticle}[author] \bauthor\bsnmZhang, \bfnmAnru R.\binitsA. R., \bauthor\bsnmCai, \bfnmT. Tony\binitsT. T. and \bauthor\bsnmWu, \bfnmYihong\binitsY. (\byear2022). \btitleHeteroskedastic PCA: Algorithm, optimality, and applications. \bjournalThe Annals of Statistics \bvolume50. \bdoi10.1214/21-AOS2074 \endbibitem
- Zhou and Chen (2023) {bmisc}[author] \bauthor\bsnmZhou, \bfnmYuchen\binitsY. and \bauthor\bsnmChen, \bfnmYuxin\binitsY. (\byear2023). \btitleDeflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA. \bnotearXiv:2303.06198 [cs, math, stat]. \endbibitem
- Zhu, Xu and Ye (2020) {barticle}[author] \bauthor\bsnmZhu, \bfnmXingyu\binitsX., \bauthor\bsnmXu, \bfnmXu\binitsX. and \bauthor\bsnmYe, \bfnmZhongfu\binitsZ. (\byear2020). \btitleRobust adaptive beamforming via subspace for interference covariance matrix reconstruction. \bjournalSignal Processing \bvolume167 \bpages107289. \endbibitem