A unified approach for covariance matrix estimation under Stein loss
Abstract
In this paper, we address the problem of estimating a covariance matrix of a multivariate Gaussian distribution, relative to a Stein loss function, from a decision theoretic point of view. We investigate the case where the covariance matrix is invertible and the case when it is non–invertible in a unified approach.
keywords:
Orthogonally invariant estimators , singular covariance matrix , statistical decision theory , high–dimensional statistics.MSC:
[2010]62H12 , 62F10 , 62C99.
[label1]organization=INSA Rouen, Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS and LMI, addressline=avenue de l’Université, BP 8, city=Saint-Étienne-du-Rouvray, postcode=76801, country=France., ead= Mohamed.haddouche@insa-rouen.fr \affiliation[label2]organization=INSA Rouen, Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, LMI, addressline=avenue de l’Université, BP 8, city=Saint-Étienne-du-Rouvray, postcode=76801, country=France., ead= wei.lu@insa-rouen.fr
1 Introduction
Let be an observed matrix of the form
(1) |
where is a matrix of unknown parameters, with , and is a random matrix. Assume that is known and that the columns of are identically and independently distributed as the -dimensional multivariate normal distribution . Then the columns of are identically and independently distributed from the -dimensional multivariate normal , where is the unknown covariance matrix with
It follows that the sample covariance matrix has a singular Wishart distribution (see Srivastava (2003)) such that
with probability one. We denote in the following by and the Moore-Penrose inverses of and respectively.
We consider the problem of estimating the covariance matrix under the Stein type loss function
(2) |
where estimates and is the diagonal matrix of the positives eigenvalues of . The corresponding risk function is denoted by
where denotes the expectation with respect to the model (1). Note that the loss function (2) is an adaptation of the original Stein loss function (see Stein (1986)) to the context of the model (1) (see Tsukuma (2016) for more details).
The difficulty of covariance estimation is commonly characterized by the ratio . The usual estimators of the form
(3) |
perform poorly when with (see Ledoit and Wolf (2004)). Hence, in this situation, alternative estimators are needed. Indeed, as pointed out by James and Stein (1961), the larger (smaller) eigenvalues of are overestimated (underestimated) by those estimators. Therefore, a possible approach to derive an improved estimators is to regularize the eigenvalues of . This fact suggest to consider the class of orthogonally invariant estimators (see Takemura (1984)) in (5) below.
Considering the model (1), we deal, in a unified approach, with the following cases.
-
1.
: is invertible of and is non–invertible of ;
-
2.
: and are invertible;
-
3.
: and are non–invertible of rank ;
-
4.
: and are non–invertible of rank ;
-
5.
: and are non–invertible of ranks and respectively.
The class of orthogonally invariant estimators was considered by various authors. For instance, see Stein (1986), Dey and Srinivasan (1985) and Haff (1980) for the case , Konno (2009) and Haddouche et al. (2021) for the cases and . See also Chételat and Wells (2016) for the cases and . Recently Tsukuma (2016) extend the Stein (1986) estimator to the five possible cases above in a unified approach. Similarly, we extend here the class of Haff (1980) estimators to the context of the model (1).
2 Main result
Improving the class of the natural estimators in (3) relies on improving the optimal estimator among this class, that is, the one which minimizes the loss function (2).
Proposition 1 (Tsukuma (2016))
As mentioned in Section 1, we consider the class of orthogonnally invariant estimators. Let be the eigenvalue decomposition of where is a semi–orthogonal matrix of eigenvectors and , with , is the diagonal matrix of the positive corresponding eigenvalues (see Kubokawa and Srivastava (2008) for more details). The class of orthogonally invariant estimators is of the form
(5) |
with , where () is a differentiable function of .
More precisely, we consider an extension of the class of Haff (1980) estimators, to the context of the model (1), defined as
(6) |
where is given in (4). We give in the following proposition our main result.
Proposition 2
Proof 1
We aim to show that the risk difference between the Haff type estimators in (6) and the optimal estimator in (4), namely,
(7) |
is non–positive. Note that can be written as
The risk of these estimators under the Stein loss function (2) is given by
(8) |
First, dealing with , we apply Lemma A.2 in Tsukuma (2016) in order to get rid of the unknown parameter . It follows that,
(9) |
where, for ,
and | |||
Using the fact, for , , it can be shown that
(10) |
Therefore, using (10), we obtain
From the submultiplicativity of the trace for semi–definite positive matrices, we have . Then, an upper bound for (9) is given by
(11) |
Secondly, dealing with in (8), it can be shown that
Note that and are full rank matrices. It follows that
Therefore
(12) |
Using the fact that , for any positive constant , then
Thus
(13) |
since, for , . Consequently, thanks to (13), a lower bound for (12) is given by
(14) |
Now, relying on the proof of Proposition 2.1 in Tsukuma (2016), it can be shown that
(15) |
3 Numerical study
We study here numerically the performance of the proposed estimators of the form
(16) |
where is given in Proposition 2.
We consider the following structures of : the identity matrix and an autoregressive structure with coefficient . We set their smallest eigenvalues to zero in order to construct matrices of rank .
To assess the performance of the proposed estimators, we compute the Percentage Reduction In Average Loss (PRIAL), for some values of , , and , defined as
10 | 6.85 | 12.45 | 15.53 | 16.95 | 17.53 | ||
---|---|---|---|---|---|---|---|
20 | 9.20 | 13.91 | 14.88 | 14.68 | 12.47 | ||
30 | 11.81 | 14.33 | 13.41 | 12.43 | 11.71 | ||
20 | 18.31 | 19.65 | 17.75 | 16.44 | 15.63 | ||
40 | 17.12 | 16.33 | 14.07 | 12.78 | 12.02 | ||
50 | 11.80 | 14.23 | 13.29 | 12.30 | 11.59 | ||
20 | 18.17 | 19.69 | 17.87 | 16.56 | 15.71 | ||
40 | 17.08 | 16.31 | 14.07 | 12.76 | 11.99 | ||
60 | 8.88 | 12.27 | 12.33 | 11.75 | 11.19 | ||
150 | 2.83 | 5.09 | 6.50 | 7.25 | 7.61 | ||
10 | 6.06 | 8.70 | 9.62 | 9.89 | 9.92 | ||
20 | 8.81 | 11.66 | 12.12 | 11.93 | 11.61 | ||
30 | 11.46 | 13.15 | 12.35 | 11.48 | 10.82 | ||
20 | 17.18 | 17.45 | 15.85 | 14.76 | 14.07 | ||
40 | 16.34 | 15.18 | 13.19 | 12.00 | 11.28 | ||
50 | 11.33 | 12.82 | 12.02 | 11.19 | 10.57 | ||
20 | 17.30 | 18.00 | 16.38 | 15.19 | 14.42 | ||
40 | 16.27 | 15.04 | 13.04 | 11.84 | 11.13 | ||
60 | 8.48 | 10.43 | 10.29 | 9.81 | 9.35 | ||
150 | 2.73 | 4.03 | 4.61 | 4.86 | 4.95 |
Table 1 shows that the proposed estimators improve over for any possible ordering of and . Compared to other cases, the Haff type estimators (for ) have better performances in the case where , with PRIAL’s higher than for both structures and of . We report that the optimal value of , which maximizes the PRIAL’s, depends on and .
Acknowledgement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References
- Srivastava (2003) M. S. Srivastava, Singular Wishart and multivariate Beta distributions, Ann. Statis. 31 (2003) 1537–1560.
- Stein (1986) C. Stein, Lectures on the theory of estimation of many parameters, J. Sov. Math. 34 (1986) 1373–1403.
- Tsukuma (2016) H. Tsukuma, Estimation of a high-dimensional covariance matrix with the Stein loss, J. Multivar. Anal. 148 (2016) 1–17.
- Ledoit and Wolf (2004) O. Ledoit, M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, J. Multivar. Anal. 88 (2004) 365–411.
- James and Stein (1961) W. James, C. Stein, Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, California, 1961, pp. 361–379.
- Takemura (1984) A. Takemura, An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population, Tsukuba J. Math. 8 (1984) 367–376.
- Dey and Srinivasan (1985) D. K. Dey, C. Srinivasan, Estimation of a covariance matrix under Stein’s loss, Ann. of Statis. 13 (1985) 1581–1591.
- Haff (1980) L. Haff, Empirical Bayes estimation of the multivariate normal covariance matrix, Ann. Statis. 8 (1980) 586–597.
- Konno (2009) Y. Konno, Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant quadratic loss, J. Multivar. Anal. 100 (2009) 2237–2253.
- Haddouche et al. (2021) A. M. Haddouche, D. Fourdrinier, F. Mezoued, Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions, J. Multivar. Anal. 181 (2021) 104680.
- Chételat and Wells (2016) D. Chételat, M. T. Wells, Improved second order estimation in the singular multivariate normal model, J. Multivar. Anal. 147 (2016) 1–19.
- Kubokawa and Srivastava (2008) T. Kubokawa, M. Srivastava, Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data, J. Multivar. Anal. 99 (2008) 1906–1928.