This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A unified approach for covariance matrix estimation under Stein loss

Anis M. Haddouche11footnotemark: 1 Wei Lu 22footnotemark: 2
Abstract

In this paper, we address the problem of estimating a covariance matrix of a multivariate Gaussian distribution, relative to a Stein loss function, from a decision theoretic point of view. We investigate the case where the covariance matrix is invertible and the case when it is non–invertible in a unified approach.

keywords:
Orthogonally invariant estimators , singular covariance matrix , statistical decision theory , high–dimensional statistics.
MSC:
[2010]
62H12 , 62F10 , 62C99.
journal: Statistics and Probability Letters
\affiliation

[label1]organization=INSA Rouen, Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS and LMI, addressline=avenue de l’Université, BP 8, city=Saint-Étienne-du-Rouvray, postcode=76801, country=France., ead= Mohamed.haddouche@insa-rouen.fr \affiliation[label2]organization=INSA Rouen, Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, LMI, addressline=avenue de l’Université, BP 8, city=Saint-Étienne-du-Rouvray, postcode=76801, country=France., ead= wei.lu@insa-rouen.fr

1 Introduction

Let 𝑿\boldsymbol{X} be an observed p×np\times n matrix of the form

𝑿=𝑩𝒁,\displaystyle\boldsymbol{X}=\boldsymbol{B}\boldsymbol{Z}, (1)

where 𝑩\boldsymbol{B} is a p×rp\times r matrix of unknown parameters, with rpr\leq p, and 𝒁\boldsymbol{Z} is a r×nr\times n random matrix. Assume that rr is known and that the columns of 𝒁\boldsymbol{Z} are identically and independently distributed as the rr-dimensional multivariate normal distribution 𝒩r(0r,𝑰r)\mathcal{N}_{r}({0}_{r},\boldsymbol{I}_{r}). Then the columns of 𝑿\boldsymbol{X} are identically and independently distributed from the pp-dimensional multivariate normal 𝒩p(0p,𝚺)\mathcal{N}_{p}({0}_{p},\boldsymbol{\Sigma}), where 𝚺=𝑩𝑩T\boldsymbol{\Sigma}=\boldsymbol{B}\boldsymbol{B}^{T} is the unknown p×pp\times p covariance matrix with

rank(𝚺)=rp.{\rm rank}(\boldsymbol{\Sigma})=r\leq p\,.

It follows that the p×pp\times p sample covariance matrix 𝑺=𝑿𝑿T\boldsymbol{S}=\boldsymbol{X}\boldsymbol{X}^{T} has a singular Wishart distribution (see Srivastava (2003)) such that

rank(𝑺)=min(n,r)=qp,{\rm rank}(\boldsymbol{S})=\min(n,r)=q\leq p\,,

with probability one. We denote in the following by 𝑺+\boldsymbol{S}^{+} and 𝚺+\boldsymbol{\Sigma}^{+} the Moore-Penrose inverses of 𝑺\boldsymbol{S} and 𝚺\boldsymbol{\Sigma} respectively.

We consider the problem of estimating the covariance matrix 𝚺\boldsymbol{\Sigma} under the Stein type loss function

L(𝚺^,𝚺)=tr(𝚺+𝚺^)ln|Λ(𝚺+𝚺^)|q,\displaystyle L(\boldsymbol{\hat{\Sigma}},\boldsymbol{\Sigma})={\rm tr}({\boldsymbol{\Sigma^{+}{\hat{\Sigma}}}})-\ln\!|\Lambda(\boldsymbol{\Sigma^{+}{\hat{\Sigma}}})|-q\,, (2)

where 𝚺^\boldsymbol{\hat{\Sigma}} estimates 𝚺\boldsymbol{\Sigma} and Λ(𝚺+𝚺^)\Lambda(\boldsymbol{\Sigma^{+}{\hat{\Sigma}}}) is the diagonal matrix of the qq positives eigenvalues of 𝚺+𝚺^\boldsymbol{\Sigma^{+}{\hat{\Sigma}}}. The corresponding risk function is denoted by

R(𝚺^,𝚺)=E[L(𝚺^,𝚺)],\displaystyle R(\boldsymbol{\hat{\Sigma}},\boldsymbol{\Sigma})={E}[L(\boldsymbol{\hat{\Sigma}},\boldsymbol{\Sigma})]\,,

where E(){E}(\cdot) denotes the expectation with respect to the model (1). Note that the loss function (2) is an adaptation of the original Stein loss function (see Stein (1986)) to the context of the model (1) (see Tsukuma (2016) for more details).

The difficulty of covariance estimation is commonly characterized by the ratio p/np/n. The usual estimators of the form

𝚺^a=a𝑺,witha>0,\displaystyle\boldsymbol{\hat{\Sigma}}_{a}={a}\,\boldsymbol{S},\quad\text{with}\quad a>0, (3)

perform poorly when n,pn,p\to\infty with p/nc>0p/n\to c>0 (see Ledoit and Wolf (2004)). Hence, in this situation, alternative estimators are needed. Indeed, as pointed out by James and Stein (1961), the larger (smaller) eigenvalues of 𝚺\boldsymbol{\Sigma} are overestimated (underestimated) by those estimators. Therefore, a possible approach to derive an improved estimators is to regularize the eigenvalues of 𝚺^a\boldsymbol{\hat{\Sigma}}_{a}. This fact suggest to consider the class of orthogonally invariant estimators (see Takemura (1984)) in (5) below.

Considering the model (1), we deal, in a unified approach, with the following cases.

  1. 1.

    n<r=pn<r=p: 𝚺\boldsymbol{\Sigma} is invertible of rank{\rm rank} pp and 𝑺\boldsymbol{S} is non–invertible of rank{\rm rank} nn;

  2. 2.

    r=pnr=p\leq n: 𝚺\boldsymbol{\Sigma} and 𝑺\boldsymbol{S} are invertible;

  3. 3.

    r<pnr<p\leq n: 𝚺\boldsymbol{\Sigma} and 𝑺\boldsymbol{S} are non–invertible of rank rr;

  4. 4.

    rn<pr\leq n<p: 𝚺\boldsymbol{\Sigma} and 𝑺\boldsymbol{S} are non–invertible of rank rr;

  5. 5.

    n<r<pn<r<p: 𝚺\boldsymbol{\Sigma} and 𝑺\boldsymbol{S} are non–invertible of ranks rr and nn respectively.

The class of orthogonally invariant estimators was considered by various authors. For instance, see Stein (1986), Dey and Srinivasan (1985) and Haff (1980) for the case (i)(i), Konno (2009) and Haddouche et al. (2021) for the cases (i)(i) and (ii)(ii). See also Chételat and Wells (2016) for the cases (iii)(iii) and (iv)(iv). Recently Tsukuma (2016) extend the Stein (1986) estimator to the five possible cases above in a unified approach. Similarly, we extend here the class of Haff (1980) estimators to the context of the model (1).

The rest of this paper is organized as follows. In Section 2, we derive the improvement result of the proposed estimators over the usual estimators. We study numerically the behavior of the proposed estimators in Section 3.

2 Main result

Improving the class of the natural estimators in (3) relies on improving the optimal estimator among this class, that is, the one which minimizes the loss function (2).

Proposition 1 (Tsukuma (2016))

Under the Stein loss function (2), the optimal estimator among the class (3) is given by

𝚺^ao=ao𝑺,whereao=1mandm=max(n,r).\displaystyle\boldsymbol{\hat{\Sigma}}_{a_{o}}={a_{o}}\boldsymbol{S},\quad\text{where}\quad a_{o}=\frac{1}{m}\quad\text{and}\quad m=\max{(n,r)}. (4)

As mentioned in Section 1, we consider the class of orthogonnally invariant estimators. Let 𝑺=𝑯𝑳𝑯\boldsymbol{S}=\boldsymbol{H}\,\boldsymbol{L}\,\boldsymbol{H}^{\top} be the eigenvalue decomposition of 𝑺\boldsymbol{S} where 𝑯\boldsymbol{H} is a p×qp\times q semi–orthogonal matrix of eigenvectors and 𝑳=diag(l1,,lq)\boldsymbol{L}={\rm diag}(l_{1},\dots,l_{q}), with l1>,,>lql_{1}>,\dots,>l_{q}, is the diagonal matrix of the qq positive corresponding eigenvalues (see Kubokawa and Srivastava (2008) for more details). The class of orthogonally invariant estimators is of the form

𝚺^Ψ\displaystyle\hat{\boldsymbol{\Sigma}}_{\Psi} =ao(𝑺+𝑯𝑳𝚿(𝑳)𝑯)\displaystyle=a_{o}\,\big{(}\boldsymbol{S}+\boldsymbol{H}\,\boldsymbol{L}\,\boldsymbol{\Psi}(\boldsymbol{L})\,\boldsymbol{H}^{\top})\, (5)

with 𝚿(𝑳)=diag(ψ1(𝑳),,ψq(𝑳))\boldsymbol{\Psi}(\boldsymbol{L})={\rm diag}(\psi_{1}(\boldsymbol{L}),\dots,\psi_{q}(\boldsymbol{L})), where ψi(𝑳)\psi_{i}(\boldsymbol{L}) (i=1,,qi=1,\dots,q) is a differentiable function of 𝑳\boldsymbol{L}.

More precisely, we consider an extension of the class of Haff (1980) estimators, to the context of the model (1), defined as

𝚺^α=ao(𝑺+𝑯𝑳𝚿(𝑳)𝑯)with α1,b>0and𝚿(𝑳)=b𝑳αtr(𝑳α),\displaystyle\hat{\boldsymbol{\Sigma}}_{\alpha}=a_{o}\big{(}\boldsymbol{S}+\boldsymbol{H}\boldsymbol{L}\boldsymbol{\Psi}(\boldsymbol{L})\boldsymbol{H}^{\top}\big{)}\,\text{with }\,\,\alpha\geq 1\,,\,b>0\,\,\text{and}\,\,\,\boldsymbol{\Psi}(\boldsymbol{L})=b\frac{\boldsymbol{L}^{-\alpha}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}, (6)

where aoa_{o} is given in (4). We give in the following proposition our main result.

Proposition 2

The Haff type estimators in (6) improves over the optimal estimator in (4), under the loss function (2), as soon as

0<bbo=2(q1)mq+1.\displaystyle 0<b\leq b_{o}=\frac{2\,(q-1)}{m-q+1}\,.
Proof 1

We aim to show that the risk difference between the Haff type estimators in (6) and the optimal estimator in (4), namely,

Δ(α,ao)=R(𝚺^α,𝚺)R(𝚺^ao,𝚺),\displaystyle\Delta_{(\alpha,a_{o})}=R(\boldsymbol{\hat{\Sigma}}_{\alpha},\boldsymbol{\Sigma})-R(\boldsymbol{\hat{\Sigma}}_{a_{o}},\boldsymbol{\Sigma}), (7)

is non–positive. Note that 𝚺^α\boldsymbol{\hat{\Sigma}}_{\alpha} can be written as

𝚺^α=ao𝑯𝑳𝚽(𝑳)𝑯with𝚽(𝑳)=𝑰q+𝚿(𝑳).\displaystyle\hat{\boldsymbol{\Sigma}}_{\alpha}=a_{o}\,\boldsymbol{H}\boldsymbol{L}\boldsymbol{\Phi}(\boldsymbol{L})\boldsymbol{H}^{\top}\quad\text{with}\quad\boldsymbol{\Phi}(\boldsymbol{L})=\boldsymbol{I}_{q}+\boldsymbol{\Psi}(\boldsymbol{L}).

The risk of these estimators under the Stein loss function (2) is given by

R(𝚺^α,𝚺)=E(tr(𝚺+𝚺^α))E(ln|Λ(𝚺+𝚺^α)|)q.\displaystyle R(\hat{\boldsymbol{\Sigma}}_{\alpha},\boldsymbol{\Sigma})={E}\big{(}{\rm tr}(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})\big{)}-{E}\big{(}\!\ln\!|\Lambda(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})|\big{)}-q\,. (8)

First, dealing with E(tr(𝚺^α𝚺+)){E}\big{(}{\rm tr}(\hat{\boldsymbol{\Sigma}}_{\alpha}\boldsymbol{\Sigma}^{+})\big{)}, we apply Lemma A.2 in Tsukuma (2016) in order to get rid of the unknown parameter 𝚺+\boldsymbol{\Sigma}^{+}. It follows that,

E(tr(𝚺+𝚺^α))\displaystyle{E}\big{(}{\rm tr}(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})\big{)} =aoE(i=1q{(mq+1)φi+2liφili+2j>iqliφiljφjlilj}),\displaystyle=a_{o}{E}\left(\sum_{i=1}^{q}\!\left\{\!(m-q+1)\varphi_{i}+2l_{i}\frac{\partial\varphi_{i}}{\partial l_{i}}+2\sum_{j>i}^{q}\frac{l_{i}\varphi_{i}-l_{j}\varphi_{j}}{l_{i}-l_{j}}\!\right\}\!\right), (9)

where, for i=1,,qi=1,\dots,q,

ϕi=1+bliαtr(Lα),φili=bα1tr(𝑳α)liαtr2(𝑳α)li1+2α\displaystyle\phi_{i}=1+b\,\frac{l_{i}^{-\alpha}}{{\rm tr}(L^{-\alpha})},\quad\frac{\partial\varphi_{i}}{\partial l_{i}}=b\,\alpha\,\frac{1-{\rm tr}(\boldsymbol{L}^{-\alpha})\,l_{i}^{\alpha}}{{\rm tr}^{2}(\boldsymbol{L}^{-\alpha})\,l_{i}^{1+2\alpha}}
and
j>iqliφiljφjlilj=j>iq(1+btr(𝑳α){li1αlj1αlilj}).\displaystyle\sum_{j>i}^{q}\frac{l_{i}\varphi_{i}-l_{j}\varphi_{j}}{l_{i}-l_{j}}=\sum_{j>i}^{q}\left(1+\frac{b}{{\rm tr}(\boldsymbol{L}^{-\alpha})}\left\{\frac{l_{i}^{1-\alpha}-l_{j}^{1-\alpha}}{l_{i}-l_{j}}\right\}\right)\,.

Using the fact, for j>ij>i, lj>lil_{j}>l_{i}, it can be shown that

j>iqliφiljφjlili(qi).\displaystyle\sum_{j>i}^{q}\frac{l_{i}\varphi_{i}-l_{j}\varphi_{j}}{l_{i}-l_{i}}\leq(q-i)\,. (10)

Therefore, using (10), we obtain

E(tr(𝚺+𝚺^α))\displaystyle{E}({\rm tr}(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})) aoE(i=1q{(mq+1)(1+bliαtr(𝑳α))\displaystyle\leq a_{o}\,{E}\Biggr{(}\sum_{i=1}^{q}\Biggr{\{}(m-q+1)\left(1+b\frac{{l_{i}}^{-\alpha}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}\right)
+2bα1tr(𝑳α)liαtr2(𝑳α)li2α+2(qi)})\displaystyle\hskip 76.82234pt+2\,b\,\alpha\,\frac{1-{\rm tr}(\boldsymbol{L}^{-\alpha})\,l_{i}^{\alpha}}{{\rm tr}^{2}(\boldsymbol{L}^{-\alpha})\,l_{i}^{2\alpha}}+2(q-i)\Biggl{\}}\Biggl{)}
=aomq+aobE(i=1q{(mq+1)liαtr(𝑳α)\displaystyle=a_{o}\,m\,q+a_{o}\,b\,{E}\Biggr{(}\sum_{i=1}^{q}\Biggr{\{}(m-q+1)\frac{{l_{i}}^{-\alpha}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}
+2α1tr(𝑳α)liαtr2(𝑳α)li2α})\displaystyle\hskip 142.26378pt+2\,\alpha\frac{1-{\rm tr}(\boldsymbol{L}^{-\alpha})\,l_{i}^{\alpha}}{{\rm tr}^{2}(\boldsymbol{L}^{-\alpha})\,l_{i}^{2\alpha}}\Biggl{\}}\Biggl{)}
=ao(mq+b(mq+1))+2αE(tr(𝑳2α)tr2(𝑳α)1).\displaystyle=a_{o}\left(m\,q+b\,(m-q+1)\right)+2\,\alpha\,{E}\left(\frac{{\rm tr}(\boldsymbol{L}^{-2\alpha})}{{\rm tr}^{2}(\boldsymbol{L}^{-\alpha})}-1\right)\,.

From the submultiplicativity of the trace for semi–definite positive matrices, we have tr(𝐋2α)tr2(𝐋α){\rm tr}(\boldsymbol{L}^{-2\alpha})\leq{\rm tr}^{2}(\boldsymbol{L}^{-\alpha}). Then, an upper bound for (9) is given by

E(tr(𝚺+𝚺^α))\displaystyle{E}({\rm tr}(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})) ao(mq+b(mq+1)).\displaystyle\leq a_{o}\left(m\,q+b\,(m-q+1)\right)\,. (11)

Secondly, dealing with E(ln|Λ(𝚺^α𝚺+)|){E}\big{(}\!\ln|\Lambda(\hat{\boldsymbol{\Sigma}}_{\alpha}\boldsymbol{\Sigma}^{+})|\big{)} in (8), it can be shown that

Λ(𝚺+𝚺^α)=aoΛ(𝚺+𝑯𝑳𝚽(𝑳)𝑯)=aoΛ(𝑳1/2𝑯𝚺+𝑯𝑳1/2𝚽(𝑳)).\displaystyle\Lambda(\boldsymbol{\Sigma}^{+}\hat{\boldsymbol{\Sigma}}_{\alpha})=a_{o}\,\Lambda\bigr{(}\boldsymbol{\Sigma}^{+}\boldsymbol{H}\boldsymbol{L}\boldsymbol{\Phi}(\boldsymbol{L})\boldsymbol{H^{\top}}\bigl{)}=a_{o}\,\Lambda\bigr{(}\boldsymbol{L}^{1/2}\boldsymbol{H^{\top}}\boldsymbol{\Sigma^{+}}\boldsymbol{H}\boldsymbol{L}^{1/2}\boldsymbol{\Phi}(\boldsymbol{L})\bigl{)}.

Note that 𝐋1/2𝐇𝚺+𝐇𝐋1/2\boldsymbol{L}^{1/2}\boldsymbol{H^{\top}}\boldsymbol{\Sigma^{+}}\boldsymbol{H}\boldsymbol{L}^{1/2} and 𝚽(𝐋)\boldsymbol{\Phi}(\boldsymbol{L}) are full rank q×qq\times q matrices. It follows that

|Λ(𝚺^α𝚺+)|=aoq|𝑳1/2𝑯𝚺+𝑯𝑳1/2𝚽(𝑳)|.|\Lambda(\hat{\boldsymbol{\Sigma}}_{\alpha}\boldsymbol{\Sigma}^{+})|=a^{q}_{o}\,|\boldsymbol{L}^{1/2}\boldsymbol{H^{\top}}\boldsymbol{\Sigma^{+}}\boldsymbol{H}\boldsymbol{L}^{1/2}\boldsymbol{\Phi}(\boldsymbol{L})|.

Therefore

E(ln|Λ(𝚺^α𝚺+)|)\displaystyle{E}(\ln|\Lambda(\hat{\boldsymbol{\Sigma}}_{\alpha}\boldsymbol{\Sigma}^{+})|) =qln(ao)+E(ln|𝑳1/2𝑯T𝚺+𝑯𝑳1/2|)+E(ln|𝚽(𝑳)|).\displaystyle=q\,\ln(a_{o})+{E}(\ln|\boldsymbol{L}^{1/2}\boldsymbol{H}^{T}\boldsymbol{\Sigma}^{+}\boldsymbol{H}\boldsymbol{L}^{1/2}|)+{E}(\ln|\boldsymbol{\Phi}(\boldsymbol{L})|). (12)

Using the fact that ln(1+x)2x/(2+x)\ln(1+x)\geq 2\,x/(2+x), for any positive constant xx, then

ln|𝚽(𝑳)|=ln|𝑰q+b𝑳αtr(𝑳α)|\displaystyle\ln|\boldsymbol{\Phi}(\boldsymbol{L})|=\ln|\boldsymbol{I}_{q}+\frac{b\,\boldsymbol{L}^{-\alpha}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}| =i=1qln(1+bliαtr(𝑳α))\displaystyle=\sum_{i=1}^{q}\ln\left(1+\frac{b\,l^{-\alpha}_{i}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}\right)
i=1q2bliα/tr(𝑳α)2+bliα/tr(𝑳α).\displaystyle\geq\sum_{i=1}^{q}\frac{2\,b\,l^{-\alpha}_{i}/{\rm tr}(\boldsymbol{L}^{-\alpha})}{2+b\,l^{-\alpha}_{i}/{\rm tr}(\boldsymbol{L}^{-\alpha})}.

Thus

ln|𝚽(𝑳)|2b2+b,\displaystyle\ln|\boldsymbol{\Phi}(\boldsymbol{L})|\geq\frac{2b}{2+b}\,, (13)

since, for i=1,,qi=1,\dots,q, liαtr(𝐋α)l^{-\alpha}_{i}\leq{\rm tr}(\boldsymbol{L}^{-\alpha}). Consequently, thanks to (13), a lower bound for (12) is given by

E(ln|Λ(𝚺^α𝚺+)|)qln(ao)+E(ln|𝑳1/2𝑯T𝚺+𝑯𝑳1/2|)+2b2+b.\displaystyle{E}(\ln|\Lambda(\hat{\boldsymbol{\Sigma}}_{\alpha}\boldsymbol{\Sigma}^{+})|)\geq q\,\ln(a_{o})+{E}(\ln|\boldsymbol{L}^{1/2}\boldsymbol{H}^{T}\boldsymbol{\Sigma}^{+}\boldsymbol{H}\boldsymbol{L}^{1/2}|)+\frac{2\,b}{2+b}. (14)

Now, relying on the proof of Proposition 2.1 in Tsukuma (2016), it can be shown that

R(𝚺^ao,𝚺)=qln(ao)E(ln|𝑳1/2𝑯T𝚺+𝑯𝑳1/2|).\displaystyle R(\hat{\boldsymbol{\Sigma}}_{a_{o}},\boldsymbol{\Sigma})=-q\ln(a_{o})-{E}(\ln\!|\boldsymbol{L}^{1/2}\boldsymbol{H}^{T}\boldsymbol{\Sigma}^{+}\boldsymbol{H}\boldsymbol{L}^{1/2}|). (15)

Finally, combining (11), (14) and (15), an upper bound for the risk difference in (7) is given by

Δ(α,ao)\displaystyle\Delta_{(\alpha,a_{o})} ao(mq+1)b2b2+b=b(ao(m+q1)(b+2)2),\displaystyle\leq a_{o}\,(m-q+1)\,b-\frac{2\,b}{2+b}=b\,\Bigr{(}a_{o}\,(m+q-1)\,(b+2)-2\Bigl{)},

since ao=1/ma_{o}=1/m, which is non-positive as soon as

0<bbo=2(q1)mq+1,\displaystyle 0<b\leq b_{o}=\frac{2\,(q-1)}{m-q+1},
\qed

3 Numerical study

We study here numerically the performance of the proposed estimators of the form

𝚺^α=ao(𝑺+𝑯𝑳𝚿(𝑳)𝑯)withα>0,𝚿(𝑳)=bo𝑳αtr(𝑳α),\displaystyle\hat{\boldsymbol{\Sigma}}_{\alpha}=a_{o}\big{(}\boldsymbol{S}+\boldsymbol{H}\boldsymbol{L}\boldsymbol{\Psi}(\boldsymbol{L})\boldsymbol{H}^{\top}\big{)}\quad\text{with}\quad\alpha>0,\quad\boldsymbol{\Psi}(\boldsymbol{L})=b_{o}\frac{\boldsymbol{L}^{-\alpha}}{{\rm tr}(\boldsymbol{L}^{-\alpha})}, (16)

where bob_{o} is given in Proposition 2.

We consider the following structures of 𝚺\boldsymbol{\Sigma}: (i)(i) the identity matrix 𝑰p\boldsymbol{I}_{p} and (ii)(ii) an autoregressive structure with coefficient 0.90.9. We set their prp-r smallest eigenvalues to zero in order to construct matrices of rank rpr\leq p.

To assess the performance of the proposed estimators, we compute the Percentage Reduction In Average Loss (PRIAL), for some values of pp, nn, rr and α\alpha, defined as

PRIAL(𝚺^α)=R(𝚺^ao,𝚺)R(𝚺^α,𝚺)R(𝚺^ao,𝚺)×100,\displaystyle\textrm{PRIAL}(\hat{\boldsymbol{\Sigma}}_{\alpha})=\frac{R(\hat{\boldsymbol{\Sigma}}_{a_{o}},\boldsymbol{\Sigma})-{R(\hat{\boldsymbol{\Sigma}}_{\alpha},\boldsymbol{\Sigma})}}{R(\hat{\boldsymbol{\Sigma}}_{a_{o}},\boldsymbol{\Sigma})}\times 100,

where 𝚺^ao\hat{\boldsymbol{\Sigma}}_{a_{o}} and 𝚺^α\hat{\boldsymbol{\Sigma}}_{\alpha} are respectively defined in (4) and (16).

𝚺\boldsymbol{\Sigma} (p,n)(p,n) rr 𝚺^1\hat{\boldsymbol{\Sigma}}_{1} 𝚺^2\hat{\boldsymbol{\Sigma}}_{2} 𝚺^3\hat{\boldsymbol{\Sigma}}_{3} 𝚺^4\hat{\boldsymbol{\Sigma}}_{4} 𝚺^5\hat{\boldsymbol{\Sigma}}_{5}
(i)(i) (30,50)(30,50) 10 6.85 12.45 15.53 16.95 17.53
20 9.20 13.91 14.88 14.68 12.47
30 11.81 14.33 13.41 12.43 11.71
(50,30)(50,30) 20 18.31 19.65 17.75 16.44 15.63
40 17.12 16.33 14.07 12.78 12.02
50 11.80 14.23 13.29 12.30 11.59
(150,30)(150,30) 20 18.17 19.69 17.87 16.56 15.71
40 17.08 16.31 14.07 12.76 11.99
60 8.88 12.27 12.33 11.75 11.19
150 2.83 5.09 6.50 7.25 7.61
(ii)(ii) (30,50)(30,50) 10 6.06 8.70 9.62 9.89 9.92
20 8.81 11.66 12.12 11.93 11.61
30 11.46 13.15 12.35 11.48 10.82
(50,30)(50,30) 20 17.18 17.45 15.85 14.76 14.07
40 16.34 15.18 13.19 12.00 11.28
50 11.33 12.82 12.02 11.19 10.57
(150,30)(150,30) 20 17.30 18.00 16.38 15.19 14.42
40 16.27 15.04 13.04 11.84 11.13
60 8.48 10.43 10.29 9.81 9.35
150 2.73 4.03 4.61 4.86 4.95
Table 1: Effect of α=1,,5\alpha=1,\dots,5 on PRIAL’s for the structures (i)(i) and (ii)(ii) of 𝚺\boldsymbol{\Sigma}.

Table 1 shows that the proposed estimators improve over 𝚺^ao\hat{\boldsymbol{\Sigma}}_{a_{o}} for any possible ordering of p,np,n and rr. Compared to other cases, the Haff type estimators 𝚺^α\hat{\boldsymbol{\Sigma}}_{\alpha} (for α=1,,5\alpha=1,\dots,5) have better performances in the case where p>n>rp>n>r, with PRIAL’s higher than 14.07%14.07\% for both structures (i)(i) and (ii)(ii) of 𝚺\boldsymbol{\Sigma}. We report that the optimal value of α\alpha, which maximizes the PRIAL’s, depends on p,np,n and rr.

Acknowledgement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References

  • Srivastava (2003) M. S. Srivastava, Singular Wishart and multivariate Beta distributions, Ann. Statis. 31 (2003) 1537–1560.
  • Stein (1986) C. Stein, Lectures on the theory of estimation of many parameters, J. Sov. Math. 34 (1986) 1373–1403.
  • Tsukuma (2016) H. Tsukuma, Estimation of a high-dimensional covariance matrix with the Stein loss, J. Multivar. Anal. 148 (2016) 1–17.
  • Ledoit and Wolf (2004) O. Ledoit, M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, J. Multivar. Anal. 88 (2004) 365–411.
  • James and Stein (1961) W. James, C. Stein, Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, California, 1961, pp. 361–379.
  • Takemura (1984) A. Takemura, An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population, Tsukuba J. Math. 8 (1984) 367–376.
  • Dey and Srinivasan (1985) D. K. Dey, C. Srinivasan, Estimation of a covariance matrix under Stein’s loss, Ann. of Statis. 13 (1985) 1581–1591.
  • Haff (1980) L. Haff, Empirical Bayes estimation of the multivariate normal covariance matrix, Ann. Statis. 8 (1980) 586–597.
  • Konno (2009) Y. Konno, Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant quadratic loss, J. Multivar. Anal. 100 (2009) 2237–2253.
  • Haddouche et al. (2021) A. M. Haddouche, D. Fourdrinier, F. Mezoued, Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions, J. Multivar. Anal. 181 (2021) 104680.
  • Chételat and Wells (2016) D. Chételat, M. T. Wells, Improved second order estimation in the singular multivariate normal model, J. Multivar. Anal. 147 (2016) 1–19.
  • Kubokawa and Srivastava (2008) T. Kubokawa, M. Srivastava, Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data, J. Multivar. Anal. 99 (2008) 1906–1928.