This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A symmetric matrix-variate normal local approximation for the Wishart distribution and some applications

Frédéric Ouimet11footnotemark: 1 California Institute of Technology, Pasadena, CA 91125, USA. McGill University, Montreal, QC H3A 0B9, Canada. frederic.ouimet2@mcgill.ca
Abstract

The noncentral Wishart distribution has become more mainstream in statistics as the prevalence of applications involving sample covariances with underlying multivariate Gaussian populations as dramatically increased since the advent of computers. Multiple sources in the literature deal with local approximations of the noncentral Wishart distribution with respect to its central counterpart. However, no source has yet developed explicit local approximations for the (central) Wishart distribution in terms of a normal analogue, which is important since Gaussian distributions are at the heart of the asymptotic theory for many statistical methods. In this paper, we prove a precise asymptotic expansion for the ratio of the Wishart density to the symmetric matrix-variate normal density with the same mean and covariances. The result is then used to derive an upper bound on the total variation between the corresponding probability measures and to find the pointwise variance of a new density estimator on the space of positive definite matrices with a Wishart asymmetric kernel. For the sake of completeness, we also find expressions for the pointwise bias of our new estimator, the pointwise variance as we move towards the boundary of its support, the mean squared error, the mean integrated squared error away from the boundary, and we prove its asymptotic normality.

keywords:
asymmetric kernel, asymptotic statistics, density estimation, expansion, local approximation, matrix-variate normal, multivariate associated kernel, normal approximation, smoothing, total variation, Wishart distribution
MSC:
[2020]Primary: 62E20 Secondary: 62H10, 62H12, 62B15, 62G05, 62G07
journal: Journal of Multivariate Analysis

1 Introduction

Let dd\in\mathbb{N} be given. Define the space of (real) symmetric matrices of size d×dd\times d and the space of (real symmetric) positive definite matrices of size d×dd\times d as follows:

𝒮d:={𝕄d×d:𝕄 is symmetric},\displaystyle\mathcal{S}^{\hskip 0.85358ptd}\vcentcolon=\left\{\mathbb{M}\in\mathbb{R}^{d\times d}:\text{$\mathbb{M}$ is symmetric}\right\}, (1)
𝒮++d:={𝕄d×d:𝕄 is symmetric and positive definite}.\displaystyle\mathcal{S}_{++}^{\hskip 0.85358ptd}\vcentcolon=\left\{\mathbb{M}\in\mathbb{R}^{d\times d}:\text{$\mathbb{M}$ is symmetric and positive definite}\right\}. (2)

For ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, the density function of the Wishartd(ν,𝕊)\mathrm{Wishart}_{d}(\nu,\mathbb{S}) distribution is defined by

Kν,𝕊(𝕏)\displaystyle K_{\nu,\mathbb{S}}(\mathbb{X}) :=|𝕊1𝕏|ν/2(d+1)/2exp(12tr(𝕊1𝕏))2νd/2|𝕊|(d+1)/2πd(d1)/4i=1dΓ(12(ν(i+1))+1),𝕏𝒮++d,\displaystyle\vcentcolon=\frac{|\mathbb{S}^{-1}\mathbb{X}|^{\nu/2-(d+1)/2}\exp\left(-\frac{1}{2}\mathrm{tr}(\mathbb{S}^{-1}\mathbb{X})\right)}{2^{\nu d/2}|\mathbb{S}|^{(d+1)/2}\pi^{\hskip 0.85358ptd(d-1)/4}\prod_{i=1}^{d}\Gamma(\frac{1}{2}(\nu-(i+1))+1)},\quad\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, (3)

where ν\nu is the number of degrees of freedom, 𝕊\mathbb{S} is the scale matrix, and

Γ(a):=0ta1etdt,a>0,\Gamma(a)\vcentcolon=\int_{0}^{\infty}t^{a-1}e^{-t}{\rm d}t,\quad a>0, (4)

denotes the Euler gamma function. The mean and covariance matrix for the vectorization of 𝕎Wishartd(ν,𝕊)\mathbb{W}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S}), namely

vecp(𝕎):=(𝕎11,𝕎12,𝕎22,,𝕎1d,,𝕎dd),\mathrm{vecp}(\mathbb{W})\vcentcolon=(\mathbb{W}_{11},\mathbb{W}_{12},\mathbb{W}_{22},\dots,\mathbb{W}_{1d},\dots,\mathbb{W}_{dd})^{\top}, (5)

(vecp()\mathrm{vecp}(\cdot) is the operator that stacks the columns of the upper triangular portion of a symmetric matrix on top of each other) are well known to be:

𝔼[vecp(𝕎)]=νvecp(𝕊)(alternatively, 𝔼[𝕎]=ν𝕊)\mathbb{E}[\mathrm{vecp}(\mathbb{W})]=\nu\,\mathrm{vecp}(\mathbb{S})\quad\text{(alternatively, $\mathbb{E}[\mathbb{W}]=\nu\hskip 0.85358pt\mathbb{S}$)} (6)

and

𝕍ar(vecp(𝕎))=Bd(2ν𝕊2ν𝕊)Bd,\mathbb{V}\mathrm{ar}(\mathrm{vecp}(\mathbb{W}))=B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}, (7)

where Id\mathrm{I}_{d} is the identity matrix of order dd, BdB_{d} is a d2×12d(d+1)d^{\hskip 0.85358pt2}\times\frac{1}{2}d(d+1) transition matrix (see Gupta and Nagar [25, p.11] for the precise definition), and \otimes denotes the Kronecker product.

Multiple sources in the literature deal with local approximations of the noncentral Wishart distribution with respect to the (central) Wishart distribution, see, e.g., Steyn and Roux [54], Tan and Gupta [55], Kollo and von Rosen [37], Kocherlakota and Kocherlakota [34]. However, no source has yet developed explicit local approximations for the (central) Wishart distribution in terms of a normal analogue, which is important since Gaussian distributions are at the heart of the asymptotic theory for many statistical methods.

The main goal of our paper (Theorem 1) is to establish an asymptotic expansion for the ratio of the Wishart density (3) to the symmetric matrix-variate normal (SMN) density with the same mean and covariances. According to Gupta and Nagar [25, Eq.(2.5.8)], the density of the SMNd×d(ν𝕊,Bd(2ν𝕊2ν𝕊)Bd)\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}) distribution is

gν,𝕊(𝕏)=exp(12tr(Δν,𝕊2))(2π)d(d+1)/2|Bd(2ν𝕊2ν𝕊)Bd|=exp(12tr(Δν,𝕊2))2dπd(d+1)/2|2ν𝕊|d+1,𝕏𝒮d,g_{\nu,\mathbb{S}}(\mathbb{X})=\frac{\exp\left(-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\sqrt{(2\pi)^{d(d+1)/2}\,|B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}|}}=\frac{\exp\left(-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\sqrt{2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}|\sqrt{2\nu}\,\mathbb{S}|^{d+1}}},\quad\mathbb{X}\in\mathcal{S}^{\hskip 0.85358ptd}, (8)

where the last equality follows from Gupta and Nagar [25, Eq.(1.2.18)], and

Δν,𝕊:=(2ν𝕊)1/2(𝕏ν𝕊)(2ν𝕊)1/2.\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}. (9)

Rewritings of the density (8) are provided on page 71 of Gupta and Nagar [25] using the vectorization operators vec()\mathrm{vec}(\cdot) and vecp()\mathrm{vecp}(\cdot). For example, we can rewrite gν,𝕊(𝕏)g_{\nu,\mathbb{S}}(\mathbb{X}) in terms of vecp(𝕏)\mathrm{vecp}(\mathbb{X}) as follows:

gν,𝕊(𝕏)=exp(12(vecp(𝕏)vecp(ν𝕊))[Bd(2ν𝕊2ν𝕊)Bd]1(vecp(𝕏)vecp(ν𝕊)))(2π)d(d+1)/2|Bd(2ν𝕊2ν𝕊)Bd|,𝕏𝒮d.g_{\nu,\mathbb{S}}(\mathbb{X})=\frac{\exp\left(-\frac{1}{2}(\mathrm{vecp}(\mathbb{X})-\mathrm{vecp}(\nu\hskip 0.85358pt\mathbb{S}))^{\top}\big{[}B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}\big{]}^{-1}(\mathrm{vecp}(\mathbb{X})-\mathrm{vecp}(\nu\hskip 0.85358pt\mathbb{S}))\right)}{\sqrt{(2\pi)^{d(d+1)/2}\,|B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}|}},\quad\mathbb{X}\in\mathcal{S}^{\hskip 0.85358ptd}. (10)

To give a bit of practical motivations for the SMN distribution (8), note that noise in the estimate of individual voxels of diffusion tensor magnetic resonance imaging (DT-MRI) data has been shown to be well modeled by the SMN3×3\mathrm{SMN}_{3\times 3} distribution in [44, 6, 45]. The SMN voxel distributions were combined into a tensor-variate normal distribution in [7, 23], which could help to predict how the whole image (not just individual voxels) changes when shearing and dilation operations are applied in image wearing and registration problems, see Alexander et al. [3]. In [49], maximum likelihood estimators and likelihood ratio tests are developed for the eigenvalues and eigenvectors of a form of the SMN distribution with an orthogonally invariant covariance structure, both in one-sample problems (for example, in image interpolation) and two-sample problems (when comparing images) and under a broad variety of assumptions. This work extended significantly previous results of Mallows [41]. In [49], it is also mentioned that the polarization pattern of cosmic microwave background (CMB) radiation measurements can be represented by 2×22\times 2 positive definite matrices, see the primer by Hu and White [30]. In a very recent and interesting paper, Vafaei Sadr and Movahed [56] presented evidence for the Gaussianity of the local extrema of CMB maps. We can also mention [22], where finite mixtures of skewed SMN distributions were applied to an image recognition problem.

In general, we know that the Gaussian distribution is an attractor for sums of i.i.d. random variables with finite variance, which makes many estimators in statistics asymptotically normal. Similarly, we expect the SMN distribution (8) to be an attractor for sums of i.i.d. random symmetric matrices with finite variances, thus including many estimators such as sample covariance matrices and score statistics for symmetric matrix parameters. In particular, if a given statistic or estimator is a function of the components of a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population, then we could study its large sample properties (such as its moments) using Theorem 1 (for example, by turning a Wishart-moments estimation problem into a Gaussian-moments estimation problem).

In Section 3, we use our asymptotic expansion (Theorem 1) to find the pointwise variance of a new density estimator on the space of positive definite matrices with a Wishart asymmetric kernel (Section 3.1), and we derive an upper bound on the total variation between the probability measures on 𝒮d\mathcal{S}^{\hskip 0.85358ptd} induced by (3) and (8) (Section 3.2). These are two examples of applications, but it is clear that there could be many others under the proper context.

Remark 1 (Notation).

Throughout the paper, a=𝒪(b)a=\mathcal{O}(b) means that lim sup|a/b|<C\limsup|a/b|<C as ν\nu\to\infty (or as b0b\to 0 or as nn\to\infty in Section 3.1, depending on the context), where C>0C>0 is a universal constant. Whenever CC might depend on some parameter, we add a subscript (for example, a=𝒪d(b)a=\mathcal{O}_{d}(b)). Similarly, a=o(b)a=\mathrm{o}(b) means that lim|a/b|=0\lim|a/b|=~{}0, and subscripts indicate which parameters the convergence rate can depend on. The notation tr()\mathrm{tr}(\cdot) will denote the trace operator for matrices and |||\cdot| their determinant. For a matrix 𝕄d×d\mathbb{M}\in\mathbb{R}^{d\times d} that is diagonalizable, λ1(𝕄)λd(𝕄)\lambda_{1}(\mathbb{M})\leq\dots\leq\lambda_{d}(\mathbb{M}) will denote its eigenvalues, and we let 𝝀(𝕄):=(λ1(𝕄),,λd(𝕄))\boldsymbol{\lambda}(\mathbb{M})\vcentcolon=(\lambda_{1}(\mathbb{M}),\dots,\lambda_{d}(\mathbb{M}))^{\top}.

In Section 3.1 and the related proofs, the symbol 𝒟\mathscr{D} over an arrow ‘\longrightarrow’ will denote the convergence in distribution (or law). We will also use the shorthand [d]:={1,,d}[d]\vcentcolon=\{1,\dots,d\} in several places. Finally, the bandwidth parameter b=b(n)b=b(n) will always be implicitly a function of the number of observations, the only exceptions being in Theorem 2 and the related proof.

2 Main result

Below, we prove an asymptotic expansion for the ratio of the Wishart density to the symmetric matrix-variate normal (SMN) density with the same mean and covariances. This result is (much) stronger than the result found, for example, in [4, Theorem 3.6.2] or [20, Theorem 2.5.1], which says that for a sequence of i.i.d. multivariate Gaussian observations 𝑿1,,𝑿n𝒩d(𝝁,𝕊)\boldsymbol{X}_{1},\dots,\boldsymbol{X}_{n}\sim\mathcal{N}_{d}(\boldsymbol{\mu},\mathbb{S}) with 𝝁d\boldsymbol{\mu}\in\mathbb{R}^{d} and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, the scaled and recentered sample covariance matrix of 𝑿1,,𝑿n\boldsymbol{X}_{1},\dots,\boldsymbol{X}_{n} converges in law to a SMN distribution, specifically,

n1/2[i=1n(𝑿i𝝁)(𝑿i𝝁)n𝕊]𝒟SMNd×d(0d×d,2Bd(𝕊𝕊)Bd),n.n^{-1/2}\left[\sum_{i=1}^{n}(\boldsymbol{X}_{i}-\boldsymbol{\mu})(\boldsymbol{X}_{i}-\boldsymbol{\mu})^{\top}-n\,\mathbb{S}\right]\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathrm{SMN}_{d\times d}(0_{d\times d},2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})\hskip 0.85358ptB_{d}),\quad n\to\infty. (11)

The result in Theorem 1 is stronger than (11) since it is well known that i=1n(𝑿i𝝁)(𝑿i𝝁)Wishart(n,𝕊)\sum_{i=1}^{n}(\boldsymbol{X}_{i}-\boldsymbol{\mu})(\boldsymbol{X}_{i}-\boldsymbol{\mu})^{\top}\sim\mathrm{Wishart}\hskip 0.85358pt(n,\mathbb{S}) in this context.

Theorem 1.

Let ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be given. Pick any η(0,1)\eta\in(0,1) and let

Bν,𝕊(η):={𝕏𝒮++d:max1id|2/νλi(Δν,𝕊)|ην1/3}B_{\nu,\mathbb{S}}(\eta)\vcentcolon=\left\{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}:\max_{1\leq i\leq d}\left|\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right|\leq\eta\,\nu^{-1/3}\right\} (12)

denote the bulk of the Wishart distribution. Then, as ν\nu\to\infty and uniformly for 𝕏Bν,𝕊(η)\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta), we have

log(Kν,𝕊(𝕏)gν,𝕊(𝕏))\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right) =ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}+ν1{12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\} (13)
+𝒪d,η(1+max1id|λi(Δν,𝕊)|5ν3/2).\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{5}}{\nu^{3/2}}\right).

Furthermore,

Kν,𝕊(𝕏)gν,𝕊(𝕏)=1\displaystyle\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=1 +ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}+ν1{19(tr(Δν,𝕊3))2d+13tr(Δν,𝕊3)tr(Δν,𝕊)+(d+1)24(tr(Δν,𝕊))2\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right)^{2}\right. (14)
12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}+𝒪d,η(1+max1id|λi(Δν,𝕊)|9ν3/2).\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{9}}{\nu^{3/2}}\right).

As a direct consequence of Theorem 1, we obtain expansions for the ratio and log-ratio of the density function for a multivariate bijective mapping 𝒉()\boldsymbol{h}(\cdot) applied to a Wishart random matrix to the density function of the same mapping applied to the corresponding SMN random matrix. In particular, the corollary below provides an asymptotic expansion for the density of a bijective mapping applied to a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population.

Corollary 1.

Let ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be given, and let 𝕎Wishartd(ν,𝕊)\mathbb{W}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S}) and SMNd×d(ν𝕊,Bd(2ν𝕊2ν𝕊)Bd)\mathbb{N}\sim\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}). Let 𝐡()\boldsymbol{h}(\cdot) be a one-to-one mapping from an open subset 𝒟\mathcal{D} of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} onto a subset \mathcal{R} of d(d+1)/2\mathbb{R}^{d(d+1)/2}. Assume further that 𝐡\boldsymbol{h} has continuous partial derivatives on 𝒟\mathcal{D} and its Jacobian determinant |ddvecp(𝕏)𝐡(𝕏)|\big{|}\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\big{|} is non-zero for all 𝕏𝒟\mathbb{X}\in\mathcal{D}. Define

Δ~ν,𝕊=(2ν𝕊)1/2(𝒉1(𝒚)ν𝕊)(2ν𝕊)1/2,𝒚,\widetilde{\Delta}_{\nu,\mathbb{S}}=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\boldsymbol{h}^{-1}(\boldsymbol{y})-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2},\quad\boldsymbol{y}\in\mathcal{R}, (15)

and denote by f𝐡(𝕎)f_{\boldsymbol{h}(\mathbb{W})} and f𝐡()f_{\boldsymbol{h}(\mathbb{N})} the density functions of 𝐡(𝕎)\boldsymbol{h}(\mathbb{W}) and 𝐡()\boldsymbol{h}(\mathbb{N}), respectively. Fix any η(0,1)\eta\in(0,1), then we have, as ν\nu\to\infty, and uniformly for 𝐲\boldsymbol{y}\in\mathcal{R} such that 𝐡1(𝐲)Bν,𝕊(η)\boldsymbol{h}^{-1}(\boldsymbol{y})\in B_{\nu,\mathbb{S}}(\eta),

log(f𝒉(𝕎)(𝒚)f𝒉()(𝒚))\displaystyle\log\left(\frac{f_{\boldsymbol{h}(\mathbb{W})}(\boldsymbol{y})}{f_{\boldsymbol{h}(\mathbb{N})}(\boldsymbol{y})}\right) =ν1/2{23tr(Δ~ν,𝕊3)d+12tr(Δ~ν,𝕊)}+ν1{12tr(Δ~ν,𝕊4)+d+12tr(Δ~ν,𝕊2)(d(2d2+3d5)24+d6)}\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\} (16)
+𝒪d,η(1+max1id|λi(Δ~ν,𝕊)|5ν3/2),\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\widetilde{\Delta}_{\nu,\mathbb{S}})|^{5}}{\nu^{3/2}}\right),

and

f𝒉(𝕎)(𝒚)f𝒉()(𝒚)=1\displaystyle\frac{f_{\boldsymbol{h}(\mathbb{W})}(\boldsymbol{y})}{f_{\boldsymbol{h}(\mathbb{N})}(\boldsymbol{y})}=1 +ν1/2{23tr(Δ~ν,𝕊3)d+12tr(Δ~ν,𝕊)}+ν1{19(tr(Δ~ν,𝕊3))2d+13tr(Δ~ν,𝕊3)tr(Δ~ν,𝕊)+(d+1)24(tr(Δ~ν,𝕊))2\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\right)^{2}\right. (17)
12tr(Δ~ν,𝕊4)+d+12tr(Δ~ν,𝕊2)(d(2d2+3d5)24+d6)}+𝒪d,η(1+max1id|λi(Δ~ν,𝕊)|9ν3/2).\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\widetilde{\Delta}_{\nu,\mathbb{S}})|^{9}}{\nu^{3/2}}\right).

Under the conditions of Corollary 1, we know from the multivariate delta method (see, e.g., [20, Theorem 2.5.2]) that the random vectors ν1/2(𝒉(𝕎)𝒉(ν𝕊))\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{W})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S})) and ν1/2(𝒉()𝒉(ν𝕊))\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{N})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S})) both converge in distribution, as ν\nu\to\infty, to

𝒀𝒩d(d+1)/2(𝟎,(ddvecp(𝕏)𝒉(𝕏)|𝕏=ν𝕊)2Bd(𝕊𝕊)Bd(ddvecp(𝕏)𝒉(𝕏)|𝕏=ν𝕊)),\boldsymbol{Y}\sim\mathcal{N}_{d(d+1)/2}\left(\boldsymbol{0},\left(\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right)2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})\hskip 0.85358ptB_{d}\left(\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right)^{\top}\right),

where 𝒉(𝕏)=(h1(𝕏),,hd(d+1)/2(𝕏))\boldsymbol{h}(\mathbb{X})=(h_{1}(\mathbb{X}),\dots,h_{d(d+1)/2}(\mathbb{X}))^{\top} and

ddvecp(𝕏)𝒉(𝕏)|𝕏=ν𝕊=[ddvecp(𝕏)h1(𝕏)|𝕏=ν𝕊ddvecp(𝕏)h2(𝕏)|𝕏=ν𝕊ddvecp(𝕏)hd(d+1)/2(𝕏)|𝕏=ν𝕊].\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}=\left[\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{1}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}~{}~{}\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{2}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}~{}~{}\dots~{}~{}\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{d(d+1)/2}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right]^{\top}.

Therefore, it would have been neat to extend Corollary 1 by expanding the log-ratio log(fν1/2(𝒉(𝕎)𝒉(ν𝕊))(𝒚)/f𝒀(𝒚))\log(f_{\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{W})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))}(\boldsymbol{y})/f_{\boldsymbol{Y}}(\boldsymbol{y})). However, this would most likely require an expansion for the log-ratio log(fν1/2(𝒉()𝒉(ν𝕊))(𝒚)/f𝒀(𝒚))\log(f_{\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{N})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))}(\boldsymbol{y})/f_{\boldsymbol{Y}}(\boldsymbol{y})), and it is unclear which restrictions we should impose on 𝒉\boldsymbol{h} to progress in that direction. This question is left open for future research.

Below, we provide numerical evidence (displayed graphically) for the validity of the expansion in Theorem 1 when d=2d=2. We compare three levels of approximation for various choices of 𝕊\mathbb{S}. For any given 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, define

E0\displaystyle E_{0} :=sup𝕏Bν,𝕊(21/2ν1/6)|log(Kν,𝕊(𝕏)gν,𝕊(𝕏))|,\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)\right|, (18)
E1\displaystyle E_{1} :=sup𝕏Bν,𝕊(21/2ν1/6)|log(Kν,𝕊(𝕏)gν,𝕊(𝕏))ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}|,\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)-\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}\right|, (19)
E2\displaystyle E_{2} :=sup𝕏Bν,𝕊(21/2ν1/6)|log(Kν,𝕊(𝕏)gν,𝕊(𝕏))ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)-\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}\right.
ν1{12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}|.\displaystyle\quad\left.\hskip 71.13188pt-\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}\right|. (20)

In order to avoid numerical errors in part due to the gamma functions in Kν,𝕊(𝕏)K_{\nu,\mathbb{S}}(\mathbb{X}), we have to work a bit to get an expression for log(Kν,𝕊(𝕏)/gν,𝕊(𝕏))\log\left(K_{\nu,\mathbb{S}}(\mathbb{X})/g_{\nu,\mathbb{S}}(\mathbb{X})\right) which is numerically more stable. By taking the expression for Kν,𝕊(𝕏)K_{\nu,\mathbb{S}}(\mathbb{X}) in (54) and dividing by the expression for gν,𝕊(𝕏)g_{\nu,\mathbb{S}}(\mathbb{X}) on the right-hand side of (8), we get

Kν,𝕊(𝕏)gν,𝕊(𝕏)=|Id+2/νΔν,𝕊|ν/2(d+1)/2exp(ν2tr(Δν,𝕊)+12tr(Δν,𝕊2))i=1d(ν(i+1))(νi)/2e(i+1)/2ν(νi)/2i=1dΓ(12(ν(i+1))+1)2πe12(ν(i+1))[12(ν(i+1))](νi)/2,\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=|\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}|^{\nu/2-(d+1)/2}\cdot\frac{\exp\left(-\sqrt{\frac{\nu}{2}}\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\cdot\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}}, (21)

so that

log(Kν,𝕊(𝕏)gν,𝕊(𝕏))\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right) =ν(d+1)2i=1dlog(1+2νλi(Δν,𝕊))ν2tr(Δν,𝕊)+12tr(Δν,𝕊2)i=1d[(νi2)log(1i+1ν)+i+12]\displaystyle=\frac{\nu-(d+1)}{2}\sum_{i=1}^{d}\log\left(1+\sqrt{\frac{2}{\nu}}\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)-\sqrt{\frac{\nu}{2}}\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\sum_{i=1}^{d}\left[\left(\frac{\nu-i}{2}\right)\log\left(1-\frac{i+1}{\nu}\right)+\frac{i+1}{2}\right] (22)
i=1d[logΓ(12(ν(i+1))+1)12log(2π)+12(ν(i+1))νi2log(12(ν(i+1)))].\displaystyle\quad-\sum_{i=1}^{d}\left[\log\Gamma\left(\frac{1}{2}(\nu-(i+1))+1\right)-\frac{1}{2}\log(2\pi)+\frac{1}{2}(\nu-(i+1))-\frac{\nu-i}{2}\log\left(\frac{1}{2}(\nu-(i+1))\right)\right].

In R, we used this last equation to evaluate the log-ratios inside E0E_{0}, E1E_{1} and E2E_{2}.

Note that 𝕏Bν,𝕊(21/2ν1/6)\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6}) implies |tr(Δν,𝕊k)|d 2k|\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{k})|\leq d\,2^{-k} for all kk\in\mathbb{N}, so we expect from Theorem 1 that the maximum errors above (E0E_{0}, E1E_{1} and E2E_{2}) will have the asymptotic behavior

Ei=𝒪d(ν(1+i)/2),for all i{0,1,2},\displaystyle E_{i}=\mathcal{O}_{d}(\nu^{-(1+i)/2}),\quad\text{for all }i\in\{0,1,2\}, (23)

or equivalently,

lim infνlogEilog(ν1)1+i2,for all i{0,1,2}.\displaystyle\liminf_{\nu\to\infty}\frac{\log E_{i}}{\log(\nu^{-1})}\geq\frac{1+i}{2},\quad\text{for all }i\in\{0,1,2\}. (24)

The property (24) is verified in Fig. 2 below, for various choices of 𝕊\mathbb{S}. Similarly, the corresponding the log-log plots of the errors as a function of ν\nu are displayed in Fig. 1. The simulations are limited to the range 5ν2055\leq\nu\leq 205. The R code that generated Fig. 1 and Fig. 2 can be found in C.

Refer to caption
(a) 𝕊=(2112)\mathbb{S}=\begin{pmatrix}2&1\\ 1&2\end{pmatrix}
Refer to caption
(b) 𝕊=(2113)\mathbb{S}=\begin{pmatrix}2&1\\ 1&3\end{pmatrix}
Refer to caption
(c) 𝕊=(2114)\mathbb{S}=\begin{pmatrix}2&1\\ 1&4\end{pmatrix}
Refer to caption
(d) 𝕊=(2115)\mathbb{S}=\begin{pmatrix}2&1\\ 1&5\end{pmatrix}
Refer to caption
(e) 𝕊=(3112)\mathbb{S}=\begin{pmatrix}3&1\\ 1&2\end{pmatrix}
Refer to caption
(f) 𝕊=(3113)\mathbb{S}=\begin{pmatrix}3&1\\ 1&3\end{pmatrix}
Refer to caption
(g) 𝕊=(3114)\mathbb{S}=\begin{pmatrix}3&1\\ 1&4\end{pmatrix}
Refer to caption
(h) 𝕊=(3115)\mathbb{S}=\begin{pmatrix}3&1\\ 1&5\end{pmatrix}
Refer to caption
(i) 𝕊=(4112)\mathbb{S}=\begin{pmatrix}4&1\\ 1&2\end{pmatrix}
Refer to caption
(j) 𝕊=(4113)\mathbb{S}=\begin{pmatrix}4&1\\ 1&3\end{pmatrix}
Refer to caption
(k) 𝕊=(4114)\mathbb{S}=\begin{pmatrix}4&1\\ 1&4\end{pmatrix}
Refer to caption
(l) 𝕊=(4115)\mathbb{S}=\begin{pmatrix}4&1\\ 1&5\end{pmatrix}
Refer to caption
(m) 𝕊=(5112)\mathbb{S}=\begin{pmatrix}5&1\\ 1&2\end{pmatrix}
Refer to caption
(n) 𝕊=(5113)\mathbb{S}=\begin{pmatrix}5&1\\ 1&3\end{pmatrix}
Refer to caption
(o) 𝕊=(5114)\mathbb{S}=\begin{pmatrix}5&1\\ 1&4\end{pmatrix}
Refer to caption
(p) 𝕊=(5115)\mathbb{S}=\begin{pmatrix}5&1\\ 1&5\end{pmatrix}
Fig. 1: Plots of 1/Ei1/E_{i} as a function of ν\nu, for various choices of 𝕊\mathbb{S}. Both the horizontal and vertical axes are on a logarithmic scale. The plots clearly illustrate how the addition of correction terms from Theorem 1 to the base approximation (18) improves it.
Refer to caption
(a) 𝕊=(2112)\mathbb{S}=\begin{pmatrix}2&1\\ 1&2\end{pmatrix}
Refer to caption
(b) 𝕊=(2113)\mathbb{S}=\begin{pmatrix}2&1\\ 1&3\end{pmatrix}
Refer to caption
(c) 𝕊=(2114)\mathbb{S}=\begin{pmatrix}2&1\\ 1&4\end{pmatrix}
Refer to caption
(d) 𝕊=(2115)\mathbb{S}=\begin{pmatrix}2&1\\ 1&5\end{pmatrix}
Refer to caption
(e) 𝕊=(3112)\mathbb{S}=\begin{pmatrix}3&1\\ 1&2\end{pmatrix}
Refer to caption
(f) 𝕊=(3113)\mathbb{S}=\begin{pmatrix}3&1\\ 1&3\end{pmatrix}
Refer to caption
(g) 𝕊=(3114)\mathbb{S}=\begin{pmatrix}3&1\\ 1&4\end{pmatrix}
Refer to caption
(h) 𝕊=(3115)\mathbb{S}=\begin{pmatrix}3&1\\ 1&5\end{pmatrix}
Refer to caption
(i) 𝕊=(4112)\mathbb{S}=\begin{pmatrix}4&1\\ 1&2\end{pmatrix}
Refer to caption
(j) 𝕊=(4113)\mathbb{S}=\begin{pmatrix}4&1\\ 1&3\end{pmatrix}
Refer to caption
(k) 𝕊=(4114)\mathbb{S}=\begin{pmatrix}4&1\\ 1&4\end{pmatrix}
Refer to caption
(l) 𝕊=(4115)\mathbb{S}=\begin{pmatrix}4&1\\ 1&5\end{pmatrix}
Refer to caption
(m) 𝕊=(5112)\mathbb{S}=\begin{pmatrix}5&1\\ 1&2\end{pmatrix}
Refer to caption
(n) 𝕊=(5113)\mathbb{S}=\begin{pmatrix}5&1\\ 1&3\end{pmatrix}
Refer to caption
(o) 𝕊=(5114)\mathbb{S}=\begin{pmatrix}5&1\\ 1&4\end{pmatrix}
Refer to caption
(p) 𝕊=(5115)\mathbb{S}=\begin{pmatrix}5&1\\ 1&5\end{pmatrix}
Fig. 2: Plots of logEi/log(ν1)\log E_{i}/\log(\nu^{-1}) as a function of ν\nu, for various choices of 𝕊\mathbb{S}. The plots confirm (24) for our choices of 𝕊\mathbb{S} and bring strong evidence for the validity of Theorem 1.

3 Applications

3.1 Asymptotic properties of Wishart asymmetric kernel estimators

Symmetric positive definite (SPD) matrix data are prevalent in modern statistical applications. As pointed out by Hadjicosta [26] and Hadjicosta and Richards [27], where goodness-of-fit tests for the Wishart distribution were developed based on integral transforms, factor analysis, diffusion tensor imaging, CMB radiation measurements, volatility models in finance, wireless communication systems and polarimetric radar imaging are just a few areas where SPD matrix data might be observed. Some articles have dealt with methods of density estimation on this space but the literature remains relatively scarce. Chevallier et al. [17] show how truncated Fourier series can be used for various applications, Kim and Richards [32, 33] and Haff et al. [28] explore the deconvolution of Wishart mixtures, Chevallier et al. [18] adapt the kernel estimator on compact Riemannian manifolds introduced by Pelletier [46] to compact subsets of the space of multivariate Gaussian distributions under the Fisher information metric and the Wasserstein metric, and Asta [5] defines a kernel density estimator on symmetric spaces of non-compact type (similar to Pelletier’s but for which Helgason–Fourier transforms are defined) and proves an upper bound on the convergence rate that is analogous to the minimax rate of classical kernel estimators on Euclidean spaces.

In a recent preprint, Li et al. [39] consider log-Gaussian kernel estimators (based on the logarithm map for SPD matrices) and a variant of the Wishart asymmetric kernel estimator that is slightly different from our definition below in (25). They prove various asymptotic properties for the former and a simulation study compares them both. If we were to apply traditional multivariate kernel estimators to the vectorization (in d(d+1)/2\mathbb{R}^{d(d+1)/2}, recall (5)) of a sequence of i.i.d. random SPD matrices, then these estimators would misbehave near the boundary because of the condition on the eigenvalues (i.e., that they remain positive), and the usual boundary kernel modifications would not be appropriate either since positive definiteness is not a condition that can be translated to individual bounds on the entries of a matrix. To the best of our knowledge, [39] is the only paper that presents estimators on the space of SPD matrices that address (implicitly) the spill over problem of traditional multivariate kernel estimators caused by the boundary condition on the eigenvalues. Similarly to Li et al. [39], we can construct a new density estimator with a Wishart asymmetric kernel that creates a variable smoothing in our space and has a uniformly negligible bias near the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}.

In terms of applications, our new density estimation method on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} could be used, apart from visualization purposes, for nonparametric alternatives to regression and classification (both supervised and unsupervised) in any of the fields mentioned at the beginning of the first paragraph in this section. The favorable boundary properties of our estimator (the proof of Theorem 2 below shows that the pointwise bias is asymptotically uniformly negligible near the boundary) means that it could be particularly useful for scarce data sets and/or data sets with clusters of observations near the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}.

Here is the definition of our estimator. Assume that we have a sequence of observations 𝕏1,,𝕏n𝒮++d\mathbb{X}_{1},\dots,\mathbb{X}_{n}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} that are independent and FF distributed (FF is unknown), with density ff supported on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} for some dd\in\mathbb{N}. Then, for a given bandwidth parameter b>0b>0, let

f^n,b(𝕊):=1ni=1nK1/b,b𝕊(𝕏i),𝕊𝒮++d,\hat{f}_{n,b}(\mathbb{S})\vcentcolon=\frac{1}{n}\sum_{i=1}^{n}K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{i}),\quad\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, (25)

be the (or a) Wishart asymmetric kernel estimator for the density function ff, where K1/b,b𝕊()K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\cdot) is defined in (3). The estimator f^n,b\hat{f}_{n,b} can be seen as a continuous example in the broader class of multivariate associated kernel estimators introduced by Kokonendji and Somé [35, 36]. It is also a natural generalization of a slight variant of the (unmodified) Gamma kernel estimator introduced by Chen [16] because the Wishart distribution (recall (3)) is a matrix-variate analogue of the Gamma distribution. In [16, 12, 19, 9, 10, 11, 58, 8, 29], many asymptotic properties for Gamma kernel estimators of density functions supported on the half-line [0,)[0,\infty) were studied, among other things: pointwise bias, pointwise variance, mean squared error, mean integrated squared error, asymptotic normality and uniform strong consistency. Also, bias reduction techniques were explored by Igarashi and Kakizawa [31] and Funke and Kawka [21], and adaptative Bayesian methods of bandwidth selection were presented by Somé [52] and Somé and Kokonendji [53].

Below, we show how some of the asymptotic properties of f^n,b\hat{f}_{n,b} can be studied using the asymptotic expansion developed in Theorem 1. Assume that ff is Lipschitz continuous and bounded on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}. (To make sense of this assumption, note that 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} is an open and convex subset in the space of symmetric matrices of size d×dd\times d, which itself is isomorphic to d(d+1)/2\mathbb{R}^{d(d+1)/2}.) Then, straightforward calculations show that, for any given 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd},

𝕍ar(f^n,b(𝕊))=n1𝔼[(K1/b,b𝕊(𝕏))2]n1(𝔼[K1/b,b𝕊(𝕏)])2=n1𝔼[(K1/b,b𝕊(𝕏))2]𝒪d,𝕊(n1),\displaystyle\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}\,\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right]-n^{-1}\left(\mathbb{E}\left[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right]\right)^{2}=n^{-1}\,\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right]-\mathcal{O}_{d,\mathbb{S}}(n^{-1}), (26)

where

𝔼[(K1/b,b𝕊(𝕏))2]\displaystyle\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right] =(14)𝒮dg1/b,b𝕊2(𝕏)f(𝕏)d𝕏+od,𝕊(1)=2d(d+1)/4(f(𝕊)+𝒪d,𝕊(b1/2))(2π)d(d+1)/2 2d(d1)/2|2b𝕊|d+1𝒮dg1/b,b2𝕊(𝕏)d𝕏=1+od,𝕊(1)+od,𝕊(1)\displaystyle\stackrel{{\scriptstyle\eqref{eq:LLT.order.2}}}{{=}}\int_{\mathcal{S}^{\hskip 0.85358ptd}}g_{1/b,b\hskip 0.85358pt\mathbb{S}}^{2}(\mathbb{X})\,f(\mathbb{X})\,{\rm d}\mathbb{X}+\mathrm{o}_{d,\mathbb{S}}(1)\stackrel{{\scriptstyle\phantom{\eqref{eq:LLT.order.2}}}}{{=}}\frac{2^{-d(d+1)/4}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))}{\sqrt{(2\pi)^{d(d+1)/2}\,2^{-d(d-1)/2}|\sqrt{2b}\,\mathbb{S}|^{d+1}}}\underbrace{\int_{\mathcal{S}^{\hskip 0.85358ptd}}g_{1/b,\frac{b}{\sqrt{2}}\hskip 0.85358pt\mathbb{S}}(\mathbb{X}){\rm d}\mathbb{X}}_{=~{}1+\mathrm{o}_{d,\mathbb{S}}(1)}\,+\,\mathrm{o}_{d,\mathbb{S}}(1)
=bd(d+1)/4|π𝕊|d+122d(d+2)/2(f(𝕊)+𝒪d,𝕊(b1/2)).\displaystyle\stackrel{{\scriptstyle\phantom{\eqref{eq:LLT.order.2}}}}{{=}}b^{-d(d+1)/4}\,\frac{|\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2})). (27)

By applying this last estimate in (26), we obtain the pointwise variance.

Proposition 1 (Pointwise variance).

Assume that ff is Lipschitz continuous and bounded on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}. For any given 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, we have

𝕍ar(f^n,b(𝕊))=n1bd(d+1)/4|π𝕊|d+122d(d+2)/2(f(𝕊)+𝒪d,𝕊(b1/2)),n.\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}b^{-d(d+1)/4}\,\frac{|\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2})),\quad n\to\infty. (28)

From this, other asymptotic expressions can be derived such as the mean squared error and the mean integrated squared error, and we can also optimize the bandwidth parameter bb with respect these expressions to implement a plug-in selection method exactly as we would in the setting of traditional multivariate kernel estimators, see, e.g., Scott [50, Section 6.5] or Chacón and Duong [14, Section 3.6]. The expressions for the mean squared error and the mean integrated squared error (away from the boundary) are provided below in Corollary 2 and Theorem 4, respectively, together with the corresponding optimal choice of bb. For the sake of completeness, we also provide results on the pointwise bias of our estimator (see Theorem 2), its pointwise variance as we move towards the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} (see Theorem 3) and its asymptotic normality (see Theorem 5).

For each result in the remainder of this section, one of the following two assumptions will be used:

\displaystyle\bullet\quad The density ff is Lipschitz continuous and bounded on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}. (29)
\displaystyle\bullet\quad The density ff and its first order partial derivatives are continuous and bounded on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}, (30)
and the second order partial derivatives of ff are uniformly continuous and bounded on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}. (31)
Remark 2.

Again, to make sense of the above assumptions, note that 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} is an open and convex subset in the space of symmetric matrices of size d×dd\times d, denoted by 𝒮d\mathcal{S}^{\hskip 0.85358ptd}, which itself is isomorphic to d(d+1)/2\mathbb{R}^{d(d+1)/2}.

We denote the expectation of f^n,b(𝕊)\hat{f}_{n,b}(\mathbb{S}) by

fb(𝕊):=𝔼[f^n,b(𝕊)]\displaystyle f_{b}(\mathbb{S})\vcentcolon=\mathbb{E}\left[\hat{f}_{n,b}(\mathbb{S})\right] =𝔼[K1/b,b𝕊(𝕏1)]=𝒮++dK1/b,b𝕊(𝕄)f(𝕄)d𝕄.\displaystyle=\mathbb{E}[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{1})]=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{M})f(\mathbb{M}){\rm d}\mathbb{M}. (32)

Alternatively, notice that if 𝕎𝕊Wishartd(1/b,b𝕊)\mathbb{W}_{\mathbb{S}}\sim\mathrm{Wishart}_{d}(1/b,b\hskip 0.85358pt\mathbb{S}), then we also have the representation

fb(𝕊)=𝔼[f(𝕎𝕊)].f_{b}(\mathbb{S})=\mathbb{E}[f(\mathbb{W}_{\mathbb{S}})]. (33)

The asymptotics of the pointwise bias and variance were first computed by Chen [15, 16] for Beta and Gamma kernel estimators, by Ouimet and Tolosana-Delgado [43] for the Dirichlet kernel estimator of Aitchison and Lauder [2], and by Kokonendji and Somé [35, 36] for multivariate associated kernel estimators. The next two theorems below extend the (unmodified) Gamma case to our multidimensional setting.

Theorem 2 (Pointwise bias).

Assume that (31) holds. Then as b0b\to 0, and for all 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, we have

𝔹ias[f^n,b(𝕊)]=fb(𝕊)f(𝕊)=bg(𝕊)+od,𝕊(b),\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]=f_{b}(\mathbb{S})-f(\mathbb{S})=b\,g(\mathbb{S})+\mathrm{o}_{d,\mathbb{S}}(b), (34)

where

g(𝕊)\displaystyle g(\mathbb{S}) :=12𝒊,𝒋[d]2i1i2,j1j2[2Bd(𝕊𝕊)Bd](𝒊,𝒋)2𝕊𝒊𝕊𝒋f(𝕊),\displaystyle\vcentcolon=\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\left[2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})B_{d}\right]_{(\boldsymbol{i},\boldsymbol{j})}\,\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S}), (35)

and where \otimes denotes the Kronecker product, and [](𝐢,𝐣)[\,\cdot\,]_{(\boldsymbol{i},\boldsymbol{j})} means that we select the entry ((i21)i2/2+i1,(j21)j2/2+j1)((i_{2}-1)i_{2}/2+i_{1},(j_{2}-1)j_{2}/2+j_{1}) in the (d(d+1)/2)×(d(d+1)/2)(d(d+1)/2)\times(d(d+1)/2) matrix.

Theorem 3 (Pointwise variance near and away from the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}).

Assume that (29) holds. Furthermore, let 𝕂𝒮++d\mathbb{K}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be independent of bb and assume that it diagonalizes as 𝕂=Vdiag(𝛌(𝕂))V\mathbb{K}=V\,\mathrm{diag}(\boldsymbol{\lambda}(\mathbb{K}))\,V^{\top}. Pick any subset 𝒥[d]\emptyset\subseteq\mathcal{J}\subseteq[d] and assume that

𝕊=Vdiag(𝝀b(𝕂))V,where the vector 𝝀b(𝕂) satisfies[𝝀b(𝕂)]i:={λi(𝕂),if i[d]\𝒥,bλi(𝕂),if i𝒥.\mathbb{S}=V\,\mathrm{diag}(\boldsymbol{\lambda}_{b}(\mathbb{K}))\,V^{\top},\quad\text{where the vector $\boldsymbol{\lambda}_{b}(\mathbb{K})$ satisfies}\quad[\boldsymbol{\lambda}_{b}(\mathbb{K})]_{i}\vcentcolon=\begin{cases}\lambda_{i}(\mathbb{K}),&\mbox{if }i\in[d]\backslash\mathcal{J},\\ b\lambda_{i}(\mathbb{K}),&\mbox{if }i\in\mathcal{J}.\end{cases} (36)

In particular, with this choice of 𝕊\mathbb{S}, note that |𝕊|=b|𝒥||𝕂||\mathbb{S}|=b^{|\mathcal{J}|}|\mathbb{K}|. Then we have, as nn\to\infty,

𝕍ar(f^n,b(𝕊))=n1br(d)/2|𝒥|d+12ψ(𝕂)(f(𝕊)+𝒪d,𝕊(b1/2))+𝒪d,𝕊(n1),\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}b^{-r(d)/2-|\mathcal{J}|\frac{d+1}{2}}\cdot\psi(\mathbb{K})\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))+\mathcal{O}_{d,\mathbb{S}}(n^{-1}), (37)

where

r(d):=d(d+1)2andψ(𝕂):=|π𝕂|d+122d(d+2)/2.r(d)\vcentcolon=\frac{d(d+1)}{2}\qquad\text{and}\qquad\psi(\mathbb{K})\vcentcolon=\frac{|\sqrt{\pi}\,\mathbb{K}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}. (38)

The above theorem means that the pointwise variance is 𝒪d,𝕊(n1br(d)/2)\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2}) away from the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd} and it gets multiplied by a factor b(d+1)/2b^{-(d+1)/2} everytime one of the dd eigenvalues approaches zero at a linear rate with respect to bb. If |𝒥||\mathcal{J}| eigenvalues approach zero as b0b\to 0, then the pointwise variance is 𝒪d,𝕊(n1br(d)/2|𝒥|d+12)\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2-|\mathcal{J}|\frac{d+1}{2}}).

By combining Theorem 2 and Proposition 1 (equivalently, Theorem 3 for 𝒥=\mathcal{J}=\emptyset), we can compute the mean squared error of our estimator and optimize the choice of the bandwidth parameter bb.

Corollary 2 (Mean squared error).

Assume that (31) holds. Then, as nn\to\infty and b=b(n)0b=b(n)\to 0, and for any given 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, we have

MSE[f^n,b(𝕊)]\displaystyle\mathrm{MSE}[\hat{f}_{n,b}(\mathbb{S})] :=𝔼[|f^n,b(𝕊)f(𝕊)|2]=𝕍ar(f^n,b(𝕊))+(𝔹ias[f^n,b(𝕊)])2\displaystyle\vcentcolon=\mathbb{E}\left[\left|\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S})\right|^{2}\right]=\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))+\left(\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]\right)^{2} (39)
=n1br(d)/2ψ(𝕊)f(𝕊)+b2g2(𝕊)+𝒪d,𝕊(n1br(d)/2+1/2)+od,𝕊(b2).\displaystyle=n^{-1}b^{-r(d)/2}\cdot\psi(\mathbb{S})f(\mathbb{S})+b^{2}g^{2}(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2+1/2})+\mathrm{o}_{d,\mathbb{S}}(b^{2}).

In particular, if f(𝕊)g(𝕊)0f(\mathbb{S})\cdot g(\mathbb{S})\neq 0, the asymptotically optimal choice of bb, with respect to MSE\mathrm{MSE}, is

bopt(𝕊)=n2/(r(d)+4)[r(d)4ψ(𝕊)f(𝕊)g2(𝕊)]2/(r(d)+4),b_{\mathrm{opt}}(\mathbb{S})=n^{-2/(r(d)+4)}\left[\frac{r(d)}{4}\cdot\frac{\psi(\mathbb{S})f(\mathbb{S})}{g^{2}(\mathbb{S})}\right]^{2/(r(d)+4)}, (40)

with

MSE[f^n,bopt(𝕊)(𝕊)]=n4/(r(d)+4)[1+r(d)4(r(d)4)r(d)r(d)+4](ψ(𝕊)f(𝕊))4/(r(d)+4)(g2(𝕊))r(d)/(r(d)+4)+od,𝕊(n4/(r(d)+4)),n.\mathrm{MSE}[\hat{f}_{n,b_{\mathrm{opt}}(\mathbb{S})}(\mathbb{S})]=n^{-4/(r(d)+4)}\left[\frac{1+\frac{r(d)}{4}}{\left(\frac{r(d)}{4}\right)^{\frac{r(d)}{r(d)+4}}}\right]\frac{\left(\psi(\mathbb{S})f(\mathbb{S})\right)^{4/(r(d)+4)}}{\left(g^{2}(\mathbb{S})\right)^{-r(d)/(r(d)+4)}}+\mathrm{o}_{d,\mathbb{S}}(n^{-4/(r(d)+4)}),\quad n\to\infty. (41)

More generally, if n2/(r(d)+4)bλn^{2/(r(d)+4)}\,b\to\lambda as nn\to\infty and b=b(n)0b=b(n)\to 0 for some λ>0\lambda>0, then

MSE[f^n,b(𝕊)]=n4/(r(d)+4)[λr(d)/2ψ(𝕊)f(𝕊)+λ2g2(𝕊)]+od,𝕊(n4/(r(d)+4)).\mathrm{MSE}[\hat{f}_{n,b}(\mathbb{S})]=n^{-4/(r(d)+4)}\left[\lambda^{-r(d)/2}\psi(\mathbb{S})f(\mathbb{S})+\lambda^{2}g^{2}(\mathbb{S})\right]+\mathrm{o}_{d,\mathbb{S}}(n^{-4/(r(d)+4)}). (42)

By integrating the MSE on the following subset of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd},

𝒮++d(δ):={𝕄𝒮++d:δλ1(𝕄)λd(𝕄)δ1},0<δ<1,\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)\vcentcolon=\left\{\mathbb{M}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}:\delta\leq\lambda_{1}(\mathbb{M})\leq\dots\leq\lambda_{d}(\mathbb{M})\leq\delta^{-1}\right\},\qquad 0<\delta<1, (43)

we obtain the next result.

Theorem 4 (Mean integrated squared error on 𝒮++d(δ)\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)).

Assume that (31) holds. For a given δ(0,1)\delta\in(0,1), assume also that

𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊<,𝒮++d(δ)g2(𝕊)d𝕊<,\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}<\infty,\qquad\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}<\infty, (44)

where recall ψ\psi and gg were defined in (38) and (35), respectively. Then, as nn\to\infty and b=b(n)0b=b(n)\to 0, we have

MISEδ[f^n,b]\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}] :=𝒮++d(δ)𝔼[|f^n,b(𝕊)f(𝕊)|2]d𝕊\displaystyle\vcentcolon=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{E}\left[\left|\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S})\right|^{2}\right]{\rm d}\mathbb{S} (45)
=n1br(d)/2𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊+b2𝒮++d(δ)g2(𝕊)d𝕊+od,δ(n1br(d)/2)+od,δ(b2).\displaystyle=n^{-1}b^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+b^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}+\mathrm{o}_{d,\delta}(n^{-1}b^{-r(d)/2})+\mathrm{o}_{d,\delta}(b^{2}).

In particular, if 𝒮++d(δ)g2(𝕊)d𝕊>0\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}>0, the asymptotically optimal choice of bb, with respect to MISEδ\mathrm{MISE}_{\delta}, is

bopt=n2/(r(d)+4)[r(d)4𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊𝒮++d(δ)g2(𝕊)d𝕊]2/(r(d)+4),b_{\mathrm{opt}}=n^{-2/(r(d)+4)}\left[\frac{r(d)}{4}\cdot\frac{\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}}{\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}}\right]^{2/(r(d)+4)}, (46)

with

MISEδ[f^n,bopt]\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b_{\mathrm{opt}}}] =n4/(r(d)+4)[1+r(d)4(r(d)4)r(d)r(d)+4](𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊)4/(r(d)+4)(𝒮++d(δ)g2(𝕊)d𝕊)r(d)/(r(d)+4)+od,δ(n4/(r(d)+4)),n.\displaystyle=n^{-4/(r(d)+4)}\left[\frac{1+\frac{r(d)}{4}}{\left(\frac{r(d)}{4}\right)^{\frac{r(d)}{r(d)+4}}}\right]\frac{\left(\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}\right)^{4/(r(d)+4)}}{\left(\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}\right)^{-r(d)/(r(d)+4)}}+\mathrm{o}_{d,\delta}(n^{-4/(r(d)+4)}),\quad n\to\infty. (47)

More generally, if n2/(r(d)+4)bλn^{2/(r(d)+4)}\,b\to\lambda as nn\to\infty and b=b(n)0b=b(n)\to 0 for some λ>0\lambda>0, then

MISEδ[f^n,b]\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}] =n4/(r(d)+4)[λr(d)/2𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊+λ2𝒮++d(δ)g2(𝕊)d𝕊]+od,δ(n4/(r(d)+4)).\displaystyle=n^{-4/(r(d)+4)}\left[\lambda^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+\lambda^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}\right]+\mathrm{o}_{d,\delta}(n^{-4/(r(d)+4)}). (48)

A straightforward verification of the Lindeberg condition for double arrays yields the asymptotic normality.

Theorem 5 (Asymptotic normality).

Assume that (29) holds. Let 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be such that f(𝕊)>0f(\mathbb{S})>0. If n1/2br(d)/4n^{1/2}b^{r(d)/4}\to\infty as nn\to\infty and b=b(n)0b=b(n)\to 0, then

n1/2br(d)/4(f^n,b(𝕊)fb(𝕊))𝒟𝒩(0,ψ(𝕊)f(𝕊)).n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})). (49)

If we also have n1/2br(d)/4+1/20n^{1/2}b^{r(d)/4+1/2}\to 0 as nn\to\infty and b=b(n)0b=b(n)\to 0, then Theorem 2 implies

n1/2br(d)/4(f^n,b(𝕊)f(𝕊))𝒟𝒩(0,ψ(𝕊)f(𝕊)).n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})). (50)

Independently of the above rates for nn and bb, if we assume (31) instead and n2/(r(d)+4)bλn^{2/(r(d)+4)}\,b\to\lambda as nn\to\infty and b=b(n)0b=b(n)\to 0 for some λ>0\lambda>0, then Theorem 2 implies

n2/(r(d)+4)(f^n,b(𝕊)f(𝕊))𝒟𝒩(λg(𝕊),λr(d)/2ψ(𝕊)f(𝕊)).n^{2/(r(d)+4)}(\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(\lambda\,g(\mathbb{S}),\lambda^{-r(d)/2}\psi(\mathbb{S})f(\mathbb{S})). (51)
Remark 3.

The rate of convergence for the traditional d(d+1)/2d(d+1)/2-dimensional kernel density estimator with i.i.d. data and bandwidth hh is 𝒪d(n1/2hr(d)/2)\mathcal{O}_{d}(n^{-1/2}h^{-r(d)/2}) in Theorem 3.1.15 of Prakasa Rao [48], whereas f^n,b\hat{f}_{n,b} converges at a rate of 𝒪d(n1/2br(d)/4)\mathcal{O}_{d}(n^{-1/2}b^{-r(d)/4}). Hence, the relation between the bandwidth of f^n,b\hat{f}_{n,b} and the bandwidth of the traditional multivariate kernel density estimator is bh2b\approx h^{2}.

3.2 Total variation and other probability metrics upper bounds between the Wishart and SMN distributions

Our second application of Theorem 1 is to compute an upper bound on the total variation between the probability measures induced by (3) and (8). Given the relation there is between the total variation and other probability metrics such as the Hellinger distance (see, e.g., Gibbs and Su [24, p.421]), we obtain several other upper bounds automatically. For the uninitiated reader, the utility of having total variation or Hellinger distance bounds between two measures is discussed by Pollard [47].

Theorem 6.

Let ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be given. Let ν,𝕊\mathbb{Q}_{\nu,\mathbb{S}} be the law of the SMNd×d(ν𝕊,Bd(2ν𝕊2ν𝕊)Bd)\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}) distribution defined in (8), and let ν,𝕊\mathbb{P}_{\nu,\mathbb{S}} be the law of the Wishartd(ν,𝕊)\mathrm{Wishart}_{d}(\nu,\mathbb{S}) distribution defined in (3). Then, as ν\nu\to\infty, we have

dist(ν,𝕊,ν,𝕊)Cd3/2νand(ν,𝕊,ν,𝕊)2Cd3/2ν,\mathrm{dist}\hskip 0.85358pt(\mathbb{P}_{\nu,\mathbb{S}},\mathbb{Q}_{\nu,\mathbb{S}})\leq\frac{C\hskip 0.85358ptd^{\hskip 0.85358pt3/2}}{\sqrt{\nu}}\qquad\text{and}\qquad\mathcal{H}(\mathbb{P}_{\nu,\mathbb{S}},\mathbb{Q}_{\nu,\mathbb{S}})\leq\sqrt{\frac{2C\hskip 0.85358ptd^{\hskip 0.85358pt3/2}}{\sqrt{\nu}}}, (52)

where C>0C>0 is a universal constant, (,)\mathcal{H}(\cdot,\cdot) denotes the Hellinger distance, and dist(,)\mathrm{dist}\hskip 0.85358pt(\cdot,\cdot) can be replaced by any of the following probability metrics: Total variation, Kolmogorov (or Uniform) metric, Lévy metric, Discrepancy metric, Prokhorov metric.

4 Proofs

Proof of Theorem 1.

First, note that

𝕊1/2𝕏𝕊1/2=ν(ν𝕊)1/2(ν𝕊+𝕏ν𝕊)(ν𝕊)1/2=ν(Id+2/νΔν,𝕊),\mathbb{S}^{-1/2}\hskip 0.85358pt\mathbb{X}\,\mathbb{S}^{-1/2}=\nu\,(\nu\hskip 0.85358pt\mathbb{S})^{-1/2}(\nu\hskip 0.85358pt\mathbb{S}+\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\nu\hskip 0.85358pt\mathbb{S})^{-1/2}=\nu\,(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}), (53)

so we can rewrite (3) as

Kν,𝕊(𝕏)=|Id+2/νΔν,𝕊|ν/2(d+1)/22d/2πd(d+1)/4|2ν𝕊|(d+1)/2exp(ν2tr(Id+2/νΔν,𝕊))exp(νd2)i=1d(ν(i+1))(νi)/2e(i+1)/2ν(νi)/2i=1dΓ(12(ν(i+1))+1)2πe12(ν(i+1))[12(ν(i+1))](νi)/2.K_{\nu,\mathbb{S}}(\mathbb{X})=\frac{|\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}|^{\nu/2-(d+1)/2}}{2^{d/2}\pi^{\hskip 0.85358ptd(d+1)/4}|\sqrt{2\nu}\,\mathbb{S}|^{(d+1)/2}}\cdot\frac{\exp\left(-\frac{\nu}{2}\mathrm{tr}(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}})\right)\,\exp(\frac{\nu d}{2})}{\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\cdot\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}}. (54)

Using the Taylor expansion

log(1y)=yy22y33y44+𝒪(y5),|y|<1,\log(1-y)=-y-\frac{y^{2}}{2}-\frac{y^{3}}{3}-\frac{y^{4}}{4}+\mathcal{O}(y^{5}),\quad|y|<1, (55)

and Stirling’s formula,

logΓ(z+1)=12log(2π)+(z+12)logzz+112z+𝒪(z3),z>0,\log\Gamma(z+1)=\frac{1}{2}\log(2\pi)+(z+\tfrac{1}{2})\log z-z+\frac{1}{12z}+\mathcal{O}(z^{-3}),\quad z>0, (56)

see, e.g., Abramowitz and Stegun [1, p.257], we have

log(i=1d(ν(i+1))(νi)/2e(i+1)/2ν(νi)/2)\displaystyle\log\left(\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\right) =i=1di+12+i=1dνi2log(1i+1ν)\displaystyle=\sum_{i=1}^{d}\frac{i+1}{2}+\sum_{i=1}^{d}\frac{\nu-i}{2}\log\left(1-\frac{i+1}{\nu}\right)
=i=1di+12i=1d{(νi)(i+1)2ν+(νi)(i+1)24ν2+𝒪d(ν2)}\displaystyle=\sum_{i=1}^{d}\frac{i+1}{2}-\sum_{i=1}^{d}\left\{\frac{(\nu-i)(i+1)}{2\nu}+\frac{(\nu-i)(i+1)^{2}}{4\nu^{2}}+\mathcal{O}_{d}(\nu^{-2})\right\}
=i=1d{ν1i214+𝒪d(ν2)}=ν1d(2d2+3d5)24+𝒪d(ν2),\displaystyle=\sum_{i=1}^{d}\left\{\nu^{-1}\cdot\frac{i^{2}-1}{4}+\mathcal{O}_{d}(\nu^{-2})\right\}=\nu^{-1}\cdot\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\mathcal{O}_{d}(\nu^{-2}), (57)

and

log(i=1dΓ(12(ν(i+1))+1)2πe12(ν(i+1))[12(ν(i+1))](νi)/2)=i=1d{16(ν(i+1))+𝒪d(ν3)}=ν1d6+𝒪d(ν2).\log\left(\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}\right)=\sum_{i=1}^{d}\left\{\frac{1}{6(\nu-(i+1))}+\mathcal{O}_{d}(\nu^{-3})\right\}=\nu^{-1}\cdot\frac{d}{6}+\mathcal{O}_{d}(\nu^{-2}). (58)

By taking the logarithm in (54) and using the expressions found in (4) and (58), we obtain (also using the fact that λi(Id+2/νΔν,𝕊)=1+2/νλi(Δν,𝕊)\lambda_{i}(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}})=1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}}) for all 1id1\leq i\leq d):

logKν,𝕊(𝕏)\displaystyle\log K_{\nu,\mathbb{S}}(\mathbb{X}) =12(ν(d+1))i=1dlog(1+2/νλi(Δν,𝕊))12log(2dπd(d+1)/2|2ν𝕊|d+1)ν2i=1d2νλi(Δν,𝕊)\displaystyle=\frac{1}{2}(\nu-(d+1))\sum_{i=1}^{d}\log\left(1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)-\frac{1}{2}\log\left(2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}|\sqrt{2\nu}\,\mathbb{S}|^{d+1}\right)-\frac{\nu}{2}\sum_{i=1}^{d}\sqrt{\frac{2}{\nu}}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}}) (59)
ν1{d(2d2+3d5)24+d6}+𝒪d(ν2).\displaystyle\quad-\nu^{-1}\cdot\left\{\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right\}+\mathcal{O}_{d}(\nu^{-2}).

By the Taylor expansion in (55) and the fact that i=1dλik(Δν,𝕊)=tr(Δν,𝕊k)\sum_{i=1}^{d}\lambda_{i}^{k}(\Delta_{\nu,\mathbb{S}})=\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{k}) for all kk\in\mathbb{N}, we have, uniformly for 𝕏Bν,𝕊(η)\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta),

i=1dlog(1+2/νλi(Δν,𝕊))=2νtr(Δν,𝕊)1νtr(Δν,𝕊2)+223ν3/2tr(Δν,𝕊3)1ν2tr(Δν,𝕊4)+𝒪d,η(max1id|λi(Δν,𝕊)|5ν5/2).\sum_{i=1}^{d}\log\left(1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)=\sqrt{\frac{2}{\nu}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})-\frac{1}{\nu}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})+\frac{2\sqrt{2}}{3\nu^{3/2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{1}{\nu^{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\mathcal{O}_{d,\eta}\left(\frac{\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{5}}{\nu^{5/2}}\right). (60)

Therefore,

logKν,𝕊(𝕏)\displaystyle\log K_{\nu,\mathbb{S}}(\mathbb{X}) =12log(2dπd(d+1)/2|2ν𝕊|d+1)12tr(Δν,𝕊2)+ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}\displaystyle=-\frac{1}{2}\log\left(2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}|\sqrt{2\nu}\,\mathbb{S}|^{d+1}\right)-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}} (61)
+ν1{12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}+𝒪d,η(1+max1id|λi(Δν,𝕊)|5ν3/2).\displaystyle\quad+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{5}}{\nu^{3/2}}\right).

With the expression for the symmetric matrix-variate normal density in (8), we can rewrite the above as

log(Kν,𝕊(𝕏)gν,𝕊(𝕏))\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right) =ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}+ν1{12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\} (62)
+𝒪d,η(1+max1id|λi(Δν,𝕊)|5ν3/2),\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{5}}{\nu^{3/2}}\right),

which proves (13). To obtain (14) and conclude the proof, we take the exponential on both sides of the last equation and we expand the right-hand side with

ey=1+y+y22+𝒪(eη~y3),for <yη~.e^{y}=1+y+\frac{y^{2}}{2}+\mathcal{O}(e^{\widetilde{\eta}}y^{3}),\quad\text{for }-\infty<y\leq\widetilde{\eta}. (63)

For ν\nu large enough and uniformly for 𝕏Bν,𝕊(η)\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta), the right-hand side of (62) is 𝒪d(1)\mathcal{O}_{d}(1), so we get

Kν,𝕊(𝕏)gν,𝕊(𝕏)=1\displaystyle\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=1 +ν1/2{23tr(Δν,𝕊3)d+12tr(Δν,𝕊)}+ν1{19(tr(Δν,𝕊3))2d+13tr(Δν,𝕊3)tr(Δν,𝕊)+(d+1)24(tr(Δν,𝕊))2\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right)^{2}\right. (64)
12tr(Δν,𝕊4)+d+12tr(Δν,𝕊2)(d(2d2+3d5)24+d6)}+𝒪d,η(1+max1id|λi(Δν,𝕊)|9ν3/2).\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{9}}{\nu^{3/2}}\right).

This ends the proof. ∎

Proof of Theorem 2.

Assume that (31) holds, and let

𝕎𝕊:=(𝕎𝒊)𝒊[d]2Wishartd(1/b,b𝕊),𝕊𝒮++d,b>0.\mathbb{W}_{\mathbb{S}}\vcentcolon=(\mathbb{W}_{\boldsymbol{i}})_{\boldsymbol{i}\in[d]^{2}}\sim\mathrm{Wishart}_{d}(1/b,b\hskip 0.85358pt\mathbb{S}),\quad\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd},~{}b>0. (65)

(Recall the notation [d]:={1,,d}[d]\vcentcolon=\{1,\dots,d\}.) By a second order mean value theorem, we have

f(𝕎𝕊)f(𝕊)\displaystyle f(\mathbb{W}_{\mathbb{S}})-f(\mathbb{S}) =𝒊[d]2i1i2(𝕎𝒊𝕊𝒊)𝕊𝒊f(𝕊)+12𝒊,𝒋[d]2i1i2,j1j2(𝕎𝒊𝕊𝒊)(𝕎𝒋𝕊𝒋)2𝕊𝒊𝕊𝒋f(𝕊)\displaystyle=\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})\frac{\partial}{\partial\mathbb{S}_{\boldsymbol{i}}}f(\mathbb{S})+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})(\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}})\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S}) (66)
+12𝒊,𝒋[d]2i1i2,j1j2(𝕎𝒊𝕊𝒊)(𝕎𝒋𝕊𝒋)(2𝕊𝒊𝕊𝒋f(𝕄𝕊)2𝕊𝒊𝕊𝒋f(𝕊)),\displaystyle\quad+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})(\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}})\left(\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right),

for some random matrix 𝕄𝕊𝒮++d\mathbb{M}_{\mathbb{S}}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} on the line segment joining 𝕎𝕊\mathbb{W}_{\mathbb{S}} and 𝕊\mathbb{S} in 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}. (The mean value theorem is applicable because the subspace 𝒮++d\mathcal{S}_{\scriptscriptstyle++}^{\scriptscriptstyle d} is open and convex in the space of symmetric matrices of size d×dd\times d, recall Remark 2.) If we take the expectation in the last equation, and then use the estimates in (6) and (7), we get

|fb(𝕊)f(𝕊)b2𝒊,𝒋[d]2i1i2,j1j2[2Bd(𝕊𝕊)Bd](𝒊,𝒋)2𝕊𝒊𝕊𝒋f(𝕊)|\displaystyle\Bigg{|}f_{b}(\mathbb{S})-f(\mathbb{S})-\frac{b}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\left[2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})B_{d}\right]_{(\boldsymbol{i},\boldsymbol{j})}\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\Bigg{|}
12𝒊,𝒋[d]2i1i2,j1j2𝔼[|𝕎𝒊𝕊𝒊||𝕎𝒋𝕊𝒋||2𝕊𝒊𝕊𝒋f(𝕄𝕊)2𝕊𝒊𝕊𝒋f(𝕊)|𝟙{vec(𝕎𝕊𝕊)1δε,d}]\displaystyle\quad\leq\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}||\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}|\cdot\left|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right|\cdot\mathds{1}_{\{\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\|_{1}\leq\delta_{\varepsilon,d}\}}\right]
+12𝒊,𝒋[d]2i1i2,j1j2𝔼[|𝕎𝒊𝕊𝒊||𝕎𝒋𝕊𝒋||2𝕊𝒊𝕊𝒋f(𝕄𝕊)2𝕊𝒊𝕊𝒋f(𝕊)|𝟙{vec(𝕎𝕊𝕊)1>δε,d}]\displaystyle\qquad+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}||\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}|\cdot\left|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right|\cdot\mathds{1}_{\{\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\|_{1}>\delta_{\varepsilon,d}\}}\right]
=:Δ1+Δ2\displaystyle\quad=\vcentcolon\Delta_{1}+\Delta_{2} (67)

where for any given ε>0\varepsilon>0, the real number δε,d(0,1]\delta_{\varepsilon,d}\in(0,1] is such that

vec(𝕊𝕊)1δd,εimplies|2𝕊𝒊𝕊𝒋f(𝕊)2𝕊𝒊𝕊𝒋f(𝕊)|<ε,\|\mathrm{vec}(\mathbb{S}^{\prime}-\mathbb{S})\|_{1}\leq\delta_{d,\varepsilon}\quad\text{implies}\quad\left|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S}^{\prime})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right|<\varepsilon, (68)

uniformly for 𝕊,𝕊𝒮++d\mathbb{S},\mathbb{S}^{\prime}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}. (We know that such a number exists because the second order partial derivatives of ff are assumed to be uniformly continuous on 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}.) Equations (68) and (7) then yield, together with the Cauchy-Schwarz inequality,

Δ112𝒊,𝒋[d]2i1i2,j1j2ε𝔼[|𝕎𝒊𝕊𝒊|2]𝔼[|𝕎𝒋𝕊𝒋|2]=ε𝒪d,𝕊(b).\Delta_{1}\leq\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\varepsilon\cdot\sqrt{\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|^{2}\right]}\sqrt{\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}|^{2}\right]}\,=\varepsilon\cdot\mathcal{O}_{d,\mathbb{S}}(b). (69)

The second order partial derivatives of ff are also assumed to be bounded, say by some constant Md>0M_{d}>0. Furthermore, {vec(𝕎𝕊𝕊)1>δε,d}\{\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\|_{1}>\delta_{\varepsilon,d}\} implies that at least one component of (𝕎𝒌𝕊𝒌)𝒌[d]2(\mathbb{W}_{\boldsymbol{k}}-\mathbb{S}_{\boldsymbol{k}})_{\boldsymbol{k}\in[d]^{2}} is larger than δε,d/d2\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2}, so a union bound over 𝒌\boldsymbol{k} followed by d2d^{\hskip 0.85358pt2} concentration bounds for the marginals of the Wishart distribution (the diagonal entries of a Wishart random matrix are chi-square distributed while the off-diagonal entries are variance-gamma distributed) yield

Δ2𝒪d,𝕊(12𝒊,𝒋[d]2i1i2,j1j22Md𝒌[d]2(|𝕎𝒌𝕊𝒌|δε,d/d2))𝒪d,𝕊(d5Md2exp((δε,d/d2)22bcd,𝕊)),\Delta_{2}\leq\mathcal{O}_{d,\mathbb{S}}\left(\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}2M_{d}\cdot\sqrt{\sum_{\boldsymbol{k}\in[d]^{2}}\mathbb{P}\left(|\mathbb{W}_{\boldsymbol{k}}-\mathbb{S}_{\boldsymbol{k}}|\geq\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2}\right)}\right)\leq\mathcal{O}_{d,\mathbb{S}}\left(d^{\hskip 0.85358pt5}\,M_{d}\cdot\sqrt{2\exp\left(-\frac{(\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2})^{2}}{2\cdot bc_{d,\mathbb{S}}}\right)}\right), (70)

where cd,𝕊>0c_{d,\mathbb{S}}>0 is a large enough constant that depends only on dd and 𝕊\mathbb{S}. If we choose a sequence ε=ε(b)\varepsilon=\varepsilon(b) that goes to 0 as b0b\to 0 slowly enough that 1δε,d>d2[100bcd,𝕊|logb|]1/21\geq\delta_{\varepsilon,d}>d^{2}\,[100\,bc_{d,\mathbb{S}}\,|\log b|]^{1/2}, for example, then Δ1+Δ2\Delta_{1}+\Delta_{2} in (4) is od,𝕊(b)\mathrm{o}_{d,\mathbb{S}}(b) by (69) and (70). This ends the proof. ∎

Proof of Theorem 3.

Assume that (29) holds. First, note that we can write

f^n,b(𝕊)fb(𝕊)=1ni=1nYi,b(𝕊),\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S})=\frac{1}{n}\sum_{i=1}^{n}Y_{i,b}(\mathbb{S}), (71)

where the random variables

Yi,b(𝕊):=K1/b,b𝕊(𝕏i)fb(𝕊),1in,are i.i.d.Y_{i,b}(\mathbb{S})\vcentcolon=K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{i})-f_{b}(\mathbb{S}),~{}~{}1\leq i\leq n,\quad\text{are i.i.d.} (72)

Hence, if 𝕎~𝕊Wishartd(2/b(d+1),b𝕊/2)\widetilde{\mathbb{W}}_{\mathbb{S}}\sim\mathrm{Wishart}_{d}(2/b-(d+1),b\hskip 0.85358pt\mathbb{S}/2), then

𝕍ar(f^n,b(𝕊))\displaystyle\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S})) =n1𝔼[K1/b,b𝕊(𝕏)2]n1(fb(𝕊))2=n1Ab(𝕊)𝔼[f(𝕎~𝕊)]𝒪d,𝕊(n1)\displaystyle=n^{-1}\,\mathbb{E}\left[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})^{2}\right]-n^{-1}\left(f_{b}(\mathbb{S})\right)^{2}=n^{-1}A_{b}(\mathbb{S})\,\mathbb{E}[f(\widetilde{\mathbb{W}}_{\mathbb{S}})]-\mathcal{O}_{d,\mathbb{S}}(n^{-1}) (73)
=n1Ab(𝕊)(f(𝕊)+𝒪d,𝕊(b1/2))𝒪d,𝕊(n1),\displaystyle=n^{-1}A_{b}(\mathbb{S})\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))-\mathcal{O}_{d,\mathbb{S}}(n^{-1}), (74)

where

Ab(𝕊)\displaystyle A_{b}(\mathbb{S}) :=|2bπ𝕊|d+12πd2i=1dΓ(1bd+i2)21biΓ2(12bi+12+1),\displaystyle\vcentcolon=|2b\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\prod_{i=1}^{d}\frac{\Gamma\left(\frac{1}{b}-\frac{d+i}{2}\right)}{2^{\frac{1}{b}-i}\,\Gamma^{\hskip 0.85358pt2}\left(\frac{1}{2b}-\frac{i+1}{2}+1\right)}, (75)

and where the last line in (73) follows from the Lipschitz continuity of ff, the Cauchy-Schwarz inequality and the analogue of (7) for 𝕎~𝕊=(𝕎~𝒊)𝒊[d]2\widetilde{\mathbb{W}}_{\mathbb{S}}=(\widetilde{\mathbb{W}}_{\boldsymbol{i}})_{\boldsymbol{i}\in[d]^{2}}:

𝔼[f(𝕎~𝕊)]f(𝕊)=𝒊[d]2i1i2𝒪(𝔼[|𝕎~𝒊𝕊𝒊|])𝒊[d]2i1i2𝒪(𝔼[|𝕎~𝒊𝕊𝒊|2])=𝒪d,𝕊(b1/2).\mathbb{E}[f(\widetilde{\mathbb{W}}_{\mathbb{S}})]-f(\mathbb{S})=\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}\mathcal{O}\left(\mathbb{E}\left[|\widetilde{\mathbb{W}}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|\right]\right)\leq\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}\mathcal{O}\left(\sqrt{\mathbb{E}\left[|\widetilde{\mathbb{W}}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|^{2}\right]}\right)=\mathcal{O}_{d,\mathbb{S}}(b^{1/2}). (76)

Now, by Stirling’s formula,

2πexxx+1/2Γ(x+1)1,x.\frac{\sqrt{2\pi}e^{-x}x^{x+1/2}}{\Gamma(x+1)}\longrightarrow 1,\quad x\to\infty. (77)

Therefore,

Ab(𝕊)\displaystyle A_{b}(\mathbb{S}) =|2bπ𝕊|d+12πd21(2π)d/2i=1de(di)/2(1bd+i+22)1bd+i+1221bi(12bi+12)1bi(1+𝒪(b))\displaystyle=|2b\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\cdot\frac{1}{(2\pi)^{d/2}}\,\prod_{i=1}^{d}\frac{e^{(d-i)/2}\left(\frac{1}{b}-\frac{d+i+2}{2}\right)^{\frac{1}{b}-\frac{d+i+1}{2}}}{2^{\frac{1}{b}-i}\left(\frac{1}{2b}-\frac{i+1}{2}\right)^{\frac{1}{b}-i}}\cdot(1+\mathcal{O}(b)) (78)
=|2bπ𝕊|d+12πd2bd(d+1)4(2π)d/2i=1de(di)/2(1(di)/21b(i+1))1/b(1+𝒪(b))=|bπ𝕊|d+122d(d+2)/2(1+𝒪d(b)),\displaystyle=|2b\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\cdot\frac{b^{\frac{d(d+1)}{4}}}{(2\pi)^{d/2}}\cdot\prod_{i=1}^{d}e^{(d-i)/2}\left(1-\frac{(d-i)/2}{\frac{1}{b}-(i+1)}\right)^{1/b}\cdot(1+\mathcal{O}(b))=\frac{|\sqrt{b\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\cdot(1+\mathcal{O}_{d}(b)),

where the last equality follows from the fact that limnex(1xn)n1\lim_{n\to\infty}e^{x}(1-\frac{x}{n})^{n}\to 1 for all xx\in\mathbb{R}. By our assumption on 𝕊\mathbb{S}, note that |𝕊|=b|𝒥||𝕂||\mathbb{S}|=b^{|\mathcal{J}|}\,|\mathbb{K}|, so we get the general expression in (37) by combining (73) and (78). This ends the proof. ∎

Proof of Theorem 4.

Assume that (31) holds. By Theorem 2, Theorem 3 for 𝒥=\mathcal{J}=\emptyset, our assumptions in (44), and the dominated convergence theorem, it is possible to show that

MISEδ[f^n,b]\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}] =𝒮++d(δ)𝕍ar(f^n,b(𝕊))d𝕊+𝒮++d(δ)𝔹ias[f^n,b(𝕊)]2d𝕊\displaystyle=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S})){\rm d}\mathbb{S}+\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]^{2}{\rm d}\mathbb{S} (79)
=n1br(d)/2𝒮++d(δ)ψ(𝕊)f(𝕊)d𝕊+b2𝒮++d(δ)g2(𝕊)d𝕊+od,δ(n1br(d)/2)+od,δ(b2).\displaystyle=n^{-1}b^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+b^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}+\mathrm{o}_{d,\delta}(n^{-1}b^{-r(d)/2})+\mathrm{o}_{d,\delta}(b^{2}).

This ends the proof. ∎

Proof of Theorem 5.

Assume that (31) holds. By (71), the asymptotic normality of n1/2br(d)/4(f^n,b(𝕊)fb(𝕊))n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S})) will be proved if we verify the following Lindeberg condition for double arrays (see, e.g., Section 1.9.3 in [51]): For every ε>0\varepsilon>0,

sb2𝔼[|Y1,b(𝕊)|2 1{|Y1,b(𝕊)|>εn1/2sb}]0,n,s_{b}^{-2}\,\mathbb{E}\left[|Y_{1,b}(\mathbb{S})|^{2}\,\mathds{1}_{\{|Y_{1,b}(\mathbb{S})|>\varepsilon n^{1/2}s_{b}\}}\right]\longrightarrow 0,\quad n\to\infty, (80)

where sb2:=𝔼[|Y1,b(𝕊)|2]s_{b}^{2}\vcentcolon=\mathbb{E}\left[|Y_{1,b}(\mathbb{S})|^{2}\right] and b=b(n)0b=b(n)\to 0. From Lemma 3 with ν=1/b\nu=1/b and 𝕄=b𝕊\mathbb{M}=b\,\mathbb{S}, we know that

|Y1,b(𝕊)|=𝒪(ψ(𝕊)br(d)/2)=𝒪d,𝕊(br(d)/2),|Y_{1,b}(\mathbb{S})|=\mathcal{O}\left(\psi(\mathbb{S})b^{-r(d)/2}\right)=\mathcal{O}_{d,\mathbb{S}}(b^{-r(d)/2}), (81)

and we also know that sb=br(d)/4ψ(𝕊)f(𝕊)(1+od,𝕊(1))s_{b}=b^{-r(d)/4}\sqrt{\psi(\mathbb{S})f(\mathbb{S})}\,(1+\mathrm{o}_{d,\mathbb{S}}(1)) when ff is Lipschitz continuous and bounded, by the proof of Theorem 3. Therefore, whenever n1/2br(d)/4n^{1/2}b^{r(d)/4}\to\infty as nn\to\infty (and b0b\to 0), we have

|Y1,b(𝕊)|n1/2sb=𝒪d,𝕊(n1/2br(d)/4br(d)/2)=𝒪d,𝕊(n1/2br(d)/4)0.\frac{|Y_{1,b}(\mathbb{S})|}{n^{1/2}s_{b}}=\mathcal{O}_{d,\mathbb{S}}(n^{-1/2}\,b^{r(d)/4}\,b^{-r(d)/2})=\mathcal{O}_{d,\mathbb{S}}(n^{-1/2}b^{-r(d)/4})\longrightarrow 0. (82)

Under this condition, Equation (80) holds (since for any given ε>0\varepsilon>0, the indicator function is equal to 0 for nn large enough, independently of ω\omega) and thus

n1/2br(d)/4(f^n,b(𝕊)fb(𝕊))\displaystyle n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S})) =n1/2br(d)/41ni=1nYi,m𝒟𝒩(0,ψ(𝕊)f(𝕊)).\displaystyle=n^{1/2}b^{r(d)/4}\cdot\frac{1}{n}\sum_{i=1}^{n}Y_{i,m}\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})). (83)

This ends the proof. ∎

Proof of Theorem 6.

By the comparison of the total variation norm \|\cdot\| with the Hellinger distance on page 726 of Carter [13], we already know that

ν,𝕊ν,𝕊2(𝕏Bν,𝕊c(1/2))+𝔼[log(dν,𝕊dν,𝕊(𝕏)) 1{𝕏Bν,𝕊(1/2)}].\|\mathbb{P}_{\nu,\mathbb{S}}-\mathbb{Q}_{\nu,\mathbb{S}}\|\leq\sqrt{2\,\mathbb{P}\left(\mathbb{X}\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\right)+\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right]}. (84)

Then, by applying a union bound followed by large deviation bounds on the eigenvalues of the Wishart matrix, we get, for ν\nu large enough,

(𝕏Bν,𝕊c(1/2))i=1d(|λi(Δν,𝕊)|>ν1/622)d2exp(ν1/3100).\mathbb{P}(\mathbb{X}\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2))\leq\sum_{i=1}^{d}\mathbb{P}\left(|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|>\frac{\nu^{1/6}}{2\sqrt{2}}\right)\leq d\cdot 2\,\exp\left(-\frac{\nu^{1/3}}{100}\right). (85)

By Theorem 1, we have

𝔼[log(dν,𝕊dν,𝕊(𝕏)) 1{𝕏Bν,𝕊(1/2)}]\displaystyle\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right] =ν1/2{23𝔼[tr(Δν,𝕊3)]d+12𝔼[tr(Δν,𝕊)]}\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right]-\frac{d+1}{\sqrt{2}}\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right]\Bigg{\}} (86)
+ν1/2𝒪(|𝔼[tr(Δν,𝕊3) 1{𝝀(Δν,𝕊)Bν,𝕊c(1/2)}]|+d|𝔼[tr(Δν,𝕊) 1{𝝀(Δν,𝕊)Bν,𝕊c(1/2)}]|)\displaystyle\quad+\nu^{-1/2}\cdot\mathcal{O}\left(\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\}}\right]\right|+d\cdot\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\,\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\}}\right]\right|\right)
+ν1𝒪(|𝔼[tr(Δν,𝕊4)]|+d|𝔼[tr(Δν,𝕊2)]|+d3).\displaystyle\quad+\nu^{-1}\cdot\mathcal{O}\left(\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})\right]\right|+d\cdot\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right]\right|+d^{\hskip 0.85358pt3}\right).

On the right-hand side, the first and third lines are estimated using Lemma 1, and the second line is bounded using Lemma 2. We find

𝔼[log(dν,𝕊dν,𝕊(𝕏)) 1{𝕏Bν,𝕊(1/2)}]=𝒪(ν1d3).\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right]=\mathcal{O}(\nu^{-1}d^{\hskip 0.85358pt3}). (87)

Putting (85) and (86) together in (84) gives the conclusion. ∎

Appendix A Technical computations

Below, we compute the expectations for the trace of powers (up to 44) of a normalized Wishart matrix. The lemma is used to estimate some trace moments and the ν1\asymp\nu^{-1} errors in (86) of the proof of Theorem 6, and also as a preliminary result for the proof of Lemma 2.

Lemma 1.

Let ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be given. If 𝕏Wishartd(ν,𝕊)\mathbb{X}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S}) according to (3), then

𝔼[tr(Δν,Id)]\displaystyle\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}})\right] =0,𝔼[tr(Δν,Id2)]=d(d+1)2,𝔼[tr(Δν,Id3)]=ν1/2d(d2+3d+4)22,\displaystyle=0,\qquad\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{2})\right]=\frac{d(d+1)}{2},\qquad\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{3})\right]=\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}}, (88)
𝔼[tr(Δν,Id4)]\displaystyle\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{4})\right] =d(2d2+5d+5)4+ν1d(d3+6d2+21d+20)4,\displaystyle=\frac{d(2\hskip 0.85358ptd^{\hskip 0.85358pt2}+5\hskip 0.85358ptd+5)}{4}+\nu^{-1}\cdot\frac{d(d^{\hskip 0.85358pt3}+6\hskip 0.85358ptd^{\hskip 0.85358pt2}+21\hskip 0.85358ptd+20)}{4}, (89)

where recall Δν,𝕊:=(2ν𝕊)1/2(𝕏ν𝕊)(2ν𝕊)1/2\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}.

Proof of Lemma 1.

Let 𝕐:=𝕊1/2𝕏𝕊1/2Wishartd(ν,Id)\mathbb{Y}\vcentcolon=\mathbb{S}^{-1/2}\hskip 0.85358pt\mathbb{X}\,\mathbb{S}^{-1/2}\sim\mathrm{Wishart}_{d}(\nu,\mathrm{I}_{d}). It was shown by Letac and Massam [38, p.308-310] (another source could be de Waal and Nel [57, p.66], or Lu and Richards [40, Theorem 3.2], although the latter is less explicit) that

𝔼[𝕐]\displaystyle\mathbb{E}[\mathbb{Y}] =νId,\displaystyle=\nu\,\mathrm{I}_{d}, (90)
𝔼[𝕐2]\displaystyle\mathbb{E}[\mathbb{Y}^{2}] =νIdtr(Id)+(ν2+ν)Id2=νdId+(ν2+ν)Id,\displaystyle=\nu\,\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d})+(\nu^{2}+\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt2}=\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}, (91)
𝔼[𝕐3]\displaystyle\mathbb{E}[\mathbb{Y}^{3}] =νId(tr(Id))2+(ν2+ν)(Idtr(Id2)+2Id2tr(Id))+(ν3+3ν2+4ν)Id3\displaystyle=\nu\,\mathrm{I}_{d}\,(\mathrm{tr}(\mathrm{I}_{d}))^{2}+(\nu^{2}+\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+2\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\mathrm{tr}(\mathrm{I}_{d})\right)+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt3}
=νd2Id+3(ν2+ν)dId+(ν3+3ν2+4ν)Id,\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}, (92)
𝔼[𝕐4]\displaystyle\mathbb{E}[\mathbb{Y}^{4}] =νId(tr(Id))3+3(ν2+ν)(Idtr(Id)tr(Id2)+Id2(tr(Id))2)+(ν3+3ν2+4ν)(Idtr(Id3)+3Id3tr(Id))\displaystyle=\nu\,\mathrm{I}_{d}\,(\mathrm{tr}(\mathrm{I}_{d}))^{3}+3\,(\nu^{2}+\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d})\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+\mathrm{I}_{d}^{\hskip 0.85358pt2}\,(\mathrm{tr}(\mathrm{I}_{d}))^{2}\right)+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt3})+3\,\mathrm{I}_{d}^{\hskip 0.85358pt3}\,\mathrm{tr}(\mathrm{I}_{d})\right)
+(2ν3+5ν2+5ν)Id2tr(Id2)+(ν4+6ν3+21ν2+20ν)Id4\displaystyle\quad+(2\,\nu^{3}+5\,\nu^{2}+5\,\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d}^{4}
=νd3Id+6(ν2+ν)d2Id+(6ν3+17ν2+21ν)dId+(ν4+6ν3+21ν2+20ν)Id,\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt3}\,\mathrm{I}_{d}+6\,(\nu^{2}+\nu)\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+(6\,\nu^{3}+17\,\nu^{2}+21\,\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d}, (93)

from which we deduce the following:

𝔼[𝕐νId]\displaystyle\mathbb{E}[\mathbb{Y}-\nu\,\mathrm{I}_{d}] =0,\displaystyle=0, (94)
𝔼[(𝕐νId)2]\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{2}] =𝔼[𝕐2](νId)2={νdId+(ν2+ν)Id}ν2Id=ν(d+1)Id,\displaystyle=\mathbb{E}[\mathbb{Y}^{2}]-(\nu\,\mathrm{I}_{d})^{2}=\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}-\nu^{2}\,\mathrm{I}_{d}=\nu\,(d+1)\,\mathrm{I}_{d}, (95)
𝔼[(𝕐νId)3]\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{3}] =𝔼[𝕐3]3ν𝔼[𝕐2]+2(νId)3\displaystyle=\mathbb{E}[\mathbb{Y}^{3}]-3\,\nu\,\mathbb{E}[\mathbb{Y}^{2}]+2\,(\nu\,\mathrm{I}_{d})^{3}
={νd2Id+3(ν2+ν)dId+(ν3+3ν2+4ν)Id}3νId{νdId+(ν2+ν)Id}+2ν3Id3\displaystyle=\left\{\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}\right\}-3\,\nu\,\mathrm{I}_{d}\,\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}+2\,\nu^{3}\,\mathrm{I}_{d}^{\hskip 0.85358pt3}
=ν(d2+3d+4)Id\displaystyle=\nu\,(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)\,\mathrm{I}_{d} (96)
𝔼[(𝕐νId)4]\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{4}] =𝔼[𝕐4]4(νId)𝔼[𝕐3]+6(νId)2𝔼[𝕐2]3(νId)4\displaystyle=\mathbb{E}[\mathbb{Y}^{4}]-4\,(\nu\,\mathrm{I}_{d})\,\mathbb{E}[\mathbb{Y}^{3}]+6\,(\nu\,\mathrm{I}_{d})^{2}\,\mathbb{E}[\mathbb{Y}^{2}]-3\,(\nu\,\mathrm{I}_{d})^{4}
=νd3Id+6(ν2+ν)d2Id+(6ν3+17ν2+21ν)dId+(ν4+6ν3+21ν2+20ν)Id\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt3}\,\mathrm{I}_{d}+6\,(\nu^{2}+\nu)\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+(6\,\nu^{3}+17\,\nu^{2}+21\,\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d}
4νId{νd2Id+3(ν2+ν)dId+(ν3+3ν2+4ν)Id}+6ν2Id2{νdId+(ν2+ν)Id}3ν4Id4\displaystyle\quad-4\,\nu\,\mathrm{I}_{d}\,\left\{\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}\right\}+6\,\nu^{2}\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}-3\,\nu^{4}\,\mathrm{I}_{d}^{4}
=ν2(2d2+5d+5)Id+ν(d3+6d2+21d+20)Id.\displaystyle=\nu^{2}\,(2\hskip 0.85358ptd^{\hskip 0.85358pt2}+5\hskip 0.85358ptd+5)\,\mathrm{I}_{d}+\nu\,(d^{\hskip 0.85358pt3}+6\hskip 0.85358ptd^{\hskip 0.85358pt2}+21\hskip 0.85358ptd+20)\,\mathrm{I}_{d}. (97)

By the linearity of expectations, we have

𝔼[tr(Δν,Idk)]=(2ν)k/2tr(𝔼[(𝕐νId)k]),for any k.\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{k})\right]=(2\nu)^{-k/2}\,\mathrm{tr}\left(\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{k}]\right),\quad\text{for any }k\in\mathbb{N}. (98)

The conclusion follows. ∎

We can also estimate the moments of Lemma 1 on various events. The lemma below is used to estimate the ν1/2\asymp\nu^{-1/2} errors in (86) of the proof of Theorem 6.

Lemma 2.

Let ν>d1\nu>d-1 and 𝕊𝒮++d\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd} be given, and let A(d)A\in\mathscr{B}(\mathbb{R}^{d}) be a Borel set. If 𝕏Wishartd(ν,𝕊)\mathbb{X}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S}) according to (3), then, for ν\nu large enough,

|𝔼[tr(Δν,𝕊)𝟙{𝝀(Δν,𝕊)A}]|d3/2((𝝀(Δν,𝕊)Ac))1/2,\displaystyle\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]\right|\leq d^{\hskip 0.85358pt3/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2}, (99)
|𝔼[tr(Δν,𝕊3)𝟙{𝝀(Δν,𝕊)A}]ν1/2d(d2+3d+4)22|3d5/2((𝝀(Δν,𝕊)Ac))1/4,\displaystyle\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]-\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}}\right|\leq 3\hskip 0.85358ptd^{\hskip 0.85358pt5/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4}, (100)

where recall Δν,𝕊:=(2ν𝕊)1/2(𝕏ν𝕊)(2ν𝕊)1/2\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}.

Proof of Lemma 2.

By Lemma 1, the Cauchy-Schwarz inequality, and Jensen’s inequality ((tr(Δν,𝕊))2dtr(Δν,𝕊2)(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}))^{2}\leq d\cdot\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})), we have

|𝔼[tr(Δν,𝕊)𝟙{𝝀(Δν,𝕊)A}]|\displaystyle\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]\right| =|𝔼[tr(Δν,𝕊)𝟙{𝝀(Δν,𝕊)Ac}]|(𝔼[(tr(Δν,𝕊))2])1/2((𝝀(Δν,𝕊)Ac))1/2\displaystyle=\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\}}\right]\right|\leq\left(\mathbb{E}\left[(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}))^{2}\right]\right)^{1/2}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2} (101)
(d𝔼[tr(Δν,𝕊2)])1/2((𝝀(Δν,𝕊)Ac))1/2d3/2((𝝀(Δν,𝕊)Ac))1/2,\displaystyle\leq\left(d\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right]\right)^{1/2}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2}\leq d^{\hskip 0.85358pt3/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2},

which proves (99). Similarly, by Lemma 1, Holder’s inequality, and Jensen’s inequality ((tr(Δν,𝕊3))4/3d1/3tr(Δν,𝕊4)(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3}))^{4/3}\leq d^{\hskip 0.85358pt1/3}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})), we have, for ν\nu large enough,

|𝔼[tr(Δν,𝕊3)𝟙{𝝀(Δν,𝕊)A}]ν1/2d(d2+3d+4)22|\displaystyle\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]-\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}}\right| =|𝔼[tr(Δν,𝕊3)𝟙{𝝀(Δν,𝕊)Ac}]|(𝔼[(tr(Δν,𝕊3))4/3])3/4((𝝀(Δν,𝕊)Ac))1/4\displaystyle=\left|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\}}\right]\right|\leq\left(\mathbb{E}\left[(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3}))^{4/3}\right]\right)^{3/4}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4} (102)
(d1/3𝔼[tr(Δν,𝕊4)])3/4((𝝀(Δν,𝕊)Ac))1/43d5/2((𝝀(Δν,𝕊)Ac))1/4,\displaystyle\leq\left(d^{\hskip 0.85358pt1/3}\,\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})\right]\right)^{3/4}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4}\leq 3\hskip 0.85358ptd^{\hskip 0.85358pt5/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4},

which proves (100). This ends the proof. ∎

In the next lemma, we bound the density of the Wishartd(ν,𝕄)\mathrm{Wishart}_{d}(\nu,\mathbb{M}) distribution from (3) when ν>d+1\nu>d+1.

Lemma 3.

If ν>d+1\nu>d+1 and 𝕄𝒮++d\mathbb{M}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}, then

sup𝕏𝒮++dKν,𝕄(𝕏)(2π/e)d(d+1)4|𝕄|(d+1)/2(2e)d/2(ν(d+1))d(d+1)/4.\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})\leq\frac{(2\pi/e)^{-\frac{d(d+1)}{4}}|\mathbb{M}|^{-(d+1)/2}}{(2e)^{d/2}(\nu-(d+1))^{d(d+1)/4}}. (103)
Proof of Lemma 3.

When ν>d+1\nu>d+1, it is easily verified that the mode of the Wishartd(ν,𝕄)\mathrm{Wishart}_{d}(\nu,\mathbb{M}) distribution is (ν(d+1))𝕄(\nu-(d+1))\,\mathbb{M}, so the expression for the Wishart density in (3) yields

sup𝕏𝒮++dKν,𝕄(𝕏)=(ν(d+1))νd2d(d+1)2|𝕄|(d+1)/2exp(d2(ν(d+1)))2νd/2πd(d1)/4i=1dΓ(12(ν(i+1))+1).\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})=\frac{(\nu-(d+1))^{\frac{\nu d}{2}-\frac{d(d+1)}{2}}|\mathbb{M}|^{-(d+1)/2}\exp\big{(}-\frac{d}{2}(\nu-(d+1))\big{)}}{2^{\nu d/2}\pi^{\hskip 0.85358ptd(d-1)/4}\prod_{i=1}^{d}\Gamma(\frac{1}{2}(\nu-(i+1))+1)}. (104)

From Lemma 1 in [42], we know that, for all y>1y>1, 2πee(y12)(y1)y12Γ(y)\sqrt{2\pi e}\,e^{-(y-\frac{1}{2})}(y-1)^{y-\frac{1}{2}}\leq\Gamma(y), so we get

sup𝕏𝒮++dKν,𝕄(𝕏)(ν(d+1))νd2d(d+1)2|𝕄|(d+1)/2eνd2+d2(d+1)2νd/2πd(d1)/4(2πe)d/2eνd2+d4(d+1)[12(ν(d+1))]νd2d(d+1)4=(2π/e)d(d+1)4|𝕄|(d+1)/2(2e)d/2(ν(d+1))d(d+1)/4.\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})\leq\frac{(\nu-(d+1))^{\frac{\nu d}{2}-\frac{d(d+1)}{2}}|\mathbb{M}|^{-(d+1)/2}\,e^{-\frac{\nu d}{2}+\frac{d}{2}(d+1)}}{2^{\nu d/2}\pi^{\hskip 0.85358ptd(d-1)/4}\cdot(2\pi e)^{d/2}e^{-\frac{\nu d}{2}+\frac{d}{4}(d+1)}\big{[}\frac{1}{2}(\nu-(d+1))\big{]}^{\frac{\nu d}{2}-\frac{d(d+1)}{4}}}=\frac{(2\pi/e)^{-\frac{d(d+1)}{4}}|\mathbb{M}|^{-(d+1)/2}}{(2e)^{d/2}(\nu-(d+1))^{d(d+1)/4}}. (105)

This ends the proof. ∎

Appendix B Acronyms

CMB cosmic microwave background
i.i.d. independent and identically distributed
SMN symmetric matrix-variate normal
SPD symmetric positive definite

Appendix C Simulation code

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jmva.2021.104.

Acknowledgments

First, I would like to thank Donald Richards for his indications on how to calculate the moments in Lemma 1. I also thank the Editor, the Associate Editor and the referees for their insightful remarks which led to improvements in the presentation of this paper. The author is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR).

References

  • Abramowitz and Stegun [1964] M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, volume 55 of National Bureau of Standards Applied Mathematics Series, For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C., 1964. MR0167642.
  • Aitchison and Lauder [1985] J. Aitchison, I. J. Lauder, Kernel density estimation for compositional data, J. Roy. Statist. Soc. Ser. C 34 (1985) 129–137. doi:10.2307/2347365.
  • Alexander et al. [2001] D. C. Alexander, C. Pierpaoli, P. J. Basser, J. C. Gee, Spatial transformations of diffusion tensor magnetic resonance images, IEEE Trans. Med. Imaging 20 (2001) 1131–1139. doi:10.1109/42.963816.
  • Anderson [2003] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition, 2003. MR1990662.
  • Asta [2021] D. M. Asta, Kernel density estimation on symmetric spaces of non-compact type, J. Multivariate Anal. 181 (2021) 104676, 10. MR4172886.
  • Basser and Jones [2002] P. J. Basser, D. K. Jones, Diffusion-tensor MRI: theory, experimental design and data analysis - a technical review, NMR Biomed. 15 (2002) 456–467. doi:10.1002/nbm.783.
  • Basser and Pajevic [2003] P. J. Basser, S. Pajevic, A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI, IEEE Trans. Med. Imaging 22 (2003) 785–794. doi:10.1109/TMI.2003.815059.
  • Bouezmarni et al. [2011] T. Bouezmarni, A. El Ghouch, M. Mesfioui, Gamma kernel estimators for density and hazard rate of right-censored data, J. Probab. Stat. (2011) Art. ID 937574, 16 pp. MR2801351.
  • Bouezmarni and Rombouts [2008] T. Bouezmarni, J. V. K. Rombouts, Density and hazard rate estimation for censored and α\alpha-mixing data using gamma kernels, J. Nonparametr. Stat. 20 (2008) 627–643. MR2454617.
  • Bouezmarni and Rombouts [2010a] T. Bouezmarni, J. V. K. Rombouts, Nonparametric density estimation for multivariate bounded data, J. Statist. Plann. Inference 140 (2010a) 139–152. MR2568128.
  • Bouezmarni and Rombouts [2010b] T. Bouezmarni, J. V. K. Rombouts, Nonparametric density estimation for positive time series, Comput. Statist. Data Anal. 54 (2010b) 245–261. MR2756423.
  • Bouezmarni and Scaillet [2005] T. Bouezmarni, O. Scaillet, Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data, Econom. Theor. 21 (2005) 390–412. MR2179543.
  • Carter [2002] A. V. Carter, Deficiency distance between multinomial and multivariate normal experiments, Ann. Statist. 30 (2002) 708–730. MR1922539.
  • Chacón and Duong [2018] J. E. Chacón, T. Duong, Multivariate Kernel Smoothing and Its Applications, volume 160 of Monographs on Statistics and Applied Probability, CRC Press, Boca Raton, FL, 2018. MR3822372.
  • Chen [1999] S. X. Chen, Beta kernel estimators for density functions, Comput. Statist. Data Anal. 31 (1999) 131–145. MR1718494.
  • Chen [2000] S. X. Chen, Probability density function estimation using gamma kernels, Ann. Inst. Statist. Math 52 (2000) 471–480. MR1794247.
  • Chevallier et al. [2014] E. Chevallier, A. Chevallier, J. Angulo, Computing histogram of tensor images using orthogonal series density estimation and Riemannian metrics, in: 22nd International Conference on Pattern Recognition, pp. 900–905. doi:10.1109/ICPR.2014.165.
  • Chevallier et al. [2017] E. Chevallier, E. Kalunga, J. Angulo, Kernel density estimation on spaces of Gaussian distributions and symmetric positive definite matrices, SIAM J. Imaging Sci. 10 (2017) 191–215. MR3606419.
  • Fernandes and Monteiro [2005] M. Fernandes, P. K. Monteiro, Central limit theorem for asymmetric kernel functionals, Ann. Inst. Statist. Math. 57 (2005) 425–442. MR2206532.
  • Fujikoshi et al. [2010] Y. Fujikoshi, V. V. Ulyanov, R. Shimizu, Multivariate Statistics, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. MR2640807.
  • Funke and Kawka [2015] B. Funke, R. Kawka, Nonparametric density estimation for multivariate bounded data using two non-negative multiplicative bias correction methods, Comput. Statist. Data Anal. 92 (2015) 148–162. MR3384258.
  • Gallaugher and McNicholas [2018] M. P. B. Gallaugher, P. D. McNicholas, Finite mixtures of skewed matrix variate distributions, Pattern Recognit. 80 (2018) 83–93. doi:10.1016/j.patcog.2018.02.025.
  • Gasbarra et al. [2017] D. Gasbarra, S. Pajevic, P. J. Basser, Eigenvalues of random matrices with isotropic Gaussian noise and the design of diffusion tensor imaging experiments, SIAM J. Imaging Sci. 10 (2017) 1511–1548. doi:10.1137/16M1098693.
  • Gibbs and Su [2002] A. L. Gibbs, F. E. Su, On choosing and bounding probability metrics, Int. Stat. Rev. 70 (2002) 419–435. doi:10.2307/1403865.
  • Gupta and Nagar [1999] A. K. Gupta, D. K. Nagar, Matrix Variate Distributions, Chapman and Hall/CRC, first edition, 1999.
  • Hadjicosta [2019] E. Hadjicosta, Integral Transform Methods in Goodness-of-Fit Testing, PhD thesis, Pennsylvania State University, 2019.
  • Hadjicosta and Richards [2020] E. Hadjicosta, D. Richards, Integral transform methods in goodness-of-fit testing, II: the Wishart distributions, Ann. Inst. Statist. Math. 72 (2020) 1317–1370. MR4169380.
  • Haff et al. [2011] L. R. Haff, P. T. Kim, J.-Y. Koo, D. S. P. Richards, Minimax estimation for mixtures of Wishart distributions, Ann. Statist. 39 (2011) 3417–3440. MR3012414.
  • Hirukawa and Sakudo [2015] M. Hirukawa, M. Sakudo, Family of the generalised gamma kernels: a generator of asymmetric kernels for nonnegative data, J. Nonparametr. Stat. 27 (2015) 41–63. MR3304359.
  • Hu and White [1997] W. Hu, M. White, A CMB polarization primer, New Astronomy 2 (1997) 323–344. doi:10.1016/S1384-1076(97)00022-5.
  • Igarashi and Kakizawa [2018] G. Igarashi, Y. Kakizawa, Generalised gamma kernel density estimation for nonnegative data and its bias reduction, J. Nonparametr. Stat. 30 (2018) 598–639. MR3843043.
  • Kim and Richards [2008] P. T. Kim, D. S. P. Richards, Diffusion tensor imaging and deconvolution on spaces of positive definite symmetric matrices, in: 2nd MICCAI Workshop on Mathematical Foundations of Computational Anatomy, New York, United States, 2008, pp. 140–149. inria-00632882.
  • Kim and Richards [2011] P. T. Kim, D. S. P. Richards, Deconvolution density estimation on the space of positive definite symmetric matrices, in: Nonparametric statistics and mixture models, World Sci. Publ., Hackensack, NJ, 2011, pp. 147–168. MR2838725.
  • Kocherlakota and Kocherlakota [1999] S. Kocherlakota, K. Kocherlakota, Approximations for central and noncentral bivariate chi-square distributions, Commun. Stat. - Simul. Comput. 28 (1999) 909–930. doi:10.1080/03610919908813585.
  • Kokonendji and Somé [2018] C. C. Kokonendji, S. M. Somé, On multivariate associated kernels to estimate general density functions, J. Korean Statist. Soc. 47 (2018) 112–126. MR3760293.
  • Kokonendji and Somé [2021] C. C. Kokonendji, S. M. Somé, Bayesian bandwidths in semiparametric modelling for nonnegative orthant data with diagnostics, Stats 4 (2021) 162–183. doi:10.3390/stats4010013.
  • Kollo and von Rosen [1995] T. Kollo, D. von Rosen, Approximating by the Wishart distribution, Ann. Inst. Statist. Math. 47 (1995) 767–783. MR1370289.
  • Letac and Massam [2004] G. Letac, H. Massam, All invariant moments of the Wishart distribution, Scand. J. Statist. 31 (2004) 295–318. MR2066255.
  • Li et al. [2020] D. Li, Y. Lu, E. Chevallier, D. Dunson, Density estimation and modeling on symmetric spaces, Preprint (2020) 1–41. arXiv:2009.01983.
  • Lu and Richards [2001] I.-L. Lu, D. S. P. Richards, MacMahon’s master theorem, representation theory, and moments of Wishart distributions, Adv. in Appl. Math. 27 (2001) 531–547. MR1868979.
  • Mallows [1961] C. L. Mallows, Latent vectors of random symmetric matrices, Biometrika 48 (1961) 133–149. MR131312.
  • Minc and Sathre [6465] H. Minc, L. Sathre, Some inequalities involving (r!)1/r(r!)^{1/r}, Proc. Edinburgh Math. Soc. (2) 14 (1964/65) 41–46. MR162751.
  • Ouimet and Tolosana-Delgado [2022] F. Ouimet, R. Tolosana-Delgado, Asymptotic properties of Dirichlet density estimators, J. Multivariate Anal. 187 (2022) 104832, 25 pp. doi:10.1016/j.jmva.2021.104832.
  • Pajevic and Basser [1999] S. Pajevic, P. J. Basser, Parametric description of noise in diffusion tensor MRI, in: 8th Annual Meeting of the ISMRM, Philadelphia, p. 1787.
  • Pajevic and Basser [2003] S. Pajevic, P. J. Basser, Parametric and non-parametric statistical analysis of DT-MRI data, J. Magn. Reson. 161 (2003) 1–14. doi:10.1016/s1090-7807(02)00178-7.
  • Pelletier [2005] B. Pelletier, Kernel density estimation on Riemannian manifolds, Statist. Probab. Lett. 73 (2005) 297–304. MR2179289.
  • Pollard [2005] D. Pollard, Total variation distance between measures, in: Asymptopia, version: 15feb05, 2005, pp. 1–15.
    [URL] http://www.stat.yale.edu/ pollard/Courses/607.spring05/handouts/Totalvar.pdf.
  • Prakasa Rao [1983] B. L. S. Prakasa Rao, Nonparametric Functional Estimation, Probability and Mathematical Statistics, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1983. MR0740865.
  • Schwartzman et al. [2008] A. Schwartzman, W. F. Mascarenhas, J. E. Taylor, Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices, Ann. Statist. 36 (2008) 2886–2919. MR2485016.
  • Scott [2015] D. W. Scott, Multivariate Density Estimation, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, second edition, 2015. MR3329609.
  • Serfling [1980] R. J. Serfling, Approximation Theorems of Mathematical Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1980. MR0595165.
  • Somé [2020] S. M. Somé, Bayesian selector of adaptive bandwidth for gamma kernel density estimator on [0,)[0,\infty) : simulations and applications, Communications in Statistics - Simulation and Computation (2020) 1–11. doi:10.1080/03610918.2020.1828921.
  • Somé and Kokonendji [2021] S. M. Somé, C. C. Kokonendji, Bayesian selector of adaptive bandwidth for multivariate gamma kernel estimator on [0,)d[0,\infty)^{d}, Journal of Applied Statistics (2021) 1–22. doi:10.1080/02664763.2021.1881456.
  • Steyn and Roux [1972] H. S. Steyn, J. J. J. Roux, Approximations for the non-central Wishart distribution, South African Statist. J. 6 (1972) 165–173. MR326925.
  • Tan and Gupta [1982] W. Y. Tan, R. P. Gupta, On approximating the noncentral Wishart distribution by central Wishart distribution: a Monte Carlo study, Comm. Statist. B—Simulation Comput. 11 (1982) 47–64. MR648656.
  • Vafaei Sadr and Movahed [2021] A. Vafaei Sadr, S. M. S. Movahed, Clustering of local extrema in Planck CMB maps, MNRAS 503 (2021) 815–829. doi:10.1093/mnras/stab368.
  • de Waal and Nel [1973] D. J. de Waal, D. G. Nel, On some expectations with respect to Wishart matrices, South African Statist. J. 7 (1973) 61–67. MR347003.
  • Zhang [2010] S. Zhang, A note on the performance of the gamma kernel estimators at the boundary, Statist. Probab. Lett. 80 (2010) 548–557. MR2595129.