A symmetric matrix-variate normal local approximation for the Wishart distribution and some applications

Frédéric Ouimet¹¹footnotemark: 1 California Institute of Technology, Pasadena, CA 91125, USA. McGill University, Montreal, QC H3A 0B9, Canada. frederic.ouimet2@mcgill.ca

Abstract

The noncentral Wishart distribution has become more mainstream in statistics as the prevalence of applications involving sample covariances with underlying multivariate Gaussian populations as dramatically increased since the advent of computers. Multiple sources in the literature deal with local approximations of the noncentral Wishart distribution with respect to its central counterpart. However, no source has yet developed explicit local approximations for the (central) Wishart distribution in terms of a normal analogue, which is important since Gaussian distributions are at the heart of the asymptotic theory for many statistical methods. In this paper, we prove a precise asymptotic expansion for the ratio of the Wishart density to the symmetric matrix-variate normal density with the same mean and covariances. The result is then used to derive an upper bound on the total variation between the corresponding probability measures and to find the pointwise variance of a new density estimator on the space of positive definite matrices with a Wishart asymmetric kernel. For the sake of completeness, we also find expressions for the pointwise bias of our new estimator, the pointwise variance as we move towards the boundary of its support, the mean squared error, the mean integrated squared error away from the boundary, and we prove its asymptotic normality.

keywords:

asymmetric kernel, asymptotic statistics, density estimation, expansion, local approximation, matrix-variate normal, multivariate associated kernel, normal approximation, smoothing, total variation, Wishart distribution

MSC:

[2020]Primary: 62E20 Secondary: 62H10, 62H12, 62B15, 62G05, 62G07

^†^†journal: Journal of Multivariate Analysis

1 Introduction

Let $d\in\mathbb{N}$ be given. Define the space of (real) symmetric matrices of size $d\times d$ and the space of (real symmetric) positive definite matrices of size $d\times d$ as follows:

	$\displaystyle\mathcal{S}^{\hskip 0.85358ptd}\vcentcolon=\left\{\mathbb{M}\in\mathbb{R}^{d\times d}:\text{$\mathbb{M}$ is symmetric}\right\},$		(1)
	$\displaystyle\mathcal{S}_{++}^{\hskip 0.85358ptd}\vcentcolon=\left\{\mathbb{M}\in\mathbb{R}^{d\times d}:\text{$\mathbb{M}$ is symmetric and positive definite}\right\}.$		(2)

For $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , the density function of the $\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ distribution is defined by

\displaystyle K_{\nu,\mathbb{S}}(\mathbb{X})

\displaystyle\vcentcolon=\frac{|\mathbb{S}^{-1}\mathbb{X}|^{\nu/2-(d+1)/2}\exp\left(-\frac{1}{2}\mathrm{tr}(\mathbb{S}^{-1}\mathbb{X})\right)}{2^{\nu d/2}|\mathbb{S}|^{(d+1)/2}\pi^{\hskip 0.85358ptd(d-1)/4}\prod_{i=1}^{d}\Gamma(\frac{1}{2}(\nu-(i+1))+1)},\quad\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd},

(3)

where $\nu$ is the number of degrees of freedom, $\mathbb{S}$ is the scale matrix, and

\Gamma(a)\vcentcolon=\int_{0}^{\infty}t^{a-1}e^{-t}{\rm d}t,\quad a>0,

(4)

denotes the Euler gamma function. The mean and covariance matrix for the vectorization of $\mathbb{W}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ , namely

\mathrm{vecp}(\mathbb{W})\vcentcolon=(\mathbb{W}_{11},\mathbb{W}_{12},\mathbb{W}_{22},\dots,\mathbb{W}_{1d},\dots,\mathbb{W}_{dd})^{\top},

(5)

( $\mathrm{vecp}(\cdot)$ is the operator that stacks the columns of the upper triangular portion of a symmetric matrix on top of each other) are well known to be:

\mathbb{E}[\mathrm{vecp}(\mathbb{W})]=\nu\,\mathrm{vecp}(\mathbb{S})\quad\text{(alternatively, $\mathbb{E}[\mathbb{W}]=\nu\hskip 0.85358pt\mathbb{S}$)}

(6)

and

\mathbb{V}\mathrm{ar}(\mathrm{vecp}(\mathbb{W}))=B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d},

(7)

where $\mathrm{I}_{d}$ is the identity matrix of order $d$ , $B_{d}$ is a $d^{\hskip 0.85358pt2}\times\frac{1}{2}d(d+1)$ transition matrix (see Gupta and Nagar [25, p.11] for the precise definition), and $\otimes$ denotes the Kronecker product.

Multiple sources in the literature deal with local approximations of the noncentral Wishart distribution with respect to the (central) Wishart distribution, see, e.g., Steyn and Roux [54], Tan and Gupta [55], Kollo and von Rosen [37], Kocherlakota and Kocherlakota [34]. However, no source has yet developed explicit local approximations for the (central) Wishart distribution in terms of a normal analogue, which is important since Gaussian distributions are at the heart of the asymptotic theory for many statistical methods.

The main goal of our paper (Theorem 1) is to establish an asymptotic expansion for the ratio of the Wishart density (3) to the symmetric matrix-variate normal (SMN) density with the same mean and covariances. According to Gupta and Nagar [25, Eq.(2.5.8)], the density of the $\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d})$ distribution is

g_{\nu,\mathbb{S}}(\mathbb{X})=\frac{\exp\left(-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\sqrt{(2\pi)^{d(d+1)/2}\,|B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}|}}=\frac{\exp\left(-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\sqrt{2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}|\sqrt{2\nu}\,\mathbb{S}|^{d+1}}},\quad\mathbb{X}\in\mathcal{S}^{\hskip 0.85358ptd},

(8)

where the last equality follows from Gupta and Nagar [25, Eq.(1.2.18)], and

\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}.

(9)

Rewritings of the density (8) are provided on page 71 of Gupta and Nagar [25] using the vectorization operators $\mathrm{vec}(\cdot)$ and $\mathrm{vecp}(\cdot)$ . For example, we can rewrite $g_{\nu,\mathbb{S}}(\mathbb{X})$ in terms of $\mathrm{vecp}(\mathbb{X})$ as follows:

g_{\nu,\mathbb{S}}(\mathbb{X})=\frac{\exp\left(-\frac{1}{2}(\mathrm{vecp}(\mathbb{X})-\mathrm{vecp}(\nu\hskip 0.85358pt\mathbb{S}))^{\top}\big{[}B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}\big{]}^{-1}(\mathrm{vecp}(\mathbb{X})-\mathrm{vecp}(\nu\hskip 0.85358pt\mathbb{S}))\right)}{\sqrt{(2\pi)^{d(d+1)/2}\,|B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d}|}},\quad\mathbb{X}\in\mathcal{S}^{\hskip 0.85358ptd}.

(10)

To give a bit of practical motivations for the SMN distribution (8), note that noise in the estimate of individual voxels of diffusion tensor magnetic resonance imaging (DT-MRI) data has been shown to be well modeled by the $\mathrm{SMN}_{3\times 3}$ distribution in [44, 6, 45]. The SMN voxel distributions were combined into a tensor-variate normal distribution in [7, 23], which could help to predict how the whole image (not just individual voxels) changes when shearing and dilation operations are applied in image wearing and registration problems, see Alexander et al. [3]. In [49], maximum likelihood estimators and likelihood ratio tests are developed for the eigenvalues and eigenvectors of a form of the SMN distribution with an orthogonally invariant covariance structure, both in one-sample problems (for example, in image interpolation) and two-sample problems (when comparing images) and under a broad variety of assumptions. This work extended significantly previous results of Mallows [41]. In [49], it is also mentioned that the polarization pattern of cosmic microwave background (CMB) radiation measurements can be represented by $2\times 2$ positive definite matrices, see the primer by Hu and White [30]. In a very recent and interesting paper, Vafaei Sadr and Movahed [56] presented evidence for the Gaussianity of the local extrema of CMB maps. We can also mention [22], where finite mixtures of skewed SMN distributions were applied to an image recognition problem.

In general, we know that the Gaussian distribution is an attractor for sums of i.i.d. random variables with finite variance, which makes many estimators in statistics asymptotically normal. Similarly, we expect the SMN distribution (8) to be an attractor for sums of i.i.d. random symmetric matrices with finite variances, thus including many estimators such as sample covariance matrices and score statistics for symmetric matrix parameters. In particular, if a given statistic or estimator is a function of the components of a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population, then we could study its large sample properties (such as its moments) using Theorem 1 (for example, by turning a Wishart-moments estimation problem into a Gaussian-moments estimation problem).

In Section 3, we use our asymptotic expansion (Theorem 1) to find the pointwise variance of a new density estimator on the space of positive definite matrices with a Wishart asymmetric kernel (Section 3.1), and we derive an upper bound on the total variation between the probability measures on $\mathcal{S}^{\hskip 0.85358ptd}$ induced by (3) and (8) (Section 3.2). These are two examples of applications, but it is clear that there could be many others under the proper context.

Remark 1 (Notation).

Throughout the paper, $a=\mathcal{O}(b)$ means that $\limsup|a/b|<C$ as $\nu\to\infty$ (or as $b\to 0$ or as $n\to\infty$ in Section 3.1, depending on the context), where $C>0$ is a universal constant. Whenever $C$ might depend on some parameter, we add a subscript (for example, $a=\mathcal{O}_{d}(b)$ ). Similarly, $a=\mathrm{o}(b)$ means that $\lim|a/b|=~{}0$ , and subscripts indicate which parameters the convergence rate can depend on. The notation $\mathrm{tr}(\cdot)$ will denote the trace operator for matrices and $|\cdot|$ their determinant. For a matrix $\mathbb{M}\in\mathbb{R}^{d\times d}$ that is diagonalizable, $\lambda_{1}(\mathbb{M})\leq\dots\leq\lambda_{d}(\mathbb{M})$ will denote its eigenvalues, and we let $\boldsymbol{\lambda}(\mathbb{M})\vcentcolon=(\lambda_{1}(\mathbb{M}),\dots,\lambda_{d}(\mathbb{M}))^{\top}$ .

In Section 3.1 and the related proofs, the symbol $\mathscr{D}$ over an arrow ‘ $\longrightarrow$ ’ will denote the convergence in distribution (or law). We will also use the shorthand $[d]\vcentcolon=\{1,\dots,d\}$ in several places. Finally, the bandwidth parameter $b=b(n)$ will always be implicitly a function of the number of observations, the only exceptions being in Theorem 2 and the related proof.

2 Main result

Below, we prove an asymptotic expansion for the ratio of the Wishart density to the symmetric matrix-variate normal (SMN) density with the same mean and covariances. This result is (much) stronger than the result found, for example, in [4, Theorem 3.6.2] or [20, Theorem 2.5.1], which says that for a sequence of i.i.d. multivariate Gaussian observations $\boldsymbol{X}_{1},\dots,\boldsymbol{X}_{n}\sim\mathcal{N}_{d}(\boldsymbol{\mu},\mathbb{S})$ with $\boldsymbol{\mu}\in\mathbb{R}^{d}$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , the scaled and recentered sample covariance matrix of $\boldsymbol{X}_{1},\dots,\boldsymbol{X}_{n}$ converges in law to a SMN distribution, specifically,

n^{-1/2}\left[\sum_{i=1}^{n}(\boldsymbol{X}_{i}-\boldsymbol{\mu})(\boldsymbol{X}_{i}-\boldsymbol{\mu})^{\top}-n\,\mathbb{S}\right]\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathrm{SMN}_{d\times d}(0_{d\times d},2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})\hskip 0.85358ptB_{d}),\quad n\to\infty.

(11)

The result in Theorem 1 is stronger than (11) since it is well known that $\sum_{i=1}^{n}(\boldsymbol{X}_{i}-\boldsymbol{\mu})(\boldsymbol{X}_{i}-\boldsymbol{\mu})^{\top}\sim\mathrm{Wishart}\hskip 0.85358pt(n,\mathbb{S})$ in this context.

Theorem 1.

Let $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be given. Pick any $\eta\in(0,1)$ and let

B_{\nu,\mathbb{S}}(\eta)\vcentcolon=\left\{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}:\max_{1\leq i\leq d}\left|\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right|\leq\eta\,\nu^{-1/3}\right\}

(12)

denote the bulk of the Wishart distribution. Then, as $\nu\to\infty$ and uniformly for $\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta)$ , we have

	$\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)$	$\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}$		(13)
		$\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\Delta_{\nu,\mathbb{S}})\|^{5}}{\nu^{3/2}}\right).$		(13)

Furthermore,

	$\displaystyle\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=1$	$\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right)^{2}\right.$		(14)
		$\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\Delta_{\nu,\mathbb{S}})\|^{9}}{\nu^{3/2}}\right).$		(14)

As a direct consequence of Theorem 1, we obtain expansions for the ratio and log-ratio of the density function for a multivariate bijective mapping $\boldsymbol{h}(\cdot)$ applied to a Wishart random matrix to the density function of the same mapping applied to the corresponding SMN random matrix. In particular, the corollary below provides an asymptotic expansion for the density of a bijective mapping applied to a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population.

Corollary 1.

Let $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be given, and let $\mathbb{W}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ and $\mathbb{N}\sim\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d})$ . Let $\boldsymbol{h}(\cdot)$ be a one-to-one mapping from an open subset $\mathcal{D}$ of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ onto a subset $\mathcal{R}$ of $\mathbb{R}^{d(d+1)/2}$ . Assume further that $\boldsymbol{h}$ has continuous partial derivatives on $\mathcal{D}$ and its Jacobian determinant $\big{|}\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\big{|}$ is non-zero for all $\mathbb{X}\in\mathcal{D}$ . Define

\widetilde{\Delta}_{\nu,\mathbb{S}}=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\boldsymbol{h}^{-1}(\boldsymbol{y})-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2},\quad\boldsymbol{y}\in\mathcal{R},

(15)

and denote by $f_{\boldsymbol{h}(\mathbb{W})}$ and $f_{\boldsymbol{h}(\mathbb{N})}$ the density functions of $\boldsymbol{h}(\mathbb{W})$ and $\boldsymbol{h}(\mathbb{N})$ , respectively. Fix any $\eta\in(0,1)$ , then we have, as $\nu\to\infty$ , and uniformly for $\boldsymbol{y}\in\mathcal{R}$ such that $\boldsymbol{h}^{-1}(\boldsymbol{y})\in B_{\nu,\mathbb{S}}(\eta)$ ,

	$\displaystyle\log\left(\frac{f_{\boldsymbol{h}(\mathbb{W})}(\boldsymbol{y})}{f_{\boldsymbol{h}(\mathbb{N})}(\boldsymbol{y})}\right)$	$\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}$		(16)
		$\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\widetilde{\Delta}_{\nu,\mathbb{S}})\|^{5}}{\nu^{3/2}}\right),$		(16)

and

	$\displaystyle\frac{f_{\boldsymbol{h}(\mathbb{W})}(\boldsymbol{y})}{f_{\boldsymbol{h}(\mathbb{N})}(\boldsymbol{y})}=1$	$\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}})\right)^{2}\right.$		(17)
		$\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\widetilde{\Delta}_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\widetilde{\Delta}_{\nu,\mathbb{S}})\|^{9}}{\nu^{3/2}}\right).$		(17)

Under the conditions of Corollary 1, we know from the multivariate delta method (see, e.g., [20, Theorem 2.5.2]) that the random vectors $\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{W})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))$ and $\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{N})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))$ both converge in distribution, as $\nu\to\infty$ , to

\boldsymbol{Y}\sim\mathcal{N}_{d(d+1)/2}\left(\boldsymbol{0},\left(\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right)2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})\hskip 0.85358ptB_{d}\left(\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right)^{\top}\right),

where $\boldsymbol{h}(\mathbb{X})=(h_{1}(\mathbb{X}),\dots,h_{d(d+1)/2}(\mathbb{X}))^{\top}$ and

\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}\boldsymbol{h}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}=\left[\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{1}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}~{}~{}\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{2}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}~{}~{}\dots~{}~{}\left.\frac{{\rm d}}{{\rm d}\,\mathrm{vecp}(\mathbb{X})}h_{d(d+1)/2}(\mathbb{X})\right|_{\mathbb{X}=\nu\hskip 0.85358pt\mathbb{S}}\right]^{\top}.

Therefore, it would have been neat to extend Corollary 1 by expanding the log-ratio $\log(f_{\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{W})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))}(\boldsymbol{y})/f_{\boldsymbol{Y}}(\boldsymbol{y}))$ . However, this would most likely require an expansion for the log-ratio $\log(f_{\nu^{-1/2}\,(\boldsymbol{h}(\mathbb{N})-\boldsymbol{h}(\nu\hskip 0.85358pt\mathbb{S}))}(\boldsymbol{y})/f_{\boldsymbol{Y}}(\boldsymbol{y}))$ , and it is unclear which restrictions we should impose on $\boldsymbol{h}$ to progress in that direction. This question is left open for future research.

Below, we provide numerical evidence (displayed graphically) for the validity of the expansion in Theorem 1 when $d=2$ . We compare three levels of approximation for various choices of $\mathbb{S}$ . For any given $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , define

$\displaystyle E_{0}$	$\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left\|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)\right\|,$	(18)
$\displaystyle E_{1}$	$\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left\|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)-\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}\right\|,$	(19)
$\displaystyle E_{2}$	$\displaystyle\vcentcolon=\sup_{\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})}\left\|\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)-\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}\right.$
	$\displaystyle\quad\left.\hskip 71.13188pt-\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}\right\|.$	(20)

In order to avoid numerical errors in part due to the gamma functions in $K_{\nu,\mathbb{S}}(\mathbb{X})$ , we have to work a bit to get an expression for $\log\left(K_{\nu,\mathbb{S}}(\mathbb{X})/g_{\nu,\mathbb{S}}(\mathbb{X})\right)$ which is numerically more stable. By taking the expression for $K_{\nu,\mathbb{S}}(\mathbb{X})$ in (54) and dividing by the expression for $g_{\nu,\mathbb{S}}(\mathbb{X})$ on the right-hand side of (8), we get

\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=|\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}|^{\nu/2-(d+1)/2}\cdot\frac{\exp\left(-\sqrt{\frac{\nu}{2}}\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right)}{\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\cdot\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}},

(21)

so that

	$\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)$	$\displaystyle=\frac{\nu-(d+1)}{2}\sum_{i=1}^{d}\log\left(1+\sqrt{\frac{2}{\nu}}\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)-\sqrt{\frac{\nu}{2}}\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\sum_{i=1}^{d}\left[\left(\frac{\nu-i}{2}\right)\log\left(1-\frac{i+1}{\nu}\right)+\frac{i+1}{2}\right]$		(22)
		$\displaystyle\quad-\sum_{i=1}^{d}\left[\log\Gamma\left(\frac{1}{2}(\nu-(i+1))+1\right)-\frac{1}{2}\log(2\pi)+\frac{1}{2}(\nu-(i+1))-\frac{\nu-i}{2}\log\left(\frac{1}{2}(\nu-(i+1))\right)\right].$		(22)

In R, we used this last equation to evaluate the log-ratios inside $E_{0}$ , $E_{1}$ and $E_{2}$ .

Note that $\mathbb{X}\in B_{\nu,\mathbb{S}}(2^{-1/2}\nu^{-1/6})$ implies $|\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{k})|\leq d\,2^{-k}$ for all $k\in\mathbb{N}$ , so we expect from Theorem 1 that the maximum errors above ( $E_{0}$ , $E_{1}$ and $E_{2}$ ) will have the asymptotic behavior

\displaystyle E_{i}=\mathcal{O}_{d}(\nu^{-(1+i)/2}),\quad\text{for all }i\in\{0,1,2\},

(23)

or equivalently,

\displaystyle\liminf_{\nu\to\infty}\frac{\log E_{i}}{\log(\nu^{-1})}\geq\frac{1+i}{2},\quad\text{for all }i\in\{0,1,2\}.

(24)

The property (24) is verified in Fig. 2 below, for various choices of $\mathbb{S}$ . Similarly, the corresponding the log-log plots of the errors as a function of $\nu$ are displayed in Fig. 1. The simulations are limited to the range $5\leq\nu\leq 205$ . The R code that generated Fig. 1 and Fig. 2 can be found in C.

Refer to caption — (a) $\mathbb{S}=\begin{pmatrix}2&1\\ 1&2\end{pmatrix}$

3 Applications

3.1 Asymptotic properties of Wishart asymmetric kernel estimators

Symmetric positive definite (SPD) matrix data are prevalent in modern statistical applications. As pointed out by Hadjicosta [26] and Hadjicosta and Richards [27], where goodness-of-fit tests for the Wishart distribution were developed based on integral transforms, factor analysis, diffusion tensor imaging, CMB radiation measurements, volatility models in finance, wireless communication systems and polarimetric radar imaging are just a few areas where SPD matrix data might be observed. Some articles have dealt with methods of density estimation on this space but the literature remains relatively scarce. Chevallier et al. [17] show how truncated Fourier series can be used for various applications, Kim and Richards [32, 33] and Haff et al. [28] explore the deconvolution of Wishart mixtures, Chevallier et al. [18] adapt the kernel estimator on compact Riemannian manifolds introduced by Pelletier [46] to compact subsets of the space of multivariate Gaussian distributions under the Fisher information metric and the Wasserstein metric, and Asta [5] defines a kernel density estimator on symmetric spaces of non-compact type (similar to Pelletier’s but for which Helgason–Fourier transforms are defined) and proves an upper bound on the convergence rate that is analogous to the minimax rate of classical kernel estimators on Euclidean spaces.

In a recent preprint, Li et al. [39] consider log-Gaussian kernel estimators (based on the logarithm map for SPD matrices) and a variant of the Wishart asymmetric kernel estimator that is slightly different from our definition below in (25). They prove various asymptotic properties for the former and a simulation study compares them both. If we were to apply traditional multivariate kernel estimators to the vectorization (in $\mathbb{R}^{d(d+1)/2}$ , recall (5)) of a sequence of i.i.d. random SPD matrices, then these estimators would misbehave near the boundary because of the condition on the eigenvalues (i.e., that they remain positive), and the usual boundary kernel modifications would not be appropriate either since positive definiteness is not a condition that can be translated to individual bounds on the entries of a matrix. To the best of our knowledge, [39] is the only paper that presents estimators on the space of SPD matrices that address (implicitly) the spill over problem of traditional multivariate kernel estimators caused by the boundary condition on the eigenvalues. Similarly to Li et al. [39], we can construct a new density estimator with a Wishart asymmetric kernel that creates a variable smoothing in our space and has a uniformly negligible bias near the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ .

In terms of applications, our new density estimation method on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ could be used, apart from visualization purposes, for nonparametric alternatives to regression and classification (both supervised and unsupervised) in any of the fields mentioned at the beginning of the first paragraph in this section. The favorable boundary properties of our estimator (the proof of Theorem 2 below shows that the pointwise bias is asymptotically uniformly negligible near the boundary) means that it could be particularly useful for scarce data sets and/or data sets with clusters of observations near the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ .

Here is the definition of our estimator. Assume that we have a sequence of observations $\mathbb{X}_{1},\dots,\mathbb{X}_{n}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ that are independent and $F$ distributed ( $F$ is unknown), with density $f$ supported on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ for some $d\in\mathbb{N}$ . Then, for a given bandwidth parameter $b>0$ , let

\hat{f}_{n,b}(\mathbb{S})\vcentcolon=\frac{1}{n}\sum_{i=1}^{n}K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{i}),\quad\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd},

(25)

be the (or a) Wishart asymmetric kernel estimator for the density function $f$ , where $K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\cdot)$ is defined in (3). The estimator $\hat{f}_{n,b}$ can be seen as a continuous example in the broader class of multivariate associated kernel estimators introduced by Kokonendji and Somé [35, 36]. It is also a natural generalization of a slight variant of the (unmodified) Gamma kernel estimator introduced by Chen [16] because the Wishart distribution (recall (3)) is a matrix-variate analogue of the Gamma distribution. In [16, 12, 19, 9, 10, 11, 58, 8, 29], many asymptotic properties for Gamma kernel estimators of density functions supported on the half-line $[0,\infty)$ were studied, among other things: pointwise bias, pointwise variance, mean squared error, mean integrated squared error, asymptotic normality and uniform strong consistency. Also, bias reduction techniques were explored by Igarashi and Kakizawa [31] and Funke and Kawka [21], and adaptative Bayesian methods of bandwidth selection were presented by Somé [52] and Somé and Kokonendji [53].

Below, we show how some of the asymptotic properties of $\hat{f}_{n,b}$ can be studied using the asymptotic expansion developed in Theorem 1. Assume that $f$ is Lipschitz continuous and bounded on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ . (To make sense of this assumption, note that $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ is an open and convex subset in the space of symmetric matrices of size $d\times d$ , which itself is isomorphic to $\mathbb{R}^{d(d+1)/2}$ .) Then, straightforward calculations show that, for any given $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ ,

\displaystyle\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}\,\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right]-n^{-1}\left(\mathbb{E}\left[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right]\right)^{2}=n^{-1}\,\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right]-\mathcal{O}_{d,\mathbb{S}}(n^{-1}),

(26)

where

	$\displaystyle\mathbb{E}\left[\left(K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})\right)^{2}\right]$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:LLT.order.2}}}{{=}}\int_{\mathcal{S}^{\hskip 0.85358ptd}}g_{1/b,b\hskip 0.85358pt\mathbb{S}}^{2}(\mathbb{X})\,f(\mathbb{X})\,{\rm d}\mathbb{X}+\mathrm{o}_{d,\mathbb{S}}(1)\stackrel{{\scriptstyle\phantom{\eqref{eq:LLT.order.2}}}}{{=}}\frac{2^{-d(d+1)/4}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))}{\sqrt{(2\pi)^{d(d+1)/2}\,2^{-d(d-1)/2}\|\sqrt{2b}\,\mathbb{S}\|^{d+1}}}\underbrace{\int_{\mathcal{S}^{\hskip 0.85358ptd}}g_{1/b,\frac{b}{\sqrt{2}}\hskip 0.85358pt\mathbb{S}}(\mathbb{X}){\rm d}\mathbb{X}}_{=~{}1+\mathrm{o}_{d,\mathbb{S}}(1)}\,+\,\mathrm{o}_{d,\mathbb{S}}(1)$
		$\displaystyle\stackrel{{\scriptstyle\phantom{\eqref{eq:LLT.order.2}}}}{{=}}b^{-d(d+1)/4}\,\frac{\|\sqrt{\pi}\,\mathbb{S}\|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2})).$		(27)

By applying this last estimate in (26), we obtain the pointwise variance.

Proposition 1 (Pointwise variance).

Assume that $f$ is Lipschitz continuous and bounded on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ . For any given $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , we have

\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}b^{-d(d+1)/4}\,\frac{|\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2})),\quad n\to\infty.

(28)

From this, other asymptotic expressions can be derived such as the mean squared error and the mean integrated squared error, and we can also optimize the bandwidth parameter $b$ with respect these expressions to implement a plug-in selection method exactly as we would in the setting of traditional multivariate kernel estimators, see, e.g., Scott [50, Section 6.5] or Chacón and Duong [14, Section 3.6]. The expressions for the mean squared error and the mean integrated squared error (away from the boundary) are provided below in Corollary 2 and Theorem 4, respectively, together with the corresponding optimal choice of $b$ . For the sake of completeness, we also provide results on the pointwise bias of our estimator (see Theorem 2), its pointwise variance as we move towards the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ (see Theorem 3) and its asymptotic normality (see Theorem 5).

For each result in the remainder of this section, one of the following two assumptions will be used:

$\displaystyle\bullet\quad$	The density $f$ is Lipschitz continuous and bounded on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ .	(29)
$\displaystyle\bullet\quad$	The density $f$ and its first order partial derivatives are continuous and bounded on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ ,	(30)
	and the second order partial derivatives of $f$ are uniformly continuous and bounded on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ .	(31)

Remark 2.

Again, to make sense of the above assumptions, note that $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ is an open and convex subset in the space of symmetric matrices of size $d\times d$ , denoted by $\mathcal{S}^{\hskip 0.85358ptd}$ , which itself is isomorphic to $\mathbb{R}^{d(d+1)/2}$ .

We denote the expectation of $\hat{f}_{n,b}(\mathbb{S})$ by

\displaystyle f_{b}(\mathbb{S})\vcentcolon=\mathbb{E}\left[\hat{f}_{n,b}(\mathbb{S})\right]

\displaystyle=\mathbb{E}[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{1})]=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{M})f(\mathbb{M}){\rm d}\mathbb{M}.

(32)

Alternatively, notice that if $\mathbb{W}_{\mathbb{S}}\sim\mathrm{Wishart}_{d}(1/b,b\hskip 0.85358pt\mathbb{S})$ , then we also have the representation

f_{b}(\mathbb{S})=\mathbb{E}[f(\mathbb{W}_{\mathbb{S}})].

(33)

The asymptotics of the pointwise bias and variance were first computed by Chen [15, 16] for Beta and Gamma kernel estimators, by Ouimet and Tolosana-Delgado [43] for the Dirichlet kernel estimator of Aitchison and Lauder [2], and by Kokonendji and Somé [35, 36] for multivariate associated kernel estimators. The next two theorems below extend the (unmodified) Gamma case to our multidimensional setting.

Theorem 2 (Pointwise bias).

Assume that (31) holds. Then as $b\to 0$ , and for all $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , we have

\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]=f_{b}(\mathbb{S})-f(\mathbb{S})=b\,g(\mathbb{S})+\mathrm{o}_{d,\mathbb{S}}(b),

(34)

where

\displaystyle g(\mathbb{S})

\displaystyle\vcentcolon=\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\left[2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})B_{d}\right]_{(\boldsymbol{i},\boldsymbol{j})}\,\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S}),

(35)

and where $\otimes$ denotes the Kronecker product, and $[\,\cdot\,]_{(\boldsymbol{i},\boldsymbol{j})}$ means that we select the entry $((i_{2}-1)i_{2}/2+i_{1},(j_{2}-1)j_{2}/2+j_{1})$ in the $(d(d+1)/2)\times(d(d+1)/2)$ matrix.

Theorem 3 (Pointwise variance near and away from the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ ).

Assume that (29) holds. Furthermore, let $\mathbb{K}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be independent of $b$ and assume that it diagonalizes as $\mathbb{K}=V\,\mathrm{diag}(\boldsymbol{\lambda}(\mathbb{K}))\,V^{\top}$ . Pick any subset $\emptyset\subseteq\mathcal{J}\subseteq[d]$ and assume that

\mathbb{S}=V\,\mathrm{diag}(\boldsymbol{\lambda}_{b}(\mathbb{K}))\,V^{\top},\quad\text{where the vector $\boldsymbol{\lambda}_{b}(\mathbb{K})$ satisfies}\quad[\boldsymbol{\lambda}_{b}(\mathbb{K})]_{i}\vcentcolon=\begin{cases}\lambda_{i}(\mathbb{K}),&\mbox{if }i\in[d]\backslash\mathcal{J},\\ b\lambda_{i}(\mathbb{K}),&\mbox{if }i\in\mathcal{J}.\end{cases}

(36)

In particular, with this choice of $\mathbb{S}$ , note that $|\mathbb{S}|=b^{|\mathcal{J}|}|\mathbb{K}|$ . Then we have, as $n\to\infty$ ,

\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))=n^{-1}b^{-r(d)/2-|\mathcal{J}|\frac{d+1}{2}}\cdot\psi(\mathbb{K})\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))+\mathcal{O}_{d,\mathbb{S}}(n^{-1}),

(37)

where

r(d)\vcentcolon=\frac{d(d+1)}{2}\qquad\text{and}\qquad\psi(\mathbb{K})\vcentcolon=\frac{|\sqrt{\pi}\,\mathbb{K}|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}.

(38)

The above theorem means that the pointwise variance is $\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2})$ away from the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ and it gets multiplied by a factor $b^{-(d+1)/2}$ everytime one of the $d$ eigenvalues approaches zero at a linear rate with respect to $b$ . If $|\mathcal{J}|$ eigenvalues approach zero as $b\to 0$ , then the pointwise variance is $\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2-|\mathcal{J}|\frac{d+1}{2}})$ .

By combining Theorem 2 and Proposition 1 (equivalently, Theorem 3 for $\mathcal{J}=\emptyset$ ), we can compute the mean squared error of our estimator and optimize the choice of the bandwidth parameter $b$ .

Corollary 2 (Mean squared error).

Assume that (31) holds. Then, as $n\to\infty$ and $b=b(n)\to 0$ , and for any given $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , we have

	$\displaystyle\mathrm{MSE}[\hat{f}_{n,b}(\mathbb{S})]$	$\displaystyle\vcentcolon=\mathbb{E}\left[\left\|\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S})\right\|^{2}\right]=\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))+\left(\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]\right)^{2}$		(39)
		$\displaystyle=n^{-1}b^{-r(d)/2}\cdot\psi(\mathbb{S})f(\mathbb{S})+b^{2}g^{2}(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(n^{-1}b^{-r(d)/2+1/2})+\mathrm{o}_{d,\mathbb{S}}(b^{2}).$		(39)

In particular, if $f(\mathbb{S})\cdot g(\mathbb{S})\neq 0$ , the asymptotically optimal choice of $b$ , with respect to $\mathrm{MSE}$ , is

b_{\mathrm{opt}}(\mathbb{S})=n^{-2/(r(d)+4)}\left[\frac{r(d)}{4}\cdot\frac{\psi(\mathbb{S})f(\mathbb{S})}{g^{2}(\mathbb{S})}\right]^{2/(r(d)+4)},

(40)

with

\mathrm{MSE}[\hat{f}_{n,b_{\mathrm{opt}}(\mathbb{S})}(\mathbb{S})]=n^{-4/(r(d)+4)}\left[\frac{1+\frac{r(d)}{4}}{\left(\frac{r(d)}{4}\right)^{\frac{r(d)}{r(d)+4}}}\right]\frac{\left(\psi(\mathbb{S})f(\mathbb{S})\right)^{4/(r(d)+4)}}{\left(g^{2}(\mathbb{S})\right)^{-r(d)/(r(d)+4)}}+\mathrm{o}_{d,\mathbb{S}}(n^{-4/(r(d)+4)}),\quad n\to\infty.

(41)

More generally, if $n^{2/(r(d)+4)}\,b\to\lambda$ as $n\to\infty$ and $b=b(n)\to 0$ for some $\lambda>0$ , then

\mathrm{MSE}[\hat{f}_{n,b}(\mathbb{S})]=n^{-4/(r(d)+4)}\left[\lambda^{-r(d)/2}\psi(\mathbb{S})f(\mathbb{S})+\lambda^{2}g^{2}(\mathbb{S})\right]+\mathrm{o}_{d,\mathbb{S}}(n^{-4/(r(d)+4)}).

(42)

By integrating the MSE on the following subset of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ ,

\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)\vcentcolon=\left\{\mathbb{M}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}:\delta\leq\lambda_{1}(\mathbb{M})\leq\dots\leq\lambda_{d}(\mathbb{M})\leq\delta^{-1}\right\},\qquad 0<\delta<1,

(43)

we obtain the next result.

Theorem 4 (Mean integrated squared error on $\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)$ ).

Assume that (31) holds. For a given $\delta\in(0,1)$ , assume also that

\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}<\infty,\qquad\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}<\infty,

(44)

where recall $\psi$ and $g$ were defined in (38) and (35), respectively. Then, as $n\to\infty$ and $b=b(n)\to 0$ , we have

	$\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}]$	$\displaystyle\vcentcolon=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{E}\left[\left\|\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S})\right\|^{2}\right]{\rm d}\mathbb{S}$		(45)
		$\displaystyle=n^{-1}b^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+b^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}+\mathrm{o}_{d,\delta}(n^{-1}b^{-r(d)/2})+\mathrm{o}_{d,\delta}(b^{2}).$		(45)

In particular, if $\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}>0$ , the asymptotically optimal choice of $b$ , with respect to $\mathrm{MISE}_{\delta}$ , is

b_{\mathrm{opt}}=n^{-2/(r(d)+4)}\left[\frac{r(d)}{4}\cdot\frac{\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}}{\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}}\right]^{2/(r(d)+4)},

(46)

with

\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b_{\mathrm{opt}}}]

\displaystyle=n^{-4/(r(d)+4)}\left[\frac{1+\frac{r(d)}{4}}{\left(\frac{r(d)}{4}\right)^{\frac{r(d)}{r(d)+4}}}\right]\frac{\left(\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}\right)^{4/(r(d)+4)}}{\left(\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}\right)^{-r(d)/(r(d)+4)}}+\mathrm{o}_{d,\delta}(n^{-4/(r(d)+4)}),\quad n\to\infty.

(47)

More generally, if $n^{2/(r(d)+4)}\,b\to\lambda$ as $n\to\infty$ and $b=b(n)\to 0$ for some $\lambda>0$ , then

\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}]

\displaystyle=n^{-4/(r(d)+4)}\left[\lambda^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+\lambda^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}\right]+\mathrm{o}_{d,\delta}(n^{-4/(r(d)+4)}).

(48)

A straightforward verification of the Lindeberg condition for double arrays yields the asymptotic normality.

Theorem 5 (Asymptotic normality).

Assume that (29) holds. Let $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be such that $f(\mathbb{S})>0$ . If $n^{1/2}b^{r(d)/4}\to\infty$ as $n\to\infty$ and $b=b(n)\to 0$ , then

n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})).

(49)

If we also have $n^{1/2}b^{r(d)/4+1/2}\to 0$ as $n\to\infty$ and $b=b(n)\to 0$ , then Theorem 2 implies

n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})).

(50)

Independently of the above rates for $n$ and $b$ , if we assume (31) instead and $n^{2/(r(d)+4)}\,b\to\lambda$ as $n\to\infty$ and $b=b(n)\to 0$ for some $\lambda>0$ , then Theorem 2 implies

n^{2/(r(d)+4)}(\hat{f}_{n,b}(\mathbb{S})-f(\mathbb{S}))\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(\lambda\,g(\mathbb{S}),\lambda^{-r(d)/2}\psi(\mathbb{S})f(\mathbb{S})).

(51)

Remark 3.

The rate of convergence for the traditional $d(d+1)/2$ -dimensional kernel density estimator with i.i.d. data and bandwidth $h$ is $\mathcal{O}_{d}(n^{-1/2}h^{-r(d)/2})$ in Theorem 3.1.15 of Prakasa Rao [48], whereas $\hat{f}_{n,b}$ converges at a rate of $\mathcal{O}_{d}(n^{-1/2}b^{-r(d)/4})$ . Hence, the relation between the bandwidth of $\hat{f}_{n,b}$ and the bandwidth of the traditional multivariate kernel density estimator is $b\approx h^{2}$ .

3.2 Total variation and other probability metrics upper bounds between the Wishart and SMN distributions

Our second application of Theorem 1 is to compute an upper bound on the total variation between the probability measures induced by (3) and (8). Given the relation there is between the total variation and other probability metrics such as the Hellinger distance (see, e.g., Gibbs and Su [24, p.421]), we obtain several other upper bounds automatically. For the uninitiated reader, the utility of having total variation or Hellinger distance bounds between two measures is discussed by Pollard [47].

Theorem 6.

Let $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be given. Let $\mathbb{Q}_{\nu,\mathbb{S}}$ be the law of the $\mathrm{SMN}_{d\times d}(\nu\hskip 0.85358pt\mathbb{S},B_{d}^{\top}(\sqrt{2\nu}\,\mathbb{S}\otimes\sqrt{2\nu}\,\mathbb{S})B_{d})$ distribution defined in (8), and let $\mathbb{P}_{\nu,\mathbb{S}}$ be the law of the $\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ distribution defined in (3). Then, as $\nu\to\infty$ , we have

\mathrm{dist}\hskip 0.85358pt(\mathbb{P}_{\nu,\mathbb{S}},\mathbb{Q}_{\nu,\mathbb{S}})\leq\frac{C\hskip 0.85358ptd^{\hskip 0.85358pt3/2}}{\sqrt{\nu}}\qquad\text{and}\qquad\mathcal{H}(\mathbb{P}_{\nu,\mathbb{S}},\mathbb{Q}_{\nu,\mathbb{S}})\leq\sqrt{\frac{2C\hskip 0.85358ptd^{\hskip 0.85358pt3/2}}{\sqrt{\nu}}},

(52)

where $C>0$ is a universal constant, $\mathcal{H}(\cdot,\cdot)$ denotes the Hellinger distance, and $\mathrm{dist}\hskip 0.85358pt(\cdot,\cdot)$ can be replaced by any of the following probability metrics: Total variation, Kolmogorov (or Uniform) metric, Lévy metric, Discrepancy metric, Prokhorov metric.

4 Proofs

Proof of Theorem 1.

First, note that

\mathbb{S}^{-1/2}\hskip 0.85358pt\mathbb{X}\,\mathbb{S}^{-1/2}=\nu\,(\nu\hskip 0.85358pt\mathbb{S})^{-1/2}(\nu\hskip 0.85358pt\mathbb{S}+\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\nu\hskip 0.85358pt\mathbb{S})^{-1/2}=\nu\,(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}),

(53)

so we can rewrite (3) as

K_{\nu,\mathbb{S}}(\mathbb{X})=\frac{|\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}}|^{\nu/2-(d+1)/2}}{2^{d/2}\pi^{\hskip 0.85358ptd(d+1)/4}|\sqrt{2\nu}\,\mathbb{S}|^{(d+1)/2}}\cdot\frac{\exp\left(-\frac{\nu}{2}\mathrm{tr}(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}})\right)\,\exp(\frac{\nu d}{2})}{\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\cdot\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}}.

(54)

Using the Taylor expansion

\log(1-y)=-y-\frac{y^{2}}{2}-\frac{y^{3}}{3}-\frac{y^{4}}{4}+\mathcal{O}(y^{5}),\quad|y|<1,

(55)

and Stirling’s formula,

\log\Gamma(z+1)=\frac{1}{2}\log(2\pi)+(z+\tfrac{1}{2})\log z-z+\frac{1}{12z}+\mathcal{O}(z^{-3}),\quad z>0,

(56)

see, e.g., Abramowitz and Stegun [1, p.257], we have

$\displaystyle\log\left(\prod_{i=1}^{d}\frac{(\nu-(i+1))^{(\nu-i)/2}}{e^{-(i+1)/2}\,\nu^{(\nu-i)/2}}\right)$	$\displaystyle=\sum_{i=1}^{d}\frac{i+1}{2}+\sum_{i=1}^{d}\frac{\nu-i}{2}\log\left(1-\frac{i+1}{\nu}\right)$
	$\displaystyle=\sum_{i=1}^{d}\frac{i+1}{2}-\sum_{i=1}^{d}\left\{\frac{(\nu-i)(i+1)}{2\nu}+\frac{(\nu-i)(i+1)^{2}}{4\nu^{2}}+\mathcal{O}_{d}(\nu^{-2})\right\}$
	$\displaystyle=\sum_{i=1}^{d}\left\{\nu^{-1}\cdot\frac{i^{2}-1}{4}+\mathcal{O}_{d}(\nu^{-2})\right\}=\nu^{-1}\cdot\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\mathcal{O}_{d}(\nu^{-2}),$	(57)

and

\log\left(\prod_{i=1}^{d}\frac{\Gamma(\frac{1}{2}(\nu-(i+1))+1)}{\sqrt{2\pi}e^{-\frac{1}{2}(\nu-(i+1))}[\frac{1}{2}(\nu-(i+1))]^{(\nu-i)/2}}\right)=\sum_{i=1}^{d}\left\{\frac{1}{6(\nu-(i+1))}+\mathcal{O}_{d}(\nu^{-3})\right\}=\nu^{-1}\cdot\frac{d}{6}+\mathcal{O}_{d}(\nu^{-2}).

(58)

By taking the logarithm in (54) and using the expressions found in (4) and (58), we obtain (also using the fact that $\lambda_{i}(\mathrm{I}_{d}+\sqrt{2/\nu}\,\Delta_{\nu,\mathbb{S}})=1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})$ for all $1\leq i\leq d$ ):

	$\displaystyle\log K_{\nu,\mathbb{S}}(\mathbb{X})$	$\displaystyle=\frac{1}{2}(\nu-(d+1))\sum_{i=1}^{d}\log\left(1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)-\frac{1}{2}\log\left(2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}\|\sqrt{2\nu}\,\mathbb{S}\|^{d+1}\right)-\frac{\nu}{2}\sum_{i=1}^{d}\sqrt{\frac{2}{\nu}}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})$		(59)
		$\displaystyle\quad-\nu^{-1}\cdot\left\{\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right\}+\mathcal{O}_{d}(\nu^{-2}).$		(59)

By the Taylor expansion in (55) and the fact that $\sum_{i=1}^{d}\lambda_{i}^{k}(\Delta_{\nu,\mathbb{S}})=\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{k})$ for all $k\in\mathbb{N}$ , we have, uniformly for $\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta)$ ,

\sum_{i=1}^{d}\log\left(1+\sqrt{2/\nu}\,\lambda_{i}(\Delta_{\nu,\mathbb{S}})\right)=\sqrt{\frac{2}{\nu}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})-\frac{1}{\nu}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})+\frac{2\sqrt{2}}{3\nu^{3/2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{1}{\nu^{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\mathcal{O}_{d,\eta}\left(\frac{\max_{1\leq i\leq d}|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|^{5}}{\nu^{5/2}}\right).

(60)

Therefore,

	$\displaystyle\log K_{\nu,\mathbb{S}}(\mathbb{X})$	$\displaystyle=-\frac{1}{2}\log\left(2^{d}\pi^{\hskip 0.85358ptd(d+1)/2}\|\sqrt{2\nu}\,\mathbb{S}\|^{d+1}\right)-\frac{1}{2}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}$		(61)
		$\displaystyle\quad+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\Delta_{\nu,\mathbb{S}})\|^{5}}{\nu^{3/2}}\right).$		(61)

With the expression for the symmetric matrix-variate normal density in (8), we can rewrite the above as

	$\displaystyle\log\left(\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}\right)$	$\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}$		(62)
		$\displaystyle\quad+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\Delta_{\nu,\mathbb{S}})\|^{5}}{\nu^{3/2}}\right),$		(62)

which proves (13). To obtain (14) and conclude the proof, we take the exponential on both sides of the last equation and we expand the right-hand side with

e^{y}=1+y+\frac{y^{2}}{2}+\mathcal{O}(e^{\widetilde{\eta}}y^{3}),\quad\text{for }-\infty<y\leq\widetilde{\eta}.

(63)

For $\nu$ large enough and uniformly for $\mathbb{X}\in B_{\nu,\mathbb{S}}(\eta)$ , the right-hand side of (62) is $\mathcal{O}_{d}(1)$ , so we get

	$\displaystyle\frac{K_{\nu,\mathbb{S}}(\mathbb{X})}{g_{\nu,\mathbb{S}}(\mathbb{X})}=1$	$\displaystyle+\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})-\frac{d+1}{\sqrt{2}}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\Bigg{\}}+\nu^{-1}\cdot\left\{\frac{1}{9}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right)^{2}-\frac{d+1}{3}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}})+\frac{(d+1)^{2}}{4}\,\left(\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right)^{2}\right.$		(64)
		$\displaystyle\quad\left.-\frac{1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})+\frac{d+1}{2}\,\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})-\left(\frac{d\,(2d^{\hskip 0.85358pt2}+3d-5)}{24}+\frac{d}{6}\right)\right\}+\mathcal{O}_{d,\eta}\left(\frac{1+\max_{1\leq i\leq d}\|\lambda_{i}(\Delta_{\nu,\mathbb{S}})\|^{9}}{\nu^{3/2}}\right).$		(64)

This ends the proof. ∎

Proof of Theorem 2.

Assume that (31) holds, and let

\mathbb{W}_{\mathbb{S}}\vcentcolon=(\mathbb{W}_{\boldsymbol{i}})_{\boldsymbol{i}\in[d]^{2}}\sim\mathrm{Wishart}_{d}(1/b,b\hskip 0.85358pt\mathbb{S}),\quad\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd},~{}b>0.

(65)

(Recall the notation $[d]\vcentcolon=\{1,\dots,d\}$ .) By a second order mean value theorem, we have

	$\displaystyle f(\mathbb{W}_{\mathbb{S}})-f(\mathbb{S})$	$\displaystyle=\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})\frac{\partial}{\partial\mathbb{S}_{\boldsymbol{i}}}f(\mathbb{S})+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})(\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}})\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})$		(66)
		$\displaystyle\quad+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}(\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}})(\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}})\left(\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right),$		(66)

for some random matrix $\mathbb{M}_{\mathbb{S}}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ on the line segment joining $\mathbb{W}_{\mathbb{S}}$ and $\mathbb{S}$ in $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ . (The mean value theorem is applicable because the subspace $\mathcal{S}_{\scriptscriptstyle++}^{\scriptscriptstyle d}$ is open and convex in the space of symmetric matrices of size $d\times d$ , recall Remark 2.) If we take the expectation in the last equation, and then use the estimates in (6) and (7), we get

		$\displaystyle\Bigg{\|}f_{b}(\mathbb{S})-f(\mathbb{S})-\frac{b}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\left[2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})B_{d}\right]_{(\boldsymbol{i},\boldsymbol{j})}\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\Bigg{\|}$
		$\displaystyle\quad\leq\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[\|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}\|\|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}\|\cdot\left\|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right\|\cdot\mathds{1}_{\{\\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\\|_{1}\leq\delta_{\varepsilon,d}\}}\right]$
		$\displaystyle\qquad+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[\|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}\|\|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}\|\cdot\left\|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right\|\cdot\mathds{1}_{\{\\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\\|_{1}>\delta_{\varepsilon,d}\}}\right]$
		$\displaystyle\quad=\vcentcolon\Delta_{1}+\Delta_{2}$		(67)

where for any given $\varepsilon>0$ , the real number $\delta_{\varepsilon,d}\in(0,1]$ is such that

\|\mathrm{vec}(\mathbb{S}^{\prime}-\mathbb{S})\|_{1}\leq\delta_{d,\varepsilon}\quad\text{implies}\quad\left|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S}^{\prime})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right|<\varepsilon,

(68)

uniformly for $\mathbb{S},\mathbb{S}^{\prime}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ . (We know that such a number exists because the second order partial derivatives of $f$ are assumed to be uniformly continuous on $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ .) Equations (68) and (7) then yield, together with the Cauchy-Schwarz inequality,

\Delta_{1}\leq\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\varepsilon\cdot\sqrt{\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|^{2}\right]}\sqrt{\mathbb{E}\left[|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}|^{2}\right]}\,=\varepsilon\cdot\mathcal{O}_{d,\mathbb{S}}(b).

(69)

The second order partial derivatives of $f$ are also assumed to be bounded, say by some constant $M_{d}>0$ . Furthermore, $\{\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\|_{1}>\delta_{\varepsilon,d}\}$ implies that at least one component of $(\mathbb{W}_{\boldsymbol{k}}-\mathbb{S}_{\boldsymbol{k}})_{\boldsymbol{k}\in[d]^{2}}$ is larger than $\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2}$ , so a union bound over $\boldsymbol{k}$ followed by $d^{\hskip 0.85358pt2}$ concentration bounds for the marginals of the Wishart distribution (the diagonal entries of a Wishart random matrix are chi-square distributed while the off-diagonal entries are variance-gamma distributed) yield

\Delta_{2}\leq\mathcal{O}_{d,\mathbb{S}}\left(\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}2M_{d}\cdot\sqrt{\sum_{\boldsymbol{k}\in[d]^{2}}\mathbb{P}\left(|\mathbb{W}_{\boldsymbol{k}}-\mathbb{S}_{\boldsymbol{k}}|\geq\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2}\right)}\right)\leq\mathcal{O}_{d,\mathbb{S}}\left(d^{\hskip 0.85358pt5}\,M_{d}\cdot\sqrt{2\exp\left(-\frac{(\delta_{\varepsilon,d}/d^{\hskip 0.85358pt2})^{2}}{2\cdot bc_{d,\mathbb{S}}}\right)}\right),

(70)

where $c_{d,\mathbb{S}}>0$ is a large enough constant that depends only on $d$ and $\mathbb{S}$ . If we choose a sequence $\varepsilon=\varepsilon(b)$ that goes to $0$ as $b\to 0$ slowly enough that $1\geq\delta_{\varepsilon,d}>d^{2}\,[100\,bc_{d,\mathbb{S}}\,|\log b|]^{1/2}$ , for example, then $\Delta_{1}+\Delta_{2}$ in (4) is $\mathrm{o}_{d,\mathbb{S}}(b)$ by (69) and (70). This ends the proof. ∎

Proof of Theorem 3.

Assume that (29) holds. First, note that we can write

\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S})=\frac{1}{n}\sum_{i=1}^{n}Y_{i,b}(\mathbb{S}),

(71)

where the random variables

Y_{i,b}(\mathbb{S})\vcentcolon=K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X}_{i})-f_{b}(\mathbb{S}),~{}~{}1\leq i\leq n,\quad\text{are i.i.d.}

(72)

Hence, if $\widetilde{\mathbb{W}}_{\mathbb{S}}\sim\mathrm{Wishart}_{d}(2/b-(d+1),b\hskip 0.85358pt\mathbb{S}/2)$ , then

	$\displaystyle\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S}))$	$\displaystyle=n^{-1}\,\mathbb{E}\left[K_{1/b,b\hskip 0.85358pt\mathbb{S}}(\mathbb{X})^{2}\right]-n^{-1}\left(f_{b}(\mathbb{S})\right)^{2}=n^{-1}A_{b}(\mathbb{S})\,\mathbb{E}[f(\widetilde{\mathbb{W}}_{\mathbb{S}})]-\mathcal{O}_{d,\mathbb{S}}(n^{-1})$		(73)
		$\displaystyle=n^{-1}A_{b}(\mathbb{S})\,(f(\mathbb{S})+\mathcal{O}_{d,\mathbb{S}}(b^{1/2}))-\mathcal{O}_{d,\mathbb{S}}(n^{-1}),$		(74)

where

\displaystyle A_{b}(\mathbb{S})

\displaystyle\vcentcolon=|2b\sqrt{\pi}\,\mathbb{S}|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\prod_{i=1}^{d}\frac{\Gamma\left(\frac{1}{b}-\frac{d+i}{2}\right)}{2^{\frac{1}{b}-i}\,\Gamma^{\hskip 0.85358pt2}\left(\frac{1}{2b}-\frac{i+1}{2}+1\right)},

(75)

and where the last line in (73) follows from the Lipschitz continuity of $f$ , the Cauchy-Schwarz inequality and the analogue of (7) for $\widetilde{\mathbb{W}}_{\mathbb{S}}=(\widetilde{\mathbb{W}}_{\boldsymbol{i}})_{\boldsymbol{i}\in[d]^{2}}$ :

\mathbb{E}[f(\widetilde{\mathbb{W}}_{\mathbb{S}})]-f(\mathbb{S})=\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}\mathcal{O}\left(\mathbb{E}\left[|\widetilde{\mathbb{W}}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|\right]\right)\leq\sum_{\begin{subarray}{c}\boldsymbol{i}\in[d]^{2}\\ i_{1}\leq i_{2}\end{subarray}}\mathcal{O}\left(\sqrt{\mathbb{E}\left[|\widetilde{\mathbb{W}}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}|^{2}\right]}\right)=\mathcal{O}_{d,\mathbb{S}}(b^{1/2}).

(76)

Now, by Stirling’s formula,

\frac{\sqrt{2\pi}e^{-x}x^{x+1/2}}{\Gamma(x+1)}\longrightarrow 1,\quad x\to\infty.

(77)

Therefore,

	$\displaystyle A_{b}(\mathbb{S})$	$\displaystyle=\|2b\sqrt{\pi}\,\mathbb{S}\|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\cdot\frac{1}{(2\pi)^{d/2}}\,\prod_{i=1}^{d}\frac{e^{(d-i)/2}\left(\frac{1}{b}-\frac{d+i+2}{2}\right)^{\frac{1}{b}-\frac{d+i+1}{2}}}{2^{\frac{1}{b}-i}\left(\frac{1}{2b}-\frac{i+1}{2}\right)^{\frac{1}{b}-i}}\cdot(1+\mathcal{O}(b))$		(78)
		$\displaystyle=\|2b\sqrt{\pi}\,\mathbb{S}\|^{-\frac{d+1}{2}}\,\pi^{\frac{d}{2}}\cdot\frac{b^{\frac{d(d+1)}{4}}}{(2\pi)^{d/2}}\cdot\prod_{i=1}^{d}e^{(d-i)/2}\left(1-\frac{(d-i)/2}{\frac{1}{b}-(i+1)}\right)^{1/b}\cdot(1+\mathcal{O}(b))=\frac{\|\sqrt{b\pi}\,\mathbb{S}\|^{-\frac{d+1}{2}}}{2^{d(d+2)/2}}\cdot(1+\mathcal{O}_{d}(b)),$		(78)

where the last equality follows from the fact that $\lim_{n\to\infty}e^{x}(1-\frac{x}{n})^{n}\to 1$ for all $x\in\mathbb{R}$ . By our assumption on $\mathbb{S}$ , note that $|\mathbb{S}|=b^{|\mathcal{J}|}\,|\mathbb{K}|$ , so we get the general expression in (37) by combining (73) and (78). This ends the proof. ∎

Proof of Theorem 4.

Assume that (31) holds. By Theorem 2, Theorem 3 for $\mathcal{J}=\emptyset$ , our assumptions in (44), and the dominated convergence theorem, it is possible to show that

	$\displaystyle\mathrm{MISE}_{\delta}[\hat{f}_{n,b}]$	$\displaystyle=\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{V}\mathrm{ar}(\hat{f}_{n,b}(\mathbb{S})){\rm d}\mathbb{S}+\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\mathbb{B}\mathrm{ias}[\hat{f}_{n,b}(\mathbb{S})]^{2}{\rm d}\mathbb{S}$		(79)
		$\displaystyle=n^{-1}b^{-r(d)/2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}\psi(\mathbb{S})f(\mathbb{S}){\rm d}\mathbb{S}+b^{2}\int_{\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)}g^{2}(\mathbb{S}){\rm d}\mathbb{S}+\mathrm{o}_{d,\delta}(n^{-1}b^{-r(d)/2})+\mathrm{o}_{d,\delta}(b^{2}).$		(79)

This ends the proof. ∎

Proof of Theorem 5.

Assume that (31) holds. By (71), the asymptotic normality of $n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S}))$ will be proved if we verify the following Lindeberg condition for double arrays (see, e.g., Section 1.9.3 in [51]): For every $\varepsilon>0$ ,

s_{b}^{-2}\,\mathbb{E}\left[|Y_{1,b}(\mathbb{S})|^{2}\,\mathds{1}_{\{|Y_{1,b}(\mathbb{S})|>\varepsilon n^{1/2}s_{b}\}}\right]\longrightarrow 0,\quad n\to\infty,

(80)

where $s_{b}^{2}\vcentcolon=\mathbb{E}\left[|Y_{1,b}(\mathbb{S})|^{2}\right]$ and $b=b(n)\to 0$ . From Lemma 3 with $\nu=1/b$ and $\mathbb{M}=b\,\mathbb{S}$ , we know that

|Y_{1,b}(\mathbb{S})|=\mathcal{O}\left(\psi(\mathbb{S})b^{-r(d)/2}\right)=\mathcal{O}_{d,\mathbb{S}}(b^{-r(d)/2}),

(81)

and we also know that $s_{b}=b^{-r(d)/4}\sqrt{\psi(\mathbb{S})f(\mathbb{S})}\,(1+\mathrm{o}_{d,\mathbb{S}}(1))$ when $f$ is Lipschitz continuous and bounded, by the proof of Theorem 3. Therefore, whenever $n^{1/2}b^{r(d)/4}\to\infty$ as $n\to\infty$ (and $b\to 0$ ), we have

\frac{|Y_{1,b}(\mathbb{S})|}{n^{1/2}s_{b}}=\mathcal{O}_{d,\mathbb{S}}(n^{-1/2}\,b^{r(d)/4}\,b^{-r(d)/2})=\mathcal{O}_{d,\mathbb{S}}(n^{-1/2}b^{-r(d)/4})\longrightarrow 0.

(82)

Under this condition, Equation (80) holds (since for any given $\varepsilon>0$ , the indicator function is equal to $0$ for $n$ large enough, independently of $\omega$ ) and thus

\displaystyle n^{1/2}b^{r(d)/4}(\hat{f}_{n,b}(\mathbb{S})-f_{b}(\mathbb{S}))

\displaystyle=n^{1/2}b^{r(d)/4}\cdot\frac{1}{n}\sum_{i=1}^{n}Y_{i,m}\stackrel{{\scriptstyle\mathscr{D}}}{{\longrightarrow}}\mathcal{N}(0,\psi(\mathbb{S})f(\mathbb{S})).

(83)

This ends the proof. ∎

Proof of Theorem 6.

By the comparison of the total variation norm $\|\cdot\|$ with the Hellinger distance on page 726 of Carter [13], we already know that

\|\mathbb{P}_{\nu,\mathbb{S}}-\mathbb{Q}_{\nu,\mathbb{S}}\|\leq\sqrt{2\,\mathbb{P}\left(\mathbb{X}\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\right)+\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right]}.

(84)

Then, by applying a union bound followed by large deviation bounds on the eigenvalues of the Wishart matrix, we get, for $\nu$ large enough,

\mathbb{P}(\mathbb{X}\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2))\leq\sum_{i=1}^{d}\mathbb{P}\left(|\lambda_{i}(\Delta_{\nu,\mathbb{S}})|>\frac{\nu^{1/6}}{2\sqrt{2}}\right)\leq d\cdot 2\,\exp\left(-\frac{\nu^{1/3}}{100}\right).

(85)

By Theorem 1, we have

$\displaystyle\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right]$	$\displaystyle=\nu^{-1/2}\cdot\Bigg{\{}\frac{\sqrt{2}}{3}\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\right]-\frac{d+1}{\sqrt{2}}\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\right]\Bigg{\}}$	(86)
	$\displaystyle\quad+\nu^{-1/2}\cdot\mathcal{O}\left(\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\,\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\}}\right]\right\|+d\cdot\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\,\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in B_{\nu,\mathbb{S}}^{\hskip 0.85358ptc}(1/2)\}}\right]\right\|\right)$
	$\displaystyle\quad+\nu^{-1}\cdot\mathcal{O}\left(\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})\right]\right\|+d\cdot\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right]\right\|+d^{\hskip 0.85358pt3}\right).$

On the right-hand side, the first and third lines are estimated using Lemma 1, and the second line is bounded using Lemma 2. We find

\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\mathbb{P}_{\nu,\mathbb{S}}}{{\rm d}\mathbb{Q}_{\nu,\mathbb{S}}}(\mathbb{X})\Bigg{)}\,\mathds{1}_{\{\mathbb{X}\in B_{\nu,\mathbb{S}}(1/2)\}}\right]=\mathcal{O}(\nu^{-1}d^{\hskip 0.85358pt3}).

(87)

Putting (85) and (86) together in (84) gives the conclusion. ∎

Appendix A Technical computations

Below, we compute the expectations for the trace of powers (up to $4$ ) of a normalized Wishart matrix. The lemma is used to estimate some trace moments and the $\asymp\nu^{-1}$ errors in (86) of the proof of Theorem 6, and also as a preliminary result for the proof of Lemma 2.

Lemma 1.

Let $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be given. If $\mathbb{X}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ according to (3), then

	$\displaystyle\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}})\right]$	$\displaystyle=0,\qquad\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{2})\right]=\frac{d(d+1)}{2},\qquad\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{3})\right]=\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}},$		(88)
	$\displaystyle\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{4})\right]$	$\displaystyle=\frac{d(2\hskip 0.85358ptd^{\hskip 0.85358pt2}+5\hskip 0.85358ptd+5)}{4}+\nu^{-1}\cdot\frac{d(d^{\hskip 0.85358pt3}+6\hskip 0.85358ptd^{\hskip 0.85358pt2}+21\hskip 0.85358ptd+20)}{4},$		(89)

where recall $\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}$ .

Proof of Lemma 1.

Let $\mathbb{Y}\vcentcolon=\mathbb{S}^{-1/2}\hskip 0.85358pt\mathbb{X}\,\mathbb{S}^{-1/2}\sim\mathrm{Wishart}_{d}(\nu,\mathrm{I}_{d})$ . It was shown by Letac and Massam [38, p.308-310] (another source could be de Waal and Nel [57, p.66], or Lu and Richards [40, Theorem 3.2], although the latter is less explicit) that

$\displaystyle\mathbb{E}[\mathbb{Y}]$	$\displaystyle=\nu\,\mathrm{I}_{d},$	(90)
$\displaystyle\mathbb{E}[\mathbb{Y}^{2}]$	$\displaystyle=\nu\,\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d})+(\nu^{2}+\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt2}=\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d},$	(91)
$\displaystyle\mathbb{E}[\mathbb{Y}^{3}]$	$\displaystyle=\nu\,\mathrm{I}_{d}\,(\mathrm{tr}(\mathrm{I}_{d}))^{2}+(\nu^{2}+\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+2\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\mathrm{tr}(\mathrm{I}_{d})\right)+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt3}$
	$\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d},$	(92)
$\displaystyle\mathbb{E}[\mathbb{Y}^{4}]$	$\displaystyle=\nu\,\mathrm{I}_{d}\,(\mathrm{tr}(\mathrm{I}_{d}))^{3}+3\,(\nu^{2}+\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d})\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+\mathrm{I}_{d}^{\hskip 0.85358pt2}\,(\mathrm{tr}(\mathrm{I}_{d}))^{2}\right)+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\left(\mathrm{I}_{d}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt3})+3\,\mathrm{I}_{d}^{\hskip 0.85358pt3}\,\mathrm{tr}(\mathrm{I}_{d})\right)$
	$\displaystyle\quad+(2\,\nu^{3}+5\,\nu^{2}+5\,\nu)\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\mathrm{tr}(\mathrm{I}_{d}^{\hskip 0.85358pt2})+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d}^{4}$
	$\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt3}\,\mathrm{I}_{d}+6\,(\nu^{2}+\nu)\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+(6\,\nu^{3}+17\,\nu^{2}+21\,\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d},$	(93)

from which we deduce the following:

$\displaystyle\mathbb{E}[\mathbb{Y}-\nu\,\mathrm{I}_{d}]$	$\displaystyle=0,$	(94)
$\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{2}]$	$\displaystyle=\mathbb{E}[\mathbb{Y}^{2}]-(\nu\,\mathrm{I}_{d})^{2}=\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}-\nu^{2}\,\mathrm{I}_{d}=\nu\,(d+1)\,\mathrm{I}_{d},$	(95)
$\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{3}]$	$\displaystyle=\mathbb{E}[\mathbb{Y}^{3}]-3\,\nu\,\mathbb{E}[\mathbb{Y}^{2}]+2\,(\nu\,\mathrm{I}_{d})^{3}$
	$\displaystyle=\left\{\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}\right\}-3\,\nu\,\mathrm{I}_{d}\,\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}+2\,\nu^{3}\,\mathrm{I}_{d}^{\hskip 0.85358pt3}$
	$\displaystyle=\nu\,(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)\,\mathrm{I}_{d}$	(96)
$\displaystyle\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{4}]$	$\displaystyle=\mathbb{E}[\mathbb{Y}^{4}]-4\,(\nu\,\mathrm{I}_{d})\,\mathbb{E}[\mathbb{Y}^{3}]+6\,(\nu\,\mathrm{I}_{d})^{2}\,\mathbb{E}[\mathbb{Y}^{2}]-3\,(\nu\,\mathrm{I}_{d})^{4}$
	$\displaystyle=\nu\hskip 0.85358ptd^{\hskip 0.85358pt3}\,\mathrm{I}_{d}+6\,(\nu^{2}+\nu)\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+(6\,\nu^{3}+17\,\nu^{2}+21\,\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{4}+6\,\nu^{3}+21\,\nu^{2}+20\,\nu)\,\mathrm{I}_{d}$
	$\displaystyle\quad-4\,\nu\,\mathrm{I}_{d}\,\left\{\nu\hskip 0.85358ptd^{\hskip 0.85358pt2}\,\mathrm{I}_{d}+3\,(\nu^{2}+\nu)\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{3}+3\,\nu^{2}+4\,\nu)\,\mathrm{I}_{d}\right\}+6\,\nu^{2}\,\mathrm{I}_{d}^{\hskip 0.85358pt2}\,\left\{\nu\hskip 0.85358ptd\,\mathrm{I}_{d}+(\nu^{2}+\nu)\,\mathrm{I}_{d}\right\}-3\,\nu^{4}\,\mathrm{I}_{d}^{4}$
	$\displaystyle=\nu^{2}\,(2\hskip 0.85358ptd^{\hskip 0.85358pt2}+5\hskip 0.85358ptd+5)\,\mathrm{I}_{d}+\nu\,(d^{\hskip 0.85358pt3}+6\hskip 0.85358ptd^{\hskip 0.85358pt2}+21\hskip 0.85358ptd+20)\,\mathrm{I}_{d}.$	(97)

By the linearity of expectations, we have

\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathrm{I}_{d}}^{k})\right]=(2\nu)^{-k/2}\,\mathrm{tr}\left(\mathbb{E}[(\mathbb{Y}-\nu\,\mathrm{I}_{d})^{k}]\right),\quad\text{for any }k\in\mathbb{N}.

(98)

The conclusion follows. ∎

We can also estimate the moments of Lemma 1 on various events. The lemma below is used to estimate the $\asymp\nu^{-1/2}$ errors in (86) of the proof of Theorem 6.

Lemma 2.

Let $\nu>d-1$ and $\mathbb{S}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ be given, and let $A\in\mathscr{B}(\mathbb{R}^{d})$ be a Borel set. If $\mathbb{X}\sim\mathrm{Wishart}_{d}(\nu,\mathbb{S})$ according to (3), then, for $\nu$ large enough,

	$\displaystyle\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]\right\|\leq d^{\hskip 0.85358pt3/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2},$		(99)
	$\displaystyle\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]-\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}}\right\|\leq 3\hskip 0.85358ptd^{\hskip 0.85358pt5/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4},$		(100)

where recall $\Delta_{\nu,\mathbb{S}}\vcentcolon=(\sqrt{2\nu}\,\mathbb{S})^{-1/2}(\mathbb{X}-\nu\hskip 0.85358pt\mathbb{S})(\sqrt{2\nu}\,\mathbb{S})^{-1/2}$ .

Proof of Lemma 2.

By Lemma 1, the Cauchy-Schwarz inequality, and Jensen’s inequality ( $(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}))^{2}\leq d\cdot\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})$ ), we have

	$\displaystyle\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]\right\|$	$\displaystyle=\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\}}\right]\right\|\leq\left(\mathbb{E}\left[(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}))^{2}\right]\right)^{1/2}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2}$		(101)
		$\displaystyle\leq\left(d\cdot\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{2})\right]\right)^{1/2}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2}\leq d^{\hskip 0.85358pt3/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/2},$		(101)

which proves (99). Similarly, by Lemma 1, Holder’s inequality, and Jensen’s inequality ( $(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3}))^{4/3}\leq d^{\hskip 0.85358pt1/3}\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})$ ), we have, for $\nu$ large enough,

	$\displaystyle\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A\}}\right]-\nu^{-1/2}\cdot\frac{d(d^{\hskip 0.85358pt2}+3\hskip 0.85358ptd+4)}{2\sqrt{2}}\right\|$	$\displaystyle=\left\|\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3})\mathds{1}_{\{\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\}}\right]\right\|\leq\left(\mathbb{E}\left[(\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{3}))^{4/3}\right]\right)^{3/4}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4}$		(102)
		$\displaystyle\leq\left(d^{\hskip 0.85358pt1/3}\,\mathbb{E}\left[\mathrm{tr}(\Delta_{\nu,\mathbb{S}}^{4})\right]\right)^{3/4}\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4}\leq 3\hskip 0.85358ptd^{\hskip 0.85358pt5/2}\,\left(\mathbb{P}\left(\boldsymbol{\lambda}(\Delta_{\nu,\mathbb{S}})\in A^{c}\right)\right)^{1/4},$		(102)

which proves (100). This ends the proof. ∎

In the next lemma, we bound the density of the $\mathrm{Wishart}_{d}(\nu,\mathbb{M})$ distribution from (3) when $\nu>d+1$ .

Lemma 3.

If $\nu>d+1$ and $\mathbb{M}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}$ , then

\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})\leq\frac{(2\pi/e)^{-\frac{d(d+1)}{4}}|\mathbb{M}|^{-(d+1)/2}}{(2e)^{d/2}(\nu-(d+1))^{d(d+1)/4}}.

(103)

Proof of Lemma 3.

When $\nu>d+1$ , it is easily verified that the mode of the $\mathrm{Wishart}_{d}(\nu,\mathbb{M})$ distribution is $(\nu-(d+1))\,\mathbb{M}$ , so the expression for the Wishart density in (3) yields

\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})=\frac{(\nu-(d+1))^{\frac{\nu d}{2}-\frac{d(d+1)}{2}}|\mathbb{M}|^{-(d+1)/2}\exp\big{(}-\frac{d}{2}(\nu-(d+1))\big{)}}{2^{\nu d/2}\pi^{\hskip 0.85358ptd(d-1)/4}\prod_{i=1}^{d}\Gamma(\frac{1}{2}(\nu-(i+1))+1)}.

(104)

From Lemma 1 in [42], we know that, for all $y>1$ , $\sqrt{2\pi e}\,e^{-(y-\frac{1}{2})}(y-1)^{y-\frac{1}{2}}\leq\Gamma(y)$ , so we get

\sup_{\mathbb{X}\in\mathcal{S}_{++}^{\hskip 0.85358ptd}}K_{\nu,\mathbb{M}}(\mathbb{X})\leq\frac{(\nu-(d+1))^{\frac{\nu d}{2}-\frac{d(d+1)}{2}}|\mathbb{M}|^{-(d+1)/2}\,e^{-\frac{\nu d}{2}+\frac{d}{2}(d+1)}}{2^{\nu d/2}\pi^{\hskip 0.85358ptd(d-1)/4}\cdot(2\pi e)^{d/2}e^{-\frac{\nu d}{2}+\frac{d}{4}(d+1)}\big{[}\frac{1}{2}(\nu-(d+1))\big{]}^{\frac{\nu d}{2}-\frac{d(d+1)}{4}}}=\frac{(2\pi/e)^{-\frac{d(d+1)}{4}}|\mathbb{M}|^{-(d+1)/2}}{(2e)^{d/2}(\nu-(d+1))^{d(d+1)/4}}.

(105)

This ends the proof. ∎

Appendix B Acronyms

CMB	cosmic microwave background
i.i.d.	independent and identically distributed
SMN	symmetric matrix-variate normal
SPD	symmetric positive definite

Appendix C Simulation code

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jmva.2021.104.

Acknowledgments

First, I would like to thank Donald Richards for his indications on how to calculate the moments in Lemma 1. I also thank the Editor, the Associate Editor and the referees for their insightful remarks which led to improvements in the presentation of this paper. The author is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR).

References

Abramowitz and Stegun [1964] M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, volume 55 of National Bureau of Standards Applied Mathematics Series, For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C., 1964. MR0167642.
Aitchison and Lauder [1985] J. Aitchison, I. J. Lauder, Kernel density estimation for compositional data, J. Roy. Statist. Soc. Ser. C 34 (1985) 129–137. doi:10.2307/2347365.
Alexander et al. [2001] D. C. Alexander, C. Pierpaoli, P. J. Basser, J. C. Gee, Spatial transformations of diffusion tensor magnetic resonance images, IEEE Trans. Med. Imaging 20 (2001) 1131–1139. doi:10.1109/42.963816.
Anderson [2003] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition, 2003. MR1990662.
Asta [2021] D. M. Asta, Kernel density estimation on symmetric spaces of non-compact type, J. Multivariate Anal. 181 (2021) 104676, 10. MR4172886.
Basser and Jones [2002] P. J. Basser, D. K. Jones, Diffusion-tensor MRI: theory, experimental design and data analysis - a technical review, NMR Biomed. 15 (2002) 456–467. doi:10.1002/nbm.783.
Basser and Pajevic [2003] P. J. Basser, S. Pajevic, A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI, IEEE Trans. Med. Imaging 22 (2003) 785–794. doi:10.1109/TMI.2003.815059.
Bouezmarni et al. [2011] T. Bouezmarni, A. El Ghouch, M. Mesfioui, Gamma kernel estimators for density and hazard rate of right-censored data, J. Probab. Stat. (2011) Art. ID 937574, 16 pp. MR2801351.
Bouezmarni and Rombouts [2008] T. Bouezmarni, J. V. K. Rombouts, Density and hazard rate estimation for censored and $\alpha$ -mixing data using gamma kernels, J. Nonparametr. Stat. 20 (2008) 627–643. MR2454617.
Bouezmarni and Rombouts [2010a] T. Bouezmarni, J. V. K. Rombouts, Nonparametric density estimation for multivariate bounded data, J. Statist. Plann. Inference 140 (2010a) 139–152. MR2568128.
Bouezmarni and Rombouts [2010b] T. Bouezmarni, J. V. K. Rombouts, Nonparametric density estimation for positive time series, Comput. Statist. Data Anal. 54 (2010b) 245–261. MR2756423.
Bouezmarni and Scaillet [2005] T. Bouezmarni, O. Scaillet, Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data, Econom. Theor. 21 (2005) 390–412. MR2179543.
Carter [2002] A. V. Carter, Deficiency distance between multinomial and multivariate normal experiments, Ann. Statist. 30 (2002) 708–730. MR1922539.
Chacón and Duong [2018] J. E. Chacón, T. Duong, Multivariate Kernel Smoothing and Its Applications, volume 160 of Monographs on Statistics and Applied Probability, CRC Press, Boca Raton, FL, 2018. MR3822372.
Chen [1999] S. X. Chen, Beta kernel estimators for density functions, Comput. Statist. Data Anal. 31 (1999) 131–145. MR1718494.
Chen [2000] S. X. Chen, Probability density function estimation using gamma kernels, Ann. Inst. Statist. Math 52 (2000) 471–480. MR1794247.
Chevallier et al. [2014] E. Chevallier, A. Chevallier, J. Angulo, Computing histogram of tensor images using orthogonal series density estimation and Riemannian metrics, in: 22nd International Conference on Pattern Recognition, pp. 900–905. doi:10.1109/ICPR.2014.165.
Chevallier et al. [2017] E. Chevallier, E. Kalunga, J. Angulo, Kernel density estimation on spaces of Gaussian distributions and symmetric positive definite matrices, SIAM J. Imaging Sci. 10 (2017) 191–215. MR3606419.
Fernandes and Monteiro [2005] M. Fernandes, P. K. Monteiro, Central limit theorem for asymmetric kernel functionals, Ann. Inst. Statist. Math. 57 (2005) 425–442. MR2206532.
Fujikoshi et al. [2010] Y. Fujikoshi, V. V. Ulyanov, R. Shimizu, Multivariate Statistics, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. MR2640807.
Funke and Kawka [2015] B. Funke, R. Kawka, Nonparametric density estimation for multivariate bounded data using two non-negative multiplicative bias correction methods, Comput. Statist. Data Anal. 92 (2015) 148–162. MR3384258.
Gallaugher and McNicholas [2018] M. P. B. Gallaugher, P. D. McNicholas, Finite mixtures of skewed matrix variate distributions, Pattern Recognit. 80 (2018) 83–93. doi:10.1016/j.patcog.2018.02.025.
Gasbarra et al. [2017] D. Gasbarra, S. Pajevic, P. J. Basser, Eigenvalues of random matrices with isotropic Gaussian noise and the design of diffusion tensor imaging experiments, SIAM J. Imaging Sci. 10 (2017) 1511–1548. doi:10.1137/16M1098693.
Gibbs and Su [2002] A. L. Gibbs, F. E. Su, On choosing and bounding probability metrics, Int. Stat. Rev. 70 (2002) 419–435. doi:10.2307/1403865.
Gupta and Nagar [1999] A. K. Gupta, D. K. Nagar, Matrix Variate Distributions, Chapman and Hall/CRC, first edition, 1999.
Hadjicosta [2019] E. Hadjicosta, Integral Transform Methods in Goodness-of-Fit Testing, PhD thesis, Pennsylvania State University, 2019.
Hadjicosta and Richards [2020] E. Hadjicosta, D. Richards, Integral transform methods in goodness-of-fit testing, II: the Wishart distributions, Ann. Inst. Statist. Math. 72 (2020) 1317–1370. MR4169380.
Haff et al. [2011] L. R. Haff, P. T. Kim, J.-Y. Koo, D. S. P. Richards, Minimax estimation for mixtures of Wishart distributions, Ann. Statist. 39 (2011) 3417–3440. MR3012414.
Hirukawa and Sakudo [2015] M. Hirukawa, M. Sakudo, Family of the generalised gamma kernels: a generator of asymmetric kernels for nonnegative data, J. Nonparametr. Stat. 27 (2015) 41–63. MR3304359.
Hu and White [1997] W. Hu, M. White, A CMB polarization primer, New Astronomy 2 (1997) 323–344. doi:10.1016/S1384-1076(97)00022-5.
Igarashi and Kakizawa [2018] G. Igarashi, Y. Kakizawa, Generalised gamma kernel density estimation for nonnegative data and its bias reduction, J. Nonparametr. Stat. 30 (2018) 598–639. MR3843043.
Kim and Richards [2008] P. T. Kim, D. S. P. Richards, Diffusion tensor imaging and deconvolution on spaces of positive definite symmetric matrices, in: 2nd MICCAI Workshop on Mathematical Foundations of Computational Anatomy, New York, United States, 2008, pp. 140–149. inria-00632882.
Kim and Richards [2011] P. T. Kim, D. S. P. Richards, Deconvolution density estimation on the space of positive definite symmetric matrices, in: Nonparametric statistics and mixture models, World Sci. Publ., Hackensack, NJ, 2011, pp. 147–168. MR2838725.
Kocherlakota and Kocherlakota [1999] S. Kocherlakota, K. Kocherlakota, Approximations for central and noncentral bivariate chi-square distributions, Commun. Stat. - Simul. Comput. 28 (1999) 909–930. doi:10.1080/03610919908813585.
Kokonendji and Somé [2018] C. C. Kokonendji, S. M. Somé, On multivariate associated kernels to estimate general density functions, J. Korean Statist. Soc. 47 (2018) 112–126. MR3760293.
Kokonendji and Somé [2021] C. C. Kokonendji, S. M. Somé, Bayesian bandwidths in semiparametric modelling for nonnegative orthant data with diagnostics, Stats 4 (2021) 162–183. doi:10.3390/stats4010013.
Kollo and von Rosen [1995] T. Kollo, D. von Rosen, Approximating by the Wishart distribution, Ann. Inst. Statist. Math. 47 (1995) 767–783. MR1370289.
Letac and Massam [2004] G. Letac, H. Massam, All invariant moments of the Wishart distribution, Scand. J. Statist. 31 (2004) 295–318. MR2066255.
Li et al. [2020] D. Li, Y. Lu, E. Chevallier, D. Dunson, Density estimation and modeling on symmetric spaces, Preprint (2020) 1–41. arXiv:2009.01983.
Lu and Richards [2001] I.-L. Lu, D. S. P. Richards, MacMahon’s master theorem, representation theory, and moments of Wishart distributions, Adv. in Appl. Math. 27 (2001) 531–547. MR1868979.
Mallows [1961] C. L. Mallows, Latent vectors of random symmetric matrices, Biometrika 48 (1961) 133–149. MR131312.
Minc and Sathre [6465] H. Minc, L. Sathre, Some inequalities involving $(r!)^{1/r}$ , Proc. Edinburgh Math. Soc. (2) 14 (1964/65) 41–46. MR162751.
Ouimet and Tolosana-Delgado [2022] F. Ouimet, R. Tolosana-Delgado, Asymptotic properties of Dirichlet density estimators, J. Multivariate Anal. 187 (2022) 104832, 25 pp. doi:10.1016/j.jmva.2021.104832.
Pajevic and Basser [1999] S. Pajevic, P. J. Basser, Parametric description of noise in diffusion tensor MRI, in: 8th Annual Meeting of the ISMRM, Philadelphia, p. 1787.
Pajevic and Basser [2003] S. Pajevic, P. J. Basser, Parametric and non-parametric statistical analysis of DT-MRI data, J. Magn. Reson. 161 (2003) 1–14. doi:10.1016/s1090-7807(02)00178-7.
Pelletier [2005] B. Pelletier, Kernel density estimation on Riemannian manifolds, Statist. Probab. Lett. 73 (2005) 297–304. MR2179289.
Pollard [2005] D. Pollard, Total variation distance between measures, in: Asymptopia, version: 15feb05, 2005, pp. 1–15.
[URL] http://www.stat.yale.edu/ pollard/Courses/607.spring05/handouts/Totalvar.pdf.
Prakasa Rao [1983] B. L. S. Prakasa Rao, Nonparametric Functional Estimation, Probability and Mathematical Statistics, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1983. MR0740865.
Schwartzman et al. [2008] A. Schwartzman, W. F. Mascarenhas, J. E. Taylor, Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices, Ann. Statist. 36 (2008) 2886–2919. MR2485016.
Scott [2015] D. W. Scott, Multivariate Density Estimation, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, second edition, 2015. MR3329609.
Serfling [1980] R. J. Serfling, Approximation Theorems of Mathematical Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1980. MR0595165.
Somé [2020] S. M. Somé, Bayesian selector of adaptive bandwidth for gamma kernel density estimator on $[0,\infty)$ : simulations and applications, Communications in Statistics - Simulation and Computation (2020) 1–11. doi:10.1080/03610918.2020.1828921.
Somé and Kokonendji [2021] S. M. Somé, C. C. Kokonendji, Bayesian selector of adaptive bandwidth for multivariate gamma kernel estimator on $[0,\infty)^{d}$ , Journal of Applied Statistics (2021) 1–22. doi:10.1080/02664763.2021.1881456.
Steyn and Roux [1972] H. S. Steyn, J. J. J. Roux, Approximations for the non-central Wishart distribution, South African Statist. J. 6 (1972) 165–173. MR326925.
Tan and Gupta [1982] W. Y. Tan, R. P. Gupta, On approximating the noncentral Wishart distribution by central Wishart distribution: a Monte Carlo study, Comm. Statist. B—Simulation Comput. 11 (1982) 47–64. MR648656.
Vafaei Sadr and Movahed [2021] A. Vafaei Sadr, S. M. S. Movahed, Clustering of local extrema in Planck CMB maps, MNRAS 503 (2021) 815–829. doi:10.1093/mnras/stab368.
de Waal and Nel [1973] D. J. de Waal, D. G. Nel, On some expectations with respect to Wishart matrices, South African Statist. J. 7 (1973) 61–67. MR347003.
Zhang [2010] S. Zhang, A note on the performance of the gamma kernel estimators at the boundary, Statist. Probab. Lett. 80 (2010) 548–557. MR2595129.

		$\displaystyle\Bigg{\|}f_{b}(\mathbb{S})-f(\mathbb{S})-\frac{b}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\left[2\hskip 0.85358ptB_{d}^{\top}(\mathbb{S}\otimes\mathbb{S})B_{d}\right]_{(\boldsymbol{i},\boldsymbol{j})}\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\Bigg{\|}$
		$\displaystyle\quad\leq\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[\|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}\|\|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}\|\cdot\left\|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right\|\cdot\mathds{1}_{\{\\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\\|_{1}\leq\delta_{\varepsilon,d}\}}\right]$
		$\displaystyle\qquad+\frac{1}{2}\sum_{\begin{subarray}{c}\boldsymbol{i},\boldsymbol{j}\in[d]^{2}\\ i_{1}\leq i_{2},\,j_{1}\leq j_{2}\end{subarray}}\mathbb{E}\left[\|\mathbb{W}_{\boldsymbol{i}}-\mathbb{S}_{\boldsymbol{i}}\|\|\mathbb{W}_{\boldsymbol{j}}-\mathbb{S}_{\boldsymbol{j}}\|\cdot\left\|\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{M}_{\mathbb{S}})-\frac{\partial^{2}}{\partial\mathbb{S}_{\boldsymbol{i}}\partial\mathbb{S}_{\boldsymbol{j}}}f(\mathbb{S})\right\|\cdot\mathds{1}_{\{\\|\mathrm{vec}(\mathbb{W}_{\mathbb{S}}-\mathbb{S})\\|_{1}>\delta_{\varepsilon,d}\}}\right]$
		$\displaystyle\quad=\vcentcolon\Delta_{1}+\Delta_{2}$		(67)

A symmetric matrix-variate normal local approximation for the Wishart distribution and some applications

Abstract

keywords:

MSC:

1 Introduction

Remark 1 (Notation).

2 Main result

Theorem 1.

Corollary 1.

3 Applications

3.1 Asymptotic properties of Wishart asymmetric kernel estimators

Proposition 1 (Pointwise variance).

Remark 2.

Theorem 2 (Pointwise bias).

Theorem 3 (Pointwise variance near and away from the boundary of 𝒮++d\mathcal{S}_{++}^{\hskip 0.85358ptd}).

Corollary 2 (Mean squared error).

Theorem 4 (Mean integrated squared error on 𝒮++d​(δ)\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)).

Theorem 5 (Asymptotic normality).

Remark 3.

3.2 Total variation and other probability metrics upper bounds between the Wishart and SMN distributions

Theorem 6.

4 Proofs

Proof of Theorem 1.

Proof of Theorem 2.

Proof of Theorem 3.

Proof of Theorem 4.

Proof of Theorem 5.

Proof of Theorem 6.

Appendix A Technical computations

Lemma 1.

Proof of Lemma 1.

Lemma 2.

Proof of Lemma 2.

Lemma 3.

Proof of Lemma 3.

Appendix B Acronyms

Appendix C Simulation code

Acknowledgments

References

Theorem 3 (Pointwise variance near and away from the boundary of $\mathcal{S}_{++}^{\hskip 0.85358ptd}$ ).

Theorem 4 (Mean integrated squared error on $\mathcal{S}_{++}^{\hskip 0.85358ptd}(\delta)$ ).