New Highly Efficient High-Breakdown Estimator of Multivariate Scatter and Location for Elliptical Distributions

Justin A. Fishbone Lamine Mili

Abstract

High-breakdown-point estimators of multivariate location and shape matrices, such as the MM-estimator with smooth hard rejection and the Rocke S-estimator, are generally designed to have high efficiency at the Gaussian distribution. However, many phenomena are non-Gaussian, and these estimators can therefore have poor efficiency. This paper proposes a new tunable S-estimator, termed the S-q estimator, for the general class of symmetric elliptical distributions, a class containing many common families such as the multivariate Gaussian, t-, Cauchy, Laplace, hyperbolic, and normal inverse Gaussian distributions. Across this class, the S-q estimator is shown to generally provide higher maximum efficiency than other leading high-breakdown estimators while maintaining the maximum breakdown point. Furthermore, its robustness is demonstrated to be on par with these leading estimators while also being more stable with respect to initial conditions. From a practical viewpoint, these properties make the S-q broadly applicable for practitioners. This is demonstrated with an example application—the minimum-variance optimal allocation of financial portfolio investments.

keywords:

Covariance matrix estimation , Robust estimation , S-estimator , S-q estimator , Shape matrix estimation

\affiliation

[inst1]organization=Bradley Department of Electrical and Computer Engineering
Virginia Polytechnic Institute and State University,addressline=
7054 Haycock Rd, city=Falls Church, postcode=22043, state=VA, country=USA

1 Introduction

Huber (1964) introduced what is the most common class of estimators, M-estimators. Although originally applied to the location case, Maronna (1976) expanded the definition to include multivariate location and scatter. After the sample median, perhaps the most common robust M-estimators are those using the general rho functions such as the Huber or Tukey bisquare functions.

However, the drawback of using general rho functions is that they have limited efficiency when applied to parameter estimation of nonideal probability distributions. To address this, various M-estimator approaches have been taken to iteratively reweight maximum likelihood estimator (MLE) weights based on the estimated probability density function (PDF) (e.g., Windham, 1995; Basu et al., 1998; Choi and Hall, 2000; Ferrari and Yang, 2010).

Even with these improvements, multivariate M-estimators inherently have limited robustness. For example, Maronna (1976) showed that the upper-bound on the breakdown point for $p$ -dimensional M-estimators is $(p+1)^{-1}$ , which converges to zero with large $p$ . To combat this weakness, Rousseeuw and Yohai (1984) introduced regression S-estimators, which Davies (1987) expanded to multivariate location and scatter. Davies also showed that the asymptotic breakdown point of S-estimators can be set to 1/2, which is the theoretical maximum of any equivariant estimator.

In practical scenarios however, estimators may have large bias at considerably lower contamination levels than the breakdown point. For many years, the Tukey bisquare was the standard rho function for S-estimators (for example, see Lopuhaä, 1989; Rocke, 1996). However, in the context of multivariate S-estimators, the bisquare is not tunable, so its robustness falls off with increasing $p$ . For this reason, Rocke (1996) introduced the tunable biflat and translated biweight rho functions. Maronna et al. (2006, sec. 6.4.4) slightly modified the biflat, proposing the Rocke rho function. The Rocke S-estimator (shortened here to S-Rocke) is currently the recommended high-breakdown estimator for large dimensions $(p\geq 15)$ (Maronna and Yohai, 2017; Maronna et al., 2019, sec. 6.10). The recommended estimator for lower dimensions is the MM-estimator with the smoothed hard rejection function (MM-SHR).

There are two major shortcomings of the S-Rocke estimator that will be discussed in this paper. Firstly, it has low efficiency for small dimension, $p$ . Although this is an inherent disadvantage of all S-estimators, it is exceptionally acute for the S-Rocke. Secondly, the S-Rocke has poor efficiency for most common non-Gaussian distributions. This is a common problem for general-purpose estimators such as the Rocke and bisquare S-estimators, the MM-SHR, and the Huber and bisquare M-estimators. Examples of common phenomena that are frequently modeled by non-Gaussian distributions include stock returns, radar sea clutter, and speech signals, which approximately follow generalized hyperbolic (Konlack Socgnia and Wilcox, 2014), K- (Ward et al., 1990), and Laplace distributions (Gazor and Zhang, 2003), respectively.

This paper proposes and explores a new subclass of tunable, maximum-breakdown-point S-estimators that is applicable across common continuous elliptical distributions. This estimator, named the S-q estimator, uses a density-based reweighting to attain generally higher maximum efficiency across the elliptical class as compared to the S-Rocke and MM-SHR estimators. These estimators are compared from the viewpoints of statistical and computational efficiency, robustness, and stability.

Although the focus on elliptical distributions sounds limiting, as discussed in the next section, most common continuous multivariate distributions—such as the Gaussian, t-, Laplace, and hyperbolic distributions—fall into this class. As Frahm (2009) discussed, this assumption is “fundamental in multivariate analysis.”

This paper is organized as follows. Section 2 defines the new estimator and provides its functions for the most common elliptical distributions. Basic properties related to the consistency of the S-q estimator are summarized in Section 3. Section 4 provides the asymptotic distribution of the S-q estimator and compares the maximum achievable efficiencies of the S-q, S-Rocke, and MM-SHR estimators. In Section 5, the finite-sample breakdown point of the S-q is discussed, the theoretical influence functions of the estimators are compared, and the empirical finite-sample robustness of the estimators are briefly explored. Section 6 assesses two computational aspects of the estimators: computational efficiency, and stability with respect to initial estimates. A real-world example in Section 7 demonstrates the application of the estimators for the minimum-variance optimal allocation of financial portfolio investments. Finally, conclusions are summarized in Section 8.

2 Defining the S-q Estimator

This section builds the definition of the proposed S-q estimator. First, the elliptical class of distributions is reviewed. The multivariate S-estimator definition is then summarized, and finally, the S-q is defined.

2.1 Elliptical Distributions

The elliptical distribution is a general class of multivariate probability distributions encompassing many familiar subclasses such as the symmetric Gaussian, t-, Cauchy, Laplace, hyperbolic, variance gamma, and normal inverse Gaussian distributions. Table 1 summarizes the most common elliptical distributions (Fang et al., 1990, p. 69; Deng and Yao, 2018).

Table 1: Summary of Common Elliptical Distributions

Distribution Name	Generating Function, $\phi(d)$
	{Range of Parameters}
Kotz type	$d^{N}exp(-rd^{s})$
	$\left\{r>0,s>0,\,N>-\frac{p}{2}\right\}$
Gaussian	$exp\left(-\frac{d}{2}\right)$
(Kotz type with $N=0,s=1,r=1/2$ )
Pearson type II	$(1-d)^{m}$ , $\quad d\in[0,1]$
	$\{m>0\}$
Pearson type VII	$(1+d/s)^{-N}$
	$\left\{N>p/2,\,s>0\right\}$
t	$(1+d/\nu)^{-(\nu+p)/2}$
(Pearson VII with $s=\nu,N=(\nu+p)/2$ )	$\{\nu>0\}$
Cauchy	$(1+d)^{-(1+p)/2}$
(t with $\nu=1$ )
Generalized hyperbolic	$\left(\sqrt{\psi(\chi+d)}\right)^{\lambda-p/2}K_{\lambda-p/2}\left(\sqrt{\psi(\chi+d)}\right)$
	$\left\{\psi>0,[\chi>0,\lambda\in\mathbb{R}\;or\;\chi=0,\lambda>0]\right\}$
Variance gamma	$\left(\sqrt{\psi\,d}\right)^{\lambda-p/2}K_{\lambda-p/2}\left(\sqrt{\psi\,d}\right)$
(Gen. hyperbolic with $\chi=0$ )	$\left\{\psi>0,\lambda>0\right\}$
Laplace	$\left(\sqrt{2d}\right)^{1-p/2}K_{1-p/2}\left(\sqrt{2d}\right)$
(Variance gamma with $\psi=2,\lambda=1$ )
Multivariate hyperbolic	$exp\left(-\sqrt{\psi(\chi+d)}\right)$
(Gen. hyperbolic with $\lambda=(p+1)/2$ )	$\left\{\psi>0,\,\chi\geq 0\right\}$
Hyperbolic with univariate marginals	$\left(\sqrt{\psi(\chi+d)}\right)^{1-p/2}K_{1-p/2}\left(\sqrt{\psi(\chi+d)}\right)$
(Gen. hyperbolic with $\lambda=1$ )	$\left\{\psi>0,\,\chi\geq 0\right\}$
Normal inverse Gaussian	$\left(\sqrt{\psi(\chi+d)}\right)^{-(1+p)/2}K_{-(1+p)/2}\left(\sqrt{\psi(\chi+d)}\right)$
(Gen. hyperbolic with $\lambda=-1/2,\,\chi>0$ )	$\left\{\psi>0,\,\chi>0\right\}$

Symmetric elliptical distributions are defined as being a function of the squared Mahalanobis distance,¹¹1Some texts define the Mahalanobis distance with the mean and covariance, but this more restrictive definition excludes thick-tailed distributions where these do not exist, such as Cauchy distributions. $d\left(\boldsymbol{x},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)=\left(\boldsymbol{x}-\boldsymbol{\mu}\right)^{\operatorname{T}}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)$ , where $\boldsymbol{x}\in\mathbb{R}^{p}$ , the location $\boldsymbol{\mu}\in\mathbb{R}^{p}$ , and the $p\times p$ positive definite symmetric ( $\operatorname{PDS}(p)$ ) scatter $\boldsymbol{\Sigma}\in\operatorname{PDS}(p)$ . When the PDF is defined, it has the form $f_{X}\left(\boldsymbol{x}\right)=\alpha_{p}|\boldsymbol{\Sigma}|^{-1/2}\phi\left(d\left(\boldsymbol{x},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)\right),$ for some generating function $\phi(d)$ , and where $\alpha_{p}$ is a constant that ensures $f_{X}\left(\boldsymbol{x}\right)$ integrates to one. Table 1 lists common generating functions. When the covariance exists, it is proportional to the scatter matrix, $\boldsymbol{\Sigma}$ . The corresponding shape matrix is commonly defined as

\boldsymbol{\Omega}=\boldsymbol{\Sigma}/|\boldsymbol{\Sigma}|^{1/p}.

(1)

The PDF of $d\left(\boldsymbol{x},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)$ is given by (Kelker, 1970)

f_{D}\left(d\right)=\beta_{p}\,d^{p/2-1}\phi\left(d\right),

(2)

where $\beta_{p}=\alpha_{p}\pi^{p/2}/\Gamma(p/2)$ . Hereafter, all densities, $f(d),$ refer to the density of $d\left(\boldsymbol{x},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)$ in (2), so the subscript $D$ will be omitted. It is also generally assumed that $p>2.$

2.2 S-Estimators

Given a set of $n$ $p$ -dimensional samples, $\{\boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\}$ , S-estimators of location and shape are defined as (Maronna et al., 2006, Sec. 6.4.2)

$\displaystyle\left(\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Omega}}\right)=\text{arg\,min}$		$\displaystyle\widehat{\sigma}$	(3)
subject to		$\displaystyle\|\boldsymbol{\Omega}\|=1,$
	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\rho\left(\frac{d\left(\boldsymbol{x}_{i},\boldsymbol{\mu},\boldsymbol{\Omega}\right)}{\widehat{\sigma}}\right)=b,$

for some scalar rho function, $\rho(t)$ . A proper S-estimator rho function should be a continuously differentiable, nondecreasing function in $t\geq 0$ with $\rho(0)=0$ , and where there is a point $c$ such that $\rho(t)=\rho(\infty)$ for $t\geq c$ . For simplicity, and without loss of generality, the rho functions will be normalized so $\rho(\infty)=1$ . The parameter $b$ is a scalar that affects the efficiency (see Section 4) and robustness of the estimator. The purpose of S-estimators is to achieve high robustness, so they are usually configured with $b=1/2-(p+1)/(2n),$ which achieves the maximum theoretical breakdown point that any affine equivariant estimator may have (see Section 5.1). To understand the derivation of the proposed estimator in the next section, note that $\widehat{\sigma}$ in the constraint is an M-estimator of the scale of $d\left(\boldsymbol{\mu},\boldsymbol{\Omega}\right).$ Local solutions of (3) can be found iteratively using the weighted sums $\sum_{i=1}^{n}w\left(d_{i}/\widehat{\sigma}\right)\left(\boldsymbol{x}_{i}-\widehat{\boldsymbol{\mu}}\right)=\boldsymbol{0}$ and $\sum_{i=1}^{n}w\left(d_{i}/\widehat{\sigma}\right)\left(\boldsymbol{x}_{i}-\widehat{\boldsymbol{\mu}}\right)\left(\boldsymbol{x}_{i}-\widehat{\boldsymbol{\mu}}\right)^{\operatorname{T}}\propto\widehat{\boldsymbol{\Omega}},$ where the weight function $w(t)=\rho^{\prime}(t),$ and where $\widehat{\boldsymbol{\Omega}}$ is re-normalized with each iteration. For the empirical results in this paper, the estimators will all be solved using this weighted-sum algorithm.

To estimate the scatter metrix, a separate estimator of $|\boldsymbol{\Sigma}|^{1/p}$ can then be used to scale $\widehat{\boldsymbol{\Omega}}$ using (1). Maronna et al. (2006, p. 186) discussed a simple estimator to scale $\widehat{\boldsymbol{\Omega}}$ to $\widehat{\boldsymbol{\Sigma}}$ . When $\boldsymbol{x}$ is normally distributed, $d$ has a chi-squared distribution with $p$ degrees of freedom. Therefore, they suggested using $\widehat{\boldsymbol{\Sigma}}=\operatorname{Median}\left\{d\left(x_{1},\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Omega}}\right),\dots,d\left(x_{n},\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Omega}}\right)\right\}\left(\chi^{2}_{p}(0.5)\right)^{-1}\widehat{\boldsymbol{\Omega}}$ , where $\chi^{2}_{p}(0.5)$ is the 50^th percentile of the chi-squared distribution. For the general case of elliptical distributions, we propose extending this to

\widehat{\boldsymbol{\Sigma}}=\frac{\operatorname{Median}\left\{d\left(x_{1},\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Omega}}\right),\dots,d\left(x_{n},\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Omega}}\right)\right\}}{F^{-1}(0.5)}\widehat{\boldsymbol{\Omega}},

where $F(d)$ is the distribution function corresponding to (2), and therefore $F^{-1}(0.5)$ is the 50^th percentile of the distribution.

For the location and shape matrices, the S-estimator formulation in (3) is equivalent to the alternative one given by

	$\displaystyle\left(\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)=\text{arg\,min}$			$\displaystyle\|\boldsymbol{\Sigma}\|$		(4)
	subject to			$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\rho\left(\frac{d\left(\boldsymbol{x}_{i},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)}{\sigma}\right)=b,$		(4)

which requires that $\sigma$ be defined such that $b=\operatorname{E}\left[\rho\left(d\left(\boldsymbol{x};\boldsymbol{\mu},\boldsymbol{\Sigma}\right)/\sigma\right)\right]$ for a consistent estimator of $\boldsymbol{\Sigma}$ at an assumed elliptical distribution (Rocke, 1996). While the first formulation is better for understanding the derivation of the proposed S-q estimator, this second formulation is better for defining and understanding its properties (for example, see Lopuhaä, 1989). The scale parameters in the two formulations are related asymptotically by $\sigma=|\boldsymbol{\Sigma}|^{-1/p}\operatorname{E}\left[\widehat{\sigma}\right],$ at the assumed distribution.

The two most common multivariate S-estimators are the bisquare and Rocke (Maronna et al., 2019, sec. 6.4.2, 6.4.4). The S-bisquare is given by $\rho_{\text{bisq}}(t)=\min\left\{1,1-(1-t)^{3}\right\}$ and $w_{\text{bisq}}(t)=3(1-t)^{2}\operatorname{I}(t\leq 1),$ which does not have a tuning parameter to control efficiency and robustness. The S-Rocke is given by

\rho_{\gamma}(t)=\begin{cases}0&\text{if}\;0\leq t\leq 1-\gamma\\ \frac{t-1}{4\gamma}\left[3-\left(\frac{t-1}{\gamma}\right)^{2}\right]+\frac{1}{2}&\text{if}\;1-\gamma<t<1+\gamma\\ 1&\text{if}\;1+\gamma\leq t\\ \end{cases},

w_{\gamma}(t)=\frac{3}{4\gamma}\left[1-\left(\frac{t-1}{\gamma}\right)^{2}\right]\operatorname{I}(1-\gamma\leq t\leq 1+\gamma),

where the parameter $\gamma\in(0,1]$ tunes the estimator’s efficiency and robustness. The Rocke’s maximum efficiency is generally limited at $\gamma=1$ , which is extremely restricting for small $p$ . Both $\rho_{\text{bisq}}(t)$ and $\rho_{\gamma}(t)$ are generic functions that do not depend on the underlying distribution. In the following section, an alternative S-estimator is defined that accounts for the underlying distribution and that generally has better performance across the most common elliptical distributions. It also does not have the same inherent restrictions for small $p$ as $\rho_{\gamma}(t).$

2.3 Elliptical Density-Based S-q Estimator

The rho function corresponding to the maximum likelihood estimator, $\widehat{\sigma},$ of the scale of $d\left(\boldsymbol{\mu},\boldsymbol{\Omega}\right),$ or equivalently $d\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right),$ is $\rho_{\text{mle}}(t)=-t\,f^{\prime}\left(t\right)/f\left(t\right).$ We propose weighting this by the power transform of the density, $\tilde{\rho}_{q}\left(t\right)=f(t)^{1-q}\rho_{\text{mle}}(t),$ where the scalar $q\leq 1$ controls the estimator robustness, with $q=1$ corresponding to the maximum likelihood function, and with decreasing $q$ increasing the estimator robustness. In most cases, this rho function is not monotone, as required by S-estimators, so it is denoted with a tilde. This rho function is equivalent to the M-Lq and other M-estimators proposed, for example, by Windham (1995); Basu et al. (1998); Choi and Hall (2000); and Ferrari and Yang (2010). However, in this particular case of estimating the scale of the squared Mahalanobis distance of an elliptically distributed random vector, the density and rho function do not need to be regenerated with each numerical iteration, $i,$ based on the estimates $\widehat{\boldsymbol{\mu}}^{(i)}$ and $\widehat{\boldsymbol{\Omega}}^{(i)}.$ Substituting the PDF from (2),

\tilde{\rho}_{q}\left(t\right)=-\left(\beta_{p}\phi(t)\right)^{s_{q}}t^{s_{p}s_{q}}\left(t\frac{\phi^{\prime}(t)}{\phi(t)}+s_{p}\right),

(5)

where $s_{p}=p/2-1$ and $s_{q}=1-q$ . Taking the derivative of $\tilde{\rho}_{q}(t)$ , the corresponding weight function is given by

\tilde{w}_{q}(t)=-\left(\beta_{p}\phi(t)\right)^{s_{q}}t^{s_{p}s_{q}}\left(\frac{s_{q}s_{p}^{2}}{t}+\left(2s_{q}s_{p}+1\right)\frac{\phi^{\prime}(t)}{\phi(t)}-qt\left(\frac{\phi^{\prime}(t)}{\phi(t)}\right)^{2}+t\frac{\phi^{\prime\prime}(t)}{\phi(t)}\right).

(6)

For simplicity, when $q<1$ , the scalar $\beta_{p}$ can be dropped from the calculation of $\tilde{\rho}_{q}(t)$ and $\tilde{w}_{q}(t)$ in (5) and (6). When $\phi(t)$ is only positive over a finite domain (e.g. Pearson Type II distribution), then we define $\tilde{\rho}_{q}(t)$ and $\tilde{w}_{q}(t)$ to be zero outside this domain.

For the common elliptical distributions listed in Table 1, $\tilde{\rho_{q}}(t)$ is monotone in its central region between its global extrema when using appropriate values for $q$ (defined below). The first extremum is the minimum, which we label point $a$ , and the second is the maximum, labeled $c$ . The distance between $a$ and $c$ varies monotonically with respect to $q$ . We use this to define a tunable, double-hard-rejection S-estimator rho function. The value of $\tilde{\rho}_{q}(t)$ is held constant between zero and $a$ at value $\tilde{\rho}_{q}(a),$ which hard rejects inliers, and the value of $\tilde{\rho}_{q}(t)$ is held constant above $c$ at value $\tilde{\rho}_{q}(c),$ which hard rejects outliers. The resulting monotonic function is then scaled and shifted so it ranges from zero to one. This defines the S-q estimator.

Definition 1.

Assuming $\phi(t)$ is twice continuously differentiable over its region of support and $\frac{s_{q}s_{p}^{2}}{t}+\left(2s_{q}s_{p}+1\right)\frac{\phi^{\prime}(t)}{\phi(t)}-qt\left(\frac{\phi^{\prime}(t)}{\phi(t)}\right)^{2}+t\frac{\phi^{\prime\prime}(t)}{\phi(t)}$ has one or two zeros in $t\in(0,\infty)$ for $q<1$ , the S-q estimator is the S-estimator with the rho function given by

\rho_{q}(t)=\begin{cases}0&\text{if $q<1$ and $t\leq a$}\\ s_{1}\left(\tilde{\rho}_{q}(t)-\tilde{\rho}_{q}(a)\right)&\text{if $q<1$ and $a<t<c$}\\ 1&\text{if $q<1$ and $t\geq c$}\\ \tilde{\rho}_{q}(t)&\text{if $q=1$}\end{cases},

(7)

where $s_{1}=\left(\tilde{\rho}_{q}(b)-\tilde{\rho}_{q}(a)\right)^{-1}$ . The S-q estimator of Type I is the case with one zero (i.e. $a=0$ ), and the Type II S-q estimator is the case with two zeros.

For most distributions, $\lim_{q\rightarrow 1}c=\infty,$ or at $q=1$ , $\tilde{\rho}_{q}(t)$ is not bounded. Therefore, we do not scale or shift $\tilde{\rho}_{q}(t)$ in this case, and $\rho_{q}(t)$ is not a proper S-estimator rho function. However, when $q=1$ and $b=1$ , the MLE of the scale of $d$ is obtained. The S-q weight function is the derivative of $\rho_{q}(t)$ and is given by

w_{q}(t)=\begin{cases}0&\text{if $q<1$ and $t\leq a$}\\ s_{1}\tilde{w}_{q}(t)&\text{if $q<1$ and $a<t<c$}\\ 0&\text{if $q<1$ and $t\geq c$}\\ \tilde{w}_{q}(t)&\text{if $q=1$}\end{cases}.

(8)

Table 2 lists expressions for the inlier rejection point, $a,$ and the outlier rejection point, $c$ , for the common elliptical distributions in Table 1. For most of these distributions, the equation $\tilde{w}_{q}(t)=0$ is quadratic, which provides a closed-form solution for the values of $a$ and $c$ .

Table 2: S-q Inlier and Outlier Rejection Points for Common Elliptical Distributions

Distribution	Inlier Rejection Point $a$ and Outlier Rejection Point $c$
Kotz type	$a,c$	$=\left(\frac{s+2s_{q}N+2s_{p}s_{q}\mp\sqrt{s^{2}+4ss_{q}N+4ss_{p}s_{q}}}{2s_{q}rs}\right)^{1/s}$
Gaussian	$a,c$	$=\frac{1+2s_{p}s_{q}\mp\sqrt{1+4s_{p}s_{q}}}{s_{q}}$
Pearson type II	$a,c$	$=\frac{2s_{q}s_{p}^{2}+m\left(2s_{q}s_{p}+1\right)\mp\sqrt{m^{2}\left(4s_{q}s_{p}+1\right)+4ms_{q}s_{p}^{2}}}{2\left(s_{q}s_{p}^{2}+m\left(2s_{q}s_{p}+ms_{q}\right)\right)}$
Pearson type VII	$a,c$	$=s\frac{2Ns_{q}s_{p}+N-2s_{q}s_{p}^{2}\mp\sqrt{4N^{2}s_{q}s_{p}-4Ns_{q}s_{p}^{2}+N^{2}}}{2s_{q}\left(s_{p}^{2}-2Ns_{p}+N^{2}\right)}$
Generalized hyperbolic	$a$	$=\begin{cases}0&when\;\chi=0\;and\;\lambda=1\\ \left\{t\|\tilde{w}_{q}(t)=0\;and\;t\in(0,c)\right\}&otherwise\\ \end{cases}$
	$c$	$=\left\{t\|\tilde{w}_{q}(t)=0\;and\;t\in(a,\infty)\right\}$

The asymptotic rejection probability (ARP) is defined as $Pr(d/\widehat{\sigma}\geq c)$ (Rocke, 1996). Table 2 can be used to determine $q$ from a desired ARP using $F^{-1}(ARP)$ . However, since $w_{q}(t)$ is very tapered (i.e. applying little weight to values just below $c$ ), practitioners may choose alternative approaches to tuning that allow for higher estimator efficiencies. For example, the approach used in this paper as well as in Maronna and Yohai (2017) is to tune the estimators to a desired expected efficiency, which is defined in the next section.

The general definition in (7) specifies that $q\leq 1$ . In a few particular cases, however, there are some minor restrictions on $q$ (when $q<1$ ) in order to ensure that $a$ and $c$ are in the support of $f(d)$ . Table 3 lists these restrictions.

Table 3: Restrictions on Parameter

q

for Common Elliptical Distributions

^*Empirically inferred. Computational precision restricts $q\notin(0.998,1),$ approximately.
Distribution	Valid Range of $q$
Kotz type	$q\leq 1$ unless $-1-s_{p}<N<-s_{p}$ , then $1+\frac{s}{4\left(s_{p}+N\right)}<q\leq 1$
Gaussian	$q\leq 1$
Pearson type II	$q=1$ or $q<1-\frac{1}{m}$
Pearson type VII	$q\leq 1$
Generalized hyperbolic^*	$q\leq 1$ unless $\chi=0$ and $\lambda<1$ , then $\textit{unknown}<q\leq 1$

Figure 1 illustrates examples of the S-q functions $\tilde{\rho}_{q}(t)$ , $\rho_{q}(t)$ , and $w_{q}(t)$ for the five-dimensional Gaussian (S-q Type II) and Laplace (S-q Type I) distributions and for various values of $q$ . As $q$ is decreased, the region of positive weights (area between points $a$ and $c$ ) narrows, corresponding to increased robustness. The PDF is also plotted, illustrating how $w_{q}(t)$ roughly follows $f(t)$ in the central region.

Refer to caption — Figure 1: Example S-q Rho and Weight Functions for Gaussian (Type II S-q) and Laplace (Type I S-q) Distributions. Rho functions $\rho_{q}(t)$ (top) and weight functions $w_{q}(t)$ (bottom) are plotted for the Gaussian distribution (left) and the Laplace distribution (right) for $q\in\{-2,0.5,0.9\}$ and $p=5$ . On the top, the dotted lines depict the corresponding $\tilde{\rho}_{q}(t)$ functions, scaled and shifted to match $\rho_{q}(t)$ . On the bottom, the dashed line depicts the density function. The solid dots indicate points $a,$ when $\tilde{\rho}_{q}(t)=0,$ and $c,$ when $\tilde{\rho}_{q}(t)=1$ .

Figure 2 compares the S-q asymptotic weights with those of the MM-SHR, S-Rocke, S-bisquare, and maximum likelihood estimators, and with the corresponding PDF. The underlying model is a 10-dimensional standard Gaussian distribution. The MM-SHR and S-q estimators have been tuned to 80% asymptotic efficiency relative to the MLE. The S-Rocke estimator is tuned to its maximum efficiency, which is 77% in this instance. The estimators have been set to the maximum breakdown point, with $b=1/2,$ which results in the shifts of the peaks of the weight curves relative to the PDF.

From the figure, it is clear that the Gaussian MLE (i.e. sample estimator) gives uniform weight to all samples, no matter how improbable. The S-Rocke has a quadratic weight function, which is a hard cutoff that cannot capture the tails of $f(d)$ . The SHR weight function is cubic, and its shape better captures the shape of the right-half of the PDF. However, the SHR function is designed to approximate a step function, which is poorly suited for many distributions (c.f. $w_{shr}(t)$ in Figure 2 with the Laplace $f(d)$ in Figure 1). Only the S-q weight function follows the general shape of the PDF—giving less weight to less probable observations.

3 Consistency Properties of the S-q Estimator

As an S-estimator, the S-q estimator inherits properties from its parent class, such as affine equivariance. This section briefly summarizes properties related to its consistency. For more detailed discussion on these, see (Davies, 1987). Here, we use the alternative S-estimator formulation given by (4) under the assumptions (A1) that

	(A1)	$\displaystyle b=\operatorname{E}[\rho_{q}\left(d\left(\boldsymbol{x}_{i},\boldsymbol{\mu},\boldsymbol{\Sigma}\right)/\sigma\right)],$
		$\displaystyle\boldsymbol{x}\sim f_{X}\left(\boldsymbol{x},\boldsymbol{\mu},\boldsymbol{\Sigma},\phi(d)\right)\text{, where $\boldsymbol{x}_{i}$ are i.i.d.,}$
		$\phi(d)$ is non-increasing, and
		$\phi(d)$ and $-\rho_{q}(d/\sigma)$ have common point(s) of decrease.

Theorem 1 (Uniqueness).

Given (A1), minimizing $|\widehat{\boldsymbol{\Sigma}}|$ subject to

\int_{0}^{\infty}\rho_{q}\left(\frac{d\left(\boldsymbol{x}_{i},\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)}{\sigma}\right)f_{X}(d\left(\boldsymbol{x}_{i},\boldsymbol{\mu},\boldsymbol{\Sigma}\right))\operatorname{d}\!\boldsymbol{x}=b

has a unique solution $\left(\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right).$

Proof.

See (Davies, 1987, Th. 1). ∎

Theorem 2 (Existence).

Given (A1) and $n\geq(p+1)/(1-b/\rho_{q}(\infty)),$ then the S-q estimator has at least one solution with probability one.

Proof.

See (Davies, 1987, Th. 2). ∎

Theorem 3 (Consistency).

Given (A1), $b=\operatorname{E}[\rho_{q}(d/\sigma)],$ and $p+1\leq n(1-b/\rho_{q}(\infty)),$ then

\lim_{n\rightarrow\infty}\left(\widehat{\boldsymbol{\mu}}_{n},\widehat{\boldsymbol{\Sigma}}_{n}\right)=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right).

Proof.

See (Davies, 1987, Th. 3). ∎

4 Asymptotic Distribution and Relative Efficiencies

In this section, the asymptotic distribution of the S-q estimator is provided. From this, measures of efficiency are then defined. Finally, the efficiency of the S-q estimator is compared with leading high-breakdown point estimators.

4.1 Asymptotic Distribution

For the asymptotic distribution of the S-q estimate, we continue to use the alternative S-estimator formulation given by (4). Lopuhaä (1997) derived the distribution of S-estimators with assumptions appropriate for the S-q estimator, that is

(A2)

\phi^{\prime}_{p}(t)

is decreasing with

\phi^{\prime}_{p}(d)<0

Here, we use the following notation. The matrix $\boldsymbol{I}_{p^{2}}$ is the $p^{2}\times p^{2}$ identity matrix, $\boldsymbol{K}_{p^{2}}$ is the $p^{2}\times p^{2}$ commutation matrix, $\otimes$ is the Kronecker product operator, and the operator $\operatorname{vec}\left(\boldsymbol{\Sigma}\right)$ stacks the columns if $\boldsymbol{\Sigma}$ into a column vector.

Theorem 4 (Asymptotic distribution).

Given (A1) and (A2), the asymptotic distribution of the S-q estimate of $\left(\widehat{\boldsymbol{\mu}}_{n},\widehat{\boldsymbol{\Sigma}}_{n}\right)$ is given by $\sqrt{n}\left(\widehat{\boldsymbol{\mu}}_{n}-\boldsymbol{\mu},\widehat{\boldsymbol{\Sigma}}_{n}-\boldsymbol{\Sigma}\right)\overset{d}{\rightarrow}\left(\boldsymbol{a},\boldsymbol{B}\right)$ , with $\boldsymbol{a}\perp\boldsymbol{B}$ . The vector $\boldsymbol{a}\sim\mathcal{N}\left(\boldsymbol{0},\boldsymbol{\Gamma}_{\boldsymbol{\mu}}\right)$ where

\boldsymbol{\Gamma}_{\boldsymbol{\mu}}=\frac{\omega_{1}}{\omega_{2}^{2}}\boldsymbol{\Sigma},

(9)

with $\omega_{1}=p^{-1}\operatorname{E}\left[dw_{q}^{2}(d/\sigma)\right]$ and $\omega_{2}=-2\beta\int_{0}^{\infty}p^{-1}d^{p/2}w_{q}(d/\sigma)\phi^{\prime}(d)\;\operatorname{d}\!d$ . The matrix $\boldsymbol{B}\sim\mathcal{N}\left(\boldsymbol{0},\boldsymbol{\Gamma}_{\boldsymbol{\Sigma}}\right)$ where

\boldsymbol{\Gamma}_{\boldsymbol{\Sigma}}=\zeta_{1}\left(\boldsymbol{I}_{p^{2}}+\boldsymbol{K}_{p^{2}}\right)\left(\boldsymbol{\Sigma}\otimes\boldsymbol{\Sigma}\right)+\zeta_{2}\operatorname{vec}\left(\boldsymbol{\Sigma}\right)\operatorname{vec}\left(\boldsymbol{\Sigma}\right)^{\operatorname{T}},

(10)

with $\zeta_{1}=\lambda_{1}^{-2}p(p+2)\operatorname{E}\left[(d/\sigma)^{2}w_{q}^{2}(d/\sigma)\right]$ and $\zeta_{2}=\lambda_{2}^{-2}\operatorname{E}\left[(\rho_{q}(d/\sigma)-b)^{2}\right]-2p^{-1}\zeta_{1}$ , where $\lambda_{1}=-2\beta\int_{0}^{\infty}\sigma^{-1}d^{p/2+1}w_{q}(d/\sigma)\phi^{\prime}(d)\;\operatorname{d}\!d$ and $\lambda_{2}=-\beta\int_{0}^{\infty}d^{p/2}\left(\rho_{q}(d/\sigma)-b\right)\phi^{\prime}(d)\;\operatorname{d}\!d.$

Proof.

See (Lopuhaä, 1997, Corollary 2). ∎

Frahm (2009) derived the asymptotic distribution of shape matrix estimates for affine equivariant estimators. This enables us to state the asymptotic distribution of the S-q shape estimate, which is applicable using either S-estimator formulation, (3) or (4).

Theorem 5 (Shape asymptotic distribution).

Given (A1) and (A2), the asymptotic distribution of the S-q estimate of shape is given by $\sqrt{n}\left(\widehat{\boldsymbol{\Omega}}_{n}-\boldsymbol{\Omega}\right)\sim\mathcal{N}\left(\boldsymbol{0},\boldsymbol{\Gamma}_{\boldsymbol{\Omega}}\right)$ where

\boldsymbol{\Gamma}_{\boldsymbol{\Omega}}=\zeta_{1}\left(\boldsymbol{I}_{p^{2}}+\boldsymbol{K}_{p^{2}}\right)\left(\boldsymbol{\Omega}\otimes\boldsymbol{\Omega}\right)-\frac{2\zeta_{1}}{p}\operatorname{vec}\left(\boldsymbol{\Omega}\right)\operatorname{vec}\left(\boldsymbol{\Omega}\right)^{\operatorname{T}},

(11)

with $\zeta_{1}$ defined as in Theorem 4.

Proof.

See (Frahm, 2009, Corollary 1). ∎

4.2 Measures of Efficiency

The asymptotic efficiency of an estimator, at an assumed distribution, is defined as the ratio of the asymptotic variance of the maximum likelihood estimate to the variance of the estimator under consideration. For multivariate estimation, this definition of efficiency is of large dimension— ${p\times p}$ for location and ${p^{2}\times p^{2}}$ for shape and scatter. However, for affine equivariant estimation of location and scatter of elliptical distributions, the covariance of the estimate depends only on a scalar. Specifically, (9), (10), and (11) are general, with only the scalars $\omega_{1}/\omega_{2}^{2}$ (Bilodeau and Brenner, 1999), and $\zeta_{1}$ and $\zeta_{2}$ (Tyler, 1982) depending on the estimator and the generating function $\phi(d)$ . Therefore, the asymptotic efficiency of the estimate $\widehat{\boldsymbol{\mu}}$ can be alternatively defined as

\operatorname{eff}_{\infty}\left(\widehat{\boldsymbol{\mu}}\right)=\frac{\omega_{1\text{,mle}}/\omega_{2\text{,mle}}^{2}}{\omega_{1,\widehat{\boldsymbol{\mu}}}/\omega_{2,\widehat{\boldsymbol{\mu}}}^{2}},

and the asymptotic efficiency of the estimate $\widehat{\boldsymbol{\Omega}}$ can alternatively be defined as

\operatorname{eff}_{\infty}\left(\widehat{\boldsymbol{\Omega}}\right)=\frac{\zeta_{1,\text{mle}}}{\zeta_{1,\widehat{\boldsymbol{\Omega}}}}.

(12)

It is common to define asymptotic efficiency this way (for example, see Tyler, 1983; Frahm, 2009).

Comparing the S-q estimator’s efficiency to another estimator can likewise be achieved analytically using, for example, $\zeta_{1,\gamma}/\zeta_{1,q}$ for the S-Rocke estimator, which when the quotient is greater than one, indicates that the S-q has higher asymptotic efficiency than the S-Rocke estimator. For other S-estimators, the asymptotic distribution parameters $\omega_{1},$ $\omega_{2},$ and $\zeta_{1}$ are calculated the same as in Theorems 4 and 5 but using their respective weight functions. MM-estimators have the same asymptotic variance and influence function as S-estimators (Rousseeuw and Hubert, 2013). For MM-estimators, however, $\sigma$ is effectively the tuning parameter, and it can be set accordingly.

In general, finite-sample performance measures are difficult to derive analytically. Instead, it is common to characterize finite-sample performance by empirically characterizing the behavior of metrics derived from the Kullback-Leibler divergence between the estimated and true distribution (for example, see Huang et al., 2006; Ferrari and Yang, 2010). For t-distributions, which includes the Gaussian distribution, the Kullback-Leibler divergence between $t_{\nu}\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)$ and $t_{\nu}\left(\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)$ is given by (Abusev, 2015)

\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Sigma};\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)=\frac{1}{2}\left(\operatorname{Tr}\left(\boldsymbol{\Sigma}^{-1}\widehat{\boldsymbol{\Sigma}}\right)+\left(\boldsymbol{\mu}-\widehat{\boldsymbol{\mu}}\right)^{T}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}-\widehat{\boldsymbol{\mu}}\right)-p-\log\left(\frac{|\widehat{\boldsymbol{\Sigma}}|}{|\boldsymbol{\Sigma}|}\right)\right).

Following Maronna and Yohai (2017), we then define the joint location and scatter finite-sample relative efficiency as

\operatorname{eff}_{n}\left(\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}};\widehat{\boldsymbol{\mu}}_{\text{mle}},\widehat{\boldsymbol{\Sigma}}_{\text{mle}}\right)=\frac{\operatorname{E}\left[\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Sigma};\widehat{\boldsymbol{\mu}}_{\text{mle}},\widehat{\boldsymbol{\Sigma}}_{\text{mle}}\right)\right]}{\operatorname{E}\left[\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Sigma};\widehat{\boldsymbol{\mu}},\widehat{\boldsymbol{\Sigma}}\right)\right]},

where $\widehat{\boldsymbol{\mu}}_{\text{mle}}$ and $\widehat{\boldsymbol{\Sigma}}_{\text{mle}}$ are the location and scatter matrices corresponding to the maximum likelihood estimate, and where the expectation is calculated empirically using the sample mean over $m$ Monte Carlo trials. The location and the scatter finite-sample relative efficiencies are then respectively defined as $\operatorname{eff}_{n}\left(\widehat{\boldsymbol{\mu}},\boldsymbol{\Sigma};\widehat{\boldsymbol{\mu}}_{\text{mle}},\boldsymbol{\Sigma}\right)$ and $\operatorname{eff}_{n}\left(\boldsymbol{\mu},\widehat{\boldsymbol{\Sigma}};\boldsymbol{\mu},\widehat{\boldsymbol{\Sigma}}_{\text{mle}}\right).$ Likewise, we define the shape matrix finite-sample relative efficiency as

\operatorname{eff}_{n}\left(\widehat{\boldsymbol{\Omega}};\widehat{\boldsymbol{\Omega}}_{\text{mle}}\right)=\frac{\operatorname{E}\left[\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Omega};\boldsymbol{\mu},\widehat{\boldsymbol{\Omega}}_{\text{mle}}\right)\right]}{\operatorname{E}\left[\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Omega};\boldsymbol{\mu},\widehat{\boldsymbol{\Omega}}\right)\right]}.

(13)

4.3 Comparison of Estimator Efficiency

Any estimator must provide a good estimate in the absence of contamination and when tuned to its maximum efficiency. This section compares the maximum achievable efficiencies of the S-q, S-Rocke, and MM-SHR estimators when set to their maximum breakdown point. The results below cover large swaths of the most common elliptical families in Table 1 for a moderate dimension of $p=20.$ These swaths were specifically chosen to cover everyday distributions: Gaussian, Cauchy, Laplace, hyperbolic, and normal inverse Gaussian distributions.

Robust scatter matrix estimation is generally “more difficult” than the estimation of location (Maronna et al., 2019), and as Maronna and Yohai (2017) demonstrated, divergence and efficiency metrics for scatter matrix estimators are generally much worse than for the corresponding estimators of location. Likewise, due to the high dimensionality of the estimate, the underlying shape matrix is the most difficult part of estimating the scatter matrix. Additionally, many practical applications such as multivariate regression, principal components analysis, linear discriminant analysis, and canonical correlation analysis only require the shape matrix, and not the full scatter or covariance matrices (Frahm, 2009). Therefore, unless otherwise noted, the performance results in this paper are for the shape matrix, with metrics given by (12), (13), and $D\left(\boldsymbol{\mu},\boldsymbol{\Omega};\boldsymbol{\mu},\widehat{\boldsymbol{\Omega}}\right).$

The maximum efficiencies of the S-q and S-Rocke generally occur when their parameters $q$ and $\gamma$ are set to one—although the maximum breakdown point of the S-q is only achieved when $q<1$ . However, the maximum efficiency of the MM-SHR must be determined by a search as depicted in Figure 3, which plots, as an example, asymptotic efficiency versus tuning parameter for the estimators for the 20-dimensional Cauchy distribution. At the limit, as the MM-SHR parameter is increased toward infinity, all samples receive equal weight, which is the MLE for the Gaussian distribution, but not for distributions such as the Cauchy. In general, for each tunable estimator, its efficiency decreases while its robustness increases as its parameter is decreased. At the lower limit of its parameter, its weight function is a delta function that may reject all the samples and may result in zero efficiency. At this point, the robustness is high, but the weighted-sum solution depends entirely on the initial estimates $\widehat{\boldsymbol{\mu}}^{(0)}$ and $\widehat{\boldsymbol{\Omega}}^{(0)}.$

It should be noted that although generally of high efficiency, the S-q estimate at its limit with $q=1$ is not necessarily the maximum likelihood estimate for location and scatter. The MLE weight function for location and scatter is given by (Tyler, 1982)

w_{\text{mle}}(t)=-2\frac{\phi^{\prime}(t)}{\phi(t)}

(14)

whereas at $q=1,$ (8) gives

w_{q=1}(t)=-\frac{\phi^{\prime}(t)}{\phi(t)}+t\left(\frac{\phi^{\prime}(t)}{\phi(t)}\right)^{2}-t\frac{\phi^{\prime\prime}(t)}{\phi(t)}.

(15)

Theorem 6 (Relation to MLE efficiency).

Assuming $b=\operatorname{E}\left[\rho_{q}(d\left(\boldsymbol{x};\boldsymbol{\mu},\boldsymbol{\Sigma}\right))\right],$ the asymptotic S-q estimate with $q=1$ is the maximum likelihood estimate for the location and scatter matrices for distributions where

t\frac{\phi^{\prime\prime}(t)}{\phi(t)}-t\left(\frac{\phi^{\prime}(t)}{\phi(t)}\right)^{2}=y\frac{\phi^{\prime}(t)}{\phi(t)},

(16)

for some value $y.$ Therefore, the S-q estimator can asymptotically achieve the Cramér–Rao lower bound for these distributions.

Proof.

The S-estimator scaling of $b=\operatorname{E}\left[\rho_{q}(d\left(\boldsymbol{x};\boldsymbol{\mu},\boldsymbol{\Sigma}\right))\right]$ results in an estimate that is invariant to scaling of the weight function. Therefore, this theorem follows directly from proportionally equating (14) to (15). ∎

Remark 1.

Although this theorem inherently assumes the alternative S-estimator formulation given by (4), it still holds true for location and shape matrices using the primary S-estimator formulation in (3).

Remark 2.

If $\lim_{q\rightarrow 1}\tilde{\rho}_{q}(c)$ is finite, and $b\leq 1/2,$ then both the Cramér–Rao lower bound (maximum efficiency) and the maximum breakdown point can occur in the limit as $q\rightarrow 1$ (see Corollary 1 in the next section).

An example family that satisfies this theorem is the Kotz type with parameter $N=0.$ Note, however, that $\lim_{q\rightarrow 1}\tilde{\rho}_{q}(c)=\infty,$ so high breakdown cannot be achieved simultaneously. This is illustrated in Figure 4, which provides the estimators’ maximum achievable asymptotic shape efficiencies for the Kotz type distribution with parameters $N=0$ and $r=1/2$ as a function of parameter $s$ . In this example, the S-q efficiency is plotted for its maximum absolute efficiency with $q=1$ and for it approximate maximum high-breakdown efficiency with $q=0.99.$ As seen in the figure, the high cost of high-breakdown is particularly acute for large $s.$

The remainder of this paper will focus on maximum efficiency at the maximum breakdown point. Figure 4 also provides the S-Rocke and MM-SHR estimator’s maximum efficiencies at their maximum breakdown points. The MM-SHR efficiency peaks at $s=1,$ which is expected since this is the Gaussian distribution, and the S-Rocke efficiency peaks just above this point. Their efficiencies fall off precipitously for larger and smaller values of $s.$ The efficiency of the S-q, conversely, increases toward unity for smaller $s.$

The estimators’ maximum achievable asymptotic efficiencies for the t-distribution as a function of the distribution parameter, $\nu$ , are plotted on the left of Figure 5. When $\nu=1$ , the t-distribution corresponds to a Cauchy distribution, and when $\nu\rightarrow\infty,$ it corresponds to the Gaussian distribution. The S-q estimator offers the highest efficiency of the three estimators for thicker tails.

The maximum achievable small-sample relative efficiencies using $n=3p$ are plotted on the right of Figure 5. The initial estimates were made using the Peña and Prieto (2007) kurtosis plus specific directions (KSD) estimator as recommended and provided by Maronna and Yohai (2017). Comparing these finite-sample results with the asymptotic ones on the left, it is seen that the relative results are similar. This general similarity implies that the relative performance of the asymptotic efficiencies can often be a good surrogate for the relative performance of the finite-sample efficiencies when there is no closed-form expression for the divergence in (13).

The estimators’ maximum achievable asymptotic efficiencies for the variance gamma distribution with $\psi=2$ are plotted as a function of parameter $\lambda$ on the left of Figure 6. The plots highlight the Laplace $(\lambda=1)$ and multivariate hyperbolic $(\lambda=(p+1)/2)$ distributions. The S-q exhibits good performance for the hyperbolic and remarkably good performance for the Laplace.

The estimators’ maximum achievable asymptotic efficiencies for the generalized hyperbolic distribution with $\psi=2$ and $\chi=1$ are plotted as a function of parameter $\lambda$ on the right of Figure 6. The plots highlight the normal inverse Gaussian $(\lambda=-1/2)$ and hyperbolic $(\lambda=(p+1)/2)$ distributions. The S-q again exhibits good performance for the hyperbolic, and it exhibits remarkably good performance for the normal inverse Gaussian.

5 Robustness Analysis

The robustness of the S-q estimator is now explored. First, the breakdown point is provided. The influence function is then explored. Finally, finite-sample simulation results are provided to further illustrate the robustness of the high-breakdown estimators.

5.1 Breakdown Point

The finite-sample breakdown point of a multivariate estimator of location or scatter is defined as the fraction of the samples, $\epsilon n,$ that can be set to either drive $\|\widehat{\boldsymbol{\mu}}\|=\infty$ or drive an eigenvalue of $\widehat{\boldsymbol{\Sigma}}$ to either zero or infinity. Unlike multivariate M-estimators, which only achieve an asymptotic breakdown point of $(p+1)^{-1}$ (Maronna, 1976), S-estimators are able to achieve the maximum possible finite-sample breakdown point that any affine equivariant estimator may have (Davies, 1987, Th. 6). For the following theorem, the term samples in general position means that no more than $p$ samples are contained in any hyperplane of dimension less than $p$ .

Theorem 7 (Finite-sample breakdown point).

Assuming (A1) and $q<1,$ when $n$ samples are in general position and $n(1-2b)\geq p+1,$ the breakdown point of the S-q estimator is $(\lfloor nb\rfloor+1)/n.$

Proof.

As discussed above Section 2.3, $q<1$ ensures a proper S-estimator with finite value $c$ and bounded rho function. See (Davies, 1987, Th. 5). ∎

Corollary 1.

The maximum breakdown point is $\lfloor(n-p+1)/2\rfloor/n,$ which is achieved when $b=1/2-(p+1)/(2n).$ Asymptotically, this is $1/2$ at $b=1/2.$

5.2 Influence Function

The influence function (IF) of an estimator characterizes its sensitivity to an infinitesimal point contamination at $\boldsymbol{z}\in\mathbb{R}^{p},$ standardized by the mass of the contamination, $\epsilon.$ The influence function for estimator $\boldsymbol{T},$ at the nominal distribution $F$ , is defined as

	$\displaystyle\boldsymbol{\operatorname{IF}}\left(\boldsymbol{z};\boldsymbol{T},F\right)$	$\displaystyle=\lim_{\epsilon\rightarrow 0^{+}}\frac{\boldsymbol{T}((1-\epsilon)F+\epsilon\Delta_{\boldsymbol{z}})-\boldsymbol{T}(F)}{\epsilon}$
		$\displaystyle=\frac{\partial}{\partial\epsilon}\boldsymbol{T}((1-\epsilon)F+\epsilon\Delta_{\boldsymbol{z}})\|_{\epsilon=0},$

where $\epsilon$ is the proportion of samples that are a point-mass, $\Delta_{\boldsymbol{z}},$ located at $\boldsymbol{z}.$

Theorem 8 (Influence function).

Assuming (A1) and (A2), the influence functions of for the S-q estimates of $\widehat{\boldsymbol{\mu}}$ and $\widehat{\boldsymbol{\Sigma}}$ are given by

	$\displaystyle\boldsymbol{\operatorname{IF}}\left(\boldsymbol{z};\boldsymbol{\mu},F\right)$	$\displaystyle=\frac{\sqrt{d_{z}}w_{q}(d_{z}/\sigma)}{\omega_{2}}\frac{\boldsymbol{z}_{c}}{\sqrt{d_{z}}},$
	$\displaystyle\boldsymbol{\operatorname{IF}}\left(\boldsymbol{z};\boldsymbol{\Sigma},F\right)$	$\displaystyle=\frac{\rho_{q}\left(d_{z}/\sigma\right)-b}{\lambda_{2}}\boldsymbol{\Sigma}+\frac{p(p+2)\left(d_{z}/\sigma\right)w_{q}\left(d_{z}/\sigma\right)}{\lambda_{1}}\left(\frac{\boldsymbol{z}_{c}\boldsymbol{z}_{c}^{\operatorname{T}}}{d_{z}}-\frac{1}{p}\boldsymbol{\Sigma}\right),$		(17)

where $\boldsymbol{z}_{c}=\boldsymbol{z}-\boldsymbol{\mu}$ and $d_{z}=\boldsymbol{z}_{c}^{\operatorname{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{z}_{c},$ and where the scalars $\omega_{2}$ , $\lambda_{1}$ , and $\lambda_{2}$ were defined in Theorem 4.

Proof.

See (Lopuhaä, 1989, Corollary 5.2) and (Lopuhaä, 1997, Remark 2). ∎

By definition of S-estimators with normalized rho function, the magnitude of first term of (17) is clearly bounded to no more than $\lambda_{2}^{-1}\boldsymbol{\Sigma}.$ Therefore, to compare the influence functions of the S-q, S-Rocke, and MM-SHR estimators, we focus on the second term. From this term, define $\alpha_{\boldsymbol{\Sigma}}\left(d_{z}\right)=\lambda_{1}^{-1}p(p+2)(d_{z}/\sigma)w(d_{z}/\sigma)$ for each estimator. Figure 7 plots $\alpha_{\boldsymbol{\Sigma}}(d_{z})$ at the 10-dimensional Gaussian distribution for the estimators as depicted in Figure 2.

By definition, all highly-robust estimators have bounded influence functions, and for the three estimators considered here, their influence functions are continuous. This means that small amounts of contamination have limited effects on their estimates. The gross-error sensitivity of an estimator is the maximum of $\boldsymbol{\operatorname{IF}}\left(\boldsymbol{z}\right),$ and in this example, the S-q demonstrates a lower gross-error sensitivity than the S-Rocke and MM-SHR estimators. By its definition, the MM-SHR has a inlier rejection point of zero, meaning inliers can negatively influence its estimates. However, proper Type II S-q functions have positive inlier rejection points, which provide robustness against inliers.

Relative to the S-Rocke and MM-SHR estimators, the S-q often has larger outlier rejection points. This is the cost of its generally higher efficiency and ability to reject inliers. However, due to its continuity, the influence near this point is still greatly attenuated.

5.3 Finite-Sample Robustness

To empirically compare the finite-sample robustness of the estimators, we employed the simulation method used by Maronna and Yohai (2017) and plot the shape matrix divergence, $\operatorname{D}\left(\boldsymbol{\mu},\boldsymbol{\Omega};\boldsymbol{\mu},\widehat{\boldsymbol{\Omega}}\right)$ , versus shift contamination value $k$ . For a contamination proportion $\epsilon$ , the first element of each of the $\lfloor\epsilon n\rfloor$ contaminated samples was replaced with the value $k$ , that is $x_{1}=k$ . The initial estimates of the weighted algorithm were determined with the KSD estimator. Figure 8 provides divergence plots for normally distributed data with $\epsilon=10\%$ contamination, for dimensions $p=5$ and $p=20$ , and for sample sizes $n=5p$ and $n=100p$ . For the cases where $p=20$ , the estimators were tuned to 90% uncontaminated relative efficiency. When $p=5$ , the S-Rocke has poor maximum efficiency, so the estimators were tuned to match the maximum S-Rocke efficiency.

These plots show that the robustness of the S-q is on par with the other two estimators. Consistent with the results in Maronna and Yohai (2017), the relative worst-case performance of the estimators vary by such factors as dimension, sample size, and contamination percentage. For example, the S-q performs the best here for $p=5$ , $n=100p$ , but the MM-SHR is the best for $p=5$ , $n=5p$ .

6 Computational Aspects

This section explores computational aspects of the S-q and other high-breakdown estimators when using their weighted-sum algorithms. The estimators’ stabilities are first assessed by comparing their sensitivities to the initial estimates, $\widehat{\boldsymbol{\mu}}^{(0)}$ and $\widehat{\boldsymbol{\Omega}}^{(0)}.$ The computational efficiency is then evaluated by comparing the computational convergence rates of the estimators.

6.1 Stability

The primary criticism of high-breakdown estimators is that their solutions are highly sensitive to the initial estimates $\widehat{\boldsymbol{\mu}}^{(0)}$ and $\widehat{\boldsymbol{\Omega}}^{(0)}$ due to the non-convexity of their objective functions. The S-q estimator helps mitigate this with a generally wider weight function (see for example Figure 2). To demonstrate that the S-q is more stable with respect to the initial estimates, $m$ Gaussian Monte Carlo simulation trials were run where for each trial, $\boldsymbol{\Omega}$ was estimated twice for each estimator using different initializations. For the first estimate, $\widehat{\boldsymbol{\Omega}}^{(0)}$ was set to the MLE using all $n$ samples before contamination. For the second estimate, $\widehat{\boldsymbol{\Omega}}^{(0)}$ was set to the MLE using just 25% of the samples before contamination, resulting in a larger expected variance of $\widehat{\boldsymbol{\Omega}}^{(0)}.$ The sample mean of the divergence between the two final estimates, $\bar{\operatorname{D}}\left(\widehat{\boldsymbol{\Omega}}_{1},\widehat{\boldsymbol{\Omega}}_{2}\right),$ was then calculated. The same values of $p,$ $n,$ and $\epsilon$ were used as in the Section 5.3 simulations. The contamination method was also the same, and the value of $k$ was set to the worst-case value for that estimator and for the values of $p,$ $n,$ and $\epsilon$ (see Figure 8).

The results are presented in the center of Table 4. It is seen that the S-q estimator was consistently the most stable of the three estimators, and the MM-SHR was generally the most sensitive. For the near-asymptotic cases, the S-q exhibited no measurable differences between the two estimates, unlike the MM-SHR. Like the S-q, the S-Rocke had no measurable differences between the two estimates for the uncontaminated near-asymptotic cases, but under contamination, its mean divergence was roughly on par with the MM-SHR.

Table 4: Estimator Stability and Computational Efficiency for Normally Distributed Data

Note: Mean divergence values listed as “0” have simulated average divergences less than the numerical convergence criterion, $\operatorname{D}\left(\widehat{\boldsymbol{\Omega}}^{(i)},\widehat{\boldsymbol{\Omega}}^{(i-1)}\right)<10^{-10}.$
Dim.	Samples	Contam.	Mean Divergence			Median No. of Iterations
$p$	$n$	$\epsilon$	S-q	S-Rocke	MM-SHR	S-q	S-Rocke	MM-SHR
5	$100p$	0%	0	0	8e-5	13	14	14
5	$100p$	10%	0	7e-4	5e-4	14	15	15
20	$100p$	0%	0	0	5e-5	7	7	7
20	$100p$	10%	0	1e-3	3e-2	18	8	16
5	$5p$	0%	2e-1	4e-1	2e-0	32	14	15
5	$5p$	10%	3e-1	5e-1	2e-0	31	15	15
20	$5p$	0%	6e-5	4e-2	1e-0	28	18	37
20	$5p$	10%	1e-0	2e-0	3e-0	30	18	33

6.2 Computational Efficiency

To compare the relative computational efficiencies of the high-breakdown estimators, we calculated the median number of iteration required for the estimators to converge for normally distributed data for various values of $p$ , $n$ , and $\epsilon$ . All three estimators were set to use the same tight convergence criteria that $\operatorname{D}\left(\widehat{\boldsymbol{\Omega}}^{(i)},\widehat{\boldsymbol{\Omega}}^{(i-1)}\right)<10^{-10}.$ The initial estimates were determined with the KSD estimator, and the estimators were tuned as in the Section 5.3 simulations. The contamination method was also the same, and the value of $k$ was set to the worst-case value for that estimator.

The results are presented on the right of Table 4. For the large-sample ( $n=100p$ ) simulations, the S-q converges approximately as fast as the other two estimators (except for the one case where $p=20,$ $\epsilon=10\%$ , where the S-Rocke performs notably better). The S-Rocke estimator consistently converges fastest for all of the small-sample ( $n=5p$ ) cases, and the small-sample convergence of the S-q estimator is relatively consistent—albeit at the upper-end of the spectrum. The small-sample convergence of the MM-SHR is on par with the S-Rocke for small $p,$ but worse than the others for large $p.$

7 Application to Financial Portfolio Optimization

A common financial application of mean and covariance matrices is in modern portfolio theory for the optimal allocation of portfolio investments. Under modern portfolio theory’s mean-variance framework, a minimum-variance portfolio aims to minimize the risk (i.e. variance) of the portfolio return subject to a desired expected return (Markowitz, 1952). Mathematically, this is expressed as

	$\displaystyle\min_{\boldsymbol{\alpha}}$			$\displaystyle\boldsymbol{\alpha}^{\operatorname{T}}\boldsymbol{\Omega}_{r}\boldsymbol{\alpha}$
	subject to			$\displaystyle\boldsymbol{\alpha}^{\operatorname{T}}\boldsymbol{\mu}_{r}=\mu_{p},\;\boldsymbol{\alpha}^{\operatorname{T}}\boldsymbol{1}=1,$

where $\boldsymbol{\alpha}$ is a normalized vector of portfolio allocation for each asset, $\boldsymbol{\Omega}_{r}$ is the shape (or equivalently covariance) matrix for the asset returns, $\boldsymbol{\mu}_{r}$ is the expected returns of each asset, $\mu_{p}$ is the desired expected portfolio return, and $\boldsymbol{1}$ is a vector of ones. The solution is given by (Roy, 1952; Merton, 1972)

	$\displaystyle\boldsymbol{\alpha}={}$	$\displaystyle s_{r}\left(\boldsymbol{\mu}_{r}^{\operatorname{T}}\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{\mu}_{r}\right)\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{1}-s_{r}\left(\boldsymbol{1}^{\operatorname{T}}\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{\mu}_{r}\right)\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{\mu}_{r}$
		$\displaystyle+s_{r}\mu_{p}\left(\boldsymbol{1}^{\operatorname{T}}\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{1}\right)\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{\mu}_{r}-s_{r}\mu_{p}\left(\boldsymbol{\mu}_{r}^{\operatorname{T}}\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{1}\right)\boldsymbol{\Omega}_{r}^{-1}\boldsymbol{1},$		(18)

where $s_{r}$ is a scalar that ensures the elements of $\boldsymbol{\alpha}$ sum to one.

In this section, the performances of the MM-SHR, S-Rocke, and S-q estimators are compared for the optimal allocation of investment in the component stocks of the DOW Jones Industrial Average. For each estimator, the parameters $\boldsymbol{\Omega}_{r}$ and $\boldsymbol{\mu}_{r}$ were estimated for the daily returns from the component stocks. Then, using a desired portfolio daily return of $\mu_{p}=.038\%$ (corresponding to 10% annual return), the optimal allocations, $\boldsymbol{\alpha},$ were calculated using (7). Using $\boldsymbol{\alpha}$ for each estimator, the portfolio return was then calculated for each business day of the verification period, assuming a daily re-balance of investments. Finally, each estimator’s performance was characterized by the variance of these daily returns. This variance is a measure of the volatility of the portfolio.

For the S-q estimator, we noted that Konlack Socgnia and Wilcox (2014) showed that the generalized hyperbolic distribution is a good model for stock returns, and specifically the variance gamma subclass has good parameter stability over time. Although their analysis is for log returns, daily log returns are generally close to one, so the variance gamma model should also fit well for gross (i.e. linear) returns. For the variance gamma S-q estimator, a density-weighted M-estimator was used to estimate the model parameters $\lambda$ and $\psi$ .

To demonstrate the robustness of the S-q estimator, we begin by noting that the first quarter of 2020 contained a once-in-a-generation period of extremely high volatility due to the COVID-19 pandemic, as depicted in Figure 9. This volatility started on approximately February 21. Each estimator’s performance was assessed by estimating the parameters $\boldsymbol{\Omega}_{r}$ and $\boldsymbol{\mu}_{r}$ using all the returns from the first quarter, then comparing the variances of the daily portfolio returns for only the pre-pandemic (prior to February 21) period. Each estimator was set to its maximum breakdown point. Each estimator was then tuned it to its maximum asymptotic efficiency with respect to the variance gamma distribution with parameters estimated using a maximum likelihood approach and using the daily returns for the years 2016–2019.

Table 5 summarizes the results, listing the variances of the daily returns. The S-q estimator performed the best with the lowest variance, which indicates high robustness. The MM-SHR performed the second best, followed by the S-Rocke. The sample estimator of mean and covariance was also included to demonstrate its poor robustness.

Table 5: Sample Variances of Achieved Daily Returns for 01-Jan-2020 – 20-Feb-2020

Estimator	Variance
S-q	76
MM-SHR	119
S-Rocke	147
Sample	176

Next, to demonstrate estimator efficiency, variances of daily returns were compared for a non-volatile period: 2016 through 2019. Using the same methodology and configuration as before, for each year and each estimator, $\boldsymbol{\Omega}_{r}$ and $\boldsymbol{\mu}_{r}$ were estimated. Then, $\boldsymbol{\alpha}$ was calculated and applied to each day of that year. The sample variances of each year’s daily portfolio returns are listed in Table 6. The S-q estimator resulted in the lowest portfolio variance for three of the four years, and the lowest variance on average, indicating high estimator efficiency. On average, the performance of the MM-SHR estimator was behind that of the S-q estimator, and the S-Rocke demonstrated substantially worse performance.

Table 6: Sample Variances of Achieved Daily Returns by Year

Year	S-q	MM-SHR	S-Rocke
2016	3.51	3.56	4.10
2017	1.18	1.24	1.34
2018	6.76	6.91	7.31
2019	3.60	3.50	4.49
Sample Mean	3.77	3.80	4.31

8 Conclusion

The S-q estimator has been introduced as a new tunable multivariate estimator of location, scatter, and shape matrices for elliptical probability distributions. This new estimator is a subclass of S-estimators, which achieve the maximum theoretical breakdown point. The S-q estimator has been compared with the leading high-breakdown estimators. Across elliptical distributions, the S-q has generally higher efficiency and stability, and its robustness is on par with these other leading estimators. Additionally, the S-q provides a monotonic and upper-bounded efficiency tuning parameter, which provides simpler tuning than the MM-SHR. The S-q is therefore a broadly applicable estimator, providing practitioners with a good general high-breakdown multivariate estimator that can be used across a broad range of practical applications, such as the optimal portfolio example.

References

Abusev (2015) Abusev, R.A., 2015. On the Distances Between Certain Distributions in Multivariate Statistical Analysis. Journal of Mathematical Sciences 205, 2–6. doi:10.1007/s10958-015-2222-y.
Basu et al. (1998) Basu, A., Harris, I.R., Hjort, N.L., Jones., M.C., 1998. Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika 85, 549–559.
Bilodeau and Brenner (1999) Bilodeau, M., Brenner, D., 1999. Theory of Multivariate Statistics. Springer-Verlag, New York.
Choi and Hall (2000) Choi, B.Y.E., Hall, P., 2000. Rendering Parametric Procedures More Robust by Empirically Tilting the Model. Biometrika 87, 453–465.
Davies (1987) Davies, P.L., 1987. Asymptotic Behaviour of S-Estimates of Multivariate Location Parameters and Dispersion Matrices. The Annals of Statistics 15, 1269–1292.
Deng and Yao (2018) Deng, X., Yao, J., 2018. On the property of multivariate generalized hyperbolic distribution and the Stein-type inequality. Communications in Statistics - Theory and Methods 47, 5346–5356. doi:10.1080/03610926.2017.1390134.
Fang et al. (1990) Fang, K.T., Kotz, S., Ng, K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman and Hall, London.
Ferrari and Yang (2010) Ferrari, D., Yang, Y., 2010. Maximum Lq-likelihood estimation. The Annals of Statistics 38, 753–783. doi:10.1214/09-AOS687.
Frahm (2009) Frahm, G., 2009. Asymptotic distributions of robust shape matrices and scales. Journal of Multivariate Analysis 100, 1329–1337. doi:10.1016/j.jmva.2008.11.007.
Gazor and Zhang (2003) Gazor, S., Zhang, W., 2003. Speech probability distribution. IEEE Signal Processing Letters 10, 204–207. doi:10.1109/LSP.2003.813679.
Huang et al. (2006) Huang, J.Z., Liu, N., Pourahmadi, M., Liu, L., 2006. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98. doi:10.1093/biomet/93.1.85.
Huber (1964) Huber, P.J., 1964. Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics 35, 73–101.
Kelker (1970) Kelker, D., 1970. Distribution Theory of Spherical Distributions and a Location-Scale Parameter Generalization. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 32, 419–430.
Konlack Socgnia and Wilcox (2014) Konlack Socgnia, V., Wilcox, D., 2014. A comparison of generalized hyperbolic distribution models for equity returns. Journal of Applied Mathematics 2014. doi:10.1155/2014/263465.
Lopuhaä (1989) Lopuhaä, H.P., 1989. On the Relation between S-Estimators and M-Estimators of Multivariate Location and Covariance. The Annals of Statistics 17, 1662–1683.
Lopuhaä (1997) Lopuhaä, H.P., 1997. Asymptotic expansion of S-estimators of location and covariance. Statistica Neerlandica 51, 220–237. doi:10.1111/1467-9574.00051.
Markowitz (1952) Markowitz, H., 1952. Portfolio Selection. The Journal of Finance 7, 77–91.
Maronna (1976) Maronna, R.A., 1976. Robust M-Estimators of Multivariate Location and Scatter. The Annals of Statistics 4, 51–67.
Maronna et al. (2006) Maronna, R.A., Martin, R.D., Yohai, V.J., 2006. Robust Statistics: Theory and Methods. First ed., John Wiley & Sons.
Maronna et al. (2019) Maronna, R.A., Martin, R.D., Yohai, V.J., Salibián-Barrera, M., 2019. Robust Statistics Theory and Methods (with R). Second ed., John Wiley & Sons.
Maronna and Yohai (2017) Maronna, R.A., Yohai, V.J., 2017. Robust and efficient estimation of multivariate scatter and location. Computational Statistics and Data Analysis 109, 64–75. doi:10.1016/j.csda.2016.11.006.
Merton (1972) Merton, R.C., 1972. An Analytic Derivation of the Efficient Portfolio Frontier. The Journal of Financial and Quantitative Analysis 7, 1851–1872.
Peña and Prieto (2007) Peña, D., Prieto, F.J., 2007. Combining Random and Specific Directions for Outlier Detection and Robust Estimation in High-Dimensional Multivariate Data. Journal of Computational and Graphical Statistics 16, 228–254.
Rocke (1996) Rocke, D.M., 1996. Robustness Properties of S-Estimators of Multivariate Location and Shape in High Dimension. The Annals of Statistics 24, 1327–1345.
Rousseeuw and Hubert (2013) Rousseeuw, P., Hubert, M., 2013. High-Breakdown Estimators of Multivariate Location and Scatter, in: Becker, C., Fried, R., Kuhnt, S. (Eds.), Robustness and Complex Data Structures. Springer, New York. chapter 4, pp. 49–66. doi:10.1007/978-3-642-35494-6.
Rousseeuw and Yohai (1984) Rousseeuw, P., Yohai, V., 1984. Robust Regression by Means of S-Estimators, in: Robust and Nonlinear Time Series Analysis. Springer US : New York, NY. TA - TT -, pp. 256–272. doi:10.1007/978-1-4615-7821-5{\_}15.
Roy (1952) Roy, A.D., 1952. Safety First and the Holding of Assets. Econometrica 20, 431–449.
Tyler (1982) Tyler, D.E., 1982. Radial estimates and the test for sphericity. Biometrika 69, 429–436. doi:10.1093/biomet/69.2.429.
Tyler (1983) Tyler, D.E., 1983. Robustness and effciency properties of scatter matrices. Biometrika 70, 411–420. doi:10.1093/biomet/71.3.656-a.
Ward et al. (1990) Ward, K.D., Baker, C.J., Watts, S., 1990. Maritime surveillance radar. Part 1. Radar scattering from the ocean surface. IEE Proceedings F (Radar and Signal Processing) 137, 51–62. doi:10.1049/ip-f-2.1990.0009.
Windham (1995) Windham, M.P., 1995. Robustifying Model Fitting. Journal of the Royal Statistical Society. Series B (Methodological) 57, 599–609.