This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\WarningFilter

hyperrefYou have enabled option ‘breaklinks’.

Another look at halfspace depth:
Flag halfspaces with applications

Dušan Pokorný Petra Laketa  and  Stanislav Nagy nagy@karlin.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic
Abstract.

The halfspace depth is a well studied tool of nonparametric statistics in multivariate spaces, naturally inducing a multivariate generalisation of quantiles. The halfspace depth of a point with respect to a measure is defined as the infimum mass of closed halfspaces that contain the given point. In general, a closed halfspace that attains that infimum does not have to exist. We introduce a flag halfspace — an intermediary between a closed halfspace and its interior. We demonstrate that the halfspace depth can be equivalently formulated also in terms of flag halfspaces, and that there always exists a flag halfspace whose boundary passes through any given point xx, and has mass exactly equal to the halfspace depth of xx. Flag halfspaces allow us to derive theoretical results regarding the halfspace depth without the need to differentiate absolutely continuous measures from measures containing atoms, as was frequently done previously. The notion of flag halfspaces is used to state results on the dimensionality of the halfspace median set for random samples. We prove that under mild conditions, the dimension of the sample halfspace median set of dd-variate data cannot be d1d-1, and that for d=2d=2 the sample halfspace median set must be either a two-dimensional convex polygon, or a data point. The latter result guarantees that the computational algorithm for the sample halfspace median form the R package TukeyRegion is exact also in the case when the median set is less-than-full-dimensional in dimension d=2d=2.

Key words and phrases:
flag halfspace; halfspace depth; halfspace median; Tukey depth
1991 Mathematics Subject Classification:
62H05, 62G35

1. Introduction: Halfspace depth and its median

Denote by (d)\mathcal{M}\left(\mathbb{R}^{d}\right) the set of all finite Borel measures on the Euclidean space d\mathbb{R}^{d}. The halfspace (or Tukey) depth of xdx\in\mathbb{R}^{d} with respect to (w.r.t.) μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) is defined as111We consider the halfspace depth w.r.t. finite measures μ\mu, that is when μ(d)<\mu(\mathbb{R}^{d})<\infty. Compared to the usual setup of probability measures, this extension is minor, and made only for notational convenience. All our results could be considered also for probability measures only, with obvious modifications.

(1) 0ptxμ=inf{μ(H):H(x)},0pt{x}{\mu}=\inf\left\{\mu(H)\colon H\in\mathcal{H}(x)\right\},

where (x)\mathcal{H}(x) is the collection of closed halfspaces in d\mathbb{R}^{d} that contain xx on their boundary. The halfspace depth quantifies the centrality of xx w.r.t. the mass of μ\mu. That is quite useful in nonparametric statistics, as it allows us to rank sample points according to their depth, from the central to the peripheral ones. As such, the depth enables the introduction of rankings, orderings, and quantile-like inference to multivariate datasets [3, 20, 21]. The upper level sets of the halfspace depth of μ\mu, given for α0\alpha\geq 0 by

(2) Dα(μ)={xd:0ptxμα},D_{\alpha}(\mu)=\left\{x\in\mathbb{R}^{d}\colon 0pt{x}{\mu}\geq\alpha\right\},

play in nonparametric statistics the role of the inner quantile regions of μ\mu. They are often called the (halfspace) central regions of μ\mu. The sets (2) are nested, closed and convex; they are compact for α>0\alpha>0, and non-empty for αα(μ)\alpha\leq\alpha^{*}(\mu), where α(μ)=supxd0ptxμ\alpha^{*}(\mu)=\sup_{x\in\mathbb{R}^{d}}0pt{x}{\mu} is the maximum halfspace depth of μ\mu. Of special importance is the set D(μ)=Dα(μ)D^{*}\left(\mu\right)=D_{\alpha^{*}(\mu)}, which contains points that are the most centrally positioned w.r.t. μ\mu. It is called the set of the halfspace medians of μ\mu and, as its name suggests, it generalises the median to d\mathbb{R}^{d}. The halfspace depth has many applications in multivariate statistics, and is already for 30 years a subject of active research [11, 12, 13, 15, 16]. Although many other statistical depth functions have been developed [2, 14, 21], in this paper we focus on the halfspace depth, and sometimes write simply depth instead of halfspace depth. We call a measure μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) smooth if the μ\mu-mass of every hyperplane in d\mathbb{R}^{d} is zero. A measure with a density is smooth; examples of non-smooth measures are those with an atom. The infimum in (1) is attained for smooth measures. That is why theoretical results on the halfspace depth are often formulated only for smooth measures, and why the analysis of the sample halfspace depth (that is, the halfspace depth evaluated w.r.t. empirical measures of random samples) is performed using different techniques [10, 12, 13]. In this paper we introduce flag halfspaces — symmetrised variants of closed halfspaces that may be considered in (1) instead of (x)\mathcal{H}(x) without altering the depth, with the property that a flag halfspace attaining the depth always exists. We will see that our restatement of formula (1) simplifies many theoretical derivations about the halfspace depth, as it is no longer needed to distinguish whether the infimum in (1) is attained.

Flag halfspaces are introduced in Section 2. Two applications to the computation of the depth are given in Section 3. In Section 3.1, we investigate the dimensionality and the structure of the median set D(μ)D^{*}\left(\mu\right) for μ\mu an empirical measure. We show that for datasets sampled from absolutely continuous probability measures in d\mathbb{R}^{d}, the halfspace median set cannot be of dimension d1d-1, almost surely. In a series of examples in 3\mathbb{R}^{3} we demonstrate that already for random samples of size n=8n=8 from the standard Gaussian distribution, halfspace median sets of dimensions 0, 11, and 33 occur with positive probability. In Section 3.2 we deal with the special situation of data of dimension d=2d=2. We show that if the dataset satisfies a mild condition of general position, then the halfspace median set must be either a full-dimensional polygon, or a data point. Both these advances find applications in the computation of the halfspace median and the central regions (2), where the dimensionality of D(μ)D^{*}\left(\mu\right) plays a crucial role [6, 11]. The paper is complemented by online Supplementary Material containing R and Mathematica scripts with visualisations and computations completing examples from Section 3.

Notations.

Some of our proofs are based on convexity theory. As a basic reference we take [17]; we now gather notations and elementary definitions that will be used throughout the paper. The unit sphere in d\mathbb{R}^{d} is 𝕊d1\mathbb{S}^{d-1}. We write SKS\subset K for SS being a proper subset of KK. The restriction of μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) to a Borel set SdS\subseteq\mathbb{R}^{d} is denoted by μ|S(d)\mu|_{S}\in\mathcal{M}\left(\mathbb{R}^{d}\right) and is defined by μ|S(B)=μ(BS)\mu|_{S}\left(B\right)=\mu\left(B\cap S\right) for BdB\subseteq\mathbb{R}^{d} Borel. The affine hull aff(S)\operatorname{aff}\left(S\right) of SdS\subseteq\mathbb{R}^{d} is the smallest affine subspace of d\mathbb{R}^{d} containing SS. The dimension dim(S)\dim(S) of SS is defined as the dimension of aff(S)\operatorname{aff}\left(S\right). For example, the affine hull of two different points in d\mathbb{R}^{d} is the infinite line joining them, and its dimension is 1. We write int(S)\operatorname{int}\left(S\right), cl(S)\operatorname{cl}\left(S\right), and bd(S)\operatorname{bd}\left(S\right) for the interior, closure, and boundary of SdS\subseteq\mathbb{R}^{d}. The interior, closure, and boundary of SS when considered as a subset of its affine hull aff(S)\operatorname{aff}\left(S\right) is denoted by relint(S)\operatorname{relint}\left(S\right), relcl(S)\operatorname{relcl}\left(S\right) and relbd(S)\operatorname{relbd}\left(S\right), and is called the relative interior, relative closure, and relative boundary of SS, respectively. Of course, if dim(S)=d\dim(S)=d, the interior is the same as the relative interior etc.

The class of all closed halfspaces in d\mathbb{R}^{d} is \mathcal{H}. A generic halfspace from \mathcal{H} may be denoted simply by HH; Hx,vH_{x,v} means a halfspace {yd:y,vx,v}\left\{y\in\mathbb{R}^{d}\colon\left\langle y,v\right\rangle\geq\left\langle x,v\right\rangle\right\} whose boundary passes through xdx\in\mathbb{R}^{d} with inner normal vd{0}v\in\mathbb{R}^{d}\setminus\{0\}. For an affine space AdA\subseteq\mathbb{R}^{d} and xAx\in A we denote by (x,A)\mathcal{H}(x,A) the set of all relatively closed halfspaces HH in AA whose relative boundary contains xx; surely (x,d)(x)\mathcal{H}(x,\mathbb{R}^{d})\equiv\mathcal{H}(x). We say that a sequence of halfspaces {Hxn,vn}n=1\{H_{x_{n},v_{n}}\}_{n=1}^{\infty}\subset\mathcal{H} converges to Hx,vH_{x,v}\in\mathcal{H} if xnxx_{n}\rightarrow x and vnvv_{n}\rightarrow v. Finally, for any of the symbols \mathcal{H}, (x)\mathcal{H}(x), or (x,A)\mathcal{H}(x,A), a superscript \circ designates the corresponding relatively open halfspaces, e.g. (x,A)={relint(H):H(x,A)}\mathcal{H}^{\circ}(x,A)=\left\{\operatorname{relint}\left(H\right)\colon H\in\mathcal{H}(x,A)\right\}.

2. Flag halfspaces

For μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) and xdx\in\mathbb{R}^{d} we call H(x)H\in\mathcal{H}(x) a minimising halfspace of μ\mu at xx if μ(H)=0ptxμ\mu(H)=0pt{x}{\mu}. For d=1d=1 minimising halfspaces always trivially exist. They also exist if μ\mu is smooth, or if μ\mu is supported in a finite number of points. In general, however, the infimum in (1) does not have to be attained. We give a simple example.

Example 1.

Take μ(2)\mu\in\mathcal{M}\left(\mathbb{R}^{2}\right) the sum of the Dirac measure at a=(1,1)2a=(1,1)\in\mathbb{R}^{2} and the uniform distribution on the disk {x2:x2}\left\{x\in\mathbb{R}^{2}\colon\left\|x\right\|\leq 2\right\}. For x=(1,0)2x=(1,0)\in\mathbb{R}^{2} no minimising halfspace exists. As we see in Figure 1, the depth D(x;μ)D(x;\mu) is approached by μ(Hx,vn)\mu(H_{x,v_{n}}) for a sequence of halfspaces HnHx,vnH_{n}\equiv H_{x,v_{n}}, n=1,2,n=1,2,\dots, with inner normals vn=(cos(1/n),sin(1/n))v_{n}=\left(\cos(-1/n),\sin(-1/n)\right) that converge to v=(1,0)𝕊1v=(1,0)\in\mathbb{S}^{1}, yet D(x;μ)=limnμ(Hx,vn)<μ(Hx,v)D(x;\mu)=\lim_{n\to\infty}\mu(H_{x,v_{n}})<\mu(H_{x,v}).

Refer to caption
Refer to caption
Figure 1. The support of μ(2)\mu\in\mathcal{M}\left(\mathbb{R}^{2}\right) from Example 1 (coloured disk) and its atom aa (diamond). No minimising halfspace of μ\mu at x=(1,0)2x=(1,0)\in\mathbb{R}^{2} (coloured point) exists. In the left hand panel we see a halfspace Hn(x)H_{n}\in\mathcal{H}(x) whose μ\mu-mass is almost D(x;μ)D(x;\mu). It does not contain aa. In the right hand panel the minimising flag halfspace F(x)F\in\mathcal{F}(x) of μ\mu at xx is displayed.

The problem with measures not attaining the infimum in (1) is elegantly resolved by considering flag halfspaces instead of the usual closed halfspaces.

Definition.

Define (x)\mathcal{F}(x) as the system of all sets FF of the form

(3) F={x}(k=1dGk)F=\{x\}\cup\left(\bigcup_{k=1}^{d}G_{k}\right)

where Gd(x)G_{d}\in\mathcal{H}^{\circ}(x), and Gk(x,relbd(Gk+1))G_{k}\in\mathcal{H}^{\circ}(x,\operatorname{relbd}\left(G_{k+1}\right)) for every k=1,,d1k=1,\dots,d-1. Any element of (x)\mathcal{F}(x) is called a flag halfspace at xx.

The name flag comes from geometry [17], where an analogous recursive construction is considered, involving nested faces of convex polytopes. The formal definition of flag halfspaces is somewhat convoluted, but these sets appear naturally. In 2\mathbb{R}^{2}, a flag halfspace at xx is the union of an open halfplane G2G_{2} whose boundary passes through xx, a relatively open halfline G1G_{1} originating at xx contained in the one-dimensional affine space (line) bd(G2)\operatorname{bd}\left(G_{2}\right), and the 0-dimensional point xx itself. For an example see Figure 1. A flag halfspace is neither an open nor a closed set. In contrast to a usual closed halfspace, a complement of a flag halfspace F(x)F\in\mathcal{F}(x) is, except for its central point xx, again a flag halfspace from (x)\mathcal{F}(x), i.e. (dF){x}(x)(\mathbb{R}^{d}\setminus F)\cup\{x\}\in\mathcal{F}(x). Several more interesting properties and characterisations of flag halfspaces can be found in [9].

We define a minimising flag halfspace of μ\mu at xx to be any F(x)F\in\mathcal{F}(x) that satisfies μ(F)=D(x;μ)\mu(F)=D(x;\mu). In the following Theorem 1 we show that the halfspace depth (1) of any measure can be expressed in terms of the μ\mu-mass of flag halfspaces, and a minimising flag halfspace always exists. The intuition behind this result is as follows: Even if the minimising closed halfspace of xdx\in\mathbb{R}^{d} does not exist, there is a sequence of closed halfspaces {Hn}n=1(x)\{H_{n}\}_{n=1}^{\infty}\subset\mathcal{H}(x) that satisfies

(4) limnμ(Hn)=0ptxμ.\lim_{n\to\infty}\mu\left(H_{n}\right)=0pt{x}{\mu}.

Because the unit normals {vn}n=1\{v_{n}\}_{n=1}^{\infty} of these halfspaces come from the compact set 𝕊d1\mathbb{S}^{d-1}, we can also assume that the sequence of halfspaces is convergent and limnvn=v𝕊d1\lim_{n\to\infty}v_{n}=v\in\mathbb{S}^{d-1} (otherwise, we extract a convergent subsequence). For nn large enough, μ(Hn)\mu\left(H_{n}\right) is arbitrarily close to 0ptxμ0pt{x}{\mu}, but this fact alone, of course, does not imply that the μ\mu-mass of the limit HHx,vH\equiv H_{x,v} defined as H=limnHnH=\lim_{n\to\infty}H_{n} is equal to 0ptxμ0pt{x}{\mu}. It turns out that for general measures, it is not possible to find any useful upper bound on the mass μ(H)\mu\left(H\right), but it is possible to bound the mass of its interior by μ(int(H))0ptxμ\mu\left(\operatorname{int}\left(H\right)\right)\leq 0pt{x}{\mu}. The interior of HH is the first open halfspace GdG_{d} in the construction of the minimising flag halfspace (3). The remaining relatively open halfspaces GkG_{k} are found by iterating the same process inside the relative boundary of the previous Gk+1G_{k+1}, k=1,,d1k=1,\dots,d-1. In the right hand panel of Figure 2 we see a visualisation of our setup, with d=2d=2. In the situation displayed, as nn\to\infty, the halfspaces HnH_{n} do not intersect the halfline G1bd(H)G_{1}^{-}\subset\operatorname{bd}\left(H\right) originating at xx, so the μ\mu-mass of G1G_{1}^{-} does not contribute to the depth of xx, and G1G_{1}^{-} should not be contained in FF. The formal statement of our theorem follows.

Refer to caption
Refer to caption
Figure 2. Left hand panel: A flag halfspace F={x}G1G2(x)F=\{x\}\cup G_{1}\cup G_{2}\in\mathcal{F}(x) in the plane. For any line segment with endpoints y,zy,z passing through xx, exactly one of the points y,zy,z belongs to FF. Right hand panel: H=cl(F)H=\operatorname{cl}\left(F\right) for HH being the limit of a sequence of closed halfspaces {Hn}n=1\left\{H_{n}\right\}_{n=1}^{\infty} satisfying (4). Certainly, μ(H)\mu\left(H\right) is not necessarily equal to 0ptxμ0pt{x}{\mu} because it is possible that μ(HHn)μ(G1)>0\mu\left(H\setminus H_{n}\right)\geq\mu\left(G_{1}^{-}\right)>0.
Theorem 1.

For any μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) and xdx\in\mathbb{R}^{d} we have

(5) D(x;μ)=min{μ(F):F(x)}.D\left(x;\mu\right)=\min\left\{\mu\left(F\right)\colon F\in\mathcal{F}(x)\right\}.

In particular, there always exists a minimising flag halfspace.

Proof.

Let {Hn}n=1(x)\left\{H_{n}\right\}_{n=1}^{\infty}\subset\mathcal{H}(x) be a sequence of halfspaces satisfying (4) with limit HHx,v=limnHnH\equiv H_{x,v}=\lim_{n\to\infty}H_{n}. For all n=1,2,n=1,2,\dots we have

(6) μ(Hn)μ(HnH)=μ(Hnint(H))+μ(Hnbd(H)).\mu(H_{n})\geq\mu(H_{n}\cap H)=\mu\left(H_{n}\cap\operatorname{int}\left(H\right)\right)+\mu\left(H_{n}\cap\operatorname{bd}\left(H\right)\right).

We first bound both summands on the right hand side from below. For each n=1,2,n=1,2,\dots we define An=(mnHm)int(H)Hnint(H)A_{n}=\left(\bigcap_{m\geq n}H_{m}\right)\cap\operatorname{int}\left(H\right)\subseteq H_{n}\cap\operatorname{int}\left(H\right). From the convergence of the halfspaces {Hn}n=1\left\{H_{n}\right\}_{n=1}^{\infty} we know that Anint(H)A_{n}\uparrow\operatorname{int}\left(H\right) as nn\rightarrow\infty, and using the continuity of measure from below [4, Theorem 3.1.11] we obtain the equality in

(7) μ(int(H))=limnμ(An)lim infnμ(Hnint(H)).\mu(\operatorname{int}\left(H\right))=\lim_{n\to\infty}\mu\left(A_{n}\right)\leq\liminf_{n\to\infty}\mu\left(H_{n}\cap\operatorname{int}\left(H\right)\right).

On the other side, xHnbd(H)x\in H_{n}\cap\operatorname{bd}\left(H\right) for all n=1,2,n=1,2,\dots, so Hnbd(H)H_{n}\cap\operatorname{bd}\left(H\right) is either a closed halfspace when considered in the (d1)(d-1)-dimensional space bd(H)\operatorname{bd}\left(H\right), or is equal to bd(H)\operatorname{bd}\left(H\right). In any case, we have that μ(Hnbd(H))0ptxμ|bd(H)\mu\left(H_{n}\cap\operatorname{bd}\left(H\right)\right)\geq 0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}} for μ|bd(H)\mu|_{\operatorname{bd}\left(H\right)} the restriction of μ\mu to the hyperplane bd(H)\operatorname{bd}\left(H\right). Consequently

(8) lim infnμ(Hnbd(H))0ptxμ|bd(H).\liminf_{n\to\infty}\mu\left(H_{n}\cap\operatorname{bd}\left(H\right)\right)\geq 0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}.

Combining (6), (7) and (8) one gets

(9) 0ptxμ\displaystyle 0pt{x}{\mu} =limnμ(Hn)lim infnμ(Hnint(H))+lim infnμ(Hnbd(H))\displaystyle=\lim_{n\to\infty}\mu(H_{n})\geq\liminf_{n\to\infty}\mu\left(H_{n}\cap\operatorname{int}\left(H\right)\right)+\liminf_{n\to\infty}\mu\left(H_{n}\cap\operatorname{bd}\left(H\right)\right)
μ(int(H))+0ptxμ|bd(H).\displaystyle\geq\mu(\operatorname{int}\left(H\right))+0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}.

Assume now for a contradiction that the inequality in (9) is strict, i.e. that 0ptxμμ(int(H))0ptxμ|bd(H)=c>00pt{x}{\mu}-\mu(\operatorname{int}\left(H\right))-0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}=c>0. The definition of the halfspace depth implies that there exists a halfspace H~(x,bd(H))\widetilde{H}\in\mathcal{H}(x,\operatorname{bd}\left(H\right)) in the hyperplane bd(H)\operatorname{bd}\left(H\right) that satisfies

(10) μ|bd(H)(H~)<0ptxμ|bd(H)+c/2.\mu|_{\operatorname{bd}\left(H\right)}(\widetilde{H})<0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}+c/2.

Denote by v~𝕊d1\widetilde{v}\in\mathbb{S}^{d-1} the unit inner normal of H~\widetilde{H} and set wn=v+v~/nw_{n}=v+\widetilde{v}/n and Cn=Hx,wnHC_{n}=H_{x,w_{n}}\setminus H for n=1,2,n=1,2,\dots. Then wnvw_{n}\to v and CnC_{n}\downarrow\emptyset as nn\to\infty, meaning that limnHx,wn=H\lim_{n\to\infty}H_{x,w_{n}}=H and limnμ(Cn)=0\lim_{n\to\infty}\mu(C_{n})=0 due to the continuity of measure from above [4, Theorem 3.1.1]. For nn large enough we have μ(Hx,wnH)<c/2\mu(H_{x,w_{n}}\setminus H)<c/2. Note also that Hx,wnbd(H)=H~H_{x,w_{n}}\cap\operatorname{bd}\left(H\right)=\widetilde{H} for all n=1,2,n=1,2,\dots, due to the choice of wnw_{n}. Altogether, we have

(11) μ(Hx,wn)\displaystyle\mu(H_{x,w_{n}}) =μ(Hx,wnint(H))+μ(Hx,wnbd(H))+μ(Hx,wnH)\displaystyle=\mu(H_{x,w_{n}}\cap\operatorname{int}\left(H\right))+\mu(H_{x,w_{n}}\cap\operatorname{bd}\left(H\right))+\mu(H_{x,w_{n}}\setminus H)
<μ(int(H))+μ(H~)+c/2=μ(int(H))+μ|bd(H)(H~)+c/2\displaystyle<\mu(\operatorname{int}\left(H\right))+\mu(\widetilde{H})+c/2=\mu(\operatorname{int}\left(H\right))+\mu|_{\operatorname{bd}\left(H\right)}(\widetilde{H})+c/2
<μ(int(H))+0ptxμ|bd(H)+c=0ptxμ,\displaystyle<\mu(\operatorname{int}\left(H\right))+0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}+c=0pt{x}{\mu},

where the last inequality in (11) follows from (10). Note that because Hx,wn(x)H_{x,w_{n}}\in\mathcal{H}(x), inequality (11) contradicts the definition of the halfspace depth (1), and we get

(12) 0ptxμ=μ(Gd)+0ptxμ|relbd(Gd),0pt{x}{\mu}=\mu(G_{d})+0pt{x}{\mu|_{\operatorname{relbd}\left(G_{d}\right)}},

where we denoted Gd=int(H)(x)G_{d}=\operatorname{int}\left(H\right)\in\mathcal{H}^{\circ}(x). We have just constructed the first open halfspace GdG_{d} in the system (3). We proceed by induction. We consider μ|bd(H)=μ|relbd(Gd)\mu|_{\operatorname{bd}\left(H\right)}=\mu|_{\operatorname{relbd}\left(G_{d}\right)} instead of μ\mu and using the same argument obtain Gd1(x,relbd(Gd))G_{d-1}\in\mathcal{H}^{\circ}(x,\operatorname{relbd}\left(G_{d}\right)) that satisfies an equation analogous to (12), i.e. 0ptxμ|relbd(Gd)=μ(Gd1)+0ptxμ|relbd(Gd1)0pt{x}{\mu|_{\operatorname{relbd}\left(G_{d}\right)}}=\mu(G_{d-1})+0pt{x}{\mu|_{\operatorname{relbd}\left(G_{d-1}\right)}}. Continuing the same procedure we eventually obtain a flag halfspace F={x}(k=1dGk)F=\{x\}\cup\left(\bigcup_{k=1}^{d}G_{k}\right) such that

0ptxμ\displaystyle 0pt{x}{\mu} =μ(int(H))+0ptxμ|bd(H)=μ(Gd)+μ(Gd1)+0ptxμ|relbd(Gd1)=\displaystyle=\mu(\operatorname{int}\left(H\right))+0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}=\mu(G_{d})+\mu(G_{d-1})+0pt{x}{\mu|_{\operatorname{relbd}\left(G_{d-1}\right)}}=\dots
=k=2dμ(Gk)+0ptxμ|relbd(G2)=k=1dμ(Gk)+μ({x})=μ(F).\displaystyle=\sum_{k=2}^{d}\mu(G_{k})+0pt{x}{\mu|_{\operatorname{relbd}\left(G_{2}\right)}}=\sum_{k=1}^{d}\mu(G_{k})+\mu(\{x\})=\mu(F).

The last but one equality above follows from the fact that relbd(G2)\operatorname{relbd}\left(G_{2}\right) is a line, meaning that G1G_{1} is one of the two relatively open halflines determined by xx in relbd(G2)\operatorname{relbd}\left(G_{2}\right) having a smaller μ\mu-mass. Thus, 0ptxμ|relbd(G2)=μ({x})+μ(G1)0pt{x}{\mu|_{\operatorname{relbd}\left(G_{2}\right)}}=\mu\left(\{x\}\right)+\mu(G_{1}). ∎

In Example 1, the single minimising flag halfspace of μ\mu at xx is

F={x}{(1,x2)2:x2<0}{(x1,x2)2:x1>1}(x).F=\{x\}\cup\left\{\left(1,x_{2}\right)\in\mathbb{R}^{2}\colon x_{2}<0\right\}\cup\left\{\left(x_{1},x_{2}\right)\in\mathbb{R}^{2}\colon x_{1}>1\right\}\in\mathcal{F}(x).

In formula (12) in the proof of Theorem 1 we unveiled the recursive nature of the halfspace depth. The following result formalises that observation. In the special situation of an empirical measure μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right), a related result has been observed in [5, Theorems 1 and 2] and successfully applied in the task of exact computation of the halfspace depth.

Corollary 2.

For xdx\in\mathbb{R}^{d} and μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) it holds true that

0ptxμ=infH(x)(μ(int(H))+0ptxμ|bd(H)).0pt{x}{\mu}=\inf_{H\in\mathcal{H}(x)}\left(\mu(\operatorname{int}\left(H\right))+0pt{x}{\mu|_{\operatorname{bd}\left(H\right)}}\right).
Proof.

There are more flag halfspaces in (x)\mathcal{F}(x) than closed halfspaces in (x)\mathcal{H}(x), in the sense that the mapping (x)(x):Fcl(F)\mathcal{F}(x)\to\mathcal{H}(x)\colon F\mapsto\operatorname{cl}\left(F\right) is not bijective. We define an equivalence relation \sim between the elements of (x)\mathcal{F}(x) by

F1F2 if and only if cl(F1)=cl(F2).F_{1}\sim F_{2}\mbox{ if and only if }\operatorname{cl}\left(F_{1}\right)=\operatorname{cl}\left(F_{2}\right).

By 𝒦\mathcal{K} we denote the quotient set of \sim. This allows us to rewrite (5) from Theorem 1 as

0ptxμ=minF(x)μ(F)=infK𝒦infFKμ(F).0pt{x}{\mu}=\min_{F\in\mathcal{F}(x)}\mu(F)=\inf_{K\in\mathcal{K}}\inf_{F\in K}\mu(F).

Note that for flag halfspaces, int(F1)=int(F2)\operatorname{int}\left(F_{1}\right)=\operatorname{int}\left(F_{2}\right) is equivalent with cl(F1)=cl(F2)\operatorname{cl}\left(F_{1}\right)=\operatorname{cl}\left(F_{2}\right). Take K𝒦K\in\mathcal{K} and denote GK=int(F)(x)G_{K}=\operatorname{int}\left(F\right)\in\mathcal{H}^{\circ}(x) for FKF\in K. Then each FKF\in K can be represented as F=GKFF=G_{K}\cup F^{\prime}, for FF^{\prime} a flag halfspace centred at xx when considered inside the affine space bd(GK)\operatorname{bd}\left(G_{K}\right) (denoted by F(x,bd(GK))F^{\prime}\in\mathcal{F}\left(x,\operatorname{bd}\left(G_{K}\right)\right)). We get, using Theorem 1 again,

infFKμ(F)=μ(GK)+infF(x,bd(GK))μ(F)=μ(GK)+0ptxμ|bd(GK).\inf_{F\in K}\mu(F)=\mu(G_{K})+\inf_{F^{\prime}\in\mathcal{F}\left(x,\operatorname{bd}\left(G_{K}\right)\right)}\mu(F^{\prime})=\mu(G_{K})+0pt{x}{\mu|_{\operatorname{bd}\left(G_{K}\right)}}.

The mapping (x)(x):Gcl(G)\mathcal{H}^{\circ}(x)\to\mathcal{H}(x)\colon G\mapsto\operatorname{cl}\left(G\right) is a bijection, so any K𝒦K\in\mathcal{K} corresponds to exactly one element H=cl(GK)(x)H=\operatorname{cl}\left(G_{K}\right)\in\mathcal{H}(x), and we obtain desired result. ∎

3. Applications: Properties of the sample halfspace median

We now use flag halfspaces to derive several properties of the sample halfspace median that are of interest in the practice of the depth; additional applications of flag halfspaces to the theory of the halfspace depth can be found in [8, 9]. Write 𝒜(d)\mathcal{A}\left(\mathbb{R}^{d}\right) for the set of all empirical measures μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right), that is all purely atomic probability measures with a finite number nn of atoms, each atom having μ\mu-mass 1/n1/n, for some n=1,2,n=1,2,\dots. These measures are typically obtained observing a random sample X1,,XndX_{1},\dots,X_{n}\in\mathbb{R}^{d} from a probability distribution ν(d)\nu\in\mathcal{M}\left(\mathbb{R}^{d}\right), each sample point corresponding to an atom. To approximate the halfspace depth of ν\nu, the depth of μ\mu is computed. The latter depth function is standardly used for inference about the unknown distribution ν\nu. Naturally, it is therefore crucial to understand the behaviour of the halfspace depth w.r.t. empirical measures. We provide results on the dimensionality of the median set, assuming that the atoms of μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right) lie in a sufficiently general position. The last assumption is not restrictive; it is satisfied if, for instance, the measure ν\nu from which we sample is smooth. The proof of the following lemma is standard and omitted.

Lemma 3.

Let X1,X2,,XnX_{1},X_{2},\dots,X_{n} be independent random variables sampled from smooth (and possibly different) probability measures from (d)\mathcal{M}\left(\mathbb{R}^{d}\right). Then the following holds true almost surely.

  1. (i)

    The points X1,X2,,XnX_{1},X_{2},\dots,X_{n} are in general position.222A set SS of points in d\mathbb{R}^{d} is said to lie in general position if no subset of kk of these points lies in a (k2)(k-2)-dimensional affine space, for all k=2,,d+1k=2,\dots,d+1. If there are n>dn>d points in SS, this is equivalent to saying that no hyperplane in d\mathbb{R}^{d} contains more than dd points from SS.

  2. (ii)

    Writing l(x,y)l(x,y) for the infinite line determined by xydx\neq y\in\mathbb{R}^{d}, if d2d\geq 2 and k1,,k6{1,2,,n}k_{1},\dots,k_{6}\in\{1,2,\dots,n\} are pairwise different indices, then

    l(Xk1,Xk2)l(Xk3,Xk4)l(Xk5,Xk6)=.l(X_{k_{1}},X_{k_{2}})\cap l(X_{k_{3}},X_{k_{4}})\cap l(X_{k_{5}},X_{k_{6}})=\emptyset.

3.1. Dimensionality of the sample halfspace median

As our first application we show that for an empirical measure with atoms in general position, the median set D(μ)D^{*}\left(\mu\right) in dimension d2d\geq 2 cannot be (d1)(d-1)-dimensional, unless we are in the trivial case when the number of atoms is equal to dd. Our findings should be seen as complementary to the earlier advances from [19, Lemma 6], where it was demonstrated that for μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right) with atoms in general position are all the depth regions Dα(μ)D_{\alpha}(\mu) full-dimensional, except for possibly the depth median D(μ)D^{*}\left(\mu\right).

Theorem 4.

Let μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right) be a measure with nn atoms of mass 1/n1/n in general position. If nd2n\neq d\geq 2, then dim(D(μ))d1\dim(D^{*}\left(\mu\right))\neq d-1.

Proof.

We use two auxiliary lemmas. Our first lemma is a special case of a more general result that can be found in [7, Lemma 4]. In [7], that lemma is formulated with a final inequality μ(int(H))α\mu(\operatorname{int}\left(H\right))\leq\alpha; for μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right) also a strict inequality can be written, because the depth of μ\mu attains only finitely many values.

Lemma 5.

Suppose that μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right), α>0\alpha>0, a point xDα(μ)x\notin D_{\alpha}(\mu) and a face FF of Dα(μ)D_{\alpha}(\mu) are given so that the relatively open line segment L(x,y)L(x,y) formed by xx and yy does not intersect Dα(μ)D_{\alpha}(\mu) for any yFy\in F. Then there exists a touching333A halfspace HH\in\mathcal{H} is a touching halfspace of a non-empty convex set AdA\subset\mathbb{R}^{d} if Hcl(A)H\cap\operatorname{cl}\left(A\right)\neq\emptyset and int(H)A=\operatorname{int}\left(H\right)\cap A=\emptyset. halfspace HH\in\mathcal{H} of Dα(μ)D_{\alpha}(\mu) such that μ(int(H))α\mu(\operatorname{int}\left(H\right))\leq\alpha, xHx\in H and Fbd(H)F\subset\operatorname{bd}\left(H\right). If, in addition, μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right), then we can write even μ(int(H))<α\mu(\operatorname{int}\left(H\right))<\alpha.

Our second lemma is a simple observation about the structure of a simplex, that is a convex hull of k+1k+1 points in general position, in the linear space k\mathbb{R}^{k}. These k+1k+1 points are called the vertices of SS.

Lemma 6.

For a simplex SkS\subset\mathbb{R}^{k} and any convex set KSK\subseteq S with non-empty interior there exist x,yKx,y\in K and v𝕊k1v\in\mathbb{S}^{k-1} such that each of the disjoint halfspaces Hx,vH_{x,v} and Hy,vH_{y,-v} contains only one vertex of SS.

Proof.

In this proof, all the vectors are column vectors, and by A𝖳A^{\mathsf{T}} we denote the transpose of a matrix AA. Denote s1,,sk+1Ss_{1},\dots,s_{k+1}\in S the vertices of SS. Denote by aa any point in the interior of KK. We first transform both SS and KK by an affine transform T:kk:zAz+bT\colon\mathbb{R}^{k}\to\mathbb{R}^{k}\colon z\mapsto A\,z+b for Ak×kA\in\mathbb{R}^{k\times k} non-singular and bkb\in\mathbb{R}^{k} such that T(si)=eiT(s_{i})=e_{i} for each i=1,,ki=1,\dots,k for eie_{i} the ii-th standard basis vector in k\mathbb{R}^{k}, and T(a)=0T(a)=0 is the origin in k\mathbb{R}^{k}. Such an affine transform certainly exists, because each full-dimensional simplex in k\mathbb{R}^{k} can be uniquely mapped to any other one using an invertible affine mapping. Because aint(K)int(S)a\in\operatorname{int}\left(K\right)\subseteq\operatorname{int}\left(S\right), the origin T(a)T(a) must be contained in the interior of the TT-image of SS defined by T(S)={T(z):zS}T(S)=\left\{T(z)\colon z\in S\right\}, meaning that necessarily T(sk+1)(,0)kT(s_{k+1})\in(-\infty,0)^{k}. Since KK is a convex set with aa in its interior, also T(K)T(K) is convex with 0=T(a)int(T(K))0=T(a)\in\operatorname{int}\left(T(K)\right). Thus, there is a closed ball BB centred at the origin with radius δ>0\delta>0 small enough so that BT(K)T(S)B\subseteq T(K)\subseteq T(S). For v~=e1𝕊k1\widetilde{v}=e_{1}\in\mathbb{S}^{k-1} we have v~,T(s1)=v~,e1=1\left\langle\widetilde{v},T(s_{1})\right\rangle=\left\langle\widetilde{v},e_{1}\right\rangle=1, v~,T(si)=v~,ei=0\left\langle\widetilde{v},T(s_{i})\right\rangle=\left\langle\widetilde{v},e_{i}\right\rangle=0 for i=2,,ki=2,\dots,k, and v~,T(ek+1)<0\left\langle\widetilde{v},T(e_{k+1})\right\rangle<0. Take x~=δe1B\widetilde{x}=\delta\,e_{1}\in B and y~=x~B\widetilde{y}=-\widetilde{x}\in B. Then Hx~,v~={zk:v~,zδ}H_{\widetilde{x},\widetilde{v}}=\left\{z\in\mathbb{R}^{k}\colon\left\langle\widetilde{v},z\right\rangle\geq\delta\right\} contains e1=T(s1)e_{1}=T(s_{1}) as the only vertex of T(S)T(S), and Hy~,v~={zk:v~,zδ}H_{\widetilde{y},-\widetilde{v}}=\left\{z\in\mathbb{R}^{k}\colon\left\langle\widetilde{v},z\right\rangle\leq-\delta\right\} contains only T(ek+1)T(e_{k+1}) as the only vertex of T(S)T(S). Certainly, also Hx~,v~Hy~,v~=H_{\widetilde{x},\widetilde{v}}\cap H_{\widetilde{y},-\widetilde{v}}=\emptyset. Now it remains to apply the inverse affine transform T1:kk:zA1(zb)T^{-1}\colon\mathbb{R}^{k}\to\mathbb{R}^{k}\colon z\mapsto A^{-1}\left(z-b\right) for A1k×kA^{-1}\in\mathbb{R}^{k\times k} the inverse of AA, and define x=T1(x~)x=T^{-1}(\widetilde{x}), y=T1(y~)y=T^{-1}(\widetilde{y}), and v=(A𝖳e1)/(A𝖳e1)𝕊k1v=\left(A^{\mathsf{T}}e_{1}\right)/\left\|\left(A^{\mathsf{T}}e_{1}\right)\right\|\in\mathbb{S}^{k-1}. Because vv is taken to be the inner normal vector of T1(Hx~,v~)=Hx,vT^{-1}(H_{\widetilde{x},\widetilde{v}})=H_{x,v}, we indeed found the desired pair of halfspaces Hx,vH_{x,v} and Hy,vH_{y,-v}. ∎

We are ready to prove Theorem 4. Recall that α(μ)=supxd0ptxμ\alpha^{*}(\mu)=\sup_{x\in\mathbb{R}^{d}}0pt{x}{\mu}. Assume for a contradiction that dim(D(μ))=d1\dim(D^{*}\left(\mu\right))=d-1. Then D(μ)D^{*}\left(\mu\right) is contained in a hyperplane that determines two different closed halfspaces — we denote them by H+H^{+} and HH^{-}, respectively. Take any wint(H+)w\in\operatorname{int}\left(H^{+}\right) and qint(H)q\in\operatorname{int}\left(H^{-}\right). We can consider the set D(μ)D^{*}\left(\mu\right) itself as a (d1)(d-1)-dimensional face of D(μ)D^{*}\left(\mu\right) that satisfies the conditions of Lemma 5 for either of the choices x=wx=w, or x=qx=q. We apply Lemma 5 twice, first to x=wx=w and then also to x=qx=q. We obtain that μ(int(H+))<α(μ)\mu(\operatorname{int}\left(H^{+}\right))<\alpha^{*}(\mu) and μ(int(H))<α(μ)\mu(\operatorname{int}\left(H^{-}\right))<\alpha^{*}(\mu). Denoting G+=int(H+)G^{+}=\operatorname{int}\left(H^{+}\right), G=int(H)G^{-}=\operatorname{int}\left(H^{-}\right) and A=bd(H+)=bd(H)A=\operatorname{bd}\left(H^{+}\right)=\operatorname{bd}\left(H^{-}\right) we can write

(13) max{μ(G+),μ(G)}<α(μ).\max\left\{\mu(G^{+}),\mu(G^{-})\right\}<\alpha^{*}(\mu).

Applying Corollary 2 to xD(μ)x\in D^{*}\left(\mu\right) and halfspaces H+H^{+} and HH^{-} and using (13), we get that

(14) 0ptxμ|Aα(μ)μ(G)>0 for all xD(μ) and G{G+,G}.0pt{x}{\mu|_{A}}\geq\alpha^{*}(\mu)-\mu(G)>0\mbox{ for all }x\in D^{*}\left(\mu\right)\mbox{ and }G\in\{G^{+},G^{-}\}.

Because dim(D(μ))=d1\dim(D^{*}\left(\mu\right))=d-1 and 0ptxμ|A>00pt{x}{\mu|_{A}}>0 for all xD(μ)x\in D^{*}\left(\mu\right), there must exist at least dd atoms of μ\mu in the hyperplane AA. At the same time, due to our assumption of the atoms of μ\mu being in general position, there are at most dd atoms of μ\mu in any hyperplane, meaning that AA contains exactly dd atoms of μ\mu, and these atoms are in general position inside AA. Consequently,

(15) 0ptxμ|A=1/nfor all xD(μ).0pt{x}{\mu|_{A}}=1/n\quad\mbox{for all }x\in D^{*}\left(\mu\right).

From (14) and (15) it follows that α(μ)>μ(G)α(μ)1/n\alpha^{*}(\mu)>\mu(G)\geq\alpha^{*}(\mu)-1/n for G{G+,G}G\in\{G^{+},G^{-}\}. Since μ\mu is an empirical measure with nn atoms, the μ\mu-mass of any set can be only a multiple of 1/n1/n, so it must be that μ(G+)=μ(G)=α(μ)1/n\mu(G^{+})=\mu(G^{-})=\alpha^{*}(\mu)-1/n. We obtain

(16) 2(α(μ)1n)=μ(G+)+μ(G)=1μ(A).2\left(\alpha^{*}(\mu)-\frac{1}{n}\right)=\mu(G^{+})+\mu(G^{-})=1-\mu(A).

Since we have shown that there are exactly dd atoms of μ\mu in AA, it has to be ndn\geq d. From an assumption of our theorem we thus have n>dn>d. Then there exists zdAz\in\mathbb{R}^{d}\setminus A such that μ({z})=1/n\mu(\{z\})=1/n. We apply Lemma 6 in the subspace AA to conclude that there exist x,yD(μ)x,y\in D^{*}\left(\mu\right) and closed halfspaces Hx,v,Hy,vH_{x,v},H_{y,-v} in space AA such that μ|A(Hx,v)=μ|A(Hy,v)=1/n\mu|_{A}(H_{x,v})=\mu|_{A}(H_{y,-v})=1/n and Hx,vHy,v=H_{x,v}\cap H_{y,-v}=\emptyset. Choose a full-dimensional halfspace Hx,uH_{x,u}\in\mathcal{H} that meets the conditions Hx,uA=Hx,vH_{x,u}\cap A=H_{x,v} and zHx,uHy,uz\notin H_{x,u}\cup H_{y,-u}. Denote Sx=Hx,uAS_{x}=H_{x,u}\setminus A and Sy=Hy,uAS_{y}=H_{y,-u}\setminus A. Then Hx,u=Hx,vSxH_{x,u}=H_{x,v}\cup S_{x} and Hy,u=Hy,vSyH_{y,-u}=H_{y,-v}\cup S_{y}, so we have

(17) μ(Hx,u)=1/n+μ(Sx),μ(Hy,u)=1/n+μ(Sy).\mu(H_{x,u})=1/n+\mu(S_{x}),\quad\mu(H_{y,-u})=1/n+\mu(S_{y}).

Note that the sets SxS_{x} and SyS_{y} are disjoint and (SxSy)(A{z})=\left(S_{x}\cup S_{y}\right)\cap\left(A\cup\{z\}\right)=\emptyset, so

(18) μ(Sx)+μ(Sy)=μ(SxSy)1μ(A{z})=1μ(A)1n.\mu(S_{x})+\mu(S_{y})=\mu(S_{x}\cup S_{y})\leq 1-\mu\left(A\cup\{z\}\right)=1-\mu(A)-\frac{1}{n}.

Combining (16), (17) and (18), we obtain

μ(Hx,u)+μ(Hy,u)=2n+μ(Sx)+μ(Sy)2α(μ)1n.\mu(H_{x,u})+\mu(H_{y,-u})=\frac{2}{n}+\mu(S_{x})+\mu(S_{y})\leq 2\alpha^{*}(\mu)-\frac{1}{n}.

It follows that min{μ(Hx,u),μ(Hy,u)}<α(μ)\min\{\mu(H_{x,u}),\mu(H_{y,-u})\}<\alpha^{*}(\mu), a contradiction with our choice {x,y}D(μ)\{x,y\}\subset D^{*}\left(\mu\right). ∎

Theorem 4 is valid for empirical measures; an analogous theorem for absolutely continuous measures can be found in [18, Proposition 3.4]. There, it was shown that for μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right) satisfying certain smoothness conditions including the existence of the density, the dimension of the median D(μ)D^{*}\left(\mu\right) cannot exceed d2d-2 provided that d2d\geq 2. A version of the latter theorem with weaker conditions, but still requiring smoothness and contiguous support of μ(d)\mu\in\mathcal{M}\left(\mathbb{R}^{d}\right), is given in [7, Corollary 7]. Unlike the proofs for smooth measures, the proof of Theorem 4 requires the use of flag halfspaces, which makes the derivation more technical and delicate. Without the assumption of general position, the claim of Theorem 4 is not valid. An example of a measure in μ𝒜(2)\mu\in\mathcal{A}\left(\mathbb{R}^{2}\right) whose atoms are not in general position but dim(D(μ))=1\dim(D^{*}\left(\mu\right))=1 is given in [7, Section 2].

Excluding the case of dim(D(μ))=d1\dim(D^{*}\left(\mu\right))=d-1 for random samples from smooth probability measures, one can ask whether there are other dimensions that the sample median set cannot attain. The answer is negative already in the case of n=8n=8 points sampled randomly from a Gaussian distribution in 3\mathbb{R}^{3}, as we show in the next example.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3. Example 2. Top left hand panel: Convex hull of the sample points in case k=3k=3 (outer polyhedron) and the full-dimensional median set D(μ)D^{*}\left(\mu\right) (inner coloured polyhedron). Top right hand panel: Convex hull of the sample points in case k=1k=1 (outer polyhedron), the median line segment (thick green line segment between the pair of the inner coloured points), and three planes, each separating two sample points from the median set (yellow planes). Bottom panels: Convex hull of the sample points (coloured polyhedron) with the single median (green point). The halfspace in the right hand panel is one minimising halfspace of the median zz containing 33 sample points (the green one and two blue ones). For interactive visualisations see the supplementary Mathematica script.
Example 2.

For ν(3)\nu\in\mathcal{M}\left(\mathbb{R}^{3}\right) the standard Gaussian probability measure and X1,,X8X_{1},\dots,X_{8} a random sample from ν\nu with empirical measure μ𝒜(3)\mu\in\mathcal{A}\left(\mathbb{R}^{3}\right), the median set D(μ)D^{*}\left(\mu\right) is of dimension 33, 11, or a single-point set, all with positive probability. The claim follows by considering three setups of eight points x1,,x8x_{1},\dots,x_{8} in the space 3\mathbb{R}^{3}. Denote k=dim(D(μ))k=\dim\left(D^{*}\left(\mu\right)\right) and write μ𝒜(3)\mu\in\mathcal{A}\left(\mathbb{R}^{3}\right) for the empirical measure of x1,,x8x_{1},\dots,x_{8}. The direct computations described below are based on the analysis performed using the R package TukeyRegion [1] for evaluation of full-dimensional central regions, and the Mathematica visualisations provided in the script in the online Supplementary Material. Plots of the three setups below are displayed in Figure 3.

  • Case k=3k=3. This situation is standard and common. For example, direct computation shows that already for randomly perturbed vertices of a unit cube in 3\mathbb{R}^{3}, i.e. points in a configuration where the convex hull of x1,,x8x_{1},\dots,x_{8} contains all the eight points on its boundary, possess a full-dimensional polyhedral median set with maximum depth 2/82/8.

  • Case k=1k=1. Arrange the points so that x1,x2,x3x_{1},x_{2},x_{3} form vertices of a triangle T1T_{1} in a plane, and x4,x5,x6x_{4},x_{5},x_{6} form vertices of a triangle T2T_{2} in a plane parallel to that determined by T1T_{1}, so that the convex hull of x1,,x6x_{1},\dots,x_{6} is a triangular prism in 3\mathbb{R}^{3}. To obtain points in general position, we perturb the six points slightly. Direct computation shows that for these six points, the halfspace median set is a three-dimensional polyhedron MM inside the prism that does not intersect T1T_{1} or T2T_{2}, of points with depth 2/62/6. Place the last two points x7x_{7} and x8x_{8} in the interior of MM, so that the straight line l(x7,x8)l(x_{7},x_{8}) between these points intersects both relative interiors of T1T_{1} and T2T_{2}. Note that certainly 0ptx7μ=0ptx8μ=3/80pt{x_{7}}{\mu}=0pt{x_{8}}{\mu}=3/8, since the two points were placed inside MM. No point can have depth 4/84/8, as in that situation the setup would exhibit halfspace symmetry which is clearly impossible [10, Proposition 1]. Finally, projecting all points of μ\mu into the plane orthogonal to l(x7,x8)l(x_{7},x_{8}) shows that any point yl(x7,x8)y\notin l(x_{7},x_{8}) can be separated from l(x7,x8)l(x_{7},x_{8}) by a plane that is parallel to l(x7,x8)l(x_{7},x_{8}) and contains only two sample points, meaning that 0ptyμ2/80pt{y}{\mu}\leq 2/8. The median set of μ\mu is therefore the line segment between x7x_{7} and x8x_{8}, with depth 3/83/8.

  • Case k=0k=0. Consider four points x1,,x4x_{1},\dots,x_{4} forming the vertices of a tetrahedron TT (blue points in the bottom panels of Figure 3). Three points x5,x6,x7Tx_{5},x_{6},x_{7}\notin T are attached to three different facets of TT so that each of these points together with its facet forms another (non-regular) tetrahedron not intersecting int(T)\operatorname{int}\left(T\right) (red points in the bottom panels of Figure 3). Finally, a single point x8x_{8} is placed strategically inside TT into the full-dimensional halfspace median of x1,,x7x_{1},\dots,x_{7}. An example is the configuration

    x1\displaystyle x_{1} =(1,0,12),\displaystyle=\left(1,0,-\frac{1}{\sqrt{2}}\right), x2\displaystyle x_{2} =(1,0,12),\displaystyle=\left(-1,0,-\frac{1}{\sqrt{2}}\right), x3\displaystyle x_{3} =(0,1,12),\displaystyle=\left(0,-1,\frac{1}{\sqrt{2}}\right), x4\displaystyle x_{4} =(0,1,12),\displaystyle=\left(0,1,\frac{1}{\sqrt{2}}\right),
    x5\displaystyle x_{5} =(0,1,14),\displaystyle=\left(0,1,-\frac{1}{4}\right), x6\displaystyle x_{6} =(110,1,14),\displaystyle=\left(\frac{1}{10},-1,-\frac{1}{4}\right), x7\displaystyle x_{7} =(34,0,14),\displaystyle=\left(\frac{3}{4},0,\frac{1}{4}\right), x8\displaystyle x_{8} =(110,110,0).\displaystyle=\left(\frac{1}{10},\frac{1}{10},0\right).

    These points are in general position in 3\mathbb{R}^{3}. For the setup of halfspaces Hx8,ui(x8)H_{x_{8},u_{i}}\in\mathcal{H}(x_{8}) given by the normal vectors

    u1=(710,310,35),u2=(25,110,910),u3=(15,45,35),u4=(1,110,0)u_{1}=\left(-\frac{7}{10},-\frac{3}{10},-\frac{3}{5}\right),\ u_{2}=\left(-\frac{2}{5},-\frac{1}{10},\frac{9}{10}\right),\ u_{3}=\left(-\frac{1}{5},\frac{4}{5},\frac{3}{5}\right),\ u_{4}=\left(1,\frac{1}{10},0\right)

    we obtain μ(Hx8,ui)=0ptx8μ=3/8\mu(H_{x_{8},u_{i}})=0pt{x_{8}}{\mu}=3/8 for each i=1,2,3,4i=1,2,3,4. At the same time, the union of the open halfspaces int(Hx8,ui)\operatorname{int}\left(H_{x_{8},u_{i}}\right) is 3\mathbb{R}^{3}, meaning that for any yx8y\neq x_{8} we can find ii with yint(Hx8,ui)y\in\operatorname{int}\left(H_{x_{8},u_{i}}\right), and the shifted closed halfspace Hy,ui=Hx8,ui+(yx8)(y)H_{y,u_{i}}=H_{x_{8},u_{i}}+(y-x_{8})\in\mathcal{H}(y) necessarily contains at most two atoms of μ\mu. Thus, 0ptyμ2/80pt{y}{\mu}\leq 2/8 and the point x8x_{8} is the single halfspace median of μ\mu.

The medians in all three cases above are stable in the sense that for a small perturbation of all the sample points, the dimension of the median set remains unchanged. Thus, in each setup and for each xix_{i} we can find a small open ball around xix_{i} such that if xix_{i} is replaced by any element of this ball, the dimension of the new median remains the same. In conclusion, all three cases k=0,1,3k=0,1,3 occur with positive probability if x1,,x8x_{1},\dots,x_{8} are sampled from any distribution in 3\mathbb{R}^{3} with positive density everywhere.444Note that our example for case k=1k=1 happens to disagree with [10, Theorem 3], as for n=8n=8 and d=3d=3 we obtain the maximum depth (nd+2)/2/n=3/8\lfloor(n-d+2)/2\rfloor/n=3/8, but the median set is not a single point set. The problem appears to stem from formula (8) in [10] that is not valid in general.

3.2. Computation of the halfspace median in 2\mathbb{R}^{2}

In dimension d=2d=2, Theorem 4 leaves only trivial cases: the halfspace median must be either full-dimensional or a singleton, and both situations may occur. But, as we show in our last result below, if μ𝒜(2)\mu\in\mathcal{A}\left(\mathbb{R}^{2}\right) has a unique median and n4n\neq 4, then the median must be one of the data points. The case of n=4n=4 data points is trivial and not interesting.555In the situation n=4n=4 and under the assumptions of Lemma 3, the unique median is almost surely a singleton and is (i) either the atom contained in the interior of the convex hull of the remaining three sample points; or (ii) not an atom, but the single point of intersection of the two diagonals of the quadrilateral formed by the convex hull of the atoms.

Theorem 7.

Let μ𝒜(2)\mu\in\mathcal{A}\left(\mathbb{R}^{2}\right) be an empirical measure with precisely nn atoms of mass 1/n1/n, with n4n\neq 4, that satisfy conditions (i) and (ii) from Lemma 3. If the halfspace median D(μ)D^{*}\left(\mu\right) is a single point set, then it must be an atom of μ\mu. In particular, the median set is either full-dimensional, or an atom of μ\mu.

Proof.

Suppose without loss of generality that x=02x=0\in\mathbb{R}^{2} is the unique median of μ\mu. Assume, for a contradiction, that μ({0})=0\mu(\{0\})=0. We start with the following observation: For every v𝕊1v\in\mathbb{S}^{1} there is w(v)𝕊1w(v)\in\mathbb{S}^{1} that meets the following conditions

(19) v,w(v)0,μ(int(H0,w(v)))=α(μ)1n, and μ(bd(H0,w(v)))=2/n.\langle v,w(v)\rangle\geq 0,\ \mu\left(\operatorname{int}\left(H_{0,w(v)}\right)\right)=\alpha^{*}(\mu)-\frac{1}{n},\mbox{ and }\mu\left(\operatorname{bd}\left(H_{0,w(v)}\right)\right)=2/n.

To prove the existence of w=w(v)w=w(v) satisfying (19) pick a real sequence ai0a_{i}\downarrow 0 and note that for every i=1,2,i=1,2,\dots we have aivD(μ)a_{i}v\notin D^{*}\left(\mu\right), so there is wi𝕊1w_{i}\in\mathbb{S}^{1} such that

(20) μ(Haiv,wi)=0ptaivμα(μ)1/n<0pt0μ.\mu(H_{a_{i}v,w_{i}})=0pt{a_{i}v}{\mu}\leq\alpha^{*}(\mu)-1/n<0pt{0}{\mu}.

The existence of a minimising halfspace Haiv,wi(aiv)H_{a_{i}v,w_{i}}\in\mathcal{H}(a_{i}v) follows from the fact that minimising halfspaces always exist for μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right), as we observed in Section 2. Then necessarily 0Haiv,wi0\not\in H_{a_{i}v,w_{i}}, meaning that v,wi>0\langle v,w_{i}\rangle>0. The sequence {wi}i=1𝕊1\{w_{i}\}_{i=1}^{\infty}\subset\mathbb{S}^{1} is bounded and therefore contains a convergent subsequence {wij}j=1\{w_{i_{j}}\}_{j=1}^{\infty} with a limit point w𝕊1w\in\mathbb{S}^{1} that satisfies v,w0\langle v,w\rangle\geq 0. By the Fatou lemma [4, Lemma 4.3.3] applied to the sets int(H0,w)liminfjint(Haijv,wij)\operatorname{int}\left(H_{0,w}\right)\subseteq{\lim\inf}_{j\to\infty}\operatorname{int}\left(H_{a_{i_{j}}v,w_{i_{j}}}\right) and (20) we have that

(21) μ(int(H0,w))lim infjμ(int(Haijv,wij))α(μ)1/n,\mu(\operatorname{int}\left(H_{0,w}\right))\leq\liminf_{j\to\infty}\mu\left(\operatorname{int}\left(H_{a_{i_{j}}v,w_{i_{j}}}\right)\right)\leq\alpha^{*}(\mu)-1/n,

which together with Corollary 2 gives us

α(μ)=0pt0μμ(int(H0,w))+0pt0μ|bd(H0,w)α(μ)1n+0pt0μ|bd(H0,w).\alpha^{*}(\mu)=0pt{0}{\mu}\leq\mu(\operatorname{int}\left(H_{0,w}\right))+0pt{0}{\mu|_{\operatorname{bd}\left(H_{0,w}\right)}}\leq\alpha^{*}(\mu)-\frac{1}{n}+0pt{0}{\mu|_{\operatorname{bd}\left(H_{0,w}\right)}}.

Therefore, 0pt0μ|bd(H0,w)1/n0pt{0}{\mu|_{\operatorname{bd}\left(H_{0,w}\right)}}\geq 1/n. Because the halfspace median 0 is not an atom of μ\mu, the condition of general position of the atoms of μ\mu from part (i) of Lemma 3 implies that the straight line bd(H0,w)\operatorname{bd}\left(H_{0,w}\right) contains exactly two atoms of μ\mu at some points y,zbd(H0,w)y,z\in\operatorname{bd}\left(H_{0,w}\right) such that 0 is contained in the relatively open line segment formed by yy and zz. Denote by lybd(H0,w)l_{y}\subset\operatorname{bd}\left(H_{0,w}\right) the open halfline centred at 0 that contains yy. The flag halfspace F={y}lyint(H0,w)(0)F=\{y\}\cup l_{y}\cup\operatorname{int}\left(H_{0,w}\right)\in\mathcal{F}(0) then satisfies μ(F)=μ(int(H0,w))+1/n\mu(F)=\mu\left(\operatorname{int}\left(H_{0,w}\right)\right)+1/n. Inequality (21) implies μ(F)α(μ)\mu(F)\leq\alpha^{*}(\mu), so it must be μ(F)=α(μ)\mu(F)=\alpha^{*}(\mu) because of Theorem 1. Consequently, μ(int(H0,w))=α(μ)1/n\mu\left(\operatorname{int}\left(H_{0,w}\right)\right)=\alpha^{*}(\mu)-1/n and we may take w(v)=ww(v)=w. We have proved (19).

Pick any v𝕊1v\in\mathbb{S}^{1}. There exists u=w(v)𝕊1u=w(v)\in\mathbb{S}^{1} that satisfies (19). Using the same observation again, we are able to find u=w(u)𝕊1u^{\prime}=w(-u)\in\mathbb{S}^{1} that satisfies (19) for vv replaced by u-u. We consider two different cases.

First case: u=uu^{\prime}=-u. By summing up equalities μ(int(H0,u))=α(μ)1/n\mu\left(\operatorname{int}\left(H_{0,u}\right)\right)=\alpha^{*}(\mu)-1/n, μ(int(H0,u))=α(μ)1/n\mu\left(\operatorname{int}\left(H_{0,-u}\right)\right)=\alpha^{*}(\mu)-1/n and μ(bd(H0,u))=2/n\mu\left(\operatorname{bd}\left(H_{0,u}\right)\right)=2/n that all follow from (19), we obtain α(μ)=1/2\alpha^{*}(\mu)=1/2. Consider any infinite line ll that passes through the origin and the two open halfplanes G+G^{+} and GG^{-} determined by ll. If μ(l)=1/n\mu(l)=1/n, then one of the open halfplanes G+G^{+} and GG^{-} is of μ\mu-mass at most 1/21/(2n)1/2-1/(2n). Assume that μ(G+)1/21/(2n)\mu(G^{+})\leq 1/2-1/(2n). Because ll contains only one atom of μ\mu that is not at the origin, there is a flag halfspace F(0)F\in\mathcal{F}(0) composed of G+G^{+} and the relatively closed halfline in ll starting at 0 that does not contain atoms. Then μ(F)=μ(G+)1/21/(2n)<α(μ)\mu(F)=\mu(G^{+})\leq 1/2-1/(2n)<\alpha^{*}(\mu), a contradiction with μ(l)=1/n\mu(l)=1/n. Due to the assumption of general position of atoms from part (i) of Lemma 3, we know that μ(l)2/n\mu(l)\leq 2/n, so μ(l)\mu(l) can take only one of the two possible values: either 0 or 2/n2/n. Because of our assumption from part (ii) of Lemma 3, there however cannot be three different lines determined by pairs of sample points that all intersect in the origin. This means that for only at most two lines ll in 2\mathbb{R}^{2} passing through the origin, the μ\mu-mass of ll can be 2/n2/n; all the other lines that we now consider have null μ\mu-mass (given that we have already excluded the case μ(l)=1/n\mu(l)=1/n). This leaves only two possibilities: either n=2n=2, or n=4n=4. If n=2n=2, then the median set D(μ)D^{*}\left(\mu\right) is the line segment determined by the only two atoms of μ\mu, and therefore it is one-dimensional. Only the case n=4n=4, not covered by the statement of this theorem, remains.

Second case: uuu^{\prime}\neq-u. There exists a closed halfspace H0,vH_{0,v^{\prime}} whose boundary passes through the origin that does not contain any of the points uu and uu^{\prime}. Let u~=w(v)\tilde{u}=w(v^{\prime}) be the unit vector that satisfies (19) with v=vv=v^{\prime}. Directly by (19), each of the three different lines bd(H0,u)\operatorname{bd}\left(H_{0,u}\right), bd(H0,u)\operatorname{bd}\left(H_{0,u^{\prime}}\right) and bd(H0,u~)\operatorname{bd}\left(H_{0,\tilde{u}}\right) contains two atoms of μ\mu, a contradiction with our assumption from part (ii) of Lemma 3.

The last part of the statement of Theorem 7 follows directly from Theorem 4. ∎

Theorems 4 and 7 fully justify the algorithmic procedure from [11] and [6] for finding the halfspace medians of samples from smooth probability distributions in 2\mathbb{R}^{2}. If the median set is full-dimensional, the algorithm from [11] implemented in the R package TukeyRegion [1] finds the median set exactly, as proved in [6]. If the median is not full-dimensional, we conclude that it has to be a single sample point, and evaluation of the maximum halfspace depth of all sample points gives the unique halfspace median.

In dimension d>2d>2, the situation with possible less-than-full-dimensional halfspace medians appears to be much more convoluted, as demonstrated already in Example 2. Our proof technique from Theorem 7 does not extend directly to d>2d>2. One might, however, conjecture that in accordance with Theorem 7, a less-than-full-dimensional median of a dataset in general position must contain at least one atom of μ𝒜(d)\mu\in\mathcal{A}\left(\mathbb{R}^{d}\right). Our final example shows that this is not true: a configuration of points in general position without an atom in the halfspace median set is indeed possible.

Example 3.

Consider a dataset of n=8n=8 points in 3\mathbb{R}^{3} given by

x1\displaystyle x_{1} =(1,13,23),\displaystyle=\left(-1,\frac{1}{3},-\frac{2}{3}\right), x2\displaystyle x_{2} =(1,0,1),\displaystyle=\bigg{(}1,0,-1\bigg{)}, x3\displaystyle x_{3} =(0,32,1),\displaystyle=\left(0,\frac{3}{2},-1\right), x4\displaystyle x_{4} =(12,0,1),\displaystyle=\left(-\frac{1}{2},0,1\right),
x5\displaystyle x_{5} =(1,0,43),\displaystyle=\left(1,0,\frac{4}{3}\right), x6\displaystyle x_{6} =(0,2,1),\displaystyle=\bigg{(}0,2,1\bigg{)}, x7\displaystyle x_{7} =(13,12,2),\displaystyle=\left(-\frac{1}{3},\frac{1}{2},-2\right), x8\displaystyle x_{8} =(13,12,2).\displaystyle=\left(\frac{1}{3},\frac{1}{2},2\right).

Similarly as in case k=1k=1 in Example 2, points x1,,x6x_{1},\dots,x_{6} are perturbed vertices of a triangular prism. Points x7x_{7} and x8x_{8} determine a line segment that passes through both triangular bases of that prism. The dataset is in general position. A direct computation performed in Mathematica, provided in the script in the online Supplementary Material, confirms that the sample halfspace median set of this dataset attains depth 3/83/8, and the median set consists of the line segment between points (0,1/2,0)\left(0,1/2,0\right) and (3/44,1/2,9/22)\left(3/44,1/2,9/22\right). This median line segment lies strictly in the relative interior of the straight line between points x7x_{7} and x8x_{8}, and does not contain any atoms of the corresponding measure μ𝒜(3)\mu\in\mathcal{A}\left(\mathbb{R}^{3}\right). Thus, it is possible that in dimension d>2d>2, a less-than-full-dimensional median set contains no data points. For a visualisation of our dataset and its median set see Figure 4.

Refer to caption
Refer to caption
Figure 4. Example 3. Left hand panel: Convex hull of the sample points (outer polyhedron) and the full-dimensional central region Dα(μ)D_{\alpha}(\mu) with α=2/8\alpha=2/8 (inner coloured polyhedron). Right hand panel: Convex hull of the sample points (outer polyhedron), the median line segment of depth α(μ)=3/8\alpha^{*}(\mu)=3/8 (thick orange line segment), and the line between data points x7x_{7} and x8x_{8} (dashed black line). The halfspace median line segment forms a piece of the dashed black line, and is contained inside the region D2/8(μ)D_{2/8}(\mu) from the left hand panel. For interactive visualisations see the supplementary Mathematica script.

In Example 3, we constructed a dataset in general position in dimension d=3d=3, with a one-dimensional median set. We do not know an example of a dataset sampled from a distribution with a density in d\mathbb{R}^{d}, d>2d>2, with a unique (zero-dimensional) halfspace median that is not a data point. The higher dimensional situation therefore deserves further investigation.

Acknowledgement

P. Laketa was supported by the OP RDE project “International mobility of research, technical and administrative staff at the Charles University”, grant CZ.02.2.69/0.0/0.0/18_053/0016976. The work of S. Nagy was supported by Czech Science Foundation (EXPRO project n. 19-28231X).

References

  • Barber and Mozharovskyi, [2022] Barber, C. and Mozharovskyi, P. (2022). TukeyRegion: Tukey region and median. R package version 0.1.5.5.
  • Chernozhukov et al., [2017] Chernozhukov, V., Galichon, A., Hallin, M., and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Statist., 45(1):223–256.
  • Donoho and Gasko, [1992] Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist., 20(4):1803–1827.
  • Dudley, [2002] Dudley, R. M. (2002). Real analysis and probability, volume 74 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge.
  • Dyckerhoff and Mozharovskyi, [2016] Dyckerhoff, R. and Mozharovskyi, P. (2016). Exact computation of the halfspace depth. Comput. Statist. Data Anal., 98:19–30.
  • Fojtík et al., [2022] Fojtík, V., Laketa, P., Mozharovskyi, P., and Nagy, S. (2022). On exact computation of Tukey depth central regions. arXiv preprint arXiv:2208.04587.
  • [7] Laketa, P. and Nagy, S. (2022a). Halfspace depth for general measures: the ray basis theorem and its consequences. Statist. Papers, 63(3):849–883.
  • [8] Laketa, P. and Nagy, S. (2022b). Partial reconstruction of measures from halfspace depth. In Proceedings of CLADAG2021, Stud. Classification Data Anal. Knowledge Organ. Springer, Cham. To appear.
  • Laketa et al., [2022] Laketa, P., Pokorný, D., and Nagy, S. (2022). Simple halfspace depth. Under review.
  • Liu et al., [2020] Liu, X., Luo, S., and Zuo, Y. (2020). Some results on the computing of Tukey’s halfspace median. Statist. Papers, 61(1):303–316.
  • Liu et al., [2019] Liu, X., Mosler, K., and Mozharovskyi, P. (2019). Fast computation of Tukey trimmed regions and median in dimension p>2p>2. J. Comput. Graph. Statist., 28(3):682–697.
  • Massé, [2004] Massé, J.-C. (2004). Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean. Bernoulli, 10(3):397–419.
  • Mizera and Volauf, [2002] Mizera, I. and Volauf, M. (2002). Continuity of halfspace depth contours and maximum depth estimators: diagnostics of depth-related methods. J. Multivariate Anal., 83(2):365–388.
  • Mosler and Mozharovskyi, [2022] Mosler, K. and Mozharovskyi, P. (2022). Choosing among notions of multivariate depth statistics. Statist. Sci., 37(3):348–368.
  • Nagy et al., [2019] Nagy, S., Schütt, C., and Werner, E. M. (2019). Halfspace depth and floating body. Stat. Surv., 13:52–118.
  • Rousseeuw and Ruts, [1999] Rousseeuw, P. J. and Ruts, I. (1999). The depth function of a population distribution. Metrika, 49(3):213–244.
  • Schneider, [2014] Schneider, R. (2014). Convex bodies: the Brunn-Minkowski theory, volume 151 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, expanded edition.
  • Small, [1987] Small, C. G. (1987). Measures of centrality for multivariate and directional distributions. Canad. J. Statist., 15(1):31–39.
  • Struyf and Rousseeuw, [1999] Struyf, A. and Rousseeuw, P. J. (1999). Halfspace depth and regression depth characterize the empirical distribution. J. Multivariate Anal., 69(1):135–153.
  • Tukey, [1975] Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Vol. 2, pages 523–531. Canad. Math. Congress, Montreal, Que.
  • Zuo and Serfling, [2000] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist., 28(2):461–482.