This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Extending the centerpoint theorem to multiple points

Alexander Pilz Supported by a Schrödinger fellowship of the Austrian Science Fund (FWF): J-3847-N35. Institute of Software Technology, Graz University of Technology.
apilz@ist.tugraz.at
Patrick Schnider Department of Computer Science, ETH Zürich.
patrick.schnider@inf.ethz.ch
Abstract

The centerpoint theorem is a well-known and widely used result in discrete geometry. It states that for any point set PP of nn points in d\mathbb{R}^{d}, there is a point cc, not necessarily from PP, such that each halfspace containing cc contains at least nd+1\frac{n}{d+1} points of PP. Such a point cc is called a centerpoint, and it can be viewed as a generalization of a median to higher dimensions. In other words, a centerpoint can be interpreted as a good representative for the point set PP. But what if we allow more than one representative? For example in one-dimensional data sets, often certain quantiles are chosen as representatives instead of the median.

We present a possible extension of the concept of quantiles to higher dimensions. The idea is to find a set QQ of (few) points such that every halfspace that contains one point of QQ contains a large fraction of the points of PP and every halfspace that contains more of QQ contains an even larger fraction of PP. This setting is comparable to the well-studied concepts of weak ε\varepsilon-nets and weak ε\varepsilon-approximations, where it is stronger than the former but weaker than the latter. We show that for any point set of size nn in d\mathbb{R}^{d} and for any positive α1,,αk\alpha_{1},\ldots,\alpha_{k} where α1α2αk\alpha_{1}\leq\alpha_{2}\leq\ldots\leq\alpha_{k} and for every i,ji,j with i+jk+1i+j\leq k+1 we have that (d1)αk+αi+αj1(d-1)\alpha_{k}+\alpha_{i}+\alpha_{j}\leq 1, we can find QQ of size kk such that each halfspace containing jj points of QQ contains least αjn\alpha_{j}n points of PP. For two-dimensional point sets we further show that for every α\alpha and β\beta with αβ\alpha\leq\beta and α+β23\alpha+\beta\leq\frac{2}{3} we can find QQ with |Q|=3|Q|=3 such that each halfplane containing one point of QQ contains at least αn\alpha n of the points of PP and each halfplane containing all of QQ contains at least βn\beta n points of PP. All these results generalize to the setting where PP is any mass distribution. For the case where PP is a point set in 2\mathbb{R}^{2} and |Q|=2|Q|=2, we provide algorithms to find such points in time O(nlog3n)O(n\log^{3}n).

1 Introduction

Medians and quantiles are ubiquitous in the statistical analysis and visualization of data. These notions allow for quantifying how deep some point lies within a one-dimensional data set by measuring how many elements of the data set appear before the point and how many appear after it. In comparison to the mean, medians and quantiles have the advantage that they only depend on the order of the data points, and not their exact positions, making them robust against outliers. However, in many applications, data sets are multidimensional, and there is no clear order of the data set. For this reason, various generalizations of medians to higher dimensions have been introduced and studied. Many of these generalized medians rely on a notion of depth of a query point within a data set, a median then being a query point with the highest depth among all possible query points. Several such depth measures have been introduced over time, most famously Tukey depth [19] (also called halfspace depth), simplicial depth, or convex hull peeling depth (see, e.g., [1]). All of these depth measures lead to generalized medians that are invariant under affine transformations. As for quantiles, only a few generalizations have been introduced (see, e.g., [7]). We propose such a generalization by extending a depth measure to sets with a fixed number of query points and defining a quantile as a set with maximal depth. The depth measure we extend is Tukey depth: the Tukey depth of a point qq with respect to a point set PdP\subset\mathbb{R}^{d} is the minimal number of points of PP in any closed halfspace containing qq. More formally, if HH denotes the set of closed halfspaces, then the Tukey depth tdP(q)\textsf{td}_{P}(q) of qq with respect to PP is

tdP(q)=minqhH{|hP|}.\textsf{td}_{P}(q)=\min_{q\in h\in H}\{|h\cap P|\}\enspace.

Similarly, the Tukey depth can also be defined for a mass distribution μ\mu:

tdμ(q)=minqhH{μ(h)}.\textsf{td}_{\mu}(q)=\min_{q\in h\in H}\{\mu(h)\}\enspace.

Here, a mass distribution μ\mu on d\mathbb{R}^{d} is a measure on d\mathbb{R}^{d} such that all open subsets of d\mathbb{R}^{d} are measurable, 0<μ(d)<0<\mu(\mathbb{R}^{d})<\infty and μ(S)=0\mu(S)=0 for every lower-dimensional subset SS of d\mathbb{R}^{d}.

The centerpoint theorem states that there is always a point of high depth, i.e., a point qq such that for every closed halfspace hh containing qq we have |hP||P|d+1|h\cap P|\geq\frac{|P|}{d+1} (or μ(h)μ(d)d+1\mu(h)\geq\frac{\mu(\mathbb{R}^{d})}{d+1} for masses). Note that, for d=1d=1, such a centerpoint is a median: a median has the property that every halfline containing it contains at least half of the underlying data set. Quantiles can be interpreted similarly: the 13\frac{1}{3}-quantile and the 23\frac{2}{3}-quantile form a set of two points such that every halfline that contains one of them contains at least 13\frac{1}{3} of the data set. Furthermore, a halfline containing both of the points contains at least 23\frac{2}{3} of the underlying data set. In particular, halflines containing more points contain more of the data set. This idea leads to the following generalization of Tukey depth for a set QQ of multiple points:

gtdP(Q):=minhH:Qh{|hP||hQ|}.\textsf{gtd}_{P}(Q):=\min_{h\in H\colon Q\cap h\neq\emptyset}\left\{\frac{|h\cap P|}{|h\cap Q|}\right\}\enspace.

Again, we can generalize this to mass distributions:

gtdμ(Q):=minhH:Qh{μ(h)|hQ|}.\textsf{gtd}_{\mu}(Q):=\min_{h\in H\colon Q\cap h\neq\emptyset}\left\{\frac{\mu(h)}{|h\cap Q|}\right\}\enspace.

We prove that there is always a set QQ of kk points that has generalized Tukey depth 1kd+1\frac{1}{kd+1}. In fact, we prove the following, more general statement:

Theorem 1.

Let μ\mu be a mass distribution in d\mathbb{R}^{d} with μ(d)=1\mu(\mathbb{R}^{d})=1. Let α1,,αk\alpha_{1},\ldots,\alpha_{k} be non-negative real numbers such that α1α2αk\alpha_{1}\leq\alpha_{2}\leq\ldots\leq\alpha_{k} and for every i,ji,j with i+jk+1i+j\leq k+1 we have that (d1)αk+αi+αj1(d-1)\alpha_{k}+\alpha_{i}+\alpha_{j}\leq 1. Then there are kk points p1,,pkp_{1},\ldots,p_{k} in d\mathbb{R}^{d} such that for each closed halfspace hh containing jj of the points p1,,pkp_{1},\ldots,p_{k} we have μ(h)αj\mu(h)\geq\alpha_{j}.

Note that, for d=1d=1 and k=2k=2, the points p1p_{1} and p2p_{2} correspond to the α1\alpha_{1}-quantile and the (1α1)(1-\alpha_{1})-quantile; for αj=jkd+1\alpha_{j}=\frac{j}{kd+1} we get our bound on the generalized Tukey depth, and for α1==αk\alpha_{1}=\ldots=\alpha_{k}, the result implies the centerpoint theorem.

Our second result is motivated by interpreting the 13\frac{1}{3}-quantile and the 23\frac{2}{3}-quantile not as two points, but as a one-dimensional simplex. We then have that every halfline that contains a part of the simplex contains at least 13\frac{1}{3} of the underlying data set and every halfline that contains the whole simplex contains at least 23\frac{2}{3} of the underlying data set. Also for this interpretation we give a generalization to two dimensions:

Theorem 2.

Let μ\mu be a mass distribution in 2\mathbb{R}^{2} with μ(2)=1\mu(\mathbb{R}^{2})=1. Let α\alpha and β\beta be real numbers such that 0<αβ0<\alpha\leq\beta and α+β=23\alpha+\beta=\frac{2}{3}. Then there is a triangle Δ\Delta in 2\mathbb{R}^{2} such that

  1. (1)

    for each closed halfplane hh containing one of the vertices of Δ\Delta we have μ(h)α\mu(h)\geq\alpha and

  2. (2)

    for each closed halfplane hh fully containing Δ\Delta we have μ(h)β\mu(h)\geq\beta.

Note that this again generalizes centerpoints for α=β\alpha=\beta. However, this result does not give bounds on the generalized Tukey depth of these sets, as, e.g., a halfspace containing two points may still only contain an α\alpha-fraction of the mass.

Finally, we give algorithms to compute two points satisfying the two-dimensional case of Theorem 1 and three points satisfying Theorem 2 in time O(nlog3n)O(n\log^{3}n).

Related work

Another way to view our setting is the following: given a multidimensional data set, we want to find a fixed number of representatives. The idea of small point sets representing a larger point set has been studied in many different settings. One of the most famous of those is the concept of ε\varepsilon-nets, introduced by Haussler and Welzl [8]. For a range space (X,R)(X,R), consisting of a set XX and a set RR of subsets of XX, an ε\varepsilon-net on PXP\subset X is a subset NN of PP with the property that every rRr\in R with |rP|ε|P||r\cap P|\geq\varepsilon|P| intersects NN. In our setting, where we consider halfspaces, we would choose X=dX=\mathbb{R}^{d} and RR as the set of all halfspaces. It is known that for this range space, for any point set PP there exists an ε\varepsilon-net of size O(dεlogdε)O(\frac{d}{\varepsilon}\log\frac{d}{\varepsilon}). In particular, this bound does not depend on the size of PP. Note that we require the ε\varepsilon-net to be a subset of PP. If this condition is dropped, we arrive at the concept of weak ε\varepsilon-nets. The fact that the points for the weak ε\varepsilon-net can be chosen anywhere in d\mathbb{R}^{d} allows for very small weak ε\varepsilon-nets for many range spaces. There has been some work on weak ε\varepsilon-nets of small size. For halfplanes in 2\mathbb{R}^{2} for example, Aronov et al. [4] have shown that there is always a weak 12\frac{1}{2}-net of two points. These two points both lie outside of the convex hull of PP. They also consider many other range spaces, such as convex sets, disks and rectangles. Similarly, Babazadeh and Zarrabi-Zadeh [5] construct weak 12\frac{1}{2}-nets of size 3 for halfspaces in 3\mathbb{R}^{3}. For two-dimensional convex sets, Mustafa and Ray [16] have shown that there is always a weak 47\frac{4}{7}-net of two points; Shabbir [18] shows how to find two such points in O(nlog4n)O(n\log^{4}n) time.

Another related concept is the concept of ε\varepsilon-approximations: For a range space (X,R)(X,R) an ε\varepsilon-approximation on PXP\subset X is a subset NN of PP with the property that for every rRr\in R we have ||rP||P||rN||N||ε\left|\frac{|r\cap P|}{|P|}-\frac{|r\cap N|}{|N|}\right|\leq\varepsilon. Again, the restriction that NN has to be a subset of PP can be dropped to get the notion of weak ε\varepsilon-approximations. Just as for ε\varepsilon-nets, there has been a lot of work on ε\varepsilon-approximations and weak ε\varepsilon-approximations, see [15] for a recent survey. In particular it was shown that for halfspaces in d\mathbb{R}^{d}, there always is an ε\varepsilon-approximation of size O(1ε22/(d+1))O(\frac{1}{\varepsilon^{2-2/(d+1)}}) [13, 14].

While our setting can be considered to be related to weak ε\varepsilon-nets and weak ε\varepsilon-approximations for range spaces determined by halfspaces, the differences are significant. As we will discuss here, a halfspace missing all the points of QQ may still contain up to half of the points of the initial set, and thus QQ qualifies neither as a good weak ε\varepsilon-approximation nor ε\varepsilon-net.

Note that for |Q|=2|Q|=2, the condition of Theorem 1 that any halfspace containing all of the points of QQ contains at least α2n\alpha_{2}n points of PP is equivalent to the statement that every halfspace containing more than (1α2)n(1-\alpha_{2})n of the points of PP contains at least one point of QQ. So, QQ is a weak (1α2)(1-\alpha_{2})-net of PP. Furthermore, the condition that any halfspace containing one of the points of QQ contains at least α1n\alpha_{1}n points of PP translates to the statement that every halfspace containing more than (1α1)n(1-\alpha_{1})n of the points of PP must contain all of QQ. Thus, QQ is not only a weak (1α2)(1-\alpha_{2})-net of PP but also has the additional property that all points of QQ are somewhat deep within PP. (For two points in the plane, this comes at the cost of having ε\varepsilon larger than 12\frac{1}{2}.) On the other hand, while we require halfspaces containing all points of QQ to also contain many points of PP, we also allow halfspaces containing only one point of QQ to contain many points of PP. This separates our concept from weak ε\varepsilon-approximations. Note that when dealing with halfspaces and ε\varepsilon-nets of size 2, the weak 12\frac{1}{2}-net of Aronov et al. [4] is actually also a weak 12\frac{1}{2}-approximation. Similarly, Theorem 1 gives us a weak (1α2)(1-\alpha_{2})-approximation of PP, with the optimal value being reached when α1\alpha_{1} tends to zero (which actually corresponds to the result in [4]). Adding more points to QQ does not give us a better approximation. For d=2d=2, requiring that for i+jk+1i+j\leq k+1 we have (d1)αi+αj+αk<1(d-1)\alpha_{i}+\alpha_{j}+\alpha_{k}<1 implies α1+2αk<1\alpha_{1}+2\alpha_{k}<1, so a halfspace containing no points of QQ may contain half of the points of PP; we therefore cannot get anything better than a weak 12\frac{1}{2}-approximation. Also, we do not get anything better than a weak 12\frac{1}{2}-net.

In fact, our setting is very much related to the concept of one-sided ε\varepsilon-approximants, recently introduced by Bukh and Nivasch [6]: For a range space (X,R)(X,R), a one-sided ε\varepsilon-approximant on PXP\subset X is a subset NN of PP with the property that for every rRr\in R we have |rP||P||rN||N|ε\frac{|r\cap P|}{|P|}-\frac{|r\cap N|}{|N|}\leq\varepsilon. Once again, the restriction that NN has to be a subset of PP can be dropped to get the notion of weak one-sided ε\varepsilon-approximations. Note that the only difference to the definition of ε\varepsilon-approximations is that one does not take the absolute value of the difference. In particular, if |rN||N|>|rP||P|\frac{|r\cap N|}{|N|}>\frac{|r\cap P|}{|P|}, i.e., if rr contains many points of NN despite containing only few points of PP, the difference is negative, and thus smaller than ε\varepsilon.

In their paper, Bukh and Nivasch [6] consider the range space of convex sets. They show that any point set in d\mathbb{R}^{d} allows a one-sided ε\varepsilon-approximant for convex ranges of size g(ε,d)g(\varepsilon,d), where g(ε,d)g(\varepsilon,d) only depends on ε\varepsilon and dd, but not on the size of PP.

In a similar reasoning, it makes sense to define an approximation by a set NN such that for every rRr\in R we have |rN||N||rP||P|ε\frac{|r\cap N|}{|N|}-\frac{|r\cap P|}{|P|}\leq\varepsilon. Intuitively, if a range rr contains a large fraction of the points of NN, then it is guaranteed to contain a large fraction of the set PP we want to approximate. But here again, our approximation ratio is 12\frac{1}{2} at best.

2 Two points

We first consider the case where the underlying data is a point set. Motivated by the definition of generalized Tukey depth, we consider α1=15\alpha_{1}=\frac{1}{5} and α2=25\alpha_{2}=\frac{2}{5}. Even though this result is a special case of Theorem 1, we still show its proof for two reasons: first, the Algorithm presented in Section 5 relies heavily on the presented proof and, secondly, the proof already illustrates the main ideas for the proof of Theorem 1.

Theorem 3.

Let PP be a set of nn points in general position in the plane. Then there are two points p1p_{1} and p2p_{2} in 2\mathbb{R}^{2} such that

  1. (1)

    each closed halfplane containing one of the points p1p_{1} and p2p_{2} contains at least n5\frac{n}{5} of the points of PP and

  2. (2)

    each closed halfplane containing both p1p_{1} and p2p_{2} contains at least 2n5\frac{2n}{5} of the points of PP.

Proof.

Note that condition (1) is equivalent to the condition that every open halfplane containing more than 4n5\frac{4n}{5} of the points of PP must contain both p1p_{1} and p2p_{2}. Similarly, condition (2) is equivalent to the condition that every open halfplane containing more than 3n5\frac{3n}{5} of the points of PP must contain one of p1p_{1} and p2p_{2}. We will now construct two points p1p_{1} and p2p_{2} satisfying both these conditions.

Let CC be the intersection of all open halfplanes containing more than 4n5\frac{4n}{5} of the points of PP. Clearly CC is convex. Also, note that CC is closed. The centerpoint region is a strict subset of CC and thus CC has a non-empty interior. In order to satisfy condition (1), both p1p_{1} and p2p_{2} have to be placed in CC.

Let now HH be the set of all open halfplanes containing more than 3n5\frac{3n}{5} of the points of PP. For any hih_{i} in HH let cic_{i} be the intersection of hih_{i} and CC. In order to also satisfy condition (2), we need to find two points p1p_{1} and p2p_{2} such that every cic_{i} contains at least one of them. To this end, we partition HH into two subsets LL and RR. The set LL contains all halfplanes that lie on the left side of their respective boundary lines. Analogously, RR contains all halfplanes that lie on the right side of their respective boundary lines. For a halfplane hih_{i} that has a horizontal boundary line, we put hih_{i} in LL if and only if it lies above its boundary line.

Note that any three halfplanes in LL have a non-empty intersection: Consider the inclusion-minimal halfplane hLh\in L with horizontal boundary line and its intersection rr with the boundary of the convex hull of PP. As hh is open, rr is not in hh. However, we claim that any point rr^{\prime} in hh on the convex hull boundary of PP in an ε\varepsilon-neighborhood of rr is in any halfplane of LL. Indeed, if there was a halfplane in LL not containing rr^{\prime}, it would contain a strict subset of the intersection of the convex hull of PP with hh; however, this would contradict the minimality of hh. The analogous holds for RR.

We will now show that for any two halfplanes h1h_{1} and h2h_{2} in LL, their corresponding regions c1c_{1} and c2c_{2} have a non-empty intersection. The same arguments hold for any two halfplanes in RR. Assume for the sake of contradiction that c1c_{1} and c2c_{2} do not intersect. As CC and h1h2h_{1}\cap h_{2} are convex, this means that there is an open halfplane gg containing more than 4n5\frac{4n}{5} of the points of PP such that the intersection of the boundary lines of h1h_{1} and h2h_{2} lies in g¯\overline{g}, the complement of gg (see Figure 1). In particular, gh1g\cap h_{1} is a strict subset of h2¯\overline{h_{2}}. As g¯\overline{g} contains strictly fewer than n5\frac{n}{5} of the points of PP and h1¯\overline{h_{1}} contains strictly fewer than 2n5\frac{2n}{5} of the points of PP, gh1g\cap h_{1} must contain strictly more than 2n5\frac{2n}{5} of the points of PP. However, being a subset of h2¯\overline{h_{2}}, which also contains strictly fewer than 2n5\frac{2n}{5} of the points of PP, this is a contradiction. Thus, by contradiction, c1c_{1} and c2c_{2} intersect.

Refer to caption
Figure 1: Two cic_{i}s associated to LL must intersect (left). The intersection is non-empty in other variants (right).

As neither three halfplanes in LL nor two halfplanes in LL and CC have an empty intersection, Helly’s Theorem entails that there exists a point in both CC and all halfplanes in LL, i.e., all cic_{i}s associated to LL have a non-empty intersection DLD_{L}. Again, the same holds for RR, with a non-empty intersection DRD_{R}. Placing p1p_{1} in DLD_{L} and p2p_{2} in DRD_{R}, we have thus constructed two points such that the conditions (1) and (2) hold. ∎

This result is tight in the following sense: There is a point set for which it is not possible to improve both conditions at the same time, that is, it is not possible to find two points such that any halfplane containing one of them contains strictly more than n5\frac{n}{5} of the points and any halfplane containing both of them contains strictly more than 2n5\frac{2n}{5} of the points. For this consider a set of n=5kn=5k point arranged in the following way. Partition the points into 5 sets P1,,P5P_{1},\ldots,P_{5} of kk points each. Place P1,,P5P_{1},\ldots,P_{5} in such a way that the convex hull of each PiP_{i} is disjoint from the convex hull of the union of the other four sets (see Figure 2).

Refer to caption
Figure 2: A construction for which the bounds of Theorem 3 cannot be improved.

Denote by Si,jS_{i,j} the convex hull CH(PiPj)\textsf{CH}(P_{i}\cup P_{j}) of PiPjP_{i}\cup P_{j}. Let \ell be a line through CH(Pi)\textsf{CH}(P_{i}) and CH(Pj)\textsf{CH}(P_{j}). Note that any other set PmP_{m} is not separated by \ell (i.e., lies entirely on one side). Let Ai,jA_{i,j} be the side of \ell containing fewer of the other sets and let Bi,jB_{i,j} be the other side. For any point qq in CH(P1P5)\textsf{CH}(P_{1}\cup\ldots\cup P_{5}) we say that qq is above Si,jS_{i,j} if it is not in Si,jS_{i,j} but it is in Ai,jA_{i,j}. Similarly, for any point qq in CH(P1P5)\textsf{CH}(P_{1}\cup\ldots\cup P_{5}) we say that qq is below Si,jS_{i,j} if it is not in Si,jS_{i,j} but it is in Bi,jB_{i,j}. Suppose, for the sake of contradiction, that there exist two points p1p_{1} and p2p_{2} such that any halfplane containing one of them contains strictly more than kk of the points of P1P5P_{1}\cup\ldots\cup P_{5} and any halfplane containing both of them contains strictly more than 2k2k of the points of P1P5P_{1}\cup\ldots\cup P_{5}. Consider two sets PiP_{i} and PjP_{j} such that Ai,jA_{i,j} contains exactly one other set. First we note that neither p1p_{1} nor p2p_{2} can lie above Si,jS_{i,j} as otherwise we can find a halfplane containing that point and only one of the sets, i.e., only kk points. Similarly, we cannot place both p1p_{1} and p2p_{2} below Si.jS_{i.j}, as otherwise we can find a halfplane containing both points and only two of the sets, i.e., only 2k2k points. Also, we must clearly place both p1p_{1} and p2p_{2} in CH(P1P5)\textsf{CH}(P_{1}\cup\ldots\cup P_{5}). Thus, for any two sets PiP_{i} and PjP_{j} such that Ai,jA_{i,j} contains exactly one other set, Si,jS_{i,j} must contain at least one of p1p_{1} and p2p_{2}. However, there are five such Si,jS_{i,j} and P1,,P5P_{1},\ldots,P_{5} can be placed in such a way that no three of them have a common intersection. So no matter how we place p1p_{1} and p2p_{2}, one of the Si,jS_{i,j} will be empty.

3 An arbitrary number of points

We now strengthen Theorem 3 in four ways: we allow for arbitrarily many query points, we extend it to higher dimensions, we consider mass distributions instead of point sets, and we give a range of possible bounds:

See 1

We use the following observation, which follows from the fact that for an empty intersection of d+1d+1 halfspaces, any point with non-zero mass is in at most dd such halfspaces.

Observation 4.

Let μ\mu be a mass distribution in d\mathbb{R}^{d} with μ(d)=1\mu(\mathbb{R}^{d})=1. Let h1,,hd+1h_{1},\ldots,h_{d+1} be d+1d+1 open halfspaces with h1hd+1=h_{1}\cap\ldots\cap h_{d+1}=\emptyset. Then μ(h1)++μ(hd+1)d\mu(h_{1})+\ldots+\mu(h_{d+1})\leq d.

Proof of Theorem 1.

The result is straightforward for d=1d=1, so assume d2d\geq 2. Again the condition that for each closed halfspace hh^{\prime} containing jj of the points p1,,pkp_{1},\ldots,p_{k} we have μ(h)αj\mu(h^{\prime})\geq\alpha_{j} is equivalent to the condition that every open halfspace hh with μ(h)>1αj\mu(h)>1-\alpha_{j} must contain at least kjk-j of the points p1,,pkp_{1},\ldots,p_{k}. Let α0=0\alpha_{0}=0. For 1jk1\leq j\leq k, we call an open halfspace hh a jj-halfspace if 1αkj+1<μ(h)1αkj1-\alpha_{k-j+1}<\mu(h)\leq 1-\alpha_{k-j}. Consider the x1x_{1}-x2x_{2}-plane, denoted by XX, and for each vector v=(v1,v2,,vd)v=(v_{1},v_{2},\ldots,v_{d}) in d\mathbb{R}^{d} let π(v)=(v1,v2,0,,0)\pi(v)=(v_{1},v_{2},0,\ldots,0) be the projection of vv to XX. Let v1,,vkv_{1},\ldots,v_{k} be kk unit vectors in XX with the property that the angle between any viv_{i} and vi+1v_{i+1} is 2πk\frac{2\pi}{k}. Note that this implies that also the angle between vkv_{k} and v1v_{1} is 2πk\frac{2\pi}{k}. For each viv_{i} we construct a principal set ViV_{i} of halfspaces as follows: For each jj, consider all jj-halfspaces. For any such halfspace hh, let n(h)n(h) be the normal vector to its bounding hyperplane that points into hh. Let hh be in ViV_{i} if the angle between π(n(h))\pi(n(h)) and viv_{i} is at most jπk\frac{j\pi}{k}. If π(n(h))=0\pi(n(h))=0, place hh arbitrarily in jj of the ViV_{i}’s. Note that with this construction each jj-halfspace is contained in exactly jj principal sets. Thus, if, for each principal set, we can pick a point in all its halfplanes, then each jj-halfplane contains jj points.

It remains to show that the halfspaces in each principal set have a common intersection. Let h1,,hd+1h_{1},\ldots,h_{d+1} be d+1d+1 halfspaces in ViV_{i} and assume for the sake of contradiction that they have no common intersection. Then the positive hull (conical hull) of their projected normal vectors must be XX, and in particular there are three of them, w.l.o.g. h1h_{1}, h2h_{2} and h3h_{3}, whose projected normal vectors already have XX as their positive hull. Further, among those three halfspaces, there are two of them, w.l.o.g. h1h_{1} and h2h_{2}, such that the angles between their projected normal vectors and viv_{i} sum up to more than π\pi. If h1h_{1} is a j1j_{1}-halfspace, then by construction of ViV_{i} we have that the angle between π(n(h1))\pi(n(h_{1})) and viv_{i} is at most j1πk\frac{j_{1}\pi}{k}. Analogously, if h2h_{2} is a j2j_{2}-halfspace, the angle between π(n(h2))\pi(n(h_{2})) and viv_{i} is at most j2πk\frac{j_{2}\pi}{k}. By the choice of h1h_{1} and h2h_{2} we thus have (j1+j2)πk>π\frac{(j_{1}+j_{2})\pi}{k}>\pi, which is equivalent to j1+j2>kj_{1}+j_{2}>k, and to j1+j2k+1j_{1}+j_{2}\geq k+1, as j1j_{1} and j2j_{2} are integers. By definition of a jj-halfspace we have

μ(h1)+μ(h2)>1αk+1j1+1αk+1j2.\mu(h_{1})+\mu(h_{2})>1-\alpha_{k+1-j_{1}}+1-\alpha_{k+1-j_{2}}\enspace.

Furthermore we have μ(hi)>1αk\mu(h_{i})>1-\alpha_{k} for every i{1,,d+1}i\in\{1,\ldots,d+1\}, and thus

μ(h1)+μ(h2)+μ(h3)++μ(hd+1)>1αk+1j1+1αk+1j2+(d1)(1αk),\mu(h_{1})+\mu(h_{2})+\mu(h_{3})+\ldots+\mu(h_{d+1})>1-\alpha_{k+1-j_{1}}+1-\alpha_{k+1-j_{2}}+(d-1)(1-\alpha_{k})\enspace,

which is equivalent to

(d1)αk+αk+1j1+αk+1j2>d+1(μ(h1)++μ(hd+1)).(d-1)\alpha_{k}+\alpha_{k+1-j_{1}}+\alpha_{k+1-j_{2}}>d+1-(\mu(h_{1})+\ldots+\mu(h_{d+1}))\enspace.

As k+1j1+k+1j2=2k+2(j1+j2)k+1k+1-j_{1}+k+1-j_{2}=2k+2-(j_{1}+j_{2})\leq k+1, we have that (d1)αk+αk+1j1+αk+1j21(d-1)\alpha_{k}+\alpha_{k+1-j_{1}}+\alpha_{k+1-j_{2}}\leq 1 and thus μ(h1)++μ(hd+1)>d\mu(h_{1})+\ldots+\mu(h_{d+1})>d, which is a contradiction to Observation 4. ∎

Setting αj=jkd+1\alpha_{j}=\frac{j}{kd+1}, we get a bound for the generalized Tukey depth:

Corollary 5.

Let μ\mu be a mass distribution in d\mathbb{R}^{d} with μ(d)=1\mu(\mathbb{R}^{d})=1. Then there exist kk points p1,,pkp_{1},\ldots,p_{k} in d\mathbb{R}^{d} with generalized Tukey depth gtdμ({p1,,pk})=1kd+1\textsf{gtd}_{\mu}(\{p_{1},\ldots,p_{k}\})=\frac{1}{kd+1}.

4 Triangles

As mentioned before, the 13\frac{1}{3}-quantile and the 23\frac{2}{3}-quantile can also be interpreted as a one-dimensional simplex with the property that every halfline that contains a part of the simplex contains at least 13\frac{1}{3} of the underlying data set and every halfline that contains the whole simplex contains at least 23\frac{2}{3} of the underlying data set. For this interpretation, we give a generalization to two dimensions. For ease of presentation, we only give a proof for point sets instead of mass distributions and for fixed values of α\alpha and β\beta.

Theorem 6.

Let PP be a set of nn points in general position in the plane. Then there are three points p1p_{1}, p2p_{2} and p3p_{3} in 2\mathbb{R}^{2} such that

  1. (1)

    each closed halfplane containing one of the points p1p_{1}, p2p_{2} and p3p_{3} contains at least n6\frac{n}{6} of the points of PP and

  2. (2)

    each closed halfplane containing all of p1p_{1}, p2p_{2} and p3p_{3} contains at least n2\frac{n}{2} points of PP.

Note that this can also be interpreted as an instance of Theorem 1 with α1=α2=16\alpha_{1}=\alpha_{2}=\frac{1}{6} and α3=12\alpha_{3}=\frac{1}{2}. However, as α3+α3+α1>1\alpha_{3}+\alpha_{3}+\alpha_{1}>1, the precondition of Theorem 1 does not apply. The proof of this result uses similar ideas as the above proofs.

Proof of Theorem 6.

Let CC be the intersection of all open halfplanes containing more than 5n6\frac{5n}{6} of the points of PP. Just as in the proof of Theorem 3, condition (1) is equivalent to p1p_{1}, p2p_{2} and p3p_{3} lying in CC. Similarly, condition (2) is equivalent to the following statement: for every open halfplane hh containing more than n2\frac{n}{2} of the points of PP, hh contains at least one of p1p_{1}, p2p_{2} and p3p_{3}. For the latter, let nin_{i} be a vector and let NiN_{i} be the set of all of these halfplanes that have nin_{i} as their normal vector. The intersection of all elements of NiN_{i} is a closed halfspace hih_{i}. Let KK be the set of all of these hih_{i}, i.e., for every possible direction of nin_{i}. Then, condition (2) is equivalent to the following statement: for every halfplane hh in KK, hh contains at least one of p1p_{1}, p2p_{2} and p3p_{3}. For each such halfplane hih_{i}, let cic_{i} be the intersection of hih_{i} and CC. Note that cic_{i} is compact. We thus want to show the claim that we can find three points p1p_{1}, p2p_{2} and p3p_{3} such that each cic_{i} contains at least one of them. Let HH be the set of cic_{i}s that are minimal with respect to set inclusion. Clearly, it is enough to show the claim just for the elements of HH. Let NN be the set of normal vectors of the defining halfplanes of the elements of HH. Note that there is a natural mapping from NN to a subset of the unit circle S1S^{1}. We will say that a subset of NN is connected, if its image under this mapping to S1S^{1} is connected.

First we show that among any three elements of HH, two of them intersect. Let c1c_{1}, c2c_{2} and c3c_{3} be elements of HH, and let h1h_{1}, h2h_{2} and h3h_{3} be their associated halfplanes. Assume for the sake of contradiction that c1c_{1}, c2c_{2} and c3c_{3} are pairwise disjoint. Let nin_{i} be the normal vector of hih_{i}. Let AA be the positive hull of n1n_{1}, n2n_{2} and n3n_{3}. Then AA is either a cone or the whole plane. See Figure 3. If AA is the whole plane, then h1h_{1}, h2h_{2} and h3h_{3} have no common intersection. Otherwise, if AA is a cone, then one of n1n_{1}, n2n_{2} and n3n_{3} can be described as a positive linear combination of the other two. In particular, h1h_{1}, h2h_{2} and h3h_{3} have a common intersection and one of them is redundant in the description of h1h2h3h_{1}\cap h_{2}\cap h_{3}. We thus consider two cases, namely whether h1h_{1}, h2h_{2} and h3h_{3} have a common intersection or not.

Refer to caption
Figure 3: Disjoint c1c_{1}, c2c_{2}, c3c_{3}, where the positive hull of the normal vectors spans the whole plane (left) or not (right).

First, assume that h1h_{1}, h2h_{2} and h3h_{3} have no common intersection. Then h1h_{1}, h2h_{2} and h3h_{3} partition the plane into seven regions (see Figure 4): hihjh_{i}\cap h_{j} for iji\neq j, hi(hjhk)h_{i}\setminus(h_{j}\cup h_{k}), for i,j,ki,j,k all different and h1ch2ch3ch_{1}^{c}\cap h_{2}^{c}\cap h_{3}^{c}. Note that each hihjh_{i}\cap h_{j} contains strictly fewer than n6\frac{n}{6} of the points of PP, as otherwise the corresponding cic_{i} and cjc_{j} intersect. In particular, h2h1h_{2}\setminus h_{1} contains more than n2n6=n3\frac{n}{2}-\frac{n}{6}=\frac{n}{3} points of PP. It follows that (h1h2)c(h_{1}\cup h_{2})^{c} and thus also h3(h1h2)h_{3}\setminus(h_{1}\cup h_{2}) contains strictly fewer than n6\frac{n}{6} of the points of PP. The number of points in h3h_{3} is the sum of the number of points in h3h1h_{3}\cap h_{1}, h3h2h_{3}\cap h_{2} and h3(h1h2)h_{3}\setminus(h_{1}\cup h_{2}). All of these sets contain strictly fewer than n6\frac{n}{6} of the points of PP, implying that h3h_{3} contains fewer than n2\frac{n}{2} of the points of PP, which is a contradiction. This concludes the first case.

Refer to caption
Figure 4: No three elements of HH can be pairwise disjoint.

For the second case assume that h1h_{1}, h2h_{2} and h3h_{3} have a common intersection and one of the halfplanes is redundant in the description of h1h2h3h_{1}\cap h_{2}\cap h_{3}; assume without loss of generality that it is h3h_{3}. Just as in the first case, each hihjh_{i}\cap h_{j} contains strictly fewer than n6\frac{n}{6} of the points of PP. Again it follows that h3(h1h2)h_{3}\setminus(h_{1}\cup h_{2}) contains strictly fewer than n6\frac{n}{6} of the points of PP. The sets h3h1h_{3}\cap h_{1}, h3h2h_{3}\cap h_{2} and h3(h1h2)h_{3}\setminus(h_{1}\cup h_{2}) cover h3h_{3}, implying that h3h_{3} contains fewer than n2\frac{n}{2} of the points of PP, which is again a contradiction. This concludes the proof that among any three elements of HH, two intersect.

It remains to show that we can find three points p1p_{1}, p2p_{2} and p3p_{3} such that each element of HH contains at least one of them. This can be achieved by picking one element of HH and placing two points p1p_{1} and p2p_{2} at the extreme intersection points with the boundary of CC; since any three elements of HH intersect, any two elements not containing p1p_{1} and p2p_{2} must intersect and we may apply Helly’s theorem in dimension one. However, we actually have more flexibility in choosing p1p_{1}.Note that the normal vectors pointing into the halfplanes defining the elements of HH define a circular order on HH. Place p1p_{1} at a topmost point of the boundary of CC. Let h1h_{1} be the first element of HH in counterclockwise direction whose defining halfplane does not contain p1p_{1} in its interior. Place p2p_{2} at the intersection of the defining line of h1h_{1} with the boundary of CC that is furthest in counterclockwise direction from p1p_{1}. Since h1h_{1} is minimal, any element of HH intersecting h1h_{1} contains either p1p_{1} or p2p_{2}. (Note that so far h1h_{1} does not contain any of p1p_{1} or p2p_{2}.) Therefore, all elements of HH that do not intersect h1h_{1} have a common intersection, in which we place p3p_{3}. Recall that the defining halfplanes are open and therefore there is no element of HH intersecting h1h_{1} in a single point. We therefore may move p2p_{2} slightly in clockwise direction, such that it is also contained in h1h_{1}. ∎

The general statement can be proved analogously: See 2

5 Construction in the plane

In this section, we describe algorithms for constructing the points described in Theorems 3 and 6. We first observe that the convex regions defined by the intersections of the half-planes in sets like LL and RR in the proof of Theorem 3 correspond to levels in the dual line arrangement. We use the duality p=(y=kx+d)p=(k,d)p^{*}=(y=kx+d)\iff p=(k,d) that maps a point pp to a line pp^{*}. The kk-level of a line arrangement is the set of points with exactly k1k-1 lines below it and not more than nkn-k lines above it. (It thus consists of segments of the line arrangement.) Suppose we are given α1\alpha_{1} and α2\alpha_{2}, s.t. 0<α1α20<\alpha_{1}\leq\alpha_{2} and α1+2α2=1\alpha_{1}+2\alpha_{2}=1. Let UU be the set of open halfplanes that are above their boundary lines and contain more than (1α2)n(1-\alpha_{2})n points of PP, and let DUD_{U} be their intersection. A point pp is in DUD_{U} if there is no line through it having at least (1α2)n+1\left\lfloor(1-\alpha_{2})n+1\right\rfloor points of PP above it. If the dual line pp^{*} of pp contains a point \ell^{*} below the α2n\left\lceil\alpha_{2}n\right\rceil-level of the dual line arrangement of PP, then pp has a supporting line \ell with more than (1α2)n(1-\alpha_{2})n points of PP above it. Since a line has a point below that level if and only if it intersects the interior of its convex hull, the interior of the convex hull of the α2n\left\lceil\alpha_{2}n\right\rceil-level thus excludes exactly those lines whose primal points are not in DUD_{U}. The supporting lines of the segments of the convex hull of the α2n\left\lceil\alpha_{2}n\right\rceil-level give the primal points that bound DUD_{U}. Matoušek [11] describes an algorithm for constructing the kk-level of a line arrangement in O(nlog4n)O(n\log^{4}n) time. The kk-hull of a set PP of nn points in the plane is the set of points pp in 2\mathbb{R}^{2} such that any closed halfplane defined by a line through pp contains at least kk points of PP. The set CC in the proof of Theorem 3 is the intersection of all open halfplanes containing more than 4n5\frac{4n}{5} points. CC is thus the n5\left\lceil\frac{n}{5}\right\rceil-hull of PP. The kk-hull of PP is obtained by computing the convex hulls of the kk-level and the (nk)(n-k)-level of the dual line arrangement of PP, which give the upper and lower envelope of the kk-hull [11]. To construct the points from Theorems 3 and 6 (without explicitly constructing the levels), we use Matoušek’s algorithmic tools from [11].

Lemma 7 (Matoušek [11, Lemma 3.2]).

In an arrangement of nn lines, let γ\gamma be the boundary of the convex hull of the lines on or below the kk-level. Given the arrangement, kk, and a point pp, one can find the tangent to γ\gamma passing through pp and touching γ\gamma to the right of pp (if it exists) in time O(nlog2n)O(n\log^{2}n).

Lemma 8.

Given an arrangement of nn lines and two numbers k<lnk<l\leq n, as well as a halfplane hh, a line separating the kk-level from the intersection of hh with the ll-level can be found in O(nlog3n)O(n\log^{3}n) time, if it exists. The separating line is tangent to both level parts and, from left to right, first intersects the kk-level and then the relevant part of the ll-level.

Proof.

Let γ\gamma be the boundary of the convex hull of all points below the kk-level, and let ν\nu be the intersection of hh with the ll-level. Note that ν\nu might not be connected. Suppose we want our line to be the counterclockwise bitangent of γ\gamma and ν\nu (i.e., from left to right, it first intersects γ\gamma, which has no point above it, and then ν\nu). Our algorithm works by obtaining tangents to ν\nu through points on γ\gamma. Matoušek’s O(nlog2n)O(n\log^{2}n) algorithm for determining the tangent to a level through a given point that is to the right of that point [11, Lemma 3.2] (our Lemma 7) also directly works for parts of a level such as ν\nu: It requires a sub-algorithm that decides in O(nlogn)O(n\log n) time whether a given line \ell intersects the level (or, in our case, the partial level ν\nu). This can be done by sorting the intersection of the lines of the arrangements along \ell (see also [11, Lemma 3.1]) as well as along the line bounding hh; \ell either intersects the relevant part of ν\nu, or we can compare the intersection of hh with \ell to the intersections of hh with ν\nu to determine whether there is a point of ν\nu below \ell.

Suppose first we are given γ\gamma. (It requires O(nlog4n)O(n\log^{4}n) time though to obtain it, so we eventually get rid of this assumption.) The convex hull of a level is known to have at most nn vertices [11, Lemma 2.1]. For a point pp on γ\gamma, we can find in O(nlog2n)O(n\log^{2}n) time the point qq on ν\nu such that the line pqpq has no point on ν\nu below it. We can thus find, by binary search on the O(n)O(n) vertices of γ\gamma, a vertex pp with qq on ν\nu such that pqpq separates γ\gamma and ν\nu. This gives an O(nlog4n)O(n\log^{4}n) time algorithm for obtaining the bitangent. To improve on that bound, we need to get rid of the explicit construction of γ\gamma to find the tangents to ν\nu.

To this end, we consider Matoušek’s algorithm for constructing the convex hull boundary γ\gamma of a level and compute only the relevant part (see [11, Section 4]). In particular, the algorithm works by finding, for a constant cc and two vertical lines, (c1)(c-1) further vertical lines between the given ones such that there are at most n2/cn^{2}/c crossings of the arrangement between two of these verticals. This can be done in O(n)O(n) time (as described in [12]). The tangents on γ\gamma at the intersection points with the vertical lines can be computed in O(nlog3n)O(n\log^{3}n) time [11, Lemma 3.3]. It is shown in [11] that, when choosing c=64c=64, there are at most n/2n/2 lines of the arrangement relevant for the construction of γ\gamma between two such vertical lines, and these lines can be found in O(n)O(n) time. The original algorithm proceeds recursively within each interval defined by two neighboring vertical lines after removing the non-relevant lines. In our adaption, however, we find the interval that contains the point pp on γ\gamma such that a tangent to γ\gamma through the vertex pp with qq on ν\nu such that pqpq separates γ\gamma and ν\nu. (We do this by considering the tangent to γ\gamma at each of the constant number of intersection of a vertical line with γ\gamma.) When we have found this interval, we can prune n/2n/2 of the lines and recurse inside this interval. Note, however, that we cannot prune the set of lines when looking for a tangent to ν\nu. Thus, in each recursive call, we need O(nlog2n)O(n\log^{2}n) time for computing the tangent. As the recursion depth is O(logn)O(\log n), this amounts to O(nlog3n)O(n\log^{3}n) in total. Also, for nin_{i} lines during the iith recursion, we need O(nilog3ni)O(nilog3n)O(n_{i}\log^{3}n_{i})\subseteq O(n_{i}\log^{3}n) time for determining the intervals. As nin_{i} decreases geometrically, this also amounts to O(nlog3n)O(n\log^{3}n). This is the total running time for finding the bitangent, as claimed. ∎

We call such a line the counterclockwise bitangent of the two subsets of the plane (i.e., the intersection with the region not above it has smaller xx-coordinate than the intersection with the region not below it). Note that by mirroring the plane horizontally or vertically, the lemma also provides other types of bitangents. Figure 5 shows an example.

Refer to caption
Figure 5: A counterclockwise bitangent (brown, dash-dotted) between the 2n5\left\lceil\frac{2n}{5}\right\rceil-level (blue) and the 4n5\left\lfloor\frac{4n}{5}\right\rfloor-level (red) of an arrangement of seven lines (left). The primal point configuration is shown to the right; there, the orange region corresponds to the n5\left\lceil\frac{n}{5}\right\rceil-hull CC, and the hatched green region corresponds to DUD_{U}. Observe that there can be vertices of DUD_{U} outside of CC.

Lemma 8 can also be obtained using the framework of Langerman and Steiger [10], similar to the computation of the Oja depth in [3]. (We merely sketch its application in this paragraph.) Let PP be the primal point set of the arrangement. In the formulation of [3], if there exists a function f:2f:\mathbb{R}^{2}\to\mathbb{R} that has the minimum on the intersection of supporting lines of point pairs of PP with the property that, for a given point aa, we can find in T(n)T(n) time a witness halfplane through aa such that for all points qq in that halfplane, we have f(q)f(a)f(q)\geq f(a), then there is an O(T(n)log2n+nlog3n)O(T(n)\log^{2}n+n\log^{3}n) time algorithm for finding such a minimum point. We can test in O(nlogn)O(n\log n) time whether the dual aa^{*} of aa intersects γ\gamma or ν\nu, defining a witness halfplane using the primal of such an intersection point (as described at the beginning of the previous proof), or the intersection point of aa^{*} with the line defining hh. If the dual line aa^{*} separates γ\gamma from ν\nu, we need a separator with a larger slope to obtain a counterclockwise bitangent, and thus define a witness halfplane to the right of a vertical line through aa.

Lemma 8 can now be used to obtain the following result.

Theorem 9.

Given a set PP of nn points in the plane, two points satisfying the conditions of Theorem 3 can be constructed in time O(nlog3n)O(n\log^{3}n).

Proof.

To find a point p1p_{1} in the intersection of CC and DUD_{U}, observe first that we can restrict our attention in the dual to the convex hull of the points above the (1α1)n\left\lfloor(1-\alpha_{1})n\right\rfloor-level of the dual line arrangement. This is because any primal line with more than (1α1)n(1-\alpha_{1})n points above it (which corresponds to a dual point below the α1n\left\lceil\alpha_{1}n\right\rceil-level) also defines a halfplane in UU. A point in the intersection of DUD_{U} and CC thus corresponds to a line on or above the α2n\left\lceil\alpha_{2}n\right\rceil-level and on or below the (1α1)n\left\lfloor(1-\alpha_{1})n\right\rfloor-level. We find a bitangent to these two levels in O(nlog3n)O(n\log^{3}n) time using Lemma 8 (with h=2h=\mathbb{R}^{2}). The primal point of this line is p1p_{1}; see the point indicated by the brown box in Figure 5 (right). We obtain p2p_{2} analogously. ∎

Theorem 10.

Three points as described in Theorem 6 can be computed in time O(nlog3n)O(n\log^{3}n).

Proof.

Consider the dual line arrangement of the point set. The points p1,p2,p3p_{1},p_{2},p_{3} dualize to three lines p1,p2,p3p_{1}^{*},p_{2}^{*},p_{3}^{*} that are between the n6\left\lceil\frac{n}{6}\right\rceil-level and the 5n6\left\lfloor\frac{5n}{6}\right\rfloor-level of the arrangement s.t. every point on the middle level has at least one of these lines above it and one of these lines below it. (We assume for simplicity that nn is odd and the middle level is the n2\left\lfloor\frac{n}{2}\right\rfloor-level of the arrangement; if nn is even, one has to consider the points between the n2\frac{n}{2}-level and the (n2+1)(\frac{n}{2}+1)-level.) Theorem 6 asserts that such lines exist, and its proof tells us that we can choose one of these lines to be an arbitrary tangent of one of the levels not intersecting the interior of the other one. We denote by γb\gamma_{b} and γt\gamma_{t} the convex hull boundaries of the points on or below the n6\left\lceil\frac{n}{6}\right\rceil-level and of the points on or above the 5n6\left\lfloor\frac{5n}{6}\right\rfloor-level, respectively.

We let p1p^{*}_{1} be the clockwise bitangent of γb\gamma_{b} and γt\gamma_{t}, which we can obtain in O(nlog3n)O(n\log^{3}n) time using Lemma 8. For simplicity of explanation, we also compute the counterclockwise bitangent \ell. (This step may be omitted in an actual implementation, but assuming it to be given facilitates the explanation and does not change the asymptotic running time.)

Refer to caption
Figure 6: An arrangement of seven lines with the n6\left\lceil\frac{n}{6}\right\rceil-level and 5n6\left\lfloor\frac{5n}{6}\right\rfloor-level (blue) and the clockwise bitangent p1p^{*}_{1} (red dashed) between them. The green boxes indicate the two points defining the counterclockwise bitangent between the n6\left\lceil\frac{n}{6}\right\rceil-level and μ1\mu_{1} (brown).

The line p1p^{*}_{1} intersects the middle level of the arrangement. Let μ1\mu_{1} be the parts of the middle level below p1p^{*}_{1}, and μ2\mu_{2} be the part above it. Note that each of these parts may be disconnected. Using Lemma 8, we search for the counterclockwise bitangent between γb\gamma_{b} (or, equivalently, the n6\left\lceil\frac{n}{6}\right\rceil-level) and μ1\mu_{1} (which is the intersection of the middle level with a halfspace defined by p1p^{*}_{1}) in O(nlog3n)O(n\log^{3}n) time. If it exists, and its intersection point with γb\gamma_{b} is between the intersections of γb\gamma_{b} with p1p^{*}_{1} and \ell, we choose this line to be p2p^{*}_{2}. Otherwise, we continue our search on γt)\gamma_{t}) in the same way (i.e., we look for the counterclockwise bitangent between γt\gamma_{t} and μ1\mu_{1}). The line p3p^{*}_{3} can be found in an analogous manner. ∎

6 Conclusion

We proposed a generalization of quantiles in higher dimensions based on a generalization of Tukey depth to multiple points. Our bounds and algorithms seem merely being a first step in this direction and we can identify several interesting open problems. Except for special cases of Theorem 1, we do not believe that our bounds are tight and particularly expect significantly better bounds in higher dimensions. Naturally, there are many other range spaces for which this problem could be considered, e.g., convex sets, like in [6].

From an algorithmic point of view, the bottleneck for the running time of our approach is Lemma 8. The current methods result in O(nlog3n)O(n\log^{3}n) time. While solutions to such kinds of problems can usually only be verified in Θ(nlogn)\Theta(n\log n) time (see, e.g., [2, 17]), a linear-time algorithm, like for centerpoints [9], is conceivable. For arbitrarily many points, it seems tedious but doable to apply similar approaches as in the proof of Theorem 9. Is there a good bound on the running time independent of the size of |Q||Q|?

Acknowledgments.

We thank Emo Welzl for initiating discussions on this topic, as well as anonymous reviewers for helpful comments.

References

  • [1] Greg Aloupis. Geometric measures of data depth. In Regina Y. Liu, Robert Serfling, and Diane L. Souvaine, editors, Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pages 147–158. DIMACS/AMS, 2003.
  • [2] Greg Aloupis, Carmen Cortés, Francisco Gómez, Michael Soss, and Godfried Toussaint. Lower bounds for computing statistical depth. Comput. Statist. Data Anal., 40(2):223–229, 2002. doi:10.1016/S0167-9473(02)00032-4.
  • [3] Greg Aloupis, Stefan Langerman, Michael A. Soss, and Godfried T. Toussaint. Algorithms for bivariate medians and a Fermat-Torricelli problem for lines. Comput. Geom., 26(1):69–79, 2003. doi:10.1016/S0925-7721(02)00173-6.
  • [4] Boris Aronov, Franz Aurenhammer, Ferran Hurtado, Stefan Langerman, David Rappaport, Carlos Seara, and Shakhar Smorodinsky. Small weak epsilon-nets. Comput. Geom., 42(5):455–462, 2009. doi:10.1016/j.comgeo.2008.02.005.
  • [5] Maryam Babazadeh and Hamid Zarrabi-Zadeh. Small weak epsilon-nets in three dimensions. In Proc. 18th Canadian Conference on Computational Geometry (CCCG), 2006.
  • [6] Boris Bukh and Gabriel Nivasch. One-sided epsilon-approximants. In Martin Loebl, Jaroslav Nešetřil, and Robin Thomas, editors, A Journey Through Discrete Mathematics: A Tribute to Jiří Matoušek, pages 343–356. Springer, 2017. doi:10.1007/978-3-319-44479-6_12.
  • [7] Probal Chaudhuri. On a geometric notion of quantiles for multivariate data. J. American Statist. Assoc., 91(434):862–872, 1996.
  • [8] David Haussler and Emo Welzl. ε\varepsilon-nets and simplex range queries. Discrete Comput. Geom., 2:127–151, 1987. doi:10.1007/BF02187876.
  • [9] Shreesh Jadhav and Asish Mukhopadhyay. Computing a centerpoint of a finite planar set of points in linear time. Discrete Comput. Geom., 12:291–312, 1994. doi:10.1007/BF02574382.
  • [10] Stefan Langerman and William L. Steiger. Optimization in arrangements. In 20th Symp. on Theoretical Aspects of Computer Science (STACS), volume 2607 of LNCS, pages 50–61, 2003. doi:10.1007/3-540-36494-3_6.
  • [11] Jiří Matoušek. Computing the center of planar point sets. In Jacob E. Goodman, Richard Pollack, and William Steiger, editors, Discrete and Computational Geometry: Papers from the DIMACS Special Year, volume 6 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 221–230. DIMACS/AMS, 1990.
  • [12] Jiří Matoušek. Construction of epsilon-nets. Discrete Comput. Geom., 5:427–448, 1990. doi:10.1007/BF02187804.
  • [13] Jiří Matoušek. Tight upper bounds for the discrepancy of half-spaces. Discrete Comput. Geom., 13(3):593–601, 1995. doi:10.1007/BF02574066.
  • [14] Jiří Matoušek, Emo Welzl, and Lorenz Wernisch. Discrepancy and approximations for bounded VC-dimension. Combinatorica, 13(4):455–466, 1993. doi:10.1007/BF01303517.
  • [15] Nabil Mustafa and Kasturi Varadarajan. Epsilon-approximations and epsilon-nets. In Handbook of Discrete and Computational Geometry. 2017.
  • [16] Nabil H. Mustafa and Saurabh Ray. An optimal extension of the centerpoint theorem. Comput. Geom., 42(6):505–510, 2009. doi:10.1016/j.comgeo.2007.10.004.
  • [17] Sambuddha Roy and William Steiger. Some combinatorial and algorithmic applications of the Borsuk-Ulam theorem. Graphs Combin., 23:331–341, 2007. doi:10.1007/s00373-007-0716-1.
  • [18] Mudassir Shabbir. Some results in computational and combinatorial geometry. PhD thesis, Rutgers The State University of New Jersey, 2014.
  • [19] John W. Tukey. Mathematics and the picturing of data. In Proc. International Congress of Mathematicians, pages 523–531, 1975.