This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maximum Volume Subset Selection for Anchored Boxes

Karl Bringmann Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.    Sergio Cabello Department of Mathematics, IMFM, and Department of Mathematics, FMF, University of Ljubljana, Slovenia. Supported by the Slovenian Research Agency, program P1-0297 and project L7-5459.    Michael T.M. Emmerich Leiden Institute of Advanced Computer Science (LIACS), Leiden University, the Netherlands.
Abstract

Let BB be a set of nn axis-parallel boxes in d\mathbb{R}^{d} such that each box has a corner at the origin and the other corner in the positive quadrant of d\mathbb{R}^{d}, and let kk be a positive integer. We study the problem of selecting kk boxes in BB that maximize the volume of the union of the selected boxes. This research is motivated by applications in skyline queries for databases and in multicriteria optimization, where the problem is known as the hypervolume subset selection problem. It is known that the problem can be solved in polynomial time in the plane, while the best known running time in any dimension d3d\geq 3 is Ω((nk))\Omega\big{(}\binom{n}{k}\big{)}. We show that:

  • The problem is NP-hard already in 3 dimensions.

  • In 3 dimensions, we break the bound Ω((nk))\Omega\big{(}\binom{n}{k}\big{)}, by providing an nO(k)n^{O(\sqrt{k})} algorithm.

  • For any constant dimension dd, we present an efficient polynomial-time approximation scheme.

1 Introduction

An anchored box is an orthogonal range of the form box(p):=[0,p1]××[0,pd]0d\textsc{box}(p):=[0,p_{1}]\times\ldots\times[0,p_{d}]\subset\mathbb{R}_{\geq 0}^{d}, spanned by the point p>0dp\in\mathbb{R}_{>0}^{d}. This paper is concerned with the problem Volume Selection: Given a set PP of nn points in >0d\mathbb{R}_{>0}^{d}, select kk points in PP maximizing the volume of the union of their anchored boxes. That is, we want to compute

VolSel(P,k):=maxSP,|S|=kvol(pSbox(p)),\textsc{VolSel}(P,k):=\max_{S\subseteq P,\,|S|=k}\textsc{vol}\Big{(}\bigcup_{p\in S}\textsc{box}(p)\Big{)},

as well as a set SPS^{*}\subseteq P of size kk realizing this value. Here, vol denotes the usual volume.

Motivation

This geometric problem is of key importance in the context of multicriteria optimization and decision analysis, where it is known as the hypervolume subset selection problem (HSSP) [2, 3, 4, 24, 12, 13]. In this context, the points in PP correspond to solutions of an optimization problem with dd objectives, and the goal is to find a small subset of PP that ’’represents‘‘ the set PP well. The quality of a representative subset SPS\subseteq P is measured by the volume of the union of the anchored boxes spanned by points in SS; this is also known as the hypervolume indicator [36]. Note that with this quality indicator, finding the optimal size-kk representation is equivalent to our problem VolSel(P,k)\textsc{VolSel}(P,k). In applications, such bounded-size representations are required in archivers for non-dominated sets [23] and for multicriteria optimization algorithms and heuristics [3, 10, 7].111We remark that in these applications the anchor point is often not the origin, however, by a simple translation we can move our anchor point from (0,,0)(0,\ldots,0) to any other point in d\mathbb{R}^{d}. Besides, the problem has recently received attention in the context of skyline operators in databases [17].

In 2 dimensions, the problem can be solved in polynomial time [2, 13, 24], which is used in applications such as analyzing benchmark functions [2] and efficient postprocessing of multiobjective algorithms [12]. A natural question is whether efficient algorithms also exist in dimension d3d\geq 3, and thus whether these applications can be pushed beyond two objectives.

In this paper, we answer this question negatively, by proving that Volume Selection is NP-hard already in 3 dimensions. We then consider the question whether the previous Ω((nk))\Omega(\binom{n}{k}) bound can be improved, which we answer affirmatively in 3 dimensions. Finally, for any constant dimension, we improve the best-known (11/e)(1-1/e)-approximation to an efficient polynomial-time approximation scheme (EPTAS). See Section 1.2 for details.

1.1 Further Related Work

Klee‘s Measure Problem

To compute the volume of the union of nn (not necessarily anchored) axis-aligned boxes in d\mathbb{R}^{d} is known as Klee‘s measure problem. The fastest known algorithm takes time222In OO-notation, we always assume dd to be a constant, and log(x)\log(x) is to be understood as max{1,log(x)}\max\{1,\log(x)\}. O(nd/2)O(n^{d/2}), which can be improved to O(nd/3polylog(n))O(n^{d/3}\textup{polylog}(n)) if all boxes are cubes [15]. By a simple reduction [8], the same running time as on cubes can be obtained on anchored boxes, which can be improved to O(nlogn)O(n\log n) for d3d\leq 3 [6]. These results are relevant to this paper because Klee‘s measure problem on anchored boxes (spanned by the points in PP) is a special case of Volume Selection (by calling VolSel(P,|P|)\textsc{VolSel}(P,|P|)).

Chan [14] gave a reduction from kk-Clique to Klee‘s measure problem in 2k2k dimensions. This proves NP-hardness of Klee‘s measure problem when dd is part of the input (and thus dd can be as large as nn). Moreover, since kk-Clique has no f(k)no(k)f(k)\cdot n^{o(k)}-time algorithm under the Exponential Time Hypothesis [16], Klee‘s measure problem has no f(d)no(d)f(d)\cdot n^{o(d)}-time algorithm under the same assumption. The same hardness results also hold for Klee‘s measure problem on anchored boxes, by a reduction in [8] (NP-hardness was first proven in [11]).

Finally, we mention that Klee‘s measure problem has a very efficient randomized (1±ε)(1\pm\varepsilon)-approximation algorithm in time O(nlog(1/δ)/ε2)O(n\log(1/\delta)/\varepsilon^{2}) with error probability δ\delta [9].

Known Results for Volume Selection

As mentioned above, 2-dimensional Volume Selection can be solved in polynomial time; the initial O(kn2)O(kn^{2}) algorithm [2] was later improved to O((nk)k+nlogn)O((n-k)k+n\log n) [13, 24]. In higher dimensions, by enumerating all size-kk subsets and solving an instance of Klee‘s measure problem on anchored boxes for each one, there is an O((nk)kd/3polylog(k))O\big{(}\binom{n}{k}k^{d/3}\textup{polylog}(k)\big{)} algorithm. For small nkn-k, this can be improved to O(nd/2logn+nnk)O(n^{d/2}\log n+n^{n-k}) [10]. Volume Selection is NP-hard when dd is part of the input, since the same holds already for Klee‘s measure problem on anchored boxes. However, this does not explain the exponential dependence on kk for constant dd.

Since the volume of the union of boxes is a submodular function (see, e.g., [33]), the greedy algorithm for submodular function maximization [28] yields a (11/e)(1-1/e)-approximation of VolSel(P,k)\textsc{VolSel}(P,k). This algorithm solves O(nk)O(nk) instances of Klee‘s measure problem on at most kk anchored boxes, and thus runs in time O(nkd/3+1polylog(k))O(nk^{d/3+1}\textup{polylog}(k)). Using [9], this running time improves to O(nk2log(1/δ)/ε2)O(nk^{2}\log(1/\delta)/\varepsilon^{2}), at the cost of decreasing the approximation ratio to 11/eε1-1/e-\varepsilon and introducing an error probability δ\delta. See [20] for related results in 33 dimensions.

A problem closely related to Volume Selection is Convex Hull Subset Selection: Given nn points in d\mathbb{R}^{d}, select kk points that maximize the volume of their convex hull. For this problem, NP-hardness was recently announced in the case d=3d=3 [30].

1.2 Our Results

In this paper we push forward the understanding of Volume Selection. We prove that Volume Selection is NP-hard already for d=3d=3 (Section 3). Previously, NP-hardness was only known when dd is part of the input and thus can be as large as nn. Moreover, this establishes Volume Selection as another example for problems that can be solved in polynomial time in the plane but are NP-hard in three or more dimensions (see also [5, 26]).

In the remainder, we focus on the regime where d3d\geq 3 is a constant and knk\ll n. All known algorithms (explicitly or implicitly) enumerate all size-kk subsets of the input set PP and thus take time Ω((nk))=nΩ(k)\Omega\big{(}\binom{n}{k}\big{)}=n^{\Omega(k)}. In 3 dimensions, we break this time bound by providing an nO(k)n^{O(\sqrt{k})} algorithm (Section 4). To this end, we project the 3-dimensional Volume Selection to a 2-dimensional problem and then use planar separator techniques.

Finally, in Section 5 we design an EPTAS for Volume Selection. More precisely, we present a (1ε)(1-\varepsilon)-approximation algorithm running in time O(nεd(logn+k+2O(ε2log1/ε)d))O(n\cdot\varepsilon^{-d}(\log n+k+2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}})), for any constant dimension dd. Note that the ’’combinatorial explosion‘‘ is restricted to dd and ε\varepsilon; for any constant d,εd,\varepsilon the algorithm runs in time O(n(k+logn))O(n(k+\log n)). This improves the previously best-known (11/e)(1-1/e)-approximation, even in terms of running time.

2 Preliminaries

All boxes considered in the paper are axis-parallel and anchored at the origin. For points p=(p1,,pd),q=(q1,,qd)dp=(p_{1},\ldots,p_{d}),\,q=(q_{1},\ldots,q_{d})\in\mathbb{R}^{d}, we say that pp dominates qq if piqip_{i}\geq q_{i} for all 1id1\leq i\leq d. For p=(p1,,pd)>0dp=(p_{1},\ldots,p_{d})\in\mathbb{R}_{>0}^{d}, we let box(p):=[0,p1]××[0,pd]\textsc{box}(p):=[0,p_{1}]\times\ldots\times[0,p_{d}]. Note that box(p)\textsc{box}(p) is the set of all points q0dq\in\mathbb{R}_{\geq 0}^{d} that are dominated by pp. A point set PP is a set of points in >0d\mathbb{R}_{>0}^{d}. We denote the union pPbox(p)\bigcup_{p\in P}\textsc{box}(p) by 𝒰(P)\mathcal{U}(P). The usual Euclidean volume is denoted by vol. With this notation, we set

μ(P):=vol(𝒰(P))=vol(pPbox(p))=vol(pP[0,p1]××[0,pd]).\mu(P):=\textsc{vol}(\mathcal{U}(P))=\textsc{vol}\Big{(}\bigcup_{p\in P}\textsc{box}(p)\Big{)}=\textsc{vol}\Big{(}\bigcup_{p\in P}[0,p_{1}]\times\ldots\times[0,p_{d}]\Big{)}.

We study Volume Selection: Given a point set PP of size nn and 0kn0\leq k\leq n, compute

VolSel(P,k):=maxSP,|S|=kμ(S).\textsc{VolSel}(P,k):=\max_{S\subseteq P,\,|S|=k}\mu(S).

Note that we can relax the requirement |S|=k|S|=k to |S|k|S|\leq k without changing this value.

3 Hardness in 3 Dimensions

We consider the following decision variant of 3-dimensional Volume Selection.

3d Volume Selection
Input:
A triple (P,k,V)(P,k,V), where PP is a set of points in >03\mathbb{R}_{>0}^{3}, kk is a positive integer and VV is a positive real value.
Question: Is there a subset QPQ\subseteq P of kk points such that μ(Q)V\mu(Q)\geq V?

We are going to show that the problem is NP-complete. First, we show that an intermediate problem about selecting a large independent set in a given induced subgraph of the triangular grid is NP-hard. The reduction for this problem is from independent set in planar graphs of maximum degree 33. Then we argue that this problem can be embedded using boxes whose points lie in two parallel planes. One plane is used to define the triangular-grid-like structure and the other is used to encode the subset of vertices that describe the induced subgraph of the grid.

3.1 Triangular Grid

Let Γ\Gamma be the infinite graph with vertex set and edge set (see Figure 1)

V(Γ)\displaystyle V(\Gamma)~ ={(i+j1/2,j3/2)i,j},\displaystyle=~\big{\{}(i+j\cdot 1/2,j\cdot\sqrt{3}/2)\mid i,j\in\mathbb{N}\big{\}},
E(Γ)\displaystyle E(\Gamma)~ ={aba,bV(Γ),the Euclidean distance between a and b is exactly 1}.\displaystyle=~\left\{ab\mid a,b\in V(\Gamma),~\text{the Euclidean distance between $a$ and $b$ is exactly $1$}\right\}.

First we show that the following intermediate problem, which is closely related to independent set, is NP-hard.

Refer to caption
Figure 1: Triangular grid Γ\Gamma.

Independent Set on Induced Triangular Grid
Input:
A pair (A,)(A,\ell), where AA is a subset of V(Γ)V(\Gamma) and \ell is a positive integer.
Question: Is there a subset BAB\subseteq A of size \ell such that no two vertices in BB are connected by an edge of E(Γ)E(\Gamma)?

Lemma 3.1.

Independent Set on Induced Triangular Grid is NP-complete.

Proof.

It is obvious that the problem is in NP.

Garey and Johnson [19] show that the problem Vertex Cover is NP-complete for planar graphs of degree at most 33. Since a subset UV(G)U\subseteq V(G) is a vertex cover of graph GG if and only if V(G)UV(G)\setminus U is an independent set of GG, it follows that the problem Independent Set is NP-complete for planar graphs of degree at most 33. For the rest of the proof, let GG be a planar graph of degree at most 33.

Let us define a Γ\Gamma-representation of GG to be a pair (H,φ)(H,\varphi), where HΓH\subset\Gamma and φ\varphi is a mapping, with the following properties:

  • Each vertex uu of GG is mapped to a distinct vertex φ(u)\varphi(u) of HH.

  • Each edge uvuv of GG is mapped to a simple path φ(uv)\varphi(uv) contained in HH and connecting φ(u)\varphi(u) to φ(v)\varphi(v).

  • For each two distinct edges uvuv and uvu^{\prime}v^{\prime} of GG, the paths φ(uv)\varphi(uv) and φ(uv)\varphi(u^{\prime}v^{\prime}) are disjoint except at the common endpoints {φ(u),φ(v)}{φ(u),φ(v)}\{\varphi(u),\varphi(v)\}\cap\{\varphi(u^{\prime}),\varphi(v^{\prime})\}.

  • The graph HH is precisely the union of φ(u)\varphi(u) and φ(uv)\varphi(uv) over all vertices uu and edges uvuv of GG.

Note that if (H,φ)(H,\varphi) is a Γ\Gamma-representation of GG then HH is a subdivision of GG. The map ϕ\phi identifies which parts of HH correspond to which parts of GG.

A planar graph GG with nn vertices and maximum degree 33 (and also 44) can be drawn in a square grid of polynomial size, and such a drawing can be obtained in polynomial time, see, e.g., the results by Storer [31] or by Tamassia and Tollis [32]. Applying the shear mapping (x,y)(x+y/2,y2/3)(x,y)\mapsto(x+y/2,y\sqrt{2}/3) to the plane, the square grid becomes a subgraph of Γ\Gamma. Therefore, we can obtain a Γ\Gamma-representation (H1,φ1)(H_{1},\varphi_{1}) of GG of polynomial size. Note that we only use edges of Γ\Gamma that are horizontal or have positive slope; edges of Γ\Gamma with negative slope are not used.

Next, we obtain another Γ\Gamma-representation (H2,φ2)(H_{2},\varphi_{2}) such that H2H_{2} is an induced subgraph of Γ\Gamma. Induced means that two vertices of H2H_{2} are connected with an edge in H2H_{2} if and only if the edge exists in Γ\Gamma. For this, we first scale up the Γ\Gamma-representation (H1,φ1)(H_{1},\varphi_{1}) by a factor 22 so that each edge of H1H_{1} becomes a 2-edge path. The new vertices used in the subdivision have degree 22 and its 22 incident edges have the same orientation. After the subdivision, vertices of degree 33 look like in Figure 2. Scaling up the figure by a factor of 33, and rerouting within a small neighbourhood of each vertex vv that was already in H1H_{1}, we obtain a Γ\Gamma-representation (H2,φ2)(H_{2},\varphi_{2}) such that H2H_{2} is an induced subgraph of Γ\Gamma. See Figure 2 for an example of such a local transformation.

Refer to caption
Figure 2: Transformation to get an induced subgraph of the triangular grid. Vertices from the subdivision of edges are green squares.

Now we have a Γ\Gamma-representation (H2,φ2)(H_{2},\varphi_{2}) such that H2H_{2} is an induced subgraph of Γ\Gamma. We want to obtain another Γ\Gamma-representation where for each edge uvE(G)uv\in E(G) the path φ2(uv)\varphi_{2}(uv) uses an even number of interior edges. For this, we can slightly reroute each path φ2(uv)\varphi_{2}(uv) that has an odd number of interior points, see Figure 3. To make sure that the graph is still induced, we can first scale up the situation by a factor 22, and then reroute all the edges φ2(uv)\varphi_{2}(uv) that use an odd number of interior vertices. (This is actually all the edges uvE(G)uv\in E(G) because of the scaling.) Let (H3,φ3)(H_{3},\varphi_{3}) be the resulting Γ\Gamma-representation of GG. Note that H3H_{3} is an induced subgraph of Γ\Gamma and it is a subdivision of GG where each edge is subdivided an even number of times.

Refer to caption
Figure 3: Choosing the parity of paths.

Let α(G)\alpha(G) denote the size of the largest independent set in GG. For each edge uvuv of GG, let 2kuv2k_{uv} be the number of internal vertices in the path φ3(uv)\varphi_{3}(uv). Then α(H3)=α(G)+uvE(G)kuv\alpha(H_{3})=\alpha(G)+\sum_{uv\in E(G)}k_{uv}. Indeed, we can obtain H3H_{3} from GG by repeatedly replacing an edge by a 33-edge path, i.e., making 2 subdivisions on the same edge. Moreover, any such replacement increases the size of the largest independent set by exactly 1.

It follows that the problem Independent Set is NP-complete in induced subgraphs of the triangular grid Γ\Gamma. This is precisely the problem Independent Set on Induced Triangular Grid, where we take AA to be the set of vertices defining the induced subgraph. ∎

3.2 The Point Set

Let m3m\geq 3 be an arbitrary integer and consider the point set PmP_{m} defined by (see Figure 4)

Pm={(x,y,z)3x+y+z=m}.P_{m}~=~\{(x,y,z)\in\mathbb{N}^{3}\mid x+y+z=m\}.
Refer to caption
Figure 4: The point set PmP_{m} and the boxes box(p)\textsc{box}(p), with pPmp\in P_{m}, for m=9m=9.

Standard induction shows that the set PmP_{m} has 1+2++(m2)=(m1)(m2)/21+2+\dots+(m-2)=(m-1)(m-2)/2 points and that

μ(Pm)=vol(pPmbox(p))=m(m1)(m2)/6.\mu(P_{m})~=~\textsc{vol}\left(\bigcup_{p\in P_{m}}\textsc{box}(p)\right)~=~m(m-1)(m-2)/6.

This last number appears as sequence A000292, tetrahedral (or triangular pyramidal) numbers, in [27].

Consider the real number ε=1/4m2\varepsilon=1/4m^{2}, and define the vector Δε=(ε,ε,ε)\Delta_{\varepsilon}=(\varepsilon,\varepsilon,\varepsilon). Note that ε\varepsilon is much smaller than 11. For each point pPm1p\in P_{m-1}, consider the point p+Δεp+\Delta_{\varepsilon}, see Figure 5. Let us define the set QmQ_{m} to be

Qm={p+ΔεpPm1}.Q_{m}~=~\{p+\Delta_{\varepsilon}\mid p\in P_{m-1}\}.

It is clear that QmQ_{m} has |Pm1|=(m2)(m3)/2|P_{m-1}|=(m-2)(m-3)/2 points, for m3m\geq 3. The points of QmQ_{m} lie on the plane x+y+z=m1+3εx+y+z=m-1+3\varepsilon.

Refer to caption
Figure 5: The point q=p+Δεq=p+\Delta_{\varepsilon} and the set diff(q)\textsc{diff}(q).

For each point qq of QmQ_{m} define

diff(q)=𝒰(Pm{q})𝒰(Pm)=(pPm{q}box(p))(pPmbox(p)).\textsc{diff}(q)~=~\mathcal{U}\big{(}P_{m}\cup\{q\}\big{)}\setminus\mathcal{U}\big{(}P_{m}\big{)}~=~\left(\bigcup_{p\in P_{m}\cup\{q\}}\textsc{box}(p)\right)\setminus\left(\bigcup_{p\in P_{m}}\textsc{box}(p)\right).

Note that diff(q)\textsc{diff}(q) is the union of 33 boxes of size ε×ε×1\varepsilon\times\varepsilon\times 1 and a cube of size ε×ε×ε\varepsilon\times\varepsilon\times\varepsilon, see Figure 5. To get the intuition for the following lemma, see Figure 6.

Refer to caption
Figure 6: The sets diff(q)\textsc{diff}(q) for all qQmq\in Q_{m}.
Lemma 3.2.

Consider any QQmQ^{\prime}\subseteq Q_{m}.

  • If the sets diff(q)\textsc{diff}(q), for all qQq\in Q^{\prime}, are pairwise disjoint, then μ(PmQ)=μ(Pm)+|Q|(3ε2+ε3)\mu(P_{m}\cup Q^{\prime})=\mu(P_{m})+|Q^{\prime}|\cdot(3\varepsilon^{2}+\varepsilon^{3}).

  • If QQ^{\prime} contains two points q0q_{0} and q1q_{1} such that diff(q0)\textsc{diff}(q_{0}) and diff(q1)\textsc{diff}(q_{1}) intersect, then μ(PmQ)<μ(Pm)+|Q|(3ε2+ε3)\mu(P_{m}\cup Q^{\prime})<\mu(P_{m})+|Q^{\prime}|\cdot(3\varepsilon^{2}+\varepsilon^{3}).

Proof.

Note that for each qQmq\in Q_{m} we have

μ(Pm{q})μ(Pm)=vol(diff(q))=3ε2+ε3.\mu(P_{m}\cup\{q\})-\mu(P_{m})~=~\textsc{vol}(\textsc{diff}(q))~=~3\varepsilon^{2}+\varepsilon^{3}.

If the sets {diff(q)qQ}\{\textsc{diff}(q)\mid q\in Q^{\prime}\} are pairwise disjoint then

μ(PmQ)\displaystyle\mu(P_{m}\cup Q^{\prime})~ =μ(Pm)+vol(qQdiff(q))\displaystyle=~\mu(P_{m})+\textsc{vol}\left(\bigcup_{q\in Q^{\prime}}\textsc{diff}(q)\right)
=μ(Pm)+qQvol(diff(q))\displaystyle=~\mu(P_{m})+\sum_{q\in Q^{\prime}}\textsc{vol}(\textsc{diff}(q))
=μ(Pm)+|Q|(3ε2+ε3).\displaystyle=~\mu(P_{m})+|Q^{\prime}|\cdot\left(3\varepsilon^{2}+\varepsilon^{3}\right).

Consider now the case when QQ^{\prime} contains two points q0q_{0} and q1q_{1} such that diff(q0)\textsc{diff}(q_{0}) and diff(q1)\textsc{diff}(q_{1}) intersect. The geometry of the point set QQ^{\prime} implies that diff(q0)\textsc{diff}(q_{0}) and diff(q1)\textsc{diff}(q_{1}) intersect in a cube of size ε×ε×ε\varepsilon\times\varepsilon\times\varepsilon, see Figure 6. Therefore, we have

μ(PmQ)\displaystyle\mu(P_{m}\cup Q^{\prime})~ =μ(Pm)+vol(qQdiff(q))\displaystyle=~\mu(P_{m})+\textsc{vol}\left(\bigcup_{q\in Q^{\prime}}\textsc{diff}(q)\right)
μ(Pm)+qQvol(diff(q))vol(diff(q0)diff(q1))\displaystyle\leq~\mu(P_{m})+\sum_{q\in Q^{\prime}}\textsc{vol}(\textsc{diff}(q))-\textsc{vol}(\textsc{diff}(q_{0})\cap\textsc{diff}(q_{1}))
=μ(Pm)+|Q|(3ε2+ε3)ε2\displaystyle=~\mu(P_{m})+|Q^{\prime}|\cdot\left(3\varepsilon^{2}+\varepsilon^{3}\right)-\varepsilon^{2}
<μ(Pm)+|Q|(3ε2+ε3).\displaystyle<\mu(P_{m})+|Q^{\prime}|\cdot\left(3\varepsilon^{2}+\varepsilon^{3}\right).

We can define naturally a graph TmT_{m} on the set QmQ_{m} by using the intersection of the sets diff()\textsc{diff}(\cdot). The vertex set of TmT_{m} is QmQ_{m}, and two points q,qQmq,q^{\prime}\in Q_{m} define an edge qqqq^{\prime} of TmT_{m} if and only if diff(q)\textsc{diff}(q) and diff(q)\textsc{diff}(q^{\prime}) intersect, see Figure 7. Simple geometry shows that TmT_{m} is isomorphic to a part of the triangular grid Γ\Gamma. Thus, choosing mm large enough, we can get an arbitrarily large portion of the triangular grid Γ\Gamma. Note that a subset of vertices QQmQ^{\prime}\subseteq Q_{m} is independent in TmT_{m} if and only if the sets {diff(q)qQ)\left\{\textsc{diff}(q)\mid q\in Q^{\prime}\right) are pairwise disjoint.

Refer to caption
Figure 7: The graph TmT_{m} for m=9m=9.

We next show that picking points in PmP_{m} has higher priority than picking points in QmQ_{m}.

Lemma 3.3.

Let PP^{\prime} be a subset of PmP_{m} such that PmPP_{m}\setminus P^{\prime} is not empty. Then μ(PQm)<μ(Pm)\mu(P^{\prime}\cup Q_{m})<\mu(P_{m}).

Proof.

Assume that PmPP_{m}\setminus P^{\prime} contains exactly one point, denoted by pp. Having a smaller set PP^{\prime} can only decrease the value of μ(PQm)\mu(P^{\prime}\cup Q_{m}). Then

μ(P)=μ(Pm)1.\mu(P^{\prime})~=~\mu(P_{m})-1.

Consider the sets of 33 points

Qm1(p)={\displaystyle Q^{1}_{m}(p)~=~\{ (px1,py,pz)+Δε,(px,py1,pz)+Δε,(px,py,pz1)+Δε}Qm,\displaystyle(p_{x}-1,p_{y},p_{z})+\Delta_{\varepsilon},(p_{x},p_{y}-1,p_{z})+\Delta_{\varepsilon},(p_{x},p_{y},p_{z}-1)+\Delta_{\varepsilon}\}~\subseteq~Q_{m},
Qm2(p)={\displaystyle Q^{2}_{m}(p)~=~\{ (px1,py1,pz+1)+Δε,(px+1,py1,pz1)+Δε,\displaystyle(p_{x}-1,p_{y}-1,p_{z}+1)+\Delta_{\varepsilon},(p_{x}+1,p_{y}-1,p_{z}-1)+\Delta_{\varepsilon},
(px1,py+1,pz1)+Δε}Qm.\displaystyle(p_{x}-1,p_{y}+1,p_{z}-1)+\Delta_{\varepsilon}\}~\subseteq~Q_{m}.

Figure 8 is useful for the following computations. For each point qQm1(p)q\in Q^{1}_{m}(p) we have

μ(Pq)=μ(P)+vol(diff(q))+ε.\mu(P^{\prime}\cup q)~=~\mu(P^{\prime})+\textsc{vol}(\textsc{diff}(q))+\varepsilon.

For each point qQm2(p)q\in Q^{2}_{m}(p) we have

μ(Pq)=μ(P)+vol(diff(q))+ε2.\mu(P^{\prime}\cup q)~=~\mu(P^{\prime})+\textsc{vol}(\textsc{diff}(q))+\varepsilon^{2}.

Using that ε2ε\varepsilon^{2}\leq\varepsilon because 0<ε<10<\varepsilon<1, we get

qQm1(p)Qm2(p):μ(Pq)μ(P)+vol(diff(q))+ε.\forall q\in Q^{1}_{m}(p)\cup Q^{2}_{m}(p):~~~\mu(P^{\prime}\cup q)~\leq~\mu(P^{\prime})+\textsc{vol}(\textsc{diff}(q))+\varepsilon.

For all points qq of Qm(Qm1(p)Qm2(p))Q_{m}\setminus(Q^{1}_{m}(p)\cup Q^{2}_{m}(p)) we have

μ(Pq)=μ(P)+vol(diff(q)).\mu(P^{\prime}\cup q)~=~\mu(P^{\prime})+\textsc{vol}(\textsc{diff}(q)).
Refer to caption
Figure 8: Image for the proof of Lemma 3.3. The point pp of PmP_{m} that is missing in PP^{\prime} is indicated with a white cross. Left: the contribution of points from Qm1(p)Q^{1}_{m}(p). Right: the contribution of points from Qm2(p)Q^{2}_{m}(p).

We thus have

μ(PQm)\displaystyle\mu(P^{\prime}\cup Q_{m})~ μ(P)+qQmvol(diff(q))+qQm1(p)Qm2(p)ε\displaystyle\leq~\mu(P^{\prime})+\sum_{q\in Q_{m}}\textsc{vol}(\textsc{diff}(q))+\sum_{q\in Q^{1}_{m}(p)\cup Q^{2}_{m}(p)}\varepsilon
=μ(Pm)1+|Qm|(3ε2+ε3)+6ε\displaystyle=~\mu(P_{m})-1+|Q_{m}|\cdot(3\varepsilon^{2}+\varepsilon^{3})+6\cdot\varepsilon
μ(Pm)1+(m2)(m3)24ε+6ε\displaystyle\leq~\mu(P_{m})-1+\frac{(m-2)(m-3)}{2}\cdot 4\cdot\varepsilon+6\cdot\varepsilon
<μ(Pm),\displaystyle<~\mu(P_{m}),

where the last step uses ε=1/4m2\varepsilon=1/4m^{2}. ∎

3.3 The Reduction

We are now ready to prove NP-completeness of 3d Volume Selection.

Theorem 3.4.

The problem 3d Volume Selection is NP-complete.

Proof.

It is obvious that the problem is in NP. To show hardness we reduce from the problem Independent Set Induced Triangular Grid, shown to be NP-complete in Lemma 3.1.

Consider an instance (A,)(A,\ell) to Independent Set on Induced Triangular Grid, where AA is a subset of the vertices of the triangular grid Γ\Gamma and \ell is an integer. Take mm large enough so that TmT_{m} is isomorphic to an induced subgraph of Γ\Gamma that contains AA. Recall that ε=1/4m2\varepsilon=1/4m^{2}. For each vertex vv of TmT_{m} let ψΓ(v)\psi_{\Gamma}(v) be the corresponding vertex of Γ\Gamma. For each subset BB of AA, let Qm(B)Q_{m}(B) be the subset of TmT_{m} that corresponds to BB, that is, Qm(B)={qQmψΓ(q)B}Q_{m}(B)=\{q\in Q_{m}\mid\psi_{\Gamma}(q)\in B\}.

Consider the set of points P=PmQm(A)P=P_{m}\cup Q_{m}(A), the parameter k=(m1)(m2)/2+k=(m-1)(m-2)/2+\ell, and the value V=m(m1)(m2)6+(3ε2+ε3)V=\frac{m(m-1)(m-2)}{6}+\ell\cdot(3\varepsilon^{2}+\varepsilon^{3}). We claim that (A,)(A,\ell) is a yes-instance for Independent Set on Induced Triangular Grid if and only if (P,k,V)(P,k,V) is a yes-instance for 3d Volume Selection.

If (A,)(A,\ell) is a yes-instance for Independent Set on Induced Triangular Grid, there is a subset BAB\subseteq A of \ell independent vertices in Γ\Gamma. This implies that Qm(B)Q_{m}(B) is an independent set in TmT_{m}, that is, the sets {diff(q)qQm(B)}\{\textsc{diff}(q)\mid q\in Q_{m}(B)\} are pairwise disjoint. Lemma 3.2 then implies that

μ(PmQm(B))\displaystyle\mu(P_{m}\cup Q_{m}(B))~ =μ(Pm)+|B|(3ε2+ε3)\displaystyle=~\mu(P_{m})+|B|\cdot(3\varepsilon^{2}+\varepsilon^{3})
=m(m1)(m2)6+(3ε2+ε3)\displaystyle=~\frac{m(m-1)(m-2)}{6}+\ell\cdot(3\varepsilon^{2}+\varepsilon^{3})
=V.\displaystyle=~V.

Therefore PmQm(B)P_{m}\cup Q_{m}(B) is a subset of PP with |Pm|+|B|=(m1)(m2)/2+=k|P_{m}|+|B|=(m-1)(m-2)/2+\ell=k points such that μ(PmQm(B))=V\mu(P_{m}\cup Q_{m}(B))=V. It follows that (P,k,V)(P,k,V) is a yes-instance for 3d Volume Selection.

Assume now that (P,k,V)(P,k,V) is a yes-instance for 3d Volume Selection. This means that PP contains a subset QQ of kk points such that

μ(Q)V=m(m1)(m2)6+(3ε2+ε3)=μ(Pm)+(3ε2+ε3)>μ(Pm).\mu(Q)~\geq~V~=~\frac{m(m-1)(m-2)}{6}+\ell\cdot(3\varepsilon^{2}+\varepsilon^{3})~=~\mu(P_{m})+\ell\cdot(3\varepsilon^{2}+\varepsilon^{3})~>~\mu(P_{m}).

Because of Lemma 3.3, it must be that PmP_{m} is contained in QQ, as otherwise we would have μ(Q)<μ(Pm)\mu(Q)<\mu(P_{m}). Since we have PmQP_{m}\subset Q and P=PmQm(A)P=P_{m}\cup Q_{m}(A), we obtain that QQ is PmQm(B)P_{m}\cup Q_{m}(B) for some BAB\subseteq A. Moreover, |B|=k|Pm|=|B|=k-|P_{m}|=\ell. By Lemma 3.2, if Qm(B)Q_{m}(B) is not an independent set in TmT_{m}, we have

μ(Q)=μ(PmQm(B))<μ(Pm)+(3ε2+ε)=V,\mu(Q)~=~\mu(P_{m}\cup Q_{m}(B))~<~\mu(P_{m})+\ell(3\varepsilon^{2}+\varepsilon)~=~V,

which contradicts the assumption that μ(Q)V\mu(Q)\geq V. Therefore it must be that Qm(B)Q_{m}(B) is an independent set in TmT_{m}. It follows that BAB\subset A has size \ell and forms an independent set in Γ\Gamma, and thus (A,)(A,\ell) is a yes-instance for Independent Set on Induced Triangular Grid. ∎

4 Exact Algorithm in 3 Dimensions

In this section we design an algorithm to solve Volume Selection in 3 dimensions in time nO(k)n^{O(\sqrt{k})}. The main insight is that, for an optimal solution QQ^{*}, the boundary of 𝒰(Q)\mathcal{U}(Q^{*}) is a planar graph with O(k)O(k) vertices, and therefore has a balanced separator with O(k)O(\sqrt{k}) vertices. We would like to guess the separator, break the problem into two subproblems, and solve each of them recursively. This basic idea leads to a few technical challenges to take care of. One obstacle is that subproblems should be really independent because we do not want to double count some covered parts. Essentially, a separator in the graph-theory sense does not imply independent subproblems in our context. Another technicality is that some of the subproblems that we encounter recursively cannot be solved optimally; we can only get a lower bound to the optimal value. However, for the subproblems that define the optimal solution at the higher level of the recursion, we do compute an optimal solution.

Let PP be a set of nn points in the positive quadrant of 3\mathbb{R}^{3}. Through our discussion, we will assume that PP is fixed and thus drop the dependency on PP and nn from the notation. We can assume that no point of PP is dominated by another point of PP. Using an infinitesimal perturbation of the points, we can assume that all points have all coordinates different. Indeed, we can replace each point pp by the point p+i(ε,ε,ε)p+i(\varepsilon,\varepsilon,\varepsilon), where ii is a different integer for each point of PP and ε>0\varepsilon>0 is an infinitesimal value or a value that is small enough.

Let MM be the largest xx- or yy-coordinate in PP, thus M=max{px,pypP}M=\max\{p_{x},p_{y}\mid p\in P\}. We define σ\sigma to be the square in 2\mathbb{R}^{2} defined by [1,M+1]×[1,M+1][-1,M+1]\times[-1,M+1]. It has side length M+2M+2.

For each subset QQ of PP, consider the projection of 𝒰(Q)\mathcal{U}(Q) onto the xyxy-plane. This defines a plane graph, which we denote by G(Q)G(Q), and which we define precisely in the following, see Figure 9. We consider G(Q)G(Q) as a geometric, embedded graph where each vertex is a point and each edge is (drawn as) a straight-line segment, in fact, a horizontal or vertical straight-line segment on the xyxy-plane. There are different types of vertices in G(Q)G(Q). The projection of each point qQq\in Q defines a vertex, which we denote by vqv_{q}. When for two distinct points q,qQq,q^{\prime}\in Q the boundary of the projection of the boxes box(q)\textsc{box}(q) and the boundary of the projection of box(q)\textsc{box}(q^{\prime}) intersect outside the xx- and yy-axis, then they do so exactly once because of our assumption on general position, and this defines a vertex that we denote by vq,qv_{q,q^{\prime}}. (Not all pairs (q,q)(q,q^{\prime}) define such a vertex.) Additionally, each point qQq\in Q defines a vertex vx,qv_{x,q} at position (qx,0)(q_{x},0) and a vertex vy,qv_{y,q} at position (0,qy)(0,q_{y}). Finally, we have a vertex vx,yv_{x,y} placed at the origin. The vertices of G(Q)G(Q) are connected in a natural way: the boundary of the visible part of box(q)\textsc{box}(q) connects the points that appear on that boundary. In particular, the vertices on the xx-axis are connected and so do those on the yy-axis. Since we assume general position, each vertex uniquely determines the boxes that define it. Each vertex qQq\in Q defines a bounded face f(q,Q)f(q,Q) in G(Q)G(Q). This is the projection of the face on the boundary of 𝒰(Q)\mathcal{U}(Q) contained in the plane {(x,y,z)3z=qz}\{(x,y,z)\in\mathbb{R}^{3}\mid z=q_{z}\}, see Figure 9, right. In fact, each bounded face of G(Q)G(Q) is f(q,Q)f(q,Q) for some qQq\in Q.

Refer to caption
Figure 9: A sample of the different vertices in G(Q)G(Q) and the faces of G(Q)G(Q).

We triangulate each bounded face f(q,Q)f(q,Q) of G(Q)G(Q) canonically, as follows, see Figure 10. The boundary of a bounded face f(q,Q)f(q,Q) is made of a top horizontal segment t(q,Q)t(q,Q) (which may contain several edges of the graph), a right vertical segment r(q,Q)r(q,Q) (which may contain several edges of the graph), and a monotone path γ(q,Q)\gamma(q,Q) from the top, left corner to the bottom, right corner. Such a monotone path γ(q,Q)\gamma(q,Q) alternates horizontal and vertical segments and has non-decreasing xx-coordinates and non-increasing yy-coordinates. Let vt(q,Q)v_{t}(q,Q) be the first interior vertex of γ(q,Q)\gamma(q,Q) and let vr(q,Q)v_{r}(q,Q) be the last interior vertex of γ(q,Q)\gamma(q,Q). Note that vqv_{q} is the vertex where t(q,Q)t(q,Q) and r(q,Q)r(q,Q) meet. We add diagonals from vqv_{q} to all interior vertices of γ(q,Q)\gamma(q,Q), diagonals from vt(q,Q)v_{t}(q,Q) to all the interior vertices of t(q,Q)t(q,Q) and diagonals from vr(q,Q)v_{r}(q,Q) to all the interior vertices of r(q,Q)r(q,Q). This is the canonical triangulation of the face f(q,Q)f(q,Q), and we apply it to each bounded face of G(Q)G(Q).

Refer to caption
Figure 10: Triangulating a bounded face of G(Q)G(Q).

The outer face of G(Q)G(Q) may also have many vertices. We place on top the square σ\sigma, with vertices {1,M+1}2\{-1,M+1\}^{2}. From the vertices at (1,1)(-1,-1) and (M+1,M+1)(M+1,M+1) we add all possible edges, while keeping planarity. From the vertex (1,M+1)(-1,M+1) we add the edges to (1,1)(-1,-1), to (M+1,M+1)(M+1,M+1), and to the highest vertex on the yy-axis. Similarly, from the vertex (M+1,1)(M+1,-1) we add the edges to (1,1)(-1,-1), to (M+1,M+1)(M+1,M+1), and to the rightmost vertex on the xx-axis. With such an operation, the outer face is defined by the boundary of the square σ\sigma.

Let T(Q)T(Q) be the resulting geometric, embedded graph, see Figure 11. The graph T(Q)T(Q) is a triangulation of the square σ\sigma with internal vertices. It is easy to see that G(Q)G(Q) and T(Q)T(Q) have O(|Q|)O(|Q|) vertices and edges. For example, one can argue that G(Q)G(Q) has |Q|+1|Q|+1 faces and no parallel edges, and the graph T(Q)T(Q) is a triangulation of G(Q)G(Q) with 44 additional vertices. To treat some extreme cases, we also define T()=σT(\emptyset)=\sigma, as a graph, with the diagonal of positive slope.

Refer to caption
Figure 11: The graph T(Q)T(Q).

A polygonal domain is a subset of the plane defined by a polygon where we remove the interior of some polygons, which form holes. The combinatorial complexity of a domain DD, denoted by |D||D|, is the number of vertices and edges used to define it. We say that a polygonal curve or a family of polygonal curves in 2\mathbb{R}^{2} is QQ-compliant if the edges of of the curves are also edges of T(Q)T(Q). A polygonal domain DD is QQ-compliant if its boundary is QQ-compliant. Note that a QQ-compliant polygonal domain has combinatorial complexity O(|Q|)O(|Q|) because the graph T(Q)T(Q) has O(|Q|)O(|Q|) edges.

Consider a set QPQ\subseteq P and a QQ-compliant polygonal curve γ\gamma. Let PγP_{\gamma} be the points of PP that participate in the definition of the vertices on γ\gamma. Thus, if vqv_{q} is in γ\gamma, we add qq to PγP_{\gamma}; if vq,qv_{q,q^{\prime}} is in γ\gamma, we add qq and qq^{\prime} to PγP_{\gamma}; if vx,qv_{x,q} is in γ\gamma, we add qq to PγP_{\gamma}, and so on. Since each vertex on γ\gamma contributes O(1)O(1) vertices to PγP_{\gamma}, we have |Pγ|=O(|γ|)|P_{\gamma}|=O(|\gamma|). For a family Γ\Gamma of polygonal curves we define PΓ=γΓPγP_{\Gamma}=\cup_{\gamma\in\Gamma}P_{\gamma}. For a polygonal domain DD with boundary D\partial D we then have |PD|=O(|D|)|P_{\partial D}|=O(|D|).

Lemma 4.1.

If γ\gamma is a QQ-compliant polygonal curve then, for each QQ^{\prime} with PγQQP_{\gamma}\subseteq Q^{\prime}\subset Q, the curve γ\gamma is also QQ^{\prime}-compliant.

Proof.

For each edge ee of T(Q)T(Q), the edge ee is also contained in T(Q~)T(\tilde{Q}) for all Q~\tilde{Q} that contain PeP_{e}. It follows that T(Q)T(Q^{\prime}) has all the edges ee contained in γ\gamma, and thus T(Q)T(Q^{\prime}) contains γ\gamma. ∎

We are going to use dynamic programming based on planar separators of T(Q)T(Q^{*}) for an optimal solution QQ^{*}. A valid tuple to define a subproblem is a tuple (S,D,)(S,D,\ell), where SPS\subset P, DD is an SS-compliant polygonal domain, and \ell is a positive integer. The tuple (S,D,)(S,D,\ell) models a subproblem where the points of SS are already selected to be part of the feasible solution, DD is a SS-compliant domain so that we only care about the volume inside the cylinder D×D\times\mathbb{R}, and we can still select \ell points from P(D×)P\cap(D\times\mathbb{R}). We have two different values associated to each valid tuple, depending on which subsets QQ of vertices from PDP\cap D can be selected:

Φfree(S,D,)=max{\displaystyle\Phi_{\rm free}(S,D,\ell)~=~\max\{ vol(𝒰(SQ)(D×))QP(D×),|Q|}.\displaystyle\textsc{vol}(\mathcal{U}(S\cup Q)\cap(D\times\mathbb{R}))\mid Q\subset P\cap(D\times\mathbb{R}),~|Q|\leq\ell\}.
Φcomp(S,D,)=max{\displaystyle\Phi_{\rm comp}(S,D,\ell)~=~\max\{ vol(𝒰(SQ)(D×))QP(D×),|Q|,\displaystyle\textsc{vol}(\mathcal{U}(S\cup Q)\cap(D\times\mathbb{R}))\mid Q\subset P\cap(D\times\mathbb{R}),~|Q|\leq\ell,
D is (SQ)-compliant}.\displaystyle\text{$D$ is $(S\cup Q)$-compliant}\}.

Obviously, we have for all valid tuples (S,D,)(S,D,\ell)

Φcomp(S,D,)Φfree(S,D,).\Phi_{\rm comp}(S,D,\ell)~\leq~\Phi_{\rm free}(S,D,\ell).

On the other hand, we are interested in the valid tuple (,σ,k)(\emptyset,\sigma,k), for which we have Φfree(,σ,k)=Φcomp(,σ,k)\Phi_{\rm free}(\emptyset,\sigma,k)=\Phi_{\rm comp}(\emptyset,\sigma,k).

We would like to get a recursive formula for Φfree(S,D,)\Phi_{\rm free}(S,D,\ell) or Φcomp(S,D,)\Phi_{\rm comp}(S,D,\ell) using planar separators. More precisely, we would like to use a separator in T(SQ)T(S\cup Q^{*}) for an optimal solution, and then branch on all possible such separators. However, none of the two definitions seem good enough for this. If we would use Φfree(S,D,)\Phi_{\rm free}(S,D,\ell), then we divide into domains that may have too much freedom and the interaction between subproblems gets complex. If we would use Φcomp(S,D,)\Phi_{\rm comp}(S,D,\ell), then merging the problems becomes an issue. Thus, we take a mixed route where we argue that, for the valid tuples that are relevant for finding the optimal solution, we actually have Φfree=Φcomp\Phi_{\rm free}=\Phi_{\rm comp}.

We start showing how to compute Φcomp(S,D,)\Phi_{\rm comp}(S,D,\ell) in the obvious way. This will be used to solve the base cases of the recursion.

Lemma 4.2.

We can compute Φcomp(S,D,)\Phi_{\rm comp}(S,D,\ell) in O(n+2)O(n^{\ell+2}) time.

Proof.

We enumerate all the subsets QQ of PDP\cap D with \ell points, and for each such QQ we proceed as follows. We first build T(SQ)T(S\cup Q) and check whether DD is contained in the edge set of T(SQ)T(S\cup Q). If it is not, then DD is not (SQ)(S\cup Q)-compliant and we move to the next subset QQ. Otherwise, we compute 𝒰(SQ)\mathcal{U}(S\cup Q), its restriction to D×D\times\mathbb{R}, and its volume. Standard approaches can be used to do this in O((|S|+|Q|+|D|)2)=O(n2)O((|S|+|Q|+|D|)^{2})=O(n^{2}) time, for example working with the projection onto the xyxy-plane. (The actual degree of the polynomial is not relevant.) This procedure enumerates O(|P|)=O(n)O(|P|^{\ell})=O(n^{\ell}) subsets of PP and for each one spends O(n2)O(n^{2}) time. The result follows. ∎

A valid partition π\pi of (S,D,)(S,D,\ell) is a collection of valid tuples π={(S1,D1,1),,(St,Dt,t)}\pi=\left\{(S_{1},D_{1},\ell_{1}),\dots,(S_{t},D_{t},\ell_{t})\right\} such that

  • S1==St=SS0S_{1}=\dots=S_{t}=S\cup S_{0} for some set S0PDS_{0}\subset P\cap D;

  • |S0|=O(|S|+)|S_{0}|=O\left(\sqrt{|S|+\ell}\right);

  • the domains D1D_{1},\dots, DtD_{t} have pairwise disjoint interiors and D=iDiD=\bigcup_{i}D_{i};

  • =|S0|+ii\ell=|S_{0}|+\sum_{i}\ell_{i}; and

  • i2/3\ell_{i}\leq 2\ell/3 for each i=1,,ti=1,\dots,t.

Let Π(S,D,)\Pi(S,D,\ell) be the family of valid partitions for the tuple (S,D,)(S,D,\ell). We remark that different valid partitions may have different cardinality.

Lemma 4.3.

For each valid tuple (S,D,)(S,D,\ell) we have

Φfree(S,D,)maxπΠ(S,D,)(S,D,)πΦfree(S,D,).\Phi_{\rm free}(S,D,\ell)~\geq~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Phi_{\rm free}(S^{\prime},D^{\prime},\ell^{\prime}).
Proof.

For any valid partition πΠ(S,D,)\pi\in\Pi(S,D,\ell), let SπS_{\pi} be the smallest set such that S=SSπS^{\prime}=S\cup S_{\pi} for all tuples (S,D,)π(S^{\prime},D^{\prime},\ell^{\prime})\in\pi. This means that Sπ=SSS_{\pi}=S^{\prime}\setminus S for an arbitrary (S,D,)π(S^{\prime},D^{\prime},\ell^{\prime})\in\pi. For each such tuple (S,D,)π(S^{\prime},D^{\prime},\ell^{\prime})\in\pi, let Q(S,D,)Q^{*}(S^{\prime},D^{\prime},\ell^{\prime}) be an optimal solution to Φfree(S,D,)\Phi_{\rm free}(S^{\prime},D^{\prime},\ell^{\prime}), and define

Qπ=Sπ(S,D,)πQ(S,D,).Q_{\pi}=S_{\pi}\cup\bigcup_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}Q^{*}(S^{\prime},D^{\prime},\ell^{\prime}).

Then from the properties of valid partitions we have

|Qπ|=|Sπ|+(S,D,)π|Q(S,D,)|=|Sπ|+(S,D,)π=.|Q_{\pi}|~=~|S_{\pi}|+\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}|Q^{*}(S^{\prime},D^{\prime},\ell^{\prime})|~=~|S_{\pi}|+\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\ell^{\prime}~=~\ell.

Obviously, QπQ_{\pi} is contained in PDP\cap D because DD contains PDiP\cap D_{i}.

We have seen that for each valid partition πΠ(S,D,)\pi\in\Pi(S,D,\ell) the set QπQ_{\pi} is a feasible solution considered in the problem Φfree(S,D,)\Phi_{\rm free}(S,D,\ell). Therefore

Φfree(S,D,)maxπΠ(S,D,)vol(𝒰(SQπ)(D×)).\Phi_{\rm free}(S,D,\ell)~\geq~\max_{\pi\in\Pi(S,D,\ell)}\textsc{vol}(\mathcal{U}(S\cup Q_{\pi})\cap(D\times\mathbb{R})).

Using that the interiors of {D(S,D,)π}\{D^{\prime}\mid(S^{\prime},D^{\prime},\ell^{\prime})\in\pi\} are pairwise disjoint, and then using that SQ(S,D,)S^{\prime}\cup Q^{*}(S^{\prime},D^{\prime},\ell^{\prime}) is contained in SQπS\cup Q_{\pi} for all (S,D,)π(S^{\prime},D^{\prime},\ell^{\prime})\in\pi, we obtain

Φfree(S,D,)\displaystyle\Phi_{\rm free}(S,D,\ell)~ maxπΠ(S,D,)(S,D,)πvol(𝒰(SQπ)(D×))\displaystyle\geq~\max_{\pi\in\Pi(S,D,\ell)}\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\textsc{vol}(\mathcal{U}(S\cup Q_{\pi})\cap(D^{\prime}\times\mathbb{R}))
maxπΠ(S,D,)(S,D,)πvol(𝒰(SQ(S,D,))(D×)).\displaystyle\geq~\max_{\pi\in\Pi(S,D,\ell)}\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\textsc{vol}(\mathcal{U}(S^{\prime}\cup Q^{*}(S^{\prime},D^{\prime},\ell^{\prime}))\cap(D^{\prime}\times\mathbb{R})).

Since Q(S,D,)Q^{*}(S^{\prime},D^{\prime},\ell^{\prime}) is optimal for Φfree(S,D,)\Phi_{\rm free}(S^{\prime},D^{\prime},\ell^{\prime}), we obtain the desired

Φfree(S,D,)maxπΠ(S,D,)(S,D,)πΦfree(S,D,).\Phi_{\rm free}(S,D,\ell)~\geq~\max_{\pi\in\Pi(S,D,\ell)}\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Phi_{\rm free}(S^{\prime},D^{\prime},\ell^{\prime}).\qed
Lemma 4.4.

For each valid tuple (S,D,)(S,D,\ell) we have

Φcomp(S,D,)maxπΠ(S,D,)(S,D,)πΦcomp(S,D,).\Phi_{\rm comp}(S,D,\ell)~\leq~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Phi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime}).
Proof.

Let QQ^{*} be the optimal solution defining Φcomp(S,D,)\Phi_{\rm comp}(S,D,\ell). Thus, QPDQ^{*}\subseteq P\cap D has at most \ell points, DD is (SQ)(S\cup Q^{*})-compliant, and

Φcomp(S,D,)=vol(𝒰(SQ)(D×)).\Phi_{\rm comp}(S,D,\ell)~=~\textsc{vol}(\mathcal{U}(S\cup Q^{*})\cap(D\times\mathbb{R})).

Consider the triangulation T(SQ)T(S\cup Q^{*}). This is a 33-connected planar graph. Recall that the boundary of DD is contained in T(SQ)T(S\cup Q^{*}) because DD is (SQ)(S\cup Q^{*})-compliant. Note that T(SQ)T(S\cup Q^{*}) has O(|SQ|)=O(|S|+)O(|S\cup Q^{*}|)=O(|S|+\ell) vertices.

Assign weight 1/|Q|1/|Q^{*}| to the vertices vqv_{q}, qQq\in Q^{*}, and weight 0 to the rest of vertices in T(SQ)T(S\cup Q^{*}). The sum of the weights is obviously 11. Because of the cycle-separator theorem of Miller [25], there is a cycle γ\gamma in T(SQ)T(S\cup Q^{*}) with O(|S|+)O(\sqrt{|S|+\ell}) vertices, such that the interior of γ\gamma has at most 2|Q|/32|Q^{*}|/3 vertices of QQ^{*} and the exterior of γ\gamma has at most 2|Q|/32|Q^{*}|/3 vertices of QQ^{*}.

Since γ\gamma has O(|S|+)O(\sqrt{|S|+\ell}) vertices, the set PγP_{\gamma} also has O(|S|+)O(\sqrt{|S|+\ell}) vertices. Note that PγSQP_{\gamma}\subseteq S\cup Q^{*}. Take S0=PγSS_{0}=P_{\gamma}\setminus S, so that SPγS\cup P_{\gamma} is the disjoint union of SS and S0S_{0}. Because of Lemma 4.1, the domain DD and the cycle γ\gamma are (SS0)(S\cup S_{0})-compliant.

The cycle γ\gamma breaks the domain DD into at least 22 domains. Let 𝒟={D1,,Dt}\mathcal{D}=\{D_{1},\dots,D_{t}\} be those domains. Since the boundary of each domain Di𝒟D_{i}\in\mathcal{D} is contained in Dγ\partial D\cup\gamma, each domain Di𝒟D_{i}\in\mathcal{D} is (SS0)(S\cup S_{0})-compliant. For each domain Di𝒟D_{i}\in\mathcal{D}, let Qi={qQ(SS0)vqDi}Q^{*}_{i}=\{q\in Q^{*}\setminus(S\cup S_{0})\mid v_{q}\in D_{i}\} and let i=|Qi|\ell_{i}=|Q^{*}_{i}|. Since the interior of DiD_{i} is either in the interior or the exterior of γ\gamma, we have i2/3\ell_{i}\leq 2\ell/3 for each Di𝒟D_{i}\in\mathcal{D}. Moreover, ||=|S0|+ii|\ell|=|S_{0}|+\sum_{i}\ell_{i} because the points qq of QQ^{*} that could be counted twice have the corresponding vertex vqv_{q} on γ\gamma, but then they also belong to PγSS0P_{\gamma}\subset S\cup S_{0} and thus cannot belong to QiQ_{i}^{*}.

The properties we have derived imply that πγ={(SS0,Di,i)i=1,,t}\pi_{\gamma}=\{(S\cup S_{0},D_{i},\ell_{i})\mid i=1,\dots,t\} is a valid partition of (S,D,)(S,D,\ell), and thus πγΠ(S,D,)\pi_{\gamma}\in\Pi(S,D,\ell). Moreover QiQ^{*}_{i} is a feasible solution for the problem Φcomp(SS0,Di,i)\Phi_{\rm comp}(S\cup S_{0},D_{i},\ell_{i}). Indeed, since DiD_{i} is (SS0)(S\cup S_{0})-compliant and (SQ)(S\cup Q^{*})-compliant, Lemma 4.1 implies that DiD_{i} is also (SS0Qi)(S\cup S_{0}\cup Q^{*}_{i})-compliant.

Note that, for each (SS0,Di,i)(S\cup S_{0},D_{i},\ell_{i}) in the partition πγ\pi_{\gamma} we have

vol(𝒰(SQ)(Di×))=vol(𝒰(SS0Qi)(Di×)).\textsc{vol}(\mathcal{U}(S\cup Q^{*})\cap(D_{i}\times\mathbb{R}))~=~\textsc{vol}(\mathcal{U}(S\cup S_{0}\cup Q^{*}_{i})\cap(D_{i}\times\mathbb{R})). (1)

Indeed, for a point qQ(SS0Qi)q\in Q^{*}\setminus(S\cup S_{0}\cup Q^{*}_{i}), box(q)\textsc{box}(q) may contribute to the union 𝒰(SPγQ)\mathcal{U}(S\cup P_{\gamma}\cup Q^{*}), but when projected onto the xyxy-plane it lies outside the domain DiD_{i} because the face f(q,SQ)f(q,S\cup Q^{*}) lies outside DiD_{i}.

Therefore we obtain

Φcomp(S,D,)\displaystyle\Phi_{\rm comp}(S,D,\ell)~ =vol(𝒰(SQ)(D×))ivol(𝒰(SQ)(Di×)),\displaystyle=~\textsc{vol}(\mathcal{U}(S\cup Q^{*})\cap(D\times\mathbb{R}))~\leq~\sum_{i}\textsc{vol}(\mathcal{U}(S\cup Q^{*})\cap(D_{i}\times\mathbb{R})),

where we used D=iDiD=\bigcup_{i}D_{i}. With equation (1), and then using that QiQ_{i}^{*} is feasible for Φcomp(SS0,Di,i)\Phi_{\rm comp}(S\cup S_{0},D_{i},\ell_{i}), we get

Φcomp(S,D,)\displaystyle\Phi_{\rm comp}(S,D,\ell)~ ivol(𝒰(SS0Qi)(Di×))\displaystyle\leq~\sum_{i}\textsc{vol}(\mathcal{U}(S\cup S_{0}\cup Q^{*}_{i})\cap(D_{i}\times\mathbb{R}))
iΦcomp(SS0,Di,i)=(S,D,)πγΦcomp(S,D,).\displaystyle\leq~\sum_{i}\Phi_{\rm comp}(S\cup S_{0},D_{i},\ell_{i})~=~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi_{\gamma}}\Phi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime}).

The statement now follows since πγΠ(S,D,)\pi_{\gamma}\in\Pi(S,D,\ell). ∎

Our dynamic programming algorithm closely follows the inequality of Lemma 4.4. Specifically, we define for each valid tuple (S,D,)(S,D,\ell) the value

Ψcomp(S,D,)={Φcomp(S,D,)if O(k);maxπΠ(S,D,)(S,D,)πΨcomp(S,D,),otherwise.\Psi_{\rm comp}(S,D,\ell)~=~\begin{cases}\Phi_{\rm comp}(S,D,\ell)&\text{if $\ell\leq O(\sqrt{k})$;}\\ {\displaystyle\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Psi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime})},&\text{otherwise.}\\ \end{cases}
Lemma 4.5.

For each valid tuple (S,D,)(S,D,\ell) we have

Φcomp(S,D,)Ψcomp(S,D,)Φfree(S,D,).\Phi_{\rm comp}(S,D,\ell)~\leq~\Psi_{\rm comp}(S,D,\ell)~\leq~\Phi_{\rm free}(S,D,\ell).
Proof.

We show this by induction on \ell. When O(k)\ell\leq O(\sqrt{k}), then from the definitions we have

Ψcomp(S,D,)=Φcomp(S,D,)Φfree(S,D,).\Psi_{\rm comp}(S,D,\ell)~=~\Phi_{\rm comp}(S,D,\ell)~\leq~\Phi_{\rm free}(S,D,\ell).

This covers the base cases. For larger values of \ell, we use Lemma 4.4, the induction hypothesis, and the definition of Ψcomp()\Psi_{\rm comp}(\cdot) to derive

Φcomp(S,D,)\displaystyle\Phi_{\rm comp}(S,D,\ell)~ maxπΠ(S,D,)(S,D,)πΦcomp(S,D,)\displaystyle\leq~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Phi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime})
maxπΠ(S,D,)(S,D,)πΨcomp(S,D,)\displaystyle\leq~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Psi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime})
=Ψcomp(S,D,).\displaystyle=~\Psi_{\rm comp}(S,D,\ell).

Also for larger values of \ell, we use the definition of Ψcomp()\Psi_{\rm comp}(\cdot), the induction hypothesis, and Lemma 4.3, to derive

Ψcomp(S,D,)\displaystyle\Psi_{\rm comp}(S,D,\ell)~ =maxπΠ(S,D,)(S,D,)πΨcomp(S,D,)\displaystyle=~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Psi_{\rm comp}(S^{\prime},D^{\prime},\ell^{\prime})
maxπΠ(S,D,)(S,D,)πΦfree(S,D,)\displaystyle\leq~\max_{\pi\in\Pi(S,D,\ell)}~\sum_{(S^{\prime},D^{\prime},\ell^{\prime})\in\pi}\Phi_{\rm free}(S^{\prime},D^{\prime},\ell^{\prime})
Φfree(S,D,).\displaystyle\leq~\Phi_{\rm free}(S,D,\ell).

Since we know that Φfree(,σ,k)=Φcomp(,σ,k)\Phi_{\rm free}(\emptyset,\sigma,k)=\Phi_{\rm comp}(\emptyset,\sigma,k), Lemma 4.5 implies that Ψcomp(,σ,k)=Φfree(,σ,k)\Psi_{\rm comp}(\emptyset,\sigma,k)=\Phi_{\rm free}(\emptyset,\sigma,k). Hence, it suffices to compute Ψcomp(,σ,k)\Psi_{\rm comp}(\emptyset,\sigma,k) using its recursive definition. In the remainder, we bound the running time of this algorithm.

Theorem 4.6.

In 3 dimensions, Volume Selection can be solved in time nO(k)n^{O(\sqrt{k})}.

Proof.

We compute Ψcomp(,σ,k)\Psi_{\rm comp}(\emptyset,\sigma,k) using its recursive definition. We need a bound on the number of different subproblems, defined by valid tuples (S,D,)(S,D,\ell) that appear in the recursion. We will see that there are nO(k)n^{O(\sqrt{k})} different subproblems.

Starting with (S1,D1,1)=(,σ,k)(S_{1},D_{1},\ell_{1})=(\emptyset,\sigma,k), consider a sequence of valid tuples (S1,D1,1)(S_{1},D_{1},\ell_{1}), (S2,D2,2)(S_{2},D_{2},\ell_{2}), \dots such that, for i2i\geq 2, the tuple (Si,Di,i)(S_{i},D_{i},\ell_{i}) appears in some valid partition of (Si1,Di1,i1)(S_{i-1},D_{i-1},\ell_{i-1}). Because of the properties of valid partitions, we have i2i1/3\ell_{i}\leq 2\ell_{i-1}/3 and |Si1||Si||Si1|+O(|Si|+i1)|S_{i-1}|\leq|S_{i}|\leq|S_{i-1}|+O(\sqrt{|S_{i}|+\ell_{i-1}}).

Let i0i_{0} be the first index ii with |Si|>i|S_{i}|>\ell_{i}. Consider first the indices i<i0i<i_{0}, where |Si|i|S_{i}|\leq\ell_{i}. Then |Si||Si1|+O(i1)|S_{i}|\leq|S_{i-1}|+O(\sqrt{\ell_{i-1}}) and it follows by induction that

|Si|\displaystyle|S_{i}|~ |S1|+O(1)+O(2)++O(i1)\displaystyle\leq~|S_{1}|+O(\sqrt{\ell_{1}})+O(\sqrt{\ell_{2}})+\dots+O(\sqrt{\ell_{i-1}})
0+O(j<ij)O(j<i(23)j1)O(1)O(k),\displaystyle\leq~0+O\bigg{(}\sum_{j<i}\sqrt{\ell_{j}}\bigg{)}~\leq~O\Bigg{(}\sum_{j<i}\sqrt{\left(\frac{2}{3}\right)^{j}\ell_{1}}\Bigg{)}~\leq~O\Big{(}\sqrt{\ell_{1}}\Big{)}~\leq~O\Big{(}\sqrt{k}\Big{)},

where we have used that 1=k\ell_{1}=k. By definition of i0i_{0}, for i>i0i>i_{0} we have |Si||Si0|+i02|Si0|=O(k)|S_{i}|\leq|S_{i_{0}}|+\ell_{i_{0}}\leq 2|S_{i_{0}}|=O(\sqrt{k}). Therefore, for all indices ii we have |Si|=O(k)|S_{i}|=O(\sqrt{k}).

For each valid tuple that appears in the recursive computation of Ψcomp(,σ,k)\Psi_{\rm comp}(\emptyset,\sigma,k), there is some sequence of valid tuples, as considered before, that contains it. It follows that, for all valid tuples (S,D,)(S,D,\ell) considered through the algorithm we have |S|=O(k)|S|=O(\sqrt{k}).

Let us give an upper bound on the valid tuples (S,D,)(S,D,\ell) that appear in the computation. There are nO(k)n^{O(\sqrt{k})} choices for the set SS. Once we have fixed SS, the domain DD has to be SS-compliant, and this means that we have to select edges in the triangulated graph T(S)T(S). Since T(S)T(S) has O(|S|)=O(k)O(|S|)=O(\sqrt{k}) vertices and edges, there are at most 2|E(T(S))|=2O(k)2^{|E(T(S))|}=2^{O(\sqrt{k})} possible choices for DD. Finally, we have kk options for the value \ell. Therefore, there are at most

nO(k)2O(k)k=nO(k)n^{O(\sqrt{k})}\cdot 2^{O(\sqrt{k})}\cdot k~=~n^{O(\sqrt{k})}

valid tuples (S,D,)(S,D,\ell) that appear in the recursion.

We next bound how much time we spend for each tuple. Consider a valid tuple (S,D,)(S,D,\ell) that appears through the recursion. If =O(k)\ell=O(\sqrt{k}), we compute Ψcomp(S,D,)\Psi_{\rm comp}(S,D,\ell) using Lemma 4.2 in nO()=nO(k)n^{O(\ell)}=n^{O(\sqrt{k})} time. Otherwise, to compute Ψcomp(S,D,)\Psi_{\rm comp}(S,D,\ell) we have to iterate over all the valid partitions Π(S,D,)\Pi(S,D,\ell). There are nO(k)n^{O(\sqrt{k})} such valid partitions. Indeed, we have to select the subset S0DPS_{0}\subset D\cap P with O(k)O(\sqrt{k}) vertices and then the partitioning of DD into regions D1,,DtD_{1},\dots,D_{t} that are (SS0)(S\cup S_{0})-compliant. This can be bounded by nO(k)n^{O(\sqrt{k})}. (Alternatively, we can iterate over the nO(k)n^{O(\sqrt{k})} possible options to define the separating cycle γ\gamma used in the proof of Lemma 4.4.)

We conclude that in the computation of Ψcomp(,σ,k)\Psi_{\rm comp}(\emptyset,\sigma,k) we have to consider nO(k)n^{O(\sqrt{k})} valid tuples and for each one of them computing Ψcomp()\Psi_{\rm comp}(\cdot) takes nO(k)n^{O(\sqrt{k})} time. The result follows. ∎

We only described an algorithm that computes VolSel(P,k)\textsc{VolSel}(P,k), i.e., the maximal volume realized by any size-kk subset of PP. It is easy to augment the algorithm with appropriate bookkeeping to also compute an actual optimal subset.

5 Efficient Polynomial-Time Approximation Scheme

In this section we design an approximation algorithm for Volume Selection.

Theorem 5.1.

Given a point set PP of size nn in >0d\mathbb{R}_{>0}^{d}, 0kn0\leq k\leq n, and 0<ε1/20<\varepsilon\leq 1/2, we can compute a (1±ε)(1\pm\varepsilon)-approximation of VolSel(P,k)\textsc{VolSel}(P,k) in time O(nεd(logn+k+2O(ε2log1/ε)d))O(n\cdot\varepsilon^{-d}(\log n+k+2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}})). We can also compute a set SPS\subseteq P of size at most kk such that μ(S)\mu(S) is a (1ε)(1-\varepsilon)-approximation of VolSel(P,k)\textsc{VolSel}(P,k) in the same time.

We also discuss an improvement to time O(2O(ε2log1/ε)dnlogn)O\big{(}2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\cdot n\log n\big{)} in Section 5.4.

The approach is based on the shifting technique of Hochbaum and Maass [21]. However, there are some non-standard aspects in our application. It is impossible to break the problem into independent subproblems because all the anchored boxes intersect around the origin. We instead break the input into subproblems that are almost independent. To achieve this, we use an exponential grid, instead of the usual regular grid with equal-size cells. Alternatively, this could be interpreted as using a regular grid in a log\log-log\log plot of the input points.

Throughout this section we need two numbers λ,τd/ε\lambda,\tau\approx d/\varepsilon. Specifically, we define τ\tau as the smallest integer larger than d/εd/\varepsilon, and λ\lambda as the smallest power of (1ε)1/d(1-\varepsilon)^{-1/d} larger than d/εd/\varepsilon. We consider a partitioning of the positive quadrant >0d\mathbb{R}_{>0}^{d} into regions of the form

R(x¯):=i=1d[λxi,λxi+1)forx¯=(x1,,xd)d.R(\bar{x}):=\prod_{i=1}^{d}[\lambda^{x_{i}},\lambda^{x_{i}+1})\quad\text{for}\quad\bar{x}=(x_{1},\ldots,x_{d})\in\mathbb{Z}^{d}.

On top of this partitioning we consider a grid, where each grid cell contains (τ1)d(\tau-1)^{d} regions and the grid boundaries are thick, i.e., two grid cells do not touch but have a region in between. More precisely, for any offset ¯=(1,,d)d{\bar{\ell}}=(\ell_{1},\ldots,\ell_{d})\in\mathbb{Z}^{d}, we define the grid cells

C¯(y¯):=i=1d[λτyi+i+1,λτ(yi+1)+i)fory¯=(y1,,yd)d.C_{\bar{\ell}}(\bar{y}):=\prod_{i=1}^{d}[\lambda^{\tau\cdot y_{i}+\ell_{i}+1},\lambda^{\tau(y_{i}+1)+\ell_{i}})\quad\text{for}\quad\bar{y}=(y_{1},\ldots,y_{d})\in\mathbb{Z}^{d}.

Note that each grid cell indeed consists of (τ1)d(\tau-1)^{d} regions, and the space not contained in any grid cell (i.e., the grid boundaries) consists of all regions R(x¯)R(\bar{x}) with xii(modτ)x_{i}\equiv\ell_{i}\pmod{\tau} for some 1id1\leq i\leq d.

Our approximation algorithm now works as follows (cf. the pseudocode given below).

(1) Iterate over all grid offsets ¯[τ]d{\bar{\ell}}\in[\tau]^{d}. This is the key step of the shifting technique of Hochbaum and Maass [21].

(2) For any choice of the offset ¯{\bar{\ell}}, remove all points not contained in any grid cell, i.e., remove points contained in the thick grid boundaries. This yields a set PPP^{\prime}\subseteq P of remaining points.

(3) The grid cells now induce a partitioning of PP^{\prime} into sets P1,,PmP_{1}^{\prime},\ldots,P_{m}^{\prime}, where each PiP_{i}^{\prime} is the intersection of PP^{\prime} with a grid cell CiC_{i} (with Ci=C¯(y¯(i))C_{i}=C_{\bar{\ell}}(\bar{y}^{(i)}) for some y¯(i)d\bar{y}^{(i)}\in\mathbb{Z}^{d}). Note that these grid cell subproblems P1,,PmP_{1}^{\prime},\ldots,P_{m}^{\prime} are not independent, since any two boxes have a common intersection near the origin, no matter how different their coordinates are. However, we will see that we may treat P1,,PmP_{1}^{\prime},\ldots,P_{m}^{\prime} as independent subproblems since we only want an approximation.

(4) We discretize by rounding down all coordinates of all points in P1,,PmP_{1}^{\prime},\ldots,P_{m}^{\prime} to powers of333Here we use that λ\lambda is a power of (1ε)1/d(1-\varepsilon)^{-1/d}, to ensure that rounded points are contained in the same cells as their originals. (1ε)1/d(1-\varepsilon)^{1/d}. We can remove duplicate points that are rounded to the same coordinates. This yields sets P~1,,P~m\tilde{P}_{1},\ldots,\tilde{P}_{m}. Note that within each grid cell in any dimension the largest and smallest coordinate differ by a factor of at most λτ1\lambda^{\tau-1}. Hence, there are at most log(1ε)1/d(λτ1)=O(ε2log1/ε)\log_{(1-\varepsilon)^{-1/d}}(\lambda^{\tau-1})=O(\varepsilon^{-2}\log 1/\varepsilon) different rounded coordinates in each dimension, and thus the total number of points in each P~i\tilde{P}_{i} is O(ε2log1/ε)dO(\varepsilon^{-2}\log 1/\varepsilon)^{d}.

(5) Since there are only few points in each P~i\tilde{P}_{i}, we can precompute all Volume Selection solutions on each set P~i\tilde{P}_{i}, i.e., for any 1im1\leq i\leq m and any 0k|P~i|0\leq k^{\prime}\leq|\tilde{P}_{i}| we precompute VolSel(P~i,k)\textsc{VolSel}(\tilde{P}_{i},k^{\prime}). We do so by exhaustively enumerating all 2|P~i|2^{|\tilde{P}_{i}|} subsets SS of P~i\tilde{P}_{i}, and for each one computing μ(S)\mu(S) by inclusion-exclusion in time O(2|S|)O(2^{|S|}) (see, e.g., [34, 35]). This runs in total time O(m2O(ε2log1/ε)d)=O(n2O(ε2log1/ε)d)O(m\cdot 2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}})=O(n\cdot 2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}).

(6) It remains to split the kk points that we want to choose over the subproblems P~1,,P~m\tilde{P}_{1},\ldots,\tilde{P}_{m}. As we treat these subproblems independently, we compute

V(¯):=maxk1++kmki=1mVolSel(P~i,ki).V({\bar{\ell}}):=\max_{k_{1}+\ldots+k_{m}\leq k}\;\sum_{i=1}^{m}\textsc{VolSel}(\tilde{P}_{i},k_{i}).

Note that if the subproblems would be independent, then this expression would yield the exact result. We argue below that the subproblems are sufficiently close to being independent that this expression yields a (1ε)(1-\varepsilon)-approximation of VolSel(i=1mP~i,k)\textsc{VolSel}(\bigcup_{i=1}^{m}\tilde{P}_{i},k). Observe that the expression V(¯)V({\bar{\ell}}) can be computed efficiently by dynamic programming, where we compute for each ii and kk^{\prime} the following value:

T[i,k]=maxk1++kiki=1iVolSel(P~i,ki).T[i,k^{\prime}]=\max_{k_{1}+\ldots+k_{i}\leq k^{\prime}}\;\sum_{i^{\prime}=1}^{i}\textsc{VolSel}(\tilde{P}_{i^{\prime}},k_{i^{\prime}}).

The following rule computes this table (see the pseudocode below for further details):

T[i,k]=max0κmin{k,|P~i|}(VolSel(P~i,κ)+T[i1,kκ]).T[i,k^{\prime}]=\max_{0\leq\kappa\leq\min\{k^{\prime},|\tilde{P}_{i}|\}}\big{(}\textsc{VolSel}(\tilde{P}_{i},\kappa)+T[i-1,k^{\prime}-\kappa]\big{)}.

(7) Finally, we optimize over the offset ¯{\bar{\ell}} by returning the maximal V(¯)V({\bar{\ell}}).

This finishes the description of the approximation algorithm. In pseudocode, this yields the following procedure.

  1. (1)

    Iterate over all offsets ¯=(1,,d)[τ]d{\bar{\ell}}=(\ell_{1},\ldots,\ell_{d})\in[\tau]^{d}:

    1. (2)

      P:=PP^{\prime}:=P. Delete any pp from PP^{\prime} that is not contained in any grid cell C¯(y¯)C_{\bar{\ell}}(\bar{y}).

    2. (3)

      Partition PP^{\prime} into P1,,PmP_{1}^{\prime},\ldots,P_{m}^{\prime}, where Pi=PCiP_{i}^{\prime}=P^{\prime}\cap C_{i} for some grid cell CiC_{i}.

    3. (4)

      Round down all coordinates to powers of (1ε)1/d(1-\varepsilon)^{1/d} and remove duplicates, obtaining P~1,,P~m\tilde{P}_{1},\ldots,\tilde{P}_{m}.

    4. (5)

      Compute H[i,k]:=VolSel(P~i,k)H[i,k^{\prime}]:=\textsc{VolSel}(\tilde{P}_{i},k^{\prime}) for all 1im1\leq i\leq m, 0k|P~i|0\leq k^{\prime}\leq|\tilde{P}_{i}|.

    5. (6)

      Compute V(¯):=maxk1++kmki=1mVolSel(P~i,ki)V({\bar{\ell}}):=\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(\tilde{P}_{i},k_{i}) by dynamic programming:

      1. Initialize T[i,k]=0T[i,k^{\prime}]=0 for all 0im0\leq i\leq m, 0kk0\leq k^{\prime}\leq k.

      2. For i=1,,mi=1,\ldots,m, for κ=0,,|P~i|\kappa=0,\ldots,|\tilde{P}_{i}|, and for k=κ,κ+1,,kk^{\prime}=\kappa,\kappa+1,\ldots,k:

        1. Set T[i,k]:=max{T[i,k],H[i,κ]+T[i1,kκ]}T[i,k^{\prime}]:=\max\{T[i,k^{\prime}],H[i,\kappa]+T[i-1,k^{\prime}-\kappa]\}

      3. Set V(¯):=T[m,k]V({\bar{\ell}}):=T[m,k].

  2. (7)

    Return max¯V(¯)\max_{\bar{\ell}}V({\bar{\ell}}).

5.1 Running Time

Step (1) yields a factor τd=O(1ε)d\tau^{d}=O(\frac{1}{\varepsilon})^{d} in the running time. Since we can compute for each point in constant time the grid cell it is contained in, step (2) runs in time O(n)O(n). For the partitioning in step (3), we use a dictionary data structure storing all y¯d\bar{y}\in\mathbb{Z}^{d} with nonempty PC¯(y¯)P^{\prime}\cap C_{\bar{\ell}}(\bar{y}). Then we can assign any point pPp\in P^{\prime} to the other points in its cell by one lookup in the dictionary, in time O(logn)O(\log n). Thus, step (3) can be performed in time O(nlogn)O(n\log n). Step (4) immediately works in the same running time. For step (5) we already argued above that it can be performed in time O(n2O(ε2log1/ε)d)O\big{(}n2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\big{)}. Finally, from the pseudocode for step (6) we read off a running time of O(i=1m|P~i|k)=O(nk)O(\sum_{i=1}^{m}|\tilde{P}_{i}|\cdot k)=O(nk). The total running time is thus

O(nεd(logn+k+2O(ε2log1/ε)d)).O\Big{(}n\cdot\varepsilon^{-d}\big{(}\log n+k+2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\big{)}\Big{)}.

5.2 Correctness

The following lemmas show that the above algorithm indeed computes a (1±O(ε))(1\pm O(\varepsilon))-approximation of VolSel(P)\textsc{VolSel}(P). Reducing ε\varepsilon by an appropriate constant factor then yields a (1±ε)(1\pm\varepsilon)-approximation.

Lemma 5.2 (Removing grid boundaries).

Let PP be a point set and let 0k|P|0\leq k\leq|P|. Remove all points contained in grid boundaries with offset ¯{\bar{\ell}} to obtain the point set P¯:=Py¯dC¯(y¯)P_{\bar{\ell}}:=P\cap\bigcup_{\bar{y}\in\mathbb{Z}^{d}}C_{\bar{\ell}}(\bar{y}). Then for all ¯d{\bar{\ell}}\in\mathbb{Z}^{d} we have

VolSel(P¯,k)VolSel(P,k),\textsc{VolSel}(P_{\bar{\ell}},k)\leq\textsc{VolSel}(P,k),

and for some ¯d{\bar{\ell}}\in\mathbb{Z}^{d} we have

VolSel(P¯,k)(1ε)VolSel(P,k).\textsc{VolSel}(P_{\bar{\ell}},k)\geq(1-\varepsilon)\textsc{VolSel}(P,k).
Proof.

Since we only remove points, the first inequality is immediate. For the second inequality we use a probabilistic argument. Consider an optimal solution, i.e., a set SPS\subseteq P of size at most kk with μ(S)=VolSel(P,k)\mu(S)=\textsc{VolSel}(P,k). Let S¯:=SP¯S_{\bar{\ell}}:=S\cap P_{\bar{\ell}}. For a uniformly random offset ¯[τ]d{\bar{\ell}}\in[\tau]^{d}, consider the probability that a fixed point pSp\in S survives, i.e., we have pS¯p\in S_{\bar{\ell}}. Consider the region R(x¯)=i=1d[λxi,λxi+1)R(\bar{x})=\prod_{i=1}^{d}[\lambda^{x_{i}},\lambda^{x_{i}+1}) containing point pp, where x¯=(x1,,xd)d\bar{x}=(x_{1},\ldots,x_{d})\in\mathbb{Z}^{d}. Recall that the grid boundaries consist of all regions R(x¯)R(\bar{x}) with xii(modτ)x_{i}\equiv\ell_{i}\pmod{\tau} for some 1id1\leq i\leq d. For a uniformly random ¯{\bar{\ell}}, for fixed ii the equation xii(modτ)x_{i}\equiv\ell_{i}\pmod{\tau} holds with probability 1/τ1/\tau. By a union bound, the probability that at least one of these equations holds for 1id1\leq i\leq d is at most d/τεd/\tau\leq\varepsilon (by definition of τ\tau as the smallest integer larger than d/εd/\varepsilon). Hence, pp survives with probability at least 1ε1-\varepsilon.

Now for each point q𝒰(S)q\in\mathcal{U}(S) identify a point s(q)Ss(q)\in S dominating qq. Since s(q)s(q) survives in S¯S_{\bar{\ell}} with probability at least 1ε1-\varepsilon, the point qq is dominated by S¯S_{\bar{\ell}} with probability at least 1ε1-\varepsilon. By integrating over all q𝒰(S)q\in\mathcal{U}(S) we thus obtain an expected volume of

𝔼¯[μ(S¯)]=𝒰(S)Pr[q is dominated by S¯]𝑑q𝒰(S)(1ε)𝑑q=(1ε)μ(S).\mathbb{E}_{\bar{\ell}}[\mu(S_{\bar{\ell}})]=\int_{\mathcal{U}(S)}\Pr[q\text{ is dominated by }S_{\bar{\ell}}]dq\geq\int_{\mathcal{U}(S)}(1-\varepsilon)dq=(1-\varepsilon)\mu(S).

It follows that for some ¯{\bar{\ell}} we have μ(S¯)𝔼[μ(S¯)](1ε)μ(S)\mu(S_{\bar{\ell}})\geq\mathbb{E}[\mu(S_{\bar{\ell}})]\geq(1-\varepsilon)\mu(S). For this ¯{\bar{\ell}} we have

VolSel(P¯,k)μ(S¯)(1ε)μ(S)=(1ε)VolSel(P,k),\textsc{VolSel}(P_{\bar{\ell}},k)\geq\mu(S_{\bar{\ell}})\geq(1-\varepsilon)\mu(S)=(1-\varepsilon)\textsc{VolSel}(P,k),

where the first inequality uses |S¯|k|S_{\bar{\ell}}|\leq k and the definition of VolSel as maximizing over all subsets, and the last inequality holds since we picked SS as an optimal solution, realizing VolSel(P,k)\textsc{VolSel}(P,k). ∎

Lemma 5.3 (Rounding down coordinates).

Let PP be a point set, and let P~\tilde{P} be the same point set after rounding down all coordinates to powers of (1ε)1/d(1-\varepsilon)^{-1/d}. Then for any kk

(1ε)VolSel(P,k)VolSel(P~,k)VolSel(P,k).(1-\varepsilon)\textsc{VolSel}(P,k)\leq\textsc{VolSel}(\tilde{P},k)\leq\textsc{VolSel}(P,k).
Proof.

Let P^\hat{P} be the set PP with all coordinates scaled down by a factor α:=(1ε)1/d\alpha:=(1-\varepsilon)^{1/d}. By a simple scaling invariance, we have VolSel(P^,k)=αdVolSel(P,k)=(1ε)VolSel(P,k)\textsc{VolSel}(\hat{P},k)=\alpha^{d}\cdot\textsc{VolSel}(P,k)=(1-\varepsilon)\textsc{VolSel}(P,k). Note that for any point p~P~\tilde{p}\in\tilde{P} the corresponding point pPp\in P dominates p~\tilde{p}, and the corresponding point p^P^\hat{p}\in\hat{P} is dominated by p~\tilde{p}. Now pick any subset S~\tilde{S} of P~\tilde{P} of size kk, and let S,S^S,\hat{S} be the corresponding subsets of P,P^P,\hat{P}. Then we have 𝒰(S^)𝒰(S~)𝒰(S)\mathcal{U}(\hat{S})\subseteq\mathcal{U}(\tilde{S})\subseteq\mathcal{U}(S), which implies μ(S^)μ(S~)μ(S)\mu(\hat{S})\leq\mu(\tilde{S})\leq\mu(S), and thus

(1ε)VolSel(P,k)=VolSel(P^,k)VolSel(P~,k)VolSel(P,k).(1-\varepsilon)\textsc{VolSel}(P,k)=\textsc{VolSel}(\hat{P},k)\leq\textsc{VolSel}(\tilde{P},k)\leq\textsc{VolSel}(P,k).\qed

In the proof of the next lemma it becomes important that we have used the thick grid boundaries, with a separating region, when defining the grid cells.

Lemma 5.4 (Treating subproblems as independent I).

For any offset ¯{\bar{\ell}}, let S1,,SmS_{1},\ldots,S_{m} be point sets contained in different grid cells with respect to offset ¯{\bar{\ell}}. Then we have

(1ε)i=1mμ(Si)μ(i=1mSi)i=1mμ(Si).(1-\varepsilon)\sum_{i=1}^{m}\mu(S_{i})\leq\mu\Big{(}\bigcup_{i=1}^{m}S_{i}\Big{)}\leq\sum_{i=1}^{m}\mu(S_{i}).
Proof.

The second inequality is essentially the union bound. Specifically, for any sets X1,,XmX_{1},\ldots,X_{m} the volume of i=1mXi\bigcup_{i=1}^{m}X_{i} is at most the sum over all volumes of XiX_{i} for 1im1\leq i\leq m. In particular, this statement holds with Xi=𝒰(Si)X_{i}=\mathcal{U}(S_{i}), which yields the second inequality.

For the first inequality, observe that we obtain the total volume of all points dominated by S1SmS_{1}\cup\ldots\cup S_{m} by summing up the volume of all points dominated by SiS_{i} but not by any SjS_{j}, j<ij<i, for each 1im1\leq i\leq m, i.e., we have

μ(i=1mSi)=i=1m(μ(Si)vol(𝒰(Si)j<i𝒰(Sj))).\mu\Big{(}\bigcup_{i=1}^{m}S_{i}\Big{)}=\sum_{i=1}^{m}\bigg{(}\mu(S_{i})-\textsc{vol}\Big{(}\mathcal{U}(S_{i})\cap\bigcup_{j<i}\mathcal{U}(S_{j})\Big{)}\bigg{)}. (2)

Now let C¯(y¯(i))C_{\bar{\ell}}(\bar{y}^{(i)}) be the grid cell containing PiP_{i} for 1im1\leq i\leq m, where y¯(i)=(y1(i),,yd(i))d\bar{y}^{(i)}=(y^{(i)}_{1},\ldots,y^{(i)}_{d})\in\mathbb{Z}^{d}. We may assume that these cells are ordered in non-decreasing order of y1(i)++yd(i)y^{(i)}_{1}+\ldots+y^{(i)}_{d}. Observe that in this ordering, for any j<ij<i we have yt(j)<yt(i)y^{(j)}_{t}<y^{(i)}_{t} for some 1td1\leq t\leq d. Recall that C¯(y¯)=t=1d[λτyt+t+1,λτ(yt+1)+t)C_{\bar{\ell}}(\bar{y})=\prod_{t=1}^{d}[\lambda^{\tau\cdot y_{t}+\ell_{t}+1},\lambda^{\tau(y_{t}+1)+\ell_{t}}). It follows that each point in j<i𝒰(Sj)\bigcup_{j<i}\mathcal{U}(S_{j}) has tt-th coordinate at most δt:=λτyt+t\delta_{t}:=\lambda^{\tau\cdot y_{t}+\ell_{t}} for some 1td1\leq t\leq d. Setting Dt:={(z1,,zd)0dztδt}D_{t}:=\{(z_{1},\ldots,z_{d})\in\mathbb{R}_{\geq 0}^{d}\mid z_{t}\leq\delta_{t}\}, we thus have j<i𝒰(Sj)t=1dDt\bigcup_{j<i}\mathcal{U}(S_{j})\subseteq\bigcup_{t=1}^{d}D_{t}, which yields

vol(𝒰(Si)j<i𝒰(Sj))vol(𝒰(Si)t=1dDt)t=1dvol(𝒰(Si)Dt).\textsc{vol}\Big{(}\mathcal{U}(S_{i})\cap\bigcup_{j<i}\mathcal{U}(S_{j})\Big{)}\leq\textsc{vol}\Big{(}\mathcal{U}(S_{i})\cap\bigcup_{t=1}^{d}D_{t}\Big{)}\leq\sum_{t=1}^{d}\textsc{vol}\big{(}\mathcal{U}(S_{i})\cap D_{t}\big{)}. (3)

Let AA be the (d1)(d-1)-dimensional volume of the intersection of 𝒰(Si)\mathcal{U}(S_{i}) with the plane xt=0x_{t}=0. Since all points in SiS_{i} have tt-th coordinate at least λτyt+t+1=λδt\lambda^{\tau\cdot y_{t}+\ell_{t}+1}=\lambda\cdot\delta_{t}, we have μ(Si)Aλδt\mu(S_{i})\geq A\cdot\lambda\cdot\delta_{t}. Moreover, 𝒰(Si)Dt\mathcal{U}(S_{i})\cap D_{t} has dd-dimensional volume AδtA\cdot\delta_{t}. Together, this yields vol(𝒰(Si)Dt)μ(Si)/λ\textsc{vol}(\mathcal{U}(S_{i})\cap D_{t})\leq\mu(S_{i})/\lambda. With (2) and (3), we thus obtain

μ(i=1mSi)i=1m(μ(Si)dμ(Si)/λ)(1ε)i=1mμ(Si),\mu\Big{(}\bigcup_{i=1}^{m}S_{i}\Big{)}\geq\sum_{i=1}^{m}\big{(}\mu(S_{i})-d\cdot\mu(S_{i})/\lambda\big{)}\geq(1-\varepsilon)\sum_{i=1}^{m}\mu(S_{i}),

since λd/ε\lambda\geq d/\varepsilon. ∎

Lemma 5.5 (Treating subproblems as independent II).

For any offset ¯{\bar{\ell}}, let P1,,PmP_{1},\ldots,P_{m} be point sets contained in different grid cells, and k0k\geq 0. Set P:=i=1mPiP:=\bigcup_{i=1}^{m}P_{i}. Then we have

(1ε)maxk1++kmki=1mVolSel(Pi,ki)VolSel(P,k)maxk1++kmki=1mVolSel(Pi,ki).(1-\varepsilon)\cdot\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(P_{i},k_{i})\leq\textsc{VolSel}(P,k)\leq\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(P_{i},k_{i}).
Proof.

Consider an optimal solution SS of VolSel(P,k)\textsc{VolSel}(P,k) and let Si:=SPiS_{i}:=S\cap P_{i} for 1im1\leq i\leq m. Then by choice of SS as an optimal solution, and by Lemma 5.4, we have

VolSel(P,k)=μ(S)=μ(i=1mSi)i=1mμ(Si).\textsc{VolSel}(P,k)=\mu(S)=\mu\Big{(}\bigcup_{i=1}^{m}S_{i}\Big{)}\leq\sum_{i=1}^{m}\mu(S_{i}).

Since VolSel maximizes over all subsets and i=1m|Si|=|S|k\sum_{i=1}^{m}|S_{i}|=|S|\leq k, we further obtain

i=1mμ(Si)i=1mVolSel(Pi,|Si|)maxk1++kmki=1mVolSel(Pi,ki).\sum_{i=1}^{m}\mu(S_{i})\leq\sum_{i=1}^{m}\textsc{VolSel}(P_{i},|S_{i}|)\leq\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(P_{i},k_{i}).

This shows the second inequality.

For the first inequality, we pick sets S1,,SmS_{1},\ldots,S_{m}, where SiPiS_{i}\subseteq P_{i} for all ii and i=1m|Si|k\sum_{i=1}^{m}|S_{i}|\leq k, realizing maxk1++kmki=1mVolSel(Pi,ki)=i=1mμ(Si)\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(P_{i},k_{i})=\sum_{i=1}^{m}\mu(S_{i}). We then argue analogously:

(1ε)maxk1++kmki=1mVolSel(Pi,ki)=(1ε)i=1mμ(Si)μ(i=1mSi)VolSel(P,k).(1-\varepsilon)\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(P_{i},k_{i})=(1-\varepsilon)\sum_{i=1}^{m}\mu(S_{i})\leq\mu\Big{(}\bigcup_{i=1}^{m}S_{i}\Big{)}\leq\textsc{VolSel}(P,k).\qed

Note that the above lemmas indeed prove that the algorithm returns a (1±O(ε))(1\pm O(\varepsilon))-approximation to the value VolSel(P,k)\textsc{VolSel}(P,k). In step (2) we delete the points containing the the grid boundaries, which yields an approximation for some choice of the offset ¯{\bar{\ell}} by Lemma 5.2. As we iterate over all possible choices for ¯{\bar{\ell}} and maximize over the resulting volume, we obtain an approximation. In step (4) we round down coordinates, which yields an approximation by Lemma 5.3. Finally, in step (6) we solve the problem maxk1++kmki=1mVolSel(P~i,ki)\max_{k_{1}+\ldots+k_{m}\leq k}\sum_{i=1}^{m}\textsc{VolSel}(\tilde{P}_{i},k_{i}), which yields an approximation to VolSel(i=1mP~i,k)\textsc{VolSel}(\bigcup_{i=1}^{m}\tilde{P}_{i},k) by Lemma 5.5. All other steps do not change the point set or the considered problem. The final approximation factor is 1±O(ε)1\pm O(\varepsilon).

5.3 Computing an Output Set

The above algorithm only gives an approximation for the value VolSel(P,k)\textsc{VolSel}(P,k), but does not yield a subset SPS\subseteq P of size kk realizing this value. However, by tracing the dynamic programming table we can reconstruct the values k1++kmkk_{1}+\ldots+k_{m}\leq k with V(¯)=i=1mVolSel(P~i,ki)V({\bar{\ell}})=\sum_{i=1}^{m}\textsc{VolSel}(\tilde{P}_{i},k_{i}). By storing in step (5) not only the values H[i,k]H[i,k^{\prime}] but also corresponding subsets S~i,kP~i\tilde{S}_{i,k^{\prime}}\subset\tilde{P}_{i}, we can thus construct a subset S~=S~1,k1S~m,km\tilde{S}=\tilde{S}_{1,k_{1}}\cup\ldots\cup\tilde{S}_{m,k_{m}} with V(¯)=i=1mμ(S~i,ki)V({\bar{\ell}})=\sum_{i=1}^{m}\mu(\tilde{S}_{i,k_{i}}). Lemma 5.4 now implies that

μ(S~)(1ε)V(¯).\mu(\tilde{S})\geq(1-\varepsilon)V({\bar{\ell}}).

By storing in step (4) for each rounded point an original point, we can construct a set SS corresponding to the rounded points S~\tilde{S} such that

μ(S)μ(S~)(1ε)V(¯)(1O(ε))VolSel(P,k),\mu(S)\geq\mu(\tilde{S})\geq(1-\varepsilon)V({\bar{\ell}})\geq(1-O(\varepsilon))\textsc{VolSel}(P,k),

and thus SS is a subset of PP of size at most kk yielding a (1O(ε))(1-O(\varepsilon))-approximation of the optimal volume VolSel(P,k)\textsc{VolSel}(P,k).

Note that we do not compute the exact volume μ(S)\mu(S) of the output set SS. Instead, the value V(¯)V({\bar{\ell}}) only is a (1+O(ε))(1+O(\varepsilon))-approximation of μ(S)\mu(S). To explain this effect, recall that exactly computing μ(T)\mu(T) for any given set TT takes time nΘ(d)n^{\Theta(d)} (under the Exponential Time Hypothesis). As our running time is O(n2)O(n^{2}) for any constant d,εd,\varepsilon, we cannot expect to compute μ(S)\mu(S) exactly.

5.4 Improved Algorithm

The following improvement was suggested to us by Timothy Chan. For constant dd and ε\varepsilon the algorithm shown above runs in time O(n(k+logn))O(n(k+\log n)). The bottleneck for the O(nk)O(nk)-term is step (6): Given Hi(k):=VolSel(P~i,k)H_{i}(k^{\prime}):=\textsc{VolSel}(\tilde{P}_{i},k^{\prime}) for all 1im1\leq i\leq m, 0k|P~i|0\leq k^{\prime}\leq|\tilde{P}_{i}|, we want to compute

maxk1++kmki=1mHi(ki).\max_{k_{1}+\ldots+k_{m}\leq k}\;\sum_{i=1}^{m}H_{i}(k_{i}).

Note that it suffices to compute an (1+ε)(1+\varepsilon)-approximation to this value, to end up with an (1+O(ε))(1+O(\varepsilon))-approximation overall.

This problem is an instance of the multiple-choice 0/1 knapsack problem, where we are given a budget WW and items jSj\in S with corresponding weights wjw_{j} and profits pjp_{j}, as well as a partitioning S=S1SmS=S_{1}\cup\ldots\cup S_{m}, and the task is to compute the maximum jTpj\sum_{j\in T}p_{j} over all sets TST\subseteq S satisfying jTwjW\sum_{j\in T}w_{j}\leq W and |TSi|=1|T\cap S_{i}|=1 for all 1im1\leq i\leq m. In order to cast the above problem as an instance of multiple-choice 0/1 knapsack, we simply set Si:={0,1,,min{k,|P~i|}}S_{i}:=\{0,1,\ldots,\min\{k,|\tilde{P}_{i}|\}\} and define pj:=Hi(j)p_{j}:=H_{i}(j) and wj=jw_{j}=j for all jSij\in S_{i}. We also set W:=kW:=k. Note that now the constraint jTwjW\sum_{j\in T}w_{j}\leq W corresponds to k1++kmkk_{1}+\ldots+k_{m}\leq k and the objective jTpj\sum_{j\in T}p_{j} corresponds to i=1mHi(ki)\sum_{i=1}^{m}H_{i}(k_{i}).

For the multiple-choice 0/1 knapsack problem there are known PTAS techniques. In particular, in his Master‘s thesis, Rhee [29, Section 4.2] claims a time bound of O(mε2log(m/ε)maxj|Sj|+|S|log|S|)O(m\varepsilon^{-2}\log(m/\varepsilon)\max_{j}|S_{j}|+|S|\log|S|). In our case, we have mnm\leq n and |Sj|=min{k,|P~i|}+1=O(ε2log1/ε)d|S_{j}|=\min\{k,|\tilde{P}_{i}|\}+1=O(\varepsilon^{-2}\log 1/\varepsilon)^{d}. Moreover, |S|mmaxj|Sj||S|\leq m\cdot\max_{j}|S_{j}|. This yields a time of O(nlog(n/ε)(ε2log1/ε)d)O(n\log(n/\varepsilon)\cdot(\varepsilon^{-2}\log 1/\varepsilon)^{d}).

Plugging this solution for step (6) into the algorithm from the previous sections, we obtain time

O(nεd(logn+log(n/ε)(ε2log1/ε)d+2O(ε2log1/ε)d)).O\Big{(}n\cdot\varepsilon^{-d}\big{(}\log n+\log(n/\varepsilon)\cdot(\varepsilon^{-2}\log 1/\varepsilon)^{d}+2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\big{)}\Big{)}.

This can be simplified to O(n(log(n/ε)ε3dlogd(1/ε)+2O(ε2log1/ε)d))O\big{(}n\big{(}\log(n/\varepsilon)\cdot\varepsilon^{-3d}\cdot\log^{d}(1/\varepsilon)+2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\big{)}\big{)}, which is bounded by O(2O(ε2log1/ε)dnlogn)O\big{(}2^{O(\varepsilon^{-2}\log 1/\varepsilon)^{d}}\cdot n\log n\big{)}.

6 Conclusions

We considered the volume selection problem, where we are given nn points in >0d\mathbb{R}_{>0}^{d} and want to select kk of them that maximize the volume of the union of the spanned anchored boxes. We show: (1) Volume selection is NP-hard in dimension d=3d=3 (previously this was only known when dd is part of the input). (2) In 3 dimensions, we design an nO(k)n^{O(\sqrt{k})} algorithm (the previously best was Ω((nk))\Omega\big{(}\binom{n}{k}\big{)}). (3) We design an efficient polynomial time approximation scheme for any constant dimension dd (previously only a (11/e)(1-1/e)-approximation was known).

We leave open to improve our NP-hardness result to a matching lower bound under the Exponential Time Hypothesis, e.g., to show that in d=3d=3 any algorithm takes time nΩ(k)n^{\Omega(\sqrt{k})} and in any constant dimension d4d\geq 4 any algorithm takes time nΩ(k)n^{\Omega(k)}. Alternatively, there could be a faster algorithm, e.g., in time nO(k11/d)n^{O(k^{1-1/d})}. Finally, we leave open to figure out the optimal dependence on n,k,d,εn,k,d,\varepsilon of a (1ε)(1-\varepsilon)-approximation algorithm.

Moving away from the applications, one could also study volume selection on general axis-aligned boxes in d\mathbb{R}^{d}, i.e., not necessarily anchored boxes. This problem General Volume Selection is an optimization variant of Klee‘s measure problem and thus might be theoretically motivated. However, General Volume Selection is probably much harder than the restriction to anchored boxes, by analogies to the problem of computing an independent set of boxes, which is not known to have a PTAS [1]. In particular, General Volume Selection is NP-hard already in 2 dimensions, which follows from NP-hardness of computing an independent set in a family of congruent squares in the plane [18, 22].

Acknowledgements

This work was initiated during the Fixed-Parameter Computational Geometry Workshop at the Lorentz Center, 2016. We are grateful to the other participants of the workshop and the Lorentz Center for their support. We are especially grateful to Günter Rote for several discussions and related work.

References

  • [1] A. Adamaszek and A. Wiese. Approximation schemes for maximum weight independent set of rectangles. In Proc. of the 54th IEEE Symp. on Found. of Comp. Science (FOCS), pages 400–409. IEEE, 2013.
  • [2] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler. Investigating and exploiting the bias of the weighted hypervolume to articulate user preferences. In Proc. of the 11th Conf. on Genetic and Evolutionary Computation (GECCO), pages 563–570. ACM, 2009.
  • [3] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler. Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications. Theoretical Comp. Science, 425:75–103, 2012.
  • [4] J. Bader. Hypervolume-based search for multiobjective optimization: theory and methods. PhD thesis, ETH Zurich, Zurich, Switzerland, 1993.
  • [5] F. Barahona. On the computational complexity of Ising spin glass models. J. of Physics A: Mathematical and General, 15(10):3241, 1982.
  • [6] N. Beume, C. M. Fonseca, M. López-Ibáñez, L. Paquete, and J. Vahrenhold. On the complexity of computing the hypervolume indicator. IEEE Trans. on Evolutionary Computation, 13(5):1075–1082, 2009.
  • [7] N. Beume, B. Naujoks, and M. Emmerich. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European J. of Operational Research, 181(3):1653–1669, 2007.
  • [8] K. Bringmann. Bringing order to special cases of Klee’s measure problem. In Proc. of the 38th Int. Symp. on Mathematical Foundations of Comp. Science (MFCS), pages 207–218. Springer, 2013.
  • [9] K. Bringmann and T. Friedrich. Approximating the volume of unions and intersections of high-dimensional geometric objects. Computational Geometry, 43(6):601 – 610, 2010.
  • [10] K. Bringmann and T. Friedrich. An efficient algorithm for computing hypervolume contributions. Evolutionary Computation, 18(3):383–402, 2010.
  • [11] K. Bringmann and T. Friedrich. Approximating the least hypervolume contributor: NP-hard in general, but fast in practice. Theoretical Comp. Science, 425:104–116, 2012.
  • [12] K. Bringmann, T. Friedrich, and P. Klitzke. Generic postprocessing via subset selection for hypervolume and epsilon-indicator. In Proc. of the 13th Int. Conf. on Parallel Problem Solving from Nature (PPSN), pages 518–527. Springer, 2014.
  • [13] K. Bringmann, T. Friedrich, and P. Klitzke. Two-dimensional subset selection for hypervolume and epsilon-indicator. In Proc. of the 16th Conf. on Genetic and Evolutionary Comput. (GECCO), pages 589–596. ACM, 2014.
  • [14] T. M. Chan. A (slightly) faster algorithm for Klee‘s measure problem. Computational Geometry, 43(3):243–250, 2010.
  • [15] T. M. Chan. Klee‘s measure problem made easy. In Proc. of the 54th IEEE Symp. on Found. of Comp. Science (FOCS), pages 410–419. IEEE, 2013.
  • [16] J. Chen, X. Huang, I. A. Kanj, and G. Xia. Linear FPT reductions and computational lower bounds. In Proc. of the 36th ACM Symp. on Theory of Computing (STOC), pages 212–221. ACM, 2004.
  • [17] M. Emmerich, A. H. Deutz, and I. Yevseyeva. A Bayesian approach to portfolio selection in multicriteria group decision making. Procedia Comp. Science, 64:993–1000, 2015.
  • [18] R. J. Fowler, M. S. Paterson, and S. L. Tanimoto. Optimal packing and covering in the plane are NP-complete. Information Processing Lett., 12(3):133–137, 1981.
  • [19] M. R. Garey and D. S. Johnson. The rectilinear Steiner tree problem in NP complete. SIAM J. of Applied Math., 32:826–834, 1977.
  • [20] A. P. Guerreiro, C. M. Fonseca, and L. Paquete. Greedy hypervolume subset selection in low dimensions. Evolutionary Computation, 24(3):521–544, 2016.
  • [21] D. S. Hochbaum and W. Maass. Approximation schemes for covering and packing problems in image processing and VLSI. J. ACM, 32(1):130–136, 1985.
  • [22] H. Imai and T. Asano. Finding the connected components and a maximum clique of an intersection graph of rectangles in the plane. J. of Algorithms, 4(4):310–323, 1983.
  • [23] J. D. Knowles, D. W. Corne, and M. Fleischer. Bounded archiving using the Lebesgue measure. In Proc. of the 2003 Congress on Evolutionary Computation (CEC), volume 4, pages 2490–2497. IEEE, 2003.
  • [24] T. Kuhn, C. M. Fonseca, L. Paquete, S. Ruzika, M. M. Duarte, and J. R. Figueira. Hypervolume subset selection in two dimensions: Formulations and algorithms. Evolutionary Computation, 2015.
  • [25] G. L. Miller. Finding small simple cycle separators for 2-connected planar graphs. J. Comput. Syst. Sci., 32(3):265–279, 1986.
  • [26] J. S. B. Mitchell and M. Sharir. New results on shortest paths in three dimensions. In Proc. of the 20th ACM Symp. on Computational Geometry (SoCG), pages 124–133, 2004.
  • [27] N. J. A. Sloane, editor. The on-line encyclopedia of integer sequences. Published electronically at https://oeis.org. Visited November 19, 2016.
  • [28] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming, 14(1):265–294, 1978.
  • [29] D. Rhee. Faster fully polynomial approximation schemes for knapsack problems, 2015. Master‘s thesis. https://dspace.mit.edu/handle/1721.1/98564.
  • [30] G. Rote, K. Buchin, K. Bringmann, S. Cabello, and M. Emmerich. Selecting kk points that maximize the convex hull volume (extended abstract). In JCDCG3 2016; The 19th Japan Conf. on Discrete and Computational Geometry, Graphs, and Games, pages 58–60, 9 2016. http://www.jcdcgg.u-tokai.ac.jp/JCDCG3_abstracts.pdf.
  • [31] J. A. Storer. On minimal-node-cost planar embeddings. Networks, 14(2):181–212, 1984.
  • [32] R. Tamassia and I. G. Tollis. Planar grid embedding in linear time. IEEE Trans. on Circuits and Systems, 36(9):1230–1234, 1989.
  • [33] T. Ulrich and L. Thiele. Bounding the effectiveness of hypervolume-based (μ\mu+λ\lambda)-archiving algorithms. In Learning and Intelligent Optimization, pages 235–249. Springer, 2012.
  • [34] L. While, P. Hingston, L. Barone, and S. Huband. A faster algorithm for calculating hypervolume. IEEE Trans. on Evolutionary Computation, 10(1):29–38, 2006.
  • [35] J. Wu and S. Azarm. Metrics for quality assessment of a multiobjective design optimization solution set. J. of Mechanical Design, 123(1):18–25, 2001.
  • [36] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca. Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. on Evolutionary Computation, 7(2):117–132, 2003.