This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Hong Kong University of Science and Technology
11email: {jganad,golin}@cse.ust.hk

Fully Dynamic kk-Center in Low Dimensions via Approximate Furthest Neighbors

Jinxiang Gan    Mordecai Jay Golin
Abstract

Let PP be a set of points in some metric space. The approximate furthest neighbor problem is, given a second point set C,C, to find a point pPp\in P that is a (1+ϵ)(1+\epsilon) approximate furthest neighbor from C.C.

The dynamic version is to maintain P,P, over insertions and deletions of points, in a way that permits efficiently solving the approximate furthest neighbor problem for the current P.P.

We provide the first algorithm for solving this problem in metric spaces with finite doubling dimension. Our algorithm is built on top of the navigating net data-structure.

An immediate application is two new algorithms for solving the dynamic kk-center problem. The first dynamically maintains (2+ϵ)(2+\epsilon) approximate kk-centers in general metric spaces with bounded doubling dimension and the second maintains (1+ϵ)(1+\epsilon) approximate Euclidean kk-centers. Both these dynamic algorithms work by starting with a known corresponding static algorithm for solving approximate kk-center, and replacing the static exact furthest neighbor subroutine used by that algorithm with our new dynamic approximate furthest neighbor one.

Unlike previous algorithms for dynamic kk-center with those same approximation ratios, our new ones do not require knowing kk or ϵ\epsilon in advance. In the Euclidean case, our algorithm also seems to be the first deterministic solution.

Keywords:
PTAS, Dynamic Algorithms, kk-center, Furthest Neighbor.

1 Introduction

The main technical result of this paper is an efficient procedure for calculating approximate furthest neighbors from a dynamically changing point set P.P. This procedure, in turn, will lead to the development of two new simple algorithms for maintaining approximate kk-centers in dynamically changing point sets.

Let B(c,r)B(c,r) denote the ball centered at cc with radius rr. The kk-center problem is to find a minimum radius rr^{*} and associated CC such that the union of balls cCB(c,r)\bigcup_{c\in C}B(c,r^{*}) contains all of the points in P.P.

In the arbitrary metric space version of the problem, the centers are restricted to be points in P.P. In the Euclidean kk-center problem (𝒳,d)=(D,2)(\mathcal{X},d)=\left(\mathbb{R}^{D},\ell_{2}\right) with D1D\geq 1 and CC may be any set of kk points in D.\mathbb{R}^{D}. The Euclidean 11-center problem is also known as the minimum enclosing ball (MEB) problem.

An ρ\rho-approximation algorithm would find a set of centers C,C^{\prime}, |C|k,|C^{\prime}|\leq k, and radius rr^{\prime} in polynomial time such that cCB(c,r)\bigcup_{c\in C^{\prime}}B(c,r^{\prime}) contains all of the points in PP and rρrr^{\prime}\leq\rho r^{*}. The kk-center problem is known to be NP-hard to approximate with a factor smaller than 2 for arbitrary metric spaces[HN79], and with a factor smaller than 3\sqrt{3} for Euclidean spaces [FG88].

Static algorithms

There do exist two 2-approximation algorithms in [Gon85, HS85] for the kk center problem on an arbitrary metric space; the best-known approximation factor for Euclidean kk-center remains 2 even for two-dimensional space when kk is part of the input (see [FG88]). There are better results for the special case of the Euclidean kk-center for fixed kk, k=1k=1 or 22 (e.g., see [BHPI02, BC03, AS10, KA15, KS20]). There are also PTASs [BHPI02, BC03, KS20] for the Euclidean kk center when kk and DD are constants.

Dynamic algorithms

In many practical applications, the data set PP is not static but changes dynamically over time, e.g, a new point may be inserted to or deleted from PP at each step. CC and rr then need to be recomputed at selected query times. If only insertions are permitted, the problem is incremental; if both insertions and deletions are permitted, the problem is fully dynamic.

The running time of such dynamic algorithms are often split into the time required for an update (to register a change in the storing data structure) and the time required for a query (to solve the problem on the current dataset). In dynamic algorithms, we require both update and query time to be nearly logarithmic or constant. The static versions take linear time.

Some known results on these problems are listed in Table 1. As is standard, many of them are stated in terms of the aspect ratio of point set PP. Let dmax=sup{d(x,y):x,yP and xy}d_{max}=\sup\{d(x,y):x,y\in P\text{ and }x\neq y\} and dmin=inf{d(x,y):x,yP and xy}d_{min}=\inf\{d(x,y):x,y\in P\text{ and }x\neq y\}. The aspect ratio Δ\Delta of PP is Δ=dmaxdmin\Delta=\frac{d_{max}}{d_{min}}.

Arbitrary Metric Space (𝒳,d)(\mathcal{X},d)
Author Approx. Dimensions Update Time Query Time Fixed
Chan et al. [CGS18] 2+ϵ2+\epsilon High O(k2logΔϵk^{2}\frac{\log\Delta}{\epsilon}) (avg.) O(k)O(k) k,ϵk,\epsilon
Goranci et al. [GHL+21] 2+ϵ2+\epsilon Low O((2/ϵ)O(dim(𝒳))logΔloglogΔlnϵ1)O((2/\epsilon)^{O(dim(\mathcal{X}))}\log\Delta\log\log\Delta\cdot\ln\epsilon^{-1}) O(logΔ+k)O(\log\Delta+k) ϵ\epsilon
Bateni et al. [BEJM21] 2+ϵ2+\epsilon High O(logΔlognϵ(k+logn)O(\frac{\log\Delta\log n}{\epsilon}(k+\log n))(avg.) O(k)O(k) k,ϵk,\epsilon
This paper 2+ϵ2+\epsilon Low O(2O(dim(𝒳))logΔloglogΔ)O\left(2^{O(dim(\mathcal{X}))}\log\Delta\log\log\Delta\right) O(k2(logΔ+(1/ϵ)O(dim(𝒳)))O(k^{2}(\log\Delta+(1/\epsilon)^{O(dim(\mathcal{X})}))
Euclidean Space (D,2)(\mathbb{R}^{D},\ell_{2})
Author Approx. Dimensions Update Time Query Time Fixed
Chan [Cha09] 1+ϵ1+\epsilon Low O((1ϵ)DkO(1)logn)O((\frac{1}{\epsilon})^{D}k^{O(1)}\log n) (avg) O(ϵDklogklogn+(kϵ)O(k11/D))O(\epsilon^{-D}k\log k\log n+(\frac{k}{\epsilon})^{O(k^{1-1/D})}) k,ϵk,\epsilon
Schmidt and Sohler [SS19] 1616 Low O((2d+1)dlog2Δlogn)O((2\sqrt{d}+1)^{d}\log^{2}\Delta\log n) (avg.) O((2d+1)d(logΔ+logn))O((2\sqrt{d}+1)^{d}(\log\Delta+\log n))
Schmidt and Sohler [SS19] O(fD)O(f\cdot D) High O(D2log2nlogΔn1/f)O(D^{2}\log^{2}n\log\Delta n^{1/f}) (avg.) O(fDlognlogΔ)O(f\cdot D\cdot\log n\log\Delta) ff
(*) Bateni et al. [BEJM21] f(8+ϵ)f(\sqrt{8}+\epsilon) High O(logδ1logΔϵDn1/f2+o(1))O(\frac{\log\delta^{-1}\log\Delta}{\epsilon}Dn^{1/f^{2}+o(1)}) ϵ,f\epsilon,f
This paper 1+ϵ1+\epsilon Low O(2O(D)logΔloglogΔ)O\left(2^{O(D)}\log\Delta\log\log\Delta\right) O(Dk(logΔ+(1/ϵ)O(D))2klogk/ϵ)O(D\cdot k(\log\Delta+(1/\epsilon)^{O(D)})2^{k\log k/\epsilon})
Table 1: Previous results on approximate dynamic kk-centers. More information on the model used by each is in the text. Note that all algorithms listed provide correct results except for Schmidt and Sohler [SS19], which maintains O(fD)O(f\cdot D) with probability 11/n1-1/n, and Bateni et al. [BEJM21], which maintains a f(8+ϵ)f(\sqrt{8}+\epsilon) solution with probability 1δ1-\delta. [BEJM21] also combines the updates and queries.

The algorithms listed in the table work under slightly different models. More explicitly:

  1. 1.

    For arbitrary metric spaces, both [GHL+21] and the current paper assume that the metric space has a bounded doubling dimension dim(𝒳)dim(\mathcal{X}) (see Definition 2).

  2. 2.

    In “Low dimension”, update time may be exponential in DD; in “High dimension” it may not.

  3. 3.

    The “fixed” column denotes parameter(s) that must be fixed in advance when initializing the corresponding data structure, e.g., kk and/or ϵ.\epsilon. In addition, in both [SS19, BEJM21] for high dimensional space, f1f\geqslant 1 is a constant selected in advance that appears in both the approximation factor and running time.

    The data structure used in the current paper is the navigating nets from [KL04]. It does not require knowing kk or ϵ\epsilon in advance but instead supports them as parameters to the query.

  4. 4.

    In [Cha09], (avg) denotes that the update time is in expectation (it is a randomized algorithm).

  5. 5.

    Schmidt and Sohler [SS19] answers the slightly different membership query. Given pp, it returns the cluster containing p.p. In low dimension, the running time of their algorithm is expected and amortized.

Our contributions and techniques

Our main results are two algorithms for solving the dynamic approximate kk-center problem in, respectively, arbitrary metric spaces with a finite doubling dimension and in Euclidean space.

  1. 1.

    Our first new algorithm is for any metric space with finite doubling dimension:

    Theorem 1.1

    Let (𝒳,d)(\mathcal{X},d) be a metric space with a finite doubling dimension DD. Let PXP\subset X be a dynamically changing set of points. We can maintain PP in O(2O(D)logΔloglogΔ)O(2^{O(D)}\log\Delta\log\log\Delta) time per point insertion and deletion so as to support (2+ϵ)(2+\epsilon) approximate kk-center queries in O(k2(logΔ+(1/ϵ)O(D)))O(k^{2}(\log\Delta+(1/\epsilon)^{O(D)})) time.

    Compared with previous results (see table 1), our data structure does not require knowing ϵ\epsilon or kk in advance, while the construction of the previous data structure depends on kk or ϵ\epsilon as basic knowledge.

  2. 2.

    Our second new algorithm is for the Euclidean kk-center problem:

    Theorem 1.2

    Let PDP\subset\mathbb{R}^{D} be a dynamically changing set of points. We can maintain PP in O(2O(D)logΔloglogΔ)O(2^{O(D)}\log\Delta\log\log\Delta) time per point insertion and deletion so as to support (1+ϵ)(1+\epsilon) approximate kk-center queries in O(Dk(logΔ+(1/ϵ)O(D))2klogk/ϵ)O(D\cdot k(\log\Delta+(1/\epsilon)^{O(D)})2^{k\log k/\epsilon}) time.

    This algorithm seems to be the first deterministic dynamic solution for the Euclidean kk-center problem. Chan [Cha09] presents a randomized dynamic algorithm while they do not find a way to derandomize it.

The motivation for our new approach was the observation that many previous results e.g., [BC03, BHPI02, Cha09, Gon85, KS20], on static kk-center, work by iteratively searching the furthest neighbor in PP from a changing set of points C.C.

The main technical result of this paper is an efficient procedure for calculating approximate furthest neighbors from a dynamically changing point set P.P. This procedure, in turn, will lead to the development of two new simple algorithms for maintaining approximate kk-centers in dynamically changing point sets.

Consider a set of nn points PP in some metric space (𝒳,d).({\mathcal{X}},d). A nearest neighbor in PP to a query point q,q, is a point pPp\in P satisfying d(p,q)=minpPd(p,q)=d(P,q).d(p,q)=\min_{p^{\prime}\in P}d(p^{\prime},q)=d(P,q). A (1+ϵ)(1+\epsilon) approximate nearest neighbor to qq is a point pPp\in P satisfying d(p,q)(1+ϵ)d(P,q).d(p,q)\leq(1+\epsilon)d(P,q).

Similarly, a furthest neighbor to a query point qq is a pp satisfying d(p,q)=maxpPd(p,q)d(p,q)=\max_{p^{\prime}\in P}d(p^{\prime},q). A (1+ϵ)(1+\epsilon) approximate furthest neighbor to qq is a point pPp\in P satisfying maxpPd(p,q)(1+ϵ)d(p,q).\max_{p^{\prime}\in P}d(p^{\prime},q)\leq(1+\epsilon)d(p,q).

There exist efficient algorithms for maintaining a dynamic point set PP (under insertions and deletions) that, given query point qq, quickly permit calculating approximate nearest [KL04] and furthest [Bes96, PSSS15, Cha16] neighbors to q.q.

A (1+ϵ)(1+\epsilon) approximate nearest neighbor to a query set CC, is a point pPp\in P satisfying d(p,C)(1+ϵ)d(P,C)d(p,C)\leq(1+\epsilon)d(P,C). Because “nearest neighbor” is decomposable, i.e., d(P,C)=minqCd(P,q),d(P,C)=\min_{q\in C}d(P,q), [KL04] also permits efficiently calculating an approximate nearest neighbor to set CC from a dynamically changing P.P.

An approximate furthest neighbor to a query set CC is similarly defined as a point pPp\in P satisfying maxpPd(p,C)(1+ϵ)d(p,C).\max_{p^{\prime}\in P}d(p^{\prime},C)\leq(1+\epsilon)d(p,C). Our main new technical result is Theorem 2.1, which permits efficiently calculating an approximate furthest neighbor to query set CC from a dynamically changing P.P. We note that, unlike nearest neighbor, furthest neighbor is not a decomposable problem and such a procedure does not seem to have previously known.

This technical result permits the creation of new algorithms for solving the dynamic kk-center problem in low dimensions.

2 Searching for a (1+ϵ)(1+\epsilon)-Approximate Furthest Point in a Dynamically Changing Point Set

Let (𝒳,d)(\mathcal{X},d) denote a fixed metric space.

Definition 1

Let C,P𝒳C,P\subset\mathcal{X} be finite sets of points and q𝒳q\in\mathcal{X}. Set

d(C,q)=d(q,C)=minqCd(q,q)andd(C,P)=minpPd(C,p).d({C},q)=d(q,C)=\min_{q^{\prime}\in{C}}d(q^{\prime},q)\quad\mbox{and}\quad d({C},P)=\min_{p\in P}d(C,p).

pPp\in P is a furthest neighbor in PP to qq if d(q,p)=maxpPd(q,p).d(q,p)=\max_{p^{\prime}\in P}d(q,p^{\prime}).

pPp\in P is a furthest neighbor in PP to set CC if d(C,p)=maxpPd(C,p).d(C,p)=\max_{p^{\prime}\in P}d(C,p^{\prime}).

pPp\in P is a (1+ϵ)(1+\epsilon)-approximate furthest neighbor in PP to qq if

maxpPd(q,p)(1+ϵ)d(q,p).\max_{p^{\prime}\in P}d(q,p^{\prime})\leq(1+\epsilon)d(q,p).

pPp\in P is a (1+ϵ)(1+\epsilon)-approximate furthest neighbor in PP to CC if

maxpPd(C,p)(1+ϵ)d(C,p).\max_{p^{\prime}\in P}d(C,p^{\prime})\leq(1+\epsilon)d(C,p).

FN(P,q)\mbox{FN}(P,q) and AFN(P,q,ϵ)\mbox{AFN}(P,q,\epsilon) will, respectively, denote procedures returning a furthest neighbor and a (1+ϵ)(1+\epsilon)-approximate furthest neighbor to qq in P.P.

FN(P,C)\mbox{FN}(P,C) and AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) will, respectively, denote procedures returning a furthest neighbor and (1+ϵ)(1+\epsilon)-approximate furthest neighbor to CC in P.P.

Our algorithm assumes that 𝒳\mathcal{X} has finite doubling dimension.

Definition 2 (Doubling Dimensions)

The doubling dimension of a metric space (𝒳,d)(\mathcal{X},d) is the minimum value dim(𝒳)\dim(\mathcal{X}) such that any ball B(x,r)B(x,r) in (𝒳,d)(\mathcal{X},d) can be covered by 2dim(𝒳)2^{\dim(\mathcal{X})} balls of radius r/2r/2.

It is known that the doubling dimension of the Euclidean space (RD,2)(R^{D},\ell_{2}) is Θ(D)\Theta(D) [H+01].

Now let (𝒳,d)(\mathcal{X},d) be a metric space with a finite doubling dimension and P𝒳P\subset\mathcal{X} be a finite set of points. Recall that dmax=sup{d(x,y):x,yP}anddmin=inf{d(x,y):x,yP,xy}.d_{max}=\sup\{d(x,y):x,y\in P\}\quad\mbox{and}\quad d_{min}=\inf\{d(x,y):x,y\in P,\ x\neq y\}. and The aspect ratio Δ\Delta of PP is Δ=dmaxdmin\Delta=\frac{d_{max}}{d_{min}}.

Our main technical theorem (proven below in Section 2.2) is:

Theorem 2.1

Let (𝒳,d)(\mathcal{X},d) be a metric space with finite doubling dimension and P𝒳P\subset\mathcal{X} be a point set stored by a navigating net data structure [KL04]. Let C𝒳C\subset\mathcal{X} be another point set. Then, we can find a (1+ϵ)(1+\epsilon)-approximate furthest point among PP to CC in O(|𝒞|(logΔ+(1/ϵ)O(dim(𝒳))))O\left(\mathcal{|C|}(\log\Delta+(1/\epsilon)^{O(\dim(\mathcal{X}))})\right) time, where Δ\Delta is the aspect ratio of set PP.

The navigating net data structure [KL04] is described in more detail below.

2.1 Navigating Nets [KL04]

Navigating nets are very well-known structures for dynamically maintaining points in a metric space with finite doubling dimension, in a way that permits approximate nearness queries. To the best of our knowledge they have not been previously used for approximate “furthest point from set” queries.

To describe the algorithm, we first need to quickly review some basic known facts about navigating nets. The following lemma is critical to our analysis.

Lemma 1

[KL04] Let (𝒳,d)(\mathcal{X},d) be a metric space and Y𝒳Y\subseteq\mathcal{X}. If the aspect ratio of the metric induced on YY is at most Δ\Delta and Δ2\Delta\geqslant 2, then |Y|ΔO(dim(𝒳))|Y|\leqslant\Delta^{O(\dim(\mathcal{X}))}.

We next introduce some notation from [KL04]:

Definition 3 (rr-net)

[KL04] Let (𝒳,d)(\mathcal{X},d) be a metric space. For a given parameter r>0r>0, a subset Y𝒳Y\subseteq\mathcal{X} is an rr-net of PP if it satisfies:

  1. (1)

    For every x,yYx,y\in Y, d(x,y)rd(x,y)\geqslant r;

  2. (2)

    xP\forall x\in P, there exists at least one yYy\in Y such that xB(y,r)x\in B(y,r).

We now start the description of the navigating net data structure. Set Γ={2i:i}\Gamma=\{2^{i}:i\in\mathbb{Z}\}. Each rΓr\in\Gamma is called a scale. For every rΓr\in\Gamma, YrY_{r} will denote an rr-net of Yr/2Y_{r/2}. The base case is that for every scale rdminr\leqslant d_{min}, Yr=P.Y_{r}=P.

Let γ4\gamma\geqslant 4 be some fixed constant. For each scale rr and each yYry\in Y_{r}, the data structure stores the set of points

Ly,r={zYr/2:d(z,y)γr}.L_{y,r}=\{z\in Y_{r/2}:d(z,y)\leqslant\gamma\cdot r\}. (1)

Ly,rL_{y,r} is called the scale rr navigation list of yy.

Let rmaxΓr_{max}\in\Gamma denote the smallest rr satisfying |Yr|=1|Y_{r}|=1 and rminΓr_{min}\in\Gamma denote the largest rr satisfying Ly,r={y}L_{y,r}=\{y\} for every yYry\in Y_{r}. Scales r[rmin,rmax]r\in[r_{min},r_{max}] are called non-trivial scales; all other scales are called trivial. Since rmax=Θ(dmax)r_{max}=\Theta(d_{max}) and rmin=Θ(dmin)r_{min}=\Theta(d_{min}), the number of non-trivial scales is O(log2rmaxrmin)=O(log2Δ).O\left(\log_{2}\frac{r_{max}}{r_{min}}\right)=O(\log_{2}\Delta).

Finally, we need a few more basic properties of navigating nets:

Lemma 2

[KL04](Lemma 2.1 and 2.2) For each scale rr, we have:

  1. (1)

    yYr\forall y\in Y_{r}, |Ly,r|=O(2O(dim(𝒳))).|L_{y,r}|=O(2^{O(\dim(\mathcal{X}))}).

  2. (2)

    zP\forall z\in P, d(z,Yr)<2rd(z,Y_{r})<2r;

  3. (3)

    x,yYr\forall x,y\in Y_{r}, d(x,y)rd(x,y)\geqslant r.

We provide an example (Figure 3) of navigating nets in the Appendix. Navigating nets were originally designed to solve dynamic approximate nearest neighbor queries and are useful because they can be quickly updated.

Theorem 2.2

([KL04]) Navigating nets use O(2O(dim(𝒳))n)O(2^{O(\dim(\mathcal{X}))}\cdot n) words. The data structure can be updated with an insertion of a point to PP or a deletion of a point in PP in time (2O(dim(𝒳))logΔloglogΔ)(2^{O(\dim(\mathcal{X}))}\log\Delta\log\log\Delta) 111Note: Although the update time of the navigating net depends on O(logΔ)O(\log\Delta), it does not explicitly maintain the value of Δ.\Delta. Instead it dynamically maintains the values rmax=Θ(dmax)r_{max}=\Theta(d_{max}) and rmin=Θ(dmin)r_{min}=\Theta(d_{min}). The update time depends on the number of non-trivial scales logrmaxrmin=Θ(logΔ)\log\frac{r_{max}}{r_{min}}=\Theta(\log\Delta), but without actually knowing Δ.\Delta.. This includes (2O(dim(𝒳))logΔ)(2^{O(\dim(\mathcal{X}))}\log\Delta) distance computations.

2.2 The Approximate Furthest Neighbor Algorithm AFN(P,C,ϵ\mbox{AFN}(P,C,\epsilon)

Algorithm 1 Approximate Furthest Neighbor: AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon)

Input: A navigating net for set P𝒳P\subset\mathcal{X}, set C𝒳C\subset\mathcal{X} and a constant ϵ>0\epsilon>0.
Output: A (1+ϵ)(1+\epsilon)-approximate furthest neighbor among PP to CC

1:Set r=rmaxr=r_{max} and Zr=YrmaxZ_{r}=Y_{r_{max}};
2:while r>max{12(ϵmaxzZrd(z,C)),rminr>\max\{\frac{1}{2}(\epsilon\cdot\max_{z\in Z_{r}}d(z,C)),r_{min}}  do
3:     set Zr/2=zZr{yLz,r:d(y,C)maxzZrd(z,C)r}Z_{r/2}=\bigcup_{z\in Z_{r}}\{y\in L_{z,r}:d(y,C)\geqslant\max_{z\in Z_{r}}d(z,C)-r\};
4:     set r=r/2r=r/2
5:Return zZrz\in Z_{r} satisfying d(z,C)d(z,C) is maximal.

AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) is given in Algorithm 1. Figure 1 provides some geometric intuition. AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) requires that PP be stored in a navigating net and the following definitions:

Definition 4 (The sets ZrZ_{r})
  • Zrmax=YrmaxZ_{r_{max}}=Y_{r_{max}}, where |Yrmax|=1|Y_{r_{max}}|=1;

  • If ZrZ_{r} is defined, Zr/2=zZr{yLz,r:d(y,C)maxzZrd(z,C)r}Z_{r/2}=\bigcup_{z\in Z_{r}}\{y\in L_{z,r}:d(y,C)\geqslant\max_{z\in Z_{r}}d(z,C)-r\}.

Note that, by induction, ZrYr.Z_{r}\subseteq Y_{r}.

Refer to caption
Figure 1: Illustration of line 3 of Algorithm 1. C={c1,c2}C=\{c_{1},c_{2}\}. Let pZrp\in Z_{r} be the furthest point in ZrZ_{r} to CC and set R=maxzZrd(z,C)R=\max_{z\in Z_{r}}d(z,C) (in fig. 1, R=d(c2,p)).R=d(c_{2},p))., ZrB(c1,R)B(c2,R)Z_{r}\subseteq B(c_{1},R)\cup B(c_{2},R). If zB(ci,R)z\in B(c_{i},R) and yLz,ry\in L_{z,r} then yB(ci,R+γr).y\in B(c_{i},R+\gamma r). Next note that if yZr/2y\in Z_{r/2} then yLz,ry\in L_{z,r} for some zZrz\in Z_{r} and for each i,i, d(y,ci)d(y,C)Rr.d(y,c_{i})\geq d(y,C)\geq R-r. This is illustrated in the right figure; yy must be in one of the two blue annulus B(ci,R+γr)B(ci,Rr)B(c_{i},R+\gamma r)\setminus B(c_{i},R-r) (i=1,2).i=1,2). Thus Zr/2Z_{r/2} is contained in the union of the annulus.

We now prove that AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) returns a (1+ϵ)(1+\epsilon)-approximate furthest point among PP to CC. We start by showing that, for every scale r,r, the furthest point to CC is close to Zr.Z_{r}.

Lemma 3

Let aa^{*} be the furthest point to CC in PP. Then, every set ZrZ_{r} as defined in Definition 4, contains a point zrz_{r} satisfying d(zr,a)2rd(z_{r},a^{*})\leqslant 2r

Proof

The proof is illustrated in Figure 2. It works by downward induction on rr. In the base case r=rmaxr=r_{max} and Zrmax=YrmaxZ_{r_{max}}=Y_{r_{max}}, thus d(a,Zrmax)2rd(a^{*},Z_{r_{max}})\leqslant 2r.

For the inductive step, we assume that ZrZ_{r} satisfies the induction hypothesis, i.e, ZrZ_{r} contains a point zz^{\prime} satisfying d(z,a)2rd(z^{\prime},a^{*})\leqslant 2r. We will show that Zr/2Z_{r/2} contains a point yy satisfying d(y,a)rd(y,a^{*})\leqslant r.

Since Yr/2Y_{r/2} is a r2\frac{r}{2}-net of PP, there exists a point yYr/2y\in Y_{r/2} satisfying d(y,a)rd(y,a^{*})\leqslant r (Lemma 2(2)). Then,

d(z,y)d(z,a)+d(a,y)2r+r=3rd(z^{\prime},y)\leqslant d(z^{\prime},a^{*})+d(a^{*},y)\leqslant 2r+r=3r

and thus, because γ4,\gamma\geqslant 4, yLz,r.y\in L_{z^{\prime},r}. Finally, let c=argminciCd(y,ci)c^{\prime}=\arg\min_{c_{i}\in C}d(y,c_{i}). Then

d(y,C)=d(y,c)d(a,c)d(a,y)d(a,C)d(a,y)maxzZrd(z,C)r.d(y,C)=d(y,c^{\prime})\geqslant d(a^{*},c^{\prime})-d(a^{*},y)\geqslant d(a^{*},C)-d(a^{*},y)\geqslant\max_{z\in Z_{r}}d(z,C)-r.

Thus yZr/2y\in Z_{r/2}.

Refer to caption
Figure 2: Illustration of Lemma 3, for an example in which C={c1,c2}C=\{c_{1},c_{2}\} and d(c2,zmax)=maxzZrd(z,C)d(c_{2},z_{max})=\max_{z\in Z_{r}}d(z,C). Suppose that, at scale rr, ZrZ_{r} contains a point zz^{\prime} satisfying d(a,z)2rd(a^{*},z^{\prime})\leqslant 2r. The proof shows that Zr/2Z_{r/2} contains a point yy satisfying d(y,a)2(r/2)=rd(y,a^{*})\leqslant 2\cdot(r/2)=r.

Lemma 3 permits bounding the approximation ratio of algorithm AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon).

Lemma 4

Algorithm AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) returns a point qq whose distance to CC satisfies maxpPd(p,C)(1+ϵ)d(q,C)\max_{p\in P}d(p,C)\leqslant(1+\epsilon)d(q,C).

Proof

Let rr^{\prime} denote the value of rr at the end of the algorithm. Let aa^{*} be the furthest point to CC among PP. Consider the two following conditions on r:r^{\prime}:

  1. 1.

    r12(ϵmaxzZrd(z,C))r^{\prime}\leqslant\frac{1}{2}(\epsilon\cdot\max_{z\in Z_{r^{\prime}}}d(z,C)). In this case, by Lemma 3, there exists a point zrZrz_{r^{\prime}}\in Z_{r^{\prime}} satisfying d(zr,a)2rd(z_{r^{\prime}},a^{*})\leqslant 2r^{\prime}. Let c=argminciCd(zr,ci)c^{\prime}=\arg\min_{c_{i}\in C}d(z_{r^{\prime}},c_{i}).

    maxzZrd(z,C)\displaystyle\max_{z\in Z_{r^{\prime}}}d(z,C) d(zr,C)=d(zr,c)d(a,c)d(zr,a)\displaystyle\geqslant d(z_{r^{\prime}},C)=d(z_{r^{\prime}},c^{\prime})\geqslant d(a^{*},c^{\prime})-d(z_{r^{\prime}},a^{*})
    d(a,C)2rd(a,C)ϵmaxzZrd(z,C)\displaystyle\geqslant d(a^{*},C)-2r^{\prime}\geqslant d(a^{*},C)-\epsilon\cdot\max_{z\in Z_{r^{\prime}}}d(z,C)

    Thus,

    (1+ϵ)maxzZrd(z,C)d(a,C)=maxxPd(x,C).(1+\epsilon)\cdot\max_{z\in Z_{r^{\prime}}}d(z,C)\geqslant d(a^{*},C)=\max_{x\in P}d(x,C). (2)
  2. 2.

    rrminr^{\prime}\leqslant r_{min}. In this case, recall that ZrYrZ_{r}\subseteq Y_{r} and that for every scale rrminr^{\prime}\leqslant r_{min} and yYr\forall y\in Y_{r}, Ly,r={y}L_{y,r}=\{y\}. Then

    Zr/2=zZr{yLz,r:d(y,C)maxzZrd(z,C)r}zZr{z}=Zr.Z_{r^{\prime}/2}=\bigcup_{z\in Z_{r^{\prime}}}\{y\in L_{z,r^{\prime}}:d(y,C)\geqslant\max_{z\in Z_{r^{\prime}}}d(z,C)-r^{\prime}\}\subseteq\bigcup_{z\in Z_{r^{\prime}}}\{z\}=Z_{r^{\prime}}.

Now let r1r_{1} be the largest scale for which r112(ϵmaxzZr1d(z,C))r_{1}\leqslant\frac{1}{2}(\epsilon\cdot\max_{z\in Z_{r_{1}}}d(z,C)) and r2r_{2} the scale at which AFN(C,ϵC,\epsilon) terminates.

From point 1, Equation (2) holds with r=r1.r^{\prime}=r_{1}.

If r1rminr_{1}\geq r_{min}, then r1=r2r_{1}=r_{2} and the lemma is correct.

If r1<rminr_{1}<r_{min} then r1r2rminr_{1}\leq r_{2}\leq r_{min}, so from point 2, Zr1Zr2Z_{r_{1}}\subseteq Z_{r_{2}} and

(1+ϵ)maxzZr2d(z,C)(1+ϵ)maxzZr1d(z,C)d(a,C)=maxxPd(x,C)(1+\epsilon)\cdot\max_{z\in Z_{r_{2}}}d(z,C)\geqslant(1+\epsilon)\cdot\max_{z\in Z_{r_{1}}}d(z,C)\geqslant d(a^{*},C)=\max_{x\in P}d(x,C)

Since r1r_{1} satisfies condition 1, the second inequality holds. Hence, the lemma is again correct.

We now analyze the running time of AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon).

Lemma 5

In each iteration of AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon), |Zr|4|C|(γ+2/ϵ)O(dim(𝒳)){|Z_{r}|}\leqslant 4|C|(\gamma+2/\epsilon)^{O(\dim(\mathcal{X}))}.

Proof

We actually prove the equivalent statement that |Zr/2|4|C|(γ+2/ϵ)O(D)|Z_{r/2}|\leqslant 4|C|(\gamma+2/\epsilon)^{O(D)}.

For all yZr/2y\in Z_{r/2}, there exists a point zZrz^{\prime}\in Z_{r} satisfying yLz,ry\in L_{z^{\prime},r}, i.e, d(z,y)γrd(z^{\prime},y)\leqslant\gamma\cdot r. Let c=argmincCd(z,C)c^{\prime}=\arg\min_{c\in C}d(z^{\prime},C). Thus,

d(y,c)d(c,z)+d(z,y)=d(z,C)+d(z,y)maxzZrd(z,C)+γr.d(y,c^{\prime})\leqslant d(c^{\prime},z^{\prime})+d(z^{\prime},y)=d(z^{\prime},C)+d(z^{\prime},y)\leqslant\max_{z\in Z_{r}}\,d(z,C)+\gamma\cdot r.

An iteration of AFN(C,ϵ)\mbox{AFN}(C,\epsilon) will construct Zr/2Z_{r/2} only when maxzZrd(z,C)2rϵ\max_{z\in Z_{r}}d(z,C)\leqslant\frac{2r}{\epsilon}. Therefore, d(y,c)(γ+2/ϵ)r.d(y,c^{\prime})\leqslant(\gamma+2/\epsilon)r. This implies Zr/2cCB(c,(γ+2/ϵ)r).Z_{r/2}\subseteq\bigcup_{c\in C}B(c,(\gamma+2/\epsilon)r).

Next notice that, since Zr/2Yr/2Z_{r/2}\subseteq Y_{r/2} is a r/2r/2-net, z1,z2Zr/2,d(z1,z2)r2\forall z_{1},z_{2}\in Z_{r/2},\,d(z_{1},z_{2})\geqslant\frac{r}{2}.

Finally, for fixed cCc\in C, x,yZr/2B(c,(γ+2/ϵ)r)\forall x,y\in Z_{r/2}\cap B(c,(\gamma+2/\epsilon)r), we have r2d(x,y)2(γ+2/ϵ)r\frac{r}{2}\leqslant d(x,y)\leqslant 2(\gamma+2/\epsilon)r. Thus, the aspect ratio ΔB(c,(γ+2/ϵ)r)\Delta_{B\left(c,(\gamma+2/\epsilon)r\right)} of the set Zr/2B(c,(γ+2/ϵ)r)Z_{r/2}\cap B(c,(\gamma+2/\epsilon)r) is at most ΔB(c,(γ+2/ϵ)r)2(γ+2/ϵ)rr2=4(γ+2/ϵ)\Delta_{B(c,(\gamma+2/\epsilon)r)}\leqslant\frac{2(\gamma+2/\epsilon)r}{\frac{r}{2}}=4(\gamma+2/\epsilon). Therefore, by Lemma 1, cC,|Zr/2B(c,(γ+2/ϵ)r)|(4(γ+2/ϵ))O(dim(𝒳)).\forall c\in C,\,|Z_{r/2}\cap B(c,(\gamma+2/\epsilon)r)|\leqslant(4(\gamma+2/\epsilon))^{O(\dim(\mathcal{X}))}.

Thus, |Zr/2||C|(4(γ+2/ϵ))O(dim(𝒳))|Z_{r/2}|\leqslant|C|(4(\gamma+2/\epsilon))^{O(\dim(\mathcal{X}))}

Lemma 6

AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) runs at most log2Δ+O(1)\log_{2}\Delta+O(1) iterations.

Proof

The algorithm starts with r=rmaxr=r_{max} and concludes when rrmin/2.r\geq r_{min}/2. Thus, the total number of iterations is at most

log2rmaxrmin/2=1+log2rmaxrmin=1+log2Θ(rmaxrmin)=1+log2Θ(dmaxdmin)=O(1)+log2Δ.\log_{2}\frac{r_{max}}{r_{min}/2}=1+\log_{2}\frac{r_{max}}{r_{min}}=1+\log_{2}\Theta\left(\frac{r_{max}}{r_{min}}\right)=1+\log_{2}\Theta\left(\frac{d_{max}}{d_{min}}\right)=O(1)+\log_{2}\Delta.

Lemmas 5 and 6, immediately imply that the running time of AFN(C,ϵC,\epsilon) is at most O(|C|(4(γ+2/ϵ)))O(dim(𝒳))logΔ)O(|C|(4(\gamma+2/\epsilon)))^{O(\dim(\mathcal{X}))}\log\Delta).

A more careful analysis leads to the proof of Theorem 2.1. Due to space limitations the full proof is deferred to appendix 0.B.

3 Modified kk-Center Algorithms

AFN(P,C,ϵ)AFN(P,C,\epsilon) will now be used to design two new dynamic kk-center algorithms.

Lemma 2 hints that elements in YrY_{r} can be approximate centers. This observation motivated Goranci et al. [GHL+21] to search for the smallest rr such that |Yr|k|Y_{r}|\leqslant k and return the elements in YrY_{r} as centers. Unfortunately, used this way, the original navigating nets data structure only returns an 88-approximation solution. [GHL+21] improve this by simultaneously maintaining multiple nets.

Although we also apply navigating nets to construct approximate kk-centers, our approach is very different from that of [GHL+21]. We do not use the elements in YrY_{r} as centers themselves. We only use the navigating net to support AFN(P,C,ϵ)AFN(P,C,\epsilon). Our algorithms result from substituting AFN(P,c,ϵ)AFN(P,c,\epsilon) for deterministic furthest neighbor procedures in static algorithms.

The next two subsections introduce the two modified algorithms.

3.1 A Modified Version of Gonzalez’s [Gon85]’s Greedy Algorithm

Gonzalez [Gon85] described a simple and now well-known O(kn)O(kn) time 22-approximation algorithm that works for any metric space. It operates by performing kk exact furthest neighbor from a set queries. We just directly replace those exact queries with our new approximate furthest neighbor query procedure.

It is then straightforward to modify Gonzalaz’s proof from [Gon85] that his original algorithm is a 22-approximation one, to prove that our new algorithm is a (2+ϵ)(2+\epsilon)-approximation one. The details of the algorithm (Algorithm 3) and the modified proof are provided in Appendix 0.C. This yields.

Theorem 3.1

Let P𝒳P\subset\mathcal{X} be a finite set of points in a metric space (𝒳,d)(\mathcal{X},d). Suppose AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) can be implemented in T(|C|,ϵ)T(|C|,\epsilon) time. Algorithm 3 constructs a (2+ϵ)(2+\epsilon)-approximate solution for the kk-center problem in O(kT(k,ϵ5))O\left(k\cdot T\left(k,\frac{\epsilon}{5}\right)\right) time.

Plugging Theorem 2.1 into this proves Theorem 1.1.

3.2 A Modified Version of the Kim Schwarzwald [KS20] Algorithm

In what follows, D1D\geq 1 is some arbitrary dimension.

In 2020 [KS20] gave an O(nD/ϵ)O(nD/\epsilon) time (1+ϵ\epsilon)-algorithm for the Euclidean 11-center (MEB) problem. They further showed how to extend this to obtain a (1+ϵ\epsilon)-approximation to the Euclidean kk-center in O(nD2O(klogk/ϵ))O(nD2^{O(k\log k/\epsilon)}) time.

Their algorithms use, as a subroutine, a Θ(n)\Theta(n) (or Θ(n|C|)\Theta(n|C|)) time brute-force procedure for finding FN(P,q)\mbox{FN}(P,q) (or FN(P,C)).\mbox{FN}(P,C)).

This subsection shows how replacing FN(P,q)\mbox{FN}(P,q) (or FN(P,C)\mbox{FN}(P,C)) by AFN(P,q,ϵ/3)\mbox{AFN}(P,q,\epsilon/3) (or AFN(P,C,ϵ/3)\mbox{AFN}(P,C,\epsilon/3)) along with some other minor small changes, maintains the correctness of the algorithm. Our modified version of Kim and Schwarzwald [KS20]’s MEB algorithm is presented as Algorithm 2.

Let ϵ>0\epsilon>0 be a constant. Their algorithm runs in O(1/ϵ)O(1/\epsilon) iterations. The ii’th iteration starts from some point mim_{i} and uses O(n)O(n) time to search for the point pi+1=FN(P,mi)p_{i+1}=\mbox{FN}(P,m_{i}) furthest from mi.m_{i}. The iteration then selects a “good” point mi+1m_{i+1} on the line segment pi+1mip_{i+1}m_{i} as the starting point for the next iteration, where “good” means that the distance from mi+1m_{i+1} to the optimal center is somehow bounded. The time to select such a "good" point is O(D)O(D). The total running time of their algorithm is O(nD/ϵ)O(nD/\epsilon). They also prove that the performance ratio of their algorithm is at most (1+ϵ)(1+\epsilon).

The running time of their algorithm is dominated by the O(n)O(n) time required to find the point FN(P,mi)\mbox{FN}(P,m_{i}). As we will see in Theorem 3.2 below, finding the exact furthest point FN(P,mi)\mbox{FN}(P,m_{i}) was not necessary. This could be replaced by AFN(P,ϵ/3,mi).\mbox{AFN}(P,\epsilon/3,m_{i}).

The first result is that this minor modification of Kim and Schwarzwald [KS20]’s algorithm still produces a (1+ϵ)(1+\epsilon) approximation.

Theorem 3.2

Let PDP\subset\mathbb{R}^{D} be a set of points whose minimum enclosing ball has (unknown) radius r.r^{*}. Suppose AFN(P,q,ϵ)\mbox{AFN}(P,q,\epsilon) can be implemented in T(ϵ)T(\epsilon) time.

Let c,rc,r be the values returned by Algorithm 2. Then PB(c,r)P\subset B(c,r) and r(1+ϵ)rr\leq(1+\epsilon)r^{*}. Thus Algorithm 2 constructs a (1+ϵ)(1+\epsilon)-approximate solution and it runs in O(DT(ϵ3)1ϵ)O\left(DT\left(\frac{\epsilon}{3}\right)\frac{1}{\epsilon}\right) time.

Plugging Theorem 2.1 into Theorem 3.2 proves Theorem 1.2 for k=1k=1.

Algorithm 2 Modified MEB(P,ϵP,\epsilon)

Input: A set of points PP and a constant ϵ>0\epsilon>0.
Output: A (1+ϵ)(1+\epsilon)-approximate minimum enclosing ball B(c,r)B(c,r) containing all points in PP.
The algorithm presented is just a slight modification of that of [KS20]. The differences are that in [KS20], line 4 was originally pi+1=FN(P,mi)p_{i+1}=\mbox{FN}(P,m_{i}) and the four (1+ϵ/3)(1+\epsilon/3) terms on lines 8 and 9, were all originally (1+ϵ).(1+\epsilon).

1:Arbitrarily select a point p1p_{1} from PP;
2:Set m1=p1m_{1}=p_{1}, r=r=\infty, and δ1=1\delta_{1}=1;
3:for i=1i=1 to 6/ϵ\lfloor 6/\epsilon\rfloor do
4:     pi+1=p_{i+1}=AFN(P,mi,ϵ/3)(P,m_{i},\epsilon/3);
5:     ri=(1+ϵ3)d(mi,pi+1);r_{i}=\left(1+\frac{\epsilon}{3}\right)d(m_{i},p_{i+1});
6:     if ri<rr_{i}<r  then
7:         c=mi;c=m_{i}; r=ri;r=r_{i};      
8:     mi+1=mi+(pi+1mi)δi2+(1+ϵ/3)212(1+ϵ/3)2m_{i+1}=m_{i}+(p_{i+1}-m_{i})\cdot\frac{\delta_{i}^{2}+(1+\epsilon/3)^{2}-1}{2(1+\epsilon/3)^{2}}
9:     δi+1=1(1+(1+ϵ/3)2δi22(1+ϵ/3))2\delta_{i+1}=\sqrt{1-\left(\frac{1+(1+\epsilon/3)^{2}-\delta^{2}_{i}}{2(1+\epsilon/3)}\right)^{2}};
Proof

Every ball B(mi,ri)B(m_{i},r_{i}) generated by Algorithm 2 encloses all of the points in P,P, i.e.,

i,maxpPd(mi,p)ri.\forall i,\quad\max_{p\in P}d(m_{i},p)\leqslant r_{i}. (3)

To prove the correctness of the algorithm it suffices to show that r(1+ϵ)r.r\leq(1+\epsilon)r^{*}. Without loss of generality, we assume that ϵ1.\epsilon\leqslant 1.

Each iteration of lines 4-9 of MEB(P,ϵ)\mbox{MEB}(P,\epsilon) must end in one of the two following cases:

  1. (1)

    d(mi,pi+1)(1+ϵ/3)rd(m_{i},p_{i+1})\leqslant(1+\epsilon/3)r^{*},

  2. (2)

    d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*}.

Note that if Case (1) holds for some i,i, then, directly from Equation 3 (using ϵ1\epsilon\leqslant 1),

maxpPd(mi,p)ri=(1+ϵ/3)d(mi,pi+1)(1+ϵ/3)2r<(1+ϵ)r\max_{p\in P}d(m_{i},p)\leqslant r_{i}=(1+\epsilon/3)d(m_{i},p_{i+1})\leqslant(1+\epsilon/3)^{2}r^{*}<(1+\epsilon)r^{*}

This implies that if Case 1 ever holds, Algorithm 2 is correct.

The main lemma is

Lemma 7

If,  1ij\forall 1\leqslant i\leqslant j, case (2) holds, i.e., d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*}, then j6ϵ1.j\leq\frac{6}{\epsilon}-1.

The proof of Lemma 7 is just a straightforward modification of the proof given in Kim and Schwarzwald [KS20] for their original algorithm and is therefore omitted. For completeness we provide the full modified proof in Section 0.D.1.

Lemma 7 implies that, by the end of the algorithm, Case 1 must have occurred at least once, so r(1+ϵ)rr\leq(1+\epsilon)r^{*} and the algorithm outputs a correct solution. Derivation of the running time of the algorithm is straightforward, completing the proof of Theorem 3.2.

[KS20] discuss (without providing details) how to use the "guessing" technique of [BHPI02, BC03]) to extend their MEB algorithm to yield a (1+ϵ)(1+\epsilon)-approximation solution to the kk-center problem for k2k\geqslant 2.

For MEB, the Euclidean 11-center, in each iteration, they maintained the location of a candidate center cc and computed a furthest point to cc among PP. For the Euclidean kk-centers, in each step, they maintain locations of a set CC of candidate centers, |C|k|C|\leqslant k and compute a furthest point to CC among PP using a FN(P,C)\mbox{FN}(P,C) procedure.

Again we can modify their algorithm by replacing the FN(P,C)\mbox{FN}(P,C) procedure by a AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) one, computing an approximate furthest point to CC among PP. This will prove Theorem 1.2.

The full details of a modified version of their algorithm have been provided in section 0.D.2, which uses AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) in place of FN(P,C)\mbox{FN}(P,C), as well as an analysis of correctness and run time.

4 Conclusion

Our main new technical contribution is an algorithm, AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) that finds a (1+ϵ)(1+\epsilon)-approximate furthest point in PP to C.C. This works on top of a navigating net data structure [KL04] storing P.P.

The proofs of Theorems 1.1 and 1.2 follow immediately by maintaining a navigating net and plugging AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) into Theorems 3.1 and 0.D.1, respectively.

These provide a fully dynamic and deterministic (2+ϵ)(2+\epsilon)-approximation algorithm for the kk-center problem in a metric space with finite doubling dimension and a (1+ϵ)(1+\epsilon)-approximation algorithm for the Euclidean kk-center problem, where ϵ,k\epsilon,k are parameters given at query time.

One limitation of our algorithm is that, because AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) is built on top of navigating nets, it depends upon aspect ratio Δ\Delta. This is the only dependence of the kk-center algorithm on Δ.\Delta. An interesting future direction would be to develop algorithms for AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) in special metric spaces built on top of other structures that are independent of Δ.\Delta. This would automatically lead to algorithms for approximate kk-center that, in those spaces, would also be independent of Δ.\Delta.

References

  • [AS10] Pankaj K Agarwal and R Sharathkumar. Streaming algorithms for extent problems in high dimensions. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms, pages 1481–1489. SIAM, 2010.
  • [BC03] Mihai Badoiu and Kenneth L Clarkson. Smaller core-sets for balls. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), volume 3, pages 801–802, 2003.
  • [BEJM21] Mohammad Hossein Bateni, Hossein Esfandiari, Rajesh Jayaram, and Vahab Mirrokni. Optimal fully dynamic kk-centers clustering. arXiv preprint arXiv:2112.07050, 2021.
  • [Bes96] Sergei Bespamyatnikh. Dynamic algorithms for approximate neighbor searching. In 8th Canadian Conference on Computational Geometry (CCCG’96), pages 252–257, 1996.
  • [BHPI02] Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing (STOC), pages 250–257, 2002.
  • [CGS18] TH Hubert Chan, Arnaud Guerqin, and Mauro Sozio. Fully dynamic k-center clustering. In Proceedings of the 2018 World Wide Web Conference (WWW), pages 579–587, 2018.
  • [Cha09] Timothy M Chan. Dynamic coresets. Discrete & Computational Geometry, 42(3):469–488, 2009.
  • [Cha16] Timothy M Chan. Dynamic streaming algorithms for epsilon-kernels. In 32nd International Symposium on Computational Geometry (SoCG 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
  • [FG88] Tomás Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 434–444, 1988.
  • [GHL+21] Gramoz Goranci, Monika Henzinger, Dariusz Leniowski, Christian Schulz, and Alexander Svozil. Fully dynamic k-center clustering in low dimensional metrics. In 2021 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX), pages 143–153. SIAM, 2021.
  • [Gon85] Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985.
  • [H+01] Juha Heinonen et al. Lectures on analysis on metric spaces. Springer Science & Business Media, 2001.
  • [HN79] Wen-Lian Hsu and George L Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1(3):209–215, 1979.
  • [HS85] Dorit S Hochbaum and David B Shmoys. A best possible heuristic for the k-center problem. Mathematics of operations research, 10(2):180–184, 1985.
  • [KA15] Sang-Sub Kim and HeeKap Ahn. An improved data stream algorithm for clustering. Computational Geometry, 48(9):635–645, 2015.
  • [KL04] Robert Krauthgamer and James R Lee. Navigating nets: Simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 798–807, 2004.
  • [KS20] Sang-Sub Kim and Barbara Schwarzwald. A (1+ ε\varepsilon)-approximation for the minimum enclosing ball problem in RdR^{d}. In the 36th European Workshop on Computational Geometry (EuroCG), 2020.
  • [PSSS15] Rasmus Pagh, Francesco Silvestri, Johan Sivertsen, and Matthew Skala. Approximate furthest neighbor in high dimensions. In International Conference on Similarity Search and Applications, pages 3–14. Springer, 2015.
  • [SS19] Melanie Schmidt and Christian Sohler. Fully dynamic hierarchical diameter k-clustering and k-center. arXiv preprint arXiv:1908.02645, 2019.

Appendix 0.A A Navigating Nets Example

Refer to caption
Figure 3: A navigating nets example with P={1,2,,10},P=\{1,2,...,10\}, rmax=4r_{max}=4 and rmin=1.r_{min}=1. Y4={5},Y_{4}=\{5\}, Y2={2,5,7,8}Y_{2}=\{2,5,7,8\} and Y1=PY_{1}=P. We note that xY2,d(x,Y4)4\forall x\in Y_{2},\ d(x,Y_{4})\leqslant 4 and xP,d(x,Y4)24\forall x\in P,\ d(x,Y_{4})\leqslant 2\cdot 4. L2,2={1,2,3,4,5,6}L_{2,2}=\{1,2,3,4,5,6\}, L5,2={2,4,5,6,7,8}L_{5,2}=\{2,4,5,6,7,8\}, L7,2={5,6,7}L_{7,2}=\{5,6,7\}, and L8,2={5,8,9,10}L_{8,2}=\{5,8,9,10\}.

Appendix 0.B The Proof of Theorem 2.1

Proof

It only remains to show that the running time of AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) is bounded by O(|C|logΔ)+O(|C|(1/ϵ)O(dim(𝒳)))O(|C|\log\Delta)+O(|C|(1/\epsilon)^{O(\dim(\mathcal{X}))}). We do this by splitting the set of scales rr processed by line 2 of Algorithm 1 into two ranges, (1) r3maxxPd(x,C)\frac{r}{3}\geqslant\max_{x\in P}d(x,C) and (2) r3<maxxPd(x,C).\frac{r}{3}<\max_{x\in P}d(x,C).

We then study the two cases separately : For (1) we will show a better bound on |Zr||Z_{r}| than Lemma 5; for (2) we will show that the number of processed scales is small.

  • (1)

    When r3maxxPd(x,C)\frac{r}{3}\geqslant\max_{x\in P}d(x,C), the size of ZrZ_{r} is small. To see this note that

    maxzZrd(z,C)maxxPd(x,C)r3\max_{z\in Z_{r}}d(z,C)\leqslant\max_{x\in P}d(x,C)\leqslant\frac{r}{3}

    Thus, for each zZrz\in Z_{r}, there exists a cCc\in C such that zB(c,r3)z\in B(c,\frac{r}{3}), i.e., ZrcCB(c,r3)Z_{r}\subseteq\bigcup_{c\in C}B(c,\frac{r}{3}). Additionally, ZrYrZ_{r}\subseteq Y_{r}, so, x,yZr\forall x,y\in Z_{r}, d(x,y)rd(x,y)\geqslant r. Since the diameter of the ball B(c,r3)B(c,\frac{r}{3}) is smaller than rr, |ZrB(c,r3)|1|Z_{r}\cap B(c,\frac{r}{3})|\leqslant 1 for every cCc\in C. Therefore, |Zr||C||Z_{r}|\leqslant|C|.

    By Lemma 6, the number of iterations is at most logΔ+O(1)\log\Delta+O(1). Hence, the total running time for Case (1) is O(|C|(logΔ+1))=O(|C|logΔ).O(|C|(\log\Delta+1))=O(|C|\log\Delta).

  • (2)

    When r3<maxxPd(x,C)\frac{r}{3}<\max_{x\in P}d(x,C), although the size of ZrZ_{r} can be larger, the number of possible iterations will be small.

    By Lemma 5, |Zr|4|C|(γ+2/ϵ)O(dim(𝒳))|Z_{r}|\leqslant 4|C|(\gamma+2/\epsilon)^{O(\dim(\mathcal{X}))}. Let rr^{\prime} be the value when the algorithm terminates, then 2r>ϵ2maxzZrd(z,C)2r^{\prime}>\frac{\epsilon}{2}\max_{z\in Z_{r^{\prime}}}d(z,C). From Lemma 4, (1+ϵ)maxzZrd(z,C)maxxPd(x,C)(1+\epsilon)\max_{z\in Z_{r^{\prime}}}d(z,C)\geqslant\max_{x\in P}d(x,C).We have

    3maxxPd(x,C)>rr>ϵ4maxzZrd(x,C)ϵ4(1+ϵ)maxxPd(x,C)3\cdot\max_{x\in P}d(x,C)>r\geqslant r^{\prime}>\frac{\epsilon}{4}\max_{z\in Z_{r^{\prime}}}d(x,C)\geq\frac{\epsilon}{4(1+\epsilon)}\max_{x\in P}d(x,C)

    Thus, the total number of Case (2) iterations is at most O(log(1ϵ))O(\log(\frac{1}{\epsilon})) and the total running time in Case (2) is O(|C|(4(γ+2/ϵ))O(dim(𝒳))log(1/ϵ)).O(|C|(4(\gamma+2/\epsilon))^{O(\dim(\mathcal{X}))}\log(1/\epsilon)).

Combining (1) and (2), the total running time of the algorithm is O(|C|(logΔ))+O(|C|(1/ϵ)O(dim(𝒳))).O(|C|(\log\Delta))+O\left(|C|(1/\epsilon)^{O(\dim(\mathcal{X}))}\right).

Appendix 0.C The Modified Version of Gonzales’s Greedy Algorithm

As noted in the main text, Algorithm 3 is essentially Gonzalez’s [Gon85] original algorithm with FN(P,C)\mbox{FN}(P,C) replaced by AFN(P,C,ϵ/5).\mbox{AFN}(P,C,\epsilon/5).

0.C.1 The Algorithm

Algorithm 3 Modified Greedy Algorithm: GREEDY(P,ϵ)GREEDY(P,\epsilon)

Input: A set of points PXP\subset X, positive integer kk and a constant ϵ>0\epsilon>0.
Output: A set CC (|C|k|C|\leq k) and radius rr such that PcCB(c,r)P\subset\bigcup_{c\in C}B(c,r) and r(2+ϵ)r.r\leq(2+\epsilon)r^{*}.

1:Arbitrarily select a point p1p_{1} from PP and set C={p1}C=\{p_{1}\}.
2:while |C|<k|C|<k  do
3:     C=C{AFN(P,C,ϵ5)}C=C\cup\left\{\mbox{AFN}\left(P,C,\frac{\epsilon}{5}\right)\right\}    % FN(P,C)\mbox{FN}(P,C) is replaced by AFN(P,C,ϵ/5)\mbox{AFN}(P,C,\epsilon/5).
4:Set r=(1+ϵ5)d(C,AFN(P,C,ϵ5))r=\left(1+\frac{\epsilon}{5}\right)d(C,\mbox{AFN}(P,C,\frac{\epsilon}{5}))
5:Return C,rC,r as the solution.

Gonzalaz’s original algorithm only returned C,C, since in the deterministic case R=FN(P,C)R=\mbox{FN}(P,C) could be calculated in O(kr)O(kr) time.

0.C.2 Proof of Theorem 3.1

As noted this proof is just a modification of the proof of correctness of Gonzalez’s [Gon85] original algorithm (which used FN(P,C)\mbox{FN}(P,C) rather than AFN(P,C,ϵ/5)\mbox{AFN}(P,C,\epsilon/5)).

Proof

Let C={q1,,qk}C=\{q_{1},...,q_{k}\} and rr denote the solution returned by GREEDY(P,ϵ)GREEDY(P,\epsilon).

Let q=AFN(P,C,ϵ5)q=AFN(P,C,\frac{\epsilon}{5}) be the (1+ϵ5)(1+\frac{\epsilon}{5})-approximate furthest neighbor from PP to CC returned. Thus

pP,d(p,C)(1+ϵ5)d(C,q)=r,\forall p\in P,\quad d(p,C)\leqslant\left(1+\frac{\epsilon}{5}\right)d(C,q)=r,

i.e., Pi=1kB(ci,r)P\subseteq\bigcup_{i=1}^{k}B(c_{i},r). The output of the algorithm GREEDY(P,ϵ)GREEDY(P,\epsilon) is thus a feasible solution.

Let O={o1,,ok}O=\{o_{1},...,o_{k}\} denote an optimal kk center solution for the point set PP with rr^{*} being the optimal radius value. Recall that |S||S| is the number of points in set SS. We consider two cases:

  1. Case 1

    1ik\forall 1\leqslant i\leqslant k, |CB(oi,r)|=1|C\cap B(o_{i},r^{*})|=1

    Fix pP.p\in P. Let oio_{i} be such that pB(oi,r).p\in B(o_{i},r^{*}). Now let qjq_{j} satisfy qjCB(oi,r)q_{j}\in C\cap B(o_{i},r^{*}).

    Then, by the triangle inequality, d(p,qj)d(p,oi)+d(oi,qj)2rd(p,q_{j})\leqslant d(p,o_{i})+d(o_{i},q_{j})\leqslant 2r^{*}.

    We have just shown that pP,\forall p\in P, d(p,C)2rd(p,C)\leqslant 2r^{*}. In particular,

    d(C,q)maxpPd(p,C)2r.d(C,q)\leqslant\max_{p\in P}d(p,C)\leqslant 2r^{*}.

    Therefore,

    r=(1+ϵ5)d(C,q)(1+ϵ5)2r<(2+ϵ)r.r=\left(1+\frac{\epsilon}{5}\right)d(C,q)\leqslant\left(1+\frac{\epsilon}{5}\right)2r^{*}<(2+\epsilon)r^{*}.
  2. Case 2

    There exists oOo^{\prime}\in O such that |CB(o,r)|2|C\cap B(o^{\prime},r^{*})|\geqslant 2.
    Let qiq_{i} be the iith point added into CC and Ci={q1,,qi}C_{i}=\{q_{1},...,q_{i}\}. Thus C1C2Ck=CC_{1}\subset C_{2}\subset\cdots\subset C_{k}=C. In Case 2, CB(o,r)C\cap B(o^{\prime},r^{*}) contains at least two points qiq_{i} and qjq_{j} (i<ji<j). From line 3 in GREEDY(P,ϵ)GREEDY(P,\epsilon), we have qj=AFN(P,Cj1,ϵ5)q_{j}=AFN\left(P,C_{j-1},\frac{\epsilon}{5}\right). Furthermore,

    maxpPd(p,C)\displaystyle\max_{p\in P}d(p,C) maxpPd(p,Cj1)%(Because Cj1C)\displaystyle\leqslant\max_{p\in P}d(p,C_{j-1})\quad\quad\%\text{(Because $C_{j-1}\subseteq C$)}
    (1+ϵ5)d(Cj1,qj)\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)d(C_{j-1},q_{j})
    (1+ϵ5)d(qi,qj)%(Because qiCj1)\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)d(q_{i},q_{j})\quad\quad\%\text{(Because $q_{i}\in C_{j-1}$)}
    (1+ϵ5)(d(qi,o)+d(o,qj))\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)(d(q_{i},o^{\prime})+d(o^{\prime},q_{j}))
    (1+ϵ5)(r+r)=(2+2ϵ5)r.()\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)(r^{*}+r^{*})=\left(2+\frac{2\epsilon}{5}\right)r^{*}.\quad\quad(*)

    Then, consider the radius returned by GREEDY(P,ϵ)GREEDY(P,\epsilon):

    r\displaystyle r =(1+ϵ5)d(C,q)\displaystyle=\left(1+\frac{\epsilon}{5}\right)d(C,q)
    (1+ϵ5)maxpPd(p,C)\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)\max_{p\in P}d(p,C)
    (1+ϵ5)(2+2ϵ5)r%(From ())\displaystyle\leqslant\left(1+\frac{\epsilon}{5}\right)\left(2+\frac{2\epsilon}{5}\right)r^{*}\quad\quad\%\text{(From ($*$))}
    (2+4ϵ5+2ϵ225)(2+ϵ)r\displaystyle\leqslant\left(2+\frac{4\epsilon}{5}+\frac{2\epsilon^{2}}{25}\right)\leqslant(2+\epsilon)r^{*}

where the last inequality assumes, without loss of generality, that ϵ1.\epsilon\leq 1.

Thus, pP\forall p\in P, Pi=1kB(ci,R)P\subseteq\bigcup_{i=1}^{k}B(c_{i},R) and, in both cases r(2+ϵ)rr\leqslant(2+\epsilon)r^{*}. Thus, GREEDY(P,ϵ)GREEDY(P,\epsilon) always computes a (2+ϵ)(2+\epsilon)-approximation solution.

Appendix 0.D Missing Details Associated with the Modified Kim and Schwarzwald[KS20]’s algorithm

0.D.1 Proof of Lemma 7

As noted previously, this proof is a modification of the proof of correctness given by Kim and Schwarzwald [KS20] for their algorithm (which used FN(P,q)\mbox{FN}(P,q) rather than AFN(P,q,ϵ/3)\mbox{AFN}(P,q,\epsilon/3)).

The proof needs an important geometric observation due to Kim and Schwarzwald[KS20] (slightly rephrased here). This was an extension of an earlier observation by Kim and Ahn[KA15] that was used to design a streaming algorithm for the Euclidean 2-center problem.

Lemma 8

[KS20] (See Figure 4.) Fix D2.D\geq 2. Let BB and BB^{\prime} be two DD-dimensional balls with radii rr and r,r^{\prime}, r>r,r>r^{\prime}, around the same center point cc. Let pBp\in\partial B and pBp^{\prime}\in\partial B^{\prime} with d(p,p)=lrd(p,p^{\prime})=l\geqslant r.

Define B′′B^{\prime\prime} to be the DD-dimensional ball centered at cc that is tangential to pppp^{\prime}. Denote that tangent point as mm and define the distances l1=d(p,m)l_{1}=d(p^{\prime},m) and l2=d(p,m)l_{2}=d(p,m). Note that l=l1+l2l=l_{1}+l_{2}.

Consider any line segment p1p2p_{1}p_{2} satisfying d(p1,p2)>ld(p_{1},p_{2})>l, p1Bp_{1}\in B^{\prime} and p2Bp_{2}\in B. Then any point mm^{*} on p1p2p_{1}p_{2} with d(m,p1)l1d(m^{*},p_{1})\geqslant l_{1} and d(m,p2)l2d(m^{*},p_{2})\geqslant l_{2} lies inside B′′B^{\prime\prime}.

Refer to caption
Figure 4: Illustration of lemma 8. This figure is copied from [KS20]

Lemma 8 will imply that if case (2) d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*} always occurs, then the distance from cc^{*} to mim_{i} is bounded.

Corollary 1

If,  1ij\forall 1\leqslant i\leqslant j, case (2) holds, i.e., d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*}, then d(c,mj+1)δj+1rd(c^{*},m_{j+1})\leqslant\delta_{j+1}\cdot r^{*} and δj+1<δj.\delta_{j+1}<\delta_{j}.

Proof

The proof will be by induction. In the base case, δ1=1\delta_{1}=1 and m1m_{1} is arbitrarily selected in PP, so d(c,m1)maxxPd(c,x)=r=δ1rd(c^{*},m_{1})\leqslant\max_{x\in P}d(c^{*},x)=r^{*}=\delta_{1}\cdot r^{*}.

Now suppose that,  1ij\forall 1\leqslant i\leqslant j, d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*} holds. From the induction hypothesis, we assume that d(c,mj)δjrd(c^{*},m_{j})\leqslant\delta_{j}\cdot r^{*} and δj+1<δj1.\delta_{j+1}<\delta_{j}\leq 1.

Set B=B(c,r)B=B(c^{*},r^{*}), and B=B(c,δjr)B^{\prime}=B(c^{*},\delta_{j}r^{*}).

Next, arbitrarily select a point pp on the boundary of B(c,r).B(c^{*},r^{*}). Now construct ball B¯=B(p,(1+ϵ/3)r)\bar{B}=B(p,(1+\epsilon/3)r^{*}). From the induction hypothesis,

d(mj,pj+1)d(c,mj)+rδjr+r.d(m_{j},p_{j+1})\leqslant d(c^{*},m_{j})+r^{*}\leqslant\delta_{j}r^{*}+r^{*}.

If δjr(ϵ/3)r,\delta_{j}r^{*}\leq(\epsilon/3)r^{*}, this would imply

d(mj,pj+1)(1+ϵ/3)r,d(m_{j},p_{j+1})\leqslant(1+\epsilon/3)r^{*},

contradicting that this is Case 2. Thus δjr>(ϵ/3)r\delta_{j}r^{*}>(\epsilon/3)r^{*}.

Since, δj<1,\delta_{j}<1, this implies that B¯\bar{B} must intersect BB^{\prime}. Arbitrarily select one of the two intersection points as pp^{\prime}. Next set l=d(p,p)=(1+ϵ/3)r.l=d(p,p^{\prime})=(1+\epsilon/3)r^{*}.

Finally, define B′′B^{\prime\prime} to be the ball centered at cc^{*} that is tangent to line segment pppp^{\prime}. Denote that tangent point as mm and define the distances l1=d(p,m)l_{1}=d(p^{\prime},m) and l2=d(p,m)l_{2}=d(p,m). We will show that B′′=B(c,δj+1r)B^{\prime\prime}=B(c^{*},\delta_{j+1}r^{*}), i.e., that d(c,m)=δj+1rd(c^{*},m)=\delta_{j+1}r^{*}, where δj+1\delta_{j+1} is as defined on line 9 of Algorithm 2.

Note that l=l1+l2l=l_{1}+l_{2}. Thus, l1l_{1} can be computed by the equation (illustrated in figure 5).

Refer to caption
Figure 5: l=d(p,p)=(1+ϵ/3)rl=d(p,p^{\prime})=(1+\epsilon/3)r^{*}, d(p,m)=l1d(p^{\prime},m)=l_{1} and d(m,p)=ll1d(m,p)=l-l_{1}.
(δjr)2l12=(r)2((1+ϵ/3)rl1)2(\delta_{j}r^{*})^{2}-l_{1}^{2}=(r^{*})^{2}-((1+\epsilon/3)r^{*}-l_{1})^{2}

This solves to l1=δj2+(1+ϵ/3)212(1+ϵ/3)r.l_{1}=\frac{\delta_{j}^{2}+(1+\epsilon/3)^{2}-1}{2(1+\epsilon/3)}\cdot r^{*}. Thus

δj+1=1(1+(1+ϵ/3)2δj22(1+ϵ/3))2\delta_{j+1}=\sqrt{1-\left(\frac{1+(1+\epsilon/3)^{2}-\delta^{2}_{j}}{2(1+\epsilon/3)}\right)^{2}}

as required and, by construction, δj+1<δj.\delta_{j+1}<\delta_{j}. Furthermore,

δi2+(1+ϵ/3)212(1+ϵ/3)2=l1l.\frac{\delta_{i}^{2}+(1+\epsilon/3)^{2}-1}{2(1+\epsilon/3)^{2}}=\frac{l_{1}}{l}.

Plugging into line 8 of Algorithm 2 yields

mj+1=mj+(pj+1mj)l1l=mj+(pj+1mj)(1l2l).m_{j+1}=m_{j}+(p_{j+1}-m_{j})\cdot\frac{l_{1}}{l}=m_{j}+(p_{j+1}-m_{j})\cdot\left(1-\frac{l_{2}}{l}\right).

Since d(mj,pj+1)>(1+ϵ/3)rd(m_{j},p_{j+1})>(1+\epsilon/3)r^{*}, we have

d(mj,mj+1)=l1ld(mj,pj+1)>l1(1+ϵ/3)r(1+ϵ/3)r=l1d(m_{j},m_{j+1})=\frac{l_{1}}{l}\cdot d(m_{j},p_{j+1})>\frac{l_{1}}{(1+\epsilon/3)r^{*}}(1+\epsilon/3)r^{*}=l_{1}

and d(pj+1,mj+1)=l2ld(mj,pj+1)>l2d(p_{j+1},m_{j+1})=\frac{l_{2}}{l}\cdot d(m_{j},p_{j+1})>l_{2}.

From the induction hypothesis, d(c,mj)δjrd(c^{*},m_{j})\leqslant\delta_{j}\cdot r^{*}, so mjB(c,δjr)m_{j}\in B(c^{*},\delta_{j}r^{*}). The definition of c,r,c^{*},r^{*}, further implies pj+1B(c,r)p_{j+1}\in B(c^{*},r^{*}).

We now apply Lemma 8, with p1=mjp_{1}=m_{j}, p2=pj+1p_{2}=p_{j+1} and m=mj+1.m^{*}=m_{j+1}. Since these three points are collinear, Lemma 8 implies that mB′′,m^{*}\in B^{\prime\prime}, i.e., that

d(c,mj+1)δj+1r.d(c^{*},m_{j+1})\leqslant\delta_{j+1}\cdot r^{*}.

We can now prove Lemma 7. We again note that this is just a slight modification of the proof given by Kim and Schwarzwald [KS20] for their original algorithm.

Proof

(of Lemma 7.)

Recall that the goal is to prove that if,  1ij\forall 1\leqslant i\leqslant j, case (2) holds, i.e., d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*}, then j6ϵ1.j\leq\frac{6}{\epsilon}-1.

Consider the triangle ppc\bigtriangleup pp^{\prime}c^{*} (Figure 5) constructed in the proof of Corollary 1.

Let p(i),p(i),m(i)p(i),p^{\prime}(i),m(i) denote p,p,mp,p^{\prime},m in step ii of the algorithm.

Recall that in the construction, p(i)p(i) is on the boundary of B(c,r)B(c^{*},r^{*}) and p(i)p^{\prime}(i) is on the boundary of B=B(c,δir)B^{\prime}=B(c^{*},\delta_{i}r^{*}). Additionally, d(p(i),p(i))=(1+ϵ/3)rd(p(i),p^{\prime}(i))=(1+\epsilon/3)r^{*} and line segment cmc^{*}m is vertical to line segment p(i)p(i).p(i)p^{\prime}(i). Recall that

d(p(i),m(i))=l1=δi2+(1+ϵ/3)212(1+ϵ/3)randd(p(i),m(i))=(1ϵ/3)rl1.d(p^{\prime}(i),m(i))=l_{1}=\frac{\delta_{i}^{2}+(1+\epsilon/3)^{2}-1}{2(1+\epsilon/3)}r^{*}\quad\mbox{and}\quad d(p(i),m(i))=(1-\epsilon/3)r^{*}-l_{1}.

Now define βi,αi\beta_{i},\alpha_{i} so that

d(m(i),p(i))=βid(p(i),p(i))=βi(1+ϵ/3)rand(p(i),m(i))=αi(1+ϵ/3)r.d(m(i),p(i))=\beta_{i}\cdot d(p(i),p^{\prime}(i))=\beta_{i}(1+\epsilon/3)r^{*}\quad\mbox{and}\quad(p^{\prime}(i),m(i))=\alpha_{i}(1+\epsilon/3)r^{*}.

Note that αi+βi=1.\alpha_{i}+\beta_{i}=1.

Since m(i)p(i)c\bigtriangleup m(i)p(i)c^{*} is a right triangle, d(m(i),p(i))=βi(1+ϵ/3)rd(c,p(i))=rd(m(i),p(i))=\beta_{i}(1+\epsilon/3)r^{*}\leqslant d(c^{*},p(i))=r^{*} so, i,\forall i, βi1(1+ϵ/3)\beta_{i}\leq\frac{1}{(1+\epsilon/3)}.

We have therefore just proven that if,  1ij\forall 1\leqslant i\leqslant j, case (2) holds, i.e., d(mi,pi+1)>(1+ϵ/3)rd(m_{i},p_{i+1})>(1+\epsilon/3)r^{*}, then for all such i,i, βi1(1+ϵ/3)\beta_{i}\leq\frac{1}{(1+\epsilon/3)} and, in particular, βj1(1+ϵ/3).\beta_{j}\leq\frac{1}{(1+\epsilon/3)}.

By construction, (δir)2=(r)2(βi(1+ϵ/3)r)2(\delta_{i}r^{*})^{2}=(r^{*})^{2}-(\beta_{i}(1+\epsilon/3)r^{*})^{2} and (αi(1+ϵ/3)r)2=(δi1r)2(δir)2(\alpha_{i}(1+\epsilon/3)r^{*})^{2}=(\delta_{i-1}r^{*})^{2}-(\delta_{i}r^{*})^{2}. Plugging the first (twice) into the second yields

(αi(1+ϵ/3)r)2=((r)2(βi1(1+ϵ/3)r)2)((r)2(βi(1+ϵ/3)r)2),(\alpha_{i}(1+\epsilon/3)r^{*})^{2}=\left((r^{*})^{2}-(\beta_{i-1}(1+\epsilon/3)r^{*})^{2}\right)-\left((r^{*})^{2}-(\beta_{i}(1+\epsilon/3)r^{*})^{2}\right),

or βi2=αi2+βi12.\beta_{i}^{2}=\alpha_{i}^{2}+\beta_{i-1}^{2}.

Combining this with βi=1αi\beta_{i}=1-\alpha_{i} yields βi=1+βi122\beta_{i}=\frac{1+\beta^{2}_{i-1}}{2}. Set φi=11βi\varphi_{i}=\frac{1}{1-\beta_{i}}. Then

φi=11βi=111+βi122=11βi122=11βi11+βi12=φi11+(11φi1)2=φi1112φi1.\varphi_{i}=\frac{1}{1-\beta_{i}}=\frac{1}{1-\frac{1+\beta_{i-1}^{2}}{2}}=\frac{1}{\frac{1-\beta_{i-1}^{2}}{2}}=\frac{\frac{1}{1-\beta_{i-1}}}{\frac{1+\beta_{i-1}}{2}}=\frac{\varphi_{i-1}}{\frac{1+(1-\frac{1}{\varphi_{i-1}})}{2}}=\frac{\varphi_{i-1}}{1-\frac{1}{2\varphi_{i-1}}}.

Thus

φi=φi1112φi1=φi1(1+12φi1+1(2φi1)2+)φi1+12.\varphi_{i}=\frac{\varphi_{i-1}}{1-\frac{1}{2\varphi_{i-1}}}=\varphi_{i-1}(1+\frac{1}{2\varphi_{i-1}}+\frac{1}{(2\varphi_{i-1})^{2}}+\cdots)\geqslant\varphi_{i-1}+\frac{1}{2}.

Recall that δ1=1\delta_{1}=1, d(c,p(1))=d(c,p(1))=rd(c^{*},p(1))=d(c^{*},p^{\prime}(1))=r^{*}, i.e., m(i)p(i)c\bigtriangleup m(i)p(i)c^{*} is an isosceles triangle. Therefore, α1=β1=12,\alpha_{1}=\beta_{1}=\frac{1}{2}, and φ1=2.\varphi_{1}=2. Iterating the equation above yields φi2+i12\varphi_{i}\geqslant 2+\frac{i-1}{2}. Thus, βi123+i\beta_{i}\geqslant 1-\frac{2}{3+i}. Thus, if j>6ϵ1j>\frac{6}{\epsilon}-1, then βj>11+ϵ/3,\beta_{j}>\frac{1}{1+\epsilon/3}, which we previously saw was not possible.

0.D.2 The Actual Modified Algorithm for Euclidean kk Center

Algorithm 4 Modified kk-center(P,ϵ,kP,\epsilon,k)

Input: A set of points PP, positive integer kk and a constant ϵ>0\epsilon>0.
Output: A set C¯\bar{C} (|C¯|k|\bar{C}|\leq k) and radius r¯\bar{r} such that PcC¯B(c,r)P\subset\bigcup_{c\in\bar{C}}B(c,r) and r¯(1+ϵ)r.\bar{r}\leq(1+\epsilon)r^{*}.
In the algorithm, each mj,im_{j,i}, j{1,,k},j\in\{1,\ldots,k\}, is either undefined or a point in D.\mathbb{R}^{D}. MiM_{i} denotes the set of defined mj,i.m_{j,i}. \mathcal{F} is the set of all functions from {1,,k6/ϵ}\{1,\ldots,k\lfloor 6/\epsilon\rfloor\} to {1,,k}.\{1,\ldots,k\}.

1:r¯=\bar{r}=\infty
2:for every function ff\in\mathcal{F} do
3:     j{1k},\forall j\in\{1\ldots k\}, set δj,1=1;\delta_{j,1}=1; Set r=;r=\infty;
4:     Arbitrarily select a point p1p_{1} from PP;
5:     mj,1={undefinedif jf(1)p1if j=f(1)m_{j,1}=\begin{cases}\mbox{undefined}&\mbox{if $j\not=f(1)$}\\ p_{1}&\mbox{if $j=f(1)$}\end{cases}
6:     for i=1i=1 to k6/ϵk\lfloor 6/\epsilon\rfloor do
7:         pi+1=p_{i+1}=AFN(P,Mi,ϵ/3)(P,M_{i},\epsilon/3);
8:         ri=(1+ϵ3)d(Mi,pi+1);r_{i}=\left(1+\frac{\epsilon}{3}\right)d(M_{i},p_{i+1});
9:         if ri<rr_{i}<r  then
10:              C=MiC=M_{i}; r=rir=r_{i};          
11:         mj,i+1={mj,iif jf(i)mj,i+(pi+1mj,i)δj,i2+(1+ϵ/3)212(1+ϵ/3)2if j=f(i)m_{j,i+1}=\begin{cases}m_{j,i}&\mbox{if $j\not=f(i)$}\\ m_{j,i}+(p_{i+1}-m_{j,i})\cdot\frac{\delta_{j,i}^{2}+(1+\epsilon/3)^{2}-1}{2(1+\epsilon/3)^{2}}&\mbox{if $j=f(i)$}\end{cases}
12:         δj,i+1={δj,iif jf(i)1(1+(1+ϵ/3)2δj,i22(1+ϵ/3))2if j=f(i)\delta_{j,i+1}=\begin{cases}\delta_{j,i}&\mbox{if $j\not=f(i)$}\\ \sqrt{1-\left(\frac{1+(1+\epsilon/3)^{2}-\delta^{2}_{j,i}}{2(1+\epsilon/3)}\right)^{2}}&\mbox{if $j=f(i)$}\end{cases}      
13:     if r<r¯r<\bar{r}  then
14:          r¯=r;\bar{r}=r; C¯=C;\bar{C}=C;      

Before starting, we provide a brief intuition. If, for each pPp\in P, the algorithm knew in advance in which of the kk clusters pp is located, it could solve the problem by running Algorithm 2 separately for each cluster and returning the largest radius it found. Since it doesn’t know that information in advance, it “guesses” the location. This guess is encoded by the function ff\in{\mathcal{F}} introduced in Algorithm 4. It runs this procedure for every possible guess. Since one of the guesses must be correct, the algorithm returns a correct answer.

Again, we emphasize that Algorithm 4 is essentially the algorithm alluded222We write “alluded to” because [KS20] do not actually provide details. They only say that they are utilizing the guessing technique from [BC03]. In our algorithm, we have provided full details of how this can be done. to in Kim and Schwarzwald [KS20] with calls to FN(P,C)\mbox{FN}(P,C) replaced by calls to AFN(P,C,ϵ/3).\mbox{AFN}(P,C,\epsilon/3).

Theorem 0.D.1

Let PDP\subset\mathbb{R}^{D} be a finite set of points. Suppose AFN(P,C,ϵ)\mbox{AFN}(P,C,\epsilon) can be implemented in T(|C|,ϵ)T(|C|,\epsilon) time. Then an (1+ϵ)(1+\epsilon)-approximate kk-center solution for PP can be constructed in O(DT(k,ϵ3)2O(klogk/ϵ))O\left(DT\left(k,\frac{\epsilon}{3}\right)2^{O(k\log k/\epsilon)}\right) time.

Proof

Let C={c1,,ck}C^{*}=\{c_{1}^{*},\ldots,c^{*}_{k}\} be a set of optimal centers and r=d(C,P).r^{*}=d(C^{*},P). Partition the points in PP into Pi,P_{i}, i=1,,k,i=1,\ldots,k, so that PiPB(ci,r).P_{i}\subseteq P\cap B(c^{*}_{i},r^{*}). Let ri=d(ci,Pi),r^{*}_{i}=d(c^{*}_{i},P_{i}), i.e., B(ci,ri)B(c^{*}_{i},r^{*}_{i}) is a minimum enclosing ball for Pi.P_{i}. Note that r=maxiri.r^{*}=\max_{i}r^{*}_{i}.

Fix f:{1,,k6/ϵ}{1,,k}f:\{1,\ldots,k\lfloor 6/\epsilon\rfloor\}\rightarrow\{1,\ldots,k\} to be some arbitrary function. Lines 3-12 of Algorithm 4 maintains a list of kk tentative centers; m1,i,,mk,i,m_{1,i},\ldots,m_{k,i}, denotes the list at the start of iteration i.i. Note that some of the mj,im_{j,i} might be undefined, i.e., do not exist. MiM_{i} will denote the set of defined items in the list at the start of iteration i.i. During iteration ii, the list updates (only the) tentative center mf(i),im_{f(i),i} and also constructs a radius ri.r_{i}.

The algorithm starts with all of the mj,im_{j,i} being undefined, chooses some arbitrary point of P,P, calls it p1,p_{1}, and then sets mf(1),1=p1.m_{f(1),1}=p_{1}.

At step i,i, it sets pi+1=AFN(P,Mi,ϵ/3)p_{i+1}=\mbox{AFN}\left(P,M_{i},\epsilon/3\right) and ri=(1+ϵ3)d(Mi,pi+1).r_{i}=\left(1+\frac{\epsilon}{3}\right)d(M_{i},p_{i+1}).

Note that the definitions of AFN(P,Mi,ϵ/3)\mbox{AFN}\left(P,M_{i},\epsilon/3\right) and rir_{i} immediately imply

i,PqMiB(q,ri).\forall i,\quad P\subset\bigcup_{q\in M_{i}}B(q,r_{i}). (4)

Thus, lines 3-12 return CC and rr that cover all points in PP in O(DkϵT(|C|,ϵ3))O\left(D\frac{k}{\epsilon}T\left(|C|,\frac{\epsilon}{3}\right)\right) time.

So far, the analysis has not considered lines 11-12 of the algorithm.

The algorithm arbitrarily chooses p1.p_{1}. Now consider the unique function f(i)f^{\prime}(i) that always returns the index of the PjP_{j} that contains pi+1,p_{i+1}, i.e., pi+1Pf(i).p_{i+1}\in P_{f^{\prime}(i)}. For this f(i),f^{\prime}(i), lines 11 and 12 of the algorithm work as if they are running the original modified MEB algorithm on each of the PjP_{j} separately.

By the generalized pigeonhole principle, there must exist at least one index jj such that f(i)=jf^{\prime}(i)=j at least 6/ϵ\lfloor 6/\epsilon\rfloor times. For such a j,j, consider the value of ii for which f(i)=jf^{\prime}(i)=j exactly 6/ϵ\lfloor 6/\epsilon\rfloor times. Then, from the analysis of Algorithm 2, for this particular j,j,

d(mj,i,pi+1)(1+ϵ/3)rj,d(m_{j,i},p_{i+1})\leq(1+\epsilon/3)r^{*}_{j},

so

d(Mi,pi+1)d(mj,i,pi+1)(1+ϵ/3)rj(1+ϵ/3)r.d(M_{i},p_{i+1})\leq d(m_{j,i},p_{i+1})\leq(1+\epsilon/3)r^{*}_{j}\leq(1+\epsilon/3)r^{*}.

Thus

ri=(1+ϵ3)d(M,pi+1)(1+ϵ3)2r(1+ϵ)r.r_{i}=\left(1+\frac{\epsilon}{3}\right)d(M,p_{i+1})\leq\left(1+\frac{\epsilon}{3}\right)^{2}r^{*}\leq\left(1+\epsilon\right)r^{*}.

In particular, lines 3-12 using f(i)f^{\prime}(i) returns a (1+ϵ)(1+\epsilon)-approximate solution.

Algorithm 4 runs lines 3-12 on all Θ(kk6/ϵ)=2O(klogk/ϵ)\Theta\left(k^{k\lfloor 6/\epsilon\rfloor}\right)=2^{O(k\log k/\epsilon)} different possible functions f(i).f(i). Since this includes f(i),f^{\prime}(i), the full algorithm also returns a (1+ϵ)(1+\epsilon)-approximate solution.