An efficient approximation for point-set diameter
in higher dimensions
Abstract
In this paper, we study the problem of computing the diameter of a set of points in -dimensional Euclidean space for a fixed dimension , and propose a new -approximation algorithm with time and space, where . We also show that the proposed algorithm can be modified to a -approximation algorithm with running time. These results provide some improvements in comparison with existing algorithms in terms of simplicity and data structure.
1 Introduction
Given a finite set of points, the diameter of , denoted by is the maximum distance between two points of . Namely, we want to find a diametrical pair and such that . Computing the diameter of a set of points has a large history, and it may be required in various fields such as database, data mining, and vision. A trivial brute-force algorithm for this problem is as follows: compute the distance between each pair of points and then choose the maximum distance. Since computing the distance takes constant time, this algorithm takes time, but this is too slow for large-scale datasets that occur in the fields. Hence, we need a faster algorithm which may be exact or is an approximation.
By reducting from the set disjointness problem, it can be shown that computing the diameter of points in requires operations in the algebraic computation-tree model [1]. It is shown by Yao that it is possible to compute the diameter in sub-quadratic time in each dimension [2]. There are well-known solutions in two and three dimensions. In the plane, this problem can be computed in optimal time , but in three dimensions, it is more difficult. Clarkson and Shor [3] present an -time randomized algorithm. Their algorithm needs to compute the intersection of balls (with the same radius) in . It may be slower than the brute-force algorithm for the most practical datasets. Moreover, it is not an efficient method for higher dimensions because the intersection of balls with the same radius has a large size. Some deterministic algorithms with running time [4, 5] and [6, 7] are found for solving this problem. Finally, Ramos [8, 9] introduced an optimal deterministic -time algorithm in . Cheong et al. [10] present an randomized algorithm that computes the all-pairs farthest neighbors for points on the convex position in .
In the absence of fast algorithms, many attempts have been done to approximate the diameter in low and high dimensions. A 2-approximation algorithm with time in dimensions can be found easily by selecting a point of and then finding the farthest point from it by brute-force manner. The first non-trivial approximation algorithm for the diameter is presented by Egecioglu and Kalantari [11] that approximates the diameter with factor and operations cost in dimensions. They also present an iterative algorithm with iterations and the cost for each iteration that has approximation factor . Agarwal et al. [12] present a -approximation algorithm in with running time by projection to directions. Barequet and Har Peled [13] present a -approximate diameter method with time. They also describe a -approximate approach for computing the diameter with time in . They show that the running time can be improved to . Similarly, Har Peled [14] presents an approach which for the most inputs
is able to compute very fast the exact diameter, or an approximation. Although, in the worst case, the algorithm running time is still quadratic, and it is sensitive to the
hardness of the input. His algorithm is able to return a pair of points and such that , for each value in each dimension with running time. He shows that with a complicated analysis, this running time can be reduced to . Simultaneously, Maladain and Boissonnat [15] present an exact algorithm for the diameter by computing the double normals in each dimension, but their algorithm is not worst-case optimal. They also show that with having double normals, a -approximation of the diameter in each dimension is provided. Moreover, Finocchiaro and Pellegrini [16] describe an algorithm that finds in time with high probability a -approximation for the diameter of a set of points in -dimensional Euclidean space. Chan [17] observes that a combination of two approaches in [12] and [13] yields a -approximation algorithm with time and a -approximation algorithm with time. He also introduces a core-set theorem, and shows that using this theorem, a -approximation for the diameter in time can be found [18]. Recently, Chan [19] has proposed an approximation algorithm with time by applying the Chebyshev polynomials for the diameter in low constant dimensions, and Arya et al. [20] show that by applying an efficient decomposition of a convex body using a hierarchy of Macbeath regions, it is possible to compute an approximation for the diameter of a point set in time, where is
an arbitrarily small positive constant. Table 1 provides a summary on some non-constant approximation algorithm for the point-set diameter.
1.1 Our results
In this paper, we propose a new -approximation algorithm for computing the diameter of a set of points in with time and space, where . Moreover, we show that the proposed algorithm can be modified to a -approximation algorithm with time and space. As stated above, two new results have been recently presented for the diameter problem in [19] and [20]. It should be noted that our algorithms are completely different in terms of computational technique. The polynomial technique provided by Chan [19] is based on using Chebyshev polynomials and discrete upper envelope subroutine [18], and the method presented by Arya et al. [20] requires the use of complex data structures to approximately answer queries for polytope membership, directional width, and nearest-neighbor. While our algorithms in comparison with these algorithms are simpler in terms of understanding and data structure. The remainder of this paper is organized as follows: in section 2, we describe our proposed algorithm. Subsection 2.1 includes our analysis over the algorithm. Subsection 2.2 presents a modified version of the proposed algorithm. In section 3, we draw our conclusion.
2 The proposed algorithm
In this section, we describe our new approximation algorithm to compute the diameter of a set of points in . In order to follow our algorithm, we first find extreme points in each coordinate and compute the axis-parallel bounding box of , which is denoted by . We use the largest length side of to impose grids on the point set. In fact, we first decompose to a grid of regular hypercubes with side length , where . We call each hypercube a cell. Then, each point of is rounded to its corresponding central cell-point. Figure 1 shows an example of the rounding process for a point set in .
In the following, we impose again a -grid to for . Then, we round each point of the rounded point set to its nearest grid-point in this new grid that results in a point set . Let, be a hypercube with side length and central-point . We restrict our search for finding diametrical pairs of the first rounded point set into two hypercubes and corresponding to two diametrical pair and in the point set . Let us use two point sets and for maintaining points of the rounded point set , which are inside two hypercubes and , respectively. See Figure 2. Then, it is sufficient to find a diameter between points of , which are inside two point sets and . We use notation for the process of computing the diameter of point set . Altogether, we can present the following algorithm.
Algorithm 1: APPROXIMATE DIAMETER
Input: a set of points in and an error parameter .
Output: approximate diameter .
1: Compute the axis-parallel bounding box for a point set .
2: Find the length of the largest side in .
3: Set and .
4: Round each point of to its central-cell point in a -grid.
5: Round each point of to its nearest grid-point in a -grid.
6: Compute the diameter of the point set by brute-force manner, and
simultaneously, a list of the diametrical pair , such that .
7: Find points of which are in two hypercubes and
for each diametrical pair .
8: Compute , corresponding to each diametrical pair
by brute-force manner and return the maximum value between them.
9: .
10: Output .
Note that we only need time to round points to their central-cell points. We need time for computing the central cell-point for each point. Thus, we can round all point of a set of points to their central-cell points in time. Similarly, rounding points to their nearest grid-point can be done in time.
2.1 Analysis
In this subsection, we analyze the proposed algorithm.
Theorem 1.
Algorithm 1 computes an approximate diameter for a set of points in in time and space, where .
Proof.
Finding the extreme points in all coordinates and finding the largest side of can be done in time. The rounding step takes time for each point, and for all of them takes time, but for computing the diameter over the rounded point set we need to know the number of points in the set . We know that the largest side of the bounding box has length and the side length of each cell in -grid is . On the other hand, the volume of a hypercube of side length in -dimensional space is . Since, corresponding to each point in the point set , we can take a hypercube of side length . Therefore, the number of grid-points in an imposed -grid to the bounding box is at most
(1) |
So, the number of points in is at most . Hence, by the brute-force quadratic algorithm, we need time for computing all distances between grid-points of the set , and its diametrical pair list. Then, for a diametrical pair in point set , we compute two sets and . They include points of which are inside two hypercubes and , respectively. This work takes time. In addition, for computing the diameter of point set , we need to know number of points in each of them. On the other hand, the number of points in two sets or is at most
(2) |
Hence, for computing , we need time by brute-force manner, but we might have more than one diametrical pair . Since the point set is a set of grid-points, so we could have in the worst-case different diametrical pairs in point set . This means that this step takes at most time.
Now, we can present the complexity of our algorithm as follows:
(3) |
Since is fixed, we have:
(4) |
We can also reduce the running time of the Algorithm 1 by discarding some internal points which do not have any potential to be the diametrical pairs in rounded point set , and similarly, in two point sets and . This can be done by considering all the points which are same in their coordinates and keep only highest and lowest. Then, the number of points in point set , and two point sets and can be reduced to . So, using the brute-force quadratic algorithm, we need time to find the diametrical pairs. Hence, this gives us the total running time . About the required space, we only need space for storing required points sets. So, this completes the proof. ∎
Now, we explain the details of the approximation.
Theorem 2.
Algorithm 1 computes an approximate diameter such that: , where .
Proof.
In line 7 of the Algorithm 1, we compute two point sets and , for each diametrical pair in the point set . We know that a grid-point in point set is formed from points of the set which are inside hypercube . We use a hypercube of side length to make sure that we do not lost any candidate diametrical pair of the first rounded point set around a diametrical point , (see Figure 2). In the next step, we should find the diametrical pair for points which are inside two point sets and . Hence, it is remained to show that the diameter, which is computed by two points and , is a -approximation of the true diameter. Let and be two central-cell points of the rounded point set which are used in line 8 of the Algorithm 1 for computing the approximate diameter . Then, we have two cases, either two true points and are in farthest distance from each other in their corresponding cells (Figure 3 (a)), or they are in nearest distance from each other (Figure 3 (b)). It is obvious that the other cases are between these two cases.
For first case (Figure 3 (a)), we can rotate line such that two points and are projected on line . Let these two projected points be and , and we set and . We know that the side length of each cell in a grid which is used for point set is . So, the hypercube (cell) diagonal is . From Figure 3 (a) it can be found that and . Therefore, we have
(5) |
Similarly, for the second case (Figure 3 (b)), we can project two points and on line . Let these two projected points be and . We know that and . Therefore, we have
(6) |
Then, from (5) and (6) we can result:
(7) |
Since we know that , we have:
(8) |
Now, we can simplify (8) as following:
(9) |
We know that . For this reason we can result:
(10) |
Finally, if we assume that , we have:
(11) |
Therefore, the theorem is proven. ∎
2.2 The modified algorithm
In this subsection, we present a modified version of our proposed algorithm by combining it with a recursive approach due to Chan [17]. Hence, we first explain Chan’s recursive approach and then use it in a phase of our proposed algorithm. As mentioned before, Agarwal et al. [12] proposed a -approximation algorithm for computing the diameter of a set of points in with running time by projecting on directions. In fact, they found a small set of directions which can approximate well all directions. This can be done by forming unit vectors which start from origin to grid-points of a uniform grid on a unit sphere [12], or to grid-points on the boundary of a box [18]. These sets of directions have cardinality . The following observation explains how we can find these directions on the boundary of a box.
Observation 3.
([18]) Consider a box which includes origin such that the boundary of this box () be in distance at least 1 from origin. For a -grid on and for each vector , there is a grid point on such that the angle between two vectors and is at most .
Proof.
By scaling, we may assume that . Since, in a -grid the diagonal of each cell is , so there is a grid point such that: . Then, we have:
(12) |
This results the observation, because: . ∎
This observation explains that grid-points on the boundary of a box () form a set of numbers of unit vectors in such that for each , there is a vector from origin to a grid-point on , where the angle between two vectors and is at most . On the other hand, according to observation 3, there is a vector such that if be the angle between two vectors and , then , and so . If be the projection of the vector on the vector , then:
(13) |
So, we have:
(14) |
This means that if pair be the diametrical pair of a point set, then there is a vector such that the angle between two vectors and is at most . See Figure 4. Then, pair which is the projection of pair on the vector , is a -approximation of the true diametrical pair , and we have:
(15) |
In other words, we can project point set on directions for all , and compute a -approximate of the diameter by finding maximum diameter between all directions. We project points on directions. In addition, computing the extreme points on each direction takes time. Consequently, Agarwal et al. [12] algorithm computes a -approximate of the diameter in time. Chan [17] proposes that if we reduce number of points from to by rounding to a grid and then apply Agarwal et al. [12] method on this rounded point set, we need time to compute the maximum diameter over all directions. Taking into account time for rounding to a grid, this new approach takes time for computing a -approximation for the diameter of a set of points. Moreover, Chan [17] observed that the bottleneck of this approach is the large number of projection operations. Hence, he proposes that we can project points on a set of 2-dimensional unit vectors instead of -dimensional unit vectors to reduce the problem to numbers of -dimensional subproblems which can be solved recursively. Then, according to relation (14), for a vector , there is a vector such that:
(16) |
Since is a unit vector (), therefore, . Hence, we can rewrite the previous relation as following:
(17) |
or
(18) |
when is the th coordinate for a point . We can expand (18) to:
(19) |
Now, define the projection . Then, we can rewrite relation (19) for each vector as following:
(20) |
So, since we have for diametrical pair :
(21) |
Therefore, for finding a -approximation for the diameter of point set , it is sufficient that we approximate recursively the diameter of a projected point set over each of the vectors . Then, the maximum diametrical pair computed in the recursive calls is a –approximation to the diametrical pair. Now, let us reduce the number of points from to by rounding to a grid. Let we denote the required time for computing the diameter of points in -dimensional space with , then for a rounded point set on a grid with points, this approach breaks the problem into subproblems in a dimension. Hence, we have a recurrence . By assuming , we can rewrite the recurrence as:
(22) |
This can be solved to: . In this case, , so, this recursive takes time. Taking into account time, we spent for rounding to a grid at the first, Chan’s recursive approach computes a -approximation for the diameter of a set of points in time [17].
In the following, we use Chan’s recursive approach in a phase of our proposed algorithm and present a modified version of it with running time .
Algorithm 2: APPROXIMATE DIAMETER 2
Input: a set of points in and an error parameter .
Output: approximate diameter .
1: Compute the axis-parallel bounding box for a point set .
2: Find the length of the largest side in .
3: Set and .
4: Round each point of to its central-cell point in a -grid.
5: Round each point of to its nearest grid-point in a -grid.
6: Compute the diameter of the point set by brute-force, and simultaneously,
a list of the diametrical pair , such that .
7: Find points of which are in two hypercubes and
for each diametrical pair .
8: Compute , corresponding to each diametrical pair
by using Chan’s [17] recursive approach and return the maximum value
over all of them.
9: Output .
Now we will analyze the running time and approximation factor of the Algorithm 2.
Theorem 4.
A -approximation for the diameter of a set of n points in -dimensional Euclidean space can be computed in time, where .
Proof.
As it can be seen, lines 1 to 5 of the Algorithm 2 are the same as the Algorithm 1. We do rounding to grids twice and reach to a point set in time. In this case, the number of points in rounded points set is at most:
(23) |
This can be reduced to , by keeping only highest and lowest points which are the same in their coordinates. So, for finding all diametrical pairs of the point set , we can use the quadratic brute-force algorithm with time. Then, for each diametrical pair , we compute two sets and which include points of set which are inside two hypercubes and , respectively. Moreover, the number of points in two sets or is at most
(24) |
This can be reduced to , by keeping only highest and lowest points which are the same in their coordinates. Now, for computing , we use Chan’s [17] recursive approach instead of using the quadratic brute-force algorithm on the point set . On the other hand, computing the diameter on a set of points using Chan’s recursive approach takes the following recurrence based on relation (22): . By assuming , we can rewrite the recurrence as:
(25) |
This can be solved to: . In this case, , so, this recursive takes time. Also, if we have more than one diametrical pair in point set , then this step takes at most time. Therefore, we can write the complexity time of the algorithm as following:
(26) |
Since is fixed, we have:
(27) |
In addition, Chan’s recursive approach in line 8 of the Algorithm 2 returns a diametrical pair which is a -approximation for the diametrical pair . This means that:
(28) |
Moreover, the diametrical pair is an approximation of the true diametrical pair , and according to relation (10), we have:
(29) |
Hence, from (28) and (29) we can result:
(30) |
So, Algorithm 2 finds a -approximation of the diameter of a point set of points in time. Therefore, this completes the proof. ∎
3 Conclusion
We have presented a new -approximation algorithm to compute the diameter of a point set of points in for a fixed dimension in time, where . Moreover, we show that the proposed algorithm can be modified to a -approximation algorithm with time. Our proposed algorithms provide some improvements in comparison with existing algorithms in terms of simplicity, understanding and data structure.
References
- [1] Preparata, F. P., Shamos, M. I.: Computational Geometry: an Introduction. New York, Springer-Verlag, pp. 176-182, (1985)
- [2] Yao, A. C.: On constructing minimum spanning trees in -dimensional spaces and related problems. SIAM Journal on Computing, 11, pp. 721-736, (1982)
- [3] Clarkson, K. L., Shor, P. W.: Applications of random sampling in computational geometry. Discrete and Computational Geometry, 4, pp. 387-421, (1989)
- [4] Amato, N. M., Goodrich, M. T., Ramos, E. A.: Parallel algorithms for higher dimensional convex hulls. In Proceedings of the 35th annual Symposium on Foundations of Computer Science, pp. 683-694, (1994)
- [5] Ramos, E. A.: Intersection of unit-balls and diameter of a point set in . Computational Geometry: Theory and Applications, 8, pp. 57-65, (1997)
- [6] Ramos, E. A.: Construction of 1-d lower envelopes and applications. In Proceedings of the 13th annual ACM Symposium on Computational Geometry (SoCG’97), pp. 57-66, (1997)
- [7] Bespamyatnikh, S.: An efficient algorithm for the three-dimensional diameter problem. Discrete and Computational Geometry, 25(2), pp. 235-255, (2001)
- [8] Ramos, E. A.: Deterministic algorithms for 3-D diameter and some 2-D lower envelopes. In Proceedings of the 16th annual Symposium on Computational Geometry (SoCG’00), (2000)
- [9] Ramos E. A.: An optimal deterministic algorithm for computing the diameter of a three-dimensional point set. Discrete and Computational Geometry, 26, pp. 245-265, (2001)
- [10] Cheong, O, Shin, C. S., Vigneron A.: Computing farthest neighbors on a convex polytope. In Proceedings of the 7th Annual International Computational and Combinatoric Conference, pp. 159-169, (2001)
- [11] Egecioglu, O., Kalantari B.: Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32(4), pp. 205-211, (1989)
- [12] Agarwal, P. K., Matousek, J. , Suri, S.: Farthest neighbors maximum spanning trees and related problems in higher dimensions. Computational Geometry: Theory and Applications, 1, pp. 189-201, (1992)
- [13] Barequet, G., Har-Peled, S.: Efficiently approximating the minimum-volume bounding box of a point set in three dimensions. Journal of Algorithms, 38, pp. 91-109, (2001)
- [14] Har-Peled, S.: A practical approach for computing the diameter of a point set. In Proceedings of the 17th annual Symposium on Computational Geometry (SoCG’01), pp. 177-186, (2001)
- [15] Malandain, G. , Boissonnat, J. D.: Computing the diameter of a point set. International Journal of Computational Geometry and Applications, 12(6), pp. 489-509, (2002)
- [16] Finocchiaro, D. V., Pellegrini, M.: On computing the diameter of a point set in high dimensional Euclidean space. Theoretical Computer Science, 287, pp. 501-514, (2002)
- [17] Chan, T. M.: Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. International Journal of Computational Geometry and Applications, pp. 67-85, (2002)
- [18] Chan, T. M.: Faster core-set constructions and data stream algorithms in fixed dimensions. Computational Geometry: Theory and Applications, 35, pp. 20-35, (2006)
- [19] Chan, T. M.: Applications of Chebyshev polynomials to low-dimensional computational geometry. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG’17), pages 26:1-15, (2017)
- [20] Arya, S., da Fonseca, G. D., Mount, D. M.: Near-optimal -kernel construction and related problems. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG’17), pages 10:1-15, (2017)