Google, aigerd@google.com School of Computer Science, Tel Aviv University, Tel Aviv, and Googlehaimk@tau.ac.ilPartially supported by ISF grant 1841/14, by grant 1367/2016 from the German-Israeli Science Foundation (GIF), and by Blavatnik Research Fund in Computer Science at Tel Aviv University.Google, efi@google.com School of Computer Science, Tel Aviv University, Tel Aviv, Israelmichas@tau.ac.ilPartially supported by ISF Grant 260/18, by grant 1367/2016 from the German-Israeli Science Foundation (GIF), and by Blavatnik Research Fund in Computer Science at Tel Aviv University. Google, bzeisl@google.com \CopyrightD. Aiger, H. Kaplan, E. Kokiopoulou, M. Sharir, B. Zeisl \EventEditors \EventNoEds2 \EventLongTitle \EventShortTitle \EventAcronym \EventYear \EventDate \EventLocation \EventLogo \SeriesVolume \ArticleNo

General techniques for approximate incidences and their application to the camera posing problem

Dror Aiger , Haim Kaplan , Efi Kokiopoulou , Micha Sharir and Bernhard Zeisl

Abstract.

We consider the classical camera pose estimation problem that arises in many computer vision applications, in which we are given $n$ 2D-3D correspondences between points in the scene and points in the camera image (some of which are incorrect associations), and where we aim to determine the camera pose (the position and orientation of the camera in the scene) from this data. We demonstrate that this posing problem can be reduced to the problem of computing ${\varepsilon}$ -approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied to other camera pose problems, as well as to similar problems in related application areas.

We describe and analyze three techniques for solving the resulting ${\varepsilon}$ -approximate incidences problem in the context of our camera posing application. The first is a straightforward assignment of surfaces to the cells of a grid (of side-length ${\varepsilon}$ ) that they intersect. The second is a variant of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique in full generality, and then apply it to the camera posing problem at hand.

We compare our methods experimentally on real and synthetic data. Our experiments show that for the typical values of $n$ and ${\varepsilon}$ , the primal-dual method is the fastest, also in practice.

Key words and phrases:

Camera positioning, Approximate incidences, Incidences

1991 Mathematics Subject Classification:

F.2.2 Nonnumerical Algorithms and Problems

1. Introduction

Camera pose estimation is a fundamental problem in computer vision, which aims at determining the pose and orientation of a camera solely from an image. This localization problem appears in many interesting real-world applications, such as for the navigation of self-driving cars [5], in incremental environment mapping such as Structure-from-Motion (SfM) [1, 11, 13], or for augmented reality [8, 9, 14], where a significant component are algorithms that aim to estimate an accurate camera pose in the world from image data.

Given a three-dimensional point-cloud model of a scene, the classical, but also state-of-the-art approach to absolute camera pose estimation consists of a two-step procedure. First, one matches a large number of features in the two-dimensional camera image with corresponding features in the three-dimensional scene. Then one uses these putative correspondences to determine the pose and orientation of the camera. Typically, the matches obtained in the first step contain many incorrect associations, forcing the second step to use filtering techniques to reject incorrect matches. Subsequently, the absolute 6 degrees-of-freedom (DoF) camera pose is estimated, for example, with a perspective $n$ -point pose solver [6] within a RANSAC scheme [4].

In this work we concentrate on the second step of the camera pose problem. That is, we consider the task of estimating the camera pose and orientation from a (potentially large) set of $n$ already calculated image-to-scene correspondences.

Further, we assume that we are given a common direction between the world and camera frames. For example, inertial sensors, available on any smart-phone nowadays, allow to estimate the vertical gravity direction in the three-dimensional camera coordinate system. This alignment of the vertical direction fixes two degrees of freedom for the rotation between the frames and we are left to estimate four degrees of freedom out of the general six. To obtain four equations (in the four remaining degrees of freedom), this setup requires two pairs of image-to-scene correspondences¹¹1As we will see later in detail, each correspondence imposes two constraints on the camera pose. for a minimal solver. Hence a corresponding naive RANSAC-based scheme requires $O(n^{2})$ filtering steps, where in each iterations a pose hypothesis based on a different pair of correspondences is computed and verified against all other correspondences.

Recently, Zeisl et al. [17] proposed a Hough-voting inspired outlier filtering and camera posing approach, which computes the camera pose up to an accuracy of ${\varepsilon}>0$ from a set of $2$ D- $3$ D correspondences, in $O(n/{\varepsilon}^{2})$ time, under the same alignment assumptions of the vertical direction. In this paper we propose new algorithms that work considerably faster in practice, but under milder assumptions. Our method is based on a reduction of the problem to a problem of counting ${\varepsilon}$ -approximate incidences between points and surfaces, where a point $p$ is ${\varepsilon}$ -approximately incident (or just ${\varepsilon}$ -incident) to a surface $\sigma$ if the (suitably defined) distance between $p$ and $\sigma$ is at most ${\varepsilon}$ . This notion has recently been introduced by a subset of the authors in [2], and applied in a variety of instances, involving somewhat simpler scenarios than the one considered here. Our approach enables us to compute a camera pose when the number of correspondences $n$ is large, and many of which are expected to be outliers. In contrast, a direct application of RANSAC-based methods on such inputs is very slow, since the fraction of inliers is small. In the limit, trying all pairs of matches involves $\Omega(n^{2})$ RANSAC iterations. Moreover, our methods enhance the quality of the posing considerably [17], since each generated candidate pose is close to (i.e., consistent with) with many of the correspondences.

Our results. We formalize the four degree-of-freedom camera pose problem as an approximate incidences problem in Section 2. Each 2D-3D correspondence is represented as a two-dimensional surface in the 4-dimensional pose-space, which is the locus of all possible positions and orientations of the camera that fit the correspondence exactly. Ideally, we would like to find a point (a pose) that lies on as many surfaces as possible, but since we expect the data to be noisy, and the exact problem is inefficient to solve anyway, we settle for an approximate version, in which we seek a point with a large number of approximate incidences with the surfaces.

Formally, we solve the following problem. We have an error parameter ${\varepsilon}>0$ , we lay down a grid on $[0,1]^{d}$ of side length ${\varepsilon}$ , and compute, for each vertex $v$ of the grid, a count $I(v)$ of surfaces that are approximately incident to $v$ , so that (i) every surface that is ${\varepsilon}$ -incident to $v$ is counted in $I(v)$ , and (ii) every surface that is counted in $I(v)$ is $\alpha{\varepsilon}$ -incident to $v$ , for some small constant $\alpha>1$ (but not all $\alpha{\varepsilon}$ -incident surfaces are necessarily counted). We output the grid vertex $v$ with the largest count $I(v)$ (or a list of vertices with the highest counts, if so desired).

As we will comment later, (a) restricting the algorithm to grid vertices only does not miss a good pose $v$ : a vertex of the grid cell containing $v$ serves as a good substitute for $v$ , and (b) we have no real control on the value of $I(v)$ , which might be much larger than the number of surfaces that are ${\varepsilon}$ -incident to $v$ , but all the surfaces that we count are ‘good’—they are reasonably close to $v$ . In the computer vision application, and in many related applications, neither of these issues is significant.

We give three algorithms for this camera-pose approximate-incidences problem. The first algorithm simply computes the grid cells that each surface intersects, and considers the number of intersecting surfaces per cell as its approximate ${\varepsilon}$ -incidences count. This method takes time $O\left(\frac{n}{{\varepsilon}^{2}}\right)$ for all vertices of our ${\varepsilon}$ -size grid. We then describe a faster algorithm using geometric duality, in Section 3. It uses a coarser grid in the primal space and switches to a dual 5-dimensional space (a 5-tuple is needed to specify a $2$ D- $3$ D correspondence and its surface, now dualized to a point). In the dual space each query (i.e., a vertex of the grid) becomes a 3-dimensional surface, and each original 2-dimensional surface in the primal 4-dimensional space becomes a point. This algorithm takes $O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right)$ time, and is asymptotically faster than the simple algorithm for $n>1/{\varepsilon}^{2}$ .

Finally, we give a general method for constructing an approximate incidences data structure for general $k$ -dimensional algebraic surfaces (that satisfy certain mild conditions) in ${\mathbb{R}}^{d}$ , in Section 4. It extends the technique of Fonseca and Mount [3], designed for the case of hyperplanes, and takes $O(n+{\rm poly}(1/{\varepsilon}))$ time, where the degree of the polynomial in $1/{\varepsilon}$ depends on the number of parameters needed to specify a surface, the dimension of the surfaces, and the dimension of the ambient space. We first present and analyze this technique in full generality, and then apply it to the surfaces obtained for our camera posing problem. In this case, the data structure requires $O(n+1/{\varepsilon}^{6})$ storage and is constructed in roughly the same time. This is asymptotically faster than our primal-dual scheme when $n\geq 1/{\varepsilon}^{16/3}$ (for $n\geq 1/{\varepsilon}^{7}$ the $O(n)$ term dominates and these two methods are asymptotically the same). Due to its generality, the latter technique is easily adapted to other surfaces and thus is of general interest and potential. In contrast, the primal-dual method requires nontrivial adaptation as it switches from one approximate-incidences problem to another and the dual space and its distance function depend on the type of the input surfaces.

We implemented our algorithms and compared their performance on real and synthetic data. Our experimentation shows that, for commonly used values of $n$ and ${\varepsilon}$ in practical scenarios ( $n\in[8K,32K]$ , ${\varepsilon}\in[0.02,0.03]$ ), the primal-dual scheme is considerably faster than the other algorithms, and should thus be the method of choice. Due to lack of space, the experimentation details are omitted in this version, with the exception of a few highlights. They can be found in the appendix.

2. From camera positioning to approximate incidences

Suppose we are given a pre-computed three-dimensional scene and a two-dimensional picture of it. Our goal is to deduce from this image the location and orientation of the camera in the scene. In general, the camera, as a rigid body in 3-space, has six degrees of freedom, three of translation and three of rotation (commonly referred to as the yaw, pitch and roll). We simplify the problem by making the realistic assumption, that the vertical direction of the scene is known in the camera coordinate frame (e.g., estimated by en inertial sensor on smart phones). This allows us to rotate the camera coordinate frame such that its $z$ -axis is parallel to the world $z$ -axis, thereby fixing the pitch and roll of the camera and leaving only four degrees of freedom $(x,y,z,\theta)$ , where $c=(x,y,z)$ is the location of the camera center, say, and $\theta$ is its yaw, i.e. horizontal the orientation of the optical axis around the vertical direction. See Figure 1.

Refer to caption — Figure 1. With the knowledge of a common vertical direction between the camera and world frame the general 6DoF camera posing problem reduces to estimating 4 parameters. This is the setup we consider in our work.

By preprocessing the scene, we record the spatial coordinates $w=(w_{1},w_{2},w_{3})$ of a discrete (large) set of salient points. We assume that some (ideally a large number) of the distinguished points are identified in the camera image, resulting in a set of image-to-scene correspondences. Each correspondence ${\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\}$ is parameterized by five parameters, the spatial position $w$ and the position $v=(\xi,\eta)$ in the camera plane of view of the same salient point. Our goal is to find a camera pose $(x,y,z,\theta)$ so that as many correspondences as possible are (approximately) consistent with it, i.e., the ray from the camera center $c$ to $w$ goes approximately through $(\xi,\eta)$ in the image plane, when the yaw of the camera is $\theta$ .

2.1. Camera posing as an ${\varepsilon}$ -incidences problem

Each correspondence and its 5-tuple ${\bf w}$ define a two-dimensional surface $\sigma_{\bf w}$ in parametric 4-space, which is the locus of all poses $(x,y,z,\theta)$ of the camera at which it sees $w$ at coordinates $(\xi,\eta)$ in its image. For $n$ correspondences, we have a set of $n$ such surfaces. We prove that each point in the parametric $4$ -space of camera poses that is close to a surface $\sigma_{\bf w}$ , in a suitable metric defined in that $4$ -space, represents a camera pose where $w$ is projected to a point in the camera viewing plane that is close to $(\xi,\eta)$ , and vice versa (see Section 2.2 for the actual expressions for these projections). Therefore, a point in $4$ -space that is close to a large number of surfaces represents a camera pose with many approximately consistent correspondences, which is a strong indication of being close to the correct pose.

Extending the notation used in the earlier work [2], we say that a point $q$ is ${\varepsilon}$ -incident to a surface $\sigma$ if ${\sf dist}(q,\sigma)\leq{\varepsilon}$ . Our algorithms approximate, for each vertex of a grid $G^{\varepsilon}$ of side length ${\varepsilon}$ , the number of ${\varepsilon}$ -incident surfaces and suggest the vertex with the largest count as the best candidate for the camera pose. This work extends the approximate incidences methodology in [2] to the (considerably more involved) case at hand.

2.2. The surfaces $\sigma_{\bf w}$

Let $w=(w_{1},w_{2},w_{3})$ be a salient point in ${\mathbb{R}}^{3}$ , and assume that the camera is positioned at $(c,\theta)=(x,y,z,\theta)$ . We represent the orientation of the vector $w-c$ , within the world frame, by its spherical coordinates $(\varphi,\psi)$ , except that, unlike the standard convention, we take $\psi$ to be the angle with the $xy$ -plane (rather than with the $z$ -axis):

\tan\psi=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}\quad\quad\quad\tan\varphi=\frac{w_{2}-y}{w_{1}-x}

In the two-dimensional frame of the camera the $(\xi,\eta)$ -coordinates model the view of $w$ , which differs from above polar representation of the vector $w-c$ only by the polar orientation $\theta$ of the viewing plane itself. Writing $\kappa$ for $\tan\theta$ , we have

\xi=\tan(\varphi-\theta)=\frac{\tan\varphi-\tan\theta}{1+\tan\varphi\tan\theta}=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)},

(1)

\eta=\tan\psi=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}.

We note that using $\tan\theta$ does not distinguish between $\theta$ and $\theta+\pi$ , but we will restrict $\theta$ to lie in $[-\pi/4,\pi/4]$ or in similar narrower ranges, thereby resolving this issue.

We use ${\mathbb{R}}^{4}$ with coordinates $(x,y,z,\kappa)$ as our primal space, where each point models a possible pose of the camera. Each correspondence ${\bf w}$ is parameterized by the triple $(w,\xi,\eta)$ , and defines a two-dimensional algebraic surface $\sigma_{\bf w}$ of degree at most $4$ , whose equations (in $x,y,z,\kappa$ ) are given in (1). It is the locus of all camera poses $v=(x,y,z,\kappa)$ at which it sees $w$ at image coordinates $(\xi,\eta)$ . We can rewrite these equations into the following parametric representation of $\sigma_{\bf w}$ , expressing $z$ and $\kappa$ as functions of $x$ and $y$ :

\kappa=\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}\quad\quad\quad z=w_{3}-\eta\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}.

(2)

For a camera pose $v=(x,y,z,\kappa)$ , and a point $w=(w_{1},w_{2},w_{3})$ , we write

F(v;w)=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)}\quad\quad\quad G(v;w)=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}.

(3)

In this notation we can write the Equations (1) characterizing $\sigma_{\bf w}$ (when regarded as equations in $v$ ) as $\xi=F(v;w)$ and $\eta=G(v;w)$ .

2.3. Measuring proximity

Given a guessed pose $v=(x,y,z,\kappa)$ of the camera, we want to measure how well it fits the scene that the camera sees. For this, given a correspondence ${\bf w}=(w,\xi,\eta)$ , we define the frame distance ${\sf fd}$ between $v$ and ${\bf w}$ as the $L_{\infty}$ -distance between $(\xi,\eta)$ and $(\xi_{v},\eta_{v})$ , where, as in Eq. (3), $\xi_{v}=F(v;w)$ , $\eta_{v}=G(v;w)$ . That is,

{\sf fd}(v,{\bf w})=\max\left\{|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\right\}.

(4)

Note that $(\xi_{v},\eta_{v})$ are the coordinates at which the camera would see $w$ if it were placed at position $v$ , so the frame distance is the $L_{\infty}$ -distance between these coordinates and the actual coordinates $(\xi,\eta)$ at which the camera sees $w$ ; this serves as a natural measure of how close $v$ is to the actual pose of the camera.

We are given a viewed scene of $n$ distinguished points (correspondences) ${\bf w}=(w,\xi,\eta)$ . Let $S$ denote the set of $n$ surfaces $\sigma_{\bf w}$ , representing these correspondences. We assume that the salient features $w$ and the camera are all located within some bounded region, say $[0,1]^{3}$ . The replacement of $\theta$ by $\kappa=\tan\theta$ makes its range unbounded, so we break the problem into four subproblems, in each of which $\theta$ is confined to some sector. In the first subproblem we assume that $-\pi/4\leq\theta\leq\pi/4$ , so $-1\leq\kappa\leq 1$ . The other three subproblems involve the ranges $[\pi/4,3\pi/4]$ , $[3\pi/4,5\pi/4]$ , and $[5\pi/4,7\pi/4]$ . We only consider here the first subproblem; the treatment of the others is fully analogous. In each such range, replacing $\theta$ by $\tan\theta$ does not incur the ambiguity of identifying $\theta$ with $\theta+\pi$ .

Given an error parameter ${\varepsilon}>0$ , we seek an approximate pose $v$ of the camera, at which many correspondences ${\bf w}$ are within frame distance at most ${\varepsilon}$ from $v$ , as given in (4).

The following two lemmas relate our frame distance to the Euclidean distance. Their (rather technical) proofs are given in the appendix.

Lemma 2.1.

Let $v=(x,y,z,\kappa)$ , and let $\sigma_{\bf w}$ be the surface associated with a correspondence ${\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\}$ . Let $v^{\prime}$ be a point on $\sigma_{\bf w}$ such that $|v-v^{\prime}|\leq{\varepsilon}$ (where $|\cdot|$ denotes the Euclidean norm). If

(i) $\left|(w_{1}-x)+\kappa(w_{2}-y)\right|\geq a>0$ , and

(ii) $(w_{1}-x)^{2}+(w_{2}-y)^{2}\geq a>0$ , for some absolute constant $a$ ,

then ${\sf fd}(v,{\bf w})\leq\beta{\varepsilon}$ for some constant $\beta$ that depends on $a$ .

Informally, Condition (i) requires that the absolute value of the $\xi=\tan(\varphi-\theta)$ coordinate of the position of $w$ in the viewing plane, with the camera positioned at $v$ , is not too large (i.e., that $|(\varphi-\theta)|$ is not too close to $\pi/2$ ). We can ensure this property by restricting the camera image to some suitably bounded $\xi$ -range.

Similarly, Condition (ii) requires that the $xy$ -projection of the vector $w-c$ is not too small. It can be violated in two scenarios. Either we look at a data point that is too close to $c$ , or we see it looking too much ‘upwards’ or ‘downwards’. We can ensure that the latter situation does not arise, by restricting the camera image, as in the preceding paragraph, to some suitably bounded $\eta$ -range too. That done, we ensure that the former situation does not arise by requiring that the physical distance between $c$ and $w$ be at least some multiple of $a$ .

The next lemma establishes the converse connection.

Lemma 2.2.

Let $v=(x,y,z,\kappa)$ be a camera pose and ${\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\}$ a correspondence, such that ${\sf fd}(v,{\bf w})\leq{\varepsilon}$ . Assume that $\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0$ , for some absolute constant $a$ , and consider the point $v^{\prime}=(x,y,z^{\prime},\kappa^{\prime})\in\sigma_{\bf w}$ where (see Eq. (2))

z^{\prime}=w_{3}-\eta\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}\;\;\;\;\;\;\;\;\;\;\kappa^{\prime}=\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}.

Then $|z-z^{\prime}|\leq\sqrt{2}{\varepsilon}$ and $|\kappa-\kappa^{\prime}|\leq c{\varepsilon}$ , for some constant $c$ , again depending on $a$ .

Informally, the condition $\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0$ means that the orientation of the camera, when it is positioned at $(x,y)$ and sees $w$ at coordinate $\xi$ of the viewing plane is not too close to $\pm\pi/2$ . This is a somewhat artificial constraint that is satisfied by our restriction on the allowed yaws of the camera (the range of $\kappa$ ).

A Simple algorithm. Using Lemma 2.2 and Lemma 2.1 we can derive a simple naive solution which does not require any of the sophisticated machinery developed in this work. We construct a grid $G$ over $Q=[0,1]^{3}\times[-1,1]$ , of cells $\tau$ , each of dimensions ${\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2a{\varepsilon}$ , where $a$ is the constant of Lemma 2.2. We use this non-square grid $G$ since we want to find ${\varepsilon}$ -approximate incidences in terms of frame distance. For each cell $\tau$ of $G$ we compute the number of surfaces $\sigma_{\bf w}$ that intersect $\tau$ . This gives an approximate incidences count for the center of $\tau$ . Further details and a precise statement can be found in the appendix.

3. Primal-dual algorithm for geometric proximity

Following the general approach in [2], we use a suitable duality, with some care. We write ${\varepsilon}=2\gamma\delta_{1}\delta_{2}$ , for suitable parameters $\gamma$ , and ${\varepsilon}/(2\gamma)\leq\delta_{1},\,\delta_{2}\leq 1$ , whose concrete values are fixed later, and apply the decomposition scheme developed in [2] tailored to the case at hand. Specifically, we consider the coarser grid $G_{\delta_{1}}$ in the primal space, of cell dimensions $\delta_{1}\times\delta_{1}\times\sqrt{2}\delta_{1}\times c\delta_{1}$ , where $c$ is is the constant from Lemma 2.2, that tiles up the domain $Q=[0,1]^{3}\times[-1,1]$ of possible camera positions. For each cell $\tau$ of $G_{\delta_{1}}$ , let $S_{\tau}$ denote the set of surfaces that cross either $\tau$ or one of the eight cells adjacent to $\tau$ in the $(z,\kappa)$ -directions.²²2The choice of $z$ , $\kappa$ is arbitrary, but it is natural for the analysis, given in the appendix. The duality is illustrated in Figure 2.

We discretize the set of all possible positions of the camera by the vertices of the finer grid $G_{\varepsilon}$ , defined as $G_{\delta_{1}}$ , with ${\varepsilon}$ replacing $\delta_{1}$ , that tiles up $Q$ . The number of these candidate positions is $m:=O(1/{\varepsilon}^{4})$ . For each vertex $q\in G_{\varepsilon}$ , we want to approximate the number of surfaces that are ${\varepsilon}$ -incident to $q$ , and output the vertex with the largest count as the best candidate for the position of the camera. Let $V_{\tau}$ be the subset of $G_{\varepsilon}$ contained in $\tau$ . We ensure that the boxes of $G_{\delta_{1}}$ are pairwise disjoint by making them half open, in the sense that if $(x_{0},y_{0},z_{0},\kappa_{0})$ is the vertex of a box that has the smallest coordinates, then the box is defined by $x_{0}\leq x<x_{0}+\delta_{1}$ , $y_{0}\leq y<y_{0}+\delta_{1}$ , $z_{0}\leq z<z_{0}+\sqrt{2}\delta_{1}$ , $\kappa_{0}\leq\kappa<\kappa_{0}+c\delta_{1}$ . This makes the sets $V_{\tau}$ pairwise disjoint as well. Put $m_{\tau}=|V_{\tau}|$ and $n_{\tau}=|S_{\tau}|$ . We have $m_{\tau}=O\left((\delta_{1}/{\varepsilon})^{4}\right)$ for each $\tau$ . Since the surfaces $\sigma_{\bf w}$ are two-dimensional algebraic surfaces of constant degree, each of them crosses $O(1/\delta_{1}^{2})$ cells of $G_{\delta_{1}}$ , so we have $\sum_{\tau}n_{\tau}=O(n/\delta_{1}^{2})$ .

We now pass to the dual five-dimensional space. Each point in that space represents a correspondence ${\bf w}=(w_{1},w_{2},w_{3},\xi,\eta)$ . We use the first three components $(w_{1},w_{2},w_{3})$ as the first three coordinates, but modify the $\xi$ - and $\eta$ -coordinates in a manner that depends on the primal cell $\tau$ . Let $c_{\tau}=(x_{\tau},y_{\tau},z_{\tau},\kappa_{\tau})$ be the midpoint of the primal box $\tau$ . For each $\sigma_{\bf w}\in S_{\tau}$ we map ${\bf w}=(w,\xi,\eta)$ , where $w=(w_{1},w_{2},w_{3})$ , to the point ${\bf w}_{\tau}=(w_{1},w_{2},w_{3},\xi_{\tau},\eta_{\tau})$ , where $\xi_{\tau}=\xi-F(c_{\tau};w)$ and $\eta_{\tau}=\eta-G(c_{\tau};w)$ , with $F$ and $G$ as given in (3). We have

Corollary 3.1.

If $\sigma_{\bf w}$ crosses $\tau$ then $|\xi_{\tau}|,\;|\eta_{\tau}|\leq\gamma\delta_{1}$ , for some absolute constant $\gamma$ , provided that the following two properties hold, for some absolute constant $a>0$ (the constant $\gamma$ depends on $a$ ).

(i) $\left|(w_{1}-x_{\tau})+\kappa_{\tau}(w_{2}-y_{\tau})\right|\geq a$ , and

(ii) $(w_{1}-x_{\tau})^{2}+(w_{2}-y_{\tau})^{2}\geq a$ , where $(x_{\tau},y_{\tau})$ are the $(x,y)$ -coordinates of the center of $\tau$ .

Proof 3.2.

If $\sigma_{\bf w}\in S_{\tau}$ then it contains a point $v^{\prime}$ such that $|v^{\prime}-c_{\tau}|\leq c^{\prime}\delta_{1}$ , for a suitable absolute constant $c^{\prime}$ (that depends on $c$ ). We now apply Lemma 2.1, recalling (4).

We take the $\gamma$ provided by Corollary 3.1 as the $\gamma$ in the definition of $\delta_{1}$ and $\delta_{2}$ . We map each point $v\in V_{\tau}$ to the dual surface $\sigma^{*}_{v}=\sigma^{*}_{v;\tau}=\{{\bf w}_{\tau}\mid v\in\sigma_{\bf w}\}$ . Using (3), we have

\sigma^{*}_{v;\tau}=\{(w,\;F(v;w)-F(c_{\tau};w),G(v;w)-G(c_{\tau};w))\mid w=(w_{1},w_{2},w_{2})\in[0,1]^{3}\}.

By Corollary 3.1, the points ${\bf w}_{\tau}$ , for the surfaces $\sigma_{\bf w}$ that cross $\tau$ , lie in the region $R_{\tau}=[0,1]^{3}\times[-\gamma\delta_{1},\gamma\delta_{1}]^{2}$ . We partition $R_{\tau}$ into a grid $G_{\delta_{2}}$ of $1/\delta_{2}^{5}$ small congruent boxes, each of dimensions $\delta_{2}\times\delta_{2}\times\delta_{2}\times(2\gamma\delta_{1}\delta_{2})\times(2\gamma\delta_{1}\delta_{2})=\delta_{2}\times\delta_{2}\times\delta_{2}\times{\varepsilon}\times{\varepsilon}$ .

Exactly as in the primal setup, we make each of these boxes half-open, thereby making the sets of dual vertices in the smaller boxes pairwise disjoint. We assign to each of these dual cells $\tau^{*}$ the set $S^{*}_{\tau^{*}}$ of dual points that lie in $\tau^{*}$ , and the set $V^{*}_{\tau^{*}}$ of the dual surfaces that cross either $\tau^{*}$ or one of the eight cells adjacent to $\tau^{*}$ in the $(\xi_{\tau},\eta_{\tau})$ -directions. Put $n_{\tau^{*}}=|S^{*}_{\tau^{*}}|$ and $m_{\tau^{*}}=|V^{*}_{\tau^{*}}|$ . Since the dual cells are pairwise disjoint, we have $\sum_{\tau^{*}}n_{\tau^{*}}=n_{\tau}$ . Since the dual surfaces are three-dimensional algebraic surfaces of constant degree, each of them crosses $O(1/\delta_{2}^{3})$ grid cells, so $\sum_{\tau^{*}}m_{\tau^{*}}=O\left(m_{\tau}/\delta_{2}^{3}\right)$ .

We compute, for each dual surface $\sigma^{*}_{v}$ , the sum $\sum_{\tau^{*}}|S^{*}_{\tau^{*}}|$ , over the dual cells $\tau^{*}$ that are either crossed by $\sigma^{*}_{v}$ or that one of their adjacent cells in the $(\xi_{\tau},\eta_{\tau})$ -directions is crossed by $\sigma^{*}_{v}$ . We output the vertex $v$ of $G_{\varepsilon}$ with the largest resulting count, over all primal cells $\tau$ .

The following theorem establishes the correctness of our technique. Its proof is given in Appendix B.

Theorem 3.3.

Suppose that for every cell $\tau\in G_{\delta_{1}}$ and for every point $v=(x,y,z,\kappa)\in V_{\tau}$ and every ${\bf w}=((w_{1},w_{2},w_{3}),\xi,\eta)$ such that $\sigma_{\bf w}$ intersects either $\tau$ or one of its adjacent cells in the $(\xi_{\tau},\eta_{\tau})$ -directions, we have that, for some absolute constant $a>0$ ,

(i) $\left|(w_{1}-x)+\kappa(w_{2}-y)\right|\geq a$ ,

(ii) $(w_{1}-x)^{2}+(w_{2}-y)^{2}\geq a$ , and

(iii) $\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a$ .

Then (a) For each $v\in V$ , every pair $(v,{\bf w})$ at frame distance $\leq{\varepsilon}$ is counted (as an ${\varepsilon}$ -incidence of $v$ ) by the algorithm. (b) For each $v\in V$ , every pair $(v,{\bf w})$ that we count lies at frame distance $\leq\alpha{\varepsilon}$ , for some constant $\alpha>0$ depending on $a$ .

3.1. Running time analysis

The cost of the algorithm is clearly proportional to $\sum_{\tau}\sum_{\tau^{*}}\left(m_{\tau^{*}}+n_{\tau^{*}}\right),$ over all primal cells $\tau$ and the dual cells $\tau^{*}$ associated with each cell $\tau$ . We have

\sum_{\tau}\sum_{\tau^{*}}\left(m_{\tau^{*}}+n_{\tau^{*}}\right)=O\left(\sum_{\tau}\left(m_{\tau}/\delta_{2}^{3}+n_{\tau}\right)\right)=O\left(m/\delta_{2}^{3}+n/\delta_{1}^{2}\right).

Optimizing the choice of $\delta_{1}$ and $\delta_{2}$ , we choose $\delta_{1}=\left(\frac{{\varepsilon}^{3}n}{m}\right)^{1/5}$ and $\delta_{2}=\left(\frac{{\varepsilon}^{2}m}{n}\right)^{1/5}$ . These choices make sense as long as each of $\delta_{1}$ , $\delta_{2}$ lies between ${\varepsilon}/(2\gamma)$ and $1$ . That is, $\frac{{\varepsilon}}{2\gamma}\leq\left(\frac{{\varepsilon}^{3}n}{m}\right)^{1/5}\leq 1$ and $\frac{{\varepsilon}}{2\gamma}\leq\left(\frac{{\varepsilon}^{2}m}{n}\right)^{1/5}\leq 1$ , or $c^{\prime}{\varepsilon}^{2}m\leq n\leq\frac{c^{\prime\prime}m}{{\varepsilon}^{3}}$ , where $c^{\prime}$ and $c^{\prime\prime}$ are absolute constants (that depend on $\gamma$ ).

If $n<c^{\prime}{\varepsilon}^{2}m$ , we use only the primal setup, taking $\delta_{1}={\varepsilon}$ (for the primal subdivision). The cost is then $O\left(n/{\varepsilon}^{2}+m\right)=O\left(m\right).$ Similarly, if $n>\frac{c^{\prime\prime}m}{{\varepsilon}^{3}}$ , we use only the dual setup, taking $\delta_{1}=1$ and $\delta_{2}={\varepsilon}/(2\gamma)$ , and the cost is thus $O\left(n+m/{\varepsilon}^{3}\right)=O(n).$ Adding everything together, to cover all three subranges, the running time is then $O\left(\frac{m^{2/5}n^{3/5}}{{\varepsilon}^{6/5}}+n+m\right).$ Substituting $m=O\left(1/{\varepsilon}^{4}\right)$ , we get a running time of $O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right).$ The first term dominates when $n=\Omega(\frac{1}{{\varepsilon}^{2}})$ and $n=O(\frac{1}{{\varepsilon}^{7}})$ . In conclusion, we have the following result.

Theorem 3.4.

Given $n$ data points that are seen (and identified) in a two-dimensional image taken by a vertically positioned camera, and an error parameter ${\varepsilon}>0$ , where the viewed points satisfy the assumptions made in Theorem 3.3, we can compute, in $O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right)$ time, a vertex $v$ of $G_{\varepsilon}$ that maximizes the approximate count of ${\varepsilon}$ -incident correspondences, where “approximate” means that every correspondence ${\bf w}$ whose surface ${\bf\sigma_{w}}$ is at frame distance at most ${\varepsilon}$ from $v$ is counted and every correspondence that we count lies at frame distance at most $\alpha{\varepsilon}$ from $v$ , for some fixed constant $\alpha$ .

Restricting ourselves only to grid vertices does not really miss any solution. We only lose a bit in the quality of approximation, replacing ${\varepsilon}$ by a slightly large constant multiple thereof, when we move from the best solution to a vertex of its grid cell.

4. Geometric proximity via canonical surfaces

In this section we present a general technique to preprocess a set of algebraic surfaces into a data structure that can answer approximate incidences queries. In this technique we round the $n$ original surfaces into a set of canonical surfaces, whose size depends only on ${\varepsilon}$ , such that each original surface has a canonical surface that is “close” to it. Then we build an octree-based data structure for approximate incidences queries with respect to the canonical surfaces. However, to reduce the number of intersections between the cells of the octree and the surfaces, we further reduce the number of surfaces as we go from one level of the octree to the next, by rounding them in a coarser manner into a smaller set of surfaces.

This technique has been introduced by Fonseca and Mount [3] for the case of hyperplanes. We describe as a warmup step, in Section C of the appendix, our interpretation of their technique applied to hyperplanes. We then extend here the technique to general surfaces, and apply it to the specific instance of 2-surfaces in 4-space that arise in the camera pose problem.

We have a set $S$ of $n$ $k$ -dimensional surfaces in ${\mathbb{R}}^{d}$ that cross the unit cube $[0,1]^{d}$ , and a given error parameter ${\varepsilon}$ . We assume that each surface $\sigma\in S$ is given in parametric form, where the first $k$ coordinates are the parameters, so its equations are

x_{j}=F_{j}^{(\sigma)}(x_{1},\ldots,x_{k}),\qquad\text{for $j=k+1,\ldots,d$}.

Moreover, we assume that each $\sigma\in S$ is defined in terms of $\ell$ essential parameters ${\bf t}=(t_{1},\ldots,t_{\ell})$ , and $d-k$ additional free additive parameters ${\bf f}=(f_{k+1},\ldots,f_{d})$ , one free parameter for each dependent coordinate. Concretely, we assume that the equations defining the surface $\sigma\in S$ , parameterized by ${\bf t}$ and ${\bf f}$ (we then denote $\sigma$ as $\sigma_{{\bf t},{\bf f}}$ ), are

x_{j}=F_{j}({\bf x};{\bf t})+f_{j}=F_{j}(x_{1},\ldots,x_{k};t_{1},\ldots,t_{\ell})+f_{j},\qquad\text{for $j=k+1,\ldots,d$}.

For each equation of the surface that does not have a free parameter in the original expression, we introduce an artificial free parameter, and initialize its value to $0$ . (We need this separation into essential and free parameters for technical reasons that will become clear later.) We assume that ${\bf t}$ (resp., ${\bf f}$ ) varies over $[0,1]^{\ell}$ (resp., $[0,1]^{d-k}$ ).

Remark. The distinction between free and essential parameters seems to be artificial, but yet free parameters do arise in certain basic cases, such as the case of hyperplanes discussed in Section C of the appendix. In the case of our 2-surfaces in 4-space, the parameter $w_{3}$ is free, and we introduce a second artificial free parameter into the equation for $\kappa$ . The number of essential parameters is $\ell=4$ (they are $w_{1}$ , $w_{2}$ , $\xi$ , and $\eta$ ).

We assume that the functions $F_{j}$ are all continuous and differentiable, in all of their dependent variables ${\bf x}$ , ${\bf t}$ and ${\bf f}$ (this is a trivial assumption for ${\bf f}$ ), and that they satisfy the following two conditions.

(i) Bounded gradients. $\left|\nabla_{\bf x}F_{j}({\bf x};{\bf t})\right|\leq c_{1},\quad\left|\nabla_{\bf t}F_{j}({\bf x};{\bf t})\right|\leq c_{1},$ for each $j=k+1,\ldots,d$ , for any ${\bf x}\in[0,1]^{k}$ and any ${\bf t}\in[0,1]^{\ell}$ , where $c_{1}$ is some absolute constant. Here $\nabla_{\bf x}$ (resp., $\nabla_{\bf t}$ ) means the gradient with respect to only the variables ${\bf x}$ (resp., ${\bf t}$ ).

(ii) Lipschitz gradients. $\left|\nabla_{\bf x}F_{j}({\bf x};{\bf t})-\nabla_{\bf x}F_{j}({\bf x};{\bf t}^{\prime})\right|\leq c_{2}|{\bf t}-{\bf t}^{\prime}|,$ for each $j=k+1,\ldots,d$ , for any ${\bf x}\in[0,1]^{k}$ and any ${\bf t}$ , ${\bf t}^{\prime}\in[0,1]^{\ell}$ , where $c_{2}$ is some absolute constant. This assumption is implied by the assumption that all the eigenvalues of the mixed part of the Hessian matrix $\nabla_{\bf t}\nabla_{\bf x}F_{j}({\bf x};{\bf t})$ have absolute value bounded by $c_{2}$ .

4.1. Canonizing the input surfaces

We first replace each surface $\sigma_{{\bf t},{\bf f}}\in S$ by a canonical “nearby” surface $\sigma_{{\bf s},{\bf g}}$ . Let ${\varepsilon}^{\prime}=\frac{{\varepsilon}}{c_{2}\log(1/{\varepsilon})}$ where $c_{2}$ is the constant from Condition (ii). We get ${\bf s}$ from ${\bf t}$ (resp., ${\bf g}$ from ${\bf f}$ ) by rounding each coordinate in the essential parametric domain $L$ (resp., in the parametric domain $\Phi$ ) to a multiple of ${\varepsilon}^{\prime}/(\ell+1)$ . Note that each of the artificial free parameters (those that did not exist in the original equations) has the initial value $0$ for all surfaces, and remains $0$ in the rounded surfaces. We get $O\left((1/{\varepsilon}^{\prime})^{\ell^{\prime}}\right)$ canonical rounded surfaces, where $\ell^{\prime}\geq\ell$ is the number of original parameters, that is, the number of essential parameters plus the number of non-artificial free parameters; in the worst case we have $\ell^{\prime}=\ell+d-k$ .

For a surface $\sigma_{{\bf t},{\bf f}}$ and its rounded version $\sigma_{{\bf s},{\bf g}}$ we have, for each $j$ ,

	$\displaystyle\left\|\left(F_{j}({\bf x};{\bf t})+f_{j}\right)-\left(F_{j}({\bf x};{\bf s})+g_{j}\right)\right\|$	$\displaystyle\leq\|\nabla_{\bf t}F_{j}({\bf x};{\bf t}^{\prime})\|\cdot\|{\bf t}-{\bf s}\|+\|f_{j}-g_{j}\|$
		$\displaystyle\leq c_{1}\|{\bf t}-{\bf s}\|+\|f_{j}-g_{j}\|\leq(c_{1}+1){\varepsilon}^{\prime},$

where ${\bf t}^{\prime}$ is some intermediate value, which is irrelevant due to Condition (i).

We will use the $\ell_{2}$ -norm of the difference vector $\left((F_{j}({\bf x};{\bf t})+f_{j})-(F_{j}({\bf x};{\bf s})+g_{j})\right)_{j=k+1}^{d}$ as the measure of proximity between the surfaces $\sigma_{{\bf t},{\bf f}}$ and $\sigma_{{\bf s},{\bf g}}$ at ${\bf x}$ , and denote it as ${\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}};{\bf x})$ . The maximum ${\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}}):=\max_{{\bf x}\in[0,1]^{k}}{\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}};{\bf x})$ measures the global proximity of the two surfaces. (Note that it is an upper bound on the Hausdorff distance between the two surfaces.) We thus have ${\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}})\leq(c_{1}+1){\varepsilon}^{\prime}$ when $\sigma_{{\bf s},{\bf g}}$ is the canonical surface approximating $\sigma_{{\bf t},{\bf f}}$ .

We define the weight of each canonical surface to be the number of original surfaces that got rounded to it, and we refer to the set of all canonical surfaces by $S^{c}$ .

4.2. Approximately counting ${\varepsilon}$ -incidences

We describe an algorithm for approximating the ${\varepsilon}$ -incidences counts of the surfaces in $S$ and the vertices of a grid $G$ of side length $4{\varepsilon}$ .

We construct an octree decomposition of $\tau_{0}:=[0,1]^{d}$ , all the way to subcubes of side length $4{\varepsilon}$ such that each vertex of $G$ is the center of a leaf-cube. We propagate the surfaces of $S^{c}$ down this octree, further rounding each of them within each subcube that it crosses.

The root of the octree corresponds to $\tau_{0}$ , and we set $S_{\tau_{0}}=S^{c}$ . At level $j\geq 1$ of the recursion, we have subcubes $\tau$ of $\tau_{0}$ of side length $\delta=1/2^{j}$ . For each such $\tau$ , we set $\tilde{S}_{\tau}$ to be the subset of the surfaces in $S_{p(\tau)}$ (that have been produced at the parent cube $p(\tau)$ of $\tau$ ) that intersect $\tau$ . We now show how to further round the surfaces of $\tilde{S}_{\tau}$ , so as to get a coarser set $S_{\tau}$ of surfaces that we associate with $\tau$ , and that we process recursively within $\tau$ .

At any node $\tau$ at level $j$ of our rounding process, each surface $\sigma$ of $S_{\tau}$ is of the form $x_{j}=H_{j}({\bf x};{\bf t})+f_{j}$ , for $j=k+1,\ldots,d$ where ${\bf x}=(x_{1},\ldots,x_{k})$ , and ${\bf t}=(t_{1},\ldots,t_{\ell})$ .

(a) For each $j=k+1,\ldots,d$ the function $H_{j}$ is a translation of $F_{j}$ . That is $H_{j}({\bf x};{\bf t})=F_{j}({\bf x};{\bf t})+c$ for some constant $c$ . Thus the gradients of $H_{j}$ also satisfy Conditions (i) and (ii).

(b) ${\bf t}$ is some vector of $\ell$ essential parameters, and each coordinate of ${\bf t}$ is an integer multiple of $\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}$ , where $\delta=1/2^{j}$ .

(c) ${\bf f}=(f_{k+1},\ldots,f_{d})$ is a vector of free parameters, each is a multiple of ${\varepsilon}^{\prime}/(\ell+1)$ .

Note that the surfaces in $S_{\tau_{0}}=S^{c}$ , namely the set of initial canonical surfaces constructed in Section 4.1, are of this form (for $j=0$ and $H_{j}=F_{j}$ ). We get $S_{\tau}$ from $\tilde{S}_{\tau}\subseteq S_{p(\tau)}$ by the following steps. The first step just changes the presentation of $\tau$ and $\tilde{S}_{\tau}$ , and the following steps do the actual rounding to obtain $S_{\tau}$ .

(1)

Let $(\xi_{1},\ldots,\xi_{k},\xi_{k+1},\ldots,\xi_{d})$ be the point in $\tau$ of smallest coordinates and set ${\bf\xi}=(\xi_{1},\ldots,\xi_{k})$ . We rewrite the equations of each surface of $\tilde{S}_{\tau}$ as follows: $x_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}$ , for $j=k+1,\ldots,d$ , where $G_{j}({\bf x};{\bf t})=H_{j}({\bf x};{\bf t})-H_{j}({\bf\xi};{\bf t})+\xi_{j}$ , and $f^{\prime}_{j}=f_{j}+H_{j}({\bf\xi};{\bf t})-\xi_{j}$ , for $j=k+1,\ldots,d$ . Note that in this reformulation we have not changed the essential parameters, but we did change the free parameters from $f_{j}$ to $f^{\prime}_{j}$ , where $f^{\prime}_{j}$ depends on $f_{j}$ , ${\bf t}$ , ${\bf\xi}$ , and $\xi_{j}$ . Note also that $G_{j}({\bf\xi};{\bf t})=\xi_{j}$ for $j=k+1,\ldots,d$ .
(2)

We replace the essential parameters ${\bf t}$ of a surface $\sigma_{{\bf t},{\bf f}}$ by ${\bf s}$ , which we obtain by rounding each coordinate of ${\bf t}$ to the nearest integer multiple of $\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}$ . So the rounded surface has the equations $x_{j}=G_{j}({\bf x};{\bf s})+f^{\prime}_{j}$ , for $j=k+1,\ldots,d$ . Note that we also have that $G_{j}({\bf\xi};{\bf s})=\xi_{j}$ , for $j=k+1,\ldots,d$ .
(3)

For each surface, we round each free parameter $f^{\prime}_{j}$ , $j=k+1,\ldots,d$ , to an integral multiple of $\frac{{\varepsilon}^{\prime}}{\ell+1}$ , and denote the rounded vector by ${\bf g}$ . Our final equations for each rounded surface that we put in $S_{\tau}$ are $x_{j}=G_{j}({\bf x};{\bf s})+g_{j}$ for $j=k+1,\ldots,d$ .

By construction, when ${\bf t}_{1}$ and ${\bf f}^{\prime}_{1}$ and ${\bf t}_{2}$ and ${\bf f}^{\prime}_{2}$ get rounded to the same vectors ${\bf s}$ and ${\bf g}$ then the corresponding two surfaces in $\tilde{S}_{\tau}$ get rounded to the same surface in $S_{\tau}$ . The weight of each surface in $S_{\tau}$ is the sum of the weights of the surfaces in $S_{p(\tau)}$ that got rounded to it, which, by induction, is the number of original surfaces that are recursively rounded to it. In the next step of the recursion the $H_{j}$ ’s of the parametrization of the surfaces in $S_{\tau}$ are the functions $G_{j}$ defined above.

The total weight of the surface in $S_{\tau}$ for a leaf cell $\tau$ is the approximate ${\varepsilon}$ -incidences count that we associate with the center of $\tau$ .

4.3. Error analysis

We now bound the error incurred by our discretization. We start with the following lemma, whose proof is given in Appendix .

Lemma 4.1.

Let $\tau$ be a cell of the octtree and let $x_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}$ , for $j=k+1,\ldots,d$ be a surface obtained in Step 1 of the rounding process described above. For any ${\bf x}=(x_{1},\ldots,x_{k})\in[0,\delta]^{k}$ , for any ${\bf t},{\bf s}\in[0,1]^{\ell}$ , and for each $j=k+1,\ldots,d$ , we have

\left|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right|\leq c_{2}|{\bf x}-{\bf\xi}|\cdot|{\bf t}-{\bf s}|,

(5)

where $c_{2}$ is the constant of Condition (ii), and ${\bf\xi}=(\xi_{1},\ldots,\xi_{k})$ consists of the first $k$ coordinates of the point in $\tau$ of smallest coordinates.

Lemma 4.2.

For any ${\bf x}=(x_{1},\ldots,x_{k})\in[0,\delta]^{k}$ , for any ${\bf t}$ , ${\bf s}\in[0,1]^{\ell}$ , and for each $j=k+1,\ldots,d$ , we have

\left|G_{j}({\bf x};{\bf s})+g_{j}-(G_{j}({\bf x};{\bf t})+f^{\prime}_{j})\right|\leq c_{2}{\varepsilon}^{\prime}\leq\frac{{\varepsilon}}{\log(1/{\varepsilon})},

(6)

where $c_{2}$ is the constant of Condition (ii).

Proof 4.3.

Using the triangle inequality and Lemma 4.1, we get that

\displaystyle\left|G_{j}({\bf x};{\bf s})+g_{j}-(G_{j}({\bf x};{\bf t})+f^{\prime}_{j})\right|

\displaystyle\leq\left|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right|+\left|g_{j}-f^{\prime}_{j}\right|\leq c_{2}|{\bf x}-{\bf\xi}||{\bf t}-{\bf s}|+\frac{{\varepsilon}^{\prime}}{\ell+1}\ .

Since $|{\bf x}-{\bf\xi}|\leq\delta$ , $|{\bf t}-{\bf s}|\leq\frac{\ell{\varepsilon}^{\prime}}{(\ell+1)\delta}$ , and $|g_{j}-f^{\prime}_{j}|\leq\frac{{\varepsilon}^{\prime}}{\ell+1}$ , the lemma follows.

We now bound the number of surfaces in $S_{\tau}$ . Since ${\bf s}\in[0,1]^{\ell}$ and each of its coordinates is a multiple of $\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}$ , we have at most $(\frac{\delta}{{\varepsilon}^{\prime}})^{\ell}$ different values for ${\bf s}$ . To bound the number of possible values of ${\bf g}$ , we prove the following lemma (see the appendix for the proof).

Lemma 4.4.

Let $x_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}$ , for $j=k+1,\ldots,d$ , be a surface $\sigma_{{\bf t},{\bf f}^{\prime}}$ in $\tilde{S}_{\tau}$ . For each $j=k+1,\ldots,d$ , we have $\left|f^{\prime}_{j}\right|\leq(c_{1}+1)\delta$ , where $c_{1}$ is the constant of Condition (i).

Lemma 4.4 implies that each $g_{j}$ , $j=k+1,\ldots,d$ , has only $O(\frac{\delta}{{\varepsilon}^{\prime}})$ possible values, for a total of at most $O((\frac{\delta}{{\varepsilon}^{\prime}})^{d-k})$ possible values for ${\bf g}$ . Combining the number of possible values for ${\bf s}$ and ${\bf g}$ , we get that the number of newly discretized surfaces in $S_{\tau}$ is

O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell}\cdot\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{d-k}\right)=O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell+d-k}\right).

(7)

It follows that each level of the recursive octree decomposition generates

O\left(\left(\frac{1}{\delta}\right)^{d}\cdot\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell+d-k}\right)=O\left(\frac{\delta^{\ell-k}}{({\varepsilon}^{\prime})^{\ell+d-k}}\right)

re-discretized surfaces, where the first factor in the left-hand side expression is the number of cubes generated at this recursive level, and the second factor is the one in (7).

Summing over the recursive levels $j=0,\ldots,\log\frac{1}{{\varepsilon}}$ , where the cube size $\delta$ is $1/2^{j}$ at level $j$ , we get a total size of $O\left(\frac{1}{({\varepsilon}^{\prime})^{\ell+d-k}}\sum_{j=0}^{\log\frac{1}{{\varepsilon}}}\frac{1}{2^{j(\ell-k)}}\right)$ . We get different estimates for the sum according to the sign of $\ell-k$ . If $\ell>k$ the sum is $O(1)$ . If $\ell=k$ the sum is $O\left(\log\frac{1}{{\varepsilon}}\right)$ . If $\ell<k$ the sum is $O\left(2^{j_{\rm max}(k-\ell)}\right)=O\left(\frac{1}{({\varepsilon}^{\prime})^{k-\ell}}\right).$ Accordingly, the overall size of the structure, taking also into account the cost of the first phase, is

\begin{cases}O\left(\frac{1}{({\varepsilon}^{\prime})^{\ell+d-k}}\right)&\text{for $\ell>k$}\\ O\left(\frac{1}{({\varepsilon}^{\prime})^{d}}\log\frac{1}{{\varepsilon}}\right)&\text{for $\ell=k$}\\ O\left(\frac{1}{({\varepsilon}^{\prime})^{d}}\right)&\text{for $\ell<k$}.\end{cases}

(8)

The following theorem summarizes the result of this section. Its proof follows in a straightforward way from the preceding discussion from Lemma 4.2, analogously to the proof of Lemma 4.4 in the appendix.

Theorem 4.5.

Let $S$ be a set of $n$ surfaces in ${\mathbb{R}}^{d}$ that cross the unit cube $[0,1]^{d}$ , given parametrically as $x_{j}=F_{j}({\bf x};{\bf t})+f_{j}$ for $j=k+1,\ldots,d$ , where the functions $F_{j}$ satisfy conditions (i) and (ii), and ${\bf t}=(t_{1},\ldots,t_{\ell})$ . Let $G$ be the $(4{\varepsilon})$ -grid within $[0,1]^{d}$ . The algorithm described above reports for each vertex $v$ of $G$ an approximate ${\varepsilon}$ -incidences count that includes all surfaces at distance at most ${\varepsilon}$ from $v$ and may include some surfaces at distance at most $(2\sqrt{d}+1){\varepsilon}$ from $v$ . The running time of this algorithm is proportional to the total number of rounded surfaces that it generates, which is given by Equation (8), plus an additive $O(n)$ term for the initial canonization of the surfaces.

We can modify our data structure so that it can answer approximate or exact ${\varepsilon}$ -incidence queries as we describe in Section C of the appendix for the case of hyperplanes.

5. Experimental Results

The goal of the experimental results is to show the practical relation between the naive, the primal-dual and the general canonical surfaces algorithms. It is not our intention to obtain the fastest possible code, but to obtain a platform for fair comparison between the techniques. We have performed a preliminary experimental comparison using synthetic as well as real-world data. We focus on values of $n,{\varepsilon}$ that are practical in real applications. Typically, we have $100K$ - $200K$ $3$ D points bounded by a rectangle of size 100-150 meters and the uncertainty is around 3m (so the relative error is ${\varepsilon}\leavevmode\nobreak\ =0.03$ ). The three methods that we evaluate are:

•

The naive method, with asymptotic run-time $O(\frac{n}{{\varepsilon}^{2}})$ .
•

The primal-dual method (cf. Section 3), with asymptotic run-time $O(n+\frac{n^{3/5}}{{\varepsilon}^{14/5}}+\frac{1}{{\varepsilon}^{4}})$ .
•

The canonical surfaces method (cf. Section 4), with asymptotic run-time $\tilde{O}(n+\frac{1}{{\varepsilon}^{6}})$ (ignoring poly logarithmic factors).

5.1. Random synthetic data

Starting from a fixed known camera pose, we generate a set of $n$ uniformly sampled $3$ D points which are projected onto the camera image plane using Eq. (1). To model outliers in the association process we use random projections for 90% of the $3D$ points, resulting in an inlier ratio of $10\%$ . We add Gaussian noise of zero mean and $\sigma=0.02$ to the coordinates of each $3D$ point. This provides us with $2$ D- $3$ D correspondences that are used for estimating the camera pose. We apply the three algorithms above and measure the run-times, where each algorithm is tested for its ability to reach approximately the (known) solution. We remark that the actual implementation may be slowed down by the (constant) cost of some of its primitive operations, but it can also gain efficiency from certain practical heuristic improvements. For example, in contrast to the worst case analysis, we could stop the recursion in the algorithm of Section 4, at any step of the octree expansion, whenever the maximum incidence count obtained so far is larger than the number of surfaces crossing a cell of the octree. The same applies for the primal-dual technique in the dual stage. On the other hand, finding whether or not a surface crosses a box in pose space, takes at least the time to test for intersections of the surface with 32 edges of the box, and this constant affects greatly the run-time. The $O(1/{\varepsilon}^{6})$ bound in the canonical surfaces algorithm is huge and has no effect in practice for this problem. For this reason, the overall number of surfaces that we have to consider in the recursion can be very large. The canonical surfaces algorithm in our setting does not change much with ${\varepsilon}$ because we are far from the second term effect. We show in Figure 3, a comparison of the three algorithms.

The computed camera poses corresponding to Figure 3, obtained by the three algorithm for various problem sizes, are displayed in Table 1, compared to the known pose. The goal here is not to obtain the most accurate algorithm but to show that they are comparable in accuracy in this setting so the runtime comparison is fair.

n	x(N/PD/C)	y(N/PD/C)	z(N/PD/C)	$\kappa$ (N/PD/C)
8000	0.31/0.31/0.28	0.22/0.2/0.18	0.1/0.12/0.09	0.55/0.66/0.59
12000	0.31/0.33/0.28	0.22/0.17/0.19	0.1/0.1/0.1	0.55/0.65/0.6
24000	0.31/0.3/0.28	0.22/0.2/0.18	0.1/0.1/0.09	0.55/0.61/0.59
32000	0.31/0.27/0.28	0.22/0.2/0.19	0.1/0.08/0.09	0.55/0.57/0.59
True pose	0.3	0.2	0.1	0.6

Table 1. Poses computed by the three algorithms for

{\varepsilon}=0.03

and various problem sizes (N:naive, PD:primal-dual, C:canonical surfaces).

5.2. Real-world data

We evaluated the performance of the algorithms also on real-world datasets for which the true camera pose is known. The input is a set of correspondences, each represented by a $5$ -tuple $(w_{1},w_{2},w_{3},\xi,\eta)$ , where $(w_{1},w_{2},w_{3})$ are the $3D$ coordinates of a salient feature in the scene and $(\xi,\eta)$ is its corresponding projection in the camera frame. We computed the camera pose from these matches using both primal-dual and naive algorithms and compared the poses to the true one. An example of the data we have used is shown in Figure 4.

We evaluated the runtime for different problem sizes and checked the correctness of the camera pose approximation when the size increased. To get different input sizes, we added random correspondences to a base set of actual correspondences. The number of random correspondences determines the input size but also the fraction of good correspondences (percentage of inliers) which goes down with increased input size (the number of inliers in real world cases is typically 10%). We show the same plots as before in Figure 5 and Table 2.

n	x	y	z	$\kappa$
2000	0.26	0.62	0.06	0.72
3626	0.36	0.56	0.12	0.52
5626	0.38	0.60	0.12	0.62
7626	0.40	0.64	0.08	0.57
11626	0.43	0.68	0.13	0.63
True pose	0.37	0.59	0.06	-

Table 2. Poses computed by the primal-dual algorithms for real-world data (we do not know the actual orientation here).

6. Future work

We note that similar approaches can be applied for computing the relative pose [10] between two cameras (that look at the same scene), except that the pose estimation then uses $2$ D- $2$ D matches between the two images (rather than 2D-3D image-to-model correspondences). Determining the relative motion between images is a prerequisite for stereo depth estimation [12], in multi-view geometry [7], and for the initialization of the view graph [15, 16] in SfM, and is therefore an equally important task in computer vision. In addition, in future work we want to also consider the case of a generalized or distributed camera setup and likewise transform the camera posing problem to an ${\varepsilon}$ -incidence problem.

References

[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski. Building rome in a day. In Commun. ACM 54(10), pages 105–112. ACM, 2011.
[2] D. Aiger, H. Kaplan, and M. Sharir. Output sensitive algorithms for approximate incidences and their applications. In Computational Geometry, to appear. Also in European Symposium on Algorithms, volume 5, pages 1–13, 2017.
[3] G. D. Da Fonseca and D. M. Mount. Approximate range searching: The absolute model. Computational Geometry, 43(4):434–444, 2010.
[4] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
[5] C. Häne, L. Heng, G. H. Lee, F. Fraundorfer, P. Furgale, T. Sattler, and M. Pollefeys. 3d visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection. Image and Vision Computing, 68:14–27, 2017.
[6] B. M. Haralick, C.-N. Lee, K. Ottenberg, and M. Nölle. Review and analysis of solutions of the three point perspective pose estimation problem. International Journal of Computer Vision, 13(3):331–356, 1994.
[7] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge university press, 2003.
[8] G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces. In ISMAR, pages 83–86. IEEE, 2009.
[9] S. Middelberg, T. Sattler, O. Untzelmann, and L. Kobbelt. Scalable 6-dof localization on mobile devices. In European Conference on Computer Vision, pages 268–283. Springer, 2014.
[10] D. Nistér. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004.
[11] M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3):207–232, 2004.
[12] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1-3):7–42, 2002.
[13] J. L. Schonberger and J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016.
[14] C. Sweeney, J. Flynn, B. Nuernberger, M. Turk, and T. Höllerer. Efficient computation of absolute pose for gravity-aware augmented reality. In ISMAR, pages 19–24. IEEE, 2015.
[15] C. Sweeney, T. Sattler, T. Hollerer, M. Turk, and M. Pollefeys. Optimizing the viewing graph for structure-from-motion. In Proceedings of the IEEE International Conference on Computer Vision, pages 801–809, 2015.
[16] C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop. constraints. In Computer Vision and Pattern Recognition, pages 1426–1433. IEEE, 2010.
[17] B. Zeisl, T. Sattler, and M. Pollefeys. Camera pose voting for large-scale image-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015.

Appendix A Omitted proofs

In all experiments we normalize the data, so that the camera position ( $x,y,z$ ) and the $3$ D points lie in the unit box $[0,1]^{3}$ , and the forth parameter ( $\kappa$ ) representing the camera orientation lies in $[-1,1]$ . Let $\xi_{v}:=F(v;w)$ , We apply the three algorithms above and measure the run-times, where each algorithm is tested for its ability to reach approximately the (known) solution. We remark that the actual implementation may be slowed down by the (constant) cost of some of its primitive operations, but it can also gain efficiency from certain practical heuristic improvements. For example, in contrast to the worst case analysis, we could stop the recursion in the algorithm of Section 4, at any step of the octree expansion, whenever the maximum incidence count obtained so far is larger than the number of surfaces crossing a cell of the octree. The same applies for the primal-dual technique in the dual stage. On the other hand, finding whether or not a surface crosses a box in pose space, takes at least the time to test for intersections of the surface with 32 edges of the box, and this constant affects greatly the run-time. The $O(1/{\varepsilon}^{6})$ bound in the canonical surfaces algorithm is huge and has no effect in practice for this problem. For this reason, the overall number of surfaces that we have to consider in the recursion can be very large. The canonical surfaces algorithm in our setting does not change much with ${\varepsilon}$ because we are far from the second term effect. We show in Figure 3, a comparison of the three algorithms. Since $v^{\prime}\in\sigma_{\bf w}$ we have that $\xi=F(v^{\prime};w)$ , $\eta=G(v^{\prime};w)$ . We want to show that ${\sf fd}(v,{\bf w})=\max\left\{|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\right\}\leq\beta{\varepsilon}$ for some constant $\beta$ that depends on $a$ .

Regarding $F$ and $G$ as functions of $v$ , we compute their gradients as follows.

	$\displaystyle F_{x}$	$\displaystyle=\frac{\kappa\left[(w_{1}-x)+\kappa(w_{2}-y)\right]+\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}=\frac{(1+\kappa^{2})(w_{2}-y)}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}$
	$\displaystyle F_{y}$	$\displaystyle=\frac{-\left[(w_{1}-x)+\kappa(w_{2}-y)\right]+\kappa\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}=-\frac{(1+\kappa^{2})(w_{1}-x)}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}$
	$\displaystyle F_{z}$	$\displaystyle=0$
	$\displaystyle F_{\kappa}$	$\displaystyle=\frac{-(w_{1}-x)\left[(w_{1}-x)+\kappa(w_{2}-y)\right]-(w_{2}-y)\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}$
		$\displaystyle=-\frac{(w_{1}-x)^{2}+(w_{2}-y)^{2}}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}},$

and

	$\displaystyle G_{x}$	$\displaystyle=\frac{(w_{3}-z)(w_{1}-x)}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{3/2}}$
	$\displaystyle G_{y}$	$\displaystyle=\frac{(w_{3}-z)(w_{2}-y)}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{3/2}}$
	$\displaystyle G_{z}$	$\displaystyle=-\frac{1}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{1/2}}$
	$\displaystyle G_{\kappa}$	$\displaystyle=0.$

Conditions (i) and (ii) in the lemma, plus the facts that we restrict both $(w_{1},w_{2},w_{3})$ and $(x,y,z)$ to lie in the bounded domain $[0,1]^{3}$ , and that $|\kappa|$ is also at most $1$ , are then easily seen to imply that the $v$ -gradients $|\nabla F|,\;|\nabla G|$ are at most $\beta$ , for some constant $\beta$ that depends on $a$ , and so $|\xi_{v}-\xi|\leq\beta|v-v^{\prime}|\leq\beta{\varepsilon}$ and $|\eta_{v}-\eta|\leq\beta|v-v^{\prime}|\leq\beta{\varepsilon}$ , and the lemma follows.

Proof A.1 (Proof of Lemma 2.2:).

Let

	$\displaystyle\xi_{v}$	$\displaystyle=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)}$		(9)
	$\displaystyle\eta_{v}$	$\displaystyle=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}.$

Since ${\sf fd}(v,{\bf w})\leq{\varepsilon}$ we have that $|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\leq{\varepsilon}$ .

Since the Equations (2) are the inverse system of those of (1), we can rewrite (9) as

	$\displaystyle z$	$\displaystyle=w_{3}-\eta_{v}\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}$
	$\displaystyle\kappa$	$\displaystyle=\frac{(w_{2}-y)-\xi_{v}(w_{1}-x)}{(w_{1}-x)+\xi_{v}(w_{2}-y)}.$

Hence

	$\displaystyle z-z^{\prime}$	$\displaystyle=(\eta-\eta_{v})\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}$
	$\displaystyle\kappa-\kappa^{\prime}$	$\displaystyle=\frac{(w_{2}-y)-\xi_{v}(w_{1}-x)}{(w_{1}-x)+\xi_{v}(w_{2}-y)}-\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}.$

It follows right away that $|z-z^{\prime}|\leq\sqrt{2}{\varepsilon}$ (recall that all the points lie in the unit cube). For the other difference, writing

H(t)=\frac{(w_{2}-y)-t(w_{1}-x)}{(w_{1}-x)+t(w_{2}-y)}

(with the other parameters being fixed), we get

|\kappa-\kappa^{\prime}|\leq\max_{t\in[\xi_{v},\xi]}|H^{\prime}(t)||\xi_{v}-\xi|.

As is easily verified, we have

H^{\prime}(t)=-\frac{(w_{1}-x)^{2}+(w_{2}-y)^{2}}{\left[(w_{1}-x)+t(w_{2}-y)\right]^{2}}.

Since $\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0$ , and $|\xi_{v}-\xi|\leq{\varepsilon}$ , the denominator of $H^{\prime}(t)$ is bounded away from zero (assuming that ${\varepsilon}$ is sufficiently small), and $|H^{\prime}(t)|\leq c$ for $t\in[\xi,\xi_{v}]$ , where $c$ is some fixed positive constant. This implies that $|\kappa-\kappa^{\prime}|\leq c{\varepsilon}$ , and the lemma follows.

Proof A.2 (Proof of Theorem 3.3. Part (a):).

Let $(v,{\bf w})$ be a pair at frame distance $\leq{\varepsilon}$ . By Lemma 2.2 and the definition of $G_{\delta_{1}}$ , there exists a cell $\tau\in G_{\delta_{1}}$ such that $v\in\tau$ and ${\bf w}\in S_{\tau}$ .

By definition, the surface $\sigma^{*}_{v;\tau}$ contains the point

(w_{1},w_{2},w_{3},\xi_{v}-F(c_{\tau};w),\eta_{v}-G(c_{\tau};w)),

where $\xi_{v}$ and $\eta_{v}$ are given by (9). Since ${\sf fd}(v,{\bf w})\leq{\varepsilon}$ , the points $(w_{1},w_{2},w_{3},\xi_{v}-F(c_{\tau};w),\eta_{v}-G(c_{\tau};w))$ and $(w_{1},w_{2},w_{3},\xi_{\tau},\eta_{\tau})$ lie at $L_{\infty}$ -distance at most ${\varepsilon}$ , therefore $\sigma^{*}_{v}\in V^{*}_{\tau^{*}}$ where $\tau^{*}\in G_{\delta_{2}}$ is the cell that contains ${\bf w}_{\tau}$ .

Together, these two properties imply that $(v,{\bf w})$ is counted by the algorithm. Moreover, since we kept both primal and dual boxes pairwise disjoint, each such pair is counted exactly once.

Part (b): Let $(v,{\bf w})$ be an ${\varepsilon}$ -incident pair that we encounter, where $v$ and ${\bf w}$ are encoded as above. That is, $\sigma_{\bf w}$ crosses the primal cell $\tau$ of $G_{\delta_{1}}$ that contains $v$ , or a neighboring cell in the $(z,\kappa)$ -directions, and $\sigma^{*}_{v;\tau}$ crosses the dual cell $\tau^{*}$ that contains ${\bf w}_{\tau}$ , or a neighboring cell in the $(\xi_{\tau},\eta_{\tau})$ -directions. This means that $\tau$ (or a neighboring cell) contains a point $c=(x^{\prime},y^{\prime},z^{\prime},\kappa^{\prime})\in\sigma_{\bf w}$ , and $\tau^{*}$ (or a neighboring cell) contains a point ${\bf w}^{\prime}_{\tau}=(w_{1}^{\prime},w_{2}^{\prime},w_{3}^{\prime},\xi^{\prime}_{\tau},\eta^{\prime}_{\tau})\in\sigma^{*}_{v;\tau}$ . The former containment means that

\xi=\frac{(w_{2}-y^{\prime})-\kappa^{\prime}(w_{1}-x^{\prime})}{(w_{1}-x^{\prime})+\kappa^{\prime}(w_{2}-y^{\prime})},\quad\quad\eta=\frac{w_{3}-z^{\prime}}{\sqrt{(w_{1}-x^{\prime})^{2}+(w_{2}-y^{\prime})^{2}}},

and that

|x-x^{\prime}|,|y-y^{\prime}|\leq\delta_{1},\quad\quad|z-z^{\prime}|\leq 2\sqrt{2}\delta_{1},\quad\text{and}\quad|\kappa-\kappa^{\prime}|\leq 2c\delta_{1}.

To interpret the latter containment, we write, using the definition of $\xi^{\prime}_{\tau}$ , $\eta^{\prime}_{\tau}$ , and the fact that ${\bf w}^{\prime}_{\tau}\in\sigma^{*}_{v;\tau}$ ,

	$\displaystyle\xi^{\prime}_{\tau}$	$\displaystyle=F(v;w^{\prime})-F(c_{\tau};w^{\prime})=\frac{(w^{\prime}_{2}-y)-\kappa(w^{\prime}_{1}-x)}{(w^{\prime}_{1}-x)+\kappa(w^{\prime}_{2}-y)}-\frac{(w^{\prime}_{2}-y_{\tau})-\kappa_{\tau}(w^{\prime}_{1}-x_{\tau})}{(w^{\prime}_{1}-x_{\tau})+\kappa_{\tau}(w^{\prime}_{2}-y_{\tau})}$
	$\displaystyle\eta^{\prime}_{\tau}$	$\displaystyle=G(v;w^{\prime})-G(c_{\tau};w^{\prime}_{1})=\frac{w^{\prime}_{3}-z}{\sqrt{(w^{\prime}_{1}-x)^{2}+(w^{\prime}_{2}-y)^{2}}}-\frac{w^{\prime}_{3}-z_{\tau}}{\sqrt{(w^{\prime}_{1}-x_{\tau})^{2}+(w^{\prime}_{2}-y_{\tau})^{2}}},$

where $w^{\prime}=(w^{\prime}_{1},w^{\prime}_{2},w^{\prime}_{3})$ , and where $c_{\tau}=(x_{\tau},y_{\tau},z_{\tau},\kappa_{\tau})$ is the centerpoint of $\tau$ , and

	$\displaystyle\max\left\{\|w_{1}-w_{1}^{\prime}\|,\;\|w_{2}-w_{2}^{\prime}\|,\;\|w_{3}-w_{3}^{\prime}\|\right\}$	$\displaystyle=2\delta_{2}$
	$\displaystyle\max\left\{\|\xi_{\tau}-\xi^{\prime}_{\tau}\|,\;\|\eta_{\tau}-\eta^{\prime}_{\tau}\|\right\}$	$\displaystyle=2{\varepsilon},$

where

\xi_{\tau}=\xi-F(c_{\tau};w)\quad\quad\text{and}\quad\quad\eta_{\tau}=\eta-G(c_{\tau};w).

By definition (of $F$ , $G$ , and the frame distance), we have

{\sf fd}(v,{\bf w})=\max\left\{\left|\xi-F(v;w)\right|,\;\left|\eta-G(v;w)\right|\right\},

which we can bound by writing

	$\displaystyle\left\|\xi-F(v;w)\right\|$	$\displaystyle\leq\left\|\xi-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|$
		$\displaystyle+\left\|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|$
		$\displaystyle=\left\|\xi_{\tau}-\xi^{\prime}_{\tau}\right\|+\left\|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|,$
	$\displaystyle\left\|\eta-G(v;w)\right\|$	$\displaystyle\leq\left\|\eta-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|$
		$\displaystyle+\left\|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|$
		$\displaystyle=\left\|\eta_{\tau}-\eta^{\prime}_{\tau}\right\|+\left\|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|.$

We are given that

\left|\xi_{\tau}-\xi^{\prime}_{\tau}\right|,\;\left|\eta_{\tau}-\eta^{\prime}_{\tau}\right|\leq 2{\varepsilon},

so it remains to bound the other term in each of the two right-hand sides. Consider for example the expression

F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w).

(10)

Write $c_{\tau}=v+t$ and $w^{\prime}=w+s$ , for suitable vectors $t,s\in{\mathbb{R}}^{3}$ . We expand the expression up to second order, by writing

	$\displaystyle F(v;w^{\prime})$	$\displaystyle=F(v;w+s)=F(v,w)+s\cdot\nabla_{w}F(v;w)+\frac{1}{2}s^{T}H_{w}(v;w)s$
	$\displaystyle F(c_{\tau};w)$	$\displaystyle=F(v+t;w)=F(v,w)+t\cdot\nabla_{v}F(v;w)+\frac{1}{2}t^{T}H_{v}(v;w)t$
	$\displaystyle F(c_{\tau};w^{\prime})$	$\displaystyle=F(v+t;w+s)=F(v,w)+s\cdot\nabla_{w}F(v;w)+t\cdot\nabla_{v}F(v;w)$
		$\displaystyle+\frac{1}{2}s^{T}H_{w}(v;w)s+\frac{1}{2}t^{T}H_{v}(v;w)t+t^{T}H_{v;w}(v;w)s,$

where $\nabla_{w}$ (resp., $\nabla_{v}$ ) denotes the gradient with respect to the variables $w$ (resp., $v$ ), and where $H_{w}$ (resp., $H_{v}$ , $H_{v;w}$ ) denotes the Hessian submatrix of second derivatives in which both derivatives are with respect to $w$ (resp., both are with respect to $v$ , one derivative is with respect to $v$ and the other is with respect to $w$ ).

Substituting in (10), we get that, up to second order,

	$\displaystyle\|F(v;w)-F(v;w^{\prime})$	$\displaystyle+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\|$
		$\displaystyle=\|t^{T}H_{v;w}(v;w)s\|\leq\\|H_{v;w}(v;w)\\|_{\infty}\|t\|\|s\|,$

where $\|H_{v;w}(v;w)\|_{\infty}$ is the maximum of the absolute values of all the “mixed” second derivatives. (Note that the mixed part of the Hessian of the Hessian arises also in the analysis of the algorithm in Section 4.) Arguing as in the preceding analysis and using the assumptions in the theorem, one can show that all these derivatives are bounded by some absolute constants, concluding that

|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)|=O(\delta_{1}\delta_{2})=O({\varepsilon}),

which implies that

\left|\xi-F(v;w)\right|=O({\varepsilon}).

Applying an analogous analysis to $G$ , we also have

\left|\eta-G(v;w)\right|=O({\varepsilon}).

Together, these bounds complete the proof of part (b) of the theorem.

Proof A.3 (Proof of Lemma 4.1:).

Fix $j$ , consider the function

K_{j}({\bf x}):=G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t}),

and recall our assumption that $G_{j}(\xi;{\bf t})=G_{j}(\xi;{\bf s})=\xi_{j}$ . Then we can also write the left-hand side of (5) as $K_{j}({\bf x})-K_{j}({\bf\xi})$ . By the intermediate value theorem, it can be written as

K_{j}({\bf x})-K_{j}({\bf\xi})=\langle\nabla K_{j}({\bf x}^{\prime}),{\bf x}-{\bf\xi}\rangle,

for some intermediate value ${\bf x}^{\prime}$ (that depends on ${\bf s}$ and ${\bf t}$ ). By definition, we have,

\nabla K_{j}({\bf x}^{\prime})=\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf s})-\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t}),

(11)

whose norm is bounded by $c_{2}|s-t|$ by Condition (ii). Using the Cauchy-Schwarz inequality, we can thus conclude that

	$\displaystyle\left\|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right\|$	$\displaystyle=\left\|K_{j}({\bf x})-K_{j}({\bf\xi})\right\|$
		$\displaystyle\leq\|{\bf x}-{\bf\xi}\|\cdot\left\|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf s})-\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right\|$
		$\displaystyle\leq c_{2}\|{\bf x}-{\bf\xi}\|\|{\bf t}-{\bf s}\|\ ,$

as asserted.

Proof A.4 (Proof of Lemma 4.4:).

Each surface $\sigma_{{\bf t},{\bf f}^{\prime}}$ in $\tilde{S}_{\tau}$ meets $\tau$ . That is, there exists a point $(x_{1},\ldots,x_{d})$ in $\tau=[0,\delta]^{d}$ that lies on $\sigma_{{\bf t},{\bf f}^{\prime}}$ , so we have $\xi_{j}\leq G_{j}({\bf x};{\bf t})+f^{\prime}_{j}\leq\xi_{j}+\delta$ for each $j=k+1,\ldots,d$ , where ${\bf x}=(x_{1},\ldots,x_{k})$ . Hence, for some intermediate value ${\bf x}^{\prime}$ , we have

	$\displaystyle\left\|f^{\prime}_{j}\right\|$	$\displaystyle=\left\|G_{j}({\bf\xi};{\bf t})-f^{\prime}_{j}-G_{j}({\bf x};{\bf t})+G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right\|$
		$\displaystyle\leq\left\|\xi_{j}-(f^{\prime}_{j}+G_{j}({\bf x};{\bf t}))\right\|+\left\|G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right\|$
		$\displaystyle\leq\delta+\left\|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right\|\cdot\|{\bf x}-{\bf\xi}\|$
		$\displaystyle\leq\delta+c_{1}\delta=(c_{1}+1)\delta,$

where the first inequality follows by the triangle inequality, the second follows since $\left|\xi_{j}-(f^{\prime}_{j}+G_{j}({\bf x};{\bf t}))\right|\leq\delta$ , the third by the intermediate value theorem and the Cauchy-Schwarz inequality, and the fourth by Condition (i).

Appendix B A simple algorithm

We present a simple naive solution which does not require any of the sophisticated machinery developed in this work. It actually turns out to be the most efficient solution when $n$ is small.

We construct a grid $G$ over $Q=[0,1]^{3}\times[-1,1]$ , of cells $\tau$ , each of dimensions ${\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2c{\varepsilon}$ , where $c$ is the constant of Lemma 2.2. (We use this non-square grid $G$ since we want to find ${\varepsilon}$ -approximate incidences in terms of frame distance.) For each cell $\tau$ of $G$ we compute the number of surfaces $\sigma_{\bf w}$ that intersect $\tau$ .

Consider now a shifted version $G^{\prime}$ of $G$ in which the vertices of $G^{\prime}$ are the centers of the cells of $G$ . To report how many surfaces are within frame distance ${\varepsilon}$ from a vertex $q\in G^{\prime}$ , we return the count of the cell of $G$ whose center is $q$ . By Lemma 2.2 and Lemma 2.1, this includes all surfaces at frame distance ${\varepsilon}$ from $q$ , but may also count surfaces at frame distance at most $\sqrt{10+4c^{2}}\beta{\varepsilon}$ from $q$ , where $\beta$ is the constant in Lemma 2.1. (The distance from $q$ to the farthest corner of its cell is $\sqrt{1^{2}+1^{2}+(2c)^{2}+(2\sqrt{2})^{2}}{\varepsilon}=\sqrt{10+4c^{2}}\beta{\varepsilon}$ .)

It takes $O(\frac{n}{{\varepsilon}^{2}})$ time to construct this data structure. Indeed, cell boundaries reside on $O(\frac{1}{{\varepsilon}})$ hyperplanes, so we compute the intersection curve of each surface with each of these hyperplanes, in a total of $O(\frac{n}{{\varepsilon}})$ time. Then, for each such curve we find the cell boundaries that it intersects within its three-dimensional hyperplane in $O(\frac{1}{{\varepsilon}})$ time. We summarize this result in the following theorem.

Theorem B.1.

The algorithm described above approximates the the number of surfaces that are at distance ${\varepsilon}$ to each vertex $q\in G^{\prime}$ where $G^{\prime}$ is an ${\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2c{\varepsilon}$ grid in $O(\frac{n}{{\varepsilon}^{2}})$ . (The approximation is in the sense defined above.)

Proof B.2.

Correctness follow from Lemmas 2.2 and 2.1, the running time follows since there are only $O(\frac{n}{{\varepsilon}^{2}})$ cells of $G$ that at least one surface intersects.

In fact we can find for each vertex $q$ of $G^{\prime}$ , the exact number of ${\varepsilon}$ -incident surfaces (i.e. surfaces at distance at most ${\varepsilon}$ from $q$ ). For this we keep with each cell $\tau$ of $G$ , the list of the surfaces that intersect $\tau$ . Then for each vertex $q\in G^{\prime}$ we traverse the surfaces stored in its cell and check which of them is within frame distance ${\varepsilon}$ from $q$ . The asymptotic running time is still $O(\frac{n}{{\varepsilon}^{2}})$ .

If we want to get an incidences counts of vertices of a finer grid that $G$ , we use a union of several shifted grids as above. This also allows to construct a data structure that can return an ${\varepsilon}$ -incidences count of any query point.

For the camera pose problem we use the vertex of $G_{\varepsilon}$ of largest ${\varepsilon}$ -incidences count as the position of the camera.

Appendix C Geometric proximity via canonical surfaces: The case of hyperplanes

We have a set $H$ of $n$ hyperplanes in ${\mathbb{R}}^{d}$ that cross the unit cube $\tau_{0}=[0,1]^{d}$ , and a given error parameter ${\varepsilon}$ . Each hyperplane $h\in H$ is given by an equation of the form $x_{d}=\sum_{i=1}^{d-1}a_{i}x_{i}+b$ . We assume, for simplicity, that $|a_{i}|\leq 1$ for each $h\in H$ and for each $i=1,\ldots,d-1$ . Moreover, since $h$ crosses $\tau_{0}$ , we have $|b|\leq d$ , as is easily checked. (This can always be enforced by rewriting the equation turning the $x_{i}$ with the coefficient $a_{i}$ of largest absolute value into the independent coordinate.)

For our rounding scheme we define ${\varepsilon}^{\prime}={\varepsilon}/\log(1/{\varepsilon})$ . We discretize each hyperplane $h\in H$ as follows. Let the equation of $h$ be $x_{d}=\sum_{i=1}^{d-1}a_{i}x_{i}+b$ . We replace each $a_{i}$ by the integer multiple of ${\varepsilon}^{\prime}/d$ that is nearest to it, and do the same for $b$ . Denoting these ‘snapped’ values as $a^{\prime}_{i}$ and $b^{\prime}$ , respectively, we replace $h$ by the hyperplane $h^{\prime}$ , given by $x_{d}=\sum_{i=1}^{d-1}a^{\prime}_{i}x_{i}+b^{\prime}$ . For any ${\bf x}=(x_{1},\ldots,x_{d-1})\in[0,1]^{d-1}$ , the $x_{d}$ -vertical distance between $h$ and $h^{\prime}$ at ${\bf x}$ is

\left|\left(\sum_{i=1}^{d-1}a_{i}x_{i}+b\right)-\left(\sum_{i=1}^{d-1}a^{\prime}_{i}x_{i}+b^{\prime}\right)\right|\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|x_{i}+|b-b^{\prime}|\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|+|b-b^{\prime}|\leq{\varepsilon}^{\prime}.

We define the weight of each canonical hyperplane to be the number of original hyperplanes that got rounded to it, and we refer to the set of all canonical hyperplanes by $H^{c}$ .

We describe a recursive procedure that approximates the number of ${\varepsilon}$ -incident hyperplanes of $H$ to each vertex of a $(4{\varepsilon})$ -grid $G$ that tiles up $[0,1]^{d}$ . Specifically, for each vertex $v$ of $G$ we report a count that includes all hyperplanes in $H$ that are at Euclidean distance at most ${\varepsilon}$ from $v$ but it may also count hyperplanes of $H$ that are at distance up to $(2\sqrt{d}+1){\varepsilon}$ from $v$ .

Our procedure constructs an octree decomposition of $\tau_{0}$ , all the way to subcubes of side length $4{\varepsilon}$ . (We assume that $4{\varepsilon}$ is a negative power of $2$ to avoid rounding issues.) We shift the grid $G$ such that its vertices are centers of these leaf-subcubes. At level $j$ of the recursive construction, we have subcubes $\tau$ of side length $\delta=1/2^{j}$ . For each such $\tau$ we construct a set $H_{\tau}$ of more coarsely rounded hyperplanes. The weight of each hyperplane $h$ in $H_{\tau}$ is the sum of the weights of the hyperplanes in the parent cube $p(\tau)$ of $\tau$ that got rounded to $h$ , which, by induction, is the number of original hyperplanes that are rounded to it (by repeated rounding along the path in the recursion tree leading to $\tau$ ).

At the root, where $j=0$ , we set $H_{\tau}=H^{c}$ (where each $h\in H_{\tau}$ has the initial weight of the number of original hyperplanes rounded to it, as described above). At any other cell $\tau$ we obtain $H_{\tau}$ by applying a rounding step to the set $\tilde{H}_{\tau}$ of the hyperplanes of $H_{p(\tau)}$ that intersect $\tau$ .

The coarser discretization of the hyperplanes of $\tilde{H}_{\tau}$ that produces the set $H_{\tau}$ proceeds as follows. Let $(\xi_{1},\ldots,\xi_{d})$ denote the coordinates of the corner of $\tau$ with smallest coordinates, so $\tau=\prod_{i=1}^{d}[\xi_{i},\xi_{i}+\delta]$ .

Let $h$ be a hyperplane of $H_{\tau}$ , and rewrite its equation as

x_{d}-\xi_{d}=\sum_{i=1}^{d-1}a_{i}(x_{i}-\xi_{i})+b\ .

This rewriting only changes the value of $b$ but does not affect the $a_{i}$ ’s. Since $h$ crosses $\tau$ , we have $|b|\leq d\delta$ (and $|a_{i}|\leq 1$ for each $i$ ). We now re-discretize each coefficient $a_{i}$ (resp., $b$ ) to the integer multiple of $\frac{{\varepsilon}^{\prime}}{d\delta}$ (resp., $\frac{{\varepsilon}^{\prime}}{d}$ ) that is nearest to it. Denoting these snapped values as $a^{\prime}_{i}$ and $b^{\prime}$ , respectively, we replace $h$ by the hyperplane $h^{\prime}$ given by

x_{d}-\xi_{d}=\sum_{i=1}^{d-1}a^{\prime}_{i}(x_{i}-\xi_{i})+b^{\prime}.

This re-discretization of the coefficients $a_{i}$ is a coarsening of the discretization of the hyperplanes in $\tilde{H}_{\tau}$ . The set $H_{\tau}$ contains all the new, more coarsely rounded hyperplanes that we obtain from the hyperplanes in $\tilde{H}_{\tau}$ in this manner. Note that several hyperplanes in $\tilde{H}_{\tau}$ may be rounded to the same hyperplane in $H_{\tau}$ . We set the weight of each hyperplane in $H_{\tau}$ to be the sum of the weights of the hyperplanes in $\tilde{H}_{\tau}$ that got rounded to it. (Note that although every hyperplane of $\tilde{H}_{\tau}$ crosses $\tau$ , such an $h$ may get rounded to a hyperplane that misses $\tau$ , in which case it is not represented by any hyperplane in $H_{\tau}$ .)

For any ${\bf x}=(x_{1},\ldots,x_{d-1})\in\prod_{i=1}^{d-1}[\xi_{i},\xi_{i}+\delta]$ , the $x_{d}$ -vertical distance between $h$ and $h^{\prime}$ at ${\bf x}$ is

	$\displaystyle\Bigl{\|}\Bigl{(}\xi_{d}+\sum_{i=1}^{d-1}a_{i}(x_{i}-\xi_{i})+b\Bigr{)}-\Bigl{(}\xi_{d}+\sum_{i=1}^{d-1}a^{\prime}_{i}(x_{i}-\xi_{i})+b^{\prime}\Bigr{)}\Bigr{\|}$
	$\displaystyle\leq\sum_{i=1}^{d-1}\|a_{i}-a^{\prime}_{i}\|(x_{i}-\xi_{i})+\|b-b^{\prime}\|\leq\sum_{i=1}^{d-1}\|a_{i}-a^{\prime}_{i}\|\delta+\|b-b^{\prime}\|\leq{\varepsilon}^{\prime}.$

Since the original value of $a_{i}$ is in $[-1,1]$ and we round it to an integer multiple of $\frac{{\varepsilon}^{\prime}}{d\delta}$ , the hyperplanes in $H_{\tau}$ have $O(\frac{\delta}{{\varepsilon}^{\prime}})$ possible values for each coefficient $a_{i}$ . Furthermore, these hyperplanes also have $O(\frac{\delta}{{\varepsilon}^{\prime}})$ possible values for $b$ , because $|b|\leq\delta d$ for every hyperplane in $\tilde{H}_{\tau}$ (since it intersects $\tau$ ). It follows that $|H_{\tau}|=O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{d}\right)$ , and the total size of all sets $H_{\tau}$ , over all cells $\tau$ at the same level of the octree, is $O\left(\left(\frac{1}{{\varepsilon}^{\prime}}\right)^{d}\right)$ .

Finally, at every leaf $\tau$ of the octree we report the sum of the weights of the hyperplanes in $H_{\tau}$ as the approximate ${\varepsilon}$ -incidences count of the vertex of $G$ at the center of $\tau$ .

Theorem C.1.

Let $H$ be a set of $n$ hyperplanes in ${\mathbb{R}}^{d}$ that cross the unit cube $[0,1]^{d}$ , and let $G$ be the $(4{\varepsilon})$ -grid within $[0,1]^{d}$ . The algorithm described above reports for each vertex $v$ of $G$ an approximate ${\varepsilon}$ -incidences count that includes all hyperplanes at Euclidean distance at most ${\varepsilon}$ from $v$ and may include some hyperplanes at distance at most $(2\sqrt{d}+1){\varepsilon}$ from $v$ . The running time of this algorithm is $O\left(n+\frac{(\log(1/{\varepsilon}))^{d+1}}{{\varepsilon}^{d}}\right)$ .

Proof C.2.

Let $v\in G$ , and consider a hyperplane $h\in H$ at distance at most ${\varepsilon}$ from $v$ . The hyperplane $h$ is rounded to a hyperplane $h^{\prime}\in H^{c}$ which is at distance at most ${\varepsilon}^{\prime}=\frac{{\varepsilon}}{\log(1/{\varepsilon})}$ from $h$ ,³³3The distance between two hyperplanes is defined to be the maximum vertical distance between them. and thereby at distance at most ${\varepsilon}+{\varepsilon}^{\prime}$ from $v$ . The hyperplane $h^{\prime}$ is further rounded to other hyperplanes while propagating down the octree. The distance from $h^{\prime}$ from the hyperplane that it is rounded to in $H^{c}$ is at most ${{\varepsilon}}{\log(1/{\varepsilon})}$ , and, in general the distance of $h^{\prime}$ from any hyperplane $h_{j}$ , that it is rounded to at any level $j$ , is at most $(j+1){\varepsilon}^{\prime}=\frac{(j+1){\varepsilon}}{\log(1/{\varepsilon})}$ . (Note that $h^{\prime}$ is rounded to different hyperplanes in different cells of level $j$ .) Therefore the distance of $h_{j}$ from $v$ is at most ${\varepsilon}+\frac{(j+1){\varepsilon}}{\log(1/{\varepsilon})}\leq 2{\varepsilon}$ (since $j+1\leq\log(1/{\varepsilon})$ ). It follows that $h^{\prime}$ is rounded to some hyperplane that crosses the cell that contains $v$ , at each level of the octree. In particular $h^{\prime}$ is (repeatedly) rounded to some hyperplane at the leaf containing $v$ and is included in the weight of some hyperplane at that leaf.

Consider now a hyperplane $h\in H$ that is rounded to some hyperplane $h^{\ell}$ at the leaf $\tau$ containing $v$ . The hyperplane $h^{\ell}$ is at distance at most ${\varepsilon}$ from $h$ . Therefore $h$ is at distance at most ${\varepsilon}$ from the boundary of the leaf-cell containing $v$ . The distance of $v$ to the boundary of the leaf-cell containing it is at most $2\sqrt{d}{\varepsilon}$ , so the distance of $h$ from $v$ is at most $(2\sqrt{d}+1){\varepsilon}$ .

The running time follows from the fact that the total size of the sets $H_{\tau}$ for all cells $\tau$ at a particular level of the quadtree is $O\left(\frac{(\log(1/{\varepsilon}))^{d}}{{\varepsilon}^{d}}\right)$ and there are $\log(1/(4{\varepsilon}))$ levels.

Note that if we consider an arbitrary point $p$ then each hyperplane at distance at most ${\varepsilon}$ from $p$ is included in the approximate count of at least one of the vertices of the grid $G$ surrounding $p$ . In this rather weak sense, the largest approximate incidences count of a vertex of $G$ can be considered as an approximation to the number of ${\varepsilon}$ -close hyperplanes to the point $p\in{\mathbb{R}}^{d}$ with the largest number of ${\varepsilon}$ -close hyperplanes.

Our octree data structure can give an approximate ${\varepsilon}$ -incidences count for any query point $q$ (albeit with somewhat worse constants). For this we construct a constant number of octree structures over $5^{d}$ shifted (by intergral multiple of ${\varepsilon}$ ) grids of a somewhat larger side-length, say $5{\varepsilon}$ . The grids are shifted such that each cell $c$ of a finer grid of side length ${\varepsilon}$ is centered in a larger grid cell of one of our grids, say $G_{c}$ . We use $G_{c}$ to answer queries $q$ that lie in $c$ , by returning the sum of the weights of the hyperplanes in $h_{\tau}$ where $\tau$ is the leaf of $G_{c}$ containing $q$ .

We can also modify this data structure such that it can answer ${\varepsilon}$ -incidences queries exactly. That is, given a query point $q$ , it can count (or report) the number of hyperplanes at distance at most ${\varepsilon}$ from $q$ and only these hyperplanes. To do this we maintain pointers from each hyperplane $h$ in $H_{\tau}$ to the hyperplanes in $H_{p(\tau)}$ that got rounded to $h$ . To answer a query $q$ , we find the leaf cell $\tau$ containing $q$ and then we traverse back the pointers of the hyperplanes of $H_{\tau}$ all the way up the octree to identify the original hyperplanes that were rounded to them. We then traverse this set of original hypeprlanes and count (or report) those that are at distance at most ${\varepsilon}$ from $q$ .

	$\displaystyle\left\|\xi-F(v;w)\right\|$	$\displaystyle\leq\left\|\xi-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|$
		$\displaystyle+\left\|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|$
		$\displaystyle=\left\|\xi_{\tau}-\xi^{\prime}_{\tau}\right\|+\left\|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right\|,$
	$\displaystyle\left\|\eta-G(v;w)\right\|$	$\displaystyle\leq\left\|\eta-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|$
		$\displaystyle+\left\|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|$
		$\displaystyle=\left\|\eta_{\tau}-\eta^{\prime}_{\tau}\right\|+\left\|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right\|.$

	$\displaystyle\left\|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right\|$	$\displaystyle=\left\|K_{j}({\bf x})-K_{j}({\bf\xi})\right\|$
		$\displaystyle\leq\|{\bf x}-{\bf\xi}\|\cdot\left\|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf s})-\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right\|$
		$\displaystyle\leq c_{2}\|{\bf x}-{\bf\xi}\|\|{\bf t}-{\bf s}\|\ ,$

	$\displaystyle\left\|f^{\prime}_{j}\right\|$	$\displaystyle=\left\|G_{j}({\bf\xi};{\bf t})-f^{\prime}_{j}-G_{j}({\bf x};{\bf t})+G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right\|$
		$\displaystyle\leq\left\|\xi_{j}-(f^{\prime}_{j}+G_{j}({\bf x};{\bf t}))\right\|+\left\|G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right\|$
		$\displaystyle\leq\delta+\left\|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right\|\cdot\|{\bf x}-{\bf\xi}\|$
		$\displaystyle\leq\delta+c_{1}\delta=(c_{1}+1)\delta,$