This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Google, aigerd@google.com School of Computer Science, Tel Aviv University, Tel Aviv, and Googlehaimk@tau.ac.ilPartially supported by ISF grant 1841/14, by grant 1367/2016 from the German-Israeli Science Foundation (GIF), and by Blavatnik Research Fund in Computer Science at Tel Aviv University.Google, efi@google.com School of Computer Science, Tel Aviv University, Tel Aviv, Israelmichas@tau.ac.ilPartially supported by ISF Grant 260/18, by grant 1367/2016 from the German-Israeli Science Foundation (GIF), and by Blavatnik Research Fund in Computer Science at Tel Aviv University. Google, bzeisl@google.com \CopyrightD. Aiger, H. Kaplan, E. Kokiopoulou, M. Sharir, B. Zeisl \EventEditors \EventNoEds2 \EventLongTitle \EventShortTitle \EventAcronym \EventYear \EventDate \EventLocation \EventLogo \SeriesVolume \ArticleNo

General techniques for approximate incidences and their application to the camera posing problem

Dror Aiger Haim Kaplan Efi Kokiopoulou Micha Sharir  and  Bernhard Zeisl
Abstract.

We consider the classical camera pose estimation problem that arises in many computer vision applications, in which we are given nn 2D-3D correspondences between points in the scene and points in the camera image (some of which are incorrect associations), and where we aim to determine the camera pose (the position and orientation of the camera in the scene) from this data. We demonstrate that this posing problem can be reduced to the problem of computing ε{\varepsilon}-approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied to other camera pose problems, as well as to similar problems in related application areas.

We describe and analyze three techniques for solving the resulting ε{\varepsilon}-approximate incidences problem in the context of our camera posing application. The first is a straightforward assignment of surfaces to the cells of a grid (of side-length ε{\varepsilon}) that they intersect. The second is a variant of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique in full generality, and then apply it to the camera posing problem at hand.

We compare our methods experimentally on real and synthetic data. Our experiments show that for the typical values of nn and ε{\varepsilon}, the primal-dual method is the fastest, also in practice.

Key words and phrases:
Camera positioning, Approximate incidences, Incidences
1991 Mathematics Subject Classification:
F.2.2 Nonnumerical Algorithms and Problems

1. Introduction

Camera pose estimation is a fundamental problem in computer vision, which aims at determining the pose and orientation of a camera solely from an image. This localization problem appears in many interesting real-world applications, such as for the navigation of self-driving cars [5], in incremental environment mapping such as Structure-from-Motion (SfM) [1, 11, 13], or for augmented reality [8, 9, 14], where a significant component are algorithms that aim to estimate an accurate camera pose in the world from image data.

Given a three-dimensional point-cloud model of a scene, the classical, but also state-of-the-art approach to absolute camera pose estimation consists of a two-step procedure. First, one matches a large number of features in the two-dimensional camera image with corresponding features in the three-dimensional scene. Then one uses these putative correspondences to determine the pose and orientation of the camera. Typically, the matches obtained in the first step contain many incorrect associations, forcing the second step to use filtering techniques to reject incorrect matches. Subsequently, the absolute 6 degrees-of-freedom (DoF) camera pose is estimated, for example, with a perspective nn-point pose solver [6] within a RANSAC scheme [4].

In this work we concentrate on the second step of the camera pose problem. That is, we consider the task of estimating the camera pose and orientation from a (potentially large) set of nn already calculated image-to-scene correspondences.

Further, we assume that we are given a common direction between the world and camera frames. For example, inertial sensors, available on any smart-phone nowadays, allow to estimate the vertical gravity direction in the three-dimensional camera coordinate system. This alignment of the vertical direction fixes two degrees of freedom for the rotation between the frames and we are left to estimate four degrees of freedom out of the general six. To obtain four equations (in the four remaining degrees of freedom), this setup requires two pairs of image-to-scene correspondences111As we will see later in detail, each correspondence imposes two constraints on the camera pose. for a minimal solver. Hence a corresponding naive RANSAC-based scheme requires O(n2)O(n^{2}) filtering steps, where in each iterations a pose hypothesis based on a different pair of correspondences is computed and verified against all other correspondences.

Recently, Zeisl et al. [17] proposed a Hough-voting inspired outlier filtering and camera posing approach, which computes the camera pose up to an accuracy of ε>0{\varepsilon}>0 from a set of 22D-33D correspondences, in O(n/ε2)O(n/{\varepsilon}^{2}) time, under the same alignment assumptions of the vertical direction. In this paper we propose new algorithms that work considerably faster in practice, but under milder assumptions. Our method is based on a reduction of the problem to a problem of counting ε{\varepsilon}-approximate incidences between points and surfaces, where a point pp is ε{\varepsilon}-approximately incident (or just ε{\varepsilon}-incident) to a surface σ\sigma if the (suitably defined) distance between pp and σ\sigma is at most ε{\varepsilon}. This notion has recently been introduced by a subset of the authors in [2], and applied in a variety of instances, involving somewhat simpler scenarios than the one considered here. Our approach enables us to compute a camera pose when the number of correspondences nn is large, and many of which are expected to be outliers. In contrast, a direct application of RANSAC-based methods on such inputs is very slow, since the fraction of inliers is small. In the limit, trying all pairs of matches involves Ω(n2)\Omega(n^{2}) RANSAC iterations. Moreover, our methods enhance the quality of the posing considerably [17], since each generated candidate pose is close to (i.e., consistent with) with many of the correspondences.

Our results. We formalize the four degree-of-freedom camera pose problem as an approximate incidences problem in Section 2. Each 2D-3D correspondence is represented as a two-dimensional surface in the 4-dimensional pose-space, which is the locus of all possible positions and orientations of the camera that fit the correspondence exactly. Ideally, we would like to find a point (a pose) that lies on as many surfaces as possible, but since we expect the data to be noisy, and the exact problem is inefficient to solve anyway, we settle for an approximate version, in which we seek a point with a large number of approximate incidences with the surfaces.

Formally, we solve the following problem. We have an error parameter ε>0{\varepsilon}>0, we lay down a grid on [0,1]d[0,1]^{d} of side length ε{\varepsilon}, and compute, for each vertex vv of the grid, a count I(v)I(v) of surfaces that are approximately incident to vv, so that (i) every surface that is ε{\varepsilon}-incident to vv is counted in I(v)I(v), and (ii) every surface that is counted in I(v)I(v) is αε\alpha{\varepsilon}-incident to vv, for some small constant α>1\alpha>1 (but not all αε\alpha{\varepsilon}-incident surfaces are necessarily counted). We output the grid vertex vv with the largest count I(v)I(v) (or a list of vertices with the highest counts, if so desired).

As we will comment later, (a) restricting the algorithm to grid vertices only does not miss a good pose vv: a vertex of the grid cell containing vv serves as a good substitute for vv, and (b) we have no real control on the value of I(v)I(v), which might be much larger than the number of surfaces that are ε{\varepsilon}-incident to vv, but all the surfaces that we count are ‘good’—they are reasonably close to vv. In the computer vision application, and in many related applications, neither of these issues is significant.

We give three algorithms for this camera-pose approximate-incidences problem. The first algorithm simply computes the grid cells that each surface intersects, and considers the number of intersecting surfaces per cell as its approximate ε{\varepsilon}-incidences count. This method takes time O(nε2)O\left(\frac{n}{{\varepsilon}^{2}}\right) for all vertices of our ε{\varepsilon}-size grid. We then describe a faster algorithm using geometric duality, in Section 3. It uses a coarser grid in the primal space and switches to a dual 5-dimensional space (a 5-tuple is needed to specify a 22D-33D correspondence and its surface, now dualized to a point). In the dual space each query (i.e., a vertex of the grid) becomes a 3-dimensional surface, and each original 2-dimensional surface in the primal 4-dimensional space becomes a point. This algorithm takes O(n3/5ε14/5+n+1ε4)O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right) time, and is asymptotically faster than the simple algorithm for n>1/ε2n>1/{\varepsilon}^{2}.

Finally, we give a general method for constructing an approximate incidences data structure for general kk-dimensional algebraic surfaces (that satisfy certain mild conditions) in d{\mathbb{R}}^{d}, in Section 4. It extends the technique of Fonseca and Mount [3], designed for the case of hyperplanes, and takes O(n+poly(1/ε))O(n+{\rm poly}(1/{\varepsilon})) time, where the degree of the polynomial in 1/ε1/{\varepsilon} depends on the number of parameters needed to specify a surface, the dimension of the surfaces, and the dimension of the ambient space. We first present and analyze this technique in full generality, and then apply it to the surfaces obtained for our camera posing problem. In this case, the data structure requires O(n+1/ε6)O(n+1/{\varepsilon}^{6}) storage and is constructed in roughly the same time. This is asymptotically faster than our primal-dual scheme when n1/ε16/3n\geq 1/{\varepsilon}^{16/3} (for n1/ε7n\geq 1/{\varepsilon}^{7} the O(n)O(n) term dominates and these two methods are asymptotically the same). Due to its generality, the latter technique is easily adapted to other surfaces and thus is of general interest and potential. In contrast, the primal-dual method requires nontrivial adaptation as it switches from one approximate-incidences problem to another and the dual space and its distance function depend on the type of the input surfaces.

We implemented our algorithms and compared their performance on real and synthetic data. Our experimentation shows that, for commonly used values of nn and ε{\varepsilon} in practical scenarios (n[8K,32K]n\in[8K,32K], ε[0.02,0.03]{\varepsilon}\in[0.02,0.03]), the primal-dual scheme is considerably faster than the other algorithms, and should thus be the method of choice. Due to lack of space, the experimentation details are omitted in this version, with the exception of a few highlights. They can be found in the appendix.

2. From camera positioning to approximate incidences

Suppose we are given a pre-computed three-dimensional scene and a two-dimensional picture of it. Our goal is to deduce from this image the location and orientation of the camera in the scene. In general, the camera, as a rigid body in 3-space, has six degrees of freedom, three of translation and three of rotation (commonly referred to as the yaw, pitch and roll). We simplify the problem by making the realistic assumption, that the vertical direction of the scene is known in the camera coordinate frame (e.g., estimated by en inertial sensor on smart phones). This allows us to rotate the camera coordinate frame such that its zz-axis is parallel to the world zz-axis, thereby fixing the pitch and roll of the camera and leaving only four degrees of freedom (x,y,z,θ)(x,y,z,\theta), where c=(x,y,z)c=(x,y,z) is the location of the camera center, say, and θ\theta is its yaw, i.e. horizontal the orientation of the optical axis around the vertical direction. See Figure 1.

Refer to caption
Figure 1. With the knowledge of a common vertical direction between the camera and world frame the general 6DoF camera posing problem reduces to estimating 4 parameters. This is the setup we consider in our work.

By preprocessing the scene, we record the spatial coordinates w=(w1,w2,w3)w=(w_{1},w_{2},w_{3}) of a discrete (large) set of salient points. We assume that some (ideally a large number) of the distinguished points are identified in the camera image, resulting in a set of image-to-scene correspondences. Each correspondence 𝐰={w1,w2,w3,ξ,η}{\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\} is parameterized by five parameters, the spatial position ww and the position v=(ξ,η)v=(\xi,\eta) in the camera plane of view of the same salient point. Our goal is to find a camera pose (x,y,z,θ)(x,y,z,\theta) so that as many correspondences as possible are (approximately) consistent with it, i.e., the ray from the camera center cc to ww goes approximately through (ξ,η)(\xi,\eta) in the image plane, when the yaw of the camera is θ\theta.

2.1. Camera posing as an ε{\varepsilon}-incidences problem

Each correspondence and its 5-tuple 𝐰{\bf w} define a two-dimensional surface σ𝐰\sigma_{\bf w} in parametric 4-space, which is the locus of all poses (x,y,z,θ)(x,y,z,\theta) of the camera at which it sees ww at coordinates (ξ,η)(\xi,\eta) in its image. For nn correspondences, we have a set of nn such surfaces. We prove that each point in the parametric 44-space of camera poses that is close to a surface σ𝐰\sigma_{\bf w}, in a suitable metric defined in that 44-space, represents a camera pose where ww is projected to a point in the camera viewing plane that is close to (ξ,η)(\xi,\eta), and vice versa (see Section 2.2 for the actual expressions for these projections). Therefore, a point in 44-space that is close to a large number of surfaces represents a camera pose with many approximately consistent correspondences, which is a strong indication of being close to the correct pose.

Extending the notation used in the earlier work [2], we say that a point qq is ε{\varepsilon}-incident to a surface σ\sigma if 𝖽𝗂𝗌𝗍(q,σ)ε{\sf dist}(q,\sigma)\leq{\varepsilon}. Our algorithms approximate, for each vertex of a grid GεG^{\varepsilon} of side length ε{\varepsilon}, the number of ε{\varepsilon}-incident surfaces and suggest the vertex with the largest count as the best candidate for the camera pose. This work extends the approximate incidences methodology in [2] to the (considerably more involved) case at hand.

2.2. The surfaces σ𝐰\sigma_{\bf w}

Let w=(w1,w2,w3)w=(w_{1},w_{2},w_{3}) be a salient point in 3{\mathbb{R}}^{3}, and assume that the camera is positioned at (c,θ)=(x,y,z,θ)(c,\theta)=(x,y,z,\theta). We represent the orientation of the vector wcw-c, within the world frame, by its spherical coordinates (φ,ψ)(\varphi,\psi), except that, unlike the standard convention, we take ψ\psi to be the angle with the xyxy-plane (rather than with the zz-axis):

tanψ=w3z(w1x)2+(w2y)2tanφ=w2yw1x\tan\psi=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}\quad\quad\quad\tan\varphi=\frac{w_{2}-y}{w_{1}-x}

In the two-dimensional frame of the camera the (ξ,η)(\xi,\eta)-coordinates model the view of ww, which differs from above polar representation of the vector wcw-c only by the polar orientation θ\theta of the viewing plane itself. Writing κ\kappa for tanθ\tan\theta, we have

ξ=tan(φθ)=tanφtanθ1+tanφtanθ=(w2y)κ(w1x)(w1x)+κ(w2y),\xi=\tan(\varphi-\theta)=\frac{\tan\varphi-\tan\theta}{1+\tan\varphi\tan\theta}=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)}, (1)
η=tanψ=w3z(w1x)2+(w2y)2.\eta=\tan\psi=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}.

We note that using tanθ\tan\theta does not distinguish between θ\theta and θ+π\theta+\pi, but we will restrict θ\theta to lie in [π/4,π/4][-\pi/4,\pi/4] or in similar narrower ranges, thereby resolving this issue.

We use 4{\mathbb{R}}^{4} with coordinates (x,y,z,κ)(x,y,z,\kappa) as our primal space, where each point models a possible pose of the camera. Each correspondence 𝐰{\bf w} is parameterized by the triple (w,ξ,η)(w,\xi,\eta), and defines a two-dimensional algebraic surface σ𝐰\sigma_{\bf w} of degree at most 44, whose equations (in x,y,z,κx,y,z,\kappa) are given in (1). It is the locus of all camera poses v=(x,y,z,κ)v=(x,y,z,\kappa) at which it sees ww at image coordinates (ξ,η)(\xi,\eta). We can rewrite these equations into the following parametric representation of σ𝐰\sigma_{\bf w}, expressing zz and κ\kappa as functions of xx and yy:

κ=(w2y)ξ(w1x)(w1x)+ξ(w2y)z=w3η(w1x)2+(w2y)2.\kappa=\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}\quad\quad\quad z=w_{3}-\eta\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}. (2)

For a camera pose v=(x,y,z,κ)v=(x,y,z,\kappa), and a point w=(w1,w2,w3)w=(w_{1},w_{2},w_{3}), we write

F(v;w)=(w2y)κ(w1x)(w1x)+κ(w2y)G(v;w)=w3z(w1x)2+(w2y)2.F(v;w)=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)}\quad\quad\quad G(v;w)=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}. (3)

In this notation we can write the Equations (1) characterizing σ𝐰\sigma_{\bf w} (when regarded as equations in vv) as ξ=F(v;w)\xi=F(v;w) and η=G(v;w)\eta=G(v;w).

2.3. Measuring proximity

Given a guessed pose v=(x,y,z,κ)v=(x,y,z,\kappa) of the camera, we want to measure how well it fits the scene that the camera sees. For this, given a correspondence 𝐰=(w,ξ,η){\bf w}=(w,\xi,\eta), we define the frame distance 𝖿𝖽{\sf fd} between vv and 𝐰{\bf w} as the LL_{\infty}-distance between (ξ,η)(\xi,\eta) and (ξv,ηv)(\xi_{v},\eta_{v}), where, as in Eq. (3), ξv=F(v;w)\xi_{v}=F(v;w), ηv=G(v;w)\eta_{v}=G(v;w). That is,

𝖿𝖽(v,𝐰)=max{|ξvξ|,|ηvη|}.{\sf fd}(v,{\bf w})=\max\left\{|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\right\}. (4)

Note that (ξv,ηv)(\xi_{v},\eta_{v}) are the coordinates at which the camera would see ww if it were placed at position vv, so the frame distance is the LL_{\infty}-distance between these coordinates and the actual coordinates (ξ,η)(\xi,\eta) at which the camera sees ww; this serves as a natural measure of how close vv is to the actual pose of the camera.

We are given a viewed scene of nn distinguished points (correspondences) 𝐰=(w,ξ,η){\bf w}=(w,\xi,\eta). Let SS denote the set of nn surfaces σ𝐰\sigma_{\bf w}, representing these correspondences. We assume that the salient features ww and the camera are all located within some bounded region, say [0,1]3[0,1]^{3}. The replacement of θ\theta by κ=tanθ\kappa=\tan\theta makes its range unbounded, so we break the problem into four subproblems, in each of which θ\theta is confined to some sector. In the first subproblem we assume that π/4θπ/4-\pi/4\leq\theta\leq\pi/4, so 1κ1-1\leq\kappa\leq 1. The other three subproblems involve the ranges [π/4,3π/4][\pi/4,3\pi/4], [3π/4,5π/4][3\pi/4,5\pi/4], and [5π/4,7π/4][5\pi/4,7\pi/4]. We only consider here the first subproblem; the treatment of the others is fully analogous. In each such range, replacing θ\theta by tanθ\tan\theta does not incur the ambiguity of identifying θ\theta with θ+π\theta+\pi.

Given an error parameter ε>0{\varepsilon}>0, we seek an approximate pose vv of the camera, at which many correspondences 𝐰{\bf w} are within frame distance at most ε{\varepsilon} from vv, as given in (4).

The following two lemmas relate our frame distance to the Euclidean distance. Their (rather technical) proofs are given in the appendix.

Lemma 2.1.

Let v=(x,y,z,κ)v=(x,y,z,\kappa), and let σ𝐰\sigma_{\bf w} be the surface associated with a correspondence 𝐰={w1,w2,w3,ξ,η}{\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\}. Let vv^{\prime} be a point on σ𝐰\sigma_{\bf w} such that |vv|ε|v-v^{\prime}|\leq{\varepsilon} (where |||\cdot| denotes the Euclidean norm). If

(i) |(w1x)+κ(w2y)|a>0\left|(w_{1}-x)+\kappa(w_{2}-y)\right|\geq a>0, and

(ii) (w1x)2+(w2y)2a>0(w_{1}-x)^{2}+(w_{2}-y)^{2}\geq a>0, for some absolute constant aa,

then 𝖿𝖽(v,𝐰)βε{\sf fd}(v,{\bf w})\leq\beta{\varepsilon} for some constant β\beta that depends on aa.

Informally, Condition (i) requires that the absolute value of the ξ=tan(φθ)\xi=\tan(\varphi-\theta) coordinate of the position of ww in the viewing plane, with the camera positioned at vv, is not too large (i.e., that |(φθ)||(\varphi-\theta)| is not too close to π/2\pi/2). We can ensure this property by restricting the camera image to some suitably bounded ξ\xi-range.

Similarly, Condition (ii) requires that the xyxy-projection of the vector wcw-c is not too small. It can be violated in two scenarios. Either we look at a data point that is too close to cc, or we see it looking too much ‘upwards’ or ‘downwards’. We can ensure that the latter situation does not arise, by restricting the camera image, as in the preceding paragraph, to some suitably bounded η\eta-range too. That done, we ensure that the former situation does not arise by requiring that the physical distance between cc and ww be at least some multiple of aa.

The next lemma establishes the converse connection.

Lemma 2.2.

Let v=(x,y,z,κ)v=(x,y,z,\kappa) be a camera pose and 𝐰={w1,w2,w3,ξ,η}{\bf w}=\{w_{1},w_{2},w_{3},\xi,\eta\} a correspondence, such that 𝖿𝖽(v,𝐰)ε{\sf fd}(v,{\bf w})\leq{\varepsilon}. Assume that |(w1x)+ξ(w2y)|a>0\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0, for some absolute constant aa, and consider the point v=(x,y,z,κ)σ𝐰v^{\prime}=(x,y,z^{\prime},\kappa^{\prime})\in\sigma_{\bf w} where (see Eq. (2))

z=w3η(w1x)2+(w2y)2κ=(w2y)ξ(w1x)(w1x)+ξ(w2y).z^{\prime}=w_{3}-\eta\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}\;\;\;\;\;\;\;\;\;\;\kappa^{\prime}=\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}.

Then |zz|2ε|z-z^{\prime}|\leq\sqrt{2}{\varepsilon} and |κκ|cε|\kappa-\kappa^{\prime}|\leq c{\varepsilon}, for some constant cc, again depending on aa.

Informally, the condition |(w1x)+ξ(w2y)|a>0\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0 means that the orientation of the camera, when it is positioned at (x,y)(x,y) and sees ww at coordinate ξ\xi of the viewing plane is not too close to ±π/2\pm\pi/2. This is a somewhat artificial constraint that is satisfied by our restriction on the allowed yaws of the camera (the range of κ\kappa).

A Simple algorithm. Using Lemma 2.2 and Lemma 2.1 we can derive a simple naive solution which does not require any of the sophisticated machinery developed in this work. We construct a grid GG over Q=[0,1]3×[1,1]Q=[0,1]^{3}\times[-1,1], of cells τ\tau, each of dimensions ε×ε×22ε×2aε{\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2a{\varepsilon}, where aa is the constant of Lemma 2.2. We use this non-square grid GG since we want to find ε{\varepsilon}-approximate incidences in terms of frame distance. For each cell τ\tau of GG we compute the number of surfaces σ𝐰\sigma_{\bf w} that intersect τ\tau. This gives an approximate incidences count for the center of τ\tau. Further details and a precise statement can be found in the appendix.

3. Primal-dual algorithm for geometric proximity

Following the general approach in [2], we use a suitable duality, with some care. We write ε=2γδ1δ2{\varepsilon}=2\gamma\delta_{1}\delta_{2}, for suitable parameters γ\gamma, and ε/(2γ)δ1,δ21{\varepsilon}/(2\gamma)\leq\delta_{1},\,\delta_{2}\leq 1, whose concrete values are fixed later, and apply the decomposition scheme developed in [2] tailored to the case at hand. Specifically, we consider the coarser grid Gδ1G_{\delta_{1}} in the primal space, of cell dimensions δ1×δ1×2δ1×cδ1\delta_{1}\times\delta_{1}\times\sqrt{2}\delta_{1}\times c\delta_{1}, where cc is is the constant from Lemma 2.2, that tiles up the domain Q=[0,1]3×[1,1]Q=[0,1]^{3}\times[-1,1] of possible camera positions. For each cell τ\tau of Gδ1G_{\delta_{1}}, let SτS_{\tau} denote the set of surfaces that cross either τ\tau or one of the eight cells adjacent to τ\tau in the (z,κ)(z,\kappa)-directions.222The choice of zz, κ\kappa is arbitrary, but it is natural for the analysis, given in the appendix. The duality is illustrated in Figure 2.

Refer to caption
Figure 2. A schematic illustration of our duality-based algorithm.

We discretize the set of all possible positions of the camera by the vertices of the finer grid GεG_{\varepsilon}, defined as Gδ1G_{\delta_{1}}, with ε{\varepsilon} replacing δ1\delta_{1}, that tiles up QQ. The number of these candidate positions is m:=O(1/ε4)m:=O(1/{\varepsilon}^{4}). For each vertex qGεq\in G_{\varepsilon}, we want to approximate the number of surfaces that are ε{\varepsilon}-incident to qq, and output the vertex with the largest count as the best candidate for the position of the camera. Let VτV_{\tau} be the subset of GεG_{\varepsilon} contained in τ\tau. We ensure that the boxes of Gδ1G_{\delta_{1}} are pairwise disjoint by making them half open, in the sense that if (x0,y0,z0,κ0)(x_{0},y_{0},z_{0},\kappa_{0}) is the vertex of a box that has the smallest coordinates, then the box is defined by x0x<x0+δ1x_{0}\leq x<x_{0}+\delta_{1}, y0y<y0+δ1y_{0}\leq y<y_{0}+\delta_{1}, z0z<z0+2δ1z_{0}\leq z<z_{0}+\sqrt{2}\delta_{1}, κ0κ<κ0+cδ1\kappa_{0}\leq\kappa<\kappa_{0}+c\delta_{1}. This makes the sets VτV_{\tau} pairwise disjoint as well. Put mτ=|Vτ|m_{\tau}=|V_{\tau}| and nτ=|Sτ|n_{\tau}=|S_{\tau}|. We have mτ=O((δ1/ε)4)m_{\tau}=O\left((\delta_{1}/{\varepsilon})^{4}\right) for each τ\tau. Since the surfaces σ𝐰\sigma_{\bf w} are two-dimensional algebraic surfaces of constant degree, each of them crosses O(1/δ12)O(1/\delta_{1}^{2}) cells of Gδ1G_{\delta_{1}}, so we have τnτ=O(n/δ12)\sum_{\tau}n_{\tau}=O(n/\delta_{1}^{2}).

We now pass to the dual five-dimensional space. Each point in that space represents a correspondence 𝐰=(w1,w2,w3,ξ,η){\bf w}=(w_{1},w_{2},w_{3},\xi,\eta). We use the first three components (w1,w2,w3)(w_{1},w_{2},w_{3}) as the first three coordinates, but modify the ξ\xi- and η\eta-coordinates in a manner that depends on the primal cell τ\tau. Let cτ=(xτ,yτ,zτ,κτ)c_{\tau}=(x_{\tau},y_{\tau},z_{\tau},\kappa_{\tau}) be the midpoint of the primal box τ\tau. For each σ𝐰Sτ\sigma_{\bf w}\in S_{\tau} we map 𝐰=(w,ξ,η){\bf w}=(w,\xi,\eta), where w=(w1,w2,w3)w=(w_{1},w_{2},w_{3}), to the point 𝐰τ=(w1,w2,w3,ξτ,ητ){\bf w}_{\tau}=(w_{1},w_{2},w_{3},\xi_{\tau},\eta_{\tau}), where ξτ=ξF(cτ;w)\xi_{\tau}=\xi-F(c_{\tau};w) and ητ=ηG(cτ;w)\eta_{\tau}=\eta-G(c_{\tau};w), with FF and GG as given in (3). We have

Corollary 3.1.

If σ𝐰\sigma_{\bf w} crosses τ\tau then |ξτ|,|ητ|γδ1|\xi_{\tau}|,\;|\eta_{\tau}|\leq\gamma\delta_{1}, for some absolute constant γ\gamma, provided that the following two properties hold, for some absolute constant a>0a>0 (the constant γ\gamma depends on aa).

(i) |(w1xτ)+κτ(w2yτ)|a\left|(w_{1}-x_{\tau})+\kappa_{\tau}(w_{2}-y_{\tau})\right|\geq a, and

(ii) (w1xτ)2+(w2yτ)2a(w_{1}-x_{\tau})^{2}+(w_{2}-y_{\tau})^{2}\geq a, where (xτ,yτ)(x_{\tau},y_{\tau}) are the (x,y)(x,y)-coordinates of the center of τ\tau.

Proof 3.2.

If σ𝐰Sτ\sigma_{\bf w}\in S_{\tau} then it contains a point vv^{\prime} such that |vcτ|cδ1|v^{\prime}-c_{\tau}|\leq c^{\prime}\delta_{1}, for a suitable absolute constant cc^{\prime} (that depends on cc). We now apply Lemma 2.1, recalling (4).

We take the γ\gamma provided by Corollary 3.1 as the γ\gamma in the definition of δ1\delta_{1} and δ2\delta_{2}. We map each point vVτv\in V_{\tau} to the dual surface σv=σv;τ={𝐰τvσ𝐰}\sigma^{*}_{v}=\sigma^{*}_{v;\tau}=\{{\bf w}_{\tau}\mid v\in\sigma_{\bf w}\}. Using (3), we have

σv;τ={(w,F(v;w)F(cτ;w),G(v;w)G(cτ;w))w=(w1,w2,w2)[0,1]3}.\sigma^{*}_{v;\tau}=\{(w,\;F(v;w)-F(c_{\tau};w),G(v;w)-G(c_{\tau};w))\mid w=(w_{1},w_{2},w_{2})\in[0,1]^{3}\}.

By Corollary 3.1, the points 𝐰τ{\bf w}_{\tau}, for the surfaces σ𝐰\sigma_{\bf w} that cross τ\tau, lie in the region Rτ=[0,1]3×[γδ1,γδ1]2R_{\tau}=[0,1]^{3}\times[-\gamma\delta_{1},\gamma\delta_{1}]^{2}. We partition RτR_{\tau} into a grid Gδ2G_{\delta_{2}} of 1/δ251/\delta_{2}^{5} small congruent boxes, each of dimensions δ2×δ2×δ2×(2γδ1δ2)×(2γδ1δ2)=δ2×δ2×δ2×ε×ε\delta_{2}\times\delta_{2}\times\delta_{2}\times(2\gamma\delta_{1}\delta_{2})\times(2\gamma\delta_{1}\delta_{2})=\delta_{2}\times\delta_{2}\times\delta_{2}\times{\varepsilon}\times{\varepsilon}.

Exactly as in the primal setup, we make each of these boxes half-open, thereby making the sets of dual vertices in the smaller boxes pairwise disjoint. We assign to each of these dual cells τ\tau^{*} the set SτS^{*}_{\tau^{*}} of dual points that lie in τ\tau^{*}, and the set VτV^{*}_{\tau^{*}} of the dual surfaces that cross either τ\tau^{*} or one of the eight cells adjacent to τ\tau^{*} in the (ξτ,ητ)(\xi_{\tau},\eta_{\tau})-directions. Put nτ=|Sτ|n_{\tau^{*}}=|S^{*}_{\tau^{*}}| and mτ=|Vτ|m_{\tau^{*}}=|V^{*}_{\tau^{*}}|. Since the dual cells are pairwise disjoint, we have τnτ=nτ\sum_{\tau^{*}}n_{\tau^{*}}=n_{\tau}. Since the dual surfaces are three-dimensional algebraic surfaces of constant degree, each of them crosses O(1/δ23)O(1/\delta_{2}^{3}) grid cells, so τmτ=O(mτ/δ23)\sum_{\tau^{*}}m_{\tau^{*}}=O\left(m_{\tau}/\delta_{2}^{3}\right).

We compute, for each dual surface σv\sigma^{*}_{v}, the sum τ|Sτ|\sum_{\tau^{*}}|S^{*}_{\tau^{*}}|, over the dual cells τ\tau^{*} that are either crossed by σv\sigma^{*}_{v} or that one of their adjacent cells in the (ξτ,ητ)(\xi_{\tau},\eta_{\tau})-directions is crossed by σv\sigma^{*}_{v}. We output the vertex vv of GεG_{\varepsilon} with the largest resulting count, over all primal cells τ\tau.

The following theorem establishes the correctness of our technique. Its proof is given in Appendix B.

Theorem 3.3.

Suppose that for every cell τGδ1\tau\in G_{\delta_{1}} and for every point v=(x,y,z,κ)Vτv=(x,y,z,\kappa)\in V_{\tau} and every 𝐰=((w1,w2,w3),ξ,η){\bf w}=((w_{1},w_{2},w_{3}),\xi,\eta) such that σ𝐰\sigma_{\bf w} intersects either τ\tau or one of its adjacent cells in the (ξτ,ητ)(\xi_{\tau},\eta_{\tau})-directions, we have that, for some absolute constant a>0a>0,

(i) |(w1x)+κ(w2y)|a\left|(w_{1}-x)+\kappa(w_{2}-y)\right|\geq a,

(ii) (w1x)2+(w2y)2a(w_{1}-x)^{2}+(w_{2}-y)^{2}\geq a, and

(iii) |(w1x)+ξ(w2y)|a\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a.

Then (a) For each vVv\in V, every pair (v,𝐰)(v,{\bf w}) at frame distance ε\leq{\varepsilon} is counted (as an ε{\varepsilon}-incidence of vv) by the algorithm. (b) For each vVv\in V, every pair (v,𝐰)(v,{\bf w}) that we count lies at frame distance αε\leq\alpha{\varepsilon}, for some constant α>0\alpha>0 depending on aa.

3.1. Running time analysis

The cost of the algorithm is clearly proportional to ττ(mτ+nτ),\sum_{\tau}\sum_{\tau^{*}}\left(m_{\tau^{*}}+n_{\tau^{*}}\right), over all primal cells τ\tau and the dual cells τ\tau^{*} associated with each cell τ\tau. We have

ττ(mτ+nτ)=O(τ(mτ/δ23+nτ))=O(m/δ23+n/δ12).\sum_{\tau}\sum_{\tau^{*}}\left(m_{\tau^{*}}+n_{\tau^{*}}\right)=O\left(\sum_{\tau}\left(m_{\tau}/\delta_{2}^{3}+n_{\tau}\right)\right)=O\left(m/\delta_{2}^{3}+n/\delta_{1}^{2}\right).

Optimizing the choice of δ1\delta_{1} and δ2\delta_{2}, we choose δ1=(ε3nm)1/5\delta_{1}=\left(\frac{{\varepsilon}^{3}n}{m}\right)^{1/5} and δ2=(ε2mn)1/5\delta_{2}=\left(\frac{{\varepsilon}^{2}m}{n}\right)^{1/5}. These choices make sense as long as each of δ1\delta_{1}, δ2\delta_{2} lies between ε/(2γ){\varepsilon}/(2\gamma) and 11. That is, ε2γ(ε3nm)1/51\frac{{\varepsilon}}{2\gamma}\leq\left(\frac{{\varepsilon}^{3}n}{m}\right)^{1/5}\leq 1 and ε2γ(ε2mn)1/51\frac{{\varepsilon}}{2\gamma}\leq\left(\frac{{\varepsilon}^{2}m}{n}\right)^{1/5}\leq 1, or cε2mnc′′mε3c^{\prime}{\varepsilon}^{2}m\leq n\leq\frac{c^{\prime\prime}m}{{\varepsilon}^{3}}, where cc^{\prime} and c′′c^{\prime\prime} are absolute constants (that depend on γ\gamma).

If n<cε2mn<c^{\prime}{\varepsilon}^{2}m, we use only the primal setup, taking δ1=ε\delta_{1}={\varepsilon} (for the primal subdivision). The cost is then O(n/ε2+m)=O(m).O\left(n/{\varepsilon}^{2}+m\right)=O\left(m\right). Similarly, if n>c′′mε3{\displaystyle n>\frac{c^{\prime\prime}m}{{\varepsilon}^{3}}}, we use only the dual setup, taking δ1=1\delta_{1}=1 and δ2=ε/(2γ)\delta_{2}={\varepsilon}/(2\gamma), and the cost is thus O(n+m/ε3)=O(n).O\left(n+m/{\varepsilon}^{3}\right)=O(n). Adding everything together, to cover all three subranges, the running time is then O(m2/5n3/5ε6/5+n+m).O\left(\frac{m^{2/5}n^{3/5}}{{\varepsilon}^{6/5}}+n+m\right). Substituting m=O(1/ε4)m=O\left(1/{\varepsilon}^{4}\right), we get a running time of O(n3/5ε14/5+n+1ε4).O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right). The first term dominates when n=Ω(1ε2)n=\Omega(\frac{1}{{\varepsilon}^{2}}) and n=O(1ε7)n=O(\frac{1}{{\varepsilon}^{7}}) . In conclusion, we have the following result.

Theorem 3.4.

Given nn data points that are seen (and identified) in a two-dimensional image taken by a vertically positioned camera, and an error parameter ε>0{\varepsilon}>0, where the viewed points satisfy the assumptions made in Theorem 3.3, we can compute, in O(n3/5ε14/5+n+1ε4){\displaystyle O\left(\frac{n^{3/5}}{{\varepsilon}^{14/5}}+n+\frac{1}{{\varepsilon}^{4}}\right)} time, a vertex vv of GεG_{\varepsilon} that maximizes the approximate count of ε{\varepsilon}-incident correspondences, where “approximate” means that every correspondence 𝐰{\bf w} whose surface σ𝐰{\bf\sigma_{w}} is at frame distance at most ε{\varepsilon} from vv is counted and every correspondence that we count lies at frame distance at most αε\alpha{\varepsilon} from vv, for some fixed constant α\alpha.

Restricting ourselves only to grid vertices does not really miss any solution. We only lose a bit in the quality of approximation, replacing ε{\varepsilon} by a slightly large constant multiple thereof, when we move from the best solution to a vertex of its grid cell.

4. Geometric proximity via canonical surfaces

In this section we present a general technique to preprocess a set of algebraic surfaces into a data structure that can answer approximate incidences queries. In this technique we round the nn original surfaces into a set of canonical surfaces, whose size depends only on ε{\varepsilon}, such that each original surface has a canonical surface that is “close” to it. Then we build an octree-based data structure for approximate incidences queries with respect to the canonical surfaces. However, to reduce the number of intersections between the cells of the octree and the surfaces, we further reduce the number of surfaces as we go from one level of the octree to the next, by rounding them in a coarser manner into a smaller set of surfaces.

This technique has been introduced by Fonseca and Mount [3] for the case of hyperplanes. We describe as a warmup step, in Section C of the appendix, our interpretation of their technique applied to hyperplanes. We then extend here the technique to general surfaces, and apply it to the specific instance of 2-surfaces in 4-space that arise in the camera pose problem.

We have a set SS of nn kk-dimensional surfaces in d{\mathbb{R}}^{d} that cross the unit cube [0,1]d[0,1]^{d}, and a given error parameter ε{\varepsilon}. We assume that each surface σS\sigma\in S is given in parametric form, where the first kk coordinates are the parameters, so its equations are

xj=Fj(σ)(x1,,xk),for j=k+1,,d.x_{j}=F_{j}^{(\sigma)}(x_{1},\ldots,x_{k}),\qquad\text{for $j=k+1,\ldots,d$}.

Moreover, we assume that each σS\sigma\in S is defined in terms of \ell essential parameters 𝐭=(t1,,t){\bf t}=(t_{1},\ldots,t_{\ell}), and dkd-k additional free additive parameters 𝐟=(fk+1,,fd){\bf f}=(f_{k+1},\ldots,f_{d}), one free parameter for each dependent coordinate. Concretely, we assume that the equations defining the surface σS\sigma\in S, parameterized by 𝐭{\bf t} and 𝐟{\bf f} (we then denote σ\sigma as σ𝐭,𝐟\sigma_{{\bf t},{\bf f}}), are

xj=Fj(𝐱;𝐭)+fj=Fj(x1,,xk;t1,,t)+fj,for j=k+1,,d.x_{j}=F_{j}({\bf x};{\bf t})+f_{j}=F_{j}(x_{1},\ldots,x_{k};t_{1},\ldots,t_{\ell})+f_{j},\qquad\text{for $j=k+1,\ldots,d$}.

For each equation of the surface that does not have a free parameter in the original expression, we introduce an artificial free parameter, and initialize its value to 0. (We need this separation into essential and free parameters for technical reasons that will become clear later.) We assume that 𝐭{\bf t} (resp., 𝐟{\bf f}) varies over [0,1][0,1]^{\ell} (resp., [0,1]dk[0,1]^{d-k}).

Remark. The distinction between free and essential parameters seems to be artificial, but yet free parameters do arise in certain basic cases, such as the case of hyperplanes discussed in Section C of the appendix. In the case of our 2-surfaces in 4-space, the parameter w3w_{3} is free, and we introduce a second artificial free parameter into the equation for κ\kappa. The number of essential parameters is =4\ell=4 (they are w1w_{1},w2w_{2},ξ\xi, and η\eta).

We assume that the functions FjF_{j} are all continuous and differentiable, in all of their dependent variables 𝐱{\bf x}, 𝐭{\bf t} and 𝐟{\bf f} (this is a trivial assumption for 𝐟{\bf f}), and that they satisfy the following two conditions.

(i) Bounded gradients. |𝐱Fj(𝐱;𝐭)|c1,|𝐭Fj(𝐱;𝐭)|c1,\left|\nabla_{\bf x}F_{j}({\bf x};{\bf t})\right|\leq c_{1},\quad\left|\nabla_{\bf t}F_{j}({\bf x};{\bf t})\right|\leq c_{1}, for each j=k+1,,dj=k+1,\ldots,d, for any 𝐱[0,1]k{\bf x}\in[0,1]^{k} and any 𝐭[0,1]{\bf t}\in[0,1]^{\ell}, where c1c_{1} is some absolute constant. Here 𝐱\nabla_{\bf x} (resp., 𝐭\nabla_{\bf t}) means the gradient with respect to only the variables 𝐱{\bf x} (resp., 𝐭{\bf t}).

(ii) Lipschitz gradients. |𝐱Fj(𝐱;𝐭)𝐱Fj(𝐱;𝐭)|c2|𝐭𝐭|,\left|\nabla_{\bf x}F_{j}({\bf x};{\bf t})-\nabla_{\bf x}F_{j}({\bf x};{\bf t}^{\prime})\right|\leq c_{2}|{\bf t}-{\bf t}^{\prime}|, for each j=k+1,,dj=k+1,\ldots,d, for any 𝐱[0,1]k{\bf x}\in[0,1]^{k} and any 𝐭{\bf t}, 𝐭[0,1]{\bf t}^{\prime}\in[0,1]^{\ell}, where c2c_{2} is some absolute constant. This assumption is implied by the assumption that all the eigenvalues of the mixed part of the Hessian matrix 𝐭𝐱Fj(𝐱;𝐭)\nabla_{\bf t}\nabla_{\bf x}F_{j}({\bf x};{\bf t}) have absolute value bounded by c2c_{2}.

4.1. Canonizing the input surfaces

We first replace each surface σ𝐭,𝐟S\sigma_{{\bf t},{\bf f}}\in S by a canonical “nearby” surface σ𝐬,𝐠\sigma_{{\bf s},{\bf g}}. Let ε=εc2log(1/ε){\varepsilon}^{\prime}=\frac{{\varepsilon}}{c_{2}\log(1/{\varepsilon})} where c2c_{2} is the constant from Condition (ii). We get 𝐬{\bf s} from 𝐭{\bf t} (resp., 𝐠{\bf g} from 𝐟{\bf f}) by rounding each coordinate in the essential parametric domain LL (resp., in the parametric domain Φ\Phi) to a multiple of ε/(+1){\varepsilon}^{\prime}/(\ell+1). Note that each of the artificial free parameters (those that did not exist in the original equations) has the initial value 0 for all surfaces, and remains 0 in the rounded surfaces. We get O((1/ε))O\left((1/{\varepsilon}^{\prime})^{\ell^{\prime}}\right) canonical rounded surfaces, where \ell^{\prime}\geq\ell is the number of original parameters, that is, the number of essential parameters plus the number of non-artificial free parameters; in the worst case we have =+dk\ell^{\prime}=\ell+d-k.

For a surface σ𝐭,𝐟\sigma_{{\bf t},{\bf f}} and its rounded version σ𝐬,𝐠\sigma_{{\bf s},{\bf g}} we have, for each jj,

|(Fj(𝐱;𝐭)+fj)(Fj(𝐱;𝐬)+gj)|\displaystyle\left|\left(F_{j}({\bf x};{\bf t})+f_{j}\right)-\left(F_{j}({\bf x};{\bf s})+g_{j}\right)\right| |𝐭Fj(𝐱;𝐭)||𝐭𝐬|+|fjgj|\displaystyle\leq|\nabla_{\bf t}F_{j}({\bf x};{\bf t}^{\prime})|\cdot|{\bf t}-{\bf s}|+|f_{j}-g_{j}|
c1|𝐭𝐬|+|fjgj|(c1+1)ε,\displaystyle\leq c_{1}|{\bf t}-{\bf s}|+|f_{j}-g_{j}|\leq(c_{1}+1){\varepsilon}^{\prime},

where 𝐭{\bf t}^{\prime} is some intermediate value, which is irrelevant due to Condition (i).

We will use the 2\ell_{2}-norm of the difference vector ((Fj(𝐱;𝐭)+fj)(Fj(𝐱;𝐬)+gj))j=k+1d\left((F_{j}({\bf x};{\bf t})+f_{j})-(F_{j}({\bf x};{\bf s})+g_{j})\right)_{j=k+1}^{d} as the measure of proximity between the surfaces σ𝐭,𝐟\sigma_{{\bf t},{\bf f}} and σ𝐬,𝐠\sigma_{{\bf s},{\bf g}} at 𝐱{\bf x}, and denote it as 𝖽𝗂𝗌𝗍(σ𝐭,𝐟,σ𝐬,𝐠;𝐱){\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}};{\bf x}). The maximum 𝖽𝗂𝗌𝗍(σ𝐭,𝐟,σ𝐬,𝐠):=max𝐱[0,1]k𝖽𝗂𝗌𝗍(σ𝐭,𝐟,σ𝐬,𝐠;𝐱){\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}}):=\max_{{\bf x}\in[0,1]^{k}}{\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}};{\bf x}) measures the global proximity of the two surfaces. (Note that it is an upper bound on the Hausdorff distance between the two surfaces.) We thus have 𝖽𝗂𝗌𝗍(σ𝐭,𝐟,σ𝐬,𝐠)(c1+1)ε{\sf dist}(\sigma_{{\bf t},{\bf f}},\sigma_{{\bf s},{\bf g}})\leq(c_{1}+1){\varepsilon}^{\prime} when σ𝐬,𝐠\sigma_{{\bf s},{\bf g}} is the canonical surface approximating σ𝐭,𝐟\sigma_{{\bf t},{\bf f}}.

We define the weight of each canonical surface to be the number of original surfaces that got rounded to it, and we refer to the set of all canonical surfaces by ScS^{c}.

4.2. Approximately counting ε{\varepsilon}-incidences

We describe an algorithm for approximating the ε{\varepsilon}-incidences counts of the surfaces in SS and the vertices of a grid GG of side length 4ε4{\varepsilon}.

We construct an octree decomposition of τ0:=[0,1]d\tau_{0}:=[0,1]^{d}, all the way to subcubes of side length 4ε4{\varepsilon} such that each vertex of GG is the center of a leaf-cube. We propagate the surfaces of ScS^{c} down this octree, further rounding each of them within each subcube that it crosses.

The root of the octree corresponds to τ0\tau_{0}, and we set Sτ0=ScS_{\tau_{0}}=S^{c}. At level j1j\geq 1 of the recursion, we have subcubes τ\tau of τ0\tau_{0} of side length δ=1/2j\delta=1/2^{j}. For each such τ\tau, we set S~τ\tilde{S}_{\tau} to be the subset of the surfaces in Sp(τ)S_{p(\tau)} (that have been produced at the parent cube p(τ)p(\tau) of τ\tau) that intersect τ\tau. We now show how to further round the surfaces of S~τ\tilde{S}_{\tau}, so as to get a coarser set SτS_{\tau} of surfaces that we associate with τ\tau, and that we process recursively within τ\tau.

At any node τ\tau at level jj of our rounding process, each surface σ\sigma of SτS_{\tau} is of the form xj=Hj(𝐱;𝐭)+fjx_{j}=H_{j}({\bf x};{\bf t})+f_{j}, for j=k+1,,dj=k+1,\ldots,d where 𝐱=(x1,,xk){\bf x}=(x_{1},\ldots,x_{k}), and 𝐭=(t1,,t){\bf t}=(t_{1},\ldots,t_{\ell}).

(a) For each j=k+1,,dj=k+1,\ldots,d the function HjH_{j} is a translation of FjF_{j}. That is Hj(𝐱;𝐭)=Fj(𝐱;𝐭)+cH_{j}({\bf x};{\bf t})=F_{j}({\bf x};{\bf t})+c for some constant cc. Thus the gradients of HjH_{j} also satisfy Conditions (i) and (ii).

(b) 𝐭{\bf t} is some vector of \ell essential parameters, and each coordinate of 𝐭{\bf t} is an integer multiple of ε(+1)δ\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}, where δ=1/2j\delta=1/2^{j}.

(c) 𝐟=(fk+1,,fd){\bf f}=(f_{k+1},\ldots,f_{d}) is a vector of free parameters, each is a multiple of ε/(+1){\varepsilon}^{\prime}/(\ell+1).

Note that the surfaces in Sτ0=ScS_{\tau_{0}}=S^{c}, namely the set of initial canonical surfaces constructed in Section 4.1, are of this form (for j=0j=0 and Hj=FjH_{j}=F_{j}). We get SτS_{\tau} from S~τSp(τ)\tilde{S}_{\tau}\subseteq S_{p(\tau)} by the following steps. The first step just changes the presentation of τ\tau and S~τ\tilde{S}_{\tau}, and the following steps do the actual rounding to obtain SτS_{\tau}.

  1. (1)

    Let (ξ1,,ξk,ξk+1,,ξd)(\xi_{1},\ldots,\xi_{k},\xi_{k+1},\ldots,\xi_{d}) be the point in τ\tau of smallest coordinates and set ξ=(ξ1,,ξk){\bf\xi}=(\xi_{1},\ldots,\xi_{k}). We rewrite the equations of each surface of S~τ\tilde{S}_{\tau} as follows: xj=Gj(𝐱;𝐭)+fjx_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}, for j=k+1,,dj=k+1,\ldots,d, where Gj(𝐱;𝐭)=Hj(𝐱;𝐭)Hj(ξ;𝐭)+ξjG_{j}({\bf x};{\bf t})=H_{j}({\bf x};{\bf t})-H_{j}({\bf\xi};{\bf t})+\xi_{j}, and fj=fj+Hj(ξ;𝐭)ξjf^{\prime}_{j}=f_{j}+H_{j}({\bf\xi};{\bf t})-\xi_{j}, for j=k+1,,dj=k+1,\ldots,d. Note that in this reformulation we have not changed the essential parameters, but we did change the free parameters from fjf_{j} to fjf^{\prime}_{j}, where fjf^{\prime}_{j} depends on fjf_{j}, 𝐭{\bf t}, ξ{\bf\xi}, and ξj\xi_{j}. Note also that Gj(ξ;𝐭)=ξjG_{j}({\bf\xi};{\bf t})=\xi_{j} for j=k+1,,dj=k+1,\ldots,d.

  2. (2)

    We replace the essential parameters 𝐭{\bf t} of a surface σ𝐭,𝐟\sigma_{{\bf t},{\bf f}} by 𝐬{\bf s}, which we obtain by rounding each coordinate of 𝐭{\bf t} to the nearest integer multiple of ε(+1)δ\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}. So the rounded surface has the equations xj=Gj(𝐱;𝐬)+fjx_{j}=G_{j}({\bf x};{\bf s})+f^{\prime}_{j}, for j=k+1,,dj=k+1,\ldots,d. Note that we also have that Gj(ξ;𝐬)=ξjG_{j}({\bf\xi};{\bf s})=\xi_{j}, for j=k+1,,dj=k+1,\ldots,d.

  3. (3)

    For each surface, we round each free parameter fjf^{\prime}_{j}, j=k+1,,dj=k+1,\ldots,d, to an integral multiple of ε+1\frac{{\varepsilon}^{\prime}}{\ell+1}, and denote the rounded vector by 𝐠{\bf g}. Our final equations for each rounded surface that we put in SτS_{\tau} are xj=Gj(𝐱;𝐬)+gjx_{j}=G_{j}({\bf x};{\bf s})+g_{j} for j=k+1,,dj=k+1,\ldots,d.

By construction, when 𝐭1{\bf t}_{1} and 𝐟1{\bf f}^{\prime}_{1} and 𝐭2{\bf t}_{2} and 𝐟2{\bf f}^{\prime}_{2} get rounded to the same vectors 𝐬{\bf s} and 𝐠{\bf g} then the corresponding two surfaces in S~τ\tilde{S}_{\tau} get rounded to the same surface in SτS_{\tau}. The weight of each surface in SτS_{\tau} is the sum of the weights of the surfaces in Sp(τ)S_{p(\tau)} that got rounded to it, which, by induction, is the number of original surfaces that are recursively rounded to it. In the next step of the recursion the HjH_{j}’s of the parametrization of the surfaces in SτS_{\tau} are the functions GjG_{j} defined above.

The total weight of the surface in SτS_{\tau} for a leaf cell τ\tau is the approximate ε{\varepsilon}-incidences count that we associate with the center of τ\tau.

4.3. Error analysis

We now bound the error incurred by our discretization. We start with the following lemma, whose proof is given in Appendix .

Lemma 4.1.

Let τ\tau be a cell of the octtree and let xj=Gj(𝐱;𝐭)+fjx_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}, for j=k+1,,dj=k+1,\ldots,d be a surface obtained in Step 1 of the rounding process described above. For any 𝐱=(x1,,xk)[0,δ]k{\bf x}=(x_{1},\ldots,x_{k})\in[0,\delta]^{k}, for any 𝐭,𝐬[0,1]{\bf t},{\bf s}\in[0,1]^{\ell}, and for each j=k+1,,dj=k+1,\ldots,d, we have

|Gj(𝐱;𝐬)Gj(𝐱;𝐭)|c2|𝐱ξ||𝐭𝐬|,\left|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right|\leq c_{2}|{\bf x}-{\bf\xi}|\cdot|{\bf t}-{\bf s}|, (5)

where c2c_{2} is the constant of Condition (ii), and ξ=(ξ1,,ξk){\bf\xi}=(\xi_{1},\ldots,\xi_{k}) consists of the first kk coordinates of the point in τ\tau of smallest coordinates.

Lemma 4.2.

For any 𝐱=(x1,,xk)[0,δ]k{\bf x}=(x_{1},\ldots,x_{k})\in[0,\delta]^{k}, for any 𝐭{\bf t}, 𝐬[0,1]{\bf s}\in[0,1]^{\ell}, and for each j=k+1,,dj=k+1,\ldots,d, we have

|Gj(𝐱;𝐬)+gj(Gj(𝐱;𝐭)+fj)|c2εεlog(1/ε),\left|G_{j}({\bf x};{\bf s})+g_{j}-(G_{j}({\bf x};{\bf t})+f^{\prime}_{j})\right|\leq c_{2}{\varepsilon}^{\prime}\leq\frac{{\varepsilon}}{\log(1/{\varepsilon})}, (6)

where c2c_{2} is the constant of Condition (ii).

Proof 4.3.

Using the triangle inequality and Lemma 4.1, we get that

|Gj(𝐱;𝐬)+gj(Gj(𝐱;𝐭)+fj)|\displaystyle\left|G_{j}({\bf x};{\bf s})+g_{j}-(G_{j}({\bf x};{\bf t})+f^{\prime}_{j})\right| |Gj(𝐱;𝐬)Gj(𝐱;𝐭)|+|gjfj|c2|𝐱ξ||𝐭𝐬|+ε+1.\displaystyle\leq\left|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right|+\left|g_{j}-f^{\prime}_{j}\right|\leq c_{2}|{\bf x}-{\bf\xi}||{\bf t}-{\bf s}|+\frac{{\varepsilon}^{\prime}}{\ell+1}\ .

Since |𝐱ξ|δ|{\bf x}-{\bf\xi}|\leq\delta, |𝐭𝐬|ε(+1)δ|{\bf t}-{\bf s}|\leq\frac{\ell{\varepsilon}^{\prime}}{(\ell+1)\delta}, and |gjfj|ε+1|g_{j}-f^{\prime}_{j}|\leq\frac{{\varepsilon}^{\prime}}{\ell+1}, the lemma follows.

We now bound the number of surfaces in SτS_{\tau}. Since 𝐬[0,1]{\bf s}\in[0,1]^{\ell} and each of its coordinates is a multiple of ε(+1)δ\frac{{\varepsilon}^{\prime}}{(\ell+1)\delta}, we have at most (δε)(\frac{\delta}{{\varepsilon}^{\prime}})^{\ell} different values for 𝐬{\bf s}. To bound the number of possible values of 𝐠{\bf g}, we prove the following lemma (see the appendix for the proof).

Lemma 4.4.

Let xj=Gj(𝐱;𝐭)+fjx_{j}=G_{j}({\bf x};{\bf t})+f^{\prime}_{j}, for j=k+1,,dj=k+1,\ldots,d, be a surface σ𝐭,𝐟\sigma_{{\bf t},{\bf f}^{\prime}} in S~τ\tilde{S}_{\tau}. For each j=k+1,,dj=k+1,\ldots,d, we have |fj|(c1+1)δ\left|f^{\prime}_{j}\right|\leq(c_{1}+1)\delta, where c1c_{1} is the constant of Condition (i).

Lemma 4.4 implies that each gjg_{j}, j=k+1,,dj=k+1,\ldots,d, has only O(δε)O(\frac{\delta}{{\varepsilon}^{\prime}}) possible values, for a total of at most O((δε)dk)O((\frac{\delta}{{\varepsilon}^{\prime}})^{d-k}) possible values for 𝐠{\bf g}. Combining the number of possible values for 𝐬{\bf s} and 𝐠{\bf g}, we get that the number of newly discretized surfaces in SτS_{\tau} is

O((δε)(δε)dk)=O((δε)+dk).O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell}\cdot\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{d-k}\right)=O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell+d-k}\right). (7)

It follows that each level of the recursive octree decomposition generates

O((1δ)d(δε)+dk)=O(δk(ε)+dk)O\left(\left(\frac{1}{\delta}\right)^{d}\cdot\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{\ell+d-k}\right)=O\left(\frac{\delta^{\ell-k}}{({\varepsilon}^{\prime})^{\ell+d-k}}\right)

re-discretized surfaces, where the first factor in the left-hand side expression is the number of cubes generated at this recursive level, and the second factor is the one in (7).

Summing over the recursive levels j=0,,log1εj=0,\ldots,\log\frac{1}{{\varepsilon}}, where the cube size δ\delta is 1/2j1/2^{j} at level jj, we get a total size of O(1(ε)+dkj=0log1ε12j(k))O\left(\frac{1}{({\varepsilon}^{\prime})^{\ell+d-k}}\sum_{j=0}^{\log\frac{1}{{\varepsilon}}}\frac{1}{2^{j(\ell-k)}}\right). We get different estimates for the sum according to the sign of k\ell-k. If >k\ell>k the sum is O(1)O(1). If =k\ell=k the sum is O(log1ε)O\left(\log\frac{1}{{\varepsilon}}\right). If <k\ell<k the sum is O(2jmax(k))=O(1(ε)k).O\left(2^{j_{\rm max}(k-\ell)}\right)=O\left(\frac{1}{({\varepsilon}^{\prime})^{k-\ell}}\right). Accordingly, the overall size of the structure, taking also into account the cost of the first phase, is

{O(1(ε)+dk)for >kO(1(ε)dlog1ε)for =kO(1(ε)d)for <k.\begin{cases}O\left(\frac{1}{({\varepsilon}^{\prime})^{\ell+d-k}}\right)&\text{for $\ell>k$}\\ O\left(\frac{1}{({\varepsilon}^{\prime})^{d}}\log\frac{1}{{\varepsilon}}\right)&\text{for $\ell=k$}\\ O\left(\frac{1}{({\varepsilon}^{\prime})^{d}}\right)&\text{for $\ell<k$}.\end{cases} (8)

The following theorem summarizes the result of this section. Its proof follows in a straightforward way from the preceding discussion from Lemma 4.2, analogously to the proof of Lemma 4.4 in the appendix.

Theorem 4.5.

Let SS be a set of nn surfaces in d{\mathbb{R}}^{d} that cross the unit cube [0,1]d[0,1]^{d}, given parametrically as xj=Fj(𝐱;𝐭)+fjx_{j}=F_{j}({\bf x};{\bf t})+f_{j} for j=k+1,,dj=k+1,\ldots,d, where the functions FjF_{j} satisfy conditions (i) and (ii), and 𝐭=(t1,,t){\bf t}=(t_{1},\ldots,t_{\ell}). Let GG be the (4ε)(4{\varepsilon})-grid within [0,1]d[0,1]^{d}. The algorithm described above reports for each vertex vv of GG an approximate ε{\varepsilon}-incidences count that includes all surfaces at distance at most ε{\varepsilon} from vv and may include some surfaces at distance at most (2d+1)ε(2\sqrt{d}+1){\varepsilon} from vv. The running time of this algorithm is proportional to the total number of rounded surfaces that it generates, which is given by Equation (8), plus an additive O(n)O(n) term for the initial canonization of the surfaces.

We can modify our data structure so that it can answer approximate or exact ε{\varepsilon}-incidence queries as we describe in Section C of the appendix for the case of hyperplanes.

5. Experimental Results

The goal of the experimental results is to show the practical relation between the naive, the primal-dual and the general canonical surfaces algorithms. It is not our intention to obtain the fastest possible code, but to obtain a platform for fair comparison between the techniques. We have performed a preliminary experimental comparison using synthetic as well as real-world data. We focus on values of n,εn,{\varepsilon} that are practical in real applications. Typically, we have 100K100K-200K200K 33D points bounded by a rectangle of size 100-150 meters and the uncertainty is around 3m (so the relative error is ε=0.03{\varepsilon}\leavevmode\nobreak\ =0.03). The three methods that we evaluate are:

  • The naive method, with asymptotic run-time O(nε2)O(\frac{n}{{\varepsilon}^{2}}).

  • The primal-dual method (cf. Section 3), with asymptotic run-time O(n+n3/5ε14/5+1ε4)O(n+\frac{n^{3/5}}{{\varepsilon}^{14/5}}+\frac{1}{{\varepsilon}^{4}}).

  • The canonical surfaces method (cf. Section 4), with asymptotic run-time O~(n+1ε6)\tilde{O}(n+\frac{1}{{\varepsilon}^{6}}) (ignoring poly logarithmic factors).

In all experiments we normalize the data, so that the camera position (x,y,zx,y,z) and the 33D points lie in the unit box [0,1]3[0,1]^{3}, and the forth parameter (κ\kappa) representing the camera orientation lies in [1,1][-1,1].

5.1. Random synthetic data

Starting from a fixed known camera pose, we generate a set of nn uniformly sampled 33D points which are projected onto the camera image plane using Eq. (1). To model outliers in the association process we use random projections for 90% of the 3D3D points, resulting in an inlier ratio of 10%10\%. We add Gaussian noise of zero mean and σ=0.02\sigma=0.02 to the coordinates of each 3D3D point. This provides us with 22D-33D correspondences that are used for estimating the camera pose. We apply the three algorithms above and measure the run-times, where each algorithm is tested for its ability to reach approximately the (known) solution. We remark that the actual implementation may be slowed down by the (constant) cost of some of its primitive operations, but it can also gain efficiency from certain practical heuristic improvements. For example, in contrast to the worst case analysis, we could stop the recursion in the algorithm of Section 4, at any step of the octree expansion, whenever the maximum incidence count obtained so far is larger than the number of surfaces crossing a cell of the octree. The same applies for the primal-dual technique in the dual stage. On the other hand, finding whether or not a surface crosses a box in pose space, takes at least the time to test for intersections of the surface with 32 edges of the box, and this constant affects greatly the run-time. The O(1/ε6)O(1/{\varepsilon}^{6}) bound in the canonical surfaces algorithm is huge and has no effect in practice for this problem. For this reason, the overall number of surfaces that we have to consider in the recursion can be very large. The canonical surfaces algorithm in our setting does not change much with ε{\varepsilon} because we are far from the second term effect. We show in Figure 3, a comparison of the three algorithms.

Refer to caption
Refer to caption
Refer to caption
Figure 3. The run-time of the three methods for various values of ε{\varepsilon}.

The computed camera poses corresponding to Figure 3, obtained by the three algorithm for various problem sizes, are displayed in Table 1, compared to the known pose. The goal here is not to obtain the most accurate algorithm but to show that they are comparable in accuracy in this setting so the runtime comparison is fair.

n x(N/PD/C) y(N/PD/C) z(N/PD/C) κ\kappa(N/PD/C)
8000 0.31/0.31/0.28 0.22/0.2/0.18 0.1/0.12/0.09 0.55/0.66/0.59
12000 0.31/0.33/0.28 0.22/0.17/0.19 0.1/0.1/0.1 0.55/0.65/0.6
24000 0.31/0.3/0.28 0.22/0.2/0.18 0.1/0.1/0.09 0.55/0.61/0.59
32000 0.31/0.27/0.28 0.22/0.2/0.19 0.1/0.08/0.09 0.55/0.57/0.59
True pose 0.3 0.2 0.1 0.6
Table 1. Poses computed by the three algorithms for ε=0.03{\varepsilon}=0.03 and various problem sizes (N:naive, PD:primal-dual, C:canonical surfaces).

5.2. Real-world data

We evaluated the performance of the algorithms also on real-world datasets for which the true camera pose is known. The input is a set of correspondences, each represented by a 55-tuple (w1,w2,w3,ξ,η)(w_{1},w_{2},w_{3},\xi,\eta), where (w1,w2,w3)(w_{1},w_{2},w_{3}) are the 3D3D coordinates of a salient feature in the scene and (ξ,η)(\xi,\eta) is its corresponding projection in the camera frame. We computed the camera pose from these matches using both primal-dual and naive algorithms and compared the poses to the true one. An example of the data we have used is shown in Figure 4.

Refer to caption
(a) 140000 33D landmarks in correspondence to image features in  4.
Refer to caption
(b) Query image.
Refer to caption
(c) Corresponding (ξ,η)(\xi,\eta) points, each corresponds to many landmarks.
Refer to caption
(d) Inliers ((ξ,η)(\xi,\eta) with ε{\varepsilon}-close projection of landmarks) found by the algorithm (green) along with the rays from matched landmarks
Refer to caption
(e) The pose found by the algorithm (within 3m and 10 degrees from the known true pose) with rays to matched landmarks.
Figure 4. Real-world data input and pose

We evaluated the runtime for different problem sizes and checked the correctness of the camera pose approximation when the size increased. To get different input sizes, we added random correspondences to a base set of actual correspondences. The number of random correspondences determines the input size but also the fraction of good correspondences (percentage of inliers) which goes down with increased input size (the number of inliers in real world cases is typically 10%). We show the same plots as before in Figure 5 and Table 2.

Refer to caption
Figure 5. Runtime for real-world data with increased nn and decreased number of inliers.
n x y z κ\kappa
2000 0.26 0.62 0.06 0.72
3626 0.36 0.56 0.12 0.52
5626 0.38 0.60 0.12 0.62
7626 0.40 0.64 0.08 0.57
11626 0.43 0.68 0.13 0.63
True pose 0.37 0.59 0.06 -
Table 2. Poses computed by the primal-dual algorithms for real-world data (we do not know the actual orientation here).

6. Future work

We note that similar approaches can be applied for computing the relative pose [10] between two cameras (that look at the same scene), except that the pose estimation then uses 22D-22D matches between the two images (rather than 2D-3D image-to-model correspondences). Determining the relative motion between images is a prerequisite for stereo depth estimation [12], in multi-view geometry [7], and for the initialization of the view graph [15, 16] in SfM, and is therefore an equally important task in computer vision. In addition, in future work we want to also consider the case of a generalized or distributed camera setup and likewise transform the camera posing problem to an ε{\varepsilon}-incidence problem.

References

  • [1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski. Building rome in a day. In Commun. ACM 54(10), pages 105–112. ACM, 2011.
  • [2] D. Aiger, H. Kaplan, and M. Sharir. Output sensitive algorithms for approximate incidences and their applications. In Computational Geometry, to appear. Also in European Symposium on Algorithms, volume 5, pages 1–13, 2017.
  • [3] G. D. Da Fonseca and D. M. Mount. Approximate range searching: The absolute model. Computational Geometry, 43(4):434–444, 2010.
  • [4] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  • [5] C. Häne, L. Heng, G. H. Lee, F. Fraundorfer, P. Furgale, T. Sattler, and M. Pollefeys. 3d visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection. Image and Vision Computing, 68:14–27, 2017.
  • [6] B. M. Haralick, C.-N. Lee, K. Ottenberg, and M. Nölle. Review and analysis of solutions of the three point perspective pose estimation problem. International Journal of Computer Vision, 13(3):331–356, 1994.
  • [7] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge university press, 2003.
  • [8] G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces. In ISMAR, pages 83–86. IEEE, 2009.
  • [9] S. Middelberg, T. Sattler, O. Untzelmann, and L. Kobbelt. Scalable 6-dof localization on mobile devices. In European Conference on Computer Vision, pages 268–283. Springer, 2014.
  • [10] D. Nistér. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004.
  • [11] M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3):207–232, 2004.
  • [12] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1-3):7–42, 2002.
  • [13] J. L. Schonberger and J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016.
  • [14] C. Sweeney, J. Flynn, B. Nuernberger, M. Turk, and T. Höllerer. Efficient computation of absolute pose for gravity-aware augmented reality. In ISMAR, pages 19–24. IEEE, 2015.
  • [15] C. Sweeney, T. Sattler, T. Hollerer, M. Turk, and M. Pollefeys. Optimizing the viewing graph for structure-from-motion. In Proceedings of the IEEE International Conference on Computer Vision, pages 801–809, 2015.
  • [16] C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop. constraints. In Computer Vision and Pattern Recognition, pages 1426–1433. IEEE, 2010.
  • [17] B. Zeisl, T. Sattler, and M. Pollefeys. Camera pose voting for large-scale image-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015.

Appendix A Omitted proofs

In all experiments we normalize the data, so that the camera position (x,y,zx,y,z) and the 33D points lie in the unit box [0,1]3[0,1]^{3}, and the forth parameter (κ\kappa) representing the camera orientation lies in [1,1][-1,1]. Let ξv:=F(v;w)\xi_{v}:=F(v;w), We apply the three algorithms above and measure the run-times, where each algorithm is tested for its ability to reach approximately the (known) solution. We remark that the actual implementation may be slowed down by the (constant) cost of some of its primitive operations, but it can also gain efficiency from certain practical heuristic improvements. For example, in contrast to the worst case analysis, we could stop the recursion in the algorithm of Section 4, at any step of the octree expansion, whenever the maximum incidence count obtained so far is larger than the number of surfaces crossing a cell of the octree. The same applies for the primal-dual technique in the dual stage. On the other hand, finding whether or not a surface crosses a box in pose space, takes at least the time to test for intersections of the surface with 32 edges of the box, and this constant affects greatly the run-time. The O(1/ε6)O(1/{\varepsilon}^{6}) bound in the canonical surfaces algorithm is huge and has no effect in practice for this problem. For this reason, the overall number of surfaces that we have to consider in the recursion can be very large. The canonical surfaces algorithm in our setting does not change much with ε{\varepsilon} because we are far from the second term effect. We show in Figure 3, a comparison of the three algorithms. Since vσ𝐰v^{\prime}\in\sigma_{\bf w} we have that ξ=F(v;w)\xi=F(v^{\prime};w), η=G(v;w)\eta=G(v^{\prime};w). We want to show that 𝖿𝖽(v,𝐰)=max{|ξvξ|,|ηvη|}βε{\sf fd}(v,{\bf w})=\max\left\{|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\right\}\leq\beta{\varepsilon} for some constant β\beta that depends on aa.

Regarding FF and GG as functions of vv, we compute their gradients as follows.

Fx\displaystyle F_{x} =κ[(w1x)+κ(w2y)]+[(w2y)κ(w1x)]((w1x)+κ(w2y))2=(1+κ2)(w2y)((w1x)+κ(w2y))2\displaystyle=\frac{\kappa\left[(w_{1}-x)+\kappa(w_{2}-y)\right]+\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}=\frac{(1+\kappa^{2})(w_{2}-y)}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}
Fy\displaystyle F_{y} =[(w1x)+κ(w2y)]+κ[(w2y)κ(w1x)]((w1x)+κ(w2y))2=(1+κ2)(w1x)((w1x)+κ(w2y))2\displaystyle=\frac{-\left[(w_{1}-x)+\kappa(w_{2}-y)\right]+\kappa\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}=-\frac{(1+\kappa^{2})(w_{1}-x)}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}
Fz\displaystyle F_{z} =0\displaystyle=0
Fκ\displaystyle F_{\kappa} =(w1x)[(w1x)+κ(w2y)](w2y)[(w2y)κ(w1x)]((w1x)+κ(w2y))2\displaystyle=\frac{-(w_{1}-x)\left[(w_{1}-x)+\kappa(w_{2}-y)\right]-(w_{2}-y)\left[(w_{2}-y)-\kappa(w_{1}-x)\right]}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}}
=(w1x)2+(w2y)2((w1x)+κ(w2y))2,\displaystyle=-\frac{(w_{1}-x)^{2}+(w_{2}-y)^{2}}{\left((w_{1}-x)+\kappa(w_{2}-y)\right)^{2}},

and

Gx\displaystyle G_{x} =(w3z)(w1x)((w1x)2+(w2y)2)3/2\displaystyle=\frac{(w_{3}-z)(w_{1}-x)}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{3/2}}
Gy\displaystyle G_{y} =(w3z)(w2y)((w1x)2+(w2y)2)3/2\displaystyle=\frac{(w_{3}-z)(w_{2}-y)}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{3/2}}
Gz\displaystyle G_{z} =1((w1x)2+(w2y)2)1/2\displaystyle=-\frac{1}{\left((w_{1}-x)^{2}+(w_{2}-y)^{2}\right)^{1/2}}
Gκ\displaystyle G_{\kappa} =0.\displaystyle=0.

Conditions (i) and (ii) in the lemma, plus the facts that we restrict both (w1,w2,w3)(w_{1},w_{2},w_{3}) and (x,y,z)(x,y,z) to lie in the bounded domain [0,1]3[0,1]^{3}, and that |κ||\kappa| is also at most 11, are then easily seen to imply that the vv-gradients |F|,|G||\nabla F|,\;|\nabla G| are at most β\beta, for some constant β\beta that depends on aa, and so |ξvξ|β|vv|βε|\xi_{v}-\xi|\leq\beta|v-v^{\prime}|\leq\beta{\varepsilon} and |ηvη|β|vv|βε|\eta_{v}-\eta|\leq\beta|v-v^{\prime}|\leq\beta{\varepsilon}, and the lemma follows.

Proof A.1 (Proof of Lemma 2.2:).

Let

ξv\displaystyle\xi_{v} =(w2y)κ(w1x)(w1x)+κ(w2y)\displaystyle=\frac{(w_{2}-y)-\kappa(w_{1}-x)}{(w_{1}-x)+\kappa(w_{2}-y)} (9)
ηv\displaystyle\eta_{v} =w3z(w1x)2+(w2y)2.\displaystyle=\frac{w_{3}-z}{\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}}.

Since 𝖿𝖽(v,𝐰)ε{\sf fd}(v,{\bf w})\leq{\varepsilon} we have that |ξvξ|,|ηvη|ε|\xi_{v}-\xi|,\;|\eta_{v}-\eta|\leq{\varepsilon}.

Since the Equations (2) are the inverse system of those of (1), we can rewrite (9) as

z\displaystyle z =w3ηv(w1x)2+(w2y)2\displaystyle=w_{3}-\eta_{v}\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}
κ\displaystyle\kappa =(w2y)ξv(w1x)(w1x)+ξv(w2y).\displaystyle=\frac{(w_{2}-y)-\xi_{v}(w_{1}-x)}{(w_{1}-x)+\xi_{v}(w_{2}-y)}.

Hence

zz\displaystyle z-z^{\prime} =(ηηv)(w1x)2+(w2y)2\displaystyle=(\eta-\eta_{v})\sqrt{(w_{1}-x)^{2}+(w_{2}-y)^{2}}
κκ\displaystyle\kappa-\kappa^{\prime} =(w2y)ξv(w1x)(w1x)+ξv(w2y)(w2y)ξ(w1x)(w1x)+ξ(w2y).\displaystyle=\frac{(w_{2}-y)-\xi_{v}(w_{1}-x)}{(w_{1}-x)+\xi_{v}(w_{2}-y)}-\frac{(w_{2}-y)-\xi(w_{1}-x)}{(w_{1}-x)+\xi(w_{2}-y)}.

It follows right away that |zz|2ε|z-z^{\prime}|\leq\sqrt{2}{\varepsilon} (recall that all the points lie in the unit cube). For the other difference, writing

H(t)=(w2y)t(w1x)(w1x)+t(w2y)H(t)=\frac{(w_{2}-y)-t(w_{1}-x)}{(w_{1}-x)+t(w_{2}-y)}

(with the other parameters being fixed), we get

|κκ|maxt[ξv,ξ]|H(t)||ξvξ|.|\kappa-\kappa^{\prime}|\leq\max_{t\in[\xi_{v},\xi]}|H^{\prime}(t)||\xi_{v}-\xi|.

As is easily verified, we have

H(t)=(w1x)2+(w2y)2[(w1x)+t(w2y)]2.H^{\prime}(t)=-\frac{(w_{1}-x)^{2}+(w_{2}-y)^{2}}{\left[(w_{1}-x)+t(w_{2}-y)\right]^{2}}.

Since |(w1x)+ξ(w2y)|a>0\left|(w_{1}-x)+\xi(w_{2}-y)\right|\geq a>0, and |ξvξ|ε|\xi_{v}-\xi|\leq{\varepsilon}, the denominator of H(t)H^{\prime}(t) is bounded away from zero (assuming that ε{\varepsilon} is sufficiently small), and |H(t)|c|H^{\prime}(t)|\leq c for t[ξ,ξv]t\in[\xi,\xi_{v}], where cc is some fixed positive constant. This implies that |κκ|cε|\kappa-\kappa^{\prime}|\leq c{\varepsilon}, and the lemma follows.

Proof A.2 (Proof of Theorem 3.3. Part (a):).

Let (v,𝐰)(v,{\bf w}) be a pair at frame distance ε\leq{\varepsilon}. By Lemma 2.2 and the definition of Gδ1G_{\delta_{1}}, there exists a cell τGδ1\tau\in G_{\delta_{1}} such that vτv\in\tau and 𝐰Sτ{\bf w}\in S_{\tau}.

By definition, the surface σv;τ\sigma^{*}_{v;\tau} contains the point

(w1,w2,w3,ξvF(cτ;w),ηvG(cτ;w)),(w_{1},w_{2},w_{3},\xi_{v}-F(c_{\tau};w),\eta_{v}-G(c_{\tau};w)),

where ξv\xi_{v} and ηv\eta_{v} are given by (9). Since 𝖿𝖽(v,𝐰)ε{\sf fd}(v,{\bf w})\leq{\varepsilon}, the points (w1,w2,w3,ξvF(cτ;w),ηvG(cτ;w))(w_{1},w_{2},w_{3},\xi_{v}-F(c_{\tau};w),\eta_{v}-G(c_{\tau};w)) and (w1,w2,w3,ξτ,ητ)(w_{1},w_{2},w_{3},\xi_{\tau},\eta_{\tau}) lie at LL_{\infty}-distance at most ε{\varepsilon}, therefore σvVτ\sigma^{*}_{v}\in V^{*}_{\tau^{*}} where τGδ2\tau^{*}\in G_{\delta_{2}} is the cell that contains 𝐰τ{\bf w}_{\tau}.

Together, these two properties imply that (v,𝐰)(v,{\bf w}) is counted by the algorithm. Moreover, since we kept both primal and dual boxes pairwise disjoint, each such pair is counted exactly once.

Part (b): Let (v,𝐰)(v,{\bf w}) be an ε{\varepsilon}-incident pair that we encounter, where vv and 𝐰{\bf w} are encoded as above. That is, σ𝐰\sigma_{\bf w} crosses the primal cell τ\tau of Gδ1G_{\delta_{1}} that contains vv, or a neighboring cell in the (z,κ)(z,\kappa)-directions, and σv;τ\sigma^{*}_{v;\tau} crosses the dual cell τ\tau^{*} that contains 𝐰τ{\bf w}_{\tau}, or a neighboring cell in the (ξτ,ητ)(\xi_{\tau},\eta_{\tau})-directions. This means that τ\tau (or a neighboring cell) contains a point c=(x,y,z,κ)σ𝐰c=(x^{\prime},y^{\prime},z^{\prime},\kappa^{\prime})\in\sigma_{\bf w}, and τ\tau^{*} (or a neighboring cell) contains a point 𝐰τ=(w1,w2,w3,ξτ,ητ)σv;τ{\bf w}^{\prime}_{\tau}=(w_{1}^{\prime},w_{2}^{\prime},w_{3}^{\prime},\xi^{\prime}_{\tau},\eta^{\prime}_{\tau})\in\sigma^{*}_{v;\tau}. The former containment means that

ξ=(w2y)κ(w1x)(w1x)+κ(w2y),η=w3z(w1x)2+(w2y)2,\xi=\frac{(w_{2}-y^{\prime})-\kappa^{\prime}(w_{1}-x^{\prime})}{(w_{1}-x^{\prime})+\kappa^{\prime}(w_{2}-y^{\prime})},\quad\quad\eta=\frac{w_{3}-z^{\prime}}{\sqrt{(w_{1}-x^{\prime})^{2}+(w_{2}-y^{\prime})^{2}}},

and that

|xx|,|yy|δ1,|zz|22δ1,and|κκ|2cδ1.|x-x^{\prime}|,|y-y^{\prime}|\leq\delta_{1},\quad\quad|z-z^{\prime}|\leq 2\sqrt{2}\delta_{1},\quad\text{and}\quad|\kappa-\kappa^{\prime}|\leq 2c\delta_{1}.

To interpret the latter containment, we write, using the definition of ξτ\xi^{\prime}_{\tau}, ητ\eta^{\prime}_{\tau}, and the fact that 𝐰τσv;τ{\bf w}^{\prime}_{\tau}\in\sigma^{*}_{v;\tau},

ξτ\displaystyle\xi^{\prime}_{\tau} =F(v;w)F(cτ;w)=(w2y)κ(w1x)(w1x)+κ(w2y)(w2yτ)κτ(w1xτ)(w1xτ)+κτ(w2yτ)\displaystyle=F(v;w^{\prime})-F(c_{\tau};w^{\prime})=\frac{(w^{\prime}_{2}-y)-\kappa(w^{\prime}_{1}-x)}{(w^{\prime}_{1}-x)+\kappa(w^{\prime}_{2}-y)}-\frac{(w^{\prime}_{2}-y_{\tau})-\kappa_{\tau}(w^{\prime}_{1}-x_{\tau})}{(w^{\prime}_{1}-x_{\tau})+\kappa_{\tau}(w^{\prime}_{2}-y_{\tau})}
ητ\displaystyle\eta^{\prime}_{\tau} =G(v;w)G(cτ;w1)=w3z(w1x)2+(w2y)2w3zτ(w1xτ)2+(w2yτ)2,\displaystyle=G(v;w^{\prime})-G(c_{\tau};w^{\prime}_{1})=\frac{w^{\prime}_{3}-z}{\sqrt{(w^{\prime}_{1}-x)^{2}+(w^{\prime}_{2}-y)^{2}}}-\frac{w^{\prime}_{3}-z_{\tau}}{\sqrt{(w^{\prime}_{1}-x_{\tau})^{2}+(w^{\prime}_{2}-y_{\tau})^{2}}},

where w=(w1,w2,w3)w^{\prime}=(w^{\prime}_{1},w^{\prime}_{2},w^{\prime}_{3}), and where cτ=(xτ,yτ,zτ,κτ)c_{\tau}=(x_{\tau},y_{\tau},z_{\tau},\kappa_{\tau}) is the centerpoint of τ\tau, and

max{|w1w1|,|w2w2|,|w3w3|}\displaystyle\max\left\{|w_{1}-w_{1}^{\prime}|,\;|w_{2}-w_{2}^{\prime}|,\;|w_{3}-w_{3}^{\prime}|\right\} =2δ2\displaystyle=2\delta_{2}
max{|ξτξτ|,|ητητ|}\displaystyle\max\left\{|\xi_{\tau}-\xi^{\prime}_{\tau}|,\;|\eta_{\tau}-\eta^{\prime}_{\tau}|\right\} =2ε,\displaystyle=2{\varepsilon},

where

ξτ=ξF(cτ;w)andητ=ηG(cτ;w).\xi_{\tau}=\xi-F(c_{\tau};w)\quad\quad\text{and}\quad\quad\eta_{\tau}=\eta-G(c_{\tau};w).

By definition (of FF, GG, and the frame distance), we have

𝖿𝖽(v,𝐰)=max{|ξF(v;w)|,|ηG(v;w)|},{\sf fd}(v,{\bf w})=\max\left\{\left|\xi-F(v;w)\right|,\;\left|\eta-G(v;w)\right|\right\},

which we can bound by writing

|ξF(v;w)|\displaystyle\left|\xi-F(v;w)\right| |ξF(v;w)+F(cτ;w)F(cτ;w)|\displaystyle\leq\left|\xi-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right|
+|F(v;w)F(v;w)+F(cτ;w)F(cτ;w)|\displaystyle+\left|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right|
=|ξτξτ|+|F(v;w)F(v;w)+F(cτ;w)F(cτ;w)|,\displaystyle=\left|\xi_{\tau}-\xi^{\prime}_{\tau}\right|+\left|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)\right|,
|ηG(v;w)|\displaystyle\left|\eta-G(v;w)\right| |ηG(v;w)+G(cτ;w)G(cτ;w)|\displaystyle\leq\left|\eta-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right|
+|G(v;w)G(v;w)+G(cτ;w)G(cτ;w)|\displaystyle+\left|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right|
=|ητητ|+|G(v;w)G(v;w)+G(cτ;w)G(cτ;w)|.\displaystyle=\left|\eta_{\tau}-\eta^{\prime}_{\tau}\right|+\left|G(v;w)-G(v;w^{\prime})+G(c_{\tau};w^{\prime})-G(c_{\tau};w)\right|.

We are given that

|ξτξτ|,|ητητ|2ε,\left|\xi_{\tau}-\xi^{\prime}_{\tau}\right|,\;\left|\eta_{\tau}-\eta^{\prime}_{\tau}\right|\leq 2{\varepsilon},

so it remains to bound the other term in each of the two right-hand sides. Consider for example the expression

F(v;w)F(v;w)+F(cτ;w)F(cτ;w).F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w). (10)

Write cτ=v+tc_{\tau}=v+t and w=w+sw^{\prime}=w+s, for suitable vectors t,s3t,s\in{\mathbb{R}}^{3}. We expand the expression up to second order, by writing

F(v;w)\displaystyle F(v;w^{\prime}) =F(v;w+s)=F(v,w)+swF(v;w)+12sTHw(v;w)s\displaystyle=F(v;w+s)=F(v,w)+s\cdot\nabla_{w}F(v;w)+\frac{1}{2}s^{T}H_{w}(v;w)s
F(cτ;w)\displaystyle F(c_{\tau};w) =F(v+t;w)=F(v,w)+tvF(v;w)+12tTHv(v;w)t\displaystyle=F(v+t;w)=F(v,w)+t\cdot\nabla_{v}F(v;w)+\frac{1}{2}t^{T}H_{v}(v;w)t
F(cτ;w)\displaystyle F(c_{\tau};w^{\prime}) =F(v+t;w+s)=F(v,w)+swF(v;w)+tvF(v;w)\displaystyle=F(v+t;w+s)=F(v,w)+s\cdot\nabla_{w}F(v;w)+t\cdot\nabla_{v}F(v;w)
+12sTHw(v;w)s+12tTHv(v;w)t+tTHv;w(v;w)s,\displaystyle+\frac{1}{2}s^{T}H_{w}(v;w)s+\frac{1}{2}t^{T}H_{v}(v;w)t+t^{T}H_{v;w}(v;w)s,

where w\nabla_{w} (resp., v\nabla_{v}) denotes the gradient with respect to the variables ww (resp., vv), and where HwH_{w} (resp., HvH_{v}, Hv;wH_{v;w}) denotes the Hessian submatrix of second derivatives in which both derivatives are with respect to ww (resp., both are with respect to vv, one derivative is with respect to vv and the other is with respect to ww).

Substituting in (10), we get that, up to second order,

|F(v;w)F(v;w)\displaystyle|F(v;w)-F(v;w^{\prime}) +F(cτ;w)F(cτ;w)|\displaystyle+F(c_{\tau};w^{\prime})-F(c_{\tau};w)|
=|tTHv;w(v;w)s|Hv;w(v;w)|t||s|,\displaystyle=|t^{T}H_{v;w}(v;w)s|\leq\|H_{v;w}(v;w)\|_{\infty}|t||s|,

where Hv;w(v;w)\|H_{v;w}(v;w)\|_{\infty} is the maximum of the absolute values of all the “mixed” second derivatives. (Note that the mixed part of the Hessian of the Hessian arises also in the analysis of the algorithm in Section 4.) Arguing as in the preceding analysis and using the assumptions in the theorem, one can show that all these derivatives are bounded by some absolute constants, concluding that

|F(v;w)F(v;w)+F(cτ;w)F(cτ;w)|=O(δ1δ2)=O(ε),|F(v;w)-F(v;w^{\prime})+F(c_{\tau};w^{\prime})-F(c_{\tau};w)|=O(\delta_{1}\delta_{2})=O({\varepsilon}),

which implies that

|ξF(v;w)|=O(ε).\left|\xi-F(v;w)\right|=O({\varepsilon}).

Applying an analogous analysis to GG, we also have

|ηG(v;w)|=O(ε).\left|\eta-G(v;w)\right|=O({\varepsilon}).

Together, these bounds complete the proof of part (b) of the theorem.

Proof A.3 (Proof of Lemma 4.1:).

Fix jj, consider the function

Kj(𝐱):=Gj(𝐱;𝐬)Gj(𝐱;𝐭),K_{j}({\bf x}):=G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t}),

and recall our assumption that Gj(ξ;𝐭)=Gj(ξ;𝐬)=ξjG_{j}(\xi;{\bf t})=G_{j}(\xi;{\bf s})=\xi_{j}. Then we can also write the left-hand side of (5) as Kj(𝐱)Kj(ξ)K_{j}({\bf x})-K_{j}({\bf\xi}). By the intermediate value theorem, it can be written as

Kj(𝐱)Kj(ξ)=Kj(𝐱),𝐱ξ,K_{j}({\bf x})-K_{j}({\bf\xi})=\langle\nabla K_{j}({\bf x}^{\prime}),{\bf x}-{\bf\xi}\rangle,

for some intermediate value 𝐱{\bf x}^{\prime} (that depends on 𝐬{\bf s} and 𝐭{\bf t}). By definition, we have,

Kj(𝐱)=𝐱Gj(𝐱;𝐬)𝐱Gj(𝐱;𝐭),\nabla K_{j}({\bf x}^{\prime})=\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf s})-\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t}), (11)

whose norm is bounded by c2|st|c_{2}|s-t| by Condition (ii). Using the Cauchy-Schwarz inequality, we can thus conclude that

|Gj(𝐱;𝐬)Gj(𝐱;𝐭)|\displaystyle\left|G_{j}({\bf x};{\bf s})-G_{j}({\bf x};{\bf t})\right| =|Kj(𝐱)Kj(ξ)|\displaystyle=\left|K_{j}({\bf x})-K_{j}({\bf\xi})\right|
|𝐱ξ||𝐱Gj(𝐱;𝐬)𝐱Gj(𝐱;𝐭)|\displaystyle\leq|{\bf x}-{\bf\xi}|\cdot\left|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf s})-\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right|
c2|𝐱ξ||𝐭𝐬|,\displaystyle\leq c_{2}|{\bf x}-{\bf\xi}||{\bf t}-{\bf s}|\ ,

as asserted.

Proof A.4 (Proof of Lemma 4.4:).

Each surface σ𝐭,𝐟\sigma_{{\bf t},{\bf f}^{\prime}} in S~τ\tilde{S}_{\tau} meets τ\tau. That is, there exists a point (x1,,xd)(x_{1},\ldots,x_{d}) in τ=[0,δ]d\tau=[0,\delta]^{d} that lies on σ𝐭,𝐟\sigma_{{\bf t},{\bf f}^{\prime}}, so we have ξjGj(𝐱;𝐭)+fjξj+δ\xi_{j}\leq G_{j}({\bf x};{\bf t})+f^{\prime}_{j}\leq\xi_{j}+\delta for each j=k+1,,dj=k+1,\ldots,d, where 𝐱=(x1,,xk){\bf x}=(x_{1},\ldots,x_{k}). Hence, for some intermediate value 𝐱{\bf x}^{\prime}, we have

|fj|\displaystyle\left|f^{\prime}_{j}\right| =|Gj(ξ;𝐭)fjGj(𝐱;𝐭)+Gj(𝐱;𝐭)Gj(ξ;𝐭)|\displaystyle=\left|G_{j}({\bf\xi};{\bf t})-f^{\prime}_{j}-G_{j}({\bf x};{\bf t})+G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right|
|ξj(fj+Gj(𝐱;𝐭))|+|Gj(𝐱;𝐭)Gj(ξ;𝐭)|\displaystyle\leq\left|\xi_{j}-(f^{\prime}_{j}+G_{j}({\bf x};{\bf t}))\right|+\left|G_{j}({\bf x};{\bf t})-G_{j}({\bf\xi};{\bf t})\right|
δ+|𝐱Gj(𝐱;𝐭)||𝐱ξ|\displaystyle\leq\delta+\left|\nabla_{\bf x}G_{j}({\bf x}^{\prime};{\bf t})\right|\cdot|{\bf x}-{\bf\xi}|
δ+c1δ=(c1+1)δ,\displaystyle\leq\delta+c_{1}\delta=(c_{1}+1)\delta,

where the first inequality follows by the triangle inequality, the second follows since |ξj(fj+Gj(𝐱;𝐭))|δ\left|\xi_{j}-(f^{\prime}_{j}+G_{j}({\bf x};{\bf t}))\right|\leq\delta, the third by the intermediate value theorem and the Cauchy-Schwarz inequality, and the fourth by Condition (i).

Appendix B A simple algorithm

We present a simple naive solution which does not require any of the sophisticated machinery developed in this work. It actually turns out to be the most efficient solution when nn is small.

We construct a grid GG over Q=[0,1]3×[1,1]Q=[0,1]^{3}\times[-1,1], of cells τ\tau, each of dimensions ε×ε×22ε×2cε{\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2c{\varepsilon}, where cc is the constant of Lemma 2.2. (We use this non-square grid GG since we want to find ε{\varepsilon}-approximate incidences in terms of frame distance.) For each cell τ\tau of GG we compute the number of surfaces σ𝐰\sigma_{\bf w} that intersect τ\tau.

Consider now a shifted version GG^{\prime} of GG in which the vertices of GG^{\prime} are the centers of the cells of GG. To report how many surfaces are within frame distance ε{\varepsilon} from a vertex qGq\in G^{\prime}, we return the count of the cell of GG whose center is qq. By Lemma 2.2 and Lemma 2.1, this includes all surfaces at frame distance ε{\varepsilon} from qq, but may also count surfaces at frame distance at most 10+4c2βε\sqrt{10+4c^{2}}\beta{\varepsilon} from qq, where β\beta is the constant in Lemma 2.1. (The distance from qq to the farthest corner of its cell is 12+12+(2c)2+(22)2ε=10+4c2βε\sqrt{1^{2}+1^{2}+(2c)^{2}+(2\sqrt{2})^{2}}{\varepsilon}=\sqrt{10+4c^{2}}\beta{\varepsilon}.)

It takes O(nε2)O(\frac{n}{{\varepsilon}^{2}}) time to construct this data structure. Indeed, cell boundaries reside on O(1ε)O(\frac{1}{{\varepsilon}}) hyperplanes, so we compute the intersection curve of each surface with each of these hyperplanes, in a total of O(nε)O(\frac{n}{{\varepsilon}}) time. Then, for each such curve we find the cell boundaries that it intersects within its three-dimensional hyperplane in O(1ε)O(\frac{1}{{\varepsilon}}) time. We summarize this result in the following theorem.

Theorem B.1.

The algorithm described above approximates the the number of surfaces that are at distance ε{\varepsilon} to each vertex qGq\in G^{\prime} where GG^{\prime} is an ε×ε×22ε×2cε{\varepsilon}\times{\varepsilon}\times 2\sqrt{2}{\varepsilon}\times 2c{\varepsilon} grid in O(nε2)O(\frac{n}{{\varepsilon}^{2}}). (The approximation is in the sense defined above.)

Proof B.2.

Correctness follow from Lemmas 2.2 and 2.1, the running time follows since there are only O(nε2)O(\frac{n}{{\varepsilon}^{2}}) cells of GG that at least one surface intersects.

In fact we can find for each vertex qq of GG^{\prime}, the exact number of ε{\varepsilon}-incident surfaces (i.e. surfaces at distance at most ε{\varepsilon} from qq). For this we keep with each cell τ\tau of GG, the list of the surfaces that intersect τ\tau. Then for each vertex qGq\in G^{\prime} we traverse the surfaces stored in its cell and check which of them is within frame distance ε{\varepsilon} from qq. The asymptotic running time is still O(nε2)O(\frac{n}{{\varepsilon}^{2}}).

If we want to get an incidences counts of vertices of a finer grid that GG, we use a union of several shifted grids as above. This also allows to construct a data structure that can return an ε{\varepsilon}-incidences count of any query point.

For the camera pose problem we use the vertex of GεG_{\varepsilon} of largest ε{\varepsilon}-incidences count as the position of the camera.

Appendix C Geometric proximity via canonical surfaces: The case of hyperplanes

We have a set HH of nn hyperplanes in d{\mathbb{R}}^{d} that cross the unit cube τ0=[0,1]d\tau_{0}=[0,1]^{d}, and a given error parameter ε{\varepsilon}. Each hyperplane hHh\in H is given by an equation of the form xd=i=1d1aixi+bx_{d}=\sum_{i=1}^{d-1}a_{i}x_{i}+b. We assume, for simplicity, that |ai|1|a_{i}|\leq 1 for each hHh\in H and for each i=1,,d1i=1,\ldots,d-1. Moreover, since hh crosses τ0\tau_{0}, we have |b|d|b|\leq d, as is easily checked. (This can always be enforced by rewriting the equation turning the xix_{i} with the coefficient aia_{i} of largest absolute value into the independent coordinate.)

For our rounding scheme we define ε=ε/log(1/ε){\varepsilon}^{\prime}={\varepsilon}/\log(1/{\varepsilon}). We discretize each hyperplane hHh\in H as follows. Let the equation of hh be xd=i=1d1aixi+bx_{d}=\sum_{i=1}^{d-1}a_{i}x_{i}+b. We replace each aia_{i} by the integer multiple of ε/d{\varepsilon}^{\prime}/d that is nearest to it, and do the same for bb. Denoting these ‘snapped’ values as aia^{\prime}_{i} and bb^{\prime}, respectively, we replace hh by the hyperplane hh^{\prime}, given by xd=i=1d1aixi+bx_{d}=\sum_{i=1}^{d-1}a^{\prime}_{i}x_{i}+b^{\prime}. For any 𝐱=(x1,,xd1)[0,1]d1{\bf x}=(x_{1},\ldots,x_{d-1})\in[0,1]^{d-1}, the xdx_{d}-vertical distance between hh and hh^{\prime} at 𝐱{\bf x} is

|(i=1d1aixi+b)(i=1d1aixi+b)|i=1d1|aiai|xi+|bb|i=1d1|aiai|+|bb|ε.\left|\left(\sum_{i=1}^{d-1}a_{i}x_{i}+b\right)-\left(\sum_{i=1}^{d-1}a^{\prime}_{i}x_{i}+b^{\prime}\right)\right|\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|x_{i}+|b-b^{\prime}|\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|+|b-b^{\prime}|\leq{\varepsilon}^{\prime}.

We define the weight of each canonical hyperplane to be the number of original hyperplanes that got rounded to it, and we refer to the set of all canonical hyperplanes by HcH^{c}.

We describe a recursive procedure that approximates the number of ε{\varepsilon}-incident hyperplanes of HH to each vertex of a (4ε)(4{\varepsilon})-grid GG that tiles up [0,1]d[0,1]^{d}. Specifically, for each vertex vv of GG we report a count that includes all hyperplanes in HH that are at Euclidean distance at most ε{\varepsilon} from vv but it may also count hyperplanes of HH that are at distance up to (2d+1)ε(2\sqrt{d}+1){\varepsilon} from vv.

Our procedure constructs an octree decomposition of τ0\tau_{0}, all the way to subcubes of side length 4ε4{\varepsilon}. (We assume that 4ε4{\varepsilon} is a negative power of 22 to avoid rounding issues.) We shift the grid GG such that its vertices are centers of these leaf-subcubes. At level jj of the recursive construction, we have subcubes τ\tau of side length δ=1/2j\delta=1/2^{j}. For each such τ\tau we construct a set HτH_{\tau} of more coarsely rounded hyperplanes. The weight of each hyperplane hh in HτH_{\tau} is the sum of the weights of the hyperplanes in the parent cube p(τ)p(\tau) of τ\tau that got rounded to hh, which, by induction, is the number of original hyperplanes that are rounded to it (by repeated rounding along the path in the recursion tree leading to τ\tau).

At the root, where j=0j=0, we set Hτ=HcH_{\tau}=H^{c} (where each hHτh\in H_{\tau} has the initial weight of the number of original hyperplanes rounded to it, as described above). At any other cell τ\tau we obtain HτH_{\tau} by applying a rounding step to the set H~τ\tilde{H}_{\tau} of the hyperplanes of Hp(τ)H_{p(\tau)} that intersect τ\tau.

The coarser discretization of the hyperplanes of H~τ\tilde{H}_{\tau} that produces the set HτH_{\tau} proceeds as follows. Let (ξ1,,ξd)(\xi_{1},\ldots,\xi_{d}) denote the coordinates of the corner of τ\tau with smallest coordinates, so τ=i=1d[ξi,ξi+δ]\tau=\prod_{i=1}^{d}[\xi_{i},\xi_{i}+\delta].

Let hh be a hyperplane of HτH_{\tau}, and rewrite its equation as

xdξd=i=1d1ai(xiξi)+b.x_{d}-\xi_{d}=\sum_{i=1}^{d-1}a_{i}(x_{i}-\xi_{i})+b\ .

This rewriting only changes the value of bb but does not affect the aia_{i}’s. Since hh crosses τ\tau, we have |b|dδ|b|\leq d\delta (and |ai|1|a_{i}|\leq 1 for each ii). We now re-discretize each coefficient aia_{i} (resp., bb) to the integer multiple of εdδ\frac{{\varepsilon}^{\prime}}{d\delta} (resp., εd\frac{{\varepsilon}^{\prime}}{d}) that is nearest to it. Denoting these snapped values as aia^{\prime}_{i} and bb^{\prime}, respectively, we replace hh by the hyperplane hh^{\prime} given by

xdξd=i=1d1ai(xiξi)+b.x_{d}-\xi_{d}=\sum_{i=1}^{d-1}a^{\prime}_{i}(x_{i}-\xi_{i})+b^{\prime}.

This re-discretization of the coefficients aia_{i} is a coarsening of the discretization of the hyperplanes in H~τ\tilde{H}_{\tau}. The set HτH_{\tau} contains all the new, more coarsely rounded hyperplanes that we obtain from the hyperplanes in H~τ\tilde{H}_{\tau} in this manner. Note that several hyperplanes in H~τ\tilde{H}_{\tau} may be rounded to the same hyperplane in HτH_{\tau}. We set the weight of each hyperplane in HτH_{\tau} to be the sum of the weights of the hyperplanes in H~τ\tilde{H}_{\tau} that got rounded to it. (Note that although every hyperplane of H~τ\tilde{H}_{\tau} crosses τ\tau, such an hh may get rounded to a hyperplane that misses τ\tau, in which case it is not represented by any hyperplane in HτH_{\tau}.)

For any 𝐱=(x1,,xd1)i=1d1[ξi,ξi+δ]{\bf x}=(x_{1},\ldots,x_{d-1})\in\prod_{i=1}^{d-1}[\xi_{i},\xi_{i}+\delta], the xdx_{d}-vertical distance between hh and hh^{\prime} at 𝐱{\bf x} is

|(ξd+i=1d1ai(xiξi)+b)(ξd+i=1d1ai(xiξi)+b)|\displaystyle\Bigl{|}\Bigl{(}\xi_{d}+\sum_{i=1}^{d-1}a_{i}(x_{i}-\xi_{i})+b\Bigr{)}-\Bigl{(}\xi_{d}+\sum_{i=1}^{d-1}a^{\prime}_{i}(x_{i}-\xi_{i})+b^{\prime}\Bigr{)}\Bigr{|}
i=1d1|aiai|(xiξi)+|bb|i=1d1|aiai|δ+|bb|ε.\displaystyle\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|(x_{i}-\xi_{i})+|b-b^{\prime}|\leq\sum_{i=1}^{d-1}|a_{i}-a^{\prime}_{i}|\delta+|b-b^{\prime}|\leq{\varepsilon}^{\prime}.

Since the original value of aia_{i} is in [1,1][-1,1] and we round it to an integer multiple of εdδ\frac{{\varepsilon}^{\prime}}{d\delta}, the hyperplanes in HτH_{\tau} have O(δε)O(\frac{\delta}{{\varepsilon}^{\prime}}) possible values for each coefficient aia_{i}. Furthermore, these hyperplanes also have O(δε)O(\frac{\delta}{{\varepsilon}^{\prime}}) possible values for bb, because |b|δd|b|\leq\delta d for every hyperplane in H~τ\tilde{H}_{\tau} (since it intersects τ\tau). It follows that |Hτ|=O((δε)d)|H_{\tau}|=O\left(\left(\frac{\delta}{{\varepsilon}^{\prime}}\right)^{d}\right), and the total size of all sets HτH_{\tau}, over all cells τ\tau at the same level of the octree, is O((1ε)d)O\left(\left(\frac{1}{{\varepsilon}^{\prime}}\right)^{d}\right).

Finally, at every leaf τ\tau of the octree we report the sum of the weights of the hyperplanes in HτH_{\tau} as the approximate ε{\varepsilon}-incidences count of the vertex of GG at the center of τ\tau.

Theorem C.1.

Let HH be a set of nn hyperplanes in d{\mathbb{R}}^{d} that cross the unit cube [0,1]d[0,1]^{d}, and let GG be the (4ε)(4{\varepsilon})-grid within [0,1]d[0,1]^{d}. The algorithm described above reports for each vertex vv of GG an approximate ε{\varepsilon}-incidences count that includes all hyperplanes at Euclidean distance at most ε{\varepsilon} from vv and may include some hyperplanes at distance at most (2d+1)ε(2\sqrt{d}+1){\varepsilon} from vv. The running time of this algorithm is O(n+(log(1/ε))d+1εd)O\left(n+\frac{(\log(1/{\varepsilon}))^{d+1}}{{\varepsilon}^{d}}\right).

Proof C.2.

Let vGv\in G, and consider a hyperplane hHh\in H at distance at most ε{\varepsilon} from vv. The hyperplane hh is rounded to a hyperplane hHch^{\prime}\in H^{c} which is at distance at most ε=εlog(1/ε){\varepsilon}^{\prime}=\frac{{\varepsilon}}{\log(1/{\varepsilon})} from hh,333The distance between two hyperplanes is defined to be the maximum vertical distance between them. and thereby at distance at most ε+ε{\varepsilon}+{\varepsilon}^{\prime} from vv. The hyperplane hh^{\prime} is further rounded to other hyperplanes while propagating down the octree. The distance from hh^{\prime} from the hyperplane that it is rounded to in HcH^{c} is at most εlog(1/ε){{\varepsilon}}{\log(1/{\varepsilon})}, and, in general the distance of hh^{\prime} from any hyperplane hjh_{j}, that it is rounded to at any level jj, is at most (j+1)ε=(j+1)εlog(1/ε)(j+1){\varepsilon}^{\prime}=\frac{(j+1){\varepsilon}}{\log(1/{\varepsilon})}. (Note that hh^{\prime} is rounded to different hyperplanes in different cells of level jj.) Therefore the distance of hjh_{j} from vv is at most ε+(j+1)εlog(1/ε)2ε{\varepsilon}+\frac{(j+1){\varepsilon}}{\log(1/{\varepsilon})}\leq 2{\varepsilon} (since j+1log(1/ε)j+1\leq\log(1/{\varepsilon})). It follows that hh^{\prime} is rounded to some hyperplane that crosses the cell that contains vv, at each level of the octree. In particular hh^{\prime} is (repeatedly) rounded to some hyperplane at the leaf containing vv and is included in the weight of some hyperplane at that leaf.

Consider now a hyperplane hHh\in H that is rounded to some hyperplane hh^{\ell} at the leaf τ\tau containing vv. The hyperplane hh^{\ell} is at distance at most ε{\varepsilon} from hh. Therefore hh is at distance at most ε{\varepsilon} from the boundary of the leaf-cell containing vv. The distance of vv to the boundary of the leaf-cell containing it is at most 2dε2\sqrt{d}{\varepsilon}, so the distance of hh from vv is at most (2d+1)ε(2\sqrt{d}+1){\varepsilon}.

The running time follows from the fact that the total size of the sets HτH_{\tau} for all cells τ\tau at a particular level of the quadtree is O((log(1/ε))dεd)O\left(\frac{(\log(1/{\varepsilon}))^{d}}{{\varepsilon}^{d}}\right) and there are log(1/(4ε))\log(1/(4{\varepsilon})) levels.

Note that if we consider an arbitrary point pp then each hyperplane at distance at most ε{\varepsilon} from pp is included in the approximate count of at least one of the vertices of the grid GG surrounding pp. In this rather weak sense, the largest approximate incidences count of a vertex of GG can be considered as an approximation to the number of ε{\varepsilon}-close hyperplanes to the point pdp\in{\mathbb{R}}^{d} with the largest number of ε{\varepsilon}-close hyperplanes.

Our octree data structure can give an approximate ε{\varepsilon}-incidences count for any query point qq (albeit with somewhat worse constants). For this we construct a constant number of octree structures over 5d5^{d} shifted (by intergral multiple of ε{\varepsilon}) grids of a somewhat larger side-length, say 5ε5{\varepsilon}. The grids are shifted such that each cell cc of a finer grid of side length ε{\varepsilon} is centered in a larger grid cell of one of our grids, say GcG_{c}. We use GcG_{c} to answer queries qq that lie in cc, by returning the sum of the weights of the hyperplanes in hτh_{\tau} where τ\tau is the leaf of GcG_{c} containing qq.

We can also modify this data structure such that it can answer ε{\varepsilon}-incidences queries exactly. That is, given a query point qq, it can count (or report) the number of hyperplanes at distance at most ε{\varepsilon} from qq and only these hyperplanes. To do this we maintain pointers from each hyperplane hh in HτH_{\tau} to the hyperplanes in Hp(τ)H_{p(\tau)} that got rounded to hh. To answer a query qq, we find the leaf cell τ\tau containing qq and then we traverse back the pointers of the hyperplanes of HτH_{\tau} all the way up the octree to identify the original hyperplanes that were rounded to them. We then traverse this set of original hypeprlanes and count (or report) those that are at distance at most ε{\varepsilon} from qq.