Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation ${}^{\text{*}}$

Hiroya Sato

{}^{\text{1,\,2}}

, Takuya Ikeda

{}^{\text{1}}

, and Koichi Nishiwaki¹ This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

{}^{\text{*}}

This is a contribution in Woven Planet Holdings, Inc. Part of the work is the result of Summer Internship Program.

{}^{\text{1}}

All authors are with the Woven Planet Holdings, Inc. 3 Chome-2-1 Nihonbashimuromachi, Chuo City, Tokyo, 103-0022, Japan, [firstname.lastname]@woven-planet.global

{}^{\text{2}}

Hiroya Sato is with Department of Mechano-Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan. h-sato@jsk.t.u-tokyo.ac.jp

Abstract

In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over $\mathop{\mathrm{SO}}(3)$ , in addition to the ambiguity representation. However it requires the complex computation of the normalizing constants. This is the bottleneck of loss computation in training neural networks based on Bingham representation. As such, we propose a fast-computable and easy-to-implement loss function for Bingham distribution. We also show not only to examine the parametrization of Bingham distribution but also an application based on our loss function.

I INTRODUCTION

Recently, there are many research efforts on pose estimation based on deep learning framework, such as [1, 2]. In these works, quaternion is widely used for rotation representation. However, since single quaternion can only represent a single rotation, it cannot capture an uncertainty of the observation. Handling uncertainty is quite important, especially in the situation that a target object is occluded or has a symmetric shape [3, 4] .

Many researchers have been considering how to represent the ambiguity of rotations. One way to represent it is to utilize Bingham distribution [5]. It mainly has two advantages. Firstly, the Bingham distribution is the probability distribution that is consistent with the spatial rotation $\mathop{\mathrm{SO}}(3)$ (detaily described in Section III-B), and is easy to be parametrized. Secondly, the continuous representation, which is suitable for a neural network, can be derived from this distribution, as Peretroukhin et al.[6] suggested one example of it. For the above characteristics, we choose the Bingham distribution for probabilistic rotation representation.

To optimize the probability distribution, in general, a negative log-likelihood (NLL) is common choice for loss function. Since NLL is defined in mathematically natural way, there are plenty of theorems on it. Moreover, it has potential of wide application because it can be easily extended to more complicated model, such as mixture model [7, 8].

Refer to caption — Figure 1: Inference sample of our Bingham model. The upper figure shows that the handle is visible and the corresponding distribution is low-variance, while the lower figure shows that the mug’s handle is occluded and the distribution is widely spread.

The Bingham distribution, however, has a difficulty in calculating its normalizing constants, which is a key bottleneck of Bingham NLL computation. Since the normalizing constant depends on the distribution’s parameter, we must compute the constant for each parameter during optimization. To our best knowledge, one common solution is to prepare a pre-computed table of the constants so the repetitive complicated calculation during optimization can be avoided. However, it takes time to create the table.

In this paper, we introduce a fast-computable and easy-to-implement NLL function of Bingham distribution, which based on a novel algorithm by Chen et al.[9]. We also examine the parametrization of the Bingham distribution. Furthermore, we will show its application to the object pose estimation in Section V, and show how easy to apply our method to existing pose estimator.

II RELATED WORKS

II-A Continuous Rotation Representations

4-dimensional rotation representation is widely used. In particular, quaternion is a popular representation. It is utilized such as in PoseCNN [1], PoseNet [10], and 6D-VNet [11]. Another 4-dimensional representation is an axis-rotation representation, introduced such as in MapNet [12]. Although these 4D representation are valid in some cases, it is known that every $d$ -dimensional ( $d<5$ ) rotation representation is “discontinuous” (in the sense of [13]), because the 3-dimensional real projective space $\mathbb{R}P^{3}(\cong\mathop{\mathrm{SO}}(3))$ cannot be embedded in $\mathbb{R}^{d}$ unless $d\geq 5$ [14]. Because these representations has some singular points that prevent the network from stable regression, it is preferred that we use the continuous rotation representation in training neural networks.

Researchers have proposed various high-dimension representations. For example, 9D representation was proposed in [15] by using the singular value decomposition (SVD). 10D representation was proposed in [6] using the 4-dimensional symmetric matrix, which can be used for parametrization of Bingham distribution. We will examine other continuous paramterization in Section II-B.

II-B Expressions of Rotational Ambiguity

There are several ways to express the rotational ambiguity. In [3, 16], they tried capturing the ambiguity by using multiple quaternions and minimizing the special loss function. In KOSNet [4], they used 3 parameters for describing camera’s rotation (elevation, azimuth, and rotation around optical axis), and employed Gaussian distribution for describing their ambiguity. These two representations, however, are discontinuous as described in Section II-A, since their dimensions are both lesser than 5.

Another approach is to use Bingham distribution. This distribution is easy to parameterize and has been utilized in pose estimation field. It is used for the model distribution of a Bayesian filter to realize online pose estimation [17], for visual self-localization [7], for multiview fusion [8], and for describing the pose ambiguity of objects with symmetric shape [18]. Moreover, there is a continuous parameterization of this distribution, as Peretroukhin et al.[6] gives an example of it. We adopt Bingham distribution as our probabilistic rotation representation.

II-C Loss Functions for Bingham Representation

An NLL loss function is the common choice and has advantages, as described in the Introduction. When computing Bingham loss, the main barrier is the computation of normalizing constant. One solution of this problem is to take the time to pre-compute the table of normalizing constant. For example, Kume et al. [19] use the saddlepoint approximation technique to construct this table. However, even if we use the pre-computed table, we also need to implement a smooth interpolation function as described in [18] to compute a constant missing in the table and a derivative of each constant for backpropagation. Thus, it is desirable to use Bingham’s negative log likelihood (NLL) loss without the pre-computed table.

Instead of using the Bingham NLL loss, another loss function is defined in [6]. They defined QCQP loss to prevent them from suffering this troublesome computations. While this shows moderate performance by calculating uni-modal Bingham loss implicitly, it is difficult to extend to more complicated model such as mixture models [7, 8], because the normalizing constant is required to mix multiple Bingham distributions. In contrast, our Bingham loss can be easily extended to mixture models.

III Rotation Representations

III-A Quaternion and Spatial Rotation

III-A1 Quaternion

We introduce symbols $i,j,k$ which satisfies the property:

i^{2}=j^{2}=k^{2}=ijk=-1.

(1)

A quaternion is an expression of the form:

q=w+xi+yj+zk

(2)

where $w,x,y,z$ are real numbers. $i,j,k$ are called the imaginary units of the quaternion. The set of quaternions forms a 4D vector space whose basis is $\{1,i,j,k\}$ . Therefore, we identify a quaternion $q$ defined in (2) with

\bm{q}=(w,x,y,z)^{\top}\in\mathbb{R}^{4}.

(3)

III-A2 Product of Quaternions

For any quaternion $q^{\prime}=a+bi+cj+dk$ , we can define a product of quaternions $q^{\prime}q$ thanks to the rule (1). The set of quaternions forms a group by this multiplication. $q^{\prime}q$ can also be identified with an element of $\mathbb{R}^{4}$ . We denote it $\bm{q}^{\prime}\odot\bm{q}\in\mathbb{R}^{4}$ . Note that $q^{\prime}q\neq qq^{\prime}$ in general. Since $\bm{q}^{\prime}\odot\bm{q}$ is bilinear w.r.t. $\bm{q}^{\prime}$ and $\bm{q}$ , we can define matrices $\mathop{\mathrm{\Omega_{L}}}(\bm{q}^{\prime})$ and $\mathop{\mathrm{\Omega_{R}}}(\bm{q})$ satisfying

\bm{q}^{\prime}\odot\bm{q}=\mathop{\mathrm{\Omega_{L}}}(\bm{q}^{\prime})\bm{q}=\mathop{\mathrm{\Omega_{R}}}(\bm{q})\bm{q}^{\prime}.

(4)

$\mathop{\mathrm{\Omega_{L}}}(\bm{q}^{\prime})$ and $\mathop{\mathrm{\Omega_{R}}}(\bm{q})$ can be written in closed form:

	$\displaystyle\mathop{\mathrm{\Omega_{L}}}(\bm{q}^{\prime})$	$\displaystyle=\begin{pmatrix}a&-b&-c&-d\\ b&a&-d&c\\ c&d&a&-b\\ d&-c&b&a\end{pmatrix},$		(5)
	$\displaystyle\mathop{\mathrm{\Omega_{R}}}(\bm{q})$	$\displaystyle=\begin{pmatrix}w&-x&-y&-z\\ x&w&z&-y\\ y&-z&w&x\\ z&y&-x&w\end{pmatrix}.$		(6)

III-A3 Conjugate, Norm, and Unit Quaternion

The conjugate of $\bm{q}$ is defined by $\overline{\bm{q}}=(w,-x,-y,-z)^{\top}$ . In general, $\overline{\bm{q}\odot\bm{q}^{\prime}}=\overline{\bm{q}}^{\prime}\odot\overline{\bm{q}}$ . In particular, if $\bm{q}=\bm{q}^{\prime}$ , then

\bm{q}\odot\overline{\bm{q}}=\overline{\bm{q}}\odot\bm{q}=w^{2}+x^{2}+y^{2}+z^{2}.

(7)

By definition (5) and (6), we get

\mathop{\mathrm{\Omega_{L}}}(\overline{\bm{q}})=\mathop{\mathrm{\Omega_{L}}}(\bm{q})^{\top},\quad\mathop{\mathrm{\Omega_{R}}}(\overline{\bm{q}})=\mathop{\mathrm{\Omega_{R}}}(\bm{q})^{\top}.

(8)

We define the norm of quaternion $\|\bm{q}\|$ as

\|\bm{q}\|=\sqrt{\bm{q}\odot\overline{\bm{q}}}=\sqrt{\overline{\bm{q}}\odot\bm{q}}.

(9)

We call $\bm{q}$ a unit quaternion if $\|\bm{q}\|=1$ . For a unit quaternion, its inverse coincides with its conjugate: $\bm{q}^{-1}=\overline{\bm{q}}$ . Using (8), we can see that $\mathop{\mathrm{\Omega_{L}}}(\bm{q})$ and $\mathop{\mathrm{\Omega_{R}}}(\bm{q})$ are both orthogonal:

\mathop{\mathrm{\Omega_{L}}}(\bm{q})^{\top}\mathop{\mathrm{\Omega_{L}}}(\bm{q})=\mathop{\mathrm{\Omega_{R}}}(\bm{q})^{\top}\mathop{\mathrm{\Omega_{R}}}(\bm{q})=I_{4}

(10)

where $\bm{q}$ is any unit quaternion, and $I_{4}$ is the 4-dimensional identity matrix.

We denote the set of unit quaternions $\mathbb{S}^{3}$ because it is homeomorphic to a 3-sphere $\mathbb{S}^{3}$ .

III-A4 Unit Quaternion and Spatial rotation

It is well known that unit quaternions can represent the spatial rotation. A mapping $R:\mathbb{S}^{3}\to\mathop{\mathrm{SO}}(3)$ defined below is in fact a group homomorphism:

\displaystyle R(\bm{q})=\left(\begin{array}[]{ccc}1-2y^{2}-2z^{2}&-2wz+2xy&2wy+2xz\\ 2wz+2xy&1-2x^{2}-2z^{2}&-2wx+2yz\\ -2wy+2xz&2wx+2yz&1-2x^{2}-2y^{2}\end{array}\right).

(14)

Crucially, antipodal unit quaternions represent the same rotation; namely, $R(-\bm{q})=R(\bm{q})$ .

III-B Definition of Bingham Distribution and Its Properties

The Bingham distribution [5] is a probability distribution on the unit sphere $\mathbb{S}^{d-1}\subset\mathbb{R}^{d}$ with the property of antipodal symmetry, which is consistent with the quaternion’s property. We set $d=4$ throughout of this paper because we only consider $\mathbb{S}^{3}$ . We define Bingham distribution as follows.

\mathfrak{B}(\bm{q};{D},\bm{\lambda})=\frac{1}{\mathcal{C}(\bm{\lambda})}\exp\left(\bm{q}^{\top}{D}\mathop{\mathrm{diag}}(\bm{\lambda}){D}^{\top}\bm{q}\right).

(15)

where $\bm{q}\in\mathbb{S}^{3},{D}\in{\mathrm{O}}(4),\bm{\lambda}\in\mathbb{R}^{4}$ . Here ${\mathrm{O}}(n)$ denotes the $n$ -dimensional orthogonal group. Note that it is a Lie group and its Lie algebra ${\mathfrak{o}}(n)$ is formed by $n$ -dimensional skew-symmetric matrices.

Here we define $\mathop{\mathrm{diag}}:\mathbb{R}^{m}\to\mathbb{R}^{m\times m}$ as below.

\mathop{\mathrm{diag}}:\begin{pmatrix}v_{1}\\ \vdots\\ v_{m}\end{pmatrix}\mapsto\begin{pmatrix}v_{1}&&\\ &\ddots&\\ &&v_{m}\end{pmatrix}

(16)

$\mathcal{C}(\bm{\lambda})$ is called a normalizing factor or a normalizing constant of a Bingham distribution $\mathfrak{B}(\bm{q};{D},\bm{\lambda})$ . $\mathcal{C}(\bm{\lambda})$ is defined as below:

\mathcal{C}(\bm{\lambda})=\int_{\bm{q}\in\mathbb{S}^{3}}\exp\left(\bm{q}^{\top}\mathop{\mathrm{diag}}(\bm{\lambda})\bm{q}\right)\mathrm{d}_{\mathbb{S}^{3}}(\bm{q})

(17)

where $\mathrm{d}_{\mathbb{S}^{3}}(\cdot)$ is the uniform measure on the $\mathbb{S}^{3}$ . Note that a normalizing factor depends only on $\bm{\lambda}$ . It is easy to check that for any $c\in\mathbb{R}$ ,

\mathfrak{B}(\bm{q};D,\bm{\lambda}+c)=\mathfrak{B}(\bm{q};D,\bm{\lambda})

(18)

where $\bm{\lambda}+c=(\lambda_{1}+c,\dots,\lambda_{4}+c)$ . Therefore, we can set $\bm{\lambda}$ satisfying

0=\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}\geq\lambda_{4}

(19)

by sorting a column of $D$ if necessary. A processed $D$ and a processed $\bm{\lambda}$ are denoted as $D_{\text{shifted}}$ , $\bm{\lambda}_{\text{shifted}}$ respectively It follows directly from the Rayleigh’s quotient formula that

\arg\max_{\bm{q}\in\mathbb{S}^{3}}\mathfrak{B}(\bm{q};D,\bm{\lambda})=\bm{q}_{\lambda_{1}}

(20)

where $\bm{q}_{\lambda_{1}}$ is a column vector of $D$ corresponding to the maximum entry of $\bm{\lambda}$ . If we sort $\bm{\lambda}$ as (19), $\bm{q}_{\lambda_{1}}$ coincides with the left-most column vector of $D$ .

III-C Parametrization of Bingham Distribution

III-C1 Representaion using Symmetric Matrix

There are several choices of the parametrization of Bingham distribution. Firstly, we introduce here the 10D parameterization using a symmetric matrix which proposed by [6]. Since every symmetric matrices can be diagonalized by some orthogonal matrix, we can rewrite the distribution instead of (15):

\mathfrak{B}_{\text{sym}}(\bm{q};A)=\frac{1}{\mathcal{C}(\bm{\lambda})}\exp\left(\bm{q}^{\top}A\bm{q}\right),

(21)

where $A$ is a 4-dimensional symmetric matrix. If the the eigenvalues of $A$ is sorted and shifted so as to satisfy (19), then we call it $A_{\text{shifted}}$ . We assume that all parameters are shifted.

Here we define a bijective map $\mathop{\mathrm{triu}}:\mathrm{Sym}_{4}\to\mathbb{R}^{10}$ as below:

\mathop{\mathrm{triu}}:\begin{pmatrix}\theta_{1}&\theta_{2}&\theta_{3}&\theta_{4}\\ \theta_{2}&\theta_{5}&\theta_{6}&\theta_{7}\\ \theta_{3}&\theta_{6}&\theta_{8}&\theta_{9}\\ \theta_{4}&\theta_{7}&\theta_{9}&\theta_{10}\end{pmatrix}\mapsto\begin{pmatrix}\theta_{1}\\ \vdots\\ \theta_{10}\end{pmatrix}

(22)

where $\mathrm{Sym}_{n}$ denotes the set of $n$ -dimensional symmetric matrices. We can use this for 10D parametrization of Bingham distribution $\mathcal{P}_{10}$ as following:

\mathcal{P}_{10}:\mathbb{R}^{10}\ni\bm{\theta}=\begin{pmatrix}\theta_{1}\\ \vdots\\ \theta_{10}\end{pmatrix}\mapsto\mathfrak{B}_{\text{sym}}\left(\bm{q};\mathop{\mathrm{triu}}(\bm{\theta})\right).

(23)

We call this representation Peretroukhin representation here.

III-C2 Representations of Orthogonal Matrices

The Peretroukhin representation defined above is simple; however, it includes the eigenvalue decomposition process for calculate the normalizing factor $\mathcal{C}(\bm{\lambda})$ , which has a high computational cost. It is reasonable that the network directly infers a orthogonal matrix $D$ and a diagonal entries $\bm{\lambda}$ , then reconstructs $A=D\mathop{\mathrm{diag}}(\bm{\lambda})D^{\top}$ .

To parametrize $D\in{\mathrm{O}}(4)$ , we introduce following two strategies. The first is Cayley transformation [20], which is commonly used representation for orthogonal matrices. It is expressed as follows:

\operatorname{cay}:\mathbb{R}^{6}\ni\bm{\theta}\mapsto\left(I-S(\bm{\theta})\right)^{-1}\left(I+S(\bm{\theta})\right)\in{\mathrm{O}}(4)

(24)

where $S:\mathbb{R}^{6}\to{\mathfrak{o}}(4)$ is defined as following:

S:\begin{pmatrix}\theta_{1}\\ \vdots\\ \theta_{6}\end{pmatrix}\mapsto\begin{pmatrix}0&\theta_{1}&-\theta_{2}&\theta_{3}\\ -\theta_{1}&0&\theta_{4}&-\theta_{5}\\ \theta_{2}&-\theta_{4}&0&\theta_{6}\\ -\theta_{3}&\theta_{5}&-\theta_{6}&0\end{pmatrix}.

(25)

This representation can be express all of the orthogonal matrices. We call this orthogonal matrix representation, $\operatorname{cay}$ , a Cayley representation.

The second is to use 4-dimensional representation defined by Birdal et al.[21]. Nevertheless there is a orthogonal matrix that cannot be represented in 4D, it works well in such as [7]. This representation uses the orthogonal property of unit quaternion’s matrix representation. That is, as in (5),

\operatorname{bir}:\mathbb{R}^{4}\ni\bm{\theta}=\begin{pmatrix}\theta_{1}\\ \vdots\\ \theta_{4}\end{pmatrix}\mapsto\mathop{\mathrm{\Omega_{L}}}(\bm{\theta})\in{\mathrm{O}}(4).

(26)

We call this orthogonal matrix representation $\operatorname{bir}$ a Birdal representation.

III-C3 Representations of Eigenvalues

The simplest choice for representation of $\bm{\lambda}$ is that the network infers $\bm{\lambda}\in\mathbb{R}^{4}$ directly, then shift and sort it so as to satisfy (19):

\Lambda_{4}:\mathbb{R}^{4}\ni\begin{pmatrix}\theta_{1}\\ \theta_{2}\\ \theta_{3}\\ \theta_{4}\end{pmatrix}\mapsto\begin{pmatrix}0\\ \lambda_{2}-\lambda_{1}\\ \lambda_{3}-\lambda_{1}\\ \lambda_{4}-\lambda_{1}\end{pmatrix}\in\mathbb{R}^{4}

(27)

where $\lambda_{i}$ $(i=1,\dots,4)$ is a permutation of $\{\theta_{1},\dots,\theta_{4}\}$ satisfying $\lambda_{1}\geq\dots\geq\lambda_{4}$ .

Another approach is to use softplus function [22] defined as below:

\phi(x)=\log(1+\exp(x)).

(28)

Note that $\phi(x)>0$ for all $x\in\mathbb{R}$ . Using the softmax function, we can define a 3D representation as follows [7]:

\Lambda_{3}:\mathbb{R}^{3}\ni\begin{pmatrix}\theta_{1}\\ \theta_{2}\\ \theta_{3}\end{pmatrix}\mapsto\begin{pmatrix}0\\ -\phi(\theta_{1})\\ -\phi(\theta_{1})-\phi(\theta_{2})\\ -\phi(\theta_{1})-\phi(\theta_{2})-\phi(\theta_{3})\end{pmatrix}\in\mathbb{R}^{4}

(29)

The resulting tuple automatically satisfies (19).

So far we introduced two representations for $D$ and two for $\bm{\lambda}$ . Now we have 5 choices of parametrization of Bingham distribution:

\begin{array}[]{crr@{\,\,\mapsto\,\,}l}\bullet&\mathcal{P}_{10}:&\mathbb{R}^{10}\ni\bm{\theta}&\mathfrak{B}_{\text{sym}}\left(\bm{q};\mathop{\mathrm{triu}}(\bm{\theta})\right)\\ \bullet&\mathcal{P}_{4+3}:&\mathbb{R}^{4}\times\mathbb{R}^{3}\ni(\bm{d},\bm{\lambda})&\mathfrak{B}\left(\bm{q};\operatorname{bir}(\bm{d}),\Lambda_{3}(\bm{\lambda})\right)\\ \bullet&\mathcal{P}_{4+4}:&\mathbb{R}^{4}\times\mathbb{R}^{4}\ni(\bm{d},\bm{\lambda})&\mathfrak{B}\left(\bm{q};\operatorname{bir}(\bm{d}),\Lambda_{4}(\bm{\lambda})\right)\\ \bullet&\mathcal{P}_{6+3}:&\mathbb{R}^{6}\times\mathbb{R}^{3}\ni(\bm{d},\bm{\lambda})&\mathfrak{B}\left(\bm{q};\operatorname{cay}(\bm{d}),\Lambda_{3}(\bm{\lambda})\right)\\ \bullet&\mathcal{P}_{6+4}:&\mathbb{R}^{6}\times\mathbb{R}^{4}\ni(\bm{d},\bm{\lambda})&\mathfrak{B}\left(\bm{q};\operatorname{cay}(\bm{d}),\Lambda_{4}(\bm{\lambda})\right)\end{array}

(30)

Note that these representations are all continuous in the sense of [13]. We will compare these representations in Section V.

Algorithm 1 Our implementation of the loss function

1:function Integrator(

f_{\text{integrant}}

\bm{\lambda}

)

N_{\text{min}}\leftarrow 15

;

N\leftarrow 200

r\leftarrow 2.5

;

\omega_{d}\leftarrow 0.5

4: Define

c

as in (33);

d\leftarrow c/2

5: Define

h,p_{1},p_{2}

as in (33)

S\leftarrow 0

7: for

n=-N-1,\dots,N

S\leftarrow S+w(|nh|)\cdot f_{\text{integrant}}(nh,\bm{\lambda})\cdot e^{nh\sqrt{-1}}

\triangleright

w

is defined in (34)

10: end for

11: return the real part of

\pi e^{c}hS

12:end function

13:

14:function BinghamLoss(

D

\bm{\lambda}

\bm{q}_{\text{gt}}

)

15:

D_{\text{shifted}}

\bm{\lambda}_{\text{shifted}}

\leftarrow

Sort&Shift(

D

\bm{\lambda}

)

16:

A_{\text{shifted}}

\leftarrow

D_{\text{shifted}}\mathop{\mathrm{diag}}(\bm{\lambda}_{\text{shifted}})D_{\text{shifted}}^{\top}

17:

\mathcal{C}\leftarrow

Integrator(

\mathcal{F}

\bm{\lambda}

)

\triangleright

see (38)

18: return

-\bm{q}_{\text{gt}}^{\top}A_{\text{shifted}}\,\bm{q}_{\text{gt}}+\ln\mathcal{C}

19:end function

IV EFFICIENT COMPUTATION OF LOSS FUNCTION

IV-A Definition of Loss Function

The negative log-likelihood function of the Bingham distribution can be written as follows:

\mathcal{L}(D,\bm{\lambda},\bm{q}_{\text{gt}})=-\bm{q}_{\text{gt}}^{\top}D\mathop{\mathrm{diag}}(\bm{\lambda})D^{\top}\,\bm{q}_{\text{gt}}+\ln\mathcal{C}(\bm{\lambda}).

(31)

It had been a hard problem to compute $\mathcal{C}(\bm{\lambda})$ until a high efficient computation method was proposed by [9]. Our loss function is implemented mainly based on [9].

IV-B Calculation of Normalizing Constant and Its Derivative

The whole procedure is shown in Algorithm 1. Let $r,\omega_{d}$ be real numbers satisfying

r\geq 2\quad\text{and}\quad\frac{1}{r}\leq\omega_{d}\leq 1.

(32)

We chose $r=2.5,\,\omega_{d}=0.5$ here. Let $c,h,p_{1},p_{2}$ be defined as

\begin{array}[]{c}\displaystyle c=\frac{N_{\text{min}}\pi}{r^{2}(1+r)\omega_{d}},\quad h=\sqrt{\frac{2\pi d(1+r)}{\omega_{d}N}},\\[12.0pt] \displaystyle p_{1}=\sqrt{\frac{Nh}{\omega_{d}}},\quad p_{2}=\sqrt{\frac{\omega_{d}Nh}{4}},\end{array}

(33)

where $d$ is any positive number satisfying $d<c$ . We chose $d=c/2$ here. $N$ is a positive integer satisfying $N\geq N_{\text{min}}$ . One can choose $N_{\text{min}}$ arbitrarily; however, a too small $N_{\text{min}}$ may lead to unstable computation. We chose $N_{\text{min}}=15$ here.

We define a function $w$ parametrized by $p_{1},\,p_{2}$ in (33) as below.

w(x)=\frac{1}{2}\mathop{\mathrm{erfc}}\left(\frac{x}{p_{1}}-p_{2}\right),

(34)

where $\mathop{\mathrm{erfc}}$ is the complementary error function:

\mathop{\mathrm{erfc}}(x)=1-\frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-t^{2}}dt.

(35)

Then we define

	$\displaystyle\mathcal{F}(t,\bm{\lambda})$	$\displaystyle=\prod^{4}_{k=1}\left(-\lambda_{k}+t\sqrt{-1}+c\right)^{-1/2},$	(36)
and, for each $i=1,\dots,4$ , we get
	$\displaystyle\frac{\partial\mathcal{F}}{\partial\lambda_{i}}(t,\bm{\lambda})$	$\displaystyle=\frac{1}{2}\left(-\lambda_{i}+t\sqrt{-1}+c\right)^{-1}\mathcal{F}(t,\bm{\lambda}).$	(37)

Now we can calculate the normalizing constant $\mathcal{C}$ as below

	$\displaystyle\mathcal{C}(\bm{\lambda})$	$\displaystyle=\pi e^{c}h\sum_{n=-N-1}^{N}w(\|nh\|)\,\mathcal{F}(nh,\bm{\lambda})\,e^{nh\sqrt{-1}},$		(38)
	$\displaystyle\frac{\partial\mathcal{C}}{\partial\lambda_{i}}(\bm{\lambda})$	$\displaystyle=\pi e^{c}h\sum_{n=-N-1}^{N}w(\|nh\|)\,\frac{\partial\mathcal{F}}{\partial\lambda_{i}}(nh,\bm{\lambda})\,e^{nh\sqrt{-1}},$		(39)

for each $i=1,\dots,4$ . Although a calculation result of $\mathcal{C}(\bm{\lambda})$ and ${\partial\mathcal{C}}/{\partial\lambda_{i}}(\bm{\lambda})$ should exactly be a real number, one may get a complex number with the very small imaginary part. In our implementation shown in Algorithm 1, we ignore the imaginary part, assuming that it is sufficiently small.

It is noteworthy that if we set the true value of $\mathcal{C}(\bm{\lambda})$ as $\mathcal{C}_{\text{truth}}(\bm{\lambda})$ , we get

|\mathcal{C}_{\text{truth}}(\bm{\lambda})-\mathcal{C}(\bm{\lambda})|=O\left(\sqrt{N}e^{-c\sqrt{N}}\right)

(40)

for a constant $c>0$ independent from $N$ [23]. This means that we can achieve any accuracy if we set a large enough $N$ . In this paper, we set $N=200$ in consideration of the computation time.

V APPLICATION TO POSE ESTIMATOR

V-A Implementation

In this section, we introduce an application of our representation to an existing pose estimator. We use PoseCNN [1] as the backborn framework. Our implementation is based on PoseCNN-PyTorch [24] by NVIDIA Research Projects. We call our network “Bingham-PoseCNN” for convenience. Fig. 2 shows the overview of Bingham-PoseCNN. “Ours” in the figure is the point of change compared to the original PoseCNN. We changed just the dimension of the final FCN layer from 4 to $d$ ( $d$ varies from 7 to 10).

On the one hand, our Bingham NLL loss function is described in Section IV. On the other hand, original PoseCNN used the following loss functions for training;

	$\displaystyle\operatorname{PLOSS}(\bm{q},\bm{q}_{\text{gt}})$	$\displaystyle=\frac{1}{2m}\sum_{\bm{x}\in\mathcal{M}}\\|R(\bm{q}_{\text{gt}})\bm{x}-R(\bm{q})\bm{x}\\|^{2},$		(41)
	$\displaystyle\operatorname{SLOSS}(\bm{q},\bm{q}_{\text{gt}})$	$\displaystyle=\frac{1}{2m}\sum_{\bm{x}_{1}\in\mathcal{M}}\min_{\bm{x}_{2}\in\mathcal{M}}\left\\|R(\bm{q}_{\text{gt}})\bm{x}_{1}-R(\bm{q})\bm{x}_{2}\right\\|^{2},$		(42)

where $\mathcal{M}$ denotes the set of points on the mesh model of each object. They used PLoss if the object has no symmetry, and SLoss if the object has symmetry. While they annotated the symmetric property to each object, our method doesn’t need these annotations. In addition, our method doesn’t need mesh models of target objects.

V-B Dataset

We tested our model with the YCB-Video dataset which is the same dataset used in [1]. In this dataset, 80 videos for training, and 2949 keyframes for testing that are extracted from the rest 12 unused videos were provided. In addition, YCB-Video dataset contains 80000 synthetic images. We also used them for training.

V-C Evaluation Metrics

We used ADD and ADD-S metrics to evaluate the performance of our pose estimator. Let $(R,\bm{t})$ be a pair of the groundtruth rotation $R$ and translation $\bm{t}$ , and $(\widehat{R},\widehat{\bm{t}})$ be a pair of the estimated rotation and translation. Then, ADD and ADD-S are defined as below:

	ADD	$\displaystyle=\frac{1}{m}\sum_{\bm{x}\in\mathcal{M}}\\|(R\bm{x}+\bm{t})-(\widehat{R}\bm{x}+\widehat{\bm{t}})\\|,$		(43)
	ADD-S	$\displaystyle=\frac{1}{m}\sum_{\bm{x}_{1}\in\mathcal{M}}\min_{\bm{x}_{2}\in\mathcal{M}}\\|(R\bm{x}_{1}+\bm{t})-(\widehat{R}\bm{x}_{2}+\widehat{\bm{t}})\\|,$		(44)

where $\mathcal{M}$ is the set of the $m$ sampled points from 3D mesh model’s surface. These metrics are the same as that used in PoseCNN [1].

V-D Results

V-D1 Comparison of Bingham Representations

Table I shows the mean values of area under the curve (AUC) of ADD and ADD-S. Peretroukhin’s 10D representation achieves the best score among 6 representations. Thus we adapt this representation to our network. 8D representation with Birdal representation comes next.

It can be seen that the 4D eigenvalue representations got a higher score than 3D representation . This would be related that $\Lambda_{4}$ , defined in (27), tends to give smaller eigenvalues than the values given by $\Lambda_{3}$ , defined in (29). It is known that the distribution is concentrated if the eigenvalues are small, as we will described in Section VI-A1. In the well-trained network, the mode quaternion closer to the given quaternion as the dispersion of the distribution becomes smaller. This is a possible reason why $\Lambda_{4}$ gives a better result that $\Lambda_{3}$ .

TABLE I: Comparison of Parametrization. Only the mode quaternion is used for ADDs calculation for Bingham representation.

Param	Ortho. Matrix	Diag. Entries	Dim	ADD	ADD-S
$\mathcal{P}_{4+3}$	Birdal [21]	3D	7	49.2	72.9
$\mathcal{P}_{4+4}$	Birdal [21]	4D	8	53.0	74.2
$\mathcal{P}_{6+3}$	Cayley [20]	3D	9	13.5	58.8
$\mathcal{P}_{6+4}$	Cayley [20]	4D	10	23.5	66.2
$\mathcal{P}_{10}$	Symmetry Matrix [6]		10	55.1	75.1
–	Quaternion [1]		4	52.9	74.1

V-D2 Conventional Quaternion vs Bingham’s Mode Quaternion

TABLE II: Area under the curve of Fig. 4. “Ratio” is the ratio of Ours score to Original PoseCNN’s score. Ratio

>

95.0% are shown in bold.

	RGB						RGB + Depth
	ADD		ADD-S		Ratio		ADD		ADD-S		Ratio
objects	Ours	Original	Ours	Original	ADD	ADD-S	Ours	Original	Ours	Original	ADD	ADD-S
002_master_chef_can	61.0	60.5	86.7	88.7	100.9	97.7	70.7	70.0	93.1	93.4	101.0	99.7
003_cracker_box	36.6	61.2	63.3	79.5	59.8	79.6	64.8	79.1	72.8	85.4	81.9	85.3
004_sugar_box	56.7	51.6	76.7	73.4	109.8	104.5	91.3	90.7	94.7	93.7	100.6	101.1
005_tomato_soup_can	71.1	69.5	83.7	82.6	102.4	101.3	86.8	87.8	93.2	93.5	98.8	99.7
006_mustard_bottle	88.7	84.5	94.0	92.1	105.0	102.1	94.9	90.6	96.6	93.5	104.8	103.2
007_tuna_fish_can	73.5	68.4	91.5	87.8	107.5	104.2	84.7	84.3	96.9	95.1	100.4	101.9
008_pudding_box	29.3	67.8	53.8	83.4	43.2	64.5	81.8	86.0	90.3	93.7	95.1	96.4
009_gelatin_box	87.8	80.2	92.8	89.4	109.4	103.8	65.4	95.3	67.4	97.2	68.7	69.4
010_potted_meat_can	60.1	59.7	79.3	78.2	100.6	101.3	80.4	78.6	89.4	88.3	102.3	101.2
011_banana	69.8	77.4	85.4	89.9	90.2	95.0	81.2	89.1	90.0	95.1	91.2	94.6
019_pitcher_base	68.6	67.8	84.4	83.8	101.1	100.6	92.3	93.9	96.2	96.5	98.3	99.7
021_bleach_cleanser	50.7	51.1	67.5	70.1	99.2	96.3	85.2	84.2	93.9	91.5	101.2	102.6
024_bowl	4.3	4.9	60.5	74.2	87.7	81.6	17.9	8.4	90.4	78.1	213.8	115.7
025_mug	71.2	47.4	87.8	72.4	150.3	121.4	77.7	84.6	88.8	95.0	91.9	93.5
035_power_drill	61.4	52.7	77.9	72.7	116.6	107.2	87.4	86.3	92.8	91.7	101.4	101.2
036_wood_block	0.9	1.3	21.7	15.8	69.7	137.3	39.5	29.0	86.2	88.7	136.1	97.1
037_scissors	43.5	50.1	65.1	68.8	86.7	94.6	62.1	72.8	77.1	82.2	85.3	93.8
040_large_marker	55.1	55.2	66.9	67.2	99.8	99.5	82.3	86.1	90.4	93.2	95.6	96.9
051_large_clamp	43.2	12.9	68.4	38.9	334.9	175.8	63.1	56.7	81.6	76.8	111.3	106.3
052_extra_large_clamp	8.1	6.3	37.6	38.6	128.5	97.5	27.3	9.2	49.6	40.6	295.8	122.1
061_foam_brick	50.3	56.8	83.9	90.0	88.5	93.2	63.4	67.2	95.0	96.6	94.3	98.3
all	55.1	53.0	75.1	74.1	104.0	101.4	75.5	75.8	88.3	88.7	99.6	99.5

Table II shows the result of PoseCNN with our Bingham representation and conventional quaternion representation. Mode quaternion described in (20) is used for comparing the performance of Bingham representation with that of conventional one. Bingham representation with our loss function achieved a equivalent performance with that of quaternions.

VI DISCUSSIONS

VI-A Evaluation of Inferred Probabilistic Representation

Table II is evaluated only with the mode quaternion. Our method can extract information about the ambiguity or uncertainty of inferrence result. Inferred results are probability distribution so we may interpret them in a several way. We evaluate the result in the two interpretation: confidence and shape ambiguity.

VI-A1 Interpret as Confidence (Epismetic uncertainty)

Fig. 1 shows an example of inferrence result of our model. The plots in the right column shows sampled poses from the inferred distribution. The sampling algorithm from Bingham distribution is based on [25].

In Fig. 1, If the handle of the mug appears, the resulting distribution becomes low-variance and concentrated. This can be interpreted that the inferred result has high confidence. In contrast, if the handle is occluded, the distribution becomes widely spread. This can be interpreted that the inferred result has low confidence.

To explain this more quantitatively, here we introduce a random variable $\Delta Q$ . Given a random quaternion from the estimated distribution $Q\sim\mathfrak{B}(D,\bm{\lambda})$ and the groundtruth $\bm{q}_{\text{gt}}$ , we define a random variable $\Delta Q$ as follows:

	$\displaystyle\Delta Q$	$\displaystyle=2\arccos\left(\left\|\mathop{\mathrm{Re}}(Q\odot\bm{q}_{\text{gt}}^{-1})\right\|\right)$
		$\displaystyle=2\arccos\left(\left\|{\bm{q}_{\text{gt}}^{\top}Q}\right\|\right),$		(45)

where $\mathop{\mathrm{Re}}(\bm{q})$ is the real part of $\bm{q}$ ; that is, if $\bm{q}=w+xi+yj+zk$ , then $\mathop{\mathrm{Re}}(\bm{q})=w$ . A realization of $\Delta Q$ represents a difference between sampled rotations from the inferred distribution and the groundtruth.

Now we introduce an indicator of the inference uncertainty proposed in [6]. They empirically found that as the trace of shifted parameter matrix: the lesser

\operatorname{tr}(A_{\text{shifted}})=\operatorname{tr}(A)-4\max(\bm{\lambda})

(46)

becomes, the more confident the estimation is. The upper figure in Fig. 3 shows the inferrence result of 025_mug as an example. The red and green plots in the figure are corresponding to the minimum and maximum value of traces, respectively. The lower figures shows the distribution at the corresponding point in the upper figure; the left is to the red point, and the right is to the green point, respectively. Here we can see that if the trace is large, then $\Delta Q$ widely distributes, that is, the confidence is low. In addition, we can also see that $\operatorname{E}[\Delta Q]$ becomes smaller as the trace is lesser. This implies that an inference with large trace may have a large error.

VI-A2 Interpret as Rotation Symmetry (Aleatoric uncertainty)

In Fig. 1, we can see that rotations are zonally spread around the $z$ -axis. We can interpret this that the mug in this view has rotational symmetry around the z-axis. We can see the symmetry characteristics of the observed objects quantitatively by inspecting the eigenvalues. According to [26], for the eigenvalues sorted as (19), $\bm{\lambda}$ gives

•

a bipolar distribution, if $\lambda_{2}+\lambda_{3}<\lambda_{4}$ ,
•

a circular distribution, if $\lambda_{2}+\lambda_{3}=\lambda_{4}$ ,
•

a spherical distribution, if $\lambda_{2}+\lambda_{3}>\lambda_{4}$ ,
•

a uniform distribution, if $\lambda_{2}=\lambda_{3}=\lambda_{4}$ .

The orientation of the symmetry axis is determined by the orthogonal matrix $D$ .

VI-B Explanation How Our Bingham Representation Works

Fig. 5 shows synthetic images of a red cup. It shows the groundtruth of the angles of rotation around $z$ -axis, instead of the quaternions. The inferred distributions with their 10D parameters are also shown. We will use this figure for explaining the learning mechanism of our network.

Our model is mathematically represented as below:

\bm{\theta}_{i}=F(I_{i})

(47)

where $I_{i}$ is an input image and $\bm{\theta}_{i}$ is a inferred parameter of distribution. $F$ is our network to be trained. Suppose that we have pairs $\{I_{i},\bm{q}_{i}\}_{i=1}^{N}$ whose images are similar to each other but whose groundtruth quaternions are all different. In Fig. 5, the middle column is corresponding to this circumstance. In this situation, there is a $\bm{\theta}$ satisfying

F(I_{i})\approx\bm{\theta}\quad\text{for all $i=1,\dots,N$}.

(48)

Let $M$ be a function that transforms a given parameter vector to a parameter matrix of Bingham distribution. Then our problem becomes

Find $\bm{\theta}$ that minimize $\displaystyle\sum_{i=1}^{N}\mathcal{L}(M(\bm{\theta}),\bm{q}_{i})$ .

By solving this problem, we finally get $\bm{\theta}$ which is optimized to the given all quaternions $\bm{q}$ . The problem is equivalent to “solve the maximum likelihood estimation (MLE) problem for each $M(f(I_{i}))$ , given $\bm{q}_{i}$ ”. This means that the resulting parameter has information about the distribution of quaternions that share the similar views.

In Fig. 5, the inferred results in the left and right column are both concentrated because quaternions that gives the similar view are close to each other. In contrast, the result in the middle column is zonally spread because the quaternions sharing the view is widely spread. Our network learns the parameter that covers rotations sharing the similar view.

VI-C Adapting to Objects with Discrete Symmetry

In Table II, the objects with discrete symmetry, such as 036_wood_block and 052_extra_large_clamp, got relatively low score in ADD and ADD-S. This is because a single Bingham distribution cannot capture the ambiguity with multiple modes well. We can improve score by introducing mixture Bingham representation which is introduced in such as [7]. Our NLL loss function is easy to extend to them, compared to non-NLL losses such as the QCQP loss presented in [6].

VII CONCLUSIONS

We proposed and implemented a Bingham NLL loss function which is free from pre-computed lookup table. This is directly computable and there is no need to interpolate computation. Also, we showed our loss function is easy to implement for being used for training. Moreover, it is quite easy to be introduced to the existing 6D pose estimator. We tested with PoseCNN as an example and proved that our representation successfully expressed the ambiguity of rotation while the evaluating the peak of distribution showed equivalent performance with that of original PoseCNN. Furthermore, we discovered the relationship between the various parametrization of the Bingham distribution and the performances from object pose perspective. In future works, we would like to handle mixture Bingham distribution for more capabilities, especially for the objects with discrete symmetry, based on this loss function.

References

[1] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A convolutional neural network for 6d object pose estimation in cluttered scenes,” 2018.
[2] M. Bui, S. Zakharov, S. Albarqouni, S. Ilic, and N. Navab, “When regression meets manifold learning for object recognition and pose estimation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018, pp. 6140–6146.
[3] F. Manhardt, D. M. Arroyo, C. Rupprecht, B. Busam, T. Birdal, N. Navab, and F. Tombari, “Explaining the ambiguity of object detection and 6d pose from visual data,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6841–6850.
[4] K. Hashimoto, D.-N. Ta, E. Cousineau, and R. Tedrake, “KOSNet: A Unified Keypoint, Orientation and Scale Network for Probabilistic 6D Pose Estimation,” 2019.
[5] C. Bingham, “An Antipodally Symmetric Distribution on the Sphere,” The Annals of Statistics, vol. 2, no. 6, pp. 1201 – 1225, 1974. [Online]. Available: https://doi.org/10.1214/aos/1176342874
[6] V. Peretroukhin, M. Giamou, D. M. Rosen, W. N. Greene, N. Roy, and J. Kelly, “A Smooth Representation of SO(3) for Deep Rotation Learning with Uncertainty,” in Proceedings of Robotics: Science and Systems (RSS’20), Jul. 12–16 2020.
[7] H. Deng, M. Bui, N. Navab, L. Guibas, S. Ilic, and T. Birdal, “Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation,” 2020.
[8] S. Riedel, Z. C. Marton, and S. Kriegel, “Multi-view orientation estimation using Bingham mixture models,” 2016 20th IEEE Int. Conf. Autom. Qual. Testing, Robot. AQTR 2016 - Proc., vol. 2, 2016.
[9] Y. Chen and K. Tanaka, “Maximum likelihood estimation of the Fisher–Bingham distribution via efficient calculation of its normalizing constant,” Statistics and Computing, vol. 31, 07 2021.
[10] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization,” 12 2015, pp. 2938–2946.
[11] W. Zou, D. Wu, S. Tian, C. Xiang, X. Li, and L. Zhang, “End-to-end 6dof pose estimation from monocular rgb images,” IEEE Transactions on Consumer Electronics, vol. 67, no. 1, pp. 87–96, 2021.
[12] B. Samarth, G. Jinwei, K. Kihwan, H. James, and K. Jan, “Geometry-aware learning of maps for camera localization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[13] Y. Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5738–5746.
[14] D. M. Davis, “Embeddings of real projective spaces,” Boletin De La Sociedad Matematica Mexicana, vol. 4, pp. 115–122, 1998.
[15] L. Jake, E. Carlos, C. Kefan, S. Noah, K. Angjoo, R. Afshin, and M. Ameesh, “An analysis of SVD for deep rotation estimation,” in Advances in Neural Information Processing Systems 34, 2020, to appear in.
[16] M. Bui, T. Birdal, H. Deng, S. Albarqouni, L. Guibas, S. Ilic, and N. Navab, “6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 139–157.
[17] R. A. Srivatsan, M. Xu, N. Zevallos, and H. Choset, “Bingham distribution-based linear filter for online pose estimation,” Robot. Sci. Syst., vol. 13, 2017.
[18] I. Gilitschenski, R. Sahoo, W. Schwarting, A. Amini, S. Karaman, and D. Rus, “Deep orientation uncertainty learning based on a bingham loss,” in International Conference on Learning Representations, 2020.
[19] A. Kume, S. Preston, and A. Wood, “Saddlepoint approximations for the normalizing constant of Fisher-Bingham distributions on products of spheres and Stiefel manifolds,” Biometrika, vol. 4, 12 2013.
[20] E. Hairer and G. Wanner, Geometric Numerical Integration, ser. Springer Series in Computational Mathematics. Berlin/Heidelberg: Springer-Verlag, 2006, vol. 31. [Online]. Available: http://link.springer.com/10.1007/3-540-30666-8
[21] T. Birdal, U. Şimşekli, M. O. Eken, and S. Ilic, “Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, ser. NIPS’18. Red Hook, NY, USA: Curran Associates Inc., 2018, pp. 306–317.
[22] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10. Madison, WI, USA: Omnipress, 2010, pp. 807–814.
[23] K. Tanaka, “Error control of a numerical formula for the fourier transform by ooura’s continuous euler transform and fractional fft,” Journal of Computational and Applied Mathematics, vol. 266, pp. 73–86, 2014.
[24] NVIDIA, “PoseCNN-PyTorch,” 2020. [Online]. Available: https://github.com/NVlabs/PoseCNN-PyTorch
[25] J. Kent, A. Ganeiber, and K. Mardia, “A new method to simulate the bingham and related distributions in directional data analysis with applications,” 10 2013.
[26] K. Kunze and H. Schaeben, “The bingham distribution of quaternions and its spherical radon transform in texture analysis,” Mathematical Geology, vol. 36, pp. 917–943, 11 2004.

Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation*{}^{\text{*}}