This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficient Estimation of Sensor Biases for the 3-Dimensional Asynchronous Multi-Sensor System

Wenqiang Pu,  Ya-Feng Liu,  Zhi-Quan Luo,  Part of this work has been presented at the 2018 IEEE GlobalSIP [1]. The work of W. Pu was supported by the National Natural Science Foundation of China (NSFC) under Grant 62101350. The work of Y.-F. Liu was supported in part by the NSFC under Grant 12022116 and Grant 12288201. The work of Z.-Q. Luo was supported in part by the National Key Research and Development Program of China Project under Grant 2022YFA1003900 and the Guangdong Provincial Key Laboratory of Big Data Computing. (Corresponding Author: Ya-Feng Liu.)W. Pu and Z.-Q. Luo are with Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, 518172, China (e-mail: {\{wenqiangpu,luozq}\}@cuhk.edu.cn).Y.-F. Liu is with the State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China (e-mail: yafliu@lsec.cc.ac.cn).
Abstract

An important preliminary procedure in multi-sensor data fusion is sensor registration, and the key step in this procedure is to estimate sensor biases from their noisy measurements. There are generally two difficulties in this bias estimation problem: one is the unknown target states which serve as the nuisance variables in the estimation problem, and the other is the highly nonlinear coordinate transformation between the local and global coordinate systems of the sensors. In this paper, we focus on the 3-dimensional asynchronous multi-sensor scenario and propose a weighted nonlinear least squares (NLS) formulation by assuming that there is a target moving with a nearly constant velocity. We propose two possible choices of the weighting matrix in the NLS formulation, which correspond to classical and weighted NLS estimation and maximum likelihood (ML) estimation, respectively. To address the intrinsic nonlinearity, we propose a block coordinate descent (BCD) algorithm for solving the formulated problem, which alternately updates different kinds of bias estimates. Specifically, the proposed BCD algorithm involves solving linear LS problems and nonconvex quadratically constrained quadratic program (QCQP) problems with special structures. Instead of adopting the semidefinite relaxation technique, we develop a much more computationally efficient algorithm based on the alternating direction method of multipliers (ADMM) to solve the nonconvex QCQP subproblems. The convergence of the ADMM to the global solution of the QCQP subproblems is established under mild conditions. The effectiveness and efficiency of the proposed BCD algorithm are demonstrated via numerical simulations.

Index Terms:
Alternating direction methods of multipliers, block coordinate decent algorithm, nonlinear least square, sensor registration problem

I Introduction

In the past decades, multi-sensor data fusion has attracted a lot of research interests in various applications, e.g., tracking system [2, 3]. Compared with the single sensor system, the performance can be significantly improved by integrating inexpensive stand-alone sensors into an integrated multi-sensor system, which is also a cost-effective way in practice. However, the success of data fusion not only depends on the data fusion algorithms but also relies on an important calibration process called sensor registration. The sensor registration process refers to expressing each sensor’s local data in a common reference frame, by removing the biases caused by the improper alignment of each sensor [4]. Consequently, the key step in the sensor registration process is to estimate the sensor biases.

I-A Related Works

An early work [4] for sensor registration dates back to the 1980s, where the authors identified various factors which dominate the alignment errors in the multi-sensor system and established a bias model to compensate the alignment errors. Under the assumption that there exists a bias-free sensor, the maximum likelihood (ML) estimation approach [4, 5, 6] and the least squares (LS) estimation approach [7] were proposed. However, the bias-free sensor assumption is not practical and a more common situation in practice is that all sensors contain biases. For the general situation where all sensors are biased, various approaches have been proposed in the last four decades. From the perspective of parameter estimation, these approaches can be divided into three types: LS estimation [8, 9], ML estimation [10, 11, 6], Bayesian estimation [12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. Note that the sensor registration problem not only contains unknown sensor biases to be estimated, but also the target states, i.e., positions of the target at different time instances. The major differences among the above three types of approaches lie in the treatment for the unknown target states. For the LS approach, the target states are expressed as nonlinear functions of sensor biases in a common coordinate system, where the target states are eliminated and only sensor biases need to be estimated. As for the ML approach, a joint ML function for the sensor biases and the target states is formulated and maximized in an iterative manner (with two steps per iteration), i.e., one step to estimate the sensor biases and one step to estimate the target’s states. Instead of jointly estimating the sensor biases and the target states as in the ML approach, the (recursive) Bayesian approach exploits the Markov property of the transition of the target states between different time instances, i.e., the probability distribution of the target state at the next time instance only depends on the target state at the current time instance. The sensor biases and the target states are recursively estimated at every time instance.

Since sensor measurements are in its local polar coordinate system, the nonlinear coordinate transformation between the local polar and global Cartesian coordinates introduces an intrinsic nonlinearity in the sensor registration problem. The approaches mentioned above adopted different kinds of approximation schemes to deal with the nonlinearity, i.e., using linear expansions to approximate some nonlinear functions [8, 5, 9, 7, 4, 10, 12, 11, 6, 13, 14, 15, 16, 17] which usually leads to an uncontrolled model mismatch or using a large number of discrete samples (particles) to approximate some nonlinear functions which often requires an expensive computational cost when the number of sensors is large [18, 20, 21]. In addition to the schemes of dealing with the nonlinearity, many of the above works only consider the synchronous case, i.e., all sensors simultaneously measure the position of the target at the same time instance, which may not be always satisfied in many practical scenarios.

Recent works in [22, 23] proposed a semidefinite relaxation (SDR) based block coordinate descent (BCD) optimization approach to address the nonlinearity issue in the asynchronous sensor registration problem in the 2-dimensional scenario. The approach assumes the presence of one reference target moving at a nearly constant velocity (e.g., commercial planes and drones in the real world) and is effective in handling the nonlinearity while also guaranteeing an exact recovery of all sensor biases in the noiseless case. Consequently, it is interesting to explore such BCD algorithm in the 3-dimensional scenario, where more kinds of biases are involved and the estimation problem becomes even more highly nonlinear, i.e., more trigonometric functions with respect to different biases are multiplied together in the coordinate transformation. Furthermore, the BCD algorithm in [22, 23] needs to solve convex semidefinite programing (SDP) subproblems at each iteration, whose computational cost is 𝒪(M4.5)\mathcal{O}(M^{4.5}) (where MM is the total number of sensors). Such a high computational complexity limits the practical use of the BCD algorithm though it can well deal with the nonlinearity issue. An interesting question is, instead of solving SDPs, whether there is an algorithm which has a lower computational complexity and can solve the original nonconvex problem with a guaranteed convergence. This paper provides a positive answer to the above question.

I-B Our Contributions

In this paper, we consider the 3-dimensional sensor registration problem, where more kinds of biases and noises are involved. Few works tackle the 3-dimensional registration problem except some notable works [12, 11, 7, 6, 1]. However, all of these works assume that sensors work synchronously. In this paper, we consider a practical scenario where all sensors are biased and work asynchronously. The major contributions are summarized as below:

\bullet 3-dimensional asynchronous sensor registration: We consider the sensor registration problem in a general scenario, where sensors work asynchronously in 3-dimension and all sensors are biased. Such a scenario is of great practical interest but has not been well studied yet. Compared with most of existing researches which consider the 2-dimensional scenario, there are more kinds of sensor biases in the 3-dimensional scenario, i.e., polar measurement biases and orientation angle biases, which increases the intrinsic nonlinearity in the bias estimation problem.

\bullet A weighted nonlinear least squares formulation: By exploiting the prior knowledge that the target moves with a nearly constant velocity, we propose a weighted nonlinear LS formulation for the 3-dimensional asynchronous multi-sensor registration problem, which takes different practical uncertainties into consideration, i.e., noise in sensor measurements and dynamic maneuverability in the target motion. The proposed formulation incorporates the covariance information of different practical uncertainties by appropriately choosing the weight matrix, which enables it to efficiently handle situations with different uncertainties.

\bullet A computationally efficient BCD algorithm: We exploit the special structure of the proposed weighted nonlinear LS formulation and develop a computationally efficient BCD algorithm for solving it. The proposed algorithm alternately updates different kinds of biases by solving linear LS problems and nonconvex quadratically constrained quadratic program (QCQP) problems. Compared with the previous work [1] which utilizes the SDR-based technique for solving the nonconvex QCQP subproblems with a computational complexity of 𝒪(M4.5)\mathcal{O}(M^{4.5}), we develop a low computational cost algorithm based on the alternating direction method of multipliers (ADMM) for solving those QCQP subproblems with a global optimality guarantee (under mild conditions), which only requires a computational complexity of 𝒪(M2)\mathcal{O}(M^{2}).

I-C Organization and Notations

The organization of this paper is as follows. In Section II, we introduce basic models of sensor measurements and the target motion. In Section III, we propose a nonlinear LS formulation for estimating all kinds of biases. To effectively solve the proposed nonlinear LS formulation, we develop a BCD algorithm in Section IV. Numerical simulation results are presented in Section V.

We adopt the following notations in this paper. Normal case letters, lower case letters in bold, and upper case letters in bold represent scalar numbers, vectors, and matrices, respectively. 𝐗T\mathbf{X}^{T} and 𝐗1\mathbf{X}^{-1} represent the transpose of matrix 𝐗\mathbf{X} and the inverse of invertible matrix 𝐗\mathbf{X}, respectively. [𝑿]i:j,m:n[\bm{X}]_{i:j,m:n} represents the matrix formed by elements of matrix 𝑿\bm{X} from rows ii to jj and columns mm to nn. 𝐱n\mathbf{x}_{n} denotes the nn-th component of 𝐱\mathbf{x} and 𝐱\|\mathbf{x}\| denotes the Euclidean norm of 𝐱\mathbf{x}. 𝔼𝐰{}\mathbb{E}_{\mathbf{w}}\{\cdot\} is the expectation operator with respect to random variable 𝐰\mathbf{w}. M\mathbb{R}^{M} represents the set of MM-dimensional real vectors. Other notions will be explained when they appear for the first time.

II Sensor Measurement Model

Consider a 3-dimensional multi-sensor system with MM (M>1M>1) sensors located at different known positions 𝒑m3,m\bm{p}_{m}\in\mathbb{R}^{3},\forall\,m. Assume there is a reference target moving with a nearly constant velocity in the space111Such a reference target can be selected from the branches of civilian airplanes. and sensors measure the relative polar coordinates (i.e., range, azimuth, and elevation) at different time instances in an asynchronous mode, i.e., polar coordinates of the target at different time instances are measured by different sensors. After a time period in which a set of local measurements are collected by sensors, all sensors’ measurements are sent to a fusion center and compactly, all sensors’ measurements are sorted and mapped onto a time axis, indexed by time instance k=1,2,,Kk=1,2,\ldots,K. Without loss of generality, we make the following (mild) regularity assumption.

Assumption 1.

Assume that only one sensor observes the target at each time instance and every sensor has at least one measurement in the whole observation interval. In addition, the total number of observations is greater than the number of sensors, i.e., KM+1K\geq M+1.

At time instance kk, let the corresponding sensor be indexed by sk{1,2,,M}s_{k}\in\{1,2,\ldots,M\} and 𝝃k=(xk,yk,zk)\bm{\xi}_{k}=(x_{k},y_{k},z_{k}) be the position of the target in the global Cartesian coordinate system. Then this target’s position in sensor sks_{k}’s local Cartesian coordinate system is denoted as 𝝃k3\bm{\xi}_{k}^{\prime}\in\mathbb{R}^{3}, which can be expressed as a function with respect to 𝝃k\bm{\xi}_{k} given as:

𝝃k=R1(𝜻sk+Δ𝜻sk)(𝝃k𝒑sk),\displaystyle\bm{\xi}_{k}^{\prime}=R^{-1}(\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}})(\bm{\xi}_{k}-\bm{p}_{s_{k}}), (1)

where

R(𝜻)=R(α,β,γ)=Rx(α)Ry(β)Rz(γ)\displaystyle R(\bm{\zeta})=R(\alpha,\beta,\gamma)=R_{x}(\alpha)R_{y}(\beta)R_{z}(\gamma) (2)
=\displaystyle= [1000cosαsinα0sinαcosα][cosβ0sinβ010sinβ0cosβ][cosγsinγ0sinγcosγ0001].\displaystyle\left[\begin{matrix}1&0&0\\ 0&\cos\alpha&-\sin\alpha\\ 0&\sin\alpha&\cos\alpha\end{matrix}\right]\left[\begin{matrix}\cos\beta&0&\sin\beta\\ 0&1&0\\ -\sin\beta&0&\cos\beta\end{matrix}\right]\left[\begin{matrix}\cos\gamma&-\sin\gamma&0\\ \sin\gamma&\cos\gamma&0\\ 0&0&1\end{matrix}\right].

In the above, R(α,β,γ)R(\alpha,\beta,\gamma) is a 3×33\times 3 rotation matrix, which represents the coordinate rotation from the local to the global Cartesian coordinate systems, and parameters α,β,γ[π,π]\alpha,\beta,\gamma\in[-\pi,\pi] there are rotation angles corresponding to roll, pitch, and yaw, respectively. Parameter 𝜻m=(αm,βm,γm)\bm{\zeta}_{m}=(\alpha_{m},\beta_{m},\gamma_{m}) in (1) is the presumed rotation angles which is measured by sensor mm’s orientation system and Δ𝜻m=(Δαm,Δβm,Δγm)\Delta\bm{\zeta}_{m}=(\Delta\alpha_{m},\Delta\beta_{m},\Delta\gamma_{m}) is the unknown rotation angle biases. Note that Δ𝜻m\Delta\bm{\zeta}_{m} is due to the imperfection of the orientation system of sensor mm and we call them the orientation biases in this paper. Hence, the term 𝜻sk+Δ𝜻sk\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}} represents the true rotation angles of sensor sks_{k}.

In the following, we will discuss another category of sensor biases arising from imperfections in the sensor measurement system. Define the Cartesian-to-polar coordinate transformation function h(x,y,z)h(x,y,z) as

h(x,y,z)=[ρϕη][(x)2+(y)2+(z)2arctan(x,y)arctan((x)2+(y)2,z)],h(x,y,z)=\begin{bmatrix}{\rho}\\ {\phi}\\ {\eta}\end{bmatrix}\triangleq\begin{bmatrix}\sqrt{(x)^{2}+(y)^{2}+(z)^{2}}\\ \arctan(x,y)\\ \arctan(\sqrt{(x)^{2}+(y)^{2}},z)\end{bmatrix},

where ρ\rho, ϕ\phi, and η\eta are polar coordinates corresponding to range, azimuth, and elevation, respectively. Then based on (1) and the fact R1()=RT()R^{-1}(\cdot)=R^{T}(\cdot), the polar coordinates of 𝝃k\bm{\xi}_{k}^{\prime} measured by sensor sks_{k} is denoted as 𝒛k=(ρk,ϕk,ηk),\bm{z}_{k}=({\rho}_{k},{\phi}_{k},{\eta}_{k}), which can be expressed as

𝒛k\displaystyle\bm{z}_{k} =h(𝝃k)Δ𝒛sk+𝒘k\displaystyle=h\left(\bm{\xi}_{k}^{\prime}\right)-\Delta\bm{z}_{s_{k}}+\bm{w}_{k} (3)
=h(RT(Δ𝜻sk+𝜻sk)(𝝃k𝒑sk))Δ𝒛sk+𝒘k.\displaystyle=h\left(R^{T}\left(\Delta\bm{\zeta}_{s_{k}}+\bm{\zeta}_{s_{k}}\right)(\bm{\xi}_{k}-\bm{p}_{s_{k}})\right)-\Delta\bm{z}_{s_{k}}+\bm{w}_{k}.

In (3), Δ𝒛m=(Δρm,Δϕm,Δηm)\Delta\bm{z}_{m}=(\Delta\rho_{m},\Delta\phi_{m},\Delta\eta_{m}) is the measurement biases of sensor m;m; 𝒘k3\bm{w}_{k}\in\mathbb{R}^{3} is the zero-mean Gaussian noise with

𝒘k𝒩(𝟎,diag([σρ2,σϕ2,ση2])),\bm{w}_{k}\sim\mathcal{N}(\bm{0},\textrm{diag}([\sigma_{\rho}^{2},\sigma_{\phi}^{2},\sigma_{\eta}^{2}])),

where σρ2,σϕ2\sigma_{\rho}^{2},\sigma_{\phi}^{2}, and ση2\sigma_{\eta}^{2} are variances corresponding to range, azimuth, and elevation, respectively.

In view of the measurement model in (3), there are two types of biases, i.e., orientation biases Δ𝜻sk\Delta\bm{\zeta}_{s_{k}} and measurement biases Δ𝒛sk\Delta\bm{z}_{s_{k}}. Both of them affect the expression for the global Cartesian coordinates 𝝃k\bm{\xi}_{k}. For each sensor, there are in total six biases, i.e., Δαm,Δβm,Δγm,Δρm,Δϕm\Delta\alpha_{m},\Delta\beta_{m},\Delta\gamma_{m},\Delta\rho_{m},\Delta\phi_{m}, and Δηm\Delta\eta_{m}. Each of these biases has a different impact on 𝒛k,\bm{z}_{k}, and the geometric illustration of them is shown in Fig. 1. We note that there is an intrinsic ambiguity among these six kinds of biases, and this is formally stated in Proposition 1.

Refer to caption
Refer to caption
Figure 1: An illustration of different kinds of sensor biases.
Proposition 1.

Let cc\in\mathbb{R}, then any pair (Δγsk,Δϕsk)(\Delta\gamma_{s_{k}},\Delta\phi_{s_{k}}) satisfying Δγsk+Δϕsk=c\Delta\gamma_{s_{k}}+\Delta\phi_{s_{k}}=c gives the same 𝐳k\bm{z}_{k} in (3).

Proof.

See Appendix A. ∎

Proposition 1 states that biases Δγ\Delta\gamma and Δϕ\Delta\phi cannot be distinguished since they are intrinsically coupled in the measurement model (3). Only their sum affects the measurements and they can be treated as one kind of bias [7]. In the rest part of this paper, we fix Δϕm=0,m\Delta\phi_{m}=0,\forall\,m and only consider estimating Δγm,m\Delta\gamma_{m},\forall\,m.

Consequently, the goal of the 3-dimensional multi-sensor registration problem is to determine sensor biases

𝜽m=[Δρm,Δηm,Δαm,Δβm,Δγm],m,\bm{\theta}_{m}=[\Delta\rho_{m},\Delta\eta_{m},\Delta\alpha_{m},\Delta\beta_{m},\Delta\gamma_{m}],\forall\,m,

from sensor measurements {𝒛k}k=1K\{\bm{z}_{k}\}^{K}_{k=1}. It is worth noting that determining the sensor biases solely from (3) is a challenging task since the target position 𝝃k\bm{\xi}_{k} at time instance kk is unknown. Any combination of 𝜽sk\bm{\theta}{s_{k}} and 𝝃k\bm{\xi}_{k} can satisfy (3). However, by leveraging the fact that the target moves with nearly constant velocity, the registration problem can be addressed.

III A Weighted Nonlinear LS Formulation

In this section, we first introduce the unbiased spherical-to-Cartesian coordinate transformation to represent local noisy measurements {𝒛k}\{\bm{z}_{k}\} in the common Cartesian coordinate system as well as the nearly-constant-velocity motion model for the target. Then, we propose a weighted nonlinear least squares formulation for estimating sensor biases.

According to the sensor measurement model in (3), the target position 𝝃k\bm{\xi}_{k} can be expressed by a nonlinear transformation with respect to 𝒛k\bm{{z}}_{k} and 𝜽sk\bm{\theta}_{s_{k}}, given in Proposition 2.

Proposition 2 (Unbiased Coordinate Transformation [24]).

Given the measurement model in (3), and suppose 𝛉sk\bm{\theta}_{s_{k}} take the value of the true bias. Then, we have

R(𝜻sk+Δ𝜻sk)h¯1(𝒛k+Δ𝒛sk)+𝒑sk+𝜺k=𝝃k,R(\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}}){\bar{h}^{-1}}(\bm{{z}}_{k}+\Delta\bm{{z}}_{s_{k}})+\bm{p}_{s_{k}}+\bm{\varepsilon}_{k}=\bm{\xi}_{k}, (4)

where 𝛆k\bm{\varepsilon}_{k} is a zero-mean random noise and

h¯1(𝒛k)=[λϕ1λη1ρkcosϕkcosηkλϕ1λη1ρksinϕkcosηkλη1ρksinηk]{\bar{h}^{-1}(\bm{z}_{k})}=\left[\begin{matrix}\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\rho_{k}\cos\phi_{k}\cos\eta_{k}\\ \lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\rho_{k}\sin\phi_{k}\cos\eta_{k}\\ \lambda_{\eta}^{-1}\rho_{k}\sin\eta_{k}\end{matrix}\right]

is the unbiased polar-to-Cartesian transformation function with λϕ=eσϕ2/2\lambda_{\phi}=e^{-\sigma_{\phi}^{2}/2} and λη=eση2/2\lambda_{\eta}=e^{-\sigma_{\eta}^{2}/2} being the compensation factors.

Note that 𝜺k\bm{\varepsilon}_{k} in Proposition 2 compensates the effects of the Gaussian noise 𝒘k\bm{w}_{k} (in 𝒛k\bm{z}_{k}) after the nonlinear mapping R(𝜻sk+Δ𝜻sk)h¯1()R(\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}})\bar{h}^{-1}(\cdot) and 𝜺k\bm{\varepsilon}_{k} can be viewed as a function of 𝒘k\bm{w}_{k}. The compensation factors (λϕ\lambda_{\phi} and λη\lambda_{\eta}) in h¯1()\bar{h}^{-1}(\cdot) ensures 𝔼𝒘k[𝜺k]=𝟎\mathbb{E}_{\bm{w}_{k}}[\bm{\varepsilon}_{k}]=\bm{0} but 𝜺k\bm{\varepsilon}_{k} is no longer a Gaussian noise due to the nonlinear mapping.

Next, assume that the reference target moves with a nearly constant velocity [25], i.e.,

𝝃k+1\displaystyle\bm{\xi}_{k+1} =𝝃k+Tk𝝃˙k+𝒏k,\displaystyle=\bm{\xi}_{k}+T_{k}\dot{\bm{\xi}}_{k}+\bm{n}_{k}, (5)
𝝃˙k+1\displaystyle\dot{\bm{\xi}}_{k+1} =𝝃˙k+𝒏˙k,\displaystyle=\dot{\bm{\xi}}_{k}+\dot{\bm{n}}_{k},

where 𝝃k=[xk,yk,zk]T3\bm{\xi}_{k}=[x_{k},y_{k},z_{k}]^{T}\in\mathbb{R}^{3} and 𝝃˙k=[x˙k,y˙k,z˙k]T3\dot{\bm{\xi}}_{k}=[\dot{x}_{k},\dot{y}_{k},\dot{z}_{k}]^{T}\in\mathbb{R}^{3} are the position and velocity of the target at time instance kk, respectively, Tk0T_{k}\geq 0 is the interval time between time instances kk and k+1k+1, and 𝒏k3\bm{n}_{k}\in\mathbb{R}^{3} and 𝒏˙k3\dot{\bm{n}}_{k}\in\mathbb{R}^{3} are the motion process noises for position and velocity at time instance kk, respectively. The process noises 𝒏k\bm{n}_{k} and 𝒏˙k\dot{\bm{n}}_{k} obey the Gaussian distribution

𝒏k𝒩(𝟎,qTk3/3𝑰3),𝒏˙k𝒩(𝟎,qTk𝑰3).\bm{n}_{k}\sim\mathcal{N}(\bm{0},qT_{k}^{3}/3\bm{I}_{3}),\ \dot{\bm{n}}_{k}\sim\mathcal{N}(\bm{0},qT_{k}\bm{I}_{3}). (6)

In the above, qq is the value of the noise power spectral density of the target motion and 𝒩(𝝁,𝚺)\mathcal{N}(\bm{\mu},\bm{\Sigma}) denotes the Gaussian distribution with mean 𝝁\bm{\mu} and covariance 𝚺\bm{\Sigma}.

Combining Proposition 2 with the target motion model in (5), we can immediately connect the measurements at time instances kk and k+1k+1 with each other, given in Proposition 3.

Proposition 3.

Given the measurement model in (3) and the target motion model in (5), and consider time instances kk and k+1k+1. Then, under Assumption 1, we have

gk(𝜽sk)+𝜺k+Tk𝝃˙k+𝒏k=gk+1(𝜽sk+1)+𝜺k+1,\displaystyle g_{k}(\bm{\theta}_{s_{k}})+\bm{\varepsilon}_{k}+T_{k}\dot{\bm{\xi}}_{k}+{\bm{n}}_{k}=g_{k+1}(\bm{\theta}_{s_{k+1}})+\bm{\varepsilon}_{k+1},

where gk(𝛉sk)=R(𝛇sk+Δ𝛇sk)h¯1(𝐳k+Δ𝐳sk)+𝐩skg_{k}(\bm{\theta}_{s_{k}})=R(\bm{\zeta}_{s_{k}}+{\Delta\bm{\zeta}_{s_{k}}}){\bar{h}^{-1}}(\bm{z}{k}+\Delta\bm{z}_{s_{k}})+\bm{p}_{s_{k}}.

By Proposition 3, we can establish the following nonlinear equations for the sensor biases estimation problem:

ϵk\displaystyle\bm{\epsilon}_{k} =gk+1(𝜽sk+1)gk(𝜽sk)Tk𝝃˙k,k,\displaystyle=g_{k+1}(\bm{\theta}_{s_{k+1}})-g_{k}(\bm{\theta}_{s_{k}})-T_{k}\dot{\bm{\xi}}_{k},\ \forall\,k, (7)
𝒏˙k\displaystyle\dot{\bm{n}}_{k} =𝝃˙k+1𝝃˙k,k,\displaystyle=\dot{\bm{\xi}}_{k+1}-\dot{\bm{\xi}}_{k},\ \forall\,k,

where ϵk=𝜺k+1𝜺k𝒏k\bm{\epsilon}_{k}=\bm{\varepsilon}_{k+1}-\bm{\varepsilon}_{k}-\bm{n}_{k} is also a zero-mean random noise.

Equations in (7) is a combination of sensor measurement model (3) and target motion model (5). They can be regarded as the constructed measurement models for parameters {𝜽m}\{\bm{\theta}_{m}\} and {𝝃˙k}\{\dot{\bm{\xi}}_{k}\}, where {ϵk}\{\bm{\epsilon}_{k}\} and {𝒏˙k}\{\dot{\bm{n}}_{k}\} are the zero-mean noise.

Based on (7), we propose to minimize the squares of ϵk\bm{\epsilon}_{k} and 𝒏˙k\dot{\bm{n}}_{k}, which results in the following nonlinear least squares formulation for estimating sensor biases:

min{𝜽m},{𝝃˙k}k=1K1[gk+1(𝜽sk+1)𝝃˙k+1][gk(𝜽sk)𝝃˙k][Tk𝝃˙k𝟎]𝑸k2,\min\limits_{\{\bm{\theta}_{m}\},\{\dot{\bm{\xi}}_{k}\}}\sum_{k=1}^{K-1}\left\|\begin{bmatrix}g_{k+1}(\bm{\theta}_{s_{k+1}})\\ \dot{\bm{\xi}}_{k+1}\end{bmatrix}-\begin{bmatrix}g_{k}(\bm{\theta}_{s_{k}})\\ \dot{\bm{\xi}}_{k}\end{bmatrix}-\begin{bmatrix}T_{k}\dot{\bm{\xi}}_{k}\\ \bm{0}\end{bmatrix}\right\|_{\bm{Q}_{k}}^{2}, (8)

where 𝑸\|\cdot\|_{\bm{Q}} denotes the weighted 2\ell_{2} norm associated with the positive definite symmetric matrix 𝑸\bm{Q}, i.e., 𝒙𝑸2=𝒙T𝑸𝒙\|\bm{x}\|_{\bm{Q}}^{2}=\bm{x}^{T}\bm{Q}\bm{x}. Notice that the LS form in (8) is derived from the nearly constant velocity model (5), where the term gk+1(𝜽sk+1)gk(𝜽sk)Tk𝝃˙kg_{k+1}(\bm{\theta}_{s_{k+1}})-g_{k}(\bm{\theta}_{s_{k}})-T_{k}\dot{\bm{\xi}}_{k} represents the mismatch of the position of the target in the global coordinate system and the other term 𝝃˙k+1𝝃˙k\dot{\bm{\xi}}_{k+1}-\dot{\bm{\xi}}_{k} corresponds to the mismatch of the nearly constant velocity. The current LS formulation cannot be directly applied to handle reference targets with strong maneuverability, such as those performing a constant turn motion. However, the fundamental concept of linking measurements at different time instances using the maneuver motion model is still relevant and can be further explored for more practical scenarios.

The choices of the weight matrix 𝑸k\bm{Q}_{k} in (8) could be either the identity matrix or the (approximate) covariance matrix of ϵk\bm{\epsilon}_{k} and 𝒏˙k\dot{\bm{n}}_{k}. More specifically, with 𝑸k=𝑰,k\bm{Q}_{k}=\bm{I},\ \forall\,k, problem (8) is a classical nonlinear LS problem based on the zero-mean property of noises ϵk\bm{\epsilon}_{k} and 𝒏˙k\dot{\bm{n}}_{k}. Instead, let 𝐑k\mathbf{R}_{k} be the approximated covariance matrix with respect to ϵk\bm{\epsilon}_{k} given in [24], then problem (8) with

𝑸k=[𝑹k+1+𝑹k+qTk3/3𝑰qTk2/2𝑰qTk2/2𝑰qTk𝑰]1\bm{Q}_{k}=\begin{bmatrix}\bm{R}_{k+1}+\bm{R}_{k}+qT_{k}^{3}/3\bm{I}&qT_{k}^{2}/2\bm{I}\\ qT_{k}^{2}/2\bm{I}&qT_{k}\bm{I}\end{bmatrix}^{-1} (9)

becomes a weighted NLS estimation. It is worth mentioning that two comprehensive works [26, 27] have provided different formulations for evaluating h¯()\bar{h}(\cdot) and 𝑹k\bm{R}_{k} to resolve the incompatibility issue in [24]. However, as the focus of this paper is to utilize optimization techniques to handle the nonlinearity in the bias estimation problem, we have chosen to use the basic version of the unbiased coordinate transformation [24]. Nevertheless, the optimization techniques developed in this paper are also applicable to other unbiased coordinate transformations.

The mathematical difficulty of solving problem (8) (no matter how to choose 𝑸k\bm{Q}_{k}) lies in its nonlinearity, as gk()g_{k}(\cdot) contains multiple products of trigonometric functions. In the next section, we will exploit the special structures of problem (8) and develop a BCD algorithm for solving it (with any positive definite matrix 𝑸k\bm{Q}_{k}).

IV A Block Coordinate Descent Algorithm

The proposed BCD algorithm iteratively minimizes the objective function with respect to one type of the biases with the others being fixed. The intuition of such decomposition is that different types of biases affect the measurements differently. By dividing the optimization variables into multiple blocks according to the type of bias, we can leverage the distinct mathematical effects of each type of biases on problem (8). In particular, each subproblem with respect to one type of biases has a compact form, which can be solved uniquely under mild conditions. This enables us to develop a computationally efficient algorithm.

Let 𝒗=[𝝃˙1,𝝃˙2,,𝝃˙K]T\bm{v}=[\dot{\bm{\xi}}_{1},\dot{\bm{\xi}}_{2},\ldots,\dot{\bm{\xi}}_{K}]^{T}, Δ𝝆=[Δρ1,Δρ2,,ΔρM]T,\Delta\bm{\rho}=[\Delta\rho_{1},\Delta\rho_{2},\ldots,\Delta\rho_{M}]^{T}, and similar to the other kinds of biases. The objective function in (8) can be expressed as f(𝒗,Δ𝝆,Δ𝜼,Δ𝜶,Δ𝜷,Δ𝜸)f(\bm{v},\Delta\bm{\rho},\Delta\bm{\eta},\Delta\bm{\alpha},\Delta\bm{\beta},\Delta\bm{\gamma}), and the proposed BCD algorithm at iteration tt is

𝒗t+1=argmin𝒗f(𝒗,Δ𝝆t,,Δ𝜸t),\displaystyle\bm{v}^{t+1}=\arg\min\limits_{\bm{v}}f(\bm{v},\Delta\bm{\rho}^{t},\ldots,\Delta\bm{\gamma}^{t}), (10a)
Δ𝝆t+1=argminΔ𝝆f(𝒗t+1,Δ𝝆,,Δ𝜸t),\displaystyle{\Delta\bm{\rho}}^{t+1}=\arg\min\limits_{\Delta\bm{\rho}}f(\bm{v}^{t+1},\Delta\bm{\rho},\ldots,\Delta\bm{\gamma}^{t}), (10b)
Δ𝜼t+1=argminΔ𝜼f(𝒗t+1,,Δ𝜼,,Δ𝜸t),\displaystyle{\Delta\bm{\eta}}^{t+1}=\arg\min\limits_{\Delta\bm{\eta}}f(\bm{v}^{t+1},\ldots,\Delta\bm{\eta},\ldots,\Delta\bm{\gamma}^{t}), (10c)
Δ𝜶t+1=argminΔ𝜶f(𝒗t+1,,Δ𝜶,,Δ𝜸t),\displaystyle{\Delta\bm{\alpha}}^{t+1}=\arg\min\limits_{\Delta\bm{\alpha}}f(\bm{v}^{t+1},\ldots,\Delta\bm{\alpha},\ldots,\Delta\bm{\gamma}^{t}), (10d)
Δ𝜷t+1=argminΔ𝜷f(𝒗t+1,,Δ𝜷,Δ𝜸t),\displaystyle{\Delta\bm{\beta}}^{t+1}=\arg\min\limits_{\Delta\bm{\beta}}f(\bm{v}^{t+1},\ldots,\Delta\bm{\beta},\Delta\bm{\gamma}^{t}), (10e)
Δ𝜸t+1=argminΔ𝜸f(𝒗t+1,,Δ𝜷t+1,Δ𝜸).\displaystyle{\Delta\bm{\gamma}}^{t+1}=\arg\min\limits_{\Delta\bm{\gamma}}f(\bm{v}^{t+1},\ldots,\Delta\bm{\beta}^{t+1},\Delta\bm{\gamma}). (10f)

In the next, we derive the solutions to the above subproblems (10a)–(10f). For simplicity, we will omit the iteration index tt in the coming derivation.

IV-A LS Solution for Subproblems (10a) and (10b)

By (8), we have the following convex quadratic reformulation for subproblem (10a):

min𝒗𝑯v𝒗+𝒄v𝑸2,\min\limits_{\bm{v}}\ \|\bm{H}_{v}\bm{v}+\bm{c}_{v}\|_{\bm{Q}}^{2}, (11)

where 𝑯v6(K1)×3K\bm{H}_{v}\in\mathbb{R}^{6(K-1)\times 3K} is the coefficient matrix, 𝒄v6(K1)\bm{c}_{v}\in\mathbb{R}^{6(K-1)} is the constant vector, and 𝑸𝕊+6(K1)\bm{Q}\in\mathbb{S}_{+}^{6(K-1)} is a positive definite matrix. Detailed expressions of them are given in Appendix C-A of the Supplementary Material. Setting the gradient to be zero, the closed-form solution to subproblem (10a) is

𝒗=(𝑯vT𝑸𝑯v)𝑯vT𝑸v𝒄v,\bm{v}^{*}=-(\bm{H}_{v}^{T}\bm{Q}\bm{H}_{v})^{{\dagger}}\bm{H}_{v}^{T}\bm{Q}_{{v}}\bm{c}_{v}, (12)

where \dagger denotes the pseudo-inverse operator.

The subproblem (10b) for updating Δ𝝆\Delta\bm{\rho} also has a closed-form solution since its objective is also convex quadratic with respect to Δ𝝆\Delta\bm{\rho}. Subproblem (10b) can be reformulated as

minΔ𝝆𝑯ρΔ𝝆+𝒄ρ𝑸2,\min\limits_{\Delta\bm{\rho}}\ \|\bm{H}_{\rho}\Delta\bm{\rho}+\bm{c}_{{\rho}}\|_{\bm{Q}}^{2}, (13)

where 𝑯ρ6(K1)×M\bm{H}_{\rho}\in\mathbb{R}^{6(K-1)\times M} is the coefficient matrix, and 𝒄ρ6(K1)\bm{c}_{\rho}\in\mathbb{R}^{6(K-1)} is the constant vector. Detailed expressions of them are given in Appendix C-B of the Supplementary Material. Similar to subproblem (10a), the closed-form solution of (10b) is

Δ𝝆=(𝑯ρT𝑸𝑯ρ)𝑯ρT𝑸𝒄ρ.\Delta\bm{\rho}^{*}=-(\bm{H}_{\rho}^{T}\bm{Q}\bm{H}_{\rho})^{{{\dagger}}}\bm{H}_{\rho}^{T}\bm{Q}\bm{c}_{\rho}. (14)

IV-B Solution for Subproblems (10c)–(10f)

Since gk()g_{k}(\cdot) in (8) is linear with respect to the trigonometric functions of the angle biases, all the four subproblems (10c)-(10f) can be reformulated into a same QCQP form. For better presentation, we use ΔϑΩ={Δ𝜼,Δ𝜶,Δ𝜷,Δ𝜸}\Delta\bm{\vartheta}\in\Omega=\{\Delta\bm{\eta},\Delta\bm{\alpha},\Delta\bm{\beta},\Delta\bm{\gamma}\} to denote one kind of these angle biases and the corresponding subproblem can be reformulated as

min𝒙2M𝑯ϑ𝒙+𝒄ϑ𝑸2,s.t.𝒙𝒞,\displaystyle\min_{\bm{x}\in\mathbb{R}^{2M}}\ \|\bm{H}_{\vartheta}\bm{x}+\bm{c}_{\vartheta}\|_{\bm{Q}}^{2},\quad\textrm{s.t.}\ \bm{x}\in\mathcal{C}, (15)

where

𝒙=[cosΔϑ1,sinΔϑ1,,cosΔϑM,sinΔϑM]T\bm{x}=[\cos\Delta\vartheta_{1},\sin\Delta\vartheta_{1},\ldots,\cos\Delta\vartheta_{M},\sin\Delta\vartheta_{M}]^{T}

and set 𝒞\mathcal{C} is defined as

𝒞{𝒙2Mx2i2+x2i12=1,i=1,,M}.\mathcal{C}\triangleq\{\bm{x}\in\mathbb{R}^{2M}\mid x_{2i}^{2}+x_{2i-1}^{2}=1,~{}i=1,\dots,M\}.

In (15), 𝑯ϑ6(K1)×2M\bm{H}_{\vartheta}\in\mathbb{R}^{6(K-1)\times 2M} is the coefficient matrix and 𝒄ϑ6(K1)\bm{c}_{\vartheta}\in\mathbb{R}^{6(K-1)} is the constant vector. Detailed expressions of them are given in Appendix C-C of the Supplementary Material.

Problem (15) is a non-convex problem due to its non-convex equality constraints. In general, such class of problem is known to be NP-hard [28] and the SDR based approach [29, 30] was used to solve it. In particular, in [23], the authors utilized the SDR technique to solve a specific case of problem (15) where it can be reformulated as a complex QCQP, and proved that the SDR is tight, i.e., the global minimizer of the QCQP problem can be obtained by solving the SDR under a mild condition. This motivates us to ask whether problem (15) can also be globally solved and whether there are some efficient algorithms better than solving the SDR.

Note that the non-convex constraints in problem (15) enjoy an “easy-projection” property. In particular, let Proj𝒞(𝒙)\textrm{Proj}_{\mathcal{C}}(\bm{x}) denote the projection of any non-zero point 𝒙2M\bm{x}\in\mathbb{R}^{2M} onto 𝒞\mathcal{C}. Then Proj𝒞(𝒙)\textrm{Proj}_{\mathcal{C}}(\bm{x}) admits the following closed-form solution:

Proj𝒞(𝒙)=argmin𝒛𝒞𝒙𝒛2=𝑫𝒙1𝒙,\textrm{Proj}_{\mathcal{C}}(\bm{x})=\arg\min_{\bm{z}\in\mathcal{C}}\|\bm{x}-\bm{z}\|^{2}=\bm{D}_{\bm{x}}^{-1}\bm{x},

where 𝑫𝒙\bm{D}_{\bm{x}} is a diagonal matrix defined as

𝑫𝒙=Diag([𝒙12+𝒙22,,𝒙2M12+𝒙2M2])𝑰2,\bm{D}_{\bm{x}}=\textrm{Diag}\left(\left[\sqrt{\bm{x}_{1}^{2}+\bm{x}_{2}^{2}},\ldots,\sqrt{\bm{x}_{2M-1}^{2}+\bm{x}_{2M}^{2}}\right]\right)\otimes\bm{I}_{2},

and \otimes denotes the Kronecker product. Such “easy-projection” property together with recent results of the ADMM for nonconvex problems [31] provides us a potential way of efficiently solving problem (15) with a guaranteed convergence. The choice of the ADMM algorithm over other potential methods such as the gradient projection (GP) method is discussed in Remark 1. In the next, we will apply the ADMM for solving it and also show that the ADMM is able to globally solve problem (15) under mild conditions.

Remark 1 (ADMM versus GP).

Note that the GP method is a good option for solving problem (15), whose convergence to a stationary point can be guaranteed under the Kurdyka-Lojasiewicz framework [32]. In this paper, we choose the ADMM rather than the GP method for solving problem (15) is because the ADMM shows a more stable numerical behavior than the GP method and in particular the ADMM is more stable to the choice of the step size than the GP method. This is particularly important for solving our interested problem (8) as we need to solve subproblems in the form of (15) many times in the proposed BCD algorithm.

Based on such “easy-projection” property, we introduce an auxiliary variable 𝒙=𝒛\bm{x}=\bm{z} to split problem (15), which makes all subproblems in the ADMM iterations admit closed-form solutions. In particular, subproblem (15) is reformulated as

min𝒛,𝒙𝑯ϑ𝒙+𝒄ϑ2+ι𝒞(𝒛),s.t.𝒙=𝒛,\displaystyle\min_{\bm{z},\bm{x}}\ \|\bm{H}_{\vartheta}\bm{x}+\bm{c}_{\vartheta}\|^{2}+\iota_{\mathcal{C}}(\bm{z}),\quad\textrm{s.t.}\ \bm{x}=\bm{z}, (16)

where ι𝒞(𝒛)\iota_{\mathcal{C}}(\bm{z}) is the indicator function for set 𝒞\mathcal{C}, given as

ι𝒞(𝒛)={0,if 𝒛𝒞,,otherwise.\iota_{\mathcal{C}}(\bm{z})=\left\{\begin{aligned} 0,\ &\textrm{if }\bm{z}\in\mathcal{C},\\ \infty,\ &\textrm{otherwise}.\end{aligned}\right. (17)

The augmented Lagrangian function for problem (16) is

Lρ(𝒙,𝒛,𝝀)=𝑯ϑ𝒙+𝒄ϑ2+𝝀T(𝒙𝒛)+ρ2𝒙𝒛2+ι𝒞(𝒛),L_{\rho}(\bm{x},\bm{z},\bm{\lambda})=\|\bm{H}_{\vartheta}\bm{x}+\bm{c}_{\vartheta}\|^{2}+\bm{\lambda}^{T}(\bm{x}-\bm{z})+\frac{\rho}{2}\|\bm{x}-\bm{z}\|^{2}+\iota_{\mathcal{C}}(\bm{z}), (18)

where ρ>0\rho>0 is the penalty parameter and 𝝀2M\bm{\lambda}\in\mathbb{R}^{2M} is the Lagrange multiplier for the equality constraint 𝒙=𝒛\bm{x}=\bm{z}. Then, the ADMM updates at iteration \ell are

𝒙+1\displaystyle\bm{x}^{\ell+1} =min𝒙Lρ(𝒙,𝒛,𝝀),\displaystyle=\min_{\bm{x}}L_{\rho}(\bm{x},\bm{z}^{\ell},\bm{\lambda}^{\ell}), (19a)
𝒛+1\displaystyle\bm{z}^{\ell+1} =min𝒛Lρ(𝒙+1,𝒛,𝝀),\displaystyle=\min_{\bm{z}}L_{\rho}(\bm{x}^{\ell+1},\bm{z},\bm{\lambda}^{\ell}), (19b)
𝝀+1\displaystyle\bm{\lambda}^{\ell+1} =𝝀t+ρ(𝒙+1𝒛+1).\displaystyle=\bm{\lambda}^{t}+\rho(\bm{x}^{\ell+1}-\bm{z}^{\ell+1}). (19c)

Note that subproblem (19a) is an unconstrained strongly convex quadratic problem and subproblem (19b) is actually a projection problem, both of which can be solved in closed forms. Details are given in Algorithm 1.

Algorithm 1 ADMM for Solving Problem (16)
1:  Initialize 𝒙0,𝒛0,𝝀0\bm{x}^{0},\bm{z}^{0},\bm{\lambda}^{0} and ρ>0\rho>0
2:  for =0,1,2,,\ell=0,1,2,\ldots, do
3:     𝒙+1=(𝑯ϑT𝑸𝑯ϑ+ρ2𝑰)1(2𝑯ϑT𝑸𝒄ϑ+𝝀ρ𝒛)/2\bm{x}^{\ell+1}=-(\bm{H}_{\vartheta}^{T}\bm{Q}\bm{H}_{\vartheta}+\frac{\rho}{2}\bm{I})^{-1}(2\bm{H}_{\vartheta}^{T}\bm{Q}\bm{c}_{\vartheta}+\bm{\lambda}^{\ell}-\rho\bm{z}^{\ell})/2;
4:     𝒛+1=Proj𝒞(𝝀+ρ𝒙+1ρ)\bm{z}^{\ell+1}=\textrm{Proj}_{\mathcal{C}}(\frac{\bm{\lambda}^{\ell}+\rho\bm{x}^{\ell+1}}{\rho});
5:     𝝀+1=𝝀+ρ(𝒙+1𝒛+1)\bm{\lambda}^{\ell+1}=\bm{\lambda}^{\ell}+\rho(\bm{x}^{\ell+1}-\bm{z}^{\ell+1});
6:  end for
7:  Output 𝒙,𝒛\bm{x}^{*},\bm{z}^{*}

Next, we discuss the convergence property of Algorithm 1. Before doing this, we first study the structures of coefficients 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta} in problem (15). Note that 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta} can be viewed as random matrices/vectors and the randomness is caused by those zero-mean Gaussian noises, i.e., 𝒘k\bm{w}_{k} in (3) and 𝒏k,𝒏˙k\bm{n}_{k},\,\dot{\bm{n}}_{k} in (5). To understand the effects of these noises, let us suppose that there is no noise, i.e., 𝒘k=𝟎,𝒏k=𝒏˙k=𝟎\bm{w}_{k}=\bm{0},\bm{n}_{k}=\dot{\bm{n}}_{k}=\bm{0}, for all kk, and all the other biases ΩΔϑ\Omega\setminus\Delta\bm{\vartheta} and velocity 𝒗\bm{v} take the true values. Then by the same derivations in Appendix C of the Supplementary Material (with compensation factors λη=λϕ=1\lambda_{\eta}=\lambda_{\phi}=1), we attain new 𝑯¯ϑ\bar{\bm{H}}_{\vartheta} and 𝒄¯ϑ\bar{\bm{c}}_{\vartheta}. Intuitively, 𝑯¯ϑ\bar{\bm{H}}_{\vartheta} and 𝒄¯ϑ\bar{\bm{c}}_{\vartheta} represent the “true” parts of 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta} by removing measurement noise 𝒘k\bm{w}_{k}, motion process noises 𝒏k\bm{n}_{k} and 𝒏˙k\dot{\bm{n}}_{k}, and errors of other biases and velocity. Consequentially, the “errors” in 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta} can be defined as

Δ𝑯ϑ=𝑯ϑ𝑯¯ϑ,Δ𝒄ϑ=𝒄ϑ𝒄¯ϑ.\displaystyle\Delta\bm{H}_{\vartheta}=\bm{H}_{\vartheta}-\bar{\bm{H}}_{\vartheta},\ \Delta\bm{c}_{\vartheta}=\bm{c}_{\vartheta}-\bar{\bm{c}}_{\vartheta}. (20)

Technically, decomposing 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta} in (20) provides a natural way of characterizing the convergence behavior of Algorithm 1, given in Theorem 1.

Theorem 1.

Consider applying Algorithm 1 to solve problem (16) with a sufficiently large penalty parameter ρ\rho, then

  1. 1.

    the sequence generated by Algorithm 1 has at least one limit point and each limit point is a stationary point of Lρ(𝒙,𝒛,𝝀)L_{\rho}(\bm{x},\bm{z},\bm{\lambda}) (defined in (21));

  2. 2.

    if Δ𝑯ϑ\Delta\bm{H}_{\vartheta} and Δ𝒄ϑ\Delta\bm{c}_{\vartheta} in (20) are sufficiently small, then Lρ(𝒙,𝒛,𝝀)L_{\rho}(\bm{x},\bm{z},\bm{\lambda}) has a unique stationary point (𝒙¯,𝒛¯,𝝀¯)(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}) and the entire sequence generated by Algorithm 1 converges to (𝒙¯,𝒛¯,𝝀¯);(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}); furthermore, if Δ𝑯ϑ=𝟎\Delta\bm{H}_{\vartheta}=\bm{0} and Δ𝒄ϑ=𝟎\Delta\bm{c}_{\vartheta}=\bm{0}, then 𝒙¯\bar{\bm{x}} is equal to the corresponding true bias.

Proof.

See Appendix B. ∎

The first statement of Theorem 1 is a direct result of [31, Corollary 2], which ensures the convergence of the ADMM for a class of the nonconvex problem (with a Lipschitiz continuous gradient) over a compact manifold. The second statement of Theorem  1 on the uniqueness of the stationary point is due to the special structure of problem (16). The entire sequence convergence in Theorem 1 is a combined result of the general convergence of the ADMM in [31, Corollary 2] and the special structure of problem (16).

IV-C Proposed BCD Algorithm

The proposed BCD algorithm for solving problem (8) is summarized as Algorithm 2. Three remarks on its convergence, computational complexity, and comparison with other BCD variants are given in order.

Algorithm 2 Proposed BCD Algorithm for Problem (8)
0:  Sensor measurements {𝒛k}k=1K\{\bm{z}_{k}\}^{K}_{k=1}
1:  Initialize Δ𝝆0,Δ𝜼0,Δ𝜶0,Δ𝜷0,Δ𝜸0\Delta\bm{\rho}^{0},\Delta\bm{\eta}^{0},\Delta\bm{\alpha}^{0},\Delta\bm{\beta}^{0},\Delta\bm{\gamma}^{0}, and 𝒗0\bm{v}^{0}
2:  for t=0,1,2,,t=0,1,2,\ldots, do
3:     Update 𝒗t+1\bm{v}^{t+1} by (12);
4:     Update Δ𝝆t+1\Delta\bm{\rho}^{t+1} by (14);
5:     Sequentially update Δ𝜼t+1\Delta\bm{\eta}^{t+1},Δ𝜶t+1\Delta\bm{\alpha}^{t+1},Δ𝜷t+1\Delta\bm{\beta}^{t+1},Δ𝜸t+1\Delta\bm{\gamma}^{t+1} by Algorithm 1, respectively;
6:  end for
6:  Sensor biases Δ𝝆,Δ𝜼,Δ𝜶,Δ𝜷,Δ𝜸\Delta\bm{\rho}^{*},\Delta\bm{\eta}^{*},\Delta\bm{\alpha}^{*},\Delta\bm{\beta}^{*},\Delta\bm{\gamma}^{*}.

Convergence of Algorithm 2. The well-known convergence analysis of BCD [33, 34] shows that if all subproblems in the proposed BCD Algorithm 2 have unique solutions and can be globally solved, then the iterates can converge to a stationary point of the non-convex problem (8). The uniqueness and the global optimality of solutions 𝒗\bm{v}^{*} (in (12)) for subproblem (10a) and Δ𝝆\Delta\bm{\rho}^{*} (in (14)) for subproblem (10b) can be guaranteed if 𝑯v\bm{H}_{v} and 𝑯ρ\bm{H}_{\rho} are of full rank. For subproblems (10c)–(10f), the full rankness of 𝑯ϑ\bm{H}_{\vartheta} together with sufficiently small Δ𝑯ϑ\Delta\bm{H}_{\vartheta} and Δ𝒄ϑ\Delta\bm{c}_{\vartheta} in (20) further ensure that each of the corresponding subproblems has a unique solution and can be numerically attained by the proposed ADMM Algorithm 1 (cf. Theorem 1). According to the definitions of 𝑯v\bm{H}_{v}, 𝑯ρ\bm{H}_{\rho}, and 𝑯ϑ\bm{H}_{\vartheta} in Appendix C of the Supplementary Material and Assumption 1, we know that (i) 𝑯v\bm{H}_{v} in (11) is always of full rank since each sensor has at least one measurement; (ii) 𝑯ρ\bm{H}_{\rho} in (13) and 𝑯ϑ\bm{H}_{\vartheta} in (15) are of full rank with probability one (whose proof is provided in Appendix D of the Supplementary Material). Based on the above discussion, we can conclude that Algorithm 2 can converge to a stationary point of the non-convex problem (8) if those Δ𝑯ϑ\Delta\bm{H}_{\vartheta} and Δ𝒄ϑ\Delta\bm{c}_{\vartheta} along the BCD iterates are all sufficiently small. Finally, we remark that this sufficient small condition provides a theoretical understanding for the convergence of Algorithm 2 and in general it depends on the initial point. Our numerical simulation results in Section V suggest that such a condition always holds in various scenarios by simply initializing all biases to be zeros. The results in the noiseless setting (Section V-A) even verify the global optimality of Algorithm 2.

Computational complexity of Algorithm 2. All subproblems in the proposed BCD updates (10a)–(10f) take two special forms, i.e., unconstrained linear LS problems (i.e., subproblems (11) and (13)) and quadratic programs with special non-convex constraints (i.e., subproblem (15)). It can be shown that the computational complexity of solving subproblem (11), (13), and (15) is 𝒪(K2+K)\mathcal{O}(K^{2}+K), 𝒪(M2+MK+M)\mathcal{O}(M^{2}+MK+M), and 𝒪(I((2M)2+2MK+2M))\mathcal{O}\left(I\left((2M)^{2}+2MK+2M\right)\right) (where II is the total iteration number of the ADMM), respectively. It is worthwhile mentioning that the computational complexity of solving (15) by using the ADMM, which is 𝒪(I(2M)2),\mathcal{O}(I(2M)^{2}), is generally much less than that of using the SDR technique, which is 𝒪((2M)4.5)\mathcal{O}((2M)^{4.5}) [29]. The computational time comparison in Table II (Section V-B2) demonstrates the much better computational efficiency of using the ADMM over the SDR based approach for solving subproblem (15).

BCD variants. Finally, we remark that there are several nice extensions of the BCD framework such as the block successive minimization method [35] and the maximum block improvement (MBI) method [36]. Both of them can be applied to solve problem (8) with guaranteed convergence (with a suitable approximation of the objective function). The sequential optimization version of MBI [36] can effectively handle nonconvex constraints. In this paper, we still use the basic version of BCD because the nonconvex BCD subproblems with respect to angle biases enjoy the unique solution property (cf. Theorem 1), which reveals an interesting insight into the considered bias estimation problem that the true bias can be recovered under the ideal case (i.e., Δ𝑯ϑ=𝟎\Delta\bm{H}_{\vartheta}=\bm{0} and Δ𝒄ϑ=𝟎\Delta\bm{c}_{\vartheta}=\bm{0}). MBI also enjoys the same property of the uniqueness of the subproblems but it requires more computational cost than classical BCD. A parallel implementation of MBI can improve its computational efficiency but this is beyond the scope of this paper. Simulation results in Section V-A further show that all biases are exactly recovered by our proposed BCD Algorithm 2 in the noiseless case.

V Simulation Results

In this section, we present simulation results to illustrate the performance of using our proposed BCD Algorithm 2 to solve the proposed formulation (8).

Scenario setting: We consider a scenario with four sensors and one target in the 3-dimensional space, as illustrated in Fig. 2. The locations and biases of each sensor are listed in Table I. The target moves with a nearly constant velocity 𝔼[𝝃˙k]=[0,0.3,0]\mathbb{E}[\dot{\bm{\xi}}_{k}]=[0,0.3,0]\,km/s with the initial position being [30,5,8][-30,-5,8]\,km. The presumed rotation angle 𝜻m\bm{\zeta}_{m} is fixed at (0,0,0)(0^{\circ},0^{\circ},0^{\circ}) for all m=1,2,3,4m=1,2,3,4. The sensors work in an asynchronous mode that four sensors measure the target positions every 1010\,s with different starting times, i.e., 2.52.5\,s, 55\,s, 7.57.5\,s, 1010\,s, respectively. Moreover, each sensor sends 2020 stamped measurements to a fusion center for sensor registration and the total observation last 210210\,s. All simulations are done on a laptop (Intel Core i7) with Matlab2018b.

Baseline approaches: We select three representative approaches in the literature and extend them to the considered 3-dimensional case where all sensors are biased and asynchronous. The first approach is the augmented state Kalman filtering (ASKF) approach [37] which treats sensor biases as the augmented states and uses the Kalman filter to jointly estimate sensor biases and target states. The second approach is the LS approach [7] which solves the proposed NLS formulation in (8) using the linear approximation (referred as linearized LS). The third approach is the ML approach [11, 6], which is originally proposed for the synchronous multi-sensor system [11] or for the scenario with one sensor being bias-free [6]. To extend it to the considered 3-dimensional case with a stable numerical behavior, we first estimate the target state by the smoothed Kalman filter (SKF) and then estimate sensor biases by the Gaussian-Newton method [33]. For simplicity, we refer this approach as SKF-GN.

Proposed algorithms’ parameter selection: In all of our following simulations, the parameters in the BCD and ADMM algorithms are selected as follows: the proposed BCD algorithm is initialized with all biases being zeros and the proposed ADMMs for solving subproblems (10c)–(10f) are also initialized with the corresponding angle biases being zeros; the penalty parameter ρ\rho in the ADMM is set to be the average of the diagonal elements of matrix 𝑯ϑT𝑸𝑯ϑ\bm{H}^{T}_{\vartheta}\bm{Q}\bm{H}_{\vartheta} in (15); the proposed BCD algorithm is terminated when the maximum difference between two consecutive BCD iterates is less than 10510^{-5} and the ADMM is terminated when both primal and dual residuals are less than 10910^{-9}.

TABLE I: Sensor Positions and Biases [km, degree].
Position Δρ\Delta\rho Δη\Delta\eta Δα\Delta\alpha Δβ\Delta\beta Δγ\Delta\gamma
Sensor 1 [0,15,0][0,-15,0] -0.5 -2 -2 1 -1
Sensor 2 [20,5,2][-20,5,2] 0.3 -2 2 -1 -1
Sensor 3 [20,5,0][20,5,0] -0.4 -2 2 -2 2
Sensor 4 [0,10,1][0,10,-1] -0.2 -1 -2 -1 1
Refer to caption
Figure 2: An illustration of the simulation scenario.

V-A Global Optimality of Proposed BCD in the Noiseless Case

In this subsection, we test the convergence behavior of the proposed BCD for solving problem  (8) with 𝑸k=𝑰\bm{Q}_{k}=\bm{I}, k\forall k. We consider the noiseless case with the measurement noise variance σρ2=0km\sigma_{\rho}^{2}=0\,\textrm{km}, ση2=σϕ2=0\sigma_{\eta}^{2}=\sigma_{\phi}^{2}=0^{\circ} and the target motion noise q=0m2/s3q=0\,\textrm{m}^{2}/\textrm{s}^{3}. In such a case, the true biases achieve a zero loss and hence they are a global solution of problem (8). Fig. 3 plots the optimization residuals over the number of iterations and the elapsed time. The optimization residuals for different kinds of biases are defined as the sum of the squared estimation error over all sensors, e.g., the optimization residual for the range bias is m=1M(Δρ^mΔρm)2\sum_{m=1}^{M}(\Delta\hat{\rho}_{m}-\Delta\rho_{m})^{2}, where Δρ^m\Delta\hat{\rho}_{m} is the estimated range bias of sensor mm. The convergence curves in Fig. 3 demonstrate the global optimality of the proposed BCD algorithm in the noiseless case, i.e., the loss function converges to zero and the finally returned solution by the BCD algorithm is very close to the true biases. The similar convergence behavior has also been observed in many other scenarios. In particular, Fig. 4 shows the convergence curves of the proposed BCD algorithm in 2020 randomly generated scenarios222 The sensors’ positions are uniformly placed over the region [20,20]2×[1,1][-20,20]^{2}\times[-1,1]\,km and sensors’ biases are generated from the Gaussian distribution, i.e., Δρ𝒩(0,(1km)2)\Delta\rho\sim\mathcal{N}(0,(1\,\textrm{km})^{2}), and Δη,Δα,Δβ,Δγ𝒩(0,(3)2)\Delta\eta,\Delta\alpha,\Delta\beta,\Delta\gamma\sim\mathcal{N}(0,(3^{\circ})^{2})..

Refer to caption
Figure 3: Convergence of the proposed BCD algorithm in the noiseless case of the scenario illustrated in Fig. 2.
Refer to caption
Figure 4: Convergence of the proposed BCD algorithm in 2020 randomly generated scenarios, where each curve corresponds to one randomly generated scenario.

V-B Estimation Performance in the Noisy Case

In this subsection, we evaluate the estimation performance of the proposed BCD algorithm in the noisy case. Note that the global optimality in the noisy case is generally hard to examine, as the global solution of the corresponding problem is unknown. However, our numerical results show that the solution found by the BCD algorithm achieves a pretty satisfactory estimation performance.

In the rest of simulations, we use the root mean square error (RMSE) as the estimation performance metric. The root of hybrid Cramer-Rao lower bound (RHCRLB) [38, 7] is used as the performance benchmark. The RMSEs and RHCRLBs in all of the following figures are obtained by averaging 100 independent Monte Carlo runs. We will compare the proposed BCD algorithm with the aforementioned three approaches under different setups.

V-B1 Estimation Performance Comparison

In this part, we compare the estimation performance of different approaches with σρ=0.05km\sigma_{\rho}=0.05\,\textrm{km}, ση=σϕ=0.02\sigma_{\eta}=\sigma_{\phi}=0.02^{\circ}, and q=0.5m2/s3q=0.5\,\textrm{m}^{2}/\textrm{s}^{3}. Two different choices of 𝑸k\bm{Q}_{k} in the proposed formulation (8) are considered: one is 𝑸k=𝑰\bm{Q}_{k}=\bm{I} which corresponds to classical nonlinear LS estimation (referred as BCD-NLS), and the other is 𝑸k\bm{Q}_{k} in (9) which can be regarded as pseudo ML estimation (referred as BCD-PML). The RMSEs of the range bias of different sensors are shown in Fig. 5 and the RMSEs of different kinds of angle biases are shown in Fig. 6. We can observe that the two proposed approaches (i.e., BCD-NLS and BCD-PML) achieve much smaller RMSEs than the other three approaches. The main reason for this is that all the other three approaches utilize the linear approximation to deal with the nonlinearity, and such approximation usually results into a model mismatch which degrades the estimation performance. Notice that the only difference between Linearized LS and BCD-NLS is the algorithm used for solving problem (8) with 𝑸=𝑰\bm{Q}=\bm{I}. More specifically, Linearized LS utilizes the linear approximation which admits a closed-form solution for estimating biases while BCD-NLS uses the proposed BCD algorithm to deal with the nonlinearity. Clearly, BCD-NLS achieves smaller RMSE. Furthermore, by incorporating the second-order statistics, BCD-PML achieves a smaller RMSE than BCD-NLS, which demonstrates the effectiveness of the proposed choice of 𝑸k\bm{Q}_{k} in (9).

Refer to caption
Figure 5: RMSE and RHCRLB for σρ=0.05km\sigma_{\rho}=0.05\,\textrm{km}, ση=σϕ=0.02\sigma_{\eta}=\sigma_{\phi}=0.02^{\circ}, and q=0.5m2/s3q=0.5\,\textrm{m}^{2}/\textrm{s}^{3}.
Refer to caption
(a) Elevation bias.
Refer to caption
(b) Roll bias.
Refer to caption
(c) Pitch bias.
Refer to caption
(d) Yaw bias.
Figure 6: RMSE and RHCRLB for σρ=0.05km\sigma_{\rho}=0.05\,\textrm{km}, ση=σϕ=0.02\sigma_{\eta}=\sigma_{\phi}=0.02^{\circ}, and q=0.5m2/s3q=0.5\,\textrm{m}^{2}/\textrm{s}^{3}.

V-B2 Computational Time Comparison

In this part, we compare the computational time of different approaches. The averaged computational time of different approaches is summarized in Table II. Notice that the SDR technique in [23] can be extended to deal with the nonlinearity in the 3-dimensional scenario333Under mild conditions, the SDR technique gives the same solution of problem (15) obtained by the proposed ADMM. In particular, the proof in Appendix B can be modified to show that the SDR admits a unique rank-one solution if Δ𝑯\Delta\bm{H} and Δ𝒄\Delta\bm{c} in (15) are sufficiently small. This implies that the SDR technique and the ADMM give the same solution.. We also include the computational time of employing the SDR technique to solve QCQP subproblems in form of (15) to demonstrate the computational effectiveness of the proposed ADMM. Values in the bracket in Table II correspond to the computational time of the BCD algorithm where CVX [39] is used to solve the SDRs of the corresponding QCQP subproblems. In our simulations, we observe that the difference of the final solutions returned by the BCD algorithm with the SDR technique and the ADMM (for solving the subproblems) is less than or equal to 10510^{-5}. This implies the two approaches return almost the same solutions.

From Table II, it can be observed that the linear approximation simplifies the estimation procedure which makes the three approaches, i.e., ASKF, Linearized LS, and SKF-GN, take much less computational time than the proposed two approaches. This is because all of these three approaches are based on the linear approximation, which simplify the estimation procedure but also degrades the estimation performance. Comparing our proposed algorithms with the ADMM and the SDR technique (in bracket) being used for solving the subproblems, we can observe that the proposed ADMM significantly reduces the computational time. We remark that sensor biases in practice usually change slowly [7], i.e., in hours or days, and thus the computational time of the proposed approaches is acceptable for practical applications.

TABLE II: Computational Time [second]
ASKF Linearized LS SKF-GN BCD-NLS BCD-PML
0.03 0.02 0.16 34.61 (1309.21) 33.28 (1289.54)
Refer to caption
Figure 7: RMSE and RHCRLB of range and elevation biases with different measurement noise levels (σρ,σϕ,ση)(\sigma_{\rho},\sigma_{\phi},\sigma_{\eta}).

V-B3 Impact of Measurement Noise

In this part, we present some simulation results to illustrate the effect of the measurement noise on the estimation performance. We fix the motion noise density q=0.5m2/s3q=0.5\,\textrm{m}^{2}/\textrm{s}^{3} and set different values of the measurement noise level (σρ,σϕ,ση)(\sigma_{\rho},\sigma_{\phi},\sigma_{\eta}). Since the measurement noise level has similar impacts on the estimation performance of the four sensors, we only present the averaged RMSEs and RHCRLBs over four sensors. The comparison of the range biases is presented in Fig. 7 and the comparison of other biases is presented in Fig. 10. It can be observed that BCD-PML achieves the smallest RMSEs among all cases and BCD-NLS is a little worse than BCD-PML. For the other three approaches, due to the model mismatch error introduced by the linear approximation procedure, they all suffer a “threshold” effect when the measurement noise is small, i.e., σϕ=ση=0.02,σρ=0.01km\sigma_{\phi}=\sigma_{\eta}=0.02^{\circ},\sigma_{\rho}=0.01\,\textrm{km}. In contrast, the proposed BCD algorithm exploits the special problem structure and utilizes the nonlinear optimization techniques to deal with the nonlinearity issue. Therefore, BCD-NLS and BCD-PML do not have the ‘threshold’ effects and their RMSEs are quite close to RHCRLBs when the measurement noise level is small.

V-B4 Impact of Target Motion Noise

In this part, we present some simulation results to illustrate the effect of the target motion noise on the estimation performance. We fix the measurement noise level to be σρ=0.1km\sigma_{\rho}=0.1\,\textrm{km} and σϕ=ση=0.1\sigma_{\phi}=\sigma_{\eta}=0.1^{\circ} and change the values of the motion noise density qq in interval [0.2,2]m2/s3[0.2,2]\,\textrm{m}^{2}/\textrm{s}^{3}. Similar to Section V-B3, we compare the averaged RMSEs of the range in Fig. 8 and the RMSEs of other angle biases in Fig. 11. It can be observed that all approaches are somehow robust to the motion process noise. Similarly, we can observe that the proposed approaches perform better than the other three approaches and their RMSEs are very close to RHCRLBs.

Refer to caption
Figure 8: RMSE and RHCRLB range and elevation biases with different motion process noise levels qq.

V-B5 Impact of Sensor Bias

To illustrate the effect of the bias level on the estimation performance (especially for the other three approaches which use the linear approximation), we present some simulation results with different levels of the sensor biases. Specifically, we fix σρ=0.1km\sigma_{\rho}=0.1\,\textrm{km}, σϕ=ση=0.1\sigma_{\phi}=\sigma_{\eta}=0.1^{\circ}, and q=0.5m2/s3q=0.5\,\textrm{m}^{2}/\textrm{s}^{3}, and change the bias values in Table I by multiplying them with a positive constant cc, i.e., c=0.2,0.5,1,2c=0.2,0.5,1,2. The averaged RMSEs of the range biases and other angle biases are compared in Figs. 9 and 12, respectively. In the case where the bias level is small, i.e., c=0.2c=0.2, all approaches have similar RMSEs since the linear approximation leads to a smaller model mismatch error. However, as cc increases, the RMSEs of the other three approaches increase very fast but the proposed two approaches are robust to the bias level.

Refer to caption
Figure 9: RMSE and RHCRLB range and elevation biases with different bias levels cc.
Refer to caption
(a) Elevation bias.
Refer to caption
(b) Roll bias.
Refer to caption
(c) Pitch bias.
Refer to caption
(d) Yaw bias.
Figure 10: RMSE and RHCRLB of rotation angle biases with different measurement noise levels (σρ,σϕ,ση)(\sigma_{\rho},\sigma_{\phi},\sigma_{\eta}).
Refer to caption
(a) Elevation bias.
Refer to caption
(b) Roll bias.
Refer to caption
(c) Pitch bias.
Refer to caption
(d) Yaw bias.
Figure 11: RMSE and RHCRLB of rotation angle biases with different motion process noise levels qq.
Refer to caption
(a) Elevation bias.
Refer to caption
(b) Roll bias.
Refer to caption
(c) Pitch bias.
Refer to caption
(d) Yaw bias.
Figure 12: RMSE and RHCRLB of rotation angle biases with different bias levels cc.

VI Conclusion and Future Work

In this paper, we have presented a weighted nonlinear LS formulation for the 3-dimensional asynchronous multi-sensor registration problem by assuming the existence of a reference target moving with a nearly constant velocity. By choosing appropriate weight matrices, the proposed formulation includes classical nonlinear LS and ML estimation as special cases. Moreover, we have proposed an efficient BCD algorithm for solving the formulated problem and also have developed a computationally efficient algorithm based on the ADMM to globally solve nonconvex subproblems with respect to angle biases in the BCD iteration. Numerical simulation results demonstrate the effectiveness and efficiency of the proposed formulation and the BCD algorithm. In particular, our proposed approaches can achieve significantly better estimation performance than existing celebrated approaches based on the linear approximation and our proposed approaches are the most robust to the sensor bias level.

It is important to note that the proposed formulation (8) implicitly assumes that all observed data are originated by a single moving target. Other important issues in practical multi-sensor systems such as missed detection, multiple targets, and clutters [40] are not considered. Such more complicated scenarios are of greatly practical interests, and dealing with the data association problem (with biased data) raised there is necessary [19]. Combining the developed optimization technique together with advanced multi-target tracking techniques [41] would be an interesting direction to further explore, especially for sensors on moving platforms. Finally, we remark that the use of the rotation matrix in (2) may lead to a gimbal lock problem when the pitch angle is ±90\pm 90^{\circ}. However, in the context of this study, where sensors are fixed at a ground location with fixed positions and attitude angles, this issue can be avoided by pre-aligning the rotation angles between the local and global Cartesian coordinates being zeros. In contrast, for sensors mounted on moving platforms, it is advisable to use quaternions to prevent gimbal lock from occurring. Exploring techniques for compensating biases in quaternion representations would be an interesting future research.

Appendix A Proof of Proposition 1

Since the result is independent with respect to noise 𝒘k\bm{w}_{k}, we assume 𝒘k=𝟎\bm{w}_{k}=\bm{0} for notational simplicity. Define (r,ω,υ)(r,\omega,\upsilon) such that [rcosωcosυ,rsinωcosυ,rsinυ]T=RyT(βsk+Δβsk)RxT(αsk+Δαsk)(𝝃k𝒑sk)[r\cos\omega\cos\upsilon,r\sin\omega\cos\upsilon,r\sin\upsilon]^{T}=R_{y}^{T}(\beta_{s_{k}}+\Delta\beta_{s_{k}})R_{x}^{T}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})(\bm{\xi}_{k}-\bm{p}_{s_{k}}). Then, by (3) and omitting the subscripts kk and sks_{k}, we have

[ρϕη]\displaystyle\small\begin{bmatrix}{\rho}\\ {\phi}\\ {\eta}\end{bmatrix} =h(RzT(γ+Δγ)[rcosωcosυrsinωcosυrsinυ])[ΔρΔϕΔη]\displaystyle=h\left(R_{z}^{T}(\gamma+\Delta\gamma)\begin{bmatrix}r\cos\omega\cos\upsilon\\ r\sin\omega\cos\upsilon\\ r\sin\upsilon\end{bmatrix}\right)-\begin{bmatrix}\Delta\rho\\ \Delta\phi\\ \Delta\eta\end{bmatrix}
=h([rcos(ωγΔγ)cosυrsin(ωγΔγ)cosυrsinυ])[ΔρΔϕΔη]\displaystyle=h\left(\begin{bmatrix}r\cos(\omega-\gamma-\Delta\gamma)\cos\upsilon\\ r\sin(\omega-\gamma-\Delta\gamma)\cos\upsilon\\ r\sin\upsilon\end{bmatrix}\right)-\begin{bmatrix}\Delta\rho\\ \Delta\phi\\ \Delta\eta\end{bmatrix}
=[rΔρωγΔγΔϕυΔη].\displaystyle=\begin{bmatrix}r-\Delta\rho\\ \omega-\gamma-\Delta\gamma-\Delta\phi\\ \upsilon-\Delta\eta\end{bmatrix}.

This completes the proof.

Appendix B Proof of Theorem 1

In this appendix, we use notation Δϑ¯\Delta\bar{\bm{\vartheta}} to denote the true value of the corresponding bias and define 𝒙(Δϑ)=[cosΔϑ1,sinΔϑ1,,cosΔϑM,sinΔϑM]T\bm{x}(\Delta{\bm{\vartheta}})=[\cos\Delta{\vartheta}_{1},\sin\Delta{\vartheta}_{1},\dots,\cos\Delta{\vartheta}_{M},\sin\Delta{\vartheta}_{M}]^{T}. Also, for notational simplicity, we ignore the subscript ϑ\vartheta of 𝑯ϑ\bm{H}_{\vartheta} and 𝒄ϑ\bm{c}_{\vartheta}. Since the objective function in (15) is a quadratic function whose gradient is Lipschitz continuous and the constraint set 𝒞\mathcal{C} is a smooth compact manifold, then by [31, Corollary 2], for a sufficiently large penalty parameter ρ\rho, the sequence {𝒙t,𝒛t,𝝀t}\left\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\right\} generated by Algorithm 1 has at least one limit point and any limit point is a stationary point of the augmented Lagrangian function Lρ(𝒙,𝒛,𝝀)L_{\rho}(\bm{x},\bm{z},\bm{\lambda}). This completes the first statement in Theorem 1.

Next, we prove the second statement. From the definition of Lρ(𝒙,𝒛,𝝀)L_{\rho}(\bm{x},\bm{z},\bm{\lambda}) in (18), its stationary point (𝒙,𝒛,𝝀)(\bm{x},\bm{z},\bm{\lambda}) is defined as

Lρ𝒙=2𝑯T𝑸𝑯𝒙+2𝑯T𝑸𝒄+𝝀+ρ𝒙ρ𝒛=𝟎,\displaystyle\frac{\partial L_{\rho}}{\partial\bm{x}}=2\bm{H}^{T}\bm{Q}\bm{H}\bm{x}+2\bm{H}^{T}\bm{Q}\bm{c}+\bm{\lambda}+\rho\bm{x}-\rho\bm{z}=\bm{0}, (21a)
𝟎Lρ𝒛=𝝀+ρ𝒛ρ𝒙+ι𝒞(𝒛),\displaystyle\bm{0}\in\frac{\partial L_{\rho}}{\partial\bm{z}}=-\bm{\lambda}+\rho\bm{z}-\rho\bm{x}+\partial\iota_{\mathcal{C}}(\bm{z}), (21b)
Lρ𝝀=𝒙𝒛=𝟎.\displaystyle\frac{\partial L_{\rho}}{\partial\bm{\lambda}}=\bm{x}-\bm{z}=\bm{0}. (21c)

In the above, ι𝒞(𝒛)\partial\iota_{\mathcal{C}}(\bm{z}) denotes the Fréchet subdifferential [42], and by Proposition 4 below, we have ι𝒞(𝒛)=𝚲𝒛\partial\iota_{\mathcal{C}}(\bm{z})=\bm{\Lambda}\bm{z}, where 𝚲=Diag(𝜼)𝑰2\bm{\Lambda}=\textrm{Diag}(\bm{\eta})\otimes\bm{I}_{2} and 𝜼M\bm{\eta}\in\mathbb{R}^{M}. Then, (21) can be simplified as follows:

𝑨𝒙+𝒃+𝚲𝒙=𝟎,𝒙𝒞.\bm{A}\bm{x}+\bm{b}+\bm{\Lambda}\bm{x}=\bm{0},\ \bm{x}\in\mathcal{C}. (22)

where 𝑨=2𝑯T𝑸𝑯\bm{A}=2\bm{H}^{T}\bm{Q}\bm{H} and 𝒃=2𝑯T𝑸𝒄\bm{b}=2\bm{H}^{T}\bm{Q}\bm{c}. By (20), we have 𝑯=𝑯¯+Δ𝑯\bm{H}=\bar{\bm{H}}+\Delta\bm{H} and 𝒄=𝒄¯+Δ𝒄\bm{c}=\bar{\bm{c}}+\Delta\bm{c}.

Express 𝒙\bm{x} by Δϑ\Delta\bm{\vartheta}, i.e., 𝒙=𝒙(Δϑ)\bm{x}=\bm{x}(\Delta\bm{\vartheta}). Then the constraint 𝒙𝒞\bm{x}\in\mathcal{C} in (22) can be eliminated and the left-hand side of (22) can be regarded as a function with respect to Δ𝑯\Delta\bm{H}, Δ𝒄\Delta\bm{c}, Δϑ\Delta\bm{\vartheta}, and 𝜼\bm{\eta}, denoted as f(Δ𝑯,Δ𝒄,Δϑ,𝜼)f(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta}); see (24) further ahead. Thus, (22) is further re-expressed as

f(Δ𝑯,Δ𝒄,Δϑ,𝜼)=𝟎.f(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta})=\bm{0}.

Then, by Proposition 5 below, there exist two continuous functions dϑ(Δ𝑯,Δ𝒄)d_{\vartheta}(\Delta\bm{H},\Delta\bm{c}) and dη(Δ𝑯,Δ𝒄)d_{\eta}(\Delta\bm{H},\Delta\bm{c}) in the neighborhood of (Δ𝑯,Δ𝒄)=(𝟎,𝟎)(\Delta\bm{H},\Delta\bm{c})=(\bm{0},\bm{0}) and these two functions are uniquely defined in this neighborhood such that

f(Δ𝑯,Δ𝒄,Δϑ~,𝜼~)=𝟎f(\Delta\bm{H},\Delta\bm{c},\Delta\tilde{\bm{\vartheta}},\tilde{\bm{\eta}})=\bm{0}

holds with Δϑ~=dϑ(Δ𝑯,Δ𝒄)\Delta\tilde{\bm{\vartheta}}=d_{\vartheta}(\Delta\bm{H},\Delta\bm{c}) and 𝜼~=dη(Δ𝑯,Δ𝒄)\tilde{\bm{\eta}}=d_{\eta}(\Delta\bm{H},\Delta\bm{c}). The uniqueness of these two functions implies that, for any sufficiently small (Δ𝑯,Δ𝒄)(\Delta\bm{H},\Delta\bm{c}), (Δϑ~,𝜼~)(\Delta\tilde{\bm{\vartheta}},\tilde{\bm{\eta}}) is the only solution (depending on (Δ𝑯,Δ𝒄)(\Delta\bm{H},\Delta\bm{c})) such that (22) holds. Hence, by the equivalence between (21) and (22),

(𝒙¯,𝒛¯,𝝀¯)=(𝒙(Δϑ~),𝒙(Δϑ~),[Diag(𝜼~)𝑰2]𝒙(Δϑ~))(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}})=\left(\bm{x}(\Delta\tilde{\bm{\vartheta}}),\bm{x}(\Delta\tilde{\bm{\vartheta}}),[\textrm{Diag}(\tilde{\bm{\eta}})\otimes\bm{I}_{2}]\bm{x}(\Delta\tilde{\bm{\vartheta}})\right)

is the unique stationary point of Lρ(𝒙,𝒛,𝝀)L_{\rho}(\bm{x},\bm{z},\bm{\lambda}) (for sufficiently small (Δ𝑯,Δ𝒄)(\Delta\bm{H},\Delta\bm{c})). Furthermore, combining with [31, Corollary 2], we know that (𝒙¯,𝒛¯,𝝀¯)(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}) is the unique limit point of the sequence {𝒙t,𝒛t,𝝀t}\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\}. By [31, Lemma 6], {𝒙t,𝒛t,𝝀t}\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\} is bounded, and hence every subsequence of {𝒙t,𝒛t,𝝀t}\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\} converges (𝒙¯,𝒛¯,𝝀¯).(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}). Otherwise, there exists some subsequence of {𝒙t,𝒛t,𝝀t}\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\} converging to a point which is different from (𝒙¯,𝒛¯,𝝀¯)(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}) and this contradicts the fact that (𝒙¯,𝒛¯,𝝀¯)(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}) is the unique limit point. As such, the entire sequence {𝒙t,𝒛t,𝝀t}\{\bm{x}^{t},\bm{z}^{t},\bm{\lambda}^{t}\} converges to (𝒙¯,𝒛¯,𝝀¯)(\bar{\bm{x}},\bar{\bm{z}},\bar{\bm{\lambda}}).

Finally, if Δ𝑯=𝟎\Delta\bm{H}=\bm{0} and Δ𝒄=𝟎\Delta\bm{c}=\bm{0}, f(𝟎,𝟎,Δϑ¯,𝟎)f(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0}) corresponds to the gradient of 𝑯¯𝒙+𝒄¯𝑸2\|\bar{\bm{H}}\bm{x}+\bar{\bm{c}}\|_{\bm{Q}}^{2} with respect to 𝒙\bm{x} at point 𝒙(Δϑ¯)\bm{x}(\Delta\bar{\bm{\vartheta}}). Then, by Proposition 6, it is straightforward to check that f(𝟎,𝟎,Δϑ¯,𝟎)=𝟎f(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0})=\bm{0}, where Δϑ¯\Delta\bar{\bm{\vartheta}} corresponds to the true bias. Therefore, the unique stationary point 𝒙¯\bar{\bm{x}} is 𝒙(Δϑ¯)\bm{x}(\Delta\bar{\bm{\vartheta}}). This completes the proof the second statement of Theorem 1.

Proposition 4.

For the indicator function ι𝒞()\iota_{\mathcal{C}}(\cdot) defined in (17), its Fréchet subdifferential at point 𝐳\bm{z} is

ι𝒞(𝒛)={𝚲𝒛𝚲=Diag(𝜼)𝑰2,𝜼M}.\partial\iota_{\mathcal{C}}(\bm{z})=\left\{\bm{\Lambda}\bm{z}\mid\bm{\Lambda}=\textrm{Diag}(\bm{\eta})\otimes\bm{I}_{2},\bm{\eta}\in\mathbb{R}^{M}\right\}.
Proof.

We first reformulate the indicator function ι𝒞()\iota_{\mathcal{C}}(\cdot) as ι𝒞(𝒛)=m=1Mι(𝒛2m1,𝒛2m),\iota_{\mathcal{C}}(\bm{z})=\sum_{m=1}^{M}\iota(\bm{z}_{2m-1},\bm{z}_{2m}), where

ι(𝒖)={0,if 𝒖𝒰{𝒖2u12+u22=1},,otherwise.\iota(\bm{u})=\left\{\begin{aligned} 0,\ &\textrm{if }\bm{u}\in\mathcal{U}\triangleq\{\bm{u}\in\mathbb{R}^{2}\mid u_{1}^{2}+u_{2}^{2}=1\},\\ \infty,\ &\textrm{otherwise}.\end{aligned}\right. (23)

Then, the Fréchet subdifferential ι()\partial\iota(\cdot) of ι()\iota(\cdot) at point 𝒖2\bm{u}\in\mathbb{R}^{2} is the Fréchet normal cone of set 𝒰\mathcal{U} [42, Proposition 1.18], which is defined as

Cι(𝒖)={𝒈2lim inf𝒖𝒰𝒖𝒈T(𝒖𝒖)𝒖𝒖0},\displaystyle C\iota(\bm{u})=\left\{\bm{g}\in\mathbb{R}^{2}\mid\liminf_{\bm{u}^{\prime}\stackrel{{\scriptstyle\mathcal{U}}}{{\rightarrow}}\bm{u}}\frac{-\bm{g}^{T}(\bm{u}^{\prime}-\bm{u})}{\|\bm{u}^{\prime}-\bm{u}\|}\geq 0\right\},

where the notation 𝒖𝒰𝒖\bm{u}^{\prime}\stackrel{{\scriptstyle\mathcal{U}}}{{\rightarrow}}\bm{u} represents 𝒖𝒖\bm{u}^{\prime}\rightarrow\bm{u} with 𝒖,𝒖𝒰\bm{u},\bm{u}^{\prime}\in\mathcal{U}. By expressing 𝒖=(cosθ,sinθ)\bm{u}=(\cos\theta,\sin\theta) and 𝒖=(cosθ,sinθ)\bm{u}^{\prime}=(\cos\theta^{\prime},\sin\theta^{\prime}) with θ,θ[π,π]\theta,\theta^{\prime}\in[-\pi,\pi], we have

lim𝒖𝒰𝒖𝒖𝒖𝒖𝒖\displaystyle\lim_{\bm{u}^{\prime}\stackrel{{\scriptstyle\mathcal{U}}}{{\rightarrow}}\bm{u}}\frac{\bm{u}^{\prime}-\bm{u}}{\|\bm{u}^{\prime}-\bm{u}\|} =limθθ(cosθcosθ,sinθsinθ)(cosθcosθ)2+(sinθsinθ)2\displaystyle=\lim_{\theta^{\prime}\rightarrow\theta}\frac{(\cos\theta^{\prime}-\cos\theta,\sin\theta^{\prime}-\sin\theta)}{\sqrt{(\cos\theta^{\prime}-\cos\theta)^{2}+(\sin\theta^{\prime}-\sin\theta)^{2}}}
=(i)limθθ(cosθ,sinθ)(cosθ,sinθ)22cos(θθ)\displaystyle\overset{(i)}{=}\lim_{\theta^{\prime}\rightarrow\theta}\frac{(\cos\theta^{\prime},\sin\theta^{\prime})-(\cos\theta,\sin\theta)}{\sqrt{2-2\cos(\theta^{\prime}-\theta)}}
=(ii){limθθ+(cosθcosθ,sinθsinθ)2sin(θθ2)limθθ(cosθcosθ,sinθsinθ)2sin(θθ2)\displaystyle\overset{(ii)}{=}\left\{\begin{aligned} \lim_{\theta^{\prime}\rightarrow\theta^{+}}\frac{(\cos\theta^{\prime}-\cos\theta,\sin\theta^{\prime}-\sin\theta)}{2\sin(\frac{\theta^{\prime}-\theta}{2})}\\ \lim_{\theta^{\prime}\rightarrow\theta^{-}}\frac{(\cos\theta^{\prime}-\cos\theta,\sin\theta^{\prime}-\sin\theta)}{-2\sin(\frac{\theta^{\prime}-\theta}{2})}\end{aligned}\right.
=(iii)(sinθ,±cosθ)=±[0110]𝒖,\displaystyle\overset{(iii)}{=}(\mp\sin\theta,\pm\cos\theta)=\pm\begin{bmatrix}0&-1\\ 1&0\end{bmatrix}\bm{u},

where θθ+\theta^{\prime}\rightarrow\theta^{+} (or θ\theta^{-}) denotes that θ\theta^{\prime} approaches θ\theta from θ>θ\theta^{\prime}>\theta (or θ<θ\theta^{\prime}<\theta). In the above, (i) is due to cos(αβ)=cosαcosβ+sinαsinβ\cos(\alpha-\beta)=\cos\alpha\cos\beta+\sin\alpha\sin\beta, (ii) is by 2sinα2=±1cosα\sqrt{2}\sin\frac{\alpha}{2}=\pm\sqrt{1-\cos\alpha}, and (iii) is by the L’Höpital’s rule. Hence,

lim inf𝒖𝒰𝒖𝒈T(𝒖𝒖)𝒖𝒖0lim𝒖𝒰𝒖𝒈T(𝒖𝒖)𝒖𝒖=0\displaystyle\liminf_{\bm{u}^{\prime}\stackrel{{\scriptstyle\mathcal{U}}}{{\rightarrow}}\bm{u}}\frac{-\bm{g}^{T}(\bm{u}^{\prime}-\bm{u})}{\|\bm{u}^{\prime}-\bm{u}\|}\geq 0\Leftrightarrow\lim_{\bm{u}^{\prime}\stackrel{{\scriptstyle\mathcal{U}}}{{\rightarrow}}\bm{u}}\frac{-\bm{g}^{T}(\bm{u}^{\prime}-\bm{u})}{\|\bm{u}^{\prime}-\bm{u}\|}=0
𝒈T[0110]𝒖=0𝒈=η𝒖,η,\displaystyle\Leftrightarrow\bm{g}^{T}\begin{bmatrix}0&-1\\ 1&0\end{bmatrix}\bm{u}=0\Leftrightarrow\bm{g}=\eta\bm{u},\eta\in\mathbb{R},

and ι(𝒖)={η𝒖2η}.\partial\iota(\bm{u})=\{\eta\bm{u}\in\mathbb{R}^{2}\mid\eta\in\mathbb{R}\}.

For ι𝒞(𝒛)=m=1Mι(𝒛2m1,𝒛2m)\iota_{\mathcal{C}}(\bm{z})=\sum_{m=1}^{M}\iota(\bm{z}_{2m-1},\bm{z}_{2m}), denoting

𝒖m=[𝒛2m1,𝒛2m]T,m=1,2,,M,\bm{u}_{m}=[\bm{z}_{2m-1},\bm{z}_{2m}]^{T},m=1,2,\ldots,M,

we have

ι𝒞(𝒛)=\displaystyle\partial\iota_{\mathcal{C}}(\bm{z})= (ι(𝒖1),ι(𝒖2),,ι(𝒖M))\displaystyle(\partial\iota(\bm{u}_{1}),\partial\iota(\bm{u}_{2}),\ldots,\partial\iota(\bm{u}_{M}))
=\displaystyle= {𝚲𝒛𝚲=Diag(𝜼)𝑰2,𝜼M},\displaystyle\left\{\bm{\Lambda}\bm{z}\mid\bm{\Lambda}=\textrm{Diag}(\bm{\eta})\otimes\bm{I}_{2},\bm{\eta}\in\mathbb{R}^{M}\right\},

where \otimes is the Kronecker product. ∎

Proposition 5.

Express 𝐱\bm{x} by Δϑ\Delta\bm{\vartheta}, i.e., 𝐱=𝐱(Δϑ)\bm{x}=\bm{x}(\Delta\bm{\vartheta}), then (22)\eqref{appeq:spKKT} can be expressed as f(Δ𝐇,Δ𝐜,Δϑ,𝛈)=𝟎f(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta})=\bm{0}, where

f(Δ𝑯,Δ𝒄,Δϑ,𝜼)=\displaystyle f(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta})= 2(𝑯¯+Δ𝑯)T𝑸(𝑯¯+Δ𝑯)𝒙(Δϑ)\displaystyle 2(\bar{\bm{H}}+\Delta\bm{H})^{T}\bm{Q}(\bar{\bm{H}}+\Delta\bm{H})\bm{x}(\Delta\bm{\vartheta}) (24)
+2(𝑯¯+Δ𝑯)T𝑸(𝒄¯+Δ𝒄)\displaystyle+2(\bar{\bm{H}}+\Delta\bm{H})^{T}\bm{Q}(\bar{\bm{c}}+\Delta\bm{c})
+[Diag(𝜼)𝑰2]𝒙(Δϑ).\displaystyle+[\textrm{Diag}(\bm{\eta})\otimes\bm{I}_{2}]\bm{x}(\Delta\bm{\vartheta}).

Further, there exist two continuous functions dϑ(Δ𝐇,Δ𝐜)d_{\vartheta}(\Delta\bm{H},\Delta\bm{c}) and dη(Δ𝐇,Δ𝐜)d_{\eta}(\Delta\bm{H},\Delta\bm{c}) in the neighborhood of (Δ𝐇,Δ𝐜)=(𝟎,𝟎)(\Delta\bm{H},\Delta\bm{c})=(\bm{0},\bm{0}) and these two functions are uniquely defined in this neighborhood such that f(Δ𝐇,Δ𝐜,dϑ(Δ𝐇,Δ𝐜),dη(Δ𝐇,Δ𝐜))=𝟎f(\Delta\bm{H},\Delta\bm{c},d_{\vartheta}(\Delta\bm{H},\Delta\bm{c}),d_{\eta}(\Delta\bm{H},\Delta\bm{c}))=\bm{0}.

Proof.

By (20), we have 𝑯=𝑯¯+Δ𝑯\bm{H}=\bar{\bm{H}}+\Delta\bm{H} and 𝒄=𝒄¯+Δ𝒄\bm{c}=\bar{\bm{c}}+\Delta\bm{c}. From Proposition 6 below, we know that (22) holds for Δ𝑯=𝟎,Δ𝒄=𝟎,𝜼=𝟎\Delta\bm{H}=\bm{0},\Delta\bm{c}=\bm{0},\bm{\eta}=\bm{0}, and 𝒙=𝒙(Δϑ¯)\bm{x}=\bm{x}(\Delta\bar{\bm{\vartheta}}). This is equivalent to f(𝟎,𝟎,Δϑ¯,𝟎)=𝟎f(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0})=\bm{0}. Moreover, by [43, Theorem 9.3], if the Jacobian matrix 𝑫Δϑ,𝜼\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}} of f(Δ𝑯,Δ𝒄,Δϑ,𝜼)f(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta}) with respect to Δϑ\Delta\bm{\vartheta} and 𝜼\bm{\eta} is invertible at point (Δ𝑯,Δ𝒄,Δϑ,𝜼)=(𝟎,𝟎,Δϑ¯,𝟎)(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta})=(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0}), then Proposition 5 holds. Next, we show that the Jacobian matrix 𝑫Δϑ,𝜼\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}} at (𝟎,𝟎,Δϑ¯,𝟎)(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0}) is indeed invertible.

By (24), 𝑫Δϑ,𝜼\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}} is given as

𝑫Δϑ,𝜼=[f𝜼fΔϑ]=[𝑺𝑮𝑪]=[𝑰2M𝑮][𝑺𝟎𝟎𝑪],\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}}=\left[\frac{\partial f}{\partial\bm{\eta}}\ \frac{\partial f}{\partial\Delta\bm{\vartheta}}\right]=\left[\bm{S}\ \bm{G}\bm{C}\right]=\begin{bmatrix}\bm{I}_{2M}&\bm{G}\end{bmatrix}\begin{bmatrix}\bm{S}&\bm{0}\\ \bm{0}&\bm{C}\end{bmatrix},

where

𝑮=𝑯T𝑸𝑯+[Diag(𝜼)𝑰2],\bm{G}=\bm{H}^{T}\bm{Q}\bm{H}+[\textrm{Diag}(\bm{\eta})\otimes\bm{I}_{2}],
𝑺=blkdiag(𝒔1,𝒔2,,𝒔M),𝒔m=[cosΔϑm,sinΔϑm]T,\bm{S}=\textrm{blkdiag}(\bm{s}_{1},\bm{s}_{2},\ldots,\bm{s}_{M}),\bm{s}_{m}=[\cos\Delta{\vartheta}_{m},\sin\Delta{\vartheta}_{m}]^{T},
𝑪=blkdiag(𝒄1,𝒄2,,𝒄M),𝒄m=[sinΔϑm,cosΔϑm]T.\bm{C}=\textrm{blkdiag}(\bm{c}_{1},\bm{c}_{2},\ldots,\bm{c}_{M}),\bm{c}_{m}=[-\sin\Delta{\vartheta}_{m},\cos\Delta{\vartheta}_{m}]^{T}.

Hence, at point (Δ𝑯,Δ𝒄,Δϑ,𝜼)=(𝟎,𝟎,Δϑ¯,𝟎)(\Delta\bm{H},\Delta\bm{c},\Delta\bm{\vartheta},\bm{\eta})=(\bm{0},\bm{0},\Delta\bar{\bm{\vartheta}},\bm{0}), we have

rank([𝑰2M𝑮])=2M,rank([𝑺𝟎𝟎𝑪])=2M,\textrm{rank}\left(\begin{bmatrix}\bm{I}_{2M}&\bm{G}\end{bmatrix}\right)=2M,\ \textrm{rank}\left(\begin{bmatrix}\bm{S}&\bm{0}\\ \bm{0}&\bm{C}\end{bmatrix}\right)=2M,

which, together with Sylvester’s rank inequality [44], immediately implies rank(𝑫Δϑ,𝜼)=2M\textrm{rank}(\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}})=2M and thus 𝑫Δϑ,𝜼\bm{D}_{\Delta\bm{\vartheta},\bm{\eta}} is invertible. This completes the proof. ∎

Proposition 6.

If Δ𝐇ϑ=𝟎\Delta\bm{H}_{\vartheta}=\bm{0} and Δ𝐜ϑ=𝟎\Delta\bm{c}_{\vartheta}=\bm{0}, then we have 𝐇ϑ𝐱(Δϑ¯)+𝐜ϑ=𝟎{\bm{H}}_{\vartheta}{\bm{x}}(\Delta\bar{\bm{\vartheta}})+{\bm{c}}_{\vartheta}=\bm{0}, where Δϑ¯\Delta\bar{\bm{\vartheta}} is the true value of the corresponding bias.

Proof.

According to the definitions of Δ𝑯ϑ\Delta\bm{H}_{\vartheta} and Δ𝒄ϑ\Delta\bm{c}_{\vartheta} and the derivations in Appendix C, we know that Δ𝑯ϑ=𝟎\Delta\bm{H}_{\vartheta}=\bm{0} and Δ𝒄ϑ=𝟎\Delta\bm{c}_{\vartheta}=\bm{0} hold when 𝒘k=𝟎,𝒏k=𝒏˙k=𝟎\bm{w}_{k}=\bm{0},\bm{n}_{k}=\dot{\bm{n}}_{k}=\bm{0}, for all kk, λη=λϕ=1\lambda_{\eta}=\lambda_{\phi}=1, and all the other biases ΩΔϑ\Omega\setminus\Delta\bm{\vartheta} and velocity 𝒗\bm{v} take the true values. In this case, the constructed measurement equations in (7) become

𝟎\displaystyle\bm{0} =gk+1(𝜽sk+1)gk(𝜽sk)Tk𝝃˙k,k,\displaystyle=g_{k+1}(\bm{\theta}_{s_{k+1}})-g_{k}(\bm{\theta}_{s_{k}})-T_{k}\dot{\bm{\xi}}_{k},\ \forall\,k,
𝟎\displaystyle\bm{0} =𝝃˙k+1𝝃˙k,k,\displaystyle=\dot{\bm{\xi}}_{k+1}-\dot{\bm{\xi}}_{k},\ \forall\,k,

where 𝜽m,m=1,2,,M\bm{\theta}_{m},m=1,2,\ldots,M, take the true bias values. Since problem (15) is an equivalent reformulation of problem (8) with respect to one type of bias Δϑ\Delta\bm{\vartheta}, then the above equation can also be reformulated with respect to Δϑ\Delta\bm{\vartheta} as below

𝑯ϑ𝒙(Δϑ¯)+𝒄ϑ=𝟎,{\bm{H}}_{\vartheta}{\bm{x}}(\Delta\bar{\bm{\vartheta}})+{\bm{c}}_{\vartheta}=\bm{0},

where Δϑ¯\Delta\bar{\bm{\vartheta}} corresponds to the true bias value. This completes the proof. ∎

References

  • [1] S. Jiang, W. Pu, and Z.-Q. Luo, “Optimal asynchronous multi-sensor registration in 3 dimensions,” in Proc. of IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018, pp. 81–85.
  • [2] Y. Bar-Shalom, “Multitarget-Multisensor Tracking: Advanced Applications,” Norwood, MA, Artech House, 1990.
  • [3] J. Yan, W. Pu, S. Zhou, H. Liu, and M. S. Greco, “Optimal resource allocation for asynchronous multiple targets tracking in heterogeneous radar networks,” IEEE Transactions on Signal Processing, vol. 68, pp. 4055–4068, 2020.
  • [4] W. L. Fischer, C. E. Muehe, and A. G. Cameron, “Registration errors in a netted air surveillance system,” DTIC Document, Tech. Rep., 1980.
  • [5] M. P. Dana, “Registration: A prerequisite for multiple sensor tracking,” Multitarget-Multisensor Tracking: Advanced Applications, 1990.
  • [6] S. Fortunati, F. Gini, A. Farina, and A. Graziano, “On the application of the expectation-maximisation algorithm to the relative sensor registration problem,” IET Radar Sonar Navigation, vol. 7, no. 2, pp. 191–203, Feb. 2013.
  • [7] S. Fortunati, A. Farina, F. Gini, A. Graziano, M. S. Greco, and S. Giompapa, “Least squares estimation and Cramér-Rao type lower bounds for relative sensor registration process,” IEEE Transactions on Signal Processing, vol. 59, no. 3, pp. 1075–1087, Mar. 2011.
  • [8] H. Leung, M. Blanchette, and C. Harrison, “A least squares fusion of multiple radar data,” in Proceedings of RADAR, vol. 94, 1994, pp. 364–369.
  • [9] D. C. Cowley and B. Shafai, “Registration in multi-sensor data fusion and tracking,” in Proc. 1993 American Control Conference, June 1993, pp. 875–879.
  • [10] Y. Zhou, H. Leung, and P. C. Yip, “An exact maximum likelihood registration algorithm for data fusion,” IEEE Transactions on Signal Processing, vol. 45, no. 6, pp. 1560–1573, Jun. 1997.
  • [11] B. Ristic and N. Okello, “Sensor registration in ECEF coordinates using the MLR algorithm,” in Proc. of the Sixth International Conference of Information Fusion, 2003, pp. 135–142.
  • [12] R. E. Helmick and T. R. Rice, “Removal of alignment errors in an integrated system of two 3-D sensors,” IEEE Transactions on Aerospace and Electronic Systems, vol. 29, no. 4, pp. 1333–1343, Oct. 1993.
  • [13] N. Nabaa and R. H. Bishop, “Solution to a multisensor tracking problem with sensor registration errors,” IEEE Transactions on Aerospace and Electronic Systems, vol. 35, no. 1, pp. 354–363, Jan. 1999.
  • [14] N. N. Okello and S. Challa, “Joint sensor registration and track-to-track fusion for distributed trackers,” IEEE Transactions on Aerospace and Electronic Systems, vol. 40, no. 3, pp. 808–823, July 2004.
  • [15] X. Lin and Y. Bar-Shalom, “Multisensor target tracking performance with bias compensation,” IEEE Transactions on Aerospace and Electronic Systems, vol. 42, no. 3, pp. 1139–1149, July 2006.
  • [16] Z. Li, S. Chen, H. Leung, and E. Bosse, “Joint data association, registration, and fusion using EM-KF,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 2, pp. 496–507, April 2010.
  • [17] E. Taghavi, R. Tharmarasa, T. Kirubarajan, Y. Bar-Shalom, and M. Mcdonald, “A practical bias estimation algorithm for multisensor-multitarget tracking,” IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 1, pp. 2–19, February 2016.
  • [18] A. Zia, T. Kirubarajan, J. P. Reilly, D. Yee, K. Punithakumar, and S. Shirani, “An EM algorithm for nonlinear state estimation with model uncertainties,” IEEE Transactions on Signal Processing, vol. 56, no. 3, pp. 921–936, Mar. 2008.
  • [19] D. F. Crouse, Y. Bar-Shalom, and P. Willett, “Sensor bias estimation in the presence of data association uncertainty,” in Proc. of Signal and Data Processing of Small Targets 2009, vol. 7445.   SPIE, 2009, pp. 281–302.
  • [20] R. Mahler and A. El-Fallah, “Bayesian unified registration and tracking,” in Proc. of SPIE Defense, Security, and Sensing.   International Society for Optics and Photonics, 2011.
  • [21] B. Ristic, D. E. Clark, and N. Gordon, “Calibration of multi-target tracking algorithms using non-cooperative targets,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 3, pp. 390–398, June 2013.
  • [22] W. Pu, Y.-F. Liu, J. Yan, S. Zhou, H. Liu, and Z.-Q. Luo, “A two-stage optimization approach to the asynchronous multi-sensor registration problem,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017, pp. 3271–3275.
  • [23] W. Pu, Y.-F. Liu, J. Yan, H. Liu, and Z.-Q. Luo, “Optimal estimation of sensor biases for asynchronous multi-sensor data fusion,” Mathematical Programming, vol. 170, no. 1, pp. 357–386, 2018.
  • [24] X. Song, Y. Zhou, and Y. Bar-Shalom, “Unbiased converted measurements for tracking,” IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, pp. 1023–1027, 1998.
  • [25] X. R. Li and V. P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 4, pp. 1333–1364, 2004.
  • [26] Z. Duan, C. Han, and X. R. Li, “Comments on” unbiased converted measurements for tracking”,” IEEE transactions on aerospace and electronic systems, vol. 40, no. 4, p. 1374, 2004.
  • [27] S. Bordonaro, P. Willett, and Y. Bar-Shalom, “Decorrelated unbiased converted measurement kalman filter,” IEEE Transactions on Aerospace and Electronic Systems, vol. 50, no. 2, pp. 1431–1444, 2014.
  • [28] Z.-Q. Luo and T.-H. Chang, “SDP relaxation of homogeneous quadratic optimization: Approximation bounds and applications,” in Convex Optimization in Signal Processing and Communications, 2010, pp. 117–165.
  • [29] Z.-Q. Luo, W.-K. Ma, A. M.-C. So, Y. Ye, and S. Zhang, “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Processing Magazine, vol. 27, no. 3, pp. 20–34, May 2010.
  • [30] C. Lu, Y.-F. Liu, W.-Q. Zhang, and S. Zhang, “Tightness of a new and enhanced semidefinite relaxation for MIMO detection,” SIAM Journal on Optimization, vol. 29, no. 1, pp. 719–742, 2019.
  • [31] Y. Wang, W. Yin, and J. Zeng, “Global convergence of ADMM in nonconvex nonsmooth optimization,” Journal of Scientific Computing, vol. 78, no. 1, pp. 29–63, 2019.
  • [32] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized gauss–seidel methods,” Mathematical Programming, vol. 137, no. 1, pp. 91–129, 2013.
  • [33] D. P. Bertsekas, Nonlinear Programming (Third Edition).   Athena Scientific Belmont, 1999.
  • [34] S. J. Wright, “Coordinate descent algorithms,” Mathematical Programming, vol. 151, no. 1, pp. 3–34, 2015.
  • [35] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 1126–1153, 2013.
  • [36] A. Aubry, A. De Maio, A. Zappone, M. Razaviyayn, and Z.-Q. Luo, “A new sequential optimization procedure and its applications to resource allocation for wireless systems,” IEEE Transactions on Signal Processing, vol. 66, no. 24, pp. 6518–6533, 2018.
  • [37] Y. Zhou, “A Kalman filter based registration approach for asynchronous sensors in multiple sensor fusion applications,” in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2.   IEEE, May 2004, pp. ii–293.
  • [38] Y. Noam and H. Messer, “Notes on the tightness of the hybrid cramer-rao lower bound,” IEEE Transactions on Signal Processing, vol. 57, no. 6, pp. 2074–2084, June 2009.
  • [39] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.
  • [40] Y. Bar-Shalom, T. E. Fortmann, and P. G. Cable, Tracking and Data Association.   Acoustical Society of America, 1990.
  • [41] F. Meyer, T. Kropfreiter, J. L. Williams, R. Lau, F. Hlawatsch, P. Braca, and M. Z. Win, “Message passing algorithms for scalable multitarget tracking,” Proceedings of the IEEE, vol. 106, no. 2, pp. 221–259, 2018.
  • [42] A. Y. Kruger, “On Fréchet subdifferentials,” Journal of Mathematical Sciences, vol. 116, no. 3, pp. 3325–3358, 2003.
  • [43] L. H. Loomis and S. Sternberg, Advanced Calculus.   Reading, Massachusetts: Addison–Wesley, 1968.
  • [44] R. A. Horn and C. R. Johnson, Matrix Analysis.   Cambridge University Press, 1985.

Supplementary Material

Appendix C Derivations of Subproblems (10a)–(10f)

Denote 𝑸=blkdiag(𝑸1,𝑸2,,𝑸K),\bm{Q}=\textrm{blkdiag}(\bm{Q}_{1},\bm{Q}_{2},\ldots,\bm{Q}_{K}), where blkdiag()\textrm{blkdiag}(\cdot) is the diagonalizing operation for stacking matrices 𝑸1,,𝑸K\bm{Q}_{1},\ldots,\bm{Q}_{K} as a block diagonal matrix. Also, for ΔϑM\Delta{\bm{\vartheta}}\in\mathbb{R}^{M}, define vector 𝒙(Δϑ)=[cosΔϑ1,sinΔϑ1,,cosΔϑM,sinΔϑM]T\bm{x}(\Delta{\bm{\vartheta}})=[\cos\Delta{\vartheta}_{1},\sin\Delta{\vartheta}_{1},\dots,\cos\Delta{\vartheta}_{M},\sin\Delta{\vartheta}_{M}]^{T}.

C-A Derivation for Subproblem (10a)

With all biases being fixed, problem (8) with respect to velocities is reformulated as follows:

min{𝝃˙k}k=1K1𝒄v,k+𝑯v,k[𝝃˙k𝝃˙k+1]𝑸k2,\min\limits_{\{\dot{\bm{\xi}}_{k}\}}\sum_{k=1}^{K-1}\left\|\bm{c}_{v,k}+\bm{H}_{v,k}\begin{bmatrix}\dot{\bm{\xi}}_{k}\\ \dot{\bm{\xi}}_{k+1}\end{bmatrix}\right\|_{\bm{Q}_{k}}^{2}, (25)

where

𝒄v,k=[gk+1(𝜽sk+1)gk(𝜽sk)𝟎],𝑯v,k=[Tk𝑰3𝟎𝑰3𝑰3].\bm{c}_{v,k}=\begin{bmatrix}g_{k+1}(\bm{\theta}_{s_{k+1}})-g_{k}(\bm{\theta}_{s_{k}})\\ \bm{0}\end{bmatrix},\ \bm{H}_{v,k}=\begin{bmatrix}-T_{k}\bm{I}_{3}&\bm{0}\\ -\bm{I}_{3}&\bm{I}_{3}\end{bmatrix}.

Compactly, defining 𝒗=[𝝃˙1,𝝃˙2,,𝝃˙K]T\bm{v}=[\dot{\bm{\xi}}_{1},\dot{\bm{\xi}}_{2},\ldots,\dot{\bm{\xi}}_{K}]^{T}, we then have an equivalent form for (25) as follows:

min𝒗𝑯v𝒗+𝒄v𝑸2,\min\limits_{\bm{v}}\|\bm{H}_{v}\bm{v}+\bm{c}_{v}\|_{\bm{Q}}^{2}, (26)

where 𝒄v=[𝒄v,1T,𝒄v,2T,,𝒄v,K1T]T\bm{c}_{v}=[\bm{c}_{v,1}^{T},\bm{c}_{v,2}^{T},\ldots,\bm{c}_{v,K-1}^{T}]^{T} and 𝑯v\bm{H}_{v} is a matrix with [𝑯v]6k5:6k,3k2:3k+3=𝑯v,k,k=1,2,,K1[\bm{H}_{v}]_{6k-5:6k,3k-2:3k+3}=\bm{H}_{v,k},\ k=1,2,\ldots,K-1, and all the other elements being zeros.

C-B Derivation for Subproblem (10b)

With velocities and all the other biases being fixed (except Δ𝝆\Delta\bm{\rho}), problem (8) with respect to Δ𝝆\Delta\bm{\rho} is reformulated as follows:

minΔ𝝆k=1K1\displaystyle\min_{\Delta\bm{\rho}}\ \sum_{k=1}^{K-1}\| 𝒉ρ,k+1Δρsk+1𝒉ρ,kΔρsk\displaystyle\bm{h}_{\rho,k+1}\Delta\rho_{s_{k+1}}-\bm{h}_{\rho,k}\Delta\rho_{s_{k}} (27)
+𝒄ρ,k+1𝒄ρ,k+𝒗ρ,k𝑸k2,\displaystyle+\bm{c}_{\rho,{k+1}}-\bm{c}_{\rho,k}+\bm{v}_{\rho,k}\|_{\bm{Q}_{k}}^{2},

where

𝒉ρ,k=\displaystyle\scriptsize\bm{h}_{\rho,k}= [𝒉¯ρ,kT,𝟎T]T,𝒄ρ,k=[𝒄¯ρ,kT,𝟎T]T,\displaystyle[\bar{\bm{h}}_{\rho,k}^{T},\bm{0}^{T}]^{T},\ \bm{c}_{\rho,k}=[\bar{\bm{c}}_{\rho,k}^{T},\bm{0}^{T}]^{T},
𝒗ρ,k=\displaystyle\bm{v}_{\rho,k}= [𝒑sk+1T𝒑skTTk𝝃˙kT,𝝃˙k+1T𝝃˙kT]T,\displaystyle[\bm{p}_{s_{k+1}}^{T}-\bm{p}_{s_{k}}^{T}-T_{k}\dot{\bm{\xi}}_{k}^{T},\dot{\bm{\xi}}_{k+1}^{T}-\dot{\bm{\xi}}_{k}^{T}]^{T},
𝒉¯ρ,k=R(𝜻sk+\displaystyle\bar{\bm{h}}_{\rho,k}=R(\bm{\zeta}_{s_{k}}+ Δ𝜻sk)[λϕ1λη1cosϕkcos(ηk+Δηsk)λϕ1λη1sinϕkcos(ηk+Δηsk)λη1sin(ηk+Δηsk)],\displaystyle\Delta\bm{\zeta}_{s_{k}})\left[\begin{matrix}\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\cos\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\sin\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\eta}^{-1}\sin(\eta_{k}+\Delta\eta_{s_{k}})\end{matrix}\right],
𝒄¯ρ,k=R(𝜻sk+\displaystyle\bar{\bm{c}}_{\rho,k}=R(\bm{\zeta}_{s_{k}}+ Δ𝜻sk)[λϕ1λη1ρkcosϕkcos(ηk+Δηsk)λϕ1λη1ρksinϕkcos(ηk+Δηsk)λη1ρksin(ηk+Δηsk)].\displaystyle\Delta\bm{\zeta}_{s_{k}})\left[\begin{matrix}\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\rho_{k}\cos\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\phi}^{-1}\lambda_{\eta}^{-1}\rho_{k}\sin\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\eta}^{-1}\rho_{k}\sin(\eta_{k}+\Delta\eta_{s_{k}})\end{matrix}\right].

Compactly, (27) can be equivalently formulated as

minΔ𝝆𝑯ρΔ𝝆+𝒄ρ𝑸2,\min\limits_{\Delta\bm{\rho}}\ \|\bm{H}_{\rho}\Delta\bm{\rho}+\bm{c}_{{\rho}}\|_{\bm{Q}}^{2}, (28)

where 𝑯ρ\bm{H}_{\rho} is a zero matrix except

[𝑯ρ]6k5:6k,m={𝒉ρ,k+1,if m=sk+1,𝒉ρ,k,if m=sk,[\bm{H}_{\rho}]_{6k-5:6k,m}=\left\{\begin{aligned} &\bm{h}_{\rho,k+1},\ \textrm{if }m=s_{k+1},\\ &-\bm{h}_{\rho,k},\ \textrm{if }m=s_{k},\end{aligned}\right.

and 𝒄ρ=[𝒄~ρ,1T,,𝒄~ρ,K1T]T\bm{c}_{\rho}=[\tilde{\bm{c}}_{\rho,1}^{T},\ldots,\tilde{\bm{c}}_{\rho,{K-1}}^{T}]^{T} with 𝒄~ρ,k=𝒄ρ,k+1𝒄ρ,k+𝒗ρ,k\tilde{\bm{c}}_{\rho,k}=\bm{c}_{\rho,{k+1}}-\bm{c}_{\rho,k}+\bm{v}_{\rho,k}.

C-C Derivation for Subproblems (10c)–(10f)

For notational simplicity, we firstly define 𝒖k\bm{u}_{k} as

𝒖k[λϕ1λη1(ρk+Δρsk)cosϕkcos(ηk+Δηsk)λϕ1λη1(ρk+Δρsk)sinϕkcos(ηk+Δηsk)λη1(ρk+Δρsk)sin(ηk+Δηsk)],\bm{u}_{k}\triangleq\left[\begin{matrix}\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\cos\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\sin\phi_{k}\cos(\eta_{k}+\Delta\eta_{s_{k}})\\ \lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\sin(\eta_{k}+\Delta\eta_{s_{k}})\end{matrix}\right],

which will be used in this subsection.

C-C1 Subproblem (10c)

With velocities and all other kinds of biases being fixed (except Δ𝜼\Delta\bm{\eta}), problem (8) with respect to Δ𝜼\Delta\bm{\eta} is reformulated as follows:

minΔ𝜼k=1K1\displaystyle\min_{\Delta\bm{\eta}}\ \sum_{k=1}^{K-1}\| 𝒉η,k+1ccosΔηsk+1+𝒉η,k+1ssinΔηsk+1\displaystyle\bm{h}_{\eta,k+1}^{c}\cos\Delta\eta_{s_{k+1}}+\bm{h}_{\eta,k+1}^{s}\sin\Delta\eta_{s_{k+1}}- (29)
𝒉η,kccosΔηsk𝒉η,kssinΔηsk+𝒄η,k𝑸k2,\displaystyle\bm{h}_{\eta,k}^{c}\cos\Delta\eta_{s_{k}}-\bm{h}_{\eta,k}^{s}\sin\Delta\eta_{s_{k}}+\bm{c}_{\eta,k}\|_{\bm{Q}_{k}}^{2},

where

𝒉η,kc=[(𝒉¯η,kc)T,𝟎T]T,𝒉η,ks=[(𝒉¯η,ks)T,𝟎T]T,𝒄η,k=𝒗ρ,k,\bm{h}_{\eta,k}^{c}=[(\bar{\bm{h}}_{\eta,k}^{c})^{T},\bm{0}^{T}]^{T},\ \bm{h}_{\eta,k}^{s}=[(\bar{\bm{h}}_{\eta,k}^{s})^{T},\bm{0}^{T}]^{T},\ \bm{c}_{\eta,k}=\bm{v}_{\rho,k},

and

𝒉¯η,kc=\displaystyle\scriptsize\bar{\bm{h}}_{\eta,k}^{c}= R(𝜻sk+Δ𝜻sk)[λϕ1λη1(ρk+Δρsk)cosϕkcosηkλϕ1λη1(ρk+Δρsk)sinϕkcosηkλη1(ρk+Δρsk)sinηk],\displaystyle R(\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}})\left[\begin{matrix}\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\cos\phi_{k}\cos\eta_{k}\\ \lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\sin\phi_{k}\cos\eta_{k}\\ \lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\sin\eta_{k}\end{matrix}\right],
𝒉¯η,ks=\displaystyle\bar{\bm{h}}_{\eta,k}^{s}= R(𝜻sk+Δ𝜻sk)[λϕ1λη1(ρk+Δρsk)cosϕksinηkλϕ1λη1(ρk+Δρsk)sinϕksinηkλη1(ρk+Δρsk)cosηk].\displaystyle R(\bm{\zeta}_{s_{k}}+\Delta\bm{\zeta}_{s_{k}})\left[\begin{matrix}-\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\cos\phi_{k}\sin\eta_{k}\\ -\lambda_{\phi}^{-1}\lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\sin\phi_{k}\sin\eta_{k}\\ \lambda_{\eta}^{-1}(\rho_{k}+\Delta\rho_{s_{k}})\cos\eta_{k}\end{matrix}\right].

Letting 𝒙=𝒙(Δ𝜼)\bm{x}=\bm{x}(\Delta\bm{\eta}), (29) can be equivalently reformulated as

min𝒙𝑯η𝒙+𝒄η𝑸2,s.t.𝒙𝒞,\displaystyle\min_{\bm{x}}\ \|\bm{H}_{\eta}\bm{x}+\bm{c}_{\eta}\|_{\bm{Q}}^{2},\quad\textrm{s.t.}\ \bm{x}\in\mathcal{C},

where 𝑯η\bm{H}_{\eta} is a zero matrix except

[𝑯η]6k5:6k,m={𝒉η,k+1c,if m=2sk+11,𝒉η,k+1s,if m=2sk+1,𝒉η,kc,if m=2sk1,𝒉η,ks,if m=2sk,[\bm{H}_{\eta}]_{6k-5:6k,m}=\left\{\begin{aligned} &\bm{h}_{\eta,k+1}^{c},\ \textrm{if }m=2s_{k+1}-1,\\ &\bm{h}_{\eta,k+1}^{s},\ \textrm{if }m=2s_{k+1},\\ &-\bm{h}_{\eta,k}^{c},\ \textrm{if }m=2s_{k}-1,\\ &-\bm{h}_{\eta,k}^{s},\ \textrm{if }m=2s_{k},\end{aligned}\right.

and 𝒄η=[𝒄η,1T,𝒄η,2T,,𝒄η,K1T]T\bm{c}_{\eta}=[{\bm{c}}_{\eta,1}^{T},{\bm{c}}_{\eta,2}^{T},\ldots,{\bm{c}}_{\eta,{K-1}}^{T}]^{T}.

C-C2 Subproblem (10d)

With velocities and all other kinds of biases being fixed (except Δ𝜶\Delta\bm{\alpha}), problem (8) with respect to Δ𝜶\Delta\bm{\alpha} is reformulated as follows:

minΔ𝜶k=1K1\displaystyle\min_{\Delta\bm{\alpha}}\ \sum_{k=1}^{K-1} 𝒉α,k+1ccosΔαsk+1+𝒉α,k+1ssinΔαsk+1\displaystyle\|\bm{h}_{\alpha,k+1}^{c}\cos\Delta\alpha_{s_{k+1}}+\bm{h}_{\alpha,k+1}^{s}\sin\Delta\alpha_{s_{k+1}}- (30)
𝒉α,kccosΔαsk𝒉α,kssinΔαsk+𝒄α,k𝑸k2,\displaystyle\bm{h}_{\alpha,k}^{c}\cos\Delta\alpha_{s_{k}}-\bm{h}_{\alpha,k}^{s}\sin\Delta\alpha_{s_{k}}+\bm{c}_{\alpha,k}\|_{\bm{Q}_{k}}^{2},

where

𝒉α,kc=[(𝒉¯α,kc)T,𝟎T]T,𝒉α,ks=[(𝒉¯α,ks)T,𝟎T]T,\bm{h}_{\alpha,k}^{c}=[(\bar{\bm{h}}_{\alpha,k}^{c})^{T},\bm{0}^{T}]^{T},\ \bm{h}_{\alpha,k}^{s}=[(\bar{\bm{h}}_{\alpha,k}^{s})^{T},\bm{0}^{T}]^{T},

and 𝒄α,k=𝒗ρ,k+[𝒄¯α,kT,𝟎T]T\bm{c}_{\alpha,k}=\bm{v}_{\rho,k}+[\bar{\bm{c}}_{\alpha,k}^{T},\bm{0}^{T}]^{T}. Define notations

E1=[000010001],E2=[000001010],E3=[100000000],E_{1}=\begin{bmatrix}0&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix},\ E_{2}=\begin{bmatrix}0&0&0\\ 0&0&-1\\ 0&1&0\end{bmatrix},\ E_{3}=\begin{bmatrix}1&0&0\\ 0&0&0\\ 0&0&0\end{bmatrix},

then 𝒉¯α,kc\bar{\bm{h}}_{\alpha,k}^{c}, 𝒉¯α,ks\bar{\bm{h}}_{\alpha,k}^{s}, and 𝒄¯α,k\bar{\bm{c}}_{\alpha,k} are expressed as

𝒉¯α,kc=E1Rx(αsk)Ry(βsk+Δβsk)Rz(γsk+Δγsk)𝒖k,\displaystyle\bar{\bm{h}}_{\alpha,k}^{c}=E_{1}R_{x}(\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k},
𝒉¯α,ks=E2Rx(αsk)Ry(βsk+Δβsk)Rz(γsk+Δγsk)𝒖k,\displaystyle\bar{\bm{h}}_{\alpha,k}^{s}=E_{2}R_{x}(\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k},
𝒄¯α,k=E3Rx(αsk)Ry(βsk+Δβsk)Rz(γsk+Δγsk)𝒖k.\displaystyle{\bar{\bm{c}}_{\alpha,k}}=E_{3}R_{x}(\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k}.

Letting 𝒙=𝒙(Δ𝜶)\bm{x}=\bm{x}(\Delta\bm{\alpha}), (30) can be equivalently reformulated as

min𝒙𝑯α𝒙+𝒄α𝑸2,s.t.𝒙𝒞,\displaystyle\min_{\bm{x}}\ \|\bm{H}_{\alpha}\bm{x}+\bm{c}_{\alpha}\|_{\bm{Q}}^{2},\quad\textrm{s.t.}\ \bm{x}\in\mathcal{C},

where 𝑯α\bm{H}_{\alpha} is a zero matrix except

[𝑯α]6k5:6k,m={𝒉α,k+1c,if m=2sk+11,𝒉α,k+1s,if m=2sk+1,𝒉α,kc,if m=2sk1,𝒉α,ks,if m=2sk,[\bm{H}_{\alpha}]_{6k-5:6k,m}=\left\{\begin{aligned} &\bm{h}_{\alpha,k+1}^{c},\ \textrm{if }m=2s_{k+1}-1,\\ &\bm{h}_{\alpha,k+1}^{s},\ \textrm{if }m=2s_{k+1},\\ &-\bm{h}_{\alpha,k}^{c},\ \textrm{if }m=2s_{k}-1,\\ &-\bm{h}_{\alpha,k}^{s},\ \textrm{if }m=2s_{k},\end{aligned}\right.

and 𝒄α=[𝒄α,1T,𝒄α,2T,,𝒄α,K1T]T\bm{c}_{\alpha}=[{\bm{c}}_{\alpha,1}^{T},{\bm{c}}_{\alpha,2}^{T},\ldots,{\bm{c}}_{\alpha,{K-1}}^{T}]^{T}.

C-C3 Subproblem (10e)

With velocities and all other kinds of biases being fixed (except Δ𝜷\Delta\bm{\beta}), problem (8) with respect to Δ𝜷\Delta\bm{\beta} is reformulated as follows:

minΔ𝜷k=1K1\displaystyle\min_{\Delta\bm{\beta}}\ \sum_{k=1}^{K-1} 𝒉β,k+1ccosΔβsk+1+𝒉β,k+1ssinΔβsk+1\displaystyle\|\bm{h}_{\beta,k+1}^{c}\cos\Delta\beta_{s_{k+1}}+\bm{h}_{\beta,k+1}^{s}\sin\Delta\beta_{s_{k+1}}- (31)
𝒉β,kccosΔβsk𝒉β,kssinΔβsk+𝒄β,k𝑸k2,\displaystyle\bm{h}_{\beta,k}^{c}\cos\Delta\beta_{s_{k}}-\bm{h}_{\beta,k}^{s}\sin\Delta\beta_{s_{k}}+\bm{c}_{\beta,k}\|_{\bm{Q}_{k}}^{2},

where

𝒉β,kc=[(𝒉¯β,kc)T,𝟎T]T,𝒉β,ks=[(𝒉¯β,ks)T,𝟎T]T,\bm{h}_{\beta,k}^{c}=[(\bar{\bm{h}}_{\beta,k}^{c})^{T},\bm{0}^{T}]^{T},\ \bm{h}_{\beta,k}^{s}=[(\bar{\bm{h}}_{\beta,k}^{s})^{T},\bm{0}^{T}]^{T},

and 𝒄β,k=𝒗ρ,k+[𝒄¯β,kT,𝟎T]T\bm{c}_{\beta,k}=\bm{v}_{\rho,k}+[\bar{\bm{c}}_{\beta,k}^{T},\bm{0}^{T}]^{T}. Define notations

E1=[100000001],E2=[001000100],E3=[000010000],E_{1}=\begin{bmatrix}1&0&0\\ 0&0&0\\ 0&0&1\end{bmatrix},\ E_{2}=\begin{bmatrix}0&0&1\\ 0&0&0\\ -1&0&0\end{bmatrix},\ E_{3}=\begin{bmatrix}0&0&0\\ 0&1&0\\ 0&0&0\end{bmatrix},

then 𝒉¯β,kc\bar{\bm{h}}_{\beta,k}^{c}, 𝒉¯β,ks\bar{\bm{h}}_{\beta,k}^{s}, and 𝒄¯β,k\bar{\bm{c}}_{\beta,k} are expressed as

𝒉β,kc=\displaystyle\bm{h}_{\beta,k}^{c}= Rx(αsk+Δαsk)E1Ry(βsk)Rz(γsk+Δγsk)𝒖k,\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})E_{1}R_{y}(\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k},
𝒉β,ks=\displaystyle\bm{h}_{\beta,k}^{s}= Rx(αsk+Δαsk)E2Ry(βsk)Rz(γsk+Δγsk)𝒖k,\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})E_{2}R_{y}(\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k},
𝒄¯β,k=\displaystyle{\bar{\bm{c}}_{\beta,k}}= Rx(αsk+Δαsk)E3Ry(βsk)Rz(γsk+Δγsk)𝒖k.\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})E_{3}R_{y}(\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k}.

Letting 𝒙=𝒙(Δ𝜷)\bm{x}=\bm{x}(\Delta\bm{\beta}), (31) can be equivalently reformulated as

min𝒙𝑯β𝒙+𝒄β𝑸2,s.t.𝒙𝒞,\displaystyle\min_{\bm{x}}\ \|\bm{H}_{\beta}\bm{x}+\bm{c}_{\beta}\|_{\bm{Q}}^{2},\quad\textrm{s.t.}\ \bm{x}\in\mathcal{C},

where 𝑯β\bm{H}_{\beta} is a zero matrix except

[𝑯β]6k5:6k,m={𝒉β,k+1c,if m=2sk+11,𝒉β,k+1s,if m=2sk+1,𝒉β,kc,if m=2sk1,𝒉β,ks,if m=2sk,[\bm{H}_{\beta}]_{6k-5:6k,m}=\left\{\begin{aligned} &\bm{h}_{\beta,k+1}^{c},\ \textrm{if }m=2s_{k+1}-1,\\ &\bm{h}_{\beta,k+1}^{s},\ \textrm{if }m=2s_{k+1},\\ &-\bm{h}_{\beta,k}^{c},\ \textrm{if }m=2s_{k}-1,\\ &-\bm{h}_{\beta,k}^{s},\ \textrm{if }m=2s_{k},\end{aligned}\right.

and 𝒄β=[𝒄β,1T,𝒄β,2T,,𝒄β,K1T]T\bm{c}_{\beta}=[{\bm{c}}_{\beta,1}^{T},{\bm{c}}_{\beta,2}^{T},\ldots,{\bm{c}}_{\beta,{K-1}}^{T}]^{T}.

C-C4 Subproblem (10f)

With velocities all other kinds of biases being fixed (except Δ𝜸\Delta\bm{\gamma}), problem (8) with respect to Δ𝜸\Delta\bm{\gamma} is reformulated as follows:

minΔ𝜸k=1K1\displaystyle\min_{\Delta\bm{\gamma}}\ \sum_{k=1}^{K-1} 𝒉γ,k+1ccosΔγsk+1+𝒉γ,k+1ssinΔγsk+1\displaystyle\|\bm{h}_{\gamma,k+1}^{c}\cos\Delta\gamma_{s_{k+1}}+\bm{h}_{\gamma,k+1}^{s}\sin\Delta\gamma_{s_{k+1}}- (32)
𝒉γ,kccosΔγsk𝒉γ,kssinΔγsk+𝒄γ,k𝑸k2,\displaystyle\bm{h}_{\gamma,k}^{c}\cos\Delta\gamma_{s_{k}}-\bm{h}_{\gamma,k}^{s}\sin\Delta\gamma_{s_{k}}+\bm{c}_{\gamma,k}\|_{\bm{Q}_{k}}^{2},

where

𝒉γ,kc=[(𝒉¯γ,kc)T,𝟎T]T,𝒉γ,ks=[(𝒉¯γ,ks)T,𝟎T]T,\bm{h}_{\gamma,k}^{c}=[(\bar{\bm{h}}_{\gamma,k}^{c})^{T},\bm{0}^{T}]^{T},\ \bm{h}_{\gamma,k}^{s}=[(\bar{\bm{h}}_{\gamma,k}^{s})^{T},\bm{0}^{T}]^{T},

and 𝒄γ,k=𝒗ρ,k+[𝒄¯γ,kT,𝟎T]T\bm{c}_{\gamma,k}=\bm{v}_{\rho,k}+[\bar{\bm{c}}_{\gamma,k}^{T},\bm{0}^{T}]^{T}. Define notations

E1=[100010000],E2=[010100000],E3=[000000001],E_{1}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&0\end{bmatrix},\ E_{2}=\begin{bmatrix}0&-1&0\\ 1&0&0\\ 0&0&0\end{bmatrix},\ E_{3}=\begin{bmatrix}0&0&0\\ 0&0&0\\ 0&0&1\end{bmatrix},

then 𝒉¯γ,kc\bar{\bm{h}}_{\gamma,k}^{c}, 𝒉¯γ,ks\bar{\bm{h}}_{\gamma,k}^{s}, and 𝒄¯γ,k\bar{\bm{c}}_{\gamma,k} are expressed as

𝒉¯γ,kc=\displaystyle\bar{\bm{h}}_{\gamma,k}^{c}= Rx(αsk+Δαsk)Ry(βsk+Δβsk)E1Rz(γsk)𝒖k,\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})E_{1}R_{z}(\gamma_{s_{k}})\bm{u}_{k},
𝒉¯γ,ks=\displaystyle\bar{\bm{h}}_{\gamma,k}^{s}= Rx(αsk+Δαsk)Ry(βsk+Δβsk)E2Rz(γsk)𝒖k,\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})E_{2}R_{z}(\gamma_{s_{k}})\bm{u}_{k},
𝒄¯γ,k=\displaystyle{\bar{\bm{c}}_{\gamma,k}}= Rx(αsk+Δαsk)Ry(βsk+Δβsk)E3Rz(γsk)𝒖k.\displaystyle R_{x}(\alpha_{s_{k}}+\Delta\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})E_{3}R_{z}(\gamma_{s_{k}})\bm{u}_{k}.

Letting 𝒙=𝒙(Δ𝜸)\bm{x}=\bm{x}(\Delta\bm{\gamma}), (32) can be equivalently reformulated as

min𝒙𝑯γ𝒙+𝒄γ𝑸2,s.t.𝒙𝒞,\displaystyle\min_{\bm{x}}\ \|\bm{H}_{\gamma}\bm{x}+\bm{c}_{\gamma}\|_{\bm{Q}}^{2},\quad\textrm{s.t.}\ \bm{x}\in\mathcal{C},

where 𝑯γ\bm{H}_{\gamma} is a zero matrix except

[𝑯γ]6k5:6k,m={𝒉γ,k+1c,if m=2sk+11,𝒉γ,k+1s,if m=2sk+1,𝒉γ,kc,if m=2sk1,𝒉γ,ks,if m=2sk,[\bm{H}_{\gamma}]_{6k-5:6k,m}=\left\{\begin{aligned} &\bm{h}_{\gamma,k+1}^{c},\ \textrm{if }m=2s_{k+1}-1,\\ &\bm{h}_{\gamma,k+1}^{s},\ \textrm{if }m=2s_{k+1},\\ &-\bm{h}_{\gamma,k}^{c},\ \textrm{if }m=2s_{k}-1,\\ &-\bm{h}_{\gamma,k}^{s},\ \textrm{if }m=2s_{k},\end{aligned}\right.

and 𝒄γ=[𝒄γ,1T,𝒄γ,2T,,𝒄γ,K1T]T\bm{c}_{\gamma}=[{\bm{c}}_{\gamma,1}^{T},{\bm{c}}_{\gamma,2}^{T},\ldots,{\bm{c}}_{\gamma,{K-1}}^{T}]^{T}.

Appendix D Full Rankness of Matrices 𝑯ρ\bm{H}_{\rho} and 𝑯ϑ\bm{H}_{\vartheta}

This section is to show, under Assumption 1, matrices 𝑯ρ\bm{H}_{\rho} in (13) and 𝑯ϑ\bm{H}_{\vartheta} in (15) are of full rank with probability one.

First, we note that the structures of 𝑯ρ\bm{H}_{\rho} and 𝑯ϑ\bm{H}_{\vartheta} are closely related to a binary matrix 𝑩{0,1}K×M\bm{B}\in\{0,1\}^{K\times M} (with K>MK>M), where KK is the total number of measurements, and MM is the number of sensors. Each entry 𝑩km=1\bm{B}_{km}=1 represents that sensor mm has a measurement at time instance kk, and 𝑩km=0\bm{B}_{km}=0 otherwise. Once 𝑩\bm{B} is of full (column) rank, then it is not hard to show that 𝑯ρ\bm{H}_{\rho} and 𝑯ϑ\bm{H}_{\vartheta} are also of full (column) rank (this will be discussed later). By Assumption 1, we know that each row of 𝑩\bm{B} is a one-hot row vector. Then, by applying proper row permutation of 𝑩\bm{B}, we have

𝑩row permutation[𝑰]\bm{B}{\xrightarrow{\text{row permutation}}}\begin{bmatrix}\bm{I}\\ *\end{bmatrix}

and hence 𝑩\bm{B} is always of full (column) rank.

Next, we discuss the relation between 𝑩\bm{B} and matrices 𝑯ρ\bm{H}_{\rho} and 𝑯ϑ\bm{H}_{\vartheta}. We take 𝑯α\bm{H}_{\alpha} as an example and the analysis is similar for other matrices 𝑯ρ\bm{H}_{\rho}, 𝑯β\bm{H}_{\beta}, 𝑯γ\bm{H}_{\gamma}, and 𝑯η\bm{H}_{\eta}. First, we need to expand 𝑩\bm{B} to a 6K×2M6K\times 2M matrix 𝑮\bm{G}, where 11 in the kk-th row of 𝑩\bm{B} is replaced by

𝑯k=[𝒉α,kc,𝒉α,ks]6×2\bm{H}_{k}=[\bm{h}_{\alpha,k}^{c},\bm{h}_{\alpha,k}^{s}]\in\mathbb{R}^{6\times 2}

and 0 in 𝑩\bm{B} is replaced by 𝟎6×2\bm{0}\in\mathbb{R}^{6\times 2}. Vectors 𝒉α,kc\bm{h}_{\alpha,k}^{c} and 𝒉α,ks\bm{h}_{\alpha,k}^{s} are defined in Appendix C-C2. According to the definition of 𝑯k\bm{H}_{k}, the event that 𝑯k\bm{H}_{k} is not of rank 2 is with zero probability, since this event requires that the measurement noise 𝒘k=(wkρ,wkϕ,wkη)\bm{w}_{k}=(w_{k}^{\rho},w_{k}^{\phi},w_{k}^{\eta}) (contained in (ρk,ϕk,ηk)(\rho_{k},\phi_{k},\eta_{k})) satisfies the following equation

𝒉α,kc=C𝒉α,ks\bm{h}_{\alpha,k}^{c}=C\bm{h}_{\alpha,k}^{s}

for a constant CC\in\mathbb{R}. To be specific, denote

(x,y,z)=Rx(αsk)Ry(βsk+Δβsk)Rz(γsk+Δγsk)𝒖k.(x,y,z)=R_{x}(\alpha_{s_{k}})R_{y}(\beta_{s_{k}}+\Delta\beta_{s_{k}})R_{z}(\gamma_{s_{k}}+\Delta\gamma_{s_{k}})\bm{u}_{k}.

Then, we see that (x,y,z)(x,y,z) is uniquely determined by the polar coordinate (wkρ,wkϕ,wkη)(w_{k}^{\rho},w_{k}^{\phi},w_{k}^{\eta}) and the above equation becomes

y=Czandz=Cy.y=-Cz~{}\text{and}~{}z=Cy.

It is easy to check that the above linear equation (in terms of yy and zz) has the unique solution z=y=0.z=y=0. However, as (wkρ,wkϕ,wkη)(w_{k}^{\rho},w_{k}^{\phi},w_{k}^{\eta}) are continuous random variables, the event that z=y=0z=y=0 happens is of zero probability.

Finally, let 𝑮k:l\bm{G}_{k:l} denote the submatrix stacked by the rows of 𝑮\bm{G} from index kk to ll (klk\leq l). Then by Assumption 1 (i.e., each sensor has at least one measurement and KM+1K\geq M+1), we know that 𝑮1:6(K1)\bm{G}_{1:6(K-1)} is of full (column) rank with probability one. Performing proper elementary row operations for 𝑮\bm{G}, we get 𝑮~\tilde{\bm{G}}, where 𝑮~6(k1)+1:6k=𝑮6(k1)+1:6k𝑮6k+1:6(k+1),k=1,2,,K1\tilde{\bm{G}}_{6(k-1)+1:6k}=\bm{G}_{6(k-1)+1:6k}-\bm{G}_{6k+1:6(k+1)},\ k=1,2,\ldots,K-1 and 𝑮~6(K1)+1:6K=𝑮6(K1)+1:6K\tilde{\bm{G}}_{6(K-1)+1:6K}=\bm{G}_{6(K-1)+1:6K}. Due to the above simple row operations and the fact that that 𝑯k𝑯k\bm{H}_{k}-\bm{H}_{k^{\prime}} (with kkk^{\prime}\neq k) is of full (column) rank with probability one (which can be proved using the same argument as in the proof that 𝑯k\bm{H}_{k} is of full column rank), one can show that 𝑮~1:6(K1)\tilde{\bm{G}}_{1:6(K-1)} is also of full (column) rank with probability one. Noticing that 𝑯α=𝑮~1:6(K1),\bm{H}_{\alpha}=\tilde{\bm{G}}_{1:6(K-1)}, we immediately obtain that 𝑯α\bm{H}_{\alpha} is also of full (column) rank with probability one.

Finally, we use the following toy example to illustrate the above proof. Suppose that there are two sensors and three measurements. Without loss of generality, we assume that sensor 11 has measurement at time instance 11 and sensor 22 has measurement at time instance 2.2. Now there are two different cases where sensor 11 has measurement at time instance 33 and sensor 22 has measurement at time instance 3.3. We first consider the case where sensor 11 has measurement at time instance 3.3. In this case, we have

𝑮=[𝑯1𝟎𝟎𝑯2𝑯3𝟎],𝑮~=[𝑯1𝑯2𝑯3𝑯2𝑯3𝟎]\bm{G}=\left[\begin{array}[]{cc}\bm{H}_{1}&\bm{0}\\ \bm{0}&\bm{H}_{2}\\ \bm{H}_{3}&\bm{0}\end{array}\right],~{}\tilde{\bm{G}}=\left[\begin{array}[]{ccc}\bm{H}_{1}&-\bm{H}_{2}\\ -\bm{H}_{3}&\bm{H}_{2}\\ \bm{H}_{3}&\bm{0}\end{array}\right]

and

𝑯α=[𝑯1𝑯2𝑯3𝑯2].\bm{H}_{\alpha}=\left[\begin{array}[]{ccc}\bm{H}_{1}&-\bm{H}_{2}\\ -\bm{H}_{3}&\bm{H}_{2}\end{array}\right].

One can show that 𝑯1𝑯3\bm{H}_{1}-\bm{H}_{3} is of full (column) rank with probability one and hence 𝑯α\bm{H}_{\alpha} in the above is also of full (column) rank with probability one. In the second case where sensor 22 has measurement at time instance 33, we have

𝑮=[𝑯1𝟎𝟎𝑯2𝟎𝑯3],𝑮~=[𝑯1𝑯2𝟎𝑯2𝑯3𝟎𝑯3]\bm{G}=\left[\begin{array}[]{cc}\bm{H}_{1}&\bm{0}\\ \bm{0}&\bm{H}_{2}\\ \bm{0}&\bm{H}_{3}\end{array}\right],~{}\tilde{\bm{G}}=\left[\begin{array}[]{ccc}\bm{H}_{1}&-\bm{H}_{2}\\ \bm{0}&\bm{H}_{2}-\bm{H}_{3}\\ \bm{0}&\bm{H}_{3}\end{array}\right]

and

𝑯α=[𝑯1𝑯2𝟎𝑯2𝑯3].\bm{H}_{\alpha}=\left[\begin{array}[]{ccc}\bm{H}_{1}&-\bm{H}_{2}\\ \bm{0}&\bm{H}_{2}-\bm{H}_{3}\end{array}\right].

Again, 𝑯2𝑯3\bm{H}_{2}-\bm{H}_{3} and 𝑯α\bm{H}_{\alpha} in the above are of full (column) rank with probability one.