This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Kalman Filter from the Mutual Information Perspective

Yarong Luo, yarongluo@whu.edu.cn
Jianlang Hu, hujianlang123@whu.edu.cn
Chi Guo, guochi@whu.edu.cn
GNSS Research Center, Wuhan University

Abstract
Kalman filter is a best linear unbiased state estimator. It is also comprehensible from the point view of the Bayesian estimation. However, this note gives a detailed derivation of Kalman filter from the mutual information perspective for the first time. Then we extend this result to the Rényi mutual information. Finally we draw the conclusion that the measurement update of the Kalman filter is the key step to minimize the uncertainty of the state of the dynamical system.

Key Words
Kalman filter, mutual information, Rényi mutual information, uncertainty, measurement update

Introduction

Kalman filter has been widely in various fields as an effective state estimator for integrated navigation [1], robotics [2], etc. The classical Kalman filter can be derived as a best linear unbiased estimate [1] and it is easy understand it from the probabilistic perspective [2]. Recently, Kalman filter has also been presented using the methods of maximum relative entropy [3] and the temporal derivative of the Rényi entropy [4], which go beyond the general Bayesian filter. More and more evidences show that Kalman filter can be regarded as a direct extension of information theory. This note gives a new perspective of the Kalman filter from the mutual information, which further bridges the gap between the optimal state estimation and the information theory. The main contribution of this note is to derive the Kalman filter from the perspective of mutual information and extend it to the Rényi mutual information case.

Kalman Filter from the Mutual Information

For following discrete-time state-space model

Xk=Φk|k1Xk1+Γk|k1Wk1X_{k}={\Phi_{k|k-1}}X_{k-1}+{\Gamma_{k|k-1}}W_{k-1} (1)
Zk=HkXk+VkZ_{k}=H_{k}X_{k}+V_{k} (2)

where XkX_{k} is n-dimensional state vector; ZkZ_{k} is m-dimensional measurement vector; Φk|k1{\Phi_{k|k-1}}, Γk|k1{\Gamma_{k|k-1}} and HkH_{k} are the known system structure parameters, which are called the n×nn\times n dimensional one-step state update matrix, the n×ln\times l dimensional system noise distribution matrix, and the m×nm\times n dimensional measurement matrix, respectively; Wk1W_{k-1} is the ll-dimensional system noise vector, and VkV_{k} is the m-dimensional measurement noise vectors. Both of them are Gaussian noise vector sequences with zero mean value, and independent to each other:

𝔼[Wk]=0,𝔼[WkWjT]=Qkδkj\mathbb{E}[W_{k}]=0,\mathbb{E}[W_{k}W_{j}^{T}]=Q_{k}\delta_{kj} (3)
𝔼[Vk]=0,𝔼[VkVjT]=Rkδkj\mathbb{E}[V_{k}]=0,\mathbb{E}[V_{k}V_{j}^{T}]=R_{k}\delta_{kj} (4)
𝔼[WkVjT]=0\mathbb{E}[W_{k}V_{j}^{T}]=0 (5)

The one-step prediction covariance matrix is denoted as Σk|k1\Sigma_{k|k-1}. The state estimation at tkt_{k} is denoted as 𝒩(X^k,Σk)\mathcal{N}(\hat{X}_{k},\Sigma_{k}), where X^k\hat{X}_{k} is the mean of estimated state and Σk\Sigma_{k} is the covariance matrix of the estimated covariance matrix. Assuming the optimal estimation of the state can be calculated as follows:

X^k=Xk|k1+KkZ~k|k1\hat{X}_{k}={X_{k|k-1}^{-}}+K_{k}{\tilde{Z}_{k|k-1}} (6)

where KkK_{k} is the undetermined correction factor matrix, Xk|k1=Xk|k1X^k1X_{k|k-1}^{-}=X_{k|k-1}-\hat{X}_{k-1} is the state estimation error, Z~k|k1=ZkHkXk|k1\tilde{Z}_{k|k-1}=Z_{k}-H_{k}X_{k|k-1}^{-} is the measurement one-step prediction error.

Then, the mean square error matrix of state estimation X^k\hat{X}_{k} is given by [1]

Σk=(IKkHk)Σk|k1(IKkHk)T+KkRkKkT{\Sigma}_{k}=({I}-K_{k}H_{k}){{\Sigma}_{k|k-1}}({I}-K_{k}H_{k})^{T}+K_{k}R_{k}K_{k}^{T} (7)

the mean square error matrix Σk\Sigma_{k} is positive definite as (IKkHk)Σk|k1(IKkHk)T({I}-K_{k}H_{k}){{\Sigma}_{k|k-1}}({I}-K_{k}H_{k})^{T} is positive definite and KkRkKkTK_{k}R_{k}K_{k}^{T} is positive definite.

The joint Gaussian distribution can be expressed

p(X,Y)([X^Y^],[ΣxxΣxyΣyxΣyy])p(X,Y)\sim\left(\begin{bmatrix}\hat{X}\\ \hat{Y}\end{bmatrix},\begin{bmatrix}\Sigma_{xx}&\Sigma_{xy}\\ \Sigma_{yx}&\Sigma_{yy}\end{bmatrix}\right) (8)

where X𝒩(X^,Σxx)X\sim\mathcal{N}(\hat{X},\Sigma_{xx}),Y𝒩(Y^,Σyy)Y\sim\mathcal{N}(\hat{Y},\Sigma_{yy}).

The mutual information for a joint Gaussian PDF can be represented by

I(X,Y)\displaystyle I(X,Y) =H(X)+H(Y)H(X,Y)=H(X)H(X|Y)\displaystyle=H(X)+H(Y)-H(X,Y)=H(X)-H(X|Y) (9)
=12ln((2πe)NdetΣxx)+12ln((2πe)MdetΣyy)12ln((2πe)M+NdetΣ)\displaystyle=\frac{1}{2}\ln((2\pi e)^{N}\det\Sigma_{xx})+\frac{1}{2}\ln((2\pi e)^{M}\det\Sigma_{yy})-\frac{1}{2}\ln((2\pi e)^{M+N}\det\Sigma)
=12ln(detΣdetΣxxdetΣyy)\displaystyle=-\frac{1}{2}\ln\left(\frac{\det\Sigma}{\det\Sigma_{xx}\det\Sigma_{yy}}\right)

where H(X)=12ln((2πe)NdetΣxx)H(X)=\frac{1}{2}\ln((2\pi e)^{N}\det\Sigma_{xx}) is the entropy of a Gaussian random variable, and

detΣ=detΣxxdet(ΣyyΣyxΣxx1Σxy)=detΣyydet(ΣxxΣxyΣyy1Σyx)\det\Sigma=\det\Sigma_{xx}\det\left(\Sigma_{yy}-\Sigma_{yx}\Sigma_{xx}^{-1}\Sigma_{xy}\right)=\det\Sigma_{yy}\det\left(\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}\right) (10)

Therefore, the mutual information describes the reduction of the uncertainty in variable X due to gaining knowledge of variable Y.

Similarly, the mutual information at time step tk+1t_{k+1} can be easily computed by the a priori, a posteriori PDF and the Kalman gain KkK_{k} as

I(X^k|k1,Zk)=12ln(detΣk|k1detΣk)=12ln(detΣk|k1det((IKkHk)Σk|k1(IKkHk)T+KkRkKkT))I(\hat{X}_{k|k-1},Z_{k})=\frac{1}{2}\ln\left(\frac{\det\Sigma_{k|k-1}}{\det\Sigma_{k}}\right)=\frac{1}{2}\ln\left(\frac{\det\Sigma_{k|k-1}}{\det\left(({I}-K_{k}H_{k}){{\Sigma}_{k|k-1}}({I}-K_{k}H_{k})^{T}+K_{k}R_{k}K_{k}^{T}\right)}\right) (11)

It describes the reduction of the uncertainty of the state due to gaining knowledge from the measurement ZkZ_{k}. Consequently, we want to maximize the mutual information I(X^k|k1,Zk)I(\hat{X}_{k|k-1},Z_{k}). It is obvious that the maximum of the mutual information is equivalent to the minimum of lndetΣk\ln\det\Sigma_{k}. We can note that equation (11) is the function of the unknown factor matrix and thereby the maximization of it can be calculated by taking derivative of it with respect to KkK_{k} and setting it equal to zero,

dI(X^k|k1,Zk)dKk=ΣkTdΣkdKk=ΣkT(2(IKkHk)Σk|k1HkT+2KkRk)=0\frac{dI(\hat{X}_{k|k-1},Z_{k})}{dK_{k}}=-\Sigma_{k}^{-T}\frac{d\Sigma_{k}}{dK_{k}}=-\Sigma_{k}^{-T}\left(-2(I-K_{k}H_{k})\Sigma_{k|k-1}H_{k}^{T}+2K_{k}R_{k}\right)=0 (12)

where lndetXX=XT\frac{\partial\ln\det X}{\partial X}=X^{-T} [5] has been used and then solving for KkK_{k} gives

Kk=Σk|k1HkT(HkΣk|k1HkT+Rk)1K_{k}=\Sigma_{k|k-1}H_{k}^{T}(H_{k}\Sigma_{k|k-1}H_{k}^{T}+R_{k})^{-1} (13)

A subsequent derivative of equation (12) must be performed to check for a maximum, that is

ddKk(ΣkT(2(IKkHk)Σk|k1HkT+2KkRk))\frac{d}{dK_{k}}\left(-\Sigma_{k}^{-T}\left(-2(I-K_{k}H_{k})\Sigma_{k|k-1}H_{k}^{T}+2K_{k}R_{k}\right)\right) (14)

After substituting the equation (12) into above equation results in

ddKk(ΣkT(2(IKkHk)Σk|k1HkT+2KkRk))=ΣkT(HkΣk|k1HkT+Rk)\frac{d}{dK_{k}}\left(-\Sigma_{k}^{-T}\left(-2(I-K_{k}H_{k})\Sigma_{k|k-1}H_{k}^{T}+2K_{k}R_{k}\right)\right)=-\Sigma_{k}^{-T}(H_{k}\Sigma_{k|k-1}H_{k}^{T}+R_{k}) (15)

which is always negative definite by the definition of the covariance matrices RkR_{k} and HkΣk|k1HkTH_{k}\Sigma_{k|k-1}H_{k}^{T}, ensuring the solution for KkK_{k} is a maximum.

Kalman Filter from the Rényi Mutual Information

Moreover, the Rényi mutual information of a joint Gaussian PDF can be calculated similar to equation (9):

IRα(X,Y)\displaystyle I_{R}^{\alpha}(X,Y) =HRα(X)+HRα(Y)HRα(X,Y)\displaystyle=H_{R}^{\alpha}(X)+H_{R}^{\alpha}(Y)-H_{R}^{\alpha}(X,Y) (16)
=12ln|(2π)NαNα1detΣxx|+12ln|(2π)MαMα1detΣyy|12ln|(2π)N+MαN+Mα1detΣ|\displaystyle=\frac{1}{2}\ln|(2\pi)^{N}\alpha^{\frac{N}{\alpha-1}}\det\Sigma_{xx}|+\frac{1}{2}\ln|(2\pi)^{M}\alpha^{\frac{M}{\alpha-1}}\det\Sigma_{yy}|-\frac{1}{2}\ln|(2\pi)^{N+M}\alpha^{\frac{N+M}{\alpha-1}}\det\Sigma|
=12ln(detΣxxdetΣyydetΣ)\displaystyle=\frac{1}{2}\ln\left(\frac{\det\Sigma_{xx}\det\Sigma_{yy}}{\det\Sigma}\right)
=I(X,Y)\displaystyle=I(X,Y)

where HRα(X)=12ln|(2π)NαNα1detΣxx|H_{R}^{\alpha}(X)=\frac{1}{2}\ln|(2\pi)^{N}\alpha^{\frac{N}{\alpha-1}}\det\Sigma_{xx}| is the Rényi entropy of order α\alpha for a continuous Gaussian random variable with a multivariate Gaussian PDF. Consequently, we can know that the mutual information is the same as the Rényi mutual information for the joint Gaussian PDF. Similarly, we can get the same result as equation (13) from the Rényi mutual information.

Conclusions

In this paper, Kalman filter is derived from the perspective of mutual information and extended to the Rényi mutual information case. We show that the measurement update of the Kalman filter can minimize the uncertainty of the state by formulating it as the mutual information between the evolving state and the measurement and maximizing the mutual information. Furthermore, we can think of Kalman filter a little more radically as an extension of the information theory.

Acknowledgement
This research was supported by a grant from the National Key Research and Development Program of China (2018YFB1305001). We express thanks to GNSS Center, Wuhan University.

References

  • [1] Y. Gongmin and W. Jun, Lectures on Strapdown Inertial Navigation Algorithm and Integrated Navigation Principles. Northwestern Polytechnical University Press: Xi’an, China, 2019.
  • [2] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005.
  • [3] A. Giffin and R. Urniezius, “The kalman filter revisited using maximum relative entropy,” Entropy, vol. 16, no. 2, pp. 1047–1069, 2014.
  • [4] Y. Luo, C. Guo, S. You, and J. Liu, “A novel perspective of the kalman filter from the rényi entropy,” Entropy, vol. 22, no. 9, p. 982, 2020.
  • [5] X.-D. Zhang, Matrix analysis and applications. Cambridge University Press, 2017.