This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Uncertainty Quantification of Hyperspectral Image Denoising Frameworks based on Sliding-Window Low-Rank Matrix Approximation

Jingwei Song, Shaobo Xia, Jun Wang, Mitesh Patel, Dong Chen This work was supported in part by the National Natural Science Foundation of China under Grant 41971415; in part by the Natural Science Foundation of Jiangsu Province under Grant BK20201387; in part by the Toyota Research Institute; and in part by the Talent Startup Project of Zhejiang A & F University Scientific Research Development Foundation under Grant 2034020104. (Corresponding author: Dong Chen)Jingwei Song is with the College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China, and the Department of Naval Architecture and Marine Engineering, University of Michigan, Ann Arbor, MI 48109, USA. e-mail:(jingweso@umich.edu); this work was partly performed during the tenure of his position at the FXPAL inc.Shaobo Xia is with the College of Environment and Resources, Zhejiang A&F University, Hangzhou, Zhejiang, China, and also with the Department of Geomatics Engineering, University of Calgary, T2N 1N4, Canada. e-mail: (shaobo.xia@zafu.edu.cn). Jun Wang is with the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China. e-mail:(wangjun@radi.ac.cn). Mitesh Patel is with FX Palo Alto Laboratory Inc. Palo Alto, CA - 94304,USA. e-mail:(patelmiteshn@gmail.com). Dong Chen is with the College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China. e-mail: (chendong@njfu.edu.cn).
Abstract

Sliding-window based low-rank matrix approximation (LRMA) is a technique widely used in hyperspectral images (HSIs) denoising or completion. However, the uncertainty quantification of the restored HSI has not been addressed to date. Accurate uncertainty quantification of the denoised HSI facilitates applications such as multi-source or multi-scale data fusion, data assimilation, and product uncertainty quantification since these applications require an accurate approach to describe the statistical distributions of the input data. Therefore, we propose a prior-free closed-form element-wise uncertainty quantification method for LRMA-based HSI restoration. Our closed-form algorithm overcomes the difficulty of handling uncertainty in HSI patch mixing caused by the sliding-window strategy used in the conventional LRMA process. The proposed approach only requires the uncertainty of the observed HSI and provides the uncertainty result relatively rapidly and with similar computational complexity as the LRMA technique. We conduct extensive experiments to validate the estimation accuracy of the proposed closed-form uncertainty approach. The method is robust to at least 10%10\% random impulse noise at the cost of 1020%10-20\% of additional processing time compared to the LRMA. The experiments indicate that the proposed closed-form uncertainty quantification method is more applicable to real-world applications than the baseline Monte Carlo test, which is computationally expensive.

Index Terms:
Low-rank matrix approximation, hyperspectral images, uncertainty quantification.

I INTRODUCTION

Data uncertainty, which refers to the lack of sureness about the data [1], is an essential factor to consider when utilizing data and derived products [2]. When the data uncertainty is known, users can determine whether a product is suitable for specific applications or can balance the contributions of different data sources. For example, in [3], the minimum detectable deformation can be estimated based on the uncertainty measures in a point cloud so that engineers can develop appropriate project plans, including sensor selection and set-up schemes [4]. In multiple data fusion tasks, the label uncertainties of hyperspectral images (HSIs) and point clouds are often considered to achieve high fusion accuracy [5].

In theory, uncertainty can be classified as aleatoric uncertainty, and epistemic uncertainty [6]. Aleatoric uncertainty is caused by noise in the data and the processing steps. In contrast, epistemic uncertainty is introduced into the observation models, and data post-processing steps [2, 7, 4, 8, 1]. In the absence of ground truth data to measure the model performance, the epistemic uncertainty is ignored, and the output uncertainty is defined as the aleatoric uncertainty due to the propagation of an arbitrary model. Most studies have defined the probabilistic density function of the noise as a mixture of the normal distribution and the random distribution. Assuming that random noise is significantly low, the uncertainty can be described by the Gaussian distribution. In HSI restoration, symmetric uncertainty based on mutual information was proposed in [9] and was used in HSI classifications. In HSI unmixing, the end member uncertainty of unmixing methods based on the normal compositional model is defined as the covariance matrix, which is used to predict the unmixing errors without knowing the ground truth [10]. Uncertainty has been described differently in different studies. [1] proposed an entropy-based uncertainty measure for evaluating the classification results of multi-spectral remote sensing images. The calculated entropy, which was defined as the pixel-wise classification uncertainty, had a linear relation with the accuracy of the classification results. In [11], the uncertainty of 3D point clouds was defined as the measurement repeatability, which was represented by the median and the two percentiles (5% and 95%) related to repeated measurements.

This paper focuses on the uncertainty quantification in HSI denoising. HSIs consisting of hundreds or thousands of spectral bands have been widely used in various applications (e.g., land cover mapping, pollution monitoring, and crop yield estimation) due to their rich spectral information compared to the spatial information. In HSI data processing, image denoising is regarded as an important preprocessing step for advanced applications and has been researched intensively in the past decades [12]. According to [13], existing HSI denoising methods can be categorized into three groups: filter-based methods, model optimization methods, and deep learning methods. Filter-based methods try to remove noise using various filters. Classical methods in this category are based on nonlocal filters [14, 15, 16, 17]. Model optimization methods can be further divided into total variation methods [18], sparse representation-based [19], and sliding-window low-rank matrix-based methods [20]. The major advantages of this group of denoising methods are utilizing image priors and considering spatial and spectral features simultaneously. These methods are often combined to achieve better performance [21, 22]. The last group is based on deep learning frameworks, which have attracted increasing attention in the past three years [23]. A recent deep learning-based framework was proposed in [13] to remove hybrid noise in a spatial-spectral framework.

HSI denoising based on the sliding-window low-rank matrix approximation (LRMA) framework has become a state-of-the-art technique in the community [20, 24]. These methods are model optimization methods aimed at exploiting the low-dimensional structure in the high-dimensional data space. LRMA-based studies [25, 20, 21] assume that the noise-free HSI can be represented by a low-rank matrix. Thus, HSI denoising can be regarded as a classical sliding-window low-rank matrix recovery problem. A representative HSI denoising framework based on LRMA is described in [20], where a clean image, impulse noise, and Gaussian noise were formulated separately in the objective function, which was minimized under a low-rank assumption. Many variants of LRMA-based HSI denoising methods have been proposed in the past five years [24, 26, 21, 27, 28]. For example, [24] proposed an LRMA-based framework containing a total-variation regularization term to preserve spatial information. The spatial-spectral structure in HSI denoising was also exploited by replacing the sliding-window low-rank matrix analysis with low-rank tensor approximation [28]. In general, LRMA-based HSI denoising methods are state-of-the-art frameworks that have been widely used in this area.

Despite the progress in LRMA-based HSI denoising, no studies investigated the uncertainty quantification of denoised HSI to date. Some works aimed to determine the noise of the raw observations [29, 30], but the uncertainty of denoised HSI has not yet been addressed, although it is critical in subsequent applications, as discussed above. However, the quantification of uncertainties of denoised HSIs based on LRMA is difficult due to the nonlinear discontinuous functions in LRMA models (e.g., nuclear norm), which hinders the deduction of the closed-form uncertainty propagation formulations. Recently, [31] proposed an optimal uncertainty quantification and inference method for noisy matrix completion. Inspired by this work, we expect that it is possible to quantify the uncertainties of outputs from LRMA-based HSI denoising frameworks.

A sliding-window strategy is always used in HSI denoising to handle the extensive data volume of HSI [24, 20]; the denoised element values are obtained by averaging the values from several traversed patches. The mixed values from multiple patches pose difficulty in quantifying the uncertainty of the estimations. Unlike the conventional perspective of multiple observations, which obey the same independent and identical distribution (i.i.d.), multiple overlapping sliding windows are inter-connected statistically. Thus, the covariance of the overlapping window should also be addressed. To tackle these problems, we introduce and improve the method described in [32] to quantify the uncertainty of the low-rank-based HSI approach. Note that we assume the input HSI is a mixture of Gaussian noise and random noise. Previous studies [30, 16, 21] already proved that the Poisson noise was the primary concern in real HSI data but could be approximated by additive Gaussian noise. Based on these results, Gaussian noise with mean and covariance is suitable to establish the uncertainty propagation model.

This research does not aim at providing a new LRMA formulation. Instead, we propose a general uncertainty quantification method for the existing LRMA approach. The main contributions of this paper are threefold,

  1. 1.

    To the best of our knowledge, the uncertainty quantification of the denoised HSI based on LRMA is presented for the first time in this paper. We provide a closed-form time-efficient uncertainty propagation model to predict the element-wise uncertainty of the denoised HSI without any ground truth data.

  2. 2.

    To solve the “uncertainty mixing” problem caused by the sliding-window strategy, we deduce a weighted average uncertainty formulation, which proves effective based on extensive experiments.

  3. 3.

    The proposed uncertainty estimation approach is independent of the choice of the LRMA algorithm. It is based on the global minimum assumption of the LRMA solver; thus, a more accurate solver leads to better uncertainty description.

The rest of this paper is organized as follows. In Section II, we briefly summarize the canonical formulation of LRMA-based HSI denoising. Section III presents the proposed algorithm for the uncertainty propagation in a classical LRMA-based HSI denoising framework. Section IV shows experiments on both simulated and real HSI data. The discussion of the parameter settings and limitations is also presented in Section IV. Section V summarizes this paper.

Refer to caption
Figure 1: Illustration of the routine process of HSI denoising. The 3D matrices at each end represent the raw and denoised HSI, respectively. The sliding window operates on each patch of the matrix. The selected 3D patch matrix is converted to a 2D matrix by concatenating each band of the 2D matrix as a 1D vector. The permutated 2D matrix goes through the LRMA process to generate the denoised matrix. Symmetrically, the denoised 2D matrix is inversely permutated to the 3D patch matrix and constitutes the final restored HSI in 3D.

II LRMA algorithm for LRMA HSI denoising

We cite [20] as a typical LRMA approach and discuss the uncertainty evaluation based on the LRMA approach. Section II is a brief formulation of the routine LRMA method [20]. The results provided in Section II have been strictly tested and validated in different LRMA studies. Section III describes the proposed uncertainty estimation method for an LRMA approach, as described in [20]. We emphasize that this work DOES NOT aim at proposing another LRMA formulation. The goal of this approach is to quantify the uncertainty of LRMA denoised images in a closed-form manner.

II-A Revisiting the general HSI restoration process

Fig. 1 demonstrates the routine process of the LMRA-based HSI restoration. The HSI is processed with a 3D sliding window to reduce the computational burden. For each 3D patch in the window, all the bands are vectorized and stacked into the 2D matrix. The permutated 2D matrix goes through the routine LMRA process, and the result is re-permutated to recover the 3D denoised patch. Finally, all the patches are summarized and averaged to create the restored HSI.

An HSI acquired by a sensor is typically modeled as a 3D image composed of Gaussian noise and random noise:

𝐖=𝐖+𝐆+𝐍,\mathbf{W}=\mathbf{W}^{\star}+\mathbf{G}+\mathbf{N}, (1)

where 𝐖M×N×P\mathbf{W}\in\mathbb{R}^{M\times N\times P} is the image acquired from the hyperspectral sensor with the size M×NM\times N and PP channels, 𝐖M×N×P\mathbf{W}^{\star}\in\mathbb{R}^{M\times N\times P} is the noise-free image denoted as the ground truth, 𝐆M×N×P\mathbf{G}\in\mathbb{R}^{M\times N\times P} is the element-wise i.i.d. Gaussian noise and 𝐍M×N×P\mathbf{N}\in\mathbb{R}^{M\times N\times P} is the sparse and random noise (impulse signal). The random noise 𝐍\mathbf{N} is normally considered negligible with regard to 𝐆\mathbf{G}, and the data quality quantification process is inferring the statistical distribution of 𝐆\mathbf{G}. In practice, the ground truth HSI 𝐖\mathbf{W}^{\star} cannot be obtained. Thus, we denote 𝐖^\hat{\mathbf{W}} as the estimated denoised HSI calculated from studies based on the low-rank assumption [21, 22].

The goal of HSI denoising is to recover the denoised 𝐖^\hat{\mathbf{W}} with the 3D HSI noisy observation 𝐖\mathbf{W} and the exact rank prior. Assuming that the random noise 𝐍\mathbf{N} is negligible and the residual follows the i.i.d. zero-mean Gaussian distribution, the restored image 𝐖^\hat{\mathbf{W}} is recovered with the following formulation:

min𝐖^μrank(𝐖^)+𝐆F2, s.t. 𝐖^=𝐖+𝐆 such that 𝐖^=g(𝐖^),\begin{array}[]{l}{\underset{\hat{\mathbf{W}}}{\min}\ \mu\operatorname{rank}(\hat{\mathbf{W}}^{\sharp})+\|\mathbf{G}\|^{2}_{\mathrm{F}},\text{\ s.t. }}{\hat{\mathbf{W}}=\mathbf{W}+\mathbf{G}}\\ \text{\ such that }{\hat{\mathbf{W}}^{\sharp}=g(\hat{\mathbf{W}})}\end{array}, (2)

where μ\mu is the hyperparameter balancing the two terms. rank()\operatorname{rank}(\cdot) constraints the rank of the input matrix to be no greater than the given rank. ||||F||\cdot||_{\mathrm{F}} is the Frobenius norm that minimizes the residuals in Gaussian distribution. g()g(\cdot) permutates the 3D HSI to a 2D matrix by vectorizing and concatenating to a general 2D matrix (Fig. 1). The low-rank formulation was originally proposed by [33] and was called “robust principal component analysis” (RPCA). The work in [20] introduced RPCA into LRMA scenario.

II-B The sliding-window in the HSI restoration process

HSI restoration differs greatly from conventional image processing due to the data size. The extremely large permutated 2D matrix hinders the iterative regular singular value decomposition (SVD) process, which is essential in solving Eq. (2). The HSI is segmented into small patches (denoted as the overlapped sliding window) and solved in parallel to overcome the heavy computational burden in minimizing the objective function due to the huge HSI size. Previous studies regarded the sliding-window strategy as an engineering compromise and ignored the formal mathematical presentation. However, we realize that the unique sliding-window structure significantly impairs the uncertainty quantification. By following [20], we obtain a sliding-window version that minimizes all patches in Eq. (2):

min𝐖^(ijk)μrank(𝐖^(ijk))+12𝐖(ijk)𝐖^(ijk)F2such that𝐖^(ijk)=g(𝐖^(ijk)) (i,j,k)Ω,\begin{aligned} &\underset{\hat{\mathbf{W}}^{(ijk)}}{\min}&&\mu\operatorname{rank}(\hat{\mathbf{W}}^{\sharp{(ijk)}})+\frac{1}{2}\|\mathbf{W}^{(ijk)}-\hat{\mathbf{W}}^{(ijk)}\|_{\mathrm{F}}^{2}\\ &\text{such that}&&{\hat{\mathbf{W}}^{\sharp{(ijk)}}=g(\hat{\mathbf{W}}^{(ijk)})}\\ &&&\text{ }(i,j,k)\in\Omega\\ \end{aligned}, (3)

where the matrices 𝐖(ijk)\mathbf{W}^{(ijk)}, 𝐖^(ijk)\hat{\mathbf{W}}^{(ijk)} and 𝐖^(ijk)J×J\hat{\mathbf{W}}^{\sharp{(ijk)}}\in\mathbb{R}^{J\times J} are the patch indexed in (ijk){(ijk)} with regard to 𝐖\mathbf{W}, 𝐖^\hat{\mathbf{W}} and 𝐖^\hat{\mathbf{W}}^{\sharp}. JJ is the size of the permutated matrix. Ω{1,,M}×{1,,N}×{1,,P}\Omega\subseteq\{1,\cdots,M^{\prime}\}\times\{1,\cdots,N^{\prime}\}\times\{1,\cdots,P^{\prime}\} is the number of patches in the HSI. After all the patches are optimized, the optimal 𝐖^\hat{\mathbf{W}} is restored by averaging the overlapped patches 𝐖^(ijk)\hat{\mathbf{W}}^{(ijk)} element-wisely. By ignoring the random noise, we obtain the observation corrupted with the element-wise Gaussian noise:

𝐖(ijk)𝐖(ijk)+𝐆(ijk),𝐆(ijk)i.i.d.𝒩(0,σ02),\displaystyle\mathbf{W}^{(ijk)}\approx{\mathbf{W}}^{\star{(ijk)}}+\mathbf{G}^{(ijk)},\quad\mathbf{G}^{(ijk)}\stackrel{{\scriptstyle\mathrm{i.i.d.}}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}), (4)

where σ02\sigma_{0}^{2} is the variance of the noise obtained from the image or estimated by algorithms such as described in [25]. 𝐆(ijk)\mathbf{G}^{(ijk)} is the Gaussian noise in patch (ijk)(ijk). In previous studies, the choice of rank was often estimated by manual-tuning strategies based on the differences between the reconstructed signal and the original one [21].

The idea of Eq. (LABEL:Eq_image_recovery) is to convert the 3D patch into a typical 2D small-size matrix so that the conventional low-rank factorization algorithm can be executed with limited computational resources. The equation is often solved with a Lagrange multiplier or an augmented Lagrange multiplier (ALM); off-the-shelf low-rank solvers can be used. In this work, we choose the GoDec algorithm proposed by [34] to be consistent with the LRMA method [20]. However, it should be emphasized that the proposed uncertainty quantification method is independent of the solver since it only assumes that the estimation of the LRMA is close to the minimum. The better the performance of the LRMA solver, the more accurate the confidence interval of our method describes.

After all small overlapped patches have gone through the process of Eq. (LABEL:Eq_image_recovery), the optimized denoised patches are summed to restore the HSI:

𝐖^=[i,j,kΩfp(𝐖^(ijk))]𝐐,\hat{\mathbf{W}}=\left[\sum_{i,j,k\in\Omega}\mathit{f}_{p}({\hat{\mathbf{W}}^{(ijk)}})\right]\oslash\mathbf{Q}, (5)

where fp()\mathit{f}_{p}(\cdot) is the padding function to convert the patch to the same size of 𝐖^\hat{\mathbf{W}} by padding the rest of the elements with 0. The matrix 𝐐M×N×P\mathbf{Q}\in\mathbb{R}^{M\times N\times P} counts the number of the patches valid on each voxel. \oslash is the Hadamard division operator.

III Uncertainty quantification method for LRMA-based HSI denoising

Based on the above-mentioned formulation, we propose a closed-form element-wise uncertainty quantification method that can handle the sliding-window LRMA-based HSI denoising.

III-A The general framework

Refer to caption
Figure 2: The general framework of the proposed approach. The blue boxes represent the typical LRMA process ([20] as an example). In addition to the LRMA HSI denoising, our work aims at providing an extra module to quantify the uncertainty of the output and the restored HSI using the LRMA algorithm.

Fig. 2 illustrates the general framework and the relationship with the routine LRMA approach. In the LRMA process, the temporary variable, i.e., the 2D sliding-window patch, is utilized to provide the uncertainty to the HSI patch based on our closed-form formulation. After the HSI patch and its associated closed-form uncertainty have been generated, the routine LRMA approach averages them to yield the restored HSI. We also provide a closed-form uncertainty propagation to quantify the final restored HSI. As shown in Fig. 2, in all processes, the proposed approach is independent of the choice of the low-rank solver. The method can also be easily integrated into the derivations of the LRMA process. In addition, since our method is a closed-form solution and only utilizes temporary variables, the processing time is relatively short compared to the routine LRMA process.

One major contribution of this work is that it provides an element-wise closed-form solution to quantify the uncertainty in a time-efficient manner. For the restored HSI 𝐖^\hat{\mathbf{W}}, one straightforward approach of the element-wise uncertainty quantification is conducting the Monte Carlo test with the given statistical distribution of the observation 𝐖\mathbf{W}. However, the size of HSIs is notoriously large, resulting in a heavy computational burden to conduct numerous Monte Carlo tests. Roughly, a one-time process in [20] on a 100×100×100100\times 100\times 100 HSI requires 20 minutes with a typical commercial desktop computer. Since a robust uncertainty quantification requires a reasonable number of trials, the Monte Carlo test is inefficient for use in the element-wise HSI uncertainty quantification.

Another benefit of this research is that it solves the uncertainty propagation of the sliding-window process. The uniqueness of HSIs is that the permutated 2D matrix after dimensionality reduction is still relatively large. The solution cannot be achieved with a one-time LRMA process. Instead, researchers have adopted the sliding-window strategy to segment the HSI into many small patches and process them parallel. The different statistical properties of each overlapped patch make it challenging to quantify the uncertainty of the final restored HSI. We solve this problem and present a closed-form solution in the proposed sliding-window element-wise closed-form HSI uncertainty quantification.

III-B Uncertainty quantification for the exact low-rank factorization

This section focuses on solving the uncertainty quantification of the patch-wise low-rank factorization process (Eq. (LABEL:Eq_image_recovery)). We follow the example of [31] and draw a similar conclusion for the uncertainty quantification of the exact low-rank patch factorization. A concise and clear proof is also provided. HSI patch-based averaging (Eq. (5)) will be discussed in the next subsection. A recent study [31] proposed an uncertainty quantification method for the exact low-rank 2D matrix. This element-wise algorithm can quantify the confidence interval for a common low-rank based 2D matrix denoising algorithm. Synthetic and real-world tests validate the efficiency of this method. We introduce this approach and modify it to enable the quantification of the uncertainty of each estimated 3D HSI patch 𝐖^(ijk)\hat{\mathbf{W}}^{(ijk)} using Eq. (LABEL:Eq_image_recovery). We also provide concise proof different from [31].

In the patch optimization process shown in Eq. (LABEL:Eq_image_recovery), we first aim to quantify the uncertainty of the temporary 2D square matrix 𝐖^(ijk)\hat{\mathbf{W}}^{\sharp{(ijk)}} (𝐖^(ijk)=g(𝐖^(ijk))\hat{\mathbf{W}}^{\sharp{(ijk)}}=g(\hat{\mathbf{W}}^{{(ijk)}}) and it can be any patch in Ω\Omega). We denote the ground truth of 𝐖^(ijk)\hat{\mathbf{W}}^{\sharp{(ijk)}} and 𝐖^(ijk)\hat{\mathbf{W}}^{{(ijk)}} as 𝐖(ijk){\mathbf{W}}^{\sharp\star{(ijk)}} and 𝐖(ijk){\mathbf{W}}^{\star{(ijk)}}. For the sake of conciseness, the rank-r SVD decomposition is 𝐖(ijk)=𝐔(ijk)Σ(ijk)𝐕(ijk){\mathbf{W}}^{\sharp\star{(ijk)}}=\mathbf{U}^{\star(ijk)}\Sigma^{\star(ijk)}\mathbf{V}^{\star(ijk)\top}. For conciseness, we simplify (𝐔(ijk),Σ(ijk),𝐕(ijk))(\mathbf{U}^{\star(ijk)},\Sigma^{\star(ijk)},\mathbf{V}^{\star(ijk)}) by ignoring the patch index (ijk)(ijk) and rewrite it as (𝐔,Σ,𝐕)(\mathbf{U}^{\star},\Sigma^{\star},\mathbf{V}^{\star}). Two matrices are defined as the product of the unitary matrix and the squared singular matrix is: 𝐗𝐔Σ1/2K×r\mathbf{X}^{\star}\triangleq\mathbf{U}^{\star}\Sigma^{\star 1/2}\in\mathbb{R}^{K\times r}, 𝐘𝐕Σ1/2L×r\mathbf{Y}^{\star}\triangleq\mathbf{V}^{\star}\Sigma^{\star 1/2}\in\mathbb{R}^{L\times r}. The following rules apply:

𝐗𝐗=𝐘𝐘=Σ.\displaystyle\mathbf{X}^{\star\top}\mathbf{X}^{\star}=\mathbf{Y}^{\star\top}\mathbf{Y}^{\star}=\Sigma^{\star}. (6)

Following the above definition, the ground truth 2D temporary matrix is 𝐖(ijk)=𝐗𝐘\mathbf{W}^{\sharp\star{(ijk)}}=\mathbf{X}^{\star}\mathbf{Y}^{\star\top}. Similarly, we define the estimated version 𝐖^(ijk)𝐗^𝐘^\hat{\mathbf{W}}^{\sharp{(ijk)}}\approx\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top} where 𝐗^\hat{\mathbf{X}} and 𝐘^\hat{\mathbf{Y}} are the estimated matrices regarding 𝐗\mathbf{X}^{\star} and 𝐘\mathbf{Y}^{\star}. We also obtain the estimated version of Σ\Sigma^{\star} as Σ^\hat{\Sigma}:

Σ^=𝐗^𝐗^=𝐘^𝐘^Σ.\displaystyle\hat{\Sigma}=\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}=\hat{\mathbf{Y}}^{\top}\hat{\mathbf{Y}}\approx\Sigma^{\star}. (7)

It should be noticed that the factorization of 𝐖^\hat{\mathbf{W}} is not unique. For each pair of estimated auxiliary matrices (𝐗^,𝐘^\hat{\mathbf{X}},\hat{\mathbf{Y}}), there are infinite unitary matrices 𝐓\mathbf{T} that satisfy:

𝐖^(ijk)𝐗^𝐘^=(𝐗^𝐓)(𝐓𝐘^).\hat{\mathbf{W}}^{\sharp{(ijk)}}\approx\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top}=(\hat{\mathbf{X}}\mathbf{T})(\mathbf{T}^{\top}\hat{\mathbf{Y}}^{\top}). (8)

Among the unitary matrices, there is a rectification matrix 𝐑^\hat{\mathbf{R}} that minimizes the distance between the auxilliary matrices (𝐗^,𝐘^\hat{\mathbf{X}},\hat{\mathbf{Y}}) and the ground truth (𝐗,𝐘{\mathbf{X}}^{\star},{\mathbf{Y}}^{\star}):

min𝐑^𝐗𝐑^𝐗𝐅2+𝐘𝐑^𝐘𝐅2.\underset{\hat{\mathbf{R}}}{\min}||{\mathbf{X}}\hat{\mathbf{R}}-{\mathbf{X}}^{\star}||^{2}_{\mathbf{F}}+||{\mathbf{Y}}\hat{\mathbf{R}}-{\mathbf{Y}}^{\star}||^{2}_{\mathbf{F}}. (9)
Theorem 1.

On condition that each pixel of the observed image 𝐖\mathbf{W} only has i.i.d. Gaussian noise (𝐆(ijk)i.i.d.𝒩(0,σ02)\mathbf{G}^{(ijk)}\stackrel{{\scriptstyle\mathrm{i.i.d.}}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}) in Eq. (LABEL:Eq_Wsharp_noise)) and the low-rank solver achieves the global minimum, the differences between the rectified auxiliary matrices estimations 𝐗^\hat{\mathbf{X}} and 𝐘^\hat{\mathbf{Y}} and the ground truth 𝐗\mathbf{X}^{\star} and 𝐘\mathbf{Y}^{\star} are:

𝐗^𝐑^𝐗=𝐙𝐗+ΨX𝐘^𝐑^𝐘=𝐙𝐘+ΨY,\begin{array}[]{l}{\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star}=\mathbf{Z}_{\mathbf{X}}+\Psi}_{X}\\ {\hat{\mathbf{Y}}\hat{\mathbf{R}}-\mathbf{Y}^{\star}=\mathbf{Z}_{\mathbf{Y}}+\Psi}_{Y}\end{array}, (10)

where the unitary matrix 𝐑^r×r\hat{\mathbf{R}}\in\mathbb{R}^{r\times r} (𝐑^𝐑^=𝐈\hat{\mathbf{R}}\hat{\mathbf{R}}^{\top}=\mathbf{I}) is the rectification matrix to rectify (𝐗^\hat{\mathbf{X}}, 𝐘^\hat{\mathbf{Y}}) to (𝐗\mathbf{X}^{\star}, 𝐘\mathbf{Y}^{\star}). ΨXK×r,ΨYL×r\Psi_{X}\in\mathbb{R}^{K\times r},\Psi_{Y}\in\mathbb{R}^{L\times r} are the negligible residuals. The rows of the error matrix 𝐙XK×r\mathbf{Z}_{X}\in\mathbb{R}^{K\times r} (resp. 𝐙YL×r\mathbf{Z}_{Y}\in\mathbb{R}^{L\times r}) are independent and obey:

𝐙X𝐞v i.i.d. 𝒩(0,σ02(Σ)1), for 1jK𝐙Y𝐞v i.i.d. 𝒩(0,σ02(Σ)1), for 1jK,\begin{array}[]{ll}{\mathbf{Z}_{X}^{\top}\mathbf{e}_{v}\stackrel{{\scriptstyle\text{ i.i.d. }}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}({\Sigma}^{\star})^{-1}),}&{\text{ for }1\leq j\leq K}\\ {\mathbf{Z}_{Y}^{\top}\mathbf{e}_{v}\stackrel{{\scriptstyle\text{ i.i.d. }}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}({\Sigma}^{\star})^{-1}),}&{\text{ for }1\leq j\leq K}\end{array}, (11)

where 𝐞v\mathbf{e}_{v} is the basis vector.

We provide the proof of theorem 1 in Section A. This proof is specifically designed for the full HSI patch denoising scenario and is clearer and more concise than the one in [31].

In theorem 1, we make two assumptions, i.e., the Gaussian noise distribution and the global minimum obtained from the low-rank solver. The former ensures the correctness of the error propagation equations. The latter ensures that the residual noises ΨX\Psi_{X} and ΨY\Psi_{Y} are negligible compared to 𝐙𝐗\mathbf{Z}_{\mathbf{X}} and 𝐙𝐘\mathbf{Z}_{\mathbf{Y}}, and Eq. (10) is approximated to:

𝐗^𝐑^𝐗𝐙𝐗𝐘^𝐑^𝐘𝐙𝐘.\begin{array}[]{l}{\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star}\approx\mathbf{Z}_{\mathbf{X}}}\\ {\hat{\mathbf{Y}}\hat{\mathbf{R}}-\mathbf{Y}^{\star}\approx\mathbf{Z}_{\mathbf{Y}}}\end{array}. (12)

Assuming that the first-order expansion is reasonably tight and omitting the high order expansion, the element-wise error between the estimation 𝐖^uv(ijk)\hat{\mathbf{W}}_{uv}^{\sharp{(ijk)}} and the ground truth 𝐖uv(ijk){\mathbf{W}}_{uv}^{\sharp\star{(ijk)}} at the position uvuv is:

𝐖^uv(ijk)𝐖uv(ijk)\displaystyle\hat{\mathbf{W}}_{uv}^{\sharp{(ijk)}}-{\mathbf{W}}_{uv}^{\sharp\star{(ijk)}} (13)
=[𝐞u𝐗^𝐑^(𝐘^𝐑^)𝐞v𝐞u(𝐗𝐘)𝐞v]\displaystyle=[\mathbf{e}_{u}^{\top}\hat{\mathbf{X}}\hat{\mathbf{R}}(\hat{\mathbf{Y}}\hat{\mathbf{R}})^{\top}\mathbf{e}_{v}-\mathbf{e}_{u}^{\top}(\mathbf{X}^{\star}{\mathbf{Y}^{\star\top}})\mathbf{e}_{v}]
=[𝐞u(𝐗^𝐑^𝐗+𝐗)(𝐘^𝐑^𝐘+𝐘)𝐞v𝐞u(𝐗𝐘)𝐞v]\displaystyle=[\mathbf{e}_{u}^{\top}(\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star}+\mathbf{X}^{\star})(\hat{\mathbf{Y}}\hat{\mathbf{R}}-\mathbf{Y}^{\star}+\mathbf{Y}^{\star})^{\top}\mathbf{e}_{v}-\mathbf{e}_{u}^{\top}(\mathbf{X}^{\star}{\mathbf{Y}^{\star\top}})\mathbf{e}_{v}]
[𝐞u(𝐗^𝐑^𝐗)𝐘𝐞v+𝐞u𝐗(𝐘^𝐑^𝐘)𝐞v]\displaystyle\approx[\mathbf{e}_{u}^{\top}(\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star})\mathbf{Y}^{\star\top}\mathbf{e}_{v}+\mathbf{e}_{u}^{\top}\mathbf{X}^{\star}(\hat{\mathbf{Y}}\hat{\mathbf{R}}-\mathbf{Y}^{\star})^{\top}\mathbf{e}_{v}]
=[𝐞u(𝐙X𝐘)𝐞v+𝐞u(𝐗𝐙Y)𝐞v],\displaystyle=[\mathbf{e}_{u}^{\top}(\mathbf{Z}_{X}{\mathbf{Y}^{\star\top}})\mathbf{e}_{v}+\mathbf{e}_{u}^{\top}(\mathbf{X}^{\star}\mathbf{Z}_{Y}^{\top})\mathbf{e}_{v}],

where the bases 𝐞u,𝐞v\mathbf{e}_{u},\ \mathbf{e}_{v} localize the elements involved in calculating the element in position (u,v)(u,v). After some manipulation, we have the element-wise variance of the error as:

var(𝐖^uv(ijk)𝐖uv(ijk))\displaystyle{\operatorname{var}}(\hat{\mathbf{W}}_{uv}^{\sharp{(ijk)}}-{\mathbf{W}}_{uv}^{\sharp\star{(ijk)}}) (14)
(i)[var(𝐞u(𝐙X𝐘)𝐞v)+var(𝐞u(𝐗𝐙Y)𝐞v)]\displaystyle\stackrel{{\scriptstyle(\mathrm{i})}}{{\approx}}[{\operatorname{var}}(\mathbf{e}_{u}^{\top}(\mathbf{Z}_{X}{\mathbf{Y}^{\star\top}})\mathbf{e}_{v})+{\operatorname{var}}(\mathbf{e}_{u}^{\top}(\mathbf{X}^{\star}\mathbf{Z}_{Y}^{\top})\mathbf{e}_{v})]
=( ii )σ02[𝐞v𝐘(Σ)1𝐘𝐞v+𝐞u𝐗(Σ)1𝐗𝐞u]\displaystyle\stackrel{{\scriptstyle(\text{ ii })}}{{=}}\sigma_{0}^{2}[{\mathbf{e}_{v}}^{\top}\mathbf{Y}^{\star}(\Sigma^{\star})^{-1}{\mathbf{Y}^{\star\top}}{\mathbf{e}_{v}}+\mathbf{e}_{u}^{\top}\mathbf{X}^{\star}(\Sigma^{\star})^{-1}\mathbf{X}^{\star\top}\mathbf{e}_{u}]
=( iii )σ02(𝐔u,.22+𝐕v,.22)=σ02vuv,\displaystyle\stackrel{{\scriptstyle(\text{ iii })}}{{=}}\sigma_{0}^{2}(\|\mathbf{U}_{u,.}^{\star}\|_{2}^{2}+\|\mathbf{V}_{v,.}^{\star}\|_{2}^{2})=\sigma_{0}^{2}{v}_{uv}^{\star},

where (i) is derived from theorem 1 since 𝐙X\mathbf{Z}_{X} and 𝐙Y\mathbf{Z}_{Y} are nearly independent. (ii) is obtained from Eq. (10). (iii) 𝐔u,.\mathbf{U}_{u,.}^{\star} and 𝐕v,.\mathbf{V}_{v,.}^{\star} are the uu/vvth row of 𝐔\mathbf{U}^{\star} and 𝐕\mathbf{V}^{\star}. For conciseness, we define vuv=(𝐔u,.22+𝐕v,.22){v}_{uv}^{\star}=(\|\mathbf{U}_{u,.}^{\star}\|_{2}^{2}+\|\mathbf{V}_{v,.}^{\star}\|_{2}^{2}), and var(){\operatorname{var}}(\cdot) is the variance.

In practice, the rows of the ground truth 𝐔u,.\mathbf{U}_{u,.}^{\star} and 𝐕v,.\mathbf{V}_{v,.}^{\star} are impossible to obtain. Assuming that the solver achieves a solution close to the global minimum, we approximate them with the estimated version 𝐔^u,.\hat{\mathbf{U}}_{u,.} and 𝐕^v,.\hat{\mathbf{V}}_{v,.}, which are the uu/vvth row of the estimations 𝐔^\hat{\mathbf{U}} and 𝐕^\hat{\mathbf{V}}. Thus the element-wise variance can be approximated as:

var(𝐖^uv(ijk)𝐖uv(ijk))σ02(𝐔^u,.22+𝐕^v,.22)=σ02v^uv,\displaystyle{\operatorname{var}}(\hat{\mathbf{W}}_{uv}^{\sharp{(ijk)}}-{\mathbf{W}}_{uv}^{\sharp\star{(ijk)}}){\approx}\sigma_{0}^{2}(\|\hat{\mathbf{U}}_{u,.}\|_{2}^{2}+\|\hat{\mathbf{V}}_{v,.}\|_{2}^{2}){=}\sigma_{0}^{2}\hat{v}_{uv}, (15)

where v^uv\hat{v}_{uv} can be regarded as the estimated version of vuv{v}_{uv}^{\star}. Due to the relationship of 𝐖uv(ijk)\mathbf{W}^{\sharp\star(ijk)}_{uv} and 𝐖^uv(ijk)\hat{\mathbf{W}}_{uv}^{\sharp{(ijk)}}, the corresponding variance of 𝐖^uv(ijk)\hat{\mathbf{W}}^{(ijk)}_{uv} is Eq. (16) and provides a closed-form variance describing the confidence of 𝐖^uv(ijk)\hat{\mathbf{W}}^{(ijk)}_{uv}. The variance of 𝐖^uv(ijk)\hat{\mathbf{W}}^{(ijk)}_{uv} is:

σξ(ijk)2=var(𝐖^uv(ijk)𝐖uv(ijk))g1(σ02v^uv),{\sigma_{\xi}^{(ijk)}}^{2}={\operatorname{var}}(\hat{\mathbf{W}}^{(ijk)}_{uv}-\mathbf{W}_{uv}^{\star{(ijk)}})\approx g^{-1}(\sigma_{0}^{2}\hat{v}_{uv}), (16)

where g1()g^{-1}(\cdot) is the inverse permutation function of g()g(\cdot). ξ{1,,M}×{1,,N}×{1,,P}\xi\subseteq\{1,\cdots,M\}\times\{1,\cdots,N\}\times\{1,\cdots,P\} is the corresponding 3D index of the 2D uvuv at patch (ijk)(ijk).

III-C Uncertainty estimation of the sliding-window

Another unique characteristic of the uncertainty quantification in the LRMA technique is the “uncertainty mixing” caused by the sliding-window strategy. The sliding window approach is seldom used in the 2D image domain, and recent work [31] did not address this issue. Thus we propose a method to solve the “uncertainty mixing” issue in HSI denoising due to the sliding-window strategy. Unlike averaging all patches to generate the final restored HSI (Eq. (2)), the uncertainty of the product does not follow the typical error propagation, such as estimating the uncertainty of multiple observations [35]. In the scenario of uncertainty estimation for i.i.d. observations, if the variance of each observation is σo2\sigma_{o}^{2} and we conduct n\mathrm{n} times observation, the variance of the averaged observation is 1nσo2\frac{1}{n}\sigma_{o}^{2}. This shows that the accuracy of the average observation is nn times more accurate than every single observation. However, in the LRMA process, we only make one observation. If we can process the entire HSI using one-time LRMA and the accuracy is σo2\sigma_{o}^{2}, the sliding-window strategy in LRMA should not exceed σo2\sigma_{o}^{2} because no new information has been introduced. In other words, 1nσo2\frac{1}{n}\sigma_{o}^{2} is not the correct uncertainty. The difference between multiple i.i.d. observations and the single processing scenarios is that the former has i.i.d. observations for all events, whereas the observations in the latter are correlated. In other words, the covariance between the overlapped patches in the LRMA is not null. For two related events XX and YY, the ratio η\eta is often used to describe the relationship:

η=cov(X,Y)var(X)var(Y),\eta=\frac{\operatorname{cov}(X,Y)}{\sqrt{\operatorname{var}(X)\cdot\operatorname{var}(Y)}}, (17)

where cov()\operatorname{cov}(\cdot) defines the covariance of the two events. If η\eta is 0, the events XX and YY are i.i.d. If η\eta is 1, the events XX and YY are linearly correlated, having the highest correlation.

We provide a toy example as the weighted average of events XX and YY as F=aX+bYF=\mathrm{a}X+\mathrm{b}Y. Assuming the variance of XX and YY are σX2\sigma_{X}^{2} and σY2\sigma_{Y}^{2}, the propagated uncertainty of FF is σF2\sigma_{F}^{2}:

σF2=a2σX2+b2σY2+2ηXYabσXσY,\sigma_{F}^{2}=\mathrm{a}^{2}\sigma_{X}^{2}+\mathrm{b}^{2}\sigma_{Y}^{2}+2\eta_{XY}\mathrm{a}\mathrm{b}\sigma_{X}\sigma_{Y}, (18)

where ηXY\eta_{XY} is the variance ratio defining the relation between events XX and YY.

For convenience, we rewrite Eq. (5) to the element-wise form:

𝐖^ξ=1ϕ(ξ)(i,j,k)Φ(ξ)fp(𝐖^(ijk))uv\displaystyle\hat{\mathbf{W}}_{\xi}=\frac{1}{\phi(\xi)}\sum_{(i,j,k)\in\Phi(\xi)}\mathit{f}_{p}({\hat{\mathbf{W}}^{(ijk)})_{uv}} (19)
such that (u,v)=G(ξ),\displaystyle\text{such that\ }(u,v)={G}(\xi),

where Φ(ξ)\Phi(\xi) defines the index of the patches covering the position (u,v)(u,v) and ϕ(ξ)\phi(\xi) is the number of those patches. Similar to the permutation function g()g(\cdot), G()G(\cdot) converts the 3D index of 𝐖^\hat{\mathbf{W}} to the 2D premutated version 𝐖^\hat{\mathbf{W}}^{\sharp}. By denoting the element-wise variance of each fp(𝐖^(ijk))uv\mathit{f}_{p}({\hat{\mathbf{W}}^{(ijk)})_{uv}} is σuv(ijk)\sigma^{(ijk)}_{uv}, based on the uncertainty propagation theory, we obtain the variance that describes 𝐖^uv\hat{\mathbf{W}}_{uv} as:

σ^ξ2\displaystyle\hat{\sigma}_{\xi}^{2} =1ϕ(ξ)2(i,j,k)ϕ(ξ)σξ(ijk)2+\displaystyle=\frac{1}{\phi(\xi)^{2}}\sum_{(i,j,k)\in\phi(\xi)}{\sigma_{\xi}^{(ijk)}}^{2}+ (20)
1ϕ(ξ)2(i,j,k)ϕ(ξ)(i,j,k)ϕ(ξ)2η(ijk)(ijk)σξ(ijk)σξ(ijk),\displaystyle\frac{1}{\phi(\xi)^{2}}\sum_{(i,j,k)\in\phi(\xi)}{\sum_{(i^{\prime},j^{\prime},k^{\prime})\in\phi(\xi)}{2\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})}{\sigma^{(ijk)}_{\xi}}{\sigma^{(i‘j’k‘)}_{\xi}}}},

where each (i,j,k)(i,j,k) is different from (i,j,k)(i^{\prime},j^{\prime},k^{\prime}). η(ijk)(ijk)\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})} defines the relation between the element ξ\xi in patch (ijk)(ijk) and patch (ijk)(i^{\prime}j^{\prime}k^{\prime}). Readers may be aware that this is an extended multiple event version of the toy example in Eq. (18).

We provide a practical solution to estimate the optimal ratio η(ijk)(ijk)\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})}. Similar to theorem 1, we assume that the adopted low-rank solver (GoDec algorithm in this paper, but it could be any solver) achieves the global minimum, the information on the denoised element ξ\xi in the patch 𝐖^(ijk)\hat{\mathbf{W}}^{(ijk)} is determined by the sum of the information embedded in all the elements in the patch 𝐖^(ijk)\hat{\mathbf{W}}^{(ijk)}. The mutual information encoded in two overlapping patches is linearly related to the number of elements shared by the overlapping patches [36]. Thus the mutual information of the element from two patches is linearly related to the number of shared elements in the two patches. Therefore, the optimal ratio between the two patches η(ijk)(ijk)\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})} is close to:

η(ijk)(ijk)[N((ijk),(ijk))KL],\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})}{\to}\left[\frac{\mathit{N}((ijk),(i^{\prime}j^{\prime}k^{\prime}))}{KL}\right]^{-}, (21)

where N((ijk),(ijk))\mathit{N}((ijk),(i^{\prime}j^{\prime}k^{\prime})) defines the number of shared elements in patch (ijk)(ijk) and patch (ijk)(i^{\prime}j^{\prime}k^{\prime}). []\left[\cdot\right]^{-} means slightly below \cdot. The ratio on the left is close to but smaller than that of the right term. The result is attributed to the assumption of a global minimum. If the global minimum is guaranteed, we have the information in each patch fully exploited; thus, the ratio is linearly related to the overlap. Thus, the mutual information and number of elements are not strictly linearly related. Extreme case is the full overlap (η(ijk)(ijk)=1\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})}=1) and no overlap (η(ijk)(ijk)=0\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})}=0). Otherwise, if ΨX\Psi_{X} and ΨY\Psi_{Y} are not ignored, or the low-rank solver does not converge adequately, the variance based on the local minimum is not accurate and behaves more randomly; thus, the ratio is less than the expected right term. In the extreme case when the low-rank solver fails and converges randomly either due to bad performance of the algorithm or heavy random noise, the η(ijk)(ijk)\eta^{(ijk)(i^{\prime}j^{\prime}k^{\prime})} should be 0. All other things being equal, a smaller ratio will lead to a smaller variance and slight over-confidence. However, a quantitative assessment of the approximation of Eq. (21) is difficult because the performance of the low-rank solver remains difficult to quantify.

Fig. 3 presents an example of Eq. (21). Each neighboring block pair has 50%50\% overlap. Based on the statistics of the number of overlapping elements, the η\eta of the pairs (1,2)(1,2), (2,3)(2,3), (3,4)(3,4) and (4,1)(4,1) is 0.5, whereas the η\eta of pairs (1,3)(1,3) and (2,4)(2,4) is 0.25.

Refer to caption
Figure 3: An example to explain the definition of mutual information defined in Eq. (21). Each of the four 2D overlapping patches 1, 2, 3, and 4 has four pixels.

III-D The impact of random noise

The result of Eq. (16) is based on the assumption that the noises ΨX\Psi_{X} and ΨY\Psi_{Y} are significantly smaller than the variables 𝐙𝐗\mathbf{Z}_{\mathbf{X}} and 𝐙𝐘\mathbf{Z}_{\mathbf{Y}} with a normal distribution. If ΨX\Psi_{X} and ΨY\Psi_{Y} are not ignored, the conclusion does not hold. Therefore, it remains challenging to quantify the impact of random noise versus Gaussian noise. However, as [33] pointed out, the RPCA-like formulation (Eq. (LABEL:Eq_image_recovery).) is robust against random noise since the low-rank solver will adjust the use of the norm-1 or norm-2 to regularize the state matrix. Correspondingly, elements not corrupted by random noise are unaffected, whereas the impulse noise can be identified. Therefore, regarding the impact of random noise, the robustness of our approach is closely related to the choice of the LRMA algorithm. It should be noted that most existing HSI denoising studies have focused on modeling Gaussian noise [21, 25]. A few LRMA-based methods have addressed non-Gaussian noise [24]. These approaches utilize a total-variance related term to filter out non-Gaussian noise in the HSI to maintain good boundaries after filtering. Similarly, the proposed uncertainty estimation method can also be integrated with the method in [24] since the total-variance term is only a constraint.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 4: (a) False-color image of the Indian Pines dataset using bands 10, 37, and 70. This is the manually classified ground truth of the Indian Pines dataset. (b) False-color image of the University of Pavia dataset composed of the bands 20, 42, and 72. (c) False-color image of the Washington D.C dataset composed of bands 60, 27, and 17.
Refer to caption
(a) Indian dataset. σ0=0.050\sigma_{0}=0.050
Refer to caption
(b) Indian dataset. σ0=0.075\sigma_{0}=0.075
Refer to caption
(c) Indian dataset. σ0=0.100\sigma_{0}=0.100
Refer to caption
(d) Indian dataset. σ0=0.125\sigma_{0}=0.125
Refer to caption
(e) University of Pavia dataset. σ0=0.050\sigma_{0}=0.050
Refer to caption
(f) University of Pavia dataset. σ0=0.075\sigma_{0}=0.075
Refer to caption
(g) University of Pavia dataset. σ0=0.100\sigma_{0}=0.100
Refer to caption
(h) University of Pavia dataset. σ0=0.125\sigma_{0}=0.125
Refer to caption
(i) Washington,D.C. σ0=0.050\sigma_{0}=0.050
Refer to caption
(j) Washington,D.C. σ0=0.075\sigma_{0}=0.075
Refer to caption
(k) Washington,D.C. σ0=0.100\sigma_{0}=0.100
Refer to caption
(l) Washington,D.C. σ0=0.125\sigma_{0}=0.125
Figure 5: The Q-Q plots of a random element position in 100 independent trials to assess the noise in the three datasets.

IV Results and discussion

Our work aims at providing an uncertainty quantification approach and does not consider convergence analysis, hyperparameter definition, denoised image visualization, or comparison of HSI denoising methods. For clarifications on the methods described in Section II, we refer the reader to [20, 24, 26, 21, 27, 28]. This work is NOT a new LRMA formulation. We only demonstrate that the proposed closed-form uncertainty estimation approach (Section III) quantifies the denoised HSI accurately, efficiently, and robustly. Since previous works have not provided any uncertainty quantification methods, we only show that our proposed approach can statistically describe the uncertainty based on the Monte Carlo tests.

To the best of our knowledge, a Monte Carlo test is the only appropriate method to validate the efficiency of the proposed method. The complexity of the LRMA formulation hinders the closed-form retrieval of the posterior distribution of the denoised image. This problem motivated us to provide the closed-form estimation of the variance. Therefore, we could not design an experiment providing the ground truth information on both the noisy image and the denoised image. Moreover, regarding the Monte Carlo tests, we should emphasize that the noise of the original image follows identical Gaussian distribution while noise of the denoised image is non-identical. The noise of any pixel in original observations follows 𝐆(ijk)i.i.d.𝒩(0,σ02)\mathbf{G}^{(ijk)}\stackrel{{\scriptstyle\mathrm{i.i.d.}}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}). However, as Eq. (16) indicates, the noise of the denoised image depends on the ratio v^uv\hat{v}_{uv}. Thus, the pixels of the denoised image do not maintain the same noise level. The simulated identical noise can only be enforced on the original image. The Monte Carlo test is the optimal method to obtain the posterior distribution of the denoised image. Theoretically, the Monte Carlo test can generate the ground truth variance when the number of trials is large enough.

We validate the proposed method on the Indian Pines dataset (simulation dataset) 111https://engineering.purdue.edu/ biehl/MultiSpec/hyperspectral.html, the University of Pavia dataset (real dataset) 222http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral
_Remote_Sensing_Scenes
, and the Washington, D.C. mall dataset 333https://engineering.purdue.edu/ biehl/MultiSpec/hyperspectral.html (real dataset). The Indian Pines dataset is a subset of the data acquired by the AVIRIS sensor. It consists of 145×145145\times 145 pixels and 224 spectral reflectance bands in the wavelength range of 0.42.51060.4–2.510^{-6} meters and covers the Indian Pines test site in northwestern Indiana, USA. The Indian Pines scene consists of two-thirds agricultural land and one-third forest or other natural perennial vegetation. We use the ground truth of the Indian Pines dataset and simulate arbitrary Gaussian noise. The University of Pavia dataset was acquired by the ROSIS sensor and covers Pavia in northern Italy. It consists of 610×610610\times 610 pixels with 103 spectral bands; however, some of the samples in the images contained no information and were discarded before the analysis. The geometric resolution is 1.3 meters. We select a subset of 610×610×103610\times 610\times 103 in the experiments. In the Washington, D.C. mall dataset, the image contains 191 spectral bands with a size of 1208×3071208\times 307. We clip the image to a size of 256×256×191256\times 256\times 191 in our experiment. Fig. 4 shows the false-color images of the three datasets.

We strictly follow the parameter and noise setting of the selected LRMA method to provide a convincing validation [20]. The size of the sliding window is 20×20×P20\times 20\times P, where PP is the number of bands in the dataset. The step size is 4, and the rank r\mathrm{r} is 7. In the image restoration process, all images are normalized to the range [0 1][0\ 1]. The zero-mean Gaussian noise is uniformly applied to all pixels in the datasets with different levels of noise. Since the two real datasets do not have ground truth, we use the output of [20] as the ground truth. The impact of impulse noise is further tested on the datasets. It should be noted that this research is based on the known raw HSI noise, which can be retrieved using previously described methods [29, 30].

TABLE I: The coverage rates of var(𝐖^ξ{l}𝐖^¯ξ){\operatorname{var}}(\hat{\mathbf{W}}_{\xi}^{\{l\}}-\overline{\hat{\mathbf{W}}}_{\xi}) for different σ0\sigma_{0} for 100 Monte Carlo trials. “Std” denotes the standard deviation. The proposed closed-form uncertainty estimation approach describes the probability distribution well.
Indian University of Pavia Washington, D.C.
σ0\sigma_{0} Mean Std Mean Std Mean Std
0.025 0.9514 0.0483 0.9573 0.0515 0.9411 0.0583
0.050 0.9579 0.0371 0.9476 0.0292 0.9368 0.0348
0.075 0.9576 0.0383 0.9556 0.0263 0.9501 0.0311
0.100 0.9643 0.0319 0.9652 0.0233 0.9626 0.0266
0.125 0.9704 0.0273 0.9735 0.0205 0.9712 0.0297
Refer to caption
Figure 6: The relationship between the average coverage rate and the rank at the noise level of σ0=0.125\sigma_{0}=0.125. As compliment to Table I with fixed scale of noise, this figure provides a comparison of different level of noises.

IV-A The validation of the uncertainty description

The accuracy of the proposed closed-form uncertainty quantification method is extensively validated with numerical Monte Carlo tests. We apply different levels of Gaussian noises to the HSI. The numerical experiments provide sufficient samples to test if the statistical distributions of the samples are described accurately with the output confidence interval. Specifically, T=100\mathrm{T}=100 Monte Carlo tests are performed on the input of the same noise σ02\sigma_{0}^{2}. We should emphasize that the 100 trials are chosen due to limited computational resources. Due to the large volume of the HSIs, 100 Monte Carlo trials are sufficient to obtain a statistic assessment of the efficiency of the proposed method. The code is provided for readers to conduct more tests. For each element of the estimated HSI in trial ll at position ξ\xi as 𝐖^ξ{l}\hat{\mathbf{W}}_{\xi}^{\{l\}}, we define its deviation from the corresponding T\mathrm{T} times mean value as 𝐖^¯ξ\overline{\hat{\mathbf{W}}}_{\xi}. The deviation from the mean is defined as:

eξ{l}=𝐖^ξ{l}𝐖^¯ξ1,l[1,,T].\mathrm{e}^{\{l\}}_{\xi}=||\hat{\mathbf{W}}_{\xi}^{\{l\}}-\overline{\hat{\mathbf{W}}}_{\xi}||_{1},\ l\in[1,\cdots,\mathrm{T}]. (22)

If the closed-form solution describes the statistical distribution of the estimation well, 95%95\% of 𝐖^ξ{l}\hat{\mathbf{W}}_{\xi}^{\{l\}} should fall within the range of [𝐖^¯ξ1.96σ^ξ,𝐖^¯ξ+1.96σ^ξ]\left[\overline{\hat{\mathbf{W}}}_{\xi}-1.96\hat{\sigma}_{\xi},\ \overline{\hat{\mathbf{W}}}_{\xi}+1.96\hat{\sigma}_{\xi}\right]. Following Eq. (22), we define the ratio of eξ{l}\mathrm{e}^{\{l\}}_{\xi} falling within the 1.96σ^ξ1.96\hat{\sigma}_{\xi} bound as the element-wise coverage rate. The average coverage rate of all elements is the average coverage rate. Table I presents the average and standard deviation of the coverage rate. The result indicates that the uncertainty approach robustly and accurately describes the element-wise denoised HSI when the level of noise is low (σ0\sigma_{0} less than 0.1000.100) 444https://JingweiSong@bitbucket.org/JingweiSong/uncertainty-estimation-for-hyper-spectral-image-denoising.git.. In the case σ00.100\sigma_{0}\geq 0.100, we notice an under-confident of the prediction. We hypothesize that the accuracy is related to the level of noise and the choice of the optimal rank. When the noise level increases, the fixed rank LRMA does not accurately filter the noise while retaining useful information. A lower rank is needed to ensure that the residual follows the assumption of the normal distribution of the noise. As depicted in Table I, we strictly follow [20] by setting r=7\mathrm{r}=7. To determine the reason, we conduct an experiment by choosing different r\mathrm{r}. The results are shown in Fig. (6), indicating that the performance (uncertainty quantification) of our approach improves by decreasing the arbitrary rank. This implies that the Monte Carlo experiment provides a method to validate the correct choice of the rank. Further studies are needed to determine the rank at high noise level.

IV-B The validation of the normal distribution of the restored HSI

In addition to the average coverage rate, we also adopt the Monte Carlo experiments to prove that the restored HSI 𝐖^{l}\hat{\mathbf{W}}^{\{l\}} has an innate Gaussian distribution. Quantile-Quantile (Q-Q) plots are created, and Shapiro-Wilk tests are conducted to validate the proposed approach. Fig. 5 shows the Q-Q plots of all the datasets. Note that we randomly plot one element in each dataset as an example, but all other elements are similar since the observation is isotropic. Fig. 5 indicates that the Monte Carlo estimation of all the trials 𝐖^{l}\hat{\mathbf{W}}^{\{l\}} is close to a line; thus, the sample follows the Gaussian distribution.

In the Shapiro-Wilk tests, it is not possible to present all the elements. Thus, we randomly choose several elements from the three datasets with different levels of Gaussian noise. The results are presented in Table II. The pp-values indicate that all samples follow the normal distribution (>0.05>0.05). Since all pixels are acquired under the same conditions, and their statistics should be consistent (either obey or reject the normal distribution assumption), we can conclude that the restored HSI still follows the Gaussian distribution.

TABLE II: The probability values (pp-value) of the Shapiro-Wilk tests of the Indian Pines, University of Pavia, and Washington, D.C. mall datasets. We cite the corresponding element positions shown in Fig. (5) as an example through all the tests. The pp-value >0.05>0.05 indicates a significant possibility that the element is normally distributed.
Dataset Indian University of Pavia Washington, D.C.
0.050 0.4239 0.1531 0.2046
0.075 0.1103 0.1924 0.0585
0.100 0.8498 0.7537 0.6977
0.125 0.4346 0.1531 0.5331

IV-C Time consumption comparison

In addition to the accuracy tests, we determine the time consumption. We find that the closed-form uncertainty estimation method has similar time consumption as the LRMA and far less time consumption than the Monte Carlo numerical tests. We use the LRMA code used in [20] and add the uncertainty estimation module. The proposed algorithm, the LRMA, and the Monte Carlo tests are all implemented in Matlab 2019a. In the Monte Carlo tests, we utilize the Parallel Computing Toolbox of Matlab to speed up the steps. The hardware set is 3.1GHz3.1GHz Intel Xeon E5-2687wv3 and 64 GB DDR4-Ram. In the Monte Carlo experiments, all 20 CPU threads are utilized. Note that in the Monte Carlo tests, each element is covered by several sliding windows. Since the memory is not sufficient to store all sliding windows, we calculate the result of the patch, store it on the hard disk, and reload it to calculate each element in the final step.

TABLE III: The time consumption of the Monte Carlo uncertainty estimation, the LRMA method [20], and the proposed method. Note that 100 trials are conducted in the Monte Carlo tests. The time is recorded in seconds.
Dataset Monte Carlo LRMA LRMA+Uncertainty
Indian 7862 653 864
University of Pavia 8393 815 930
Washington, D.C. 12437 1023 1590

Table III shows the time consumption of the Monte Carlo uncertainty estimation, the LRMA method [20], and the proposed method integrated into [20]. The 100 Monte Carlo trials are significantly speeded up because of the running in parallel using 20 CPU threads. The results indicate that the proposed closed-form estimation does not add a significant amount of time to the LRMA method, whereas the Monte Carlo test requires substantially more time. As indicated by Eq. (14), the proposed approach requires no more than one band-wise SVD decomposition of the given HSI in addition to the original LRMA solver.

IV-D The impact of random noise

TABLE IV: The coverage rate and standard deviation of different ratios of impulse noise. We add m%,m[5,10,20,30]\mathrm{m}\%,\mathrm{m}\in[5,10,20,30] ratios of impulse noise to the observations, and test the coverage rates of different scenarios in 100 Monte Carlo trials.
5% 10% 20% 30%
Dataset σ0\sigma_{0} Mean Std Mean Std Mean Std Mean Std
Indian 0.050 0.9006 0.0619 0.8429 0.0770 0.7309 0.0924 0.6278 0.0952
0.075 0.9348 0.0476 0.9063 0.0578 0.8395 0.0745 0.7621 0.0860
0.100 0.9516 0.0382 0.9340 0.0456 0.8888 0.0606 0.8293 0.0747
0.125 0.9624 0.0318 0.9495 0.0379 0.9148 0.0516 0.8657 0.0663
University of Pavia 0.050 0.9008 0.0438 0.8472 0.0558 0.7466 0.0711 0.6578 0.0783
0.075 0.9434 0.0343 0.9189 0.0432 0.8640 0.0591 0.8020 0.0715
0.100 0.9649 0.0275 0.9519 0.0340 0.9195 0.0477 0.8778 0.0607
0.125 0.9778 0.0220 0.9700 0.0270 0.9496 0.0382 0.9201 0.0503
Washington, D.C 0.050 0.9010 0.0549 0.8536 0.0690 0.7649 0.0860 0.6840 0.0933
0.075 0.9456 0.0417 0.9250 0.0514 0.8781 0.0689 0.8248 0.0835
0.100 0.9674 0.0319 0.9566 0.0388 0.9301 0.0534 0.8954 0.0688
0.125 0.9798 0.0245 0.9735 0.0297 0.9566 0.0417 0.9330 0.0560

In addition to the coverage rate tests, we assess if the proposed approach is robust to the random noise from the sensor. As Eq. (2) shows, our work ignores the random noise 𝐍\mathbf{N} and only assumes that the residuals follow the Gaussian noise 𝐆\mathbf{G}. As described in Section III-D, we expect that the proposed method is robust to random noise such as impulse noise, dead lines, and stripes [37, 38]. We conduct the tests by deliberately corrupting the original HSI with random noise and test the performance of the proposed uncertainty quantification method. We simulate different proportions of impulse noise on all datasets and determine the coverage rate of the Monte Carlo experiments.

Table IV shows the results of the Monte Carlo experiments for different coverage rates. The results indicate a relationship between the robustness of the proposed closed-form method and the coverage rate. When the noise σ0\sigma_{0} is greater than 0.10.1, the uncertainty quantification approach is less sensitive to pulse noises. When the noise σ0\sigma_{0} is less than 0.0750.075, the proposed method is susceptible to random noise. Theoretically, the proposed approach is only effective for Gaussian noise, which is the low-rank prior to Eq. (2). Therefore, less Gaussian noise means less reliable of the assumption that 𝐍\mathbf{N} is less than 𝐆\mathbf{G} in Eq. (1). Thus, the results from the closed-form approach are less reliable. When the Gaussian noise level of the observation is low, the proposed approach tends to be over-confident of the result, and the result is lower than the coverage rate with a higher σ0\sigma_{0}. A higher σ0\sigma_{0} means that the distribution is closer to the impulse noise. Furthermore, since the solvers (either GoDec or ALM solver) are all based on SVD decomposition, our approach has the same computational complexity as the original LRMA methods.

V Conclusion

This is the first study that addresses the uncertainty quantification of LRMA-based HSI denoising techniques. We provide a time-efficient closed-form uncertainty propagation model to quantify the element-wise uncertainty of the denoised HSI without requiring any ground truth. We also propose a weighted average uncertainty formulation to tackle the “uncertainty mixing” problem caused by the sliding-window strategy in HSI denoising. Moreover, the proposed uncertainty estimation approach is independent of the choice of the LRMA solver. It is based on the global minimum assumption of LRMA; thus, a more accurate algorithm leads to a better performance of the uncertainty estimation. Extensive numerical experiments indicate that the proposed uncertainty quantification method is accurate and robust and describes the statistical behavior of the estimates obtained from the LRMA approach. The time consumption of the closed-form formulation is low; therefore, the method can be deployed in any state-of-the-art LRMA algorithm. The proposed uncertainty approach can also be used in applications that require multi-source or multi-scale data fusion, data assimilation, and remote sensing uncertainty estimation.

Future works may also focus on tackling the observable random noise, such as impulse noises, deadlines, and stripes. The percentage of the observable random noise can be utilized to adjust the proposed closed-form uncertainty method.

Appendix A Appendix

Proof of Theorem 1. If the selected LRMA solver provides the optimal solution to the objective function Eq. (LABEL:Eq_image_recovery), the optimizer 𝐗^\hat{\mathbf{X}}, 𝐘^\hat{\mathbf{Y}} should satisfy the local minima of Eq. (LABEL:Eq_image_recovery). F(𝐗^,𝐘^)\mathit{F}(\hat{\mathbf{X}},\hat{\mathbf{Y}}) are denoted as:

F(𝐗^,𝐘^)\displaystyle\mathit{F}(\hat{\mathbf{X}},\hat{\mathbf{Y}}) =12𝐖(ijk)g1(𝐗^𝐘^)F2\displaystyle=\frac{1}{2}\|\mathbf{W}^{(ijk)}-g^{-1}(\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\|_{\mathrm{F}}^{2} (23)
=(i)12𝐖(ijk)𝐗^𝐘^F2,\displaystyle\stackrel{{\scriptstyle(\mathrm{i})}}{{=}}\frac{1}{2}\|\mathbf{W}^{\sharp{(ijk)}}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top}\|_{\mathrm{F}}^{2},

where (i)(\mathrm{i}) is because g1()g^{-1}(\cdot) is the permutation function and g(𝐖(ijk))=𝐖(ijk)g(\mathbf{W}^{(ijk)})=\mathbf{W}^{\sharp(ijk)}. Assuming the solver yields F(𝐗^,𝐘^)F(\hat{\mathbf{X}},\hat{\mathbf{Y}}) close to the global minimum F(𝐗,𝐘)F(\mathbf{X}^{\star},\mathbf{Y}^{\star}). The first-order expansion of F(𝐗^,𝐘^)F(\hat{\mathbf{X}},\hat{\mathbf{Y}}) is close to the zero matrix defined as 𝒪\mathcal{O}. By solving Eq. (1) and ignoring random noise, we define the patch function:

𝐖(ijk)=𝐖(ijk)+𝐆(ijk)\displaystyle\mathbf{W}^{(ijk)}=\mathbf{W}^{\star{(ijk)}}+\mathbf{G}^{(ijk)} (24)
𝐖(ijk)=𝐖(ijk)+𝐆(ijk),\displaystyle\mathbf{W}^{\sharp(ijk)}=\mathbf{W}^{\sharp\star{(ijk)}}+\mathbf{G}^{{\sharp(ijk)}},

where 𝐆(ijk)=g(𝐆(ijk))\mathbf{G}^{{\sharp(ijk)}}=g(\mathbf{G}^{(ijk)}). The optimizer F(𝐗^,𝐘^)F(\hat{\mathbf{X}},\hat{\mathbf{Y}}) achieves local minima which is close to the global minima; thus, we obtain the partial derivatives equal to the zero matrix:

F(𝐗^,𝐘^)𝐗^\displaystyle\frac{\partial{F(\hat{\mathbf{X}},\hat{\mathbf{Y}})}}{\partial{\hat{\mathbf{X}}}} =(i)(𝐖(ijk)𝐗^𝐘^)𝐘^\displaystyle\stackrel{{\scriptstyle(\mathrm{i})}}{{=}}(\mathbf{W}^{\sharp(ijk)}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\hat{\mathbf{Y}} (25)
=(𝐆(ijk)+𝐖(ijk)𝐗^𝐘^)𝐘^\displaystyle=(\mathbf{G}^{\sharp(ijk)}+\mathbf{W}^{\star(ijk)}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\hat{\mathbf{Y}}
=(𝐆(ijk)+𝐗𝐘𝐗^𝐘^)𝐘^\displaystyle=(\mathbf{G}^{\sharp(ijk)}+\mathbf{X}^{\star}{\mathbf{Y}^{\star\top}}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\hat{\mathbf{Y}}
(ii)(𝐆(ijk)+𝐗𝐑^𝐘^𝐗^𝐘^)𝐘^𝒪,\displaystyle\stackrel{{\scriptstyle(\mathrm{ii})}}{{\approx}}(\mathbf{G}^{\sharp(ijk)}+\mathbf{X}^{\star}\hat{\mathbf{R}}^{\top}\hat{\mathbf{Y}}^{\top}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\hat{\mathbf{Y}}\approx\mathcal{O},
F(𝐗^,𝐘^)𝐘^\displaystyle\frac{\partial{F(\hat{\mathbf{X}},\hat{\mathbf{Y}})}}{\partial{\hat{\mathbf{Y}}}} =(i)𝐗^(𝐖(ijk)𝐗^𝐘^)\displaystyle\stackrel{{\scriptstyle(\mathrm{i})}}{{=}}\hat{\mathbf{X}}^{\top}(\mathbf{W}^{\sharp(ijk)}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top}) (26)
=𝐗^(𝐆(ijk)+𝐖(ijk)𝐗^𝐘^)\displaystyle=\hat{\mathbf{X}}^{\top}(\mathbf{G}^{\sharp(ijk)}+\mathbf{W}^{\star(ijk)}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})
=𝐗^(𝐆(ijk)+𝐗𝐘𝐗^𝐘^)\displaystyle=\hat{\mathbf{X}}^{\top}(\mathbf{G}^{\sharp(ijk)}+\mathbf{X}^{\star}{\mathbf{Y}^{\star\top}}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})
(ii)𝐗^(𝐆(ijk)+𝐗^𝐑^𝐘𝐗^𝐘^)𝒪,\displaystyle\stackrel{{\scriptstyle(\mathrm{ii})}}{{\approx}}\hat{\mathbf{X}}^{\top}(\mathbf{G}^{\sharp(ijk)}+\hat{\mathbf{X}}\hat{\mathbf{R}}{\mathbf{Y}}^{\star\top}-\hat{\mathbf{X}}\hat{\mathbf{Y}}^{\top})\approx\mathcal{O},

where (i) is the result of d𝐖(ijk)=d𝐗^𝐘^+𝐗^(d𝐘^)d\mathbf{W}^{\sharp(ijk)}=d{\hat{\mathbf{X}}}{\hat{\mathbf{Y}}^{\top}}+\hat{\mathbf{X}}(d\hat{\mathbf{Y}})^{\top}. (ii) is approximated by the fact that the rectified local minima (𝐗^𝐑^\hat{\mathbf{X}}\hat{\mathbf{R}},𝐘^𝐑^\hat{\mathbf{Y}}\hat{\mathbf{R}}) are close to the global minima (𝐗\mathbf{X}^{\star},𝐘\mathbf{Y}^{\star}). We manipulate Eq. (25) and Eq. (26) to:

𝐆(ijk)𝐘^(𝐗^𝐗𝐑^)𝐘^𝐘^\displaystyle\mathbf{G}^{\sharp(ijk)}\hat{\mathbf{Y}}\approx(\hat{\mathbf{X}}-\mathbf{X}^{\star}\hat{\mathbf{R}}^{\top})\hat{\mathbf{Y}}^{\top}\hat{\mathbf{Y}} (27)
(𝐗^𝐗𝐑^)𝐆(ijk)𝐘^(𝐘^𝐘^)1\displaystyle(\hat{\mathbf{X}}-\mathbf{X}^{\star}\hat{\mathbf{R}}^{\top})\approx\mathbf{G}^{\sharp(ijk)}\hat{\mathbf{Y}}(\hat{\mathbf{Y}}^{\top}\hat{\mathbf{Y}})^{-1}
(𝐗^𝐑^𝐗)𝐆(ijk)𝐘^(𝐘^𝐑^𝐑^𝐘^)1𝐑^\displaystyle(\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star})\approx\mathbf{G}^{\sharp(ijk)}\hat{\mathbf{Y}}(\hat{\mathbf{Y}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}\hat{\mathbf{Y}})^{-1}\hat{\mathbf{R}}
(𝐗^𝐑^𝐗)(i)𝐆(ijk)𝐘^𝐑^(𝐘^𝐑^𝐑^𝐘^)1\displaystyle(\hat{\mathbf{X}}\hat{\mathbf{R}}-\mathbf{X}^{\star})\stackrel{{\scriptstyle(\mathrm{i})}}{{\approx}}\mathbf{G}^{\sharp(ijk)}\hat{\mathbf{Y}}\hat{\mathbf{R}}(\hat{\mathbf{Y}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}\hat{\mathbf{Y}})^{-1}
(𝐑^𝐗^𝐗)𝐞v[𝐆(ijk)𝐘^(𝐘^𝐘^)1]𝐞v(ii)𝒩(0,σ02(Σ)1),\displaystyle(\hat{\mathbf{R}}^{\top}\hat{\mathbf{X}}^{\top}-\mathbf{X}^{\star\top})\mathbf{e}_{v}\approx[\mathbf{G}^{\sharp(ijk)}\widehat{\mathbf{Y}}(\widehat{\mathbf{Y}}^{\top}\widehat{\mathbf{Y}})^{-1}]^{\top}\mathbf{e}_{v}\stackrel{{\scriptstyle(\mathrm{ii})}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}(\Sigma^{\star})^{-1}),
𝐗^𝐆(ijk)=𝐗^𝐗^(𝐘^𝐑^𝐘)\displaystyle\hat{\mathbf{X}}^{\top}\mathbf{G}^{\sharp(ijk)}=\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}({\hat{\mathbf{Y}}}^{\top}-\hat{\mathbf{R}}{\mathbf{Y}^{\star\top}}) (28)
(𝐘^𝐑^𝐘)(𝐗^𝐗^)1𝐗^𝐆(ijk)\displaystyle({\hat{\mathbf{Y}}}^{\top}-\hat{\mathbf{R}}{\mathbf{Y}^{\star\top}})\approx(\hat{\mathbf{X}}^{\top}{\hat{\mathbf{X}}})^{-1}{\hat{\mathbf{X}}^{\top}}\mathbf{G}^{\sharp(ijk)}
(𝐑^𝐘^𝐘)𝐑^(𝐗^𝐑^𝐑^𝐗^)1𝐗^𝐆(ijk)\displaystyle(\hat{\mathbf{R}}^{\top}{\hat{\mathbf{Y}}}^{\top}-{\mathbf{Y}^{\star\top}})\approx\hat{\mathbf{R}}^{\top}(\hat{\mathbf{X}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}{\hat{\mathbf{X}}})^{-1}{\hat{\mathbf{X}}^{\top}}\mathbf{G}^{\sharp(ijk)}
(𝐑^𝐘^𝐘)(i)(𝐗^𝐑^𝐑^𝐗^)1𝐑^𝐗^𝐆(ijk)\displaystyle(\hat{\mathbf{R}}^{\top}{\hat{\mathbf{Y}}}^{\top}-{\mathbf{Y}^{\star\top}})\stackrel{{\scriptstyle(\mathrm{i})}}{{\approx}}(\hat{\mathbf{X}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}{\hat{\mathbf{X}}})^{-1}\hat{\mathbf{R}}^{\top}{\hat{\mathbf{X}}^{\top}}\mathbf{G}^{\sharp(ijk)}
(𝐑^𝐘^𝐘)𝐞v(𝐗^𝐗^)1𝐗^𝐆(ijk)𝐞v(iii)𝒩(0,σ02(Σ)1),\displaystyle(\hat{\mathbf{R}}^{\top}{\hat{\mathbf{Y}}^{\top}}-{\mathbf{Y}^{\star\top}})\mathbf{e}_{v}\approx(\widehat{\mathbf{X}}^{\top}{\widehat{\mathbf{X}}})^{-1}{\widehat{\mathbf{X}}^{\top}}\mathbf{G}^{\sharp(ijk)}\mathbf{e}_{v}\stackrel{{\scriptstyle(\mathrm{iii})}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}(\Sigma^{\star})^{-1}),

where the matrix exchange in step (i) is enabled because (𝐗^𝐑^𝐑^𝐗^)(\hat{\mathbf{X}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}{\hat{\mathbf{X}}}) and (𝐘^𝐑^𝐑^𝐘^)(\hat{\mathbf{Y}}^{\top}\hat{\mathbf{R}}^{\top}\hat{\mathbf{R}}{\hat{\mathbf{Y}}}) are diagonal matrices. We denote the rectified local minima (𝐗^\widehat{\mathbf{X}},𝐘^\widehat{\mathbf{Y}}) equal to (𝐗^𝐑^\hat{\mathbf{X}}\hat{\mathbf{R}},𝐘^𝐑^\hat{\mathbf{Y}}\hat{\mathbf{R}}) for a concise presentation.

The proof of step (ii): Each element of the Gaussian noise matrix 𝐆(ijk)𝒩(0,σ2)\mathbf{G}^{\sharp(ijk)}{\sim}\mathcal{N}(0,\sigma^{2}). According to Eq. (7), Σ\Sigma^{\star} is a diagonal matrix and the diagonal element at position [i,i][i,i] (Σii)1(\Sigma^{\star}_{ii})^{-1} equals to 1/𝐗.,i221/||\mathbf{X}_{.,i}||_{2}^{2} or 1/𝐘.,i221/||\mathbf{Y}_{.,i}||_{2}^{2} where 𝐗^.,i\widehat{\mathbf{X}}_{.,i} or 𝐘^.,i\widehat{\mathbf{Y}}_{.,i} is the iith column of matrix 𝐗^\widehat{\mathbf{X}} or 𝐘^\widehat{\mathbf{Y}}.

The proof of (i):

[𝐆(ijk)𝐘^(𝐘^𝐘^)1]𝐞v\displaystyle[\mathbf{G}^{\sharp(ijk)}\widehat{\mathbf{Y}}(\widehat{\mathbf{Y}}^{\top}\widehat{\mathbf{Y}})^{-1}]^{\top}\mathbf{e}_{v} (29)
=[i=1L1𝐘^.,1||22𝐘^i1𝐆vi(ijk)i=1L1𝐘^,r||22𝐘^ir𝐆vi(ijk)].\displaystyle=[\sum_{i=1}^{L}\frac{1}{\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2}}\widehat{\mathbf{Y}}_{i1}\mathbf{G}^{\sharp(ijk)}_{vi}\ \cdots\ \sum_{i=1}^{L}\frac{1}{\|\widehat{\mathbf{Y}}_{,r}||_{2}^{2}}\widehat{\mathbf{Y}}_{ir}\mathbf{G}^{\sharp(ijk)}_{vi}].

Take the first element i=1L1𝐘^.,1||22𝐘^i1𝐆vi(ijk)\sum_{i=1}^{L}\frac{1}{\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2}}\widehat{\mathbf{Y}}_{i1}\mathbf{G}^{\sharp(ijk)}_{vi} as an example:

var(i=1L1𝐘^.,1||22𝐘^i1𝐆vi(ijk))\displaystyle\operatorname{var}(\sum_{i=1}^{L}\frac{1}{\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2}}\widehat{\mathbf{Y}}_{i1}\mathbf{G}^{\sharp(ijk)}_{vi}) =(i=1L1𝐘^.,1||22𝐘^i1)2σ02\displaystyle=(\sum_{i=1}^{L}\frac{1}{\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2}}\widehat{\mathbf{Y}}_{i1})^{2}\sigma_{0}^{2} (30)
=1(𝐘^.,1||22)2(i=1L𝐘^i1)2σ02\displaystyle=\frac{1}{(\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2})^{2}}(\sum_{i=1}^{L}\widehat{\mathbf{Y}}_{i1})^{2}\sigma_{0}^{2}
=1(𝐘^.,1||22)σ02\displaystyle=\frac{1}{(\|\widehat{\mathbf{Y}}_{.,1}||_{2}^{2})}\sigma_{0}^{2}
(Σ11)1σ02.\displaystyle\approx(\Sigma^{\star}_{11})^{-1}\sigma_{0}^{2}.

The same conclusion also applies to other elements.

Similarly, the proof of (ii):

(𝐗^𝐗^)1𝐗^𝐆(ijk)𝐞v\displaystyle(\widehat{\mathbf{X}}^{\top}{\widehat{\mathbf{X}}})^{-1}{\widehat{\mathbf{X}}}\mathbf{G}^{\sharp(ijk)}\mathbf{e}_{v} (31)
=[i=1K1𝐗^.,1||22𝐗^i1𝐆iv(ijk)i=1K1𝐗^,r||22𝐗^i1𝐆iv(ijk)]\displaystyle=[\sum_{i=1}^{K}\frac{1}{\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2}}\widehat{\mathbf{X}}_{i1}\mathbf{G}^{\sharp(ijk)}_{iv}\ \cdots\ \sum_{i=1}^{K}\frac{1}{\|\widehat{\mathbf{X}}_{,r}||_{2}^{2}}\widehat{\mathbf{X}}_{i1}\mathbf{G}^{\sharp(ijk)}_{iv}]

Take the first element i=1K1𝐗^.,1||22𝐗^iv𝐆i1(ijk)\sum_{i=1}^{K}\frac{1}{\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2}}\widehat{\mathbf{X}}_{iv}\mathbf{G}^{\sharp(ijk)}_{i1} as an example:

var(i=1K1𝐗^.,1||22𝐗^i1𝐆iv(ijk))\displaystyle\operatorname{var}(\sum_{i=1}^{K}\frac{1}{\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2}}\widehat{\mathbf{X}}_{i1}\mathbf{G}^{\sharp(ijk)}_{iv}) =(i=1K1𝐗^.,1||22𝐗^i1)2σ02\displaystyle=(\sum_{i=1}^{K}\frac{1}{\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2}}\widehat{\mathbf{X}}_{i1})^{2}\sigma_{0}^{2} (32)
=1(𝐗^.,1||22)2(i=1K𝐗^i1)2σ02\displaystyle=\frac{1}{(\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2})^{2}}(\sum_{i=1}^{K}\widehat{\mathbf{X}}_{i1})^{2}\sigma_{0}^{2}
=1(𝐗^.,1||22)σ02\displaystyle=\frac{1}{(\|\widehat{\mathbf{X}}_{.,1}||_{2}^{2})}\sigma_{0}^{2}
(Σ11)1σ02.\displaystyle\approx(\Sigma^{\star}_{11})^{-1}\sigma_{0}^{2}.

The same conclusion also applies to other elements.

Following Eqs. (27) and (28), we strictly obtain the conclusion that the residual of Gaussian noise 𝐆(ijk)i.i.d.𝒩(0,σ02)\mathbf{G}^{(ijk)}\stackrel{{\scriptstyle\mathrm{i.i.d.}}}{{\sim}}\mathcal{N}(0,\sigma_{0}^{2}). Thus, ΨXK×r,ΨYL×r\Psi_{X}\in\mathbb{R}^{K\times r},\Psi_{Y}\in\mathbb{R}^{L\times r} are the negligible residuals due to numerical issues.

This concludes the proof of (i)(i) and (ii)(ii) and also the proof of theorem 1.

References

  • [1] H. Dehghan and H. Ghassemian, “Measurement of uncertainty by the entropy: application to the classification of mss data,” International journal of remote sensing, vol. 27, no. 18, pp. 4005–4014, 2006.
  • [2] G. M. Foody and P. M. Atkinson, Uncertainty in remote sensing and GIS.   John Wiley & Sons, 2003.
  • [3] W. Verhoef and H. Bach, “Coupled soil–leaf-canopy and atmosphere radiative transfer modeling to simulate hyperspectral multi-angular surface reflectance and toa radiance data,” Remote Sensing of Environment, vol. 109, no. 2, pp. 166–182, 2007.
  • [4] X. Chen, K. Yu, and H. Wu, “Determination of minimum detectable deformation of terrestrial laser scanning based on error entropy model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 1, pp. 105–116, 2017.
  • [5] X. Du and A. Zare, “Multiresolution multimodal sensor fusion for remote sensing data with label uncertainty,” IEEE Transactions on Geoscience and Remote Sensing, 2019.
  • [6] R. Senge, S. Bösner, K. Dembczyński, J. Haasenritter, O. Hirsch, N. Donner-Banzhoff, and E. Hüllermeier, “Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty,” Information Sciences, vol. 255, pp. 16–29, 2014.
  • [7] A. Oxley, Uncertainties in GPS Positioning: A mathematical discourse.   Academic Press, 2017.
  • [8] L. Bastin, P. F. Fisher, and J. Wood, “Visualizing uncertainty in multi-spectral remotely sensed imagery,” Computers & Geosciences, vol. 28, no. 3, pp. 337–350, 2002.
  • [9] E. Sarhrouni, A. Hammouch, and D. Aboutajdine, “Application of symmetric uncertainty and mutual information to dimensionality reduction of and classification hyperspectral images,” International Journal of Engineering and Technology (IJET), vol. 4, no. 5, pp. 268–276, 2012.
  • [10] Y. Zhou, A. Rangarajan, and P. D. Gader, “A spatial compositional model for linear unmixing and endmember uncertainty estimation,” IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5987–6002, 2016.
  • [11] M.-E. Polo, Á. M. Felicísimo, A. G. Villanueva, and J.-Á. Martinez-del Pozo, “Estimating the uncertainty of terrestrial laser scanner measurements,” IEEE transactions on geoscience and remote sensing, vol. 50, no. 11, pp. 4804–4808, 2012.
  • [12] P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza, “Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 37–78, 2017.
  • [13] Q. Zhang, Q. Yuan, J. Li, X. Liu, H. Shen, and L. Zhang, “Hybrid noise removal in hyperspectral imagery with a spatial–spectral gradient network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7317–7329, 2019.
  • [14] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2.   IEEE, 2005, pp. 60–65.
  • [15] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, “Nonlocal transform-domain filter for volumetric data denoising and reconstruction,” IEEE transactions on image processing, vol. 22, no. 1, pp. 119–133, 2012.
  • [16] Y. Qian and M. Ye, “Hyperspectral imagery restoration using nonlocal spectral-spatial structured sparse representation with noise estimation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 2, pp. 499–515, 2012.
  • [17] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [18] Q. Yuan, L. Zhang, and H. Shen, “Hyperspectral image denoising employing a spectral–spatial adaptive total variation model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 10, pp. 3660–3677, 2012.
  • [19] J. Li, Q. Yuan, H. Shen, and L. Zhang, “Noise removal from hyperspectral image with joint spectral–spatial distributed sparse representation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 9, pp. 5425–5439, 2016.
  • [20] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectral image restoration using low-rank matrix recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 4729–4743, 2013.
  • [21] L. Zhuang and J. M. Bioucas-Dias, “Fast hyperspectral image denoising and inpainting based on low-rank and sparse representations,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 3, pp. 730–742, 2018.
  • [22] W. He, H. Zhang, L. Zhang, and H. Shen, “Hyperspectral image denoising via noise-adjusted iterative low-rank matrix approximation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 3050–3061, 2015.
  • [23] Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang, “Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 2, pp. 1205–1218, 2018.
  • [24] W. He, H. Zhang, L. Zhang, and H. Shen, “Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration,” IEEE transactions on geoscience and remote sensing, vol. 54, no. 1, pp. 178–188, 2015.
  • [25] J. M. Bioucas-Dias and J. M. Nascimento, “Hyperspectral subspace identification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 46, no. 8, pp. 2435–2445, 2008.
  • [26] Y. Chen, X. Cao, Q. Zhao, D. Meng, and Z. Xu, “Denoising hyperspectral image with non-iid noise structure,” IEEE transactions on cybernetics, vol. 48, no. 3, pp. 1054–1066, 2017.
  • [27] H. Fan, C. Li, Y. Guo, G. Kuang, and J. Ma, “Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 10, pp. 6196–6213, 2018.
  • [28] Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, “Hyperspectral image restoration using weighted group sparsity-regularized low-rank tensor decomposition,” IEEE transactions on cybernetics, 2019.
  • [29] B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, P. Marcoionni, I. Pippi, and M. Selva, “Noise modelling and estimation of hyperspectral data from airborne imaging spectrometers,” Annals of Geophysics, vol. 49, no. 1, 2006.
  • [30] N. Acito, M. Diani, and G. Corsini, “Signal-dependent noise modeling and model parameter estimation in hyperspectral images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 8, pp. 2957–2971, 2011.
  • [31] Y. Chen, J. Fan, C. Ma, and Y. Yan, “Inference and uncertainty quantification for noisy matrix completion,” Proceedings of the National Academy of Sciences, vol. 116, no. 46, pp. 22 931–22 937, 2019.
  • [32] Y. Chen, Y. Chi, J. Fan, C. Ma, and Y. Yan, “Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization,” SIAM Journal on Optimization, vol. 30, no. 4, pp. 3098–3121, 2020.
  • [33] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization,” in Advances in neural information processing systems, 2009, pp. 2080–2088.
  • [34] T. Zhou and D. Tao, “Godec: Randomized low-rank & sparse matrix decomposition in noisy case,” in Proceedings of the 28th International Conference on Machine Learning, ICML 2011, 2011.
  • [35] G. Casella and R. L. Berger, Statistical inference.   Duxbury Pacific Grove, CA, 2002, vol. 2.
  • [36] R. M. Gray, Entropy and information theory.   Springer Science & Business Media, 2011.
  • [37] G. Vane, R. O. Green, T. G. Chrien, H. T. Enmark, E. G. Hansen, and W. M. Porter, “The airborne visible/infrared imaging spectrometer (aviris),” Remote sensing of environment, vol. 44, no. 2-3, pp. 127–143, 1993.
  • [38] R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aronsson, B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit, M. Solis et al., “Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (aviris),” Remote sensing of environment, vol. 65, no. 3, pp. 227–248, 1998.
[Uncaptioned image] Jingwei Song obtained his doctoral degree in Center of Autonomous System, University of Technology Sydney in 2020. Before that, he received B.E. degree in Remote sensing science and technology from Wuhan University, Wuhan in 2012. He obtained a master’s degree in Cartography and Geographic Information System from Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences in 2015. He is a research associate at the Department of Naval Architecture and Marine Engineering, University of Michigan, Ann Arbor. His research interests include computer vision, simultaneous localization and mapping, and robotics.
[Uncaptioned image] Shaobo Xia received the bachelor’s degree in geodesy and geomatics from the School of Geodesy and Geomatics, Wuhan University, Wuhan, China, in 2013, the master’s degree in cartography and geographic information systems from the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing, China, in 2016, and the Ph.D. degree in geomatics from the University of Calgary, Calgary, AB, Canada, in 2020. He is an Assistant Professor with the College of Environmental and Resource Sciences, Zhejiang A&F University, Hangzhou, China. His research interests include mobile mapping and remote sensing.
[Uncaptioned image] Jun Wang received B.E. degree in Remote sensing science and technology from Wuhan University, Wuhan, China, in 2012 and is currently pursuing the Ph.D. degree in Remote sensing science from University of Chinese Academy of Sciences. He also got a joint Ph.D. program in Robotics from University of Technology, Sydney. His research interests include computer vision, simultaneous localization and mapping, robotics.
[Uncaptioned image] Mitesh Patel received my Ph.D. in Robotics from Center of Autonomous Systems (CAS) at the University of Technology Sydney in 2014 where I focused on modeling a wide spectrum of high-level user activities (also known as activities of daily living) using different probabilistic techniques. Most recently, he was a senior research scientist at FX Palo Alto Laboratory Inc., where he worked on developing indoor localization system using variety of sensors such as RF, RGB, RGB-D images, LiDAR.
[Uncaptioned image] Dong Chen (Member, IEEE) received the bachelor’s degree in computer science from the Qingdao University of Science and Technology, Qingdao, China, in 2005, the master’s degree in cartography and geographical information engineering from the Xi’an University of Science and Technology, Xi’an, China, in 2009, and the Ph.D. degree in geographical information sciences from Beijing Normal University, Beijing, China, in 2013. He is an Associate Professor with Nanjing Forestry University, Nanjing, China. He is also a Post-Doctoral Fellow with the Department of Geomatics Engineering, University of Calgary, Calgary, AB, Canada. His research interests include image-and LiDAR-based segmentation and reconstruction, full-waveform LiDAR data processing, and related remote sensing applications in the field of forest ecosystems.