∎

¹¹institutetext: E. Demircan-Tureyen ²²institutetext: Computer Engineering, Istanbul Kultur University, 34158
Computer Engineering, Istanbul Technical University, 34467
Istanbul, Turkey
Tel.: +90-212-4984719
²²email: e.demircan@iku.edu.tr ³³institutetext: M. E. Kamasak ⁴⁴institutetext: Computer Engineering, Istanbul Technical University, 34467 Istanbul, Turkey
⁴⁴email: kamasak@itu.edu.tr

Nonlocal Adaptive Direction-Guided Structure Tensor Total Variation For Image Recovery ^†^†thanks: This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under 115R285.

Ezgi Demircan-Tureyen Mustafa E. Kamasak

(Received: date / Accepted: date)

Abstract

A common strategy in variational image recovery is utilizing the nonlocal self-similarity (NSS) property, when designing energy functionals. One such contribution is nonlocal structure tensor total variation (NLSTV), which lies at the core of this study. This paper is concerned with boosting the NLSTV regularization term through the use of directional priors. More specifically, NLSTV is leveraged so that, at each image point, it gains more sensitivity in the direction that is presumed to have the minimum local variation. The actual difficulty here is capturing this directional information from the corrupted image. In this regard, we propose a method that employs anisotropic Gaussian kernels to estimate directional features to be later used by our proposed model. The experiments validate that our entire two-stage framework achieves better results than the NLSTV model and two other competing local models, in terms of visual and quantitative evaluation.

Keywords:

Directional total variation Image recovery Nonlocal regularization Orientation field estimation Structure tensor.

1 Introduction

The general inverse imaging problems seek the recovery of the underlying image $\textbf{f}\in\mathbb{R}^{NC}$ (assuming that each $N$ -pixel channel is vectorized and all these $C$ channels are stacked together into a single vector) from an observation model of the form $\textbf{g}=\textbf{H}\textbf{f}+\bm{\eta}$ , where $\bm{\eta}$ is the noise and H is a known linear operator. The type of the recovery problem is determined by the almost always ill-conditioned H, which mainly corresponds to the impulse response of the imaging device. If $\textbf{H}=\textbf{I}$ , then the problem is denoising. Besides, it may be a matrix that functions as a mask (i.e., inpainting), or a circulant matrix with all shifts of a blurring kernel (i.e., deblurring). It may also be a composite operator, i.e., $\textbf{H}=\textbf{SF}$ , where F blurs and S down-samples (i.e., single-image super-resolution problem), or F applies the Fourier transform and S works as a mask that retains only a subset of the Fourier coefficients (i.e., reconstruction from sparse Fourier measurements).

The variational approach to such recovery tasks aims at minimizing an energy function of the form:

E(\textbf{f})=\frac{1}{2}\|\textbf{g}-\textbf{H}\textbf{f}\|_{2}^{2}+\tau R(\textbf{f})

(1)

where the former term is known as data fidelity and the latter one is the regularization term that encodes a ‘prior’. The parameter $\tau$ is the regularization parameter that balances the contribution of each term to the cost. The regularization criteria has a critical role in model-driven recovery, thus its proper selection has been immensely studied. One old (but still functional with its extended versions) regularizer is total variation (TV) by Rudin et al (1992), which favors piecewise-constant solutions. For years, TV has been extended in many aspects. Some researchers brought anisotropicity to better handle angled boundaries Grasmair and Lenzen (2010); Lou et al (2015); Bayram and Kamasak (2012); Lefkimmiatis et al (2015), some designed higher-order functionals to overcome staircase artifacts by favoring piecewise-smoothness Chan et al (2000); Bredies et al (2010); Papafitsoros and Schönlieb (2014), and some others employed nonlocal priors to benefit inherent nonlocal self similarity (NSS) as well Gilboa and Osher (2008); Lefkimmiatis and Osher (2015). Structure tensor total variation (STV) Lefkimmiatis et al (2015) is one such TV variant, which redesigns TV in semi-local fashion. Rather than penalizing the gradient, it penalizes a summary of the gradients within a local neighborhood. This neighborhood-awareness provides a better representation of the local image variations, and significantly boosts the performance. Recently in Demircan-Tureyen and Kamasak (2019, 2020), we have shown that penalizing the summary of the directional gradients within a patch instead of the gradients increases the performance as long as the local directions are accurately estimated. In this paper, we design a nonlocal counterpart to our prior direction-guided regularizer, under the influence of the nonlocal STV (NLSTV Lefkimmiatis and Osher (2015)), so that the long-distance dependencies across the image can also be modelled. We also design an algorithm that employs anisotropic Gaussian kernels and nonlocal structure tensors to estimate the directional parameters that are needed by the proposed regularizer.

2 Background

In Lefkimmiatis and Osher (2015), the nonlocal counterpart of the STV was proposed. The extension has leveraged the STV term, which had already robustly encoded the local structural variation in semi-local fashion, by widening its scope to involve the nonlocal variations as well. This idea takes its source from the fact that a local patch often has similar nonlocal patches accross the image. This NSS prior was first revealed in the form of nonlocal means (NLM) filtering Buades et al (2011), which approximates the underlying intensity value at a spatial coordinate $i$ by taking the weighted average of the patches similar to the one around $i$ th pixel. In Gilboa and Osher (2008) and Elmoataz et al (2008), the idea behind NLM was adopted to variational approach to enrich TV based models. They treated images as a graphs, which allowed them to define nonlocal gradients and the interaction of two distant image points became possible. The nonlocal TV (NLTV) term involves the following nonlocal gradient operator:

(\nabla_{w}\textbf{f})[i]=\sqrt{w(i,j)}(\textbf{f}[j]-\textbf{f}[i])

(2)

$\forall{j}\in\{1,2,\cdots,N\}$ , where $w(i,j)$ is a function that assigns weights between points by considering the pairwise similarities of the intensities and the relative distances. Therefore, the NLTV prior is defined as follows:

\text{{NLTV}}(\textbf{f})=\sum_{i=1}^{N}\|\nabla_{w}\textbf{f}[i]\|_{2}

(3)

The rational behind NLTV is to penalize the weighted averages of the gradient magnitudes gathered from the similar patches under the assumption that the gradients of the similar patches are also similar. The NLTV regularization term is designed only for the scalar-valued images, i.e., $C=1$ . If we turn back to NLSTV Lefkimmiatis and Osher (2015), pursuing the same idea with NLTV, it involves a nonlocal structure tensor operator primarily defined as

(S_{w}\textbf{f})[i]=\sum_{j\in\mathcal{N}[i]}w(i,j)(J\textbf{f})[j](J\textbf{f})[j]^{T}

(4)

where $\mathcal{N}[i]=\{j:w(i,j)>0\}$ and $J$ is the Jacobian operator. It was later reformulated as $(S_{w}\textbf{f})[i]=(J_{w}\textbf{f})[i](J_{w}\textbf{f})[i]^{T}$ to be able to decompose the NLSTV into linear functionals (See Lefkimmiatis et al (2015); Lefkimmiatis and Osher (2015)). Here, the operator $J_{w}:\mathbb{R}^{NC}\mapsto\mathbb{R}^{N\times 2\times NC}$ is nonlocal Jacobian of the form:

(J_{w}\textbf{f})[i]=[(D_{w}\textbf{f}^{1})[i]\cdots(D_{w}\textbf{f}^{C})[i]]\in\mathbb{R}^{2\times NC}

(5)

where each superscript referring to a certain channel and $D_{w}:\mathbb{R}^{N}\mapsto\mathbb{R}^{N\times 2\times N}$ is nonlocal gradient with the definition:

\begin{split}(D_{w}\textbf{f})[i]^{c}=[&\sqrt{w(i,1)}(\nabla\textbf{f}^{c})[1]\cdots\\ &\quad\sqrt{w(i,N)}(\nabla\textbf{f}^{c})[N]\in\mathbb{R}^{2\times N}\end{split}

(6)

This definition differs from Eq. (2) in the sense that it can also exploit the underlying structure as opposed to the graph-based approach. Having defined the nonlocal Jacobian, the NLSTV is formulated as

\small\text{{NLSTV}}(\textbf{f})=\sum_{i=1}^{N}\|(J_{w}\textbf{f})[i]\|_{\mathcal{S}_{p}}=\|J_{w}\textbf{f}\|_{1,p}

(7)

where $\mathcal{S}_{p}$ is Schatten norm of order $p$ , i.e., $\|\textbf{X}\|_{\mathcal{S}_{p}}=\|\sigma(\textbf{X})\|_{p}$ for a generic matrix X and $\sigma(\cdot)$ returning the singular values Bhatia (2013).

3 Method

Our recovery algorithm involves two stages: (1) the stage that estimates the directional parameters required by the subsequent stage, (2) the stage that solves the inverse problem at hand by employing the proposed regularizer. The following subsection starts with the explanation of the proposed regularizer. Since we used the term adaptive direction-guided STV (ADSTV) in Demircan-Tureyen and Kamasak (2020) to refer the extended version of DSTV Demircan-Tureyen and Kamasak (2019), which gained the ability of handling multi-directional images, the regularization term that we propose here will be called as nonlocal ADSTV (NLADSTV).

3.1 NLADSTV Regularizer

From the NLSTV’s point of view, we can also further extend our ADSTV to NLADSTV by pursuing the assumption that the directional gradients of the similar patches are also similar. In this respect, we define a directional nonlocal gradient that involves directional parameters estimated beforehand, i.e.,

\begin{split}(\tilde{D}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}^{c})[i]&=[\sqrt{w(i,1)}(\tilde{\Pi}_{(\bm{\alpha},\bm{\theta})}\nabla\textbf{f}^{c})[1]\cdots\\ &\sqrt{w(i,N)}(\tilde{\Pi}_{(\bm{\alpha},\bm{\theta})}\nabla\textbf{f}^{c})[N]]\in\mathbb{R}^{2\times N}\end{split}

(8)

where $(\tilde{\Pi}_{(\bm{\alpha},\bm{\theta})}\nabla\textbf{f}^{c})[i]=\tilde{\bm{\Lambda}}_{(\alpha^{+},\bm{\alpha}^{\textbf{-}}[i])}\textbf{R}_{\bm{-\theta}[i]}(\nabla\textbf{f}^{c})[i]$ for the rotation and scale matrices, i.e.,

\small{{\textbf{R}}}_{\beta}=\left[\begin{array}[]{cc}\cos{\beta}&-\sin{\beta}\\ \sin{\beta}&\cos{\beta}\end{array}\right]\quad\tilde{\bm{\Lambda}}_{(\alpha^{+},\bm{\alpha}^{\textbf{-}}[i])}=\left[\begin{array}[]{cc}\alpha^{+}&0\\ 0&\bm{\alpha}^{\textbf{-}}[i]\end{array}\right]

(9)

with $\bm{\theta}\in[0,\pi)^{N}$ , $\bm{\alpha}^{-}\in[1,\alpha^{+}]^{N}$ , and $\alpha^{+}>1$ . The estimation of these parameters is explained in Section III. B. Thus the directional nonlocal Jacobian takes the shape of:

\begin{split}\small(\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f})[i]=[(\tilde{D}^{(\bm{\alpha},\bm{\theta})}_{w}&\textbf{f}^{1})[i]\cdots\\ &(\tilde{D}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}^{C})[i]]\in\mathbb{R}^{{2\times NC}}\end{split}

(10)

and our NLADSTV is nothing but the mixed $\ell_{1}-S_{p}$ norm of Eq.(10), i.e.,

\small\text{{NLADSTV}}(\textbf{f})=\sum_{i=1}^{N}\|(\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f})[i]\|_{\mathcal{S}_{p}}=\|\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}\|_{1,p}

(11)

Since our NLADSTV preserves NLSTV’s convexity, the numerical optimization of NLADSTV based Eq. (1) is performed by applying the same steps of NLSTV model as described in Lefkimmiatis and Osher (2015). It was solved by employing the augmented Lagrangian Bertsekas (2014) convex optimization tool. Due to its straightforwardness and to avoid repetition, we skip the numerical optimization part in this paper by referring the reader to Lefkimmiatis and Osher (2015). However, since the adjoint of the directional nonlocal Jacobian is needed by the optimization procedure, adopting Proposition 1 in Lefkimmiatis and Osher (2015), it is expressed as

\small\begin{split}(\tilde{J}^{(\bm{\alpha},\bm{\theta})^{*}}_{w}\textbf{X})[i]=[(\tilde{D}^{{(\bm{\alpha},\bm{\theta})}^{*}}_{w}&\textbf{X}^{1})[i]\cdots\\ &(\tilde{D}^{{(\bm{\alpha},\bm{\theta})}^{*}}_{w}\textbf{X}^{C})[i]]\in\mathbb{R}^{C}\end{split}

(12)

where $\textbf{X}\in\mathbb{R}^{N\times 2\times NC}$ is an arbitrary matrix field and the adjoint NL gradient acts on $\textbf{X}^{c}\in\mathbb{R}^{N\times 2\times N}$ as follows:

\small\begin{split}(\tilde{D}^{{(\bm{\alpha},\bm{\theta})}^{*}}_{w}\textbf{X}^{c})[i]=-{\text{div}}\sum_{j=1}^{N}\Big{(}\tilde{\Pi}_{(\bm{\alpha}[i],\bm{\theta}[i])}^{T}\sqrt{w(j,i)}\textbf{X}^{c}[j,:,i]\Big{)}\end{split}

(13)

where $\textbf{X}^{c}[j,:,i]\in\mathbb{R}^{2}$ refers to the $i$ th column of the $j$ th matrix of the matrix field $\textbf{X}^{c}$ , div is the discrete divergence operator, and $\tilde{\Pi}_{(\bm{\alpha}[i],\bm{\theta}[i])}^{T}=R_{\bm{\theta}[i]}\tilde{\Lambda}_{(\alpha^{+},\bm{\alpha}^{\textbf{-}}[i])}$ .

When it comes to the nonlocal weighs, similar to the NLM, the following function is used:

w(i,j)=e^{-d_{\rho}(i,j)/\beta^{2}}

(14)

where $d_{\rho}(\cdot)$ is the distance function of patches and $\beta$ is filtering parameter. Note that, in practical realization of NLSTV, the target space of the nonlocal gradient was shrunk to $\mathbb{R}^{N\times 2\times L}$ , where $L<<N$ , by using a sparse version of weighting strategy. This strategy seeks the similar patches inside a search window rather than the entire image. Next, it takes only $L$ most similar pair of patches into consideration Lefkimmiatis and Osher (2015). The distance function that serves this purpose is given below:

d_{\rho}(i,j)=\sum_{l=-s/2}^{s/2}G_{\rho}[l]|\textbf{f}[i-l]-\textbf{f}[i+d(i,j)-l]|^{2}

(15)

where $G_{\rho}$ is a symmetric weighting kernel of size $s\times s$ and $-r/2\leq d(i,j)\leq r/2$ returns the relative distance between the pixels $i$ and $j$ . The same realization of the weighting function is used by our NLADSTV regularizer.

3.2 Parameter Estimation

Our regularization term requires three additional parameters at each spatial image point: $\bm{\theta}[i]$ , $\bm{\alpha}^{-}[i]$ , and a constant $\alpha^{+}$ . The parameter $\bm{\theta}$ corresponds to the orientation map of the image, while $\bm{\alpha}^{-}$ and $\alpha^{+}$ together determines the anisotropic behaviour of our regularizer. If at a point $i$ , $\bm{\alpha}^{-}[i]$ is equal to $\alpha^{+}$ , then the regularizer behaves like NLSTV. Otherwise, as $\bm{\alpha}^{-}[i]$ gets closer to one, it gains more anisotropic behaviour, in other words; the sensitivity to the changes towards $\bm{\theta}[i]$ increases.

In Demircan-Tureyen and Kamasak (2020), we suggested an algorithm to estimate these parameters. To put it simply, it was applying successive eigendecomposition on the structure tensor (ST) of the observed image (or the luminance information of the observed image for vector-valued images), and TV based smoothing on the parameter fields obtained at multiple scales. In this paper, in order to boost the parameter estimation performance, we propose a different procedure that employs anisotropic Gaussian kernels and nonlocal structure tensor. This subsection elaborates our directional parameter estimation procedure (hereinafter referred to as DPE) by dividing it into steps (See Fig. 1 for the illustration). Note that, the term “input image” refers to the luminance information of the observed image, unless otherwise stated. We denote this image as $\tilde{\textbf{g}}$ .

Refer to caption — Figure 1: The flowchart of our DPE procedure for the estimation of directional parameters.

3.2.1 Structure Tensor Eigendecomposition

We start by obtaining two initial maps through the eigendecomposition of the structure tensor (ST) of the input image. The initial orientation map $\bm{\theta}^{0}$ is a map that is composed of the entries $\bm{\theta}^{0}[i]\angle\textbf{v}^{\textbf{-}}[i]$ , where $\textbf{v}^{\textbf{-}}[i]$ is the eigenvector associated with the smallest eigenvalue $\bm{\lambda}^{\textbf{-}}[i]$ at an image point $i$ . The initial linearity map $\textbf{c}^{0}$ , on the other side, is made of the entries computed as the difference of eigenvalues normalized by the largest eigenvalue $\bm{\lambda}^{\textbf{+}}[i]$ , i.e., $\textbf{c}^{0}[i]={(\bm{\lambda}^{\textbf{+}}[i]-\bm{\lambda}^{\textbf{-}}[i])}/{\bm{\lambda}^{\textbf{+}}[i]}$ . This non-negative ratio serves as a measure of local linearity.

3.2.2 Postprocessing Initial Orientation Map

The initial orientation map is converted into binary masks each of which is filtering out the orientations except a certain (main) orientation and the orientations falling within the interval $[-\Delta\theta,+\Delta\theta)$ around it. In our work, the main orientations are sampled at 15 degree intervals between $[0,180)$ , thus $\Delta\theta=7.5$ degree.

Each orientation mask is further processed by using morphological closings in order to fill in the cracks and the holes, so that these filtered-out pixels are also classified by the corresponding orientation. The cracks along the corresponding orientation are closed by using oriented line segments of 5-pixel length as structuring element (SE). The orientation of the line segment matches with the main orientation of the mask. The holes, on the other hand, are closed by using 3-by-3 square SE.

3.2.3 Filtering with Anisotropic Gaussian Kernels

Anisotropic Gaussian kernels (AGK) and their derivatives have usually been applied to the edge Zhang et al (2017); Wang et al (2019a), line Wang et al (2019b), and corner Zhang and Sun (2020) detection tasks. These kernels are nothing but the elongated versions of the isotropic counterpart and defined as

\scriptstyle g(\textbf{g}_{L};\sigma,\psi,\beta)=\frac{1}{2\pi\sigma^{2}}\operatorname{exp}\Bigg{(}-\frac{1}{2\sigma^{2}}\textbf{g}_{L}^{T}\textbf{R}_{\beta}\left[\begin{array}[]{cc}\scriptstyle\psi^{2}&0\\ 0&\scriptstyle\psi^{-2}\end{array}\right]\textbf{R}_{\beta}^{T}\textbf{g}_{L}\Bigg{)}

(16)

where $\textbf{R}_{\beta}$ is the rotation matrix given in Eq. (9), $\sigma$ represents the scale, $\psi>1$ denotes the factor of anisotropy ( $\psi=1$ for isotropic kernel), and $\beta$ gives the direction of the kernel.

We convolve both the input image $\textbf{g}_{L}$ and the initial linearity map $\textbf{c}^{0}$ with a filter bank of discrete AGKs covering the main orientations. As mentioned above, the main orientations are sampled at 15 degree intervals between $[0,180)$ . We also fixed the scale parameter as $\sigma=0.75$ and the anisotropy factor as $\psi=4$ . Among all orientations, we select the ones coinciding with the angles in the postprocessed version of the initial orientation map by performing element-wise multiplication of each filter response with the corresponding mask and by taking the union of those multiplications. Let us call the resulting filtered versions of $\tilde{\textbf{g}}$ and $\textbf{c}^{0}$ as filtered input image $\tilde{\textbf{g}}_{agk}$ and filtered linearity map $\textbf{c}_{agk}$ , respectively.

3.2.4 Nonlocal Structure Tensor Eigendecomposition

This step is very similar to the first step, except this time the structure tensor is nonlocal and computed on the filtered input image $\tilde{\textbf{g}}_{agk}$ , rather than $\tilde{\textbf{g}}$ . The rational behind preferring not to apply the nonlocal ST eigendecomposition in the first step is due to its less robustness to the noise or artifacts, when compared to its local counterpart. The orientation map $\bm{\theta}$ with the entries $\bm{\theta}[i]\angle\textbf{v}_{nl}^{\textbf{-}}[i]$ , where the subscript $nl$ indicates that the eigenvector is derived from the nonlocal ST, is the final orientation map now, which will be fed into the restoration process as it is. On the other side, the entries of the nonlocal linearity map is obtained as $\textbf{c}_{nl}[i]={(\bm{\lambda}_{nl}^{\textbf{+}}[i]-\bm{\lambda}_{nl}^{\textbf{-}}[i])}/{\bm{\lambda}_{nl}^{\textbf{+}}[i]}$ and averaged with the filtered linearity map to reach the final linearity map, i.e., $\textbf{c}=(\textbf{c}_{agk}+\textbf{c}_{nl})/2$ .

3.2.5 The Final Directional Parameters

The orientation map $\bm{\theta}$ to be used by the NLADSTV regularizer is the direct output of nonlocal ST eigendecomposition as stated in the previous step. The parameter $\bm{\alpha}^{-}$ , on the other hand, is computed by inversely scaling the entries of $\textbf{c}\in\mathbb{R}^{N}$ onto the range $[1,\alpha^{+}]$ , i.e.,

\small\bm{\alpha}^{-}[i]=\frac{\alpha^{+}-1}{\text{max}({\textbf{c}})-\text{min}({\textbf{c}})}(\text{max}({\textbf{c}})-{\textbf{c}}[i])+1

(17)

In Fig. 1 dashed red box shows the final directional parameters.

3.3 Overall Algorithm

Our recovery framework starts by estimating the directional parameters as described in Subsection B. Next, the NLADSTV regularized inverse problem (See Eq. (18)) is solved by following the optimization scheme in Lefkimmiatis and Osher (2015). The overall algorithm is provided in Algorithm 1. This algorithm actually repeats the steps in the Algorithm 1 in Lefkimmiatis and Osher (2015), except the presence of the DPE procedure (called at third line) and the nonlocal Jacobian operator switched to our directional nonlocal Jacobian.

As mentioned earlier, we don’t go into the details of the numerical optimization in this paper, however the Algorithm 1 needs some explanations. It solves the following NLADSTV regularized inverse problem:

\hat{\textbf{{f}}}=\underset{\textbf{f}}{\operatorname{argmin}}\ \frac{1}{2}\|\textbf{g}-\textbf{H}\textbf{f}\|_{2}^{2}+\tau\|\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}\|_{1,p}+\iota_{\mathcal{C}}(\textbf{f})

(18)

where $\iota_{\mathcal{C}}$ is the indicator function of a convex set $\mathcal{C}$ that takes the value 0 if $\textbf{f}\in\mathcal{C}$ and $\infty$ otherwise. The set $\mathcal{C}$ corresponds to an additional constraint, such as nonnegativity, and if there is no constraint, then $\mathcal{C}=\mathbb{R}^{NC}$ . The authors of Lefkimmiatis and Osher (2015) reformulates this problem in constrained form by using auxiliary variables and seeks a solution through augmented Lagrangian methods Bertsekas (2014). They employ alternating-direction method of multipliers (ADMM) Eckstein and Bertsekas (1992).

In the line 5, the $\operatorname{prox}$ operator at the RHS corresponds to the proximal map of the mixed $\ell_{1}-S_{p}$ norm weighted by $\tau/\mu$ . For a function $h$ evaluated at $z$ , it is defined as:

\operatorname{prox}_{h}(\textbf{z})=\underset{\textbf{f}}{\operatorname{argmin}}\ \frac{1}{2}\|\textbf{f}-\textbf{z}\|^{2}+h(\textbf{f})

(19)

While deriving a minimization approach in Lefkimmiatis and Osher (2015) for the NLSTV regularized inverse problem, Eq. (19) is reduced to the proximal map of $\mathcal{S}_{p}$ norm by inputting the entries $\Omega[i]\in\mathbb{R}^{2\times LC}$ of $\Omega=\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}^{(t)}+\textbf{s}^{t}_{1}$ separately. Next, through the SVD decomposition of $\Omega[i]$ , the problem is reduced to the proximal map of $\ell_{p}$ norm with an input consisting of the singular values of $\Omega[i]$ . For the efficient evaluation of $\operatorname{prox}_{p}$ , they utilize iterative proximal algorithm proposed in Liu and Ye (2010). (See Section IV in Lefkimmiatis and Osher (2015) for the details).

In the line 6, $P_{\mathcal{C}}$ is nothing but the projection of $\textbf{f}^{(t)}+\textbf{s}_{2}^{(t)}$ onto the convex set $\mathcal{C}$ . It is defined as $P_{\mathcal{C}}=\min(\max(0,\textbf{f}),\gamma)$ for the set $\mathcal{C}=\{\textbf{f}\in\mathbb{R}^{NC}:0\leq\textbf{f}[i]\leq\gamma,\forall i=1,\cdots,NC\}$ that we consider Lefkimmiatis and Osher (2015).

The rest of the Algorithm 1 keeps realizing the steps of ADMM iterations and updating the Lagrange multipliers (The reader is referred to Lefkimmiatis and Osher (2015) for details).

Algorithm 1 Algorithm for NLADSTV-based recovery (adopted from Lefkimmiatis and Osher (2015))

1: INPUT: g, H,

\tau>0

\mu>0

p\geq 1

{\alpha}^{+}>1

2: INIT:

{\textbf{f}}^{(0)}=\textbf{g}

\textbf{s}^{(0)}_{1}=\textbf{0}

\textbf{s}^{(0)}_{2}=\textbf{0}

t=0

(\bm{\theta},\bm{\alpha^{-}})\leftarrow

DPE

(\textbf{g},\alpha^{+})

4: while stopping criterion is not satisfied do

\textbf{z}_{1}^{(t+1)}\leftarrow\operatorname{prox}_{\frac{\tau}{\mu}\|\cdot\|_{1,p}}(\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}^{(t)}+\textbf{s}^{t}_{1})

\textbf{z}_{2}^{(t+1)}\leftarrow P_{\mathcal{C}}(\textbf{f}^{(t)}+\textbf{s}_{2}^{(t)})

\textbf{B}\leftarrow\big{(}\frac{1}{\mu}\textbf{H}^{T}\textbf{H}+\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w^{*}}\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}+\textbf{I}\big{)}

\bm{\omega}_{1}^{(t+1)}\leftarrow\textbf{z}_{1}^{(t+1)}-\textbf{s}^{(t)}_{1}

\bm{\omega}_{2}^{(t+1)}\leftarrow\textbf{z}_{2}^{(t+1)}-\textbf{s}^{(t)}_{2}

10:

\textbf{f}^{(t+1)}\leftarrow\textbf{B}^{-1}\big{(}\frac{1}{\mu}\textbf{H}^{T}\textbf{g}+\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w^{*}}\bm{\omega}_{1}^{(t+1)}+\bm{\omega}_{2}^{(t+1)}\big{)}

11:

\textbf{s}^{(t+1)}_{1}\leftarrow\textbf{s}^{(t)}_{1}+\tilde{J}^{(\bm{\alpha},\bm{\theta})}_{w}\textbf{f}^{(t+1)}-\textbf{z}_{1}^{(t+1)}

12:

\textbf{s}^{(t+1)}_{2}\leftarrow\textbf{s}^{(t)}_{2}+\textbf{f}^{(t+1)}-\textbf{z}_{2}^{(t+1)}

13:

t\leftarrow t+1

14: end while

4 Experimental Results

We consider the problems of image denoising and image deblurring by confronting four related methods: STV, ADSTV, NLSTV, and NLADSTV. The experiments are conducted on the vector-valued images shown in Fig. 2. The images are taken from the popular Berkeley Segmentation Dataset (BSD) Arbelaez et al (2010). For the quantitative evaluation, we use the peak signal-to-noise ratio (PSNR). The source codes of STV¹¹1https://github.com/cig-skoltech/Structure_Tensor_Total_Variation and NLSTV²²2https://github.com/cig-skoltech/NLSTV that we use were made publicly available by the authors on GitHub. Our NLADSTV is implemented on top of NLSTV.

Since the algorithms under consideration are all aiming to minimize Eq. (1), they need regularization parameter $\tau$ to be tuned. We fine-tuned $\tau$ such that it leads to the best PSNR in all experiments. For both STV and ADSTV, the Gaussian kernel of support $3\times 3$ pixel with the standard deviation $\sigma=0.5$ is used. For NLSTV and NLADSTV, on the other hand, the sizes of the neighborhood (used for the nonlocal weights) and the search window are set to $7\times 7$ and $11\times 11$ , respectively. Also, the number of the similar patches $L=9$ , as suggested in Lefkimmiatis and Osher (2015). For the sake of fairness, we fixed the free parameter $\alpha^{+}$ of ADSTV and NLADSTV as 4, but it is possible to increase the restoration quality of these methods by fine-tuning $\alpha^{+}$ . When it comes to the setting of the DPE procedure, the Gaussian kernel used for the computation of the local structure tensor has $9\times 9$ pixel support and the standard deviation of $1.5$ , while the size of the neighborhood is $5\times 5$ , the size of the search window is $11\times 11$ , and $L=13$ . In addition to them, note that the order of the Schatten norm $p=1$ for all regularizers under consideration.

4.1 Image Denoising

Table 1: PSNR Comparisons on Image Denoising

$\sigma_{\eta}$ :	0.05				0.1				0.15				0.2
Method:	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV
Chateau	32.38	32.73	32.92	33.15	28.79	29.33	29.41	29.71	26.93	27.60	27.56	27.93	25.74	26.37	26.29	26.71
Dog	32.19	32.56	32.25	32.61	29.00	29.38	29.09	29.44	27.44	27.82	27.59	27.87	26.46	26.78	26.63	26.85
Mural	30.16	30.41	30.20	30.46	26.59	26.95	26.74	27.14	24.78	25.19	25.03	25.47	23.61	24.00	23.88	24.35
Workers	31.90	32.38	32.40	32.87	28.04	28.69	28.66	29.35	25.94	26.65	26.60	27.35	24.55	25.20	25.13	25.92
Buildings	31.24	31.82	31.35	32.13	27.44	28.15	27.72	28.55	25.47	26.25	25.90	26.74	24.20	24.93	24.64	25.50
Deer	31.49	31.76	31.31	31.69	28.30	28.65	28.22	28.54	26.69	27.04	26.67	26.90	25.64	25.97	25.66	25.80
Arch	30.86	31.03	30.92	31.07	27.63	27.90	27.76	28.14	26.08	26.38	26.27	26.61	25.15	25.34	25.36	25.63
Man	33.42	33.64	33.65	33.97	29.54	29.85	29.88	30.31	27.38	27.71	27.71	28.24	25.90	26.19	26.16	26.75
Monarch	35.16	35.52	35.42	35.73	31.68	32.09	32.03	32.37	29.73	30.17	30.21	30.51	28.39	28.78	28.88	29.23
Structure	30.00	30.31	30.32	30.65	26.27	26.68	26.58	27.05	24.51	24.92	24.82	25.29	23.45	23.82	23.73	24.18
Avg.	31.88	32.22	32.07	32.43	28.33	28.77	28.61	29.06	26.50	26.97	26.84	27.29	25.31	25.74	25.64	26.09

We consider additive i.i.d. Gaussian noise with four noise levels $\sigma_{\eta}=\{0.05,0.1,0.15,0.2\}$ . Table 1 reports the PSNR scores obtained by using four competing methods. By inspecting the results, one observes that the local regularizers perform worse than their nonlocal counterparts, except for the Deer image, which may not be exhibiting NSS property. STV is the least performing regularizer for all four noise levels. As was shown in Demircan-Tureyen and Kamasak (2019), ADSTV systematically outperforms STV with the improvement around 0.4 dBs. Another observation is, as the noise level increases, ADSTV also produces better results than NLSTV. Our NLADSTV, on the other hand, is the best performing regularizer with 0.4 dBs improvement over NLSTV and 0.3 dBs over its local counterpart ADSTV, on average.

Fig. 3 is also provided for the visual judgement. We demonstrate two representative detail patches. As we observe from the patch taken from the Structure image, the straight lines seem smoother in the NLADSTV reconstruction. The oil-painting-like artifacts present in the STV and NLSTV are far less visible in ADSTV and NLADSTV. When compared to the ADSTV, the NLADSTV is better at removing noise while preserving details, thanks to its nonlocality. In the Arch image, the details of the plant and the bricks around the arch are more distinguishably reconstructed by our NLADSTV than the other methods.

4.2 Image Deblurring

In deblurring problem, instead of feeding the direct observation (blurred image) to the DPE procedure, we input the deblurred version of the image by a Wiener filter, since otherwise the blur effect causes the poor localization of edges and the loss of some small gaps between parallel lines. By means of Wiener filter, our DPE procedure can better estimate the directional parameters.

Table 2: PSNR Comparisons on Image Deblurring

	Gaussian PSF								Motion PSF
BSNR:	20 dB				30 dB				20 dB				30 dB
	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV	STV	ADSTV	NLSTV	NLADSTV
Chateau	25.62	25.92	25.83	26.04	27.85	28.10	27.77	28.27	27.73	28.14	28.15	28.43	31.81	32.28	32.44	32.78
Dog	27.18	27.25	27.19	27.25	28.70	28.84	28.76	28.90	29.54	29.76	29.59	29.85	33.81	34.14	33.88	34.24
Mural	23.91	24.13	24.08	24.20	26.12	26.25	26.34	26.42	26.30	26.71	26.56	26.93	31.17	31.51	31.34	31.66
Workers	23.80	24.09	24.04	24.37	26.26	26.52	26.62	26.89	26.38	26.99	26.96	27.74	31.42	31.99	32.16	32.78
Buildings	24.02	24.21	24.37	24.58	25.98	26.25	26.24	26.51	27.22	27.71	27.50	28.08	31.81	32.42	31.89	32.65
Deer	27.39	27.54	27.30	27.43	29.28	29.44	29.28	29.39	29.25	29.54	29.20	29.51	33.90	34.20	33.81	34.19
Arch	25.93	26.00	26.00	26.11	27.58	27.58	27.63	27.68	27.64	27.84	27.73	28.06	31.88	32.07	31.96	32.27
Man	25.38	25.45	25.40	25.45	28.15	28.28	28.35	28.44	27.34	27.63	27.74	28.09	32.40	32.72	32.86	33.23
Monarch	30.15	30.15	30.39	30.31	32.54	32.63	32.97	32.97	33.20	33.64	33.70	34.00	38.40	38.76	38.88	39.20
Structure	22.99	23.10	23.02	23.14	24.22	24.35	24.30	24.46	25.32	25.64	25.44	25.90	29.79	30.13	30.06	30.43
Avg.	25.64	25.79	25.76	25.89	27.67	27.82	27.83	27.99	27.99	28.36	28.26	28.66	32.64	33.02	32.93	33.34

Table 2 reports the deblurring performances of the methods under consideration for two types of blurring kernels. We consider a Gaussian kernel of support $9\times 9$ pixel with $\sigma_{b}=6$ and a motion kernel of support $19$ pixel. The blurred images are also degraded with Gaussian noise corresponding to blurred SNR (BSNR = variance of Hf divided by $\sigma_{\eta}^{2}$ ) of 20 and 30 dBs. While the deblurring results for motion blur are on par with the denoising results, the improvement that we obtain with NLADSTV is less notable (around 0.15 dBs over both ADSTV and NLSTV on average) when it comes to the Gaussian blur.

We demonstrate two exemplary results in Fig. 4. As one can observe from the detail patch of the Buildings image, the artifacts between the closely parallel lines are far less visible in the NLADSTV regularized solution. In the Workers image again, the the cloudy look around the lines is more apparent in the other methods than in our NLADSTV.

5 Discussion

According to the experiments conducted, our approach seems to produce promising results. However, the effectiveness of the NLADSTV based regularization highly depends on the performance of the directional parameter estimation (DPE) procedure. In the case of Gaussian blur for instance, since our DPE procedure was able to estimate less reliable parameters, the PSNR gain was not as satisfying as it was in the other experiments. Fortunately, because the structure tensor is a positive semi-definite matrix, the linearity factor $\textbf{c}[i]$ also behaves like a coherence factor, which measures the reliability of the estimated orientations. Thus for the unreliable orientation estimations, $\textbf{c}[i]$ becomes small and the NLADSTV starts working like the NLSTV. Therefore, the restoration quality never falls below the one that we obtain by using NLSTV.

The NLADSTV comes up with additional computational demands. The DPE procedure takes around 10 sec. on average, on a computer equipped with Intel Core Processor i7-7500U (2.70-GHz) with 16 GB of memory. Besides, the convergence of the NLADSTV based problem requires more iterations than the NLSTV, which makes the entire framework run 2 times slower than the NLSTV based image recovery.

6 Conclusion

This paper proposes a competitive framework for variational image recovery. We establish a two-stage algorithm to advance the nonlocal structure tensor total variation (NLSTV) regularizer, which already exploits local structural regularity and nonlocal self-similarity properties. We further enrich these priors by encoding some spatially varying directional characteristics of the underlying image. In order to understand these characteristics, we design an algorithm that estimates the directional parameters required by our regularizer. The experiments prove that the proposed method achieves significant performance improvements over its local counterpart and the NLSTV.

References

Arbelaez et al (2010) Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence 33(5):898–916
Bayram and Kamasak (2012) Bayram I, Kamasak ME (2012) Directional total variation. IEEE Signal Processing Letters 19(12):781–784
Bertsekas (2014) Bertsekas DP (2014) Constrained optimization and Lagrange multiplier methods. Academic press
Bhatia (2013) Bhatia R (2013) Matrix analysis, vol 169. Springer Science & Business Media
Bredies et al (2010) Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM Journal on Imaging Sciences 3(3):492–526
Buades et al (2011) Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Processing On Line 1:208–212
Chan et al (2000) Chan T, Marquina A, Mulet P (2000) High-order total variation-based image restoration. SIAM Journal on Scientific Computing 22(2):503–516
Demircan-Tureyen and Kamasak (2019) Demircan-Tureyen E, Kamasak ME (2019) On the direction guidance in structure tensor total variation based denoising. In: Iberian Conference on Pattern Recognition and Image Analysis, Springer, pp 89–100
Demircan-Tureyen and Kamasak (2020) Demircan-Tureyen E, Kamasak ME (2020) Adaptive direction-guided stiructure tensor total variation. arXiv preprint arXiv:200105717
Eckstein and Bertsekas (1992) Eckstein J, Bertsekas DP (1992) On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55(1-3):293–318
Elmoataz et al (2008) Elmoataz A, Lezoray O, Bougleux S (2008) Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. IEEE transactions on Image Processing 17(7):1047–1060
Gilboa and Osher (2008) Gilboa G, Osher S (2008) Nonlocal operators with applications to image processing. Multiscale Modeling & Simulation 7(3):1005–1028
Grasmair and Lenzen (2010) Grasmair M, Lenzen F (2010) Anisotropic total variation filtering. Applied Mathematics & Optimization 62(3):323–339
Lefkimmiatis and Osher (2015) Lefkimmiatis S, Osher S (2015) Nonlocal structure tensor functionals for image regularization. IEEE Transactions on Computational Imaging 1(1):16–29
Lefkimmiatis et al (2015) Lefkimmiatis S, Roussos A, Maragos P, Unser M (2015) Structure tensor total variation. SIAM Journal on Imaging Sciences 8(2):1090–1122
Liu and Ye (2010) Liu J, Ye J (2010) Efficient l1/lq norm regularization. arXiv preprint arXiv:10094766
Lou et al (2015) Lou Y, Zeng T, Osher S, Xin J (2015) A weighted difference of anisotropic and isotropic total variation model for image processing. SIAM Journal on Imaging Sciences 8(3):1798–1823
Papafitsoros and Schönlieb (2014) Papafitsoros K, Schönlieb CB (2014) A combined first and second order variational approach for image reconstruction. Journal of mathematical imaging and vision 48(2):308–338
Rudin et al (1992) Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60(1-4):259–268
Wang et al (2019a) Wang G, Lopez-Molina C, De Baets B (2019a) Multiscale edge detection using first-order derivative of anisotropic gaussian kernels. Journal of Mathematical Imaging and Vision 61(8):1096–1111
Wang et al (2019b) Wang G, Lopez-Molina C, de Ulzurrun GVD, De Baets B (2019b) Noise-robust line detection using normalized and adaptive second-order anisotropic gaussian kernels. Signal Processing 160:252–262
Zhang and Sun (2020) Zhang W, Sun C (2020) Corner detection using multi-directional structure tensor with multiple scales. International Journal of Computer Vision 128(2):438–459
Zhang et al (2017) Zhang W, Zhao Y, Breckon TP, Chen L (2017) Noise robust image edge detection based upon the automatic anisotropic gaussian kernels. Pattern Recognition 63:193–205

Nonlocal Adaptive Direction-Guided Structure Tensor Total Variation For Image Recovery ††thanks: This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under 115R285.