This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\receiveddate

XX Month, XXXX \reviseddateXX Month, XXXX \accepteddateXX Month, XXXX \publisheddateXX Month, XXXX \currentdateXX Month, XXXX \doiinfoXXXX.2022.1234567

\corresp

This work was partially supported by JSPS Grants-in-Aid (23K22762). \authornote This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion

Masahiro Yukawa1{}^{\textbf{1}}    Senior Member    IEEE Department of Electronics and Electrical Engineering, Keio University, Yokohama, JAPAN
Abstract

We present a principled way of deriving a continuous relaxation of a given discontinuous shrinkage operator, which is based on a couple of fundamental results. First, the image of a point with respect to the “set-valued” proximity operator of a nonconvex function is included by that for its lower semicontinuous (l.s.c.) 1-weakly-convex envelope. Second, the “set-valued” proximity operator of a proper l.s.c. 1-weakly-convex function is converted, via double inversion, to a “single-valued” proximity operator which is Lipschitz continuous. As a specific example, we derive a continuous relaxation of the discontinuous shrinkage operator associated with the reversely ordered weighted 1\ell_{1} (ROWL) penalty. Numerical examples demonstrate potential advantages of the continuous relaxation.

{IEEEkeywords}

convex analysis, proximity operator, weakly convex function

1 Introduction

Triggered by the seminal work [1, 2] of waveshrink by Donoho and Johnstone in 1994, the soft and hard shrinkage operators have been studied in the context of sparse modeling [3, 4], which has a wide range of signal processing applications such as magnetic resonance imaging, radar, sparse coding, compressed sensing/sampling, signal dimensionality reduction, to name just a few [5]. The soft shrinkage operator is characterized as the proximity operator of the 1\ell_{1} norm, where the proximity operator of a convex function is nonexpansive (Lipschitz continuous with unity constant) in general. Because of this, it has widely been used in the operator splitting algorithms [6, 7, 8]. The hard shrinkage operator, on the other hand, is “discontinuous”, while it causes no extra bias in the estimates of large-magnitude coefficients. Hard shrinkage is derived from the set-valued proximity operator of the 0\ell_{0} pseudo-norm.

The firm shrinkage operator [9] has been proposed in 1997, possessing a good balance in the sense of (i) being Lipschitz continuous (with constant strictly greater than unity) and (ii) yielding nearly unbiased estimates of the large-magnitude coefficients. It is the (generalized) proximity operator of the minimax concave (MC) penalty [10, 11] which is weakly convex. Owing to this property, the proximal forward-backward splitting algorithm employing firm shrinkage has a guarantee of convergence to a global minimizer of the objective function [12, 13].

Another discontinuous shrinkage operator has been proposed in the context of image restoration based on the reversely ordered weighted 1\ell_{1} (ROWL) penalty [14], which gives small (possibly zero) weights to dominant coefficients to avoid underestimation so that edge sharpness and gradation smoothness of images are enhanced simultaneously. The ROWL shrinkage operator requires no knowledge about the “magnitude” of the dominant coefficients but instead it requires the “number” of those dominant components essentially. This is in sharp contrast to firm shrinkage which requires (at least a rough approximation, or a sort of bound of) the “magnitude” of the dominant coefficients but it does not care about the “number”. This implies that ROWL shrinkage would be preferable in such situations where the magnitude of dominant components tends to change but the number (or at least its rough estimate) can be assumed to be known a priori. Despite this potential benefit, the ROWL shrinkage operator is discontinuous on the boundaries where some of the components share the same magnitude, provided that the corresponding weights are different. A natural question would be the following: is it possible to convert the discontinuous operator to a continuous one?

To address the above question, our arguments in the present study rely on our recent result [13]: a given operator is the “single-valued” proximity operator of a η\eta-weakly convex function for η(0,1)\eta\in(0,1) if and only if it is a monotone Lipschitz-continuous gradient (MoL-Grad) denoiser (i.e., it can be expressed as a Lipschitz-continuous gradient operator of a differentiable convex function). See Fact 5. It has been shown in [13] that the operator-regularization approach (or the plug-and-play method [15]) employing a MoL-Grad denoiser actually solves a variational problem involving a weakly convex regularizer which is characterized explicitly by the denoiser.

In this paper, we present a pair of fundamental findings concerning the “set-valued” proximity operator111The “set-valued” proximity operator has been studied previously in the literature [16, 17]. See also [18]. of (proper) nonconvex function, aiming to build a way of converting a discontinuous operator to its continuous relaxation. First, given a nonconvex function, the image of a point with respect to the proximity operator is included by its image with respect to the proximity operator of its lower-semicontinuous 1-weakly-convex envelope (Theorem 1). This well explains the known fact that hard shrinkage can also be derived from the proximity operator of a certain weakly convex function [19, 20], as elaborated in Section 4-4.1. Second, the “set-valued” proximity operator of a (proper) lower-semicontinuous 1-weakly-convex function can be converted, via double inversion, to a MoL-Grad denoiser (Theorem 2). Those proximal inclusion and conversion lead to a principled way of deriving a continuous relaxation (a Lipschitz-continuous relaxation, more specifically) of a given discontinuous operator. As an illustrative example, we show that the firm shrinkage operator is obtained as a continuous relaxation of the hard shrinkage operator. Under the same principle, we derive a continuous relaxation of the ROWL shrinkage operator. Numerical examples show that the continuous relaxation has potential advantages over the original discontinuous shrinkage.

2 Preliminaries

Let (,,)({\mathcal{H}},\left\langle{\cdot},{\cdot}\right\rangle) be a real Hilbert space with the induced norm \left\|\cdot\right\|. Let {\mathbb{R}}, +{\mathbb{R}}_{+}, ++{\mathbb{R}}_{++}, and {\mathbb{N}} denote the sets of real numbers, nonnegative real numbers, strictly positive real numbers, and nonnegative integers, respectively.

2.1 Lipschitz Continuity of Operator, Set-valued Operator

Let Id::xx{\rm Id}:{\mathcal{H}}\rightarrow{\mathcal{H}}:x\mapsto x denote the identity operator on {\mathcal{H}}. An operator T:T:{\mathcal{H}}\rightarrow{\mathcal{H}} is Lipschitz continuous with constant κ>0\kappa>0 (or κ\kappa-Lipschitz continuous for short) if

T(x)T(y)κxy,(x,y)2.\left\|T(x)-T(y)\right\|\leq\kappa\left\|x-y\right\|,~{}\forall(x,y)\in{\mathcal{H}}^{2}. (1)

Let 22^{{\mathcal{H}}} denote the power set (the family of all subsets) of {\mathcal{H}}. An operator 𝖳:2\mathsf{T}:{\mathcal{H}}\rightarrow 2^{{\mathcal{H}}} is called a set-valued operator, where 𝖳(x)\mathsf{T}(x)\subset{\mathcal{H}} for every xx\in{\mathcal{H}}. Given a set-valued operator 𝖳\mathsf{T}, a mapping U:U:{\mathcal{H}}\rightarrow{\mathcal{H}} such that U(x)𝖳(x)U(x)\in\mathsf{T}(x) for every xx\in{\mathcal{H}} is called a selection of 𝖳\mathsf{T}. The inverse of a set-valued operator 𝖳\mathsf{T} is defined by 𝖳1:2:y{xy𝖳(x)}\mathsf{T}^{-1}:{\mathcal{H}}\rightarrow 2^{{\mathcal{H}}}:y\mapsto\{x\in{\mathcal{H}}\mid y\in\mathsf{T}(x)\}, which is again a set-valued operator in general.

An operator 𝖳:2\mathsf{T}:{\mathcal{H}}\rightarrow 2^{{\mathcal{H}}} is monotone if

xy,uv0,(x,u)gra𝖳,(y,v)gra𝖳,\left\langle{x\!-\!y},{u\!-\!v}\right\rangle\geq 0,~{}\forall(x,u)\in{\rm gra~{}}\mathsf{T},~{}\forall(y,v)\in{\rm gra~{}}\mathsf{T}, (2)

where gra 𝖳:={(x,u)2u𝖳(x)}\mathsf{T}:=\{(x,u)\in{\mathcal{H}}^{2}\mid u\in\mathsf{T}(x)\} is the graph of 𝖳\mathsf{T}. The following fact is known [21, Proposition 20.10].

Fact 1 (Preservation of monotonicity)

Let 𝒦\mathcal{K} be a real Hilbert space, A:2A:{\mathcal{H}}\rightarrow 2^{\mathcal{H}} and B:𝒦2𝒦B:\mathcal{K}\rightarrow 2^{\mathcal{K}} be monotone operators, L:𝒦L:{\mathcal{H}}\rightarrow\mathcal{K} be a bounded linear operator, and c+c\in{\mathbb{R}}_{+} be a nonnegative constant. Then, the operators A1A^{-1}, cAcA, and A+LBLA+L^{*}BL are monotone, where LL^{*} denotes the adjoint operator of LL.

A monotone operator 𝖳\mathsf{T} is maximally monotone if no other monotone operator has its graph containing gra 𝖳\mathsf{T} properly.

2.2 Lower Semicontinuity and Convexity of Function

A function f:(,+]:={+}f:{\mathcal{H}}\rightarrow(-\infty,+\infty]:={\mathbb{R}}\cup\{+\infty\} is proper if the domain is nonempty; i.e., domf:={xf(x)<+}{\rm dom}~{}f:=\{x\in{\mathcal{H}}\mid f(x)<+\infty\}\neq\varnothing. A function ff is lower semicontinuous (l.s.c.) on {\mathcal{H}} if the level set levaf:={x:f(x)a}{\rm lev}_{\leq a}f:=\left\{x\in{\mathcal{H}}:f(x)\leq a\right\} is closed for every aa\in{\mathbb{R}}. A function f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] is convex on {\mathcal{H}} if f(αx+(1α)y)αf(x)+(1α)f(y)f(\alpha x+(1-\alpha)y)\leq\alpha f(x)+(1-\alpha)f(y) for every (x,y,α)domf×domf×[0,1](x,y,\alpha)\in{\rm dom}~{}f\times{\rm dom}~{}f\times[0,1]. For η(,+]\eta\in(-\infty,+\infty], we say that ff is η\eta-weakly convex if f+(η/2)2f+(\eta/2)\left\|\cdot\right\|^{2} is convex. Here, η<0\eta<0 (i.e., f(|η|/2)2f-(\left|\eta\right|/2)\left\|\cdot\right\|^{2} is convex) means that ff is |η|\left|\eta\right|-strongly convex. Clearly, η\eta-weak convexity of ff implies η~\tilde{\eta}-weak convexity of ff for an arbitrary η~η\tilde{\eta}\geq\eta. When the “minimal” weak-convexity parameter is η=+\eta=+\infty, f+α2f+\alpha\left\|\cdot\right\|^{2} is nonconvex for every α\alpha\in{\mathbb{R}}, i.e., ff is not weakly convex (for any parameter in {\mathbb{R}}). The set of all proper l.s.c. convex functions f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] is denoted by Γ0()\Gamma_{0}({\mathcal{H}}).

Given a proper function f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty], the Fenchel conjugate (a.k.a. the Legendre transform) of ff is f:(,]:usupx(x,uf(x))f^{*}:{\mathcal{H}}\rightarrow(-\infty,\infty]:u\mapsto\sup_{x\in{\mathcal{H}}}(\left\langle{x},{u}\right\rangle-f(x)). The conjugate f:=(f)f^{**}:=(f^{*})^{*} of ff^{*} is called the biconjugate of ff. In general, ff^{*} is l.s.c. and convex [21, Proposition 13.13].222 It may happen that domf={\rm dom}~{}f^{*}=\varnothing. For instance, the conjugate of f::xx2f:{\mathbb{R}}\rightarrow{\mathbb{R}}:x\mapsto-x^{2} is f(u)=+f^{*}(u)=+\infty for every uu\in{\mathbb{R}}. If domf{\rm dom}~{}f^{*}\neq\varnothing, or equivalently if ff possesses a continuous affine minorant333 Function ff has a continuous affine minorant if there exist some (a,b)×(a,b)\in{\mathcal{H}}\times{\mathbb{R}} such that f(x)a,x+bf(x)\geq\left\langle{a},{x}\right\rangle+b for every xx\in{\mathcal{H}}., then ff^{*} is proper and f=f˘f^{**}=\breve{f}; otherwise f(x)=f^{**}(x)=-\infty for every xx\in{\mathcal{H}} [21, Proposition 13.45]. Here, f˘\breve{f} is the l.s.c. convex envelope of ff.444 The l.s.c. convex envelope f˘\breve{f} is the largest l.s.c. convex function gg such that f(x)g(x)f(x)\geq g(x), x\forall x\in{\mathcal{H}}.

Fact 2 (Fenchel–Moreau Theorem [21])

Given a proper function f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty], the following equivalence and implication hold: fΓ0()f=ffΓ0()f\in\Gamma_{0}({\mathcal{H}})\Leftrightarrow f=f^{**}\Rightarrow f^{*}\in\Gamma_{0}({\mathcal{H}}).

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be proper. Then, the set-valued operator f:2\partial f:{\mathcal{H}}\rightarrow 2^{\mathcal{H}} such that

f:x{zyx,z+f(x)f(y),y}\partial f:x\mapsto\{z\in{\mathcal{H}}\mid\left\langle{y\!-\!x},{z}\right\rangle+f(x)\leq f(y),~{}\forall y\in{\mathcal{H}}\} (3)

is the subdifferential of ff [21].

Fact 3

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be proper. Then, the following statements hold.

  1. 1.

    f\partial f is monotone [21, Example 20.3].

  2. 2.

    f(x)(f)1(x)\partial f(x)\subset(\partial f^{*})^{-1}(x) for every xx\in{\mathcal{H}} and (f)1(u)f(u)(\partial f)^{-1}(u)\subset\partial f^{*}(u) for every uu\in{\mathcal{H}} [21, Proposition 16.10].

Fact 4

Let fΓ0()f\in\Gamma_{0}({\mathcal{H}}). Then, the following statements hold.

  1. 1.

    f\partial f is maximally monotone [21, Theorem 20.25].

  2. 2.

    (f)1=f(\partial f)^{-1}=\partial f^{*} [21, Corollary 16.30].

  3. 3.

    f(x)\partial f(x)\neq\varnothing if ff is continuous at xx\in{\mathcal{H}} [21, Proposition 16.17].

  4. 4.

    f(x)={f(x)}\partial f(x)=\{\nabla f(x)\} if ff is (Gâteaux) differentiable with its (Gâteaux) derivative f\nabla f [21, Proposition 17.31].

2.3 Proximity Operator of Nonconvex Function

Definition 1 (Proximity operator [18, 17])

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be proper. The proximity operator of ff of index γ++\gamma\in{\mathbb{R}}_{++} is then defined by

𝐏𝐫𝐨𝐱γf:2:xargminy(f(y)+12γxy2),{\mathbf{Prox}}_{\gamma f}:{\mathcal{H}}\!\rightarrow\!2^{\mathcal{H}}:x\mapsto\operatornamewithlimits{argmin}_{y\in\mathcal{H}}\!\Big{(}f(y)+\frac{1}{2\gamma}\left\|x\!-\!y\right\|^{2}\Big{)}, (4)

which is set-valued in general.

We present a slight extension of the previous result [13, Lemma 1] below.

Lemma 1

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be a proper function. Then, given every positive constant γ++\gamma\in{\mathbb{R}}_{++}, it holds that

𝐏𝐫𝐨𝐱γf=[(f+12γ2)]1(γ1Id),\mathbf{Prox}_{\gamma f}=\left[\partial\Big{(}f+\frac{1}{2\gamma}\left\|\cdot\right\|^{2}\Big{)}\right]^{-1}\circ(\gamma^{-1}{\rm Id}), (5)

which is monotone.

Proof.

The case of γ:=1\gamma:=1 is given in [13, Lemma 1], from which the following equivalence can be verified:

p𝐏𝐫𝐨𝐱γf(x)=[(γf+(1/2)2)]1(x)\displaystyle~{}p\in\mathbf{Prox}_{\gamma f}(x)=\left[\partial\left(\gamma f+(1/2)\left\|\cdot\right\|^{2}\right)\right]^{-1}(x)
\displaystyle\Leftrightarrow x(γf+(1/2)2)(p)=γ(f+(γ1/2)2)(p)\displaystyle~{}x\!\in\partial\!\left(\!\gamma f+(1/2)\left\|\cdot\right\|^{2}\right)\!(p)=\gamma\partial\!\left(f+(\gamma^{-1}/2)\left\|\cdot\right\|^{2}\right)\!(p)
\displaystyle\Leftrightarrow p[(f+(γ1/2)2)]1(γ1x).\displaystyle~{}p\in\left[\partial\left(f+(\gamma^{-1}/2)\left\|\cdot\right\|^{2}\right)\right]^{-1}(\gamma^{-1}x). (6)

Finally, monotonicity of 𝐏𝐫𝐨𝐱γf\mathbf{Prox}_{\gamma f} can readily be verified by combining Facts 1 and 3.1. ∎

Definition 2 (Single-valued proximity operator [13])

If 𝐏𝐫𝐨𝐱γf{\mathbf{Prox}}_{\gamma f} is single-valued, it is denoted by s-Proxγf:{{\rm s\mbox{-}Prox}}_{\gamma f}:{\mathcal{H}}\rightarrow{\mathcal{H}}, which is referred to as the s-prox operator of ff of index γ\gamma.

As a particular instance, if f+(η/2)2Γ0()f+(\eta/2)\left\|\cdot\right\|^{2}\in{\Gamma_{0}(\mathcal{H})} for some constant η(,γ1)\eta\in(-\infty,\gamma^{-1}), existence and uniqueness of minimizer is automatically ensured. In the convex case of fΓ0()f\in{\Gamma_{0}(\mathcal{H})}, s-Proxγf{{\rm s\mbox{-}Prox}}_{\gamma f} reduces to the classical Moreau’s proximity operator [22].

Fact 5 ([13] MoL-Grad Denoiser)

Let T:T:{\mathcal{H}}\rightarrow{\mathcal{H}}. Then, for every η[0,1)\eta\in[0,1), the following two conditions are equivalent.555 The case of η:=0\eta:=0 is due to [22].

  1. (C1)

    T=s-ProxφT={\rm s\mbox{-}Prox}_{\varphi} for some φ:(,+]\varphi:{\mathcal{H}}\rightarrow(-\infty,+\infty] such that φ+(η/2)2Γ0()\varphi+\left(\eta/2\right)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}).

  2. (C2)

    TT is a (1η)1(1-\eta)^{-1}-Lipschitz continuous gradient of a (Fréchet) differentiable convex function ψΓ0()\psi\in{\Gamma_{0}(\mathcal{H})}. In other words, TT is a monotone (1η)1(1-\eta)^{-1}-Lipschitz-continuous gradient (of a differentiable function).

If (C1), or equivalently (C2), is satisfied, then it holds that φ=ψ(1/2)2\varphi=\psi^{*}-(1/2)\left\|\cdot\right\|^{2} (ψ=(φ+(1/2)2))(\Leftrightarrow\psi=(\varphi+(1/2)\left\|\cdot\right\|^{2})^{*}).

3 Proximal Inclusion and Conversion

Suppose that a given discontinuous (monotone) operator TT is a selection of the (set-valued) proximity operator of some proper function. Then, there exists a principled way of constructing a continuous relaxation of TT which is the s-prox operator of a certain weakly convex function. This is the main claim of this article that will be supported by the key results — proximal inclusion and conversion. Some other results on set-valued proximity operators are also presented. All results (lemma, propositions, theorems, corollaries) in what follows are the original contributions of this work.

3.1 Interplay of Maximality of Monotone Operator and Weak Convexity of Function

Fact 5 presented above gives an interplay between η\eta-weakly convex functions for η[0,1)\eta\in[0,1) and the (1η)1(1-\eta)^{-1}-Lipschitz continuous gradients of smooth convex functions, stemming essentially from the duality between strongly convex functions and smooth convex functions. Now, the question is the following: is there any such relation in the case of η[1,+)\eta\in[1,+\infty)?

While the proximity operator s-Proxφ{\rm s\mbox{-}Prox}_{\varphi} is (Lipschitz) continuous in the case of η<1\eta<1, the case of η1\eta\geq 1 (more specifically, the case in which φ+(η/2)2Γ0()\varphi+(\eta/2)\left\|\cdot\right\|^{2}\not\in\Gamma_{0}({\mathcal{H}}) for any η<1\eta<1) includes those functions of which the “set-valued” proximity operator contains a discontinuous shrinkage operator as its selection. The following proposition concerns the case of η=1\eta=1, which will be linked to the case of η>1\eta>1 later in Section 3-3.2.

Proposition 1

Given a set-valued operator 𝖳:2\mathsf{T}:{\mathcal{H}}\rightarrow 2^{\mathcal{H}}, the following statements are equivalent.

  1. 1.

    𝖳=ψ\mathsf{T}=\partial\psi for some convex function ψΓ0()\psi\in\Gamma_{0}({\mathcal{H}}).

  2. 2.

    𝖳=𝐏𝐫𝐨𝐱ϕ\mathsf{T}=\mathbf{Prox}_{\phi} for some ϕ+(1/2)2Γ0()\phi+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}).

Moreover, if statements 1 – 2 are true, it holds that ϕ=ψ(1/2)2\phi=\psi^{*}-(1/2)\left\|\cdot\right\|^{2} (ψ=(ϕ+(1/2)2))(\Leftrightarrow\psi=(\phi+(1/2)\left\|\cdot\right\|^{2})^{*}).

Proof.

The equivalence 1) \Leftrightarrow 2) can be verified by showing the following equivalence basically:

𝖳=ψ,ψΓ0()\displaystyle~{}\mathsf{T}=\partial\psi,~{}\exists\psi\in\Gamma_{0}({\mathcal{H}})
\displaystyle\Leftrightarrow 𝖳=(ψ)1=[((ψ122)=:ϕ+122)]1\displaystyle~{}\mathsf{T}=(\partial\psi^{*})^{-1}=\Big{[}\partial\Big{(}\underbrace{\Big{(}\psi^{*}-\frac{1}{2}\left\|\cdot\right\|^{2}\Big{)}}_{=:\phi}+\frac{1}{2}\left\|\cdot\right\|^{2}\Big{)}\Big{]}^{-1}
=𝐏𝐫𝐨𝐱ψ(1/2)2,ψΓ0().\displaystyle\hskip 11.00008pt=\mathbf{Prox}_{\psi^{*}-(1/2)\left\|\cdot\right\|^{2}},~{}~{}~{}\exists\psi\in\Gamma_{0}({\mathcal{H}}).

(Proof of \Rightarrow) The last equality can be verified by Lemma 1, and ϕ+(1/2)2=ψΓ0()\phi+(1/2)\left\|\cdot\right\|^{2}=\psi^{*}\in\Gamma_{0}({\mathcal{H}}) follows by Fact 2.

(Proof of \Leftarrow) Letting ψ:=(ϕ+(1/2)2)\psi:=(\phi+(1/2)\left\|\cdot\right\|^{2})^{*}, we have ψΓ0()\psi\in{\Gamma_{0}(\mathcal{H})} again by Fact 2 so that 𝖳=((ϕ+(1/2)2))1=[(ϕ+(1/2)2)]=ψ\mathsf{T}=(\partial(\phi+(1/2)\left\|\cdot\right\|^{2}))^{-1}=\partial[(\phi+(1/2)\left\|\cdot\right\|^{2})^{*}]=\partial\psi. ∎

Proposition 1 bridges the subdifferential of convex function and the proximity operator of 1-weakly convex function, indicating an interplay between maximality of monotone operators and 1-weak convexity of functions.

Remark 1 (Role of Monotonicity)

From Proposition 1 together with Fact 4.1, ϕ+(1/2)2Γ0()\phi+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}) implies that 𝖳=𝐏𝐫𝐨𝐱ϕ\mathsf{T}=\mathbf{Prox}_{\phi} is maximally monotone. Viewing the proposition in light of Rockafeller’s cyclic monotonicity theorem [21, Theorem 22.18], moreover, one can see that 𝖳=𝐏𝐫𝐨𝐱ϕ\mathsf{T}=\mathbf{Prox}_{\phi} with ϕ+(1/2)2Γ0()\phi+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}) if and only if 𝖳\mathsf{T} is maximally cyclically monotone.666 An operator 𝖳:2\mathsf{T}:{\mathcal{H}}\rightarrow 2^{\mathcal{H}} is cyclically monotone if, for every integer n2n\geq 2, ui𝖳(xi)u_{i}\in\mathsf{T}(x_{i}) for i{1,2,,n}i\in\{1,2,\cdots,n\} and xn+1=x1x_{n+1}=x_{1} imply i=1nxi+1xi,ui0\sum_{i=1}^{n}\left\langle{x_{i+1}-x_{i}},{u_{i}}\right\rangle\leq 0. It is maximally cyclically monotone if no other cyclically monotone operator has its graph containing gra 𝖳\mathsf{T} properly [21]. An operator 𝖳:2\mathsf{T}:{\mathcal{H}}\rightarrow 2^{\mathcal{H}} is maximally cyclically monotone if and only if 𝖳=f\mathsf{T}=\partial f for some fΓ0()f\in\Gamma_{0}({\mathcal{H}}) (Rockafeller’s cyclic monotonicity theorem). It is clear under Fact 4.1 that a maximally cyclically monotone operator is maximally monotone; the converse is (true when ={\mathcal{H}}={\mathbb{R}} but) not true in general. In words, maximal cyclic monotonicity characterizes the property that TT can be expressed as the proximity operator (which is set-valued in general) of a 1-weakly convex function. Note that, whereas the proximity operator of a 1-weakly convex function is maximally monotone, the converse is not true in general; i.e., “maximal monotonicity” itself does not ensure that TT can be expressed as the proximity operator of a 1-weakly convex function.

In Fact 5, monotonicity plays a role of ensuring the convexity of ψ\psi. Indeed, the assumption on TT in Fact 5 (i.e., the condition for MoL-Grad denoiser) implies maximal cyclic monotonicity, since η\eta-weak convexity for η:=1β(0,1)\eta:=1-\beta\in(0,1) implies 1-weak convexity. The assumption required for η(0,1)\eta\in(0,1) is actually even stronger than maximal cyclic monotonicity (which is required for η=1\eta=1).

3.2 Proximal Inclusion

We start with the definition of the l.s.c. 1-weakly-convex envelope.

Definition 3 (L.s.c. 1-weakly-convex envelope)

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be a proper function such that (f+(1/2)2)(f+(1/2)\left\|\cdot\right\|^{2})^{*} is proper as well. Then, (f+(1/2)2)(1/2)2(f+(1/2)\left\|\cdot\right\|^{2})^{**}-(1/2)\left\|\cdot\right\|^{2} is the l.s.c. 1-weakly-convex envelope of ff. The notation ()~\widetilde{(\cdot)} will be used to denote the l.s.c. 1-weakly-convex envelope, such as f~:=(f+(1/2)2)(1/2)2\widetilde{f}:=(f+(1/2)\left\|\cdot\right\|^{2})^{**}-(1/2)\left\|\cdot\right\|^{2}. The envelope f~\widetilde{f} is the largest (proper) l.s.c. 1-weakly-convex function ϕ:(,+]\phi:{\mathcal{H}}\rightarrow(-\infty,+\infty] such that f(x)ϕ(x)f(x)\geq\phi(x), x\forall x\in{\mathcal{H}}.

The first key result is presented below.

Theorem 1 (Proximal inclusion)

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be a proper function such that (f+(1/2)2)(f+(1/2)\left\|\cdot\right\|^{2})^{*} is proper as well. Then, the following inclusion holds between the proximity operators of ff and its l.s.c. 1-weakly-convex envelope f~\widetilde{f}:

𝐏𝐫𝐨𝐱f(x)𝐏𝐫𝐨𝐱f~(x),x,\mathbf{Prox}_{f}(x)\subset\mathbf{Prox}_{\widetilde{f}}(x),~{}\forall x\in{\mathcal{H}}, (7)

i.e., gra𝐏𝐫𝐨𝐱fgra𝐏𝐫𝐨𝐱f~{\rm gra~{}}\mathbf{Prox}_{f}\subset{\rm gra~{}}\mathbf{Prox}_{\widetilde{f}}.

Proof.

By the assumption, we have (f+(1/2)2)Γ0()\big{(}f+(1/2)\left\|\cdot\right\|^{2}\big{)}^{*}\in\Gamma_{0}({\mathcal{H}}), and thus f~+(1/2)2=(f+(1/2)2)Γ0()\widetilde{f}+(1/2)\left\|\cdot\right\|^{2}=\big{(}f+(1/2)\left\|\cdot\right\|^{2}\big{)}^{**}\in\Gamma_{0}({\mathcal{H}}) by Fact 2. Hence, for every uu\in{\mathcal{H}}, it follows that

𝐏𝐫𝐨𝐱f(x)=[(f+(1/2)2)]1(x)\displaystyle\mathbf{Prox}_{f}(x)=[\partial(f+(1/2)\left\|\cdot\right\|^{2})]^{-1}(x)
[(f+(1/2)2)](x)=[((f+(1/2)2))]1(x)\displaystyle\subset\partial[(f+(1/2)\left\|\cdot\right\|^{2})^{*}](x)=[\partial((f+(1/2)\left\|\cdot\right\|^{2})^{**})]^{-1}(x)
=[(f~+(1/2)2)]1(x)=𝐏𝐫𝐨𝐱f~(x),\displaystyle\hskip 8.00003pt=[\partial(\widetilde{f}+(1/2)\left\|\cdot\right\|^{2})]^{-1}(x)=\mathbf{Prox}_{\widetilde{f}}(x), (8)

where the first and last equalities are due to Lemma 1, the inclusion is due to Fact 3.2, and the third equality is due to Fact 4.2. ∎

Theorem 1 implies that, if a discontinuous operator is a selection of 𝐏𝐫𝐨𝐱f\mathbf{Prox}_{f} for a nonconvex function ff, it can also be expressed as a selection of 𝐏𝐫𝐨𝐱f~\mathbf{Prox}_{\widetilde{f}} for a 1-weakly-convex function f~\widetilde{f}, which actually coincides with the l.s.c. 1-weakly-convex envelope of ff. This fact indicates that, when seeking for a shrinkage operator as (a selection of) the proximity operator, one may restrict attention to the class of weakly convex functions.

We remark that Theorem 1 is trivial if f+(1/2)2Γ0()f+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}), because f~=f\widetilde{f}=f in that case by Fact 2. Hence, our primary focus in Theorem 1 is on η\eta-weakly convex functions for η>1\eta>1, although Theorem 1 itself has no such a restriction. The following corollary is a direct consequence of Fact 5 (η<1\eta<1), Proposition 1 (η=1\eta=1), and Theorem 1 (η>1\eta>1).

Corollary 1

Let f+(η/2)2Γ0()f+(\eta/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}) for η(,+]\eta\in(-\infty,+\infty]. Then, the following statements hold.

  1. 1.

    If η1\eta\leq 1, 𝐏𝐫𝐨𝐱f\mathbf{Prox}_{f} is maximally cyclically monotone. In particular, if η<1\eta<1, the proximity operator is single valued, and s-Proxf{\rm s\mbox{-}Prox}_{f} is (1η)1(1-\eta)^{-1}-Lipschitz continuous.

  2. 2.

    If η>1\eta>1 (more specifically, if f+(1/2)2Γ0()f+(1/2)\left\|\cdot\right\|^{2}\not\in\Gamma_{0}({\mathcal{H}})), 𝐏𝐫𝐨𝐱f\mathbf{Prox}_{f} cannot be maximally cyclically monotone, as 𝐏𝐫𝐨𝐱f(x)𝐏𝐫𝐨𝐱(f+(1/2)2)(1/2)2(x)\mathbf{Prox}_{f}(x)\subsetneq\mathbf{Prox}_{(f+(1/2)\left\|\cdot\right\|^{2})^{**}-(1/2)\left\|\cdot\right\|^{2}}(x) for every xx\in{\mathcal{H}}, provided that (f+(1/2)2)(f+(1/2)\left\|\cdot\right\|^{2})^{*} is proper.

Remark 2 (On Theorem 1)
  1. 1.

    If the properness assumption of (f+(1/2)2)(f+(1/2)\left\|\cdot\right\|^{2})^{*} is violated, it holds that =dom(f+(1/2)2)dom(f+(1/2)2)\varnothing={\rm dom}~{}(f+(1/2)\left\|\cdot\right\|^{2})^{*}\supset{\rm dom}~{}\partial(f+(1/2)\left\|\cdot\right\|^{2})^{*} [21, Proposition 16.4], which implies that dom(f+(1/2)2)={\rm dom}~{}\partial(f+(1/2)\left\|\cdot\right\|^{2})^{*}=\varnothing. Thus, by Lemma 1, it can be verified, for every xx\in{\mathcal{H}}, that =(f+(1/2)2)(x)(f+(1/2)2)1(x)=𝐏𝐫𝐨𝐱f(x)\varnothing=\partial(f+(1/2)\left\|\cdot\right\|^{2})^{*}(x)\supset\partial(f+(1/2)\left\|\cdot\right\|^{2})^{-1}(x)=\mathbf{Prox}_{f}(x) (cf. Section 2), which implies that 𝐏𝐫𝐨𝐱f(x)=\mathbf{Prox}_{f}(x)=\varnothing.

  2. 2.

    By Fact 2, f~=f\widetilde{f}=f if and only if f+(1/2)2Γ0()f+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}).

  3. 3.

    The operator 𝐏𝐫𝐨𝐱f~\mathbf{Prox}_{\widetilde{f}} is maximally cyclically monotone.

3.3 Proximal Conversion

The second key result presented below gives a principled way of converting the set-valued proximity operator of 1-weakly convex function to a MoL-Grad denoiser.

Theorem 2 (Proximal conversion)

Let ϕ\phi be a function such that ϕ+(1/2)2Γ0()\phi+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}). Then, the proximity operator of the (δ+1)1(\delta+1)^{-1}-weakly convex function ϕ/(δ+1)\phi/(\delta+1) for the relaxation parameter δ++\delta\in{\mathbb{R}}_{++} can be expressed as

s-Proxϕ/(δ+1)=[𝐏𝐫𝐨𝐱ϕ1+δId]1(δ+1)Id,{\rm s\mbox{-}Prox}_{\phi/(\delta+1)}=\left[\mathbf{Prox}_{\phi}^{-1}+\delta{\rm Id}\right]^{-1}\circ(\delta+1){\rm Id}, (9)

which is (1+1/δ)(1+1/\delta)-Lipschitz continuous.

Proof.

Since ϕ\phi is 1-weakly convex, it is clear that ϕ/(δ+1)\phi/(\delta+1) is η\eta-weakly convex for η:=(δ+1)1(0,1)\eta:=(\delta+1)^{-1}\in(0,1). Letting f:=ϕf:=\phi and γ:=1/(δ+1)\gamma:=1/(\delta+1) in Lemma 1 yields

s-Proxϕ/(δ+1)=[(ϕ+((δ+1)/2)2)]1(δ+1)Id.{\rm s\mbox{-}Prox}_{\phi/(\delta+1)}=\left[\partial(\phi+((\delta+1)/2)\left\|\cdot\right\|^{2})\right]^{-1}\circ(\delta+1){\rm Id}. (10)

Here, since ϕ+(1/2)2Γ0()\phi+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathcal{H}}), it holds that (ϕ+((δ+1)/2)2)=(ϕ+(1/2)2)+[(δ/2)2]=(ϕ+(1/2)2)+δId=𝐏𝐫𝐨𝐱ϕ1+δId\partial(\phi+((\delta+1)/2)\left\|\cdot\right\|^{2})=\partial(\phi+(1/2)\left\|\cdot\right\|^{2})+\partial[(\delta/2)\left\|\cdot\right\|^{2}]=\partial(\phi+(1/2)\left\|\cdot\right\|^{2})+\delta{\rm Id}=\mathbf{Prox}_{\phi}^{-1}+\delta{\rm Id}, where the last equality can be verified by letting f:=ϕf:=\phi and γ:=1\gamma:=1 in Lemma 1. From Fact 5, the Lipschitz constant is given by (1η)1=1+1/δ(1-\eta)^{-1}=1+1/\delta, which completes the proof. ∎

Remark 3

Theorems 1 and 2 give a constructive proof for the existence of continuous relaxation of discontinuous operator, say TT, provided that TT is a selection of 𝐏𝐫𝐨𝐱f\mathbf{Prox}_{f} for some proper function f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty]. Such an ff exists if and only if there exists a proper function ψ:(,+]\psi:{\mathcal{H}}\rightarrow(-\infty,+\infty] such that T1(u)ψ(u)T^{-1}(u)\subset\partial\psi(u), urangeT\forall u\in{\rm range}~{}T. Concrete examples will be given in Section 4.

3.4 Surjectivity of Mapping and Continuity of Its Associated Function

An interplay between surjectivity of an operator TT and continuity (under weak convexity) of a function can be seen through a ‘lens’ of proximity operator.777 It is known that convexity of f:f:{\mathcal{H}}\rightarrow{\mathbb{R}} implies continuity of ff when {\mathcal{H}} is finite dimensional [21, Corollary 8.40].

Lemma 2

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be a proper function. Then, the following two statements are equivalent.

  1. 1.

    domf={\rm dom}~{}\partial f={\mathcal{H}}.

  2. 2.

    f:f:{\mathcal{H}}\rightarrow{\mathbb{R}} is convex and continuous over {\mathcal{H}}.

Proof.

1) \Rightarrow 2): By [21, Proposition 16.5] and Fact 2, it holds that domf=f=ffΓ0(){\rm dom}~{}\partial f={\mathcal{H}}\Rightarrow f=f^{**}\Leftrightarrow f\in\Gamma_{0}({\mathcal{H}}). Hence, invoking [21, Propositions 16.4 and 16.27], we obtain =domfdomf=intdomf=contf={\mathcal{H}}={\rm dom}~{}\partial f\subset{\rm dom}~{}f={\mathcal{H}}\Rightarrow{\rm int}~{}{\rm dom}~{}f={\rm cont}~{}f={\mathcal{H}}.

2) \Rightarrow 1): Clear from [21, Proposition 16.17(ii)]. ∎

Proposition 2

Let f:(,+]f:{\mathcal{H}}\rightarrow(-\infty,+\infty] be a proper l.s.c. function. Define an operator U:U:{\mathcal{H}}\rightarrow{\mathcal{H}} such that U(x)𝐏𝐫𝐨𝐱f(x)U(x)\in\mathbf{Prox}_{f}(x) for every xx\in{\mathcal{H}}. Consider the following two statements.

  1. 1.

    rangeU={\rm range}~{}U={\mathcal{H}}.

  2. 2.

    f+(1/2)2f+(1/2)\left\|\cdot\right\|^{2} is convex and continuous.

Then, 1) \Rightarrow 2). Assume that U=s-ProxfU={\rm s\mbox{-}Prox}_{f}; i.e., f+(1/2)x2f+(1/2)\left\|\cdot-x\right\|^{2} has a unique minimizer for every xx\in{\mathcal{H}}. Then, 1) \Leftrightarrow 2).

Proof.

By the assumption, it holds that =rangeUrange[(f+(1/2)2)]1=dom(f+(1/2)2)={\mathcal{H}}={\rm range}~{}U\subset{\rm range}~{}[\partial(f+(1/2)\left\|\cdot\right\|^{2})]^{-1}={\rm dom}~{}\partial(f+(1/2)\left\|\cdot\right\|^{2})={\mathcal{H}}. Hence, by Lemma 2, rangeU={\rm range}~{}U={\mathcal{H}} implies convexity and continuity of f+(1/2)2f+(1/2)\left\|\cdot\right\|^{2}. Under the assumption that U=s-ProxfU={\rm s\mbox{-}Prox}_{f}, by Lemma 2 again, rangeU=dom(f+(1/2)2)={\rm range}~{}U={\rm dom}~{}\partial(f+(1/2)\left\|\cdot\right\|^{2})={\mathcal{H}}. ∎

4 Application: Continuous Relaxation of Discontinuous Operator

As an illustrative example, we first show how hard shrinkage is converted to a continuous operator by leveraging Theorems 1 and 2. We then apply the same idea to the ROWL-based discontinuous operator to obtain its continuous relaxation.

4.1 An Illustrative Example: Converting Discontinuous Hard Shrinkage to Continuous Firm Shrinkage

The hard shrinkage operator with the threshold τ++\tau\in{\mathbb{R}}_{++} is defined by [4]

hardτ::xx1(|x|>τ):={0,if |x|τ,x,if |x|>τ,{\rm hard}_{\tau}:{\mathbb{R}}\!\rightarrow{\mathbb{R}}:x\mapsto x1(\left|x\right|>\tau):=\begin{cases}0,&\!\!\mbox{if }\left|x\right|\leq\tau,\\ x,&\!\!\mbox{if }\left|x\right|>\tau,\end{cases} (11)

for which it holds that

hardτ(x)𝐏𝐫𝐨𝐱gτ(x)={{0},if |x|<τ,{0,x},if |x|=τ,{x},if |x|>τ,\displaystyle{\rm hard}_{\tau}(x)\in\mathbf{Prox}_{g_{\tau}}(x)=\begin{cases}\{0\},&\mbox{if }\left|x\right|<\tau,\\ \{0,x\},&\mbox{if }\left|x\right|=\tau,\\ \{x\},&\mbox{if }\left|x\right|>\tau,\end{cases} (12)

where

gτ(x):=τ22x0={0,if x=0,τ22,if x0,x.g_{\tau}(x):=\frac{\tau^{2}}{2}\left\|x\right\|_{0}=\begin{cases}0,&\mbox{if }x=0,\\ \dfrac{\tau^{2}}{2},&\mbox{if }x\neq 0,\end{cases}~{}~{}~{}\forall x\in{\mathbb{R}}. (13)

The l.s.c. 1-weakly convex envelope g~τ:=(gτ+(1/2)()2)(1/2)()2\widetilde{g}_{\tau}:=(g_{\tau}+(1/2)(\cdot)^{2})^{**}-(1/2)(\cdot)^{2} of gτg_{\tau} is given by (see Fig. 1)

g~τ(x)=τφτMC(x)={τ|x|12x2,if |x|τ,τ22,if |x|>τ.\widetilde{g}_{\tau}(x)\!=\!\tau\varphi_{{\tau}}^{\rm MC}(x)=\begin{cases}\tau\left|x\right|-\dfrac{1}{2}x^{2},&\!\!\!\mbox{if }\!\left|x\right|\leq\tau,\\ \dfrac{\tau^{2}}{2},&\!\!\!\mbox{if }\!\left|x\right|>\tau.\end{cases} (14)

Here, φτ2MC(x):={|x|12τ2x2,if |x|τ2,12τ2,if |x|>τ2,\varphi_{\tau_{2}}^{\rm MC}(x):=\begin{cases}\left|x\right|-\frac{1}{2\tau_{2}}x^{2},&\mbox{if }\left|x\right|\leq\tau_{2},\\ \frac{1}{2}\tau_{2},&\mbox{if }\left|x\right|>\tau_{2},\end{cases} is the MC penalty of the parameter τ2++\tau_{2}\in{\mathbb{R}}_{++} [10, 11].

\psfrag{L0}[Bl][Bl][.7]{$\left\|\cdot\right\|_{0}$}\psfrag{L0+}[Bl][Bl][.7]{$\left\|\cdot\right\|_{0}+(1/2)(\cdot)^{2}$}\psfrag{MC}[Bl][Bl][.7]{$\widetilde{\left\|\cdot\right\|}_{0}$}\psfrag{MC rho}[Bl][Bl][.7]{$\widetilde{\left\|\cdot\right\|}_{0}/(\delta+1)$}\psfrag{biconjugation}[Bl][Bl][.7]{$(\left\|\cdot\right\|_{0}+(1/2)(\cdot)^{2})^{**}$}\psfrag{x}[Bl][Bl][1]{$x$}\includegraphics[width=227.62204pt]{fig1.eps}
Figure 1: 0\ell_{0} pseudo norm 0\left\|\cdot\right\|_{0}, its l.s.c. 1-weakly-convex envelope ~0\widetilde{\left\|\cdot\right\|}_{0}, and related functions. The biconjugate (0+2/2)(\left\|\cdot\right\|_{0}+\left\|\cdot\right\|^{2}/2)^{**} is the l.s.c. convex envelope of 0+2/2\left\|\cdot\right\|_{0}+\left\|\cdot\right\|^{2}/2.

The proximity operator of g~τ\widetilde{g}_{\tau} is given by

𝐏𝐫𝐨𝐱g~τ:\displaystyle\mathbf{Prox}_{\widetilde{g}_{\tau}}: x{{0},if |x|<τ,conv¯{0,x},if |x|=τ,{x},if |x|>τ,\displaystyle~{}x\mapsto\begin{cases}\{0\},&\mbox{if }\left|x\right|<\tau,\\ \overline{\rm conv}\{0,x\},&\mbox{if }\left|x\right|=\tau,\\ \{x\},&\mbox{if }\left|x\right|>\tau,\end{cases} (15)

where conv¯\overline{\rm conv} denotes the closed convex hull (conv¯{0,τ}=[0,τ]\overline{\rm conv}\{0,\tau\}=[0,\tau] and conv¯{0,τ}=[τ,0]\overline{\rm conv}\{0,-\tau\}=[-\tau,0]). Comparing (12) and (15), it can be seen that

(hardτ(x))𝐏𝐫𝐨𝐱gτ(x)𝐏𝐫𝐨𝐱g~τ(x),x,({\rm hard}_{\tau}(x)\in)\mathbf{Prox}_{g_{\tau}}(x)\subset\mathbf{Prox}_{\widetilde{g}_{\tau}}(x),~{}\forall x\in{\mathbb{R}}, (16)

as consistent with Theorem 1. This implies that hardτ{\rm hard}_{\tau} is also a selection of the proximity operator 𝐏𝐫𝐨𝐱g~τ\mathbf{Prox}_{\widetilde{g}_{\tau}} of the 1-weakly convex function g~τ\widetilde{g}_{\tau}, which is maximally monotone. Indeed, 𝐏𝐫𝐨𝐱g~τ\mathbf{Prox}_{\widetilde{g}_{\tau}} in (15) is the (unique) maximally monotone extension of hardτ{\rm hard}_{\tau} [20] (cf. Remark 2).

We now invoke Theorem 2 to obtain the continuous operator

s-Proxg~τ/(δ+1)=\displaystyle{\rm s\mbox{-}Prox}_{\widetilde{g}_{\tau}/(\delta+1)}= [𝐏𝐫𝐨𝐱g~τ1+δId]1(δ+1)Id\displaystyle~{}\Big{[}\mathbf{Prox}_{\widetilde{g}_{\tau}}^{-1}+\delta{\rm Id}\Big{]}^{-1}\circ(\delta+1){\rm Id}
=\displaystyle= firmτ/(δ+1),τ,\displaystyle~{}{\rm firm}_{\tau/(\delta+1),\tau}, (17)

where the firm shrinkage operator [9] for the thresholds τ1++\tau_{1}\in{\mathbb{R}}_{++}, τ2(τ1,+)\tau_{2}\in(\tau_{1},+\infty), is defined by firmτ1,τ2:=s-Proxτ1φτ2MC::x0if |x|τ1;xsign(x)τ2(|x|τ1)τ2τ1if τ1<|x|τ2;xxif |x|>τ2{\rm firm}_{\tau_{1},\tau_{2}}:={\rm s\mbox{-}Prox}_{\tau_{1}\varphi_{\tau_{2}}^{\rm MC}}:{\mathbb{R}}\rightarrow{\mathbb{R}}:x\mapsto 0~{}\mbox{if }\left|x\right|\leq\tau_{1};x\mapsto{\rm sign}(x)\frac{\tau_{2}(\left|x\right|-\tau_{1})}{\tau_{2}-\tau_{1}}~{}\mbox{if }\tau_{1}<\left|x\right|\leq\tau_{2};x\mapsto x~{}\mbox{if }\left|x\right|>\tau_{2}. We remark that τ1φτ2MC\tau_{1}\varphi_{\tau_{2}}^{\rm MC} is τ1/τ2\tau_{1}/\tau_{2}-weakly convex (g~τ/(δ+1)\widetilde{g}_{\tau}/(\delta+1) in (17) is 1/(δ+1)1/(\delta+1)-weakly convex) and its “single-valued” proximity operator gives firm shrinkage, while g~τ\widetilde{g}_{\tau} in (16) is 1-weakly convex and a selection of its “set-valued” proximity operator gives hard shrinkage. We also mention that the limit of the continuous operator firmτ/(δ+1),τ{\rm firm}_{\tau/(\delta+1),\tau} with respect to the relaxation parameter δ\delta coincides with the discontinuous operator hardτ{\rm hard}_{\tau}; i.e., limδ0firmτ/(δ+1),τ(x)=hard~τ(x)=x1(|x|τ)𝐏𝐫𝐨𝐱gτ(x)\lim_{\delta\downarrow 0}{\rm firm}_{\tau/(\delta+1),\tau}(x)=\widetilde{\rm hard}_{\tau}(x)=x1(\left|x\right|\geq\tau)\in\mathbf{Prox}_{g_{\tau}}(x) for every xx\in{\mathbb{R}} (see Fig. 2).

\psfrag{x}[Bc][Bc][1]{$x$}\psfrag{delta}[Bc][Bc][1]{$\delta\downarrow 0$}\psfrag{delta1}[Br][Br][1]{$\delta=1$}\psfrag{delta3}[Br][Br][1]{$\delta=3$}\psfrag{delta13}[Br][Br][1]{$\delta=1/3$}\includegraphics[width=170.71652pt]{fig2.eps}
Figure 2: Pointwise convergence of firm1/(δ+1),1{\rm firm}_{1/(\delta+1),1} to hard~1\widetilde{\rm hard}_{1} as δ0\delta\downarrow 0.

Fig. 3(a)–(g) illustrates the process of obtaining a continuous relaxation s-Prox23g~1{\rm s\mbox{-}Prox}_{\frac{2}{3}\widetilde{g}_{1}} of the discontinuous operator hard1, corresponding to the case of τ:=1\tau:=1 and δ:=1/2\delta:=1/2. Comparing the graphs of the discontinuous operator hard1, 𝐏𝐫𝐨𝐱g1\mathbf{Prox}_{g_{1}}, and 𝐏𝐫𝐨𝐱g~1(=:𝖧)\mathbf{Prox}_{\widetilde{g}_{1}}(=:\mathsf{H}), one can observe that (see Theorem 1 for the second inclusion)

grahard1gra𝐏𝐫𝐨𝐱g1gra𝐏𝐫𝐨𝐱g~1.{\rm gra}~{}{\rm hard}_{1}\subset{\rm gra}~{}\mathbf{Prox}_{g_{1}}\subset{\rm gra}~{}\mathbf{Prox}_{\widetilde{g}_{1}}. (18)

The maximally monotone operator 𝐏𝐫𝐨𝐱g~1\mathbf{Prox}_{\widetilde{g}_{1}} (see Remark 1) is then converted to s-Prox(2/3)g~1=firm2τ/3,τ{\rm s\mbox{-}Prox}_{(2/3)\widetilde{g}_{1}}={\rm firm}_{2\tau/3,\tau} in a step-by-step manner using (17) (which is based on Theorem 2). Inspecting the figure under Corollary 1, Figs. 3(b) and 3(h) correspond to Corollary 1.2 (not maximally monotone) for η:=+\eta:=+\infty and η:=4\eta:=4, respectively, and Figs. 3(c) and 3(g) correspond to Corollary 1.1 (maximally monotone) for η:=1\eta:=1 and η:=2/3\eta:=2/3, respectively.

Letting η:=1/(δ+1)\eta:=1/(\delta+1), (17) for δ++\delta\in{\mathbb{R}}_{++} concerns the case of η(0,1)\eta\in(0,1). In the case of η[1,+)\eta\in[1,+\infty), on the other hand, the proximity operator of the η\eta-weakly convex function ηg~τ\eta\widetilde{g}_{\tau} is set-valued. Actually, it is not difficult to verify that 𝐏𝐫𝐨𝐱ηg~τ=𝐏𝐫𝐨𝐱ηgτ\mathbf{Prox}_{\eta\widetilde{g}_{\tau}}=\mathbf{Prox}_{\eta g_{\tau}} for every η(1,+)\eta\in(1,+\infty). Note here that this is not true for η:=1\eta:=1, i.e., 𝐏𝐫𝐨𝐱g~τ𝐏𝐫𝐨𝐱gτ\mathbf{Prox}_{\widetilde{g}_{\tau}}\neq\mathbf{Prox}_{g_{\tau}}, as can be seen from Figs. 3(b) and 3(c). See Fig. 3(h) for the case of η:=4\eta:=4.

\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3a.eps}
(a) hard1
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3b.eps}
(b) 𝐏𝐫𝐨𝐱g1\mathbf{Prox}_{g_{1}} (η=+\eta=+\infty)
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3c.eps}
(c) 𝖧:=𝐏𝐫𝐨𝐱g~1\mathsf{H}:=\mathbf{Prox}_{\widetilde{g}_{1}} (η=1\eta=1)
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3d.eps}
(d) 𝖧1\mathsf{H}^{-1}
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3e.eps}
(e) 𝖧1+0.5Id\mathsf{H}^{-1}+0.5{\rm Id}
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3f.eps}
(f) (𝖧1+0.5Id)1(\mathsf{H}^{-1}+0.5{\rm Id})^{-1}
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3g.eps}
(g) s-Prox(2/3)g~1{\rm s\mbox{-}Prox}_{(2/3)\widetilde{g}_{1}} (η=2/3\eta=2/3)
\psfrag{delta}[Bl][Bl][.8]{$\delta\downarrow 0$}\psfrag{3/2}[Bc][Bc][.8]{$\dfrac{3}{2}$}\psfrag{2/3}[Bc][Bc][.8]{$\dfrac{2}{3}$}\psfrag{x}[Bl][Bl][1]{$x$}\psfrag{inf}[Bl][Bl][1]{$\infty$}\psfrag{a}[Bl][Bl][1]{(i)}\psfrag{bd}[Bl][Bl][1]{(ii)}\psfrag{f}[Bl][Bl][1]{(vi)}\includegraphics[width=102.43008pt]{fig3h.eps}
(h) 𝐏𝐫𝐨𝐱4g~1=𝐏𝐫𝐨𝐱4g1\mathbf{Prox}_{4\widetilde{g}_{1}}=\mathbf{Prox}_{4g_{1}} (η=4\eta=4)
Figure 3: Graphs of (a) hard1, the proximity operators of (b) g1=(1/2)0g_{1}=(1/2)\left\|\cdot\right\|_{0}, (c) g~1\widetilde{g}_{1}, and (g) (2/3)g~1(2/3)\widetilde{g}_{1}, (d)–(f) the intermediate operators in conversion from (c) to (g), and (h) the proximity operator of 4g~14\widetilde{g}_{1}.

4.2 eROWL Shrinkage: Continuous Relaxation of ROWL Shrinkage Operator

We have seen that the discontinuous hard shrinkage operator is converted to the continuous firm shrinkage operator via the transformation from (i) to (vi) in Fig. 3(a). We mimic this procedure for another discontinuous operator.

We consider the Euclidean case :=N{\mathcal{H}}:={\mathbb{R}}^{N} for N2N\geq 2. Let 𝒘+N{\boldsymbol{w}}\in{\mathbb{R}}_{+}^{N} be the weight vector such that 0w1w2wN0\leq w_{1}\leq w_{2}\leq\cdots\leq w_{N}. Given 𝒙N{\boldsymbol{x}}\in{\mathbb{R}}^{N}, we define |𝒙|+N\left|{\boldsymbol{x}}\right|\in{\mathbb{R}}_{+}^{N} of which the iith component is given by |xi|\left|x_{i}\right|. Let |𝒙|+N={𝒙+Nx1x2xN}\left|{\boldsymbol{x}}\right|_{\downarrow}\in{\mathbb{R}}_{+\downarrow}^{N}=\{{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{N}\mid x_{1}\geq x_{2}\geq\cdots\geq x_{N}\} denote a sorted version of |𝒙|\left|{\boldsymbol{x}}\right| in the nonincreasing order; i.e., [|𝒙|]1[|𝒙|]2[|𝒙|]N\big{[}\left|{\boldsymbol{x}}\right|_{\downarrow}\big{]}_{1}\geq\big{[}\left|{\boldsymbol{x}}\right|_{\downarrow}\big{]}_{2}\geq\cdots\geq\big{[}\left|{\boldsymbol{x}}\right|_{\downarrow}\big{]}_{N}. The reversely ordered weighted 1\ell_{1} (ROWL) penalty [14] is defined by Ω𝒘(𝒙):=𝒘|𝒙|\Omega_{{\boldsymbol{w}}}({\boldsymbol{x}}):={\boldsymbol{w}}^{\top}\left|{\boldsymbol{x}}\right|_{\downarrow}. The penalty Ω𝒘\Omega_{{\boldsymbol{w}}} is nonconvex and thus not a norm.888If w1w2wNw_{1}\geq w_{2}\geq\cdots\geq w_{N}, the function 𝒙𝒘|𝒙|{\boldsymbol{x}}\mapsto{\boldsymbol{w}}^{\top}\left|{\boldsymbol{x}}\right|_{\downarrow} is convex, and it is called the ordered weighted 1\ell_{1} (OWL) norm [23].

In this case, the associated proximity operator will be discontinuous. This implies in light of Fact 5 that Ω𝒘\Omega_{{\boldsymbol{w}}} is not even weakly convex. To see this, let us consider the case of N=2N=2, and let (0)w1<w2(0\leq)w_{1}<w_{2}. In this case, by symmetry, the proximity operator is given by 𝐏𝐫𝐨𝐱Ω𝒘(𝒙)=sgn(𝒙)𝐏𝐫𝐨𝐱Ω𝒘(|𝒙|),𝒙2\mathbf{Prox}_{\Omega_{{\boldsymbol{w}}}}({\boldsymbol{x}})={\rm sgn}({\boldsymbol{x}})\odot\mathbf{Prox}_{\Omega_{{\boldsymbol{w}}}}(\left|{\boldsymbol{x}}\right|),~{}{\boldsymbol{x}}\in{\mathbb{R}}^{2}, where sgn(){\rm sgn}(\cdot) is the componentwise signum function, \odot denotes the Hadamard (componentwise) product, and for 𝒙+2{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2}

𝐏𝐫𝐨𝐱Ω𝒘(𝒙)={{(𝒙𝒘)+},if x1>x2,{(𝒙𝒘)+,(𝒙𝒘)+},if x1=x2,{(𝒙𝒘)+},if x1<x2.\mathbf{Prox}_{\Omega_{{\boldsymbol{w}}}}({\boldsymbol{x}})\!=\!\begin{cases}\!\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+}\},&\mbox{if }\!x_{1}\!>\!x_{2},\\ \!\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+},({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\},&\mbox{if }\!x_{1}\!=\!x_{2},\\ \!\{({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\},&\mbox{if }\!x_{1}\!<\!x_{2}.\\ \end{cases} (19)

Here, 𝒘:=[w2,w1]{\boldsymbol{w}}_{\downarrow}:=[w_{2},w_{1}]^{\top}, and ()+:22:[y1,y2][max{y1,0},max{y2,0}](\cdot)_{+}:{\mathbb{R}}^{2}\rightarrow{\mathbb{R}}^{2}:[y_{1},y_{2}]^{\top}\mapsto[\max\{y_{1},0\},\max\{y_{2},0\}]^{\top} is the ‘ramp’ function. A selection of 𝐏𝐫𝐨𝐱Ω𝒘\mathbf{Prox}_{\Omega_{{\boldsymbol{w}}}} will be referred to as ROWL shrinkage.

Note that the set {(𝒙𝒘)+,(𝒙𝒘)+}2\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+},({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\}\subset{\mathbb{R}}^{2} in (19) is discrete. This is similar to the case of 𝐏𝐫𝐨𝐱0\mathbf{Prox}_{\left\|\cdot\right\|_{0}} in (12). Thus, resembling the relation between 𝐏𝐫𝐨𝐱gτ\mathbf{Prox}_{g_{\tau}} and 𝐏𝐫𝐨𝐱g~τ\mathbf{Prox}_{\widetilde{g}_{\tau}} corresponding to (i) and (ii) of Fig. 3(a), respectively, we replace the discrete set to its closed convex hull999 For a set {𝒂,𝒃}2\{{\boldsymbol{a}},{\boldsymbol{b}}\}\subset{\mathbb{R}}^{2}, its closed convex hull is given by {α𝒂+(1α)𝒃α[0,1]}\{\alpha{\boldsymbol{a}}+(1-\alpha){\boldsymbol{b}}\mid\alpha\in[0,1]\}. conv¯{(𝒙𝒘)+,(𝒙𝒘)+}\overline{{\rm conv}}\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+},({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\}. This replacement yields the set-valued operator 𝖱:222:𝒙{sgn(𝒙)𝒚𝒚𝖱(|𝒙|)}\mathsf{R}:{\mathbb{R}}^{2}\rightarrow 2^{{\mathbb{R}}^{2}}:{\boldsymbol{x}}\mapsto\{{\rm sgn}({\boldsymbol{x}})\odot{\boldsymbol{y}}\mid{\boldsymbol{y}}\in\mathsf{R}(\left|{\boldsymbol{x}}\right|)\}, where for 𝒙+2{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2}

𝖱(𝒙)={{(𝒙𝒘)+},if x1>x2,conv¯{(𝒙𝒘)+,(𝒙𝒘)+},if x1=x2,{(𝒙𝒘)+},if x1<x2.\mathsf{R}({\boldsymbol{x}})\!=\!\begin{cases}\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+}\},&\mbox{if }\!x_{1}\!>\!x_{2},\\ \overline{{\rm conv}}\{({\boldsymbol{x}}-{\boldsymbol{w}})_{+},({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\},&\mbox{if }\!x_{1}\!=\!x_{2},\\ \{({\boldsymbol{x}}-{\boldsymbol{w}}_{\downarrow})_{+}\},&\mbox{if }\!x_{1}\!<\!x_{2}.\\ \end{cases} (20)

As expected, it can be shown that 𝖱=𝐏𝐫𝐨𝐱Ω~𝒘\mathsf{R}=\mathbf{Prox}_{\widetilde{\Omega}_{{\boldsymbol{w}}}} for the 1-weakly convex function101010One may add 𝒘𝒘/4{\boldsymbol{w}}^{\top}{\boldsymbol{w}}/4 to Ω~𝒘\widetilde{\Omega}_{\boldsymbol{w}} to make the minimum value be zero. (see Figs. 4 and 5)

Ω~𝒘\displaystyle\widetilde{\Omega}_{{\boldsymbol{w}}} (𝒙):=(Ω𝒘+(1/2)2)(𝒙)(1/2)𝒙2\displaystyle({\boldsymbol{x}}):=(\Omega_{{\boldsymbol{w}}}+(1/2)\left\|\cdot\right\|^{2})^{**}({\boldsymbol{x}})-(1/2)\left\|{\boldsymbol{x}}\right\|^{2}
=\displaystyle= {𝒘|𝒙|,if |𝒙|𝒦1,𝒘|𝒙|12(|𝒙|+𝒘)𝑪(|𝒙|+𝒘),if |𝒙|𝒦2,\displaystyle\begin{cases}\!{\boldsymbol{w}}^{\top}\!\left|{\boldsymbol{x}}\right|_{\downarrow},&\!\!\mbox{if }\!\left|{\boldsymbol{x}}\right|_{\downarrow}\in\mathcal{K}_{1},\\ \!{\boldsymbol{w}}^{\top}\!\left|{\boldsymbol{x}}\right|_{\downarrow}-\frac{1}{2}(\left|{\boldsymbol{x}}\right|_{\downarrow}\!+\!{\boldsymbol{w}})^{\top}{\boldsymbol{C}}(\left|{\boldsymbol{x}}\right|_{\downarrow}\!+\!{\boldsymbol{w}}),&\!\!\mbox{if }\!\left|{\boldsymbol{x}}\right|_{\downarrow}\in\mathcal{K}_{2},\end{cases}

where 𝑪:=12[​ 1111]=𝑽[1000]𝑽{\boldsymbol{C}}:=\dfrac{1}{2}\left[\!\begin{tabular}[]{rr}\! $1$\!\!&$-1$\\ \!$-1$\!\!&$1$\end{tabular}\!\right]={\boldsymbol{V}}\left[\!\begin{tabular}[]{rr}$1$\!\!&$0$\\ $0$\!\!&$0$\end{tabular}\!\right]{\boldsymbol{V}}^{\top} with 𝑽:=12[1111]{\boldsymbol{V}}:=\dfrac{1}{\sqrt{2}}\left[\!\begin{tabular}[]{rr}$1$&$1$\\ $-1$&$1$\end{tabular}\!\right], 𝒦1:={𝒙+2x1x2+w2w1}\mathcal{K}_{1}:=\{{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2}\mid x_{1}\geq x_{2}+w_{2}-w_{1}\}, and 𝒦2:=+2𝒦1={𝒙+2x2x1<x2+w2w1}\mathcal{K}_{2}:={\mathbb{R}}_{+\downarrow}^{2}\setminus\mathcal{K}_{1}=\{{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2}\mid x_{2}\leq x_{1}<x_{2}+w_{2}-w_{1}\}.

Refer to caption
(a) eROWL
Refer to caption
(b) eROWL
Refer to caption
(c) ROWL
Refer to caption
(d) ROWL
Figure 4: Surface and contours of eROWL and ROWL for the weight vector 𝒘=[1,5]{\boldsymbol{w}}=[1,5]^{\top}.

We now derive the extended ROWL (eROWL) shrinkage operator by resembling the relation between 𝐏𝐫𝐨𝐱~0\mathbf{Prox}_{\widetilde{\left\|\cdot\right\|}_{0}} and s-Prox~0/(δ+1){\rm s\mbox{-}Prox}_{\widetilde{\left\|\cdot\right\|}_{0}/(\delta+1)} corresponding to (ii) and (vi) of Fig. 3, respectively. The eROWL shrinkage operator for the relaxation parameter δ++\delta\in{\mathbb{R}}_{++} is defined by

Rδ:=s-ProxΩ~𝒘/(δ+1),R_{\delta}:={\rm s\mbox{-}Prox}_{\widetilde{\Omega}_{{\boldsymbol{w}}}/(\delta+1)}, (21)

which is (1+1/δ)(1+1/\delta)-Lipschitz continuous (see Theorem 2). It can readily be verified that 𝒑:=Rδ(𝒙)=[𝖱1+δId]1((δ+1)𝒙)(δ+1)𝒙𝖱1(𝒑)+δ𝒑{\boldsymbol{p}}:=R_{\delta}({\boldsymbol{x}})=\left[\mathsf{R}^{-1}+\delta{\rm Id}\right]^{-1}((\delta+1){\boldsymbol{x}})\Leftrightarrow(\delta+1){\boldsymbol{x}}\in\mathsf{R}^{-1}({\boldsymbol{p}})+\delta{\boldsymbol{p}}, which gives the geometric interpretation shown in Fig. 6. Using this, we can verify that Rδ(𝒙)=sgn(𝒙)Rδ(|𝒙|)R_{\delta}({\boldsymbol{x}})={\rm sgn}({\boldsymbol{x}})\odot R_{\delta}(\left|{\boldsymbol{x}}\right|), where, for 𝒙+2{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2},

Rδ(𝒙)={(𝒙1δ+1𝒘)+,if x1x2,𝒙𝒞1𝒞2(𝒙1δ+1𝒘)+,if x1<x2,𝒙𝒞1𝒞2,[mw10]+(δ+1)x2mδ[11],if 𝒙𝒞1,𝒙1δ+1𝒘α(𝒙),if 𝒙𝒞2.\displaystyle R_{\delta}({\boldsymbol{x}})=\left\{\begin{array}[]{l}\big{(}{\boldsymbol{x}}-\frac{1}{\delta+1}{\boldsymbol{w}}\big{)}_{+},~{}\mbox{if }\!x_{1}\geq x_{2},{\boldsymbol{x}}\not\in\mathcal{C}_{1}\cup\mathcal{C}_{2}\\ \big{(}{\boldsymbol{x}}-\frac{1}{\delta+1}{\boldsymbol{w}}_{\downarrow}\big{)}_{+},~{}\mbox{if }x_{1}<x_{2},{\boldsymbol{x}}\not\in\mathcal{C}_{1}\cup\mathcal{C}_{2},\\ \!\!\left[\begin{array}[]{c}m-w_{1}\\ 0\end{array}\right]+\dfrac{(\delta+1)x_{2}-m}{\delta}\left[\begin{array}[]{c}\!-1\!\\ 1\end{array}\right]\!,~{}\mbox{if }{\boldsymbol{x}}\in\mathcal{C}_{1},\\ {\boldsymbol{x}}-\frac{1}{\delta+1}{\boldsymbol{w}}_{\alpha({\boldsymbol{x}})},~{}\mbox{if }{\boldsymbol{x}}\in\mathcal{C}_{2}.\end{array}\right. (30)

Here, m:=[(δ+1)(x1+x2)+δw1]/(δ+2)+m:=[(\delta+1)(x_{1}+x_{2})+\delta w_{1}]/(\delta+2)\in{\mathbb{R}}_{+}, 𝒘α(𝒙):=α(𝒙)𝒘+(1α(𝒙))𝒘+2{\boldsymbol{w}}_{\alpha({\boldsymbol{x}})}:=\alpha({\boldsymbol{x}}){\boldsymbol{w}}+(1-\alpha({\boldsymbol{x}})){\boldsymbol{w}}_{\downarrow}\in{\mathbb{R}}_{+}^{2} for α(𝒙):=1/2+(δ+1)(x1x2)/(2δ(w2w1))(0,1)\alpha({\boldsymbol{x}}):=1/2+(\delta+1)(x_{1}-x_{2})/(2\delta(w_{2}-w_{1}))\in(0,1),

𝒞1:=\/1+/2+\mathcal{C}_{1}:=\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{-}\cap\mathcal{H}_{\mbox{{\tiny$/$}}1}^{+}\cap\mathcal{H}_{\mbox{{\tiny$/$}}2}^{+} (31)

is a triangle given by the intersection of three halfspaces

\:=\displaystyle\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{-}:= {𝒙2x1+x2(w1+w2)/(δ+1)+w2w1},\displaystyle~{}\{{\boldsymbol{x}}\in{\mathbb{R}}^{2}\mid x_{1}\!+\!x_{2}\leq(w_{1}\!+\!w_{2})/(\delta\!+\!1)\!+\!w_{2}\!-\!w_{1}\},
/1+:=\displaystyle\mathcal{H}_{\mbox{{\tiny$/$}}1}^{+}:= {𝒙2x1+(δ+1)x2>δw1/(δ+1)},\displaystyle~{}\{{\boldsymbol{x}}\in{\mathbb{R}}^{2}\mid-x_{1}+(\delta+1)x_{2}>\delta w_{1}/(\delta+1)\},
/2+:=\displaystyle\mathcal{H}_{\mbox{{\tiny$/$}}2}^{+}:= {𝒙2(δ+1)x1x2>δw1/(δ+1)},\displaystyle~{}\{{\boldsymbol{x}}\in{\mathbb{R}}^{2}\mid(\delta+1)x_{1}-x_{2}>\delta w_{1}/(\delta+1)\},

and

𝒞2:=\+𝒮\mathcal{C}_{2}:=\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{+}\cap\mathcal{S} (32)

is an unbounded set given by the intersection of the hyperslab and the halfspace

𝒮:=\displaystyle\mathcal{S}:= {𝒙2|x1x2|<δ(w2w1)/(δ+1)}\displaystyle~{}\{{\boldsymbol{x}}\in{\mathbb{R}}^{2}\mid\left|x_{1}-x_{2}\right|<\delta(w_{2}-w_{1})/(\delta+1)\} (33)
\+:=\displaystyle\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{+}\!:= {𝒙2x1+x2>(w1+w2)/(δ+1)+w2w1}\displaystyle~{}\{{\boldsymbol{x}}\in{\mathbb{R}}^{2}\mid x_{1}\!+\!x_{2}>(w_{1}\!+\!w_{2})/(\delta+1)+w_{2}-w_{1}\}
=\displaystyle= 2\.\displaystyle~{}{\mathbb{R}}^{2}\setminus\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{-}. (34)
Refer to caption
(a) eROWL
Refer to caption
(b) eROWL
Refer to caption
(c) ROWL
Refer to caption
(d) ROWL
Figure 5: Surface and contours of eROWL and ROWL for the weight vector 𝒘=[0,5]{\boldsymbol{w}}=[0,5]^{\top}.

For every 𝒙2{\boldsymbol{x}}\in{\mathbb{R}}^{2}, it holds that limδ0Rδ(𝒙)𝖱(𝒙)\lim_{\delta\downarrow 0}R_{\delta}({\boldsymbol{x}})\in\mathsf{R}({\boldsymbol{x}}), where limδ0Rδ(𝒙)=𝒙𝒘1/2\lim_{\delta\downarrow 0}R_{\delta}({\boldsymbol{x}})={\boldsymbol{x}}-{\boldsymbol{w}}_{1/2} over {𝒙+2x1=x2}\{{\boldsymbol{x}}\in{\mathbb{R}}_{+}^{2}\mid x_{1}=x_{2}\}. An arbitrary selection U:22U:{\mathbb{R}}^{2}\rightarrow{\mathbb{R}}^{2} of the set-valued operator 𝖱(=𝐏𝐫𝐨𝐱Ω~𝒘)\mathsf{R}(=\mathbf{Prox}_{\widetilde{\Omega}_{{\boldsymbol{w}}}}) jointly satisfies (i) range U2U\neq{\mathbb{R}}^{2} and (ii) Ω~𝒘+(1/2)2Γ0(2)\widetilde{\Omega}_{{\boldsymbol{w}}}+(1/2)\left\|\cdot\right\|^{2}\in\Gamma_{0}({\mathbb{R}}^{2}). This gives a counterexample where 2)⇏1)2)\not\Rightarrow 1) in Proposition 2. (The same applies to a selection of 𝐏𝐫𝐨𝐱0~\mathbf{Prox}_{\widetilde{\left\|\cdot\right\|_{0}}}.) On the other hand, it holds that range Rδ=2R_{\delta}={\mathbb{R}}^{2}, as consistent with Proposition 2. The operator RδR_{\delta} is a MoL-Grad denoiser; this is a direct consequence of Theorem 2 and Fact 5.

Corollary 2

For every δ++\delta\in{\mathbb{R}}_{++}, Rδ=s-ProxΩ~𝐰/(δ+1)R_{\delta}={\rm s\mbox{-}Prox}_{\widetilde{\Omega}_{{\boldsymbol{w}}}/(\delta+1)} can be expressed as the (1+1/δ1+1/\delta)-Lipschitz continuous gradient of a differentiable convex function.

\psfrag{M}[Bc][Bc][1]{$\mathcal{M}:={\rm span}\{[1,1]^{\top}\}$}\psfrag{Tm}[Bc][Bc][1]{$\mathsf{R}({\boldsymbol{m}})$}\psfrag{m}[Bl][Bl][1]{${\boldsymbol{m}}=\mathsf{R}^{-1}({\boldsymbol{p}})$}\psfrag{=}[Bl][Bl][1]{$=\mathsf{R}^{-1}(\hat{{\boldsymbol{p}}})$}\psfrag{x}[Br][Br][1]{${\boldsymbol{x}}$}\psfrag{x1}[Bl][Bl][1]{$x_{1}$}\psfrag{x2}[Br][Br][1]{$x_{2}$}\psfrag{p}[Bl][Bl][1]{${\boldsymbol{p}}=R_{\delta}({\boldsymbol{x}})$}\psfrag{xt}[Br][Br][1]{$\hat{{\boldsymbol{x}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}=R_{\delta}(\hat{{\boldsymbol{x}}})$}\psfrag{distrho}[Bl][Bl][.8]{$\dfrac{(w_{1}+w_{2})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist1}[Bl][Bl][.8]{$\dfrac{w_{1}+w_{2}}{\sqrt{2}(\delta+1)}$}\psfrag{dist2rho}[Br][Br][.8]{$\dfrac{(w_{2}-w_{1})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist21}[Br][Br][.8]{$\dfrac{w_{2}-w_{1}}{\sqrt{2}(\delta+1)}$}\psfrag{w2}[Br][Br][1]{$w_{2}-w_{1}$}\psfrag{w2delta}[Br][Br][1]{$\frac{(w_{2}-w_{1})\delta}{\delta+1}$}\includegraphics[height=142.26378pt]{fig6.eps}
Figure 6: A geometric interpretation of conversion from 𝖱\mathsf{R} to RδR_{\delta}.
Refer to caption
(a) ROWL shrinkage
Refer to caption
(b) eROWL shrinkage RδR_{\delta}
Figure 7: Behaviours of ROWL shrinkage and eROWL shrinkage (continuous relaxation) RδR_{\delta}. The shaded region is the range of operator.

Figure 7 depicts how each point on 2{\mathbb{R}}^{2} is mapped by the operators, where the dotted line on the diagonal is the set of 𝒙{\boldsymbol{x}}’s with x1=x2x_{1}=x_{2}. In the case of ROWL, the mapping is discontinuous on the ‘border’ of the diagonal line. The discontinuity may cause difficulty in analysing convergence when the operator is used in the splitting algorithms. In the case of RδR_{\delta}, the displacement vector Rδ(𝒙)𝒙R_{\delta}({\boldsymbol{x}})-{\boldsymbol{x}} on the ‘border’ is aligned with [1,1][-1,-1]^{\top}, and it changes continuously when 𝒙{\boldsymbol{x}} moves away from there gradually.

Remark 4

The eROWL shrinkage operator RδR_{\delta} given in (30) can be represented with another set of parameters 𝐰~eROWL:=𝐰/(δ+1)\widetilde{{\boldsymbol{w}}}_{\rm eROWL}:={\boldsymbol{w}}/(\delta+1) and ϖ:=(w2w1)δ/(δ+1)\varpi:=(w_{2}-w_{1})\delta/(\delta+1), in place of 𝐰{\boldsymbol{w}} and δ\delta. Comparing (20) and (30) under this representation, one can see that 𝐰~eROWL\widetilde{{\boldsymbol{w}}}_{\rm eROWL} of RδR_{\delta} (eROWL) corresponds to 𝐰ROWL{\boldsymbol{w}}_{\rm ROWL} (the weight vector 𝐰{\boldsymbol{w}} of 𝖱\mathsf{R}), and it will therefore be fair to set the weight vectors in a way that 𝐰~eROWL=𝐰ROWL\widetilde{{\boldsymbol{w}}}_{\rm eROWL}={\boldsymbol{w}}_{\rm ROWL}. The parameter ϖ\varpi gives the bound of |x1x2|\left|x_{1}-x_{2}\right| in the definition of 𝒮\mathcal{S} given in (33).

5 Numerical Examples: Sparse Signal Recovery

We consider the simple linear model 𝒚:=𝑨𝒙+ϵ{\boldsymbol{y}}:={\boldsymbol{A}}{\boldsymbol{x}}_{\diamond}+{\boldsymbol{\epsilon}}, where 𝒙N{\boldsymbol{x}}_{\diamond}\in{\mathbb{R}}^{N} is the sparse (or weakly sparse) signal, 𝑨M×N{\boldsymbol{A}}\in{\mathbb{R}}^{M\times N} is the measurement matrix, and ϵM{\boldsymbol{\epsilon}}\in{\mathbb{R}}^{M} is the i.i.d. Gaussian noise vector. To recover the signal 𝒙{\boldsymbol{x}}_{\diamond}, we consider the iterative shrinkage algorithm in the following form:

𝒙k+1:=T(𝒙kμf(𝒙k)),k,{\boldsymbol{x}}_{k+1}:=T\left({\boldsymbol{x}}_{k}-\mu\nabla f({\boldsymbol{x}}_{k})\right),~{}k\in{\mathbb{N}}, (35)

where TT is the shrinkage operator (hard, firm, soft, ROWL, or eROWL), and f:N+:𝒙12𝑨𝒙𝒚22f:{\mathbb{R}}^{N}\rightarrow{\mathbb{R}}_{+}:{\boldsymbol{x}}\mapsto\frac{1}{2}\left\|{\boldsymbol{A}}{\boldsymbol{x}}-{\boldsymbol{y}}\right\|_{2}^{2} is the squared-error function with the step size parameter μ++\mu\in{\mathbb{R}}_{++}. Clearly, the function ff is ρ\rho-strongly convex with κ\kappa-Lipschitz continuous gradient f:𝒙𝑨(𝑨𝒙𝒚)\nabla f:{\boldsymbol{x}}\mapsto{\boldsymbol{A}}^{\top}({\boldsymbol{A}}{\boldsymbol{x}}-{\boldsymbol{y}}) for ρ:=λmin(𝑨𝑨)\rho:=\lambda_{\min}({\boldsymbol{A}}^{\top}{\boldsymbol{A}}) and κ:=λmax(𝑨𝑨)\kappa:=\lambda_{\max}({\boldsymbol{A}}^{\top}{\boldsymbol{A}}). As an evaluation metric, we adopt the system mismatch 𝒙𝒙k22/𝒙22\left\|{\boldsymbol{x}}_{\diamond}-{\boldsymbol{x}}_{k}\right\|_{2}^{2}/\left\|{\boldsymbol{x}}_{\diamond}\right\|_{2}^{2}.

5.1 Hard Shrinkage versus Firm Shrinkage

We compare the performance of the “discontinuous” hard shrinkage operator with its continuous relaxation which is firm shrinkage. For comparison, we also test soft shrinkage. The signal 𝒙N{\boldsymbol{x}}_{\diamond}\in{\mathbb{R}}^{N} of dimension N:=50N:=50 is generated as follows: the first ss components are generated from the i.i.d. standard Gaussian distribution 𝒩(0,1)\mathcal{N}(0,1), and the other NsN-s components are generated from i.i.d. 𝒩(0,1.0×104)\mathcal{N}(0,1.0\times 10^{-4}). The matrix 𝑨{\boldsymbol{A}} is generated also from 𝒩(0,1)\mathcal{N}(0,1). We study the impacts of the parameter τ1\tau_{1} of firm shrinkage and the threshold τ\tau of soft/hard shrinkage on the performance. Although μ\mu and τ2\tau_{2} can be tuned within the range given in [13, Theorem 2], those parameters are set systematically to μ:=(2ε)/(κ+ρ)\mu:=(2-\varepsilon)/(\kappa+\rho) and τ2:=τ1/(μρ)\tau_{2}:=\tau_{1}/(\mu\rho) for ε:=1.0×106\varepsilon:=1.0\times 10^{-6} (see Appendix 8). The step size of soft/hard shrinkage is set to 1/κ(0,2/κ)1/\kappa\in(0,2/\kappa). The results are averaged over 20,000 independent trials.

Figure 8 depicts the results for three different sparsity levels s:=5s:=5, 10, 20 under M:=100M:=100, 200 and the signal-to-noise ratio (SNR) 10,2010,20 dB. Table 1 summarizes the reduction rate (gain) of firm shrinkage against hard/soft shrinkage. Thanks to the continuous relaxation, firm shrinkage gains 19.5–44.9 % against hard shrinkage. Compared to soft shrinkage, the gain is 5.3–37.9 %.

\psfrag{t}[Bc][Bc][.8]{$\tau$}\includegraphics[height=113.81102pt]{fig8a.eps}
(a) M=100M=100, SNR=10=10 dB
\psfrag{t}[Bc][Bc][.8]{$\tau$}\includegraphics[height=113.81102pt]{fig8b.eps}
(b) M=200M=200, SNR=10=10 dB
\psfrag{t}[Bc][Bc][.8]{$\tau$}\includegraphics[height=113.81102pt]{fig8c.eps}
(c) M=100M=100, SNR=20=20 dB
\psfrag{t}[Bc][Bc][.8]{$\tau$}\includegraphics[width=199.16928pt]{fig8d.eps}
(d) M=200M=200, SNR=20=20 dB
Figure 8: Comparisons of the iterative shrinkage algorithms with hard shrinkage (blue), firm shrinkage (red), and soft shrinkage (green). Solid, dashed, and dotted curves correspond to sparsity level s:=5s:=5, s:=10s:=10, and s:=20s:=20, respectively. Mark (square, pentagon, and circle) indicates the best point (the lowest system mismatch) on each curve.

5.2 ROWL Shrinkage versus eROWL Shrinkage

The operator RδR_{\delta} is Lipschitz continuous with constant (1η)1=(1+1/δ)(1-\eta)^{-1}=(1+1/\delta) (see Theorem 2), where η=1/(1+δ)\eta=1/(1+\delta). To exploit [13, Theorem 2], let β:=1η=δ/(1+δ)\beta:=1-\eta=\delta/(1+\delta). Then, the sequence (𝒙k)k({\boldsymbol{x}}_{k})_{k\in{\mathbb{N}}} generated by (35) converges to a minimizer (if exists) of μf+Ω~𝒘/(ρ+1)\mu f+\widetilde{\Omega}_{{\boldsymbol{w}}}/(\rho+1), provided that (a) δ>(κρ)/2ρ\delta>(\kappa-\rho)/2\rho (β>(κρ)/(κ+ρ)\Leftrightarrow\beta>(\kappa-\rho)/(\kappa+\rho)) and (b) μ[(1β)ρ,(1+β)/κ)\mu\in[(1-\beta)\rho,(1+\beta)/\kappa). Thus, unless otherwise stated, we set δ:=γδ(κρ)/(2ρ)\delta:=\gamma_{\delta}(\kappa-\rho)/(2\rho) and μ:=γμ(1β)/ρ+(1γμ)(1+β)/κ\mu:=\gamma_{\mu}(1-\beta)/\rho+(1-\gamma_{\mu})(1+\beta)/\kappa with the additional parameters γδ>1\gamma_{\delta}>1 and γμ(0,1]\gamma_{\mu}\in(0,1] fixed to γδ:=1.01\gamma_{\delta}:=1.01 and γμ:=0.5\gamma_{\mu}:=0.5. In the following, eROWL shrinkage RδR_{\delta} is compared to ROWL shrinkage as well as firm shrinkage, where RδR_{\delta} in (35) is replaced by those other shrinkage operators.

Table 1: Reduction rate (%) of firm shrinkage in system mismatch
against hard shrinkage against soft shrinkage
s=5s=5  s=10s=10  s=20s=20 s=5s=5  s=10s=10  s=20s=20
case (a) 41.1     38.5      28.7 29.5     25.1      12.1
case (b) 35.9     32.4      24.8 37.9     35.3      20.8
case (c) 36.2     44.9      41.5 25.1     37.9      32.3
case (d) 19.5     24.7      24.9 5.3     32.5      31.8

5.2.1 Illustrative Examples

\psfrag{x1}[Bc][Bc][.8]{$x_{1}$}\psfrag{x2}[Bc][Bc][.8]{$x_{2}$}\includegraphics[height=142.26378pt]{fig9a.eps}
(a) ROWL
\psfrag{x1}[Bc][Bc][.8]{$x_{1}$}\psfrag{x2}[Bc][Bc][.8]{$x_{2}$}\includegraphics[height=142.26378pt]{fig9b.eps}
(b) eROWL
Figure 9: Illustrative examples in which ROWL fails but eROWL works. The signal 𝒙{\boldsymbol{x}}_{\diamond} is depicted by a pentagram.

Unlike the continuous eROWL shrinkage RδR_{\delta}, ROWL shrinkage is discontinuous and its corresponding iterate has no guarantee of convergence. To illuminate the potential issue, we consider the noiseless situation (i.e., ϵ:=𝟎{\boldsymbol{\epsilon}}:={\boldsymbol{0}}) with the 2×22\times 2 matrix 𝑨:=𝑸[1000.1]𝑸{\boldsymbol{A}}:={\boldsymbol{Q}}\left[\begin{tabular}[]{rr}$1$\!\!&$0$\\ $0$\!\!&$0.1$\end{tabular}\right]{\boldsymbol{Q}}^{\top} (i.e., M:=2M:=2) for 𝑸:=[
1 −0.9
0.9 1
]
{\boldsymbol{Q}}:=\left[\begin{tabular}[]{rr}$1$\!\!&$-0.9$\\ $0.9$\!\!&$1$\end{tabular}\right]
and the sparse signal 𝒙=[0,1.0]{\boldsymbol{x}}_{\diamond}=[0,1.0]^{\top}. In this case, κ=0.82\kappa=0.82, ρ=8.2×103\rho=8.2\times 10^{-3}. For the sake of illustration, the step size is set to μ:=2.0\mu:=2.0 for both ROWL and eROWL, the weight vector of ROWL is set to 𝒘ROWL:=[0,0.03]{\boldsymbol{w}}_{\rm ROWL}:=[0,0.03]^{\top}, and the weight vector 𝒘eROWL{\boldsymbol{w}}_{\rm eROWL} of eROWL is chosen, for fairness, in such a way that (𝒘~eROWL=)𝒘eROWL/(δ+1)=𝒘ROWL(\widetilde{{\boldsymbol{w}}}_{\rm eROWL}=){\boldsymbol{w}}_{\rm eROWL}/(\delta+1)={\boldsymbol{w}}_{\rm ROWL} with δ=50.0\delta=50.0 (see Remark 4). The convergence for eROWL is guaranteed for every μ[1.6×104,2.4)\mu\in[1.6\times 10^{-4},2.4). The algorithms are initialized to 𝒙0:=𝟎{\boldsymbol{x}}_{0}:={\boldsymbol{0}}.

Figure 9 plots the points 𝒙0,𝒙0.5,𝒙1,𝒙1.5,𝒙2,{\boldsymbol{x}}_{0},{\boldsymbol{x}}_{0.5},{\boldsymbol{x}}_{1},{\boldsymbol{x}}_{1.5},{\boldsymbol{x}}_{2},\cdots, where 𝒙k+1/2:=𝒙kμf(𝒙k){\boldsymbol{x}}_{k+1/2}:={\boldsymbol{x}}_{k}-\mu\nabla f({\boldsymbol{x}}_{k}), kk\in{\mathbb{N}}, is the intermediate vector between 𝒙k{\boldsymbol{x}}_{k} and 𝒙k+1{\boldsymbol{x}}_{k+1}. The arrows in red color depict the displacement vectors 𝒙k+1𝒙k+1/2{\boldsymbol{x}}_{k+1}-{\boldsymbol{x}}_{k+1/2} visualizing how each shrinkage operator works. In view of (19), the displacement vector for ROWL is given by 𝒘-{\boldsymbol{w}} basically, guiding the estimate toward a wrong direction. The ROWL iterate converges numerically to 𝒙ROWL:=[0.88,0]𝖳{\boldsymbol{x}}_{\infty}^{\rm ROWL}:=[0.88,0]^{\sf T}, failing to identify the active component. In sharp contrast, the eROWL iterate converges to 𝒙eROWL:=[0,0.99]𝖳{\boldsymbol{x}}_{\infty}^{\rm eROWL}:=[0,0.99]^{\sf T}, identifying the active component correctly. This is because the vector 𝒘α{\boldsymbol{w}}_{\alpha} governing the displacement vector depends on the position of the current estimate. More precisely, since the estimate at the early phase is located in the neighborhood of the diagonal (where x1=x2x_{1}=x_{2}), we have 𝒘α(1/2)(𝒘+𝒘)=[1,1]{\boldsymbol{w}}_{\alpha}\approx(1/2)({\boldsymbol{w}}+{\boldsymbol{w}}_{\downarrow})=[1,1]^{\top}, which allows the estimate to be updated toward 𝒙{\boldsymbol{x}}_{\diamond}. We emphasize that this notable advantage comes from the continuity of the eROWL shrinkage operator RδR_{\delta}.

\psfrag{w}[Bl][Bl][.8]{$w_{2}$ ($\tau$)}\includegraphics[height=113.81102pt]{fig10a.eps}
\psfrag{w}[Bl][Bl][.8]{$w_{2}$ ($\tau$)}\includegraphics[height=113.81102pt]{fig10b.eps}
Figure 10: Comparisons for fixed 𝒙:=[0,1]{\boldsymbol{x}}_{\diamond}:=[0,1]^{\top}.

5.2.2 ROWL versus eROWL

We first compare the performance of ROWL and eROWL with the matrix 𝑨{\boldsymbol{A}} generated randomly from i.i.d. 𝒩(0,1)\mathcal{N}(0,1) for M:=8M:=8 and with i.i.d. zero-mean Gaussian ϵ{\boldsymbol{\epsilon}}. For reference, soft shrinkage and firm shrinkage as well as the least squares (LS) estimate, are also tested. Two types of signal are considered: (i) 𝒙:=[0,1]{\boldsymbol{x}}_{\diamond}:=[0,1]^{\top} (deterministic) and (ii) 𝒙:=[0,ξ]{\boldsymbol{x}}_{\diamond}:=[0,\xi]^{\top} (stochastic) with ξ\xi generated randomly from 𝒩(0,1)\mathcal{N}(0,1). The results are averaged over 500,000 independent trials.

Figures 10 and 11 depict the results of the deterministic case and the stochastic case, respectively. In Figs. 10(a) and 11(a), the performance for different SNRs is plotted, where the weights of ROWL and eROWL are set to 𝒘=[0,w2]{\boldsymbol{w}}=[0,w_{2}]^{\top} with the second weight w2w_{2} is tuned individually by grid search under SNR 20 dB. The threshold τ\tau of soft shrinkage and τ1\tau_{1} of firm shrinkage are also tuned individually under SNR 20 dB. It can be seen that eROWL preserves good performance over the whole range, while the performance curves of soft, firm, and ROWL saturate as SNR increases.

\psfrag{x1}[Bc][Bc][.8]{$x_{\diamond,1}$}\psfrag{w}[Bl][Bl][.8]{$w_{2}$ ($\tau$)}\includegraphics[height=113.81102pt]{fig11a.eps}
\psfrag{x1}[Bc][Bc][.8]{$x_{\diamond,1}$}\psfrag{w}[Bl][Bl][.8]{$w_{2}$ ($\tau$)}\includegraphics[height=113.81102pt]{fig11b.eps}
Figure 11: Comparisons for 𝒙:=[0,ξ]{\boldsymbol{x}}_{\diamond}:=[0,\xi]^{\top} where ξ\xi is random.

It should also be mentioned that, in the stochastic case, the performance of firm for low SNRs is nearly identical to that of LS. This is because the threshold τ1\tau_{1} must fit the magnitude of the active component of 𝒙{\boldsymbol{x}}_{\diamond} but this is difficult in this case as the magnitude changes at every trial randomly. In sharp contrast, eROWL only depends on the “number” of active component (but not on its “magnitude”), and this is why it performs well. The performance of ROWL for low SNRs is ony slightly better than that of LS because of its discontinuity (see Section 5-5.2.5.2.1). In the deterministic case, although firm and ROWL achieves comparable performance to eROWL under SNR 20 dB for which the parameters of each shrinkage operator are tuned, the performance of those shrinkage operators becomes worse significantly as SNR becomes apart from 20 dB.

Figures 10(b) and 11(b) show the impacts of the tuning parameters w2w_{2} (or τ\tau) of each shrinkage operator for SNR 20 dB. It clearly shows the stable performance of eROWL which comes from the continuity of the eROWL shrinkage operator. This means that w2w_{2} is easy to tune and also that eROWL is expected to be robust against possible environmental changes. In contrast, the performance of ROWL degrades as w2w_{2} becomes larger than the best value due to its discontinuity. Note that soft/firm shrinkage for too large threshold level yields the zero solution for which the system mismatch is unity.

6 Conclusion

We presented the principled way of constructing a continuous relaxation of a discontinuous shrinkage operator by leveraging the proximal inclusion and conversion (Theorems 1 and 2). As its specific application, the continuous relaxation of the ROWL shrinkage was derived. Numerical examples showed the clear advantages of firm shrinkage and eROWL shrinkage over hard shrinkage and ROWL shrinkage (the discontinuous counterparts), demonstrating the efficacy of the continuous relaxation. A specific situation was presented where the ROWL shrinkage fails but eROWL shrinkage gives a good approximation of the true solution. The simulation results also indicated the potential advantages of eROWL in terms of simplicity of parameter tuning as well as robustness against environmental changes. Although the present study of eROWL is limited to the two dimensional case, its extension to an arbitrary (finite) dimensional case has been presented in [24], where advantages over ROWL have also been shown. We finally mention that the continuous relaxation approach is expected to be useful also for other nonconvex regularizers such as the one proposed in [25].

References

  • [1] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, Sep. 1994.
  • [2] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard, “Wavelet shrinkage: Asymptopia?” J. Royal Statistical Society: Series B (Methodological), vol. 57, no. 2, pp. 301–369, Jul. 1995, (with discussions).
  • [3] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory, vol. 41, no. 3, pp. 613–627, May 1995.
  • [4] T. Blumensath and M. Davies, “Iterative thresholding for sparse approximations,” J. Fourier Anal. Appl., vol. 14, pp. 629–654, 2008.
  • [5] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing.   New York: Springer, 2013.
  • [6] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” SIAM Journal on Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168–1200, 2005.
  • [7] H. H. Bauschke, R. S. Burachik, and D. R. Luke, Eds., Splitting Algorithms, Modern Operator Theory, and Applications.   Cham: Switzerland: Springer, 2019.
  • [8] L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi, “Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists,” SIAM Review, vol. 65, no. 2, pp. 375–435, 2023.
  • [9] H.-Y. Gao and A. G. Bruce, “Waveshrink with firm shrinkage,” Statistica Sinica, vol. 7, no. 4, pp. 855––874, 1997.
  • [10] C. H. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” The Annals of Statistics, vol. 38, no. 2, pp. 894–942, Apr. 2010.
  • [11] I. Selesnick, “Sparse regularization via convex analysis,” IEEE Trans. Signal Process., vol. 65, no. 17, pp. 4481–4494, Sep. 2017.
  • [12] I. Bayram, “On the convergence of the iterative shrinkage/thresholding algorithm with a weakly convex penalty,” IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1597–1608, 2016.
  • [13] M. Yukawa and I. Yamada, “Monotone Lipschitz-gradient denoiser: Explainability of operator regularization approaches and convergence to optimal point,” 2024, arXiv:2406.04676v2 [math.OC].
  • [14] T. Sasaki, Y. Bandoh, and M. Kitahara, “Sparse regularization based on reverse ordered weighted l1-norm and its application to edge-preserving smoothing,” in Proc. IEEE ICASSP, 2024, pp. 9531–9535.
  • [15] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Inf. Process, 2013, pp. 945–948.
  • [16] R. Gribonval and M. Nikolova, “A characterization of proximity operators,” Journal of Mathematical Imaging and Vision, vol. 62, pp. 773–789, 2020.
  • [17] H. H. Bauschke, W. M. Moursi, and X. Wang, “Generalized monotone operators and their averaged resolvents,” Math. Program., vol. 189, no. 55–74, 2021.
  • [18] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, 3rd ed.   Berlin Heidelberg: Springer, 2010.
  • [19] M. Kowalski, “Thresholding rules and iterative shrinkage/thresholding algorithm: A convergence study,” in Proc. IEEE ICIP, 2014, pp. 4151–4155.
  • [20] I. Bayram, “Penalty functions derived from monotone mappings,” IEEE Signal Process. Lett., vol. 22, no. 3, pp. 265–269, 2015.
  • [21] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.   New York: NY: Springer, 2017.
  • [22] J. J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bull. Soc. Math. France, vol. 93, pp. 273–299, 1965.
  • [23] X. Zeng and M. A. T. Figueiredo, “The ordered weighted 1\ell_{1} norm: Atomic formulation, projections, and algorithms,” arXiv, 2015, arXiv:1409.4271 [cs.DS].
  • [24] T. Okuda, K. Suzuki, and M. Yukawa, “Sparse signal recovery based on continuous relaxation of reversely ordered weighted 1\ell_{1} shrinkage operator,” in Proc. IEICE Signal Processing Symposium, Sapporo: Japan, Dec. 2024.
  • [25] C. Wang, Y. Wei, and M. Yukawa, “Dispersed-sparsity-aware LMS algorithm for scattering-sparse system identification,” Signal Processing, vol. 225, 2024.
\appendices

7 Derivation of Rδ(𝒙2)R_{\delta}({\boldsymbol{x}}_{2}) for 𝒙2𝒞1{\boldsymbol{x}}_{2}\in\mathcal{C}_{1}

The representation of RδR_{\delta} is visualized in Fig. A.1. The halfspaces \\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{-} and \+\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{+} share the same boundary which is depicted by the blue dotted line, and the boundaries of the hyperslab 𝒮\mathcal{S} are depicted by the black dashed lines. Figure A.1(b), specifically, illustrates the case in which 𝖱(𝒎)\mathsf{R}({\boldsymbol{m}}) touches the x1x_{1} and x2x_{2} axes. This situation happens when 𝒎{\boldsymbol{m}} lies between [w1,w1][w_{1},w_{1}]^{\top} and [w2,w2][w_{2},w_{2}]^{\top}, and it corresponds to the case of 𝒙𝒞1{\boldsymbol{x}}\in\mathcal{C}_{1}. The point 𝒑1{\boldsymbol{p}}_{1} is located on the x2x_{2} axis, and its inverse image 𝖱1(𝒑1)\mathsf{R}^{-1}({\boldsymbol{p}}_{1}) is a set (the line segment in magenta color). In this case, 𝒑1{\boldsymbol{p}}_{1} can be expressed as 𝒑1=Rδ(𝒙1)=(𝒙11δ+1𝒘)+{\boldsymbol{p}}_{1}=R_{\delta}({\boldsymbol{x}}_{1})=({\boldsymbol{x}}_{1}-\frac{1}{\delta+1}{\boldsymbol{w}}_{\downarrow})_{+}, and it can also be expressed as 𝒑1=Rδ(𝒙^1)=(𝒙^11δ+1𝒘)+{\boldsymbol{p}}_{1}=R_{\delta}(\hat{{\boldsymbol{x}}}_{1})=(\hat{{\boldsymbol{x}}}_{1}-\frac{1}{\delta+1}{\boldsymbol{w}}_{\downarrow})_{+}.

We rephrase the inclusion relation (see Section 4-4.2)

(δ+1)𝒙𝖱1(𝒑)+δ𝒑,(\delta+1){\boldsymbol{x}}\in\mathsf{R}^{-1}({\boldsymbol{p}})+\delta{\boldsymbol{p}}, (A.1)

which plays a key role in the derivation. Let 𝒎:=[mm]{\boldsymbol{m}}:=\left[\begin{array}[]{c}m\\ m\end{array}\right] for m[w1,w2]m\in[w_{1},w_{2}]. Then, 𝒑1{\boldsymbol{p}}_{1}, 𝒑3𝖱(𝒎){\boldsymbol{p}}_{3}\in\mathsf{R}({\boldsymbol{m}}) in Fig. A.1(b) can be expressed as 𝒑1=(𝒎𝒘)+=[0mw1]{\boldsymbol{p}}_{1}=({\boldsymbol{m}}-{\boldsymbol{w}}_{\downarrow})_{+}=\left[\begin{array}[]{c}0\\ m-w_{1}\end{array}\right] and 𝒑3=(𝒎𝒘)+=[mw10]{\boldsymbol{p}}_{3}=({\boldsymbol{m}}-{\boldsymbol{w}})_{+}=\left[\begin{array}[]{c}m-w_{1}\\ 0\end{array}\right], and thus their convex combination 𝒑2{\boldsymbol{p}}_{2} is given by

𝒑2=ω𝒑1+(1ω)𝒑3{\boldsymbol{p}}_{2}=\omega{\boldsymbol{p}}_{1}+(1-\omega){\boldsymbol{p}}_{3} (A.2)

for ω(0,1)\omega\in(0,1). Since (A.1) implies that

(δ+1)𝒙2=𝒎+δ𝒑2,(\delta+1){\boldsymbol{x}}_{2}={\boldsymbol{m}}+\delta{\boldsymbol{p}}_{2}, (A.3)

we obtain

m=\displaystyle m= [(δ+1)(x1+x2)+δw1]/(δ+2),\displaystyle~{}[(\delta+1)(x_{1}+x_{2})+\delta w_{1}]/(\delta+2), (A.4)
ω=\displaystyle\omega= (δ+1)x2mδ(mw1).\displaystyle~{}\frac{(\delta+1)x_{2}-m}{\delta(m-w_{1})}. (A.5)

Substituting (A.5) into (A.2) yields

Rδ(𝒙2)=𝒑2=[mw10]+(δ+1)x2mδ[11].\displaystyle R_{\delta}({\boldsymbol{x}}_{2})={\boldsymbol{p}}_{2}=\left[\begin{array}[]{c}m-w_{1}\\ 0\end{array}\right]+\dfrac{(\delta+1)x_{2}-m}{\delta}\left[\begin{array}[]{c}\!-1\!\\ 1\end{array}\right]. (A.10)

The halfspaces /1+\mathcal{H}_{\mbox{{\tiny$/$}}1}^{+} and /2+\mathcal{H}_{\mbox{{\tiny$/$}}2}^{+} are derived from the condition ω(0,1)\omega\in(0,1), and \\mathcal{H}_{\mbox{{\tiny$\backslash$}}}^{-} comes from mw2m\leq w_{2}.

The expression of Rδ(𝒙)R_{\delta}({\boldsymbol{x}}) for 𝒙𝒞2{\boldsymbol{x}}\in\mathcal{C}_{2} can be derived analogously by using 𝒑1=[mw2mw1]{\boldsymbol{p}}_{1}=\left[\begin{array}[]{c}m-w_{2}\\ m-w_{1}\end{array}\right] and 𝒑3=[mw1mw2]{\boldsymbol{p}}_{3}=\left[\begin{array}[]{c}m-w_{1}\\ m-w_{2}\end{array}\right] for m>w2m>w_{2}.

\psfrag{0}[Bl][Bl][1]{${\boldsymbol{0}}$}\psfrag{m}[Bl][Bl][1]{${\boldsymbol{m}}$}\psfrag{m=}[Bl][Bl][1]{${\boldsymbol{m}}=\mathsf{R}^{-1}({\boldsymbol{p}}_{2})$}\psfrag{inRinv}[Bl][Bl][1]{$\in\mathsf{R}^{-1}({\boldsymbol{p}}_{1})$}\psfrag{mhat}[Bl][Bl][1]{$\hat{{\boldsymbol{m}}}$}\psfrag{x}[Br][Br][1]{${\boldsymbol{x}}_{1}$}\psfrag{xhat}[Bl][Bl][1]{$\hat{{\boldsymbol{x}}}_{1}$}\psfrag{p1}[Br][Br][1]{${\boldsymbol{p}}_{1}=R_{\delta}({\boldsymbol{x}}_{1})=R_{\delta}(\hat{{\boldsymbol{x}}}_{1})$}\psfrag{p2}[Br][Br][1]{${\boldsymbol{p}}_{2}=R_{\delta}({\boldsymbol{x}}_{2})$}\psfrag{p3}[Br][Br][1]{${\boldsymbol{p}}_{3}$}\psfrag{x2vec}[Bl][Bl][1]{${\boldsymbol{x}}_{2}$}\psfrag{Rm}[Br][Br][1]{$\mathsf{R}({\boldsymbol{m}})$}\psfrag{Rinvp1}[Bl][Bl][1]{$\mathsf{R}^{-1}({\boldsymbol{p}}_{1})$}\psfrag{C1}[Bc][Bc][1]{$\mathcal{C}_{1}$}\psfrag{C2}[Bc][Bc][1]{$\mathcal{C}_{2}$}\psfrag{M}[Bc][Bc][1]{$\mathcal{M}$}\psfrag{-w}[Bc][Bc][1]{$-{\boldsymbol{w}}_{\downarrow}$}\psfrag{Tm}[Bc][Bc][1]{$\mathsf{R}({\boldsymbol{m}})$}\psfrag{mmin}[Bl][Bl][1]{$\left[\begin{array}[]{c}w_{1}\\ w_{1}\end{array}\right]$}\psfrag{mmax}[Bl][Bl][1]{$\left[\begin{array}[]{c}w_{2}\\ w_{2}\end{array}\right]$}\psfrag{w1delta+1}[Bl][Bl][1]{$\dfrac{1}{\delta+1}\left[\begin{array}[]{c}w_{1}\\ w_{1}\end{array}\right]$}\psfrag{=}[Bl][Bl][1]{$=\mathsf{R}^{-1}(\hat{{\boldsymbol{p}}})$}\psfrag{x1}[Bl][Bl][1]{$x_{1}$}\psfrag{x2}[Br][Br][1]{$x_{2}$}\psfrag{p}[Bl][Bl][1]{${\boldsymbol{p}}=R_{\delta}({\boldsymbol{x}})$}\psfrag{xt}[Br][Br][1]{$\hat{{\boldsymbol{x}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}=R_{\delta}(\hat{{\boldsymbol{x}}})$}\psfrag{distrho}[Bl][Bl][1]{$\dfrac{(w_{1}+w_{2})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist1}[Bl][Bl][1]{$\dfrac{w_{1}+w_{2}}{\sqrt{2}(\delta+1)}$}\psfrag{dist2rho}[Br][Br][1]{$\dfrac{(w_{2}-w_{1})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist21}[Br][Br][1]{$\dfrac{w_{2}-w_{1}}{\sqrt{2}(\delta+1)}$}\psfrag{w2}[Br][Br][1]{$w_{2}-w_{1}$}\psfrag{w2delta}[Br][Br][1]{$\dfrac{(w_{2}-w_{1})\delta}{\delta+1}$}\includegraphics[height=256.0748pt]{figA1a.eps}
\psfrag{0}[Bl][Bl][1]{${\boldsymbol{0}}$}\psfrag{m}[Bl][Bl][1]{${\boldsymbol{m}}$}\psfrag{m=}[Bl][Bl][1]{${\boldsymbol{m}}=\mathsf{R}^{-1}({\boldsymbol{p}}_{2})$}\psfrag{inRinv}[Bl][Bl][1]{$\in\mathsf{R}^{-1}({\boldsymbol{p}}_{1})$}\psfrag{mhat}[Bl][Bl][1]{$\hat{{\boldsymbol{m}}}$}\psfrag{x}[Br][Br][1]{${\boldsymbol{x}}_{1}$}\psfrag{xhat}[Bl][Bl][1]{$\hat{{\boldsymbol{x}}}_{1}$}\psfrag{p1}[Br][Br][1]{${\boldsymbol{p}}_{1}=R_{\delta}({\boldsymbol{x}}_{1})=R_{\delta}(\hat{{\boldsymbol{x}}}_{1})$}\psfrag{p2}[Br][Br][1]{${\boldsymbol{p}}_{2}=R_{\delta}({\boldsymbol{x}}_{2})$}\psfrag{p3}[Br][Br][1]{${\boldsymbol{p}}_{3}$}\psfrag{x2vec}[Bl][Bl][1]{${\boldsymbol{x}}_{2}$}\psfrag{Rm}[Br][Br][1]{$\mathsf{R}({\boldsymbol{m}})$}\psfrag{Rinvp1}[Bl][Bl][1]{$\mathsf{R}^{-1}({\boldsymbol{p}}_{1})$}\psfrag{C1}[Bc][Bc][1]{$\mathcal{C}_{1}$}\psfrag{C2}[Bc][Bc][1]{$\mathcal{C}_{2}$}\psfrag{M}[Bc][Bc][1]{$\mathcal{M}$}\psfrag{-w}[Bc][Bc][1]{$-{\boldsymbol{w}}_{\downarrow}$}\psfrag{Tm}[Bc][Bc][1]{$\mathsf{R}({\boldsymbol{m}})$}\psfrag{mmin}[Bl][Bl][1]{$\left[\begin{array}[]{c}w_{1}\\ w_{1}\end{array}\right]$}\psfrag{mmax}[Bl][Bl][1]{$\left[\begin{array}[]{c}w_{2}\\ w_{2}\end{array}\right]$}\psfrag{w1delta+1}[Bl][Bl][1]{$\dfrac{1}{\delta+1}\left[\begin{array}[]{c}w_{1}\\ w_{1}\end{array}\right]$}\psfrag{=}[Bl][Bl][1]{$=\mathsf{R}^{-1}(\hat{{\boldsymbol{p}}})$}\psfrag{x1}[Bl][Bl][1]{$x_{1}$}\psfrag{x2}[Br][Br][1]{$x_{2}$}\psfrag{p}[Bl][Bl][1]{${\boldsymbol{p}}=R_{\delta}({\boldsymbol{x}})$}\psfrag{xt}[Br][Br][1]{$\hat{{\boldsymbol{x}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}$}\psfrag{pt}[Br][Br][1]{$\hat{{\boldsymbol{p}}}=R_{\delta}(\hat{{\boldsymbol{x}}})$}\psfrag{distrho}[Bl][Bl][1]{$\dfrac{(w_{1}+w_{2})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist1}[Bl][Bl][1]{$\dfrac{w_{1}+w_{2}}{\sqrt{2}(\delta+1)}$}\psfrag{dist2rho}[Br][Br][1]{$\dfrac{(w_{2}-w_{1})\delta}{\sqrt{2}(\delta+1)}$}\psfrag{dist21}[Br][Br][1]{$\dfrac{w_{2}-w_{1}}{\sqrt{2}(\delta+1)}$}\psfrag{w2}[Br][Br][1]{$w_{2}-w_{1}$}\psfrag{w2delta}[Br][Br][1]{$\dfrac{(w_{2}-w_{1})\delta}{\delta+1}$}\includegraphics[height=256.0748pt]{figA1b.eps}
Figure A.1: Geometric interpretations of the operator RδR_{\delta}.

8 Parameters of Firm Shrinkage

By [13, Theorem 2], the iterate given in (35) converges to the minimizer of μf+φ\mu f+\varphi under the conditions β((κρ)/(κ+ρ),1)\beta\in((\kappa-\rho)/(\kappa+\rho),1) and μ[(1β)/ρ,(1+β)/κ)\mu\in[(1-\beta)/\rho,(1+\beta)/\kappa) (see Fact 5 for the relation between φ\varphi and the given operator TT). Since firmτ1,τ2{\rm firm}_{\tau_{1},\tau_{2}} is τ2/(τ2τ1)\tau_{2}/(\tau_{2}-\tau_{1})-Lipschitz continuous, we have β1=τ2/(τ2τ1)β=1τ1/τ2\beta^{-1}=\tau_{2}/(\tau_{2}-\tau_{1})\Leftrightarrow\beta=1-\tau_{1}/\tau_{2}. This leads to the condition 1τ1/τ2>(κρ)/(κ+ρ)τ2>τ1(κ+ρ)/(2ρ)1-\tau_{1}/\tau_{2}>(\kappa-\rho)/(\kappa+\rho)\Leftrightarrow\tau_{2}>\tau_{1}(\kappa+\rho)/(2\rho). To satisfy this inequality, we set τ2:=τ1(κ+ρ)/((2ε)ρ)\tau_{2}:=\tau_{1}(\kappa+\rho)/((2-\varepsilon)\rho) for a small constant ε(0,2)\varepsilon\in(0,2). In this case, we have β=(κ(1ε)ρ)/(κ+ρ)\beta=(\kappa-(1-\varepsilon)\rho)/(\kappa+\rho), and setting the step size μ\mu to its lower bound gives μ=(1β)/ρ(2ε)/(κ+ρ)\mu=(1-\beta)/\rho\Leftrightarrow(2-\varepsilon)/(\kappa+\rho).

Explain in detail why the contribution of this manuscript is within the scope of the IEEE Open Journal of Signal Processing

Aims & Scope

The IEEE Open Journal of Signal Processing covers the enabling technology for the generation, transformation, extraction, and interpretation of information. It comprises the theory, algorithms with associated architectures and implementations, and applications related to processing information contained in many different formats broadly designated as signals. Signal processing uses mathematical, statistical, computational, heuristic, and/or linguistic representations, formalisms, modeling techniques and algorithms for generating, transforming, transmitting, and learning from signals.

Discontinuous shrinkage operators have a merit of no estimation bias but lack guarantee of algorithm convergence. This paper presents fundamental results enabling to convert discontinuous operators to continuous ones in a principled manner. This is linked to the recently-established mathematical framework for monotone Lipschitz-gradient operators. The author believes that the presented study will boost the research on those signals/data which involve sparseness, low rankness, and so on.