This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Parameter Choices for Sparse Multi-Parameter Regularization with the 1\ell_{1} Norm

Qianru Liu1, Rui Wang2111Author to whom any correspondence should be addressed. and Yuesheng Xu3 1 School of Mathematics and Statistics, Henan University of Technology, Zhengzhou, 450001, People’s Republic of China
2 School of Mathematics, Jilin University, Changchun 130012, People’s Republic of China
3 Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, United States of America
liuqr23@haut.edu.cn rwang11@jlu.edu.cn y1xu@odu.edu
Abstract

This paper introduces a multi-parameter regularization approach using the 1\ell_{1} norm, designed to better adapt to complex data structures and problem characteristics while offering enhanced flexibility in promoting sparsity in regularized solutions. As data volumes grow, sparse representations of learned functions become critical for reducing computational costs during function operations. We investigate how the selection of multiple regularization parameters influences the sparsity of regularized solutions. Specifically, we characterize the relationship between these parameters and the sparsity of solutions under transform matrices, enabling the development of an iterative scheme for selecting parameters that achieve prescribed sparsity levels. Special attention is given to scenarios where the fidelity term is non-differentiable, and the transform matrix lacks full row rank. In such cases, the regularized solution, along with two auxiliary vectors arising in the sparsity characterization, are essential components of the multi-parameter selection strategy. To address this, we propose a fixed-point proximity algorithm that simultaneously determines these three vectors. This algorithm, combined with our sparsity characterization, forms the basis of a practical multi-parameter selection strategy. Numerical experiments demonstrate the effectiveness of the proposed approach, yielding regularized solutions with both predetermined sparsity levels and satisfactory approximation accuracy.

, and

1 Introduction

Multi-parameter regularization is a widely used technique for addressing ill-posed problems [4, 6, 8, 9, 10, 18, 19, 28]. Motivated by the challenges posed by big data in practical applications, sparse multi-parameter regularization using the 1\ell_{1} norm has become a prominent area of research [1, 11, 21, 22, 23, 26]. Empirical studies have shown that, compared to single-parameter regularization with the 1\ell_{1} norm, the multi-parameter approach offers greater flexibility in promoting sparsity in the regularized solutions while effectively mitigating the ill-posedness inherent in practical problems.

This paper aims to develop a practical parameter selection strategy for multi-parameter regularization with the 1\ell_{1} norm, enabling the regularized solution to achieve a specified level of sparsity. To achieve this, we first establish a theoretical understanding of how the regularization parameters affect the sparsity of the solution under transform matrices. Unlike single-parameter regularization, the multi-parameter framework allows independent control over the sparsity of the solution corresponding to each transform matrix. Using tools from convex analysis, we characterize the relationship between the choice of multiple regularization parameters and the sparsity of the solution under the transform matrices. In the special case where the transform matrices are degenerate identities and the fidelity term exhibits block separability, this characterization provides a direct multi-parameter selection strategy that ensures each sub-vector of the solution achieves a desired level of block sparsity. In the general case, where such direct adoption is not feasible, we propose an iterative scheme to select the multiple regularization parameters. This scheme ensures the regularized solution attains the prescribed sparsity levels under the transform matrices. Unlike single-parameter regularization models, where only one parameter is optimized, our iterative algorithm considers the interplay between multiple parameters, thereby enhancing the flexibility and effectiveness of the parameter selection process.

In our recent papers [17] and [23], we investigated parameter selection strategies for sparse regularization using the 1\ell_{1} norm. Compared to these studies, this paper makes two significant contributions: First, we have established strategies for selecting multiple regularization parameters in a model where the objective function comprises a convex fidelity term and multiple 1\ell_{1}-based regularization terms, each composed with a linear transform. Unlike the iterative algorithm in [17], which focused on selecting a single parameter, the algorithm proposed here addresses multiple parameters and explicitly incorporates their interdependencies. While [23] also explored iterative schemes for multi-parameter selection in a highly nonconvex optimization problem with numerous network parameters and extensive training data, the model in this paper facilitates a more sophisticated iterative algorithm. Second, we have developed iterative schemes for selecting multiple regularization parameters in two distinct scenarios: (1) when the fidelity term is differentiable, and the transform matrix has full row rank, and (2) when the fidelity term is nondifferentiable, and the transform matrix lacks full row rank. In contrast, the approaches in [17] and [23] address parameter selection only for the case of a differentiable fidelity term and a full-rank transform matrix.

For the nondifferentiable fidelity term and non-full-rank transform matrix, each iteration requires determining not only the regularized solution but also two auxiliary vectors. To address this, we have characterized these three components using a system of fixed-point equations involving proximity operators. Based on this characterization, we have devised a fixed-point proximity algorithm capable of simultaneously computing all three vectors. The convergence of this algorithm is rigorously analyzed in this paper. By combining the fixed-point proximity algorithm with the sparsity characterization of the regularized solution, we propose a robust multi-parameter selection strategy that ensures the solution achieves a specified level of sparsity.

This paper is organized into seven sections and two appendices. Section 2 introduces the general multi-parameter regularization problem under investigation and highlights several examples of practical significance. In Section 3, we establish a characterization of how each regularization parameter affects the sparsity of the regularized solution under its associated transform matrix. This characterization is further specialized in Section 4 to the case where the transform matrices are degenerate identities. Building on the insights from Sections 3 and 4, Section 5 presents iterative schemes for simultaneously determining multiple parameters and solutions with prescribed sparsity levels under the transform matrices. Two scenarios are addressed: (1) when the fidelity term is differentiable and the transform matrix has full row rank, and (2) when the fidelity term is nondifferentiable and the transform matrix does not have full row rank. Section 6 demonstrates the effectiveness of the proposed parameter selection strategies through a series of numerical examples. Section 7 concludes the paper by summarizing the findings. Appendix A contains the proof of the convergence analysis for the fixed-point proximity algorithm introduced in Section 5, while Appendix B provides closed-form expressions for the proximity operators used in the algorithm.

2 Multi-parameter regularization with the 1\ell_{1} norm

In this section, we introduce the multi-parameter regularization problem considered in this paper. Additionally, we review several optimization models of practical significance and illustrate how they can be formulated within this general framework.

We begin with describing the multi-parameter regularization problem. For each ss\in\mathbb{N}, let s:={1,2,,s}\mathbb{N}_{s}:=\{1,2,\ldots,s\} and set 0:=\mathbb{N}_{0}:=\emptyset. Suppose that 𝝍:n+:=[0,+)\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+}:=[0,+\infty) is a convex function and for each jdj\in\mathbb{N}_{d}, mjm_{j}\in\mathbb{N} and 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n real matrix. For each mm\in\mathbb{N} and 𝐱:=[xj:jm]m\mathbf{x}:=[x_{j}:j\in\mathbb{N}_{m}]\in\mathbb{R}^{m}, we define its 1\ell_{1} norm by 𝐱1:=jm|xj|\|\mathbf{x}\|_{1}:=\sum_{j\in\mathbb{N}_{m}}|x_{j}|. We consider the multi-parameter regularization problem

min{𝝍(𝐮)+jdλj𝐁j𝐮1:𝐮n},\min\left\{\bm{\psi}(\mathbf{u})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{B}_{j}\mathbf{u}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}, (1)

where λj>0,\lambda_{j}>0, jd,j\in\mathbb{N}_{d}, is a sequence of positive regularization parameters.

The multi-parameter regularization problem (1) appears in many application areas. Below, we present several examples of it. In imaging reconstruction problems, a combination of multiple regularizers was used to encourage their solution to simultaneously exhibit the characteristics enforced by each of them. For example, a combination of frame-based synthesis and analysis 1\ell_{1} norm regularizers was proposed in [1] for an imaging deblurring problem. Specifically, for m,n,sm,n,s\in\mathbb{N}, we assume that 𝐀m×s\mathbf{A}\in\mathbb{R}^{m\times s} represents a periodic convolution, 𝐖s×n\mathbf{W}\in\mathbb{R}^{s\times n} is a synthesis operator whose columns contain the elements of a frame, and 𝐏m×s\mathbf{P}\in\mathbb{R}^{m\times s} is an analysis operator of a tight Parseval frame satisfying 𝐏𝐏=𝐈s\mathbf{P}^{\top}\mathbf{P}=\mathbf{I}_{s}, where 𝐈s\mathbf{I}_{s} denotes the identity matrix of order ss. Let 𝐲m\mathbf{y}\in\mathbb{R}^{m} be observed data. The regularization problem combining the synthesis and analysis 1\ell_{1} norm regularizers has the form

min{12𝐀𝐖𝐮𝐲22+λ1𝐮1+λ2𝐏𝐖𝐮1:𝐮n}.\min\left\{\frac{1}{2}\|\mathbf{AW}\mathbf{u}-\mathbf{y}\|_{2}^{2}+\lambda_{1}\|\mathbf{u}\|_{1}+\lambda_{2}\|\mathbf{PW}\mathbf{u}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}. (2)

Clearly, problem (2) may be identified as a special case of (1) with

𝝍(𝐮):=12𝐀𝐖𝐮𝐲22for𝐮n,𝐁1:=𝐈n,𝐁2:=𝐏𝐖.\bm{\psi}(\mathbf{u}):=\frac{1}{2}\|\mathbf{AW}\mathbf{u}-\mathbf{y}\|_{2}^{2}\ \ \mbox{for}\ \mathbf{u}\in\mathbb{R}^{n},\ \ \mathbf{B}_{1}:=\mathbf{I}_{n},\ \ \mathbf{B}_{2}:=\mathbf{PW}.

As a generalization of the lasso regularized model [25], the fused lasso regularized model was proposed in [26] for problems with features that can be ordered in some meaningful way. Let p,np,n\in\mathbb{N}. Suppose that a prediction problem with pp cases has outcomes yiy_{i}, ipi\in\mathbb{N}_{p} and features xijx_{ij}, ipi\in\mathbb{N}_{p}, jnj\in\mathbb{N}_{n}. Let 𝐗:=[xij:ip,jn]\mathbf{X}:=[x_{ij}:i\in\mathbb{N}_{p},j\in\mathbb{N}_{n}] be the p×np\times n matrix of features and 𝐲:=[yi:ip]p\mathbf{y}:=[y_{i}:i\in\mathbb{N}_{p}]\in\mathbb{R}^{p}, 𝐮n\mathbf{u}\in\mathbb{R}^{n} be the vectors of outcomes and coefficients, respectively. The fused lasso regularized model is formulated as

min{12𝐗𝐮𝐲22+λ1𝐮1+λ2𝐃𝐮1:𝐮n},\min\left\{\frac{1}{2}\|\mathbf{X}\mathbf{u}-\mathbf{y}\|_{2}^{2}+\lambda_{1}\|\mathbf{u}\|_{1}+\lambda_{2}\|\mathbf{Du}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}, (3)

where 𝐃:=[dij:in1,jn]\mathbf{D}:=[d_{ij}:i\in\mathbb{N}_{n-1},j\in\mathbb{N}_{n}] is the (n1)×n(n-1)\times n first order difference matrix with dii=1,d_{ii}=-1, di,i+1=1d_{i,i+1}=1 for in1i\in\mathbb{N}_{n-1} and 0 otherwise. By introducing

𝝍(𝐮):=12𝐗𝐮𝐲22,for𝐮n,𝐁1:=𝐈n,𝐁2:=𝐃,\bm{\psi}(\mathbf{u}):=\frac{1}{2}\|\mathbf{X}\mathbf{u}-\mathbf{y}\|_{2}^{2},\ \mbox{for}\ \mathbf{u}\in\mathbb{R}^{n},\ \mathbf{B}_{1}:=\mathbf{I}_{n},\ \mathbf{B}_{2}:=\mathbf{D},

the fused lasso model (3) can be rewritten in the form of (1). By penalizing the 1\ell_{1} norm of both the coefficients and their successive differences, the fused lasso regularized model encourages the sparsity of the coefficients and also the sparsity of their differences. As a special case, the fused lasso signal approximation [11] has the form (3) with the feature matrix 𝐗\mathbf{X} being the identity matrix.

Filtering noisy data was considered in [22] for the case where the underlying signal comprises a low-frequency component and a sparse or sparse-derivative component. Specifically, assume that the noisy data y(t)y(t) can be modeled as

y(t)=f(t)+u(t)+η(t),tn,y(t)=f(t)+u(t)+\eta(t),\ t\in\mathbb{N}_{n}, (4)

where ff is a low-pass signal, uu is a sparse and sparse-derivative signal and η\eta is stationary white Gaussian noise. Given noisy data of the form (4), one seeks the estimate of ff and uu individually. For this purpose, we first solve the compound sparse denoising problem

min{12𝐇(𝐲𝐮)22+λ1𝐮1+λ2𝐃𝐮1:𝐮n}\min\left\{\frac{1}{2}\|\mathbf{\mathbf{H}}(\mathbf{y}-\mathbf{u})\|_{2}^{2}+\lambda_{1}\|\mathbf{u}\|_{1}+\lambda_{2}\|\mathbf{D}\mathbf{u}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\} (5)

to obtain the estimate 𝐮\mathbf{u}^{*} of uu. Here, 𝐲:=[y(t):tn]\mathbf{y}:=[y(t):t\in\mathbb{N}_{n}] and 𝐇\mathbf{H} is the high-pass filter matrix with the form 𝐇:=𝐀1𝐂\mathbf{H}:=\mathbf{A}^{-1}\mathbf{C}, where 𝐀\mathbf{A} and 𝐂\mathbf{C} are banded matrices. We then get the estimate 𝐟\mathbf{f}^{*} of ff as 𝐟:=(𝐈~𝐀1𝐂)(𝐲𝐮)\mathbf{f}^{*}:=(\tilde{\mathbf{I}}-\mathbf{A}^{-1}\mathbf{C})(\mathbf{y}-\mathbf{u}^{*}). It is clear that the compound sparse denoising model (5) has the form (1) with

𝝍(𝐮):=12𝐇(𝐲𝐮)22,for all𝐮n,\bm{\psi}(\mathbf{u}):=\frac{1}{2}\|\mathbf{\mathbf{H}}(\mathbf{y}-\mathbf{u})\|_{2}^{2},\ \ \mbox{for all}\ \ \mathbf{u}\in\mathbb{R}^{n}, (6)

and 𝐁1:=𝐈n\mathbf{B}_{1}:=\mathbf{I}_{n}, 𝐁2:=𝐃\mathbf{B}_{2}:=\mathbf{D}.

Using a technique similar to that used in the fused lasso regularized model, the fused SVM was proposed for classification of array-based comparative genomic hybridization (arrayCGH) data [21, 27]. Given training data {(𝐱j,yj):jp}\{(\mathbf{x}_{j},y_{j}):j\in\mathbb{N}_{p}\} composed of sample points {𝐱j:jp}n\{\mathbf{x}_{j}:j\in\mathbb{N}_{p}\}\subset\mathbb{R}^{n} and labels {yj:jp}{1,1}\{y_{j}:j\in\mathbb{N}_{p}\}\subset\{-1,1\}. The aim of binary classification is to find a decision function f(𝐱):=𝐮𝐱f(\mathbf{x}):=\mathbf{u}^{\top}\mathbf{x}, 𝐱n\mathbf{x}\in\mathbb{R}^{n} predicting the class yj=1y_{j}=-1 or yj=1y_{j}=1. The class prediction for a profile 𝐱\mathbf{x} is then 11 if f(𝐱)0f(\mathbf{x})\geq 0 and 1-1 otherwise. The fused SVM based on the hinge loss function has the form

min{jpmax(0,1yj𝐮𝐱j)+λ1𝐮1+λ2𝐃𝐮1:𝐮n}.\min\left\{\sum_{j\in\mathbb{N}_{p}}\mathrm{max}(0,1-y_{j}\mathbf{u}^{\top}\mathbf{x}_{j})+\lambda_{1}\|\mathbf{u}\|_{1}+\lambda_{2}\|\mathbf{D}\mathbf{u}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}. (7)

We rewrite model (7) in the form (1) as follows. We define the matrix 𝐗:=[𝐱j:jp]\mathbf{X}:=[\mathbf{x}_{j}:j\in\mathbb{N}_{p}]^{\top}, the matrix 𝐘:=diag(yj:jp)\mathbf{Y}:=\mathrm{diag}(y_{j}:j\in\mathbb{N}_{p}) and the function ϕ(𝐳):=jpmax{0,1zj}\bm{\phi}(\mathbf{z}):=\sum_{j\in\mathbb{N}_{p}}\mathrm{max}\{0,1-z_{j}\}, for all 𝐳:=[zj:jp]p\mathbf{z}:=[z_{j}:j\in\mathbb{N}_{p}]\in\mathbb{R}^{p}. Then by introducing 𝝍(𝐮):=ϕ(𝐘𝐗𝐮)\bm{\psi}(\mathbf{u}):=\bm{\phi}(\mathbf{YXu}), for 𝐮n\mathbf{u}\in\mathbb{R}^{n}, 𝐁1:=𝐈n\mathbf{B}_{1}:=\mathbf{I}_{n}, and 𝐁2:=𝐃\mathbf{B}_{2}:=\mathbf{D}, the fused SVM model (7) can be represented in the form (1).

3 Choices of the multiple regularization parameters

In this section, we characterize the relation between the multiple regularization parameters and the sparsity of the regularized solution under the transform matrices 𝐁j\mathbf{B}_{j}, jdj\in\mathbb{N}_{d}, of the regularization problem (1). Unlike the single-parameter regularization problem, the use of multiple regularization parameters in problem (1) allows us to separately consider the sparsity of the solution under each transform matrix 𝐁j\mathbf{B}_{j}.

We begin by recalling the definition of the level of sparsity for a vector in s\mathbb{R}^{s}. For ss\in\mathbb{N}, we set s:={0,1,,s1}\mathbb{Z}_{s}:=\{0,1,\ldots,s-1\}. A vector 𝐱s\mathbf{x}\in\mathbb{R}^{s} is said to have sparsity of level ls+1l\in\mathbb{Z}_{s+1} if it has exactly ll nonzero components.

We now reformulate the regularization problem (1) into an equivalent form to facilitate the characterization of the sparsity of the solution under the transform matrices 𝐁j\mathbf{B}_{j}, jdj\in\mathbb{N}_{d}. Let p0:=0p_{0}:=0 and pj:=ijmip_{j}:=\sum_{i\in\mathbb{N}_{j}}m_{i}, jdj\in\mathbb{N}_{d}. We decompose a vector 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} into dd sub-vectors by setting

𝐳j:=[zpj1+i:imj]mjfor alljd.\mathbf{z}_{j}:=[z_{p_{j-1}+i}:i\in\mathbb{N}_{m_{j}}]\in\mathbb{R}^{m_{j}}\ \mbox{for all}\ j\in\mathbb{N}_{d}.

By introducing a column block matrix

𝐁:=[𝐁j:jd]pd×n,\mathbf{B}:=[\mathbf{B}_{j}:j\in\mathbb{N}_{d}]\in\mathbb{R}^{p_{d}\times n}, (8)

we write 𝐁𝐮\mathbf{B}\mathbf{u} for the block column vector [𝐁j𝐮:jd][\mathbf{B}_{j}\mathbf{u}:j\in\mathbb{N}_{d}]. Thus, we may rewrite the regularization problem (1) as

min{𝝍(𝐮)+jdλj(𝐁𝐮)j1:𝐮n}.\min\left\{\bm{\psi}(\mathbf{u})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|(\mathbf{B}\mathbf{u})_{j}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}. (9)

We further convert problem (9) into an equivalent form by utilizing a change of variables. To achieve this, we consider inverting the linear system

𝐁𝐮=𝐳,for each𝐳(𝐁).\mathbf{B}\mathbf{u}=\mathbf{z},\ \ \mbox{for each}\ \ \mathbf{z}\in\mathcal{R}(\mathbf{B}). (10)

Here, (𝐁)\mathcal{R}(\mathbf{B}) denotes the range of 𝐁\mathbf{B}. It is known from [5, 12] that the general solution of the linear system (10) can be represented by the pseudoinverse of 𝐁\mathbf{B}. An alternative form of the general solution was provided in [17]. To describe this result, we recall that if 𝐁\mathbf{B} has the rank rr satisfying 0<rmin{pd,n}0<r\leq\mathrm{min}\{p_{d},n\}, then 𝐁\mathbf{B} has the SVD as 𝐁=𝐔𝚲𝐕\mathbf{B}=\mathbf{U}\mathbf{\Lambda}\mathbf{V}^{\top}, where 𝐔\mathbf{U} and 𝐕\mathbf{V} are pd×pdp_{d}\times p_{d} and n×nn\times n orthogonal matrices, respectively, and 𝚲\mathbf{\Lambda} is a pd×np_{d}\times n diagonal matrix with the nonzero diagonal entries σ1σr>0\sigma_{1}\geq\cdots\geq\sigma_{r}>0, which are the nonzero singular values of 𝐁\mathbf{B}. In order to represent the general solution of linear system (10), we define an n×(pd+nr)n\times(p_{d}+n-r) matrix by employing the SVD of 𝐁\mathbf{B}. Specifically, we denote by 𝐔~rpd×r\widetilde{\mathbf{U}}_{r}\in\mathbb{R}^{p_{d}\times r} the matrix composed of the first rr columns of 𝐔\mathbf{U} and define an n×(pd+nr)n\times(p_{d}+n-r) block diagonal matrix by setting

𝐔:=diag(𝐔~r,𝐈nr).\mathbf{U}^{\prime}:=\mathrm{diag}\left(\widetilde{\mathbf{U}}_{r}^{\top},\mathbf{I}_{n-r}\right).

We also introduce a diagonal matrix of order nn by

𝚲:=diag(σ11,σ21,,σr1,1,,1).\mathbf{\Lambda}^{\prime}:=\mathrm{diag}\left(\sigma_{1}^{-1},\sigma_{2}^{-1},\ldots,\sigma_{r}^{-1},1,\ldots,1\right).

Using these matrices, we define an n×(pd+nr)n\times(p_{d}+n-r) matrix by 𝐁:=𝐕𝚲𝐔.\mathbf{B}^{\prime}:=\mathbf{V}\mathbf{\Lambda}^{\prime}\mathbf{U}^{\prime}. As has been shown in [17], for each solution 𝐮\mathbf{u} of system (10), there exists a unique vector 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r} such that

𝐮=𝐁[𝐳𝐯].\mathbf{u}=\mathbf{B}^{\prime}{\small\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}. (11)

As a result, the mapping \mathcal{B}, defined for each 𝐮n\mathbf{u}\in\mathbb{R}^{n} by 𝐮:=[𝐳𝐯],\mathcal{B}\mathbf{u}:=\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}, where 𝐳:=𝐁𝐮\mathbf{z}:=\mathbf{B}\mathbf{u} and 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r} satisfies equation (11), is bijective from n\mathbb{R}^{n} onto (𝐁)×nr\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r}.

By making use of the change of variables defined by equation (11), we reformulate problem (9) as an equivalent multi-parameter regularization problem with for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} being a degenerated identity. We set 𝕄:=(𝐁)×nr\mathbb{M}:=\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r} and let ι𝕄:pd+nr¯:={+}\iota_{\mathbb{M}}:\mathbb{R}^{p_{d}+n-r}\rightarrow\overline{\mathbb{R}}:=\mathbb{R}\cup\{+\infty\} denote the indicator function of 𝕄\mathbb{M}, that is, ι𝕄(𝐰)=0\iota_{\mathbb{M}}(\mathbf{w})=0 if 𝐰𝕄\mathbf{w}\in\mathbb{M}, and ++\infty otherwise. For each jdj\in\mathbb{N}_{d}, we introduce a degenerated identity matrix by

𝐈j:=[𝟎mj×pj1𝐈mj 0mj×(pdpj) 0mj×(nr)]mj×(pd+nr),\mathbf{I}^{{}^{\prime}}_{j}:=[\mathbf{0}_{m_{j}\times p_{j-1}}\ \mathbf{I}_{m_{j}}\ \mathbf{0}_{m_{j}\times(p_{d}-p_{j})}\ \mathbf{0}_{m_{j}\times(n-r)}]\in\mathbb{R}^{m_{j}\times(p_{d}+n-r)}, (12)

where 𝟎s×t\mathbf{0}_{s\times t} denotes the zero matrix of order s×ts\times t. We show in the following lemma that the regularization problem (1) is equivalent to the regularization problem

min{𝝍𝐁(𝐰)+ι𝕄(𝐰)+jdλj𝐈j𝐰1:𝐰pd+nr}.\min\left\{\bm{\psi}\circ\mathbf{B}^{\prime}(\mathbf{w})+\iota_{\mathbb{M}}(\mathbf{w})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{I}^{{}^{\prime}}_{j}\mathbf{w}\|_{1}:\mathbf{w}\in\mathbb{R}^{p_{d}+n-r}\right\}. (13)
Lemma 3.1.

If matrix 𝐁\mathbf{B} has the form (8) and 𝐁\mathbf{B}^{\prime}, \mathcal{B} are defined as above, then 𝐮\mathbf{u}^{*} is a solution of the regularization problem (1) if and only if 𝐮\mathcal{B}\mathbf{u}^{*} is a solution of the regularization problem (13).

Proof.

We first prove that 𝐮\mathbf{u}^{*} is a solution of problem (1) if and only if 𝐮\mathcal{B}\mathbf{u}^{*} is a solution of the constrained optimization problem

min{𝝍𝐁(𝐰)+jdλj𝐈j𝐰1:𝐰𝕄}.\min\left\{\bm{\psi}\circ\mathbf{B}^{\prime}(\mathbf{w})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{I}^{{}^{\prime}}_{j}\mathbf{w}\|_{1}:\mathbf{w}\in\mathbb{M}\right\}. (14)

As has been shown in [17], \mathcal{B} is a bijective mapping from n\mathbb{R}^{n} to 𝕄\mathbb{M}. It suffices to verify that for all 𝐮n\mathbf{u}\in\mathbb{R}^{n} there holds

𝝍(𝐮)+jdλj(𝐁𝐮)j1=𝝍𝐁(𝐮)+jdλj𝐈j(𝐮)1.\bm{\psi}(\mathbf{u})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|(\mathbf{B}\mathbf{u})_{j}\|_{1}=\bm{\psi}\circ\mathbf{B}^{\prime}(\mathcal{B}\mathbf{u})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{I}^{{}^{\prime}}_{j}(\mathcal{B}\mathbf{u})\|_{1}.

By the definition of mapping \mathcal{B}, we get that 𝐁𝐮=𝐮\mathbf{B}^{\prime}\mathcal{B}\mathbf{u}=\mathbf{u} and 𝐈j(𝐮)=(𝐁𝐮)j\mathbf{I}^{{}^{\prime}}_{j}(\mathcal{B}\mathbf{u})=(\mathbf{B}\mathbf{u})_{j}, which confirm the validity of the equation above.

We next show that problem (14) as a constrained optimization problem is equivalent to the unconstrained optimization problem (13). By the definition of the indicator function ι𝕄\iota_{\mathbb{M}}, the minimum of problem (13) will be assumed at an element 𝐰𝕄\mathbf{w}\in\mathbb{M}. Thus, problem (13) can be rewritten as problem (14). ∎

Below, we consider how the regularization parameters λj\lambda_{j}, jdj\in\mathbb{N}_{d} influence the sparsity of the solution of problem (13). To characterize the solution of problem (13), we need the notion of the subdifferential of a convex function. Suppose that f:s¯f:\mathbb{R}^{s}\to\overline{\mathbb{R}} is a proper convex function. The subdifferential of ff at 𝐱dom(f):={𝐲s:f(𝐲)<+}\mathbf{x}\in{\rm dom}(f):=\{\mathbf{y}\in\mathbb{R}^{s}:f(\mathbf{y})<+\infty\} is defined by

f(𝐱):={𝐲s:f(𝐳)f(𝐱)+𝐲,𝐳𝐱,forall𝐳s}.\partial f(\mathbf{x}):=\{\mathbf{y}\in\mathbb{R}^{s}:\ f(\mathbf{z})\geq f(\mathbf{x})+\langle\mathbf{y},\mathbf{z}-\mathbf{x}\rangle,\ \mathrm{for}\ \mathrm{all}\ \mathbf{z}\in\mathbb{R}^{s}\}.

It is known [30] that for two convex functions ff and gg on s\mathbb{R}^{s}, if gg is continuous on s\mathbb{R}^{s} then (f+g)(𝐱)=f(𝐱)+g(𝐱)\partial(f+g)(\mathbf{x})=\partial f(\mathbf{x})+\partial g(\mathbf{x}), for all 𝐱s\mathbf{x}\in\mathbb{R}^{s}. We also describe the chain rule of the subdifferential [24]. Suppose that f:s¯f:\mathbb{R}^{s}\rightarrow\overline{\mathbb{R}} is a convex function and 𝐌\mathbf{M} is an s×ts\times t matrix. If ff is continuous at some point of the range of 𝐌\mathbf{M}, then for all 𝐱t\mathbf{x}\in\mathbb{R}^{t}

(f𝐌)(𝐱)=𝐌f(𝐌(𝐱)).\partial(f\circ\mathbf{M})(\mathbf{x})=\mathbf{M}^{\top}\partial f(\mathbf{M}(\mathbf{x})). (15)

The Fermat rule [30] states that if a proper convex function f:s¯f:\mathbb{R}^{s}\rightarrow\overline{\mathbb{R}} has a minimum at 𝐱s\mathbf{x}\in\mathbb{R}^{s} if and only if 𝟎f(𝐱)\mathbf{0}\in\partial f(\mathbf{x}).

For the purpose of characterizing sparsity of vectors in s\mathbb{R}^{s}, we also recall the notion of the sparsity partition of s\mathbb{R}^{s} introduced in [29]. Specifically, by using the canonical basis 𝐞s,j,js,\mathbf{e}_{s,j},j\in\mathbb{N}_{s}, for s\mathbb{R}^{s}, we introduce s+1s+1 numbers of subsets of s\mathbb{R}^{s} by

Ωs,0:={𝟎s},\displaystyle\Omega_{s,0}:=\{\mathbf{0}\in\mathbb{R}^{s}\},
Ωs,l:={jlxkj𝐞s,kj:xkj{0},for 1k1<k2<<kls},forls.\displaystyle\Omega_{s,l}:=\left\{\sum_{j\in\mathbb{N}_{l}}x_{k_{j}}\mathbf{e}_{s,k_{j}}:x_{k_{j}}\in\mathbb{R}\setminus{\{0\}},\ \mathrm{for}\ 1\leq k_{1}<k_{2}<\cdots<k_{l}\leq s\right\},\ \mathrm{for}\ l\in\mathbb{N}_{s}.

It is clear that the sets Ωs,l,ls+1\Omega_{s,l},l\in\mathbb{Z}_{s+1}, form a partition for s\mathbb{R}^{s} and for each ls+1l\in\mathbb{Z}_{s+1}, Ωs,l\Omega_{s,l} coincides with the set of all vectors in s\mathbb{R}^{s} having sparsity of level ll.

We provide in the next lemma a characterization of the sparsity of the solution of problem (13). For an s×ts\times t matrix 𝐌\mathbf{M}, we denote by 𝒩(𝐌)\mathcal{N}(\mathbf{M}) the null space of 𝐌\mathbf{M} and for each iti\in\mathbb{N}_{t}, denote by 𝐌(i)\mathbf{M}_{(i)} the iith column of 𝐌\mathbf{M}.

Lemma 3.2.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. Let 𝐁\mathbf{B} be defined as in (8). Then problem (13) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐰:=[𝐳𝐯]\mathbf{w}^{*}:=\scriptsize{\begin{bmatrix}\mathbf{z^{*}}\\ \mathbf{v^{*}}\end{bmatrix}} with for each jd,j\in\mathbb{N}_{d}, 𝐳j=iljzpj1+ki𝐞mj,kiΩmj,lj,\mathbf{z}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if there exist 𝐚𝛙(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}^{*}) and 𝐛:=[bj:jpd]𝒩(𝐁)\mathbf{b}:=[b_{j}:j\in\mathbb{N}_{p_{d}}]\in\mathcal{N}(\mathbf{B}^{\top}) such that

(𝐁(i))𝐚=0,ipd+nrpd,(\mathbf{B}^{\prime}_{(i)})^{\top}\mathbf{a}={0},\ i\in\mathbb{N}_{p_{d}+n-r}\setminus\mathbb{N}_{p_{d}}, (16)

and for each jdj\in\mathbb{N}_{d}

λj=((𝐁(pj1+ki))𝐚+bpj1+ki)sign(zpj1+ki),ilj,\lambda_{j}=-\left((\mathbf{B}_{(p_{j-1}+k_{i})}^{\prime})^{\top}\mathbf{a}+b_{p_{j-1}+k_{i}}\right)\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),\ i\in\mathbb{N}_{l_{j}}, (17)
λj|(𝐁(pj1+i))𝐚+bpj1+i|,imj{ki:ilj}.\lambda_{j}\geq\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\mathbf{a}+b_{p_{j-1}+i}\right|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}. (18)
Proof.

By the Fermat rule, we have that 𝐰:=[𝐳𝐯]\mathbf{w}^{*}:=\scriptsize{\begin{bmatrix}\mathbf{z}^{*}\\ \mathbf{v}^{*}\end{bmatrix}} is a solution of problem (13) if and only if

𝟎(𝝍𝐁+ι𝕄+jdλj1𝐈j)(𝐰),\mathbf{0}\in\partial\left(\bm{\psi}\circ\mathbf{B}^{\prime}+\iota_{\mathbb{M}}+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}\right)(\mathbf{w}^{*}),

which by the continuity of the 1\ell_{1} norm and the chain rule (15) of the subdifferential is equivalent to

𝟎(𝐁)𝝍(𝐁𝐰)+ι𝕄(𝐰)+jdλj(𝐈j)1(𝐳j).\mathbf{0}\in(\mathbf{B}^{\prime})^{\top}\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}^{*})+\partial\iota_{\mathbb{M}}(\mathbf{w}^{*})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}(\mathbf{I}^{{}^{\prime}}_{j})^{\top}\partial\|\cdot\|_{1}(\mathbf{z}_{j}^{*}). (19)

Let 𝕄\mathbb{M}^{\bot} denote the orthogonal complement of 𝕄\mathbb{M}. It is known that ι𝕄(𝐰)=𝕄\partial\iota_{\mathbb{M}}(\mathbf{w})=\mathbb{M}^{\bot} for all 𝐰𝕄\mathbf{w}\in\mathbb{M}. Recalling that 𝕄:=(𝐁)×nr\mathbb{M}:=\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r}, we get that

𝕄=((𝐁))×(nr)=𝒩(𝐁)×{𝟎}.\mathbb{M}^{\bot}=(\mathcal{R}(\mathbf{B}))^{\bot}\times(\mathbb{R}^{n-r})^{\bot}=\mathcal{N}(\mathbf{B}^{\top})\times\{\mathbf{0}\}.

As a result, ι𝕄(𝐰)=𝒩(𝐁)×{𝟎}\partial\iota_{\mathbb{M}}(\mathbf{w})=\mathcal{N}(\mathbf{B}^{\top})\times\{\mathbf{0}\}, for all 𝐰𝕄\mathbf{w}\in\mathbb{M}. Substituting the above equation with 𝐰:=𝐰\mathbf{w}:=\mathbf{w}^{*} into the inclusion relation (19), we conclude that 𝐰:=[𝐳𝐯]\mathbf{w}^{*}:=\scriptsize{\begin{bmatrix}\mathbf{z}^{*}\\ \mathbf{v}^{*}\end{bmatrix}} is a solution of problem (13) if and only if there exist 𝐚𝝍(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}^{*}) and 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) such that

(𝐁)𝐚[𝐛𝟎]jdλj(𝐈j)1(𝐳j).-(\mathbf{B}^{\prime})^{\top}\mathbf{a}-\small{\begin{bmatrix}\mathbf{b}\\ \mathbf{0}\end{bmatrix}}\in\sum_{j\in\mathbb{N}_{d}}\lambda_{j}(\mathbf{I}^{{}^{\prime}}_{j})^{\top}\partial\|\cdot\|_{1}(\mathbf{z}_{j}^{*}). (20)

Note that for each jdj\in\mathbb{N}_{d}

𝐳j=iljzpj1+ki𝐞mj,kiΩmj,lj,withzpj1+ki{0},ilj,\mathbf{z}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}},\ \mbox{with}\ z_{p_{j-1}+k_{i}}^{*}\in\mathbb{R}\setminus{\{0\}},\ i\in\mathbb{N}_{l_{j}},

with which we obtain that

1(𝐳j)={𝐱mj:xki=sign(zpj1+ki),iljand|xi|1,imj{ki:ilj}}.\partial\|\cdot\|_{1}(\mathbf{z}_{j}^{*})=\left\{\mathbf{x}\in\mathbb{R}^{m_{j}}:x_{k_{i}}=\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),i\in\mathbb{N}_{l_{j}}\ \mbox{and}\ |x_{i}|\leq 1,i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}\right\}.

Combining inclusion relation (20) with the above equation, we have that (𝐁(i))𝐚=0(\mathbf{B}^{\prime}_{(i)})^{\top}\mathbf{a}={0} for all ipd+nrpdi\in\mathbb{N}_{p_{d}+n-r}\setminus\mathbb{N}_{p_{d}}, which coincides with equation (16) and for each jdj\in\mathbb{N}_{d}

(𝐁(pj1+ki))𝐚bpj1+ki=λjsign(zpj1+ki),ilj,-(\mathbf{B}_{(p_{j-1}+k_{i})}^{\prime})^{\top}\mathbf{a}-b_{p_{j-1}+k_{i}}=\lambda_{j}\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),\ i\in\mathbb{N}_{l_{j}},
(𝐁(pj1+i))𝐚bpj1+i[λj,λj],imj{ki:ilj},-(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\mathbf{a}-b_{p_{j-1}+i}\in[-\lambda_{j},\lambda_{j}],\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\},

which are equivalent to equation (17) and inequality (18), respectively. ∎

Combining Lemmas 3.1 with 3.2, we establish the relation between the regularization parameters and the sparsity of the solution of problem (1).

Theorem 3.3.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. Let 𝐁\mathbf{B} be defined as in (8). Then problem (1) with λj>0,jd,\lambda_{j}>0,j\in\mathbb{N}_{d}, has a solution 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} with for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=iljzpj1+ki𝐞mj,kiΩmj,lj,\mathbf{B}_{j}\mathbf{u}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if there exist 𝐚𝛙(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}) and 𝐛:=[bj:jpd]𝒩(𝐁)\mathbf{b}:=[b_{j}:j\in\mathbb{N}_{p_{d}}]\in\mathcal{N}(\mathbf{B}^{\top}) such that (16), (17) and (18) hold.

Proof.

It follows from Lemma 3.1 that 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} with for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=iljzpj1+ki𝐞mj,kiΩmj,lj\mathbf{B}_{j}\mathbf{u}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}} is a solution of problem (1) if and only if 𝐮:=[𝐳𝐯]\mathcal{B}\mathbf{u}^{*}:=\scriptsize{\begin{bmatrix}\mathbf{z}^{*}\\ \mathbf{v}^{*}\end{bmatrix}} with for each jd,j\in\mathbb{N}_{d}, 𝐳j=iljzpj1+ki𝐞mj,kiΩmj,lj,\mathbf{z}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, is a solution of problem (13). The latter guaranteed by Lemma 3.2 is equivalent to that there exist 𝐚𝝍(𝐁𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathcal{B}\mathbf{u}^{*}) and 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) such that (16), (17) and (18) hold. It suffices to show that 𝐚𝝍(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}). This is done by noting that 𝐁𝐮=𝐮\mathbf{B}^{\prime}\mathcal{B}\mathbf{u}^{*}=\mathbf{u}^{*}. ∎

Theorem 3.3 provides a characterization of the multiple regularization parameter λj\lambda_{j}, jdj\in\mathbb{N}_{d}, with which problem (1) has a solution with sparsity of a certain level ljl_{j} under each transform matrix 𝐁j\mathbf{B}_{j}. By specifying the sparsity level of the solution under each transform matrix 𝐁j\mathbf{B}_{j} to be ljl_{j}^{*}, our goal is to find regularization parameters λj\lambda_{j}^{*}, jdj\in\mathbb{N}_{d}, satisfying conditions (16), (17) and (18). However, since these conditions depend on the corresponding solution, the characterization stated in Theorem 3.3 can not be used directly as a multi-parameter choice strategy. Motivated by Theorem 3.3, an iterative scheme to be developed in section 5 will enable us to choose multiple regularization parameters with which a minimizer of problem (1) has a prescribed sparsity level under each transform matrix.

The next result concerns the special case that matrix 𝐁\mathbf{B} defined by (8) has full row rank, that is, rank(𝐁)=pd\mathrm{rank}(\mathbf{B})=p_{d}.

Corollary 3.4.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. If 𝐁\mathbf{B} defined by (8) satisfies rank(𝐁)=pd\mathrm{rank}(\mathbf{B})=p_{d}, then problem (1) with λj>0,jd,\lambda_{j}>0,j\in\mathbb{N}_{d}, has a solution 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} with for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=iljzpj1+ki𝐞mj,kiΩmj,lj,\mathbf{B}_{j}\mathbf{u}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if there exists 𝐚𝛙(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}) such that

(𝐁(i))𝐚=0,inpd,(\mathbf{B}^{\prime}_{(i)})^{\top}\mathbf{a}={0},\ i\in\mathbb{N}_{n}\setminus\mathbb{N}_{p_{d}}, (21)

and for each jdj\in\mathbb{N}_{d}

λj=(𝐁(pj1+ki))𝐚sign(zpj1+ki),ilj,\lambda_{j}=-(\mathbf{B}_{(p_{j-1}+k_{i})}^{\prime})^{\top}\mathbf{a}\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),\ i\in\mathbb{N}_{l_{j}}, (22)
λj|(𝐁(pj1+i))𝐚|,imj{ki:ilj}.\lambda_{j}\geq\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\mathbf{a}\right|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}. (23)
Proof.

Theorem 3.3 ensures that 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} with for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=iljzpj1+ki𝐞mj,kiΩmj,lj\mathbf{B}_{j}\mathbf{u}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}} is a solution of problem (1) if and only if there exist 𝐚𝝍(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}) and 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) such that (16), (17) and (18) hold. By the assumption that rank(𝐁)=pd\mathrm{rank}(\mathbf{B})=p_{d}, we rewrite equation (16) as equation (21). It follows from rank(𝐁)=pd\mathrm{rank}(\mathbf{B})=p_{d} that 𝒩(𝐁)=((𝐁))={𝟎}\mathcal{N}(\mathbf{B}^{\top})=\left(\mathcal{R}(\mathbf{B})\right)^{\bot}=\{\mathbf{0}\}. Then vector 𝐛\mathbf{b} in (17) and (18) is the zero vector. Thus, (22) and (23) can be obtained directly. ∎

Note that if 𝝍\bm{\psi} is differentiable, then the subdifferential of 𝝍\bm{\psi} at 𝐮\mathbf{u}^{*} is the singleton 𝝍(𝐮)\nabla\bm{\psi}(\mathbf{u}^{*}). In this case, Theorem 3.3 and Corollary 3.4 have the following simple form.

Corollary 3.5.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a differentiable and convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. Let 𝐁\mathbf{B} be defined as in (8). Then problem (1) with λj>0,jd,\lambda_{j}>0,j\in\mathbb{N}_{d}, has a solution 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} with for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=iljzpj1+ki𝐞mj,kiΩmj,lj,\mathbf{B}_{j}\mathbf{u}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if there exist 𝐛:=[bj:jpd]𝒩(𝐁)\mathbf{b}:=[b_{j}:j\in\mathbb{N}_{p_{d}}]\in\mathcal{N}(\mathbf{B}^{\top}) such that

(𝐁(i))𝝍(𝐮)=0,ipd+nrpd,(\mathbf{B}^{\prime}_{(i)})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})={0},\ i\in\mathbb{N}_{p_{d}+n-r}\setminus\mathbb{N}_{p_{d}},

and for each jdj\in\mathbb{N}_{d}

λj=((𝐁(pj1+ki))𝝍(𝐮)+bpj1+ki)sign(zpj1+ki),ilj,\lambda_{j}=-\left((\mathbf{B}_{(p_{j-1}+k_{i})}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})+b_{p_{j-1}+k_{i}}\right)\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),\ i\in\mathbb{N}_{l_{j}},
λj|(𝐁(pj1+i))𝝍(𝐮)+bpj1+i|,imj{ki:ilj}.\lambda_{j}\geq\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})+b_{p_{j-1}+i}\right|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}.

In particular, if rank(𝐁)=pd\mathrm{rank}(\mathbf{B})=p_{d}, then the conditions reduce to

(𝐁(i))𝝍(𝐮)=0,inpd,(\mathbf{B}^{\prime}_{(i)})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})={0},\ i\in\mathbb{N}_{n}\setminus\mathbb{N}_{p_{d}},

and for each jdj\in\mathbb{N}_{d}

λj=(𝐁(pj1+ki))𝝍(𝐮)sign(zpj1+ki),kilj,\lambda_{j}=-(\mathbf{B}_{(p_{j-1}+k_{i})}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})\mathrm{sign}(z_{p_{j-1}+k_{i}}^{*}),\ k_{i}\in\mathbb{N}_{l_{j}},
λj|(𝐁(pj1+i))𝝍(𝐮)|,imj{ki:ilj}.\lambda_{j}\geq\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{*})\right|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}.

4 A special model with degenerated transform matrices

In this section, we consider the special case where pd=np_{d}=n and for each jd,j\in\mathbb{N}_{d}, the transform matrix 𝐁j\mathbf{B}_{j} takes the form

𝐁j:=[𝟎mj×pj1𝐈mj 0mj×(npj)]mj×n.\mathbf{B}_{j}:=[\mathbf{0}_{m_{j}\times p_{j-1}}\ \mathbf{I}_{m_{j}}\ \mathbf{0}_{m_{j}\times(n-p_{j})}]\in\mathbb{R}^{m_{j}\times n}. (24)

In this scenario, the multi-parameter regularization problem (1) assumes the special form

min{𝝍(𝐮)+jdλj𝐮j1:𝐮n}.\min\left\{\bm{\psi}(\mathbf{u})+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{u}_{j}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}. (25)

We specialize the characterizations of the sparsity of regularized solutions established in the previous section to this special case. Moreover, particular attention is given to scenarios where the fidelity term 𝝍\bm{\psi} is block-separable.

We first characterize the sparsity of the solution of problem (25). It follows from equation (24) that matrix 𝐁\mathbf{B} defined by (8) coincides with the identity matrix of order nn. Given that matrix 𝐁\mathbf{B} has full row rank, we specialize Corollary 3.4 to the regularization problem (25). In addition, the transform matrices 𝐁j\mathbf{B}_{j}, jdj\in\mathbb{N}_{d}, with the form (24) enable us to consider the sparsity of each sub-vector of the regularized solution separately.

Theorem 4.1.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function. Then problem (25) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j=iljupj1+ki𝐞mj,kiΩmj,lj\mathbf{u}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}u^{*}_{p_{j-1}+k_{i}}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if there exists 𝐚:=[aj:jn]𝛙(𝐮)\mathbf{a}:=[a_{j}:j\in\mathbb{N}_{n}]\in\partial\bm{\psi}(\mathbf{u}^{*}) such that for each jdj\in\mathbb{N}_{d}

λj=apj1+kisign(upj1+ki),ilj,\lambda_{j}=-a_{p_{j-1}+k_{i}}\mathrm{sign}(u^{*}_{p_{j-1}+k_{i}}),\ i\in\mathbb{N}_{l_{j}}, (26)
λj|apj1+i|,imj{ki:ilj}.\lambda_{j}\geq|a_{p_{j-1}+i}|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}. (27)

In particular, if 𝛙\bm{\psi} is differentiable, then the conditions reduce to for each jdj\in\mathbb{N}_{d}

λj=𝝍upj1+ki(𝐮)sign(upj1+ki),ilj,\lambda_{j}=-\frac{\partial{\bm{\psi}}}{\partial{u_{p_{j-1}+k_{i}}}}(\mathbf{u}^{*})\mathrm{sign}(u^{*}_{p_{j-1}+k_{i}}),\ i\in\mathbb{N}_{l_{j}}, (28)
λj|𝝍upj1+i(𝐮)|,imj{ki:ilj}.\lambda_{j}\geq\left|\frac{\partial{\bm{\psi}}}{\partial{u_{p_{j-1}+i}}}(\mathbf{u}^{*})\right|,\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}. (29)
Proof.

Since matrix 𝐁\mathbf{B} defined by equation (8) has full row rank, Corollary 3.4 ensures that problem (25) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j=iljupj1+ki𝐞mj,kiΩmj,lj\mathbf{u}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}u^{*}_{p_{j-1}+k_{i}}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if for each jdj\in\mathbb{N}_{d} there exists 𝐚:=[aj:jn]𝝍(𝐮)\mathbf{a}:=[a_{j}:j\in\mathbb{N}_{n}]\in\partial\bm{\psi}(\mathbf{u}^{*}) such that conditions (21), (22) and (23) hold. Note that index ii such that equation (21) holds belongs to an empty set since pd=np_{d}=n. It is clear that matrix 𝐁\mathbf{B}^{\prime} appearing in conditions (22) and (23) is also an identity matrix. Hence, conditions (22) and (23) reduce to (26) and (27), respectively. If 𝝍\bm{\psi} is differentiable, then the subdifferential of 𝝍\bm{\psi} at 𝐮\mathbf{u}^{*} is the singleton 𝝍(𝐮)\nabla\bm{\psi}(\mathbf{u}^{*}). Substituting aj=𝝍uj(𝐮)a_{j}=\frac{\partial{\bm{\psi}}}{\partial{u_{j}}}(\mathbf{u}^{*}) into (26) and (27) lead directly to (28) and (29), respectively. ∎

As a specific example, we consider the regularization problem

min{12𝐀𝐮𝐱22+jdλj𝐮j1:𝐮n}.\min\left\{\frac{1}{2}\left\|\mathbf{A}\mathbf{u}-\mathbf{x}\right\|_{2}^{2}+\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{u}_{j}\|_{1}:\mathbf{u}\in\mathbb{R}^{n}\right\}. (30)

In this model, the fidelity term

𝝍(𝐮):=12𝐀𝐮𝐱22,𝐮n,\bm{\psi}(\mathbf{u}):=\frac{1}{2}\|\mathbf{A}\mathbf{u}-\mathbf{x}\|_{2}^{2},\ \mathbf{u}\in\mathbb{R}^{n}, (31)

is convex and differentiable. As a result, we apply Theorem 4.1 to this model.

Corollary 4.2.

Suppose that 𝐱t\mathbf{x}\in\mathbb{R}^{t} and 𝐀t×n\mathbf{A}\in\mathbb{R}^{t\times n} are given. Then the regularization problem (30) with λj>0,jd\lambda_{j}>0,j\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j=iljupj1+ki𝐞mj,kiΩmj,lj\mathbf{u}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}u_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if for each jdj\in\mathbb{N}_{d}

λj=(𝐀(pj1+ki))(𝐱𝐀𝐮)sign(upj1+ki),ilj\lambda_{j}=(\mathbf{A}_{(p_{j-1}+k_{i})})^{\top}(\mathbf{x}-\mathbf{Au^{*}})\mathrm{sign}(u^{*}_{p_{j-1}+k_{i}}),\ i\in\mathbb{N}_{l_{j}} (32)
λj|(𝐀(pj1+i))(𝐀𝐮𝐱)|,imj{ki:ilj}.\lambda_{j}\geq\big{|}(\mathbf{A}_{(p_{j-1}+i)})^{\top}(\mathbf{Au^{*}}-\mathbf{x})\big{|},\ i\in\mathbb{N}_{m_{j}}\setminus\{k_{i}:i\in\mathbb{N}_{l_{j}}\}. (33)
Proof.

Since the fidelity term 𝝍(𝐮)\bm{\psi}(\mathbf{u}) defined by (31) is convex and differentiable, Theorem 4.1 confirms that problem (30) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j=iljupj1+ki𝐞mj,kiΩmj,lj\mathbf{u}_{j}^{*}=\sum_{i\in\mathbb{N}_{l_{j}}}u^{*}_{p_{j-1}+k_{i}}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}}, for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} if and only if (28) and (29) hold. Note that the gradient of 𝝍\bm{\psi} at 𝐮\mathbf{u}^{*} has the form 𝝍(𝐮)=𝐀(𝐀𝐮𝐱)\nabla\bm{\psi}(\mathbf{u}^{*})=\mathbf{A}^{\top}(\mathbf{A}\mathbf{u}^{*}-\mathbf{x}). As a result, there holds for each jnj\in\mathbb{N}_{n} and each imji\in\mathbb{N}_{m_{j}}

𝝍upj1+i(𝐮)=(𝐀(pj1+i))(𝐀𝐮𝐱).\frac{\partial{\bm{\psi}}}{\partial{u_{p_{j-1}+i}}}(\mathbf{u}^{*})=(\mathbf{A}_{(p_{j-1}+i)})^{\top}(\mathbf{Au^{*}}-\mathbf{x}).

According to the above representations of the partial derivatives of 𝝍\bm{\psi}, conditions (28) and (29) reduce to (32) and (33), respectively.

We next study the case that the fidelity term 𝝍\bm{\psi} involved in problem (25) has special structure, that is, 𝝍\bm{\psi} is block separable. To describe the block separability of a function on s\mathbb{R}^{s}, we introduce a partition of the index set s\mathbb{N}_{s}. Let qq\in\mathbb{N} with qsq\leq s. We suppose that 𝒮s,q:={Ss,1,Ss,2,,Ss,q}\mathcal{S}_{s,q}:=\left\{S_{s,1},S_{s,2},\ldots,S_{s,q}\right\} is a partition of s\mathbb{N}_{s} in the sense that Ss,kS_{s,k}\neq\emptyset, for all kqk\in\mathbb{N}_{q}, Ss,kSs,l=S_{s,k}\cap S_{s,l}=\emptyset if klk\neq l, and kqSs,k=s\cup_{k\in\mathbb{N}_{q}}S_{s,k}=\mathbb{N}_{s}. For each kqk\in\mathbb{N}_{q} we denote by sks_{k} the cardinality of Ss,kS_{s,k} and regard Ss,kS_{s,k} as an ordered set in the natural order of the elements in s\mathbb{N}_{s}. That is,

Ss,k:={ik,1,,ik,sk},withik,ls,lskandik,1<<ik,sk.S_{s,k}:=\{i_{k,1},\dots,i_{k,s_{k}}\},\ \mbox{with}\ i_{k,l}\in\mathbb{N}_{s},\ l\in\mathbb{N}_{s_{k}}\ \mbox{and}\ i_{k,1}<\dots<i_{k,s_{k}}.

Associated with partition 𝒮s,q\mathcal{S}_{s,q}, we decompose 𝐰:=[wk:ks]s\mathbf{w}:=[w_{k}:k\in\mathbb{N}_{s}]\in\mathbb{R}^{s} into qq sub-vectors by setting

𝐰Ss,k:=[wik,1,,wik,sk]sk,kq.\mathbf{w}_{S_{s,k}}:=[w_{i_{k,1}},\dots,w_{{i}_{k,s_{k}}}]\in\mathbb{R}^{s_{k}},\ k\in\mathbb{N}_{q}.

A function ϕ:s\bm{\phi}:\mathbb{R}^{s}\rightarrow\mathbb{R} is called 𝒮s,q\mathcal{S}_{s,q}-block separable if there exist functions ϕk:sk\bm{\phi}_{k}:\mathbb{R}^{s_{k}}\rightarrow\mathbb{R}, kqk\in\mathbb{N}_{q} such that

ϕ(𝐰)=kqϕk(𝐰Ss,k),𝐰s.\bm{\phi}(\mathbf{w})=\sum\limits_{k\in\mathbb{N}_{q}}\bm{\phi}_{k}(\mathbf{w}_{S_{s,k}}),\ \mathbf{w}\in\mathbb{R}^{s}.

We now describe the block separablity of the fidelity term 𝝍\bm{\psi}. Recall that pd=np_{d}=n. If the partition 𝒮n,d:={Sn,1,Sn,2,,Sn,d}\mathcal{S}_{n,d}:=\left\{S_{n,1},S_{n,2},\ldots,S_{n,d}\right\} for n\mathbb{N}_{n} is chosen with Sn,j:={pj1+i:imj},jd,S_{n,j}:=\{p_{j-1}+i:i\in\mathbb{N}_{m_{j}}\},\ j\in\mathbb{N}_{d}, then for each jdj\in\mathbb{N}_{d} the sub-vector 𝐮Sn,j\mathbf{u}_{S_{n,j}} of 𝐮n\mathbf{u}\in\mathbb{R}^{n} coincides with 𝐮j\mathbf{u}_{j}. It is clear that the regularization term in problem (25) is 𝒮n,d\mathcal{S}_{n,d}-block separable. We also assume that 𝝍\bm{\psi} is 𝒮n,d\mathcal{S}_{n,d}-block separable, that is, there exist functions 𝝍j:mj+\bm{\psi}_{j}:\mathbb{R}^{m_{j}}\rightarrow\mathbb{R}_{+}, jdj\in\mathbb{N}_{d} such that

𝝍(𝐮)=jd𝝍j(𝐮j),𝐮n.\bm{\psi}(\mathbf{u})=\sum\limits_{j\in\mathbb{N}_{d}}\bm{\psi}_{j}(\mathbf{u}_{j}),\ \mathbf{u}\in\mathbb{R}^{n}. (34)

Combining the block separability of the fidelity term 𝝍\bm{\psi} and the norm function 1\|\cdot\|_{1}, the multi-parameter regularization problem (25) can be reduced to the following lower dimensional single-parameter regularization problems

min{𝝍j(𝐮j)+λj𝐮j1:𝐮jmj},jd.\min\left\{\bm{\psi}_{j}(\mathbf{u}_{j})+\lambda_{j}\|\mathbf{u}_{j}\|_{1}:\mathbf{u}_{j}\in\mathbb{R}^{m_{j}}\right\},\ j\in\mathbb{N}_{d}. (35)

Note that the sparsity of the solution of each single parameter regularization problem (35) was characterized in [17]. This characterization can also be derived from Theorem 4.1. We further assume that functions 𝝍j\bm{\psi}_{j}, jdj\in\mathbb{N}_{d} has block separability. For each jdj\in\mathbb{N}_{d}, let 𝒮mj,qj:={Smj,1,Smj,2,,Smj,qj}\mathcal{S}_{m_{j},q_{j}}:=\left\{S_{m_{j},1},S_{m_{j},2},\ldots,S_{m_{j},q_{j}}\right\} be a partition of mj\mathbb{N}_{m_{j}} and for each kqjk\in\mathbb{N}_{q_{j}}, sj,ks_{j,k} be the cardinality of Smj,kS_{m_{j},k}. For each 𝐮n\mathbf{u}\in\mathbb{R}^{n}, we set 𝐮j,k:=(𝐮j)Smj,k\mathbf{u}_{j,k}:=(\mathbf{u}_{j})_{S_{m_{j},k}} for all jdj\in\mathbb{N}_{d} and kqjk\in\mathbb{N}_{q_{j}}. Suppose that for each jdj\in\mathbb{N}_{d}, 𝝍j\bm{\psi}_{j} has the form

𝝍j(𝐮j)=kqj𝝍j,k(𝐮j,k),𝐮jmj,\bm{\psi}_{j}(\mathbf{u}_{j})=\sum\limits_{k\in\mathbb{N}_{q_{j}}}\bm{\psi}_{j,k}(\mathbf{u}_{j,k}),\ \mathbf{u}_{j}\in\mathbb{R}^{m_{j}}, (36)

with 𝝍j,k\bm{\psi}_{j,k} being functions from sj,k\mathbb{R}^{s_{j,k}} to +\mathbb{R}_{+}, kqjk\in\mathbb{N}_{q_{j}}.

We are ready to characterize the block sparsity of each sub-vector of the solution of problem (25) when 𝝍\bm{\psi} has block separability described above. Here, we say that a vector 𝐱s\mathbf{x}\in\mathbb{R}^{s} has 𝒮s,q\mathcal{S}_{s,q}-block sparsity of level lq+1l\in\mathbb{Z}_{q+1} if 𝐱\mathbf{x} has exactly ll number of nonzero sub-vectors with respect to partition 𝒮s,q\mathcal{S}_{s,q}.

Theorem 4.3.

Suppose that for each jdj\in\mathbb{N}_{d} and each kqjk\in\mathbb{N}_{q_{j}}, 𝛙j,k:sj,k+\bm{\psi}_{j,k}:\mathbb{R}^{s_{j,k}}\to\mathbb{R}_{+} is a convex function and 𝛙j\bm{\psi}_{j} is an 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block separable function having the form (36). Let 𝛙\bm{\psi} be the function with the form (34). Then problem (25) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j\mathbf{u}_{j}^{*} having the 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block sparsity of level ljljl^{\prime}_{j}\leq l_{j} for some ljqj+1l_{j}\in\mathbb{Z}_{q_{j}+1} if and only if for each jdj\in\mathbb{N}_{d} there exist distinct kj,iqjk_{j,i}\in\mathbb{N}_{q_{j}}, ilji\in\mathbb{N}_{l_{j}}, such that

λjmin{𝐲:𝐲𝝍j,k(𝟎)},for allkqj{kj,i:ilj}.\lambda_{j}\geq\mathrm{min}\left\{\|\mathbf{y}\|_{\infty}:\mathbf{y}\in\partial\bm{\psi}_{j,k}(\mathbf{0})\right\},\ \mbox{for all}\ k\in\mathbb{N}_{q_{j}}\setminus{\{k_{j,i}:i\in\mathbb{N}_{l_{j}}\}}. (37)

In particular, if 𝛙j,k\bm{\psi}_{j,k}, jdj\in\mathbb{N}_{d}, kqjk\in\mathbb{N}_{q_{j}} are differentiable, then condition (37) reduces to

λj𝝍j,k(𝟎),for allkqj{kj,i:ilj}.\lambda_{j}\geq\|\nabla\bm{\psi}_{j,k}(\mathbf{0})\|_{\infty},\ \ \mbox{for all}\ \ k\in\mathbb{N}_{q_{j}}\setminus{\{k_{j,i}:i\in\mathbb{N}_{l_{j}}\}}. (38)
Proof.

Observing from the block separability of functions 𝝍\bm{\psi} and 1\|\cdot\|_{1}, we conclude that 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} is a solution of problem (25) if and only if for each jdj\in\mathbb{N}_{d}, 𝐮jmj\mathbf{u}^{*}_{j}\in\mathbb{R}^{m_{j}} is a solution of problem (35). Theorem 3.2 in [17] ensures that for each jdj\in\mathbb{N}_{d}, 𝐮jmj\mathbf{u}^{*}_{j}\in\mathbb{R}^{m_{j}} is a solution of problem (35) and has the 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block sparsity of level ljljl^{\prime}_{j}\leq l_{j} for some ljqj+1l_{j}\in\mathbb{Z}_{q_{j}+1} if and only if there exist distinct kj,iqjk_{j,i}\in\mathbb{N}_{q_{j}}, ilji\in\mathbb{N}_{l_{j}}, such that (37) holds. For the case that 𝝍j,k\bm{\psi}_{j,k}, jdj\in\mathbb{N}_{d}, kqjk\in\mathbb{N}_{q_{j}} are all differentiable, it suffices to notice that the subdifferential of 𝝍j,k\bm{\psi}_{j,k} at zero are the singleton 𝝍j,k(𝟎)\nabla\bm{\psi}_{j,k}(\mathbf{0}). This together with inequality (37) leads to inequality (38). ∎

Unlike in Theorems 3.3 and 4.1, the characterization stated in Theorem 4.3 can be taken as a multi-parameter choice strategy. That is, when the fidelity term is block separable, if for each jdj\in\mathbb{N}_{d}, the regularization parameter λj\lambda_{j} is chosen so that inequality (37) (or (38)) holds, then the regularization problem (25) has a solution with each sub-vector having a block sparsity of a prescribed level. The choice of the parameters depends on the subdifferentials or the gradients of the functions 𝝍j,k\bm{\psi}_{j,k}, jdj\in\mathbb{N}_{d}, kqjk\in\mathbb{N}_{q_{j}}.

We also specialize Theorem 4.3 to the regularization problem (30). For this purpose, we require that the fidelity term 𝝍\bm{\psi} defined by (31) is block separable. Associated with the partition 𝒮n,d:={Sn,1,Sn,2,,Sn,d}\mathcal{S}_{n,d}:=\left\{S_{n,1},S_{n,2},\ldots,S_{n,d}\right\} for n\mathbb{N}_{n} with Sn,j:={pj1+i:imj}S_{n,j}:=\{p_{j-1}+i:i\in\mathbb{N}_{m_{j}}\}, jd,j\in\mathbb{N}_{d}, we decompose matrix 𝐀t×n\mathbf{A}\in\mathbb{R}^{t\times n} into dd sub-matrices by setting

𝐀[j]:=[𝐀(i):iSn,j]t×mj,jd.\mathbf{A}_{[j]}:=[\mathbf{A}_{(i)}:i\in S_{n,j}]\in\mathbb{R}^{t\times m_{j}},\ j\in\mathbb{N}_{d}.

By lemma 3.4 of [17], the fidelity term 𝝍\bm{\psi} defined by (31) is 𝒮n,d\mathcal{S}_{n,d}-block separable if and only if there holds

(𝐀[j])𝐀[k]=𝟎,for allj,kdandjk.(\mathbf{A}_{[j]})^{\top}\mathbf{A}_{[k]}=\mathbf{0},\ \mbox{for all}\ j,k\in\mathbb{N}_{d}\ \mbox{and}\ j\neq{k}. (39)

It follows from the decomposition of 𝐀\mathbf{A} and that of each vector 𝐮\mathbf{u} in n\mathbb{R}^{n} with respect to 𝒮n,d\mathcal{S}_{n,d} that 𝐀𝐮=jd𝐀[j]𝐮j\mathbf{A}\mathbf{u}=\sum_{j\in\mathbb{N}_{d}}\mathbf{A}_{[j]}\mathbf{u}_{j}, for all 𝐮n\mathbf{u}\in\mathbb{R}^{n}. According to this equation and condition (39), we represent 𝝍\bm{\psi} defined by (31) as in (34) with 𝝍j\bm{\psi}_{j}, jdj\in\mathbb{N}_{d}, being defined by

𝝍j(𝐮j):=12𝐀[j]𝐮j22𝐱𝐀[j]𝐮j+12d𝐱𝐱,𝐮jmj.\bm{\psi}_{j}(\mathbf{u}_{j}):=\frac{1}{2}\|\mathbf{A}_{[j]}\mathbf{u}_{j}\|_{2}^{2}-\mathbf{x}^{\top}\mathbf{A}_{[j]}\mathbf{u}_{j}+\frac{1}{2d}\mathbf{x}^{\top}\mathbf{x},\ \mathbf{u}_{j}\in\mathbb{R}^{m_{j}}. (40)

To describe the block separability of functions 𝝍j\bm{\psi}_{j}, jdj\in\mathbb{N}_{d}, we recall that for each jdj\in\mathbb{N}_{d}, 𝒮mj,qj:={Smj,1,Smj,2,,Smj,qj}\mathcal{S}_{m_{j},q_{j}}:=\left\{S_{m_{j},1},S_{m_{j},2},\ldots,S_{m_{j},q_{j}}\right\} is a partition of mj\mathbb{N}_{m_{j}}. Associated with the partition 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}, matrix 𝐀[j]\mathbf{A}_{[j]} can be decomposed into qjq_{j} sub-matrices by setting

𝐀[j,k]:=[(𝐀[j])(i):iSmj,k]t×sj,k,kqj.\mathbf{A}_{[j,k]}:=[(\mathbf{A}_{[j]})_{(i)}:i\in S_{m_{j},k}]\in\mathbb{R}^{t\times s_{j,k}},\ k\in\mathbb{N}_{q_{j}}.

It is clear that the last two terms in the right hand side of equation (40) are both 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block separable. Hence, again by lemma 3.4 of [17], we conclude that the functions 𝝍j\bm{\psi}_{j} with the form (40) is 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block separable if and only if there holds

(𝐀[j,k])𝐀[j,l]=𝟎,for allk,lqjandkl.(\mathbf{A}_{[j,k]})^{\top}\mathbf{A}_{[j,l]}=\mathbf{0},\ \mbox{for all}\ k,l\in\mathbb{N}_{q_{j}}\ \mbox{and}\ k\neq{l}. (41)

We represent 𝝍j\bm{\psi}_{j}, jdj\in\mathbb{N}_{d}, as in (36) when condition (41) holds. For each jdj\in\mathbb{N}_{d}, the decomposition of 𝐀[j]\mathbf{A}_{[j]} and that of each vector 𝐮j\mathbf{u}_{j} in mj\mathbb{R}^{m_{j}} with respect to 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}} lead to 𝐀[j]𝐮j=kqj𝐀[j,k]𝐮j,k\mathbf{A}_{[j]}\mathbf{u}_{j}=\sum_{k\in\mathbb{N}_{q_{j}}}\mathbf{A}_{[j,k]}\mathbf{u}_{j,k}, for all 𝐮jmj\mathbf{u}_{j}\in\mathbb{R}^{m_{j}}. Substituting the above equation into definition (40) with noting that condition (41) holds, we represent 𝝍j\bm{\psi}_{j} as in (36) with 𝝍j,k\bm{\psi}_{j,k}, kqjk\in\mathbb{N}_{q_{j}}, having the form

𝝍j,k(𝐮j,k):=12𝐀[j,k]𝐮j,k22𝐱𝐀[j,k]𝐮j,k+12dqj𝐱𝐱,𝐮j,ksj,k.\bm{\psi}_{j,k}(\mathbf{u}_{j,k}):=\frac{1}{2}\|\mathbf{A}_{[j,k]}\mathbf{u}_{j,k}\|_{2}^{2}-\mathbf{x}^{\top}\mathbf{A}_{[j,k]}\mathbf{u}_{j,k}+\frac{1}{2dq_{j}}\mathbf{x}^{\top}\mathbf{x},\ \mathbf{u}_{j,k}\in\mathbb{R}^{s_{j,k}}. (42)

We now apply Theorem 4.3 to the regularization problem (30) when the matrix 𝐀\mathbf{A} satisfies conditions (39) and (41).

Corollary 4.4.

Suppose that 𝐱t\mathbf{x}\in\mathbb{R}^{t} and 𝐀t×n\mathbf{A}\in\mathbb{R}^{t\times n} satisfies conditions (39) and (41). Then the regularization problem (30) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j\mathbf{u}^{*}_{j} having the 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block sparsity of level ljljl^{\prime}_{j}\leq l_{j} for some ljqj+1l_{j}\in\mathbb{Z}_{q_{j}+1} if and only if for each jdj\in\mathbb{N}_{d}, there exist distinct kj,iqjk_{j,i}\in\mathbb{N}_{q_{j}}, ilji\in\mathbb{N}_{l_{j}}, such that λj(𝐀[j,k])𝐱\lambda_{j}\geq\big{\|}(\mathbf{A}_{[j,k]})^{\top}\mathbf{x}\big{\|}_{\infty}, for all kqj{kj,i:ilj}k\in\mathbb{N}_{q_{j}}\setminus{\{k_{j,i}:i\in\mathbb{N}_{l_{j}}\}}.

Proof.

As pointed out before, condition (39) ensures that the fidelity term 𝝍\bm{\psi} defined by (31) is 𝒮n,d\mathcal{S}_{n,d}-block separable and has the form (34) with 𝝍j\bm{\psi}_{j}, jdj\in\mathbb{N}_{d}, being defined by (40). Moreover, condition (41) guarantees that for each jdj\in\mathbb{N}_{d}, the function 𝝍j\bm{\psi}_{j} is 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block separable and can be represented as in (36) with 𝝍j,k\bm{\psi}_{j,k}, kqjk\in\mathbb{N}_{q_{j}}, having the form (42). Clearly, 𝝍j,k,\bm{\psi}_{j,k}, jdj\in\mathbb{N}_{d}, kqjk\in\mathbb{N}_{q_{j}}, are all convex and differentiable functions. Consequently, we conclude by Theorem 4.3 that the regularization problem (30) with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d}, has a solution 𝐮\mathbf{u}^{*} with for each jdj\in\mathbb{N}_{d}, 𝐮j\mathbf{u}^{*}_{j} having the 𝒮mj,qj\mathcal{S}_{m_{j},q_{j}}-block sparsity of level ljljl^{\prime}_{j}\leq l_{j} for some ljqj+1l_{j}\in\mathbb{Z}_{q_{j}+1} if and only if for each jdj\in\mathbb{N}_{d} there exist distinct kj,iqjk_{j,i}\in\mathbb{N}_{q_{j}}, ilji\in\mathbb{N}_{l_{j}}, such that inequality (38) holds. Note that for each jdj\in\mathbb{N}_{d}, 𝝍j,k(𝟎)=(𝐀[j,k])𝐱\nabla\bm{\psi}_{j,k}(\mathbf{0})=-(\mathbf{A}_{[j,k]})^{\top}\mathbf{x} for all kqjk\in\mathbb{N}_{q_{j}}. Substituting this equation into inequality (38) leads directly to the desired inequality. ∎

5 Iterative schemes for parameter choices

Theorem 3.3 characterizes the influence of each regularization parameter λj\lambda_{j} on the sparsity of the solution to problem (1) under the transform matrix 𝐁j\mathbf{B}_{j}. Based on this characterization, we develop iterative schemes in this section for selecting multiple regularization parameters that achieve prescribed sparsity levels in the solution of problem (1) under different transform matrices. We consider two cases: when the fidelity term term 𝝍\bm{\psi} is differentiable and the transform matrix 𝐁\mathbf{B} has full row rank, as well as when 𝝍\bm{\psi} is non-differentiable and 𝐁\mathbf{B} does not have full row rank.

Theorem 3.3 shows that if 𝐮n\mathbf{u}^{*}\in\mathbb{R}^{n} is a solution of problem (1) with λj>0,\lambda_{j}^{*}>0, jd,j\in\mathbb{N}_{d}, and for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljmj+1l_{j}^{*}\in\mathbb{Z}_{m_{j}+1} under 𝐁j\mathbf{B}_{j}, then there exist 𝐚𝝍(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}) and 𝐛:=[bj:jpd]𝒩(𝐁)\mathbf{b}:=[b_{j}:j\in\mathbb{N}_{p_{d}}]\in\mathcal{N}(\mathbf{B}^{\top}) such that for each jdj\in\mathbb{N}_{d}, λj\lambda_{j}^{*} satisfies conditions (17) and (18). According to these conditions, we introduce for each jdj\in\mathbb{N}_{d}, a sequence γj,i(𝐮),\gamma_{j,i}(\mathbf{u}^{*}), imji\in\mathbb{N}_{m_{j}} by

γj,i(𝐮):=|(𝐁(pj1+i))𝐚+bpj1+i|,imj,\gamma_{j,i}(\mathbf{u}^{*}):=\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\mathbf{a}+b_{p_{j-1}+i}\right|,\ i\in\mathbb{N}_{m_{j}}, (43)

and rearrange them in a nondecreasing order:

γj,i1(𝐮)γj,i2(𝐮)γj,imj(𝐮),with{i1,i2,,imj}=mj.\gamma_{j,i_{1}}(\mathbf{u}^{*})\leq\gamma_{j,i_{2}}(\mathbf{u}^{*})\leq\cdots\leq\gamma_{j,i_{m_{j}}}(\mathbf{u}^{*}),\ \mbox{with}\ \{i_{1},i_{2},\ldots,i_{m_{j}}\}=\mathbb{N}_{m_{j}}. (44)

The equality (17) and the inequality (18) that the parameter λj\lambda_{j}^{*} needs to satisfy corresponds the non-zero components and the zero components of 𝐁j𝐮\mathbf{B}_{j}\mathbf{u}^{*}, respectively. Thus, if λj>γj,i(𝐮)\lambda_{j}^{*}>\gamma_{j,i}(\mathbf{u}^{*}), then (𝐁j𝐮)i(\mathbf{B}_{j}\mathbf{u}^{*})_{i} must be zero and if λj=γj,i(𝐮)\lambda_{j}^{*}=\gamma_{j,i}(\mathbf{u}^{*}), then (𝐁j𝐮)i(\mathbf{B}_{j}\mathbf{u}^{*})_{i} may be zero or nonzero. With the help of the observation above, we present the following result.

Theorem 5.1.

Let 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} be a convex function, for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} be an mj×nm_{j}\times n matrix and 𝐁\mathbf{B} be defined by (8). Suppose that 𝐮\mathbf{u}^{*} is a solution of problem (1) with λj>0,\lambda_{j}^{*}>0, jdj\in\mathbb{N}_{d}, and for each jdj\in\mathbb{N}_{d}, γj,i(𝐮),\gamma_{j,i}(\mathbf{u}^{*}), imji\in\mathbb{N}_{m_{j}}, defined by (43), are ordered as in (44). Then the following statements hold true.

(a) If for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljmj+1l_{j}^{*}\in\mathbb{Z}_{m_{j}+1} under 𝐁j\mathbf{B}_{j}, then for each jdj\in\mathbb{N}_{d}, λj\lambda_{j}^{*} satisfies

γj,i1(𝐮)γj,imjlj(𝐮)λj=γj,imjlj+1(𝐮)==γj,imj(𝐮).\gamma_{j,i_{1}}(\mathbf{u}^{*})\leq\cdots\leq\gamma_{j,i_{m_{j}-l^{*}_{j}}}(\mathbf{u}^{*})\leq\lambda_{j}^{*}=\gamma_{j,i_{m_{j}-l^{*}_{j}+1}}(\mathbf{u}^{*})=\cdots=\gamma_{j,i_{m_{j}}}(\mathbf{u}^{*}). (45)

(b) If for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljmj+1l_{j}^{*}\in\mathbb{Z}_{m_{j}+1} under 𝐁j\mathbf{B}_{j}, then for each jdj\in\mathbb{N}_{d}, there exists ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} with ljljl_{j}\geq l_{j}^{*} such that λj\lambda_{j}^{*} satisfies

γj,i1(𝐮)γj,imjlj(𝐮)<λj=γj,imjlj+1(𝐮)==γj,imj(𝐮).\gamma_{j,i_{1}}(\mathbf{u}^{*})\leq\cdots\leq\gamma_{j,i_{m_{j}-l_{j}}}(\mathbf{u}^{*})<\lambda_{j}^{*}=\gamma_{j,i_{m_{j}-l_{j}+1}}(\mathbf{u}^{*})=\cdots=\gamma_{j,i_{m_{j}}}(\mathbf{u}^{*}). (46)

(c) If for each jdj\in\mathbb{N}_{d}, there exists ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1} such that λj\lambda_{j}^{*} satisfies inequality (46), then for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljljl_{j}^{*}\leq l_{j} under 𝐁j\mathbf{B}_{j}.

Proof.

We first prove Item (a). If 𝐮\mathbf{u}^{*} is a solution of problem (1) with λj>0,\lambda_{j}^{*}>0, jdj\in\mathbb{N}_{d}, and for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljmj+1l_{j}^{*}\in\mathbb{Z}_{m_{j}+1} under 𝐁j\mathbf{B}_{j}, then for each jdj\in\mathbb{N}_{d}, the parameter λj\lambda_{j}^{*}, guaranteed by Theorem 3.3, satisfies equality (17) and inequality (18). Noting that the subset {ki:ilj}\{k_{i}:i\in\mathbb{N}_{l_{j}^{*}}\} of mj\mathbb{N}_{m_{j}} has the cardinality ljl_{j}^{*}, there are exactly ljl_{j}^{*} elements of {γj,i(𝐮):imj}\{\gamma_{j,i}(\mathbf{u}^{*}):i\in\mathbb{N}_{m_{j}}\} equal to λj\lambda_{j}^{*} and the remaining mjljm_{j}-l_{j}^{*} elements less than or equal to λj\lambda_{j}^{*}. This together with the order of γj,i(𝐮)\gamma_{j,i}(\mathbf{u}^{*}), imji\in\mathbb{N}_{m_{j}} as in (44) leads to the desired inequality (45).

We next verify Item (b). As has been shown in Item (a), for each jdj\in\mathbb{N}_{d}, λj\lambda_{j}^{*} satisfies inequality (45). If there is no element of the sequence {γj,i(𝐮):imj}\{\gamma_{j,i}(\mathbf{u}^{*}):i\in\mathbb{N}_{m_{j}}\} being smaller than λj\lambda_{j}^{*}, then inequality (45) reduces to λj=γj,ik(𝐮)\lambda^{*}_{j}=\gamma_{j,i_{k}}(\mathbf{u}^{*}), kmjk\in\mathbb{N}_{m_{j}}. We then get inequality (46) with lj:=mjl_{j}:=m_{j}. Otherwise, we choose kjmjljk_{j}\in\mathbb{N}_{m_{j}-l^{*}_{j}} such that γj,ikj(𝐮)<λj=γj,ikj+1(𝐮)\gamma_{j,i_{k_{j}}}(\mathbf{u}^{*})<\lambda_{j}^{*}=\gamma_{j,i_{k_{j}+1}}(\mathbf{u}^{*}). We then rewrite inequality (45) as inequality (46) with lj:=mjkjl_{j}:=m_{j}-k_{j}. It is clear that ljmj(mjlj)=ljl_{j}\geq m_{j}-(m_{j}-l_{j}^{*})=l_{j}^{*}.

It remains to show Item (c). If lj=mj,l_{j}=m_{j}, clearly, the sparsity level ljl^{*}_{j} of 𝐮\mathbf{u}^{*} under 𝐁j\mathbf{B}_{j} satisfies ljljl_{j}^{*}\leq l_{j}. We now consider the case when lj<mjl_{j}<m_{j}. According to Theorem 3.3, the relation γj,i1(𝐮)γj,imjlj(𝐮)<λj\gamma_{j,i_{1}}(\mathbf{u}^{*})\leq\cdots\leq\gamma_{j,i_{m_{j}-l_{j}}}(\mathbf{u}^{*})<\lambda_{j}^{*} leads to (𝐁j𝐮)ik=0,(\mathbf{B}_{j}\mathbf{u}^{*})_{i_{k}}=0, for all kmjljk\in\mathbb{N}_{m_{j}-l_{j}}. Hence, 𝐁j𝐮\mathbf{B}_{j}\mathbf{u}^{*} has at least mjljm_{j}-l_{j} zero components. In other words, the number of nonzero components of 𝐁j𝐮\mathbf{B}_{j}\mathbf{u}^{*} is at most ljl_{j}, that is, 𝐮\mathbf{u}^{*} has sparsity of level ljljl_{j}^{*}\leq l_{j} under 𝐁j\mathbf{B}_{j}. ∎

Our goal is to find regularization parameters λj\lambda^{*}_{j}, jdj\in\mathbb{N}_{d} that ensures the resulting solution 𝐮\mathbf{u}^{*} of problem (1) achieves a prescribed sparsity level ljl^{*}_{j} under each 𝐁j\mathbf{B}_{j}. According to Item (a) of Theorem 5.1, for each jdj\in\mathbb{N}_{d}, the parameter λj\lambda_{j}^{*} satisfies inequality (45). Since the sequence {γj,i(𝐮):imj}\{\gamma_{j,i}(\mathbf{u}^{*}):i\in\mathbb{N}_{m_{j}}\} depends on the corresponding solution, inequality (45) can not be used directly as a parameter choice strategy. Instead, it motivates us to propose an iterative scheme. The iteration begins with initial regularization parameters λj0\lambda_{j}^{0}, jdj\in\mathbb{N}_{d} which are large enough so that for each jdj\in\mathbb{N}_{d}, the sparsity level lj0l_{j}^{0} of the corresponding solution 𝐮0\mathbf{u}^{0} of problem (1) under 𝐁j\mathbf{B}_{j} is smaller than the given target sparsity level ljl^{*}_{j}. Suppose that at step kk, we have λjk\lambda_{j}^{k}, jdj\in\mathbb{N}_{d} and the corresponding solution 𝐮k\mathbf{u}^{k} with the sparsity level ljk<ljl_{j}^{k}<l^{*}_{j}, jdj\in\mathbb{N}_{d} under the transform matrices 𝐁j\mathbf{B}_{j}, jdj\in\mathbb{N}_{d}, respectively. Item (a) of Theorem 5.1 ensures that for each jdj\in\mathbb{N}_{d}, parameter λjk\lambda_{j}^{k} satisfies

γj,i1k(𝐮k)γj,imjljkk(𝐮k)λjk=γj,imjljk+1k(𝐮k)==γj,imjk(𝐮k).\gamma_{j,i^{k}_{1}}(\mathbf{u}^{k})\leq\cdots\leq\gamma_{j,i^{k}_{m_{j}-l^{k}_{j}}}(\mathbf{u}^{k})\leq\lambda_{j}^{k}=\gamma_{j,i^{k}_{m_{j}-l^{k}_{j}+1}}(\mathbf{u}^{k})=\cdots=\gamma_{j,i^{k}_{m_{j}}}(\mathbf{u}^{k}). (47)

We choose parameter λjk+1\lambda_{j}^{k+1} at step k+1k+1 from the elements among the ordered sequence in (47). Motivated by inequality (45), we choose λjk+1\lambda_{j}^{k+1} as the (mjlj+1)(m_{j}-l^{*}_{j}+1)-th element of the ordered sequence in (47), that is,

λjk+1:=γj,imjlj+1k(𝐮k).\lambda_{j}^{k+1}:=\gamma_{j,i^{k}_{m_{j}-l^{*}_{j}+1}}(\mathbf{u}^{k}). (48)

As a result, for each jdj\in\mathbb{N}_{d}, parameter λjk+1\lambda_{j}^{k+1} satisfies

γj,i1k(𝐮k)γj,imjljk(𝐮k)λjk+1\displaystyle\gamma_{j,i^{k}_{1}}(\mathbf{u}^{k})\leq\cdots\leq\gamma_{j,i^{k}_{m_{j}-l^{*}_{j}}}(\mathbf{u}^{k})\leq\lambda_{j}^{k+1} =γj,imjlj+1k(𝐮k)\displaystyle=\gamma_{j,i^{k}_{m_{j}-l^{*}_{j}+1}}(\mathbf{u}^{k})
γj,imjljk+1k(𝐮k)==γj,imjk(𝐮k).\displaystyle\leq\cdots\leq\gamma_{j,i^{k}_{m_{j}-l^{k}_{j}+1}}(\mathbf{u}^{k})=\cdots=\gamma_{j,i^{k}_{m_{j}}}(\mathbf{u}^{k}). (49)

Below, we claim that if the algorithm is convergent, then the parameters obtained by this algorithm satisfy inequality (45) of Theorem 5.1 which is a necessary condition for the resulting solution to have the given target sparsity levels. To this end, we state the following assumptions about the convergence of the algorithm. The convergence analysis of the proposed algorithm for choosing the multiple regularization parameters will be our future research projects.

(A1) For each jdj\in\mathbb{N}_{d}, the sequence ljkl_{j}^{k}, kk\in\mathbb{N}, generated by the iteration scheme satisfies that ljkljl_{j}^{k}\leq l^{*}_{j} for all kk\in\mathbb{N}.

(A2) For each jdj\in\mathbb{N}_{d}, the sequences λjk\lambda_{j}^{k}, kk\in\mathbb{N} generated by the iteration scheme satisfies that λjkλj\lambda_{j}^{k}\rightarrow\lambda_{j} as k+k\rightarrow+\infty for some λj>0\lambda_{j}>0.

(A3) The solution 𝐮n\mathbf{u}\in\mathbb{R}^{n} of problem (1) with λj\lambda_{j}, jdj\in\mathbb{N}_{d} satisfies that for each jdj\in\mathbb{N}_{d} and each smjs\in\mathbb{N}_{m_{j}}, γj,isk(𝐮k)γj,is(𝐮)\gamma_{j,i^{k}_{s}}(\mathbf{u}^{k})\rightarrow\gamma_{j,i_{s}}(\mathbf{u}) as k+k\rightarrow+\infty.

Proposition 5.2.

If assumptions (A1), (A2) and (A3) hold, then for each jdj\in\mathbb{N}_{d}, λj\lambda_{j} satisfies

γj,i1(𝐮)γj,imjlj(𝐮)λj=γj,imjlj+1(𝐮)==γj,imj(𝐮).\gamma_{j,i_{1}}(\mathbf{u})\leq\cdots\leq\gamma_{j,i_{m_{j}-l^{*}_{j}}}(\mathbf{u})\leq\lambda_{j}=\gamma_{j,i_{m_{j}-l^{*}_{j}+1}}(\mathbf{u})=\cdots=\gamma_{j,i_{m_{j}}}(\mathbf{u}). (50)
Proof.

Note that assumption (A1) allows us to choose parameter λjk+1\lambda_{j}^{k+1} as in (48) at step k+1k+1. This together with inequality (47) leads to inequality (5). By taking k+k\rightarrow+\infty on each item of inequality (5) and assumptions (A2) and (A3), we get that

γj,i1(𝐮)γj,imjlj(𝐮)λj=γj,imjlj+1(𝐮)γj,imj(𝐮).\gamma_{j,i_{1}}(\mathbf{u})\leq\cdots\leq\gamma_{j,i_{m_{j}-l^{*}_{j}}}(\mathbf{u})\leq\lambda_{j}=\gamma_{j,i_{m_{j}-l^{*}_{j}+1}}(\mathbf{u})\leq\cdots\leq\gamma_{j,i_{m_{j}}}(\mathbf{u}). (51)

It follows from inequality (47) that λjk=γj,imjk(𝐮k)\lambda_{j}^{k}=\gamma_{j,i^{k}_{m_{j}}}(\mathbf{u}^{k}) for all kk\in\mathbb{N}. Taking k+k\rightarrow+\infty on both sides of this equation, we get that λj=γj,imj(𝐮).\lambda_{j}=\gamma_{j,i_{m_{j}}}(\mathbf{u}). This together with inequality (51) leads to the desired inequality (50). ∎

We note that three issues need to be considered in the iterative scheme. First, if λjk=γj,imjlj+1k(𝐮k)\lambda_{j}^{k}=\gamma_{j,i^{k}_{m_{j}-l^{*}_{j}+1}}(\mathbf{u}^{k}) for some kk\in\mathbb{N} and jdj\in\mathbb{N}_{d}, the choice of λjk+1\lambda_{j}^{k+1} as in (48) leads to λjk+1=λjk\lambda_{j}^{k+1}=\lambda_{j}^{k} and thus is invalid. To address this issue, we choose λjk+1\lambda^{k+1}_{j}, motivated by inequality (46), among the sequence {γj,i(𝐮k):γj,i(𝐮k)<λjk}\{\gamma_{j,i}(\mathbf{u}^{k}):\gamma_{j,i}(\mathbf{u}^{k})<\lambda_{j}^{k}\}. We set Γj(𝐮k):=max{γj,i(𝐮k):γj,i(𝐮k)<λjk}\Gamma_{j}(\mathbf{u}^{k}):=\max\{\gamma_{j,i}(\mathbf{u}^{k}):\gamma_{j,i}(\mathbf{u}^{k})<\lambda_{j}^{k}\} and choose λjk+1\lambda^{k+1}_{j} as

λjk+1:=min{γj,imjlj+1k(𝐮k),Γj(𝐮k)}.\lambda^{k+1}_{j}:=\min\{\gamma_{j,i^{k}_{m_{j}-l^{*}_{j}+1}}(\mathbf{u}^{k}),\Gamma_{j}(\mathbf{u}^{k})\}. (52)

Second, assumption (A1) may not always hold true. If ljk>ljl_{j}^{k}>l^{*}_{j} for some kk\in\mathbb{N} and jdj\in\mathbb{N}_{d}, this indicates that λjk\lambda_{j}^{k} is too small and thus we should choose λjk+1\lambda_{j}^{k+1} at step k+1k+1 greater than λjk\lambda_{j}^{k}. Since as shown in inequality (47), all the elements γj,isk(𝐮k)\gamma_{j,i^{k}_{s}}(\mathbf{u}^{k}), smjs\in\mathbb{N}_{m_{j}}, are less than or equal to λjk\lambda_{j}^{k}, the choice (52) cannot provide a desired parameter greater than λjk\lambda_{j}^{k}. In this case, we should go back to the sequence in step k1k-1 to choose an appropriate parameter λjk\lambda_{j}^{k}. Finally, due to the interplay between the multiple regularization parameters, we do not require exact match of sparsity levels and instead, we allow them to have a tolerance error. For each jdj\in\mathbb{N}_{d}, let ljl_{j} denote the sparsity level of a solution of problem (1) under 𝐁j\mathbf{B}_{j}. With a given tolerance ϵ>0\epsilon>0, we say that the solution achieves target sparsity levels ljl^{*}_{j}, jdj\in\mathbb{N}_{d} if

jd|ljlj|ϵ.\sum_{j\in\mathbb{N}_{d}}|l_{j}-l_{j}^{*}|\leq\epsilon.

At step k+1k+1 of the iterative scheme, the parameter λjk+1\lambda_{j}^{k+1} is chosen from the elements among the ordered sequence in (47). To compute γj,i(𝐮k)\gamma_{j,i}(\mathbf{u}^{k}), imji\in\mathbb{N}_{m_{j}}, according to definition (43), we need to obtain the solution 𝐮k\mathbf{u}^{k} of problem (1) with λjk>0,\lambda_{j}^{k}>0, jd,j\in\mathbb{N}_{d}, and then determine the vectors 𝐚𝝍(𝐮k)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{k}) and 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) in Theorem 3.3.

We first consider the case when 𝝍\bm{\psi} is differentiable and 𝐁\mathbf{B} has full row rank. In this case, the subdifferential of 𝝍\bm{\psi} at 𝐮k\mathbf{u}^{k} is the singleton 𝝍(𝐮k)\nabla\bm{\psi}(\mathbf{u}^{k}) and 𝒩(𝐁)={𝟎}\mathcal{N}(\mathbf{B}^{\top})=\{\mathbf{0}\}. That is, 𝐚=𝝍(𝐮k)\mathbf{a}=\nabla\bm{\psi}(\mathbf{u}^{k}) and 𝐛=𝟎\mathbf{b}=\mathbf{0}. Accordingly, for each jdj\in\mathbb{N}_{d}, the sequence γj,i(𝐮k)\gamma_{j,i}(\mathbf{u}^{k}), imji\in\mathbb{N}_{m_{j}}, has the form

γj,i(𝐮k):=|(𝐁(pj1+i))𝝍(𝐮k)|,imj.\gamma_{j,i}(\mathbf{u}^{k}):=\left|(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{k})\right|,\ i\in\mathbb{N}_{m_{j}}.

As a result, we merely need to solve problem (1) with λjk>0,\lambda_{j}^{k}>0, jd,j\in\mathbb{N}_{d}, to obtain the solution 𝐮k\mathbf{u}^{k}. This can be done by employing the Fixed Point Proximity Algorithm (FPPA) which was developed in [2, 14, 20].

We describe the FPPA as follows. Let 𝕊+s\mathbb{S}_{+}^{s} denote the set of symmetric and positive definite matrices. For 𝐇𝕊+s\mathbf{H}\in\mathbb{S}_{+}^{s}, we define the weighted inner product of 𝐱\mathbf{x}, 𝐲s\mathbf{y}\in\mathbb{R}^{s} by 𝐱,𝐲𝐇:=𝐱,𝐇𝐲\langle\mathbf{x},\mathbf{y}\rangle_{\mathbf{H}}:=\langle\mathbf{x},\mathbf{H}\mathbf{y}\rangle and the weighted 2\ell_{2}-norm of 𝐱s\mathbf{x}\in\mathbb{R}^{s} by 𝐱𝐇:=𝐱,𝐱𝐇1/2\|\mathbf{x}\|_{\mathbf{H}}:=\langle\mathbf{x},\mathbf{x}\rangle_{\mathbf{H}}^{1/2}. Suppose that f:s¯f:\mathbb{R}^{s}\to\overline{\mathbb{R}} is a convex function, with dom(f).\mathrm{dom}(f)\neq{\emptyset}. The proximity operator proxf,𝐇:ss\text{prox}_{f,\mathbf{H}}:\mathbb{R}^{s}\to\mathbb{R}^{s} of ff with respect to 𝐇𝕊+s\mathbf{H}\in\mathbb{S}_{+}^{s} is defined for 𝐰s\mathbf{w}\in\mathbb{R}^{s} by

proxf,𝐇(𝐰):=argmin{12𝐮𝐰𝐇2+f(𝐮):𝐮s}.\text{prox}_{f,\mathbf{H}}(\mathbf{w}):=\operatorname*{argmin}\left\{\frac{1}{2}\|\mathbf{u}-\mathbf{w}\|_{\mathbf{H}}^{2}+f(\mathbf{u}):\mathbf{u}\in\mathbb{R}^{s}\right\}. (53)

In the case that 𝐇\mathbf{H} coincides with the s×ss\times s identity matrix 𝐈s\mathbf{I}_{s}, proxf,𝐈s\text{prox}_{f,\mathbf{I}_{s}} will be abbreviated as proxf\text{prox}_{f}. Suppose that Φ:s¯\Phi:\mathbb{R}^{s}\to\overline{\mathbb{R}} and Ψ:t¯\Psi:\mathbb{R}^{t}\to\overline{\mathbb{R}} are two convex functions which may not be differentiable, and 𝐂t×s\mathbf{C}\in\mathbb{R}^{t\times s}. The optimization problem

min{Φ(𝐮)+Ψ(𝐂𝐮):𝐮s}\min\{\Phi(\mathbf{u})+\Psi(\mathbf{C}\mathbf{u}):\mathbf{u}\in\mathbb{R}^{s}\}

can be solved by the FPPA: For given positive constants α\alpha, ρ\rho and initial points 𝐮0s\mathbf{u}^{0}\in\mathbb{R}^{s}, 𝐯0t\mathbf{v}^{0}\in\mathbb{R}^{t},

{𝐮k=proxαΦ(𝐮k1α𝐂𝐯k1),𝐯k=ρ(𝐈tprox1ρΨ)(1ρ𝐯k1+𝐂(2𝐮k𝐮k1)).\left\{\begin{array}[]{l}\mathbf{u}^{k}=\operatorname{prox}_{\alpha\Phi}\left(\mathbf{u}^{k-1}-\alpha\mathbf{C}^{\top}\mathbf{v}^{k-1}\right),\\ \mathbf{v}^{k}=\rho\left(\mathbf{I}_{t}-\operatorname{prox}_{\frac{1}{\rho}\Psi}\right)\left(\frac{1}{\rho}\mathbf{v}^{k-1}+\mathbf{C}\left(2\mathbf{u}^{k}-\mathbf{u}^{k-1}\right)\right).\end{array}\right. (54)

In Algorithm (54), positive constants α\alpha and ρ\rho may be selected to satisfy αρ<1/𝐂22\alpha\rho<1/\|\mathbf{C}\|_{2}^{2} so that the algorithm converges. When 𝝍\bm{\psi} is differentiable and 𝐁\mathbf{B} has full row rank, in each step of the iterative algorithm for choosing the multiple regularization parameters, we solve problem (1) by Algorithm (54) with specific choices of functions Φ\Phi, Ψ\Psi and matrix 𝐂\mathbf{C}.

We summarize the iterative scheme for choosing the multiple regularization parameters when 𝝍\bm{\psi} is differentiable and 𝐁\mathbf{B} has full row rank in Algorithm 1. Numerical experiments to be presented in Section 6 demonstrate the effectiveness of this algorithm in identifying the desired regularization parameters.

1
2
Input: 𝐁\mathbf{B}, 𝝍\bm{\psi}, {lj:jd}\{l_{j}^{*}:j\in\mathbb{N}_{d}\}, ϵ\epsilon.
3
Initialization: Choose {λj0:jd}\{\lambda^{0}_{j}:j\in\mathbb{N}_{d}\} large enough that guarantees lj0ljl^{0}_{j}\leq l_{j}^{*} for all jdj\in\mathbb{N}_{d}.
4 for k=0,1,2,k=0,1,2,\ldots do
5      Solve (1) with λjk\lambda_{j}^{k}, jdj\in\mathbb{N}_{d} by Algorithm (54) and count the sparsity level ljkl^{k}_{j} of 𝐁j𝐮k\mathbf{B}_{j}\mathbf{u}^{k}.
6       if jd|ljklj|ϵ\sum_{j\in\mathbb{N}_{d}}|l_{j}^{k}-l_{j}^{*}|\leq\epsilon then
7             break
8      
9      for j=1,2,,dj=1,2,\ldots,d do
10             if ljk<ljl^{k}_{j}<l^{*}_{j} then
11                  Compute γj,i:=|(𝐁(pj1+i))𝝍(𝐮k)|\gamma_{j,i}:=\big{|}(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\nabla\bm{\psi}(\mathbf{u}^{k})\big{|}, imji\in\mathbb{N}_{m_{j}}.
12                   Sort: γj,i1γj,imj\gamma_{j,i_{1}}\leq\cdots\leq\gamma_{j,i_{m_{j}}} with {i1,i2,,imj}=mj\{i_{1},i_{2},\cdots,i_{m_{j}}\}=\mathbb{N}_{m_{j}}.
13                   Compute Γj:=max{γj,i:γj,i<λjk,imj}\Gamma_{j}:=\mathrm{max}\big{\{}\gamma_{j,i}:\gamma_{j,i}<\lambda^{k}_{j},i\in\mathbb{N}_{m_{j}}\big{\}}.
14                   Update λjk:=min{γj,imjlj+1,Γj}\lambda^{k}_{j}:=\min\big{\{}\gamma_{j,i_{m_{j}-{l_{j}^{*}}+1}},\Gamma_{j}\big{\}}.
15             else if ljk>ljl^{k}_{j}>l^{*}_{j} then
16                  Set sj=0s_{j}=0.
17                   for i=1,2,i=1,2,\ldots do
18                        Update sj:=sj+ljkljs_{j}:=s_{j}+l^{k}_{j}-l^{*}_{j}.
19                         Update λjk:=min{γj,imjlj+1+sj,Γj}\lambda_{j}^{k}:=\min\big{\{}\gamma_{j,i_{m_{j}-{l_{j}^{*}}+1+s_{j}}},\Gamma_{j}\big{\}}.
20                         Solve (1) with λjk\lambda^{k}_{j}, jdj\in\mathbb{N}_{d} by Algorithm (54), and count the sparsity level ljkl^{k}_{j} of 𝐁j𝐮k\mathbf{B}_{j}\mathbf{u}^{k}.
21                         if ljkljl^{k}_{j}\leq l_{j}^{*} then
22                              break
23                        
24                  
25            
26      Update λjk+1\lambda^{k+1}_{j} as λjk\lambda^{k}_{j} for all jdj\in\mathbb{N}_{d}.
27
Output: {λjk:jd}\{\lambda_{j}^{k}:j\in\mathbb{N}_{d}\}, 𝐮k\mathbf{u}^{k}.
Algorithm 1 Iterative scheme selecting multiple regularization parameters for problem (1) when 𝝍\bm{\psi} is differentiable and 𝐁\mathbf{B} has full row rank

We next consider the case when 𝝍\bm{\psi} is non-differentiable and 𝐁\mathbf{B} does not have full row rank. In this case, at each step of the iterative scheme, we should not only find the solution 𝐮k\mathbf{u}^{k} of problem (1) but also determine the vectors 𝐚𝝍(𝐮k)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{k}) and 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) satisfying (16), (17) and (18) simultaneously. To this end, we establish a characterization of vectors 𝐚\mathbf{a} and 𝐛\mathbf{b} by using a fixed-point formulation via the proximity operators of the functions appearing in the objective functions of problem (13).

We begin with recalling some useful results about the proximity operator. It is known [20] that the proximity operator of a convex function is intimately related to its subdifferential. Specifically, if ff is a convex function from s\mathbb{R}^{s} to ¯\overline{\mathbb{R}}, then for all 𝐱dom(f)\mathbf{x}\in\mathrm{dom}(f), 𝐲s\mathbf{y}\in\mathbb{R}^{s} and 𝐇𝕊+s\mathbf{H}\in\mathbb{S}_{+}^{s}

𝐇𝐲f(𝐱)if and only if𝐱=proxf,𝐇(𝐱+𝐲).\mathbf{Hy}\in\partial f(\mathbf{x})\ \text{if and only if}\ \mathbf{x}=\mathrm{prox}_{f,\mathbf{H}}(\mathbf{x}+\mathbf{y}). (55)

The conjugate function of a convex function f:s¯f:\mathbb{R}^{s}\to\overline{\mathbb{R}} is defined as f(𝐲):=sup{𝐱,𝐲f(𝐱):𝐱s}f^{*}(\mathbf{y}):=\sup\{\langle\mathbf{x},\mathbf{y}\rangle-f(\mathbf{x}):\mathbf{x}\in\mathbb{R}^{s}\} for all 𝐲s\mathbf{y}\in\mathbb{R}^{s}. There is a relation between the subdifferential of a convex function ff and that of its conjugate function ff^{*}. Specifically, for all 𝐱dom(f)\mathbf{x}\in\mathrm{dom}(f) and all 𝐲dom(f)\mathbf{y}\in\mathrm{dom}(f^{*}), there holds

𝐱f(𝐲)if and only if𝐲f(𝐱).\mathbf{x}\in\partial f^{*}(\mathbf{y})\ \text{if and only if}\ \mathbf{y}\in\partial f(\mathbf{x}). (56)

This leads to the relation between the proximity operators of ff and ff^{*}: proxf=𝐈sproxf.\mathrm{prox}_{f}=\mathbf{I}_{s}-\mathrm{prox}_{f^{*}}.

In the following proposition, we establish the fixed-point equation formulation of the solution of problem (13).

Proposition 5.3.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. Let 𝐁\mathbf{B} be defined as in (8). Then the following statements hold true.

(a) If 𝐰pd+nr\mathbf{w}\in\mathbb{R}^{p_{d}+n-r} is a solution of problem (13) with λj\lambda_{j}, jdj\in\mathbb{N}_{d}, then there exist vectors 𝐚n\mathbf{a}\in\mathbb{R}^{n} and 𝐜pd+nr\mathbf{c}\in\mathbb{R}^{p_{d}+n-r} satisfying

𝐰=proxjdλj1𝐈j,𝐎(𝐰𝐎1(𝐁)𝐚𝐎1𝐜),\ \quad\quad\quad\mathbf{w}=\mathrm{prox}_{\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j},\mathbf{O}}\left(\mathbf{w}-\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}\mathbf{a}-\mathbf{O}^{-1}\mathbf{c}\right), (57)
𝐚=prox𝝍,𝐏(𝐏1𝐁𝐰+𝐚),\mathbf{a}=\mathrm{prox}_{\bm{\psi}^{*},\mathbf{P}}(\mathbf{P}^{-1}\mathbf{B}^{\prime}\mathbf{w}+\mathbf{a}),\qquad\quad\qquad\quad (58)
𝐜=proxι𝕄,𝐐(𝐐1𝐰+𝐜).\mathbf{c}=\mathrm{prox}_{\iota_{\mathbb{M}}^{*},\mathbf{Q}}(\mathbf{Q}^{-1}\mathbf{w}+\mathbf{c}).\quad\quad\quad\qquad\qquad (59)

for any matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n} and 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r}.

(b) If there exist vectors 𝐰pd+nr\mathbf{w}\in\mathbb{R}^{p_{d}+n-r}, 𝐚n\mathbf{a}\in\mathbb{R}^{n}, 𝐜pd+nr\mathbf{c}\in\mathbb{R}^{p_{d}+n-r} and matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}, 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r} satisfying equations (57), (58) and (59), then 𝐰\mathbf{w} is a solution of problem (13) with λj\lambda_{j}, jdj\in\mathbb{N}_{d}.

Proof.

According to Fermat rule and the chain rule (15) of the subdifferential, we have that 𝐰pd+nr\mathbf{w}\in\mathbb{R}^{p_{d}+n-r} is a solution of problem (13) if and only if

𝟎(𝐁)𝝍(𝐁𝐰)+ι𝕄(𝐰)+(jdλj1𝐈j)(𝐰).\mathbf{0}\in(\mathbf{B}^{\prime})^{\top}\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w})+\partial\iota_{\mathbb{M}}(\mathbf{w})+\partial\left(\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}\right)(\mathbf{w}).

The latter is equivalent to that there exist 𝐚𝝍(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}) and 𝐜ι𝕄(𝐰)\mathbf{c}\in\partial\iota_{\mathbb{M}}(\mathbf{w}) such that

(𝐁)𝐚𝐜(jdλj1𝐈j)(𝐰).-(\mathbf{B}^{\prime})^{\top}\mathbf{a}-\mathbf{c}\in\partial\left(\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}\right)(\mathbf{w}). (60)

We first prove Item (a). If 𝐰pd+nr\mathbf{w}\in\mathbb{R}^{p_{d}+n-r} is a solution of problem (13), then there exist 𝐚n\mathbf{a}\in\mathbb{R}^{n} and 𝐜pd+nr\mathbf{c}\in\mathbb{R}^{p_{d}+n-r} satisfying inclusion relation (60), which further leads to 𝐎(𝐎1(𝐁)𝐚𝐎1𝐜)(jdλj1𝐈j)(𝐰)\mathbf{O}(-\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}\mathbf{a}-\mathbf{O}^{-1}\mathbf{c})\in\partial\left(\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}\right)(\mathbf{w}) for any 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}. Relation (55) ensures the equivalence between the inclusion relation above and equation (57). According to relation (56), we rewrite the inclusion relation 𝐚𝝍(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}) as 𝐁𝐰𝝍(𝐚)\mathbf{B}^{\prime}\mathbf{w}\in\partial\bm{\psi}^{*}(\mathbf{a}), which further leads to 𝐏(𝐏1𝐁𝐰)𝝍(𝐚)\mathbf{P}(\mathbf{P}^{-1}\mathbf{B}^{\prime}\mathbf{w})\in\partial\bm{\psi}^{*}(\mathbf{a}) for any 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}. This guaranteed by relation (55) is equivalent to equation (58). Again by relation (56), the inclusion relation 𝐜ι𝕄(𝐰)\mathbf{c}\in\partial\iota_{\mathbb{M}}(\mathbf{w}) can be rewritten as 𝐰ι𝕄(𝐜)\mathbf{w}\in\partial\iota_{\mathbb{M}}^{*}(\mathbf{c}). Hence, for any 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r}, we obtain that 𝐐(𝐐1𝐰)ι𝕄(𝐜)\mathbf{Q}(\mathbf{Q}^{-1}\mathbf{w})\in\partial\iota_{\mathbb{M}}^{*}(\mathbf{c}) which guaranteed by relation (55) is equivalent to equation (59).

We next verify Item (b). Suppose that vectors 𝐰pd+nr\mathbf{w}\in\mathbb{R}^{p_{d}+n-r}, 𝐚n\mathbf{a}\in\mathbb{R}^{n}, 𝐜pd+nr\mathbf{c}\in\mathbb{R}^{p_{d}+n-r} and matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}, 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r} satisfying equations (57), (58) and (59). As pointed out in the proof of Item (a), equations (58) and (59) are equivalent to inclusion relations 𝐚𝝍(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}) and 𝐜ι𝕄(𝐰)\mathbf{c}\in\partial\iota_{\mathbb{M}}(\mathbf{w}), respectively. Moreover, equation (57) are equivalent to inclusion relation (60). Consequently, we conclude that 𝐰\mathbf{w} is a solution of problem (13). ∎

Proposition 5.3 provides the fixed-point equation formulations not only for the solution of problem (13) but also for another two vectors 𝐚\mathbf{a} and 𝐜\mathbf{c}. It follows from Lemma 3.1 that the solutions of problems (1) and (13) are closely related. Below, we show that 𝐚\mathbf{a} and a subvector of 𝐜\mathbf{c} just coincide with the desired two vectors appearing in Theorem 3.3, respectively.

Theorem 5.4.

Suppose that 𝛙:n+\bm{\psi}:\mathbb{R}^{n}\to\mathbb{R}_{+} is a convex function and for each jdj\in\mathbb{N}_{d}, 𝐁j\mathbf{B}_{j} is an mj×nm_{j}\times n matrix. Let 𝐁\mathbf{B} be defined as in (8). Suppose that 𝐰:=[𝐳𝐯]pd+nr\mathbf{w}:=\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r} with 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r}, 𝐳pd\mathbf{z}\in\mathbb{R}^{p_{d}} and for each jd,j\in\mathbb{N}_{d}, 𝐳j=iljzpj1+ki𝐞mj,kiΩmj,lj\mathbf{z}_{j}=\sum_{i\in\mathbb{N}_{l_{j}}}z_{p_{j-1}+k_{i}}^{*}\mathbf{e}_{m_{j},k_{i}}\in\Omega_{m_{j},l_{j}} for some ljmj+1l_{j}\in\mathbb{Z}_{m_{j}+1}, and in addition 𝐚n\mathbf{a}\in\mathbb{R}^{n}, 𝐜:=[cj:jpd+nr]pd+nr\mathbf{c}:=[c_{j}:j\in\mathbb{N}_{p_{d}+n-r}]\in\mathbb{R}^{p_{d}+n-r}. If vectors 𝐰\mathbf{w}, 𝐚\mathbf{a} and 𝐜\mathbf{c} satisfy equations (57), (58) and (59) for some matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}, 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r}, then 𝐮:=𝐁𝐰\mathbf{u}^{*}:=\mathbf{B}^{\prime}\mathbf{w} is a solution of problem (1) with λj\lambda_{j}, jdj\in\mathbb{N}_{d} and for each jdj\in\mathbb{N}_{d}, 𝐮\mathbf{u}^{*} has sparsity of level ljl_{j} under 𝐁j\mathbf{B}_{j}. Moreover, 𝐚𝛙(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}) and 𝐛:=[cj:jpd]𝒩(𝐁)\mathbf{b}:=[c_{j}:j\in\mathbb{N}_{p_{d}}]\in\mathcal{N}(\mathbf{B}^{\top}) satisfy (16), (17) and (18).

Proof.

Item (b) of Proposition 5.3 ensures that if 𝐰\mathbf{w}, 𝐚\mathbf{a} and 𝐜\mathbf{c} satisfy equations (57), (58) and (59) for some matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}, 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r}, then 𝐰\mathbf{w} is a solution of problem (13) with λj\lambda_{j}, jdj\in\mathbb{N}_{d}. According to Lemma 3.1, we get that 𝐮:=1𝐰\mathbf{u}^{*}:=\mathcal{B}^{-1}\mathbf{w} is a solution of problem (1) with λj\lambda_{j}, jdj\in\mathbb{N}_{d}. By definition of mapping \mathcal{B}, we get that 𝐮=𝐁𝐰\mathbf{u}^{*}=\mathbf{B}^{\prime}\mathbf{w} and 𝐁𝐮=𝐳\mathbf{B}\mathbf{u}^{*}=\mathbf{z}. It follows from definition of matrix 𝐁\mathbf{B} that for each jd,j\in\mathbb{N}_{d}, 𝐁j𝐮=𝐳j\mathbf{B}_{j}\mathbf{u}^{*}=\mathbf{z}_{j}, which shows that 𝐮\mathbf{u}^{*} has sparsity of level ljl_{j} under the transform matrix 𝐁j\mathbf{B}_{j}.

It suffices to verify that 𝐚𝝍(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}), 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}) and there hold (16), (17) and (18). As pointed out in the proof of Item (a) of Proposition 5.3, equations (58) and (59) are equivalent to inclusion relations 𝐚𝝍(𝐁𝐰)\mathbf{a}\in\partial\bm{\psi}(\mathbf{B}^{\prime}\mathbf{w}) and 𝐜ι𝕄(𝐰)\mathbf{c}\in\partial\iota_{\mathbb{M}}(\mathbf{w}), respectively. By noting that 𝐮=𝐁𝐰\mathbf{u}^{*}=\mathbf{B}^{\prime}\mathbf{w}, we get that 𝐚𝝍(𝐮)\mathbf{a}\in\partial\bm{\psi}(\mathbf{u}^{*}). Recalling that ι𝕄(𝐰)=𝒩(𝐁)×{𝟎}\partial\iota_{\mathbb{M}}(\mathbf{w})=\mathcal{N}(\mathbf{B}^{\top})\times\{\mathbf{0}\} leads to 𝐛𝒩(𝐁)\mathbf{b}\in\mathcal{N}(\mathbf{B}^{\top}). Note that equation (57) are equivalent to inclusion relation (60). The latter holds if and only if inclusion relation (20) holds. As pointed out in the proof of Lemma 3.2, inclusion relation (20) yields that 𝐚\mathbf{a} and 𝐛\mathbf{b} satisfy (16), (17) and (18). ∎

Theorem 5.4 shows that obtaining the solution 𝐮\mathbf{u}^{*} of problem (1) and the vectors 𝐚\mathbf{a}, 𝐛\mathbf{b} satisfying (16), (17) and (18) can be done by solving fixed-point equations (57), (58) and (59). These three fixed-point equations are coupled together and they have to be solved simultaneously by iteration. It is convenient to write equations (57), (58) and (59) in a compact form. To this end, we utilize the three column vectors 𝐰\mathbf{w}, 𝐚\mathbf{a} and 𝐜\mathbf{c} to form a block column vector 𝐯2pd+3n2r\mathbf{v}\in\mathbb{R}^{2p_{d}+3n-2r} having 𝐰\mathbf{w}, 𝐚\mathbf{a} and 𝐜\mathbf{c} as its three blocks. That is, 𝐯:=[𝐰,𝐚,𝐜]\mathbf{v}^{\top}:=[\mathbf{w}^{\top},\mathbf{a}^{\top},\mathbf{c}^{\top}]. By integrating together the three proximity operators involved in equations (57), (58) and (59), we introduce an operator from 2pd+3n2r\mathbb{R}^{2p_{d}+3n-2r} to itself by

𝒫:=[proxjdλj1𝐈j,𝐎prox𝝍,𝐏proxι𝕄,𝐐].\mathcal{P}:=\begin{bmatrix}\mathrm{prox}_{\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j},\mathbf{O}}\\ \mathrm{prox}_{\bm{\psi}^{*},\mathbf{P}}\\ \mathrm{prox}_{\iota_{\mathbb{M}}^{*},\mathbf{Q}}\end{bmatrix}.

We also define a block matrix by

𝐄:=[𝐈pd+nr𝐎1(𝐁)𝐎1𝐏1𝐁𝐈n𝟎𝐐1𝟎𝐈pd+nr].\mathbf{E}:=\begin{bmatrix}&\mathbf{I}_{p_{d}+n-r}&-\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}&-\mathbf{O}^{-1}\\ &\mathbf{P}^{-1}\mathbf{B}^{\prime}&\mathbf{I}_{n}&\mathbf{0}&\\ &\mathbf{Q}^{-1}&\mathbf{0}&\mathbf{I}_{p_{d}+n-r}\end{bmatrix}.

In the above notions, we rewrite equations (57), (58) and (59) in the following compact form

𝐯=(𝒫𝐄)(𝐯).\mathbf{v}=(\mathcal{P}\circ\mathbf{E})(\mathbf{v}). (61)

Since equations (57), (58) and (59) are represented in the compact form (61), one may define the Picard iteration based on (61) to solve the fixed-point 𝐯\mathbf{v} of the operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}, that is

𝐯k+1=(𝒫𝐄)(𝐯k),k=0,1,\mathbf{v}^{k+1}=(\mathcal{P}\circ\mathbf{E})(\mathbf{v}^{k}),\ k=0,1,\cdots

When it converges, the Picard sequence 𝐯k\mathbf{v}^{k}, k=0,1,k=0,1,\cdots, generated by the Picard iteration above, converges to a fixed-point of the operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}. It is known [7] that the convergence of the Picard sequence requires the firmly non-expansiveness of the operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}. However, by arguments similar to those used in the proof of Lemmas 3.1 and 3.2 of [14], we can prove that the operator 𝒫𝐄\mathcal{P}\circ\mathbf{E} is not firmly non-expansive. We need to reformulate the fixed-point equation (61) by appropriately split the matrix 𝐄\mathbf{E} guided by the theory of the non-expansive map.

We describe the split of 𝐄\mathbf{E} as follows. Set 𝐑:=diag(𝐎,𝐏,𝐐)\mathbf{R}:=\text{diag}(\mathbf{O},\mathbf{P},\mathbf{Q}). By introducing three (2pd+3n2r)×(2pd+3n2r)(2p_{d}+3n-2r)\times(2p_{d}+3n-2r) matrices 𝐌0,𝐌1,𝐌2\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2} satisfying 𝐌0=𝐌1+𝐌2\mathbf{M}_{0}=\mathbf{M}_{1}+\mathbf{M}_{2}, we split the expansive matrix 𝐄\mathbf{E} as

𝐄=(𝐄𝐑1𝐌0)+𝐑1𝐌1+𝐑1𝐌2.\mathbf{E}=(\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})+\mathbf{R}^{-1}\mathbf{M}_{1}+\mathbf{R}^{-1}\mathbf{M}_{2}.

Accordingly, the fixed-point equation (61) can be rewritten as

𝐯=𝒫((𝐄𝐑1𝐌0)𝐯+𝐑1𝐌1𝐯+𝐑1𝐌2𝐯).\mathbf{v}=\mathcal{P}((\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\mathbf{v}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{v}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{v}).

Based upon the above equation, we define a two-step iteration to solve the fixed-point 𝐯\mathbf{v} of the operator 𝒫𝐄\mathcal{P}\circ\mathbf{E} as

𝐯k+1=𝒫((𝐄𝐑1𝐌0)𝐯k+1+𝐑1𝐌1𝐯k+𝐑1𝐌2𝐯k1).\mathbf{v}^{k+1}=\mathcal{P}((\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\mathbf{v}^{k+1}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{v}^{k}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{v}^{k-1}). (62)

To develop an efficient iterative algorithm, we need to consider two issues: The first issue is the solvability of equation (62). The second issue is the convergence of the iterative algorithm. In fact, these two issues may be addressed by choosing appropriate matrices 𝐌0\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2}. Specifically, by introducing a real number θ\theta, we choose 𝐌0\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2} as

𝐌0:=[𝐎(𝐁)𝐈pd+nrθ𝐁𝐏𝟎θ𝐈pd+nr𝟎𝐐],\mathbf{M}_{0}:=\begin{bmatrix}\mathbf{O}&-(\mathbf{B}^{\prime})^{\top}&-\mathbf{I}_{p_{d}+n-r}\\ -\theta\mathbf{B}^{\prime}&\mathbf{P}&\mathbf{0}\\ -\theta\mathbf{I}_{p_{d}+n-r}&\mathbf{0}&\mathbf{Q}\end{bmatrix}, (63)
𝐌1:=[𝐎(θ2)(𝐁)(θ2)𝐈pd+nrθ𝐁𝐏𝟎θ𝐈pd+nr𝟎𝐐],\mathbf{M}_{1}:=\begin{bmatrix}\mathbf{O}&(\theta-2)(\mathbf{B}^{\prime})^{\top}&(\theta-2)\mathbf{I}_{p_{d}+n-r}\\ -\theta\mathbf{B}^{\prime}&\mathbf{P}&\mathbf{0}\\ -\theta\mathbf{I}_{p_{d}+n-r}&\mathbf{0}&\mathbf{Q}\end{bmatrix}, (64)

and

𝐌2:=[𝟎(1θ)(𝐁)(1θ)𝐈pd+nr𝟎𝟎𝟎𝟎𝟎𝟎].\mathbf{M}_{2}:=\begin{bmatrix}\mathbf{0}&(1-\theta)(\mathbf{B}^{\prime})^{\top}&(1-\theta)\mathbf{I}_{p_{d}+n-r}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}. (65)

It is clearly that 𝐌0=𝐌1+𝐌2\mathbf{M}_{0}=\mathbf{M}_{1}+\mathbf{M}_{2}. Associated with these matrices, we have that

(𝐄𝐑1𝐌0)=[𝟎𝟎𝟎(1+θ)𝐏1𝐁𝟎𝟎(1+θ)𝐐1𝟎𝟎],(\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})=\begin{bmatrix}\mathbf{0}&\mathbf{0}&\mathbf{0}\\ (1+\theta)\mathbf{P}^{-1}\mathbf{B}^{\prime}&\mathbf{0}&\mathbf{0}\\ (1+\theta)\mathbf{Q}^{-1}&\mathbf{0}&\mathbf{0}\end{bmatrix}, (66)
𝐑1𝐌1=[𝐈pd+nr(θ2)𝐎1(𝐁)(θ2)𝐎1θ𝐏1𝐁𝐈n𝟎θ𝐐1𝟎𝐈pd+nr]\mathbf{R}^{-1}\mathbf{M}_{1}=\begin{bmatrix}\mathbf{I}_{p_{d}+n-r}&(\theta-2)\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}&(\theta-2)\mathbf{O}^{-1}\\ -\theta\mathbf{P}^{-1}\mathbf{B}^{\prime}&\mathbf{I}_{n}&\mathbf{0}\\ -\theta\mathbf{Q}^{-1}&\mathbf{0}&\mathbf{I}_{p_{d}+n-r}\end{bmatrix} (67)

and

𝐑1𝐌2=[𝟎(1θ)𝐎1(𝐁)(1θ)𝐎1𝟎𝟎𝟎𝟎𝟎𝟎].\mathbf{R}^{-1}\mathbf{M}_{2}=\begin{bmatrix}\mathbf{0}&(1-\theta)\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}&(1-\theta)\mathbf{O}^{-1}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\end{bmatrix}. (68)

We then rewrite the iterative scheme (62) in terms of the vector 𝐰k\mathbf{w}^{k}, 𝐚k\mathbf{a}^{k} and 𝐜k\mathbf{c}^{k} as

{𝐰k+1=proxjdλj1𝐈j,𝐎(𝐰k+𝐎1(𝐁)((θ2)𝐚k+(1θ)𝐚k1)+𝐎1((θ2)𝐜k+(1θ)𝐜k1)),𝐚k+1=prox𝝍,𝐏(𝐚k+𝐏1𝐁(𝐰k+1+θ(𝐰k+1𝐰k))),𝐜k+1=proxι𝕄,𝐐(𝐜k+𝐐1(𝐰k+1+θ(𝐰k+1𝐰k)).\left\{\begin{array}[]{l}\mathbf{w}^{k+1}=\mathrm{prox}_{\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j},\mathbf{O}}\big{(}\mathbf{w}^{k}+\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}((\theta-2)\mathbf{a}^{k}\\ \ \quad\qquad+(1-\theta)\mathbf{a}^{k-1})+\mathbf{O}^{-1}((\theta-2)\mathbf{c}^{k}+(1-\theta)\mathbf{c}^{k-1})\big{)},\\ \ \mathbf{a}^{k+1}=\mathrm{prox}_{\bm{\psi}^{*},\mathbf{P}}\big{(}\mathbf{a}^{k}+\mathbf{P}^{-1}\mathbf{B}^{\prime}(\mathbf{w}^{k+1}+\theta(\mathbf{w}^{k+1}-\mathbf{w}^{k}))\big{)},\\ \ \mathbf{c}^{k+1}=\mathrm{prox}_{\iota_{\mathbb{M}}^{*},\mathbf{Q}}\big{(}\mathbf{c}^{k}+\mathbf{Q}^{-1}(\mathbf{w}^{k+1}+\theta(\mathbf{w}^{k+1}-\mathbf{w}^{k})\big{)}.\end{array}\right. (69)

We first note that since matrix 𝐄𝐑1𝐌0\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0} is strictly block lower triangular, the two-step iteration scheme (62), as an implicit scheme in general, reduces to an explicit scheme (69). We next establish the convergence of Algorithm (69) in the following theorem. The convergence can be obtained by using the arguments similar to those in [14, 15] and we provide its complete proof in Appendix A for the convenience of readers. To this end, we introduce two block matrices by

𝐅:=[𝐏𝟎𝟎𝐐]and𝐆:=[𝐁𝐈pd+nr].\mathbf{F}:=\begin{bmatrix}\mathbf{P}&\mathbf{0}\\ \mathbf{0}&\mathbf{Q}\end{bmatrix}\ \text{and}\ \mathbf{G}:=\begin{bmatrix}\mathbf{B}^{\prime}\\ \mathbf{I}_{p_{d}+n-r}\end{bmatrix}. (70)
Theorem 5.5.

If matrices 𝐎𝕊+pd+nr\mathbf{O}\in\mathbb{S}_{+}^{p_{d}+n-r}, 𝐏𝕊+n\mathbf{P}\in\mathbb{S}_{+}^{n}, 𝐐𝕊+pd+nr\mathbf{Q}\in\mathbb{S}_{+}^{p_{d}+n-r} and θ\theta\in\mathbb{R} satisfy

|θ|𝐅12𝐆𝐎122<1|\theta|\big{\|}\mathbf{F}^{-\frac{1}{2}}\mathbf{G}\mathbf{O}^{-\frac{1}{2}}\big{\|}_{2}<1 (71)

and

|1θ|𝐆2max{𝐎12,𝐅12}1|θ|𝐅12𝐆𝐎122<12,\frac{|1-\theta|\|\mathbf{G}\|_{2}\max\big{\{}\|\mathbf{O}^{-1}\|_{2},\left\|\mathbf{F}^{-1}\right\|_{2}\big{\}}}{1-|\theta|\big{\|}\mathbf{F}^{-\frac{1}{2}}\mathbf{G}\mathbf{O}^{-\frac{1}{2}}\big{\|}_{2}}<\frac{1}{2}, (72)

then the sequence 𝐯k:=[𝐰k𝐚k𝐜k]\mathbf{v}^{k}:=\scriptsize{\begin{bmatrix}\mathbf{w}^{k}\\ \mathbf{a}^{k}\\ \mathbf{c}^{k}\end{bmatrix}}, kk\in\mathbb{N}, generated by Algorithm (69) for any given 𝐯0,𝐯12pd+3n2r\mathbf{v}^{0},\mathbf{v}^{1}\in\mathbb{R}^{2p_{d}+3n-2r}, converges to a fixed-point of operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}.

To end this section, we summarize the iterative scheme for parameter choices when 𝝍\bm{\psi} is non-differentiable and 𝐁\mathbf{B} does not satisfy full row rank in Algorithm 2. Note that the computation of the proximity operators involved in Algorithm (69) is essential for the implementation of Algorithm 2. In Appendix B, we will provide the closed-form formulas for these proximity operators.

1
2
Input: 𝐁\mathbf{B}, 𝝍\bm{\psi}, {lj:jd}\{l_{j}^{*}:j\in\mathbb{N}_{d}\}, ϵ\epsilon.
3
Initialization: Choose {λj0:jd}\{\lambda^{0}_{j}:j\in\mathbb{N}_{d}\} large enough that guarantees lj0ljl^{0}_{j}\leq l_{j}^{*} for all jdj\in\mathbb{N}_{d}.
4 for k=0,1,2,k=0,1,2,\ldots do
5      Solve (13) with λjk\lambda_{j}^{k}, jdj\in\mathbb{N}_{d} by Algorithm (69), get the vectors 𝐰k=[𝐳k𝐯k]\mathbf{w}^{k}=\scriptsize{\begin{bmatrix}\mathbf{z}^{k}\\ \mathbf{v}^{k}\end{bmatrix}}, 𝐚k\mathbf{a}^{k} and 𝐜k\mathbf{c}^{k}, and count the sparsity level ljkl^{k}_{j} of 𝐳jk\mathbf{z}^{k}_{j}.
6       if jd|ljklj|ϵ\sum_{j\in\mathbb{N}_{d}}|l_{j}^{k}-l_{j}^{*}|\leq\epsilon then
7             break
8      
9      for j=1,2,,dj=1,2,\ldots,d do
10             if ljk<ljl^{k}_{j}<l^{*}_{j} then
11                  Compute γj,i:=|(𝐁(pj1+i))𝐚k+cpj1+ik)|\gamma_{j,i}:=\big{|}(\mathbf{B}_{(p_{j-1}+i)}^{\prime})^{\top}\mathbf{a}^{k}+c^{k}_{p_{j-1}+i})\big{|}, imji\in\mathbb{N}_{m_{j}}.
12                   Sort: γj,i1γj,imj\gamma_{j,i_{1}}\leq\cdots\leq\gamma_{j,i_{m_{j}}} with {i1,i2,,imj}=mj\{i_{1},i_{2},\cdots,i_{m_{j}}\}=\mathbb{N}_{m_{j}}.
13                   Compute Γj:=max{γj,i:γj,i<λjk,imj}\Gamma_{j}:=\mathrm{max}\big{\{}\gamma_{j,i}:\gamma_{j,i}<\lambda^{k}_{j},i\in\mathbb{N}_{m_{j}}\big{\}}.
14                   Update λjk:=min{γj,imjlj+1,Γj}\lambda^{k}_{j}:=\min\big{\{}\gamma_{j,i_{m_{j}-{l_{j}^{*}}+1}},\Gamma_{j}\big{\}}.
15             else if ljk>ljl^{k}_{j}>l^{*}_{j} then
16                  Set sj=0s_{j}=0.
17                   for i=1,2,i=1,2,\ldots do
18                        Update sj:=sj+ljkljs_{j}:=s_{j}+l^{k}_{j}-l^{*}_{j}.
19                         Update λjk:=min{γj,imjlj+1+sj,Γj}\lambda_{j}^{k}:=\min\big{\{}\gamma_{j,i_{m_{j}-{l_{j}^{*}}+1+s_{j}}},\Gamma_{j}\big{\}}.
20                         Solve (13) with λjk\lambda_{j}^{k}, jdj\in\mathbb{N}_{d} by Algorithm (69), get the vectors 𝐰k\mathbf{w}^{k}, 𝐚k\mathbf{a}^{k} and 𝐜k\mathbf{c}^{k}, and count the sparsity level ljkl^{k}_{j} of 𝐳jk\mathbf{z}^{k}_{j}.
21                         if ljkljl^{k}_{j}\leq l_{j}^{*} then
22                              break
23                        
24                  
25            
26      Update λjk+1\lambda^{k+1}_{j} as λjk\lambda^{k}_{j} for all jdj\in\mathbb{N}_{d}.
27
Output: {λjk:jd}\{\lambda_{j}^{k}:j\in\mathbb{N}_{d}\}, 𝐰k\mathbf{w}^{k}.
Algorithm 2 Iterative scheme selecting multiple regularization parameters for problem (1)

6 Numerical experiments

In this section, we present four numerical experiments to demonstrate the efficiency of the proposed multi-parameter choice strategies.

In presentation of the numerical results, we use “TSLs” and ``SLs"``\mathrm{SLs}" to represent the target sparsity levels {lj:jd}\{l^{*}_{j}:j\in\mathbb{N}_{d}\} and the sparsity levels {lj:jd}\{l_{j}:j\in\mathbb{N}_{d}\} of the solution obtained from the parameter choice strategies, respectively. For the single parameter regularization model, we use “TSL” and ``SL"``\mathrm{SL}" to denote the target sparsity level ll^{*} and the sparsity level ll of the solution obtained from the parameter choice strategies, respectively. In the first two experiments related to the regularization problem (25), we use “Ratio” to denote the ratio rr of the number of nonzero components to the total number of components of the obtained solution. In the last two experiments associated with the general regularization problem (1), we use “Ratios” to represent the ratios {rj:jd}\{r_{j}:j\in\mathbb{N}_{d}\}. Specifically, for each jdj\in\mathbb{N}_{d}, rjr_{j} refers to the ratio of the number of nonzero components to the total number of components of the obtained solution under the transform matrix 𝐁j\mathbf{B}_{j}.

6.1 Parameter choices: a block separable case

In this experiment, we validate the parameter choice strategy proposed in Corollary 4.4 by considering the multi-parameter regularization model (30) for signal denoising. The single-parameter regularization model for signal denoising is also considered for the comparison purpose.

We employ model (30) to recover the Doppler signal function

f(t):=t(1t)sin((2.1π)/(t+0.05)),t[0,1],f(t):=\sqrt{t(1-t)}\sin((2.1\pi)/(t+0.05)),\ t\in[0,1], (73)

from its noisy data. Let n:=4096n:=4096 and tjt_{j}, jnj\in\mathbb{N}_{n}, be the sample points on a uniform grid in [0,1][0,1] with step size h:=1/(n1)h:={1}/{(n-1)}. We recover the signal 𝐟:=[f(tj):jn]\mathbf{f}:=[f(t_{j}):j\in\mathbb{N}_{n}] from the noisy signal 𝐱:=𝐟+𝜼\mathbf{x}:=\mathbf{f}+\bm{\eta}, where 𝜼\bm{\eta} is an additive white Gaussian noise with the signal-to-noise ratio SNR=80\text{SNR}=80.

We describe the multi-parameter regularization model (30) as follows. In this model, we let d:=7d:=7 and choose the matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} as the Daubechies wavelet transform with the vanishing moments N:=6\mathrm{N}:=6 and the coarsest resolution level L:=6\mathrm{L}:=6. Let m1:=26m_{1}:=2^{6}, mj:=2j+4m_{j}:=2^{j+4}, jd{1}j\in\mathbb{N}_{d}\setminus\{1\} and set p0:=0p_{0}:=0, pj:=ijmip_{j}:=\sum_{i\in\mathbb{N}_{j}}m_{i}, jdj\in\mathbb{N}_{d}. We choose the partition 𝒮n,d:={Sn,1,Sn,2,,Sn,d}\mathcal{S}_{n,d}:=\left\{S_{n,1},S_{n,2},\ldots,S_{n,d}\right\} for n\mathbb{N}_{n} with Sn,j:={pj1+k:kmj}S_{n,j}:=\{p_{j-1}+k:k\in\mathbb{N}_{m_{j}}\}, jdj\in\mathbb{N}_{d}. Associated with this partition, we decompose 𝐮:=[uk:kn]n\mathbf{u}:=[u_{k}:k\in\mathbb{N}_{n}]\in\mathbb{R}^{n} into dd sub-vectors 𝐮j:=[upj1+k:kmj]\mathbf{u}_{j}:=[u_{p_{j-1}+k}:k\in\mathbb{N}_{m_{j}}], jdj\in\mathbb{N}_{d}, and decompose 𝐀\mathbf{A} into dd sub-matrices 𝐀[j]:=[𝐀(pj1+k):kmj]n×mj,jd\mathbf{A}_{[j]}:=[\mathbf{A}_{(p_{j-1}+k)}:k\in\mathbb{N}_{m_{j}}]\in\mathbb{R}^{n\times m_{j}},\ j\in\mathbb{N}_{d}. Moreover, for each jdj\in\mathbb{N}_{d}, we choose the nature partition 𝒮mj,mj:={Smj,1,Smj,2,,Smj,mj}\mathcal{S}_{m_{j},m_{j}}:=\left\{S_{m_{j},1},S_{m_{j},2},\ldots,S_{m_{j},m_{j}}\right\} for mj\mathbb{N}_{m_{j}}. That is, Smj,k:={pj1+k}S_{m_{j},k}:=\{p_{j-1}+k\}, kmjk\in\mathbb{N}_{m_{j}}. Accordingly, we decompose vector 𝐮j\mathbf{u}_{j} into mjm_{j} sub-vectors 𝐮j,k:=upj1+k\mathbf{u}_{j,k}:=u_{p_{j-1}+k} and decompose matrix 𝐀[j]\mathbf{A}_{[j]} into mjm_{j} sub-matrices 𝐀[j,k]:=𝐀(pj1+k)\mathbf{A}_{[j,k]}:=\mathbf{A}_{(p_{j-1}+k)}, kmjk\in\mathbb{N}_{m_{j}}. It follows from the orthogonality of matrix 𝐀\mathbf{A} that conditions (39) and (41) are satisfied. This allows us to choose the regularization parameters according to the strategy stated in Corollary 4.4. We note that if 𝐮𝝀\mathbf{u}_{\bm{\lambda}} is a solution of problem (30) with 𝝀:=[λj:jd]\bm{\lambda}:=[\lambda_{j}:j\in\mathbb{N}_{d}], then for each jdj\in\mathbb{N}_{d}, the 𝒮mj,mj\mathcal{S}_{m_{j},m_{j}}-block sparsity level of (𝐮𝝀)j(\mathbf{u}_{\bm{\lambda}})_{j} coincides with its sparsity level.

In this experiment, we solve model (30) by Algorithm (54) with Φ(𝐮):=jdλj𝐮j1\Phi(\mathbf{u}):=\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{u}_{j}\|_{1}, Ψ:=12𝐱22\Psi:=\frac{1}{2}\|\cdot-\mathbf{x}\|_{2}^{2} and 𝐂:=𝐀\mathbf{C}:=\mathbf{A}. The proximity operator proxαΦ\mathrm{prox}_{\alpha\Phi} at 𝐳n\mathbf{z}\in\mathbb{R}^{n} has the form proxαΦ(𝐳):=𝝁\mathrm{prox}_{\alpha\Phi}(\mathbf{z}):=\bm{\mu} with μpj1+i\mu_{p_{j-1}+i}, jdj\in\mathbb{N}_{d}, imji\in\mathbb{N}_{m_{j}} being defined by (86). The proximity operator prox1ρΨ\mathrm{prox}_{\frac{1}{\rho}\Psi} at 𝐳n\mathbf{z}\in\mathbb{R}^{n} has the form prox1ρΨ(𝐳):=𝐰\mathrm{prox}_{\frac{1}{\rho}\Psi}(\mathbf{z}):=\mathbf{w} with 𝐰:=[wj:jn]\mathbf{w}:=[w_{j}:j\in\mathbb{N}_{n}] and wjw_{j}, jnj\in\mathbb{N}_{n} being defined by (92).

We first validate the parameter choice strategy stated in Corollary 4.4. We set three prescribed TSLs values [64,41,50,58,61,66,60][64,41,50,58,61,66,60], [64,51,68,86,108,116,107][64,51,68,86,108,116,107] and [64,55,91,128,206,226,230][64,55,91,128,206,226,230]. According to the strategy stated in Corollary 4.4, we select the parameter 𝝀:=[λj:jd]\bm{\lambda}^{*}:=[\lambda_{j}^{*}:j\in\mathbb{N}_{d}] with which model (30) has solutions having the target sparsity levels. For each jdj\in\mathbb{N}_{d}, we set γj,k:=|(𝐀[j,k])𝐱|\gamma_{j,k}:=\big{|}(\mathbf{A}_{[j,k]})^{\top}\mathbf{x}\big{|}, kmjk\in\mathbb{N}_{m_{j}} and rearrange them in a nondecreasing order: γj,k1γj,k2γj,kmj\gamma_{j,k_{1}}\leq\gamma_{j,k_{2}}\leq\cdots\leq\gamma_{j,k_{m_{j}}} with {k1,k2,,kmj}=mj\{k_{1},k_{2},\ldots,k_{m_{j}}\}=\mathbb{N}_{m_{j}}. We then choose λj:=γj,kmjlj\lambda^{*}_{j}:=\gamma_{j,k_{m_{j}-l_{j}^{*}}} if lj<mjl^{*}_{j}<m_{j} and λj:=0.001γj,k1\lambda^{*}_{j}:=0.001\gamma_{j,k_{1}} if lj=mjl^{*}_{j}=m_{j}. We solve model (30) with each selected value of 𝝀\bm{\lambda}^{*} for the corresponding solution 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}} and determine the actual sparsity level SLs of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}}.

We report in Table 1 the numerical results of this experiment: the targeted sparsity levels TSLs, the selected values of parameter 𝝀\bm{\lambda}^{*}, the actual sparsity levels SLs of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}}, the Ratio values of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}} and the MSE\mathrm{MSE} values of the denoised signals 𝐀𝐮𝝀\mathbf{A}\mathbf{u}_{\bm{\lambda}^{*}}. Here and in the next subsection, MSE:=1n𝐀𝐮𝝀𝐱22\mathrm{MSE}:=\frac{1}{n}\|\mathbf{A}\mathbf{u}_{\bm{\lambda}^{*}}-\mathbf{x}\|_{2}^{2}. Observing from Table 1, the SLs values match with the TSLs values. The numerical results in Table 1 demonstrate the efficacy of the strategy stated in Corollary 4.4 in selecting regularization parameters with which the corresponding solution achieves a desired sparsity level and preserves the approximation accuracy.

Table 1: Multi-parameter choices λj:=γj,kmjlj\lambda^{*}_{j}:=\gamma_{j,k_{m_{j}-l_{j}^{*}}} for signal denoising (block separable)
  • TSLs [64,41,50,58,[64,41,50,58, [64,51,68,86,[64,51,68,86, [64,55,91,128,[64,55,91,128,
    61,66,60]61,66,60] 108,116,107]108,116,107] 206,226,230]206,226,230]
    𝝀\bm{\lambda}^{*} [1.18,3.12,5.20,6.80,[1.18,3.12,5.20,6.80, [1.18,1.98,3.05,4.05,[1.18,1.98,3.05,4.05, [1.18,1.26,1.79,2.62,[1.18,1.26,1.79,2.62,
    7.42,7.06,7.17]×1057.42,7.06,7.17]\times 10^{-5} 4.82,5.60,6.15]×1054.82,5.60,6.15]\times 10^{-5} 2.99,4.12,4.91]×1052.99,4.12,4.91]\times 10^{-5}
    SLs\mathrm{SLs} [64,41,50,58,[64,41,50,58, [64,51,68,86,[64,51,68,86, [64,55,91,128,[64,55,91,128,
    61,66,60]61,66,60] 108,116,107]108,116,107] 206,226,230]206,226,230]
    Ratio 9.77%9.77\% 14.65%14.65\% 24.41%24.41\%
    MSE\mathrm{MSE} 3.50×10103.50\times 10^{-10} 2.33×10102.33\times 10^{-10} 2.04×10102.04\times 10^{-10}

For the comparison purpose, we also consider the single-parameter regularization model (30) with d:=1d:=1. Let γk:=|(𝐀(k))𝐱|\gamma_{k}:=|(\mathbf{A}_{(k)})^{\top}\mathbf{x}|, knk\in\mathbb{N}_{n}, rearranged in a nondecreasing order: γk1γk2γkn\gamma_{k_{1}}\leq\gamma_{k_{2}}\leq\cdots\leq\gamma_{k_{n}} with {k1,k2,,kn}=n\{k_{1},k_{2},\cdots,k_{n}\}=\mathbb{N}_{n}. For three TSL values 400400, 600600 and 10001000, we choose λ:=γknl\lambda^{*}:=\gamma_{k_{n-l^{*}}} according to Corollary 4.4 with d:=1d:=1. By solving model (30) with d:=1d:=1 by Algorithm (54) with Φ(𝐮):=λ𝐮1\Phi(\mathbf{u}):=\lambda\|\mathbf{u}\|_{1}, Ψ:=12𝐱22\Psi:=\frac{1}{2}\|\cdot-\mathbf{x}\|_{2}^{2} and 𝐂:=𝐀n×n\mathbf{C}:=\mathbf{A}\in\mathbb{R}^{n\times n}, we obtain the solution 𝐮λ\mathbf{u}_{\lambda^{*}} and determine the actual sparsity level SL of 𝐮λ\mathbf{u}_{\lambda^{*}}. The targeted sparsity levels TSL, the selected values of parameter λ\lambda^{*}, the actual sparsity levels SL\mathrm{SL} of 𝐮λ\mathbf{u}_{\lambda^{*}}, the Ratio values of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}} and the MSE\mathrm{MSE} values of the denoised signals 𝐀𝐮λ\mathbf{A}\mathbf{u}_{\lambda^{*}} are reported in Table 2. Observing from Tables 1 and 2, compared with the single-parameter regularization model, the multi-parameter regularization model may provide a solution having better approximation accuracy with the same sparsity level.

Table 2: Single-parameter choices λ:=γknl\lambda^{*}:=\gamma_{k_{n-l^{*}}} for signal denoising (block separable)

6.2 Parameter choices: a nonseparable case

This experiment is devoted to validating the efficacy of Algorithm 1. We again consider recovering the Doppler signal function defined by (73) from its noisy data by the multi-parameter regularization model (30). The original signal 𝐟\mathbf{f} and the noisy signal 𝐱\mathbf{x} are chosen in the same way as in Subsection 6.1.

In this experiment, the matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} is determined by the biorthogonal wavelet ‘bior2.2’ available in Matlab with the coarsest resolution level L:=6\mathrm{L}:=6. In both the analysis and synthesis filters, ‘bior2.2’ possesses 2 vanishing moments. Such a matrix does not satisfies conditions (39) and (41). As a result, we choose the regularization parameters by employing Algorithm 1. The number dd of the regularization parameters, the sub-vectors 𝐮j\mathbf{u}_{j}, jdj\in\mathbb{N}_{d} of a vector 𝐮n\mathbf{u}\in\mathbb{R}^{n} and the sub-matrix 𝐀[j]\mathbf{A}_{[j]}, jdj\in\mathbb{N}_{d} of 𝐀\mathbf{A} are all defined as in Subsection 6.1.

We set three prescribed TSLs values [64,64,105,123,145,157,142][64,64,105,123,145,157,142], [64,64,115,143,[64,64,115,143, 191,208,215]191,208,215], [64,64,124,237,295,389,427][64,64,124,237,295,389,427]. We choose the parameters λj\lambda_{j}^{*}, jdj\in\mathbb{N}_{d} by Algorithm 1 with 𝝍(𝐮):=12𝐀𝐮𝐱22\bm{\psi}(\mathbf{u}):=\frac{1}{2}\|\mathbf{Au}-\mathbf{x}\|_{2}^{2}, 𝐁=𝐈n\mathbf{B}=\mathbf{I}_{n}, 𝐁=𝐈n\mathbf{B}^{\prime}=\mathbf{I}_{n} and ϵ=7\epsilon=7. In Algorithm 1, we choose the initial parameter λj0=𝐀[j]𝐱\lambda_{j}^{0}=\big{\|}\mathbf{A}_{[j]}^{\top}{\mathbf{x}}\big{\|}_{\infty}, jdj\in\mathbb{N}_{d} and solve model (30) by Algorithm (54) with Φ(𝐮):=jdλj𝐮j1\Phi(\mathbf{u}):=\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\mathbf{u}_{j}\|_{1}, Ψ:=12𝐱22\Psi:=\frac{1}{2}\|\cdot-\mathbf{x}\|_{2}^{2} and 𝐂:=𝐀\mathbf{C}:=\mathbf{A}. The targeted sparsity levels TSLs, the selected values of parameter 𝝀\bm{\lambda}^{*} chosen by Algorithm 1, the actual sparsity levels SLs\mathrm{SLs} of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}}, the Ratio values of 𝐮𝝀\mathbf{u}_{\bm{\lambda}^{*}}, the numbers NUM\mathrm{NUM} of iterations for 𝝀\bm{\lambda}^{*} and the MSE\mathrm{MSE} values of the denoised signals 𝐀𝐮λ\mathbf{A}\mathbf{u}_{\lambda^{*}} are reported in Table 3. For the three TSLs values, the algorithm reaches the stopping criteria within 1111, 44 and 1515 iterations, respectively. The SLs values obtained by Algorithm 1 match with the TSLs values within tolerance error ϵ=7\epsilon=7. The numerical results in Table 3 validate the efficacy of Algorithm 1 for obtaining regularization parameters leading to a solution with desired sparsity level and approximation error.

Table 3: Multi-parameter choices by Algorithm 1 for signal denoising (nonseparable)
  • TSLs\mathrm{TSLs} [64,64,105,123,[64,64,105,123, [64,64,115,143,[64,64,115,143, [64,64,124,237,[64,64,124,237,
    145,157,142]145,157,142] 191,208,215]191,208,215] 295,389,427]295,389,427]
    𝝀{\bm{\lambda}}^{*} [5.94,0.56,1.91,2.27,[5.94,0.56,1.91,2.27, [5.94,0.56,1.06,1.40,[5.94,0.56,1.06,1.40, [59.39,5.59,2.19,2.44,[59.39,5.59,2.19,2.44,
    2.57,3.36,6.68]×1042.57,3.36,6.68]\times 10^{-4} 1.12,1.54,2.82]×1041.12,1.54,2.82]\times 10^{-4} 3.69,4.87,7.72]×1053.69,4.87,7.72]\times 10^{-5}
    SLs\mathrm{SLs} [64,64,105,124,[64,64,105,124, [64,64,116,143,[64,64,116,143, [64,64,124,237,[64,64,124,237,
    147,156,141]147,156,141] 194,210,216]194,210,216] 294,388,426]294,388,426]
    Ratio 19.56%19.56\% 24.58%24.58\% 38.99%38.99\%
    NUM\mathrm{NUM} 1111 44 1515
    MSE\mathrm{MSE} 4.81×1084.81\times 10^{-8} 1.98×1081.98\times 10^{-8} 9.93×1099.93\times 10^{-9}

6.3 Compound sparse denoising

We consider in this experiment the parameter choice of the compound sparse denoising regularization model (5) implemented by Algorithm 2. In this case, the fidelity term 𝝍\bm{\psi} defined by (6) is differentiable and the transform matrix 𝐁:=[𝐈n𝐃]\mathbf{B}:=\scriptsize{\begin{bmatrix}\mathbf{I}_{n}\\ \mathbf{D}\end{bmatrix}} satisfies rank(𝐁)=nrank(\mathbf{B})=n. Clearly, 𝐁\mathbf{B} does not have full row rank.

In this experiment, the noisy data 𝐲:=[y(t):tn]\mathbf{y}:=[y(t):t\in\mathbb{N}_{n}] with n=300n=300 is generated by adding a low-frequency sinusoid signal f(t):=sin(0.021πt)f(t):=\sin(0.021\pi t), two additive step discontinuities u(t)=2u(t)=2 for t<0.3nt<0.3n, 11 for t>0.6nt>0.6n and 0 otherwise, and additive white Gaussian noise η(t)\eta(t) with 0.3 standard deviation and zero mean. Set 𝐟:=[f(t):tn]\mathbf{f}:=[f(t):t\in\mathbb{N}_{n}] and 𝐮:=[u(t):tn]\mathbf{u}:=[u(t):t\in\mathbb{N}_{n}]. We estimate 𝐟\mathbf{f} and 𝐮\mathbf{u} from the noisy data 𝐲\mathbf{y} by using simultaneous low-pass filtering and compound sparse denoising. The estimate 𝐮\mathbf{u}^{*} of 𝐮\mathbf{u} is obtained by solving the compound sparse denoising model (5), where the (n4)×n(n-4)\times n high-pass filter matrix 𝐇\mathbf{H} is chosen as in Example A of [22]. The estimate 𝐟\mathbf{f}^{*} of ff is then obtained by 𝐟=(𝐈~𝐇)(𝐲𝐮)\mathbf{f}^{*}=(\tilde{\mathbf{I}}-\mathbf{H})(\mathbf{y}-\mathbf{u}^{*}), where 𝐈~\tilde{\mathbf{I}} is the identity matrix of order nn with the first 22 and last 22 rows removed. To measure the filtering effect, we define the MSE\mathrm{MSE} by

MSE:=1n4𝐈~(𝐟+𝐮)𝐟𝐈~𝐮22.\mathrm{MSE}:=\frac{1}{n-4}\big{\|}\tilde{\mathbf{I}}(\mathbf{f}+\mathbf{u})-\mathbf{f}^{*}-\tilde{\mathbf{I}}\mathbf{u}^{*}\big{\|}_{2}^{2}.

We employ Algorithm 2 to select multiple regularization parameters that ensures the resulting solution of model (5) achieves given TSLs. In Algorithm 2, we solve model (13) by Algorithm (69) where d=2d=2, 𝝍\bm{\psi} has the form (6), 𝐁1\mathbf{B}_{1} is the identity matrix of order nn and 𝐁2\mathbf{B}_{2} is the (n1)×n(n-1)\times n first order difference matrix. Recall that pd=2n1p_{d}=2n-1. We choose the real number θ:=1\theta:=1 and the three matrices 𝐎:=1α𝐈pd\mathbf{O}:=\frac{1}{\alpha}\mathbf{I}_{p_{d}}, 𝐏:=1ρ𝐈n\mathbf{P}:=\frac{1}{\rho}\mathbf{I}_{n} and 𝐐:=1β𝐈pd\mathbf{Q}:=\frac{1}{\beta}\mathbf{I}_{p_{d}} with α\alpha, ρ\rho and β\beta being positive real numbers. According to Theorem 5.5, to ensure the convergence of Algorithm (69), we choose the real numbers α\alpha, ρ\rho and β\beta satisfying [αρ𝐁αβ𝐈pd]2<1\left\|\scriptsize{\begin{bmatrix}\sqrt{\alpha\rho}\mathbf{B}^{\prime}\\ \sqrt{\alpha\beta}\mathbf{I}_{p_{d}}\end{bmatrix}}\right\|_{2}<1. In this case, Algorithm (69) reduces to (84). The closed-form formulas of the proximity operators proxαjdλj1𝐈j\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}}, prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} and prox1ρ𝝍\mathrm{prox}_{\frac{1}{\rho}\bm{\psi}} with 𝝍\bm{\psi} being defined by (6) are given in (85), (91) and (93), respectively.

We set five prescribed TSLs\mathrm{TSLs} values [10,5][10,5], [20,20][20,20], [20,30][20,30], [50,40][50,40] and [80,60][80,60] and choose the regularization parameters λj\lambda_{j}^{*}, jdj\in\mathbb{N}_{d} by employing Algorithm 2 with ϵ=2\epsilon=2. We report in Table 4 the targeted sparsity levels TSLs\mathrm{TSLs}, the initial values of 𝝀0\bm{\lambda}^{0}, the selected values of parameter 𝝀\bm{\lambda}^{*} chosen by Algorithm 2, the actual sparsity levels SLs\mathrm{SLs} of 𝐮\mathbf{u}^{*}, the Ratios values of 𝐮\mathbf{u}^{*}, the numbers NUM\mathrm{NUM} of iterations for 𝝀\bm{\lambda}^{*} and the MSE\mathrm{MSE} values of the filtered signal. For the five TSLs values, the algorithm meets the stopping criteria after 88, 88, 66, 22, and 1212 iterations, respectively. The SLs values obtained by Algorithm 2 match with the TSLs values within tolerance error ϵ=2\epsilon=2. The numerical results in Table 4 demonstrate the efficacy of Algorithm 2 in selecting multiple regularization parameters to achieve desired sparsity levels of the solution.

Table 4: Multi-parameter choices by Algorithm 2 for model (5)

6.4 Fused SVM

The goal of this experiment is to validate the parameter choice of the fused SVM model (7) implemented by Algorithm 2. In model (7), the fidelity term 𝝍(𝐮):=ϕ(𝐘𝐗𝐮)\bm{\psi}(\mathbf{u}):=\bm{\phi}(\mathbf{YXu}) is not differential and the transform matrix 𝐁:=[𝐈n𝐃]\mathbf{B}:=\scriptsize{\begin{bmatrix}\mathbf{I}_{n}\\ \mathbf{D}\end{bmatrix}} does not have full row rank.

The dataset utilized for this experiment is the set of handwriting digits sourced from the modified national institute of standards and technology (MNIST) database [13]. The original MNIST database consists of 60000 training samples and 10000 testing samples of the digits ‘0’ through ‘9’. We consider the binary classification problem with two digits ‘7’ and ‘9’, by taking 8141 training samples and 2037 testing samples of these two digits from the database. Let p:=8141p:=8141 be the number of training samples and {yj:jp}{1,1}\{y_{j}:j\in\mathbb{N}_{p}\}\subset\{-1,1\} be the labels of training data in which -1 and 1 represent the digits ‘7’ and ‘9’, respectively. In addition, let n:=784n:=784 be the the number of pixels in each sample.

We implement Algorithm 2 to select multiple regularization parameters with which the resulting solution of model (7) achieves given TSLs. By choosing the real number θ\theta and the matrices 𝐎\mathbf{O}, 𝐏\mathbf{P}, 𝐐\mathbf{Q} in the same way as in Subsection 6.3, we solve model (13) by Algorithm (84) where d=2d=2, 𝝍(𝐮):=ϕ(𝐘𝐗𝐮)\bm{\psi}(\mathbf{u}):=\bm{\phi}(\mathbf{YXu}), 𝐁1\mathbf{B}_{1} is the identity matrix of order nn and 𝐁2\mathbf{B}_{2} is the (n1)×n(n-1)\times n first order difference matrix. It is worth noting that the proximity operator of function 1ρ𝝍\frac{1}{\rho}\bm{\psi} can not be expressed explicitly. Instead, we solve model (13) by Algorithm (84) with 𝝍\bm{\psi} and 𝐁\mathbf{B}^{\prime} being replaced by ϕ\bm{\phi} and 𝐘𝐗𝐁\mathbf{YX}\mathbf{B}^{\prime}, respectively, and obtain at step k three vectors 𝐰~k\widetilde{\mathbf{w}}^{k}, 𝐚~k\widetilde{\mathbf{a}}^{k} and 𝐜~k\widetilde{\mathbf{c}}^{k}. Then the vectors 𝐰k\mathbf{w}^{k}, 𝐚k\mathbf{a}^{k} and 𝐜k\mathbf{c}^{k} emerging in Algorithm 2 can be obtained by 𝐰k:=𝐰~k\mathbf{w}^{k}:=\widetilde{\mathbf{w}}^{k}, 𝐚k:=(𝐘𝐗)𝐚~k\mathbf{a}^{k}:=(\mathbf{YX})^{\top}\widetilde{\mathbf{a}}^{k} and 𝐜k:=𝐜~k\mathbf{c}^{k}:=\widetilde{\mathbf{c}}^{k}. Again by Theorem 5.5, to guarantee the convergence of Algorithm (84), we choose the real numbers α\alpha, ρ\rho and β\beta in Algorithm (84) satisfying [αρ𝐘𝐗𝐁αβ𝐈pd]2<1\left\|\scriptsize{\begin{bmatrix}\sqrt{\alpha\rho}\mathbf{YX}\mathbf{B}^{\prime}\\ \sqrt{\alpha\beta}\mathbf{I}_{p_{d}}\end{bmatrix}}\right\|_{2}<1. The closed-form formulas of the proximity operators proxαjdλj1𝐈j\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}}, prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} and prox1ρϕ\mathrm{prox}_{\frac{1}{\rho}\bm{\phi}} are given in (85), (91) and (94), respectively.

We set five prescribed TSLs values [80,80][80,80], [90,110][90,110], [150,120][150,120], [280,200][280,200] and [360,400][360,400] and use Algorithm 2 with ϵ=2\epsilon=2 to select the regularization parameters λj\lambda_{j}^{*}, jdj\in\mathbb{N}_{d}. The targeted sparsity levels TSLs\mathrm{TSLs}, the initial values of 𝝀0\bm{\lambda}^{0}, the selected values of parameter 𝝀\bm{\lambda}^{*} chosen by Algorithm 2, the actual sparsity levels SLs\mathrm{SLs} of 𝐮\mathbf{u}^{*}, the Ratios values of 𝐮\mathbf{u}^{*}, the numbers NUM\mathrm{NUM} of iterations for 𝝀\bm{\lambda}^{*}, the accuracy on the training datasets (TrA\mathrm{TrA}) and the accuracy on the testing datasets (TeA\mathrm{TeA}) are reported in Table 5. Algorithm 2 meets the stopping criteria ϵ=2\epsilon=2 within 10, 13, 15, 15 and 8 iterations for the five TSLs values, respectively. The SLs values obtained by Algorithm 2 match with the TSLs values within tolerance error ϵ=2\epsilon=2. These results validate the effectiveness of Algorithm 2 for obtaining multiple regularization parameters leading to a solution with desired sparsity levels.

Table 5: Multi-parameter choices by Algorithm 2 for model (7)

7 Conclusion

In this paper, we have explored strategies for selecting multiple regularization parameters in the multi-parameter regularization model with the 1\ell_{1} norm. We established the relationship between the regularization parameters and the sparsity of the regularized solution under transform matrices. Leveraging this relationship, we developed an iterative algorithm to determine multiple regularization parameters, ensuring that the resulting regularized solution achieves prescribed sparsity levels under the transform matrices. For scenarios where the fidelity term is nondifferentiable and the transform matrix lacks full row rank, we introduced a fixed-point proximity algorithm capable of simultaneously determining the regularized solution and two auxiliary vectors arising in the sparsity characterization. This algorithm served as a critical component in constructing the iterative parameter selection scheme. Finally, numerical experiments demonstrated the effectiveness of the proposed multi-parameter selection strategies, confirming their ability to produce solutions with the desired sparsity and performance.


Acknowledgments. Q. Liu is supported in part by the Doctor Foundation of Henan University of Technology, China (No.2023BS061), and the Innovative Funds Plan of Henan University of Technology (No.2021ZKCJ11). R. Wang is supported in part by the Natural Science Foundation of China under grant 12171202. Y. Xu is supported in part by the US National Science Foundation under grant DMS-2208386, and the US National Institutes of Health under grant R21CA263876.

Appendix A Proof of Theorem 5.5

In this appendix, we present a complete proof for Theorem 5.5, which shows the convergence of Algorithm (69).

We start with reviewing the notion of weakly firmly non-expansive operators introduced in [14]. An operator T:2ssT:\mathbb{R}^{2s}\to\mathbb{R}^{s} is called weakly firmly non-expansive with respect to a set :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\} of s×ss\times s matrices if for any (𝐬i,𝐳i,𝝎i)s×s×s(\mathbf{s}^{i},\mathbf{z}^{i},\bm{\omega}^{i})\in\mathbb{R}^{s}\times\mathbb{R}^{s}\times\mathbb{R}^{s} satisfying 𝝎i=T(𝐬i,𝐳i)\bm{\omega}^{i}=T(\mathbf{s}^{i},\mathbf{z}^{i}) for i=1,2,i=1,2, there holds

𝝎2𝝎1,𝐌0(𝝎2𝝎1)𝝎2𝝎1,𝐌1(𝐬2𝐬1)+𝐌2(𝐳2𝐳1).\big{\langle}\bm{\omega}^{2}-\bm{\omega}^{1},\mathbf{M}_{0}(\bm{\omega}^{2}-\bm{\omega}^{1})\big{\rangle}\leq\big{\langle}\bm{\omega}^{2}-\bm{\omega}^{1},\mathbf{M}_{1}(\mathbf{s}^{2}-\mathbf{s}^{1})+\mathbf{M}_{2}(\mathbf{z}^{2}-\mathbf{z}^{1})\big{\rangle}. (74)

The graph gra(T){\rm gra}(T) of operator TT is defined by

gra(T):={(𝐬,𝐳,𝝎):(𝐬,𝐳,𝝎)s×s×s,𝝎=T(𝐬,𝐳)}.{\rm gra}(T):=\{(\mathbf{s},\mathbf{z},\bm{\omega}):(\mathbf{s},\mathbf{z},\bm{\omega})\in\mathbb{R}^{s}\times\mathbb{R}^{s}\times\mathbb{R}^{s},\bm{\omega}=T(\mathbf{s},\mathbf{z})\}.

We say the graph gra(T){\rm gra}(T) of TT is a closed set if for any sequence {(𝐬k,𝐳k,𝝎k)gra(T):k}\{(\mathbf{s}^{k},\mathbf{z}^{k},\bm{\omega}^{k})\in{\rm gra}(T):k\in\mathbb{N}\} converging to (𝐬,𝐳,𝝎)(\mathbf{s},\mathbf{z},\bm{\omega}), there holds (𝐬,𝐳,𝝎)gra(T)(\mathbf{s},\mathbf{z},\bm{\omega})\in{\rm gra}(T). Following [14], we also need the notion of Condition-𝐌\mathbf{M}. We say a set :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\} of s×ss\times s matrices satisfies Condition-𝐌\mathbf{M}, if the following three hypotheses are satisfied : (i) 𝐌0=𝐌1+𝐌2\mathbf{M}_{0}=\mathbf{M}_{1}+\mathbf{M}_{2}; (ii) 𝐇:=𝐌0+𝐌2\mathbf{H}:=\mathbf{M}_{0}+\mathbf{M}_{2} is in 𝕊+s\mathbb{S}^{s}_{+}, (iii) 𝐇1/2𝐌2𝐇1/22<1/2\big{\|}\mathbf{H}^{-1/2}\mathbf{M}_{2}\mathbf{H}^{-1/2}\big{\|}_{2}<1/2.

The next result established in [14] demonstrates that if a operator TT with closed graph is weakly firmly non-expansive with respect to a set :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\} of s×ss\times s matrices satisfies Condition-𝐌\mathbf{M}, then a sequence {𝝎k:k}\{\bm{\omega}^{k}:k\in\mathbb{N}\} generated by

𝝎k+1=T(𝝎k,𝝎k1)\bm{\omega}^{k+1}=T(\bm{\omega}^{k},\bm{\omega}^{k-1}) (75)

converges to a fixed point 𝝎\bm{\omega} of TT, that is, 𝝎=T(𝝎,𝝎)\bm{\omega}=T(\bm{\omega},\bm{\omega}).

Lemma A.1.

Suppose that a set :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\} of s×ss\times s matrices satisfies Condition-𝐌\mathbf{M}, the operator T:2ssT:\mathbb{R}^{2s}\to\mathbb{R}^{s} is weakly firmly non-expansive with respect to \mathcal{M}, the set of fixed-points of TT is nonempty and dom(T)=2s\mathrm{dom}(T)=\mathbb{R}^{2s}. If the sequence {𝛚k:k}\{\bm{\omega}^{k}:k\in\mathbb{N}\} is generated by (75) for any given 𝛚0\bm{\omega}^{0}, 𝛚1s\bm{\omega}^{1}\in\mathbb{R}^{s}, then {𝛚k:k}\{\bm{\omega}^{k}:k\in\mathbb{N}\} converges. Moreover, if the graph gra(T){\rm gra}(T) of TT is closed, then {𝛚k:k}\{\bm{\omega}^{k}:k\in\mathbb{N}\} converges to a fixed-point of TT.

Below, we establish the convergence of Algorithm (69) by employing Lemma A.1. For this purpose, we construct a new operator from operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}. Let 𝐌0\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2} be (2pd+3n2r)×(2pd+3n2r)(2p_{d}+3n-2r)\times(2p_{d}+3n-2r) matrices defined by (63), (64) and (65), respectively. Associated with the set :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\}, we define an operator 𝒯:2pd+3n2r×2pd+3n2r2pd+3n2r\mathcal{T}_{\mathcal{M}}:\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r}\to\mathbb{R}^{2p_{d}+3n-2r} for any (𝐬,𝐳)2pd+3n2r×2pd+3n2r(\mathbf{s},\mathbf{z})\in\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r} by 𝝎:=𝒯(𝐬,𝐳)\bm{\omega}:=\mathcal{T}_{\mathcal{M}}(\mathbf{s},\mathbf{z}) with 𝝎\bm{\omega} satisfying

𝝎=𝒫((𝐄𝐑1𝐌0)𝝎+𝐑1𝐌1𝐬+𝐑1𝐌2𝐳).\bm{\omega}=\mathcal{P}((\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\bm{\omega}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{s}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{z}). (76)

We note that the operator 𝒯\mathcal{T}_{\mathcal{M}} is well-defined. To see this, we decompose any 𝐱2pd+3n2r\mathbf{x}\in\mathbb{R}^{2p_{d}+3n-2r} as 𝐱:=[𝐱1𝐱2𝐱3]\mathbf{x}:=\scriptsize{\begin{bmatrix}\mathbf{x}_{1}\\ \mathbf{x}_{2}\\ \mathbf{x}_{3}\end{bmatrix}} with 𝐱1,𝐱3pd+nr\mathbf{x}_{1},\mathbf{x}_{3}\in\mathbb{R}^{p_{d}+n-r} and 𝐱2n\mathbf{x}_{2}\in\mathbb{R}^{n}. By using representations (66),(67) and (68), we rewrite equation (76) as

{𝝎1=proxjdλj1𝐈j,𝐎(𝐬1+(θ2)𝐎1(𝐁)𝐬2+(θ2)𝐎1𝐬3+(1θ)𝐎1(𝐁)𝐳2+(1θ)𝐎1𝐳3,𝝎2=prox𝝍,𝐏(𝐬2+(1+θ)𝐏1𝐁𝝎1θ𝐏1𝐁𝐬1),𝝎3=proxι𝕄,𝐐(𝐬3+(1+θ)𝐐1𝝎1θ𝐐1𝐬1).\left\{\begin{array}[]{l}\bm{\omega}_{1}=\mathrm{prox}_{\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j},\mathbf{O}}\big{(}\mathbf{s}_{1}+(\theta-2)\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}\mathbf{s}_{2}+(\theta-2)\mathbf{O}^{-1}\mathbf{s}_{3}\\ \quad\qquad+(1-\theta)\mathbf{O}^{-1}(\mathbf{B}^{\prime})^{\top}\mathbf{z}_{2}+(1-\theta)\mathbf{O}^{-1}\mathbf{z}_{3},\\ \bm{\omega}_{2}=\mathrm{prox}_{\bm{\psi}^{*},\mathbf{P}}\big{(}\mathbf{s}_{2}+(1+\theta)\mathbf{P}^{-1}\mathbf{B}^{\prime}\bm{\omega}_{1}-\theta\mathbf{P}^{-1}\mathbf{B}^{\prime}\mathbf{s}_{1}\big{)},\\ \bm{\omega}_{3}=\mathrm{prox}_{\iota_{\mathbb{M}}^{*},\mathbf{Q}}\big{(}\mathbf{s}_{3}+(1+\theta)\mathbf{Q}^{-1}\bm{\omega}_{1}-\theta\mathbf{Q}^{-1}\mathbf{s}_{1}\big{)}.\end{array}\right. (77)

Clearly, for any (𝐬,𝐳)2pd+3n2r×2pd+3n2r(\mathbf{s},\mathbf{z})\in\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r}, there exists a unique 𝝎2pd+3n2r\bm{\omega}\in\mathbb{R}^{2p_{d}+3n-2r} satisfying equation (77).

Observing from equation (76), we have that operator 𝒯\mathcal{T}_{\mathcal{M}} has the same fixed-point set as 𝒫𝐄\mathcal{P}\circ\mathbf{E}. With the help of operator 𝒯\mathcal{T}_{\mathcal{M}}, we represent equation (62) in an explicit form as

𝐯k+1=𝒯(𝐯k,𝐯k1)\mathbf{v}^{k+1}=\mathcal{T}_{\mathcal{M}}(\mathbf{v}^{k},\mathbf{v}^{k-1}) (78)

and obtain a fixed-point of 𝒯\mathcal{T}_{\mathcal{M}} by this iteration. According to Lemma A.1, in order to obtain the convergence of iterative scheme (78), it suffices to prove that operator 𝒯\mathcal{T}_{\mathcal{M}} defined by (76) is weakly firmly non-expansive with respect to \mathcal{M} and the graph gra(𝒯){\rm gra}(\mathcal{T}_{\mathcal{M}}) of 𝒯\mathcal{T}_{\mathcal{M}} is closed, and in addition, the set \mathcal{M} satisfies Condition-𝐌\mathbf{M}. We first show the properties of operator 𝒯\mathcal{T}_{\mathcal{M}}. For this purpose, we introduce a (2pd+3n2r)×(2pd+3n2r)(2p_{d}+3n-2r)\times(2p_{d}+3n-2r) skew-symmetric matrix 𝐒𝐁\mathbf{S}_{\mathbf{B}} as

𝐒𝐁:=[𝟎(𝐁)𝐈pd+nr𝐁𝟎𝟎𝐈pd+nr𝟎𝟎].\mathbf{S}_{\mathbf{B}}:=\begin{bmatrix}\mathbf{0}&-(\mathbf{B}^{\prime})^{\top}&-\mathbf{I}_{p_{d}+n-r}\\ \mathbf{B}^{\prime}&\mathbf{0}&\mathbf{0}\\ \mathbf{I}_{p_{d}+n-r}&\mathbf{0}&\mathbf{0}\end{bmatrix}.

and then represent matrix 𝐄\mathbf{E} as 𝐄=𝐈2pd+3n2r+𝐑1𝐒𝐁\mathbf{E}=\mathbf{I}_{2p_{d}+3n-2r}+\mathbf{R}^{-1}\mathbf{S}_{\mathbf{B}}.

Proposition A.2.

Let 𝐌0\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2} be (2pd+3n2r)×(2pd+3n2r)(2p_{d}+3n-2r)\times(2p_{d}+3n-2r) matrices defined by (63), (64) and (65), respectively, and :={𝐌0,𝐌1,𝐌2}\mathcal{M}:=\{\mathbf{M}_{0},\mathbf{M}_{1},\mathbf{M}_{2}\}. If 𝒯\mathcal{T}_{\mathcal{M}} is defined by equation (76), then 𝒯\mathcal{T}_{\mathcal{M}} is weakly firmly non-expansive with respect to \mathcal{M} and the graph gra(𝒯){\rm gra}(\mathcal{T}_{\mathcal{M}}) of 𝒯\mathcal{T}_{\mathcal{M}} is closed.

Proof.

We first prove that 𝒯\mathcal{T}_{\mathcal{M}} is weakly firmly non-expansive with respect to \mathcal{M}. It suffices to prove that for any (𝐬i,𝐳i,𝝎i)2pd+3n2r×2pd+3n2r×2pd+3n2r(\mathbf{s}^{i},\mathbf{z}^{i},\bm{\omega}^{i})\in\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r} satisfying 𝝎i=𝒯(𝐬i,𝐳i)\bm{\omega}^{i}=\mathcal{T}_{\mathcal{M}}(\mathbf{s}^{i},\mathbf{z}^{i}) for i=1,2,i=1,2, there holds equation (74). It follows from definition (76) of 𝒯\mathcal{T}_{\mathcal{M}} that

𝝎i=𝒫((𝐄𝐑1𝐌0)𝝎i+𝐑1𝐌1𝐬i+𝐑1𝐌2𝐳i),i=1,2.\bm{\omega}^{i}=\mathcal{P}\big{(}(\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\bm{\omega}^{i}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{s}^{i}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{z}^{i}\big{)},\ i=1,2. (79)

By arguments similar to those used in the proof of Lemma 3.1 of [14], we have that operator 𝒫\mathcal{P} is firmly non-expansive with respect to 𝐑\mathbf{R}, that is, for all 𝐱\mathbf{x}, 𝐲2pd+3n2r\mathbf{y}\in\mathbb{R}^{2p_{d}+3n-2r},

𝒫𝐱𝒫𝐲𝐑2𝒫𝐱𝒫𝐲,𝐱𝐲𝐑.\|\mathcal{P}\mathbf{x}-\mathcal{P}\mathbf{y}\|^{2}_{\mathbf{R}}\leq\langle\mathcal{P}\mathbf{x}-\mathcal{P}\mathbf{y},\mathbf{x}-\mathbf{y}\rangle_{\mathbf{R}}.

As a result, we get by equation (79) that

𝝎2𝝎1𝐑2𝝎2𝝎1,(𝐑𝐄𝐌0)(𝝎2𝝎1)+𝐌1(𝐬2𝐬1)+𝐌2(𝐳2𝐳1).\|\bm{\omega}^{2}-\bm{\omega}^{1}\|^{2}_{\mathbf{R}}\leq\big{\langle}\bm{\omega}^{2}-\bm{\omega}^{1},(\mathbf{R}\mathbf{E}-\mathbf{M}_{0})(\bm{\omega}^{2}-\bm{\omega}^{1})+\mathbf{M}_{1}(\mathbf{s}^{2}-\mathbf{s}^{1})+\mathbf{M}_{2}(\mathbf{z}^{2}-\mathbf{z}^{1})\big{\rangle}.

Substituting 𝐑𝐄=𝐑+𝐒𝐁\mathbf{R}\mathbf{E}=\mathbf{R}+\mathbf{S}_{\mathbf{B}} into the right hand side of the inequality above, we obtain that

𝝎2𝝎1,𝐌0(𝝎2𝝎1)𝝎2𝝎1,𝐒𝐁(𝝎2𝝎1)+𝐌1(𝐬2𝐬1)+𝐌2(𝐳2𝐳1).\big{\langle}\bm{\omega}^{2}-\bm{\omega}^{1},\mathbf{M}_{0}(\bm{\omega}^{2}-\bm{\omega}^{1})\big{\rangle}\leq\big{\langle}\bm{\omega}^{2}-\bm{\omega}^{1},\mathbf{S}_{\mathbf{B}}(\bm{\omega}^{2}-\bm{\omega}^{1})+\mathbf{M}_{1}(\mathbf{s}^{2}-\mathbf{s}^{1})+\mathbf{M}_{2}(\mathbf{z}^{2}-\mathbf{z}^{1})\big{\rangle}.

This together with the fact that 𝝎2𝝎1,𝐒𝐁(𝝎2𝝎1)=0\langle\bm{\omega}^{2}-\bm{\omega}^{1},\mathbf{S}_{\mathbf{B}}(\bm{\omega}^{2}-\bm{\omega}^{1})\rangle=0 leads to equation (74). Consequently, we conclude that 𝒯\mathcal{T}_{\mathcal{M}} is weakly firmly non-expansive with respect to \mathcal{M}.

It remains to show the closedness of the graph of operator 𝒯\mathcal{T}_{\mathcal{M}}. For any sequence {(𝐬k,𝐳k,𝝎k)gra(𝒯):k}\big{\{}(\mathbf{s}^{k},\mathbf{z}^{k},\bm{\omega}^{k})\in{\rm gra}(\mathcal{T}_{\mathcal{M}}):k\in\mathbb{N}\big{\}} converging to (𝐬,𝐳,𝝎)2pd+3n2r×2pd+3n2r×2pd+3n2r(\mathbf{s},\mathbf{z},\bm{\omega})\in\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r}\times\mathbb{R}^{2p_{d}+3n-2r}, we obtain from definition (76) of 𝒯\mathcal{T}_{\mathcal{M}} that

𝝎k=𝒫((𝐄𝐑1𝐌0)𝝎k+𝐑1𝐌1𝐬k+𝐑1𝐌2𝐳k).\bm{\omega}^{k}=\mathcal{P}((\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\bm{\omega}^{k}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{s}^{k}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{z}^{k}).

Associated with (𝐬,𝐳,𝝎)(\mathbf{s},\mathbf{z},\bm{\omega}), we introduce a vector 𝝎~2pd+3n2r\widetilde{\bm{\omega}}\in\mathbb{R}^{2p_{d}+3n-2r} as

𝝎~:=𝒫((𝐄𝐑1𝐌0)𝝎+𝐑1𝐌1𝐬+𝐑1𝐌2𝐳).\widetilde{\bm{\omega}}:=\mathcal{P}((\mathbf{E}-\mathbf{R}^{-1}\mathbf{M}_{0})\bm{\omega}+\mathbf{R}^{-1}\mathbf{M}_{1}\mathbf{s}+\mathbf{R}^{-1}\mathbf{M}_{2}\mathbf{z}).

Since 𝒫\mathcal{P} is firmly non-expansive with respect 𝐑\mathbf{R}, there holds for any kk\in\mathbb{N}

𝝎k𝝎~𝐑2𝝎k𝝎~,(𝐑𝐌0)(𝝎k𝝎)+𝐌1(𝐬k𝐬)+𝐌2(𝐳k𝐳).\|\bm{\omega}^{k}-\widetilde{\bm{\omega}}\|^{2}_{\mathbf{R}}\leq\big{\langle}\bm{\omega}^{k}-\widetilde{\bm{\omega}},(\mathbf{R}-\mathbf{M}_{0})(\bm{\omega}^{k}-\bm{\omega})+\mathbf{M}_{1}(\mathbf{s}^{k}-\mathbf{s})+\mathbf{M}_{2}(\mathbf{z}^{k}-\mathbf{z})\big{\rangle}.

By letting k+k\rightarrow+\infty with noting that 𝐬k𝐬\mathbf{s}^{k}\rightarrow\mathbf{s} and 𝐳k𝐳\mathbf{z}^{k}\rightarrow\mathbf{z} as k+k\rightarrow+\infty, we get limk+𝝎k=𝝎~\lim_{k\to+\infty}\bm{\omega}^{k}=\widetilde{\bm{\omega}}. As a result, 𝝎=𝝎~\bm{\omega}=\widetilde{\bm{\omega}}. This together with the definition of 𝝎~\widetilde{\bm{\omega}} leads directly to 𝝎=𝒯(𝐬,𝐳)\bm{\omega}=\mathcal{T}_{\mathcal{M}}(\mathbf{s},\mathbf{z}). Therefore, the graph gra(𝒯){\rm gra}(\mathcal{T}_{\mathcal{M}}) of 𝒯\mathcal{T}_{\mathcal{M}} is closed. ∎

The next proposition reveals that the set \mathcal{M} satisfies Condition-𝐌\mathbf{M}.

Proposition A.3.

Let 𝐌0\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2} be (2pd+3n2r)×(2pd+3n2r)(2p_{d}+3n-2r)\times(2p_{d}+3n-2r) matrices defined by (63), (64) and (65), respectively. If conditions (71) and (72) are satisfied, then the set :={𝐌0\mathcal{M}:=\{\mathbf{M}_{0}, 𝐌1\mathbf{M}_{1}, 𝐌2}\mathbf{M}_{2}\} satisfies Condition-𝐌\mathbf{M}.

Proof.

It is clear that 𝐌0=𝐌1+𝐌2\mathbf{M}_{0}=\mathbf{M}_{1}+\mathbf{M}_{2}, that is, Item (i) of Condition-𝐌\mathbf{M} holds. To show the validity of Item (ii), we set 𝐇:=𝐌0+𝐌2\mathbf{H}:=\mathbf{M}_{0}+\mathbf{M}_{2}. By introducing block matrices 𝐅\mathbf{F} and 𝐆\mathbf{G} as in (70), we represent 𝐇\mathbf{H} as

𝐇=[𝐎θ𝐆θ𝐆𝐅].\mathbf{H}=\begin{bmatrix}&\mathbf{O}&-\theta\mathbf{G}^{\top}\\ &-\theta\mathbf{G}&\mathbf{F}\end{bmatrix}.

It follows from condition (71) that

𝐅12(θ𝐆)𝐎122<1.\big{\|}\mathbf{F}^{-\frac{1}{2}}(-\theta\mathbf{G})\mathbf{O}^{-\frac{1}{2}}\big{\|}_{2}<1. (80)

which guaranteed by Lemma 6.2 in [14] is equivalent to 𝐇𝕊+2pd+3n2r\mathbf{H}\in\mathbb{S}_{+}^{2p_{d}+3n-2r}. Hence, we get Item (ii) of Condition-𝐌\mathbf{M}.

It remains to verify Item (iii) of Condition-𝐌\mathbf{M}. Lemma 6.2 in [14] ensures that if inequality (80) holds, the norm of matrix 𝐇1\mathbf{H}^{-1} can be estimated by

𝐇12max{𝐎12,𝐅12}1|θ|𝐅12𝐆𝐎122.\|\mathbf{H}^{-1}\|_{2}\leq\frac{\max\big{\{}\|\mathbf{O}^{-1}\|_{2},\left\|\mathbf{F}^{-1}\right\|_{2}\big{\}}}{1-|\theta|\big{\|}\mathbf{F}^{-\frac{1}{2}}\mathbf{G}\mathbf{O}^{-\frac{1}{2}}\big{\|}_{2}}. (81)

We observe that

𝐌2𝐌2=(1θ)2[𝟎𝟎𝟎𝐆𝐆],\mathbf{M}_{2}^{\top}\mathbf{M}_{2}=(1-\theta)^{2}\begin{bmatrix}&\mathbf{0}&\mathbf{0}\\ &\mathbf{0}&\mathbf{G}\mathbf{G}^{\top}\end{bmatrix},

which yields that

𝐌22=|1θ|𝐆2.\|\mathbf{M}_{2}\|_{2}=|1-\theta|\|\mathbf{G}\|_{2}. (82)

It follows that

𝐇1/2𝐌2𝐇1/22𝐇1/222𝐌22.\big{\|}\mathbf{H}^{-1/2}\mathbf{M}_{2}\mathbf{H}^{-1/2}\big{\|}_{2}\leq\big{\|}\mathbf{H}^{-1/2}\big{\|}_{2}^{2}\big{\|}\mathbf{M}_{2}\big{\|}_{2}. (83)

Substituting inequality (81) and equation (82) into inequality (83) yields that

𝐇1/2𝐌2𝐇1/22|1θ|𝐆2max{𝐎12,𝐅12}1|θ|𝐅12𝐆𝐎122,\big{\|}\mathbf{H}^{-1/2}\mathbf{M}_{2}\mathbf{H}^{-1/2}\big{\|}_{2}\leq\frac{|1-\theta|\|\mathbf{G}\|_{2}\max\big{\{}\|\mathbf{O}^{-1}\|_{2},\left\|\mathbf{F}^{-1}\right\|_{2}\big{\}}}{1-|\theta|\big{\|}\mathbf{F}^{-\frac{1}{2}}\mathbf{G}\mathbf{O}^{-\frac{1}{2}}\big{\|}_{2}},

which together with condition (72) further leads to Item (iii) of Condition-𝐌\mathbf{M}. ∎

Combining Lemma A.1, Propositions A.2 and A.3, we are ready to provide in the following proof of Theorem 5.5.

Proof of Theorem 5.5:   Proposition A.2 ensures that 𝒯\mathcal{T}_{\mathcal{M}} is weakly firmly non-expansive with respect to \mathcal{M} and the graph gra(𝒯){\rm gra}(\mathcal{T}_{\mathcal{M}}) of 𝒯\mathcal{T}_{\mathcal{M}} is closed. Moreover, the set \mathcal{M}, guaranteed by Proposition A.3, satisfies Condition-𝐌\mathbf{M}. That is, the hypotheses of Lemma A.1 are satisfied. By Lemma A.1, we get that the sequence {𝐯k:k}\{\mathbf{v}^{k}:k\in\mathbb{N}\}, generated by (78) for any given 𝐯0\mathbf{v}^{0}, 𝐯12pd+3n2r\mathbf{v}^{1}\in\mathbb{R}^{2p_{d}+3n-2r}, converges to a fixed-point of 𝒯\mathcal{T}_{\mathcal{M}}. Note that Algorithm (69) has the equivalent form (78) and operator 𝒯\mathcal{T}_{\mathcal{M}} has the same fixed-point set as 𝒫𝐄\mathcal{P}\circ\mathbf{E}. Consequently, we conclude that the sequence {𝐯k:k}\{\mathbf{v}^{k}:k\in\mathbb{N}\}, generated by Algorithm (69) for any given 𝐯0,𝐯12pd+3n2r\mathbf{v}^{0},\mathbf{v}^{1}\in\mathbb{R}^{2p_{d}+3n-2r}, converges to a fixed-point of operator 𝒫𝐄\mathcal{P}\circ\mathbf{E}.

Appendix B The closed-form formulas of proximity operators

In this appendix, we provide the closed-form formulas for the proximity operators corresponding to the functions utilized in Algorithm (69).

As in numerical experiments, the real number θ\theta and the matrices 𝐎\mathbf{O}, 𝐏\mathbf{P}, 𝐐\mathbf{Q} are chosen as

θ:=1,𝐎:=1α𝐈pd+nr,𝐏:=1ρ𝐈n,𝐐:=1β𝐈pd+nr\theta:=1,\ \mathbf{O}:=\frac{1}{\alpha}\mathbf{I}_{p_{d}+n-r},\ \mathbf{P}:=\frac{1}{\rho}\mathbf{I}_{n},\ \mathbf{Q}:=\frac{1}{\beta}\mathbf{I}_{p_{d}+n-r}

with α\alpha, ρ\rho and β\beta being positive real numbers. It is known [3] that for any convex function ff from s\mathbb{R}^{s} to ¯\overline{\mathbb{R}} and μ>0\mu>0

proxf,μ𝐈s=prox1μfandprox1μf=1μ(𝐈sproxμf)(μ𝐈s).\mathrm{prox}_{f^{*},\mu\mathbf{I}_{s}}=\mathrm{prox}_{\frac{1}{\mu}f^{*}}\ \ \text{and}\ \ \mathrm{prox}_{\frac{1}{\mu}f^{*}}=\frac{1}{\mu}(\mathbf{I}_{s}-\mathrm{prox}_{\mu f})\circ(\mu\mathbf{I}_{s}).

Accordingly, Algorithm (69) reduces to

{𝐰k+1=proxαjdλj1𝐈j(𝐰kα(𝐁)𝐚kα𝐜k),𝐚k+1=ρ(𝐈nprox1ρ𝝍)(1ρ𝐚k+𝐁(2𝐰k+1𝐰k)),𝐜k+1=β(𝐈pd+nrprox1βι𝕄)(1β𝐜k+2𝐰k+1𝐰k).\left\{\begin{array}[]{l}\mathbf{w}^{k+1}=\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}}\big{(}\mathbf{w}^{k}-\alpha(\mathbf{B}^{\prime})^{\top}\mathbf{a}^{k}-\alpha\mathbf{c}^{k}\big{)},\\ \ \mathbf{a}^{k+1}=\rho\big{(}\mathbf{I}_{n}-\mathrm{prox}_{\frac{1}{\rho}\bm{\psi}}\big{)}\big{(}\frac{1}{\rho}\mathbf{a}^{k}+\mathbf{B}^{\prime}(2\mathbf{w}^{k+1}-\mathbf{w}^{k})\big{)},\\ \ \mathbf{c}^{k+1}=\beta\big{(}\mathbf{I}_{p_{d}+n-r}-\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}}\big{)}\big{(}\frac{1}{\beta}\mathbf{c}^{k}+2\mathbf{w}^{k+1}-\mathbf{w}^{k}\big{)}.\end{array}\right. (84)

We consider computing the three proximity operators in (84) explicitly. We begin with the proximity operator proxαjdλj1𝐈j\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}} with α>0\alpha>0, λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d} and 𝐈j\mathbf{I}^{{}^{\prime}}_{j}, jdj\in\mathbb{N}_{d} being defined by (12).

Proposition B.1.

If α>0\alpha>0, λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d} and 𝐈j\mathbf{I}^{{}^{\prime}}_{j}, jdj\in\mathbb{N}_{d} being defined by (12), then the proximity operator proxαjdλj1𝐈j\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}} at [𝐳𝐯]pd+nr\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r} with 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} and 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r} has the form

proxαjdλj1𝐈j([𝐳𝐯]):=[𝝁𝐯]with𝝁:=[μk:kpd]pd,\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}}\left({\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}}\right):={\scriptsize{\begin{bmatrix}\bm{\mu}\\ \mathbf{v}\end{bmatrix}}}\ \mbox{with}\ \bm{\mu}:=[\mu_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}}, (85)

where for each jdj\in\mathbb{N}_{d} and imji\in\mathbb{N}_{m_{j}}

μpj1+i={zpj1+iαλj,ifzpj1+i>αλj,zpj1+i+αλj,ifzpj1+i<αλj,0,ifzpj1+i[αλj,αλj].\mu_{p_{j-1}+i}=\left\{\begin{array}[]{ll}z_{p_{j-1}+i}-\alpha\lambda_{j},&\mathrm{if}\ z_{p_{j-1}+i}>\alpha\lambda_{j},\\ z_{p_{j-1}+i}+\alpha\lambda_{j},&\mathrm{if}\ z_{p_{j-1}+i}<-\alpha\lambda_{j},\\ 0,&\mathrm{if}\ z_{p_{j-1}+i}\in[-\alpha\lambda_{j},\alpha\lambda_{j}].\end{array}\right. (86)
Proof.

For each 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} and each 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r}, we set proxαjdλj1𝐈j([𝐳𝐯]):=[𝝁𝝂]\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|\cdot\|_{1}\circ\mathbf{I}^{{}^{\prime}}_{j}}\left({\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}}\right):={\scriptsize{\begin{bmatrix}\bm{\mu}\\ \bm{\nu}\end{bmatrix}}} with 𝝁:=[μk:kpd]pd\bm{\mu}:=[\mu_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} and 𝝂nr\bm{\nu}\in\mathbb{R}^{n-r}. It follows from definition (53) of the proximity operator that

[𝝁𝝂]=argmin{12[𝐳~𝐯~][𝐳𝐯]22+αjdλj𝐈j[𝐳~𝐯~]1:[𝐳~𝐯~]pd+nr}{\scriptsize{\begin{bmatrix}\bm{\mu}\\ \bm{\nu}\end{bmatrix}}}=\operatorname*{argmin}\left\{\frac{1}{2}\left\|\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}-\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\right\|_{2}^{2}+\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\left\|\mathbf{I}^{{}^{\prime}}_{j}\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}\right\|_{1}:\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r}\right\}

which further leads to

[𝝁𝝂]=argmin{jd(12𝐳~j𝐳j22+αλj𝐳~j1)+12𝐯~𝐯22:[𝐳~𝐯~]pd+nr}.{\scriptsize{\begin{bmatrix}\bm{\mu}\\ \bm{\nu}\end{bmatrix}}}=\operatorname*{argmin}\left\{\sum_{j\in\mathbb{N}_{d}}\left(\frac{1}{2}\|\widetilde{\mathbf{z}}_{j}-\mathbf{z}_{j}\|_{2}^{2}+\alpha\lambda_{j}\|\widetilde{\mathbf{z}}_{j}\|_{1}\right)+\frac{1}{2}\|\widetilde{\mathbf{v}}-\mathbf{v}\|_{2}^{2}:\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r}\right\}.

As a result, we have that

𝝁j=argmin{12𝐳~j𝐳j22+αλj𝐳~j1:𝐳~jmj},jd,\bm{\mu}_{j}=\operatorname*{argmin}\left\{\frac{1}{2}\|\widetilde{\mathbf{z}}_{j}-\mathbf{z}_{j}\|_{2}^{2}+\alpha\lambda_{j}\|\widetilde{\mathbf{z}}_{j}\|_{1}:\widetilde{\mathbf{z}}_{j}\in\mathbb{R}^{m_{j}}\right\},\ j\in\mathbb{N}_{d}, (87)

and

𝝂=argmin{12𝐯~𝐯22:𝐯~nr}.\bm{\nu}=\operatorname*{argmin}\left\{\frac{1}{2}\|\widetilde{\mathbf{v}}-\mathbf{v}\|_{2}^{2}:\widetilde{\mathbf{v}}\in\mathbb{R}^{n-r}\right\}. (88)

According to Examples 2.3 and 2.4 in [20] and noting that 𝝁j:=[μpj1+i:imj]\bm{\mu}_{j}:=[\mu_{p_{j-1}+i}:i\in\mathbb{N}_{m_{j}}] and 𝐳j:=[zpj1+i:imj]\mathbf{z}_{j}:=[z_{p_{j-1}+i}:i\in\mathbb{N}_{m_{j}}], jdj\in\mathbb{N}_{d}, we obtain from equation (87) that

μpj1+i=max{|zpj1+i|αλj,0}sign(zpj1+i),jd,imj.\mu_{p_{j-1}+i}=\max\{|z_{p_{j-1}+i}|-\alpha\lambda_{j},0\}{\rm sign}(z_{p_{j-1}+i}),\ j\in\mathbb{N}_{d},\ i\in\mathbb{N}_{m_{j}}.

That is, equation (85) holds. Moreover, equation (88) leads directly to 𝝂=𝐯\bm{\nu}=\mathbf{v}. ∎

We also consider the special case that n=rn=r. In this case, we present a closed-form formula for the proximity operator proxαjdλj()j1\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|(\cdot)_{j}\|_{1}} with λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d} and α>0\alpha>0.

Corollary B.2.

If λj>0\lambda_{j}>0, jdj\in\mathbb{N}_{d} and α>0\alpha>0, then the proximity operator proxαjdλj()j1\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|(\cdot)_{j}\|_{1}} at 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} has the form

proxαjdλj()j1(𝐳):=𝝁with𝝁:=[μk:kpd]pd,\mathrm{prox}_{\alpha\sum_{j\in\mathbb{N}_{d}}\lambda_{j}\|(\cdot)_{j}\|_{1}}(\mathbf{z}):=\bm{\mu}\ \mbox{with}\ \bm{\mu}:=[\mu_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}},

where for each jdj\in\mathbb{N}_{d} and imji\in\mathbb{N}_{m_{j}}, μpj1+i\mu_{p_{j-1}+i} is defined by (86).

The next closed-form formula concerns the proximity operator prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} with 𝕄:=(𝐁)×nr\mathbb{M}:=\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r} and β>0\beta>0. For a matrix 𝐌\mathbf{M}, we denote by 𝐌\mathbf{M}^{{\dagger}} the pseudoinverse of 𝐌\mathbf{M}.

Proposition B.3.

If 𝕄:=(𝐁)×nr\mathbb{M}:=\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r} and β>0\beta>0, then the proximity operator prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} at [𝐳𝐯]pd+nr\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r} with 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} and 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r} has the form

prox1βι𝕄([𝐳𝐯])=[𝐁(𝐁𝐁)𝐁𝐳𝐯].\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}}\left(\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\right)=\scriptsize{\begin{bmatrix}\mathbf{B}(\mathbf{B}^{\top}\mathbf{B})^{{\dagger}}\mathbf{B}^{\top}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}.
Proof.

According to definition (53) of the proximity operator, we obtain for each 𝐳pd\mathbf{z}\in\mathbb{R}^{p_{d}} and each 𝐯nr\mathbf{v}\in\mathbb{R}^{n-r} that prox1βι𝕄([𝐳𝐯]):=[𝝁𝝂]\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}}\left({\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}}\right):={\scriptsize{\begin{bmatrix}\bm{\mu}\\ \bm{\nu}\end{bmatrix}}} with 𝝁:=[μk:kpd]pd\bm{\mu}:=[\mu_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} and 𝝂nr\bm{\nu}\in\mathbb{R}^{n-r}, where

[𝝁𝝂]=argmin{12[𝐳~𝐯~][𝐳𝐯]22+1βι𝕄([𝐳~𝐯~]):[𝐳~𝐯~]pd+nr}.{\scriptsize{\begin{bmatrix}\bm{\mu}\\ \bm{\nu}\end{bmatrix}}}=\operatorname*{argmin}\left\{\frac{1}{2}\left\|\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}-\scriptsize{\begin{bmatrix}\mathbf{z}\\ \mathbf{v}\end{bmatrix}}\right\|_{2}^{2}+\frac{1}{\beta}\iota_{\mathbb{M}}\left({\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}}\right):\scriptsize{\begin{bmatrix}\widetilde{\mathbf{z}}\\ \widetilde{\mathbf{v}}\end{bmatrix}}\in\mathbb{R}^{p_{d}+n-r}\right\}.

By noting that 𝕄:=(𝐁)×nr\mathbb{M}:=\mathcal{R}(\mathbf{B})\times\mathbb{R}^{n-r}, we rewrite the above equation as

𝝁=argmin{12𝐳~𝐳22:𝐳~(𝐁)}\bm{\mu}=\operatorname*{argmin}\left\{\frac{1}{2}\|\widetilde{\mathbf{z}}-\mathbf{z}\|_{2}^{2}:\widetilde{\mathbf{z}}\in\mathcal{R}(\mathbf{B})\right\} (89)

and

𝝂=argmin{12𝐯~𝐯22:𝐯~nr}.\bm{\nu}=\operatorname*{argmin}\left\{\frac{1}{2}\|\widetilde{\mathbf{v}}-\mathbf{v}\|_{2}^{2}:\widetilde{\mathbf{v}}\in\mathbb{R}^{n-r}\right\}. (90)

Equation (89) shows that 𝝁\bm{\mu} is the best approximation to 𝐳\mathbf{z} from the subspace (𝐁)\mathcal{R}(\mathbf{B}). Hence, we get that 𝝁=𝐁𝐱\bm{\mu}=\mathbf{B}\mathbf{x} with vector 𝐱n\mathbf{x}\in\mathbb{R}^{n} satisfying

(𝐁𝐲)(𝐳𝐁𝐱)=0,for all𝐲n.(\mathbf{B}\mathbf{y})^{\top}(\mathbf{z}-\mathbf{B}\mathbf{x})=0,\ \mbox{for all}\ \mathbf{y}\in\mathbb{R}^{n}.

By rewriting the above equation as

𝐲𝐁(𝐳𝐁𝐱)=0,for all𝐲n,\mathbf{y}^{\top}\mathbf{B}^{\top}(\mathbf{z}-\mathbf{B}\mathbf{x})=0,\ \mbox{for all}\ \mathbf{y}\in\mathbb{R}^{n},

we have that vector 𝐱\mathbf{x} is a solution of the linear system

𝐁𝐁𝐱=𝐁𝐳.\mathbf{B}^{\top}\mathbf{B}\mathbf{x}=\mathbf{B}^{\top}\mathbf{z}.

By using the pseudoinverse of 𝐁𝐁\mathbf{B}^{\top}\mathbf{B}, we represent 𝐱\mathbf{x} as

𝐱=(𝐁𝐁)𝐁𝐳+𝐱0,\mathbf{x}=(\mathbf{B}^{\top}\mathbf{B})^{{\dagger}}\mathbf{B}^{\top}\mathbf{z}+\mathbf{x}_{0},

where 𝐱0n\mathbf{x}_{0}\in\mathbb{R}^{n} satisfying 𝐁𝐁𝐱0=0\mathbf{B}^{\top}\mathbf{B}\mathbf{x}_{0}=0. Note that 𝐁𝐁𝐱0=0\mathbf{B}^{\top}\mathbf{B}\mathbf{x}_{0}=0 if and only if 𝐁𝐱0=0\mathbf{B}\mathbf{x}_{0}=0. Thus, 𝝁=𝐁𝐱=𝐁(𝐁𝐁)𝐁𝐳\bm{\mu}=\mathbf{B}\mathbf{x}=\mathbf{B}(\mathbf{B}^{\top}\mathbf{B})^{{\dagger}}\mathbf{B}^{\top}\mathbf{z}. Moreover, we can obtain directly from equation (90) that 𝝂=𝐯\bm{\nu}=\mathbf{v}. ∎

In the case when n=rn=r, the subspace 𝕄\mathbb{M} reduces to (𝐁)\mathcal{R}(\mathbf{B}). Since 𝐁\mathbf{B} has full column rank, the matrix 𝐁𝐁\mathbf{B}^{\top}\mathbf{B} is nonsingular. As a direct consequence of Proposition B.3, we describe the closed-form formula of the proximity operator prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} for this special case as follows.

Corollary B.4.

If matrix 𝐁\mathbf{B} has full column rank, 𝕄:=(𝐁)\mathbb{M}:=\mathcal{R}(\mathbf{B}) and β>0\beta>0, then the proximity operator prox1βι𝕄\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}} at 𝐳:=[zk:kpd]pd\mathbf{z}:=[z_{k}:k\in\mathbb{N}_{p_{d}}]\in\mathbb{R}^{p_{d}} has the form

prox1βι𝕄(𝐳)=𝐁(𝐁𝐁)1𝐁𝐳.\mathrm{prox}_{\frac{1}{\beta}\iota_{\mathbb{M}}}(\mathbf{z})=\mathbf{B}(\mathbf{B}^{\top}\mathbf{B})^{-1}\mathbf{B}^{\top}\mathbf{z}. (91)

Finally, we give closed-form formulas for the proximity operators of some loss functions, which will be used in numerical experiments. The first loss function is the 2\ell_{2} norm composed with a matrix.

Proposition B.5.

If 𝐀\mathbf{A} is an m×nm\times n matrix, 𝐱m\mathbf{x}\in\mathbb{R}^{m} and ρ>0\rho>0, then the proximity operator prox12ρ𝐀𝐱22\mathrm{prox}_{\frac{1}{2\rho}\|\mathbf{A}\cdot-\mathbf{x}\|_{2}^{2}} at 𝐳n\mathbf{z}\in\mathbb{R}^{n} has the form

prox12ρ𝐀𝐱22(𝐳)=(ρ𝐈n+𝐀𝐀)1(ρ𝐳+𝐀𝐱).\mathrm{prox}_{\frac{1}{2\rho}\|\mathbf{A}\cdot-\mathbf{x}\|_{2}^{2}}(\mathbf{z})=\left(\rho\mathbf{I}_{n}+\mathbf{A}^{\top}\mathbf{A}\right)^{-1}\left(\rho\mathbf{z}+\mathbf{A}^{\top}\mathbf{x}\right).
Proof.

By setting prox12ρ𝐀𝐱22(𝐳):=𝝁\mathrm{prox}_{\frac{1}{2\rho}\|\mathbf{A}\cdot-\mathbf{x}\|_{2}^{2}}(\mathbf{z}):=\bm{\mu}, we obtain from definition (53) of the proximity operator that

𝝁=argmin{12𝐮𝐳22+12ρ𝐀𝐮𝐱22:𝐮n},\bm{\mu}=\operatorname*{argmin}\left\{\frac{1}{2}\|\mathbf{u}-\mathbf{z}\|_{2}^{2}+\frac{1}{2\rho}\|\mathbf{A}\mathbf{u}-\mathbf{x}\|_{2}^{2}:\mathbf{u}\in\mathbb{R}^{n}\right\},

which together with the Fermat rule leads to

𝝁𝐳+1ρ𝐀(𝐀𝝁𝐱)=𝟎.\bm{\mu}-\mathbf{z}+\frac{1}{\rho}\mathbf{A}^{\top}(\mathbf{A}\bm{\mu}-\mathbf{x})=\mathbf{0}.

By rewriting the above equation as

(ρ𝐈n+𝐀𝐀)𝝁=ρ𝐳+𝐀𝐱\left(\rho\mathbf{I}_{n}+\mathbf{A}^{\top}\mathbf{A}\right)\bm{\mu}=\rho\mathbf{z}+\mathbf{A}^{\top}\mathbf{x}

and noting that ρ𝐈n+𝐀𝐀\rho\mathbf{I}_{n}+\mathbf{A}^{\top}\mathbf{A} is nonsingular, we obtain that

𝝁=(ρ𝐈n+𝐀𝐀)1(ρ𝐳+𝐀𝐱),\bm{\mu}=\left(\rho\mathbf{I}_{n}+\mathbf{A}^{\top}\mathbf{A}\right)^{-1}\left(\rho\mathbf{z}+\mathbf{A}^{\top}\mathbf{x}\right),

which completes the proof of this proposition. ∎

Two special cases of Proposition B.5 are used in numerical experiments. In Subsection 6.1, the loss function is chosen as Ψ:=12𝐱22\Psi:=\frac{1}{2}\|\cdot-\mathbf{x}\|_{2}^{2}. According Proposition B.5, the proximity operator prox1ρΨ\mathrm{prox}_{\frac{1}{\rho}\Psi} at 𝐳:=[zj:jn]n\mathbf{z}:=[z_{j}:j\in\mathbb{N}_{n}]\in\mathbb{R}^{n} has the form prox1ρΨ(𝐳):=𝐰\mathrm{prox}_{\frac{1}{\rho}\Psi}(\mathbf{z}):=\mathbf{w} with 𝐰:=[wj:jn]\mathbf{w}:=[w_{j}:j\in\mathbb{N}_{n}], where

wj:=(ρzj+xj)/(ρ+1),for alljn.w_{j}:=(\rho{z_{j}}+x_{j})/(\rho+1),\ \mbox{for all}\ j\in\mathbb{N}_{n}. (92)

In Subsection 6.3, the loss function 𝝍\bm{\psi} is defined by (6) and the proximity operator prox1ρ𝝍\mathrm{prox}_{\frac{1}{\rho}\bm{\psi}} at 𝐳n\mathbf{z}\in\mathbb{R}^{n} can be represented by

prox1ρ𝝍(𝐳):=(ρ𝐈n+𝐇𝐇)1(ρ𝐳+𝐇𝐇𝐲).\mathrm{prox}_{\frac{1}{\rho}\bm{\psi}}(\mathbf{z}):=(\rho\mathbf{I}_{n}+\mathbf{H}^{\top}\mathbf{H})^{-1}(\rho\mathbf{z}+\mathbf{H}^{\top}\mathbf{Hy}). (93)

The loss function is chosen in Subsection 6.4 as ϕ(𝐳):=jpmax{0,1zj}\bm{\phi}(\mathbf{z}):=\sum_{j\in\mathbb{N}_{p}}\mathrm{max}\{0,1-z_{j}\} for 𝐳:=[zj:jp]p\mathbf{z}:=[z_{j}:j\in\mathbb{N}_{p}]\in\mathbb{R}^{p}. The closed-form formula of the proximity operator prox1ρϕ\mathrm{prox}_{\frac{1}{\rho}\bm{\phi}} with ρ>0\rho>0 has been given in [16]. Specifically, the proximity operator prox1ρϕ\mathrm{prox}_{\frac{1}{\rho}\bm{\phi}} at 𝐳p\mathbf{z}\in\mathbb{R}^{p} has the form prox1ρϕ(𝐳):=[μj:jp]\mathrm{prox}_{\frac{1}{\rho}\bm{\phi}}(\mathbf{z}):=[\mu_{j}:j\in\mathbb{N}_{p}], where for all jpj\in\mathbb{N}_{p},

μj:={zj+1ρ,ifzj<11ρ,zj,ifzj>1,1,ifzj[11ρ,1].\mu_{j}:=\left\{\begin{array}[]{ll}z_{j}+\frac{1}{\rho},&\mathrm{if}\ z_{j}<1-\frac{1}{\rho},\\ z_{j},&\mathrm{if}\ z_{j}>1,\\ 1,&\mathrm{if}\ z_{j}\in[1-\frac{1}{\rho},1].\end{array}\right. (94)

References

References

  • [1] Afonso M V, Bioucas-Dias J M and Figueiredo M A T 2010 An augmented Lagrangian approach to linear inverse problems with compound regularization IEEE International Conference on Image Processing 4169–72
  • [2] Argyriou A, Micchelli C A, Pontil M, Shen L and Xu Y 2011 Efficient first order methods for linear composite regularizers (arXiv:1104.1436)
  • [3] Bauschke H H and Combettes P L 2011 Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, New York)
  • [4] Belkin M, Niyogi P and Sindhwani V 2006 Manifold regularization: a geometric frame work for learning from labeled and unlabeled examples J. Mach. Learn. Res. 7 2399–434
  • [5] Björck Å 1996 Numerical methods for least squares problems (Philadelphia, PA: SIAM)
  • [6] Brezinski C, Redivo-Zaglia M, Rodriguez G and Seatzu S 2003 Multi-parameter regularization techniques for ill-conditioned linear system Numer. Math. 94 203–28
  • [7] Byrne C 2003 A unified treatment of some iterative algorithms in signal processing and image reconstruction Inverse problems 20 103
  • [8] Chen Z, Lu Y, Xu Y and Yang H 2008 Multi-parameter Tikhonov regularization for ill-posed operator equations J. Comput. Math. 26 37–55
  • [9] Ding L and Han W 2019 α1\alpha\ell_{1}-β2\beta\ell_{2} regularization for sparse recovery Inverse Problems 35 125009
  • [10] Düvelmeyer D and Hofmann B 2006 A multi-parameter regularization approach for estimating parameters in jump diffusion process J. Inverse Ill-posed Probl. 14 861–80.
  • [11] Friedman J, Hastie T, Höfling H and Tibshirani R 2007 Pathwise coordinate optimization Ann. Appl. Stat. 1 302–32
  • [12] Horn R A and Johnson C R 2012 Matrix Analysis (Cambridge: Cambridge University Press)
  • [13] LeCun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition Proc. IEEE 86 2278–324
  • [14] Li Q, Shen L, Xu Y and Zhang N 2015 Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing Adv. Comput. Math. 41 387–422
  • [15] Li Q, Xu Y and Zhang N 2017 Two-step fixed-point proximity algorithms for multi-block separable convex problems J. Sci. Comput. 70 1204–28
  • [16] Li Z, Song G and Xu Y 2019 A two-step fixed-point proximity algorithm for a class of non-differentiable optimization models in machine learning J. Sci. Comput. 81 923–40
  • [17] Liu Q, Wang R, Xu Y and Yan M 2023 Parameter Choices for Sparse Regularization with the 1\ell_{1} Norm Inverse Problems 39 025004
  • [18] Lu S and Pereverzev S V 2013 Regularization Theory for Ill-Posed Problems. Selected Topics (De Gruyter, Berlin, Boston)
  • [19] Lu Y, Shen L and Xu Y 2007 Multi-parameter regularization methods for high-resolution image reconstruction with displacement errors IEEE Trans. Circuits Systems I 54 1788–99
  • [20] Micchelli C A, Shen L and Xu Y 2011 Proximity algorithms for image models: denoising Inverse Problems 27 045009
  • [21] Rapaport F, Barillot E and Vert J P 2008 Classification of arrayCGH data using fused SVM Bioinformatics 24 i375–82.
  • [22] Selesnick I W, Graber H L, Pfeil D S and Barbour R L 2014 Simultaneous low-pass filtering and total variation denoising IEEE Trans. Image Process. 62 1109–24
  • [23] Shen L, Wang R, Xu Y and Yan M 2024 Sparse Deep Learning Models with the 1\ell_{1} Regularization (arXiv:2408.02801)
  • [24] Showalter R E 1997 Monotone Operators in Banach Spaces and Nonlinear Partial Differential Equations (American Mathematical Society, Providence)
  • [25] Tibshirani R 1996 Regression shrinkage and selection via the lasso J. Roy. Statist. Soc. Ser. B 58 267–88.
  • [26] Tibshirani R, Saunders M, Rosset S, Zhu J and Knight K 2005 Sparsity and smoothness via the fused lasso J. R. Stat. Soc. B 67 91–108
  • [27] Tolosi L and Lengauer T 2011 Classification with correlated features: unreliability of feature ranking and solutions Bioinformatics 27 1986–94.
  • [28] Wang W, Lu S, Mao H and Cheng J 2013 Multi-parameter Tikhonov regularization with the 0\ell_{0} sparsity constraint Inverse Problems 29 065018
  • [29] Xu Y 2023 Sparse regularization with the 0\ell_{0} norm Anal. Appl. 21 901–29
  • [30] Zălinescu C 2002 Convex Analysis in General Vector Spaces (River Edge, NJ: World Scientific)