Parameter Choices for Sparse Multi-Parameter Regularization with the Norm
Abstract
This paper introduces a multi-parameter regularization approach using the norm, designed to better adapt to complex data structures and problem characteristics while offering enhanced flexibility in promoting sparsity in regularized solutions. As data volumes grow, sparse representations of learned functions become critical for reducing computational costs during function operations. We investigate how the selection of multiple regularization parameters influences the sparsity of regularized solutions. Specifically, we characterize the relationship between these parameters and the sparsity of solutions under transform matrices, enabling the development of an iterative scheme for selecting parameters that achieve prescribed sparsity levels. Special attention is given to scenarios where the fidelity term is non-differentiable, and the transform matrix lacks full row rank. In such cases, the regularized solution, along with two auxiliary vectors arising in the sparsity characterization, are essential components of the multi-parameter selection strategy. To address this, we propose a fixed-point proximity algorithm that simultaneously determines these three vectors. This algorithm, combined with our sparsity characterization, forms the basis of a practical multi-parameter selection strategy. Numerical experiments demonstrate the effectiveness of the proposed approach, yielding regularized solutions with both predetermined sparsity levels and satisfactory approximation accuracy.
, and
1 Introduction
Multi-parameter regularization is a widely used technique for addressing ill-posed problems [4, 6, 8, 9, 10, 18, 19, 28]. Motivated by the challenges posed by big data in practical applications, sparse multi-parameter regularization using the norm has become a prominent area of research [1, 11, 21, 22, 23, 26]. Empirical studies have shown that, compared to single-parameter regularization with the norm, the multi-parameter approach offers greater flexibility in promoting sparsity in the regularized solutions while effectively mitigating the ill-posedness inherent in practical problems.
This paper aims to develop a practical parameter selection strategy for multi-parameter regularization with the norm, enabling the regularized solution to achieve a specified level of sparsity. To achieve this, we first establish a theoretical understanding of how the regularization parameters affect the sparsity of the solution under transform matrices. Unlike single-parameter regularization, the multi-parameter framework allows independent control over the sparsity of the solution corresponding to each transform matrix. Using tools from convex analysis, we characterize the relationship between the choice of multiple regularization parameters and the sparsity of the solution under the transform matrices. In the special case where the transform matrices are degenerate identities and the fidelity term exhibits block separability, this characterization provides a direct multi-parameter selection strategy that ensures each sub-vector of the solution achieves a desired level of block sparsity. In the general case, where such direct adoption is not feasible, we propose an iterative scheme to select the multiple regularization parameters. This scheme ensures the regularized solution attains the prescribed sparsity levels under the transform matrices. Unlike single-parameter regularization models, where only one parameter is optimized, our iterative algorithm considers the interplay between multiple parameters, thereby enhancing the flexibility and effectiveness of the parameter selection process.
In our recent papers [17] and [23], we investigated parameter selection strategies for sparse regularization using the norm. Compared to these studies, this paper makes two significant contributions: First, we have established strategies for selecting multiple regularization parameters in a model where the objective function comprises a convex fidelity term and multiple -based regularization terms, each composed with a linear transform. Unlike the iterative algorithm in [17], which focused on selecting a single parameter, the algorithm proposed here addresses multiple parameters and explicitly incorporates their interdependencies. While [23] also explored iterative schemes for multi-parameter selection in a highly nonconvex optimization problem with numerous network parameters and extensive training data, the model in this paper facilitates a more sophisticated iterative algorithm. Second, we have developed iterative schemes for selecting multiple regularization parameters in two distinct scenarios: (1) when the fidelity term is differentiable, and the transform matrix has full row rank, and (2) when the fidelity term is nondifferentiable, and the transform matrix lacks full row rank. In contrast, the approaches in [17] and [23] address parameter selection only for the case of a differentiable fidelity term and a full-rank transform matrix.
For the nondifferentiable fidelity term and non-full-rank transform matrix, each iteration requires determining not only the regularized solution but also two auxiliary vectors. To address this, we have characterized these three components using a system of fixed-point equations involving proximity operators. Based on this characterization, we have devised a fixed-point proximity algorithm capable of simultaneously computing all three vectors. The convergence of this algorithm is rigorously analyzed in this paper. By combining the fixed-point proximity algorithm with the sparsity characterization of the regularized solution, we propose a robust multi-parameter selection strategy that ensures the solution achieves a specified level of sparsity.
This paper is organized into seven sections and two appendices. Section 2 introduces the general multi-parameter regularization problem under investigation and highlights several examples of practical significance. In Section 3, we establish a characterization of how each regularization parameter affects the sparsity of the regularized solution under its associated transform matrix. This characterization is further specialized in Section 4 to the case where the transform matrices are degenerate identities. Building on the insights from Sections 3 and 4, Section 5 presents iterative schemes for simultaneously determining multiple parameters and solutions with prescribed sparsity levels under the transform matrices. Two scenarios are addressed: (1) when the fidelity term is differentiable and the transform matrix has full row rank, and (2) when the fidelity term is nondifferentiable and the transform matrix does not have full row rank. Section 6 demonstrates the effectiveness of the proposed parameter selection strategies through a series of numerical examples. Section 7 concludes the paper by summarizing the findings. Appendix A contains the proof of the convergence analysis for the fixed-point proximity algorithm introduced in Section 5, while Appendix B provides closed-form expressions for the proximity operators used in the algorithm.
2 Multi-parameter regularization with the norm
In this section, we introduce the multi-parameter regularization problem considered in this paper. Additionally, we review several optimization models of practical significance and illustrate how they can be formulated within this general framework.
We begin with describing the multi-parameter regularization problem. For each , let and set . Suppose that is a convex function and for each , and is an real matrix. For each and , we define its norm by . We consider the multi-parameter regularization problem
(1) |
where is a sequence of positive regularization parameters.
The multi-parameter regularization problem (1) appears in many application areas. Below, we present several examples of it. In imaging reconstruction problems, a combination of multiple regularizers was used to encourage their solution to simultaneously exhibit the characteristics enforced by each of them. For example, a combination of frame-based synthesis and analysis norm regularizers was proposed in [1] for an imaging deblurring problem. Specifically, for , we assume that represents a periodic convolution, is a synthesis operator whose columns contain the elements of a frame, and is an analysis operator of a tight Parseval frame satisfying , where denotes the identity matrix of order . Let be observed data. The regularization problem combining the synthesis and analysis norm regularizers has the form
(2) |
Clearly, problem (2) may be identified as a special case of (1) with
As a generalization of the lasso regularized model [25], the fused lasso regularized model was proposed in [26] for problems with features that can be ordered in some meaningful way. Let . Suppose that a prediction problem with cases has outcomes , and features , , . Let be the matrix of features and , be the vectors of outcomes and coefficients, respectively. The fused lasso regularized model is formulated as
(3) |
where is the first order difference matrix with for and otherwise. By introducing
the fused lasso model (3) can be rewritten in the form of (1). By penalizing the norm of both the coefficients and their successive differences, the fused lasso regularized model encourages the sparsity of the coefficients and also the sparsity of their differences. As a special case, the fused lasso signal approximation [11] has the form (3) with the feature matrix being the identity matrix.
Filtering noisy data was considered in [22] for the case where the underlying signal comprises a low-frequency component and a sparse or sparse-derivative component. Specifically, assume that the noisy data can be modeled as
(4) |
where is a low-pass signal, is a sparse and sparse-derivative signal and is stationary white Gaussian noise. Given noisy data of the form (4), one seeks the estimate of and individually. For this purpose, we first solve the compound sparse denoising problem
(5) |
to obtain the estimate of . Here, and is the high-pass filter matrix with the form , where and are banded matrices. We then get the estimate of as . It is clear that the compound sparse denoising model (5) has the form (1) with
(6) |
and , .
Using a technique similar to that used in the fused lasso regularized model, the fused SVM was proposed for classification of array-based comparative genomic hybridization (arrayCGH) data [21, 27]. Given training data composed of sample points and labels . The aim of binary classification is to find a decision function , predicting the class or . The class prediction for a profile is then if and otherwise. The fused SVM based on the hinge loss function has the form
(7) |
We rewrite model (7) in the form (1) as follows. We define the matrix , the matrix and the function , for all . Then by introducing , for , , and , the fused SVM model (7) can be represented in the form (1).
3 Choices of the multiple regularization parameters
In this section, we characterize the relation between the multiple regularization parameters and the sparsity of the regularized solution under the transform matrices , , of the regularization problem (1). Unlike the single-parameter regularization problem, the use of multiple regularization parameters in problem (1) allows us to separately consider the sparsity of the solution under each transform matrix .
We begin by recalling the definition of the level of sparsity for a vector in . For , we set . A vector is said to have sparsity of level if it has exactly nonzero components.
We now reformulate the regularization problem (1) into an equivalent form to facilitate the characterization of the sparsity of the solution under the transform matrices , . Let and , . We decompose a vector into sub-vectors by setting
By introducing a column block matrix
(8) |
we write for the block column vector . Thus, we may rewrite the regularization problem (1) as
(9) |
We further convert problem (9) into an equivalent form by utilizing a change of variables. To achieve this, we consider inverting the linear system
(10) |
Here, denotes the range of . It is known from [5, 12] that the general solution of the linear system (10) can be represented by the pseudoinverse of . An alternative form of the general solution was provided in [17]. To describe this result, we recall that if has the rank satisfying , then has the SVD as , where and are and orthogonal matrices, respectively, and is a diagonal matrix with the nonzero diagonal entries , which are the nonzero singular values of . In order to represent the general solution of linear system (10), we define an matrix by employing the SVD of . Specifically, we denote by the matrix composed of the first columns of and define an block diagonal matrix by setting
We also introduce a diagonal matrix of order by
Using these matrices, we define an matrix by As has been shown in [17], for each solution of system (10), there exists a unique vector such that
(11) |
As a result, the mapping , defined for each by where and satisfies equation (11), is bijective from onto .
By making use of the change of variables defined by equation (11), we reformulate problem (9) as an equivalent multi-parameter regularization problem with for each , being a degenerated identity. We set and let denote the indicator function of , that is, if , and otherwise. For each , we introduce a degenerated identity matrix by
(12) |
where denotes the zero matrix of order . We show in the following lemma that the regularization problem (1) is equivalent to the regularization problem
(13) |
Lemma 3.1.
Proof.
We first prove that is a solution of problem (1) if and only if is a solution of the constrained optimization problem
(14) |
As has been shown in [17], is a bijective mapping from to . It suffices to verify that for all there holds
By the definition of mapping , we get that and , which confirm the validity of the equation above.
Below, we consider how the regularization parameters , influence the sparsity of the solution of problem (13). To characterize the solution of problem (13), we need the notion of the subdifferential of a convex function. Suppose that is a proper convex function. The subdifferential of at is defined by
It is known [30] that for two convex functions and on , if is continuous on then , for all . We also describe the chain rule of the subdifferential [24]. Suppose that is a convex function and is an matrix. If is continuous at some point of the range of , then for all
(15) |
The Fermat rule [30] states that if a proper convex function has a minimum at if and only if .
For the purpose of characterizing sparsity of vectors in , we also recall the notion of the sparsity partition of introduced in [29]. Specifically, by using the canonical basis for , we introduce numbers of subsets of by
It is clear that the sets , form a partition for and for each , coincides with the set of all vectors in having sparsity of level .
We provide in the next lemma a characterization of the sparsity of the solution of problem (13). For an matrix , we denote by the null space of and for each , denote by the th column of .
Lemma 3.2.
Proof.
By the Fermat rule, we have that is a solution of problem (13) if and only if
which by the continuity of the norm and the chain rule (15) of the subdifferential is equivalent to
(19) |
Let denote the orthogonal complement of . It is known that for all . Recalling that , we get that
As a result, , for all . Substituting the above equation with into the inclusion relation (19), we conclude that is a solution of problem (13) if and only if there exist and such that
(20) |
Note that for each
with which we obtain that
Combining inclusion relation (20) with the above equation, we have that for all , which coincides with equation (16) and for each
which are equivalent to equation (17) and inequality (18), respectively. ∎
Combining Lemmas 3.1 with 3.2, we establish the relation between the regularization parameters and the sparsity of the solution of problem (1).
Theorem 3.3.
Proof.
It follows from Lemma 3.1 that with for each is a solution of problem (1) if and only if with for each is a solution of problem (13). The latter guaranteed by Lemma 3.2 is equivalent to that there exist and such that (16), (17) and (18) hold. It suffices to show that . This is done by noting that . ∎
Theorem 3.3 provides a characterization of the multiple regularization parameter , , with which problem (1) has a solution with sparsity of a certain level under each transform matrix . By specifying the sparsity level of the solution under each transform matrix to be , our goal is to find regularization parameters , , satisfying conditions (16), (17) and (18). However, since these conditions depend on the corresponding solution, the characterization stated in Theorem 3.3 can not be used directly as a multi-parameter choice strategy. Motivated by Theorem 3.3, an iterative scheme to be developed in section 5 will enable us to choose multiple regularization parameters with which a minimizer of problem (1) has a prescribed sparsity level under each transform matrix.
The next result concerns the special case that matrix defined by (8) has full row rank, that is, .
Corollary 3.4.
Proof.
Theorem 3.3 ensures that with for each is a solution of problem (1) if and only if there exist and such that (16), (17) and (18) hold. By the assumption that , we rewrite equation (16) as equation (21). It follows from that . Then vector in (17) and (18) is the zero vector. Thus, (22) and (23) can be obtained directly. ∎
4 A special model with degenerated transform matrices
In this section, we consider the special case where and for each the transform matrix takes the form
(24) |
In this scenario, the multi-parameter regularization problem (1) assumes the special form
(25) |
We specialize the characterizations of the sparsity of regularized solutions established in the previous section to this special case. Moreover, particular attention is given to scenarios where the fidelity term is block-separable.
We first characterize the sparsity of the solution of problem (25). It follows from equation (24) that matrix defined by (8) coincides with the identity matrix of order . Given that matrix has full row rank, we specialize Corollary 3.4 to the regularization problem (25). In addition, the transform matrices , , with the form (24) enable us to consider the sparsity of each sub-vector of the regularized solution separately.
Theorem 4.1.
Suppose that is a convex function. Then problem (25) with , , has a solution with for each , , for some if and only if there exists such that for each
(26) |
(27) |
In particular, if is differentiable, then the conditions reduce to for each
(28) |
(29) |
Proof.
Since matrix defined by equation (8) has full row rank, Corollary 3.4 ensures that problem (25) with , , has a solution with for each , , for some if and only if for each there exists such that conditions (21), (22) and (23) hold. Note that index such that equation (21) holds belongs to an empty set since . It is clear that matrix appearing in conditions (22) and (23) is also an identity matrix. Hence, conditions (22) and (23) reduce to (26) and (27), respectively. If is differentiable, then the subdifferential of at is the singleton . Substituting into (26) and (27) lead directly to (28) and (29), respectively. ∎
As a specific example, we consider the regularization problem
(30) |
In this model, the fidelity term
(31) |
is convex and differentiable. As a result, we apply Theorem 4.1 to this model.
Corollary 4.2.
Suppose that and are given. Then the regularization problem (30) with , has a solution with for each , , for some if and only if for each
(32) |
(33) |
Proof.
Since the fidelity term defined by (31) is convex and differentiable, Theorem 4.1 confirms that problem (30) with , , has a solution with for each , , for some if and only if (28) and (29) hold. Note that the gradient of at has the form . As a result, there holds for each and each
According to the above representations of the partial derivatives of , conditions (28) and (29) reduce to (32) and (33), respectively.
∎
We next study the case that the fidelity term involved in problem (25) has special structure, that is, is block separable. To describe the block separability of a function on , we introduce a partition of the index set . Let with . We suppose that is a partition of in the sense that , for all , if , and . For each we denote by the cardinality of and regard as an ordered set in the natural order of the elements in . That is,
Associated with partition , we decompose into sub-vectors by setting
A function is called -block separable if there exist functions , such that
We now describe the block separablity of the fidelity term . Recall that . If the partition for is chosen with then for each the sub-vector of coincides with . It is clear that the regularization term in problem (25) is -block separable. We also assume that is -block separable, that is, there exist functions , such that
(34) |
Combining the block separability of the fidelity term and the norm function , the multi-parameter regularization problem (25) can be reduced to the following lower dimensional single-parameter regularization problems
(35) |
Note that the sparsity of the solution of each single parameter regularization problem (35) was characterized in [17]. This characterization can also be derived from Theorem 4.1. We further assume that functions , has block separability. For each , let be a partition of and for each , be the cardinality of . For each , we set for all and . Suppose that for each , has the form
(36) |
with being functions from to , .
We are ready to characterize the block sparsity of each sub-vector of the solution of problem (25) when has block separability described above. Here, we say that a vector has -block sparsity of level if has exactly number of nonzero sub-vectors with respect to partition .
Theorem 4.3.
Suppose that for each and each , is a convex function and is an -block separable function having the form (36). Let be the function with the form (34). Then problem (25) with , , has a solution with for each , having the -block sparsity of level for some if and only if for each there exist distinct , , such that
(37) |
In particular, if , , are differentiable, then condition (37) reduces to
(38) |
Proof.
Observing from the block separability of functions and , we conclude that is a solution of problem (25) if and only if for each , is a solution of problem (35). Theorem 3.2 in [17] ensures that for each , is a solution of problem (35) and has the -block sparsity of level for some if and only if there exist distinct , , such that (37) holds. For the case that , , are all differentiable, it suffices to notice that the subdifferential of at zero are the singleton . This together with inequality (37) leads to inequality (38). ∎
Unlike in Theorems 3.3 and 4.1, the characterization stated in Theorem 4.3 can be taken as a multi-parameter choice strategy. That is, when the fidelity term is block separable, if for each , the regularization parameter is chosen so that inequality (37) (or (38)) holds, then the regularization problem (25) has a solution with each sub-vector having a block sparsity of a prescribed level. The choice of the parameters depends on the subdifferentials or the gradients of the functions , , .
We also specialize Theorem 4.3 to the regularization problem (30). For this purpose, we require that the fidelity term defined by (31) is block separable. Associated with the partition for with , we decompose matrix into sub-matrices by setting
By lemma 3.4 of [17], the fidelity term defined by (31) is -block separable if and only if there holds
(39) |
It follows from the decomposition of and that of each vector in with respect to that , for all . According to this equation and condition (39), we represent defined by (31) as in (34) with , , being defined by
(40) |
To describe the block separability of functions , , we recall that for each , is a partition of . Associated with the partition , matrix can be decomposed into sub-matrices by setting
It is clear that the last two terms in the right hand side of equation (40) are both -block separable. Hence, again by lemma 3.4 of [17], we conclude that the functions with the form (40) is -block separable if and only if there holds
(41) |
We represent , , as in (36) when condition (41) holds. For each , the decomposition of and that of each vector in with respect to lead to , for all . Substituting the above equation into definition (40) with noting that condition (41) holds, we represent as in (36) with , , having the form
(42) |
We now apply Theorem 4.3 to the regularization problem (30) when the matrix satisfies conditions (39) and (41).
Corollary 4.4.
Proof.
As pointed out before, condition (39) ensures that the fidelity term defined by (31) is -block separable and has the form (34) with , , being defined by (40). Moreover, condition (41) guarantees that for each , the function is -block separable and can be represented as in (36) with , , having the form (42). Clearly, , , are all convex and differentiable functions. Consequently, we conclude by Theorem 4.3 that the regularization problem (30) with , , has a solution with for each , having the -block sparsity of level for some if and only if for each there exist distinct , , such that inequality (38) holds. Note that for each , for all . Substituting this equation into inequality (38) leads directly to the desired inequality. ∎
5 Iterative schemes for parameter choices
Theorem 3.3 characterizes the influence of each regularization parameter on the sparsity of the solution to problem (1) under the transform matrix . Based on this characterization, we develop iterative schemes in this section for selecting multiple regularization parameters that achieve prescribed sparsity levels in the solution of problem (1) under different transform matrices. We consider two cases: when the fidelity term term is differentiable and the transform matrix has full row rank, as well as when is non-differentiable and does not have full row rank.
Theorem 3.3 shows that if is a solution of problem (1) with and for each , has sparsity of level under , then there exist and such that for each , satisfies conditions (17) and (18). According to these conditions, we introduce for each , a sequence by
(43) |
and rearrange them in a nondecreasing order:
(44) |
The equality (17) and the inequality (18) that the parameter needs to satisfy corresponds the non-zero components and the zero components of , respectively. Thus, if , then must be zero and if , then may be zero or nonzero. With the help of the observation above, we present the following result.
Theorem 5.1.
Let be a convex function, for each , be an matrix and be defined by (8). Suppose that is a solution of problem (1) with , and for each , , defined by (43), are ordered as in (44). Then the following statements hold true.
(a) If for each , has sparsity of level under , then for each , satisfies
(45) |
(b) If for each , has sparsity of level under , then for each , there exists with such that satisfies
(46) |
(c) If for each , there exists such that satisfies inequality (46), then for each , has sparsity of level under .
Proof.
We first prove Item (a). If is a solution of problem (1) with , and for each , has sparsity of level under , then for each , the parameter , guaranteed by Theorem 3.3, satisfies equality (17) and inequality (18). Noting that the subset of has the cardinality , there are exactly elements of equal to and the remaining elements less than or equal to . This together with the order of , as in (44) leads to the desired inequality (45).
We next verify Item (b). As has been shown in Item (a), for each , satisfies inequality (45). If there is no element of the sequence being smaller than , then inequality (45) reduces to , . We then get inequality (46) with . Otherwise, we choose such that . We then rewrite inequality (45) as inequality (46) with . It is clear that .
It remains to show Item (c). If clearly, the sparsity level of under satisfies . We now consider the case when . According to Theorem 3.3, the relation leads to for all . Hence, has at least zero components. In other words, the number of nonzero components of is at most , that is, has sparsity of level under . ∎
Our goal is to find regularization parameters , that ensures the resulting solution of problem (1) achieves a prescribed sparsity level under each . According to Item (a) of Theorem 5.1, for each , the parameter satisfies inequality (45). Since the sequence depends on the corresponding solution, inequality (45) can not be used directly as a parameter choice strategy. Instead, it motivates us to propose an iterative scheme. The iteration begins with initial regularization parameters , which are large enough so that for each , the sparsity level of the corresponding solution of problem (1) under is smaller than the given target sparsity level . Suppose that at step , we have , and the corresponding solution with the sparsity level , under the transform matrices , , respectively. Item (a) of Theorem 5.1 ensures that for each , parameter satisfies
(47) |
We choose parameter at step from the elements among the ordered sequence in (47). Motivated by inequality (45), we choose as the -th element of the ordered sequence in (47), that is,
(48) |
As a result, for each , parameter satisfies
(49) |
Below, we claim that if the algorithm is convergent, then the parameters obtained by this algorithm satisfy inequality (45) of Theorem 5.1 which is a necessary condition for the resulting solution to have the given target sparsity levels. To this end, we state the following assumptions about the convergence of the algorithm. The convergence analysis of the proposed algorithm for choosing the multiple regularization parameters will be our future research projects.
(A1) For each , the sequence , , generated by the iteration scheme satisfies that for all .
(A2) For each , the sequences , generated by the iteration scheme satisfies that as for some .
(A3) The solution of problem (1) with , satisfies that for each and each , as .
Proposition 5.2.
If assumptions (A1), (A2) and (A3) hold, then for each , satisfies
(50) |
Proof.
Note that assumption (A1) allows us to choose parameter as in (48) at step . This together with inequality (47) leads to inequality (5). By taking on each item of inequality (5) and assumptions (A2) and (A3), we get that
(51) |
It follows from inequality (47) that for all . Taking on both sides of this equation, we get that This together with inequality (51) leads to the desired inequality (50). ∎
We note that three issues need to be considered in the iterative scheme. First, if for some and , the choice of as in (48) leads to and thus is invalid. To address this issue, we choose , motivated by inequality (46), among the sequence . We set and choose as
(52) |
Second, assumption (A1) may not always hold true. If for some and , this indicates that is too small and thus we should choose at step greater than . Since as shown in inequality (47), all the elements , , are less than or equal to , the choice (52) cannot provide a desired parameter greater than . In this case, we should go back to the sequence in step to choose an appropriate parameter . Finally, due to the interplay between the multiple regularization parameters, we do not require exact match of sparsity levels and instead, we allow them to have a tolerance error. For each , let denote the sparsity level of a solution of problem (1) under . With a given tolerance , we say that the solution achieves target sparsity levels , if
At step of the iterative scheme, the parameter is chosen from the elements among the ordered sequence in (47). To compute , , according to definition (43), we need to obtain the solution of problem (1) with and then determine the vectors and in Theorem 3.3.
We first consider the case when is differentiable and has full row rank. In this case, the subdifferential of at is the singleton and . That is, and . Accordingly, for each , the sequence , , has the form
As a result, we merely need to solve problem (1) with to obtain the solution . This can be done by employing the Fixed Point Proximity Algorithm (FPPA) which was developed in [2, 14, 20].
We describe the FPPA as follows. Let denote the set of symmetric and positive definite matrices. For , we define the weighted inner product of , by and the weighted -norm of by . Suppose that is a convex function, with The proximity operator of with respect to is defined for by
(53) |
In the case that coincides with the identity matrix , will be abbreviated as . Suppose that and are two convex functions which may not be differentiable, and . The optimization problem
can be solved by the FPPA: For given positive constants , and initial points , ,
(54) |
In Algorithm (54), positive constants and may be selected to satisfy so that the algorithm converges. When is differentiable and has full row rank, in each step of the iterative algorithm for choosing the multiple regularization parameters, we solve problem (1) by Algorithm (54) with specific choices of functions , and matrix .
We summarize the iterative scheme for choosing the multiple regularization parameters when is differentiable and has full row rank in Algorithm 1. Numerical experiments to be presented in Section 6 demonstrate the effectiveness of this algorithm in identifying the desired regularization parameters.
We next consider the case when is non-differentiable and does not have full row rank. In this case, at each step of the iterative scheme, we should not only find the solution of problem (1) but also determine the vectors and satisfying (16), (17) and (18) simultaneously. To this end, we establish a characterization of vectors and by using a fixed-point formulation via the proximity operators of the functions appearing in the objective functions of problem (13).
We begin with recalling some useful results about the proximity operator. It is known [20] that the proximity operator of a convex function is intimately related to its subdifferential. Specifically, if is a convex function from to , then for all , and
(55) |
The conjugate function of a convex function is defined as for all . There is a relation between the subdifferential of a convex function and that of its conjugate function . Specifically, for all and all , there holds
(56) |
This leads to the relation between the proximity operators of and :
In the following proposition, we establish the fixed-point equation formulation of the solution of problem (13).
Proposition 5.3.
Suppose that is a convex function and for each , is an matrix. Let be defined as in (8). Then the following statements hold true.
(a) If is a solution of problem (13) with , , then there exist vectors and satisfying
(57) |
(58) |
(59) |
for any matrices , and .
Proof.
According to Fermat rule and the chain rule (15) of the subdifferential, we have that is a solution of problem (13) if and only if
The latter is equivalent to that there exist and such that
(60) |
We first prove Item (a). If is a solution of problem (13), then there exist and satisfying inclusion relation (60), which further leads to for any . Relation (55) ensures the equivalence between the inclusion relation above and equation (57). According to relation (56), we rewrite the inclusion relation as , which further leads to for any . This guaranteed by relation (55) is equivalent to equation (58). Again by relation (56), the inclusion relation can be rewritten as . Hence, for any , we obtain that which guaranteed by relation (55) is equivalent to equation (59).
We next verify Item (b). Suppose that vectors , , and matrices , , satisfying equations (57), (58) and (59). As pointed out in the proof of Item (a), equations (58) and (59) are equivalent to inclusion relations and , respectively. Moreover, equation (57) are equivalent to inclusion relation (60). Consequently, we conclude that is a solution of problem (13). ∎
Proposition 5.3 provides the fixed-point equation formulations not only for the solution of problem (13) but also for another two vectors and . It follows from Lemma 3.1 that the solutions of problems (1) and (13) are closely related. Below, we show that and a subvector of just coincide with the desired two vectors appearing in Theorem 3.3, respectively.
Theorem 5.4.
Suppose that is a convex function and for each , is an matrix. Let be defined as in (8). Suppose that with , and for each for some , and in addition , . If vectors , and satisfy equations (57), (58) and (59) for some matrices , , , then is a solution of problem (1) with , and for each , has sparsity of level under . Moreover, and satisfy (16), (17) and (18).
Proof.
Item (b) of Proposition 5.3 ensures that if , and satisfy equations (57), (58) and (59) for some matrices , , , then is a solution of problem (13) with , . According to Lemma 3.1, we get that is a solution of problem (1) with , . By definition of mapping , we get that and . It follows from definition of matrix that for each , which shows that has sparsity of level under the transform matrix .
It suffices to verify that , and there hold (16), (17) and (18). As pointed out in the proof of Item (a) of Proposition 5.3, equations (58) and (59) are equivalent to inclusion relations and , respectively. By noting that , we get that . Recalling that leads to . Note that equation (57) are equivalent to inclusion relation (60). The latter holds if and only if inclusion relation (20) holds. As pointed out in the proof of Lemma 3.2, inclusion relation (20) yields that and satisfy (16), (17) and (18). ∎
Theorem 5.4 shows that obtaining the solution of problem (1) and the vectors , satisfying (16), (17) and (18) can be done by solving fixed-point equations (57), (58) and (59). These three fixed-point equations are coupled together and they have to be solved simultaneously by iteration. It is convenient to write equations (57), (58) and (59) in a compact form. To this end, we utilize the three column vectors , and to form a block column vector having , and as its three blocks. That is, . By integrating together the three proximity operators involved in equations (57), (58) and (59), we introduce an operator from to itself by
We also define a block matrix by
In the above notions, we rewrite equations (57), (58) and (59) in the following compact form
(61) |
Since equations (57), (58) and (59) are represented in the compact form (61), one may define the Picard iteration based on (61) to solve the fixed-point of the operator , that is
When it converges, the Picard sequence , , generated by the Picard iteration above, converges to a fixed-point of the operator . It is known [7] that the convergence of the Picard sequence requires the firmly non-expansiveness of the operator . However, by arguments similar to those used in the proof of Lemmas 3.1 and 3.2 of [14], we can prove that the operator is not firmly non-expansive. We need to reformulate the fixed-point equation (61) by appropriately split the matrix guided by the theory of the non-expansive map.
We describe the split of as follows. Set . By introducing three matrices satisfying , we split the expansive matrix as
Accordingly, the fixed-point equation (61) can be rewritten as
Based upon the above equation, we define a two-step iteration to solve the fixed-point of the operator as
(62) |
To develop an efficient iterative algorithm, we need to consider two issues: The first issue is the solvability of equation (62). The second issue is the convergence of the iterative algorithm. In fact, these two issues may be addressed by choosing appropriate matrices , , . Specifically, by introducing a real number , we choose , , as
(63) |
(64) |
and
(65) |
It is clearly that . Associated with these matrices, we have that
(66) |
(67) |
and
(68) |
We then rewrite the iterative scheme (62) in terms of the vector , and as
(69) |
We first note that since matrix is strictly block lower triangular, the two-step iteration scheme (62), as an implicit scheme in general, reduces to an explicit scheme (69). We next establish the convergence of Algorithm (69) in the following theorem. The convergence can be obtained by using the arguments similar to those in [14, 15] and we provide its complete proof in Appendix A for the convenience of readers. To this end, we introduce two block matrices by
(70) |
Theorem 5.5.
If matrices , , and satisfy
(71) |
and
(72) |
then the sequence , , generated by Algorithm (69) for any given , converges to a fixed-point of operator .
To end this section, we summarize the iterative scheme for parameter choices when is non-differentiable and does not satisfy full row rank in Algorithm 2. Note that the computation of the proximity operators involved in Algorithm (69) is essential for the implementation of Algorithm 2. In Appendix B, we will provide the closed-form formulas for these proximity operators.
6 Numerical experiments
In this section, we present four numerical experiments to demonstrate the efficiency of the proposed multi-parameter choice strategies.
In presentation of the numerical results, we use “TSLs” and to represent the target sparsity levels and the sparsity levels of the solution obtained from the parameter choice strategies, respectively. For the single parameter regularization model, we use “TSL” and to denote the target sparsity level and the sparsity level of the solution obtained from the parameter choice strategies, respectively. In the first two experiments related to the regularization problem (25), we use “Ratio” to denote the ratio of the number of nonzero components to the total number of components of the obtained solution. In the last two experiments associated with the general regularization problem (1), we use “Ratios” to represent the ratios . Specifically, for each , refers to the ratio of the number of nonzero components to the total number of components of the obtained solution under the transform matrix .
6.1 Parameter choices: a block separable case
In this experiment, we validate the parameter choice strategy proposed in Corollary 4.4 by considering the multi-parameter regularization model (30) for signal denoising. The single-parameter regularization model for signal denoising is also considered for the comparison purpose.
We employ model (30) to recover the Doppler signal function
(73) |
from its noisy data. Let and , , be the sample points on a uniform grid in with step size . We recover the signal from the noisy signal , where is an additive white Gaussian noise with the signal-to-noise ratio .
We describe the multi-parameter regularization model (30) as follows. In this model, we let and choose the matrix as the Daubechies wavelet transform with the vanishing moments and the coarsest resolution level . Let , , and set , , . We choose the partition for with , . Associated with this partition, we decompose into sub-vectors , , and decompose into sub-matrices . Moreover, for each , we choose the nature partition for . That is, , . Accordingly, we decompose vector into sub-vectors and decompose matrix into sub-matrices , . It follows from the orthogonality of matrix that conditions (39) and (41) are satisfied. This allows us to choose the regularization parameters according to the strategy stated in Corollary 4.4. We note that if is a solution of problem (30) with , then for each , the -block sparsity level of coincides with its sparsity level.
In this experiment, we solve model (30) by Algorithm (54) with , and . The proximity operator at has the form with , , being defined by (86). The proximity operator at has the form with and , being defined by (92).
We first validate the parameter choice strategy stated in Corollary 4.4. We set three prescribed TSLs values , and . According to the strategy stated in Corollary 4.4, we select the parameter with which model (30) has solutions having the target sparsity levels. For each , we set , and rearrange them in a nondecreasing order: with . We then choose if and if . We solve model (30) with each selected value of for the corresponding solution and determine the actual sparsity level SLs of .
We report in Table 1 the numerical results of this experiment: the targeted sparsity levels TSLs, the selected values of parameter , the actual sparsity levels SLs of , the Ratio values of and the values of the denoised signals . Here and in the next subsection, . Observing from Table 1, the SLs values match with the TSLs values. The numerical results in Table 1 demonstrate the efficacy of the strategy stated in Corollary 4.4 in selecting regularization parameters with which the corresponding solution achieves a desired sparsity level and preserves the approximation accuracy.
-
TSLs Ratio
For the comparison purpose, we also consider the single-parameter regularization model (30) with . Let , , rearranged in a nondecreasing order: with . For three TSL values , and , we choose according to Corollary 4.4 with . By solving model (30) with by Algorithm (54) with , and , we obtain the solution and determine the actual sparsity level SL of . The targeted sparsity levels TSL, the selected values of parameter , the actual sparsity levels of , the Ratio values of and the values of the denoised signals are reported in Table 2. Observing from Tables 1 and 2, compared with the single-parameter regularization model, the multi-parameter regularization model may provide a solution having better approximation accuracy with the same sparsity level.
6.2 Parameter choices: a nonseparable case
This experiment is devoted to validating the efficacy of Algorithm 1. We again consider recovering the Doppler signal function defined by (73) from its noisy data by the multi-parameter regularization model (30). The original signal and the noisy signal are chosen in the same way as in Subsection 6.1.
In this experiment, the matrix is determined by the biorthogonal wavelet ‘bior2.2’ available in Matlab with the coarsest resolution level . In both the analysis and synthesis filters, ‘bior2.2’ possesses 2 vanishing moments. Such a matrix does not satisfies conditions (39) and (41). As a result, we choose the regularization parameters by employing Algorithm 1. The number of the regularization parameters, the sub-vectors , of a vector and the sub-matrix , of are all defined as in Subsection 6.1.
We set three prescribed TSLs values , , . We choose the parameters , by Algorithm 1 with , , and . In Algorithm 1, we choose the initial parameter , and solve model (30) by Algorithm (54) with , and . The targeted sparsity levels TSLs, the selected values of parameter chosen by Algorithm 1, the actual sparsity levels of , the Ratio values of , the numbers of iterations for and the values of the denoised signals are reported in Table 3. For the three TSLs values, the algorithm reaches the stopping criteria within , and iterations, respectively. The SLs values obtained by Algorithm 1 match with the TSLs values within tolerance error . The numerical results in Table 3 validate the efficacy of Algorithm 1 for obtaining regularization parameters leading to a solution with desired sparsity level and approximation error.
-
Ratio
6.3 Compound sparse denoising
We consider in this experiment the parameter choice of the compound sparse denoising regularization model (5) implemented by Algorithm 2. In this case, the fidelity term defined by (6) is differentiable and the transform matrix satisfies . Clearly, does not have full row rank.
In this experiment, the noisy data with is generated by adding a low-frequency sinusoid signal , two additive step discontinuities for , for and otherwise, and additive white Gaussian noise with 0.3 standard deviation and zero mean. Set and . We estimate and from the noisy data by using simultaneous low-pass filtering and compound sparse denoising. The estimate of is obtained by solving the compound sparse denoising model (5), where the high-pass filter matrix is chosen as in Example A of [22]. The estimate of is then obtained by , where is the identity matrix of order with the first and last rows removed. To measure the filtering effect, we define the by
We employ Algorithm 2 to select multiple regularization parameters that ensures the resulting solution of model (5) achieves given TSLs. In Algorithm 2, we solve model (13) by Algorithm (69) where , has the form (6), is the identity matrix of order and is the first order difference matrix. Recall that . We choose the real number and the three matrices , and with , and being positive real numbers. According to Theorem 5.5, to ensure the convergence of Algorithm (69), we choose the real numbers , and satisfying . In this case, Algorithm (69) reduces to (84). The closed-form formulas of the proximity operators , and with being defined by (6) are given in (85), (91) and (93), respectively.
We set five prescribed values , , , and and choose the regularization parameters , by employing Algorithm 2 with . We report in Table 4 the targeted sparsity levels , the initial values of , the selected values of parameter chosen by Algorithm 2, the actual sparsity levels of , the Ratios values of , the numbers of iterations for and the values of the filtered signal. For the five TSLs values, the algorithm meets the stopping criteria after , , , , and iterations, respectively. The SLs values obtained by Algorithm 2 match with the TSLs values within tolerance error . The numerical results in Table 4 demonstrate the efficacy of Algorithm 2 in selecting multiple regularization parameters to achieve desired sparsity levels of the solution.
6.4 Fused SVM
The goal of this experiment is to validate the parameter choice of the fused SVM model (7) implemented by Algorithm 2. In model (7), the fidelity term is not differential and the transform matrix does not have full row rank.
The dataset utilized for this experiment is the set of handwriting digits sourced from the modified national institute of standards and technology (MNIST) database [13]. The original MNIST database consists of 60000 training samples and 10000 testing samples of the digits ‘0’ through ‘9’. We consider the binary classification problem with two digits ‘7’ and ‘9’, by taking 8141 training samples and 2037 testing samples of these two digits from the database. Let be the number of training samples and be the labels of training data in which -1 and 1 represent the digits ‘7’ and ‘9’, respectively. In addition, let be the the number of pixels in each sample.
We implement Algorithm 2 to select multiple regularization parameters with which the resulting solution of model (7) achieves given TSLs. By choosing the real number and the matrices , , in the same way as in Subsection 6.3, we solve model (13) by Algorithm (84) where , , is the identity matrix of order and is the first order difference matrix. It is worth noting that the proximity operator of function can not be expressed explicitly. Instead, we solve model (13) by Algorithm (84) with and being replaced by and , respectively, and obtain at step k three vectors , and . Then the vectors , and emerging in Algorithm 2 can be obtained by , and . Again by Theorem 5.5, to guarantee the convergence of Algorithm (84), we choose the real numbers , and in Algorithm (84) satisfying . The closed-form formulas of the proximity operators , and are given in (85), (91) and (94), respectively.
We set five prescribed TSLs values , , , and and use Algorithm 2 with to select the regularization parameters , . The targeted sparsity levels , the initial values of , the selected values of parameter chosen by Algorithm 2, the actual sparsity levels of , the Ratios values of , the numbers of iterations for , the accuracy on the training datasets () and the accuracy on the testing datasets () are reported in Table 5. Algorithm 2 meets the stopping criteria within 10, 13, 15, 15 and 8 iterations for the five TSLs values, respectively. The SLs values obtained by Algorithm 2 match with the TSLs values within tolerance error . These results validate the effectiveness of Algorithm 2 for obtaining multiple regularization parameters leading to a solution with desired sparsity levels.
7 Conclusion
In this paper, we have explored strategies for selecting multiple regularization parameters in the multi-parameter regularization model with the norm. We established the relationship between the regularization parameters and the sparsity of the regularized solution under transform matrices. Leveraging this relationship, we developed an iterative algorithm to determine multiple regularization parameters, ensuring that the resulting regularized solution achieves prescribed sparsity levels under the transform matrices. For scenarios where the fidelity term is nondifferentiable and the transform matrix lacks full row rank, we introduced a fixed-point proximity algorithm capable of simultaneously determining the regularized solution and two auxiliary vectors arising in the sparsity characterization. This algorithm served as a critical component in constructing the iterative parameter selection scheme. Finally, numerical experiments demonstrated the effectiveness of the proposed multi-parameter selection strategies, confirming their ability to produce solutions with the desired sparsity and performance.
Acknowledgments. Q. Liu is supported in part by the Doctor Foundation of Henan University of Technology, China (No.2023BS061), and the Innovative Funds Plan of Henan University of Technology (No.2021ZKCJ11). R. Wang is supported in part by the Natural Science Foundation of China under grant 12171202. Y. Xu is supported in part by the US National Science Foundation under grant DMS-2208386, and the US National Institutes of Health under grant R21CA263876.
Appendix A Proof of Theorem 5.5
In this appendix, we present a complete proof for Theorem 5.5, which shows the convergence of Algorithm (69).
We start with reviewing the notion of weakly firmly non-expansive operators introduced in [14]. An operator is called weakly firmly non-expansive with respect to a set of matrices if for any satisfying for there holds
(74) |
The graph of operator is defined by
We say the graph of is a closed set if for any sequence converging to , there holds . Following [14], we also need the notion of Condition-. We say a set of matrices satisfies Condition-, if the following three hypotheses are satisfied : (i) ; (ii) is in , (iii) .
The next result established in [14] demonstrates that if a operator with closed graph is weakly firmly non-expansive with respect to a set of matrices satisfies Condition-, then a sequence generated by
(75) |
converges to a fixed point of , that is, .
Lemma A.1.
Suppose that a set of matrices satisfies Condition-, the operator is weakly firmly non-expansive with respect to , the set of fixed-points of is nonempty and . If the sequence is generated by (75) for any given , , then converges. Moreover, if the graph of is closed, then converges to a fixed-point of .
Below, we establish the convergence of Algorithm (69) by employing Lemma A.1. For this purpose, we construct a new operator from operator . Let , , be matrices defined by (63), (64) and (65), respectively. Associated with the set , we define an operator for any by with satisfying
(76) |
We note that the operator is well-defined. To see this, we decompose any as with and . By using representations (66),(67) and (68), we rewrite equation (76) as
(77) |
Clearly, for any , there exists a unique satisfying equation (77).
Observing from equation (76), we have that operator has the same fixed-point set as . With the help of operator , we represent equation (62) in an explicit form as
(78) |
and obtain a fixed-point of by this iteration. According to Lemma A.1, in order to obtain the convergence of iterative scheme (78), it suffices to prove that operator defined by (76) is weakly firmly non-expansive with respect to and the graph of is closed, and in addition, the set satisfies Condition-. We first show the properties of operator . For this purpose, we introduce a skew-symmetric matrix as
and then represent matrix as .
Proposition A.2.
Proof.
We first prove that is weakly firmly non-expansive with respect to . It suffices to prove that for any satisfying for there holds equation (74). It follows from definition (76) of that
(79) |
By arguments similar to those used in the proof of Lemma 3.1 of [14], we have that operator is firmly non-expansive with respect to , that is, for all , ,
As a result, we get by equation (79) that
Substituting into the right hand side of the inequality above, we obtain that
This together with the fact that leads to equation (74). Consequently, we conclude that is weakly firmly non-expansive with respect to .
It remains to show the closedness of the graph of operator . For any sequence converging to , we obtain from definition (76) of that
Associated with , we introduce a vector as
Since is firmly non-expansive with respect , there holds for any
By letting with noting that and as , we get . As a result, . This together with the definition of leads directly to . Therefore, the graph of is closed. ∎
The next proposition reveals that the set satisfies Condition-.
Proposition A.3.
Proof.
It is clear that , that is, Item (i) of Condition- holds. To show the validity of Item (ii), we set . By introducing block matrices and as in (70), we represent as
It follows from condition (71) that
(80) |
which guaranteed by Lemma 6.2 in [14] is equivalent to . Hence, we get Item (ii) of Condition-.
It remains to verify Item (iii) of Condition-. Lemma 6.2 in [14] ensures that if inequality (80) holds, the norm of matrix can be estimated by
(81) |
We observe that
which yields that
(82) |
It follows that
(83) |
Substituting inequality (81) and equation (82) into inequality (83) yields that
which together with condition (72) further leads to Item (iii) of Condition-. ∎
Combining Lemma A.1, Propositions A.2 and A.3, we are ready to provide in the following proof of Theorem 5.5.
Proof of Theorem 5.5: Proposition A.2 ensures that is weakly firmly non-expansive with respect to and the graph of is closed. Moreover, the set , guaranteed by Proposition A.3, satisfies Condition-. That is, the hypotheses of Lemma A.1 are satisfied. By Lemma A.1, we get that the sequence , generated by (78) for any given , , converges to a fixed-point of . Note that Algorithm (69) has the equivalent form (78) and operator has the same fixed-point set as . Consequently, we conclude that the sequence , generated by Algorithm (69) for any given , converges to a fixed-point of operator .
Appendix B The closed-form formulas of proximity operators
In this appendix, we provide the closed-form formulas for the proximity operators corresponding to the functions utilized in Algorithm (69).
As in numerical experiments, the real number and the matrices , , are chosen as
with , and being positive real numbers. It is known [3] that for any convex function from to and
Accordingly, Algorithm (69) reduces to
(84) |
We consider computing the three proximity operators in (84) explicitly. We begin with the proximity operator with , , and , being defined by (12).
Proposition B.1.
If , , and , being defined by (12), then the proximity operator at with and has the form
(85) |
where for each and
(86) |
Proof.
For each and each , we set with and . It follows from definition (53) of the proximity operator that
which further leads to
As a result, we have that
(87) |
and
(88) |
According to Examples 2.3 and 2.4 in [20] and noting that and , , we obtain from equation (87) that
That is, equation (85) holds. Moreover, equation (88) leads directly to . ∎
We also consider the special case that . In this case, we present a closed-form formula for the proximity operator with , and .
Corollary B.2.
The next closed-form formula concerns the proximity operator with and . For a matrix , we denote by the pseudoinverse of .
Proposition B.3.
If and , then the proximity operator at with and has the form
Proof.
According to definition (53) of the proximity operator, we obtain for each and each that with and , where
By noting that , we rewrite the above equation as
(89) |
and
(90) |
Equation (89) shows that is the best approximation to from the subspace . Hence, we get that with vector satisfying
By rewriting the above equation as
we have that vector is a solution of the linear system
By using the pseudoinverse of , we represent as
where satisfying . Note that if and only if . Thus, . Moreover, we can obtain directly from equation (90) that . ∎
In the case when , the subspace reduces to . Since has full column rank, the matrix is nonsingular. As a direct consequence of Proposition B.3, we describe the closed-form formula of the proximity operator for this special case as follows.
Corollary B.4.
If matrix has full column rank, and , then the proximity operator at has the form
(91) |
Finally, we give closed-form formulas for the proximity operators of some loss functions, which will be used in numerical experiments. The first loss function is the norm composed with a matrix.
Proposition B.5.
If is an matrix, and , then the proximity operator at has the form
Proof.
By setting , we obtain from definition (53) of the proximity operator that
which together with the Fermat rule leads to
By rewriting the above equation as
and noting that is nonsingular, we obtain that
which completes the proof of this proposition. ∎
Two special cases of Proposition B.5 are used in numerical experiments. In Subsection 6.1, the loss function is chosen as . According Proposition B.5, the proximity operator at has the form with , where
(92) |
In Subsection 6.3, the loss function is defined by (6) and the proximity operator at can be represented by
(93) |
References
References
- [1] Afonso M V, Bioucas-Dias J M and Figueiredo M A T 2010 An augmented Lagrangian approach to linear inverse problems with compound regularization IEEE International Conference on Image Processing 4169–72
- [2] Argyriou A, Micchelli C A, Pontil M, Shen L and Xu Y 2011 Efficient first order methods for linear composite regularizers (arXiv:1104.1436)
- [3] Bauschke H H and Combettes P L 2011 Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, New York)
- [4] Belkin M, Niyogi P and Sindhwani V 2006 Manifold regularization: a geometric frame work for learning from labeled and unlabeled examples J. Mach. Learn. Res. 7 2399–434
- [5] Björck Å 1996 Numerical methods for least squares problems (Philadelphia, PA: SIAM)
- [6] Brezinski C, Redivo-Zaglia M, Rodriguez G and Seatzu S 2003 Multi-parameter regularization techniques for ill-conditioned linear system Numer. Math. 94 203–28
- [7] Byrne C 2003 A unified treatment of some iterative algorithms in signal processing and image reconstruction Inverse problems 20 103
- [8] Chen Z, Lu Y, Xu Y and Yang H 2008 Multi-parameter Tikhonov regularization for ill-posed operator equations J. Comput. Math. 26 37–55
- [9] Ding L and Han W 2019 - regularization for sparse recovery Inverse Problems 35 125009
- [10] Düvelmeyer D and Hofmann B 2006 A multi-parameter regularization approach for estimating parameters in jump diffusion process J. Inverse Ill-posed Probl. 14 861–80.
- [11] Friedman J, Hastie T, Höfling H and Tibshirani R 2007 Pathwise coordinate optimization Ann. Appl. Stat. 1 302–32
- [12] Horn R A and Johnson C R 2012 Matrix Analysis (Cambridge: Cambridge University Press)
- [13] LeCun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition Proc. IEEE 86 2278–324
- [14] Li Q, Shen L, Xu Y and Zhang N 2015 Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing Adv. Comput. Math. 41 387–422
- [15] Li Q, Xu Y and Zhang N 2017 Two-step fixed-point proximity algorithms for multi-block separable convex problems J. Sci. Comput. 70 1204–28
- [16] Li Z, Song G and Xu Y 2019 A two-step fixed-point proximity algorithm for a class of non-differentiable optimization models in machine learning J. Sci. Comput. 81 923–40
- [17] Liu Q, Wang R, Xu Y and Yan M 2023 Parameter Choices for Sparse Regularization with the Norm Inverse Problems 39 025004
- [18] Lu S and Pereverzev S V 2013 Regularization Theory for Ill-Posed Problems. Selected Topics (De Gruyter, Berlin, Boston)
- [19] Lu Y, Shen L and Xu Y 2007 Multi-parameter regularization methods for high-resolution image reconstruction with displacement errors IEEE Trans. Circuits Systems I 54 1788–99
- [20] Micchelli C A, Shen L and Xu Y 2011 Proximity algorithms for image models: denoising Inverse Problems 27 045009
- [21] Rapaport F, Barillot E and Vert J P 2008 Classification of arrayCGH data using fused SVM Bioinformatics 24 i375–82.
- [22] Selesnick I W, Graber H L, Pfeil D S and Barbour R L 2014 Simultaneous low-pass filtering and total variation denoising IEEE Trans. Image Process. 62 1109–24
- [23] Shen L, Wang R, Xu Y and Yan M 2024 Sparse Deep Learning Models with the Regularization (arXiv:2408.02801)
- [24] Showalter R E 1997 Monotone Operators in Banach Spaces and Nonlinear Partial Differential Equations (American Mathematical Society, Providence)
- [25] Tibshirani R 1996 Regression shrinkage and selection via the lasso J. Roy. Statist. Soc. Ser. B 58 267–88.
- [26] Tibshirani R, Saunders M, Rosset S, Zhu J and Knight K 2005 Sparsity and smoothness via the fused lasso J. R. Stat. Soc. B 67 91–108
- [27] Tolosi L and Lengauer T 2011 Classification with correlated features: unreliability of feature ranking and solutions Bioinformatics 27 1986–94.
- [28] Wang W, Lu S, Mao H and Cheng J 2013 Multi-parameter Tikhonov regularization with the sparsity constraint Inverse Problems 29 065018
- [29] Xu Y 2023 Sparse regularization with the norm Anal. Appl. 21 901–29
- [30] Zălinescu C 2002 Convex Analysis in General Vector Spaces (River Edge, NJ: World Scientific)