This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

[2]\fnmXinmin\surYang

1]\orgdivCollege of Mathematics, \orgnameSichuan University, \orgaddress\cityChengdu, \postcode610065, \countryChina

[2]\orgdivSchool of Mathematical Sciences, \orgnameChongqing Normal University, \orgaddress\cityChongqing, \postcode401331, \countryChina

General inertial smoothing proximal gradient algorithm for the relaxation of matrix rank minimization problem

Abstract

We consider the exact continuous relaxation model of matrix rank minimization problem proposed by Yu and Zhang (Comput.Optim.Appl. 1-20, 2022). Motivated by the inertial techinique, we propose a general inertial smoothing proximal gradient algorithm(GIMSPG) for this kind of problems. It is shown that the singular values of any accumulation point have a common support set and the nonzero singular values have a unified lower bound. Besides, the zero singular values of the accumulation point can be achieved within finite iterations. Moreover, we prove that any accumulation point of the sequence generated by the GIMSPG algorithm is a lifted stationary point of the continuous relaxation model under the flexible parameter constraint. Finally, we carry out numerical experiments on random data and image data respectively to illustrate the efficiency of the GIMSPG algorithm.

keywords:
Smoothing approximation, Proximal gradient method, Inertial, Rank minimization problem

1 Introduction

In the recent years, much work has been dedicated to the matrix rank minimization problem which emerge in many applications, especially in computer vision Zheng_Y and matrix completion Recht . Lots of models, methods and its variants have been studied, one can refer to the literaturesMa_S ; Cai_J ; Mesbahi ; Lai_M ; Fornasier_M ; Ma_T ; Ji_S ; Lu_Z ; He_Y ; Zhao_Q , in detail, the matrix rank minimization problem can be represented as

minXm×n(X):=f(X)+λrank(X),\displaystyle\mathop{\min}\limits_{X\in\mathbb{R}^{m\times n}}\quad\mathcal{F}(X):=f(X)+\lambda\cdot\mathrm{rank}(X), (1)

where rank(X)\mathrm{rank}(X) denotes the number of nonzero singular values.

In this paper, we concentrate on the exact continuous relaxation of matrix rank minimization problem proposed in Yu_Q_Zhang_X , that is the following nonconvex relaxation problem,

minXm×n(X):=f(X)+λΦ(X),\displaystyle\mathop{\min}_{{X\in\mathbb{R}^{m\times n}}}\limits\quad\mathcal{F}(X):=f(X)+\lambda\Phi(X), (2)

where mnm\geq n and f:m×n[0,)f:\mathbb{R}^{m\times n}\to[0,\infty) is convex and not necessairly smooth, λ\lambda is a positive parameter. Φ(X)=i=1nϕ(σi(X))\Phi(X)=\sum_{i=1}^{n}\phi(\sigma_{i}(X)) is an exact continuous relaxation of the matrix rank minimization problem. The capped-1\ell_{1} function ϕ\phi with given v>0v>0 is

ϕ(t)=min{1,t/v},t0.\displaystyle\phi(t)=\mathop{\min\left\{1,t/v\right\}},\,t\geq 0. (3)

It is observed that ϕ\phi in (3) can be seen as a DC function, that is

ϕ(t)=tvmax{θ1(t),θ2(t)},\displaystyle\phi(t)=\frac{t}{v}-\max\left\{\theta_{1}(t),\theta_{2}(t)\right\}, (4)

with θ1(t)=0,θ2(t)=t/v1(t0)\theta_{1}(t)=0,\,\theta_{2}(t)=t/v-1(t\geq 0) and 𝒟(t)={i{1,2}:θi(t)=max{θ1(t),θ2(t)}}\mathcal{D}(t)=\{i\in\{1,2\}:\theta_{i}(t)=\max\{\theta_{1}(t),\theta_{2}(t)\}\}. When the matrix XX is to be diagonal, the problem (2) is reduced to the exact continuous relaxation model of 0\ell_{0} regularization problem which was given in Bian_W_Chen_X , i.e.,

minxn(x):=f(x)+λΦ(x),\displaystyle\mathop{\min}\limits_{x\in\mathbb{R}^{n}}\quad\mathcal{F}(x):=f(x)+\lambda\Phi(x), (5)

where Φ(x)=i=1nϕ(xi)\Phi(x)=\sum_{i=1}^{n}\phi(x_{i}) and ϕ\phi is defined as ϕ(t)=min{1,|t|/v}\phi(t)=\mathop{\min\left\{1,|t|/v\right\}} for tt\in\mathbb{R}. In Bian_W_Chen_X , one proposed the efficient smoothing proximal gradient method to solve this kind of problem with box constraint.

In Zhang_J , the authors established the smoothing proximal gradient method with extrapolation (SPGE) algorithm for solving (5). The extrapolation coefficient can be obtained supβk=1\sup\beta_{k}=1. Further, under more strict condition of the extrapolation coefficient which includes the extrapolation coefficient in sFISTA with the fixed restart ODonoghue , it is proved that any accumulation point of the sequence generated by SPGE is a lifted stationary point of (5). Besides, the convergence rate based on the proximal residual is developed. Since the penalty item of problem (2) is nonconvex for a fixed dd in (12), the framework of SPGE algorithm in Zhang_J can not be directly applied to the matrix case.

As we know, incorporating inertial item which is also called extrapolation item to the proximal gradient method is a popular technique to improve the efficiency of proximal gradient algorithm for solving the following composition optimization problem,

minxn(x)=f(x)+g(x),\displaystyle\min_{x\in\mathbb{R}^{n}}{\mathcal{F}}(x)=f(x)+g(x), (6)

where f:nf\colon\mathbb{R}^{n}\to\mathbb{R} is a smooth with Lipschitz continuous gradient and possibly nonconvex function, g:n(,+]g\colon\mathbb{R}^{n}\to(-\infty,+\infty] is a proper closed, and convex function. The composition structure that one item is smooth convex and the other is convex makes the proximal gradient method Parikh_N_Boyd_S widely used. Specifically, the updating scheme can be read as:

xk+1=proxtkg(xktkf(xk)),x^{k+1}={{\rm prox}_{t_{k}g}}(x^{k}-t_{k}\nabla f(x^{k})),

where tkt_{k} is the stepsize. Throughout this paper, the proximal mapping Parikh_N_Boyd_S of λg\lambda g is defined as

proxλg(u):=argminxn{g(x)+12λxu2},{{\rm prox}_{\lambda g}}(u):=\arg\min\limits_{x\in\mathbb{R}^{n}}\limits\left\{g(x)+\frac{1}{2\lambda}{\|x-u\|}^{2}\right\},

where λ>0\lambda>0 and unu\in\mathbb{R}^{n}. The accelerated proximal gradient method for convex optimization problems have been well studied Nesterov_Y0 ; Nesterov_Y1 . In particular, Beck and Teboulle Beck_A_Teboulle_M established the remarkable Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) for the convex case which was based on the Nesterov’s method Nesterov_Y1 ; Nesterov_Y . Some variants of FISTA have been developed by choosing appropriate extrapolation parameters, we refer the readers to Chambolle_A_Dossal_C ; Liang_J_and_C_B and their references therein. In Wu_Z_Li_M , under the proper parameter constraints, Wu and Li established the proximal gradient algorithm with different extrapolation (PGe) coeifficients on the proximal step and gradient step for solving the problem (6) when the smooth component is nonconvex and the nomsooth component is convex. The general framework is given as follows:

yk=xk+αk(xkxk1),zk=xk+βk(xkxk1),xk+1=argminxn{g(x)+f(zk),xyk+12λkxyk2},\begin{split}y^{k}=&x^{k}+\alpha_{k}(x^{k}-x^{k-1}),\\ z^{k}=&x^{k}+\beta_{k}(x^{k}-x^{k-1}),\\ x^{k+1}=&\arg\min_{x\in\mathbb{R}^{n}}\left\{g(x)+\left<\nabla f(z^{k}),x-y^{k}\right>+\frac{1}{2{\lambda}_{k}}{\|x-y_{k}\|}^{2}\right\},\end{split} (7)

where αk,βk\alpha_{k},\beta_{k} are the extrapolation coefficients, and λk\lambda_{k} is the stepsize. In Wu_zhongming , the authors further developed the inertial Bregman proximal gradient method for minimizing the sum of two possible nonconvex functions. In their method, the generalized Bregman distance replaces the Euclidean distance and two different inertial items on the proximal step and gradient step are taken respectively.

For matrix case, Toh and Yun Toh_K_C_Yun_S proposed the accelerated proximal gradient algorithm for the following convex problem:

minXm×nF(X)=f(X)+g(X),\displaystyle\mathop{\min}\limits_{X\in\mathbb{R}^{m\times n}}F(X)=f(X)+g(X), (8)

where ff is a convex and smooth function with Lipschitz continuous gradient and gg is a proper closed convex function. The problem arises in applications such as in matrix completion problems Toh_K_C_Yun_S , multi-task learning A_Argyriou and principal component analysis (PCA) Mesbahi ; Xu_H . When the matrix is diagonal, the problem is reduced to the vector optimization problem (6).

Since optimization problem (2) is a matrix generalization of the vector optimization problem (5), it is natural for us to explore the possibility of extending some of the algorithms developed for problem (5) to solve problem (2). We observe that the loss function in above require a restrictive assumption that the function ff is smooth, it is difficult to apply the proximal gradient method directly for problem (2). Fortunately, the smoothing approxitation methods proposed in Chen_X can overcome the dilemma. Smoothing approximation method is an efficient method which is widely used in other problems. One can refer to Zhang_C_Chen_X ; Wu_F_Bian_W_Xue_X and their references therein. Especially, in Wei_B , the author proposed the smoothing fast iterative shrinkage thresholding algorithm (sFISTA) for the convex problem with the following extrapolation coefficient under the vector case, i.e.,

αk=βk=tk1tk+1,\displaystyle\alpha_{k}=\beta_{k}=\frac{t_{k}-1}{t_{k+1}},
tk+1=1+1+4tk2μkμk+12,\displaystyle t_{k+1}=\frac{1+\sqrt{1+4{t_{k}}^{2}\frac{\mu_{k}}{\mu_{k+1}}}}{2}, (9)

where t0=1t_{0}=1 and μ0(0,1)\mu_{0}\in(0,1). They gave the global convergence rate O(lnk/k)O(lnk/k) on the objective function.

Motivated by the smoothing method and the general accelerated techinique in (7), we propose a general inertial smoothing proximal gradient method (GIMSPG) for matrix case. Thanks to the special structure of penalty item of (2)(\ref{1.2}), though the subproblem is nonconvex in the GIMSPG algorithm, it has a closed form solution. Most recently, Li and Bian li_wenjing studied a class of sparse group 0\ell_{0} regularized optimization problem and gave an exact continuous relaxation model of it. They proposed the difference of convex(DC) algorithms for solving the relaxation model which has the DC structure. Moreover, they discussed the zero elements of the accumulation point has finite iterations. Inspired by this, under certain conditions of the parameters, we prove that the zero singular values of any accumulation point of the iterates generated by the GIMSPG algorithm have the same locations and the nonzero singular values are always not less than a fixed value. Besides, the GIMSPG algorithm has the ability identifying the zero singular values of any accumulation point within finite iterations. Furthermore, we show that any accumulation point of the sequence generated by GIMSPG algorithm is a lifted stationary point of the continuous relaxation problem (2).

The outline of this paper is as follows. Some preliminaries are presented in Section 2. Then we establish the GIMSPG algorithm framework and discuss its convergence in Section 3. In Section 4, we conduct numerical experiments to illustrate the efficiency of the proposed GIMSPG algorithm. Finally, we draw some conclusions in Section 5.

Notations. Let m×n\mathbb{R}^{m\times n} be the matrix space of m×nm\times n with the standard inner product, i.e., for any X,Ym×n(mn)X,Y\in\mathbb{R}^{m\times n}(m\geq n), X,Y=tr(XTY)\langle X,Y\rangle={\rm{tr}}(X^{T}Y), where tr(XTY){\rm{tr}}(X^{T}Y) is the trace of the matrix XTYX^{T}Y. For any matrix Xm×nX\in\mathbb{R}^{m\times n}, the Frobenius norm and nuclear norm are respectively denoted by XXF=tr(XTX)\|X\|{\|X\|}_{F}=\sqrt{{\rm{tr}}(X^{T}X)}, X=i=1nσi(X)\|X\|_{*}=\sum_{i=1}^{n}\sigma_{i}(X) where σ(X):=(σ1(X),,σn(X))T\sigma(X):=(\sigma_{1}(X),\ldots,\sigma_{n}(X))^{T} is the singular values of XX with σ1(X)σ2(X),,σn(X)0\sigma_{1}(X)\geq\sigma_{2}(X)\geq,\ldots,\geq\sigma_{n}(X)\geq 0. Denote 𝒜(σ(X))={i{1,2,,n}:σi(X)0}\mathcal{A}(\sigma(X))=\left\{i\in\left\{1,2,\ldots,n\right\}:\sigma_{i}(X)\neq 0\right\}, X1=vec(X)1\|X\|_{1}=\|vec(X)\|_{1}, where vec(X)vec(X) is the vectorization operation of a matrix XX. Denote f(X)\partial f(X) be a Clarke subgradient of ff at Xm×nX\in\mathbb{R}^{m\times n} where f(X):m×nf(X):\mathbb{R}^{m\times n}\to\mathbb{R} is a locally Lipschitz continuous function. 𝒟(x)\mathscr{D}(x) is a diagonal matrix generated by vector xx. Ei=𝒟(ei)E_{i}=\mathscr{D}(e_{i}) where eie_{i} is the unit vector and the ii-th element is 1. Denote n\mathbb{Q}^{n} be the set of n×nn\times n dimension unitary orthogonal matrix and 𝕊n\mathbb{S}^{n} be the real and symmertric n×nn\times n matrices. For vector xnx\in\mathbb{R}^{n}, x1=i=1n|xi|\|x\|_{1}=\sum_{i=1}^{n}|x_{i}|. Denote ={0,1,2,}\mathbb{N}=\left\{0,1,2,\ldots\right\}, 𝔻n={dn:di{1,2},i=1,2,,n}\mathbb{D}^{n}=\left\{d\in\mathbb{R}^{n}:d_{i}\in\left\{1,2\right\},i=1,2,\ldots,n\right\}.

2 Preliminaries

In this section, we first recall some definitions and primary results which are the basis for the rest of the discussion. First, we give the form of the Clarke subdifferential of Φ\Phi at Xm×nX\in\mathbb{R}^{m\times n} followed by A_S_Lewis , that is

Φ(X)={U𝒟(x)VT:xi=1nϕ(σi(X)),U,V(X)},\displaystyle\partial\Phi(X)=\{U\mathscr{D}(x)V^{T}:x\in\partial\sum_{i=1}^{n}\phi(\sigma_{i}(X)),U,V\in\mathcal{M}(X)\},

where (X)={(U,V)m×m×n×n:UTU=VTV=I,X=U𝒟(σ(X))VT}\mathcal{M}(X)=\{(U,V)\in\mathbb{Q}^{m\times m}\times\mathbb{Q}^{n\times n}:U^{T}U\!=\!V^{T}V\!=\!I,X\!=\!U\mathscr{D}(\sigma(X))V^{T}\}.

Definition 1.

(Lifted stationary point Yu_Q_Zhang_X ) We say that Xm×nX\in\mathbb{R}^{m\times n} is a lifted stationary point of (2) if there exist di𝒟(σi(X))d_{i}\in\mathcal{D}(\sigma_{i}(X)) for i=1,2,,ni=1,2,\ldots,n such that

λi=1nθdi(σi(X))Ei{Uf(X)VT+λv𝒟(x1x=σ(X)):(U,V)(X)},\displaystyle\lambda\sum_{i=1}^{n}\theta^{\prime}_{d_{i}}(\sigma_{i}(X))E_{i}\in\{U\partial f(X)V^{T}+\frac{\lambda}{v}\mathscr{D}(\partial\|x\|_{1}\mid_{x=\sigma(X)}):(U,V)\in\mathcal{M}(X)\}, (10)

where σi(X)\sigma_{i}(X) is the iith largest singular value of XX.

Assumption 1.

ff is Lipschitz continuous on m×n\mathbb{R}^{m\times n} with the Lipschitz continuous constant LfL_{f}.

Assumption 2.

The parameter vv in (3) satisfies 0<v<v¯:=λLf0<v<\bar{v}:=\frac{\lambda}{L_{f}}.

Assumption 3.

\mathcal{F} in (2)(or 0\mathcal{F}_{\ell_{0}} in (1) ) is level bounded on m×n\mathbb{R}^{m\times n}.

Lemma 1.

(Yu_Q_Zhang_X ) If X¯\bar{X} is a lifted stationary point of (2), then the vector dX¯=(d1X¯,,dnX¯)i=1n𝒟(σi(X¯))d^{\bar{X}}=(d_{1}^{\bar{X}},\ldots,d_{n}^{\bar{X}})\in\prod_{i=1}^{n}\mathcal{D}({\sigma_{i}}(\bar{X})) satisfying (10) is unique. In particular, for i=1,2,,ni=1,2,\ldots,n, it has that

diX¯={1, if σi(X¯)<v,2, if σi(X¯)v.d_{i}^{\bar{X}}=\begin{cases}1,&\text{ if }{\sigma_{i}}(\bar{X})<v,\\ 2,&\text{ if }{\sigma_{i}}(\bar{X})\geq v.\end{cases} (11)
Lemma 2.

(Yu_Q_Zhang_X ) X¯\bar{X} is a global minimizer of (2) if and only if it is a global minimizer of (1) and the objective functions have the same value at X¯\bar{X}.

Lemma 3.

(Yu_Q_Zhang_X ) For a given Wm×nW\in\mathbb{R}^{m\times n} and τ>0\tau>0, suppose U𝒟(w)VTU\mathscr{D}(w)V^{T} be the singular value decomposition of WW and x^=proxτΦd(w)\hat{x}=prox_{\tau\Phi^{d}}(w), then X^=U𝒟(x^)VT\hat{X}=U\mathscr{D}(\hat{x})V^{T} is an optimal solution of the problem

minX{τΦd(X)+12XW2}.\min_{X}\{\tau\Phi^{d}(X)+\frac{1}{2}\|X-W\|^{2}\}.

3 The general inertial smoothing proximal gradient algorithm and its convergence analysis

In order to overcome the nondifferentiability of the convex loss function ff in (2), we use a smoothing function defined as in Yu_Q_Zhang_X to approximate convex function ff.

Definition 2.

We call f~:m×n×[0,μ¯]\tilde{f}:\mathbb{R}^{m\times n}\times[0,\bar{\mu}]\to{\mathbb{R}} with μ¯>0\bar{\mu}>0 a smoothing function of the function ff in (2) if f~(,μ)\tilde{f}(\cdot,\mu) is continuous differentiable in m×n\mathbb{R}^{m\times n} for any fixed μ>0\mu>0 and satisfies the following conditions:

  1. (i)

    limZX,μ0f~(Z,μ)=f(X)Xm×n\lim_{Z\to X,\mu\downarrow 0}\tilde{f}(Z,\mu)=f(X)\quad\forall X\in\mathbb{R}^{m\times n};

  2. (ii)

    f~(X,μ)\tilde{f}(X,\mu) is convex with respect to XX for any fixed μ>0\mu>0;

  3. (iii)

    {limZX,μ0Zf~(Z,μ)}f(X)Xm×n\{\lim_{Z\to X,\mu\downarrow 0}\nabla_{Z}\tilde{f}(Z,\mu)\}\subseteq\partial f(X)\quad\forall X\in\mathbb{R}^{m\times n};

  4. (iv)

    there exists a positive constant κ\kappa such that

    |f~(X,μ2)f~(X,μ1)|κ|μ1μ2|,Xm×n,μ1,μ2[0,μ¯],|\tilde{f}(X,\mu_{2})-\tilde{f}(X,\mu_{1})|\leq\kappa|\mu_{1}-\mu_{2}|,\quad\forall X\in\mathbb{R}^{m\times n},\mu_{1},\mu_{2}\in[0,\bar{\mu}],

    especially,

    |f~(X,μ)f(X)|κμ,Xm×n,0<μμ¯;|\tilde{f}(X,\mu)-f(X)|\leq\kappa\mu,\quad\forall X\in\mathbb{R}^{m\times n},0<\mu\leq\bar{\mu}\mathchar 24635\relax\;
  5. (v)

    there exists a constant L~>0\tilde{L}>0 such that for any μ(0,μ¯]\mu\in(0,\bar{\mu}], Xf~(,μ)\nabla_{X}\tilde{f}(\cdot,\mu) is Lipschitz continuous with Lipschitz constant L~μ1\tilde{L}{\mu}^{-1}.

Denote

Φd(X):=i=1n(σi(X)vθdi(σi(X))d=(d1,d2,,dn)T𝔻n.\displaystyle\Phi^{d}(X):=\sum_{i=1}^{n}(\frac{{\sigma_{i}}(X)}{v}-\theta_{d_{i}}({\sigma_{i}}(X))\quad d=(d_{1},d_{2},\ldots,d_{n})^{T}\in\mathbb{D}^{n}. (12)

It can be easily got that

Φ(X)=mind𝔻nΦd(X)Xm×n,\displaystyle\Phi(X)=\mathop{\min}_{d\in\mathbb{D}^{n}}\Phi^{d}(X)\quad\forall X\in{\mathbb{R}}^{m\times n}, (13)

and Φ(X¯)=ΦdX¯(X¯)\Phi(\bar{X})=\Phi^{d^{\bar{X}}}(\bar{X}) with fixed X¯\bar{X} where dX¯d^{\bar{X}} is defined in (11).

For the discussion convenience, we give the notations in the following way

~d(X,μ):=f~(X,μ)+λΦd(X)and~(X,μ):=f~(X,μ)+λΦ(X),\mathcal{\widetilde{F}}^{d}(X,\mu):=\widetilde{f}(X,\mu)+\lambda\Phi^{d}(X)\quad{\rm{and}}\quad\mathcal{\widetilde{F}}(X,\mu):=\tilde{f}(X,\mu)+\lambda\Phi(X),

where f~\tilde{f} is a smoothing function of ff, μ>0\mu>0, and d𝔻nd\in\mathbb{D}^{n}. By the formulation of (13), we have

~d(X,μ)~(X,μ),d𝔻n,Xm×n,μ(0,μ¯].\displaystyle\mathcal{\widetilde{F}}^{d}(X,\mu)\geq\mathcal{\widetilde{F}}(X,\mu),\quad\forall d\in\mathbb{D}^{n},X\in{\mathbb{R}}^{m\times n},\mu\in(0,\bar{\mu}]. (14)

In the following, we focus on the following nonconvex optimization problem with the given smoothing parameter μ>0\mu>0, and d𝔻nd\in\mathbb{D}^{n}:

~d(X,μ)=f~(X,μ)+λΦd(X).\displaystyle\mathcal{\widetilde{F}}^{d}(X,\mu)=\tilde{f}(X,\mu)+\lambda\Phi^{d}(X). (15)

When the matrix XX is the diagnoal matrix, since the penalty item of (15) is piecewise linearity function, the proximal operator of τΦd\tau\Phi^{d} on n\mathbb{R}^{n} has a closed form solution with the given vectors d𝔻n,wnd\in\mathbb{D}^{n},w\in\mathbb{R}^{n}, and a positive number τ>0\tau>0, in detail, the following proximal operator

x^=argminxn{τΦd(x)+12xw2}\displaystyle\hat{x}=\arg\min_{x\in\mathbb{R}^{n}}\left\{\tau\Phi^{d}(x)+\frac{1}{2}{\|x-w\|}^{2}\right\} (16)

can be calculated by x^i=max{w^iτv,0}\hat{x}_{i}=\mathop{\max}\left\{\hat{w}_{i}-\frac{\tau}{v},0\right\}, i=1,2,,n,\forall i=1,2,\ldots,n, where

w^i={wi, if di=1,wi+τ/v, if di=2.\hat{w}_{i}=\begin{cases}w_{i},&\text{ if }d_{i}=1,\\ w_{i}+\tau/v,&\text{ if }d_{i}=2.\end{cases} (17)

Motivated by the smoothing approximation technique, the proximal gradient algorithm and the efficiency of inertial method, we consider the following general inetrial smoothing proxiaml gradient method of ~dk(,μk)\mathcal{\widetilde{F}}^{d^{k}}(\cdot,\mu_{k}), i.e.,

{Yk=Xk+αk(XkXk1),Zk=Xk+βk(XkXk1),Qdk(X,Yk,Zk,μk)=XYk,f~(Zk,μk)+12hkμk1XYk2+λΦdk(X),\begin{cases}Y^{k}=X^{k}+\alpha_{k}(X^{k}-X^{k-1}),\\ Z^{k}=X^{k}+\beta_{k}(X^{k}-X^{k-1}),\\ Q_{d^{k}}(X,Y^{k},Z^{k},\mu_{k})=\langle X-Y^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle+\frac{1}{2}{h_{k}\mu^{-1}_{k}}{\|X-Y^{k}\|}^{2}+\lambda{{\Phi}^{d^{k}}}(X),\end{cases}

where αk,βk\alpha_{k},\beta_{k} are different extrapolation parameters and certain conditions are required for them, hkh_{k} is the parameter depending on L~\tilde{L}. Especially, when αk=βk=0\alpha_{k}=\beta_{k}=0, the GIMSPG algorithm is reduced to the MSPG algorithm. Based on (16) and Lemma 3, we get that the problem minQdk(X,Yk,Zk,μk)\mathop{\min Q_{d^{k}}(X,Y^{k},Z^{k},\mu_{k})} has a minimizer X^=U𝒟(x^)VT\hat{X}=U\mathscr{D}(\hat{x})V^{T} with x^=proxτΦd(wk)\hat{x}={\rm{prox}}_{\tau\Phi^{d}}(w^{k}), τ=λhk1μk\tau=\lambda{h^{-1}_{k}}\mu_{k} and Wk=U𝒟(wk)VT,W^{k}=U\mathscr{D}({w^{k}})V^{T}, Wk=Ykhk1μkf~(Zk,μk)W^{k}=Y^{k}-{h^{-1}_{k}}\mu_{k}\nabla\tilde{f}(Z^{k},\mu_{k}).

Assumption 4.

(Parameters constraints)The parameters αk,βk,hk\alpha_{k},\beta_{k},h_{k} in the GIMSPG algorithm satisfy the following conditions: for any 0<ε10<\varepsilon\ll 1, αk\alpha_{k} is nonincreasing and αk[0,12σε1+2σ)\alpha_{k}\in[0,\frac{1-2^{\sigma}\varepsilon}{1+2^{\sigma}}), βk[0,1]{\beta_{k}}\in[0,1], {hk}\{h_{k}\} is nonincreasing and satisfies h01hk1min{1αk(αk+ε)μkμk+1(1βk)L~,αkβkL~}.h^{-1}_{0}\leq h^{-1}_{k}\leq\min\{\frac{1-\alpha_{k}-(\alpha_{k}+\varepsilon)\frac{\mu_{k}}{\mu_{k+1}}}{(1-\beta_{k})\tilde{L}},\frac{\alpha_{k}}{\beta_{k}\tilde{L}}\}.

Algorithm 1 GIMSPG algorithm
1:Input: X1=X0m×nX^{-1}=X^{0}\in\mathbb{R}^{m\times n}, μ1=μ0(0,μ¯]{\mu}_{-1}={\mu}_{0}\in(0,\bar{\mu}], σ(0,1)\sigma\in(0,1), choosing the parameters αk,βk,hk\alpha_{k},\beta_{k},h_{k} satisfying the Assumption 4. Set k=0k=0.
2:while a termination criterion is not met do
3:step 1: Let dk=dXkd^{k}=d^{X^{k}}, where dikd_{i}^{k} is defined as in (11).
4:step 2: Compute
{Yk=Xk+αk(XkXk1),Zk=Xk+βk(XkXk1),Xk+1argminXm×nQdk(X,Yk,Zk,μk).\begin{cases}Y^{k}=X^{k}+\alpha_{k}(X^{k}-X^{k-1}),\\ Z^{k}=X^{k}+\beta_{k}(X^{k}-X^{k-1}),\\ X^{k+1}\in\arg\min\limits_{X\in\mathbb{R}^{m\times n}}{Q_{d^{k}}(X,Y^{k},Z^{k},\mu_{k})}.\end{cases} (18)
5:step 3: If
Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)αμk2,H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})\leq-\alpha{\mu_{k}}^{2}, (19)
set μk+1=μk\mu_{k+1}=\mu_{k}, otherwise, set
μk+1=μ0(k+1)σ,\mu_{k+1}=\frac{\mu_{0}}{(k+1)^{\sigma}}, (20)
set k:=k+1k:=k+1, and return to Step 1.
6:end while
Remark 1.

The parameter hkh_{k} is bounded. Indeed, if 1αk(αk+ε)μkμk+1(1βk)L~>1L~\frac{1-\alpha_{k}-(\alpha_{k}+\varepsilon)\frac{\mu_{k}}{\mu_{k+1}}}{(1-\beta_{k})\tilde{L}}>\frac{1}{\tilde{L}} and αkβkL~>1L~\frac{\alpha_{k}}{\beta_{k}\tilde{L}}>\frac{1}{\tilde{L}}, then it has that αk+(αk+ε)μkμk+1<βk<αk\alpha_{k}+(\alpha_{k}+\varepsilon)\frac{\mu_{k}}{\mu_{k+1}}<\beta_{k}<\alpha_{k} which is impossible, then 1L~\frac{1}{\tilde{L}} can be seen as a upper bound of hk1h_{k}^{-1}.

Remark 2.

From Assumption 4, when 1αk(αk+ε)μkμk+1(1βk)L~=αkβkL~\frac{1-\alpha_{k}-(\alpha_{k}+\varepsilon)\frac{\mu_{k}}{\mu_{k+1}}}{(1-\beta_{k})\tilde{L}}=\frac{\alpha_{k}}{\beta_{k}\tilde{L}}, we have the largest range of the stepsize. In this case, αk=βk(1εμkμk+1)1+βkμkμk+1\alpha_{k}=\frac{\beta_{k}(1-\varepsilon\frac{\mu_{k}}{\mu_{k+1}})}{1+\beta_{k}\frac{\mu_{k}}{\mu_{k+1}}}. Moreover, from infμkμk+1=1\inf\frac{\mu_{k}}{\mu_{k+1}}=1 and supμkμk+1=2σ\sup\frac{\mu_{k}}{\mu_{k+1}}=2^{\sigma}, we can take αk=0.98βk1+2σβk\alpha_{k}=\frac{0.98\beta_{k}}{1+2^{\sigma}\beta_{k}} and hk=(1+2σβk)L~0.98h_{k}=\frac{(1+2^{\sigma}\beta_{k})\tilde{L}}{0.98}. It needs to mention that the parameters in the GIMSPG algorithm can be selected adaptively. However, how to select appropriate parameters is beyond the scope of this paper.

Next, we are ready to dicuss the convergence by constructing the auxiliary sequence. For any kk\in\mathbb{N}, define

Hδk+1(Xk+1,Xk,μk+1,μk):=~(Xk+1,μk)+κμk+δk+1μk+11Xk+1Xk2,\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}):=\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\kappa\mu_{k}+\delta_{k+1}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2}, (21)

where {Xk}\{X^{k}\}, {μk}\{\mu_{k}\} are the sequences generated by GIMSPG algorithm and δk+1=hk+1αk+12\delta_{k+1}=\frac{h_{k+1}\alpha_{k+1}}{2}. It needs to mention that the sequence is essential to the convergence analysis of the GIMSPG algorithm. Now, we start by showing that GIMSPG algorithm is well defined.

Lemma 4.

The proposed GIMSPG algorithm is well defined, and the sequence {μk}\left\{\mu_{k}\right\} generated by the GIMSPG has the property that there are infinite elements in 𝒩s\mathcal{N}^{s} and limk+μk=0\lim_{k\to+\infty}\mu_{k}=0, where 𝒩s:={k:μk+1μk}\mathcal{N}^{s}:=\left\{k\in\mathbb{N}:\mu_{k+1}\neq\mu_{k}\right\}.

Proof: We can easily get the result from Zhang_J in Lemma 4.

Then we are intend to prove that the auxiliary sequence Hδk+1(Xk+1,Xk,μk+1,μk)H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}) is monotonically decreasing.

Lemma 5.

For any kk\in\mathbb{N}, suppose {Xk}\{X^{k}\} be the sequence generated by GIMSPG algorithm, it holds that

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})
\displaystyle\leq ((L~hk2+hkαkβkL~2)μk1+αkhkμk+112)Xk+1Xk2\displaystyle((\frac{\tilde{L}-h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k}h_{k}\mu^{-1}_{k+1}}{2})\|X^{k+1}-X^{k}\|^{2}
+L~βk(βk1)2μk1XkXk12.\displaystyle+\frac{\tilde{L}\beta_{k}(\beta_{k}-1)}{2}\mu^{-1}_{k}{\|X^{k}-X^{k-1}\|}^{2}. (22)

Moreover, when the parameters αk,βk,hk\alpha_{k},\beta_{k},h_{k} satisfy Assumption 4, it holds that

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)εhk2μk+11Xk+1Xk2,\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})\leq\frac{-\varepsilon h_{k}}{2}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2}, (23)

and the sequence Hδk+1(Xk+1,Xk,μk+1,μk)H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}) is nonincreasing.

Proof: From the optimality of subproblem (18), we have

Xk+1Yk,f~(Zk,μk)+12hkμk1Xk+1Yk2+λΦdk(Xk+1)\displaystyle\langle X^{k+1}-Y^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle+\frac{1}{2}h_{k}{\mu_{k}}^{-1}{\|X^{k+1}-Y^{k}\|}^{2}+\lambda{{\Phi}^{d^{k}}}(X^{k+1})
\displaystyle\leq XkYk,f~(Zk,μk)+12hkμk1XkYk2+λΦdk(Xk).\displaystyle\langle X^{k}-Y^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle+\frac{1}{2}h_{k}{\mu_{k}}^{-1}{\|X^{k}-Y^{k}\|}^{2}+\lambda{{\Phi}^{d^{k}}}(X^{k}). (24)

Since f~(,μk)\nabla\tilde{f}(\cdot,\mu_{k}) is Lipschitz continuous with modulus L~μk1\tilde{L}{\mu^{-1}_{k}}, it follows from the Definition 2(v) that

f~(Xk+1,μk)f~(Zk,μk)+Xk+1Zk,f~(Zk,μk)+12L~μk1Xk+1Zk2.\displaystyle\tilde{f}(X^{k+1},\mu_{k})\leq\tilde{f}(Z^{k},\mu_{k})+\langle X^{k+1}-Z^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle+\frac{1}{2}\tilde{L}{\mu^{-1}_{k}}{\|X^{k+1}-Z^{k}\|}^{2}. (25)

Moreover, by the convexity of f~(,μk)\tilde{f}(\cdot,\mu_{k}), it holds that

f~(Zk,μk)+XkZk,f~(Zk,μk)f~(Xk,μk).\displaystyle\tilde{f}(Z^{k},\mu_{k})+\langle X^{k}-Z^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle\leq\tilde{f}(X^{k},\mu_{k}). (26)

Combining (3), (25) and (26), we have

~dk(Xk+1,μk)~dk(Xk,μk)\displaystyle\tilde{{\mathcal{F}}}^{d^{k}}(X^{k+1},\mu_{k})-\widetilde{\mathcal{F}}^{d^{k}}(X^{k},\mu_{k}) (27)
\displaystyle\leq hkμk1YkXk,Xk+1Xk12hkμk1Xk+1Xk2+L~μk12Xk+1Zk2.\displaystyle h_{k}{\mu^{-1}_{k}}\langle Y^{k}-X^{k},X^{k+1}-X^{k}\rangle-\frac{1}{2}h_{k}{\mu^{-1}_{k}}{\|{X}^{k+1}-X^{k}\|}^{2}+\frac{\tilde{L}\mu^{-1}_{k}}{2}{\|{X}^{k+1}-Z^{k}\|}^{2}.

Denote Δk:=XkXk1\Delta_{k}:=X^{k}-X^{k-1}, we have αkΔk=YkXk\alpha_{k}\Delta_{k}=Y^{k}-X^{k}, βkΔk=ZkXk\beta_{k}\Delta_{k}=Z^{k}-X^{k}, βkΔkΔk+1=ZkXk+1\beta_{k}\Delta_{k}-\Delta_{k+1}=Z^{k}-X^{k+1}. Then it follows from (27) that,

~dk(Xk+1,μk)~dk(Xk,μk)\displaystyle\widetilde{{\mathcal{F}}}^{d^{k}}(X^{k+1},\mu_{k})-\widetilde{{\mathcal{F}}}^{d^{k}}(X^{k},\mu_{k})
\displaystyle\leq hkμk1αkΔk,Δk+1+L~μk12βkΔkΔk+1212hkμk1Δk+12\displaystyle h_{k}{\mu_{k}}^{-1}\langle\alpha_{k}\Delta_{k},\Delta_{k+1}\rangle+\frac{\tilde{L}\mu^{-1}_{k}}{2}{\|\beta_{k}\Delta_{k}-\Delta_{k+1}\|}^{2}-\frac{1}{2}h_{k}{\mu_{k}}^{-1}{\|\Delta_{k+1}\|}^{2}
=\displaystyle= (L~hk)μk12Δk+12+L~2βk2μk1Δk2+(hkαkβkL~)μk1Δk,Δk+1\displaystyle\frac{(\tilde{L}-h_{k})\mu^{-1}_{k}}{2}{\|\Delta_{k+1}\|}^{2}+\frac{\tilde{L}}{2}\beta^{2}_{k}\mu^{-1}_{k}{\|\Delta_{k}\|}^{2}+(h_{k}\alpha_{k}-\beta_{k}\tilde{L})\mu^{-1}_{k}\langle\Delta_{k},\Delta_{k+1}\rangle
\displaystyle\leq (L~hk2+hkαkβkL~2)μk1Δk+12+(L~βk22+hkαkβkL~2)μk1Δk2,\displaystyle(\frac{\tilde{L}-h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}{\|\Delta_{k+1}\|}^{2}+(\frac{\tilde{L}\beta^{2}_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}{\|\Delta_{k}\|}^{2}, (28)

where the second inequality comes from the Assumption 4 and Cauchy-Schwartz inequality.

Letting dk=dXkd^{k}=d^{X^{k}} and according to ~dk(Xk+1,μk)~(Xk+1,μk)\mathcal{\widetilde{F}}^{d^{k}}(X^{k+1},\mu_{k})\geq\mathcal{\widetilde{F}}(X^{k+1},\mu_{k}), we have

~(Xk+1,μk)+αk+1hk+1μk+112Δk+12(~(Xk,μk)+αkhkμk12Δk2)\displaystyle\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}{\|\Delta_{k+1}\|}^{2}-(\widetilde{\mathcal{F}}(X^{k},\mu_{k})+\frac{\alpha_{k}h_{k}\mu^{-1}_{k}}{2}{\|\Delta_{k}\|}^{2})
((L~hk2+hkαkβkL~2)μk1+αk+1hk+1μk+112)Δk+12\displaystyle\leq((\frac{\tilde{L}-h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}){\|\Delta_{k+1}\|}^{2}
+((L~βk2+hkαkβkL~2)μk1αkhkμk12)Δk2.\displaystyle\quad+((\frac{\tilde{L}\beta^{2}_{k}+h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}-\frac{\alpha_{k}h_{k}\mu^{-1}_{k}}{2}){\|\Delta_{k}\|}^{2}. (29)

By the Definition 2(iv), we easily have that

~(Xk,μk)~(Xk,μk1)+κ(μk1μk).\displaystyle\widetilde{\mathcal{F}}(X^{k},\mu_{k})\leq\widetilde{\mathcal{F}}(X^{k},\mu_{k-1})+\kappa(\mu_{k-1}-\mu_{k}). (30)

Combining (3), (30) with the nonincreasing of hkh_{k}, αk\alpha_{k}, we get

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})
\displaystyle\leq ((L~hk2+hkαkβkL~2)μk1+αkhkμk+112)Δk+12+L~(βk2βk)2μk1Δk2.\displaystyle((\frac{\tilde{L}-h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k}h_{k}\mu^{-1}_{k+1}}{2}){\|\Delta_{k+1}\|}^{2}+\frac{\tilde{L}(\beta^{2}_{k}-\beta_{k})}{2}\mu^{-1}_{k}{\|\Delta_{k}\|}^{2}.

According to the parameters constraint in Assumption 4, we easily have that (L~hk+hkαkβkL~)μk1+αkhkμk+11<hkμk+11ε(\tilde{L}-h_{k}+h_{k}\alpha_{k}-\beta_{k}\tilde{L})\mu^{-1}_{k}+\alpha_{k}h_{k}\mu^{-1}_{k+1}<-h_{k}\mu^{-1}_{k+1}\varepsilon, βk2βk0\beta^{2}_{k}-\beta_{k}\leq 0. Hence, the inequality (23) holds. So the desired results are obtained.

Corollary 1.

For any kk\in\mathbb{N}, under the Assumption 4, there exists ζ\zeta\in\mathbb{R} satisfying

ζ:=limk+Hδk+1(Xk+1,Xk,μk+1,μk).\zeta:=\mathop{\lim_{k\to+\infty}}H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}).

Further, it holds that

limk+Hδk+1(Xk+1,Xk,μk+1,μk)\displaystyle\mathop{\lim_{k\to+\infty}}H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}) =limk+~(Xk,μk1)=limk+(Xk)=ζ.\displaystyle=\mathop{\lim_{k\to+\infty}}\widetilde{\mathcal{F}}(X^{k},\mu_{k-1})=\mathop{\lim_{k\to+\infty}}\mathcal{F}(X^{k})=\zeta. (31)

and the sequence {Xk}\left\{X^{k}\right\} is bounded.

Proof: By the definition of Hδk+1H_{\delta_{k+1}} from (21), we obtain that

Hδk+1(Xk+1,Xk,μk+1,μk)~(Xk+1,μk)+κμk(Xk+1)\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})\geq\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\kappa\mu_{k}\geq\mathcal{F}(X^{k+1}) minXm×n(X)\displaystyle\geq\mathop{\min_{X\in\mathbb{R}^{m\times n}}}\mathcal{F}(X)
=minXm×n0(X),\displaystyle=\mathop{\min_{X\in\mathbb{R}^{m\times n}}}\mathcal{F}_{\ell_{0}}(X),

where the last equality holds due to the result that the global minimizers of problem (2) and problem (1) is equivalent from Lemma 2, further, according to the fact that {Hδk+1(Xk+1,Xk,μk+1,μk)}\left\{H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})\right\} is nonincreasing, then there exists ζ\zeta\in\mathbb{R} such that

limk+Hδk+1(Xk+1,Xk,μk+1,μk)=ζ,Hδk+1(Xk+1,Xk,μk+1,μk)ζ,k.\displaystyle\mathop{\lim_{k\to+\infty}}H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})=\zeta,\quad H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})\geq\zeta,\forall k\in\mathbb{N}. (32)

Next, note that

|~(Xk+1,μk)+κμkζ|\displaystyle|\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\kappa\mu_{k}-\zeta|
=\displaystyle= |Hδk+1(Xk+1,Xk,μk+1,μk)ζαk+1hk+1μk+112Δk+12|\displaystyle|H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-\zeta-\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}{\|\Delta_{k+1}\|}^{2}|
\displaystyle\leq |Hδk+1(Xk+1,Xk,μk+1,μk)ζ|+αk+1hk+1εhkεhkμk+112Δk+12\displaystyle|H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-\zeta|+\frac{\alpha_{k+1}h_{k+1}}{\varepsilon h_{k}}\cdot\frac{\varepsilon h_{k}\mu^{-1}_{k+1}}{2}{\|\Delta_{k+1}\|}^{2}
\displaystyle\leq |Hδk+1(Xk+1,Xk,μk+1,μk)ζ|+αk+1hk+1εhk(Hδk(Xk,Xk1,μk,μk1)ζ)\displaystyle|H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-\zeta|+\frac{\alpha_{k+1}h_{k+1}}{\varepsilon h_{k}}(H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})-\zeta) (33)

where the first inequality holds due to the traingle inequality and the second one derives from (23) and (32). When kk is sufficiently large, the right of the above inequality tends to 0 by means of (32) and the boundedness of αk+1hk+1εhk\frac{\alpha_{k+1}h_{k+1}}{\varepsilon h_{k}}. Therefore, it holds that

limk+Hδk+1(Xk+1,Xk,μk+1,μk)=limk+~(Xk+1,μk)=limk+(Xk)=ζ.\mathop{\lim_{k\to+\infty}}H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})=\mathop{\lim_{k\to+\infty}}\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})=\mathop{\lim_{k\to+\infty}}\mathcal{F}(X^{k})=\zeta.

From the nonincreasing property of the sequence {Hδk+1(Xk+1,Xk,μk+1,μk)}\left\{H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})\right\} again, it holds that

(Xk+1)~(Xk+1,μk)+κμk\displaystyle\mathcal{F}(X^{k+1})\leq\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\kappa\mu_{k} Hδk+1(Xk+1,Xk,μk+1,μk)\displaystyle\leq{H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})}
Hδ0(X0,X1,μ0,μ1)<+.\displaystyle\leq H_{\delta_{0}}(X^{0},X^{-1},\mu_{0},\mu_{-1})<+\infty.

Then we get that {Xk}\left\{X^{k}\right\} is bounded for any kk\in\mathbb{N} from the level bounded of {\mathcal{F}} by the Assumption 3.

Corollary 2.

Under the parameters constraint in Assumption 4, it holds that

k=0+μk1XkXk12<+,k=0+μk11XkXk12<+.\displaystyle\sum_{k=0}^{+\infty}\mu^{-1}_{k}{\|X^{k}-X^{k-1}\|}^{2}<+\infty,\quad\sum_{k=0}^{+\infty}\mu^{-1}_{k-1}{\|X^{k}-X^{k-1}\|}^{2}<+\infty.

Moreover, limk+XkXk1=0\lim_{k\to+\infty}\|X^{k}-X^{k-1}\|=0.

Proof: Summing up the inequality (23) from k=0k=0 to k=Kk=K for any KK\in\mathbb{N}, it yields that

k=0Kεhk2μk+11Xk+1Xk2\displaystyle\sum_{k=0}^{K}\frac{\varepsilon h_{k}}{2}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2}
\displaystyle\leq Hδ0(X0,X1,μ0,μ1)HδK+1(XK+1,XK,μK+1,μK),\displaystyle H_{\delta_{0}}(X^{0},X^{-1},\mu_{0},\mu_{-1})-H_{\delta_{K+1}}(X^{K+1},X^{K},\mu_{K+1},\mu_{K}),

when KK is sufficiently large, from (32) and the boundedness of hkh_{k}, we have

k=0+μk+11Xk+1Xk2<+.\displaystyle\sum_{k=0}^{+\infty}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2}<+\infty.

Since {μk}\{\mu_{k}\} is a nonincreasing sequence, it can be easily got that

k=0+μk1Xk+1Xk2k=0+μk+11Xk+1Xk2<+.\displaystyle\sum_{k=0}^{+\infty}\mu^{-1}_{k}{\|X^{k+1}-X^{k}\|}^{2}\leq\sum_{k=0}^{+\infty}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2}<+\infty.

Further, from μkμ¯<1\mu_{k}\leq\bar{\mu}<1 for any kk\in\mathbb{N}, we have

limk+Xk+1Xk=0.\lim_{k\to+\infty}\|X^{k+1}-X^{k}\|=0.

So the proof is completed.

Theorem 1.

Suppose {Xk}\{X^{k}\} be the sequence generated by the GIMSPG algorithm. Then, we have that

  1. (i)

    for any k𝒩sk\in\mathcal{N}^{s}, dkd^{k} in Algorithm 1 only changes finite number of times;

  2. (ii)

    for any accumulation point X¯\bar{X} and XX^{*} of {Xk,k𝒩s}\{X^{k},k\in\mathcal{N}^{s}\}, it follows that 𝒜c(σ(X¯))=𝒜c(σ(X))\mathcal{A}^{c}(\sigma(\bar{X}))=\mathcal{A}^{c}(\sigma(X^{*})), moreover, there exists JJ\in\mathbb{N}, for any iJi\geq J, it holds that

    σ(Xki)𝒜c(σ(X¯))σ(X¯)𝒜c(σ(X¯))=0,\|\sigma(X^{k_{i}})_{\mathcal{A}^{c}(\sigma(\bar{X}))}-\sigma(\bar{X})_{\mathcal{A}^{c}(\sigma(\bar{X}))}\|=0,

    where {Xki}\{X^{k_{i}}\} is a subsequence of {Xk:k𝒩s}\left\{X^{k}:k\in\mathcal{N}^{s}\right\} such that limi+Xki=X¯\lim_{i\to+\infty}X^{k_{i}}=\bar{X}.

Proof: (i) From the first order optimality condition of (18), there exist U,V(Xk+1)U,V\in\mathcal{M}(X^{k+1}) such that

U(f~(Zk,μk)+hkμk1(Xk+1Yk))VT+λvσ(Xk+1)1λj=1nθdjk(σj(Xk+1))Ej=0.\displaystyle U(\nabla\tilde{f}(Z^{k},\mu_{k})\!+\!h_{k}\mu^{-1}_{k}(X^{k+1}\!-\!Y^{k}))V^{T}\!+\!\frac{\lambda}{v}\partial{\|\sigma(X^{k+1})\|_{1}}\!-\!\lambda\sum_{j=1}^{n}\nabla\theta_{d^{k}_{j}}(\sigma_{j}(X^{k+1}))E_{j}=\textbf{0}. (34)

From (23) and for any k𝒩sk\in\mathcal{N}^{s}, the inequality (19) does not hold, then we get

εhk2μk+11Xk+1Xk2αμk2.\frac{\varepsilon h_{k}}{2}\mu^{-1}_{{k+1}}{\|X^{k+1}-X^{k}\|}^{2}\leq\alpha{\mu^{2}_{k}}.

That is,

εhk2μk+11Xk+1Xkαμk2μk+11.\sqrt{\frac{\varepsilon h_{k}}{2}}\mu^{-1}_{k+1}\|X^{k+1}-X^{k}\|\leq\sqrt{\alpha\mu^{2}_{k}\mu^{-1}_{k+1}}.

According to limk+μk=0\lim_{k\to+\infty}\mu_{k}=0, the nonincreasing of μk\mu_{k} and the boundedness of hkh_{k}, we have that

limk+μk+11Xk+1Xk=0,limk+μk1Xk+1Xk=0.\displaystyle\lim_{k\to+\infty}\mu^{-1}_{k+1}\|X^{k+1}-X^{k}\|=0,\quad\lim_{k\to+\infty}\mu^{-1}_{k}\|X^{k+1}-X^{k}\|=0. (35)

Further, since

μk1Xk+1Ykμk1Xk+1Xk+μk1XkXk1,\mu^{-1}_{k}\|X^{k+1}-Y^{k}\|\leq\mu^{-1}_{k}\|X^{k+1}-X^{k}\|+\mu^{-1}_{k}\|X^{k}-X^{k-1}\|,

we have limk+μk1Xk+1Yk=0\lim_{k\to+\infty}\mu^{-1}_{k}\|X^{k+1}-Y^{k}\|=0. From the boundedness of hkh_{k}, we get that

limk+hkμk1Xk+1Yk=0,\displaystyle\lim_{k\to+\infty}h_{k}\mu^{-1}_{k}\|X^{k+1}-Y^{k}\|=0, (36)

then there exists KK\in\mathbb{N} such that for any kKk\geq K, it has that

hkμk1Xk+1Yk<ε,\displaystyle h_{k}\mu^{-1}_{k}\|X^{k+1}-Y^{k}\|<\varepsilon, (37)

where ε=12(λvLf).\varepsilon=\frac{1}{2}(\frac{\lambda}{v}-L_{f}).

If there exist k0kk_{0}\geq k and j{1,2,,n}j\in\{1,2,...,n\} such that σj(Xk0)<v\sigma_{j}(X^{k_{0}})<v, then djk0=1d^{k_{0}}_{j}=1 from (11) which yields that θdjk0(σj(Xk0+1))=0\nabla\theta_{d^{k_{0}}_{j}}(\sigma_{j}(X^{k_{0}+1}))=0.

Next, we will prove σj(Xk0+1)=0\sigma_{j}(X^{k_{0}+1})=0 by contradiction. If σj(Xk0+1)0\sigma_{j}(X^{k_{0}+1})\neq 0, from (37), Assumption 2 and Definition 2(iii), then it holds that

(Uf~(Zk,μk)VT)jj+hkμk1(U(Xk+1Yk)VT)jj+λv0,(U\nabla\tilde{f}(Z^{k},\mu_{k})V^{T})_{jj}+h_{k}\mu^{-1}_{k}(U(X^{k+1}-Y^{k})V^{T})_{jj}+\frac{\lambda}{v}\neq 0,

which contradicts (34). In conclusion, if there exist k0kk_{0}\geq k and j{1,2,,n}j\in\{1,2,\ldots,n\} satisfying σj(Xk0)<v\sigma_{j}(X^{k_{0}})<v, then σj(Xk0+1)=0\sigma_{j}(X^{k_{0}+1})=0. Then we get that σj(Xk~)0\sigma_{j}(X^{\tilde{k}})\equiv 0 for any k~>k0\tilde{k}>k_{0}. Therefore, for sufficiently large k~\tilde{k}, it holds that σj(Xk~)0\sigma_{j}(X^{\tilde{k}})\equiv 0 or σj(Xk~)v\sigma_{j}(X^{\tilde{k}})\geq v for j=1,,nj=1,\ldots,n. So dkd^{k} only changes finite number of times.

(ii) Suppose X¯\bar{X} and XX^{*} be any two accumualtion points of {Xk}\{X^{k}\}, then we get 𝒜c(σ(X¯))=𝒜c(σ(X))\mathcal{A}^{c}(\sigma(\bar{X}))=\mathcal{A}^{c}(\sigma(X^{*})) by (i) of this Theorem. From the boundedness of {Xk}\{X^{k}\}, then there exists a subsequence {Xki,ki𝒩s}\{X^{k_{i}},k_{i}\in\mathcal{N}^{s}\} of {Xk,k𝒩s}\{X^{k},k\in\mathcal{N}^{s}\} such that limi+Xki=X¯\lim_{i\to+\infty}X^{k_{i}}=\bar{X}. Then there exists JJ\in\mathbb{N}, for any iJi\geq J, it has that

σ(Xki)𝒜c(σ(X¯))σ(X¯)𝒜c(σ(X¯))=0,\|\sigma(X^{k_{i}})_{\mathcal{A}^{c}(\sigma(\bar{X}))}-\sigma(\bar{X})_{\mathcal{A}^{c}(\sigma(\bar{X}))}\|=0,

so the proof is completed.

In the following, we establish the subsequence convergence result.

Theorem 2.

Under the Assumption 4, any accumulation point of {Xk:k𝒩s}\left\{X^{k}:k\in\mathcal{N}^{s}\right\} is a lifted stationary point of problem (2).

Proof: By the boundedness of {Xk:k𝒩s}\left\{X^{k}:k\in\mathcal{N}^{s}\right\} from Corollary 1, letting X¯\bar{X} be an accumulation point of {Xk:k𝒩s}\left\{X^{k}:k\in\mathcal{N}^{s}\right\}, then there exists a subsequence {Xki}ki𝒩s\left\{X^{k_{i}}\right\}_{k_{i}\in\mathcal{N}^{s}} of {Xk}k𝒩s\left\{X^{k}\right\}_{k\in\mathcal{N}^{s}} satisfying limi+XkiX¯\lim_{i\to+\infty}X^{k_{i}}\to\bar{X}.

Using the first order necessary optimality condition of (18), we get that

0f~(Zki,μki)+hkiμki1(Xki+1Yki)+λξkiξkiΦdki(Xki+1).\displaystyle\textbf{0}\in\nabla\tilde{f}(Z^{k_{i}},\mu_{k_{i}})+h_{k_{i}}\mu^{-1}_{k_{i}}(X^{k_{i}+1}-Y^{k_{i}})+\lambda\xi^{k_{i}}\quad\forall\;\xi^{k_{i}}\in\partial\Phi^{d^{k_{i}}}(X^{k_{i}+1}). (38)

By virtue of limi+Xki=X¯\lim_{i\to+\infty}X^{k_{i}}=\bar{X} and the fact that elements in {dki:i}\left\{d^{k_{i}}:i\in\mathbb{N}\right\} are finite, there is a subsequence {kij}\left\{k_{i_{j}}\right\} of {ki}\left\{k_{i}\right\} and d¯𝒟(σ(X¯))\bar{d}\in\mathcal{D}(\sigma(\bar{X})) such that dkij=d¯d^{k_{i_{j}}}=\bar{d}, j\forall j\in\mathbb{N}. Further, from limj+Xkij+1Xkij=0\lim_{j\to+\infty}\|X^{k_{i_{j}}+1}-X^{k_{i_{j}}}\|=0, (35) and the upper semicontinuity of Φdkij\partial\Phi^{d^{k_{i_{j}}}}, we get that

limj+Xkij+1=X¯,limj+Zkij=X¯,limj+μkij1Xkij+1Ykij=0,\lim_{j\to+\infty}X^{k_{i_{j}}+1}=\bar{X},\quad\lim_{j\to+\infty}Z^{k_{i_{j}}}=\bar{X},\quad\lim_{j\to+\infty}\mu^{-1}_{k_{i_{j}}}\|X^{k_{i_{j}}+1}-Y^{k_{i_{j}}}\|=0,

and

{limj+ξkij:ξkijΦdkij(Xkij+1)}Φd¯(X¯),\displaystyle\left\{\lim_{j\to+\infty}\xi^{k_{i_{j}}}:\xi^{k_{i_{j}}}\in\partial\Phi^{d^{k_{i_{j}}}}(X^{k_{i_{j}}+1})\right\}\subseteq\partial\Phi^{\bar{d}}(\bar{X}), (39)

taking ki=kijk_{i}=k_{i_{j}} in (38) and letting j+j\to+\infty, together (35), (38), (39) with the Definition (2)(iii), it deduces that there exist ψ¯f(X¯)\bar{\psi}\in\partial f(\bar{X}) and ξ¯d¯Φd¯(X¯)\bar{\xi}^{\bar{d}}\in\partial\Phi^{\bar{d}}(\bar{X}) satisfying the following

0=ψ¯+λξ¯d¯.\displaystyle\textbf{0}=\bar{\psi}+\lambda\bar{\xi}^{\bar{d}}.

This completes the proof.

Remark 3.

It’s well known that, in the vector case, the bregman distance is a generalized form of the Euclidean distance. One can refer to the literatures Bolte_J ; Teboulle_M which have studied the theoretical analysis and methods based on Bregman distance. Similarly, in the matrix case, for any X,Y𝕊nX,Y\in\mathbb{S}^{n}, the Euclidean distance 12XY2\frac{1}{2}\|X-Y\|^{2} used in solving the subproblem (18) can be replaced by general Bregman regularization distance Dϕ(X,Y)D_{\phi}(X,Y) where ϕ:𝕊n\phi:\mathbb{S}^{n}\to\mathbb{R} is a convex and continuously differentiable function which has been studied in Ma_S ; Kulis_B .

For any X,Y𝕊nX,Y\in\mathbb{S}^{n}, the Bregman distance on matrix space is defined as

Dϕ(X,Y):=ϕ(X)ϕ(Y)tr(Tϕ(X)(XY)).D_{\phi}(X,Y):=\phi(X)-\phi(Y)-tr(\nabla^{T}\phi(X)(X-Y)).

Specially, when ϕ(X)=12X2\phi(X)=\frac{1}{2}\|X\|^{2}, we have Dϕ(X,Y)=12XY2D_{\phi}(X,Y)=\frac{1}{2}\|X-Y\|^{2}. If the function ϕ\phi is strong convex with the strong convex modulus ϱ\varrho, then Dϕ(X,Y)ϱ2XY2D_{\phi}(X,Y)\geq\frac{\varrho}{2}\|X-Y\|^{2}. In this case, the subproblem in (18) is generalized as follows:

{Yk=Xk+αk(XkXk1),Zk=Xk+βk(XkXk1),Xk+1=argminX𝕊nQdk(X,Yk,Zk,μk)\begin{cases}Y^{k}=X^{k}+\alpha_{k}(X^{k}-X^{k-1}),\\ Z^{k}=X^{k}+\beta_{k}(X^{k}-X^{k-1}),\\ X^{k+1}=\mathop{\arg\min\limits_{X\in\mathbb{S}^{n}}}{Q^{\prime}_{d^{k}}(X,Y^{k},Z^{k},\mu_{k})}\end{cases} (40)

where Qdk(X,Yk,Zk,μk)=X,f~(Zk,μk)hkμk1(YkXk)+hkμk1Dϕ(X,Xk)+λΦdk(X).Q^{\prime}_{d^{k}}(X,Y^{k},Z^{k},\mu_{k})=\langle X,\nabla\widetilde{f}(Z^{k},\mu_{k})-h_{k}\mu^{-1}_{k}(Y_{k}-X_{k})\rangle+h_{k}\mu^{-1}_{k}D_{\phi}(X,X^{k})+\lambda{{\Phi}^{d^{k}}}(X). In this case, the auxiliary sequence defined as (21) is descending under the Assumption 5 and the proof is presented in Appendix 5. The parameter constraints in Assumption 4 is generalized as follows:

Assumption 5.

for any 0<ε10<\varepsilon\ll 1, αk[0,ϱ2σε1+2σ)\alpha_{k}\in[0,\frac{\varrho-2^{\sigma}\varepsilon}{1+2^{\sigma}}), βk[0,1]{\beta_{k}}\in[0,1], {hk}\{h_{k}\} is nonincreasing and satisfies h01hk1min{ϱαk(αk+ε)μkμk+1(1βk)L~,αkβkL~}.h^{-1}_{0}\leq h^{-1}_{k}\leq\min\{\frac{\varrho-\alpha_{k}-(\alpha_{k}+\varepsilon)\frac{\mu_{k}}{\mu_{k+1}}}{(1-\beta_{k})\tilde{L}},\frac{\alpha_{k}}{\beta_{k}\tilde{L}}\}.

Particularly, if the convex function ϕ\phi is a strong convex function with the modulus ϱ\varrho satisfying ϱ(1+2σ)\varrho\geq(1+2^{\sigma}), then αk[0,1),βk[0,1]\alpha_{k}\in[0,1),\beta_{k}\in[0,1]. This means that the parameter αk\alpha_{k} and βk\beta_{k} can be chosen as the sFISTA system (1) with fixted restart.

4 Numerical Experiments

In this section, we aim to verify the efficiency of the GIMSPG algorithm compared with the MSPG algorithm Yu_Q_Zhang_X , FPCA Ma_S , SVT Cai_J and VBMFL1 Zhao_Q on matrix completion problem, i.e.,

minXm×n(X):=PΩ(XM)1+λrank(X),\displaystyle\min_{X\in\mathbb{R}^{m\times n}}\mathcal{F}(X):=\|P_{\Omega}(X-M)\|_{1}+\lambda\cdot{\rm{rank(X)}}, (41)

where Mm×nM\in\mathbb{R}^{m\times n} and Ω\Omega is the subset of the index set of the matrix MM, PΩ()P_{\Omega}(\cdot) is the projection operator which projects onto the subspace of sparse matrices with nonzero entries confined to the index subset Ω\Omega. The goal of this problem is to find the missing entries of the partially observed low-rank matrix MM based on the known elements {Mij:i,jΩ}\{M_{ij}:i,j\in\Omega\}. All the numerical experiments are carried out on 1.80GHz Core i5 PC with 12GB of RAM.

In the following, we denote X¯\bar{X} the output of the GIMSPG algorithm and use “Iter" “time" to present the number of iterations and the CPU time in seconds respectively. We pick X0=PΩ(M)X^{0}=P_{\Omega}(M) as the initial point. The stopping standard is set as

Xk+1XkXk104.\frac{\|X^{k+1}-X^{k}\|}{\|X^{k}\|}\leq 10^{-4}.

For the following 1\ell_{1} loss function,

f(x)=xb1,withx,bn,f(x)=\|x-b\|_{1},\quad{\rm{with}}\;x,b\in\mathbb{R}^{n},

the smoothing function is defined as

f~(x,μ)=i=1nθ~(xibi,μ)withθ~(s,μ)={|s|if|s|>μ,s22μ+μ2if|s|μ.\displaystyle\tilde{f}(x,\mu)=\sum_{i=1}^{n}\tilde{\theta}(x_{i}-b_{i},\mu)\quad{\rm{with}}\quad\tilde{\theta}(s,\mu)=\begin{cases}|s|\;&{\rm{if}}\,|s|>\mu,\\ \frac{s^{2}}{2\mu}+\frac{\mu}{2}\;&{\rm{if}}\,|s|\leq\mu.\end{cases} (42)

In the numerical simulation, the non-Gaussian noise momdel is set as the typical two-component Gaussian mixture model(GMM), the specific form of the probability density function is

pv(i)=(1c)N(0,σA2)+cN(0,σB2),\displaystyle p_{v}(i)=(1-c)N(0,\sigma^{2}_{A})+cN(0,\sigma^{2}_{B}), (43)

where N(0,σA2)N(0,\sigma^{2}_{A}) indicates as the general noise disturbance with variance σA2\sigma^{2}_{A} and N(0,σB2)N(0,\sigma^{2}_{B}) represents the outliers with a large variance σB2\sigma^{2}_{B}. The parameter cc trade off between them.

The parameters in the GIMSPG algorithm are defined as follows: L~=1\tilde{L}=1, σ=0.9\sigma=0.9, μ0=1\mu_{0}=1, α=1\alpha=1, λ=20\lambda=20. From Remark 2, we can take αk=0.98βk1+2σβk\alpha_{k}=\frac{0.98\beta_{k}}{1+2^{\sigma}\beta_{k}} and hk=(1+2σβk)L~0.98h_{k}=\frac{(1+2^{\sigma}\beta_{k})\tilde{L}}{0.98}.

Moreover, in order to chose an appropriate value of βk\beta_{k}, we perform the experiments to test the influence of different values of βk\beta_{k} on GIMSPG algorithm from the RMSE, the last value μ\mu of smoothing factor and timetime. The details are presented in Figure 1. From it, we observe that the higher value of βk\beta_{k}, the used time is less, but RMSE and the last value of the smoothing factor is larger. For the sake of balance, in the following experiments, we chose βk=0.4\beta_{k}=0.4 in Section 4.1 and 4.2. In Section 4.3, we chose βk=0.3\beta_{k}=0.3.

4.1 Matrix completion on the random data

In this subsetion, we perform the numerical experiments on random generated data. In order to make a fair comparison, our results are averaged over 20 independent tests. Besides, the way of the generated data is same as Yu_Q_Zhang_X , in detail, we first randomly generate matrices MLm×rM_{L}\in\mathbb{R}^{m\times r} and MRn×rM_{R}\in\mathbb{R}^{n\times r} with i.i.d. standard Gaussian entries and let M=MLMRTM=M_{L}M^{T}_{R}. We then sample a subset Ω\Omega with sample ration srsr uniformly at random, where sr=|Ω|mnsr=\frac{|\Omega|}{mn}. In the following experiments, the parameters in the GMM noise are set as σA2=0.0001,σB2=0.1,c=0.1\sigma^{2}_{A}=0.0001,\,\sigma^{2}_{B}=0.1,\,c=0.1. The rank of the matrix MM is set as r=30r=30 and the sampling ration srsr are given three different cases sr=0.2,sr=0.6,sr=0.8sr=0.2,\,sr=0.6,\,sr=0.8. The size of the square matrix mm increases from 100 to 200 with increament 10.

The root-mean-square error (RMSE) is defined as an error estimation criterion,

RMSE:=X¯M2mn.\displaystyle\mathrm{RMSE}:=\sqrt{\frac{\|\bar{X}-M\|^{2}}{mn}}. (44)

The Figure 2 presents the last value of the objective function and the last value μ\mu of smoothing factor for GIMSPG algorithm and MSPG algorithm under different matrix size and sr=0.2,sr=0.6,sr=0.8sr=0.2,\,sr=0.6,\,sr=0.8. It can be easily seen that the last value of the objective function value of GIMSPG algorithm is lower than the MSPG algorithm except sr=0.2sr=0.2. Besides, the last value μ\mu of smoothing factor of GIMSPG algorithm are far lower than the MSPG algorithm, that is, the model in GIMSPG algorithm can be better approxiamtion to the original problem. Moreover, we illustrate the efficiency of GIMSPG algorithm compared with MSPG, FPCA, SVT and VBMFL1 from RMSE and the running time TT two aspects under different matrix size for sr=0.2,sr=0.6,sr=0.8sr=0.2,\,sr=0.6,\,sr=0.8 in Tabels 1,2,3. From them, we see that the running time of GIMSPG algorithm is the least one in these algorithms. We also observe that as the dimension increases, so does the time. Compared with MSPG algorithm, when the sample ratio is lower, the RMSE of GIMSPG algorithm is better. And compared with the other algorithms, the GIMSPG algorithm always has the least value of RMSE.

Refer to caption
Figure 1: The timetime, the last value of μ\mu and RMSE for sr=0.6sr=0.6 and m,n=150m,n=150
Refer to caption
Refer to caption
Figure 2: The last value of the objective function (2) and smoothing factor μ\mu for sr=0.2,0.6,0.8sr=0.2,0.6,0.8
Table 1: Numerical results of the random matrix problem for sr=0.8sr=0.8
mm RMSE TT
GIMSPG MSPG VBMFL1 FPCA SVT GIMSPG MSPG VBMFL1 FPCA SVT
100 0.030 0.031 0.051 0.551 0.242 0.32 1.01 3.33 5.13 3.70
110 0.029 0.030 0.045 0.472 0.076 0.33 1.21 3.53 4.97 4.03
120 0.029 0.029 0.038 0.352 0.049 0.33 1.26 3.56 5.55 4.31
130 0.028 0.029 0.036 0.289 0.036 0.45 1.46 3.84 6.44 4.32
140 0.028 0.028 0.035 0.284 0.028 0.51 1.78 3.56 6.72 4.51
150 0.028 0.028 0.034 0.271 0.026 0.55 1.85 3.58 6.72 4.75
160 0.028 0.028 0.037 0.271 0.023 0.62 2.21 4.06 7.74 4.52
170 0.028 0.027 0.034 0.230 0.022 0.78 2.66 4.30 8.28 4.70
180 0.027 0.026 0.032 0.209 0.022 0.84 2.89 4.16 8.86 4.41
190 0.025 0.026 0.031 0.201 0.021 0.90 3.19 4.59 9.72 4.31
200 0.025 0.025 0.030 0.196 0.020 1.04 3.47 4.94 10.03 4.84
Table 2: Numerical results of the random matrix problem for sr=0.6sr=0.6
mm RMSE TT
GIMSPG MSPG VBMFL1 FPCA SVT GIMSPG MSPG VBMFL1 FPCA SVT
100 0.036 0.036 0.124 1.640 1.860 0.42 1.40 5.11 4.71 5.23
110 0.033 0.033 0.080 1.360 1.380 0.57 1.50 5.19 5.09 4.84
120 0.033 0.033 0.062 0.870 1.324 0.85 1.70 5.38 5.94 5.70
130 0.031 0.031 0.054 0.610 0.972 0.91 1.88 5.44 6.05 8.41
140 0.030 0.031 0.051 0.500 0.707 1.09 2.10 5.52 6.67 7.27
150 0.029 0.029 0.047 0.437 0.552 1.87 2.32 6.21 7.17 7.47
160 0.028 0.028 0.042 0.396 0.368 1.90 2.41 6.36 8.09 8.55
170 0.027 0.028 0.041 0.357 0.219 1.99 2.99 6.59 8.64 9.01
180 0.027 0.027 0.038 0.326 0.163 0.84 3.10 7.16 10.06 9.03
190 0.026 0.027 0.035 0.310 0.127 2.14 4.11 7.51 10.23 10.06
200 0.026 0.027 0.034 0.289 0.068 2.38 4.37 8.14 10.55 10.84
Table 3: Numerical results of the random matrix problem for sr=0.2sr=0.2
mm RMSE TT
GIMSPG MSPG VBMFL1 FPCA SVT GIMSPG MSPG VBMFL1 FPCA SVT
100 0.070 0.075 5.649 5.770 6.536 0.69 1.40 13.42 2.38 8.17
110 0.071 0.074 5.496 5.660 6.497 0.80 1.50 16.59 2.64 995
120 0.069 0.073 5.486 5.540 6.563 0.94 1.60 17.38 3.27 10.41
130 0.066 0.073 5.456 5.300 6.618 1.08 1.70 18.33 3.27 13.51
140 0.066 0.075 5.446 5.209 6.255 1.50 1.98 20.84 3.73 15.41
150 0.066 0.072 5.442 5.270 6.230 1.66 2.19 21.84 4.06 15.84
160 0.065 0.071 5.508 5.230 6.210 1.80 2.57 25.67 3.99 27.09
170 0.065 0.068 5.309 5.050 6.194 2.30 2.80 26.77 4.31 23.89
180 0.064 0.069 5.212 5.010 5.984 2.60 3.39 29.16 5.25 29.01
190 0.062 0.068 5.091 4.970 5.020 2.63 3.58 29.98 5.98 29.21
200 0.062 0.067 4.239 4.680 5.064 3.20 4.82 32.83 6.66 29.77

4.2 Image inpainting

In this subsection, we aim to illustrate the efficiency of GIMSPG algorithm by solving a grayscale inpainting problem with non-Gaussian noise. The image inpainting problem is to fill the missing pixel values of the image at given pixel locations. In fact, the grayscale image can be seen as the matrix, and the image inpainting problem can be represented as the matrix completion problem if the matrix satisfies the property of the low rank. In our experiments, two different grayscale images are considered which are from the USC-SIPI image database 111http://sipi.usc.edu/database/.. In detail, they are “Chart” and “Ruler". “Chart” is an image with 256×256256\times 256 pixels and rank r=185r=185. “Ruler” is an image with 512×512512\times 512 with rank r=67r=67. To evaluate the performance of the GIMSPG algorithm under the non-Gaussian noise, the peak signal-to-noise ratio (PSNR) is considered as an evaluation metric which is defined as follows:

PSNR:=10log10(mnX¯MF2).\mathrm{PSNR}:=10\mathrm{log}_{10}(\frac{mn}{\|\bar{X}-M\|^{2}_{F}}).

We solve the image inpainting problem under three different sample ratios sr=0.2,sr=0.6,sr=0.8sr=0.2,\,sr=0.6,\,sr=0.8. From the results in Table 5, we observe that the last value μ\mu of smoothing factor of GIMSPG algorithm is less than MSPG algorithm in each case. The used time TT of GIMSPG is the least in these algorithms. The PSNR of GIMSPG algorithm is higher than MSPG for sr=0.2sr=0.2 and obviously higher than algorithm in all the cases. From Figures 3, 4, one can see that the recoverability of GIMSPG algorithm is slightly better than that MSPG algorithm espeically in the lower sample rate and apparently better than others.

Refer to captionRefer to captionRefer to caption
(a) Observed
Refer to captionRefer to captionRefer to caption
(b) GIMSPG
Refer to captionRefer to captionRefer to caption
(c) MSPG
Refer to captionRefer to captionRefer to caption
(d) VBMFL1
Refer to captionRefer to captionRefer to caption
(e) FPCA
Refer to captionRefer to captionRefer to caption
(f) SVT
Figure 3: Results of “Chart” recovery. The first column is the sample images with sr=0.2, 0.6, 0.8.sr=0.2,\,0.6,\,0.8. The rest columns are the recovery images by the corresponding algorithms
Refer to captionRefer to captionRefer to caption
(a) Observed
Refer to captionRefer to captionRefer to caption
(b) GIMSPG
Refer to captionRefer to captionRefer to caption
(c) MSPG
Refer to captionRefer to captionRefer to caption
(d) VBMFL1
Refer to captionRefer to captionRefer to caption
(e) FPCA
Refer to captionRefer to captionRefer to caption
(f) SVT
Figure 4: Results of “Ruler” recovery. The first column is the sample images with sr=0.2, 0.6, 0.8.sr=0.2,\,0.6,\,0.8. The rest columns are the recovery images by the corresponding algorithms
\sidewaystablefn
Table 4: Numerical results on the image data for sr=0.2,sr=0.6,sr=0.8sr=0.2,\,sr=0.6,\,sr=0.8
Image sr PSNR T μ\mu
GIMSPG MSPG VBMFL1 FPCA SVT GIMSPG MSPG VBMFL1 FPCA SVT MSPGE MSPG
chart 0.2 18.04 17.16 15.96 15.60 9.16 9.78 18.10 18.67 0.53 2.72 0.022 0.062
0.6 29.13 29.01 19.80 18.80 10.49 2.19 12.59 33.10 14.16 2.78 0.035 0.131
0.8 30.84 30.18 20.63 19.50 10.65 1.55 3.16 47.94 13.64 3.58 0.107 0.234
ruler 0.2 21.17 20.74 20.10 19.90 5.21 25.45 55.76 5.23 26.25 5.42 0.020 0.105
0.6 27.02 27.04 19.80 18.80 5.22 11.50 12.86 19.67 54.91 6.31 0.044 0.238
0.8 30.01 30.39 20.50 20.20 5.24 7.11 14.83 49.91 49.98 7.09 0.093 0.273
\botrule
Table 5: Numerical results of the MRI volume dataset for sr=0.6,sr=0.8sr=0.6,\,sr=0.8
Image sr PSNR T μ\mu
GIMSPG MSPG VBMFL1 FPCA SVT GIMSPG MSPG VBMFL1 FPCA SVT MSPGE MSPG
MRI 0.6 29.13 29.01 29.58 26.70 13.56 2.19 12.59 32.91 10.67 2.22 0.035 0.131
0.8 33.06 32.99 30.33 27.20 13.59 1.79 10.57 26.61 11.58 2.44 0.029 0.137
\botrule

4.3 MRI Volume Dataset

In this subsection, we perform the experiments on the MRI image which is of size 217×181217\times 181 with 181 slices and we select the 38th slice for the experiments. The two sampling ratios sr=0.6,sr=0.8sr=0.6,\,sr=0.8 are considered. The parameters in GMM noise are set as σA2=0.0001\sigma^{2}_{A}=0.0001, σB2=0.1\sigma^{2}_{B}=0.1 and c=0.1c=0.1.

From Figure 5 and Table 5, we see that the recoverability of VBMFL1 is relatively better but it spends more time. Particularly, we notice that the PSNR of GIMSPG algorithm is higher than other algorithms except the VBMFL1 in Table 5. Besides, the used time TT of GIMSPG is the least one. Compared with MSPG, the last value μ\mu of smoothing facor of GIMSPG algorithm is lower.

Refer to captionRefer to caption
(a) Observed
Refer to captionRefer to caption
(b) GIMPG
Refer to captionRefer to caption
(c) MSPG
Refer to captionRefer to caption
(d) VBMFL1
Refer to captionRefer to caption
(e) FPCA
Refer to captionRefer to caption
(f) SVT
Figure 5: Results of “MRI” recovery. The first column is the sample images with sr=0.6, 0.8.sr=0.6,\,0.8. The rest columns are the recovery images by the corresponding algorithms

5 Conclusions

This paper mainly proposes the GIMSPG algorithm for the exact continuous model Yu_Q_Zhang_X of matrix rank minimization problem and discuss its convergence under the flexible parameters condition. It is shown that the singular values of the accumulation point have a common support set and the nonzero elements have a unified lower bound. Further, the zero singular values of the accumulation point can be achieved within finite iterations. Moreover, we prove that any accumulation point of the sequence generated by GIMSPG algorithm is a lifted stationary point of the continuous relaxation model under the flexible parameter constraint. Further, we generalize Euclidean distance to the Bregman distance in the GIMSPG algorithm and the . Under the appropriate parameters, the range of the extrapolation parameters is large enough which also includes the sFISTA system (1) with fixted restart. At last, the superiority of the GIMSPG algorithm is verified on random data and real image data respectively.

Appendix A Appendix

Lemma 6.

For any kk\in\mathbb{N}, suppose {Xk}\{X^{k}\} be the sequence generated by GIMSPG, we have

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})
\displaystyle\leq ((L~ϱhk2+hkαkβkL~2)μk1+αkhkμk+112)Xk+1Xk2\displaystyle((\frac{\tilde{L}-\varrho h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k}h_{k}\mu^{-1}_{k+1}}{2})\|X^{k+1}-X^{k}\|^{2}
+L~βk(βk1)2μk1XkXk12.\displaystyle+\frac{\tilde{L}\beta_{k}(\beta_{k}-1)}{2}\mu^{-1}_{k}{\|X^{k}-X^{k-1}\|}^{2}.

When the parameters αk,βk,hk\alpha_{k},\beta_{k},h_{k} satisfying Assumption 5, it holds that

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)εhk2μk+11Xk+1Xk2\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})\leq\frac{-\varepsilon h_{k}}{2}\mu^{-1}_{k+1}{\|X^{k+1}-X^{k}\|}^{2} (45)

Moreover, the sequence Hδk+1(Xk+1,Xk,μk+1,μk)H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k}) is a nonincreasing sequence.

Proof: By (40), we have

Xk+1,f~(Zk,μk)hkμk1(YkXk)+hkμk1Dϕ(Xk+1,Xk)+λΦdk(Xk+1)\displaystyle\langle X^{k+1},\nabla\tilde{f}(Z^{k},\mu_{k})-h_{k}\mu^{-1}_{k}(Y_{k}-X_{k})\rangle+h_{k}\mu^{-1}_{k}D_{\phi}(X^{k+1},X^{k})+\lambda{{\Phi}^{d^{k}}}(X^{k+1})
\displaystyle\leq Xk,f~(Zk,μk)hkμk1(YkXk)+λΦdk(Xk).\displaystyle\langle X^{k},\nabla\tilde{f}(Z^{k},\mu_{k})-h_{k}\mu^{-1}_{k}(Y_{k}-X_{k})\rangle+\lambda{{\Phi}^{d^{k}}}(X^{k}). (46)

Since f~(,μk)\nabla\tilde{f}(\cdot,\mu_{k}) has Lipschitz constant L~μk1\tilde{L}{\mu_{k}}^{-1}, it follows from the Definition 2(v) that

f~(Xk+1,μk)f~f(Zk,μk)+Xk+1Zk,f~(Zk,μk)+12L~μk1Xk+1Zk2.\displaystyle\tilde{f}(X^{k+1},\mu_{k})\leq\tilde{f}f(Z^{k},\mu_{k})+\langle X^{k+1}-Z^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle+\frac{1}{2}\tilde{L}{\mu_{k}}^{-1}{\|X^{k+1}-Z^{k}\|}^{2}. (47)

Moreover, f~(Xk,μk)\tilde{f}(X^{k},\mu_{k}) is convex with respect to XkX^{k}, we have

f~(Zk,μk)+XkZk,f~(Zk,μk)f~(Xk,μk)\displaystyle\tilde{f}(Z^{k},\mu_{k})+\langle X^{k}-Z^{k},\nabla\tilde{f}(Z^{k},\mu_{k})\rangle\leq\tilde{f}(X^{k},\mu_{k}) (48)

Combining (A), (47) and (48), we have

~dk(Xk+1,μk)~dk(Xk,μk)\displaystyle\widetilde{\mathcal{F}}^{d^{k}}(X^{k+1},\mu_{k})-\widetilde{\mathcal{F}}^{d^{k}}(X^{k},\mu_{k}) (49)
\displaystyle\leq hkμk1YkXk,XkXk+1hkμk1Dϕ(Xk+1,Xk)+L~μk12Xk+1Zk2.\displaystyle-h_{k}{\mu^{-1}_{k}}\langle Y^{k}-X^{k},X^{k}-X^{k+1}\rangle-h_{k}{\mu^{-1}_{k}}D_{\phi}(X^{k+1},X^{k})+\frac{\tilde{L}\mu^{-1}_{k}}{2}{\|{X}^{k+1}-Z^{k}\|}^{2}.

Denote Δk:=XkXk1\Delta_{k}:=X^{k}-X^{k-1}, then it has αkΔk:=YkXk\alpha_{k}\Delta_{k}:=Y^{k}-X^{k}, βkΔk:=ZkXk\beta_{k}\Delta_{k}:=Z^{k}-X^{k}, βkΔkΔk+1:=ZkXk+1\beta_{k}\Delta_{k}-\Delta_{k+1}:=Z^{k}-X^{k+1}. That is,

~dk(Xk+1,μk)~dk(Xk,μk)\displaystyle\widetilde{\mathcal{F}}^{d^{k}}(X^{k+1},\mu_{k})-\widetilde{\mathcal{F}}^{d^{k}}(X^{k},\mu_{k})
hkμk1αkΔk,Δk+1+L~μk12βkΔkΔk+1212ϱhkμk1Δk+12\displaystyle\leq h_{k}{\mu_{k}}^{-1}\langle\alpha_{k}\Delta_{k},\Delta_{k+1}\rangle+\frac{\tilde{L}\mu^{-1}_{k}}{2}{\|\beta_{k}\Delta_{k}-\Delta_{k+1}\|}^{2}-\frac{1}{2}\varrho h_{k}{\mu_{k}}^{-1}{\|\Delta_{k+1}\|}^{2}
=(L~ϱhk)μk12Δk+12+L~2βk2μk1Δk2+(hkαkβkL~)μk1Δk,Δk+1\displaystyle=\frac{(\tilde{L}-\varrho h_{k})\mu^{-1}_{k}}{2}{\|\Delta_{k+1}\|}^{2}+\frac{\tilde{L}}{2}\beta^{2}_{k}\mu^{-1}_{k}{\|\Delta_{k}\|}^{2}+(h_{k}\alpha_{k}-\beta_{k}\tilde{L})\mu^{-1}_{k}\langle\Delta_{k},\Delta_{k+1}\rangle
(L~ϱhk2+hkαkβkL~2)μk1Δk+12+(L~βk22+hkαkβkL~2)μk1Δk2\displaystyle\leq(\frac{\tilde{L}-\varrho h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}{\|\Delta_{k+1}\|}^{2}+(\frac{\tilde{L}\beta^{2}_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}{\|\Delta_{k}\|}^{2} (50)

where the second inequality comes from the Cauchy-Schwartz inequality.

Letting dk=dXkd^{k}=d^{X^{k}} and according to ~dk(Xk+1,μk)~(Xk+1,μk)\mathcal{\widetilde{F}}^{d^{k}}(X^{k+1},\mu_{k})\geq\mathcal{\widetilde{F}}(X^{k+1},\mu_{k}), we have

~(Xk+1,μk)+αk+1hk+1μk+112Δk+12(~(Xk,μk)+αkhkμk12Δk2)\displaystyle\widetilde{\mathcal{F}}(X^{k+1},\mu_{k})+\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}{\|\Delta_{k+1}\|}^{2}-(\widetilde{\mathcal{F}}(X^{k},\mu_{k})+\frac{\alpha_{k}h_{k}\mu^{-1}_{k}}{2}{\|\Delta_{k}\|}^{2})
((L~ϱhk2+hkαkβkL~2)μk1+αk+1hk+1μk+112)Δk+12\displaystyle\leq((\frac{\tilde{L}-\varrho h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}){\|\Delta_{k+1}\|}^{2}
+((L~βk2+hkαkβkL~2)μk1αkhkμk12)Δk2\displaystyle\quad+((\frac{\tilde{L}\beta^{2}_{k}+h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}-\frac{\alpha_{k}h_{k}\mu^{-1}_{k}}{2}){\|\Delta_{k}\|}^{2} (51)

By the Definition 2(iv) and the nonincreasing of μk\mu_{k}, we easily have that

~(Xk,μk)~(Xk,μk1)+κ(μk1μk).\displaystyle\widetilde{\mathcal{F}}(X^{k},\mu_{k})\leq\widetilde{\mathcal{F}}(X^{k},\mu_{k-1})+\kappa(\mu_{k-1}-\mu_{k}). (52)

Together (A) with (52), we get

Hδk+1(Xk+1,Xk,μk+1,μk)Hδk(Xk,Xk1,μk,μk1)\displaystyle H_{\delta_{k+1}}(X^{k+1},X^{k},\mu_{k+1},\mu_{k})-H_{\delta_{k}}(X^{k},X^{k-1},\mu_{k},\mu_{k-1})
\displaystyle\leq ((L~ϱhk2+hkαkβkL~2)μk1+αk+1hk+1μk+112)Δk+12+L~(βk2βk)2μk1Δk2\displaystyle((\frac{\tilde{L}-\varrho h_{k}}{2}+\frac{h_{k}\alpha_{k}-\beta_{k}\tilde{L}}{2})\mu^{-1}_{k}+\frac{\alpha_{k+1}h_{k+1}\mu^{-1}_{k+1}}{2}){\|\Delta_{k+1}\|}^{2}+\frac{\tilde{L}(\beta^{2}_{k}-\beta_{k})}{2}\mu^{-1}_{k}{\|\Delta_{k}\|}^{2}

According to the parameters constraint in Assumption 5, we easily have that (L~ϱhk+hkαkβkL~)μk1+αkhkμk+11<εhkμk+11(\tilde{L}-\varrho h_{k}+h_{k}\alpha_{k}-\beta_{k}\tilde{L})\mu^{-1}_{k}+\alpha_{k}h_{k}\mu^{-1}_{k+1}<-\varepsilon h_{k}\mu^{-1}_{k+1}, βk2βk0\beta^{2}_{k}-\beta_{k}\leq 0. Hence, the inequality (45) holds. So the desired result is obtained.

References

  • (1) Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243-272 (2008)
  • (2) Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183-202 (2009)
  • (3) Bian,W.: Smoothing accelerated algorithm for constrained nonsmooth convex optimization problems (in Chinese). Sci. Sin. Math. 50, 1651–1666 (2020)
  • (4) Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First-order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131-2151 (2018)
  • (5) Bian, W., Chen, X.: A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 58, 858-883 (2020)
  • (6) Bouwmans,T., Zahzah, E, H.: Robust PCA via Principal Component Pursuit: A Review for a Comparative Evaluation in Video Surveillance. Comput Vis. Image Underst. 122, 22-34 (2014)
  • (7) Cai, J., Candes, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956-1982 (2008)
  • (8) Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm". J. Optim. Theory Appl. 166, 968-982 (2015)
  • (9) Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134, 71-99 (2012)
  • (10) Fornasier, M., Rauhut, H., Ward, R.: Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM J. Optim. 21, 1614-1640 (2011)
  • (11) He, Y., Wang, F., Li, Y., Qin, J., Chen, B.: Robust matrix completion via maximum correntropy criterion and half-quadratic optimization. IEEE T. Signal Proces. 68, 181-195 (2020)
  • (12) Ji, S., Ye, J.: An accelerated gradient method for trace norm minimization. Proceedings of the 26th annual international conference on machine learning. 457-464 (2009)
  • (13) Kulis, B., Sustik, M.,A., Dhillon, I.,S.: Low-Rank Kernel Learning with Bregman Matrix Divergences. J. Mach. Learn. Res. 10, 2009
  • (14) Lai, M., Xu,Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed q minimization. SIAM J. Numer. Anal. 51, 927-957 (2013)
  • (15) Lewis, A, S., Sendov, H, S.: Nonsmooth analysis of singular values. Part II: Applications. Set-Valued Anal. 13, 243-264 (2005)
  • (16) Li, W., Bian, W., Toh, K, C.: DC algorithms for a class of sparse group 0\ell_{0} regularized optimization problems (2021). arXiv:2109.05251
  • (17) Liang, J., Schonlieb, C, B.: “Faster FISTA." European Signal Processing Conference. (2018)
  • (18) Lu, Z., Zhang, Y., Lu, J.: p\ell_{p} Regularized low-rank approximation via iterative reweighted singular value minimization. Comput. Optim. Appl. 68, 619-642 (2017)
  • (19) Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128, 321-353 (2011)
  • (20) Ma, T., H., Lou, Y., Huang, T., Z.: Truncated 12\ell_{1-2} models for sparse recovery and rank minimization. SIAM J. Imaging Sci. 10, 1346-1380 (2017)
  • (21) Mesbahi, M., Papavassilopoulos, G., P.: On the rank minimization problem over a positive semidefinite linear matrix inequality. IEEE T. Automat. Contr. 42, 239-243(1997)
  • (22) Nesterov, Y.: Introductory Lectures on Convex Programming. Kluwer Academic Publisher, Dordrecht (2004)
  • (23) Nesterov Y. Gradient methods for minimizing composite functions[J]. Mathematical programming, 2013, 140(1): 125-161.
  • (24) Nesterov, Y.: A method for solving the convex programming problem with convergence rate O(1/k2)O(1/k^{2}). Dokl. Akad. Nauk. SSSR 269, 543¨C547 (1983)
  • (25) O’Donoghue, B., Candes, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715-732 (2015)
  • (26) Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Mach. Learn. 1, 127-239 (2014)
  • (27) Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. Siam Rev. 52, 471-501 (2010)
  • (28) Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67-96 (2018)
  • (29) Toh, K., C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 6, 615-640 (2010)
  • (30) Wu, F., Bian, W., Xue, X.: Smoothing fast iterative hard thresholding algorithm for 0\ell_{0} regularized nonsmooth convex regression problem (2021). arXiv:2104.13107
  • (31) Wu, Z., Li, C., Li, M., Lim, A.: Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J. Global Optim. 79, 617-644 (2021)
  • (32) Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 73, 129-158(2019)
  • (33) Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory. 58, 3047-3064 (2012)
  • (34) Yu, Q., Zhang, X.: A smoothing proximal gradient algorithm for matrix rank minimization problem. Comput. Optim. Appl. 1-20 (2022)
  • (35) Zhang, C., Chen, X.: Smoothing projected gradient method and its application to stochastic linear complementarity problems. SIAM J. Optim. 20, 627-649(2009)
  • (36) Zhang, J., Yang, X., Li, G., Zhang, K. The smoothing proximal gradient algorithm with extrapolation for the relaxation of l0l_{0} regularization problem (2021). arXiv:2112.01114.
  • (37) Zhao, Q., Meng, D., Xu, Z., Yan, Y.: L1L_{1}-norm low-rank matrix factorization by variational Bayesian method. IEEE T. Neur. Net. Lear. 26, 825-839(2015)
  • (38) Zheng, Y., Liu, G., Sugimoto, S., Yan, S., Okutomi, M.: Practical low-rank matrix approximation under robust L1L_{1}-norm. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1410-1417). IEEE